Learning of Bayesian Network using EM Algorithm

Deqing Li

Introduction


In my own research project "deception detection" [1], we use Bayesian Networks (BNs) to simulate a group of human experts since their reasoning processes can be represented as the inference over BNs. Deception detection is achieved by predicting the answers of each BN and comparing the predicted answer with the deceptive answer from experts. In this project, an alternative way to detect deception by machine learning is proposed. Specifically, we first simulate human experts by BNs and then learn their BNs from the observations generated from the original BNs. Thus the original BN serves as a simulator of an expert and the learned BN as a model of the expert. To test the reliability of the experts' answers to given evidence, we reason through the model and compare the answers with the experts' answer inferred from the simulator. An achitecture of the system can be depicted as Figure 1.

Fig. 1 Achitecture of Deception Detection by Learning Bayesian Networks

Method

To Learn a BN, one needs to specify two things: the structure of a BN and the parameter (conditional probability table) of a BN [2]. Both can be learned from observations. For example, to learn a BN with random variable (r.v.) A, B, C,  observations such as {A=1, B=2, C=3} and etc. can be used to learn the CPT given the structure or to learn the structure itself. The observations are randomly generated from the simulator BNs. However, learning the structure of BN is much harder than learning the parameter, and learning with incomplete data is harder than with complete data.
  • Learning parameter from complete data
With complete data, learning the CPT is as simple as maximizing the likelihood of the observations [2].

where xi[m] denotes the ith r.v. in the mth observation, and pai[xi] represents the ith parent of xi.
  • Learning parameter from Incomplete data
Incomplete data means that some r.v.s are hidden from the observations. In this case, we use Expectation-Maximization (EM) Algorithm [3] to learn the parameter. EM algorithm is used to find the maximimum likelihood of parameters of statistical models based on unobserved data. It is composed of an E step and an M step. In the E step, we calculate an expected value of the log likelihood based on an estimation of the unobserved data, while in the M step, we find the parameter that can maximize the log likelihood. The EM steps improve the log likelihood while iterating and achieve an optimal when it converges. [4]
  • Learning strucutre
The traditional way to learn the structure of a BN is to search through the structure space for one that can maximize the posterior probability of the structure given data.
The complexity of an exhaustive search is super-exponential. Researchers implement different heuristics to improve the complexity. [5] Due to the big difficulty in learning structure, I may not implement it in this project.

Timeline
 
Week 1: Implement the code to randomly generate observations from simulating BNs (both complete and incomplete).
Week 2: Implement the algorithm to learn parameter from complete data
Week 3: Study EM algorithm and how to apply EM algorithm to learning BN
Week 4: Implement the algorithm to learn parameter from incomplete data
Milestone Presentation
Week 5: Test the learned BN by reasoning with random evidence
Week 6: Implement the heuristic to detect deception
Week 7: Finalize and Work on the report

References

[1] E. Santos, Jr., D. Li, and X. Yuan, “On deception detection in multi-agent systems and deception intent”, in Proc. SPIE, Orlando, FL: SPIE, March 2008, vol. 6965.
[2]
K. Murphy, "A Brief Introduction to Graphical Models and Bayesian Networks", http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html, 1998.
[3]
A. P. Dempster, N. M. Laird, and D. B. Rubin. "Maximum likelihood from incomplete data via the EM algorithm". Journal of the Royal Statistical Society B 39, 1–39,1977.
[4] Lauritzen, S. L. "The EM algorithm for graphical association models with missing data". Computational Statistics and Data Analysis 19, 191–201,
1995.
[5]
N. Friedman, and D. Koller. "Being Bayesian about network structure: A Bayesian approach
to structure discovery in Bayesian networks". Machine Learning 50, 95–126,
2003.