**Q:**What is the difference between EM and Variational Bayes (aka "Variational Bayesian EM")?

**A:**Both are iterative methods for learning about the parameters theta and the posterior p(z | x, theta) in a latent variable model p(x, z | theta) by trying to a maximize a lower bound to the marginal likelihood p(x) (or p(x | theta) when using EM). The difference between the two techniques lies in following:

- VB assumes that, even when given a setting of the parameters theta, it is not tractable to compute the posterior p(z | x, theta). Thus, it must be approximated by some variational distribution q(z). On the other hand, EM makes the opposite assumption - given a setting of the parameters theta, it will compute the exact posterior p(z | x, theta). In this sense, VB is a relaxation of EM.
- VB is a bayesian technique while EM is a frequentist technique. As such, VB treats theta as a latent, random variable and computes a distribution q(theta) that approximates the posterior distribution p(theta | x). EM, a maximum likelihood technique, computes a point estimate to the parameters theta instead of a full distribution.

__Objective:__

EM’s objective is to maximize a lower bound to p(x | theta), E_q(z)[log(p(x, z | theta))].

VB’s objective is to maximize a lower bound to p(x), E_q(z, theta)[log(p(x, z, theta))].

__E-step:__

EM first maximizes the objective by fixing theta and then finding an

*optimal*setting of q(z). This optimal setting is always calculated to be the posterior p(z | x, theta).

VM first maximizes the objective by fixing q(theta) and finding a

*better*setting of q(z) (often by some gradient method).

__M-step:__

EM maximizes the objective by fixing q(z) to the posterior from the previous E-step, and then calculates a maximum likelihood point estimate of the parameters theta.

VM maximizes the objective by fixing q(z) and then finding a better setting of q(theta) (again, often by some gradient method).

The E and M step are then repeated until convergence (i.e. until the point at which the log marginal likelihood no longer improves).

One last point: there is another algorithm called "Variational EM". This algorithm is just the E-step of VB combined with the M-step of EM. Since parameters are not treated as random variables, it is not a bayesian technique.

## No comments:

## Post a Comment