###
Common distribution:

- beta. prior for bernoulli. apparently the larger the pesudo count is, the more skewed the beta distribution is. \alpha,\beta<1 favor sparse distribution.
- dirichlet similar to beta. multivariate version

###
About conjugate prior relationship

http://www.johndcook.com/conjugate_prior_diagram.html
###
Time Series Analysis

###
EM algorithms:

Andrew Ng
EM:
E: Get a lowerbound of likelihood.Jensen's Inequality: E(f(x)) >= f(E(X))
M: Maximize that lowerbound

###
Resources

Machine Learning Summer School 2009 - Cambridge

Overview of ML methods
- SVM: max-marginal
- HMM
- CRF
- MEMM
- SVM-HMM
- HMM-LDA
- Discriminative vs. Non-Discriminative. model posterior prob vs. distribution/likelihood? directly

sLDA
HDP. YW Teh EPX
Prior: representing knowledge or belief about an unknown quantity
Point estimaation:
P(theta|x) = p(x|theta)p(theta)/p(x)
MLE: maximize likelihood probability -> p(x|theta). fits the data as much as possible
MAP: maximize posteriori prob -> p(theta|x)