A number of widely used machine learning methods, mainly clustering methods, can be accommodated into nonnegative matrix factorization framework with some variations. There are two purposes of applying matrix factorization to the user-item rating (or document-word frequency) matrix: to discover underling latent factors and/or to predict missing values of the matrix.

- A Tutorial on Principal Component Analysis (and its relation to SVD)
- A Unified View of Matrix Factorization Models
- Learning the parts of objects by non-negative matrix factorization compares NMF vs PCA/VQ.
- discussion: http://hebb.mit.edu/NMF/
- Take face data for example, all three learn a linear combination of basis images $V=WH^T$. NMF learns localized feature representation, VQ learns prototypes, each being a whole face while PCA learns eigenfaces which resemble distorted versions of whole faces.
- Why the difference? For VQ, each weight vector in $H$ is unary. for PCA, each vector of $W$ is orthogonal, and vectors of $H$ too. Also entries can be arbitrary signs so subtraction can happen. In contract, NMF only allows nonnegative entries, forcing additive combination, which is compatible with the intuition of combining parts to form a whole.
- Using neural network to interprete: VQ allows only one node to be active; PCA allows multiple but since attribution can be negative it's hard to interprete the semantics of nodes. NMF exhibits sparse distribution since components are shared by all learning objects while each object only attibutes to a few components. But NMF has only one layer while many tasks may require multiple layers.
- SVD? SVD = PCA https://math.stackexchange.com/questions/3869/what-is-the-intuitive-relationship-between-svd-and-pca SVD only applies to full matrix without missing values.
- PCA can be disastrous http://blog.explainmydata.com/2012/07/should-you-apply-pca-to-your-data.html

- Why non-negative? It enforces additive components. It encourages sparse representation. Its output is non-negative, which may make more sense for rating.

- http://www.csie.ntu.edu.tw/~cjlin/libmf/
- http://www.libfm.org/ Factorization Machines
- https://github.com/JohnLangford/vowpal_wabbit/wiki/Matrix-factorization-example