- What is the insight of matrix diagonization? Necessary and sufficient condition of matrix diagonization.
- What is the insight of eigenvalue and eigenvector of a matrix?
- What is the insight of singular value? What are the nice properties about SVD?
- All real square matrices have eigenvalues and eigenvectors over the complex field C. A real square matrix may not have real eigenvalues and eigenvectors. What about a non-square matrix?
- All complex square matrices have eigenvalues and eigenvectors over the complex field C. The eigenvalues of a Hermitian matrix are all real (proof?). What about its eigenvectors?
- Kroneker product and Kronecker sum are just a tool, which can represent a high-dimensional matrix by a 2-dimensional matrix. E.g., to represent the transition probability matrix/infinitesimal generator of a discrete/continuous time Markov chain, whose state is a vector.
- Tensor product of two matrices A x B, is a matrix that contains all possible product of two elements, one from A and one from B. That is, A x B = [a_{i,j}*B].
- Definiteness of a matrix
- Positive definite
- Necessary condition:
- All the diagonal elements are positive.

- Sufficient conditions:
- All the eigenvalues are positive.

- Necessary condition:
- Positive semi-definite or non-negative
definite
- Necessary condition:
- All the diagonal elements are non-negative.

- Sufficient conditions:
- All the eigenvalues are non-negative.

- A covarince matrix of r.v.'s must be
non-negative definite.
- Proof: Denote a random vector \vec{r}=[r_1, r_2, ..., r_n]^T (a column vector). Its covariance matrix R=E[\vec{r}*\vec{r}^T]. For any nonzero column vector x, x^T *R*x=E[x^T*\vec{r}*\vec{r}^T*x]=E[a^2]>= 0, where a=x^T*\vec{r} is a scalar.

- Necessary condition:
- Negative definite
- Negative semi-definite or non-positive definite
- Indefinite
- Sufficient conditions:
- The diagonal has both positive and negative elements.
- The matrix has both positive and negative eigenvalues.

- Sufficient conditions:

- Positive definite

- Matrix factorizaton/decomposition:
- Cholesky factorization for a positive
semidefinite symmetric matrix A:
- A=L*L^T, where L^T is transpose of lower triangular matrix L. So L^T is an upper triangular matrix.
- A=L*D*L^T, where D is a diagonal matrix and the diagonal elements of L are all one.
- Cholesky factorization is very useful in statistical signal processing since the covariance matrix of samples is positive semidefinite and symmetric.

- QR factorization of a square matrix A
(definiteness not required)
- A=QR, where Q is an orthogonal matrix and R is an upper triangular matrix.
- Geometrical interpretation: the columns of Q are basis vectors and the columns of R are coordinates of columns of A w.r.t. the basis (columns in Q).
- Given a matrix A, how to compute QR?
- Gram-Schmidt procedure: performs poorly under round-off errors.
- Householder transformation (reflection): performs well under round-off errors, compared to Gram-Schmidt procedure.
- Givens transformation (rotation): performs well under round-off errors, compared to Gram-Schmidt procedure.

- Singular Value Decomposition (SVD) of an
n-by-m matrix A (definiteness not required)
- A=U*\Lambda*V^H, where U and V are unitary matrices, V^H is conjugate transpose (Hermitian) of V, \Lambda is a diagonal matrix of singular values. Singular values are non-negative (could be zero).
- Easy to obtain pseudoinverse: A^# = V*\Lambda^{-1}*U^H, where \Lambda^{-1} is an inverse matrix of \Lambda w.r.t. nonzero singular values.
- Easy to obtain projection onto the column space of A: P_A=A*A^#

Purpose for Matrix Factorization:

Matrix factorization in**numerical linear algebra**(NLA) typically serves the purpose of restating some given problem in such a way that it can be**solved more readily**; for example, one major application is**solving a linear system of equations**. - Cholesky factorization for a positive
semidefinite symmetric matrix A:
- What's the property of a companion matrix? Every eigenvalue is simple, i.e., having geometric multiplicity of one.
- A necessary and sufficient condition for a matrix being diagonalizable is that the geometrical multiplicity of each eigenvalue is equal to its algebraic multiplicity. Geometrical multiplicity of an eigenvalue is the dimensionality of the span of its eigenvectors. Algebraic multiplicity is the multiplicity of an eigenvalue in the characteristic polynomial.
- If the above condition is not satisfied, a matrix must have a Jordan canonical form.
**Pseudo-inverse**of a matrix A- Right pseudo-inverse, a.k.a., generalized
inverse, pseudo-inverse, and Moore-Penrose inverse
- A^{+} = (A^T*A)^{-1}*A^T
- A^{+}*A=I
- Least square solution (minimum squared error)
of a system of over-constrained linear equations
Proof: solve min (A*x-b)^T*(A*x-b). Assume there is no all-zero column or row in A.

Zero the derivative of the objective function w.r.t. x and then we have

2* A^T* (A*x-b) =0

A^T* A*x = A^T* b

x=(A^T*A)^{-1}*A^T*b = A^{+}* b (since A^T*A is a square matrix and invertible)

- Left pseudo-inverse, a.k.a., matrix 1-inverse
- A^{-} = A^T(A*A^T)^{-1}
- A*A^{-} = I
- Minimum norm solution of a system of
under-constrained linear equations
Proof: solve min x^T*x, subject to A*x=b. Assume there is no all-zero column or row in A.

Lagrangian multiplier method: L=x^T*x+\lambda^T*( A*x-b)

dL/dx= 2*x+ A^T*\lambda =0 -> x= - A^T*\lambda/2

A*x= - A*A^T*\lambda/2=b -> \lambda = -2*(A*A^T)^{-1}*b (since A*A^T is a square matrix and invertible)

x=- A^T*\lambda/2= A^T*(A*A^T)^{-1}*b = A^{-}* b

- In the solution space (which has infinite number of solutions), the point x_0 which has minimum norm, is unique. Draw a figure and show the vector originating from the origin and ending at the point x_0. It is unique since it is global optimum of a convex function with convex constraints (refer to KKT sufficient condition).

- Application to solving a system of linear
equations A*x=b
- If the solution x does not exist due to over-constrained-ness, use a linear regression model (stochastic static linear model) A*x+n=b, where n is a Gaussian noise vector, whose elements are independent of each other. Using least square (LS) estimation, we obtain an LS solution x=A^{+}*b.

- Right pseudo-inverse, a.k.a., generalized
inverse, pseudo-inverse, and Moore-Penrose inverse