Statistical Signal Processing (Detection and Estimation)

The goal of hypothesis test (detection) is to decide which hypothesis (among n hypotheses) is true, where n can be any integer greater than 1.  A hypothesis is a statement about a random event.  Note a hypothesis need not be a random variable, which is a function mapping an elementary event to a real number.

The goal of estimation is to extract quantity of interest from noisy data/signal.  Estimation problems can be classified into the following categories: 

Key techniques: all detection/estimation problems are optimization problems with respect to a certain criterion as follows.  So whenever one talks about an optimal detector/estimator, don't forget to ask him/her what optimal criterion is used.

Types of estimators

Minimum Description Length (MDL) model selection

If someone asks you whether a Markov chain is a reasonable model to do the time series analysis, you should ask him/her comparing to what? Without a baseline solution, anything could be reasonable to start with.   So the first step in modeling is selecting a model.  Then, the second step is to estimate the parameter of the model.

MDL can be used as a criterion for model selection.  It measures the universal code length of the data, given a model.  One can compare two different models based on their MDLs using the data available and favor one with shorter code length.  However, it is not clear how to optimize from a model set (functional space) with infinite models based on MDL.  Note that MDL is used for comparing given models in a parameterized family, rather than searching for the best model over the whole functional space.

Kalman filters requires 1) linear system equation, 2) Gaussian noise.   The linearity of system equation preserves the Gaussianity of the process (Gauss Markov process, actually, AR process).  Since the state process is a Gauss Markov process, Kalman filters only have to propagate mean and covariance.  Kalman filters can be used to estimate the state of time-varying linear system.   Just change the system matrices from constant ones to time varying ones.

Graphical Models

"Graphical models are a marriage between probability theory and graph theory. They provide a natural tool for dealing with two problems that occur throughout applied mathematics and engineering -- uncertainty and complexity -- and in particular they are playing an increasingly important role in the design and analysis of machine learning algorithms. Fundamental to the idea of a graphical model is the notion of modularity -- a complex system is built by combining simpler parts. Probability theory provides the glue whereby the parts are combined, ensuring that the system as a whole is consistent, and providing ways to interface models to data. The graph theoretic side of graphical models provides both an intuitively appealing interface by which humans can model highly-interacting sets of variables as well as a data structure that lends itself naturally to the design of efficient general-purpose algorithms.

Many of the classical multivariate probabalistic systems studied in fields such as statistics, systems engineering, information theory, pattern recognition and statistical mechanics are special cases of the general graphical model formalism -- examples include mixture models, factor analysis, hidden Markov models, Kalman filters and Ising models. The graphical model framework provides a way to view all of these systems as instances of a common underlying formalism. This view has many advantages -- in particular, specialized techniques that have been developed in one field can be transferred between research communities and exploited more widely. Moreover, the graphical model formalism provides a natural framework for the design of new systems." --- Michael Jordan, 1998.

EM algorithm vs. BCJR

EM is a sub-optimal algorithm to compute maximum likelihood or MAP.  What EM obtains is a fixed-point, rather than the desired global optimum. It is guaranteed that each step in EM converges to the fixed-point. 

BCJR is an algorithm to compute a posteri probabilty exactly. 

EM is an iterative algorithm while BCJR is a sequential algorithm.