# Bayesian Linear Regression

Linear Regression is a very simple machine learning method
in which each datapoints is a pair of vectors: the input vector and
the output vector. In the simplest case linear regression assumes
that the k'th output vector was formed as some linear combination of
the components of the k'th input vector plus a constant
term, and then Gaussian noise was added.
Classical linear regression can then be used to identify, based on
the data, the best fitting linear relationship between the inputs
and outputs. It turns out that this is an efficient process (at least for
fewer than around 30 or 40 inputs) because it simply involves building
two matrices from the data and then solving a DxD system of linear
equations where D is (1 + the number of inputs).
Instead of performing linear regression on the raw inputs it is
almost as easy to perform regression on a vector of basis functions.
This gives rise to polynomial regression or radial basis function
regression (with fixed centers). Linear function approximation is
simple but has many nice properties. Read more
about it here.

**Bayesian** linear regression allows a fairly natural mechanism
to survive insufficient data, or poor distributed data. It allows
you to put a prior on the coefficients and on the noise so that in
the absence of data, the priors can take over. More importantly, you
can ask Bayesian linear regression which parts (if any) of its fit
to the data is it confident about, and which parts are very uncertain
(perhaps based entirely on the priors). Specifically, you can ask
it about

- What is the estimated linear relation, what is the confidence on that,
and what is the full posterior distribution on that?
- What is the estimated noise and the posterior distribution on that?
- What is the estimated gradient and the posterior distribution on that?
- With more numerical effort you can also ask about the direction of
steepest ascent and the distribution on that. Also (if doing
quadratic regression), you can ask about the location of an
optimum or saddle-point and the distribution on that.

We like to use Bayesian regression analysis in conjunction with
locally weighted queries.

## More Information

*Back to Glossary Index*