# Bayesian Linear Regression

Linear Regression is a very simple machine learning method in which each datapoints is a pair of vectors: the input vector and the output vector. In the simplest case linear regression assumes that the k'th output vector was formed as some linear combination of the components of the k'th input vector plus a constant term, and then Gaussian noise was added. Classical linear regression can then be used to identify, based on the data, the best fitting linear relationship between the inputs and outputs. It turns out that this is an efficient process (at least for fewer than around 30 or 40 inputs) because it simply involves building two matrices from the data and then solving a DxD system of linear equations where D is (1 + the number of inputs).

Instead of performing linear regression on the raw inputs it is almost as easy to perform regression on a vector of basis functions. This gives rise to polynomial regression or radial basis function regression (with fixed centers). Linear function approximation is simple but has many nice properties. Read more about it here.

Bayesian linear regression allows a fairly natural mechanism to survive insufficient data, or poor distributed data. It allows you to put a prior on the coefficients and on the noise so that in the absence of data, the priors can take over. More importantly, you can ask Bayesian linear regression which parts (if any) of its fit to the data is it confident about, and which parts are very uncertain (perhaps based entirely on the priors). Specifically, you can ask it about

• What is the estimated linear relation, what is the confidence on that, and what is the full posterior distribution on that?
• What is the estimated noise and the posterior distribution on that?
• What is the estimated gradient and the posterior distribution on that?
• With more numerical effort you can also ask about the direction of steepest ascent and the distribution on that. Also (if doing quadratic regression), you can ask about the location of an optimum or saddle-point and the distribution on that.

We like to use Bayesian regression analysis in conjunction with locally weighted queries.