next up previous
Next: Markov Chain Monte Carlo Up: Introduction Previous: Introduction

 

Bayesian Learning and forecasting

The goal of Bayesian forecasting is to obtain the predictive distribution: tex2html_wrap_inline395 where tex2html_wrap_inline397 is the variable we want to predict at time n+1 and D is the knowledge we have expressed in a database that must at least contain the value tex2html_wrap_inline401 . In the general case:

  equation32

where x is the "cause" of a causal (or multivariate) forecasting model. So finally tex2html_wrap_inline403 is the predictive distribution.

Model fitting is done in the Bayesian framework by writing the probability of having a parameter tex2html_wrap_inline405 given the data tex2html_wrap_inline407 and the model H like,

  equation45

which is equivalente to say that the:

displaymath409

tex2html_wrap_inline411 in the first inference level is just a normalizing constant but at the second level is called evidence.

The evaluation of this distribution involves marginalizing over all levels of uncertainity: Model selection (H), hyperparameters ( tex2html_wrap_inline413 ) and parameters ( tex2html_wrap_inline405 ).

displaymath417

  equation59

where tex2html_wrap_inline405 are the parameter (weights and biases) and tex2html_wrap_inline413 are the noise over parameters.

The evaluation of tex2html_wrap_inline423 only requires a single pass through the network.

Typically, marginalization over tex2html_wrap_inline405 and H affects the predictive distribution (eq. 3) significantly, but integration over the hyperparameters tex2html_wrap_inline413 has a lesser effect. Marginalization can rarely be done analitically. The alternatives are Gaussian approximations [5], [6] and Monte Carlo Methods [7].

Complex models might produce overfitting the data, they also have an obvious extra computational cost. Bayesian inference can help tackle the problems of complexity, uncertainity and selection of probabilistic models.



Rafael A. Calvo
Fri Apr 18 12:26:35 GMT+1000 1997