The goal of Bayesian forecasting is to obtain the predictive distribution:
where
is the variable we want to predict at time
n+1 and D is the knowledge we have expressed in a database that must at least
contain the value
. In the general case:
where x is the "cause" of a causal (or multivariate) forecasting model.
So finally
is the predictive distribution.
Model fitting is done in the Bayesian framework by writing the probability of having a
parameter
given the data
and the model
H like,
which is equivalente to say that the:
in the first inference level is just a normalizing constant but at the second level is called evidence.
The evaluation of this distribution involves marginalizing over all levels of
uncertainity: Model selection (H), hyperparameters (
) and parameters
(
).
where
are the parameter (weights and biases) and
are the noise over parameters.
The evaluation of
only requires a single pass through the network.
Typically, marginalization over
and H affects the predictive distribution (eq. 3) significantly, but integration over the hyperparameters
has a lesser effect.
Marginalization can rarely be done analitically.
The alternatives are Gaussian approximations [5],
[6] and Monte Carlo Methods [7].
Complex models might produce overfitting the data, they also have an obvious extra computational cost. Bayesian inference can help tackle the problems of complexity, uncertainity and selection of probabilistic models.