Next: Results Up: Benchmarking Bayesian neural networks Previous: Markov Chain Monte Carlo

Data and Methods

The Mackey Glass equation was originally developed for modelling white blood cells production. It is a time delay ordinary differential equation:

chosing the equation becomes chaotic and only short term forecasts are feasible.

Integrating the equation over the range by the trapezoidal rule yields after re-arranging terms:

And constants were assigned the values a=0.2, b=0.1 and c=10 with an integration step . The data is sampled every 6 points.

Patterns were generated windowing 6 inputs and 1 output. 500 patterns were used for training and 500 for testing in Bayes, but 50 of the training patterns had to be used for cross-validation in the backpropagation approach. The fact that the Bayesian approach does not produce overfitting and that no data is wasted is of great importance in real world applications where data is scarce. The second set of pattern files is equal to the above but with a positive uniform noise of 20% of the original signal added to the targets.

All the backpropagation and Bayesian networks used had 6 input units, 5 tanh hidden units and 1 output unit. The output can only be linear in the Bayes implementation we used and so we tried linear and the more commonly used tanh output units in backpropation. In the case of tanh output units the data was normalized dividing by 2, so all the points are in the range (0,1).

Each run of the Bayesian learning had an initialization and a sampling phase. During initialization different stepsize factors ( ) and lenghts L are tried to get a stable system. During sampling different L were tried, choosing the one with the smallest mean error in the last 200 states. is choosed so that no more than 20% of the candidates states are rejected, but big enough so the sampling is efficient. The limitations of the method are computational: 1 sampling run takes about 15 hours on a Sun Ultra I.

We have used version 3.6 of the MARS library provided by Friedman. In MARS we have used a maximum of 2 interactions, 30 and 100 basis functions. Noticing very litle improvement on more complex regression schemes.

Next: Results Up: Benchmarking Bayesian neural networks Previous: Markov Chain Monte Carlo

Rafael A. Calvo
Fri Apr 18 12:26:35 GMT+1000 1997