(15) |

where is the output of network , and is the weight associated to that network. If the networks have more than one output, a different weight is usually assigned to each output. The ensembles of neural networks have some of the advantages of large networks without their problems of long training time and risk of over-fitting.

Moreover, this combination of several networks that cooperate in solving a given task has other important advantages, such as [LYH00,Sha96]:

- They can perform more complex tasks than any of their subcomponents.
- They can make an overall system easier to understand and modify.
- They are more robust than a single network.

Techniques using multiple models usually consist of two independent phases: model generation and model combination [Mer99b]. Once each network has been trained and assigned a weights (model generation), there are, in a classification environment three basic methods for combining the outputs of the networks (model combination):

*Majority voting.*Each pattern is classified into the class where the majority of networks places it [Mer99b]. Majority voting is effective, but is prone to fail in two scenarios:- When a subset of redundant and less accurate models comprise the majority, and
- When a dissenting vote is not recognized as an area of specialization for a particular model.

*Sum of the outputs of the networks.*The output of the ensemble is just the sum of the outputs of the individual networks.*Winner takes all.*The pattern is assigned to the class with the highest output over all the outputs of all the networks. That is, the network with the largest outputs directly classify the pattern, without taking into account the other networks.

The most commonly used methods for combining the networks are the *majority voting* and *sum of the outputs of the networks*, both
with a weight vector that measures the confidence in the prediction of
each network. The problem of obtaining the weight vector
is not an easy task. Usually, the values of the
weights are constrained:

(16) |

in order to help to produce estimators with lower prediction error [LT93], although the justification of this constraint is just intuitive [Bre96]. When the method of majority voting is applied, the vote of each network is weighted before it is counted:

arg max | (17) |

The problem of finding the *optimal* weight vector is a very
complex task. The ``Basic ensemble method (BEM)'', as it is called by
Perrone and Cooper [PC93], consists of weighting
all the networks equally.
So, having networks, the output of the ensembles is:

(18) |

Perrone and Cooper [PC93] defined the *Generalized
Ensemble Method*, which is equivalent to the *Mean Square Error
- Optimal Linear Combination* (MSE-OLC) without a constant term of
Hashem [Has97]. The form of the output of the ensemble is:

where the are real and satisfy the constraint . The values of are given by:

where is the symmetric correlation matrix
, where
defines the *misfit* of function , that
is the deviation from the true solution
,
.
The previous methods are commonly used. Nevertheless, many other
techniques have been proposed over the last few years. Among others,
there are methods based on linear regression [LT93],
principal components analysis and least-square regression
[Mer99a], correspondence analysis [Mer99b], and the use of
a validation set [OS96].

In this application, we use a genetic algorithm for obtaining the weight of each component. This approach is similar to the use of a gradient descent procedure [KW97], avoiding the problem of being trapped in local minima. The use of a genetic algorithm has an additional advantage over the optimal linear combination, as the former is not affected by the collinearity problem [PC93,Has97].