next up previous contents
Next: Time series data Up: Blackbox Model Selection Previous: Searching specific sets of

Fast feature selection with LOO-XVE

Feature selection is an important part of modeling when the dimensionality of a data set becomes large. Even when there are many attributes, it is common that only a small subset are needed for good predictions. Finding that subset is something Blackbox is very good at because of the speed at which LOO-XVE can be done.

In order to demonstrate the feature selection capability of Blackbox, we'll give it a data file with 8 inputs and 1 output where only 4 of the inputs are relevant to the output. The data file was synthetically generated as 800 points from the function: tex2html_wrap_inline1634 noise. The function is globally linear in tex2html_wrap_inline1636 and tex2html_wrap_inline1638 and nonlinear in tex2html_wrap_inline1640 and tex2html_wrap_inline1642 . We'll use Blackbox to find the best metacode for this function.

File -> Open -> 4of8.mbl
Blackbox -> Seconds  300
            Launch!

The length of time required to find the best metacode will vary depending on the machine, but 300 seconds should work on many PCs. You can watch its progress during the search. It almost immediately finds the four relevant features and observes that a very local kernel regression does well (metacode A30:-9-9-9-9). Next, it finds that linear regression is better than kernel regression with these features (metacode L30:-9-9-9-9). This is no surprise since the function is globally linear in two of the features. After further exploration, it finds the two globally linear features (metacode L30:-0-0-9-9). Finally, it observes that when the two globally linear features are handled globally, it can make the smoothing even more local for the best model of L20:-0-0-9-9.



Jeff Schneider
Fri Feb 7 18:00:08 EST 1997