Date: Tue, 10 Dec 1996 14:53:46 GMT Server: NCSA/1.4.2 Content-type: text/html Last-modified: Wed, 03 Jan 1996 23:13:43 GMT Content-length: 3637 Brute

Brute


Principal Investigators: Richard Segal and Oren Etzioni .

Overview

Brute is an inductive system for performing both data mining and classification tasks. At the core of Brute, is an algorithm for efficiently searching the space of conjunctive rules up to a user specified depth. When used for data mining, this core search can identify rules that meet a specific search criteria. When used for classification, the rules found by Brute's core search must be combined to form a classifier. Brute supports several mechanisms for building a classifier.

Brute differs from existing data mining and classification algorithms in that it uses massive search rather than greedy search. Massive search can avoid many of the pitfalls of greedy search, albeit at additional cost. Empirical analysis shows that brute performs much better than greedy algorithms for data mining and has similar performance when used for classification. Surprisingly, Brute's running time is often quite reasonable.

Efficiency

Brute's efficiency is important because of its reliance on massive search. Brute was implemented in C to achieve maximum search speed. Brute can process 100,000 rules per second on a 500 item database when run a SPARC-10 processor. Brute's running time grows linearly with the number of training examples.

Brute uses several pruning axioms to reduce the size of the space it must search. These axioms are sound in that they only remove portions of the search space guaranteed not to contain useful rules. Brute can commonly reduce the search space by a factor of a 1,000 or more.

Flexibility

Brute is a highly parameterized algorithm that allows the exploration of several different learning algorithms. Brute's options allow it to simulate algorithms such as BruteDL, Greedy3, Kdl, 1R, and many variations. This flexibility makes Brute a valuable tool for learning research.

Brute supports a wide variety of data formats. Brute can be used with minimal effort on databases from the UCI repository, C4.5 databases, and IND databases. A program is provided for automatically creating attribute description files that makes it easy to use Brute on new data sets.

Availability

Brute is available for both research and commercial uses. Please contact Oren Etzioni (etzioni@cs.washington.edu) for licensing details.

References

P. Riddle, O. Etzioni, C. Pearson, and R. Segal. Process improvement through automated feedback (preliminary report). In Proceedings of the Machine Learning Workshop on Integrated Learning in Real-World Domains, July 1992.

P. Riddle, R. Segal, and O. Etzioni. Representation design and brute-force induction in a Boeing manufacturing domain. Applied Artificial Intelligence, 8:125-147, 1994.

R. Segal and O. Etzioni. Learning decision lists using homogeneous rules. In Proceedings of the Twelfth National Conference on Artificial Intelligence, July, 1994.


segal@cs.washington.edu