MIME-Version: 1.0 Server: CERN/3.0 Date: Sunday, 01-Dec-96 19:31:05 GMT Content-Type: text/html Content-Length: 3351 Last-Modified: Monday, 21-Oct-96 21:38:43 GMT Data Mining

EMV - CS537 Project Proposal



Classification in Data Mining


Eric Vitrano


Common Level

Class table Class tuple


Data Mining Classifiers

Stage 1
Once the above groundwork is complete, I will implement a version of an elementary data mining classification algorithm. This algorithm will be based on the ID-3 decision tree model, with limited pruning. A summary of the algorithm in pseudocode form is as follows:
The above algorithm will be implemented in Visual C++, with the intention to build a decision tree that will classify tuples into defined classes. The tree must be trained using a training set where the classes of the tuples is known, and then tested on data to see if the returned classes are of the appropriate type. The results can then be used for directing queries on incoming data, as well as classifying existing data.

Stage 2
When the above algorithm is implemented, a further algorithm will be implemented. This next algorithm will either be related to SLIQ, or will be something generated by observing the development and processes of the general case. Possible areas of improvement would be pruning on the fly, limiting the searches of the data and the amount of data needed to be kept in memory, and presorting/partial classification of the data.


Time Estimates

I would expect the following schedule to be an approximate scheme for progress:




  • EMV Home Page