~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A Novelty Detection Approach to Classification Nathalie Japkowicz Department of Computer Science Rutgers University Novelty Detection techniques are concept-learning methods that proceed by recognizing positive instances of a concept rather than discriminating between its positive and negative instances. Novelty Detection approaches consequently require very few, if any, negative training instances. In this presentation, I will introduce HIPPO, a particular Novelty Detection classifier which learns by training an autoencoder to reconstruct positive input instances at the output layer and then using this autoencoder to recognize novel instances. Classification is possible, after training, because positive instances are expected to be reconstructed accurately while negative instances are not. In a preliminary set of test experiments, HIPPO was shown to perform better than two standard discrimination-based classifiers on two out of three real-world domains, and as well as the most accurate classifier on the third. The system is currently in the process of being installed by the U.S. Navy in CH-46 helicopters as a gearbox monitoring device. The purpose of this talk is to describe HIPPO's inner workings in order to explain its behavior. It begins by dispelling common beliefs that autoencoders are equivalent to Principal Component Analysis and, therefore, cannot accurately classify nonlinear domains. It then turns to HIPPO's generalization strategy and shows that when used with an adequate number of hidden units, HIPPO and BACKPROP use different learning strategies, despite the fact that both systems are trained with the backpropagation procedure: while BACKPROP uses a competitive strategy, HIPPO adopts a cooperative one. This observation challenges yet another common belief in the connectionist community, namely, that the effective number of hidden units used by a network is smaller than the actual number of hidden units present in the model. The last part of the talk turns to the practical properties of HIPPO that my study allowed me to infer. In particular, it discusses HIPPO's extreme sensitivity to the statistical distribution of the training set, a property that allows it to deal with noise and rare cases better than BACKPROP, the standard connectionist classification method. This work is based on my dissertation research which is conducted under the supervision of Stephen J. Hanson and Mark A. Gluck.