12:00, Wed 30 Jul 1997 Text Classification with Support Vector Machines Thorsten Joachims Text Classification is the process of assigning text documents to categories based on their content. When applying machine learning methods to learning text classifiers from examples, one has to deal with very high dimensional feature spaces. Usually the number of available features exceeds the number of training examples. So, care has to be taken to avoid "overfitting" during learning. Overfitting is the problem of poor generalization performance despite a low error rate on the training data. Support Vector Machines (SVM's) [Vapnik et al.] are a new learning method especially designed to prevent overfitting in high dimensional feature spaces. They combine ideas from Computational Learning Theory and Neural Nets and have shown excellent performance on other high dimensional learning tasks (e. g. OCR). SVM's are very different from conventional machine learning approaches and introduce interesting new concepts. The talk will identify the problems faced in Text Classification, relate them to SVM's, and give some empirical results. Much of the talk will be a tutorial about Support Vector Machines and what interesting research issues they suggest.