Experiment Tools
| Linear regression [Applet] | SVMs [Applets] | 
| Naive Bayes [Applet] | PAC learning [Applets] | 
| Logistic regression [Applet] | Java Bayes [Applet] | 
| Discriminative v. Generative models [Applet] | K-Means [Applet] | 
| Decision trees [Applet] | Mixture of Gausians [Applet] | 
| Boosting [Adaboost Applet] | PCA [Applet] | 
| Instance-based learning [Applet] | Reinforcement Learning [RL Sim Applet] | 
| EM for estimating Gaussian mixtures | 
Conference Proceedings
- Text REtrieval Conference (TREC)
- SIGIR
- Digital Libraries (DL)
- Conference on Information and Knowledge Management (CIKM)
- World Wide Web
Datasets
- Wikipedia database [English] [French] [German]
- University of California Irvine Machine Learning Repository
- Text REtrieval Conference (TREC)
- Reuters-21578 Dataset
- International Conference of Weblogs and Social Media Dataset
- NIST
- The Linguistic Data Consortium
- LDC resources at CMU
- The Rosetta Project
Online Books
- P. Ingwersen. Information Retrieval Interaction. London: Taylor Graham, 1992.
- C. J. van Rijsbergen. Information Retrieval. London: Butterworths, 1979
Software and Tools
- The Porter stemmer
- Martin Porter's Porter algorithm Web page
- MXTERMINATOR English sentence boundary detector
- MXPOST English part of speech tagger
- Language ID tools
- R - free software environment for statistical computing and graphics