 
     
    
     CMU Text Learning Group
      CMU Text Learning Group
     
     
    
Goal
 
Our goal is to develop new machine learning algorithms for text and
hypertext data.  Applications of these algorithms include information
filtering systems for the Internet, and software agents that make
decisions based on text information.  For further information, contact
Professor Tom
Mitchell.
Projects
 
 
- The
World Wide Knowledge Base (Web->KB) project: converting the web to a giant
symbolic knowledge base 
- WebWatcher: a
tour guide for the web, specialized to a site 
- Personal
WebWatcher: a tour guide for the web, specialized to a person
-  NewsWeeder: a newsreader that learns your reading
interests. (former research project; became WiseWire).  
-  Cora: computer science research paper search engine
-  ifile: mail filter program for EXMH
-  CorpusBuilder: automatically building web search queries to construct language-specific corpora
Meetings
 
Meetings are held on Mondays at 2:30pm in Wean Hall
8220.  Refreshments served.
      - January 31, 2000: Web->KB update and planning meeting
      
- February 7, 2000: Sean Slattery on "Discovering Test Set Regularities in Relational Domains"
      
- February 21, 2000: Thorsten Joachims on Transductive Inference for Text Classification using Support Vector Machines
      
- February 28, 2000: Kamal Nigam on "Analyzing the Effectiveness and Applicability of Co-training"
      
- March 6, 2000: Dunja Mladenic on Feature selection for unbalanced class distribution and Naive Bayes
      
- March 20, 2000: Antal van den Bosch on "Memory-based language processing: some findings and applications"
      
- March 27, 2000: Web->KB/SHIELD update and planning
      
- April 3, 2000: Dayne Freitag on "Boosted Wrapper Induction"
      
- April 10, 2000: Paul Bennett on "Using Naive Bayes to Estimate Posterior Distributions"
      
- April 24, 2000: Web->KB/SHIELD organizational meeting
      
- September 15, 2000: Rosie Jones on "Automatically Building a Corpus for a Minority Language from the Web"
      
- October 16, 2000: Kamal Nigam on "Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching"
      
- October 30, 2000: Nick Roy on "Spoken Dialogue Management using Probabilistic Reasoning"
      
- November 20, 2000: Krzysztof Czuba on "Learning a Pruning Strategy for a Chart Parser"
Past meetings: Fall 1999, Spring 1999 Fall 1998, Fall 1997 , Spring 1997Publications
    All newer publications are listed on individual project pages or individual's pages.  Several publications from 1995 and 1996 are not associated with a specific project.
People
Alumni
Photos - 
Look for us in the Learning Lab.
The old text learning page
This page at /afs/cs.cmu.edu/project/theo-4/text-learning/www/index.html
Last modified: Tue Dec  9 13:17:39 EST 2003