TextMiner: Text Classification for Intelligent Agent Portfolio Management





There is a positive correlation between news reports on a company's financial outlook and the company's attractiveness as an investment. However, because of the volume of such reports, it is impossible for financial analysts or investors to track and read all of them.

A system that automatically classifies news reports that reflect positively or negatively on a company's financial outlook would greatly benefit analysts and investors. In the application domain of stock portfolio management (see Warren), software agents that evaluate the risks associated with the individual companies of a portfolio should be able to read, classify and weigh electronic news articles, to give investors an indication of the financial outlook of a company.

To accomplish this task, we treat the unsupervised reading and understanding of news articles as an automatic text classification problem. In this project, we are developing an automatic text classifying technique resulting in software agents that we call "Domain Experts." Domain Experts use a sampling algorithm -- based on Weighted Majority algorithm -- that make use of frequently co-occurring phrases as their feature vector.

We call the sampling technique used by Domain Experts "self-confident" sampling. Briefly, "self-confident" sampling is a technique for sampling more promising data from unlabeled data sets to improve a classifier's performance. The "self-confident" sampling is a kind of pseudo labeling method that predicts a label for unlabeled data on the basis of the entropy value of unlabeled data, and the trainer's confidence, which is acquired during training phase.

Instructions for access to TextMiner:

To receive access to TextMiner, please print the the CMU License Agreement
(.pdf). Click here to download Adobe Reader.

  • Read carefully and if you agree to the terms, complete the bottom portion of the Agreement. Include your name, institutional affiliation and address, a url for the website that describes your group's or your own research activities, your email address, and, if you are a student, the name, position, url and email address of your advisor. Please sign and date the agreement.
  • Send the completed agreement to us by mail at:
    Katia Sycara
    The Robotics Institute
    5000 Forbes Avenue
    PIttsburgh, PA 15213

  • We will send qualified users a user name and password via email, so that you can access the executable by downloading Communicator Library v1.4.1_Apr2003 (Jar) from the downloads page, here.

Download GoodNews (TextMiner) Interface, and instructions for use.

Access the Data for TextMiner Testing/Training


Young-Woo Seo, Joseph Giampapa, and Katia Sycara, Financial news analysis for intelligent portfolio management, Tech. Report CMU-RI-TR-04-04, Robotics Institute, Carnegie Mellon University, Jan. 2004.

Young-Woo Seo, Joseph A. Giampapa, and Katia P. Sycara, Text classification for intelligent agent portfolio management, Tech. Report CMU-RI-TR-02-14, Robotics Institute, Carnegie Mellon University, May 2002.

Robotics Institute Project Page


Copyright 2006 - 2012 © Advanced Agent-Robotics Technology Lab - The Robotics Institute - Carnegie Mellon University

Internal Site (Restricted Access).