is a positive correlation between news reports on a company's
financial outlook and the company's attractiveness as an
because of the volume of such reports, it is impossible
for financial analysts or investors to track and read all
that automatically classifies news reports that reflect
positively or negatively on a company's financial outlook
would greatly benefit analysts and investors. In the application
domain of stock portfolio management (see Warren),
software agents that evaluate the risks associated with
the individual companies of a portfolio should be able to
read, classify and weigh electronic news articles, to give
investors an indication of the financial outlook of a company.
this task, we treat the unsupervised reading and understanding
of news articles as an automatic text classification problem.
In this project, we are developing an automatic text classifying
technique resulting in software agents that we call "Domain
Experts." Domain Experts use a sampling algorithm --
based on Weighted Majority algorithm -- that make use of
frequently co-occurring phrases as their feature vector.
the sampling technique used by Domain Experts "self-confident"
sampling. Briefly, "self-confident" sampling is a technique
for sampling more promising data from unlabeled data sets
to improve a classifier's performance. The "self-confident"
sampling is a kind of pseudo labeling method that predicts
a label for unlabeled data on the basis of the entropy value
of unlabeled data, and the trainer's confidence, which is
acquired during training phase.
for access to TextMiner:
To receive access to TextMiner, please print
CMU License Agreement
to download Adobe Reader.
carefully and if you agree to the terms, complete the bottom
portion of the Agreement. Include your name, institutional
affiliation and address, a url for the website that describes
your group's or your own research activities, your email
address, and, if you are a student, the name, position,
url and email address of your advisor. Please sign and date
the completed agreement to us by mail at:
The Robotics Institute
5000 Forbes Avenue
PIttsburgh, PA 15213
will send qualified users a user name and password via email,
so that you can access the executable by downloading Communicator
Library v1.4.1_Apr2003 (Jar) from the downloads page, here.
Interface, and instructions for use.
the Data for TextMiner Testing/Training
Seo, Joseph Giampapa, and Katia Sycara, Financial
news analysis for intelligent portfolio management, Tech.
Report CMU-RI-TR-04-04, Robotics Institute, Carnegie Mellon
University, Jan. 2004.
Seo, Joseph A. Giampapa, and Katia P. Sycara, Text
classification for intelligent agent portfolio management,
Tech. Report CMU-RI-TR-02-14, Robotics Institute, Carnegie
Mellon University, May 2002.
Institute Project Page