|
|
ICML 2003
Workshop on |
|
|
Important Dates Organizers: |
The 20th International Conference on Machine
Learning (ICML-2003) will be held in Washington, DC on August 21-24, 2003.
It will be co-located with KDD and
COLT. Submissions should be sent to rayid.ghani@accenture.com in pdf or postscript format There is a spectrum of ways to use data in machine learning and data mining. At the one end is completely unsupervised learning or clustering, and at the other end is supervised learning where the target output is known for every example. This workshop aims to explore the space between these two extremes. Techniques that have been proposed include learning from unlabeled data with hints, learning from unlabeled and positive-only labeled data, learning from distantly and noisily labeled data, combining labeled and unlabeled data with cotraining, EM and other semi-supervised techniques, and transductive learning, where the test data is added as an additional source of unlabeled data. The possible sources of labels and hints are also broad: systematic hand- labeling, labels acquired through active learning, and hints derived from domain knowledge are among the techniques which may be used. The goal of this workshop is to bring together researchers from different fields to talk about their different perspectives on this intersection and to share their latest ideas. We see the workshop as a venue not only for the presentation of papers focusing on exploiting unlabeled data, but also a forum for sharing ideas across different application domains. In particular it is an opportunity for discussion of techniques which are applicable to multiple types of datasets, and experiments across many points in the continuum from unsupervised to supervised learning. The use of domain knowledge as a source of partial supervision, and the generation of examples to be labeled by domain experts though active learning are of particular significance in the data mining context. We are also interested in promoting discussion to develop diagnostic techniques that can inform the user whether unlabeled data is helping or hurting the performance of the underlying learner. We see this as a unique opportunity due to the co-location of ICML with KDD. With this workshop co-located with KDD, we will target researchers from both academia and industry who are involved in data mining to participate in the workshop. For many data mining problems, large amounts of data have been collected and the labels are either not known or are expensive to obtain. Such examples include security applications (intrusion detection, anomaly detection), CRM (customer interactions, transactional data, call center applications), financial industry (fraud detection, loan defaults, banking), targeted marketing and retail applications (supply chain optimization). Most of these applications have large amounts of unlabeled data being captured but rarely utilized. We encourage the participation of people working on practical applications where some form of unlabeled data can be beneficial. The workshop will consist of both regular paper
presentations, and debates. Regular papers can be up to
8 pages, and may address work in progress.
Problem Descriptions from
Machine Learning/Data Mining Practitioners To be decided later Rayid Ghani Rosie Jones Chuck Rosenberg Kristin Bennett, Rennselear Polytechnic Institute |
|