---------------------------------------- ----------------------------------------

Tutorials to be held at International Conference on Machine Learning
(ICML-2003)

---------------------------------------- ----------------------------------------

The Twentieth International Conference on Machine Learning (ICML-2003) will be held in Washington D.C., August 21-24, 2003.
The conference will bring together researchers to exchange ideas and report on recent progress in the field of machine learning. The conference is collocated with The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003).

The tutorials will be held on the first day of the conference, August 21, 2003, at the conference site.

List of Tutorials

To attend any of the tutorials, please register for the conference.

Description of Tutorials

Mining Time Series Data, by Christos Faloutsos

How can we find patterns in a sequence of sensor measurements (eg., a sequence of temperatures, or water-pollutant measurements)? How can we compress it? What are the major tools for forecasting and outlier detection? The objective of this tutorial is to provide a concise and intuitive overview of the most important tools, that can help us find patterns in sensor sequences. Sensor data analysis becomes of increasingly high importance, thanks to the decreasing cost of hardware and the increasing on-sensor processing abilities. We review the state of the art in three related fields: (a) fast similarity search for time sequences, (b) linear forecasting with the traditional AR (autoregressive) and ARIMA methodologies and (c) non-linear forecasting, for chaotic/self-similar time sequences, using lag-plots and fractals. The emphasis of the tutorial is to give the intuition behind these powerful tools, which is usually lost in the technical literature, as well as to give case studies that illustrate their practical use.

Tutorial Notes are available on line.

The State of the Art in Language Modeling, by Joshua Goodman

Language models give the probability of word sequences, e.g., "recognize speech" is much more probable than "wreck a nice beach." They are used in speech recognition, machine translation, and many other areas. Because of size and data issues, language modeling is an especially challenging subfield of machine learning. The bulk of the tutorial will describe current techniques in language modeling, including techniques like word clustering and smoothing (regularization) that are useful in many areas besides language modeling, and more language-model specific techniques such as high order n-grams and sentence mixture models. Finally, we will describe applications of language modeling in more detail, including applications outside of language, as well as available toolkits and corpora. See for more details.

Tutorial Notes are available on line.

Practical Sample Complexity, by John Langford

A collection of work from several researchers has resulted in a set of sample complexity results which can be useful to machine learning researchers now. I The goal of this tutorial is making researchers in the feld of machine learning aware of the state-of-the-art in sample complexity bound techniques from learning theory. There are immediately useful implications on error reporting (which it would be nice to see adopted). There are also longer term implications on the design of learning algorithms and research projects. We would like to increase the level of contact between the learning theory and machine learning communities. Learning theory might beneft by working on more relevant problems while machine learning can benefit from the sharpened intuitions learning theory brings and more rigorous systems for learning application and reporting. This presentation is a step on that path.

Tutorial Notes are available on line.

Machine Learning and Genetic Microarrays, by Jude Shavlik and David Page

Machine Learning and Genetic Microarrays, Gene-expression microarrays, commonly called "gene chips", make it possible to simultaneously measure the rate at which a cell or tissue is expressing each of its thousands of genes. One can use these comprehensive snapshots of biological activity to infer regulatory pathways in cells, identify novel targets for drug design, and improve the diagnosis, prognosis, and treatment planning for those suffering from disease. However, the amount of data this new technology produces is more than one can manually analyze. Hence, the need for automated analysis of microarray data offers an opportunity for machine learning to have a significant impact on biology and medicine. In this tutorial we introduce microarray technology, the data it produces, and the types of machine learning tasks that naturally arise with this data. We review some of the recent prominent applications of machine learning to gene-chip data, as well as major challenges that gene-chip data poses for machine learning. We close the tutorial by describing related tasks where machine learning may have a further impact on biology and medicine, and describe additional types of interesting data that recent advances in biotechnology allow biomedical researchers to collect.

Tutorial Notes are available on line.

Additional Information

For additional information, see the conference web site: http://www.hpl.hp.com/conferences/icml03/, which will provide additional details as they become available.

---------------------------------------- ----------------------------------------

For any further questions regarding the tutorials, please e-mail to ICML-2003 Workshop and Tutorial chair: Dunja.Mladenic@ijs.si.