Research Notebook: The Discipline of Machine Learning

A scientific field is best defined by the central question it studies. In machine learning, that question is: "How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?"

This covers a broad range of learning tasks, such as how to design autonomous mobile robots that learn to navigate from their own experience; how to data mine medical records to learn which future patients will respond best to which treatments; and how to build search engines that automatically customize themselves to their user's interests.

More precisely, we say that a machine "learns" with respect to a particular task T, performance metric P and type of experience E, if the system reliably improves its performance P at task T following experience E. Depending on how we define T, P and E, the learning task might also be called "data mining," "autonomous discovery," "database updating" or "programming by example."

Machine learning is a natural outgrowth of computer science and statistics. Where computer science has focused primarily on how to program computers, machine learning asks how we can get computers to program themselves, from experience plus some initial structure. Where statistics has focused primarily on what conclusions can be inferred from data, machine learning incorporates additional questions about what computational architectures and algorithms can be used to most effectively capture, store, index, retrieve and merge these data; how multiple learning subtasks can be orchestrated in a larger system; and questions of computational tractability.

Closely related to machine learning are psychology, neuroscience and related fields. The questions of "how do animals learn?" and "how can computers learn?" are highly intertwined. So far, the insights machine learning has gained from studies of human learning are much weaker than those it has gained from statistics and computer science, due primarily to the weak state of our understanding of human learning.

Nevertheless, the synergy between studies of machine and human learning is growing, with machine learning algorithms now being suggested as explanations for neural signals observed in learning animals. It's reasonable to expect the synergy between studies of human learning and machine learning to grow substantially, as they are close neighbors in the landscape of core scientific questions.

Other fields --- from biology to economics to control theory --- also have a core interest in the question of how systems can automatically adapt or optimize to their environment, and machine learning will have an increasing exchange of ideas with these fields in the years ahead.

Application Successes

It's worth noting that as late as 1985, there were almost no commercial applications of machine learning. One measure of progress in the field is the growth of significant real-world applications.

Speech Recognition: All currently available commercial systems for speech recognition use machine learning in one fashion or another to train the system to recognize speech. The reason is simple --- speech recognition accuracy is greater if one trains the system than if one tries to program it by hand. In fact, many commercial speech recognition systems involve two distinct learning phases --- one before the software is shipped (training the general system in a speaker-independent fashion), and a second phase after the user purchases the software (to achieve greater accuracy by training in a speaker-dependent fashion).

Computer Vision: Many current vision systems, from face recognition systems to systems that automatically classify microscope images of cells, are developed using machine learning, again because the resulting systems are more accurate than handcrafted programs. One massive-scale application is the system used by the U.S. Postal Service to automatically sort letters containing handwritten addresses. Over 85 percent of handwritten mail in the United States is sorted automatically, using handwriting analysis software trained to very high accuracy using machine learning over a very large data set.

Bio-Surveillance: A variety of government efforts to detect and track disease outbreaks now use machine learning. For example, the Real-time Outbreak and Disease Surveillance, or RODS, project launched by the University of Pittsburgh in 1999 involves collection of admissions reports by emergency rooms and the detection of anomalous patterns of symptoms and their geographical distribution. Current work involves adding in a rich set of additional data, such as retail purchases of over-the-counter medicines, to increase the information flow into the system, further increasing the need for automated learning methods given this even more complex data set.

Robot Control: Machine learning methods have been successfully used in a number of robot systems. For example, several researchers have demonstrated the use of machine learning to acquire control strategies for stable helicopter flight and helicopter aerobatics.

Accelerating Empirical Sciences: Many data-intensive sciences now make use of machine learning methods to aid in the scientific discovery process. Machine learning is being used to learn models of gene expression in the cell from high-throughput data; to discover unusual astronomical objects from massive data collected by the Sloan Digital Sky Survey; and to characterize the complex patterns of brain activation that indicate different cognitive states of people in fMRI scanners. Machine learning methods are reshaping the practice of many data-intensive empirical sciences, and many of these sciences now hold workshops on machine learning as part of their field's conferences.

Placing Machine Learning Within Computer Science

Given this sample of applications, what can we infer about the future role of machine learning in the field of computer applications? The above applications suggest a niche where machine learning has a special role to play. In particular, machine learning methods are already the best methods available for developing two types of applications:

1.) Applications too complex for people to manually design the algorithms: Sensor-based perception tasks, such as speech recognition and computer vision, fall into this category. Machine learning is the software development method of choice because it's relatively easy to collect labeled training data.

2.) Applications that require software to customize itself to fit its users: This machine learning niche is growing rapidly. Examples include speech recognition programs that learn to recognize the user's voice patterns, online merchants that learn your purchasing preferences, and email readers that evolve to block your particular definition of spam.

While there will remain software applications where machine learning may never be useful, the niche where it will be used is growing as applications grow in complexity; as the demand grows for self-customizing software; as computers gain access to more data; and as we develop increasingly effective machine learning algorithms.

Beyond its obvious role as a method for software development, machine learning is also likely to help reshape our view of computer science. By shifting the question from "how to program computers" to "how to allow them to program themselves," machine learning emphasizes the design of self-monitoring systems that self-diagnose and self-repair, and on approaches that model their users, and then take advantage of the steady stream of data flowing through the program rather than simply processing it.

Some Current Research Questions

Substantial progress has already been made in the development of machine learning algorithms and their underlying theory. The field is moving forward in many directions, exploring a variety of types of learning tasks, and developing a variety of underlying theory. Here is a sample of current research questions:
  • Can unlabeled data be helpful for supervised learning?
  • Are there situations where unlabeled data can be guaranteed to improve the expected learning accuracy?
  • How can we transfer what is learned for one task to improve learning in other related tasks?
  • What is the relationship between different learning algorithms, and which should be used when?
  • For learners that actively collect their own training data, what is the best strategy?
  • What is the most efficient training strategy for actively collecting new data as its learning proceeds?
  • To what degree can we have both data privacy and the benefits of data mining?
It's also interesting to consider longer term research questions.

For instance, can we build never-ending learners? The vast majority of machine learning work to date involves running programs on particular data sets, then putting the learner aside and using the result. In contrast, learning in humans and other animals is an ongoing process in which the agent learns many different capabilities, often in a sequenced curriculum, and uses these different learned facts and capabilities in a highly synergistic fashion.

Why not build machine learners that learn in this same cumulative way, becoming increasingly competent rather than halting at some plateau?

Can machine learning theories and algorithms help explain human learning? Recently, theories and algorithms from machine learning have been found relevant to understanding aspects of human and animal learning. And machine learning algorithms for discovering sparse representations of naturally occurring images predict surprisingly well the types of visual features found in the early visual cortex of animals. However, theories of animal learning involve considerations that have not yet been considered in machine learning, such as the role of motivation, fear, urgency, forgetting, and learning over multiple time scales. There is a rich opportunity for cross fertilization here --- an opportunity to develop a general theory of learning processes covering animals as well as machines, and potential implications for improved strategies for teaching students.

Can a new generation of computer programming languages directly support writing programs that learn? In many current machine-learning applications, standard machine-learning algorithms are integrated with hand-coded software into a final application program. Why not design a new computer programming language that supports writing programs in which some subroutines are hand-coded while others are specified as "to be learned"? Such a programming language could allow the programmer to declare the inputs and outputs of each "to be learned" subroutine, then select a learning algorithm from the primitives provided by the programming language. Interesting new research issues arise here, such as designing programming language constructs for declaring what training experience should be given to each "to be learned" subroutine, when, and with what safeguards against arbitrary changes to program behavior.

Will computer perception merge with machine learning? Given the increasing use of machine learning for state-of-the-art computer vision, computer speech recognition and other forms of computer perception, can we develop a general theory of perception grounded in learning processes? One intriguing opportunity here is the incorporation of multiple senses (sight, sound, touch, etc.) to provide a setting in which self-supervised learning could be applied to predict one sensory experience from the others. Already researchers in developmental psychology and education have observed that learning can be more effective when people are provided multiple input modalities, and work on co-training methods from machine learning suggests the same.

Considering Ethical Questions

While it's impossible to predict the future, further research in machine learning will almost certainly produce more powerful computer capabilities. This, in turn, will lead on occasion to ethical questions about where and when to apply the resulting technology.

Consider that today's technology could enable discovering unanticipated side effects of new drugs, if it were applied to data describing all doctor visits and medical records in the country along with all purchases of drugs. Recent cases in which new drugs were recalled following a number of unanticipated patient deaths might well have been ameliorated by already available machine-learning methods. However, applying this machine-learning technology also would have impacted our personal privacy, as our medical records and drug purchases would have had to be captured and analyzed. Is this something we wish as a society to do?

Related questions occur about collecting data for security and law enforcement or for marketing purposes. Like any powerful technology, machine learning will raise its share of questions about whether it should be used for particular purposes. Although the answer to each of these questions will have a technical component, in some cases the question will also have a social policy component requiring all of us to become engaged in deciding its answer.

Tom Mitchell is the Fredkin Professor of Artificial Intelligence and Machine Learning and chair of the Machine Learning Department --- thefirst of its kind when it was established in 2006. In May 2009, he wasnamed a University Professor, the highest distinction for a member ofthe Carnegie Mellon Faculty. A graduate of Massachusetts Institute ofTechnology, Mitchell earned his Ph.D. in electrical engineering with aminor in computer science at Stanford University. A former president ofthe Association for the Advancement of Artificial Intelligence, he is afellow of both the AAAI and the American Association for theAdvancement of Science and winner of the 2007 AAAI DistinguishedService Award.
For More Information: 
Jason Togyer | 412-268-8721 | jt3y@cs.cmu.edu