William W. Cohen
Director, Research Engineering, Google AI
News: I have moved to Google! Since June 2018 I have been
starting up and leading a new research group in AI/ML based in located
in Pittsburgh, in Google's Bakery Square location.
William Cohen Principal Scientist at Google,
and is based in Google's Pittsburgh office. He received his bachelor's
degree in Computer Science from
Duke University in 1984, and a PhD
in Computer Science from Rutgers
University in 1990. From 1990 to 2000 Dr. Cohen worked at
AT&T Bell Labs and
later AT&T Labs-Research,
and from April 2000 to May 2002 Dr. Cohen worked
at Whizbang Labs, a company
specializing in extracting information from the web. From 2002 to
2018, Dr. Cohen worked at Carnegie Mellon University in
the Machine Learning Department,
with a joint appointment in
the Language Technology
Institute, as an Associate Research Professor, a Research
Professor, and a Professor. Dr. Cohen also was the Director of the
Undergraduate Minor in Machine Learning at CMU and co-Director of the
Master of Science in ML Program.
Dr. Cohen is a past president of
the International Machine
Learning Society. In the past he has also served as an action
editor for the
and Machine Learning series of books published
by Morgan Claypool, for
Intelligence, the Journal of
Machine Learning Research, and
the Journal of Artificial
Intelligence Research. He was General Chair for
the 2008 International
Machine Learning Conference, held July 6-9 at
the University of
Program Co-Chair of
International Machine Learning Conference; and Co-Chair of
International Machine Learning Conference. Dr. Cohen was also the
co-Chair for the 3rd
Int'l AAAI Conference on Weblogs and Social Media, which was held
May 17-20, 2009 in San Jose, and was the co-Program Chair for
the 4rd Int'l AAAI
Conference on Weblogs and Social Media. He is
Fellow, and was a winner of the 2008
"Test of Time" Award for the most influential SIGMOD paper of
1998, and the
"Test of Time" Award for the most influential SIGIR paper of
Dr. Cohen's research interests include information integration and
machine learning, particularly information extraction, text
categorization and learning from large datasets. He has a
long-standing interest in statistical relational learning and learning
models, or learning from data, that display non-trivial structure.
He holds seven
patents related to learning, discovery, information retrieval, and
data integration, and is the author of more than 200 publications.
Dr. Cohen is currently on leave from his position as a Professor in
the Department of Machine Learning, with a joint appointment in the
Language Technology Institute.
Announcements and FAQs
Projects, Publications, Software, Datasets, and Talks
These are now being distributed from my Github page.
- Spring 2018: Undergraduate Level Machine Learning with Large Datasets, 10-405, Mon-Wed 3:30-4:20 in GHC 4307
- Fall 2017: Machine Learning with Large Datasets, 10-605 and 10-805, Tues-Thus 1:30-2:50pm, PH 100.
Learning with Large Datasets, 10-605 and 10-805, Tues-Thus
1:30-2:50pm, Wean Hall 7500.
- Spring 2016: Machine Learning 10-601, Mon-Wed time 10:30-11:50am, GHC 4401, with Maria-Florina Balcan.
- Fall 2015: Machine Learning with Large Datasets, 10-605 and 10-805, Tu-Thu 4:30-5:50am in Dougherty Hall 2210.
- Spring 2015: Machine Learning with Large Datasets, 10-605 and 10-805, Tu-Thu 10:30-11:50am in BH A51
- Fall 2014: 10-601 Machine Learning, Tu-Thu 1:30-2:50, Wean 7500
- Spring 2014: 10-605 Machine Learning with Large Datasets, Mon-Wed 1:30-2:50, Dougherty Hall 1112
- Fall 2013: 10-601 Machine Learning, Mon-Wed 4:30-5:50, Doherty Hall 2315 (with Eric Xing).
- Spring 2013: Machine Learning with Large Datasets, Mon-Wed 1:30-2:50, 4307 GHC
- Fall 2012: ML 10-802 and LTI 11-772 (Analysis of Social Media), 10:30-11:50pm Tues & Thus, 4303 Gates Building.
- Fall 2012: 10-915, the MLD Journal Club, 12-1:20pm Tue & Thu, 4101 Gates Building (with Roy Maxion).
- Spring 2012: Machine Learning with Large Datasets, Tues-Thurs 1:30-2:50pm, NSH 1305
- Fall 2011: Structured
Prediction for Language and Other Discrete Data (SPLODD-2011), ML
10-710 and LTI 11-763, Tues-Thursday 3:00-4:20 in Gates-Hillman 4211.
This is co-taught by myself and Noah Smith, and will include some
subjects from Information
Extraction and some from Language and Stats 2. A
machine learning course (10-701 or consent of the instructors) is a
prereq; we don't recommend that you take the course if you have
already taken Information Extraction or Language and Stats 2.
- Spring 2011: ML 10-802 and LTI 11-772 (Analysis of Social Media), 10:30-11:50pm Tues & Thus, 4303 Gates Building.
- Spring 2011: 10-915, the MLD Journal Club, 3-4pm Mon & Wed, 4101 Gates Building.
- Fall 2010: 10-707
(Information Extraction - cross-listed in LTI as 11-748),
1:30-2:50pm Mon & Wed, Gates 4101. The first class is 9/8, the
Wed after Labor Day, to allow incoming students time to attend the IC
- Spring 2010: 10-802 (Analysis of Social Media).
- Fall 2009: 10-707
(Information Extraction), 1:30-2:50pm Mon & Wed, 5222 Gates
- Spring 2008: 10-601 (Machine Learning)
with Tom Mitchell, on 3-4:30
Mon & Wed in Wean Hall 5409.
- Fall 2007: Analysis of Social
Media, Machine Learning 10-802 and LTI 11-772, with Natalie Glance
(of Google Pittsburgh) - a brand-new seminar course. 4:30-6:30
Tuesdays in Wean Hall 4623.
- Note: This site is the shattered remains of a once-beautiful wiki,
created by the students of 10-802, generously hosted for free by
ScribbleWiki, tragically lost (due
a combination of RAID drive failures and low-bidder backup schemes),
and then largely recovered using
from various internel caches and archives.
- Fall 2007: Current Topics
in Computational Biology (Journal Club), 02-701. (Announcements). Thursdays from 4:00-5:00 in 411
Mellon Institute (after Cell & Systems Modeling).
- Spring 2007: Information Extraction, Machine
Learning 10-707 and LTI 11-748 - back by popular demand for the first time since 2004!
- Fall 2006: Current Topics in Computational Biology (Journal Club), 02-701.
- Spring 2006: Read the Web, CALD 10-709.
- June 21,23,25, 2005: A mini-course on Minorthird. Materials are below.
- Slides, notes, and sample files from first
- Slides, notes, and sample files from second
- Powerpoint slides from third
- Jar file for minorThird, if you
only want to run the code, not compile it or read it.
The installation process here is:
- Install Java 1.4 or higher (actually, JRE is all you need).
- Download the jar for minorThird
and stick it in some directory.
- Optionally, download the sample data
repository and unpack it into the same directory.
- Change to that same directory and
then run Minorthird with the command
java -Xmx500M -jar minorthird.jar
What will pop up will be a small launch pad that can be used to
start any of the UI programs. You can also start a particular
main by specifying minorthird.jar as your classpath, for
java -Xmx500M -cp minorthird.jar edu.cmu.minorthird.ui.Help
- If you want to do a real install here's the home page on Sourceforge, and
a document on how to do a CVS
- Spring 2004: "Learning to Turn Words into Data:
Machine Learning Approaches to Information Extraction and Information Integration", CALD 10-707 and LTI 11-748.
- Daniel Spokoyny, LTI PhD student, co-supervised with Taylor Berg-Kirkpatrick.