William W. Cohen
Director, Research Engineering, Google AI
News: I have moved to Google! Starting June 2018 I will be
starting up and leading a new research group in AI/ML that will be
located in Pittsburgh in Google's Bakery Square location.
William Cohen is a Director of Research & Engineering at Google AI,
and is based in Google's Pittsburgh office. He received his bachelor's
degree in Computer Science from
Duke University in 1984, and a PhD
in Computer Science from Rutgers
University in 1990. From 1990 to 2000 Dr. Cohen worked at
AT&T Bell Labs and
later AT&T Labs-Research,
and from April 2000 to May 2002 Dr. Cohen worked
at Whizbang Labs, a company
specializing in extracting information from the web. From 2002 to
2018, Dr. Cohen worked at Carnegie Mellon University in
the Machine Learning Department,
with a joint appointment in
the Language Technology
Institute, as an Associate Research Professor, a Research
Professor, and a Professor. Dr. Cohen also was the Director of the
Undergraduate Minor in Machine Learning at CMU and co-Director of the
Master of Science in ML Program.
Dr. Cohen is a past president of
the International Machine
Learning Society. In the past he has also served as an action
editor for the
and Machine Learning series of books published
by Morgan Claypool, for
Intelligence, the Journal of
Machine Learning Research, and
the Journal of Artificial
Intelligence Research. He was General Chair for
the 2008 International
Machine Learning Conference, held July 6-9 at
the University of
Program Co-Chair of
International Machine Learning Conference; and Co-Chair of
International Machine Learning Conference. Dr. Cohen was also the
co-Chair for the 3rd
Int'l AAAI Conference on Weblogs and Social Media, which was held
May 17-20, 2009 in San Jose, and was the co-Program Chair for
the 4rd Int'l AAAI
Conference on Weblogs and Social Media. He is
Fellow, and was a winner of the 2008
"Test of Time" Award for the most influential SIGMOD paper of
1998, and the
"Test of Time" Award for the most influential SIGIR paper of
Dr. Cohen's research interests include information integration and
machine learning, particularly information extraction, text
categorization and learning from large datasets. He has a
long-standing interest in statistical relational learning and learning
models, or learning from data, that display non-trivial structure.
He holds seven
patents related to learning, discovery, information retrieval, and
data integration, and is the author of more than 200 publications.
Dr. Cohen is currently on leave from his position as a Professor in
the Department of Machine Learning, with a joint appointment in the
Language Technology Institute.
Announcements and FAQs
Projects, Publications, Software, Datasets, and Talks
These are now being distributed from my Github page.
- Spring 2018: Undergraduate Level Machine Learning with Large Datasets, 10-405, Mon-Wed 3:30-4:20 in GHC 4307
- Fall 2017: Machine Learning with Large Datasets, 10-605 and 10-805, Tues-Thus 1:30-2:50pm, PH 100.
Learning with Large Datasets, 10-605 and 10-805, Tues-Thus
1:30-2:50pm, Wean Hall 7500.
- Spring 2016: Machine Learning 10-601, Mon-Wed time 10:30-11:50am, GHC 4401, with Maria-Florina Balcan.
- Fall 2015: Machine Learning with Large Datasets, 10-605 and 10-805, Tu-Thu 4:30-5:50am in Dougherty Hall 2210.
- Spring 2015: Machine Learning with Large Datasets, 10-605 and 10-805, Tu-Thu 10:30-11:50am in BH A51
- Fall 2014: 10-601 Machine Learning, Tu-Thu 1:30-2:50, Wean 7500
- Spring 2014: 10-605 Machine Learning with Large Datasets, Mon-Wed 1:30-2:50, Dougherty Hall 1112
- Fall 2013: 10-601 Machine Learning, Mon-Wed 4:30-5:50, Doherty Hall 2315 (with Eric Xing).
- Spring 2013: Machine Learning with Large Datasets, Mon-Wed 1:30-2:50, 4307 GHC
- Fall 2012: ML 10-802 and LTI 11-772 (Analysis of Social Media), 10:30-11:50pm Tues & Thus, 4303 Gates Building.
- Fall 2012: 10-915, the MLD Journal Club, 12-1:20pm Tue & Thu, 4101 Gates Building (with Roy Maxion).
- Spring 2012: Machine Learning with Large Datasets, Tues-Thurs 1:30-2:50pm, NSH 1305
- Fall 2011: Structured
Prediction for Language and Other Discrete Data (SPLODD-2011), ML
10-710 and LTI 11-763, Tues-Thursday 3:00-4:20 in Gates-Hillman 4211.
This is co-taught by myself and Noah Smith, and will include some
subjects from Information
Extraction and some from Language and Stats 2. A
machine learning course (10-701 or consent of the instructors) is a
prereq; we don't recommend that you take the course if you have
already taken Information Extraction or Language and Stats 2.
- Spring 2011: ML 10-802 and LTI 11-772 (Analysis of Social Media), 10:30-11:50pm Tues & Thus, 4303 Gates Building.
- Spring 2011: 10-915, the MLD Journal Club, 3-4pm Mon & Wed, 4101 Gates Building.
- Fall 2010: 10-707
(Information Extraction - cross-listed in LTI as 11-748),
1:30-2:50pm Mon & Wed, Gates 4101. The first class is 9/8, the
Wed after Labor Day, to allow incoming students time to attend the IC
- Spring 2010: 10-802 (Analysis of Social Media).
- Fall 2009: 10-707
(Information Extraction), 1:30-2:50pm Mon & Wed, 5222 Gates
- Spring 2008: 10-601 (Machine Learning)
with Tom Mitchell, on 3-4:30
Mon & Wed in Wean Hall 5409.
- Fall 2007: Analysis of Social
Media, Machine Learning 10-802 and LTI 11-772, with Natalie Glance
(of Google Pittsburgh) - a brand-new seminar course. 4:30-6:30
Tuesdays in Wean Hall 4623.
- Note: This site is the shattered remains of a once-beautiful wiki,
created by the students of 10-802, generously hosted for free by
ScribbleWiki, tragically lost (due
a combination of RAID drive failures and low-bidder backup schemes),
and then largely recovered using
from various internel caches and archives.
- Fall 2007: Current Topics
in Computational Biology (Journal Club), 02-701. (Announcements). Thursdays from 4:00-5:00 in 411
Mellon Institute (after Cell & Systems Modeling).
- Spring 2007: Information Extraction, Machine
Learning 10-707 and LTI 11-748 - back by popular demand for the first time since 2004!
- Fall 2006: Current Topics in Computational Biology (Journal Club), 02-701.
- Spring 2006: Read the Web, CALD 10-709.
- June 21,23,25, 2005: A mini-course on Minorthird. Materials are below.
- Slides, notes, and sample files from first
- Slides, notes, and sample files from second
- Powerpoint slides from third
- Jar file for minorThird, if you
only want to run the code, not compile it or read it.
The installation process here is:
- Install Java 1.4 or higher (actually, JRE is all you need).
- Download the jar for minorThird
and stick it in some directory.
- Optionally, download the sample data
repository and unpack it into the same directory.
- Change to that same directory and
then run Minorthird with the command
java -Xmx500M -jar minorthird.jar
What will pop up will be a small launch pad that can be used to
start any of the UI programs. You can also start a particular
main by specifying minorthird.jar as your classpath, for
java -Xmx500M -cp minorthird.jar edu.cmu.minorthird.ui.Help
- If you want to do a real install here's the home page on Sourceforge, and
a document on how to do a CVS
- Spring 2004: "Learning to Turn Words into Data:
Machine Learning Approaches to Information Extraction and Information Integration", CALD 10-707 and LTI 11-748.
- Rose Catherine Kanjirathinkal, LTI PhD student.
- Zhilin Yang, LTI PhD student, co-advised with Ruslan Salakhutdinov.
- Bhuwan Dhingra, LTI PhD student, co-advised with Ruslan Salakhutdinov.
- Yifeng Tao, CMU Comp Bio PhD student, co-supervised with Xinghua Lu.
- Fan Yang, MLD PhD student.
- Daniel Spokoyny, LTI PhD student, co-supervised with Taylog Berg-Kirkpatrick.
- Haitian Sun, MLD MS student.
- Qiao Jin, School of Medicine, Tsinghua University
- William Yang Wang (former LTI PhD student, now at UCSB).
- Dana Movshovitz-Attias (former CSD PhD student,
now at Google).
- Bhavana Dalvi Mishra (former LTI PhD student
(co-advised with Jamie Callan, now at AI2)
- Tae Yano, (former LTI
PhD student, co-advised
with Noah Smith, now at Microsoft)
- Nan Li, (former CSD PhD
Koedinger, now at D. E. Shaw)
- Ramnath Balasubramanyan, (LTI PhD student, now at Twitter)
- Mahesh Joshi, (former LTI PhD student,
co-advised with Carolyn Rosé, now at EBay)
- Frank Lin, (former LTI PhD student, now at AirBnB)
- Ni Lao (former LTI PhD student, now at Google)
- Richard C. Wang,
(former LTI PhD student co-advised with Bob Frederking, now at Baidu).
- Andrew Arnold
(former MLD PhD student, now at Point 72 Asset Management)
- Einat Minkov
(former LTI PhD student, now at Haifa University)
- Vitor Rocha de Carvalho (former LTI PhD student, now at QualComm)
- Zhenzhen Kou (former MLD PhD student, now at Google)
- Ezra Winston, MLD Master's student.
- Lanxio (Karen) Xu, MLD Master's student.
- Yuxing Zhang, MLD Master's student.
- Jakob Bauer, MLD 5th-year Master's student
- Kavya Srinet, MCDS Master's student.
- Bhawna Juneja, MCDS Master's student.
- Tom Shen, CMU CSD undergrad
- Yu-Hsin Allen Kuo, LTI MLT student, formerly co-advised with Natasa Miskov-Zivanov
- Rahul Goutam, former LTI MLT student, co-advised with Natasa Miskov-Zivanov
- Malcolm Greaves, former CSD master's student.
- Edoardo Airoldi
(former MLD/Stats PhD student, co-advised with Steve Fienberg)
- Ja-Hui Chang
(visiting faculty from National Central University, Taiwan, 2007-2008)
- Wen Haw Chong (PhD student at Singapore Management University,
visted CMU in 2015-2016).
Ahn Hoang, (PhD student at Singapore Management University,
visited CMU for 2012-2013 academic year in my group).
Chong Tat Chua (PhD student at Singapore Management University,
visited CMU for the academic year 2011-2012 in my group.)
- Gustavo Lacerda
(former research assistant, co-supervised with Noboru Matsuda and Ken Koedinger, now at UBC)
- Lidong Bing, former
postdoc, now at Tencent.
- Ramesh Nallapati
(former postdoc, co-supervised with John Lafferty, now at IBM Watson)
- Noboru Matsuda
(former postdoc, co-supervised with Ken Koedinger,
now System Scientist in CMU's HCII)
- Pradeep Ravikumar
(former MLD PhD student, co-advised with Steve Fienberg)
- I have been an external committee member for the PhD theses of
I have also been an external committee member for the Master's theses of
Mehrbod Sharifi (CMU) and
Weam Abu-Zaki (CMU).
I am currently an external committee member for Tiancheng Zhao,
Shashank Srivastava, Pradeep Dasigi, and Abulhair Saparov.
William W. Cohen
Professor, Machine Learning Department
Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213
8217 Gates Hillman Complex
(shipping address: 6105 Gates Hillman Complex)
voice: 412-268-7664 / fax: 412-268-2205
Assistant: Dorothy Holland-Minkley, GHC 8001, firstname.lastname@example.org
My preferred email address for CMU-related matters is: wcohen AT cs DOT cmu DOT edu