Eduard Hovy

Eduard Hovy

Carnegie Mellon University
Language Technologies Institute
5000 Forbes Avenue
Pittsburgh, PA 15213
U.S.A.

tel: +1-412-268-6592
email: hovy@cmu.edu

Projects webpage: http://www.edvisees.cs.cmu.edu

Current Positions
Research Directions and Projects
Work Experience
Honors

Current Positions

Prof. Hovy currently holds the following positions:

Research Professor at the Language Technologies Institute of Carnegie Mellon University. My research focuses on various topics, including aspects of the computational semantics of human language (such as text analysis, event detection and coreference, text summarization and generation, question answering, discourse processing, ontologies, text mining, text annotation, and machine translation evaluation), aspects of social media (such as event detection and tracking, sentiment and opinion analysis, and author profile creation), analysis of the semantics of non-textual information such as tables, and aspects of digital government. For details see below.

Regular High-Level Visiting Scientist, International Guest Academic Talents (IGAT) Program for the Development of University Disciplines in China (111 Program), China (Jan 2008 -- Dec 2017).

Research Directions and Projects

Research can be organized into three principal overlapping directions:

(1) Natural Language Processing / Computational Linguistics / Human Language Technology

Development of sophisticated machine reading, information extraction, parsing, and text analysis technology (relevant publications).
DARPA's programs AIDA, World Modeling, Big Mechanisms, DEFT, and Machine Reading all have the goal to develop NLP and knowledge representation and reasoning techniques for deeper semantic analysis of text and resultant automated learning of domain information. Prof. Hovy leads or has led the following projects: OPERA (AIDA Program, 2018--, domain: automated hypothesis formation and reasoning based on multilingual news and reports); STORM/SOFIA (World Modelers Program, 2017--, domain: reading to help construct in-depth causal models of world situations) as part of the STORM project headed by researchers at the University of Pittsburgh; RUBICON (Big Mechanism program, 2014--2017, domain: research articles on cancer), which includes researchers at Carnegie Mellon University (CMU), the University of Southern California's Information Sciences Institute (USC/ISI), and Elsevier Inc.; SAFT (Semantic Analysis and Filtering of Text) (DEFT program, 2012--2015, domain: news articles and reports on violent and legal events), which includes researchers at CMU and USC/ISI; from 2008--12, Prof. Hovy's groups participated in two of DARPA's MRP teams: RACR (headed by IBM, the team that developed the Watson QA game-playing engine) and ERUDITE (headed by BBN; the OntoNotes corpus was developed as part of this project); the SASO (2004--11) and MRE (2001--04) projects at the Institute of Creative Technology of the University of Southern California developed virtual humans in virtual reality simulations, employing text-to-semantics parsers and opposite-direction generators developed by Prof. Hovy and students.

Development of text analytics systems and approaches in various domains. One example is Future of Work, a project with Lee Branstetter of CMU that attempts to quantify the effect of Artificial Intelligence developments on the economies of the US (and later, other countries) by identifying all AI-related patents and analyzing how they affect the workforce of the corporations that develop and/or employ the patents; another is the AI in the Auditing Process project, working with auditors at PriceWaterhouseCoopers and Pierre Liang to employ NLP and AI methods in support of corporate auditing procedures.

Development of automated question answering systems (relevant publications).
Associated with the above are several QA systems developed at ISI, such as Textmap and Webclopedia (with Dr. Daniel Marcu, Dr. Ulf Hermjakob, Dr. Chin-Yew Lin, and others). This work employed information retrieval, clustering, text summarization, parsing, and text harvesting methods described elsewhere.

Development of automated text summarization systems and automated summarization evaluation theory and technology (relevant publications).
Summarization engines developed by Prof. Hovy, Dr. Chin-Yew Lin, and others at ISI include SUMMARIST (single documents), NeATS (multiple documents), and GOSP (producing headlines). Summarization was used in multilingual text access and management systems such as C*ST*RD and MuST. Summarization evaluation systems include ROUGE (2003--04) developed by Dr. Chin-Yew Lin of ISI with Prof. Hovy, and the BE package (2005--08) developed by Dr. Stephen Tratz and Prof. Hovy.

Research on various aspects of machine translation and automated MT evaluation systems and technology (relevant publications).
For MT evaluation, work in 2002--04 includes a systematization of all major machine translation evaluation measures ( the FEMTI survey) with Prof. Maghi King and Dr. Andre Popescu-Belis at the University of Geneva, as well as students and researchers at other universities and commercial MT companies. Work on machine translation included development of the Pangloss MT system (1990--94) together with researchers at CMU and New Mexico State University, which helped establish ISI's Gazelle system headed by Dr. Kevin Knight. The NSF-sponsored IL-Annotation project IAMTC (2003--04), joint with researchers at CMU, University of Maryland, MITRE, Columbia University, and New Mexico State University, focused on Interlingua design and text annotation; see under lexical semantics below.

Development of sophisticated social media analysis and opinion identification technology (relevant publications).
In one project, focusing on identifying the personality and interaction patterns of people active on social media. This work with researchers at ISI and elsewhere in the Social Media project, developing techniques for classifying participants in online discussions into roles such as Leader, Follower, Idiot, etc., and quantifying their degree of persuasiveness and persuadability. Builds upon prior research in the MKIDS-ISI project (2002--05) that developed methods to analyze emails for expertise (of people and groups) and relative social status, using topic signature and speech act recognition. In another project, developing techniques to identify new events of interest and track their evolution, building on work done with Dr. Don Metzler and others at ISI in 2010--11 to recognize important events from analyzing the Twitter stream. Earlier, the Psyop project (2004--08) employed information extraction and sentiment analysis technology to extract from online texts entities, events, beliefs, goals, opinions, and other information of interest, and to compose the results into psychologically informative descriptions of people.

Development of theories and systems to perform automated text generation, including multi-sentence text and sentence planners (relevant publications) and single-sentence generators (relevant publications).
Work with researchers at USC's Institute for Creative Technologies to develop a parser and generator generator for the software agents in virtual reality simulations called SASO (2004--09) and Mission Rehearsal Exercise (2001--04) (this work in collaboration with Dr. David Traum, Dr. Anton Leuski, Dr. David DeVault, and others).
The project Quick!Help focuses on the generation of tailored recipes for poor people (this work in collaboration with Prof. Peter Clarke and Dr. Susan Evans from USC and Andrew Philpot from ISI). This work relates to language tailoring done earlier in the HealthDoc project with Prof. Chrysanne DiMarco from the University of Waterloo, Canada and Prof. Graeme Hirst from the University of Toronto). Earlier work focused on the development of discourse relations and planners that employ them to ensure the production of coherent multisentential text. This includes a taxonomization of all available discourse relations collected from various sources (1992) and the RST Test Structurer (1987--92). Prof. Hovy's work in 1987--92 included participation on the Penman sentence generator with researchers in various countries, to develop the then-largest sentence generator in the world.
Prof. Hovy's Ph.D. work focused on the development of a text generation program PAULINE that took into account the pragmatic aspects of communication, since the absence of sensitivity toward hearer and context has been a serious shortcoming of generator programs written to date. In general, he is interested in all facets of communication, especially language, as situated in the wider context of intelligent behavior. Related areas include Artificial Intelligence (work on planning and learning), Linguistics (semantics and pragmatics), Psychology, Philosophy (ontologies), and Theory of Computation.

Development of theory to address problems in multimedia human-computer communication (relevant publications).
This work (1989--2002), conducted with Dr. Yigal Arens of ISI and students, focused specifically on the question of dynamic planning and allocation of information to media during presentation design.

(2) Deep (neural), Distributional, and Lexical Semantics, Ontologies, and Text Mining/Harvesting

Exploration of deep (neural) embedding representations and neural network (deep) processing methods for a variety of NLP and related tasks (relevant publications).
Several separate projects explore ways to make embeddings explainable/understandable (for example, by retrofitting them against semantic ontologies or making them sparse) or to understand the specific information transformation done by neural networks.

Exploration of distributional semantics and its interrelationships with traditional propositional semantics (relevant publications).
In the in-depth reading projects OPERA (2018--), SAFT (2012--17), and earlier projects (funded under DARPA's AIDA, DEFT, and earlier programs), I am exploring various ways of creating and working with distributional and deep (neural embedding) semantic models that are learned from text in various domains and used to enable automated reasoning and other NLP-based tasks (relevant publications).

Development of shallow semantic representation notations and tools that support manual annotation of large amounts of text with shallow semantic information (relevant publications).
The DARPA-funded OntoNotes project (2008--2012), joint with Dr. Ralph Weischedel and Dr. Lance Ramshaw of BBN, Prof. Mitch Marcus of the University of Pennsylvania, and Prof. Martha Palmer of the University of Colorado, focused on the creation of a large corpus of texts in English, Chinese, and Arabic that was annotated with shallow semantic information (word senses and some coreference). The wordsense information was incorporated into the Omega ontology (see below). The NSF-funded IL-Annot project IAMTC (2003--04), joint with researchers at CMU, University of Maryland, MITRE, Columbia University, and New Mexico State University, focused on stepwise Interlingua design and verification by annotation of texts in 7 languages. In both these projects, the Omega ontology (see below) provided the symbol set for semantic annotation.

Development of large concept taxonomies/ontologies through a combination of merging together existing ontologies, adding to the knowledge by extracting information from online text (see below), and enriching the interdependency relations by extracting information from dictionaries (relevant publications).
The Omega ontology, built at ISI since 2003, contains over 120,000 concept terms and several million instances, in addition to various other information, acquired from a variety of sources, including Princeton's WordNet, NMSU's Mikrokosmos, and ISI's earlier ontology SENSUS (1996--2000). During 2008--2011, in the OntoNotes project (see above), a new Upper Model was built for Omega, and its Entities were thoroughly re-organized. Work on Omega has been performed by Prof. Hovy in collaboration with Mr. Andrew Philpot, Dr. Patrick Pantel, Mr. Michael Fleischman, and Dr. Jerry Hobbs from ISI.

Development of techniques to extract large amounts of instance- and concept-level information from online text (relevant publications).
At ISI, Dr. Zornitsa Kozareva and Prof. Hovy developed the Double-Anchored Pattern (DAP) text harvesting technique and demonstrated its effectiveness for collecting terms and relations, and for organizing them hierarchically, over large amount sof domain texts. (This work was partially done in collaboration with Prof. Ellen Riloff from the University of Utah.)
In several earlier projects since 1996, Prof. Hovy, students, and collaborators developed a series of text mining and information extraction engines, and built collections comprising several millions facts (about people, locations, objects, etc.). This information, stored in a database, was in many cases connected to the Omega ontology (see above). The Learning by Reading and Möbius (2005--08) experiments attempted to combine tagging, parsing, semantic analysis, and inference techniques to create a knowledge base automatically from a high school textbook of Chemistry and from texts about the heart and engines, and to answer high school-level test questions about this.

(3) Digital Government and Homeland Security

Assisting law enforcement in the fight against human trafficking (relevant publications).
Developing methods to identify instances of human trafficking, to help locate victims, and to collect and synthesize enough information that trends and patterns can be discovered and used to combat the problem. The system WAT developed in this project performed information extraction, data synthesis, and pattern analysis to assist US law enforcement agencies locate and help underage victims of trafficking.

Developing technology to assist people regarding cybersecurity (relevant publications).
The development of an environment that allows non-experts to learn about cybersecurity, to identify experts and/or publications relevant to specific topics and questions, and to check any software they have received.

Development of techniques to support data interoperability to support effective Digital Government (relevant publications).
The development of systems to automatically find alignments or aliases across and within databases (2003--06). The SiFT system used mutual information technology to detect patterns in the distribution of data values. Government partners in this NSF-funded project project were the Environmental Protection Agency (EPA), who provided databases with air quality measurement data. (This work was done at ISI with Mr. Andrew Philpot and Dr. Patrick Pantel).

Development of sophisticated text analysis of public commentary, such as email, letters, and reports, delivered to the government (relevant publications).
Several projects from 2000--07 addressed the problem faced by government regulation writers that they regularly face tens to hundreds of thousands of emails and other comments about proposed regulations, sent to them by the public. Funded by the NSF, the eRule projects were a collaboration between Prof. Stuart Shulman (a political scientist then at the University of Pittsburgh and the University of Massachusetts Amherst), Prof. Jamie Callan (a computer scientist at CMU), Prof Steven Zavestoski (a sociologist at the University of San Francisco), and Prof. Hovy. Government partners providing data were the Environmental Protection Agency (EPA) and the Department of Transportation (DOT). Research at ISI focused on technology to perform opinion detection and argument structure extraction. This research fed into the analysis of text for psychological profiling in the Psyop project mentioned above.

Development of text analysis of public communications with city government via email (2005). The NSF funded a one-year project to collaborate with the QUALEG group, a European consortium of businesses, researchers, and three cities funded by the EU's eGovernment program to develop ICT for city-to-citizen interaction. Work at ISI focused on the development of a system to classify emails and extract speech acts, opinions, and stakeholders in German.

Development of systems to access multiple heterogeneous databases (relevant publications).
Funded by the NSF (1999--2003) a series of projects addressed the problem that many government agencies face: their data is distributed in various formats over different databases, and evolves to include slightly different variations over the years. Our EDC and AskCal systems provided access to over 50,000 table of information about gasoline, produced by various Federal Statistics agencies, including the Census Bureau, the Bureau of Labor Statistics, and the Energy Information Administration. The system included a large ontology and a natural language question interpreter. This work was done at ISI in collaboration with Mr. Andrew Philpot and Dr. Jose-Luis Ambite. External partners in this project were the DGRC team at Columbia University, New York, headed by Dr. Judith Klavans.

Work Experience

Research Professor (Jul 2015--), Associate Research Professor (Sep 2012 -- Jun 2015) at the Language Technologies Institute of Carnegie Mellon University, Pittsburgh, PA

Research Area Co-Lead (20-17--), Center for Criminal Investigation and Network Analysis (CINA). George Mason University heading a consortium of about 10 universities, including CMU, 2017–22. Funded under Department of Homeland Security University Affiliates Center program.

Co-Director for Research (May 2009 -- Jun 2016) of the Command, Control, and Interoperability Center for Advanced Data Analysis (CCICADA), a Department of Homeland Security Center of Excellence consortium of over a dozen universities

Research Associate Professor (Nov 1999 --) and Research Assistant Professor (Dec 1989 -- Oct 1999), Department of Computer Science, University of Southern California, Los Angeles, CA

Advisory Professor (Oct 2005 -- Aug 2015), Beijing University of Posts and Telecommunications, Beijing, China

Adjunct Professor (Feb 1997 -- Jan 2003) and (May 2010 -- Apr 2015), School of Computer Science, University of Waterloo, Waterloo, Canada

Division Director (2011 -- 2012), ISI Fellow (Aug 2000 -- 2012), Deputy Division Director (Oct 2002 -- 2011), Senior Project Leader (May 1997 -- 2002), Project Leader (Jul 1989 -- Apr 1997), and Computer Scientist (Mar 1987 -- Jun 1989), Information Sciences Institute of the University of Southern California, Los Angeles, CA

Adjunct Professor (Sep 2005 -- Aug 2012), Department of Computer Science, KAIST, Daejeon, Korea

Concurrent Professor (Oct 2008 -- Sep 2011), Department of Computer Science, Northeastern University, Shenyang, China

Director of Research (2000 -- 2008) of the Digital Government Research Center DGRC at the Information Sciences Institute of the University of Southern California, Los Angeles, CA

Co-Director (May 1999 -- Jun 2005), Master's Degree Program in Computational Linguistics, University of Southern California, Los Angeles, CA

Honors

Ph.D. honoris causa, University of Antwerp, Belgium. Apr 2015

Ph.D. honoris causa, National University of Distance Education (UNED), Madrid, Spain. Jan 2013

AAAI Fellow, Association for the Advancement of Artificial Intelligence. Feb 2017

ACL Fellow, one of the original 17 Fellows of the Association for Computational Linguistics. Dec 2011

Regular High-Level Visiting Scientist, International Guest Academic Talents (IGAT) Program for the Development of University Disciplines in China ( 111 Program). (China's Ministry of Education launched the so-called "111" program in September 2006, aiming to invite 1,000 world class academics from the world's top 100 universities to establish 100 innovative research bases in China.) Jan 2008 -- Dec 2012, renewed to Dec 2015

Best Paper Award, North American ACL conference (HLT-NAACL) conference. Faruqui, M., J. Dodge, S.K. Jauhar, C. Dyer, E.H. Hovy, and N.A. Smith. Retrofitting Word Vectors to Semantic Lexicons. Jun 2015

Best Paper Award, IEEE International Conference on Semantic Computing (IEEE-ICSC 2007). Nov 2007

Mellon Award for Excellence in Mentoring, awarded by the USC Center for Excellence in Teaching (Office of the Provost). University of Southern California. Apr 2006

Program Committee Honorary Chair, IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE). Wuhan, China (2005); Beijing, China (2007); Beijing, China (2010)

ISI Fellow. USC Information Sciences Institute. Aug 2000

Scientia prize for best science graduate. Rand Afrikaans University (now called University of Johannesburg), Johannesburg, South Africa. Dec 1977

De Beers Undergraduate Scholarship. De Beers Consolidated Mines. Johannesburg, South Africa. Jan 1975 -- Dec 1978