Hideki Shima

Last Updated:
Nov 2015
Software Engineer at Duolingo
Office: 5533 Walnut St, 3rd floor
Pittsburgh, PA 15232 USA

I graduated from the Ph.D. program after defending my thesis in Aug 2014. I joined Duolingo in Sep 2014.

I have been interested in applying Natural Language Processing, Machine Learning, and Artificial Intelligence to solve real world problems. In Duolingo, we develop a language learning app used by 100+ million users all over the world. I mainly work on optimizing the "engine" component that dynamically generates a sequence of challenges in a way that a student can learn a foreign language efficiently.
While at CMU, my research interest included automatically acquiring paraphrase knowledge using weakly-supervised machine learning models, and its application to Question Answering, Recognizing Textual Entailment, automatic evaluation of Information Access system, Information Extraction, and Information Retrieval. I was one of the original core technical team members of DeepQA (IBM Watson) who were developing a system toward competing on Jeopardy! TV show against human champions. I created and lead shared tasks for recognizing textual entailment, paraphrase and contradiction, called RITE where more than 20 research organizations participated internationally.

News: In Aug 2015, my interview article appeared on Lifehacker Japan (link). In April 2013, I released WS4J Web Demo.


(Aug 2006 - Aug 2014)
Carnegie Mellon University, Pittsburgh, PA
Language Technologies Institute, School of Computer Science
Thesis: Paraphrase Pattern Acquisition by Diversifiable Bootstrapping (PDF)
(committee: Teruko Mitamura, Eric Nyberg, Eduard Hovy, and Patrick Pantel; degree awareded in May 2015)

(Aug 2004 - Aug 2006)
Carnegie Mellon University, Pittsburgh, PA
Language Technologies Institute, School of Computer Science

(Apr 2000 - Mar 2004)
B.S., Waseda University, Tokyo, Japan
Information and Computer Science

(Fall 2012 - 2014)
Deep Exploration and Filtering of Text (DEFT) - sponsored by DARPA
The goal of DEFT is to apply sophisticated AI / NLP to enable analysts to efficiently discover not only explicit but also implicit information from orders of magnitude more text documents. Our team, led by Eduard Hovy and Teruko Mitamura, focuses on research involving Event Detection and Event Coreference Resolution. Especially, I worked on the hard cases where the surface form of the event mentions cannot be literally interpreted (e.g. "pulled a trigger at" and "shot"). To this end, I built a semi-supervised machine learning model that can automatically acquire a domain-specific thesaurus that covers more diverse forms of expressions than existing thesauri do.

(Winter 2011 - 2014)
SmartReader - sponsored by NPRP under Qatar National Research Fund
This project, led by Kemal Oflazer and Teruko Mitamura, aims to develop intelligent educational software for English language learners. I worked on a monolingual statistical Machine Translation model for paraphrase generation as well as automatic simplified vocabulary annotation on text.

(Spring 2010 - Fall 2012)
Machine Reading Program (MRP) - sponsored by DARPA under AFRL
The MRP's RACR (Reading and Contextual Reasoning) team was led by Chris Welty (IBM), Eric Nyberg, and Teruko Mitamura where we built a text engine that captures universal knowledge from naturally occurring unstructured text and transforms it into the formal representations used by artificial intelligence (AI) reasoning systems. Part of my contributions include: (1) building a mixture-of-expert model that can merge outputs from various different NLP components (Entity/Relation Extractors), so that strength complement each other; (2) building automatic evaluation software for entity and relation mention extraction accuracy which also provides tools to analyze errors and visualize result diagrams.

(Sep 2011 - Feb 2012)
Yahoo! FREP Project - employed by Yahoo! Labs (remote part-time)
In this project led by Emre Velipasaoglu and Eric Nyberg, we worked on developing a model which can be used to identify complicated information needs using years of question-answer data asked in Yahoo! Answers. We utilized Apache Hadoop to efficiently obtain features from unigram, POS, to super senses. Collaborators: Pinar Donmez and Ana-Maria Popescu. Position: Research Scientist Student.

(June 2009 - Aug 2009)
Watson (Jeopardy!) Project - employed by IBM Research (full-time)
At IBM T.J. Watson Research Center, New York, I spent three months as a full-time research intern in the Watson (Jaopardy! / DeepQA) Question Answering project (PI: David Ferrucci; mentor: Eric W. Brown). The Watson QA system later became very famous when it played in the Jeopardy! TV show and beat the grand champions. I researched in one of the algorithms used to score answer candidates. The algorithm analyzes supporting evidence found in a set of text passages retrieved for each candidate answer, and estimates how well they support the answer by modeling the semantic similarity between the passages and the Jeopardy! clue. The research resulted in a part of Watson system and one journal publication.

(Fall 2008 - Spring 2010)
KIJI QA Project - collaborated with IBM Research - Tokyo
We built a Complex Cross-lingual QA system English-Japanese, in collaboration with IBM Research - Tokyo (Koichi Takeda and Hiroshi Kanayama), which can answer various (e.g. definition, biography, relationship, event, person, location etc) kinds of questions that may be asked in Business Intelligence scenario. The system has been evaluated in NTCIR ACLIA and resulted in a good performance. The project also resulted in an HTML-based annotation viewer for UIMA.

(Fall 2004 - Fall 2008)
JAVELIN QA Project - sponsored by AQUAINT under ARDA/DTO/IARPA
As a graduate research assistant, I contributed in building open-domain Factoid and Complex Question Answering systems where I worked on crosslingual English-to-Japanese (EJ) and monolingual Japanese-to-Japanese (JJ) modules. My experience varies in various aspects of the QA research, e.g. Question Analysis, Named Entity Transliteration, Document Retrieval, Information Extraction, Answer Summarization, web-based demo, and batch evaluation with automatic error analysis. As a result of intensive research effort, the Javelin system achieved remarkable results in competition style evaluation-oriented QA tasks similar to TREC and CLEF; we achieved the best result among participants in NTCIR-6 CLQA JJ subtask, NTCIR-7 ACLIA CCLQA EJ and JJ task, and NTCIR-7 ACLIA IR4QA EJ task. PI: Eric Nyberg, Teruko Mitamura. Collaborators: Ni Lao, Mengqiu Wang, Frank Lin, Matthew Bilotti, Andy Schlaikjer, Jeongwoo Ko, Jim Rankin, Eric Riebling, David Svoboda.

Citations: 350,  h-index: 8,  i10-index: 8.  (Google Scholar as of Nov 2015)

Azab, Mahmoud, Ahmed Salama, Kemal Oflazer, Hideki Shima, Jun Araki, and Teruko Mitamura. 2013. "An NLP-based Reading Tool for Aiding Non-native English Readers". In Proceedings of the 9th Recent Advances in Natural Language Processing (RANLP), 41-48. (Link)

Azab, Mahmoud, Ahmed Salama, Kemal Oflazer, Hideki Shima, Jun Araki, and Teruko Mitamura. 2013. "An English Reading Tool as a NLP Showcase". In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP): System Demonstrations, 5-8. (Link)

Watanabe, Yotaro, Yusuke Miyao, Junta Mizuno, Tomohide Shibata, Hiroshi Kanayama, C.-W. Lee, C.-J. Lin, Shuming Shi, Teruko Mitamura, Noriko Kando, Hideki Shima, and Kohichi Takeda. 2013. "Overview of the Recognizing Inference in Text (RITE-2) at the NTCIR-10 Workshop". In Proceedings of NTCIR-10 Workshop Meeting. (Link)

Miyao, Yusuke, Hideki Shima, Hiroshi Kanayama, and Teruko Mitamura. 2012. "Evaluating Textual Entailment Recognition for University Entrance Examinations", ACM Transactions on Asian Language Information Processing (TALIP) 11 (4), 13. (Link)

Lee, Cheng-Wei, Chuan-Jie Lin, Hideki Shima and Wen-Lian Hsu. 2012. "Evaluating and Enhancing Cross-domain Rank Predictability of Textual Entailment Datasets", IEEE 13th International Conference on Information Reuse and Integration (IRI), 51--58. (Link)

Shima, Hideki and Teruko Mitamura. 2012. "Diversifiable Bootstrapping for Acquiring High-Coverage Paraphrase Resource", in Proceedings of The Language Resource and Evaluation Conference (LREC) 2012, Turkey. (PDF), (PPTX).

Murdock, J. William, James Fan, Adam Lally, Hideki Shima, and Branimir Boguraev. 2012. "Textual Evidence Gathering and Analysis". IBM Research and Development Journal Special Issue on DeepQA, 56(3/4), 8:1 - 8:14. (Link)

Shima, Hideki, Hiroshi Kanayama, Cheng-Wei Lee, Chuan-Jie Lin, Teruko Mitamura, Yusuke Miyao, Shuming Shi, and Koichi Takeda. 2011. "Overview of NTCIR-9 RITE: Recognizing Inference in TExt", in Proceedings of NTCIR-9 Workshop, Japan. (PDF), (CODE)

Shima, Hideki, Yuanpeng Li, Naoki Orii, and Teruko Mitamura. 2011. "LTI's Textual Entailment Recognizer System at NTCIR-9 RITE", in Proceedings of NTCIR-9 Workshop, Japan. (PDF).

Shima, Hideki and Teruko Mitamura. 2011. "Diversity-aware Evaluation for Paraphrase Patterns", in Proceedings of TextInfer 2011: The EMNLP 2011 Workshop on Textual Entailment. Edinburgh, Scotland. (PDF), (CODE)

Shima, Hideki and Teruko Mitamura. 2010. "Bootstrap Pattern Learning for Open-Domain CLQA", in Proceedings of NTCIR-8 Workshop, Japan. (PDF)

Mitamura, Teruko, Hideki Shima, Tetsuya Sakai, Noriko Kando, Tatsunori Mori, Koichi Takeda, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, and Cheng-Wei Lee. 2010. "Overview of the NTCIR-8 ACLIA Tasks: Advanced Cross-Lingual Information Access", in Proceedings of NTCIR-8 Workshop, Japan. (PDF)

Sakai, Tetsuya, Hideki Shima, Noriko Kando, Ruihua Song, Chuan-Jie Lin, Teruko Mitamura, and Miho Sugimoto. 2010. "Overview of NTCIR-8 ACLIA IR4QA", in Proceedings of NTCIR-8 Workshop, Japan. (PDF)

Sakai, Tetsuya, Noriko Kando, Hideki Shima, Chuan-Jie Lin, Ruihua Song, Miho Sugimoto and Teruko Mitamura. 2009. "Ranking the NTCIR ACLIA IR4QA Systems without Relevance Assessments", DBSJ Journal, Vol.8, No.2, pp.1-6 (2009) (PDF)

Sakai, Tetsuya, Noriko Kando, Chuan-Jie Lin, Ruihua Song, Hideki Shima, and Teruko Mitamura. 2009. "Revisiting NTCIR ACLIA IR4QA with Additional Relevance Assessments", IPSJ SIG Technical Report Vol.2009-DBS-148 No.9 / Vol.2009-FI-95 No.9, 2009.

Shima, Hideki, Ni Lao, Eric Nyberg and Teruko Mitamura. 2008. "Complex Cross-lingual Question Answering as Sequential Classification and Multi-Document Summarization Task", in Proceedings of NTCIR-7 Workshop, Japan. (PDF)

Lao, Ni, Hideki Shima, Teruko Mitamura and Eric Nyberg. 2008. "Query Expansion and Machine Translation for Robust Cross-Lingual Information Retrieval", in Proceedings of NTCIR-7 Workshop, Japan. (PDF)

Mitamura, Teruko, Eric Nyberg, Hideki Shima, Tsuneaki Kato, Tatsunori Mori, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, Tetsuya Sakai, Donghong Ji and Noriko Kando. 2008. "Overview of the NTCIR-7 ACLIA: Advanced Cross-Lingual Information Access", in Proceedings of NTCIR-7 Workshop, Japan. (PDF)

Sakai, Tetsuya, Noriko Kando, Chuan-Jie Lin, Teruko Mitamura, Hideki Shima, Donghong Ji, Kuang-Hua Chen and Eric Nyberg. 2008. "Overview of the NTCIR-7 ACLIA IR4QA Task", in Proceedings of NTCIR-7 Workshop, Japan. (PDF)

Mitamura, Teruko, Frank Lin, Hideki Shima, Mengqiu Wang, Jeongwoo Ko, Justin Betteridge, Matthew Bilotti, Andrew Schlaikjer and Eric Nyberg. 2007. "JAVELIN III: Cross-Lingual Question Answering from Japanese and Chinese Documents", in Proceedings of NTICIR-6 Workshop, Japan. (PDF)

Shima, Hideki and Teruko Mitamura. 2007. "JAVELIN III: Answering Non-Factoid Questions in Japanese", in Proceedings of NTICIR-6 Workshop, Japan. (PDF)

Mitamura, Teruko, Mengqiu Wang, Hideki Shima and Frank Lin. 2006. "Keyword Translation Accuracy and Cross-Lingual Question Answering in Chinese and Japanese", in Proceedings of EACL 2006 Workshop on MLQA (PDF)

Shima, Hideki, Mengqiu Wang, Frank Lin and Teruko Mitamura. 2006. "Modular Approach to Error Analysis and Evaluation for Multilingual Question Answering", in Proceedings of LREC 2006 (PDF)

Lin, Frank, Hideki Shima, Mengqiu Wang and Teruko Mitamura. 2005. "CMU JAVELIN System for NTCIR5 CLQA1", In Proceedings of the NTCIR-5 Workshop, Tokyo, Japan (PDF)

Shima, Hideki. 2004. "Automatic News Aggregation for Children", Bachelor's Thesis in Waseda University, Tokyo, Japan. (In Japanese).

AWARD   The Allen Newell Award for Research Excellence (with Eric Nyberg, Teruko Mitamura and Nico Schlaefer) Past recipients of this award inlcude Turing award winners Ken Thompson and Edmund Clarke.

     Graduate teaching and research assistantships
     NTCIR-9 Travel Award by Google
     NTCIR-8 Travel Award by NII
     NTCIR-7 Travel Award by NII

  Workshop Chair: IEEE EMRITE (2014), IEEE EMRITE (2013), IEEE EMRITE (2012)
Organizer: NTCIR-10 RITE, DARPA MRP Kick Off - Student Summit (2011), NTCIR-9 RITE, NTCIR-8 ACLIA and NTCIR-7 ACLIA
Program Committee: AIRS (2011), EMNLP (2011), AIRS (2010)
Reviewer: ACL (2010), ACM CIKM (2009), AIRS (2009), ACM SIGIR (2008)
TEACHING ASSISTANT   Spring 2009, 11-792 Software Engineering II (graduate level): Advising four student projects: HoneyDew (meeting scheduling agent that interprets emails), WebRecommender (web page recommendation system), STAT (unsupervised learning toolkit), PIGOptimizer (Hadoop's command optimization subproject).

Fall 2008, 2009 and 2010, 11-791 Software Engineering I (graduate level): Designing and grading individual assignments, exams and team projects. Giving a tutorial lecture for the tools/skills needed in the team project, including Subversion, Trac, Maven2 and Test-Driven Development with JUnit.

SOFTWARE   WS4J (WordNet Similarity for Java) provides Java APIs for several published semantic relatedness/similarity algorithms that runs with various WordNet DB/API. (web-based demo)

Wikipedia Redirect can extract pairs of a title and a redirected title (e.g. "USA" -> "United States") from a wikipedia dump on any language. It's useful for addressing vocabulary mismatch in text especially on proper nouns.

DIMPLE (DIversity-aware Metric for Pattern Learning Experiment) evaluates paraphrase patterns considering with lexical diversity. The software comes with a data loader for RTE, MS Paraphrase, and TREC Complex QA evaluation datasets which could be reused in other projects.

RITE SDK provides a Java framework for rapidly building a Textual Entailment recognition system especially toward participating in the NTCIR-9 RITE evaluation task. RITE SDK comes with a sample code, so you can rapidly build a working system by modifying it.

SEPIA is a web based tool for topic development and evaluation for Information Retrieval and Complex Question Answering. Officially used in NTCIR-7 ACLIA, NTCIR-8 ACLIA and GeoTime.

JAWJAW is a Java API for the Japanese/English WordNet.

Annotation Viewer based on UIMA CAS Consumer enables anyone unfamiliar with UIMA to browse NLP annotations. Ask me to get the code. Here is a sample output from HTML Cas Consumer and ASCII Annotation Cas Consumer.

Indri CAS Consumer is a UIMA component that produces offset annotations for Indri. With this component, you can easily create indri index so that document retrieval with annotated query is made possible. Structured indexing (e.g. syntactic dependency, predicate-argument structure) is also supported. Robust enough to work on a gigabyte class corpus. (To be released.)

UCR or UIMA Component Repository is a web based repository where developers can upload their UIMA components to share. As one of start up members consisting of CMU students and IBM researchers, I contributed in object oriented analysis, design and implementation especially in search part.

... And many other (re-)implementations including:
  • Recall-optimized sequential classifier for answer-bearing sentence extraction
  • Text summarizer based on Maximal Marginal Relevance
  • Bootstrapping relation-instance learner based on Espresso
  • Pattern-based Pseudo-Relevance Feedback addressing NE vocabulary mismatch
  • Integrated automatic evaluation toolkit with BLEU, METEOR, ROUGE, BE, POURPRE,...
  • Factoid QA batch evaluation & error analysis tool (see sample output)
  • Japanese Named Entity tagger based on CRFs
  • Javelin web-based demo (see screendump)
  • Wikipedia inter-page and inter-language link mining (see figures)
  • Shallow Semantic Parser based on Tree-CRFs
  • Baum-Welch unsupervised learner for HMMs
  • Machine-generated fake text classifier
  • Spam filter based on Naive Bayes
  • Protein search engine on Medline biomedical corpus
  • Web-mining tool for proper noun translation (see result)
  • English-French alignment tool
  • Web-mining tool for person name transliteration
  • Wikipedia-gloss annotator for kids (see screendump)
  • Robust bitmap emboldening algorithm (undergrad research with Microsoft Japan, see result)
  • AIBO remote controller with gyroscope and head-mount-display

IBM Watson Jeopardy! Challenge
15-681 Machine Learning
11-796 Question Answering Lab
11-792 Software Engineering II
11-791 Software Engineering
11-772 Analysis of Social Media
11-761 Language and Statistics

11-748 Information Extraction

11-741 Information Retrieval
11-731 Machine Translation
11-721 Grammars and Lexicons
11-711 Algorithms for NLP

Languages Spoken:   Japanese (mother tongue), English (fluent)
Languages Researched:   Japanese, English, Chinese
Languages Studied:   French (2 years), Chinese (1 year), Spanish (1 year)