I am a final-year Ph.D. student in the Language Technologies Institute of the School of Computer Science at Carnegie Mellon University, working with my fantastic advisor, Eduard Hovy. I interned at Facebook AI, Allen Institute for AI (AI2), and Microsoft Research. My Ph.D. study has been supported by Allen Institute for AI (AI2) Fellowship, CMU Presidential Fellowship, and ILJU Graduate Fellowship. In the middle of my study, I completed my alternative military service in South Korea at Naver Labs and KAIST Institute. Before joining CMU, I obtained my BS and MS in Computer Science Engineering at KAIST, Korea.
Neural-symbolic integration for knowledge-augmented generation. The open-ended QA is a sub-task of language generation where external knowledge and its reasoning (“what” in Ideational metafunction) are required to answer the question. To take advantage of two opposing systems (neural and symbolic), I developed three ways of neural-symbolic integrated systems through embedding representations, data [ACL18adventure], and modules [EMNLP18nsnet], filling the knowledge gaps of the neural system and showing the state-of-the-art performance on the science QA tasks.
Text-planning for coherently-structured generation. A multi-text document (e.g., a paragraph) itself contains various forms of inductive structures that can be linguistically represented: co-reference, discourse, scripts, and more. Our proposed text-planning approach horizontally or vertically controls the high-level plans (“how” in Textual) using the linguistic structures before the surface realization. Our results show that the linguistic structures (e.g., relations [EMNLP19flownet][EMNLP17cgraph], scripts, policies [EMNLP19gorecdial], aspects [NAACL18peerread]) guide to produce more coherent text in different paragraph completion tasks.
Cross-stylization for stylistic generation. Style is a strategic choice of text for some personal or social goal (“who” in Interpersonal). It is formed by a complex combination of different stylistic factors, including formality, emotions, personal demographic traits, etc. Studying the nature of the co-varying combinations across different styles shed light on stylistic language in general called crossstylization. We addressed two fundamental challenges in the cross-style study: (1) the lack of parallel dataset to address the semantic drift [EMNLP19pastel] and (2) the lack of benchmark for cross-style correlation study [ARXIV19xslue].
Title: "Linguistically Informed Language Generation: A Multifaceted Approach"
xSLUE: A Benchmark and Analysis Platform for Cross-Style Language Understanding and Evaluation
Dongyeop Kang and Eduard Hovy
Under review (last update: November 9, 2019, 8:38 pm, GMT+8) [pdf | arxiv | data+leaderboard | code | bib]
Earlier Isn't Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization
Taehee Jung*, Dongyeop Kang*, Lucas Mentch and Eduard Hovy (*equal contribution)
EMNLP 2019 (Long) [pdf | arxiv | data+leaderboard | code | bib]
Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue
Dongyeop Kang, Anusha Balakrishnan, Pararth Shah, Paul Crook, Y-Lan Boureau and Jason Weston
EMNLP 2019 (Long) [pdf| arxiv | [data] | bib]
(Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Annotated Stylistic Language Dataset with Multiple Personas
Dongyeop Kang, Varun Gangal and Eduard Hovy
EMNLP 2019 (Long, Oral) [pdf| arxiv | data+code | slides | bib]
A Dataset of Peer Reviews (PeerReaD): Collection, Insights and NLP Applications
Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz
NAACL 2018 (Long) [pdf | arxiv | data&code | bib]
Eventera: Real-time Event Recommendation System from Massive Heterogeneous Online Media
Dongyeop Kang, DongGyun Han, Na Hea Park, Sangtae Kim, U Kang, Soobin Lee
ICDM 2014 (Demo) [pdf | bib | project page | demo]
Multidimensional Mining of Large-Scale Search Logs: A Topic-Concept Cube Approach
Dongyeop Kang, Daxin Jiang, Jian Pei, Zhen Liao, Xiaohui Sun, Ho-Jin Choi
WSDM 2011 (Long) [pdf | bib | journal version]
Program Committee / Reviewer of ICLR20, ACL20 (Generation track), NeurIPS20, EMNLP20
Program Committee / Reviewer of ICLR19, ICML19, NAACL19 (Style track), ACL19 (Machine Learning / Generation / Question Answering / Sentence-level Semantics / Applications track), EMNLP19 (Lexical Semantics track), NeurIPS19, W-NUT19, Scientometrics19
Program Committee / Reviewer of NeurIPS18 (top-30% reviewer), EMNLP18 (Discourse track), ACL18 (Machine Learning track) (top-reviewer), MRQA18
Eventrader, founder, 2015, Seoul, Korea
Naver Labs, researcher, 2014-2015, Seoul, Korea
KAIST Institute, researcher, 2012-2014, Seoul, Korea
ETRI, intern, 2008, Daejeon, Korea.
CMU, teaching assistant, Machine Translation and Sequence to Sequence Models (advisor: Graham Neubig), 2017 Spring
Last updated in January 2020