Research
I study machine learning, natural language processing, and data mining.
My longterm goals include improving mathematical understanding of empirical phenomena, and developing principled approaches to modern deep learning applications.

Publications
In the following list, the asterisk symbol (*) in the author list means equal contribution or alphabetical order.


The Limitations of Limited Context for Constituency Parsing
Yuchen Li*,
Andrej Risteski*
Association for Computational Linguistics (ACL), 2021
bibtex /
slides /
My Twitter post /
Andrej's Twitter post
We prove that, if the context for the parsing decision at each word is unbounded & bidirectional,
then the parsing model has full representational power.
On the other hand, even if the context is bounded in one direction (which is the case for various leading approaches to constituency parsing),
the parsing power is quite limited for hard grammars.


Complexity of Leading Digit Sequences
Xinwei He*,
AJ Hildebrand*,
Yuchen Li*,
Yunyi Zhang*
Journal of Discrete Mathematics & Theoretical Computer Science (DMTCS), vol. 22 no. 1, Automata, Logic and Semantics, 2020
bibtex /
poster
Consider the sequence of leading digits of 2^n: 1, 2, 4, 8, 1, 3, 6, ......
What is the complexity of such sequences?
They are neither periodic nor random.
How to quantify and distinguish the middle ground?
We prove that, the block complexity is linear for the sequence of leading digits of a^n in base b (except for some special cases).
In fact, we give explicit formula for the linear coefficients in terms of a and b.


Discovering Hypernymy in TextRich Heterogeneous Information Network by Exploiting Context Granularity
Yu Shi*,
Jiaming Shen*,
Yuchen Li,
Naijing Zhang,
Xinwei He,
Zhengzhi Lou,
Qi Zhu,
Matthew Walker,
Myunghwan Kim,
Jiawei Han
Conference on Information and Knowledge Management (CIKM), 2019
bibtex
This work discovers hypernymhyponym ("isa" relation) pairs by an innovative combination of information from natural language and graph structure.
Our system outperforms existing methods that utilize signals from text, graph, or both.


ContextSensitive Malicious Spelling Error Correction
Hongyu Gong,
Yuchen Li,
Suma Bhat,
Pramod Viswanath
The Web Conference (WWW), 2019
bibtex
Given a misspelled word, our algorithm generates candidates based on the surface form,
represents the context as a linear space, and
selects the candidate whose embedding is closest to the context subspace.
The spelling correction accuracy is better than the stateoftheart methods, improving the performance in downstream applications.


Temporal Motifs in Heterogeneous Information Networks
Yuchen Li*,
Zhengzhi Lou*,
Yu Shi,
Jiawei Han
Workshop on Mining and Learning with Graphs (MLG), in conjunction with KDD, 2018
bibtex
This work shows the effectiveness of temporal motifs (frequent substructures)
in quantifying the relationship between nodes in heterogeneous information networks,
and proposes efficient algorithms for counting a family of useful motifs.

Template of this page: credit to: link.

