Xiaohua Yan


  • 5000 Forbes Ave, Pittsburgh
  • xiaohuay@cs.cmu.edu
  • +1 412 230 6988
  • www.cs.cmu.edu/~xiaohuay


Machine Learning | Natural Language Processing | Bioinformatics


Aug. 2012 - Present

Carnegie Mellon University, Pittsburgh, PA, USA
Masters in Language Technologies GPA: 3.7 Advisor: Joy Y. Zhang

Sep. 2008 - June 2012

Nanjing University of Technology, Nanjing, China
Bachelor of Science in Information and Computing Science GPA: 3.9 Advisor: Jianfeng Shao

Sep. 2005 - July 2008

Suzhou High School, Suzhou, China
Preparatory years for Chinese National College Entrance Examination (Physics/Chemistry section)


Year 2013

Early Detection of Cyber Security Threats using Structured Behavior Modeling. Xiaohua Yan, Joy Y. Zhang.
Submitted to ACM Transactions on Information and System Security

SNR of DNA sequences mapped by general affine transformations of the indicator sequences. Jianfeng Shao, Xiaohua Yan, Shuo Shao.
Journal of Mathematical Biology

Year 2012

3-periodicity of DNA sequence signals (in Chinese). Jianfeng Shao, Xiaohua Yan, Wei Shao and Guoqing Liu.
Journal of Nanjing University of Technology

Numerical mappings of DNA sequences and their effects on 3-base periodicity behavior (in Chinese). Jianfeng Shao, Wei Shao, Xiaohua Yan and Jian Zhao.
Journal of Nanjing University of Technology



Mathematical Analysis, Numerical Analysis, Data Analysis, Probability Theory and Statistics, Discrete Mathematics, Optimization and Operation Research.

Computer Science

Data Structures, Operating Systems, Databases, Computer Graphics.

Artificial Intelligence

Machine Learning, Structured Prediction, Probabilistic Graphical Models, Algorithms in Natural Language Processing.


Digital Signal Processing, Statistical Signal Processing, Digital Image Processing, Information Theory.


Jan. 2013 - Present

Early Detection of Cyber-threats
Advisor: Prof. Joy Zhang

  • Proposed a structured early intrusion detection system capable of detecting cyber attacks in the early planning period so that courses of defense actions can be made in advance of real compromises.

Sep. 2012 - Jan. 2013

Predicting the Outbreaks of Influenza
Advisor: Prof. Roni Rosenfeld

  • Experimented with different models (e.g., Bayesian networks, Auto-regression) for the prediction of flu outbreaks, based on seasonal influenza data.

Apr. 2013

Classification of Fake and Real Articles
Course project of Language and Statistics

  • Leaded the winning team of the course project that presented a machine-oriented solution of classifying between fake and real articles.

Feb. 2013 - May 2013

Named Entity Recognition on Tweets
Course project of Probabilistic Graphical Models

  • Made a comparison of sequential and topic models for Named Entity Recognition on tweets. Proposed a semi-supervised approach for NER on tweets based on the Latent Dirichlet Allocation topic model.

Sep. 2012 - Nov. 2012

Relational Classification of Noun Phrases
Course project of Machine Learning

  • Proposed a novel relation classification model for noun phrases in the NELL knowledge base, which is not only capable of examining whether a pair of noun phrases satisfies a relation in NELL, but also able to determine the most probable relation given the noun phrase pair.

Aug. 2011

Shenzhen Zhongxing Information Technology Co, Ltd, Nanjing Branch
Software Engineer Intern

July 2011

Gene Prediction based on 3-periodicity of Genetic Power Spectra
Undergraduate thesis

  • Proposed a fast algorithm of computing the signal-to-noise ratio of DNA sequences that boosts the performance of gene prediction algorithms and achieves high accuracy on nucleotide level.



  • Java
  • Python
  • SAS
  • SPSS


  • LaTeX
  • HTML
  • XML
  • CSS


  • English (Fluent) | TOEFL: 112 | GRE: V 660 Q 800 AW 5.0
  • Mandarin Chinese (Native)
