DI XU

5000 Forbes Ave. GHC 5404, Pittsburgh, PA 15213

(765) 409-9707

dix@cs.cmu.edu

OBJECTIVE

To advance in the field of Information Retrieval and Text Mining

To get equipped with Machine Learning skills and insights

To get practices in Software Engineering

 

RESEARCH INTEREST

Information Retrieval, Question Answering, Text Mining, Spoken Term Detection and Machine Learning

 

EDUCATION

Carnegie Mellon University, Pittsburgh, PA

August 2013 每August 2015 (Expected)

Master of Language Technologies, School of Computer Science

Graduate Research Fellowship

GPA: 3.89/4.0

Advisor: Prof. Florian Metze

 

Purdue University, West Lafayette, IN

August 2009 每 May 2013

1st Bachelor of Science, Computer Science; 2nd Bachelor of Science, Statistics

GPA: 3.93/4.0 (with distinction)

Major Concentration: Machine Intelligence and Software Engineering

Dean*s List since Admission

 

RESEARCH EXPERIENCE

Graduate Research Assistant

August 2014Present

Language Technologies Institute, School of Computer Science

Carnegie Mellon University, Pittsburgh, PA

Project: The IARPA Aladdin Video Program

Supervisor: Dr. Alex Hauptmann

w  Experimented with multiple learning to rank algorithms for MED late fusion; worked on Multi-media Event Recounting and Summarization

 

Independent Research

Summer 2014

Language Technologies Institute, School of Computer Science

Carnegie Mellon University, Pittsburgh, PA

Project: TREC 2014 每 Web Search (Ad hoc)

Supervisor: Prof. Jamie Callan

w  Performed extensive studies on well-known retrieval models; explored topic-modeling based pseudo-relevance feedback and query expansion approaches; explored multiple Learning to Rank and data fusion techniques

 

Independent Research

Summer 2014

Language Technologies Institute, School of Computer Science

Carnegie Mellon University, Pittsburgh, PA

Project: TREC 2014 每 Contextual Suggestion Track

Supervisor: Prof. Jamie Callan

w  Performed large scale web crawling and mining, with Google, Yelp and Wikipedia APIs; implemented intelligent user preference models with various text mining methods; developed a large scale intelligent information system

 

Graduate Research Assistant

August 2013August 2014

Language Technologies Institute, School of Computer Science

Carnegie Mellon University, Pittsburgh, PA

Project: The IARPA Babel Program

Supervisor: Prof. Florian Metze

w  Developed and published the word-based Probabilistic Phonetic Retrieval model for spoken term detection on low resource languages; implemented tools for significance tests; performed system fusions; coordinated and prepared final submissions for IARPA Babel OP1 evaluation

 

Volunteered Research Assistant

August 2012 每 Spring 2014

Department of Computer Science

University of Illinois at Urbana Champaign, Urbana, IL

Project: Consistent Language Model for Keyword Search over Unstructured Documents

Supervisor: Prof. Marianne Winslett, Dr. Arash Termenchy

w  Implemented state of the art smoothing methods; performed extensive studies to evaluate the effectiveness of multiple language modeling methods; designed novel modeling methods in vector space

 

Undergraduate Research Assistant

May 2012 每 May 2013

Department of Statistics

Purdue University, West Lafayette, IN

Project: Pattern Mining over Time Series and Drought Detection

Supervisor: Prof. Sergey Kirshner

w  Implemented Hidden Markov Models and the Viterbi Algorithms; developed a geographical and meteorological plotter of specified area; conducted extensive training and testing on different models

 

Undergraduate Research Intern

May 2012 每 August 2012

Information Trust Institute

University of Illinois at Urbana Champaign, Urbana, IL

Project: Principled and Optimal Language Model for Keyword Search over Structured Documents

Supervisor: Prof. Marianne Winslett             

w  Developed novel and effective search algorithms for keyword queries on semi-structured data; implemented multiple statistical language modeling based retrieval models; developed novel smoothing techniques; performed extensive studies to evaluate the effectiveness of various language modeling approaches

 

Undergraduate Research Assistant

May 2010 每 December 2011

Department of Computer Science

Purdue University , West Lafayette, IN

Project: Loop-level Data Dependence Profiling and Multicore Processing

Supervisor: Prof. Zhiyuan Li

w  Developed a framework for testing thread based parallel C programs; analyzed loop-level data dependencies of SPEC CPU2000 benchmark 197.parser and  developed its thread based parallel version using OpenMP.

 

TEACHING EXPERIENCE

Purdue University, West Lafayette, IN

w   C Programming Applications for Engineers (Spring 2011)

w   Programming with Multimedia Objects (Summer 2011       , Fall 2012)

w   Introduction to Computers (Fall 2011)

 

PUBLICATIONS

[1]   A. Ge, D. Xu and L. Yang. A Novel Maximum Power Point Tracking Method under Non-uniform Insolation Conditions. Infrared and Laser Engineering, No. 6 Volume 42, 2013. ISSN: 1007-2276 CN 12-1261/TN;

[2]   D. Xu and F. Metze. Word-based probabilistic phonetic retrieval for low-resource spoken term detection. In 15th Annual Conference of the International Speech Communication Association (ISCA). INTERSPEECH 2014.

[3]   D. Xu, Y. Wang, and F. Metze. EM-based phoneme confusion matrix generation for low-resource spoken term detection. In Spoken Language Technology (SLT). IEEE, 2014.

[4]   D. Xu, and J. Callan. Modelling Psychological Needs for User-dependent Contextual Suggestion. In Proceedings of the Twenty-Third Text REtrieval Conference (TREC 2014). NIST, to appear.

[5]   D. Xu, and J. Callan. Towards a simple and efficient web search framework. In Proceedings of the Twenty-Third Text REtrieval Conference (TREC 2014). NIST, to appear.

 

PROFESSIONAL SKILLS

Java, C, C++, Python, bash, LaTex

Lucene, Indri, UIMA, Maven, Git, OpenFST, OpenMP/MPI, SQL, Google Web App, Android

 

PROFESSIONAL ACTIVITIES

IEEEXTREME Programming Competition

Purdue University, Fall 2010

ACM-ICPC Regional Contest

University of Cincinnati, Fall 2011