Click for a larger picture Le Zhao

PhD Candidate
Language Technologies Institute
School of Computer Science

   lezhao @ cs . cmu . edu
Office: 4622 Newell Simon Hall
Carnegie Mellon University
Pittsburgh, PA 15213, USA

Advisor: Jamie Callan and our DIRGroup

Click for a larger picture

| research | publications | teaching | CV | proses | tricks for research | contact |
     

Introduction

I am a 4th year PhD student in the Language Technologies Institute (LTI), School of Computer Science (SCS), CMU.

My research interests are in text information retrieval and applications of natural language understanding. My current focus is on structured text retrieval, and identifying applications where structured queries might help. We develop and use the structured retrieval abilities of the Indri search engine of the Lemur project. With the more powerful query language, we hope to exploit structures in the information needs of search applications, and impove retrieval accuracy over simple keyword retrieval (which provides a surprisingly good baseline). We support applications such as Question Answering, Intelligent Tutoring and XML retrieval. Other search applications include legal search where lawyers issue complex boolean queries expressing precisely what they are looking for. Patent search might be another application where documents have structures and queries may ask about the structure, making queries structural.

Research Interests (+ interesting courses)

  • Query structure (aspects) and Document structure (annotations, parse trees, etc.), Structured retrieval models, XML Retrieval, Search Engine Indexing
  • Text Retrieval (Sentence level novelty detection, probabilistic models, language modeling, formalizing the notion of Relevance) and Web Search IR 11-741, Advanced IR seminar 11-743
  • Natural Language Processing (syntactical and semantical theories of natural language understanding, statistical or rule-based) Algorithms in NLP 11-711, Language and Statistics I 11-761, Language and Statistics II 11-762, Grammar Formalisms 11-722 (There are not so many courses about how to get the semantics -- e.g. event-patient-agent -- out of natural language texts, and this is a great basic course of that.)
  • Data Mining & Database Multimedia Databases and Data Mining 15-826 (In the real world: many Bursty distributions, power laws, fractals.. ideas about graph mining and analyzing real world data.)
  • Machine Learning (with its relation to language and intelligence, mostly applying/devising ML tools for NLP) Advanced ML seminar 11-745 (statistical analysis, problems, methods, bias variance etc.), Graphical Models 10-708 (modeling random variables and relations among them, in a graph and solve inference problems efficiently)

Selected Publications (full list here)

  • Le Zhao and Jamie Callan. "Effective and efficient structured retrieval" (poster description). In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009). Hong Kong. To appear. paper poster
  • Le Zhao and Jamie Callan. "A generative retrieval model for structured documents". In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM 2008), Napa valley, USA paper slides
  • Yangbo Zhu, Le Zhao, Jamie Callan and Jaime Carbonell. "Structured Queries for Legal Search". TREC 2007. November 2007. paper TREC poster
  • Le Zhao, Min Zhang, Shaoping Ma. "The nature of novelty detection", Journal of Information Retrieval, 9(5): pp. 521-541, November 2006 paper

Working

  • 2006.3-2006.6 Internship at Sogou.com, a Chinese search engine company, worked to improve the relevance ranking of web documents.

Teaching

Latest!

  • 2008-02-20 -> Jasmine staying in Pittsburgh and lives happily ever after with Le!
  • 2007-01-25 -> 03-15 Jasmine coming to Pittsburgh!
  • 2006-08-15 -> 08-17 Graduate School Orientation, see my space for photos.
  • 2006-08-01 Leaving for LTI, CMU (Pittsburgh) for my yet another Masters degree... Hopefully to continue on PhD.
  • 2006-06-17 -> 06-18 Going to Jasmine's home (our home in Shijiazhuang).
  • 2006-06-13 -> 06-17 Honeymoon: Yalong Bay, Sanya. This will be the hotel.
  • 2006-03-31 Married to Jasmine! (Take a look if you could, as she is much better than me in expressing the marriage excitement and love experience. Certainly I am at least as excited and happy as she feels.)

Some Interesting Resources

  • Topical Words: Top 100 popular words in low grade level ranges (5-8 in K-12), and popular words in topics such as Arts, Business, Computer, Health, Science, Society, Sports, Music, MovieAndTheater, Biology, Fitness, Religion, Politics, LawAndCrime, History etc. Starts from 3rd column.
  • Kid sites list (a fairly complete one): This is a list of about 1,800 websites that are of low reading difficulty level. Not everyone of them is very good, but there are many interesting ones. A byproduct of research (the first number is a popularity score, whether the site has many low difficulty pages). Boys and girls, Enjoy!

Other Interests

  • Probability; Statistics, Stochastic Processes, Measure Theory, Functional Analysis
  • Logic and Linguisitics (especially interested, it's also related to my research work)
  • Differential Manifolds, Topology
  • Philosophy, Psychology, Buddhism, Aesthetics, Abstract theories of human intellects
  • Literature(reading), Classical Music(Bethoven, Chopin), Movies
  • Volleyball(LTI won Championship in 2007, 2008 and 2009!), Badminton, Tennis, Swimming, Skiing (a lot fun), Parachuting, Camping(Pity! Never tried these two before)
  • Cuisine(still improving...), Houseology(figuring out ways to keep housework simple while keeping the house neat -- Once I read about this terminology on the web so I borrowed it here.)

Contact Me

username@cs.cmu.edu (change username->lezhao)
+1-412-268-6748 (Office)

Friends

Jaime Arguello, Matthew Bilotti, Vasco Calais Pedro, Vitor Carvalho, Sourish Chaudhuri, Wei Chen, Kevyn Collins-Thompson, Jonathan Elsas, Bin Fu, Wenjie Fu, Qin Gao, Kevin Gimpel, Fan Guo, Abhay Harpale, Michael Heilmann, Hongwen Kang, Yan Ke, Anagha Kulkarni (Joshi), Mohit Kumar, Rohit Kumar, Abhimanyu Lad, Ni Lao, Lei Li, Fan Li, Chenmin Liang, Frank Lin, Jialiu Lin, Liu Liu, Yan Liu, Jie Lu, Udhyakumar (Udhay) Nallasamy, Paul Ogilvie, Juan Pino, Yanjun Qi, Jinghai Rao, Runting Shi, Hideki Shima, Boting (Henry) Shu, Luo Si, Mengqiu Wang, Yang Wang, Wen Wu, Hong Yan, Rong Yan, Hui (Grace) Yang, Jun Yang, Wei Yu, Xin Zhang, Ying (Joy) Zhang, Yangbo Zhu

Qi-Xing Huang, Jiao Li, Yiqun Liu, Canhui Wang, Min Zhang,

Links

Tsinghua IR group, AI lab, www.net9.org(by CS Undergraduates of Tsinghua)
2nd Middle School of Huzhou, Junior Middle School Alumni, High School Alumni


Le Zhao
Last Update: 2009-11-16