| |
Research
I'm interested in text information retrieval and applications of natural language understanding, using tools such as statistical machine learning, logic/semantics and computational linguistics.
My current focus is on structured text retrieval. The grand goal is to automatically create structured queries from unstructured queries to improve retrieval effectiveness. We exploit several kinds of structure: the structure of the information need (as identified from the original keyword query), the predicted matching between query terms/concepts and relevant documents, and the available document structure from natural language parsing tools or XML/HTML fields which enhances the expressivity of structured queries. We develop and use the structured retrieval abilities of the Indri search engine of the Lemur project. We support applications such as Ad-hoc retrieval, relevance feedback, pseudo relevance feedback, Question Answering, Intelligent Tutoring and XML retrieval. I also work on legal search and patent search, which are structure heavy in their own ways.
- Le Zhao and Jamie Callan. "Effective and efficient structured retrieval" (poster description). In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009). Hong Kong. To appear. paper poster
- Le Zhao and Jamie Callan. "A generative retrieval model for structured documents". In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM 2008), Napa valley, USA paper slides
- Yangbo Zhu, Le Zhao, Jamie Callan and Jaime Carbonell. "Structured Queries for Legal Search". TREC 2007. November 2007. paper TREC poster
- Le Zhao, Min Zhang, Shaoping Ma. "The nature of novelty detection", Journal of Information Retrieval, 9(5): pp. 521-541, November 2006 paper
Teaching
Working
- 2006.3-2006.6 Internship at Sogou.com, a Chinese search engine company, worked to improve the relevance ranking of web documents.
Research Interests and Interesting Courses
- Query structure (aspects) and Document structure (annotations, parse trees, etc.), Structured retrieval models, XML Retrieval, Search Engine Indexing
- Text Retrieval (Sentence level novelty detection, probabilistic models, language modeling, formalizing the notion of Relevance) and Web Search IR 11-741, Advanced IR seminar 11-743
- Natural Language Processing (syntactical and semantical theories of natural language understanding, statistical or rule-based) Algorithms in NLP 11-711, Language and Statistics I 11-761, Language and Statistics II 11-762, Grammar Formalisms 11-722 (There are not so many courses about how to get the semantics -- e.g. event-patient-agent -- out of natural language texts, and this is a great basic course of that.)
- Data Mining & Database Multimedia Databases and Data Mining 15-826 (In the real world: many Bursty distributions, power laws, fractals.. ideas about graph mining and analyzing real world data.)
- Machine Learning (with its relation to language and intelligence, mostly applying/devising ML tools for NLP) Advanced ML seminar 11-745 (statistical analysis, problems, methods, bias variance etc.), Graphical Models 10-708 (modeling random variables and relations among them, in a graph and solve inference problems efficiently)
Latest!
- 2008-02-20 -> Jasmine staying in Pittsburgh and lives happily ever after with Le!
- 2007-01-25 -> 03-15 Jasmine coming to Pittsburgh!
- 2006-08-15 -> 08-17 Graduate School Orientation, see my space for photos.
- 2006-08-01 Leaving for LTI, CMU (Pittsburgh) for my yet another Masters degree... Hopefully to continue on PhD.
- 2006-06-17 -> 06-18 Going to Jasmine's home (our home in Shijiazhuang).
- 2006-06-13 -> 06-17 Honeymoon: Yalong Bay, Sanya. This will be the hotel.
- 2006-03-31 Married to Jasmine! (Take a look if you could, as she is much better than me in expressing the marriage excitement and love experience. Certainly I am at least as excited and happy as she feels.)
Some Interesting Resources
- Topical Words: Top 100 popular words in low grade level ranges (5-8 in K-12), and popular words in topics such as Arts, Business, Computer, Health, Science, Society, Sports, Music,
MovieAndTheater, Biology, Fitness, Religion, Politics, LawAndCrime, History etc. Starts from 3rd column.
- Kid sites list (a fairly complete one):
This is a list of about 1,800 websites that are of low reading difficulty level. Not everyone of them is very good, but there are many interesting ones.
A byproduct of research (the first number is a popularity score, whether the site has many low difficulty pages). Boys and girls, Enjoy!
Other Interests
- Probability; Statistics, Stochastic Processes, Measure Theory, Functional Analysis
- Logic and Linguisitics (anything related to human intelligence, and computerizing them)
- Differential Manifolds, Topology
- Philosophy, Psychology, Buddhism, Aesthetics, Abstract theories of human intellects
- Literature(reading), Classical Music(Bethoven, Chopin, Mozart), Movies
- Volleyball(LTI won Championship in 2007, 2008 and 2009!), Badminton, Tennis, Swimming, Skiing (a lot fun), Parachuting, Camping(Pity! Never tried these two before)
- Cuisine(keep improving), Houseology(figuring out ways to keep housework simple while keeping the house neat -- Once I read about this terminology on the web so I borrowed it here.)
Contact Me
username@cs.cmu.edu (change username->lezhao)
Le Zhao
Last Update: 2010-01-23
|