|
ASHWIN TENGLI |
tengli [at] cs [dot] cmu [dot] edu |
||||
|
|
|
||||
|
Educational Qualifications |
Degree |
College/University |
Year |
||
|
Masters in Language
Technologies |
|
2004 |
|||
|
Bachelor of Engineering
(Computer Science and Engineering) |
Sri Jayachamarajendra College
of Engineering, |
2000 |
|||
|
P.U.C (Grade 12) |
|
1996 |
|||
|
S.S.L.C. (Grade 10) |
Sri Ramakrishna Vidyashala, |
1994 |
|||
|
|
|
||||
|
Areas of Interest |
Machine Learning, Data Mining, Artificial Intelligence, Machine Vision, Information Retrieval and Extraction |
||||
|
Experience |
Research Programmer |
October 2004-Present |
|||
|
Auton Lab, Robotics Institute, CMU Designed and developed a class of algorithms called Predictive
Rule Lists that combine ideas of model trees and decision lists with
robust statistics, and efficiently model small sets of dirty data. This
research was motivated by real-life examples of data that came in short
supply, had relatively many predictor variables and were noisy and suspected
of containing outliers. I am currently working on developing a collection of
fast classifiers that efficiently handle both real and symbolic attributes in
dense and sparse data. They also make use of cached sufficient statistics
for accelerating the various machine learning algorithms. In this work I have
investigated creating efficient ball trees in metric space to
speed up k nearest neighbor search. I am also presently investigating
on developing fast algorithms to compute the Least Median of Squares
regression line, which currently has solutions of only combinatorial nature.
I intend to use geometrical properties of the problem along with cached
sufficient statistics to speed up the search. |
|||||
|
|
Research Assistant |
September 2002-September
2004 |
|||
|
Cross LAnguage Information Retrieval (CLAIR) group, CMU Experimented on focused crawling to direct crawl towards web pages of interest. Designed a system to gather multilingual news stories and extract them. Was involved in the design of a system that collects statistical information from distributed websites, extracts it, indexes the information and provides a Question Answering interface to query the indexed data. Designed and developed the table agent and graph agent as a part of this system. The table agent learns extraction from HTML tables by examples. The graph agent converts series data into curves and uses curve comparison measures to compute similarity between the data. My work also involved developing a crawler to find web pages with statistical information about universities and querying websites with information hidden in databases. |
|||||
|
|
Software Engineer |
June 2001-July 2002 |
|||
|
PicoPeta Simputers Pvt. Ltd., Worked on design and development of applications for Simputer, a handheld device which is a low cost alternative to PCs. I worked on design and development of IMLI, a browser that supports the IML markup language. IMLI supports display of Indian languages, and is also integrated with a speech-synthesis system, that is capable of synthesizing voice in Indian languages. Also worked on customizing Picogui windowing system for Simputer and wrote the USB device driver for the WorldSpaceTM satellite radio receiver to interface with Simputer. |
|||||
|
|
Software Engineer |
September 2000-June 2001 |
|||
|
Philips Software Center Pvt. Ltd., Worked on the BATE project (Browsing for Access to a TV Environment). This project involved developing built-in embedded applications for Philips range of Digital Satellite Receivers (Set-Top Box). I worked as designer and developer for some of the modules of EPG, Menu and the supporting DLLs. I also designed and developed a generic display handler to display the visual components on to the TV and also worked on design of a simple internet browser for TV. |
|||||
|
Publications |
Predictive
Rule Lists for Modeling Small Sets of Dirty Data Ashwin
Tengli, Artur Dubrawski and Lujie Chen In
the Proceedings of International Conference on Information and Automation, This paper introduces Predictive Rule Lists; new structures which combine ideas of decision lists and model trees with robust statistics. The paper illustrates the utility of the concept using a selection of problems typically approached with multiple linear regression and provides empirical results that reveal features that may be appealing in practical situations. Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nianli Ma In the Proceedings of 20th International
Conference in Computational Linguistics, Developed algorithms to extract data from HTML tables. Algorithm leverages layout information in HTML and learns lexical information from examples. The system performs complete extraction task and achieves the best performance we have known so far. Resolving Uncertainty in Guided Systems - The Cows and Bulls Problem Yogananda A.P. and Ashwin Tengli Presented at Cyberia’99 (a National Conference on Electrical & Computer Science) Developed efficient algorithms to play a game called the Cows and Bulls problem. The problem involved finding the desired permutation of size k out of n elements. |
||||
|
Academic Projects |
Effectiveness of Wrapper Extraction
from Distributed Websites Studied effectives of existing Wrapper Extraction techniques on HTML documents when the documents are collected from distributed websites. Stacked Bidirectional Maximum
Entropy Markov Model (MEMM) Developed a bidirectional version of MEMMs to overcome the label bias and observation bias problems, but without incurring the optimization costs of CRFs. Tested it on the sequential classification task of protein secondary structure prediction. Consistency Measure of Sources Developed trend similarity comparison methods to cluster multidimensional news source data. Developed methods to convert the multidimensional data to curves and methods to measure similarity of news sources using the curves. Role Based Named Entity Extraction Explored methods for role based named entity extraction for the template filling task. Used feature selection measures like Information Gain and parts of speech to weight terms. Support Vector Machines was used as a classifier to classify the named entities into the slots of the template. Converge: A prototype for a public web search engine Developed a search engine that performs efficient query processing and incremental ranking of pages. |
||||
|
Graduate Courses |
Machine Learning, Information Extraction, Language and Statistics, Information Retrieval, Advanced IR Seminar and Lab, Algorithms for NLP, Advanced AI Concepts, Software Engineering for IT, NLP Lab |
||||
|
Technical |
Languages: C, C++, Perl, Java, Pascal |
||||
|
Skills |
Operating Systems: Linux, Windows |
||||
|
|
Databases: PostgreSQL, MySQL |
||||
|
|
Tools: CVS, Ant, Eclipse, Matlab |
||||
|
|
Object Oriented Analysis and Design: UML, Design Patterns |
||||