ASHWIN TENGLI

 

 

4 Bayard Rd., Apt #41

 Pittsburgh, PA  15213

 tengli [at] cs [dot] cmu [dot] edu

 

 

Educational

Qualifications

Degree

College/University

Year

Masters in Language Technologies

 

School of Computer Science, Carnegie Mellon University, Pittsburgh

2004

Bachelor of Engineering (Computer Science and Engineering)

 

Sri Jayachamarajendra College of Engineering, University of Mysore, India

2000

P.U.C (Grade 12)

 

N.V. P.U. College, Gulbarga, India

1996

S.S.L.C. (Grade 10)

Sri Ramakrishna Vidyashala, Mysore, India

1994

 

 

Areas of Interest

Machine Learning, Data Mining, Artificial Intelligence, Machine Vision, Information Retrieval and Extraction

 

Experience

Research Programmer

October 2004-Present

Auton Lab, Robotics Institute, CMU

Designed and developed a class of algorithms called Predictive Rule Lists that combine ideas of model trees and decision lists with robust statistics, and efficiently model small sets of dirty data. This research was motivated by real-life examples of data that came in short supply, had relatively many predictor variables and were noisy and suspected of containing outliers. I am currently working on developing a collection of fast classifiers that efficiently handle both real and symbolic attributes in dense and sparse data. They also make use of cached sufficient statistics for accelerating the various machine learning algorithms. In this work I have investigated creating efficient ball trees in metric space to speed up k nearest neighbor search. I am also presently investigating on developing fast algorithms to compute the Least Median of Squares regression line, which currently has solutions of only combinatorial nature. I intend to use geometrical properties of the problem along with cached sufficient statistics to speed up the search.

 

 

Research Assistant

September 2002-September 2004

Cross LAnguage Information Retrieval (CLAIR) group, CMU

Experimented on focused crawling to direct crawl towards web pages of interest.  Designed a system to gather multilingual news stories and extract them. Was involved in the design of a system that collects statistical information from distributed websites, extracts it, indexes the information and provides a Question Answering interface to query the indexed data. Designed and developed the table agent and graph agent as a part of this system. The table agent learns extraction from HTML tables by examples. The graph agent converts series data into curves and uses curve comparison measures to compute similarity between the data. My work also involved developing a crawler to find web pages with statistical information about universities and querying websites with information hidden in databases.

 

 

Software Engineer

June 2001-July 2002

PicoPeta Simputers Pvt. Ltd., Bangalore, India

Worked on design and development of applications for Simputer, a handheld device which is a low cost alternative to PCs. I worked on design and development of IMLI, a browser that supports the IML markup language. IMLI supports display of Indian languages, and is also integrated with a speech-synthesis system, that is capable of synthesizing voice in Indian languages. Also worked on customizing Picogui windowing system for Simputer and wrote the USB device driver for the WorldSpaceTM satellite radio receiver to interface with Simputer.

 

 

Software Engineer

September 2000-June 2001

Philips Software Center Pvt. Ltd., Bangalore, India

Worked on the BATE project (Browsing for Access to a TV Environment). This project involved developing built-in embedded applications for Philips range of Digital Satellite Receivers (Set-Top Box). I worked as designer and developer for some of the modules of EPG, Menu and the supporting DLLs. I also designed and developed a generic display handler to display the visual components on to the TV and also worked on design of a simple internet browser for TV.

 

Publications

Predictive Rule Lists for Modeling Small Sets of Dirty Data

Ashwin Tengli, Artur Dubrawski and Lujie Chen

In the Proceedings of International Conference on Information and Automation, Colombo, Sri Lanka, 2005

This paper introduces Predictive Rule Lists; new structures which combine ideas of decision lists and model trees with robust statistics. The paper illustrates the utility of the concept using a selection of problems typically approached with multiple linear regression and provides empirical results that reveal features that may be appealing in practical situations.

 

Learning Table Extraction from Examples

Ashwin Tengli, Yiming Yang and Nianli Ma

In the Proceedings of 20th International Conference in Computational Linguistics, Geneva, Switzerland, 2004

Developed algorithms to extract data from HTML tables. Algorithm leverages layout information in HTML and learns lexical information from examples. The system performs complete extraction task and achieves the best performance we have known so far.

 

Resolving Uncertainty in Guided Systems - The Cows and Bulls Problem

Yogananda A.P. and Ashwin Tengli

Presented at Cyberia’99 (a National Conference on Electrical & Computer Science)

Developed efficient algorithms to play a game called the Cows and Bulls problem. The problem involved finding the desired permutation of size k out of n elements.

 

Academic Projects

 

Effectiveness of Wrapper Extraction from Distributed Websites

Studied effectives of existing Wrapper Extraction techniques on HTML documents when the documents are collected from distributed websites.

 

Stacked Bidirectional Maximum Entropy Markov Model (MEMM)

Developed a bidirectional version of MEMMs to overcome the label bias and observation bias problems, but without incurring the optimization costs of CRFs. Tested it on the sequential classification task of protein secondary structure prediction.

 

Consistency Measure of Sources

Developed trend similarity comparison methods to cluster multidimensional news source data. Developed methods to convert the multidimensional data to curves and methods to measure similarity of news sources using the curves.

 

Role Based Named Entity Extraction

Explored methods for role based named entity extraction for the template filling task. Used feature selection measures like Information Gain and parts of speech to weight terms. Support Vector Machines was used as a classifier to classify the named entities into the slots of the template.

 

Converge: A prototype for a public web search engine

Developed a search engine that performs efficient query processing and incremental ranking of pages.

 

 

Graduate Courses

Machine Learning, Information Extraction, Language and Statistics, Information Retrieval, Advanced IR Seminar and Lab, Algorithms for NLP, Advanced AI Concepts, Software Engineering for IT, NLP Lab

 

 

Technical

Languages: C, C++, Perl, Java, Pascal

Skills

Operating Systems:  Linux, Windows

 

Databases: PostgreSQL, MySQL

 

Tools: CVS, Ant, Eclipse, Matlab

 

Object Oriented Analysis and Design: UML, Design Patterns