Ashwin Tengli

ASHWIN TENGLI		4 Bayard Rd., Apt #41 Pittsburgh, PA 15213 tengli [at] cs [dot] cmu [dot] edu

Educational Qualifications	Degree		College/University		Year
	Masters in Language Technologies		School of Computer Science, Carnegie Mellon University, Pittsburgh		2004
	Bachelor of Engineering (Computer Science and Engineering)		Sri Jayachamarajendra College of Engineering, University of Mysore, India		2000
	P.U.C (Grade 12)		N.V. P.U. College, Gulbarga, India		1996
	S.S.L.C. (Grade 10)		Sri Ramakrishna Vidyashala, Mysore, India		1994

Areas of Interest	Machine Learning, Data Mining, Artificial Intelligence, Machine Vision, Information Retrieval and Extraction
Experience	Research Programmer			October 2004-Present
	Auton Lab, Robotics Institute, CMU Designed and developed a class of algorithms called Predictive Rule Lists that combine ideas of model trees and decision lists with robust statistics, and efficiently model small sets of dirty data. This research was motivated by real-life examples of data that came in short supply, had relatively many predictor variables and were noisy and suspected of containing outliers. I am currently working on developing a collection of fast classifiers that efficiently handle both real and symbolic attributes in dense and sparse data. They also make use of cached sufficient statistics for accelerating the various machine learning algorithms. In this work I have investigated creating efficient ball trees in metric space to speed up k nearest neighbor search. I am also presently investigating on developing fast algorithms to compute the Least Median of Squares regression line, which currently has solutions of only combinatorial nature. I intend to use geometrical properties of the problem along with cached sufficient statistics to speed up the search.
	Research Assistant			September 2002-September 2004
	Cross LAnguage Information Retrieval (CLAIR) group, CMU Experimented on focused crawling to direct crawl towards web pages of interest. Designed a system to gather multilingual news stories and extract them. Was involved in the design of a system that collects statistical information from distributed websites, extracts it, indexes the information and provides a Question Answering interface to query the indexed data. Designed and developed the table agent and graph agent as a part of this system. The table agent learns extraction from HTML tables by examples. The graph agent converts series data into curves and uses curve comparison measures to compute similarity between the data. My work also involved developing a crawler to find web pages with statistical information about universities and querying websites with information hidden in databases.
	Software Engineer			June 2001-July 2002
	PicoPeta Simputers Pvt. Ltd., Bangalore, India Worked on design and development of applications for Simputer, a handheld device which is a low cost alternative to PCs. I worked on design and development of IMLI, a browser that supports the IML markup language. IMLI supports display of Indian languages, and is also integrated with a speech-synthesis system, that is capable of synthesizing voice in Indian languages. Also worked on customizing Picogui windowing system for Simputer and wrote the USB device driver for the WorldSpace^TM satellite radio receiver to interface with Simputer.
	Software Engineer			September 2000-June 2001
	Philips Software Center Pvt. Ltd., Bangalore, India Worked on the BATE project (Browsing for Access to a TV Environment). This project involved developing built-in embedded applications for Philips range of Digital Satellite Receivers (Set-Top Box). I worked as designer and developer for some of the modules of EPG, Menu and the supporting DLLs. I also designed and developed a generic display handler to display the visual components on to the TV and also worked on design of a simple internet browser for TV.
Publications	Predictive Rule Lists for Modeling Small Sets of Dirty Data Ashwin Tengli, Artur Dubrawski and Lujie Chen In the Proceedings of International Conference on Information and Automation, Colombo, Sri Lanka, 2005 This paper introduces Predictive Rule Lists; new structures which combine ideas of decision lists and model trees with robust statistics. The paper illustrates the utility of the concept using a selection of problems typically approached with multiple linear regression and provides empirical results that reveal features that may be appealing in practical situations. Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nianli Ma In the Proceedings of 20^th International Conference in Computational Linguistics, Geneva, Switzerland, 2004 Developed algorithms to extract data from HTML tables. Algorithm leverages layout information in HTML and learns lexical information from examples. The system performs complete extraction task and achieves the best performance we have known so far. Resolving Uncertainty in Guided Systems - The Cows and Bulls Problem Yogananda A.P. and Ashwin Tengli Presented at Cyberia’99 (a National Conference on Electrical & Computer Science) Developed efficient algorithms to play a game called the Cows and Bulls problem. The problem involved finding the desired permutation of size k out of n elements.
Academic Projects	Effectiveness of Wrapper Extraction from Distributed Websites Studied effectives of existing Wrapper Extraction techniques on HTML documents when the documents are collected from distributed websites. Stacked Bidirectional Maximum Entropy Markov Model (MEMM) Developed a bidirectional version of MEMMs to overcome the label bias and observation bias problems, but without incurring the optimization costs of CRFs. Tested it on the sequential classification task of protein secondary structure prediction. Consistency Measure of Sources Developed trend similarity comparison methods to cluster multidimensional news source data. Developed methods to convert the multidimensional data to curves and methods to measure similarity of news sources using the curves. Role Based Named Entity Extraction Explored methods for role based named entity extraction for the template filling task. Used feature selection measures like Information Gain and parts of speech to weight terms. Support Vector Machines was used as a classifier to classify the named entities into the slots of the template. Converge: A prototype for a public web search engine Developed a search engine that performs efficient query processing and incremental ranking of pages.
Graduate Courses	Machine Learning, Information Extraction, Language and Statistics, Information Retrieval, Advanced IR Seminar and Lab, Algorithms for NLP, Advanced AI Concepts, Software Engineering for IT, NLP Lab
Technical	Languages: C, C++, Perl, Java, Pascal
Skills	Operating Systems: Linux, Windows
	Databases: PostgreSQL, MySQL
	Tools: CVS, Ant, Eclipse, Matlab
	Object Oriented Analysis and Design: UML, Design Patterns