navneetrao[at]cmu.edu
Language Technologies Institute
School of Computer Science
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213
"The ones who are crazy enough to think that they can change the world, are the ones who do!” – Steve Jobs ------------------------------
------------------------------
About Me
I'm a master's student at the Language Technologies Institute, School of Computer Science at Carnegie Mellon University.
My interests include:
Aug 2013 - Dec 2014 (expected)
Carnegie Mellon University, Master's in Intelligent Information Systems
Aug 2008 - Aug 2012
University of Pune, Bachelor's in Computer Engineering
May 2014 - Aug 2014
IBM Watson, Graduate Summer Intern
Dec 2012 - May 2013
Tata Consultancy Services (TCS), Assistant System Engineer Trainee
Sep 2011 - Mar 2012
PuneTech Software, Intern
Certificate in Cloud Computing, TCS Business Domain Academy, May 2013
Foundation Certificate in Capital Markets, TCS Business Domain Academy, Apr 2013
Star of the Learner's Group, Tata Consultancy Services, Hyderabad, Apr 2013
Best Outgoing Student (Male), Rajarshi Shahu College of Engineering, Pune, May 2012
Best Project Award at 2 inter-university project competitions, Pune, Mar 2012
Best Paper Award at an inter-university technical symposium, Pune, Mar 2012
Certificate of Leadership, National Entrepreneurship Network, Sep 2011
Current Projects
Sept 2013 - May 2014
Text Content Classification & Organization using Machine Learning and Natural Language Processing
A 2009 TIME magazine article suggests that on average 78% of the 210 billion emails sent out every day are spam. Of that approximately 50% of the messages bypass spam filters. It is thus imperative to tackle this issue from various perspectives. We primarily address this problem from a language perspective. We seek to understand the characteristics of the spam content using feature engineering and natural language processing. We apply machine learning techniques to build models that can classify different kinds of email as per their content.
From an enterprise perspective it also becomes imperative that the different forms of text content (both spam and non-spam) are organized from the perspecives of different stakeholders, with minimal supervision. We consider the area of cyber-security and build and evaluate models which can organize text content. The data is organized as per custom generated taxonomies which are easily extensible.
I recently presented this work at the 2014 Command, Control, and Interoperability Center for Advanced Data Analysis (CCICADA) Research Symposium held at RPI, New York.
Tech: Java, Python, Lightside, Weka, SQL, Mallet, Scrapy Framework
Past Projects
1. Grad School Projects:
Mar 2014 - May 2014
Twitter Analytics - Cloud Computing
Design and implementation of an efficient REST based service, that required running analytics jobs on a 250 GB Twitter dataset. It involved iteratively building an efficient, performance-oriented front and back-end system within specific cost-constraints in a team of 3 students.
It first involved performing ETL phases using various MapReduce jobs. It then involved design and implementation of REST based services with a MySQL and HBase back-end.
Tech: Java, MapReduce, MySQL, HBase, Hadoop, Amazon Web Services, Amazon Elastic Map Reduce, Amazon EC2, Amazon Cloudwatch
Feb 2014 - Mar 2014
MediQA - Big Data Analytics
Design of a question answering system using PubMed medical data, augmented by existing information retrieval systems like Google.
It involved the design of the pre-processing system using MapReduce, followed by the indexing and retrieval of data. The retrieved documents were then to be parsed for extraction of answer nuggets. The design was such that it integrated results from Web Search engines like Google and thus augmented the overall system.
Tech: Java, MapReduce, Scrapy Framework, Question Answering
Sept 2013 - Dec 2013
Predicting the Quality of Amazon Reviews - Applied Machine Learning
Thousands of consumers provide reviews on websites for the products that they have purchased. Reviews which are displayed more prominently to the user tend to influence customer preferences. To ensure customer satisfaction E-Commerce portals try to predict the quality of online reviews, so that they can rank the reviews as per their quality and display the higher quality reviews more prominently.
I implemented a framework for predicting the quality of online reviews using scraped Amazon product review data so as to enable a better user experience. Various classification techniques like Naive Bayes, Logistic Regression, Support Vector Machines, Locally Weighted Learning, Bagging with Random Forests were evaluated on this dataset as part of this project. Various feature engineering techniques were also demonstrated along with standard machine learning techniques like ablation studies.
Tech: Lightside, Java, Weka, SQL
Sept 2013 - Oct 2013
Text Search Engine using a 10% Wikipedia Corpus
I created a Text Search Engine using a pre-indexed corpus consisting of 10% of all Wikipedia webpages, as part of the Search Engines and Web Mining course. I created parsers which would be able to handle structured queries consisting of operators like 'AND', 'OR', 'NEAR', 'WEIGHT', 'WINDOW' as well as handle Bag of Words(BoW) queries.
Retrieval models starting from Unranked Boolean to Ranked Boolean and Okapi BM25 were implemented iteratively. I also implemented statistical models like Indri with pseudo-relevance feedback.
Tech: Java, Lucene API
Oct 2013 - Nov 2013
Movie Recommender System using Netflix User Data
We implemented a Movie Recommender System using Collaborative Filtering as part of the Web Mining class. User-user as well as movie-movie similarity was calculated using various similarity metrics like cosine similarity and then recommendations were provided to the user. The dataset for this project was made available by Netflix.
Tech: Java, Multi-threading
Jan 2014 - Apr 2014
Implementing Machine Learning Models
As part of the Machine Learning class we implemented and experimented with various machine learning models like Decision Trees, Naive Bayes, Neural Network using the backpropagation algorithm, Hidden Markov Models
Tech: Java, Matlab
Nov 2013 - Dec 2013
Learning to Rank (LETOR)
We implemented preference ranking of retrieved documents in an IR system as part of the Web Mining class. By implementing the logistic regression algorithm, which learns a ranking function we were able to rank those documents which are more relevant, higher than those documents which are less relevant in the list of retrieved documents.
Tech: Java, Multithreading
Sept 2013 - Oct 2013
Cyber Security Taxonomies

Part of the operational taxonomy on cyber security
As part of an educational initiative at the Command, Control, and Interoperability Center for Advanced Data Analysis (CCICADA), I helped develop and visualize taxonomies for the field of cyber-security along with David Klaper. The central and user taxonomies were created and visualized by us whereas the operational taxonomy was visualized based on a pre-existing taxonomy taken from a research paper by the Software Engineering Institute at Carnegie Mellon
Tech: HTML, CSS, Javascript, XML
2. TCS Projects:
Apr - May 2013
Royal Bank of Scotland's Financial Reconciliation Project
After my initial training, I was part of the Royal Bank of Scotland team for a short duration. While we underwent training in the usage of the existing suite of deployed applications, I completed the certification on financial reconciliation. Working on this project gave me an insight into the actual functioning of the corporate world.
Tech: SQL, Data Warehousing
Feb 2013 - Mar 2013
Internal Assessment System for TCS Hyderabad
There were hundreds of trainees joining the Hyderabad centre every one or two weeks and all the assessments were done by instructors using excel sheets. The instructors were wasting a lot of time in keeping track of multiple assessments over a 3 month time period. During the last few weeks of the initial training, I was tasked with creating an assessment system for all the new trainees. Me and another trainee worked to create an application which could be efficiently used by the instructors over the corporate intranet. We also built a few features to allow the management to derive useful insights from the data. For my contribution, I was honoured with the Star of the Learner's Group award, which was given to only 3 people from among a batch of more than 350 trainees.
Tech: Java, SQL, JSP, Servlets, Javascript, HTML, CSS, Eclipse, Oracle 11
Dec 2012 - Feb 2013
Telecom Inventory Management System - Trainee Project
As part of the initial training at TCS, we were tasked with creating a dummy application using Enterprise Java. I headed a group of 6 trainees working on telecom inventory management. Our system was developed using the MVC 2 architecture and the front end was created using JSP.
Tech: Java, SQL, JSP, Servlets, Javascript, HTML, CSS, Eclipse, Oracle 11
3. College Projects:
Jul 2011 - Apr 2012
Decision Support System for Indian Stock Markets using Web Mining

Screen shot of the recommender engine's performance
For my senior year project, we worked on the problem of generating profitable stock market recommendations using recommendations given by analysts on Twitter and various financial websites.
It involved the creation of an efficient ranking mechanism, structuring of semi-structured and unstructured text recommendations, scraping of data from websites, creation of an efficient capital allocation engine and creation of a user friendly web interface for our clients.
While working on this project, our research paper on a new ranking mechanism was accepted at the IEEE International Conference on Intelligent Systems in Bulgaria (unfortunately we could not publish our paper). We also won 3 inter-university project competitions while working on this project.
Tech: Python, Scrapy Framework, SQL, PHP, Twitter API, MySQL, Linux, XPath, Yahoo Pipes.
Jan 2011 - April 2011
Votre Catalog - Personalized Reminder System using Google App Engine
We created a personalized reminder system which could be accessed by any user with a Google account. The intent was to explore the working of the Google App Engine.
Tech: Eclipse, Java, Google App Engine, App Engine Datastore, HTML, CSS, Javascript.
Aug 2009 - Aug 2011
Co-founder and President, Entrepreneurship Club
Helped create and grow the entrepreneurship club on campus to over 200 members. Led the student groups in organizing multiple events. Created partnerships with other colleges to create cross-campus events
Aug 2011 - Sep 2011
Head of Student Co-ordinators, Campus Recruitment
Helped co-ordinate on-campus recruitment drives for over 800 students trying for companies like Tata Consultancy Services and L&T Infotech
Aug 2010 - Aug 2011
Student Vice President, Computer Department Business Club
Helped create the computer department's business club to identify potential business opportunities and make students think about applying computing to business problems
Feb 2009 - Apr 2012
Content Writer, Campus Activities
I was involved in writing press articles about college activities throughout the course of my under-graduation. The articles were published in local newspapers in Pune
Aug 2009 - Aug 2011
Student Committee Member, Association of Computer Engineering Students (ACES)
I was on the student committee of the Association of Computer Engineering Students from 2009 to 2010, where i worked on redressing student issues.
Jun 2010 - Jun 2011
Member, Sankalp
Was involved in teaching basic computer skills to underprivileged children from rural areas of Pune, India