• Download
  • View photo

About Me

Hi! My name is Leonardo Neves and I am attending Carnegie Mellon on the Master of Science in Intelligent Information Systems program on the Language Technologies Institute. Next year I will be joining Snapchat Research team in Los Angeles, CA.
During my time at CMU, I am focusing on expanding my knowledge in Machine Learning and scalable big data analytics. Also, I want to learn more about handling and learning from unstructured data, such as speech, text and video, making it possible to put all together to better understand language.


Here are my planned courses throughout the program:

Fall 2015

  • 10701 - Introduction to Machine Learning (PhD version of the course)
  • 11676 - Big Data Analytics
  • 11751 - Speech Recognition and Understanding
  • Spring 2016

  • 11642 - Search Engines
  • 11611 - Natural Language Processing
  • 11641 - Machine Learning for Text Mining
  • 11695 - Competition Programming and Problem Solving
  • Fall 2016

  • 17-648 - Engineering Data Intensive Scalable Systems
  • 98-012 - Fun with Robots
  • 11676 - Big Data Analytics (Being a TA for the course)
  • Work Experiences

    Yelp May 2016 - Aug 2016

    Software Engineer, Intern

  • Worked on reformulating the pipeline for the training and deployment of language models used for classifying queries into business categories.
  • Reduced engineering deployment time from several days to a single-click batch job, allowing models to be trained and deployed automatically in less than a day. Models are now trained once a week instead of once a year as before.
  • Built evaluation scripts to allow comparing performance of newly trained models.
  • Graduate Researcher - Language Technologies Institute Sep 2015 - May 2016

    Under Professor Florian Metze

  • Applied SVM models on multi-media dataset in order to detect events using audio features, achieving 40% accuracy on test set for 18-class classification.
  • Worked on automatic audio labeling using video features in order to increase training data by learning “how image concepts are supposed to sound like”.
  • Currently working on improving recurrent neural network based language models using visual concepts from videos.
  • Intel Parallel Computing Center - UFRJ Sep 2014 - Jul 2015

    Software Engineering, Intern

  • Allowed performance evaluation of SciCumulus, a cloud workflow engine designed to execute workflows in parallel, by modeling database and adapting existing code with minimum footprint.
  • Designed performance visualization tool by integrating SciCumulus and TAU paraprof profiling tool.
  • Published paper on performance evaluation for scientific applications on WPerformance 2015.
  • Pivotal Software Jun 2014 - Aug 2014

    Hadoop Software Engineering, Intern
  • Developed an enterprise Disaster Recovery solution for Pivotal Hadoop, achieving performance 170 times faster than simple asynchronous backup.
  • Fluxo Consulting Jul 2012 - Jul 2013

    Project Coordinator
  • Achieved second greatest revenue in Fluxo’s IT sector history by managing more than 10 employees on over 15 projects and establishing a result-driven culture.
  • Made projects to be more flexible by implementing Scrum and RUP, leading to increased client satisfaction.
  • Undergraduate Researcher - NTT (UFRJ) Feb 2012 - Dec 2012

    Under Professor Alexandre Evsukoff
  • Increased possible dataset size and reduced computation time of existing Matlab text-mining spectral analysis algorithm by redesigning it in Python.
  • Mentioned on prestigious newspaper by applying the terms clustering algorithm on a set of records from a famous trial in a Brazilian Federal Court.
  • Professional skills

    Worked as both developer and manager, having a deep understanding of the software life cycle, presentation techniques, process modeling, UML diagrams and scrum methodology.

    Software

    • Hadoop
    • mrjob
    • Spark
    • Git
    • Numpy
    • OpenCV
    • NLTK
    • Lasagne (Theano)
    • Knime
    • Weka
    • MySQL
    • Postgres
    • Cassandra
    • RedShift

    Languages

    • Python (Expert)
    • Java (Proficient)
    • Scala (Coursework)
    • Javascript (Coursework)
    • C/C++ (Coursework)
    • PHP (Coursework)
    • Matlab (Coursework)

    Education

    Carnegie Mellon UniversityAug 2015 - Dec 2016

    MS in Intelligent Information Systems student

    GPA: 3.7

    Cornell University Aug 2013 - May 2014

    One-year Non-Degree study program as an international student in Computer Science.

    GPA: 3.6

    Rio de Janeiro’s Federal University (UFRJ)Mar 2009 - Jun 2015

    B.S.E in Computer and Information Engineering

    GPA: 7.6/10.0 - Ranked #5 in class

    Projects

    Wikipedia QA System (2016) Mar 2016 - April 2016

  • Created system from scratch to generate and answer questions based on a set of given Wikipedia articles by applying NLP/IR techniques. Python, NLTK, Stanford coreNLP, NodeBox linguistic, CLiPS pattern
  • Complete Search Engine System Jan 2016 - April 2016

  • Implemented a search engine on top of Lucene index with Boolean, Indri and BM25 retrieval models using relevance feedback, query expansion and learning to rank features. Java, Apache Lucene
  • Recomendation System for Yelp Reviews Mar 2016 - Apr 2016

  • Implemented Multinomial Regularized Logistic Regression for predicting Yelp reviews achieving over 61% accuracy and less than 0.75 RMSE on test set. Ranked top5 of the class. Training set had more than 1M reviews. Python, Numpy, Scipy, sklearn
  • Recomendation System for Netflix Reviews Mar 2016 - Apr 2016

  • Implemented KNN and Probabilistic Matrix Factorization Based collaborative filtering recommender systems to work on the Netflix dataset. Achieved RMSE lower than 1. Python, Numpy, Scipy, sklearn
  • Link Analysis Mar 2016

  • Implemented PageRank, Personalized Topic Sensitive PageRank and Query-Based Topic Sensitive PageRank for improving document retrieval performance. Scipy, Numpy, Python
  • K-Means/KPP for Text Clustering Feb 2016

  • Implemented K-means and KPP algorithms for text clustering on 1000 documents dataset.Scipy, Numpy, Python
  • Animus - Your Personal Media Virtual Station Jan 2016

  • Project consists on creating a virtual machine that aggregates Speech, Computer Vision and NLP tools to analyze personal data. Information retrieval, face and scene detection, Speech Recognition and Natural language understanding are some of the possibilities of the project.
  • Pubmed Central Topic Visualization Dec 2015

  • Analyzed over 80 GB of text data from Pubmed Central publication corpus, preprocessing documents (stopwords removal, tokenization, stemming, POS tagging) in order to identify co-occurrences of topics related to user input. Coded TF-IDF on Hadoop, persisted intermediate steps to Cassandra and allowed close to real-time computation given user query. From the result, generated dynamic visualizations for users to make informed decisions on which articles to read. Python, Java, Javascript, NLTK, Hadoop, Cassandra, D3, joblib, flare.json
  • Video Sequence LearningOct 2015 - Dec 2015

  • Experimented with different techniques in order to recreate original sequence from shuffled set of video frames. Tested different regularization techniques(L1 and L2), cost functions(cosine similarity, euclidean and manhattan distances, MSE) and machine learning approaches (metric learning, feed-forward, recurrent(RNN) and long-short term memory (LSTM) neural networks) to recreate video sequencing from features extracted using pre-trained Convolutional Neural Network.
  • Achieved 49% next frame accuracy, improving from 32% baseline. Python, Theano, Lasagne, OpenCV, Theanets, Matlab
  • Cyber Attack Classification Sept 2015 - Nov 2015

  • Explored two big datasets (~6M instances in total) in order to detect and classify cyber attacks using network data. Project stages include data preparation, decision tree and random forest implementation in Java, migrating computation to Hadoop and Cassandra and integration to Spark using Scala. Final step was to create a non-technical presentation for the project. Java,Python, Scala, Spark, Cassandra, MS Excel
  • Plankton Image ClassificationOct 2015

  • Worked on 400MB of low resolution Plankton Images in order to classify instances into 121 different classes. Artificially increased training data for rarer classes by using image processing techniques, explored how different sets of features could increase performance and used different machine learning techniques to improve baseline from 40% to over 60% accuracy. Python, sklearn, OpenCV
  • Sentiment Classification of Movie Review Data Sept 2015

  • Coded Naive Bayes algorithm from scratch to classify movie reviews between positive and negative, achieving 100% accuracy on the training set and 85% accuracy on the test set using bigrams, negation handling and Laplacian Smoothing. Python
  • Capital One - Challenge Participant Sept 2015

  • Submitted proposal to Capital One for a small consulting project - "Use transaction data to categorize clients". Java
  • Traffic Camera Speed Tracker Mar 2015 - May 2015

  • Implemented Lucas-Kanade optical flow algorithm and designed an alternative to track car speed from traffic camera records. Python, OpenCV
  • Presentation Timer for Android Nov 2014 - Dec 2014

  • Developed an Android app to better time presentations, using colors and vibrations to identify sections and end of presentation without having to look at the phone or from distance.
  • Project created because I couldn't get used to existing solutions, so having a customizable tool would be interesting. Android, Java
  • TF-IDF Implementation on HadoopNov 2014 - Dec 2014

  • Implemented TF-IDF using MapReduce under Hadoop and analyzed the trend of important words on each chapter of the Bible. Java, Hadoop
  • Fast Convergence PageRank in Hadoop Mar 2014 - May 2014

  • Implemented simple and blocked PageRank algorithm using Hadoop on AWS. Java, Hadoop, AWS, MapReduce
  • Spanish League Performance Visualization Apr 2014

  • Developed Dynamic Data-Visualization Application using Javascript and D3 to analyze and compare performance of Spanish Teams during the season of 2012/2013. JavaScript, D3
  • Matchmaker for Independent Research – Cornell CS Department Aug 2013 - Dec 2013

  • Developed a J2EE web application on JSP framework along with HTML, CSS components which allowed the professors to search for students for their research projects, under Prof. Andrew Myers.
  • It also allowed the students to apply to the projects to which they had adequate skills for. Java, JSP, JPA, HTML, CSS, SQL
  • Graphs Library Sep 2012 - Feb 2013

  • Implemented a Graphs Library from basic algorithms like BFS and DFS to more advanced such as Prim's and Kruskal's MST algorithms, Dijkstra and travelling salesman problem with 2-opt.
  • Ranked best of the semester and top 5% of all times. C++
  • Additional Experience, Publications and Awards

    Visual Features For Context-Aware Speech RecognitionMay 2016

    To appear at ICASSP 2017.

    Audio-Based Multimedia Event Detection Using Deep Recurrent Neural NetworksSept 2015

    In Proc. ICASSP, Shanghai; China, March 2016. IEEE.

    Monitoramento de Desempenho usando Dados de Proveniência
    e de Domínio durante a Execução de Aplicações Científicas
    May 2015

    Awarded second best paper on XIV Workshop em Desempenho de Sistemas Computacionais e de Comunicação (WPerformance 2015)

    In English: "Real-time scientific computation performance evaluation using provenance and domain data"

    Full Scholarship Aug 2013 - May 2014

    Rewarded merit-based scholarship to study at Cornell University for one academic year by Brazilian Ministry of Education

    Best CoordinatorJun 2013

    Highest evaluation rate among Fluxo coordinators, 4.8 out of 5.0

    Software Developer VolunteerJan 2010 - May 2010

    Developed educational tools for underprivileged children

    Teaching AssistantAug 2009 - Dec 2009

    Assisted lead teacher for Python Programming Class

    © 2015 Curriculum Vitae All Rights Reseverd | Design by W3layouts