Dishan Gupta दिशान गुप्ता
Language Technologies Institute
School of Computer Science
Carnegie Mellon University
Office: 5716 Gates-Hillman Complex (map)
New malware classes constantly emerge, and the accuracy of labelling them as malicious or safe depends highly on the expertise of the annotator. Proactive machine learning provides a good solution to this problem by jointly optimizing the annotator-instance selection, as different annotators may be adept at discovering different malware classes. Previous proactive learning models assume the expertise to be constant over time, but given the limited availability of real data in this domain such an assumption can adversely effect the predictions a lot. Our aim is to develop a new model that incorporates the time varying aspect of annotator expertise.
The Near-Synonym System (NeSS) uses an unsupervised corpus-based conditional model for finding synonyms and near-synonyms at the phrasal level. It does so by using a novel probabilistic scoring function known as the Shared Feature Gain, which essentially works on the principle that the more instances of common context, the more specific said context, and the longer the shared context, the stronger the synonymy relationship between a given pair of phrases.
Studying the propagation of information (contagion) through a network is important in a variety of tasks such as viral marketing, and the spread of infectious diseases. Typically, in such situations time-stamp data is given, but information about the underlying network structure and therefore, the maximally influential nodes in it is not. Previous attempts, have studied the network inference and influence estimation problem separately. We studied both the aspects in an integrated manner using a greedy approach. [PDF] [Slides]
Low-level hand-engineered features like SIFT, SURF and HOG are not robust, invariant and are in general difficult and time-consuming to extend to the video domain. Furthermore, research has shown that there is no universal set of hand-engineered features for all datasets. To improve on this, we used a deep convolutional neural network to extract both high-level and low-level features from the video in an unsupervised manner, before classifying the events. [PDF] [Poster]