Exploratory Learning: Semi-supervised Learning in the Presence of Unanticipated Classes
Thesis document [PDF]
Traditional semi-supervised learning (SSL) techniques consider the missing labels of unlabeled datapoints as latent/unobserved variables, and model these variables, and the parameters of the model, using techniques like Expectation Maximization (EM). We consider two extensions to traditional SSL methods which make it more suitable for many Automatic Knowledge Base Construction tasks.
First, we consider jointly assigning multiple labels to each instance, with a flexible scheme for encoding constraints between assigned labels: this makes it possible, for instance, to assign labels for multiple levels from a hierarchy.
Second, we account for another type of latent variable, in the form of unobserved *classes*. In open-domain web-scale information extraction problems, it is an unrealistic assumption that the class ontology or topic hierarchy we are using is complete. Our proposed framework combines structural search for the best class hierarchy with SSL, reducing the semantic draft associated with
erroneously grouping unanticipated classes with expected classes.
Together, these extensions allow a single framework to handle a large number of knowledge extraction tasks,
including macro-reading, micro-reading, multi-view macro- or micro-reading, alignment of KBs to
wikipedia or on-line glossaries, and ontology extension.
- Exploratory Learning , Bhavana Dalvi Mishra, William W. Cohen and Jamie Callan, ECML/PKDD 2013
[Code: zipped folder]
- Automatic Gloss Finding for a Knowledge Base using Ontological Constraints, Bhavana Dalvi Mishra, Einat Minkov, Partha Pratim Talukdar, and William W. Cohen, WSDM 2015 (Acceptance rate: 16.8%)
[Slides] (presented in LTI SRS 2014),
[Dataset: link], [Code: zipped folder], Example glosses matched by our method: link
- Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies, Bhavana Dalvi Mishra, Aditya Mishra and William W. Cohen, WSDM 2016 (Acceptance rate: 18.2%) [PDF],
- Multi-View Hierarchical Semi-supervised Learning by Optimal Assignment of Sets of Labels to Instances, Bhavana Dalvi Mishra, and William W. Cohen, In preparation. [Draft], [Dataset: link]
- WebSets: Extracting Sets of Entities from the Web Using
Unsupervised Information Extraction, Bhavana Dalvi, William W. Cohen and Jamie Callan, Proceedings of the The Fifth ACM International Conference on Web Search and Data Mining, WSDM 2012 (Acceptance rate: 20.7%)
[Datasets and Evaluation: link]
- Multi-view Exploratory Learning for AKBC Problems, Bhavana Dalvi Mishra and William W. Cohen, in Proceedings of AKBC 2014, 4th Automatic Knowledge Base Construction workshop at NIPS 2014.
- Classifying Entities into an Incomplete Ontology , Bhavana Dalvi Mishra, William W. Cohen and Jamie Callan, in Proceedings of AKBC 2013, 3rd Knowledge Extraction workshop at CIKM 2013.
Bhavana Dalvi, Prof. William Cohen, Prof. Jamie Callan, Prof. Einat Minkov, Prof. Partha Pratim Talukdar, and Prof. Aditya Mishra.
For any questions or comments regarding these papers and resources please contact the following author :
Name : Bhavana Dalvi
Email: bhavana DOT dalvi AT gmail DOT com
Webpage : link