Search engines have come a long way, but searching for images is still primarily restricted to meta information such as keywords as opposed to the images’ visual content. My thesis introduces a new form of interaction for image retrieval, where the user can give rich feedback to the system via semantic visual attributes (e.g., “metallic”, “pointy”, and “smiling”). The proposed WhittleSearch approach allows users to narrow down the pool of relevant images by comparing the properties of the results to those of the desired target.
Building on this idea, I develop a system-guided version of the method which actively engages the user in a 20-questions-like game where the answers are visual comparisons. This enables the system to obtain that information which it most needs to know. To ensure that the system interprets the user’s attribute-based queries and feedback as intended, I further show how to efficiently adapt a generic model for an attribute to more closely align with the individual user's perception.
My work transforms the interaction between the image search system and its user from keywords and clicks to precise and natural language-based communication. I demonstrate the dramatic impact of this new search modality for effective retrieval on databases ranging from consumer products to human faces. This is an important step in making the output of vision systems more useful, by allowing users to both express their needs better and better interpret the system’s predictions.
Adriana Kovashka is a PhD candidate in the Department of Computer Science at The University of Texas at Austin. Her advisor is Professor Kristen Grauman. Adriana received her B.A. in Computer Science and Media Studies from Pomona College, CA, in May 2008. Her research interests primarily lie in computer vision, with some overlap in machine learning, information retrieval, natural language processing, and human computation. Her focus is on enhancing the communication between computer vision systems and their human users, particularly for image retrieval. Her research has been published in the top computer vision conferences, such as Computer Vision and Pattern Recognition (CVPR) and the International Conference on Computer Vision (ICCV), as well as the annual conference of the Association for Computational Linguistics (ACL).
Host: Kris Kitani
kkitani [atsymbol] cs.cmu.edu