Image Matching in Large Scale Indoor Environment


Hongwen (Henry) Kang
Alexei Efros
Takeo Kanade
Martial Hebert


Set of images and features from images in an indoor environment for testing image matching ZIP file (Warning: over 800Megs!).


Different from outdoor scenes, indoor environment is full of self-repetitive structures, which poses a big issue for image matching algorithms. We study this unique problem and propose a novel image matching algorithm, named Re-Search, that is designed to cope with self-repetitive structures and confusing patterns in the indoor environment. This algorithm uses state-of-art image search techniques, and it matches a query image with a two-pass strategy. In the first pass, a conventional image search algorithm is used to search for a small number of images that are most similar to the query image. In the second pass, the retrieval results from the first step are used to discover features that are more distinctive in the local context. This framework is first implemented in our system using the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme of vector space model in information retrieval.

We demonstrate and evaluate the effectiveness of the Re-Search algorithm in the context of indoor localization scenario, where we want to localize a user based on an image that is captured from the user's point of view. For the testing set, we captured one set of images with rich and distinctive visual structures, we call this set the "clean set". It acts as a control set, and measures how our system performs under normal condition. Also, we captured a much more challenging set of images that captured more detailed part of the scene, or scenes with objects that could easily be found somewhere else, such as doors. We call this set the "confusing set". (Check our paper for more details on the experiment setup.)

Given an image that is captured from the user's point of view, the location of the user is determined by finding images that are matched with the input image.
Some examples of the testing "clean set" and the "confusing set".
The Precision-Recall curve on the "clean set". The Precision-Recall curve on the "confusing set"

We achieve almost perfect performance the "clean set", which demonstrates the practical usefulness of our system in indoor localization. What is of more interest is to test the robustness of the system in handling the "confusing" situation, because this is the scenario in which a user will rely on the system the most. We can see that compared to the performance in the "clean set", initial search (baseline) performance degrades severely, while after using the Re-Search algorithm we still achieved about 85% precision at 80% recall, with a maximum of about 15% gain in precision compared to initial search result.

Our ongoing work includes efficient indexing and search, as well as potential applications in data-driven object pop-out and data-driven zoom-in.

Data-driven object pop-out by matching the input image with large number of images captured beforehand.
The user selects part of the image (in red rectangle) that she wants to see more clearly; our program generates a magnified view using the image that is matched to the input image, but with a much more closed-up view of the selected region.


H. KangA. A. Efros, M. Hebertand T. KanadeImage Matching in Large Scale Indoor Environment  IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Egocentric Vision, June, 2009.


This research is supported by:

Copyright notice