Joint Aligning and Cosegmenting Multiple Photo Streams
Matlab demo toolbox will be available around conference date.
Updated around CVPR conference.
Motivation of Research
Suppose that we query and download millions of photo streams associated with the keyword scuba diving from the photo sharing site Flickr. Obviously, the photo streams are neither aligned nor calibrated since they are taken in different temporal, spatial, and personal perspectives. However, at the same time, they are likely to share common storylines consisting of sequences of events and activities repeatedly recurred across the scuba+diving photo streams (e.g. riding a boat, wearing equipment, underwater diving, and so on).
Our challenging goal is to build such collective storylines from the photo streams of millions of users. In this paper, as a first technical step, we propose a method to jointly perform alignment of multiple photo streams and cosegmentation of aligned images, as shown in the figure below. In the alignment step, images of different photo sets are matched based on visual contents and associated meta-data. In the cosegmentation step, the aligned images are segmented together in order to facilitate image understanding such as pixel-level classification in the images. We close a loop between the two tasks so that solving one task helps enhance the performance of the other in a mutually rewarding way.
We design a scalable message-passing based optimization framework to jointly achieve both tasks for the whole input image set at once. Please see the details in the paper. For evaluation, we collect about 1.5 millions of images of 13 thousands of photo streams regarding 15 outdoor recreational activities from Flickr.
We proposed a scalable approach to jointly aligning and segmenting multiple uncalibrated Web photo streams of different users in an unsupervised and bottom-up way. The empirical results assured that our method can be a key component to achieve our ultimate goal: inferring collective photo storylines from Web images, which is a next direction of our future work.