On Multiple Foreground Cosegmentation
Figure 3. Examples of apple+picking and cow+pasture groups in FlickrMFC dataset.
Motivation of Research
This project aims at cosgementing a general user's photo stream. The following figure shows sampled images from an apple+picking photo stream of Flickr, which follow an ordinary user's photo-taking pattern.
For example, two girls (in red and blue), one baby, and apple buckets repeatedly but irregularly appear in the photo stream. Unfortunately, such a content-misaligned set of images would not be explicitly addressed yet by existing cosegmentation algorithms.
Figure 1. Motivation for multiple foreground cosegmentation.
Our approach alternates between two subtasks: foreground modeling, and region assignment. The foreground modeling step learns the appearance models of K foregrounds and the background, which can be accomplished by using any existing region classifiers or their combinations. Our major technical novelty lies in the region assignment step, which is formulated as welfare maximization in combinatorial auction. We first oversegment each image into a set of multiple small segments. In analogy, given a set of segments (items), the foreground models (bidders or buyers) submit a set of foreground candidates (package bids) that they want to take. Then, a small number of feasible foreground candidates are chosen among all submitted ones in order to maximize the overall values. The general welfare maximization (or winner determination) problem is NP-complete and inapproximable. In this paper, we propose a tractable method by leveraging structural properties that are commonly observed in the image space. Please see the details in the paper or CVPR oral slides.
Some segmentation examples of Flickr and ImageNet datasets are as follows.
Figure 2. Examples of multiple foreground cosegmentation on Flickr and ImageNet datasets.
We believe that this mid-level combinatorial optimization is very promising for image segmentation. Instead of relying on complicated ML algorithms or high-order models, we can just enumerate the cases (or hypotheses) of best segmentation not in a brute-force but in a smart way, by using state-of-art region descriptors and classifiers. Our previous paper in ICCV 2011, based on submodular optimization, lies in the same line of thought.