Graph Partitioning and Matching Problems in Computer Vision

Jitendra Malik (
Computer Science Department, UC Berkeley

Download slides here (gzipped PostScript file).


Much progress has been made in the areas of reconstruction and visual control. The problem of segmentation (grouping) is currently at an intermediate stage leaving ample room for progress; however, even this is too strong a statement with respect to the area of recognition (classification). If progress is to be made in this area, chances are that it will occur through a vastly different approach then the ones currently being researched. This raises an interesting question: do these problems (especially the last) require computational power equivalent to that of humans.


The statistical factors that lead to grouping (segmentation) were derived primarily from psychological studies.


Although the techniques described on this page, such as gradient based edge detection, are useful, these are essentially local operations; whereas, it seems that global statistics should be the goal.


Current global techniques are mainly probablistic; that is, the image is treated as a signal with noise.


This is a global approach.


The weights are determined a priori, and from a computer vision perspective, the selection of a good weight function is the meat of the problem. As pointed out in the previous slide, the weight function should account for visual cues. w_ij should represent the probability that nodes i and j belong to the same region. In the image, the thickness of each edge represents its weight.


The idea is to partition the nodes (pixels) of the graph by finding some sort of cut, relative to the weight function, whose shores represent nodes (pixels) of similar objects, although it seems that a minimum cut is not appropriate for this task.


The normalized cut yields more balanced shores than the minimum cut.


The minimum Ncut is equivalent to the maximum Nassoc (NP-hard on a regular grid).


x is an incidence vector of the nodes in A, a shore of the cut.


The dynamical system interpretation stems from the considering each pixel as a point mass connected to each of its neighbors by a spring (whose constant, k_ij = d_ij-w_ij).