An applet demonstrating the version space of a linear classifier. (Get the source for this applet.)

The left half of the applet shows example space (x,y). Click in the left half of the applet to place a positive training example at the clicked position. Control-click to place a negative training example. Shift-click to remove training examples.

The right half of the applet shows weight space (v,w): our classification rule is sign(v x + w y - 1). Each training example (x,y) rules out a halfspace, namely, those weight vectors (v,w) which misclassify it. Click in the right half of the applet to pick a weight vector. Your weight vector will show up as a point in weight space (on the right) and a half-plane in example space (on the left). Control-clicking places a negated weight vector, so that positive examples are on the opposite side of the classification surface (ie, classify by sign(1 - v x - w y) instead).

Version space is the set of weight vectors that classify all examples correctly, i.e., the region of weight space that satisfies all of our constraints. If all of weight space is some shade of red or pink, try control-clicking on the right half of the applet; this will change the sign of the weight vector and might reveal a correct classifier.

As the slides describe in more detail, support vector machines find weight vectors with large margins. The margin is shown as a “fat plane” around the decision surface in the left half of the applet, and as multiple “fat planes” around the examples in the right half. To avoid clutter, in the right half, we only display the margin around the examples that are closest to the weight vector.

If the examples are all nearly unit norm, then the maximum-margin classifier is near the ball center of version space. (The ball center is the point that maximizes the minimum distance from a constraint, i.e., the center of the largest ball that fits inside version space.) The required margin for each point scales according to the inverse of its norm, so if the examples have different norms, the maximum-margin classifier may not be near the center of version space. In any case, note that the “center” of version space can mean something weird if version space is unbounded.

Weight vectors near the center of version space tend to be better choices because we learn more from their classification errors: if we make a mistake, the new constraint must chop off a large volume of version space. This fact can be seen as one justification for why SVMs work well.