VC dimension
From Free net encyclopedia
The VC dimension (for Vapnik Chervonenkis dimension) is a measure of the capacity of a statistical classification algorithm. It is one of the core concepts in Vapnik Chervonenkis theory. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis.
Informally, the capacity of a classification model is related to how complicated it can be. An example could be thresholding a high-degree polynomial, where if the polynomial evaluates above zero, that point is classified into a positive class, negative otherwise. If a very high-degree polynomial is used, it can be very wiggly, and can fit a training set exactly. But, one can expect that outside of the training points, the classifier would not generalize well, because it is too wiggly. Such a polynomial has a high capacity. A much simpler alternative is to threshold a linear function. This polynomial may not fit the entire training set, because it has a low capacity. This notion of capacity can be made more rigorous.
First, one needs the concept of shattering. Consider a classification model <math>f</math> with some parameter vector <math>\theta</math>. The model <math>f</math> can shatter a set of data points (<math>x_1,x_2,\ldots,x_n</math>) if, for all assignments of labels to those data points, there exists a <math>\theta</math> such that the model <math>f</math> makes no errors when evaluating that set of data points.
For example, consider a straight line as the classification model: the model used by a perceptron. The line should separate positive data points from negative data points. When there are 3 points, the line can shatter them. However, the line cannot shatter four points. It is important to remember that one can choose the arrangement of points, but then cannot change it as the labels on the points are permuted. Note, only 3 of the 8 possible permutations are shown for the 3 points.
Image:VC1.png | Image:VC2.png | Image:VC3.png | Image:VC4.png |
3 points shattered | 4 points impossible |
This is where it is appropriate to define a mathematical notion of capacity, called the VC dimension. The VC dimension of a model <math>f</math> is the maximum <math>h</math> such that some data point set of cardinality <math>h</math> can be shattered by <math>f</math>.
The VC dimension has utility in statistical learning theory, because it can predict a probabilistic upper bound on the test error of a classification model.
The bound on the test error of a classification model (on data that is drawn i.i.d. from the same distribution as the training set) is given by
- Training error + <math>\sqrt{h(\log(2N/h)+1)-\log(\eta/4)\over N}</math>
with probability <math>1-\eta</math>, where <math>h</math> is the VC dimension of the classification model, and <math>N</math> is the size of the training set.
References and sources
- Andrew Moore's VC dimension tutorial
- V. Vapnik and A. Chervonenkis. "On the uniform convergence of relative frequencies of events to their probabilities." Theory of Probability and its Applications, 16(2):264--280, 1971.
- A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. "Learnability and the Vapnik-Chervonenkis dimension." Journal of the ACM, 36(4):929--865, 1989.