Proof of the PAC generalization error bound using VC dimension

Question

There is a theorem in PAC ("Probably Approximately Correct" model, in computational learning theory) that reads as follows:

To guarantee that any hypothesis that perfectly fits the training data is probably (with probability $1-\delta$) approximately correct (with generalization error $\varepsilon$) we need to collect at least: $$m \geq \frac{1}{\varepsilon}\left(4 \log_2\frac{2}{\delta}+8 \operatorname{VC}(H)\log_2 \frac{13}{\varepsilon} \right)$$ where $m$ is the number of training examples and $\operatorname{VC}(H)$ is the Vapnik—Chervonenkis dimension.

I was not able to find a place where the proof of this theorem is presented and explained. Please may somebody point me to such a place, or provide here an sketch of the proof?

Clement C. · Answer 1 · 2016-02-13T19:09:29.967

This is proven for instance in An Introduction to Computational Learning Theory, by M. Kearns and U. Vazirani (MIT Press, 1994). Up to the exact constants, what you ask seems to be covered by Theorem 3.3, in Chapter 3, which states that to learn a concept class $\mathcal{C}$ a consistent hypothesis will work with this many samples: $$ O\!\left(\frac{d}{\varepsilon}\log\frac{1}{\varepsilon}+\frac{1}{\varepsilon}\log\frac{1}{\delta}\right) $$ where $d$ is the VC dimension of $\mathcal{C}$. A lower bound of $\Omega(\frac{d}{\varepsilon})$ is shown in Theorem 3.5. (Note that there is still a logarithmic gap between upper and lower bound.)

Since you use $\mathcal{H}$ in your question, I suppose you are also interested in the case where the algorithm uses a hypothesis (representation) class $\mathcal{H}$ to learn the concept class $\mathcal{C}$. In this case, this is Theorem 3.4 of the same chapter, with the same expression for the suficient sample complexity (but with $d$ being the VC dimension of $\mathcal{H}$, not $\mathcal{C}$).

The tight sample complexity bound seems to have been resolved recently by S. Hanneke, in "The Optimal Sample Complexity of PAC Learning," 2015. (abs/1507.00473). This establishes that the sample complexity of PAC-learning a concept class $\mathcal{C}$ to error $\varepsilon$ with failure probability at most $\delta$ is $$ \Theta\!\left(\frac{d}{\varepsilon}+\frac{1}{\varepsilon}\log\frac{1}{\delta}\right) $$ where $d = \operatorname{VC}(\mathcal{C})$.

Many thanks for your answer. Will try to get the book, although it may take me some time (and money). I have just downloaded the article. — edel, Feb 14 '16 at 20:48
If you have access to a university library, either it'll be there or you could find a pdf version on the library online repository. — Clement C., Feb 14 '16 at 21:49

Proof of the PAC generalization error bound using VC dimension

1 Answers1

Linked