This is proven for instance in An Introduction to Computational Learning Theory, by M. Kearns and U. Vazirani (MIT Press, 1994).
Up to the exact constants, what you ask seems to be covered by Theorem 3.3, in Chapter 3, which states that to learn a concept class $\mathcal{C}$ a consistent hypothesis will work with this many samples:
$$
O\!\left(\frac{d}{\varepsilon}\log\frac{1}{\varepsilon}+\frac{1}{\varepsilon}\log\frac{1}{\delta}\right)
$$
where $d$ is the VC dimension of $\mathcal{C}$. A lower bound of $\Omega(\frac{d}{\varepsilon})$ is shown in Theorem 3.5. (Note that there is still a logarithmic gap between upper and lower bound.)
Since you use $\mathcal{H}$ in your question, I suppose you are also interested in the case where the algorithm uses a hypothesis (representation) class $\mathcal{H}$ to learn the concept class $\mathcal{C}$. In this case, this is Theorem 3.4 of the same chapter, with the same expression for the suficient sample complexity (but with $d$ being the VC dimension of $\mathcal{H}$, not $\mathcal{C}$).
The tight sample complexity bound seems to have been resolved recently by S. Hanneke, in "The Optimal Sample Complexity of PAC Learning," 2015. (abs/1507.00473). This establishes that the sample complexity of PAC-learning a concept class $\mathcal{C}$ to error $\varepsilon$ with failure probability at most $\delta$ is
$$
\Theta\!\left(\frac{d}{\varepsilon}+\frac{1}{\varepsilon}\log\frac{1}{\delta}\right)
$$
where $d = \operatorname{VC}(\mathcal{C})$.