I'm working on a problem 2.3a in Shalev-Shwartz/Ben-David's Machine learning textbook, which states:
An axis aligned rectangle classifier in the plane is a classifier that assigns 1 to a point if and only if it is inside a certain rectangle. Formally, given real numbers $a_1\leq b_1, a_2\leq b_2,$ define the classifier $h_{(a_1, b_1, a_2, b_2)}$ by $$ h_{(a_1, b_1, a_2, b_2)}(x_1, x_2) = \begin{cases}1&\textrm{if $a_1\leq x_1\leq b_1$ and $a_2\leq x_2\leq b_2$}\\ 0&\textrm{otherwise}\end{cases} $$ The class of all axis aligned rectangles in the plane is defined as $\mathcal{H}_\mathrm{rec}^2 = \{h_{(a_1, b_1, a_2, b_2)}:\textrm{$a_1\leq b_1$ and $a_2\leq b_2$}\}$...rely on realizability assumption. Let $A$ be an algorithm that returns the smallest rectangle enclosing all positive examples in the training set. Show that $A$ is ERM (empirical risk minimizer).
I don't understand why $A$ would be ERM, when the domain/instance space is infinite. Below is the counterexample I considered to this claim.
For simplicity, consider the one-dimensional case, where the rectangles would be intervals. Sample instance $x\sim \mathcal{U}_{[0, 1]}$ from a uniform distribution over $[0, 1]$, where the labeling function $f:[0, 1]\to\{0, 1\}$ is defined by $$ f(x) = \begin{cases} 1 &\textrm{x = 0 or x = 1}\\ 0 &\textrm{otherwise,} \end{cases} $$ then the realizability assumption holds as there exists an $h^*$ whose rectangle described by the singleton $\{1\}$ has an error set $\{0\}$ of measure $0$. However, in the case of a particular sample $$ S = \{0, 1/4, 1/2, 3/4, 1\}, $$ the smallest rectangle $A$ is the interval $[0, 1]$, and the consequent classifier $h_A$ has an empirical error of $L_S(h_A) = 3/5$. In contrast, the classifier $h^*$ has empirical error $L_S(h^*) = 1/5$, a lower error than the classifier described by $A$, even though it doesn't contain all positive examples. Therefore $A$ does not minimize empirical error.