The argument is originally due to A. van Enter. Suppose that $k\leq d$ and $p>0$. For simplicity, let us consider the case $d=2$. The general case is similar but more elaborate (I presume).
First, observe that an active rectangular region with a protuberance along one of its sides is extended in a number of steps into a larger active rectangular region:
$\qquad\longrightarrow\qquad$

It follows that an active $n\times n$ square grows to an active $(n+2)\times(n+2)$ square provided there are four active sites on its boundary one along each of its sides.
$\qquad\longrightarrow\qquad$

Let us call a square region properly fenced if there are four active sites on its boundary one along each of its sides. So, we know that if an active $n\times n$ square is properly fenced, it will grow into an active $(n+2)\times(n+2)$ square. If the new square is also properly fenced, the active square will grow even larger, and so on. In particular, if a square is active and properly fenced and all (centered) squares surrounding it are also properly fenced, then every site on the lattice will eventually be active.

Let us call a $(2n+1)\times(2n+1)$ square centered at a site $(i,j)$ critical if for every $m\geq n$, the $(2m+1)\times(2m+1)$ square centered at $(i,j)$ is properly fenced. In conclusion
($\star$)$\quad$ If there is a critical active square somewhere on the lattice, then
every site on the lattice will eventually be active.
In order to prove the claim, it is sufficient to show that with probability $1$, somewhere on the lattice there is a critical active square.
Let us first show that
($\star\star$)$\quad$ There is an $N$ sufficiently large such that with positive
probability, the $(2N+1)\times(2N+1)$ square centered at the origin is active
and critical.
Argument.
This is essentially a Borel-Cantelli argument.
Let $B_n$ denote the $(2n+1)\times(2n+1)$ square centered at the origin. Note that the probability that at the beginning $B_n$ is not properly fenced is no larger than $4(1-p)^n$. Therefore, for each $N$,
\begin{align*}
\mathbb{P}(\text{$B_N$ not critical}) &=
\mathbb{P}(\text{$B_n$ not properly fenced for some $n\geq N$}) \\
&\leq
\sum_{n\geq N} \mathbb{P}(\text{$B_n$ not properly fenced}) \\
&\leq
\sum_{n\geq N} 4(1-p)^n \;.
\end{align*}
Since $p>0$, the sum on the right-hand side converges, and moreover it is smaller than $1$ for sufficiently large $N$.
Therefore, if we choose $N$ sufficiently large, $\mathbb{P}(\text{$B_N$ critical})>0$. Since the event that $B_N$ is critical and the event that $B_N$ is active are independent, we obtain
\begin{align*}
\mathbb{P}(\text{$B_N$ active and critical}) &=
\mathbb{P}(\text{$B_N$ active})\times\mathbb{P}(\text{$B_N$ critical}) \\
&=
p^{(2N+1)^2}\mathbb{P}(\text{$B_N$ critical}) \\
&> 0
\end{align*}
for sufficiently large $N$.
$\square$
It follows from ($\star$) and ($\star\star$) that
($\clubsuit$)$\quad$ $\mathbb{P}(\text{every site will eventually be active})>0$.
Here is where the concept of ergodicity comes in.
Let $E\subseteq\{\square,\blacksquare\}^{\mathbb{Z}^d}$ denote the set of all configurations starting from which the bootstrap cellular automaton turns every site eventually active. As you mentioned, this event is invariant under translations: if we translate a configuration in $E$ we get another configuration in $E$ and if we translate a configuration outside $E$ we get another configuration outside $E$. This is because the bootstrap cellular automaton treats all sites in the same fashion.
In your model, the starting configuration is chosen using independent biased coin flips ($\blacksquare$ with probability $p$ and $\square$ with probability $1-p$).
The probability law of independent coin flips has the property that every translation-invariant event has probability either $0$ or $1$ [proof via basic measure theory]. This property is one of the definitions of what it means for a probability law to be ergodic for translations. As a corollary, $\mathbb{P}(E)$ is either $0$ or $1$. Since we already know from ($\clubsuit$) that $\mathbb{P}(E)>0$, we conclude that $\mathbb{P}(E)=1$.
Q.E.D.
Let me give you an alternative intuition why the concept of ergodicity came in. Clearly, the origin has no special role in ($\star\star$): if the $(2N+1)\times(2N+1)$ square centered at the origin is active and critical with probability $\varepsilon>0$, then for every site $(i,j)$, the probability that the $(2N+1)\times(2N+1)$ square centered at $(i,j)$ is active and critical is also $\varepsilon$. (This is again because the bootstrap cellular automaton does not distinguish between different frames of reference.)
Having the strong law of large numbers in mind, we may expect that
($\spadesuit$)$\quad$ Almost surely, the density of sites $(i,j)$ for which the
$(2N+1)\times(2N+1)$ square centered at the $(i,j)$ is active and
critical is $\varepsilon$.
This would have followed from the law of large numbers if the events for different sites were independent, but obviously these events are not independent. The ergodic theorem (a far-reaching generalization of the law of large numbers) and the ergodicity of the distribution of independent coin flips mentioned above imply that ($\spadesuit$) is indeed true. It follows that almost surely there is at least one (in fact infinitely many) critical active squares on the lattice.