Thomas Kojar has provided an answer and some references already, but here is an intuitive explanation, in order to stay in the same context/spirit as your question.
1) Why $\mathrm{d}x^2 \sim 0$ in the standard case ?
It is to be recalled that, in the standard case of a function depending on a non-stochastic variable, the differential $\mathrm{d}f(x)$ is a somewhat "halfway unfinished computation" of the derivative $f'(x)$, so that
$$
\frac{\mathrm{d}f}{\mathrm{d}x} = f'(x) + \frac{1}{2}f''(x)\,\mathrm{d}x + \ldots,
$$
with all terms except for the first one vanishing when $\mathrm{d}x \rightarrow 0$. The confusing ambiguity comes from the fact that usually the limit is already contained implicitly inside the "d" notation; here, the notation is a little bit abused by using $\mathrm{d}f$ and $\mathrm{d}x$ before taking the limit.
2) How to treat $\mathrm{d}X_t^2$ in the stochastic case and why $\mathrm{d}B_t\mathrm{d}t \sim 0$ ?
In contrast, when the independent variable $x$ is random, namely $X_t$, then (some) terms inside $\mathrm{d}f(X_t)$ coming from $\mathrm{d}X_t^2$ cannot be ignored, because $\mathbb{E}[\mathrm{d}B_t^2] = \mathrm{d}t$. It is to be noted that another abuse of notation $-$ in a way, stochastic calculus is full of formalized abuses of notation $-$ is usually made by dropping the average, i.e. $\mathrm{d}B_t^2 = \mathrm{d}t$.
As before, higher-order terms are again considered as negligible in the limit $\mathrm{d}t \rightarrow 0$, because they would vanish when computing a yet-to-be-formalized derivative $\frac{\mathrm{d}f(X_t)}{\mathrm{d}t}$ (with $\frac{\mathrm{d}B_t}{\mathrm{d}t}$ being interpreted as a white noise in that case). In consequence, the $o(\mathrm{d}t)$ terms, i.e. all the supralinear terms with respect to $\mathrm{d}t$, are (implicitly) cut from the initial Taylor expansion. In that point of view, $\mathrm{d}B_t$ is kept because $\mathrm{d}B_t = \mathcal{N}(0,\mathrm{d}t) = \mathcal{N}(0,1)\sqrt{\mathrm{d}t} \sim \mathrm{d}t^{1/2}$, but $\mathrm{d}B_t\mathrm{d}t$ is ruled out because $\mathrm{d}B_t\mathrm{d}t \sim \mathrm{d}t^{3/2} = o(\mathrm{d}t)$.
Final remark
All the above developments are valid when the stochastic process $X_t$ is made of a deterministic component, represented by the drift term $a_t\mathrm{d}t$, and a random phenomenon modelled by a normal distribution, represented here by the gaussian noise $b_t\mathrm{d}t$. When the random event in question is not gaussian, for example in the case of a Poisson process, then you will need to adapt and rederive Itô's lemma, because the relation $\mathrm{d}B_t^2 \sim \mathrm{d}t$ is not true anymore.