Let $(B_t)_{t \geq 0}$ be a Brownian motion. Let's forget for the moment everything we already know about stochastic integrals and start at the very beginning of the story:
We would like to define a stochastic integral $\int \dots \, dB_r$ with respect to Brownian motion. Which properties do we expect the stochastic integral to have?
Wish Nr. 1: The stochastic integral should be an additive mapping, i.e. $$\int_0^t (G(r)+H(r)) \, dB_r = \int_0^t G(r) \, dB_r + \int_0^t H(r) \, dB_r$$ whenever we can make sense of the integrals.
Wish Nr. 2: We would like to have $$\int_0^t dB_r = B_t-B_0$$.
Wish Nr. 3: .... is some sort of pull out property. If a random variable $\xi$ does not depend on the values of $(B_r)_{r > s}$, then $$\int_s^t \xi \, dB_r = \xi \int_s^t \, dB_r$$ ("pull out what is known").
If we combine these three properties, we find that the stochastic integral of a simple function of the form
$$H(r,\omega) = \sum_{j=1}^n \xi_j(\omega) 1_{[t_{j-1},t_j)}(r)$$
(where $t_0< \ldots < t_n \leq T$ and $\xi_j$ is $\mathcal{F}_{t_{j-1}}$-measurable) is given by
$$\int_0^T H(r) \, dB_r = \sum_{j=1}^n \xi_j (B_{t_j}-B_{t_{j-1}}) = \sum_{j=1}^n H(t_{j-1}) (B_{t_j}-B_{t_{j-1}}). \tag{2}$$
So far all the integrals can be interpreted as pointwise ($\omega$-wise) integrals, and, moreover, we have a nice interpretation for the integrals.
The bad news: Defining the stochastic integral
$$\int_0^t H(r) \, dB_r$$
as a pointwise limit of Riemann sums of the form $(2)$ works only for a very small class of integrands $H$; this is closely related to the fact that the Brownian motion has almost surely unbounded variation (see this answer for more details). For instance if $H:[0,1] \to \mathbb{R}$ is a continuous mapping, we cannot expect that the limit of the Riemann sums
$$\lim_{n \to \infty} \sum_{j=1}^n H((j-1)/n) (B_{j/n} -B_{(j-1)/n})$$
exists with probability $1$.
The good news: We can simply consider a different type of convergence. Almost sure convergence of a sequence is a pretty strong assumption, and therefore it is quite natural to consider weaker forms of convergence. The most natural choice is clearly convergence in probability (or, alternatively, convergence in $L^2$). It turns out that the limit the Riemann sums converge very nicely in probability, and this allows us to define the stochastic integral for a large class of integrands. Some nice things about this definition:
Almost sure convergence implies convergence in probability. This means that if we have an integrand $H$ for which the integral $\int_0^t H(r) \, dB_r$ can be defined pathwise as the limit of Riemann sums, then both notions coincide.
If $(H_t)_{t \geq 0}$ is a continuous adapted process, then $$\sum_{j=1}^n H(t_{j-1}) (B_{t_j}-B_{t_{j-1}}) \stackrel{n \to \infty}{\to} \int_0^t H(r) \, dB_r \quad \text{in probability}$$ for any sequence $\Pi_n = \{0=t_0 < \ldots < t_n = t\}$ of partitions of $[0,t]$ with mesh size converging to $0$. Note that this implies in particular that there exist a subsequence which converges almost surely.
- The same construction works for a much larger class of driving processes; for instance we can easily replace $(B_t)_{t \geq 0}$ by a martingale with continuous sample paths.
I hope that this answer gave you some insight why
... Riemann sums pop up in the Itô theory although the Itô integral is not defined as a Riemann integral (namely, because we want to have $(2)$ for simple functions)
... there is nothing magic about it. Pointwise convergence does not work, but by considering a weaker form of convergence we can get rid of this problem and obtain a nice stochastic calculus for a large class of integrands.