This is essentially because $S^2$ is a $C^1$ manifold, and as such admits a $C^1$ partition of unity. Indeed, if $F:S^2\to \mathbb{R}^3$ is a continuous function, we can take an open cover $\mathcal{U}$ of $S^2$ such that on each element of the cover the value of $F$ does not vary too much, say on $U\in\mathcal{U}$ $F$ takes values that are within a $\delta$ of $F_U\in\mathbb{R}$. Then taking a $C^1$ partition of unity $\lambda_\bullet:\mathcal{U}\to C^1_{\text{loc}}(S^2;\mathbb{R}^2)$ subordinate to $\mathcal{U}$ we have that $\sum_{U\in\mathcal{U}} F_U\lambda_U$ is a $C^1$ function that is within an $\epsilon=\epsilon(\delta)$ of $F$.
Since $S^2$ is compact we may take $\mathcal{U}$ to get an explicit dependency of $\epsilon$ on $\delta$, but compactness is not necessary. Indeed we have the following more general density result:
Theorem: Let $r\in\mathbb{Z}_{\geq1}\cup\{\infty\}$, $M$ be a $C^r$ manifold (of finite dimension($\dagger$), but not necessarily compact). Then $C^r(M;\mathbb{R}^d)$ is dense in $C^0(M;\mathbb{R}^d)$, when the latter is endowed with strong topology.
Recall that strong ( = fine = Whitney) topology coincides with weak ( = compact-open = topology of uniform convergence on compact subsets) topology.
(See Equivalence of Definitions of Hirsch and Wall of Strong $C^r$-topologies for the definition of strong topology and Hirsch's Differential Topology vs Rudin Functional analysis definition of weak and strong topology. for the definition of weak topology.)
This is Theorem 2.2 in Hirsch's book Differential Topology, p.44; see there for details. One can e.g. also replace the target with an anonymous manifold, and replace $0$ with with a positive integer (in which case one would need convolutions).
($\dagger$) There is also an infinite dimensional version of this, assuming the manifold is separable and the local model is a Banach space with a certain "$C^r$ extension property" (a Banach space $B$ has the $C^r$ extension property if there is a $C^r$ function $\Phi:B\to \mathbb{R}$ (with finite $C^r$ norm) that vanishes outside the unit ball at $0$ and is $1$ on a smaller ball at $0$). For this see Ruelle's Elements of Differentiable Dynamics and Bifurcation Theory, p.144, which is a special case of a theorem from Bonic & Frampton's paper "Smooth Functions on Banach Manifolds".