For your first question, whether a function is "scalar-valued" or not doesn't depend on the coordinate system of the domain. Any function that evaluates to a value in the underlying field, in this case $\mathbb{R}$, is scalar-valued. Unless explicitly stated otherwise though, people usually assume Cartesian coordinates with the standard basis vectors when they are referencing $\mathbb{R}^n$.
Your second question takes a little more work to answer but the short answer is, yes, as it is typically defined, the gradient is specified in terms of Cartesian coordinates but there is a much better approach. I have recently been working on material that is directly related to this. See, for example, this question I recently posed. The point there was the following:
If $f:X\subset \mathbb{R}^n \rightarrow \mathbb{R}$ then the derivative of $f$ at $x_0$, $df_{x_0}$, is a linear function $df_{x_0}:\mathbb{R^n} \rightarrow \mathbb{R}$ By the Riesz representation theorem the exists a unique vector in $\mathbb{R}^n$, which we denote by $\nabla f(x_0)$, that satisfies
$$
df_{x_0}(v) = g(v, \nabla f(x_0))
$$
for every $v \in \mathbb{R}^n$ and where $g$ is an inner product on $\mathbb{R}^n$ Note that this definition is free of coordinates but does require the existence of an inner product (which may or may not be the standard one). See for example [AMANN, p 160] or [FRANKEL, p46] for a discussion of this perspective.
Now, it will turn out that if you do use standard Cartesian coordinate vectors then you can recover the "typical" definition of the gradient from this one. To see this though, and to see where the expression for the gradient in spherical coordinates that you provided in your question comes from, requires us to dig deeper.
Now, it can be shown that
$$
\nabla f(x_0) = (g^{1k} \partial_k f(x_0), \dots, g^{nk} \partial f(x_0))
$$
where $g^{ij}$ denotes the $i,j$ entry of the inverse of the matrix $G = [g_{ij}]$ I'll reference you again to my previous question for the details of this statement. So, this expression gives us a concrete way for actually calculating the gradient but in order to do so, we will need to figure out how to actually compute the matrix $G$
For a (tractable) example, let us consider polar coordinates. They are related to Cartesian coordinates by the well-known formulae $x = r \cos (\theta)$ and $y = r \sin (\theta)$. It can be shown that the matrix $G$ is determined by the relation $G = J^TJ$ where $J$ is the Jacobian of the transformation in question. See [KAY, p54] for a reference to this fact.
In the case of polar coordinates, the transformation is given by
$$
T(r,\theta)= (r \cos (\theta), r \sin (\theta))
$$
The Jacobian of the transformation then is
$$
J = \bigl( \begin{array}{ccc}
\cos (\theta) & -r \sin (\theta) \\
\sin (\theta) & r \cos(\theta)
\end{array} \bigr)
$$
After working through the details we find then that
$$
G = J^TJ = \bigl( \begin{array}{ccc}
1 & 0 \\
0 & r^2
\end{array} \bigr)
$$
Therefore,
$$
G^{-1} = \bigl( \begin{array}{ccc}
1 & 0 \\
0 & \frac{1}{r^2}
\end{array} \bigr)
$$
From this matrix then we can read off the $g^{ij}$ components from which it follows that
$$
\nabla f(x_0) = (\partial_r f(x_0), \frac{1}{r^2} \partial_{\theta} f(x_0))
$$
But, we are still not done. This expression does not agree with what you will usually encounter which is
$$
\nabla f(x_0) = (\partial_r f(x_0), \frac{1}{r} \partial_{\theta} f(x_0))
$$
where there is a missing factor of $r$ in the second coordinate. So, what's going on here? First, we note that since we are working in $r - \theta$ coordinates, the gradient vector is relative to $r - \theta$ bases. Our component-wise notation is obscuring this fact. So what we actually have is
$$
\nabla f(x_0) = \partial_r f(x_0) e_r + \frac{1}{r^2} \partial_{\theta}f(x_0)e_{\theta}
$$
where $e_r$ and $e_{\theta}$ denote the $r - \theta$ basis vectors. So, what are they? Well, you can always use geometry to figure this out but, since I'm really lousy at geometry I like to think of them as being defined analytically as tangent vectors. See [KOKS, p 298] for a discussion of this perspective. To determine them we just differentiate our transformation $T$, with respect to $r$ and $\theta$ respectively. Thus
$$
e_r = \partial_r T = (\cos (\theta), \sin(\theta))
$$
and
$$
e_{\theta} = \partial_{\theta}T = r(-\sin( \theta), \cos (\theta))
$$
Note though that while $e_r = \hat{e_r}$ is a unit vector $e_{\theta}$ is not. Using some algebra we see that $e_{\theta} = |r|\hat{e_{\theta}}$. Therefore, the gradient with respect to the unit basis vectors is given by
$$
\nabla f(x_0) = \partial_r f(x_0) \hat{e_r} + \frac{1}{r} \partial_{\theta}f(x_0)\hat{e_\theta}
$$
and we thus have agreement with the common expression for the gradient in polar coordinates.
References:
[AMANN] Amann and Escher, Analysis II
[FRANKEL] Frankel, The Geometry of Physics
[KAY] Kay, Schaum's Outline of Tensor Calculus
[KOKS] Koks, Explorations in Mathematical Physics