Background info: If $F:\mathbb R^n \to \mathbb R^m$ is differentiable at $x$, then $F'(x)$ is an $m \times n$ matrix which satisfies
$$
\tag{1} \underbrace{F(x + \Delta x)}_{m \times 1} \approx \underbrace{F(x)}_{m \times 1} + \underbrace{F'(x)}_{m \times n} \underbrace{\Delta x}_{n \times 1}.
$$
The approximation is good when $\Delta x$ is small. The local linear approximation (1) is sometimes called "Newton's approximation", and it is the key to understanding and computing derivatives in calculus. It is the basic idea at the heart of differential calculus. Most formulas of calculus can be derived easily just by applying Newton's approximation.
In the special case that $F:\mathbb R^n \to \mathbb R$, $F'(x)$ is a $1 \times n$ matrix (a row vector). Often we use the convention that the gradient of $F$ at $x$ is a column vector, so that
$$
\nabla F(x) = F'(x)^T.
$$
For $F(x) = \|x \|_2^2$, if you don't want to compute the partial derivatives of $F$ (which would be easy in this case), you could think directly in terms of Newton's approximation. With this choice of $F$, we have
\begin{align}
F(x + \Delta x) &= \|x + \Delta x \|_2^2 \\
&= \| x \|^2 + 2 x^T \Delta x + \| \Delta x \|_2^2 \\
&\approx F(x) + 2 x^T \Delta x.
\end{align}
Comparing with Newton's approximation, we discover that
$$
F'(x) = 2 x^T.
$$
If we use the convention that $\nabla F(x)$ is a column vector, then
$$
\nabla F(x) = F'(x)^T = 2x.
$$
This is the result we would expect or guess based on what we know about single-variable calculus. (In single-variable calculus, if $F(x) = x^2$, then $F'(x) = 2x$.)
To compute the gradient of the function $F(x) = \| Ax - b \|_2^2$, I recommend using the chain rule, as I explained here: https://math.stackexchange.com/a/3508376/40119
This makes the calculation simple and elegant.