I'm trying to derive the gradient in polar coordinates using the chain rule.
So the idea is that when we have a function $f(x,y)$ and we switch to polar coordinates, we're really composing $f$ with $P(r,\theta) = (r\cos(\theta),r\sin(\theta))$. So then the gradient of $f$ in polar coordinates should just be $\nabla (f\circ P)(r,\theta)$.
From my last question I know that $$\nabla(f\circ P)(r,\theta) = \begin{bmatrix}(\partial_1f\circ P)(r,\theta) & (\partial_2f\circ P)(r,\theta)\end{bmatrix}\begin{bmatrix}\partial_1P_1(r,\theta) & \partial_2P_1(r,\theta) \\ \partial_1P_2(r,\theta) & \partial_2P_2(r,\theta)\end{bmatrix} \\ = \begin{bmatrix} (\partial_1f\circ P)(r,\theta)\cdot\partial_1P_1(r,\theta) + (\partial_2f\circ P)(r,\theta)\cdot\partial_1P_2(r,\theta) \\ (\partial_1f\circ P)(r,\theta)\cdot\partial_2P_1(r,\theta) +(\partial_2f\circ P)(r,\theta)\cdot\partial_2P_2(r,\theta)\end{bmatrix}^T$$
But I also know that $$\partial_1(f\circ P)(r,\theta) = (\partial_1f\circ P)(r,\theta)\cdot\partial_1P_1(r,\theta) + (\partial_2f\circ P)(r,\theta)\cdot\partial_1P_2(r,\theta) \\ \partial_2(f\circ P)(r,\theta) = (\partial_1f\circ P)(r,\theta)\cdot\partial_2P_1(r,\theta) +(\partial_2f\circ P)(r,\theta)\cdot\partial_2P_2(r,\theta)$$
by the regular chain rule.
So, putting that together I get $$\nabla(f\circ P)(r,\theta) = \begin{bmatrix}\partial_1(f\circ P)(r,\theta) & \partial_2(f\circ P)(r,\theta)\end{bmatrix}$$
i.e. $$\nabla(f\circ P) = \frac{\partial (f\circ P)}{\partial r}\mathbf {\hat r} + \frac{\partial (f\circ P)}{\partial \theta}\mathbf {\hat \theta}$$
If I were to then write this out in more traditional notation, where $f$ and $f\circ P$ are not distinguished, it should look like $$\nabla f = \frac{\partial f}{\partial r}\mathbf {\hat r} + \frac{\partial f}{\partial \theta}\mathbf {\hat \theta}$$
But comparing this to the correct formula for the gradient in polar coordinates, which for reference is usually written as $$\nabla f = \frac{\partial f}{\partial r}\mathbf {\hat r} + \frac 1r\frac{\partial f}{\partial \theta}\mathbf {\hat \theta},$$ I see that I'm missing a factor of $\frac 1r$ on the second term. Where does that come from?
Edit: BTW, I notice something interesting, though it may have nothing to do with the problem I'm having. But $$\begin{bmatrix}\partial_1P_1(r,\theta) & \partial_2P_1(r,\theta) \\ \partial_1P_2(r,\theta) & \partial_2P_2(r,\theta)\end{bmatrix} = \begin{bmatrix}\cos(\theta) & -r\sin(\theta) \\ \sin(\theta) & r\cos(\theta)\end{bmatrix}$$ would be an orthogonal matrix (in fact, it'd be a rotation) if we multiplied the second column by $\frac 1r$. But then that's the column that goes into $\partial_2(f\circ P)(r,\theta)$. So if I normalized that column, then somehow my formula would have come out correctly with the $\frac 1r$. But I see no reason why I should do that. Is this just a weird coincidence?