I understand that $\theta $ basis vector have variable length proportional to the radius.
Some change in $\theta$ trace longer curve the further it away from the center. So the same $\Delta\theta$ make bigger impact on the function change the larger the current $r$ is.
What I don't understand intuitively is why this behaviour shouldn't propagate to the gradient as well.
The correct formula for gradient in polar coordinates $$\nabla f = \frac{\partial f}{\partial r}\hat{r} + \frac{1}{r}\frac{\partial f}{\partial \theta}\hat{\theta}$$ explicitly compensates for this behaviour.
But why $$\nabla f = \frac{\partial f}{\partial r}\hat{r} + \frac{\partial f}{\partial \theta}\hat{\theta}$$ wouldn't be the "direction and rate of fastest increase" in polar coordinates?
I've seen the derivations, and I understand why they are correct, but I have the problem that incorrect formula still seems more natural for me intuitively as "direction and rate of fastest increase" in polar coordinates.