The most helpful answer I could find was Jonathan's answer here, and I decided not to comment and bump a literally 6 years old thread for clarification on the explanation, and also because my question, I believe, is not a direct duplicate, because it asks for a question following this, and because it can't be contained by the word limit.
His explanation, in "Why is gradient the direction of steepest ascent?" is this:
Consider a Taylor expansion of this function, $$f({\bf r}+{\bf\delta r})=f({\bf r})+(\nabla f)\cdot{\bf\delta r}+\ldots$$ The linear correction term $(\nabla f)\cdot{\bf\delta r}$ is maximized when ${\bf\delta r}$ is in the direction of $\nabla f$.
I find this is a very graceful answer, but I have one confusion.
For the linear term, it is implied that the derivative for $f(\mathbf r + \mathbf{\delta r})$ is $\mathbf{\nabla f}$, and I can't seem to figure out why this is. Also it seems clear that a Taylor series for a vector-valued function seems to replace multiplication by what seems like its vector analog, the dot product, for which I have no reliable, rigorous understanding as to why other than it being sort of what I would assume it would be if one created a Taylor series for a vector-valued function.