$
\renewcommand\vec\mathbf
\newcommand\R{\mathbb R}
\newcommand\PD[2]{\frac{\partial#1}{\partial#2}}
\newcommand\tPD[2]{\partial#1/\partial#2}
\newcommand\diff{\mathrm D}
\newcommand\dd{\mathrm d}
$See this answer of mine where I define $\nabla$ in arbitrary expressions rigorously. I'll give a short version of that exposition here and then explain what $(\vec v\cdot\nabla)\vec v$ means.
Let's just think about the gradient of some $f : \R^n \to \R$ for the moment. In this context you know that given Cartesian coordinates $(x^1)_{i=1}^n$ and the standard basis $\{\vec e_i\}_{i=1}^n$ we formally have
$$
\nabla = \sum_{i=1}^n\vec e_i\PD{}{x^i}
$$
in the sense that $\nabla f = \sum_ie_i\tPD f{x^i}$. However the gradient is an inherently coordinate-free concept: we define $\nabla f(\vec x)$ as the unique vector such that
$$
(\nabla f(\vec x))\cdot\vec w = \diff f_{\vec x}(\vec w)
$$
for all $\vec w \in \R^n$. The linear function $\diff f_{\vec x} : \R^n \to \R$ is the total differential of $f$ at $\vec x$, making $\diff f_{\vec x}(\vec w)$ the direction derivative of $f$ at $\vec x$ in the $\vec w$ direction. We can discover from this definition that if $(x^i)_{i=1}^n$ are any coordinates with corresponding basis $\{\vec e_i(\vec x)\}_{i=1}^n$ (which depends on a chosen point $\vec x$) then $\nabla$ can be given the form
$$
\nabla = \sum_{i=1}^n\vec e^i(\vec x)\PD{}{x^i}
$$
where $\vec e^i(\vec x)$ is the reciprocal basis uniquely defined by
$$
\vec e^i(\vec x)\cdot\vec e_j(\vec x) = \delta^i_j.
$$
As is common practice we will drop the $\vec x$-dependence and simply write $\vec e_i$ and $\vec e^i$. Note that this has nothing to do with the way point $\vec x$ is expressed; we could express $\nabla$ using one set of coordinates and $\vec x$ using a completely different set of coordinates and the above expression for $\nabla$ would still be valid; we would just have to take into account the relationship between these coordinates via the chain rule (and the point-dependence of any basis vector once we extend $\nabla$ to act on vectors).
Now let $L_{\vec x} : \R^n \to V$ be a linear function for each $\vec x \in \R^n$ with $V$ some arbitary real vector space. The following formal manipulation motivates a definition:
$$
L_{\vec x}(\nabla) = L_{\dot{\vec x}}\left(\sum_{i=1}^n\vec e^i\PD{}{\dot x^i}\right) = \PD{}{\dot x^i}L_{\dot{\vec x}}(\vec e^i).
$$
The overdots are to make it clear that we are not differentiating any point-dependence of $\vec e^i$, only that of $\vec x \mapsto L_{\vec x}$. You can convince youself that this quantity is indpendent of the coordinates chosen; we will call this the derivative of $L$. Note that $\vec x \mapsto L_{\vec x}(\nabla) : \R^n \to V$.
- When $f : \R^n \to \R$ then the derivative of $L_{\vec x}(\vec w) = \vec w f(\vec x)$ is the gradient of $f$.
- When $\vec f : \R^n \to \R^n$ then the derivative of $L_{\vec x}(\vec w) = \vec w\cdot\vec f(\vec x)$ is the divergence of $f$.
- When $\vec f : \R^3 \to \R^3$ then the derivative of $L_{\vec x}(\vec w) = \vec w\times\vec f(\vec x)$ is the curl of $f$.
This gives an interpretation of $\nabla$ in any expression where it appear in a "linear slot". We can define multiple uses of $\nabla$ in the same expression as higher derivatives $L_{\vec x}(\nabla, \nabla,\dotsc)$ of a multilinear $L$; for instance the second derivative of $L_{\vec x}(\vec w_1, \vec w_2) = \vec (\vec w_1\cdot\vec w_2)\vec f(\vec x)$ for any $\vec f : \R^n \to V$ is the Laplacian of $\vec f$:
$$
L_{\vec x}(\nabla, \nabla) = \nabla^2\vec f(\vec x).
$$
These higher derivatives make the most sense when partial derivatives commute since then the order of application of each $\nabla$ does not matter.
We can arrive at the following geometric interpretation of the derivative:
$$
L_{\vec x}(\nabla) = \lim_{R_{\vec x}\to 0}\frac1{|R_{\vec x}|}\oint_{\partial R_{\vec x}}L_{\vec y}(\vec n)\,\dd S = \frac n{|\partial B_n|}\oint_{\partial B_n}\diff L_{\vec x}(\vec y)(\vec y)\,\dd S.
$$
In the first integral the limit is taken over regions $R_x$ of non-zero volume containing $\vec x$ "shrinking down" to zero volume, $\vec y$ is the variable of integration, $\vec n = \vec n(\vec y)$ is the outward-pointing unit normal of boundary $\partial R_{\vec x}$, and $\dd S$ is the scalar surface area measure. In the second integral, $B_n$ is the unit $n$-ball centered at the origin with surface area $|\partial B_n|$ and $\diff L_{\vec x}$ is the differential of $\vec x \mapsto L_{\vec x}$ evaluated at $\vec x$; this is a linear map taking vectors to linear functions so $\diff L_{\vec x}(\vec y) : \R^n \to V$ and finally $\diff L_{\vec x}(\vec y)(\vec y) \in V$. I will abstain from discussing these integrals for the sake of brevity, except to say the last one is exactly $n$ times the average of $\diff L_{\vec x}(\vec y)(\vec y)$ over all directions $\vec y$.
Now we discuss the particular case $(\vec v\cdot\nabla)\vec v$ where $\vec v = \vec v(\vec x)$ and $\vec v : \R^n \to \R^n$. It is crucial to note that the intent of this notation is to only differentiate the second $\vec v$, or written explicitly $(\vec v\cdot\dot\nabla)\dot{\vec v}$.
First consider some constant $\vec u \in \R^n$ and $(\vec u\cdot\nabla)\vec v$. We formalize this as the derivative of $L_{\vec x}(\vec w) = (\vec u\cdot\vec w)\vec v$; but now notice that if we write $\vec v$ in Cartesian coordinates $\vec v = v^i\vec e_i$ then
$$
L_{\vec x}(\vec w) = \sum_{i=1}^n[\vec u\cdot(\vec w v^i)]\vec e_i.
$$
The derivative of $\vec w \mapsto \vec u\cdot(\vec w v^i)$ is precisely $\vec u\cdot(\nabla v^i)$, which is the derivative of $v^i$ in the $\vec u$ direction. This shows that $L_{\vec x}(\nabla) = (\vec u\cdot\nabla)\vec v$ is precisely the $\vec u$-directional derivative of $\vec v$. In fact
$$
\vec u\cdot\nabla
$$
is always the $\vec u$-directional derivative operator regardless of what it is applied to.
We get the expression $(\vec v\cdot\dot\nabla)\dot{\vec v}$ by simply setting $\vec u = \vec v(\vec x)$; so this expression is the derivative of $\vec v$ in the $\vec v$ direction. In other words, this expression is the change in $\vec v$ that occurs only in the direction that it is pointing at a particular $\vec x$. Just like any directional derivative we can write
$$
(\vec v\cdot\nabla)\vec v = \lim_{\epsilon\to0}\frac{\vec v(\vec x + \epsilon\vec v(\vec x))- \vec v(\vec x)}\epsilon = \frac\dd{\dd t}\vec v(\vec x + t\vec v(\vec x)).
$$
Applying the geometric interpretation we get
$$
(\vec v(\vec x)\cdot\nabla)\vec v(\vec x) = \lim_{R_{\vec x}\to 0}\frac1{|R_{\vec x}|}\oint_{\partial R_{\vec x}}(\vec v(\vec x)\cdot\vec n(\vec y))\vec v(\vec y)\,\dd S
$$
with $\vec y$ the variable of integration. Notice how $\vec n(\vec y)$ is constrained by $\vec v(\vec x)\cdot\vec n(\vec y)$.
Finally we consider the identity
$$
\nabla\times[(\vec v\cdot\nabla)\vec v] = -\nabla\times[\vec v\times(\nabla\times\vec v)].
$$
The simple fact that
$$
L_{\vec x}(\vec w) = M_{\vec x}(\vec w)\text{ for all $\vec w$} \implies L_{\vec x}(\nabla) = M_{\vec x}(\nabla)
$$
means that we can manipulate $\nabla$ in any way as if it were a vector so long as we keep track of what it differentiates. For instance, since we have the identity
$$
\vec a\times(\vec b\times\vec c) = (\vec a\cdot\vec c)\vec b - (\vec a\cdot\vec b)\vec c
$$
the following is true:
$$
\vec v\times(\dot\nabla\times\dot{\vec v})
= (\vec v\cdot\dot{\vec v})\dot\nabla - (\vec v\cdot\dot\nabla)\dot{\vec v}.
\tag{$*$}
$$
To be completely explicit, this follows from the equality of the functions
$$
L_{\vec x}(\vec w) = \vec u\times(\vec w\times\vec v(\vec x)),\quad
M_{\vec x}(\vec w) = (\vec u\cdot\vec v(\vec x))\vec w - (\vec u\cdot\vec w)\vec v(\vec x)
$$
for any constant $\vec u$, setting $\vec u = \vec v$ after differentiating. The second term of ($*$) we are already familiar with; the first term is impossible to write with the standard convention of "$\nabla$ differentiates to the right" (though it turns out that this term is $(\diff \vec v_\vec x)^T(\vec v)$ where $(\diff \vec v_\vec x)^T$ is the adjoint of the differential). However, in this particular case we can use the very powerful but simple subexpression rule which I state without proof:
- The derivative of an expression is the sum of the derivatives of its subexpressions.
Practically, what this means is the following:
$$
\dot\nabla(\dot{\vec v}\cdot\dot{\vec v}) = \dot\nabla(\dot{\vec v}\cdot\vec v) + \dot\nabla(\vec v\cdot\dot{\vec v}) = 2\dot\nabla(\vec v\cdot\dot{\vec v}).
$$
Then since $(\vec v\cdot\dot{\vec v})\dot\nabla = \dot\nabla(\vec v\cdot\dot{\vec v})$ because $\vec v\cdot\dot{\vec v}$ is a scalar we see
$$
\vec v\times(\nabla\times\vec v) = \frac12\nabla|\vec v|^2 - (\vec v\cdot\nabla)\vec v
$$
where we've returned to the standard convention that $\nabla$ differerntiates to the right. Applying $\nabla\times$ to each side we see that the first term gives the curl of a gradient, which is zero, and the second terms gives the desired identity.