Suppose $E\subseteq\mathbb R^n$ and $f$ maps $E$ into $\mathbb R^m$. Let $g$ map a subset of $\mathbb R^m$ into $\mathbb R^p$. If $f$ is differentiable at $x\in E$ and $g$ is differentiable at $f(x) \in f(E)$, then the composition $g \circ f$ is differentiable at $x$ and $$(g\circ f)'(x) = g'(f(x)) f'(x).$$
where the indicated product is matrix multiplication.
Although this version of the chain rule may look a bit strange, it is really just the familiar chain rule of calculus in a new guise. You can convince yourself of this fact by writing the formula out in terms of partial derivatives.
This is something I don't understand. I don't find this definition of chain rule intuitive.Can someone explain this?