Let $x^{T}=[X_{1}\ X_{2}], \ h(x)=x$. Then,
$$\dfrac{\partial h(x)}{\partial x} = \dfrac{\partial}{\partial h(x)}\begin{bmatrix}X_{1}\\ X_{2}\end{bmatrix} = \begin{bmatrix}\dfrac{\partial}{\partial X_{1}} X_{1} \dfrac{\partial}{\partial X_{2}} X_{1}\\ \dfrac{\partial}{\partial X_{1}}X_{2}\ \dfrac{\partial}{\partial X_{2}}X_{2}\end{bmatrix}$$
The derivative therefore is 2x2 while x is 2x1. If we were to therefore use this result for something like gradient descent, how would that work?