According to Wikipedia, given a differentiable mapping $F: \mathbb{R}^n \to \mathbb{R}^m$, its Jacobian matrix is a $m \times n$ matrix defined as: $$ J_F=\begin{bmatrix} \dfrac{\partial y_1}{\partial x_1} & \cdots & \dfrac{\partial y_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial y_m}{\partial x_1} & \cdots & \dfrac{\partial y_m}{\partial x_n} \end{bmatrix}. $$ Specially when $m=1$, the Jacobian matrix is also called the gradient $\nabla F$. So when trying to compute a differential, it is $J_F \Delta x$ or $\nabla F \Delta x$.
In real analysis, optimization, ..., some texts agree with Wikipedia's definitions. However, in some others, a Jacobian matrix or a gradient of a differentiable mapping is defined to be the transpose of the Wikipedia definitions.
Moreover, in baby Rudin, $J_F$ is of $m \times n$ dimension, while when $m=1$, $\nabla F$ is of $n \times 1$ dimension.
When it comes to writing my own formulas, I wonder which way is mostly adopted?
Thanks and regards!