As to whether to transpose or not: This depends on what double-contraction operator is in use.
Defining differentiation of a scalar-valued function $f$ with respect to a (typically square) matrix $X$ is meant to help expressing linearized changes as
$$\mathrm{d}f = \frac{\delta f}{\delta X}\cdot\cdot\,\mathrm{d}X\tag{*}\label{ccorn-D}$$
Here the double-dot stands for the double-contraction operator.
It takes two matrix arguments and contracts those bilinearly to a scalar.
Some people define the double-contraction operator as
$$A\cdot\cdot B = \operatorname{tr}(AB)\tag{1}\label{ccorn-I}$$
whereas others use
$$A\cdot\cdot B = \operatorname{tr}(A^\top B)\tag{2}\label{ccorn-II}$$
These two agree for symmetric matrices; and both ensure that the double-contraction of a symmetric matrix with a skew-symmetric matrix yields zero.
They differ basically in the sign of the result when contracting two skew-symmetric matrices.
As long as at least one of $A$ or $B$ is symmetric, it does not matter which of $\eqref{ccorn-I}$ or $\eqref{ccorn-II}$ you use, but it matters in formulae like the one you have encountered.
Note that $\eqref{ccorn-D}$ alone does not uniquely define $\delta f/\delta X$.
In particular, if $X$ is symmetric, $\delta f/\delta X$ could contain arbitrary skew-symmetric components that do not affect $\eqref{ccorn-D}$.
In such scenarios, $\delta f/\delta X$ is commonly understood to be symmetric as well.
As to the result named first in your question:
Probably the authors mean
$$\frac{\delta f}{\delta X} = \begin{pmatrix}
\frac{\partial f}{\partial X_{11}} & \cdots
& \frac{\partial f}{\partial X_{1n}}
\\\vdots & \ddots & \vdots
\\\frac{\partial f}{\partial X_{n1}} & \cdots
& \frac{\partial f}{\partial X_{nn}}
\end{pmatrix}\tag{3}\label{ccorn-III}$$
or the transpose thereof.
This is compatible with $\eqref{ccorn-D}$ if you choose the double-contraction
operator accordingly – $\eqref{ccorn-II}$ for the above version,
$\eqref{ccorn-I}$ for the transposed version – so that you end up with the
chain rule, as it should be:
$$\frac{\delta f}{\delta X}\cdot\cdot\,\mathrm{d}X
= \sum_{i,j}\frac{\partial f}{\partial X_{ij}}\mathrm{d}X_{ij}
= \mathrm{d}f$$
$\eqref{ccorn-III}$ assumes that the $X_{ij}$ are independent variables
insofar as the definition of $f$ is concerned. This does not require
that $X$ be unconstrained; it just means that $f(X)$ needs to be defined in
terms of the $X_{ij}$ so that the partial derivatives can be taken.
If $X$ is constrained to be symmetric, one could make $f$ depend only on those $X_{ij}$ where $i\leq j$ by replacing $X_{ji}$ with $X_{ij}$ in the definition of $f$. Let us call this version $f^*$.
Then the right-hand side matrix in $\eqref{ccorn-III}$ becomes
upper triangular, and the former lower triangle gets transposed and added to the former upper triangle:
$$\frac{\delta f^*}{\delta X} = \begin{pmatrix}
\frac{\partial f}{\partial X_{11}}
& \frac{\partial f}{\partial X_{12}} + \frac{\partial f}{\partial X_{21}}
& \cdots
& \cdots
& \frac{\partial f}{\partial X_{1n}} + \frac{\partial f}{\partial X_{n1}}
\\ 0 & \frac{\partial f}{\partial X_{22}}
& \frac{\partial f}{\partial X_{23}} + \frac{\partial f}{\partial X_{32}}
& \cdots
& \frac{\partial f}{\partial X_{2n}} + \frac{\partial f}{\partial X_{n2}}
\\\vdots & \ddots & \ddots & \ddots & \vdots
\\ 0 & \cdots & 0
& \frac{\partial f}{\partial X_{n-1,n-1}}
& \frac{\partial f}{\partial X_{n-1,n}} + \frac{\partial f}{\partial X_{n,n-1}}
\\ 0 & \cdots & \cdots & 0 & \frac{\partial f}{\partial X_{nn}}
\end{pmatrix}\tag{4}\label{ccorn-IV}$$
This would still fulfill $\eqref{ccorn-D}$.
Another option would be to symmetrize the right-hand side of \eqref{ccorn-IV}
by averaging that triangular matrix with its transpose.
That is like replacing $f(X)$ with
$$\bar{f}(X) = \frac{f(X)+f(X^\top)}{2}\tag{5}\label{ccorn-S}$$
But somehow the authors of your first-mentioned formula chose to re-interpret the right-hand side of $\eqref{ccorn-III}$ as
$$\frac{\delta f}{\delta X} \stackrel{?}{=} \begin{pmatrix}
\frac{\partial f^*}{\partial X_{11}}
& \frac{\partial f^*}{\partial X_{12}}
& \cdots & \frac{\partial f^*}{\partial X_{1n}}
\\\frac{\partial f^*}{\partial X_{12}}
& \frac{\partial f^*}{\partial X_{22}}
& \cdots & \frac{\partial f^*}{\partial X_{2n}}
\\\vdots & \vdots & \ddots & \vdots
\\\frac{\partial f^*}{\partial X_{1n}}
& \frac{\partial f^*}{\partial X_{2n}} & \cdots
& \frac{\partial f^*}{\partial X_{nn}}
\end{pmatrix}\tag{6}\label{ccorn-bad}$$
So they sorted the indices and replaced $f$ with $f^*$. That's misleading:
- The replacement alone would have resulted in $\eqref{ccorn-IV}$
and thus been compatible with $\eqref{ccorn-D}$.
- The sorting would have been compatible with $\eqref{ccorn-D}$
if applied to the symmetric $\bar{f}$ from $\eqref{ccorn-S}$
instead of $f^*$.
- Doing both means that now the terms $\partial f/\partial X_{ij}$
with $i\neq j$ occur twice when contracting. This violates the chain rule.
In other words, their resulting formula is incompatible with $\eqref{ccorn-D}$.