There is a serious flaw in the formulae reported in the The Matrix Cookbook for gradient with respect to symmetric matrices. (section 2.8 of the Matrix Cookbook, on Structured Matrices).
The arxiv paper cited in another answer has now been published https://doi.org/10.1007/s13226-022-00313-x (shareit link https://rdcu.be/cUwmr), and reading it will give a full picture.
I am the first author and I wanted to put up an answer that will hopefully educate other people who come here with the same questions as I did.
In my opinion, the whole confusion arises because somehow the idea of constrained optimization has got mixed up with something that I can only call constrained derivative. I am using derivative and gradient interchangeably here but that doesn't change things.
Let's consider an example. What is the derivative of $f(x, y) = x^2 + 2y^2$ given that the domain is the set of points lying on the unit circle C , $x^2 + y^2 = 1$, i.e, we require the derivative of $f( \mathbf{x})|_C$ ? If instead of asking what is the derivative I had asked a maxima/minima question then we talk of incorporating the unit circle constraint through a Lagrange multiplier etc. This is similar to all the talk of "off-diagonal elements are not independent", "symmetric matrix satisfies a constraint" etc.
However, in the context of taking a derivative, the formula needs $f(\mathbf{x} + \mathbf{h}) - f(\mathbf{x})$ to make sense, i.e., both $\mathbf{x}$ and $\mathbf{x}+\mathbf{h}$ need to belong to whatever is the domain of definition. Here that is the boundary of the unit circle.
Clearly, taking $\mathbf{x} = (x, y), \mathbf{h} = (h, k)$, that is not true and the derivative definition has no meaning.
However, if we parametrize the unit circle constraint as $x = \sin(\theta), y= \cos(\theta)$, now the argument is $\theta$ and any perturbation of it like $\theta + \delta \theta$ also lies on the unit circle. Now the derivative restricted to the unit circle C can be evaluated from $f( \mathbf{x})|_C = g(\theta) = \sin^2(\theta) + 2\cos^2(\theta)$.
The point I am making is that special procedures are necessary only when the domain is not a subspace.
Now consider matrix functionals like $\phi(A)$ for $A$ symmetric, diagonal, upper triangular etc and the derivative of $\phi$. The analogy to the unit circle constraint should now be clear.
The Matrix Cookbook and the so-called matrix calculus attempt special procedures for all these cases, even though no special procedures are necessary --
In all these cases, $A, A+H$ belong to the domain of definition.
The exception is when $A$ is orthogonal. In that case, $A+H$ need not be orthogonal for $A, H$ orthogonal and the set of orthogonal matrices need to be parametrized in some way to construct the restriction of $\phi$ to the domain of orthogonal matrices and then evaluate the derivative.
In the paper linked above we show how the end result obtained after the special procedures is incorrect for symmetric matrices.