I'm reading Eli Bendersky's blog post that derives the softmax function and its associated loss function and am stuck on one of the first steps of the softmax function derivative [link].
His notation defines the softmax as follows:
$$S_j = \frac{e^{a_i}}{ \sum_{k=1}^{N} e^{a_k} } $$
He then goes on to start the derivative:
$$ \frac{\partial S_i}{\partial a_j} = \frac{ \partial \frac{e^{a_i} }{ \sum_{k=1}^N e^{a_k}} } {\partial a_j} $$
Here we are computing the derivative with respect to the $i$th output and the $j$th input. Because the numerator involves a quotient, he says one must apply the quotient rule from calculus:
$$ f(x) = \frac{g(x)}{h(x)} $$ $$ f'(x) = \frac{ g'(x)h(x) - h'(x)g(x) } { (h(x))^2 } $$
In the case of the $S_j$ equations above:
$$ g_i = e^{a_i} $$ $$ h_i = \sum_{k=1}^N e^{a_k} $$
So far so good. Here's where I get confused. He then says: "Note that no matter which $a_j$ we compute the derivative of $h_i$ for, the answer will always be $e^{a_j}$".
If anyone could help me see why this is the case, I'd be very grateful.