Is it possible to reverse engineer out the loss based on weights update when data is unknown?

Question

Assume the gradient updates (both $W_t$ and $W_{t+1}$) and learning rate are known while data $X$ is unknown, is it possible to deduce the loss $L$ used in backprop algorithm that gave rise to the gradient update $W_{t+1} - W_{t}$? If not, is it possible to verify if a given loss is the one we are looking for (in other words is the loss that gave rise to a known gradient update unique, assume we know the model architecture)?

This depends on your definition of Loss $L$, $L$ such as MSE is easily calculable according to its formula after each forward pass and its error terms for each output unit are then used to backprop. — cinch, Dec 14 '22 at 02:06

score 1 · Accepted Answer · answered Dec 13 '22 at 18:14

First, there is no way you could recover the exact loss. At best you could recover the loss up to a constant factor, because the gradient is only giving information on the slope on the loss and not the actual loss values.

If we make assumptions like a standard model architecture, and we have access to the model architecture in addition to the weight values, then it might be possible to recover the loss up to a constant factor. Assuming the (scalar) loss depends only on the output layer, and assuming the weights are updated with vanilla SGD, first we have an eqn like \begin{align*} \frac{1}{\alpha}(W_{t+1} - W_t) = \dfrac{\partial f}{\partial x_n}\Big(\dfrac{\partial x_n}{\partial W_t} + \dfrac{\partial x_n}{\partial x_{n-1}}\dfrac{\partial x_{n-1}}{\partial W_t} + \ldots \Big) \end{align*} where $\alpha$ is the learning rate, $f$ is the loss function, $x_i$ are the units of the $i$-th layer ($x_n$ is output, $x_0$ is input). the LHS is what you know the gradient was and the RHS is the definition of the gradient (based on backprop). In the RHS, the unknowns are the partial derivatives $\frac{\partial f}{\partial x_n}$, which has as many unknowns as dimension of the output layer. This is an over-determined system, because there are only $\text{dim}(x_n)$ unknowns and $\dim(W)$ equations. Normally this would then have no solution, but in this case, we know there actually are some loss partial derivatives $\partial f / \partial x_n$ which should satisfy this.

But this doesn't give you the actual function f. You just get the partial derivatives at $x_n$. Then if you wanted to try to recover $f$ (again, only up to a constant factor), you would need a way to actually evaluate various different $x_n$. Then you can "recover" $f$ (up to a constant factor) by integrating $\partial f / \partial x_n$ over all possible values of $x_n$.

Is it possible to reverse engineer out the loss based on weights update when data is unknown?

1 Answers1