gradient of $x^tAy$ with respect of $y$ and gradient of the Euclidean norm.

Question

I was reading this paper where they want to find the saddle point of this equation:

$$\text{min }\text{max }\{x^TAy + \frac{\lambda}{2} ||By−z||^2\}$$

Where $x,y,z$ are vectors and $A,B$ are matrices. $\lambda$ is a real number.

On section 2.1 they reformulate this problem as a Variational Inequality. I read here (page 2332) that is easier to consider variational inequalities of the function's differentials, so the authors of the first paper (I'll call it "the paper of doom") make:

$$\Delta_x (x^TAy + \frac{\lambda}{2} ||By−z||^2) = -A^Ty \tag{1}$$ $$\Delta_y (x^TAy + \frac{\lambda}{2} ||By−z||^2) = Ax + \lambda B^T(By - z) \tag{2}$$

I assume that $\Delta_x (||By−z||^2) = 0$.

I'm calculating $\Delta_y(||By−z||^2)$ as follows:

$$\Delta_y(||By−z||^2) = \Delta_y \sum_i^L \sum_j^N (b_{ij}y_j - z_j)^2$$ $$ = \Delta_y \sum_i^L \sum_j^N |(b_{ij}y_j)^2 - 2b_{ij}y_jz_j + z_j^2|$$ $$ = \sum_i^L \sum_j^N \Delta_y |(b_{ij}y_j)^2 - 2b_{ij}y_jz_j + z_j^2|$$ $$ = \sum_i^L \sum_j^N |2b_{ij}y_j - 2b_{ij}z_j|$$ $$ = 2\sum_i^L \sum_j^N |b_{ij}y_j - b_{ij}z_j|$$

After reading this post and this post I'm calculating the derivatives of $y^TAx$ as follows: (typo here, see the edit)

Edit:

Thanks Greg for your comment, you helped me to find a typo in my question and other mistakes from the authors of the paper of doom, here's the correction:

$A \in \mathbb{R}^{LxN}$, $x \in \mathbb{R}^{L}$, $y \in \mathbb{R}^{N}$.

I'm calculating the derivatives of $x^TAy$ as follows:

$$\Delta_x x^TAy = Ay$$ $$\Delta_y x^TAy = x^TA$$

Thanks to this wonderful paper, I knew that my calculation of $\Delta_x x^TAy$ and $\Delta_y x^TAy$ were right, in the case of $\Delta_x x^TAy$ I was missing a last step: $$\Delta_y x^TAy = x^TA = a \in \mathbb{R}$$ $$a = a^T = (x^TA)^T = A^Tx$$

For the euclidean norm, in the paper of doom authors make (or at least seem to make, I can not trust them anymore):

$$\frac{\lambda}{2} ||By−z||^2 = \lambda B^T(By−z)$$

Where $z \in \mathbb{R}^L, B \in \mathbb{R}^{LxN}$

My mistake was trying to force my results to look like the ones in the paper of doom. How to calculate the derivative of the euclidean norm is well explained here and more closely to my case in here. $(By)_i$ is the $i$ element of the vector $By$, and $y_i$ is the $i$ element of the vector $y$:

$$\frac{\partial}{\partial y_i}||By - z||^2 = 2(\sum_i^L ((By_i) - z_i)) \frac{\partial}{\partial y_i}((By)_i - z_i) $$

So I'm getting:

$$\Delta_x (x^TAy + \frac{\lambda}{2} ||By−z||^2) = Ay \tag{3}$$ $$\Delta_y (x^TAy + \frac{\lambda}{2} ||By−z||^2) = A^Tx + 2(\sum_i^L ((By_i) - z_i)) \frac{\partial}{\partial y_i}((By)_i - z_i) \tag{4}$$

At this point, I'm unsure how to calculate $\frac{\partial}{\partial y_i}((By)_i - z_i)$ Does anyone know how to do it?

I don't know what to make of the first paper. Their very first equation is defines $$\eqalign{ &\Phi(x,y) = x^TAy + \frac{\lambda}{2}|By-z|^2 \cr &x\in{\mathbb R}^{L},, y\in{\mathbb R}^{N},, z\in{\mathbb R}^{L},, A\in{\mathbb R}^{L\times N},, B\in{\mathbb R}^{L\times N} \cr }$$ But in section 2.1 they start writing stuff like $$(x-x^*)^T(-A^Ty)\ge 0$$ Nevermind the accuracy of their differentials and gradients, this is just dimensionally incompatible nonsense. — greg, Feb 10 '19 at 22:19
@greg, you're right, authors messed with dimensions horribly. Their $-A^Ty$ has no sense unless $L=N$. Thanks a lot for making me notice it. — Broken_Window, Feb 12 '19 at 17:03
They also mess with dimensions where they make $(y-y^)^T(Ax^ + \lambda B^T(By^*−z)$ — Broken_Window, Feb 12 '19 at 21:16

gradient of $x^tAy$ with respect of $y$ and gradient of the Euclidean norm.

0 Answers0