Finding the gradient of the restricted function in terms of the gradient of the original function

Question

The following question showed up as part of a proof that I am doing for my research thesis.

If we have a differentiable function $f: \mathbb{R}^n \to \mathbb{R}$ and then set $n-d$ coordinates to zero we get a new differentiable function $g: \mathbb{R}^d \to \mathbb{R}$. Now, given the gradient $\nabla_x f(x)$, how one can get $\nabla_y g(y)$?

My try

Let $x \in \mathbb{R}^n$ and $S \subset \{1,\dots,n\}$ such that $|S|=d$ where $|\cdot|$ is the cardinality of the set. Let $U_S$ be a restricted identity matrix such that the $j$-th entry of the diagonal matrix is maintained if $j \in S$ otherwise it is set to zero. Also, let $I_S$ be the restriction of $U_S$ where we keep nonzero columns and remove zero columns. Hence,

$$ g(y)=f(U_Sx) $$ where $y=I_S^{\top}x$.

The above is the translation of what I stated in terms of functions $f$ and $g$.

From this point things are a little bit unclear. I think the answer should be $\nabla_y g(y)=I_S^{\top} \nabla_x f(x)$ but I do not know how to get it.

Also, I know using the chain rule $J_x f(U_S x)=J_{W} f(W)J_x W= J_{W} f(W)U_S$ where $J$ is the Jacobian and $W=U_S x$. In addition, $\nabla^{\top}_x f(U_Sx) = J_x f(U_S x)=J_{W} f(W)U_S$. I do not know how to put things together.

Paradox · Accepted Answer · 2021-09-17T10:59:18.710

2

Since no one has posted an answer yet, and I get the same result as you suggest, I thought I'll post my solution for you to judge:

We have that $$U_S x = I_S y$$ so that, vieweing matrices as linear transformations $$g(y) = f(U_S x) = f(I_S y) = f\circ I_S (y)$$ And similar to what you write about $J_{U_S}(x)$ we have $J_{I_S}(y) = I_S$. Applying the chain rule: $J_{h_1 \circ h_2}(a) = J_{h_1}(h_2(a))J_{h_2}(a)$ then gives $$\begin{align} (\nabla_y g(y))^T = J_g(y) =\\ J_{f\circ I_S}(y) = \\ J_f(I_s y)J_{I_S}(y) = \\ J_f(U_s x)I_S = \\ (\nabla_x f(U_S x))^TI_S \implies \\ \nabla_y g(y) = [(\nabla_x f(U_S x))^TI_S]^T = I_S^T\nabla_x f(U_S x) \end{align} $$

Due to the definitions of $U_S$ and $I_S$, the zero columns in $I_S^T$ exactly matches the rows where $\nabla_x f(U_S x)$ and $\nabla_x f(x)$ might differ, so finally we obtain $$ \nabla_y g(y) = I_S^T\nabla_x f(U_S x) = I_S^T \nabla_x f(x) $$

Edit:
As pointed out in the comments, it would be more correct to write $$ \nabla_y g(y) = I_S^T\nabla_x f(I_S y) $$

edited Sep 17 '21 at 10:59

answered Sep 17 '21 at 06:45

Paradox

341

Could you make it a little more clear what's going on? – Mathemagician314 Sep 17 '21 at 10:49
@Mathemagician314 My answer was a bit confused, I at least had some unnecessary steps. Are there any steps in particular you find strange? – Paradox Sep 17 '21 at 11:04
@Paradox: it is not correct since $U_S\in \mathbb{R}^{n \times n}$ so $U_Sx$ is a vector in $\mathbb{R}^n$ and $I_sx \in \mathbb{R}^{d}$ – Saeed Sep 18 '21 at 16:48
@Sepide I assume you mean $I_Sy \in \mathbb{R}^d$ since I've not written $I_S x$ anywhere. As far as I can see, $I_s$ is an $n \times d$ matrix, not $d \times n$, since it was the non-zero columns that was removed from $U_S$, not non-zero rows. So $I_s y \in \mathbb{R}^n$. – Paradox Sep 18 '21 at 16:56

Rodrigo de Azevedo · Answer 2 · 2021-09-23T19:03:46.323

Let fat matrix ${\bf S} \in \Bbb R^{d \times n}$ be

$${\bf S} := \begin{bmatrix} {\bf I}_d & {\bf O} \end{bmatrix} {\bf P}$$

where ${\bf P}$ is an $n \times n$ permutation matrix. Note that

$${\bf S} {\bf S}^\top = \begin{bmatrix} {\bf I}_d & {\bf O} \end{bmatrix} \underbrace{\,{\bf P} {\bf P}^\top}_{= {\bf I}_n} \begin{bmatrix} {\bf I}_d \\ {\bf O} \end{bmatrix} = {\bf I}_d$$

Let vector fields $\rho : \Bbb R^n \to \Bbb R^d$ and $\eta : \Bbb R^d \to \Bbb R^n$ be defined by

$$\rho := ({\bf x} \mapsto {\bf S} {\bf x}), \qquad \eta := ({\bf y} \mapsto {\bf S}^\top {\bf y})$$

and note that $\rho \circ \eta = \mbox{id}_{\Bbb R^d}$. Colloquially, if one "expands" and then "restricts", one ends up exactly where one started.

Given differentiable scalar field $f : \Bbb R^n \to \Bbb R$, let scalar field $g : \Bbb R^d \to \Bbb R$ be defined by

$$g := f \circ \eta$$

Hence,

$$\begin{aligned} g \left( {\bf y} + {\rm d} {\bf y} \right) = f \left( {\bf S}^\top {\bf y} + {\bf S}^\top {\rm d} {\bf y} \right) &= f \left( {\bf S}^\top {\bf y} \right) + \left\langle \nabla f \left( {\bf S}^\top {\bf y} \right) , {\bf S}^\top {\rm d} {\bf y} \right\rangle \\ &= f \left( {\bf S}^\top {\bf y} \right) + \left\langle {\bf S} \, \nabla f \left( {\bf S}^\top {\bf y} \right) , {\rm d} {\bf y} \right\rangle \end{aligned}$$

and, thus, the gradient of $g$ is

$$\nabla g \left( {\bf y} \right) = \color{blue}{{\bf S} \, \nabla f \left( {\bf S}^\top {\bf y} \right)}$$

or, more succinctly,

$$\boxed{ \qquad \\ \qquad \nabla g = \color{blue}{\rho \circ \nabla f \circ \eta \qquad \\ \qquad}}$$

score 0 · Answer 3 · answered Oct 17 '21 at 14:34

$ \def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} $For typing convenience, name the gradient $$\eqalign{ p = \grad{f}{x} \\ }$$ and rename the matrices $I_S\to S$ and $U_S\to U$.

Also note that $\,U=SS^T\,$ and that $$y=S^Tx \qiq dy=S^Tdx$$ Write the differential of the function and rearrange it to recover the desired gradient. $$\eqalign{ g(y) &= f(Ux) \\ dg &= df \\ &= p:d(Ux) \\ &= p:\LR{U\,dx} \\ &= p:\LR{SS^T\,dx} \\ &= \LR{S^Tp}:\LR{S^T\,dx} \\ &= \LR{S^Tp}:dy \\ \grad{g}{y} &= S^Tp \\ }$$

In the preceding, a colon is used to denote the Frobenius product, which is a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:AB &= CB^T:A = A^TC:B \\ }$$

Finding the gradient of the restricted function in terms of the gradient of the original function

3 Answers3