I'm reading this machine learning optimization paper https://arxiv.org/pdf/2010.01412.pdf. At the last formula of page 3, they derived an optimization problem like this:
${\bf{\epsilon^*(w)}} = \underset{||\bf{\epsilon}||_p \leq\rho}{\operatorname{argmax}} \bf{\epsilon^{T}\nabla_{w}L_s(w)}$ (1)
They said this is a classical dual norm problem and the solution is:
$\bf{\hat\epsilon(w) = \rho sign(\nabla_{w}L_s(w))}|\nabla_{w}L_s(w)|^{q-1}/(||\nabla_{w}L_s(w)||_q^q)^{\frac{1}{p}}$ (2)
with $\frac{1}{p}+\frac{1}{q} = 1$ and $|.|^{q-1}$ denotes elementwise absolute value and power.
Can anyone please show me how to solve the optimization problem to arrive at the second formulas. I really appreciate.