I am trying to understand the chain rule applied to a series of transformations in the context of the back propagation algorithm for deep learning. Let $x \in \mathbb{R^k}$ and $A,B$ be real-value matrices of size $K \times K$. Then consider a network defined as $$y = Ax$$ $$u = \sigma (y)$$ $$v = Bx$$ $$z = A (u * v)$$ $$w = Az$$ $$ L = ||w||^2$$
where $L$ is considered as a function of $x, A, B$, and $u*v$ is the element-wise product, and $\sigma(y)$ is the element-wise application of the sigmoid function to $y$. Now I want to be able to calculate $\frac{\partial L }{\partial A}$ and $\frac{\partial L }{\partial B}$.
From what I understand $\frac{\partial L }{\partial A} = \frac {\partial {L}}{\partial w} \frac {\partial w} {\partial A}$
I'm not sure how to express $\frac{\partial w} {\partial A}$ since $z$ is a function of $A$. My guess would be something like $\frac {\partial w}{\partial A} = \frac{d}{dA} (Az) + A \frac{d}{dA} (z)$ but I am not sure if this step should be an application of the product rule or the chain rule.
I'm also not sure how to express $\frac {\partial z} {\partial A}$. Any insights appreciated