Use for questions about Backpropagation, which is commonly used in training Neural Networks in conjunction with an optimization method such as gradient descent.
Questions tagged [backpropagation]
310 questions
11
votes
1 answer
Synthetic Gradients good number of Layers & neurons
I would like to train my LSTM with a "synthetic gradients" Decoupled Neural Interface (DNI).
How to decide on the number of layers and neurons for my DNI?
Searching for them by trial end error or what's worse - by Genetic
algorithm which would…

Kari
- 2,726
- 2
- 20
- 49
8
votes
2 answers
Synthetic Gradients - what's the practical benefit?
I can see two motives to use Synthetic Gradients in RNN:
To speed up training, by imediately correcting each layer with predicted gradient
To be able to learn longer sequences
I see problems with both of them. Please note, I really like…

Kari
- 2,726
- 2
- 20
- 49
8
votes
1 answer
How to apply the gradient of softmax in backprop
I recently did a homework where I had to learn a model for the MNIST 10-digit classification. The HW had some scaffolding code and I was supposed to work in the context of this code.
My homework works / passes tests but now I'm trying to do it all…

SaldaVonSchwartz
- 299
- 1
- 3
- 7
4
votes
1 answer
Confusion in backpropagation algorithm
I have been trying to understand the backpropagation for a while now.
I have came across two variants of it.
In the Andrew Ng class the derivatives of the weights of hidden layers are calculated using the error signal that is distributed back to…

lakshay taneja
- 73
- 5
4
votes
3 answers
What backpropagation actually is?
I have a conceptual question due to terminology that bothers me.
Is backpropagation algorithm a neural network training algorithm or is it just a recursive algorithm in order to calculate a Jacobian for a neural network? Then this Jacobian will be…

user3223137
- 63
- 4
3
votes
0 answers
Homework/class help: Backward propagation of max pooling if each element in an array determines more than one value?
(This isn't actually my homework, and in fact wasn't addressed in my homework, but I was confused about this because my homework hadn't addressed this)
For example if I have an array:
And I do max pooling with:
filter size = 2x2
stride = 1
I…

user127418
- 31
- 1
2
votes
2 answers
Backpropagation During Neural Networks Training - Units while updating weights
I found this article that describes how neural networks work. This paragraph near the end caught my eye and explains how weights are updated:
So we see that $\theta_i := \theta_i + \nabla\theta_i$ where…

E. Kaufman
- 21
- 3
2
votes
3 answers
A good reference for the back propagation algorithm?
I'm trying to learn more about the fundamentals of neural networks. I feel like I understand the basics of back propagation, but I want to solidify the details in my mind.
I was working through Ian Goodfellow's famous Deep Learning text. However, I…

Tac-Tics
- 1,360
- 2
- 9
- 6
1
vote
0 answers
How to derive gradients for softmax function
We have the following feedforward equations:
$z_1 = W_1x + b_1$
$a_1 = f(z_1)$
$z_2 = W_2a_1 + b_2$
$a_2 = y^* = softmax(z_2)$
$L(y, y^*) = -\frac{1}{N}\sum_{n \in N} \sum_{i \in C} y_{n,i} \log{y^*_{n,i}}$
Now, I'm trying to compute the following…

py1123
- 11
- 1
1
vote
1 answer
"residual error" of LSTM during backprop vs usual "error"
What does the residual error mean when we are talking about LSTM?
Taken from the middle of section 3 of this paper, where it says:
"...of the residual error $\epsilon$"
Where $s_0$ is the initial state of the RNN network.
Question: how is a…

Kari
- 2,726
- 2
- 20
- 49
0
votes
0 answers
Backprop: backward pass way faster than forward pass
I started to work with my own implementation of backpropagation algorithm, that I made five years ago. For each training sample (input-output pair), I make a forward pass (to compute outputs of each neuron), backward pass (to compute "Deltas" for…

Ivan Kuckir
- 101
0
votes
0 answers
forward or reverse accumulation DL frameworks
Automatic differentiation can be accomplished using forward or reverse accumulation.
Quoting Wikipedia :
which mode is used in DL frameworks is used for implementation and why?
Does it have any motivation from the issue of complexity as in the…