Questions tagged [backpropagation]

Use for questions about Backpropagation, which is commonly used in training Neural Networks in conjunction with an optimization method such as gradient descent.

310 questions
11
votes
1 answer

Synthetic Gradients good number of Layers & neurons

I would like to train my LSTM with a "synthetic gradients" Decoupled Neural Interface (DNI). How to decide on the number of layers and neurons for my DNI? Searching for them by trial end error or what's worse - by Genetic algorithm which would…
Kari
  • 2,726
  • 2
  • 20
  • 49
8
votes
2 answers

Synthetic Gradients - what's the practical benefit?

I can see two motives to use Synthetic Gradients in RNN: To speed up training, by imediately correcting each layer with predicted gradient To be able to learn longer sequences I see problems with both of them. Please note, I really like…
Kari
  • 2,726
  • 2
  • 20
  • 49
8
votes
1 answer

How to apply the gradient of softmax in backprop

I recently did a homework where I had to learn a model for the MNIST 10-digit classification. The HW had some scaffolding code and I was supposed to work in the context of this code. My homework works / passes tests but now I'm trying to do it all…
SaldaVonSchwartz
  • 299
  • 1
  • 3
  • 7
4
votes
1 answer

Confusion in backpropagation algorithm

I have been trying to understand the backpropagation for a while now. I have came across two variants of it. In the Andrew Ng class the derivatives of the weights of hidden layers are calculated using the error signal that is distributed back to…
4
votes
3 answers

What backpropagation actually is?

I have a conceptual question due to terminology that bothers me. Is backpropagation algorithm a neural network training algorithm or is it just a recursive algorithm in order to calculate a Jacobian for a neural network? Then this Jacobian will be…
3
votes
0 answers

Homework/class help: Backward propagation of max pooling if each element in an array determines more than one value?

(This isn't actually my homework, and in fact wasn't addressed in my homework, but I was confused about this because my homework hadn't addressed this) For example if I have an array: And I do max pooling with: filter size = 2x2 stride = 1 I…
user127418
  • 31
  • 1
2
votes
2 answers

Backpropagation During Neural Networks Training - Units while updating weights

I found this article that describes how neural networks work. This paragraph near the end caught my eye and explains how weights are updated: So we see that $\theta_i := \theta_i + \nabla\theta_i$ where…
E. Kaufman
  • 21
  • 3
2
votes
3 answers

A good reference for the back propagation algorithm?

I'm trying to learn more about the fundamentals of neural networks. I feel like I understand the basics of back propagation, but I want to solidify the details in my mind. I was working through Ian Goodfellow's famous Deep Learning text. However, I…
Tac-Tics
  • 1,360
  • 2
  • 9
  • 6
1
vote
0 answers

How to derive gradients for softmax function

We have the following feedforward equations: $z_1 = W_1x + b_1$ $a_1 = f(z_1)$ $z_2 = W_2a_1 + b_2$ $a_2 = y^* = softmax(z_2)$ $L(y, y^*) = -\frac{1}{N}\sum_{n \in N} \sum_{i \in C} y_{n,i} \log{y^*_{n,i}}$ Now, I'm trying to compute the following…
py1123
  • 11
  • 1
1
vote
1 answer

"residual error" of LSTM during backprop vs usual "error"

What does the residual error mean when we are talking about LSTM? Taken from the middle of section 3 of this paper, where it says: "...of the residual error $\epsilon$" Where $s_0$ is the initial state of the RNN network. Question: how is a…
Kari
  • 2,726
  • 2
  • 20
  • 49
0
votes
0 answers

Backprop: backward pass way faster than forward pass

I started to work with my own implementation of backpropagation algorithm, that I made five years ago. For each training sample (input-output pair), I make a forward pass (to compute outputs of each neuron), backward pass (to compute "Deltas" for…
0
votes
0 answers

forward or reverse accumulation DL frameworks

Automatic differentiation can be accomplished using forward or reverse accumulation. Quoting Wikipedia : which mode is used in DL frameworks is used for implementation and why? Does it have any motivation from the issue of complexity as in the…