I am new to pytorch and started with this github code. I do not understand the comment in line 60-61 in the code "because weights have requires_grad=True, but we don't need to track this in autograd"
. I understood that we mention requires_grad=True
to the variables which we need to calculate the gradients for using autograd but what does it mean to be "tracked by autograd"
?

- 207
- 1
- 6

- 1,282
- 1
- 10
- 14
4 Answers
The wrapper with torch.no_grad()
temporarily sets all of the requires_grad
flags to false. An example is from the official PyTorch tutorial.
x = torch.randn(3, requires_grad=True)
print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
print((x ** 2).requires_grad)
Output:
True
True
False
I recommend you to read all the tutorials from the link above.
In your example: I guess the author does not want PyTorch to calculate the gradients of the new defined variables w1 and w2 since he just want to update their values.
Torch.no_grad()
deactivates autograd engine. Eventually it will reduce the memory usage and speed up computations.
Use of Torch.no_grad()
:
To perform inference without Gradient Calculation.
To make sure there's no leak test data into the model.
It's generally used to perform Validation. Reason in this case one can use validation batch of large size.

- 1,633
- 9
- 24
- 39

- 191
- 1
- 5
with torch.no_grad()
will make all the operations in the block have no gradients.
In pytorch, you can't do inplacement changing of w1 and w2, which are two variables with require_grad = True
. I think that avoiding the inplacement changing of w1 and w2 is because it will cause error in back propagation calculation. Since inplacement change will totally change w1 and w2.
However, if you use this no_grad()
, you can control the new w1 and new w2 have no gradients since they are generated by operations, which means you only change the value of w1 and w2, not gradient part, they still have previous defined variable gradient information and back propagation can continue.

- 1,783
- 11
- 22
- 34

- 101
- 1
- 2
-
Hey, I looked at the code and I can't see any problem with not having the torch.no_grad() line. I mean anyway we clear the grad so that tracking should not matter. Please correct me if I am wrong! – Black Jack 21 Apr 07 '20 at 10:42
I think if we do not use torch.no_grad
then the weight update step will be added to the computational graph of the neural network which is not desired.

- 1,633
- 9
- 24
- 39

- 111
autograd
? Is there a memory allocation reason or what? Thanks! – desmond13 Mar 30 '20 at 08:36torch.no_grad()
line. I mean anyway we clear the grad so that tracking should not matter. Please correct me if I am wrong! – Black Jack 21 Apr 07 '20 at 10:41model.eval()
would mean that I didn't need to also usetorch.no_grad()
. Turns out that both have different goals:model.eval()
will ensure that layers like batchnorm or dropout will work in eval mode instead of training mode; whereas,torch.no_grad()
is used for the reason specified above in the answer. Ideally, one should use both if in the evaluation phase. – Lakshay Sharma Apr 23 '20 at 22:34torch.no_grad()
does not set 'all of therequires_grad
flags toFalse
; it only sets these toFalse
for new tensors.requires_grad
will not be set toFalse
for the parameters of the model (or, in this example, for the original tensorx
). – eric.mitchell Sep 15 '21 at 17:33