4

Some RL literature use terms such as: 'Bellman backup' and 'Bellman error'. What do these terms refer to?

nbro
  • 40,472
  • 12
  • 105
  • 192
user529295
  • 369
  • 2
  • 10
  • There's already an answer that addresses both concerns/questions, but, please, next time, focus on one question per post, although, in this case, the terms are highly related (but I still think these "simple" questions could have been asked in separate posts). It may also be a good idea to provide more context (e.g. a link to an article that mentions these terms), although, again, in this case, anyone familiar with RL would be able to understand the question. – nbro Jun 28 '21 at 13:15

1 Answers1

3

A Bellman backup is an application of a Bellman operator. For example, the step

$$ V(x)\leftarrow \alpha(R + \mathbf{E}[V(x')]) + (1-\alpha)V(x) $$

Is a Bellman backup for some learning rate $\alpha$.

A Bellman error is

$$ d(V(x), R + \mathbf{E}[V(x')]) $$

for some metric $d$, usually $d(x, y) = (x-y)^2$.

harwiltz
  • 1,136
  • 1
  • 6
  • 6