What do the terms 'Bellman backup' and 'Bellman error' mean?

Question

Some RL literature use terms such as: 'Bellman backup' and 'Bellman error'. What do these terms refer to?

There's already an answer that addresses both concerns/questions, but, please, next time, focus on one question per post, although, in this case, the terms are highly related (but I still think these "simple" questions could have been asked in separate posts). It may also be a good idea to provide more context (e.g. a link to an article that mentions these terms), although, again, in this case, anyone familiar with RL would be able to understand the question. — nbro, Jun 28 '21 at 13:15

score 3 · Accepted Answer · answered Jun 28 '21 at 12:01

3

A Bellman backup is an application of a Bellman operator. For example, the step

$$ V(x)\leftarrow \alpha(R + \mathbf{E}[V(x')]) + (1-\alpha)V(x) $$

Is a Bellman backup for some learning rate $\alpha$.

A Bellman error is

$$ d(V(x), R + \mathbf{E}[V(x')]) $$

for some metric $d$, usually $d(x, y) = (x-y)^2$.

answered Jun 28 '21 at 12:01

harwiltz

What does 'backup' refer to here? – user529295 Jun 28 '21 at 12:55
2

It refers to propagating information from later states to earlier ones (backward in time sorta) – harwiltz Jun 28 '21 at 13:02
2

It may be a good idea to 1. also provide the figures of the backups e.g. that you can find in Sutton and Barto's book, 2. to link the OP to this question about what the Bellman operator is and 3. explain the symbols in your answer. – nbro Jun 28 '21 at 13:12
@nbro Thanks for the references. – user529295 Jun 28 '21 at 13:18

1 Answers1