I've been working on an implementation of TD-backgammon. The paper/project I'm basing my implementation on is here:
https://www.cs.cornell.edu/boom/2001sp/Tsinteris/gammon.htm
Everything makes sense to me up until the point that it talks about the procedure for back-prop. I haven't taken a lot of upper-level maths past Calc II, and I've never taken a formal course on ML/RL.
The description:
Backpropagation procedure:
Given an input vector V and a desired output O.
Calculate error E between the network's output on V and the desired output O.
e(s) = (lambda)*e(s) + grad(V)
V = V + (alpha)*error(n)*e(s)
where error(n) is:
For the weight between hidden node i and the output node, error(i)=E*activation(i)*weight(i)
For the weigth between input node j and hidden node i, error(j,i)=error(i)*activation(j)*weight(j,i)
The main points I'm confused about is:
What information is included in the "eligibility trace vector" e(s)?
What is "(lambda)" in step 2?
What is "grad(V)" in step 2. Does it stand for the gradient? and if so what does this mean?
What is meant by "alpha" in step 3?
Any help or resources would be greatly appreciated.