Specialized for prediction problems.
Conventional prediction: error between predicted and actual
Temporal Difference: error (difference) between temporally successive predictions
Better than supervised by an actual RL signal because it is incremental.
The parameter λ introduces a recency bias that makes the method weight more recent samples more highly.
Formally, for
, we have
Note that if then we have the uniform sampling distribution, where all diffs in the future are weighted equally,
and if then we have the singleton distribution where only the next diff matters.
@article{article,
author = {Sutton, Richard},
year = {1988},
month = {08},
pages = {9-44},
title = {Learning to Predict by the Method of Temporal Differences},
volume = {3},
journal = {Machine Learning},
doi = {10.1007/BF00115009}
}