Temporal difference (TD) learning is a prediction method. It has been mostly used for solving the reinforcement learning problem. “TD learning is a combination of Monte Carlo ideas and dynamic programming (DP) ideas.” TD resembles a Monte Carlo method because it learns by sampling the environment according to some policy. TD is related to dynamic programming techniques because it approximates its current estimate based on previously learned estimates (a process known as bootstrapping). The TD learning algorithm is related to the temporal difference model of animal learning. … Temporal Difference Learning (TD) google