SARSA (State-Action-Reward-State-Action) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. It was introduced in a technical note where the alternative name SARSA was only mentioned as a footnote.
This name simply reflects the fact that the main function for updating the Q-value depends on the current state of the agent “S1”, the action the agent chooses “A1”, the reward “R” the agent gets for choosing this action, the state “S2” that the agent will now be in after taking that action, and finally the next action “A2” the agent will choose in its new state. Taking every letter in the quintuple (st, at, rt, st+1, at+1) yields the word SARSA. …
State-Action-Reward-State-Action (SARSA) google