Topics and Objectives (Apr 15 – Apr 19)
- Policy evaluation vs. policy Learning
- Policy Iteration
- Temporal Difference Learning
- SARSA and Q-Learning
- On-line vs. Off-line
- On-Policy vs. Off-Policy
- Exploration vs. Exploitation
Lecture Notes
Homework
- See Homework 5 for details
- This is our last set of homework.