📌Last lecture : How to evaluate a given policy while assuming we do not know how the world works and only interacting with the environment.
→ Model-free policy evaluation Methods
- Monte Carlo policy evaluation
- TD learning
✔This lecture : Model free control where we learn good policies under the same constraint which is important when
- MDP model is unknown but can be sampled
- MDP model is known but computing the value function via model based control methods is infeasible due to the size of the domain
1. Generalized Policy Iteration

- Line 4 can be done by model-free policy evaluation (Lecture 3)
- In order to make the entire algorithm Model-free, we need to improve the policy in a model-free way (Line 5)

1-1. MC for On policy Q evaluation
2. Importance of Exploration
2-1. Policy Evaluation with Exploration
2-2. Monotonic $\epsilon$-greedy policy improvement

3. Monte Carlo Control