📌Last lecture : How to evaluate a given policy while assuming we do not know how the world works and only interacting with the environment.

→ Model-free policy evaluation Methods

✔This lecture : Model free control where we learn good policies under the same constraint which is important when

  1. MDP model is unknown but can be sampled
  2. MDP model is known but computing the value function via model based control methods is infeasible due to the size of the domain

1. Generalized Policy Iteration

Untitled

Untitled

1-1. MC for On policy Q evaluation

2. Importance of Exploration

2-1. Policy Evaluation with Exploration

2-2. Monotonic $\epsilon$-greedy policy improvement

Untitled

3. Monte Carlo Control