1. Model Based RL


1-1. Model Learning
- Goal : Estimate model $M$ from experience ${ S_1,A_1,r_2, \dots , S_T}$
Table Lookup Model
- Count visits N(s,a) to each state action pair.
Planning with a Model
- 모델 M이 주어지면 value iteration, policy iteration 등을 이용해 solve MDP
Sample based Planning
- Use model to generate samples
- Apply model-free RL to samples (Mote carlo control, Sarsa, Q-learning)

<aside>
💡 Efficient because we can generate tons of data from model and can make different assumptions!
</aside>
2. Simulation-Based Search
2-1. Forward Search