1. Dynamic programming

Untitled

2. Monte Carlo policy evaluation

2-1. First-visit Monte Carlo

Untitled

2-2. Every-visit Monte Carlo

Untitled

2-3. Incremental Monte Carlo

Untitled

<aside> 👩🏼‍🏫 skew it so that you’re running average is more weighted towards recent data because your real domain is non-stationary (when MDP is changing over time)

</aside>