1. Markov Process
- Memoryless random process
- Sequence of random states with Markov Property
<aside>
๐ฉ๐ผโ๐ซ When we think about Markov Process or Markov chain, there is no control or actions, but the idea is that you might have a stochastic process thatโs evolving over time (stock market)
</aside>
- Definition of Markov Process
- S : (finite) set of states ( s $\in$ S )
- P : dynamics/transition model that specifies $p(s_{t+1} = s^\prime | s_t = s)$
- No rewards, no actions
- Define a MP by the tuple ($S,P$)

2. Markov Reward Process(MRPs)
2-1. Markov Reward Process
- Markov chain + Rewards
- Definition of Markov Reward Process (MRP)
- S : (finite) set of states ( s $\in$ S )
- P : dynamics/transition model that specifies $p(s_{t+1} = s^\prime | s_t = s)$
- R : reward function $R(s_t=s) = \mathbb{E}[r_t|s_t=s]$
- Discount factor $\gamma \in [0,1]$
- No actions
- Define MRP by the tuple ($S,P,R, \gamma$)
2-2. Reward function
- For the markov reward process $(s_0, s_1, s_2,...)$, each transition $s_i โ s_{i+1}$ is accompanied by a reward $r_i$ for all $i = 0,1,...$ and so a particular episode of the MRP is represented as $(s_0,r_0,s_1,r_1,...)$
- $R(s) = \mathbb{E}[r_0|s_0=s]$ : expected reward obtained during the first transition, when the Markov process starts in sate $s$
2-3. Return & Value Function
- Definition of Horizon
- Number of time steps in each episode (~$\infty$)