1. Markov Process

Memoryless random process
- Sequence of random states with Markov Property

<aside> 👩🏼‍🏫 When we think about Markov Process or Markov chain, there is no control or actions, but the idea is that you might have a stochastic process that’s evolving over time (stock market)

</aside>

Definition of Markov Process
- S : (finite) set of states ( s $\in$ S )
- P : dynamics/transition model that specifies $p(s_{t+1} = s^\prime | s_t = s)$
No rewards, no actions
Define a MP by the tuple ($S,P$)

Untitled

2. Markov Reward Process(MRPs)

2-1. Markov Reward Process

Markov chain + Rewards
Definition of Markov Reward Process (MRP)
- S : (finite) set of states ( s $\in$ S )
- P : dynamics/transition model that specifies $p(s_{t+1} = s^\prime | s_t = s)$
- R : reward function $R(s_t=s) = \mathbb{E}[r_t|s_t=s]$
- Discount factor $\gamma \in [0,1]$
No actions
Define MRP by the tuple ($S,P,R, \gamma$)

2-2. Reward function

For the markov reward process $(s_0, s_1, s_2,...)$, each transition $s_i → s_{i+1}$ is accompanied by a reward $r_i$ for all $i = 0,1,...$ and so a particular episode of the MRP is represented as $(s_0,r_0,s_1,r_1,...)$
$R(s) = \mathbb{E}[r_0|s_0=s]$ : expected reward obtained during the first transition, when the Markov process starts in sate $s$

2-3. Return & Value Function

Definition of Horizon
- Number of time steps in each episode (~$\infty$)