Lecture 1 - Introduction | Notion

Reinforcement learning Involves
- Optimization
- Delayed Consequences
- Exploration
- Generalization

1. Sequential Decision Making

Goal: Select actions to maximize total expected future reward
May require balancing immediate & long term rewards

<aside> 👩🏼‍🏫 An agent is going to do exactly what we tell him to do in terms of the rewards function we specify. → It is important to design a good reward

</aside>

History

History $h_t = (a_1, o_1, r_1, ..., a_t, o_t, r_t)$
Agent chooses action based on history
State is information assumed to determine what happens next
- Function of history: $s_t = (h_t)$

Agent State

What the agent, algorithm uses to make decisions about how to act
Function of the history $s_t = f(h_t)$
Could include meta info
- State of algorithm (how many computations..etc)
- Decision process (how many decision left until end)

World State

True state of the world used to determine how world generates next observation and reward
Often hidden or unknown to agent → might not contain needed info

2. Markov Assumption

State s_t is Markov if and only if : $p(s_{t+1}|s_t,a_t) = p(s_{t+1}|h_t , a_t)$