- Reinforcement learning Involves
- Optimization
- Delayed Consequences
- Exploration
- Generalization
1. Sequential Decision Making
<aside>
👩🏼🏫 An agent is going to do exactly what we tell him to do in terms of the rewards function we specify. → It is important to design a good reward
</aside>
History
- History $h_t = (a_1, o_1, r_1, ..., a_t, o_t, r_t)$
- Agent chooses action based on history
- State is information assumed to determine what happens next
- Function of history: $s_t = (h_t)$
Agent State
- What the agent, algorithm uses to make decisions about how to act
- Function of the history $s_t = f(h_t)$
- Could include meta info
- State of algorithm (how many computations..etc)
- Decision process (how many decision left until end)
World State
- True state of the world used to determine how world generates next observation and reward
- Often hidden or unknown to agent → might not contain needed info
2. Markov Assumption
- State s_t is Markov if and only if : $p(s_{t+1}|s_t,a_t) = p(s_{t+1}|h_t , a_t)$