- Representing value functions by a lookup table(tabular representation)
- state โ $V(s)$
- state-action pair โ $Q(s,a)$
โ Generalization problems in large state, state-action spaces
- Prefer learning approximations
- w : parameter or weights of function approximator
$$
v_{\pi}(s) \approx \hat{v}(s;w)
$$

$$
q_{\pi}(s,a) \approx \hat{q}(s,a;w)
$$

<aside>
๐ฉ๐ผโ๐ซ Trade off between representation capacity VS memory,computation, experience
</aside>
โ function approximator ํํ๋ ฅ $\uparrow$ ,์ด์ ํ์ํ ๋ฉ๋ชจ๋ฆฌ, ๊ณ์ฐ๋, ๋ฐ์ดํฐ $\uparrow$, ์ ํ๋ $\uparrow$
1. Linear Feature representation
VFA for Policy estimation with oracle
- Assume we know $V^{\pi}(s)$ for all $s$ โ want to fit parameterized function to represent all the data accurately
- MSE as loss between $V^{\pi}(s) ,\hat{V}(s;w)$
- GD or SGD
Model free VFA Policy estimation
- Donโt have access to $V_{\pi}(s)$
- ์ ํด์ง policy ํน์ ๋ฐ์ดํฐ๋ฅผ ํตํด Monte Carlo Methods, TD methods ๋ฅผ ์ด์ฉํด $V_{\pi}(s), Q_{\pi}(s,a)$๋ฅผ ์ถ์ ํจ
- VFA์์๋ $V_{\pi}(s), Q_{\pi}(s,a)$๋ฅผ update ํ ๋ function approximator ๋ update ํจ
Feature vector
- feature vector to represent a state s

<aside>
๐ฉ๐ผโ๐ซ State representation for which there is partial aliasing, itโs not markov
</aside>
Linear VFA for prediction with oracle
- ํน์ Policy์ ๋ํ Value function์ feature vector ์ linear combination์ผ๋ก ํํ
