Rectified Flow VQVAE
[x] Vanilla RF VQVAE ์ฑ๋ฅ ์ธก์
โ Tradeoff between stochastic / deterministic sampling in FID / MPJPE
โ Effect of reflow is not significant
โ Implemented with very basic MLP (BxSxD โ B*SxD) : No Attention Used
[x] Architecture Improvements
[ ] UNet / Transformer (or at least some attention in temporal dimension)
[ ] End to End Training (Following Sample what you cannot compress scheme)
[x] Text Conditioning (์ถํ์๋ ๋ฐ๋์ ํ์ํ ๊ฒ์ด๋ผ๊ณ ์๊ฐ๋จ)
[x] Reflow ์ ์ ํจ์ฑ ๊ฒ์ฆ
[ ] ๋ค์ํ ํ๋ผ๋ฏธํฐ, ๋์์ธ ๊ฒ์ฆ
Conditioning ๋ณด๋ค๋ ์ข๋ค๋ผ๋๊ฒ ๋ ผ๋ฌธ์์ ๋ณด์ฌ์ฃผ๊ธฐ๋ ํจ. (๊ทธ๋ฆฌ๊ณ ๊ตณ์ด Pretrained World Model์ด ์๋๋ฐ Conditioning ํ ํ์๋ ์์ด๋ณด์ธ๋ค.)
โ ์ด๊ฑฐ์ ๋ํ ์ค๋ช ์ด ์๊ธดํ๋ค (Condition ์ํค๋ฉด Posterior MSE๋ ๋์น๋ค)
[ ] Theoretical Background
Vanilla (2D)VQVAE + RF ์ ๋ํ T2M ์ฑ๋ฅ ์ธก์
[ ] MMM ์คํด์ ๊ทธ๋๋ก ๋ฐ๋ผ์ Bidirectional 1 Stage Generation (์ผ๋จ 1D๋ก Flattenํด์ ๋น ๋ฅด๊ฒ ํ์ธ?) (Fixed Length) โ WIP
[ ] ๊ฒฐ๊ตญ ์ฐ๋ฆฌ๊ฐ 2DVQVAE๋ฅผ ๊ฐ์ง๊ณ ์๊ธฐ๋ ํ๊ณ ์ฑ๋ฅ์ด ์ข๋ค๋๊ฒ ์ถฉ๋ถํ ๋ณด์ฌ์ง๊ฒ ๊ฐ์ผ๋ฏ๋ก MogenTS ์ 2D Token Map + 2D Masking Strategy Handling (๋จ ๋ฐ์ Autoregressive + Bidirectional ์ธํ ๊ณผ์ ์ฐ๊ฒฐ๋ ๊ณ ๋ คํด์ผํจ.)
[ ] BAMM์ Autoregressive + Bidirectional ์ธํ ์ฐจ์ฉ
[ ] Architecture ๊ณ ์ ํ๋ฉด VQVAE ๋์์ธ ๋ฑ ๋ฐ๊ฟ๊ฐ๋ฉด์ Ablation ํด๋ณด๊ธฐ
[ ] ์ VQVAE๊ฐ ์๋๋๊ฐ? ์ ๋ํ ๊ณ ์ฐฐ (Like Cross Entropy)
โ VQVAE + Continuous refinement ๋ผ๋ Pipeline ์ ์ฐ๋ฆฌ๊ฐ ์ ์ฌ์ฉํด์ผํ๋๊ฐ?
Diffusion VQVAE
MAR