Keywords : VQVAE, Diffusion Refinement, 2D Conv, 2D Attention, Autoregressive Generation
vqvae + Diff 구현
아예 Convolution 단에서 Spatial Dim을 1로 줄여버리기
EX) (B, S, 263) → (B, S, 12, 21) → (B, 1, 49, 16) → (B, 49, 16)
Spatial Dimension 살려놓고 Reshape 하기
EX) (B, S, 263) → (B, S, 12, 21) → (B, 4, 16, 16) → (B, 64, 16)
ex)

