<aside> 👩‍🏫 “Continous한 space에서 generation을 해야한다”라는 것을 주장하기 위해서는 discrete motion prior의 한계를 보여줘야함.

</aside>

https://github.com/EricGuo5513/momask-codes

→ Momask는 Residual VQVAE를 활용해서 현재 humanML3D benchmark에서 SOTA이다

공개된 rvq의 HumanML3D test set 성능은 다음과 같다.

FID: 0.019, conf. 0.000 Diversity: 9.634, conf. 0.098 TOP1: 0.509, conf. 0.003, TOP2. 0.701, conf. 0.003, TOP3. 0.795, conf. 0.003 Matching: 2.998, conf. 0.007 MAE:0.029, conf.0.000

Ours

t2m_TemporalVAE_qf2_hd256_fd512_nl2_nh8_klw0.0001_0830015023

	FID: 0.010, conf. 0.000
	Diversity: 8.373, conf. 0.068
	TOP1: 0.444, conf. 0.002, TOP2. 0.638, conf. 0.003, TOP3. 0.744, conf. 0.002
	Matching: 3.390, conf. 0.006
	MAE:0.013, conf.0.000

t2m_TemporalVAE_qf2_hd256_fd512_nl2_nh8_klw0.0001_0830015023

	FID: 0.011, conf. 0.000
Diversity: 8.542, conf. 0.064
TOP1: 0.439, conf. 0.003, TOP2. 0.635, conf. 0.002, TOP3. 0.741, conf. 0.002
Matching: 3.408, conf. 0.007
MAE:0.014, conf.0.000

t2m_TemporalVAE_qf2_hd128_fd512_nl2_nh8_klw0.0001_0915193643

FID: 0.018, conf. 0.000
Diversity: 8.480, conf. 0.088
TOP1: 0.438, conf. 0.002, TOP2. 0.634, conf. 0.002, TOP3. 0.744, conf. 0.002
Matching: 3.423, conf. 0.007
MAE:0.019, conf.0.000

t2m_TemporalVAE_qf2_hd128_fd512_nl2_nh8_klw0.0001_bsz1024_0921205552

FID: 0.008, conf. 0.000
Diversity: 8.410, conf. 0.063
TOP1: 0.437, conf. 0.002, TOP2. 0.634, conf. 0.002, TOP3. 0.740, conf. 0.002
Matching: 3.423, conf. 0.006
MAE:0.017, conf.0.000