adapting MAR baseline on random VAE settings
Remormatting data so that it fits to our motion data
Motion Data ⇒ $T\times263$ since SMPL has $j=22$ joints
$\dot{r}^{a} \isin \mathbb{R}$ : root angular velocity along the y-axis
$\dot{r}^{x}\isin \mathbb{R}$ : root linear velocity along the y-axis
$\dot{r}^{z}\isin \mathbb{R}$ : root linear velocity along the y-axis
${r}^{y}\isin \mathbb{R}$ : root height
$\bold{j}^p \isin \mathbb{R}^{(j - 1)*3}$ : local joint position (in the root space)
$\bold{j}^v\isin \mathbb{R}^{j*3}$ : local joint velocity (in the root space)
$\bold{j}^r\isin \mathbb{R}^{(j-1)*6}$ : local joint rotation (in the root space)
$\bold{c}^{f}\isin \mathbb{R}^{4}$ : foot contact
Text Conditioned Motion Generation.
Sampling Order
Which VAE?
Normal Temporal VAE
Input $T\times263\rightarrow T/l\times263$
Can we expand transformer vae?
→ Need to read MLD
MLD가 애초에 왜 저런 아키텍쳐를 도입한건지
Question : is motion more like image or text. Do we actually need to avoid VQ?
그냥 VQ 없이 한다는걸 너무 메인으로 밀어붙이면 공격받을수도 뭐가 다르냐고