Todo [JS]
조정빈 투두
Todos
ToDos(10/1 - 10/7)
ToDos (10/8~10/11)
ToDos (~10/24)
Refining Human Motions via Rectified Flow in Various Domains
🏗️ Find Best Architecture
Our architecture is basically built upon freezed VQVAEs. By transporting VQVAE output distribution into GT distribution, we can expect high fidelity motions. Also, we expect this improvements will be transferred into VQVAE based Stage 2
- Rectified Flow Architecture → Fixed to DiT
- [ ] Encoder Noise (For generalization) → Not quite effective yet.
- [ ] Decoder Noise (For Rectified Flow) → Big noise is not good at inference time
- [ ] Maybe come up with other augmentations for filling the gap between Transformer output distribution & VQVAE output distribution.
Conditioning (Condition on Text → Not really possible)
- [ ] Condition on Codebook Latent → Recover informations lost in VQVAE Decoder (JB)
- [ ] Condition on something else (Think of some)
- [ ] VAVAE FID → T2M FID (Which transfers best??)
- [ ] Decoder RF / Conditional RF
📊 Metrics to support our idea
Our method is built upon idea that MSE based loss can recover semantic details pretty well, but cannot capture the high frequency details because of its smoothing nature.
- [ ] To show improvements in such perceptual qualities we might have to find (or come upon) some metrics other than FID that fits well with human perception .
- [x] Implement Jitter (Jerk) and compare with our baseline models
- Results → RF tends make motions sharper (Expected…?)
- [ ] Qualitative Result 는 어떻게 실제로 좋아졌는지 보여줄건지?
🔬 Experiments for Different Domains
Our method’s strength is that it can be easily implemented above almost all VQVAE based tasks. Our expectation is that there will be (at least little) improvements when our method is applied
-
We must fix our code baselines (ASAP)
- [ ] Co-speech motion Generation → (MG)
- [ ] Music Driven Dance Generation → (JS)
- [ ] Multitask, Multimodal Motion Generation → (JS)
- [ ] Text to Motion
→ Best hyperparameter, and minor details my vary across different tasks.
-
What should we show??
📝 Paper Writing
Since our idea can be faced with critics of, what is it actually doing? or Performance gain is as is, we have to be solid and give some strong reasons for why is this valuable.
- Idea Cleanup
- [ ] Abstract and Introduction → What is our message? / Why is it valuable? (JW)
- [ ] 어떻게 잘 팔거임??????
- Related Works
- 바로바로 생각안날거같은것들 적어두기 (논문자체든, Theory 든)
- Experiments
- [ ] Ablation study settings
- [ ] Compare with diffusion based refinement?
🥽 Visualization
🧹Code Cleanup
- Github 레포랑 코드 정리좀 하기 → 아예 브랜치 새로파도될듯