hUwL (score 4 / conf 3)
Major Weaknesses:
- The claim of "scene-awareness without sacrificing semantic diversity" is overstated by the reported numbers. RP@3 and FID are worse compared to the MDM baseline.
- Without training with the text-scene-motion triplet, how does the model handle scene-related text prompts? For example, is the model able to generate motions properly with a prompt like "Go and sit on the chair"? This should be discussed as a potential limitation of the proposed method.
Justification Of Preliminary Recommendation:
The proposed method is well-motivated and technically sound. But the paper shouldn't overstate and should discuss the limitations properly.
<aside>
📝
without sacrificing → tradeoff가 적음을 강조하고 싶었다 / 고치겠다
limitation → 안 되는거 정리해서 주고 revision때 수정하겠다고 얘기
</aside>
Qg3t (score 4 / conf 3)
Major Weaknesses:
- Using inbetweening as the proxy task, the paper's central design decision, is never rigorously motivated. Other text-free alternatives (masked completion, trajectory prediction, motion denoising) could serve the same role. Tab. 5 shows the pipeline works, but not that inbetweening is the best choice.
- The evaluation set is assembled by matching HML3D pairs with trajectory positions sampled from TRUMANS, which may favor SceneAdapt. Evaluation on an independently collected scene set would strengthen validity.
- Missing comparison with SceneMI [20]. SceneMI directly addresses scene-aware inbetweening using scene-motion data, highly overlapping with this work's problem formulation.
<aside>
📝
다른 evaluation set에서 평가 ? : 뭘 써야 한담?
SceneMI evaluation해서 추가로 report ?
Justification은 밑에서 같이
</aside>
Alternatives