Rebuttal (2) | Notion

hUwL (score 4 / conf 3)

Major Weaknesses:

The claim of "scene-awareness without sacrificing semantic diversity" is overstated by the reported numbers. RP@3 and FID are worse compared to the MDM baseline.
Without training with the text-scene-motion triplet, how does the model handle scene-related text prompts? For example, is the model able to generate motions properly with a prompt like "Go and sit on the chair"? This should be discussed as a potential limitation of the proposed method.

Justification Of Preliminary Recommendation:

The proposed method is well-motivated and technically sound. But the paper shouldn't overstate and should discuss the limitations properly.

<aside> 📝

without sacrificing → tradeoff가 적음을 강조하고 싶었다 / 고치겠다

limitation → 안 되는거 정리해서 주고 revision때 수정하겠다고 얘기

</aside>

Major Weaknesses:

Using inbetweening as the proxy task, the paper's central design decision, is never rigorously motivated. Other text-free alternatives (masked completion, trajectory prediction, motion denoising) could serve the same role. Tab. 5 shows the pipeline works, but not that inbetweening is the best choice.
The evaluation set is assembled by matching HML3D pairs with trajectory positions sampled from TRUMANS, which may favor SceneAdapt. Evaluation on an independently collected scene set would strengthen validity.
Missing comparison with SceneMI [20]. SceneMI directly addresses scene-aware inbetweening using scene-motion data, highly overlapping with this work's problem formulation.

<aside> 📝

다른 evaluation set에서 평가 ? : 뭘 써야 한담?

SceneMI evaluation해서 추가로 report ?

Justification은 밑에서 같이

</aside>

Alternatives