Human motion generation based on diverse conditional signals has become a key focus in computer vision, especially due to its significance in real-world applications such as Virtual Reality (VR), animation, and gaming. For generated motion to integrate seamlessly into such real-world applications, it must meet two essential criteria: 1) accurate alignment with the conditional signal and 2) a natural quality that avoids any sense of unnaturalness or discomfort for human observers.

While 최근에 Human Motion Generation task 에 많은 발전이 있었지만, 대부분의 method는 둘 중 하나에 집중하고 다른 하나를 overlook 하는 경향이 있었다. 우리는 이 두가지 criteria 중 어느 하나도 overlook 하면 안된다는 사실로부터 present new framework that can effectively adress 두가지 문제.

모션은 기본적으로 continuous modality 이기 때문에 Human motion generation based on continuous representations are inherently well-suited for generating natural motion. However, learning such continuous representations from 적은 데이터 may fall short of satisfactory quality.

scratch requires substantial data, which is nor yet present in human motion domain, the results may fall short of satisfactory quality.

alignment with the modality of smooth, uninterrupted movement. However, learning such continuous representations from scratch requires substantial data, and in data-limited scenarios, the results may fall short of satisfactory quality.

We found out discrete representation generated by discrete quantization based method can work as strong condition for building continuous representation space. NAME 은 Score based method 들이 iterative 한 방법을 통해 continuous representation 을 효과적으로 만들어내는 것을 이용하여, Discrete quantization based method들이 생성하는 모션들을 iterative 하게 refine 한다.

We note that 우리의 방법은 conditioning signal에 상관없이 모든 discrete based motion generation framework 에 적용되어 성능을 올릴 수 있다. Through our extensive experiments we show that our method 는 existing method들에 쉽게 적용되어 대부분의 경우 State of the art 성능을 달성한다.

기존 태스크들에서 일반적으로 사용하는 conventional metrics들 이외에도 우리는 jitter 라는 additional metric 에 대한 extensive analysis 을 통해 우리의 방법이 conventional metric에서 improve를 보여주는 한편 naturalness 측면에서도 기존의 방법론들보다 좋다는 사실을 보여준다.

We note that our method can be applied to any discrete-based motion generation framework, regardless of the conditioning signal, to enhance performance. Through extensive experiments, we demonstrate that our method can be seamlessly integrated into existing approaches and achieves state-of-the-art (SOTA) performance in most cases. In addition to conventional metrics commonly used in existing tasks, we conduct an extensive analysis using an additional metric, Jitter \cite{yi2021transpose}, which is derivative of acceleration that can effectively measure physical naturalness, to demonstrate that our method not only improves performance on conventional metrics but also outperforms existing approaches in terms of naturalness.

우리는 기존 대부분의 Human Motion Generation Scenario에서 SOTA를 달성하는 한편, address the problem of generating natural human motion, overlooked by SOTA discrete-based generation mehotds.

However 우리는 기존의 GT Based metric 들만으로는 motion 의 physical naturalness를 판단하기에는 부족하다는 점을 확인하였다. 따라서 이런 Physical naturalness를 adress 하고자 In addition to conventional metrics commonly used in existing tasks, we perform a detailed analysis using an additional metric,