This paper introduce NymeriaHMG, which is a scene-aware human motion generation benchmark & dataset that has more number of clips, duration, scenes than previous benchmarks, along with free-from text. NymeriaHMG is derived from the original Nymeria dataset, and the authors run processing steps which make it suitable for scene-aware human motion generation. Moreover, the authors introduce scenemask, which is an extension of MoMask, but with scene input as well.
The motivation of this paper is timely. Text-conditioned scene-aware 3D human motion generation lacks large scale dataset that has diverse scenes along with diverse human motion semantics. Current scene-text-motion triplet datasets are either synthetic (HUMANISE), lack size, or have only a handful of motion diversity such as "get up", "sit down", "lie down" etc. NymeriaHMG, as it is derived from Nymeria, is not synthetic, large, and has diverse motions.