This paper introduce NymeriaHMG, which is a scene-aware human motion generation benchmark & dataset that has more number of clips, duration, scenes than previous benchmarks, along with free-from text. NymeriaHMG is derived from the original Nymeria dataset, and the authors run processing steps which make it suitable for scene-aware human motion generation. Moreover, the authors introduce scenemask, which is an extension of MoMask, but with scene input as well.

The motivation of this paper is timely. Text-conditioned scene-aware 3D human motion generation lacks large scale dataset that has diverse scenes along with diverse human motion semantics. Current scene-text-motion triplet datasets are either synthetic (HUMANISE), lack size, or have only a handful of motion diversity such as "get up", "sit down", "lie down" etc. NymeriaHMG, as it is derived from Nymeria, is not synthetic, large, and has diverse motions.

1. Lack of contribution and novelty