Neurips review

This paper introduce NymeriaHMG, which is a scene-aware human motion generation benchmark & dataset that has more number of clips, duration, scenes than previous benchmarks, along with free-from text. NymeriaHMG is derived from the original Nymeria dataset, and the authors run processing steps which make it suitable for scene-aware human motion generation. Moreover, the authors introduce scenemask, which is an extension of MoMask, but with scene input as well.

The motivation of this paper is timely. Text-conditioned scene-aware 3D human motion generation lacks large scale dataset that has diverse scenes along with diverse human motion semantics. Current scene-text-motion triplet datasets are either synthetic (HUMANISE), lack size, or have only a handful of motion diversity such as "get up", "sit down", "lie down" etc. NymeriaHMG, as it is derived from Nymeria, is not synthetic, large, and has diverse motions.

1. Lack of contribution and novelty

NymeriaHMG : NymeriaHMG is from processing Nymeria. However, the processing steps are simple and more like implementation details rather than contributions. For motion processing, the authors use Atomic Action annotations as text along with its timestamp annotation. This is not a contribution. For scene processing, the authors 1) crop 2) remove outliers 3) downsample 4) augment floor points 5) canonicalize the scenes. This is just preprocessing implementation details not a contribution.
SceneMask : Scenemask is Momask with scene inputs. This is not a contribution.