To make 3D human avatars widely available, we must be able to generate a variety of 3D virtual humans with varied identities and shapes in arbitrary poses. This task is challenging due to the diversity of clothed body shapes, their complex articulations, and the resulting rich, yet stochastic geometric detail in clothing. Hence, current methods to represent 3D people do not provide a full generative model of people in clothing. In this paper, we propose a novel method that learns to generate detailed 3D shapes of people in a variety of garments with corresponding skinning weights. Specifically, we devise a multi-subject forward skinning module that is learned from only a few posed, un-rigged scans per subject. To capture the stochastic nature of high-frequency details in garments, we leverage an adversarial loss formulation that encourages the model to capture the underlying statistics. We provide empirical evidence that this leads to realistic generation of local details such as wrinkles. We show that our model is able to generate natural human avatars wearing diverse and detailed clothing. Furthermore, we show that our method can be used on the task of fitting human models to raw scans, outperforming the previous state-of-the-art.
To generate diverse 3D humans, We build an implicit multi-subject articulated model. We model clothed human shapes and detailed surface normals in a pose-independent canonical space via a neural implicit surface representation, conditioned on latent codes.
In previous work, learning shape and skinning either requires direct supervision or many poses of the same subject. This dramatically limits the data amount and/or quality. With our proposed multi-subject forward skinning module, our method can learn shape and skinning weights jointly from "crowd sampled" poses - one or very few poses per subject from various subjects. As a result, our method can leverage high-quality commercial scans with varying topology and one or very few poses per subject, which was not possible before.
We found that learning high-quality surface details is difficult with a pure reconstruction loss due to the stochastic nature of wrinkles. While adversarial losses are promising to improve details, applying them in 3D is infeasible due to the memory requirement. We propose to learn 3D wrinkles by projecting them to 2D, and then apply a 2D adversarial loss to optimize our 3D detailed normal field. In this way, our method learns to produce faithful wrinkles.
Our model enables disentangled control over coarse shape, fine details, body pose and body size.
We interpolate between latent codes of training samples during animation.
We can randomly sample 3D avatars in various clothing and identities and animate them with poses sequences from existing motion databases.
@article{chen2022gdna,
title={gDNA: Towards Generative Detailed Neural Avatars},
author={Chen, Xu and Jiang, Tianjian and Song, Jie and Yang, Jinlong and Black, Michael J and Geiger, Andreas and Hilliges, Otmar},
journal = {arXiv},
year = {2022}
}