MeshAvatar

Abstract

We present a novel pipeline for learning high-quality triangular human avatars from multi-view videos. Recent methods for avatar learning are typically based on neural radiance fields (NeRF), which is not compatible with traditional graphics pipeline and poses great challenges for operations like editing or synthesizing under different environments. To overcome these limitations, our method represents the avatar with an explicit triangular mesh extracted from an implicit SDF field, complemented by an implicit material field conditioned on given poses. Leveraging this triangular avatar representation, we incorporate physics-based rendering to accurately decompose geometry and texture. To enhance both the geometric and appearance details, we further employ a 2D UNet as the network backbone and introduce pseudo normal ground-truth as additional supervision. Experiments show that our method can learn triangular avatars with high-quality geometry reconstruction and plausible material decomposition, inherently supporting editing, manipulation or relighting operations.

Method Overview

Our pipeline learns a hybrid human avatar represented in the form of (a) an explicit skinned mesh and (b) implicit pose-dependent material fields. Such a representation inherently supports (c) physics-based ray tracing and can be trained in an end-to-end manner using (d) normal estimation as an additional supervision signal.

Comparisons

Quantitative Results

	Ours	AvatarReX	Animatable Gaussians	Animatable Gaussians^*	Xu et al.	Lin et al.	Intrinsic Avatar
Representation	hybrid	SDF	3DGS	3DGS	SDF	SDF	SDF
Relightable?	✔			✔	✔	✔	✔
Training Time (~100 frames)	~3h					2.5 days	4h (mono.)
Training Time (~1000 frames)	~16h	2 days	2 days (RTX 4090)	2 days (RTX 4090)	30h
Inference Time (per image)	180ms	30s	100ms	4~10s	5s	40s	20s

References

[1] Zheng, Zerong, et al. "Avatarrex: Real-time expressive full-body avatars." ACM Transactions on Graphics (TOG) 42.4 (2023): 1-19.
[2] Işık, Mustafa, et al. "Humanrf: High-fidelity neural radiance fields for humans in motion." ACM Transactions on Graphics (TOG) 42.4 (2023): 1-12.

BibTeX

@misc{chen2024meshavatar, title={MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos}, author={Yushuo Chen and Zerong Zheng and Zhe Li and Chao Xu and Yebin Liu}, year={2024}, eprint={2407.08414}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2407.08414}, }

MeshAvatar: Learning High-quality Triangular
Human Avatars from Multi-view Videos

Given the multi-view videos of a specific subject, our method learns his triangular avatar with intrinsic material decomposition. After training, the avatar not only supports synthesis under novel poses and novel lighting conditions, but also enables texture editing and material manipulation.

Abstract

Method Overview

Video Presentation

Comparisons

Quantitative Results

Comparisons with recent SOTA methods on neural avatars. We achieved 20x faster at inference.

Qualitative Results

We evaluated our method on AvatarReX^[1] and ActorsHQ^[2] datasets. Our method could reconstruct fine-grained dynamic human geometry.

References

[1] Zheng, Zerong, et al. "Avatarrex: Real-time expressive full-body avatars." ACM Transactions on Graphics (TOG) 42.4 (2023): 1-19.
[2] Işık, Mustafa, et al. "Humanrf: High-fidelity neural radiance fields for humans in motion." ACM Transactions on Graphics (TOG) 42.4 (2023): 1-12.

BibTeX

MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos

Given the multi-view videos of a specific subject, our method learns his triangular avatar with intrinsic material decomposition. After training, the avatar not only supports synthesis under novel poses and novel lighting conditions, but also enables texture editing and material manipulation.

Abstract

Method Overview

Video Presentation

Comparisons

Quantitative Results

Comparisons with recent SOTA methods on neural avatars. We achieved 20x faster at inference.

Qualitative Results

We evaluated our method on AvatarReX[1] and ActorsHQ[2] datasets. Our method could reconstruct fine-grained dynamic human geometry.

References

[1] Zheng, Zerong, et al. "Avatarrex: Real-time expressive full-body avatars." ACM Transactions on Graphics (TOG) 42.4 (2023): 1-19. [2] Işık, Mustafa, et al. "Humanrf: High-fidelity neural radiance fields for humans in motion." ACM Transactions on Graphics (TOG) 42.4 (2023): 1-12.

BibTeX

MeshAvatar: Learning High-quality Triangular
Human Avatars from Multi-view Videos

We evaluated our method on AvatarReX^[1] and ActorsHQ^[2] datasets. Our method could reconstruct fine-grained dynamic human geometry.

[1] Zheng, Zerong, et al. "Avatarrex: Real-time expressive full-body avatars." ACM Transactions on Graphics (TOG) 42.4 (2023): 1-19.
[2] Işık, Mustafa, et al. "Humanrf: High-fidelity neural radiance fields for humans in motion." ACM Transactions on Graphics (TOG) 42.4 (2023): 1-12.