Researchers at BIGAI and Peking University have recently introduced a promising new framework that addresses this challenge. The framework, presented at The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024, builds on previous work known as HUMANIZE. The team’s goal was to enhance the model’s ability to generalize well across new problems, such as creating realistic motions in response to specific prompts.
The new framework operates in two stages: an Affordance Diffusion Model (ADM) for affordance map prediction and an Affordance-to-Motion Diffusion Model (AMDM) for generating human motion based on descriptions and pre-produced affordance. By utilizing affordance maps derived from the distance field between human skeleton joints and scene surfaces, the model effectively links 3D scene grounding with conditional motion generation.
One of the key advantages of this new framework is its ability to clearly delineate the region associated with user descriptions or prompts. This enhanced 3D grounding capability allows the model to generate convincing motions with minimal training data. Additionally, the model’s use of maps offers a deep understanding of the geometric relationship between scenes and motions, facilitating generalization across diverse scene geometries.
The research conducted by Zhu and his colleagues showcases the potential of conditional motion generation models that incorporate scene affordances. The team anticipates that their model and approach will inspire innovation within the generative AI research community. The model could potentially be further refined and applied to real-world problems, such as producing animated films using AI or generating synthetic training data for robotics applications. Future research will focus on addressing data scarcity through improved collection and annotation strategies for human-scene interaction data.
Leave a Reply