Categories: Technology

The Future of AI in Film and Robotics

In recent years, machine learning-based models have made significant strides in autonomously generating various types of content. These frameworks have revolutionized the fields of filmmaking and robotics by opening up new possibilities in creating datasets to train algorithms. While some existing models excel at generating realistic or artistic images from text descriptions, the development of AI capable of generating videos of moving human figures based on human instructions has presented a greater challenge.

Researchers at BIGAI and Peking University have recently introduced a promising new framework that addresses this challenge. The framework, presented at The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024, builds on previous work known as HUMANIZE. The team’s goal was to enhance the model’s ability to generalize well across new problems, such as creating realistic motions in response to specific prompts.

The Two-Stage Approach

The new framework operates in two stages: an Affordance Diffusion Model (ADM) for affordance map prediction and an Affordance-to-Motion Diffusion Model (AMDM) for generating human motion based on descriptions and pre-produced affordance. By utilizing affordance maps derived from the distance field between human skeleton joints and scene surfaces, the model effectively links 3D scene grounding with conditional motion generation.

One of the key advantages of this new framework is its ability to clearly delineate the region associated with user descriptions or prompts. This enhanced 3D grounding capability allows the model to generate convincing motions with minimal training data. Additionally, the model’s use of maps offers a deep understanding of the geometric relationship between scenes and motions, facilitating generalization across diverse scene geometries.

The research conducted by Zhu and his colleagues showcases the potential of conditional motion generation models that incorporate scene affordances. The team anticipates that their model and approach will inspire innovation within the generative AI research community. The model could potentially be further refined and applied to real-world problems, such as producing animated films using AI or generating synthetic training data for robotics applications. Future research will focus on addressing data scarcity through improved collection and annotation strategies for human-scene interaction data.

adam1