World-Consistent Video Diffusion With Explicit 3D Modeling

As diffusion models dominating visual content generation, efforts have been made to adapt these models for multi-view image generation to create 3D content. Traditionally, these methods implicitly learn 3D consistency by generating only RGB frames, which can lead to artifacts and inefficiencies in training. In contrast, we propose generating Normalized Coordinate Space (NCS) frames alongside RGB frames. NCS frames capture each pixel’s global coordinate, providing strong pixel correspondence and explicit supervision for 3D consistency. Additionally, by jointly estimating RGB and NCS frames during training, our approach enables us to infer their conditional distributions during inference through an inpainting strategy applied during denoising. For example, given ground truth RGB frames, we can inpaint the NCS frames and estimate camera poses, facilitating camera estimation from unposed images. We train our model over a diverse set of datasets. Through extensive experiments, we demonstrate its capacity to integrate multiple 3D-related tasks into a unified framework, setting a new benchmark for foundational 3D model.

Figure 1: Pipeline of the proposed World-consistent Video Diffusion Model.

† The Chinese University of Hong Kong
‡ Work done while at Apple

World-Consistent Video Diffusion With Explicit 3D Modeling

TC Sessions: AI Trivia Countdown — score big on tickets

Looking ahead to the AI Seoul Summit

softbliss

Related Posts

Introducing the Frontier Safety Framework

Hands-On Attention Mechanism for Time Series Classification, with Python

AI First Puts Humans First – O’Reilly

Medical Centers Tap AI, Federated Learning for Better Cancer Detection

An anomaly detection framework anyone can use | MIT News

Looking ahead to the AI Seoul Summit

Leave a Reply Cancel reply

Premium Content

Advanced Data Visualization Techniques to Enhance Business

Meta Researchers Introduced J1: A Reinforcement Learning Framework That Trains Language Models to Judge With Reasoned Consistency and Minimal Data

Unlock Scalable Growth with Microsoft ERP Integration Solutions

Browse by Category

Soft Bliss Academy

Categories

Recent Posts

Are you sure want to unlock this post?

Are you sure want to cancel subscription?

World-Consistent Video Diffusion With Explicit 3D Modeling

TC Sessions: AI Trivia Countdown — score big on tickets

Looking ahead to the AI Seoul Summit

Related Posts

Leave a Reply Cancel reply

Premium Content

Browse by Category

Browse by Tags

Soft Bliss Academy

Categories

Recent Posts

Are you sure want to unlock this post?

Are you sure want to cancel subscription?