Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action

33d ago · Global · primary source: huggingface.co

Multi-source synthesis by The Embedding Report from 2 sources. Every numeric and quoted claim traces to a cited source body (see methodology).

NVIDIA has released Cosmos 3, an open omni-model for physical AI reasoning and action, built on a Mixture-of-Transformers (MoT) architecture. This model supports multiple input and generation modalities through a single unified model^[1].

Cosmos 3 is designed to help build physical AI systems capable of understanding the real world, including motion, causality, physics, and action. It can generate realistic and physically plausible video worlds from text, images, videos, or action inputs, and reason about physical properties like motion and spatial relationships^[2]. The model is available in two sizes: Cosmos 3 Nano with 8 billion parameters, optimized for efficient inference, and Cosmos 3 Super with 32 billion parameters, designed for large-scale synthetic data generation and research^[1]. Cosmos 3 is integrated with the Hugging Face Diffusers library, making it easy to use in world generation pipelines. For example, developers can use the Cosmos3OmniPipeline to generate images or videos from text prompts. NVIDIA has also released a set of Synthetic Data Generation (SDG) datasets to help the physical AI community train and evaluate world foundation models. These datasets cover various domains, including robotics, physics, and autonomous driving^[2].

Background sources we checked (1)

en.wikipedia.org ↗ A world model in artificial intelligence is a machine learning system that builds an internal representation of an environment. The model predicts how that environment changes over time in response to actions. Researchers design world models to help agents plan, reason, and act w…

Sources cited (2)

Spot something wrong? Report an issue