NVIDIA is paving the way toward a new landscape in artificial intelligence, breaking away from its traditional confines within two-dimensional digital realms to introduce what it calls 'physical AI.' This ambitious shift was articulated by Jensen Huang, NVIDIA's CEO, in a recent address where he detailed the company's vision of an AI that can not only perceive but also interact with the three-dimensional world around us.
Huang compared this leap in technology to stepping out of Edwin Abbott Abbott's Flatland and into a realm where an AI model can process its surroundings much like humans do, as opposed to processing text. He envisions a future where traditional prompts transform into actionable requests, producing what he refers to as action tokens instead of mere text outputs.
This breakthrough positions NVIDIA's AI beyond the capabilities of traditional robotics, which have typically been limited to executing specific, repetitive tasks within controlled environments. In contrast, physical AI is set to navigate complex, dynamic settings with adaptability and deeper understanding, a game-changer in industries where operational flexibility is crucial.
Kimberly Powell, NVIDIA's Vice President of Healthcare, highlighted the transformative potential of physical AI at the JP Morgan Healthcare Conference. Powell announced plans to integrate physical AI into all facets of healthcare, from patient rooms to sensor networks across medical facilities. By understanding the physical world, these advanced AI systems aim to enhance healthcare delivery significantly.
Huang credits NVIDIA's continued advancements in GPU performance for these strides in AI development. The company's recent enhancements to its Hopper architecture, culminating in a fivefold improvement within a year, demonstrate its capability to push the boundaries of AI technology. Such improvements, Huang states, were made possible through NVIDIA’s approach to hardware-software co-design methodologies. These tools are instrumental in inching closer to the goal of artificial general intelligence (AGI) and what Huang refers to as artificial general robotics—a future where machines not only think but also act with a greater degree of autonomy.
A significant contributor to these innovations is NVIDIA's new Cosmos platform, a computational architecture engineered for the development of autonomous systems. Cosmos offers comprehensive frameworks designed for processing visual and physical data, crucial for applications in robotics and autonomous vehicles. Key components of this architecture include models for video generation and data compression, which collectively support the platform’s aim to advance physics-aware world modeling and video creation.
Highlighting the practical implications of these technological advancements, Huang mentioned during the CES 2025 keynote how different this version of AI will be compared to text-centric models. He proposed a scenario where AI doesn't just interpret verbal commands but can be prompted to interact with physical environments directly. This capability could herald a new era for robotics, driven by the increasing demand for multimodal large language models (LLMs).
NVIDIA's vision extends beyond new opportunities in AI deployment; it also encompasses what Huang calls 'brownfield' opportunities. These refer to innovations that can take place without new infrastructure, like self-driving cars and humanoid robots—fields that naturally integrate with the existing built environment.
The technical structure underpinning the Cosmos platform promises significant improvements in pose estimation accuracy and data compression. Its training framework incorporates an immense dataset, harnessing an extensive repository of robotics and driving data to train its systems effectively. According to Huang, the scale of data and computational power employed directly correlates with the capabilities of such advanced models.
This bold new vision for AI posits the technology as a partner to human endeavor, offering assistive roles across various sectors from healthcare to manufacturing. It suggests a future where AI seamlessly integrates into daily life, replacing mundane tasks, adapting dynamically to changes, and operating continuously to support human productivity.
As physical AI approaches the mainstream, its potential to transform industries and everyday experiences seems boundless. NVIDIA's foray into this domain marks the dawn of a new era, where AI's once-flat world is expanding into a future rich with three-dimensional potential.