The Year of Physical AI: A Trillion-Dollar Gamble on 'How the World Works'

marsbitPublished on 2026-04-03Last updated on 2026-04-03

Abstract

The year 2026 is being positioned as the dawn of the "Physical AI" era, marked by major funding rounds and technological breakthroughs. This shift signifies AI's evolution from understanding the digital world to perceiving and acting within the physical world. Key events include Yann LeCun's AMI Labs raising $1.03 billion to develop "world models," Fei-Fei Li's World Labs securing funding, and companies like Tesla deploying humanoid robots (Optimus) in factories. This transition expands the AI model competition into a broader infrastructure battle encompassing hardware, data, simulation, and real-world integration. The core debate is between two AI paths: the established LLM (Large Language Model) approach focused on text prediction and the emerging "world model" approach, which aims to understand physical states for action-oriented tasks. Hardware, particularly dexterous robotic hands, is a critical and expensive challenge. Companies are racing to build capable robotic bodies, with Tesla, Boston Dynamics, and Figure AI making significant progress. NVIDIA is positioning itself as the essential infrastructure provider for this new era, offering a full suite of development tools and platforms. A major bottleneck is the scarcity of high-quality physical world interaction data, with companies exploring solutions through real-world data collection, synthetic data generation, and human teleoperation. Substantial investments in Q1 2026, exceeding $6.4 billion, signal strong bel...

In March 2026, AMI Labs, co-founded by Turing Award winner and former Meta Chief AI Scientist Yann LeCun, announced the completion of a $1.03 billion seed funding round.

Almost simultaneously:

  • World Labs, founded by Fei-Fei Li, completed a new funding round of approximately $1 billion
  • Google DeepMind released the Genie 3 world model
  • Tesla continued to advance the deployment of its Optimus humanoid robots in factories

These events did not occur in isolation but collectively point to a clearer trend: AI is moving from 'understanding the digital world' to 'understanding and acting upon the physical world.'

If 2024 was the expansion period for large language models, and 2025 was the exploration period for Agent落地 (Agent implementation), then in 2026, the core narrative in Silicon Valley is shifting to a more fundamental question: Can AI truly understand 'how the world works' and complete tasks in reality?

This is not just a change in technical direction; it also means the industrial value chain is being rewritten. Over the past two years, the main battlefield of AI competition has been concentrated in a few high-barrier areas like models, computing power, and data centers. But when AI truly enters the physical world, competition will no longer occur only at the model layer; it will simultaneously expand to hardware本体 (hardware bodies), system integration, data collection, simulation environments, supply chain coordination, and real-world scenario implementation. In other words, Physical AI brings not a single-point breakthrough, but a reconstruction of an entire infrastructure system.

Precisely because of this, this round of change, for the Chinese-speaking world, especially Chinese entrepreneurs, engineers, and investors, might not just be a new wave of technological enthusiasm, but a rare structural window of opportunity. Unlike the previous competition dominated mainly by large model training resources and super capital, Physical AI inherently relies more on复合能力 (composite capabilities): one must understand algorithms and also engineering; one must be capable of system coordination and also deeply enter manufacturing, supply chains, and industrial scenarios. Teams that possess both technical depth, hardware coordination capabilities, and Sino-US industrial vision反而更有机会 (instead have more opportunity) to occupy key positions in this new cycle.

In other words, Physical AI is not just a new story Silicon Valley is telling; it might also be the most important entry ticket for Chinese talent in the next round of global technological infrastructure transformation.

01 The Century-Long Debate Between Two Paths: The LLM Camp vs. The World Model Camp

Over the past three years, large language models (LLMs) have almost dominated the development path of AI, with their core paradigm being next-token prediction based on massive text data. But the boundaries of this paradigm are gradually becoming apparent: it can 'describe' the physical world but lacks executable understanding; it lacks modeling capabilities for causality and physical constraints; and it performs limitedly in continuous decision-making and long-term tasks.

Therefore, a faction represented by Yann LeCun began promoting another path: World Model—predicting 'state,' not 'text.' The core difference between the two is that LLMs take text as the learning object and language as the output form, essentially停留在 (remaining at) 'cognition and expression'; whereas world models take the state of the physical world as the modeling object, directly pointing to the ability闭环 (closed loop) of 'perception-decision-execution.'

This is not just LeCun's judgment. In Q1 2026, the world model direction almost simultaneously welcomed several key advancements: AMI Labs, with JEPA as its core architecture, clearly bet on a long-term路线 (route) of 'research first, product later'; World Labs切入 (cut into) 'spatial intelligence,' attempting to make AI truly understand relationships, occlusions, and physical constraints in the three-dimensional world; Google DeepMind, through Genie 3, promoted dynamically generated environments for real-time interaction and used them for agent training.

The three companies have different paths, but they point to the same trend: The next leap in AI is not just about generating better text, but about modeling the world more accurately and completing actions within it.

02 The Hardware War: Who is Building the 'Body'?

The world model solves the 'brain' problem—how AI understands the physical world. But the other half of the Physical AI battlefield is equally fierce: who will build the 'body'?

By 2026, the humanoid robot track has moved completely from 'lab demo' to 'factory mass production' stage. A few key numbers:

Tesla Optimus Gen 3: Over 1000 units have been deployed at Gigafactory Texas and Fremont factory, performing parts handling and assembly tasks. This is the largest deployment of humanoid robots in a factory in human history. Tesla is building a dedicated factory at Giga Texas with an annual capacity of 10 million units, aiming to reduce the cost per unit to $20,000—two years ago, the industry average was still $50,000-$250,000.

Boston Dynamics Atlas: The product version Atlas at CES 2026, 6.2 feet tall, 56 degrees of freedom, can lift 110 lbs. More noteworthy is its 'soul'—Boston Dynamics announced a collaboration with Google DeepMind to integrate cutting-edge foundation models into Atlas. The 2026 annual production capacity has been预定了 (booked) by Hyundai and Google DeepMind, and a factory with 30,000 units/year capacity is being planned.

Figure 03: Figure AI raised $1 billion at a $39 billion valuation (2025). Its Figure 02, during an 11-month trial run at the BMW Spartanburg factory, participated in the production of over 30,000 BMW X3s, moved over 90,000 parts, and累计运行 (accumulated operation) 1250 hours. Figure 03 is a comprehensive upgrade based on this, equipped with 48+ degrees of freedom and a proprietary Helix AI platform.

Mind Robotics: Just announced a $500 million funding round in March, focusing on industrial-scale AI robot deployment.

But in this hardware race, an underestimated环节 (link) is emerging: Dexterous Hand.

The legs of humanoid robots solve the mobility problem, the torso solves the承载问题 (load-bearing problem), but what truly determines whether a robot can work in a complex environment is the hand. Taking Tesla Optimus as an example, the hand cost accounts for 17% of the整机 (whole machine), about $9,500—it is the most expensive single component.

The difficulty of the dexterous hand lies in a fundamental contradiction: the finger space is too small to fit large motors; small motors have insufficient torque, requiring high reduction ratio gearboxes to amplify force; and high reduction ratio gearboxes bring inertial distortion, loss of force feedback, and mechanical wear—these three problems will 'poison' the AI's learning process from the physical level.

A batch of new companies is trying to break through this bottleneck. Some use axial flux motor architecture to compress the reduction ratio from 288:1 to 15:1, achieving a fully reversibly drivable dexterous hand; others, through同步设计 (synchronous design) data collection gloves, allow human operation data to be migrated to robot hardware with zero loss. These seemingly small hardware innovations might be one of the most critical infrastructures in the entire Physical AI ecosystem.

03 NVIDIA: The 'Shovel Seller' in the Physical AI Era

Every technological wave has a 'shovel seller.'

In the era of large models, NVIDIA became the biggest beneficiary凭借 (relying on) GPUs and the CUDA ecosystem;而在 (and in) the Physical AI era, its role is being further upgraded—not just providing computing power, but attempting to build an entire infrastructure for the robotics era.

At the GTC conference in March 2026, NVIDIA released a full suite of platform capabilities围绕 (surrounding) Physical AI: including the visual-language-action model Isaac GR00T for humanoid robots, the Cosmos series for generating large-scale synthetic data, and toolchains covering training, evaluation, and deployment (like Isaac Lab and OSMO). These capabilities are not single-point tools but are gradually forming a complete development and operation system.

Including Boston Dynamics, Caterpillar, Franka Robotics, LG, NEURA Robotics, and other robot companies, are already building next-generation systems on the NVIDIA platform.

Its strategy is also very clear:

Do not directly participate in终端产品 (end products), but become the underlying standard for the entire industry.

If Physical AI is a city under construction, then NVIDIA is simultaneously providing the cement, steel bars, and power grid.

04 Data: The Most Scarce 'Oil' of Physical AI

In the world of large language models, the internet provides almost unlimited text data. But in Physical AI, a more fundamental question emerges:

Manipulation data from the real world is extremely scarce.

This makes data one of the most critical and scarce resources in the entire industry chain.

Currently, the industry is mainly exploring three paths.

Real Data Route. Represented by Physical Intelligence, its π0 model is trained on over 10,000 hours of real robot operation data, covering multiple robot形态 (morphologies) and task types, capable of completing complex operations (like folding clothes, assembling cardboard boxes, etc.). Its open-source behavior essentially provides the industry with a 'manipulation pre-training base.'

Synthetic Data Route. Google DeepMind's Genie 3 and NVIDIA's Cosmos attempt to generate大量 (a large number of) simulated environments through world models, complete training in the virtual world, and then migrate to the real world. The core challenge of this path is the sim-to-real gap, but as simulation accuracy improves, this gap is gradually narrowing.

Human Teleoperation Route. Through devices like data collection gloves, human operations are directly mapped to the robot system. This method has the highest data quality but still has limitations in cost and scalability.

Tesla is trying a hybrid path: continuously collecting human operation behaviors through factory videos and using them to train Optimus's motion capabilities.

Long-term, the competitive landscape of Physical AI will likely not depend on whose model is optimal, but on who possesses the most and highest-quality physical world interaction data. Once the data flywheel starts turning, its barriers will增强 exponentially (strengthen exponentially).

05┃ What Money is Saying: A Full Picture of Physical AI Financing in 2026 Q1

Numbers don't lie. Here are the key financing events in the Physical AI field in Q1 2026:

【World Model Layer】

· AMI Labs (LeCun) — $1.03B Seed Round, Valuation $3.5B

· World Labs (Fei-Fei Li) — $1B New Round, Autodesk invested $200M

【Foundation Model Layer】

· Physical Intelligence — Negotiating a $1B new round, valuation will exceed $11B

· RLWRLD — $41M Seed Round Extension

【Humanoid Robot整机 (Whole Machine)】

· Figure AI — Previously raised $1B at $39B valuation (2025)

· Mind Robotics — $500M, industrial-scale deployment

· Galaxea — $434M, Series B Unicorn

· Humanoid — $290M Seed Round,直接独角兽 (direct unicorn)

· Generative Bionics — €70M Seed Round

【Infrastructure & Tools】

· NVIDIA — Continued investment in Isaac GR00T / Cosmos platform

· RoboForce — $52M, Physical AI labor platform

Just the above公开数据 (public data) for Q1 already exceeds $6.4 billion. And this does not include the internal investments of major players like Tesla, Hyundai/Boston Dynamics, Google DeepMind, etc.

The flow of capital说明一件事 (illustrates one thing): Physical AI has moved past the 'proof of concept' stage and entered the 'infrastructure construction' stage. Investors are no longer asking 'can robots be used,' but 'whose infrastructure can scale robots the fastest.'

06 Cold Thinking: Bubble or Inflection Point?

Of course, Silicon Valley is never short of bubbles. Faced with the狂热 (enthusiasm) for Physical AI, a few冷静的问题 (calm questions) are worth considering:

Demo ≠ Deployment. As industry insiders共识 (consensus) at Davos 2026: the gap between a spectacular demo and a system that can run 10,000 times consecutively without error is much larger than the宣传暗示 (publicity implies). Figure 02 did participate in the production of 30,000 cars at the BMW factory, but it performed relatively standardized parts handling, not dexterous assembly.

Sim-to-real is still a hard nut to crack. The fidelity of world models is improving, but the long-tail complexity of the physical world—lighting changes, material differences, unexpected collisions—remains the biggest challenge for the synthetic data route.

Business models have not yet been proven. LeCun himself said AMI Labs will only do research in its first year. World Labs is trying a free + paid model. Physical Intelligence open-sourced its core model. Currently, these companies have almost zero revenue; capital is betting on paradigm垄断 (monopoly) in 3-5 years.

The gray rhino of safety and regulation. When thousands of robots with autonomous decision-making capabilities enter factories and even homes, who is responsible for accidents? The global regulatory framework for Physical AI is almost空白 (a blank slate).

But precisely these problems indicate that we are in the early stages of a technological inflection point, not the top of a bubble. Every true paradigm shift—the internet, smartphones, cloud computing—was accompanied in its early stages by a phase where 'Demo was far better than the product.' The key difference is: is the underlying technology truly advancing, or is it just the PPT that's improving?

From LeCun's JEPA architecture, to Genie 3's real-time world generation, to π0's 68-task generalization capability, to Optimus's factory deployment of 1000 units—the progress in Q1 2026 is real engineering breakthroughs, not castles in the air.

07 Physical AI is Not an Independent Track; It is the Final Form of AI.

Physical AI is not a new track; it is more like one of the endgame forms of AI.

When AI moves from 'understanding the world' to 'entering the world,' what is truly being rewritten is not just the boundary of model capabilities, but also the way industrial division of labor and value distribution occur. Future competition will not only happen in model parameters and computing clusters, but also in robot本体 (bodies), dexterous hands, data collection, simulation systems, industrial scenarios, and supply chain organizational capabilities.

This is also why this round is particularly important for Chinese talent.

Because one of the deepest accumulations of Chinese talent over the past two decades has never been a single-dimensional technical label, but the ability to truly串起来 (string together) cutting-edge technology, engineering execution, hardware manufacturing, and cross-regional industrial coordination. Whether entrepreneurs, engineers, investors, or industrial resource organizers, as long as they can grasp this migration from digital intelligence to physical intelligence, they have the opportunity not only to participate in the trend but to become part of the trend itself at some key layers.

In 2026, Physical AI may still be far from mature; but precisely because it is still early, the window is just opening. For Chinese talent, this might not be another cycle of 'following participation,' but a new starting point with more opportunity to deeply切入 (cut into) the infrastructure layer, platform layer, and key component layer.

This article is from the WeChat public account "硅兔君" (ID: gh_1faae33d0655), author: 硅兔君 (Silicon Rabbit Jun)

Related Questions

QWhat is the core trend that events in Q1 2026, such as the funding of AMI Labs and World Labs, point towards?

AThe events point to a clear trend that AI is shifting from 'understanding the digital world' to 'understanding and acting in the physical world'.

QAccording to the article, what is the fundamental difference between the LLM approach and the World Model approach to AI?

AThe core difference is that LLMs learn from text data and output language, focusing on 'cognition and expression,' while World Models model the state of the physical world, aiming for a closed loop of 'perception-decision-execution'.

QWhy is the 'dexterous hand' considered a critical and underestimated component in the humanoid robotics competition?

AThe dexterous hand is critical because it determines a robot's ability to work in complex environments. It is the most expensive single component and presents a fundamental engineering challenge due to the矛盾 of fitting powerful enough actuators into a small space without compromising force feedback and creating mechanical wear that hinders AI learning.

QWhat role is NVIDIA playing in the Physical AI era, as described in the article?

ANVIDIA is positioning itself as the foundational infrastructure provider or 'shovel seller' for the Physical AI era. It is building a comprehensive platform with tools like the Isaac GR00T model, Cosmos synthetic data, and development toolchains, aiming to become the underlying standard for the entire industry without making end products.

QWhat does the article identify as the most scarce and critical resource for the development of Physical AI?

AThe article identifies real-world physical interaction data as the most scarce and critical 'oil' for Physical AI development, as it is far less abundant than the text data used for large language models.

Related Reads

Why Pricing Social Interactions is Doomed to Fail?

Titled "Why Putting a Price on Social Interaction Is Doomed to Fail," this article critiques attempts to monetize social networks directly through SocialFi models, arguing their inevitable failure stems from a fundamental misunderstanding of media dynamics. Using Marshall McLuhan's theory of "hot" and "cold" media, the author posits that social networks are inherently "cold" media. Their value isn't contained in individual posts but is co-created through user participation, interpretation, and fragmented, ongoing interaction (e.g., replies, shares). This ambiguity and need for user involvement are core to their function. The article asserts that SocialFi projects like Friend.tech failed because introducing real-time, tradable financial pricing (a definitive "hot" signal) into this "cold" environment doesn't add a layer—it replaces the medium's essence. The unambiguous price signal overshadows and nullifies the nuanced, participatory social signal. Users become traders, not participants, and when speculative profits vanish, the underlying social ecosystem—never genuinely cultivated—collapses entirely. This principle extends beyond crypto. The author argues platforms like Twitter have gradually "heated up" through metrics (likes, retweets counts, algorithmically defined value), shifting users from participants to performers and eroding organic engagement. The solution isn't to abandon capital but to manage its entry point. Successful models like Substack, Patreon, or Bandcamp allow capital to "condense" at specific, isolated nodes (e.g., subscriptions, one-time payments) without permeating and "heating" every social interaction. They preserve the core "cold," participatory medium while enabling monetization at designated boundaries. The NFT boom and bust serves as a stark parallel: the ancient "cold" medium of collecting (valued for story, community, gradual accumulation) was rapidly destroyed by platforms that introduced real-time floor prices, rarity scores, and trading dashboards, transforming collectors into speculators and vaporizing cultural value when prices fell. The core lesson: "Liquidity equals heat." Injecting high liquidity and definitive pricing into a "cold" participatory medium doesn't optimize it; it fundamentally alters and destroys its value-creating mechanism. The future lies not in pricing every social gesture but in finding precise, non-invasive points for capital to condense without overheating the entire ecosystem.

marsbit7m ago

Why Pricing Social Interactions is Doomed to Fail?

marsbit7m ago

Jensen Huang's CMU Speech: In the AI Era, Don't Just Watch, Build

Jensen Huang, CEO of NVIDIA and a first-generation immigrant, delivered the commencement address to Carnegie Mellon University's class of 2026. He shared his personal journey from a humble background to founding NVIDIA, emphasizing resilience, learning from failure, and the responsibility that comes with leadership. Huang framed the present moment as the dawn of the AI revolution, a shift he believes is more profound than previous computing waves. He described AI as fundamentally resetting computing—moving from human-written software to machines that understand, reason, and use tools. This will create a new industry for generating intelligence and transform every sector. While acknowledging AI's potential to automate tasks and displace some jobs, Huang distinguished between the *tasks* of a job and its core *purpose*. He argued AI will augment human capability, not replace humans. The real risk, he stated, is not AI itself, but people being left behind by those who effectively use AI. He presented AI as a generational opportunity for massive infrastructure investment—in chip factories, data centers, energy grids, and advanced manufacturing—that could re-industrialize nations like the U.S. and bridge the digital divide by making computing and intelligent tools accessible to all. Huang called for a balanced approach: advancing AI safely and responsibly, establishing prudent policies, ensuring broad access, and encouraging universal participation. He urged the graduates not to fear the future but to engage with optimism and ambition, reminding them of CMU's motto, "My heart is in the work." His core message was clear: this is their moment to actively build and shape the AI-powered future, not merely observe it.

marsbit1h ago

Jensen Huang's CMU Speech: In the AI Era, Don't Just Watch, Build

marsbit1h ago

The Era Has Arrived Where Human Writers Must Prove They Are Not Machines

The article describes an era where AI-generated content is flooding the market, forcing human authors to prove they are not machines. It begins with the example of dozens of AI-written, error-ridden biographies of Henry Kissinger appearing on Amazon within hours of his death, a pattern repeated for other deceased celebrities and even living experts who find fraudulent books under their names. This spam content has exploded, with monthly new book releases on platforms like Amazon reaching 300,000 by late 2025. The issue spans genres, from suspiciously high proportions of AI-written teen romance and self-help books to dangerous, AI-generated foraging guides containing lethal advice. The platforms' automated review systems, designed to catch plagiarism and banned words, are ill-equipped to detect AI-generated text that avoids these pitfalls while being nonsensical or fraudulent. The problem has infiltrated traditional publishing. A major publisher, Hachette, had to recall a bestselling horror novel after AI detection tools suggested 78% of its content was machine-generated. An acclaimed European philosophy book was later revealed to be entirely written by AI under a fake author persona. In response, authors are fighting back. At the 2026 London Book Fair, 10,000 writers published a blank book titled "Don't Steal This Book" containing only their signatures—using emptiness as a protest weapon in an age of AI overproduction. Initiatives like the "Human Author Certification" program have emerged, ironically placing the burden on humans to prove their work is not machine-made. The article warns of a vicious cycle: AI-generated low-quality books pollute the data used to train future AI models, leading to "model collapse" and an ever-worsening flood of digital waste, eroding trust in publishing and devaluing human creativity.

marsbit1h ago

The Era Has Arrived Where Human Writers Must Prove They Are Not Machines

marsbit1h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片