Physical AI is Hot, Some New Thoughts from Me

marsbitPublished on 2026-05-18Last updated on 2026-05-18

Abstract

The term "Physical AI" is gaining significant traction, marking a shift from AI that processes information to AI that understands and interacts with the physical world. Unlike traditional AI confined to screens, Physical AI involves integrating intelligence into robotic bodies to perform tasks in environments governed by gravity, friction, and inertia. The concept, formally defined in a 2020 paper, focuses on creating embodied systems that can complete perception-to-action cycles. 2026 is identified as a pivotal "deployment year," where the focus moves from demonstrations to practical utility. Companies like China's Zhiyuan Robotics have transitioned to live, unscripted factory deployments and announced mass production targets. Internationally, Figure AI, after a major funding round, shifted to its own neural system, while NVIDIA partnered with major industrial robot firms to upgrade millions of existing units with AI capabilities. A key trend is the crossover from the automotive supply chain. Companies like Aptiv and Valeo are entering the Physical AI space, leveraging their expertise in sensors, control systems, and mass production from the autonomous vehicle sector. This "technology spillover" is accelerating development, as seen with Tesla's plans to repurpose automotive production lines for its Optimus robot. The technical breakthrough enabling this progress is the engineering maturity of "world models." Previously theoretical, these AI models can now simulate physica...

Article | New Mou, Author | Lu Yao

Recently, a term has been buzzing in certain circles: "Physical AI".

This term was actually mentioned over ten times by Jensen Huang in his speech at the Las Vegas CES early last year, but it wasn't until this year that "Physical AI" truly exploded in significance.

So, what exactly is "Physical AI"?

A couple of days ago, I saw a video of a robot watering flowers. The robot first walked to the faucet, turned on the valve, filled the watering can, then turned around, walked to the flower pot, adjusted its angle, and poured the water in evenly. The spout didn't hit the edge of the pot, and no water spilled out.

For a machine to understand "carrying a cup of water," it needs to know the cup is cylindrical, calculate the precise force needed to grip it without slipping or crushing it, understand that water is a liquid and will spill if shaken, and constantly adjust its arm angle while walking to compensate for body movement.

These things, a human three-year-old can do intuitively. But for AI, this is a huge leap. Over the past decade, AI learned to see, hear, speak, and draw, but it remained trapped within screens. What Physical AI aims to do is put this smart brain into a body that can run, jump, grasp, and manipulate objects in the real world.

Simply put, Physical AI is about making AI understand and act upon the physical world. It's no longer just processing text and images; it's about performing correct actions in an environment governed by gravity, friction, and inertia.

A fact seldom discussed domestically is that the term "Physical AI" didn't originate from some chip giant's PR department. This concept first appeared in a 2020 paper published in *Nature Machine Intelligence*. The paper systematically defined Physical AI for the first time:

A class of embodied systems capable of performing tasks typically associated with intelligent organisms. The core lies in deeply integrating physical laws into the AI system, so machines are no longer "physically blind" and can complete the perception-to-action loop.

From the academic world's opening shot in 2020 to the industry's full embrace in 2026, there was a gap of six whole years. In these six years, sensor costs dropped by several orders of magnitude, edge AI computing power moved from theory to engineering, and the reliability and mass production capability of robot bodies quietly reached an inflection point — these were the hidden forces pushing Physical AI from papers to production lines.

From Demonstration to Working

If the large language models of 2023 taught AI to chat, then the keyword for Physical AI in 2026 is just one thing: work.

The change is visible to the naked eye.

This time last year, the way robot companies showed off their muscles was still by filming demo videos, setting up scenes, rehearsing repeatedly, and shooting in one take. Impressive to watch, but you never knew how many takes they did.

This year, the playbook is completely different. This year, Zhi Yuan Robotics did something on a 3C production line in Nanchang: they threw a robot into a real factory and had it work continuously for several hours, live-streaming the entire process. No preset script, no limited scene — just the same production line workers face daily. Hundreds of thousands of people watched online.

A month later, Zhi Yuan announced in Hong Kong the mass production of 10,000 humanoid robots. The leap from one prototype in the lab to 10,000 on a production line is a milestone that changes the game.

Zhi Yuan's approach is interesting. Most robotics startups focus on a specific segment — some only on the body, some only on the large model, some only on dexterous hands. Zhi Yuan chose another path: doing the full stack, simultaneously developing the body manufacturing, AI model, dexterous manipulation, and data collection, while also investing in over 60 upstream and downstream companies in the industry chain.

The cost of this approach is clear: the parent company has over a thousand employees, expected to grow further by the end of this year, with an annual salary expenditure alone reaching billions. This path burns cash, but once proven, its moat is also the deepest.

Zhi Yuan's founder Deng Taihua proposed an analytical framework called the "XYZ Curve." He said embodied intelligence development has three stages: X is the development and experimentation phase, where people are still playing with demos; Y is the deployment and growth phase, where robots actually start working on production lines; Z is the ultimate intelligent emergence phase.

He characterized 2026 as: "the first year of deployment phase, officially moving from 'can move' to 'can work'." The difference between "can move" and "can work" is just one word, but it marks the entire industry's coming of age.

The pace overseas is equally intense, not slowing down across the Pacific.

American humanoid robot company Figure AI is an unavoidable name on this track. In September last year, they completed a funding round of over $1 billion, raising their valuation to $39 billion, making them the world's highest-valued humanoid robot company at the time.

A month later, they released a new generation product, Figure 03, standing 1.68 meters tall and weighing about 60 kilograms, demonstrating household chores like watering plants, serving dishes, and folding clothes. Founder Brett Adcock specifically added on social media: all actions were autonomously completed by the robot, with no human remote control.

Technologically, it's noteworthy that Figure made a major strategic pivot, terminating its cooperation with OpenAI and fully transitioning to its self-developed neural network system, Helix.

This system mimics human cognition with a three-layer structure: the bottom layer handles balance and instinctive reactions, the middle layer translates brain commands into motor control commands 200 times per second, and the top layer is the logical brain, responsible for understanding scenes and making decisions. This "instinct-reflex-thought" three-tier architecture is quite clever, essentially giving the robot a non-crashing nervous system.

Another thing worth mentioning. At this year's GTC conference, NVIDIA announced a move: deep cooperation with the world's four industrial robotics giants — ABB, KUKA, Yaskawa, and Fanuc. Over 2 million industrial robots already installed on production lines worldwide can now use NVIDIA's simulation platform for virtual commissioning and AI training.

These four companies combined account for over half of the global industrial robot market share. In the next decade, these robots will undergo an upgrade from "traditional programming" to "AI-driven." Whichever software platform can embed itself into this process will essentially secure the "operating system" layer for the next generation of industrial automation. NVIDIA clearly doesn't want to miss this boat ticket.

Cross-Border Sprint from the Supply Chain

Another interesting phenomenon: automotive supply chain companies are entering the Physical AI track en masse.

At this year's Beijing Auto Show, traditional automotive suppliers like Aptiv, Valeo, Horizon Robotics, and Qianxun SI showcased robotics-related solutions in clusters. Many industry insiders realized then that embodied intelligent perception is the same as automotive intelligent driving perception; automotive solutions can be directly applied to humanoid robots.

Thinking about it carefully, it makes sense. The automotive intelligent driving system is essentially a perception-decision-execution loop for a "mobile robot." Its three core modules — visual perception, path planning, and real-time control — are highly homologous in technical architecture with traditional industrial robots and humanoid robots.

Automotive suppliers' cameras, radars, steer-by-wire chassis, and real-time operating systems can be migrated to the robotics field with slight adaptation. In this sense, the hundreds of billions in R&D spending the automotive industry burned over the past decade on intelligence are now flowing into the Physical AI track as "technology spillover."

This might explain why Chinese robotics companies can so quickly enter the mass production stage. Manufacturing capabilities and supply chain management aren't built from scratch; many are readily available. Those component suppliers already honed on automotive production lines for over a decade are now applying their skills on a new battlefield.

There are ready-made cases abroad. Take Tesla, for example. Its first-generation humanoid robot Optimus is also accelerating its entry. Previously, Tesla clearly announced in its Q1 2026 earnings call that the company would transition to "a future centered on AI, autonomous taxis, and humanoid robots," with the first-generation robot production line having a capacity of 1 million units, replacing the current Model S and Model X production lines.

The number 1 million might seem exaggerated in today's context, but Tesla's logic is clear: it wants to directly replicate the large-scale production capabilities and supply chain management experience accumulated in automobile manufacturing into the humanoid robotics field.

What Musk wants is not a "robot that can move," but a "mass-produced tool" that can work alongside humans in factories. Once this path is proven, its impact on the manufacturing automation landscape will be no less than that of the Model 3 on the fuel vehicle market.

World Model: Why It Become Usable This Year

Having covered the major players' moves at the industry level, let's zoom in one layer deeper: what's the technological foundation of this Physical AI race?

To sum it up in one sentence: the engineering breakthrough of world models. I think this is also the most critical point for understanding this wave.

The concept of "world model" isn't new; it was proposed back in 2018. The core idea is simple: let AI develop an internal understanding of how the physical world operates, so it can predict "what will happen if I push this cup." But previously, this mostly existed only in papers — too computationally expensive, unstable generation quality, unsuitable for real-time interaction.

The turning point happened in the last year. NVIDIA launched a series of models called Cosmos, whose core capability is generating action data conforming to physical laws from text or images.

For example: if you want to train a robot to move boxes in various weather conditions, you don't need to actually film videos in factories during rain, snow, or at night. Set the parameters in a simulation environment, and Cosmos can directly generate massive amounts of highly realistic training data covering various extreme scenarios.

Early this year, the Ant Lingbo team open-sourced a framework called LingBot-World, specifically for interactive world models. It can achieve nearly 10 minutes of continuous, stable video generation, with end-to-end interaction latency controlled within seconds. Users can control virtual characters in real-time with a keyboard and mouse like playing a game, with the model providing instant feedback on scene changes. The significance is that world models moved from "offline rendering" to "online interaction," boosting training efficiency by an order of magnitude.

Another startup, Jijia Vision, released the GigaWorld-1 platform, positioned as a "digital sandbox" for the physical world. A month later, Alibaba's ABot-PhysWorld surpassed it on a benchmark called WorldArena, topping the comprehensive rankings. Competition is advancing month by month.

The importance of these open-source projects lies not in how high their parameters are, but in turning a game "only giants could play" into a tool "small teams can also use." When enough people are building the wheels, more cars will truly start running.

The reason world models have become a core component in the Physical AI era is that they answer that long-unresolved question: how to enable robots to learn the complex laws of the physical world in a low-cost, high-efficiency way?

Training data from the real world is extremely costly to obtain and inherently carries distribution bias. It's hard to gather all edge scenarios in reality, like factory night shifts during a blizzard, emergency situations during a logistics warehouse blackout, or sudden human intervention on a production line. But synthetic data can. By manipulating scene parameters with prompts in a simulation environment, researchers can generate large-scale training videos covering extreme conditions within hours, which would take months or even years under the traditional real-data collection route.

The leverage effect of this breakthrough might exceed any single algorithm improvement.

The Paradigm Has Changed

The breakthrough in world models is actually just one part of the evolution of the Physical AI tech stack. Changes in underlying technology are driving a fundamental architectural rebuild of the entire robotics industry.

Traditional robots use a "sense, plan, act" three-stage approach. First, sensors perceive the environment, then engineers write rules telling the machine how to plan its path, and finally, it executes the action. This works fine in structured environments like factory assembly lines, but once the scenario gets complex, its shortcomings are exposed. The machine only follows the preset script and gets stuck when encountering unseen situations.

Physical AI takes a different path: "perception, reasoning, execution." After perception, it doesn't go through human-written rules but uses a trained neural network to reason what to do and then execute. The essential difference is that the former is "the engineer thinks for the machine," while the latter is "the machine understands the physical world itself."

The International Federation of Robotics released a technology roadmap this year, predicting that within the next three years, 80% of new robot models will adopt this new architecture, with the traditional three-stage approach gradually exiting the mainstream. This isn't a minor tweak; it's a full paradigm shift.

As an industry expert aptly summarized: Physical AI is the ultimate mode of AI development because it needs to understand not only human instructions but also all the laws of the physical world.

Jensen Huang said the "ChatGPT moment" for robotics development has arrived. In my view, the nature of Physical AI's "moment" is completely different from that of language models. The "that moment" for language models was when ordinary people worldwide first got their hands on AI. The "that moment" for Physical AI is when AI truly starts working for the first time.

Currently, this track is at a very special stage: the direction is locked in, the concept is validated, but the landscape isn't settled.

On one hand, making demos and achieving mass production are two completely different capability systems. Getting one prototype to work is one thing; having ten thousand products perform consistently in real-world scenarios tests manufacturing consistency, supply chain resilience, scenario generalization ability, and operational systems. These have little to do with AI algorithms, but each is enough to halt a batch of players. On the other hand, real-world data collection is expensive, time-consuming, and has limited coverage, which almost predestines that large-scale training for Physical AI will heavily rely on synthetic data.

At the same time, from automotive supply chains and traditional industrial automation to consumer electronics manufacturing, industries that seem unrelated to "AI" are accelerating their entry into Physical AI through technology spillover. Their manufacturing capabilities, supply chain management experience, and scenario resources might be the key variables determining the speed of Physical AI's practical application.

An intuitive judgment is this: look back at the AI wave ignited by ChatGPT in early 2023. The ones who captured the most value weren't the model makers, but the infrastructure providers. Will this wave of Physical AI replay the same script?

NVIDIA's moves suggest it's betting on this direction, but the story isn't finished. 2026 is the first year of the deployment phase; industrial competition has just begun. Looking back three years from now, which names are still at the table and which have been eliminated might surprise most people.

Related Questions

QWhat is Physical AI and how is it fundamentally different from previous AI developments?

APhysical AI refers to an intelligent, embodied system that can understand and interact with the physical world by integrating physical laws into its AI framework. Unlike earlier AI models confined to processing digital data like text and images, Physical AI operates within environments governed by gravity, friction, and inertia, enabling it to perform tasks like grasping, moving, and manipulating real-world objects.

QWhat were the key industry developments in 2026 that marked the transition of Physical AI into a deployment phase?

AIn 2026, key developments included Zhiyuan Robotics conducting live, unscripted demonstrations of its humanoid robots on real 3C production lines, announcing mass production of 10,000 units. Internationally, Figure AI released its Figure 03 model and shifted to its in-house Helix neural system. Additionally, NVIDIA partnered with four major industrial robotics firms to integrate AI training into existing robotic fleets, signaling a shift from prototype demonstrations to practical, scalable deployment.

QHow is the automotive supply chain contributing to the advancement of Physical AI?

AAutomotive suppliers are leveraging their expertise in sensors (cameras, radar), drive-by-wire systems, and real-time operating systems developed for autonomous vehicles. This technology is highly transferable to robotics for perception, planning, and control. Companies like Aptiv, Valeo, and Horizon Robotics are applying these solutions to the Physical AI domain, providing mature manufacturing capabilities and supply chain management that accelerate the transition of robots from labs to mass production.

QWhat is a 'World Model' and why has it become a critical technological foundation for Physical AI in 2026?

AA 'World Model' is an AI system that learns an internal understanding of physical world dynamics, allowing it to predict outcomes of actions (e.g., what happens if a cup is pushed). In 2026, its engineering breakthrough, led by models like NVIDIA's Cosmos and open-source frameworks like LingBot-World, enabled the efficient generation of massive, realistic synthetic training data. This allows robots to learn complex physical interactions and edge-case scenarios in simulation at low cost and high speed, which is impractical with real-world data collection alone.

QHow is the traditional robotics architecture being transformed by the Physical AI paradigm?

AThe traditional 'Sense, Plan, Act' architecture, which relies on pre-programmed rules for specific environments, is being replaced by Physical AI's 'Perception, Reasoning, Execution' paradigm. Instead of following fixed scripts, robots now use trained neural networks to reason and make decisions based on their understanding of the physical world. This shift enables adaptability in unstructured environments. Industry forecasts suggest that 80% of new robot models will adopt this new architecture within three years, representing a fundamental paradigm change in the field.

Related Reads

Will the Next Crypto Bull Run Start with On-Chain Trading of SpaceX?

This article presents a scenario-based forecast for the crypto industry from 2026 to 2029, arguing that the next major cycle will be driven not by technological narratives but by legal access to real-world assets. The author predicts that by mid-2026, pre-IPO perpetual contracts for top private companies like SpaceX, OpenAI, and Anthropic on platforms like Hyperliquid will become the primary gateway for accessing quality assets, as most crypto-native tokens fail to capture real value. The much-hyped AI x Crypto intersection largely fails except for prediction markets, which thrive on betting on AI model supremacy. By 2027, public blockchain foundations are forced to choose between catering to retail speculation or building compliant infrastructure for institutions, with many opting for the latter. Growth in stablecoins and tokenized private credit/equity hits a "triple ceiling" due to regulatory and political uncertainty rather than market demand. The pivotal shift is forecast for 2028. A major liquidation event in pre-IPO perpetuals exposes the structural flaw of synthetic markets lacking a real underlying asset anchor. In response, regulatory changes finally allow the public solicitation of private securities resales to verified accredited investors. This creates a legitimate secondary market for real company equity, which then becomes the core asset class of the new bull market, relegating synthetic perps to a niche role. By 2029, the industry becomes "boring" but foundational. Tokens without claims on real cash flows or assets cease trading. Stablecoin growth is steady but politically capped. Crypto infrastructure fades from view as it gets absorbed into traditional finance backends. The article's central thesis is that the key bottleneck for crypto's next phase is legal and regulatory channels for real asset ownership, not technology.

marsbit1h ago

Will the Next Crypto Bull Run Start with On-Chain Trading of SpaceX?

marsbit1h ago

The Value Distribution of Stablecoins

**Summary: The Value Distribution of Stablecoins** The article argues that stablecoins are evolving from mere trading tools into broader channels for dollar access. It divides the stablecoin ecosystem into four layers to analyze how value is distributed: 1. **Issuance Layer:** Mints stablecoins, holds reserve assets, and captures the spread between reserve yield and user costs (e.g., Tether, Circle). This layer currently earns the largest profit margin. 2. **Infrastructure Layer:** Connects stablecoins to the traditional financial system, handling fiat on/off-ramps, banking integration, compliance (KYC/AML), and asset management (e.g., Bridge, BVNK). This is the "unglamorous" but critical work, building the essential bridges between crypto and real-world finance. 3. **Acquiring/Distribution Layer:** Integrates stablecoins into merchant systems, manages payment flows, and provides enterprise financial software (e.g., Stripe, Coinbase). They act as the access point for businesses. 4. **Application Layer:** The end-users and businesses that ultimately use stablecoins for payments, settlements, or as a store of value. They benefit from convenience but have little pricing power. The core thesis is that while the issuance layer currently dominates profits, the often-overlooked **infrastructure layer holds significant long-term potential**. The real challenge and barrier to mass adoption is not the on-chain transfer of stablecoins (which is simple), but the complex "last mile" integration into existing business workflows, banking systems, and regulatory frameworks across different countries. Companies in this layer are currently in a "land grab" phase, investing heavily to build networks, secure bank partnerships, and establish compliance pathways. While their position is currently pressured by the profitable issuers above and distribution platforms below, the article suggests that if stablecoins become a default financial rail for businesses, the infrastructure providers who have done the hard work of integration will ultimately gain strong pricing power and become entrenched, essential players.

marsbit7h ago

The Value Distribution of Stablecoins

marsbit7h ago

The Value Distribution of Stablecoins

The Value Distribution of Stablecoins The article argues that stablecoins are evolving from a mere trading tool into a broad "dollar channel." It analyzes the industry's value chain through four layers: 1. **Issuance Layer (e.g., Tether, Circle):** The top layer that mints stablecoins, holds reserve assets, and captures the thickest interest rate spread. 2. **Infrastructure Layer (e.g., Bridge, BVNK):** Connects stablecoins to the traditional financial system, handling critical but complex "dirty work" like fiat on/off-ramps, banking integration, compliance (KYC/AML), and cross-border settlement. 3. **Acquiring/Distribution Layer (e.g., Stripe, Coinbase):** Embeds stablecoins into merchant systems, manages payment flows, and integrates with enterprise software. 4. **Application Layer:** End-users and businesses that ultimately use stablecoins for payments, settlement, or storing value. The author posits that while the issuance layer currently captures the most profit, the most overlooked and potentially critical layer is infrastructure. The core challenge for stablecoin adoption isn't the on-chain transfer (which is simple), but bridging the gap between blockchain and the real-world financial system. This involves solving practical problems for businesses: fiat conversion, reconciliation, tax handling, and user onboarding. Infrastructure companies are currently in a difficult "land-grab" phase—building networks, securing banking relationships, and achieving compliance country-by-country. They face pressure from both the profitable issuance layer above and distribution platforms below. However, the author suggests this layer is building a crucial moat. Once stablecoins become a default business rail, the infrastructure players who have done the hard work of integration may gain significant, durable value and pricing power.

链捕手8h ago

The Value Distribution of Stablecoins

链捕手8h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片