a16z's 10,000-Word Article: The Next Frontier of AI Is Not in Language, But in the Physical World—The Triple Flywheel of Robotics, Autonomous Science, and Brain-Computer Interfaces

marsbitPublicado em 2026-04-16Última atualização em 2026-04-16

Resumo

The next frontier of AI lies in the physical world, moving beyond language and code into robotics, autonomous science, and novel human-computer interfaces. These domains are powered by five core technical primitives: learned representations of dynamics, embodied action architectures, simulation and synthetic data infrastructure, expanded sensory channels, and closed-loop agent systems. Robotics applies these to real-time physical interaction, autonomous science enables AI-driven discovery through self-driving labs, and new interfaces—like AR, silent speech, and brain-computer interfaces—expand human-AI interaction bandwidth. Together, they form a mutually reinforcing flywheel: robotics enables automated science, science produces structured physical data to improve AI models, and new interfaces generate rich human-world interaction data. This convergence promises to unlock emergent capabilities as AI begins to scale in the physical domain.

Author: Oliver Hsu (a16z)

Compiled by: Deep Tide TechFlow

Deep Tide Introduction: This article is from a16z researcher Oliver Hsu and is the most systematic "Physical AI" investment map since 2026. His judgment is: the language/code mainline is still scaling, but the areas that can truly develop the next generation of disruptive capabilities are the three fields adjacent to the mainline—general-purpose robots, autonomous science (AI scientists), and new human-computer interfaces like brain-computer interfaces. The author breaks down the five underlying capabilities that support them and argues that these three fronts will form a structurally reinforcing flywheel that feeds into each other. For those who want to understand the investment logic of Physical AI, this is currently the most complete framework.

Today's dominant AI paradigm is organized around language and code. The scaling laws of large language models have been clearly defined, the commercial flywheel of data, computing power, and algorithmic improvements is turning, and the returns from each step up in capability are still significant, with most of these returns being visible. This paradigm deserves the capital and attention it attracts.

But another set of adjacent fields has already made substantial progress during their incubation period. These include VLA (Vision-Language-Action models), WAM (World Action Models), and other general-purpose robotics approaches, physical and scientific reasoning centered around "AI scientists," and new interfaces that leverage AI advancements to reshape human-computer interaction (including brain-computer interfaces and neurotechnology). Beyond the technology itself, these directions are beginning to attract talent, capital, and founders. The technical primitives for extending frontier AI into the physical world are maturing simultaneously, and progress over the past 18 months suggests these fields will soon enter their own scaling phases.

In any technological paradigm, the areas with the largest delta between current capabilities and medium-term potential often share two characteristics: first, they can benefit from the same scaling advantages driving the current frontier; second, they are just one step removed from the mainstream paradigm—close enough to inherit its infrastructure and research momentum, yet distant enough to require substantial additional work. This distance itself has a dual effect: it naturally forms a moat against fast followers, while also defining a problem space with sparser information and less crowding, thus more likely to give rise to new capabilities—precisely because the shortcuts haven't been exhausted.

Caption: Schematic of the relationship between the current AI paradigm (language/code) and adjacent frontier systems

Three fields fit this description today: robotic learning, autonomous science (especially in materials and life sciences), and new human-computer interfaces (including brain-computer interfaces, silent speech, neuro-wearables, and new sensory channels like digital olfaction). They are not entirely independent efforts; thematically, they belong to the same group of "frontier systems for the physical world." They share a set of underlying primitives: learned representations of physical dynamics, architectures for embodied action, simulation and synthetic data infrastructure, expanding sensory channels, and closed-loop agent orchestration. They reinforce each other through cross-domain feedback relationships. They are also the most likely places for qualitative capability leaps to emerge—products of the interaction between model scale, physical deployment, and new data modalities.

This article will outline the technical primitives supporting these systems, explain why these three fields represent frontier opportunities, and propose that their mutual reinforcement forms a structural flywheel pushing AI into the physical world.

Five Underlying Primitives

Before looking at specific applications, understand the shared technical foundation of these frontier systems. Pushing frontier AI into the physical world relies on five main primitives. These technologies are not exclusive to any single application domain; they are building blocks—enabling systems that "extend AI into the physical world" to be built. Their simultaneous maturation is what makes the current moment special.

Caption: The five underlying primitives supporting Physical AI

Primitive 1: Learned Representations of Physical Dynamics

The most fundamental primitive is the ability to learn a compressed, general representation of physical world behavior—how objects move, deform, collide, and react to forces. Without this layer, every physical AI system would have to learn the physics of its domain from scratch, a cost no one can afford.

Several architectural schools are approaching this goal from different directions. VLA models start from the top: take pre-trained vision-language models—which already possess semantic understanding of objects, spatial relationships, and language—and add an action decoder on top to output motion control commands. The key point is that the enormous cost of learning to see and understand the world can be amortized by internet-scale image-text pre-training. Physical Intelligence's π0, Google DeepMind's Gemini Robotics, and NVIDIA's GR00T N1 are all validating this architecture at increasingly larger scales.

WAM models start from the bottom: based on video diffusion Transformers pre-trained on internet-scale video, inheriting rich priors about physical dynamics (how objects fall, get occluded, interact when force is applied), and then coupling these priors with action generation. NVIDIA's DreamZero demonstrated zero-shot generalization to novel tasks and environments, cross-embodiment migration from few human video demonstrations, and achieved meaningful improvements in real-world generalization.

A third approach might be most indicative of future directions, skipping pre-trained VLMs and video diffusion backbones entirely. Generalist's GEN-1 is a natively embodied foundation model trained from scratch on over 500,000 hours of real physical interaction data, primarily collected from people performing daily manipulation tasks using low-cost wearable devices. It is not a standard VLA (no vision-language backbone is being fine-tuned), nor a WAM. It is a foundation model specifically designed for physical interaction, learning from scratch not the statistical patterns of internet images, text, or video, but the statistical patterns of human-object contact.

Spatial intelligence, as pursued by companies like World Labs, is valuable for this primitive because it addresses a common shortcoming of VLA, WAMs, and natively embodied models: none explicitly model the 3D structure of the scene they are in. VLAs inherit 2D visual features from image-text pre-training; WAMs learn dynamics from video, which is a 2D projection of 3D reality; models learning from wearable sensor data capture force and kinematics but not scene geometry. Spatial intelligence models can help fill this gap—learning to reconstruct and generate the complete 3D structure of physical environments and reason about it: geometry, lighting, occlusion, object relationships, spatial layout.

The convergence of these various approaches is itself significant. Whether the representation is inherited from a VLM, co-learned from video, or built natively from physical interaction data, the underlying primitive is the same: a compressed, transferable model of physical world behavior. The data flywheel these representations can tap into is enormous and largely untapped—not just internet video and robot trajectories, but also the vast corpus of human bodily experience that wearable devices are beginning to collect at scale. The same representation can serve a robot learning to fold towels, an autonomous lab predicting reaction outcomes, and a neural decoder interpreting motor cortex grasp intentions.

Primitive 2: Architectures for Embodied Action

Physical representation alone is not enough. Translating "understanding" into reliable physical action requires architectures to solve several interrelated problems: mapping high-level intent to continuous motion commands, maintaining consistency over long action sequences, operating under real-time latency constraints, and continuously improving with experience.

A dual-system hierarchical architecture has become a standard design for complex embodied tasks: a slow but powerful vision-language model handles scene understanding and task reasoning (System 2), paired with a fast, lightweight visuomotor policy for real-time control (System 1). Variants of this approach are used by GR00T N1, Gemini Robotics, and Figure's Helix, addressing the fundamental tension between "large models providing rich reasoning" and "physical tasks requiring millisecond-level control frequencies." Generalist takes a different path, using "resonant reasoning" to allow thinking and acting to occur simultaneously.

The action generation mechanisms themselves are also evolving rapidly. The flow-matching and diffusion-based action heads pioneered by π0 have become the mainstream method for generating smooth, high-frequency continuous actions, replacing the discrete tokenization borrowed from language modeling. These methods treat action generation as a denoising process similar to image synthesis, producing trajectories that are physically smoother and more robust to error accumulation than autoregressive token prediction.

But perhaps the most critical architectural advancement is the extension of reinforcement learning to pre-trained VLAs—a foundation model trained on demonstration data can continue to improve through autonomous practice, just like a person refining a skill through repetition and self-correction. Physical Intelligence's π*0.6 work is the clearest large-scale demonstration of this principle. Their method, called RECAP (Reinforcement Learning with Experience and Correction based on Advantage-conditioned Policies), addresses the long-sequence credit assignment problem that pure imitation learning cannot solve. If a robot picks up an espresso machine handle at a slightly skewed angle, failure may not be immediate but manifest several steps later during insertion. Imitation learning has no mechanism to attribute this failure back to the earlier grasp; RL does. RECAP trains a value function to estimate the probability of success from any intermediate state and then has the VLA choose high-advantage actions. Crucially, it integrates multiple heterogeneous data types—demonstration data, on-policy autonomous experience, corrections provided by expert teleoperation during execution—into the same training pipeline.

The results of this approach are good news for the prospects of RL in the action domain. π*0.6 reliably folds 50 types of unseen clothing in real home environments, assembles cardboard boxes, and makes espresso on professional machines, running for hours continuously without human intervention. On the most difficult tasks, RECAP more than doubled throughput and halved failure rates compared to pure imitation baselines. The system also demonstrated that RL post-training produces qualitative behaviors not seen in imitation learning: smoother recovery motions, more efficient grasping strategies, adaptive error correction not present in the demonstration data.

These gains indicate one thing: the compute power scaling dynamics that pushed large models from GPT-2 to GPT-4 are beginning to operate in the embodied domain—only now at an earlier point on the curve, where the action space is continuous, high-dimensional, and subject to the unforgiving constraints of the physical world.

Primitive 3: Simulation and Synthetic Data as Scaling Infrastructure

In the language domain, the data problem was solved by the internet: naturally occurring, freely available trillions of text tokens. In the physical world, this problem is orders of magnitude more difficult—a consensus now, most directly signaled by the rapid increase in startup data providers focused on the physical world. Collecting real-world robot trajectories is expensive, risky to scale, and limited in diversity. A language model can learn from billions of conversations; a robot (for now) cannot have billions of physical interactions.

Simulation and synthetic data generation are the infrastructure layers addressing this constraint. Their maturation is a key reason why Physical AI is accelerating now, not five years ago.

Modern simulation stacks combine physics-based simulation engines, photorealistic ray-traced rendering, procedural environment generation, and world foundation models that generate photorealistic video from simulation inputs—the latter responsible for bridging the sim-to-real gap. The entire pipeline starts with neural reconstruction of real environments (possible with just a smartphone), populates them with physically accurate 3D assets, and proceeds to large-scale synthetic data generation with automatic labeling.

The significance of simulation stacks is that they are changing the economic assumptions underpinning Physical AI. If the bottleneck for Physical AI shifts from "collecting real data" to "designing diverse virtual environments," the cost curve collapses. Simulation scales with compute, not with manpower and physical hardware. This transformation of the economic structure for training Physical AI systems is of the same kind as the transformation internet text data brought to training language models—meaning investment in simulation infrastructure has enormous leverage for the entire ecosystem.

But simulation is not just a robotics primitive. The same infrastructure serves autonomous science (digital twins of lab equipment, simulated reaction environments for hypothesis pre-screening), new interfaces (simulated neural environments for training BCI decoders, synthetic sensory data for calibrating new sensors), and other domains where AI interacts with the physical world. Simulation is the universal data engine for physical world AI.

Primitive 4: Expanding Sensory Channels

The signals conveying information in the physical world are far richer than vision and language. Haptics conveys material properties, grasp stability, contact geometry—information cameras cannot see. Neural signals encode motor intent, cognitive states, and perceptual experiences with a bandwidth far exceeding any existing human-computer interface. Subvocal muscle activity encodes speech intent before any sound is produced. The fourth primitive is the rapid expansion of AI's access to these previously hard-to-reach sensory modalities—driven not only by research but also by an entire ecosystem building consumer-grade devices, software, and infrastructure.

Caption: Expanding AI sensory channels, from AR and EMG to brain-computer interfaces

The most直观的指标是新品类设备的出现。直观 metric is the emergence of new device categories. AR devices have significantly improved in experience and form factor in recent years (companies are already building applications for consumer and industrial scenarios on this platform); voice-first AI wearables give language-based AI a more complete physical world context—they literally follow users into physical environments. Long-term, neural interfaces may unlock even more complete interaction modalities. The shift in computing paradigms brought by AI creates an opportunity for a major upgrade in human-computer interaction, with companies like Sesame building new modalities and devices for this purpose.

More mainstream modalities like voice also create tailwinds for emerging interaction methods. Products like Wispr Flow push voice as a primary input method (due to its high information density and natural advantages), improving the market conditions for silent speech interfaces as well. Silent speech devices use various sensors to capture tongue and vocal cord movements, recognizing language silently—they represent a human-computer interaction modality with even higher information density than voice.

Brain-computer interfaces (invasive and non-invasive) represent a deeper frontier, with the commercial ecosystem around them steadily advancing. Signals will emerge at the confluence of clinical validation, regulatory approval, platform integration, and institutional capital—a convergence point for a technology category that was purely academic just a few years ago.

Haptic perception is entering embodied AI architectures, with some models in robotic learning explicitly incorporating touch as a first-class citizen. Olfactory interfaces are becoming real engineering products: wearable olfactory displays using micro odor generators with millisecond response times have been demonstrated in mixed reality applications; olfactory models are also beginning to pair with visual AI systems for chemical process monitoring.

The common pattern in these developments is: they converge on each other at the limit. AR glasses continuously generate visual and spatial data of user interaction with the physical environment; EMG wristbands capture the statistical patterns of human movement intent; silent speech interfaces capture the mapping from subvocalization to language output; BCIs capture neural activity at currently the highest resolution; tactile sensors capture the contact dynamics of physical manipulation. Each new device category is also a data generation platform, feeding the underlying models across multiple application domains. A robot trained on data using EMG to infer movement intent learns different grasping strategies than one trained only on teleoperation data; a lab interface responding to subvocal commands enables a completely different scientist-machine interaction compared to a keyboard-controlled lab; a neural decoder trained on high-density BCI data produces motor planning representations unavailable through any other channel.

The proliferation of these devices is expanding the effective dimensionality of the data manifold available for training frontier physical AI systems—and a significant portion of this expansion is driven by well-capitalized consumer goods companies, not just academic labs, meaning the data flywheel can expand along with market adoption rates.

Primitive 5: Closed-Loop Agent Systems

The final primitive is more architectural. It refers to the orchestration of perception, reasoning, and action into sustained, autonomous, closed-loop systems that operate over long time horizons without human intervention.

In language models, the corresponding development is the rise of agent systems—multi-step reasoning chains, tool use, self-correction processes—pushing models from single-turn Q&A tools to autonomous problem solvers. In the physical world, the same transition is happening, only with much more demanding requirements. A language agent can roll back errors at no cost; a physical agent cannot undo a spilled reagent.

Physical world agent systems have three characteristics that distinguish them from their digital counterparts. First, they need to be instrumented for experimentation or operate in a closed loop: directly interfacing with raw instrument data streams, physical state sensors, and execution primitives, grounding reasoning in physical reality, not textual descriptions of it. Second, they need long-sequence persistence: memory, provenance tracking, safety monitoring, recovery behaviors, linking multiple run cycles together, not treating each task as an independent episode. Third, they need closed-loop adaptation: revising strategies based on physical outcomes, not just textual feedback.

This primitive fuses individual capabilities (good world models, reliable action architectures, rich sensor suites) into complete systems capable of autonomous operation in the physical world. It is the integration layer, and its maturation is the prerequisite for the three application areas below to exist as real-world deployments rather than isolated research demonstrations.

Three Domains

The primitives above are general enabling layers; they themselves do not specify where the most important applications will emerge. Many domains involve physical action, physical measurement, or physical perception. What distinguishes "frontier systems" from "merely improved versions of existing systems" is the degree to which compounding returns occur from model capability improvements and scaling infrastructure within the domain—not just better performance, but the emergence of new capabilities previously impossible.

Robotics, AI-driven science, and new human-computer interfaces are the three domains with the strongest compounding effects. Each uniquely assembles the primitives, each is constrained by the limitations the current primitives are removing, and each, in operation, generates as a byproduct a form of structured physical data—data that in turn makes the primitives themselves better, creating a feedback loop that accelerates the entire system. They are not the only Physical AI domains worth watching, but they are where frontier AI capabilities interact most intensively with physical reality, and are furthest from the current language/code paradigm—thus offering the largest space for new capabilities to emerge—while also being highly complementary to it and able to benefit from its advantages.

Robotics

Robotics is the most literal embodiment of Physical AI: an AI system must perceive, reason, and exert physical action on the material world in real time. It also constitutes a stress test for every primitive.

Consider what a general-purpose robot must do to fold a towel. It needs a learned representation of how deformable materials behave under force—a physical prior not provided by language pre-training. It needs an action architecture that can translate high-level instructions into sequences of continuous motion commands at control frequencies above 20Hz. It needs simulation-generated training data because no one has collected millions of real towel-folding demonstrations. It needs tactile feedback to detect slippage and adjust grip force because vision cannot distinguish a secure grasp from a failing one. It also needs a closed-loop controller that can recognize errors during folding and recover, not just blindly execute memorized trajectories.

Caption: Robotics tasks simultaneously invoke all five underlying primitives

This is why robotics is a frontier system, not a mature engineering discipline with better tools. These primitives are not improving existing robotic capabilities; they are enabling categories of manipulation, movement, and interaction previously impossible outside narrow, controlled industrial environments.

Frontier progress has been significant in recent years—we have written about this before. First-generation VLAs proved that foundation models can control robots for diverse tasks. Architectural advances are bridging high-level reasoning and low-level control in robotic systems. On-device inference is becoming feasible. Cross-embodiment migration means a model can be adapted to a new robot platform with limited data. The remaining core challenge is reliability at scale, which remains the deployment bottleneck. 95% success rate per step translates to only 60% over a 10-step chain, while production environments require far higher rates. RL post-training holds great potential here to help the field cross the capability and robustness threshold needed for the scaling phase.

These advancements have implications for market structure. Value in the robotics industry has for decades been captured in the mechanical systems themselves. Mechanics remain a critical part of the stack, but as learned strategies become more standardized, value will migrate towards models, training infrastructure, and data flywheels. Robotics also feeds back into the primitives: every real-world trajectory is training data to improve world models, every deployment failure exposes gaps in simulation coverage, every test on a new embodiment expands the diversity of physical experience available for pre-training. Robotics is both the most demanding consumer of primitives and one of their most important sources of improvement signals.

Autonomous Science

If robotics tests the primitives with "real-time physical action," autonomous science tests something slightly different—sustained multi-step reasoning about causally complex physical systems, over timeframes of hours or days, where experimental results must be interpreted, contextualized, and used to revise strategies.

Caption: How autonomous science (AI scientists) integrates the five underlying primitives

AI-driven science is the most thorough domain for primitive composition. A self-driving lab (SDL) needs learned representations of physical and chemical dynamics to predict experimental outcomes; needs embodied action to pipette, position samples, operate analytical instruments; needs simulation for pre-screening candidate experiments and allocating scarce instrument time; needs expanded sensing capabilities—spectroscopy, chromatography, mass spectrometry, and increasingly novel chemical and biological sensors—to characterize results. It更需要闭环智能体编排原语比其他任何领域都更需要闭环智能体编排原语:更需要闭环智能体编排原语 than any other field: the ability to maintain multi-round "hypothesis-experiment-analysis-revision" workflows无人介入, retaining provenance, monitoring safety, and adjusting strategies based on information revealed each round.

No other domain invokes these primitives so deeply. This is why autonomous science is a frontier "system," not just laboratory automation with better software. Companies like Periodic Labs and Medra, in materials science and life sciences respectively, synthesize scientific reasoning capabilities with physical validation capabilities to achieve scientific iteration, producing experimental training data along the way.

The value of such systems is intuitively obvious. Traditional material discovery takes years from concept to commercialization; AI-accelerated workflows could theoretically compress this process far more. The key constraint is shifting from hypothesis generation (which foundation models can assist well) to fabrication and validation (which requires physical instruments, robotic execution, closed-loop optimization). SDLs target this bottleneck.

Another important特性 of autonomous science—true for all physical world systems—is its role as a data engine. Every experiment run by an SDL produces not just a scientific result, but also a physically grounded, experimentally validated training signal. A measurement of how a polymer crystallizes under specific conditions enriches the world model's understanding of material dynamics; a validated synthesis pathway becomes training data for physical reasoning; a characterized failure tells the agent system where its predictions break down. The data produced by an AI scientist running real experiments is qualitatively different from internet text or simulation output—it is structured, causal, and empirically verified. This is precisely the kind of data physical reasoning models need most and lack from other sources. Autonomous science is the pathway that directly converts physical reality into structured knowledge, improving the entire Physical AI ecosystem.

New Interfaces

Robotics extends AI into physical action; autonomous science extends it into physical research. New interfaces extend it into the direct coupling of artificial intelligence with human perception, sensory experience, and bodily signals—devices spanning AR glasses, EMG wristbands, all the way to implantable brain-computer interfaces. What binds this category together is not a single technology but a common function: expanding the bandwidth and modalities of the channel between human intelligence and AI systems—and in the process generating human-world interaction data directly usable for building Physical AI.

Caption: The spectrum of new interfaces, from AR glasses to brain-computer interfaces

The distance from the mainstream paradigm is both the challenge and the potential of this field. Language models know about these modalities conceptually but are not natively familiar with the motor patterns of silent speech, the geometry of olfactory receptor binding, or the temporal dynamics of EMG signals. Representations to decode these signals must be learned from the expanding sensory channels. Many modalities lack internet-scale pre-training corpora; data often can only be produced by the interfaces themselves—meaning the system and its training data co-evolve, something without parallel in language AI.

The recent performance of this field is the rapid rise of AI wearables as a consumer category. AR glasses are perhaps the most visible example, but other wearables primarily using voice or vision as input are also emerging simultaneously.

This ecosystem of consumer devices both provides new hardware platforms for extending AI into the physical world and is becoming infrastructure for physical world data. A person wearing AI glasses can continuously produce first-person video streams of how people navigate physical environments, manipulate objects, and interact with the world; other wearables continuously capture biometric and motion data. The installed base of AI wearables is becoming a distributed physical world data acquisition network, recording human physical experience at a previously impossible scale. Consider the volume of smartphones as consumer devices—a new category of consumer device allows computers to perceive the world in new modalities at equivalent scale, opening a huge new channel for AI's interaction with the physical world.

Brain-computer interfaces represent a deeper frontier. Neuralink has implanted multiple patients, with surgical robots and decoding software iterating. Synchron's intravascular Stentrode has been used to allow paralyzed users to control digital and physical environments. Echo Neurotechnologies is developing a BCI system for speech restoration based on their research in high-resolution cortical speech decoding. New companies like Nudge are also being formed, gathering talent and capital to build new neural interface and brain interaction platforms. Technical milestones at the research level are also noteworthy: the BISC chip demonstrated wireless neural recording with 65,536 electrodes on a single chip; the BrainGate team decoded internal speech directly from the motor cortex.

The common thread running through AR glasses, AI wearables, silent speech devices, and implantable BCIs is not just that "they are all interfaces," but that they collectively constitute an increasing-bandwidth spectrum between human physical experience and AI systems—every point on this spectrum supports the continuous progress of the primitives behind the three major domains discussed here. A robot trained on high-quality first-person video from millions of AI glasses users learns operational priors completely different from one trained on curated teleoperation datasets; a lab AI responding to subvocal commands is a completely different experience in terms of latency and fluency compared to a keyboard-controlled lab; a neural decoder trained on high-density BCI data produces motor planning representations unavailable through any other channel.

New interfaces are the mechanism for making the sensory channels themselves larger—they open up previously non-existent data channels between the physical world and AI. And this expansion is driven by consumer device companies pursuing scaled deployment, meaning the data flywheel will accelerate along with consumer adoption.

Systems for the Physical World

The reason to view robotics, autonomous science, and new interfaces as different instances of frontier systems composed from the same set of primitives is that they enable each other and compound.

Caption: The mutual feedback flywheel between robotics, autonomous science, and new interfaces

Robotics enables autonomous science. Self-driving labs are essentially robotic systems. The manipulation capabilities developed for general-purpose robots—dexterous grasping, liquid handling, precise positioning, multi-step task execution—can be directly transferred to laboratory automation. Every step forward in the generality and robustness of robot models expands the range of experimental protocols an SDL can execute autonomously. Every advance in robotic learning lowers the cost and increases the throughput of autonomous experimentation.

Autonomous science enables robotics. The scientific data produced by self-driving labs—validated physical measurements, causal experimental results, material property databases—can provide the kind of structured, grounded training data most needed by world models and physical reasoning engines. Furthermore, the materials and components needed for next-generation robots (better actuators, more sensitive tactile sensors, higher density batteries, etc.) are themselves products of materials science. Autonomous discovery platforms that accelerate materials innovation directly improve the hardware substrate on which robotic learning operates.

New interfaces enable robotics. AR devices are a scalable way to collect data on "how humans perceive and interact with the physical environment." Neural interfaces produce data about human movement intent, cognitive planning, and sensory processing. This data is extremely valuable for training robotic learning systems, especially for tasks involving human-robot collaboration or teleoperation.

There is a deeper observation here about the nature of frontier AI progress itself. The language/code paradigm has produced extraordinary results and is still rising strongly in the scaling era. But the new problems, new data types, new feedback signals, and new evaluation standards offered by the physical world are almost limitless. Grounding AI systems in physical reality—through robots manipulating objects, labs synthesizing materials, interfaces connecting to the biological and physical world—we open up new scaling axes complementary to the existing digital frontier—and likely mutually improving.

Caption: Interaction and emergence across the various scaling axes of Physical AI

What behaviors will emerge from these systems is difficult to predict precisely—emergence is defined by the interaction of independently understandable but combined unprecedented capabilities. But historical patterns are optimistic. Each time AI systems gained a new modality of interaction with the world—seeing (computer vision), speaking (speech recognition), reading and writing (language models)—the resulting capability leap far exceeded the sum of individual improvements. The transition to physical world systems represents the next such phase transition. In this sense, the primitives discussed in this article are being built right now, potentially enabling frontier AI systems to perceive, reason, and act upon the physical world, unlocking significant value and progress in the physical world.

Disclaimer: This article is for informational purposes only and does not constitute any investment advice. It should not be used as a basis for legal, commercial, investment, or tax advice.

Perguntas relacionadas

QWhat are the three key adjacent fields identified as the next frontier for AI beyond language and code, according to the a16z article?

AThe three key adjacent fields are general-purpose robotics, autonomous science (AI scientists), and new human-computer interfaces including brain-computer interfaces.

QWhat are the five underlying primitives that enable the development of AI systems for the physical world, as outlined in the article?

AThe five underlying primitives are: 1. Learned representations of physical dynamics, 2. Architectures for embodied action, 3. Simulation and synthetic data as scaling infrastructure, 4. Expanding sensory channels, and 5. Closed-loop agent systems.

QHow do the fields of robotics, autonomous science, and new interfaces create a mutually reinforcing 'flywheel effect'?

AThey create a flywheel effect by enabling each other: Robotics enables autonomous science by providing the physical automation for labs. Autonomous science enables robotics by generating structured, validated physical data to improve world models. New interfaces enable robotics by providing vast amounts of data on human physical interaction and intent, collected from devices like AR glasses and wearables.

QWhat is the significance of the RECAP method developed by Physical Intelligence, as mentioned in the article?

ARECAP (Reinforcement Learning with Experience and Correction via Advantage-Conditioned Policies) is significant because it combines imitation learning with reinforcement learning. It uses a value function to estimate the probability of success from any state, allowing a robot to learn from its own autonomous practice and expert corrections. This method demonstrated substantial improvements in success rates and failure reduction for long-horizon tasks in real-world home environments.

QWhy is simulation considered a critical scaling infrastructure for physical AI, analogous to internet text data for language models?

ASimulation is critical because collecting real-world physical interaction data (e.g., robot trajectories) is extremely costly, risky, and limited in diversity. Simulation, powered by physics engines and photorealistic rendering, allows for the generation of massive, automatically labeled synthetic data at a scale that mirrors how internet text data solved the scaling problem for language models, thereby dramatically altering the economic assumptions for training physical AI systems.

Leituras Relacionadas

Trading

Spot
Futuros

Artigos em Destaque

O que é GROK AI

Grok AI: Revolucionar a Tecnologia Conversacional na Era Web3 Introdução No panorama em rápida evolução da inteligência artificial, a Grok AI destaca-se como um projeto notável que liga os domínios da tecnologia avançada e da interação com o utilizador. Desenvolvida pela xAI, uma empresa liderada pelo renomado empreendedor Elon Musk, a Grok AI procura redefinir a forma como interagimos com a inteligência artificial. À medida que o movimento Web3 continua a florescer, a Grok AI visa aproveitar o poder da IA conversacional para responder a consultas complexas, proporcionando aos utilizadores uma experiência que é não apenas informativa, mas também divertida. O que é a Grok AI? A Grok AI é um sofisticado chatbot de IA conversacional projetado para interagir com os utilizadores de forma dinâmica. Ao contrário de muitos sistemas de IA tradicionais, a Grok AI abraça uma gama mais ampla de perguntas, incluindo aquelas tipicamente consideradas inadequadas ou fora das respostas padrão. Os principais objetivos do projeto incluem: Raciocínio Fiável: A Grok AI enfatiza o raciocínio de senso comum para fornecer respostas lógicas com base na compreensão contextual. Supervisão Escalável: A integração de assistência de ferramentas garante que as interações dos utilizadores sejam monitorizadas e otimizadas para qualidade. Verificação Formal: A segurança é primordial; a Grok AI incorpora métodos de verificação formal para aumentar a fiabilidade das suas saídas. Compreensão de Longo Contexto: O modelo de IA destaca-se na retenção e recordação de um extenso histórico de conversas, facilitando discussões significativas e contextualizadas. Robustez Adversarial: Ao focar na melhoria das suas defesas contra entradas manipuladas ou maliciosas, a Grok AI visa manter a integridade das interações dos utilizadores. Em essência, a Grok AI não é apenas um dispositivo de recuperação de informações; é um parceiro conversacional imersivo que incentiva um diálogo dinâmico. Criador da Grok AI A mente por trás da Grok AI não é outra senão Elon Musk, um indivíduo sinónimo de inovação em vários campos, incluindo automóvel, viagens espaciais e tecnologia. Sob a égide da xAI, uma empresa focada em avançar a tecnologia de IA de maneiras benéficas, a visão de Musk visa reformular a compreensão das interações com a IA. A liderança e a ética fundacional são profundamente influenciadas pelo compromisso de Musk em ultrapassar os limites tecnológicos. Investidores da Grok AI Embora os detalhes específicos sobre os investidores que apoiam a Grok AI permaneçam limitados, é reconhecido publicamente que a xAI, a incubadora do projeto, é fundada e apoiada principalmente pelo próprio Elon Musk. As anteriores empreitadas e participações de Musk fornecem um forte apoio, reforçando ainda mais a credibilidade e o potencial de crescimento da Grok AI. No entanto, até agora, informações sobre fundações ou organizações de investimento adicionais que apoiam a Grok AI não estão prontamente acessíveis, marcando uma área para exploração futura potencial. Como Funciona a Grok AI? A mecânica operacional da Grok AI é tão inovadora quanto a sua estrutura conceptual. O projeto integra várias tecnologias de ponta que facilitam as suas funcionalidades únicas: Infraestrutura Robusta: A Grok AI é construída utilizando Kubernetes para orquestração de contêineres, Rust para desempenho e segurança, e JAX para computação numérica de alto desempenho. Este trio assegura que o chatbot opere de forma eficiente, escale eficazmente e sirva os utilizadores prontamente. Acesso a Conhecimento em Tempo Real: Uma das características distintivas da Grok AI é a sua capacidade de aceder a dados em tempo real através da plataforma X—anteriormente conhecida como Twitter. Esta capacidade concede à IA acesso às informações mais recentes, permitindo-lhe fornecer respostas e recomendações oportunas que outros modelos de IA poderiam perder. Dois Modos de Interação: A Grok AI oferece aos utilizadores a escolha entre “Modo Divertido” e “Modo Regular”. O Modo Divertido permite um estilo de interação mais lúdico e humorístico, enquanto o Modo Regular foca em fornecer respostas precisas e exatas. Esta versatilidade assegura uma experiência adaptada que atende a várias preferências dos utilizadores. Em essência, a Grok AI combina desempenho com envolvimento, criando uma experiência que é tanto enriquecedora quanto divertida. Cronologia da Grok AI A jornada da Grok AI é marcada por marcos fundamentais que refletem as suas fases de desenvolvimento e implementação: Desenvolvimento Inicial: A fase fundamental da Grok AI ocorreu ao longo de aproximadamente dois meses, durante os quais o treinamento inicial e o ajuste do modelo foram realizados. Lançamento Beta do Grok-2: Numa evolução significativa, o beta do Grok-2 foi anunciado. Este lançamento introduziu duas versões do chatbot—Grok-2 e Grok-2 mini—cada uma equipada com capacidades para conversar, programar e raciocinar. Acesso Público: Após o seu desenvolvimento beta, a Grok AI tornou-se disponível para os utilizadores da plataforma X. Aqueles com contas verificadas por um número de telefone e ativas há pelo menos sete dias podem aceder a uma versão limitada, tornando a tecnologia disponível para um público mais amplo. Esta cronologia encapsula o crescimento sistemático da Grok AI desde a sua concepção até ao envolvimento público, enfatizando o seu compromisso com a melhoria contínua e a interação com o utilizador. Principais Características da Grok AI A Grok AI abrange várias características principais que contribuem para a sua identidade inovadora: Integração de Conhecimento em Tempo Real: O acesso a informações atuais e relevantes diferencia a Grok AI de muitos modelos estáticos, permitindo uma experiência de utilizador envolvente e precisa. Estilos de Interação Versáteis: Ao oferecer modos de interação distintos, a Grok AI atende a várias preferências dos utilizadores, convidando à criatividade e personalização na conversa com a IA. Base Tecnológica Avançada: A utilização de Kubernetes, Rust e JAX fornece ao projeto uma estrutura sólida para garantir fiabilidade e desempenho ótimo. Consideração de Discurso Ético: A inclusão de uma função de geração de imagens demonstra o espírito inovador do projeto. No entanto, também levanta considerações éticas em torno dos direitos autorais e da representação respeitosa de figuras reconhecíveis—uma discussão em curso dentro da comunidade de IA. Conclusão Como uma entidade pioneira no domínio da IA conversacional, a Grok AI encapsula o potencial para experiências transformadoras do utilizador na era digital. Desenvolvida pela xAI e impulsionada pela abordagem visionária de Elon Musk, a Grok AI integra conhecimento em tempo real com capacidades avançadas de interação. Esforça-se por ultrapassar os limites do que a inteligência artificial pode alcançar, mantendo um foco nas considerações éticas e na segurança do utilizador. A Grok AI não apenas incorpora o avanço tecnológico, mas também representa um novo paradigma de conversas no panorama Web3, prometendo envolver os utilizadores com conhecimento hábil e interação lúdica. À medida que o projeto continua a evoluir, ele permanece como um testemunho do que a interseção da tecnologia, criatividade e interação humana pode alcançar.

354 Visualizações TotaisPublicado em {updateTime}Atualizado em 2024.12.26

O que é GROK AI

O que é ERC AI

Euruka Tech: Uma Visão Geral do $erc ai e as suas Ambições no Web3 Introdução No panorama em rápida evolução da tecnologia blockchain e das aplicações descentralizadas, novos projetos surgem frequentemente, cada um com objetivos e metodologias únicas. Um desses projetos é a Euruka Tech, que opera no vasto domínio das criptomoedas e do Web3. O foco principal da Euruka Tech, particularmente do seu token $erc ai, é apresentar soluções inovadoras concebidas para aproveitar as capacidades crescentes da tecnologia descentralizada. Este artigo tem como objetivo fornecer uma visão abrangente da Euruka Tech, uma exploração das suas metas, funcionalidade, a identidade do seu criador, potenciais investidores e a sua importância no contexto mais amplo do Web3. O que é a Euruka Tech, $erc ai? A Euruka Tech é caracterizada como um projeto que aproveita as ferramentas e funcionalidades oferecidas pelo ambiente Web3, focando na integração da inteligência artificial nas suas operações. Embora os detalhes específicos sobre a estrutura do projeto sejam um tanto elusivos, ele é concebido para melhorar o envolvimento dos utilizadores e automatizar processos no espaço cripto. O projeto visa criar um ecossistema descentralizado que não só facilita transações, mas também incorpora funcionalidades preditivas através da inteligência artificial, daí a designação do seu token, $erc ai. O objetivo é fornecer uma plataforma intuitiva que facilite interações mais inteligentes e um processamento eficiente de transações dentro da crescente esfera do Web3. Quem é o Criador da Euruka Tech, $erc ai? Neste momento, a informação sobre o criador ou a equipa fundadora da Euruka Tech permanece não especificada e algo opaca. Esta ausência de dados levanta preocupações, uma vez que o conhecimento sobre o histórico da equipa é frequentemente essencial para estabelecer credibilidade no setor blockchain. Portanto, categorizamos esta informação como desconhecida até que detalhes concretos sejam disponibilizados no domínio público. Quem são os Investidores da Euruka Tech, $erc ai? De forma semelhante, a identificação de investidores ou organizações de apoio para o projeto Euruka Tech não é prontamente fornecida através da pesquisa disponível. Um aspeto que é crucial para potenciais partes interessadas ou utilizadores que consideram envolver-se com a Euruka Tech é a garantia que vem de parcerias financeiras estabelecidas ou apoio de empresas de investimento respeitáveis. Sem divulgações sobre afiliações de investimento, é difícil tirar conclusões abrangentes sobre a segurança financeira ou a longevidade do projeto. Em linha com a informação encontrada, esta seção também se encontra no estado de desconhecido. Como funciona a Euruka Tech, $erc ai? Apesar da falta de especificações técnicas detalhadas para a Euruka Tech, é essencial considerar as suas ambições inovadoras. O projeto procura aproveitar o poder computacional da inteligência artificial para automatizar e melhorar a experiência do utilizador no ambiente das criptomoedas. Ao integrar IA com tecnologia blockchain, a Euruka Tech visa fornecer funcionalidades como negociações automatizadas, avaliações de risco e interfaces de utilizador personalizadas. A essência inovadora da Euruka Tech reside no seu objetivo de criar uma conexão fluida entre os utilizadores e as vastas possibilidades apresentadas pelas redes descentralizadas. Através da utilização de algoritmos de aprendizagem automática e IA, visa minimizar os desafios enfrentados por utilizadores de primeira viagem e agilizar as experiências transacionais dentro do quadro do Web3. Esta simbiose entre IA e blockchain sublinha a importância do token $erc ai, que se apresenta como uma ponte entre interfaces de utilizador tradicionais e as capacidades avançadas das tecnologias descentralizadas. Cronologia da Euruka Tech, $erc ai Infelizmente, devido à informação limitada disponível sobre a Euruka Tech, não conseguimos apresentar uma cronologia detalhada dos principais desenvolvimentos ou marcos na jornada do projeto. Esta cronologia, tipicamente inestimável para traçar a evolução de um projeto e compreender a sua trajetória de crescimento, não está atualmente disponível. À medida que informações sobre eventos notáveis, parcerias ou adições funcionais se tornem evidentes, atualizações certamente aumentarão a visibilidade da Euruka Tech na esfera cripto. Esclarecimento sobre Outros Projetos “Eureka” É importante abordar que múltiplos projetos e empresas partilham uma nomenclatura semelhante com “Eureka.” A pesquisa identificou iniciativas como um agente de IA da NVIDIA Research, que se concentra em ensinar robôs a realizar tarefas complexas utilizando métodos generativos, bem como a Eureka Labs e a Eureka AI, que melhoram a experiência do utilizador na educação e na análise de serviços ao cliente, respetivamente. No entanto, estes projetos são distintos da Euruka Tech e não devem ser confundidos com os seus objetivos ou funcionalidades. Conclusão A Euruka Tech, juntamente com o seu token $erc ai, representa um jogador promissor, mas atualmente obscuro, dentro do panorama do Web3. Embora os detalhes sobre o seu criador e investidores permaneçam não divulgados, a ambição central de combinar inteligência artificial com tecnologia blockchain destaca-se como um ponto focal de interesse. As abordagens únicas do projeto em promover o envolvimento do utilizador através da automação avançada podem diferenciá-lo à medida que o ecossistema Web3 avança. À medida que o mercado cripto continua a evoluir, as partes interessadas devem manter um olhar atento sobre os avanços em torno da Euruka Tech, uma vez que o desenvolvimento de inovações documentadas, parcerias ou um roteiro definido pode apresentar oportunidades significativas no futuro próximo. Neste momento, aguardamos por insights mais substanciais que possam desvendar o potencial da Euruka Tech e a sua posição no competitivo panorama cripto.

402 Visualizações TotaisPublicado em {updateTime}Atualizado em 2025.01.02

O que é ERC AI

O que é DUOLINGO AI

DUOLINGO AI: Integrar a Aprendizagem de Línguas com Inovação Web3 e IA Numa era em que a tecnologia transforma a educação, a integração da inteligência artificial (IA) e das redes blockchain anuncia uma nova fronteira para a aprendizagem de línguas. Apresentamos DUOLINGO AI e a sua criptomoeda associada, $DUOLINGO AI. Este projeto aspira a unir o poder educativo das principais plataformas de aprendizagem de línguas com os benefícios da tecnologia descentralizada Web3. Este artigo explora os principais aspectos do DUOLINGO AI, analisando os seus objetivos, estrutura tecnológica, desenvolvimento histórico e potencial futuro, mantendo a clareza entre o recurso educativo original e esta iniciativa independente de criptomoeda. Visão Geral do DUOLINGO AI No seu cerne, DUOLINGO AI procura estabelecer um ambiente descentralizado onde os alunos podem ganhar recompensas criptográficas por alcançar marcos educativos em proficiência linguística. Ao aplicar contratos inteligentes, o projeto visa automatizar processos de verificação de habilidades e alocação de tokens, aderindo aos princípios do Web3 que enfatizam a transparência e a propriedade do utilizador. O modelo diverge das abordagens tradicionais de aquisição de línguas ao apoiar-se fortemente numa estrutura de governança orientada pela comunidade, permitindo que os detentores de tokens sugiram melhorias ao conteúdo dos cursos e à distribuição de recompensas. Alguns dos objetivos notáveis do DUOLINGO AI incluem: Aprendizagem Gamificada: O projeto integra conquistas em blockchain e tokens não fungíveis (NFTs) para representar níveis de proficiência linguística, promovendo a motivação através de recompensas digitais envolventes. Criação de Conteúdo Descentralizada: Abre caminhos para educadores e entusiastas de línguas contribuírem com os seus cursos, facilitando um modelo de partilha de receitas que beneficia todos os colaboradores. Personalização Através de IA: Ao empregar modelos avançados de aprendizagem de máquina, o DUOLINGO AI personaliza as lições para se adaptar ao progresso de aprendizagem individual, semelhante às características adaptativas encontradas em plataformas estabelecidas. Criadores do Projeto e Governança A partir de abril de 2025, a equipa por trás do $DUOLINGO AI permanece pseudónima, uma prática frequente no panorama descentralizado das criptomoedas. Esta anonimidade visa promover o crescimento coletivo e o envolvimento das partes interessadas, em vez de se concentrar em desenvolvedores individuais. O contrato inteligente implementado na blockchain Solana indica o endereço da carteira do desenvolvedor, o que significa o compromisso com a transparência em relação às transações, apesar da identidade dos criadores ser desconhecida. De acordo com o seu roteiro, o DUOLINGO AI pretende evoluir para uma Organização Autónoma Descentralizada (DAO). Esta estrutura de governança permite que os detentores de tokens votem em questões críticas, como implementações de funcionalidades e alocação de tesouraria. Este modelo alinha-se com a ética de empoderamento comunitário encontrada em várias aplicações descentralizadas, enfatizando a importância da tomada de decisão coletiva. Investidores e Parcerias Estratégicas Atualmente, não existem investidores institucionais ou capitalistas de risco publicamente identificáveis ligados ao $DUOLINGO AI. Em vez disso, a liquidez do projeto origina-se principalmente de trocas descentralizadas (DEXs), marcando um contraste acentuado com as estratégias de financiamento das empresas tradicionais de tecnologia educacional. Este modelo de base indica uma abordagem orientada pela comunidade, refletindo o compromisso do projeto com a descentralização. No seu whitepaper, o DUOLINGO AI menciona a formação de colaborações com “plataformas de educação blockchain” não especificadas, com o objetivo de enriquecer a sua oferta de cursos. Embora parcerias específicas ainda não tenham sido divulgadas, estes esforços colaborativos sugerem uma estratégia para misturar inovação em blockchain com iniciativas educativas, expandindo o acesso e o envolvimento dos utilizadores em diversas vias de aprendizagem. Arquitetura Tecnológica Integração de IA O DUOLINGO AI incorpora dois componentes principais impulsionados por IA para melhorar as suas ofertas educativas: Motor de Aprendizagem Adaptativa: Este motor sofisticado aprende a partir das interações dos utilizadores, semelhante a modelos proprietários de grandes plataformas educativas. Ele ajusta dinamicamente a dificuldade das lições para abordar desafios específicos dos alunos, reforçando áreas fracas através de exercícios direcionados. Agentes Conversacionais: Ao empregar chatbots alimentados por GPT-4, o DUOLINGO AI oferece uma plataforma para os utilizadores se envolverem em conversas simuladas, promovendo uma experiência de aprendizagem de línguas mais interativa e prática. Infraestrutura Blockchain Construído na blockchain Solana, o $DUOLINGO AI utiliza uma estrutura tecnológica abrangente que inclui: Contratos Inteligentes de Verificação de Habilidades: Esta funcionalidade atribui automaticamente tokens aos utilizadores que passam com sucesso em testes de proficiência, reforçando a estrutura de incentivos para resultados de aprendizagem genuínos. Emblemas NFT: Estes tokens digitais significam vários marcos que os alunos alcançam, como completar uma seção do seu curso ou dominar habilidades específicas, permitindo-lhes negociar ou exibir as suas conquistas digitalmente. Governança DAO: Membros da comunidade com tokens podem participar na governança votando em propostas-chave, facilitando uma cultura participativa que incentiva a inovação nas ofertas de cursos e funcionalidades da plataforma. Cronologia Histórica 2022–2023: Conceituação O trabalho preliminar para o DUOLINGO AI começa com a criação de um whitepaper, destacando a sinergia entre os avanços em IA na aprendizagem de línguas e o potencial descentralizado da tecnologia blockchain. 2024: Lançamento Beta Um lançamento beta limitado introduz ofertas em línguas populares, recompensando os primeiros utilizadores com incentivos em tokens como parte da estratégia de envolvimento comunitário do projeto. 2025: Transição para DAO Em abril, ocorre um lançamento completo da mainnet com a circulação de tokens, promovendo discussões comunitárias sobre possíveis expansões para línguas asiáticas e outros desenvolvimentos de cursos. Desafios e Direções Futuras Obstáculos Técnicos Apesar dos seus objetivos ambiciosos, o DUOLINGO AI enfrenta desafios significativos. A escalabilidade continua a ser uma preocupação constante, particularmente no equilíbrio dos custos associados ao processamento de IA e à manutenção de uma rede descentralizada responsiva. Além disso, garantir a criação e moderação de conteúdo de qualidade num ambiente descentralizado apresenta complexidades na manutenção dos padrões educativos. Oportunidades Estratégicas Olhando para o futuro, o DUOLINGO AI tem o potencial de aproveitar parcerias de micro-certificação com instituições académicas, proporcionando validações verificadas em blockchain das habilidades linguísticas. Além disso, a expansão cross-chain poderia permitir que o projeto acedesse a bases de utilizadores mais amplas e a ecossistemas de blockchain adicionais, melhorando a sua interoperabilidade e alcance. Conclusão DUOLINGO AI representa uma fusão inovadora de inteligência artificial e tecnologia blockchain, apresentando uma alternativa focada na comunidade aos sistemas tradicionais de aprendizagem de línguas. Embora o seu desenvolvimento pseudónimo e o modelo económico emergente tragam certos riscos, o compromisso do projeto com a aprendizagem gamificada, educação personalizada e governança descentralizada ilumina um caminho a seguir para a tecnologia educativa no domínio do Web3. À medida que a IA continua a avançar e o ecossistema blockchain evolui, iniciativas como o DUOLINGO AI poderão redefinir a forma como os utilizadores interagem com a educação linguística, empoderando comunidades e recompensando o envolvimento através de mecanismos de aprendizagem inovadores.

388 Visualizações TotaisPublicado em {updateTime}Atualizado em 2025.04.11

O que é DUOLINGO AI

Discussões

Bem-vindo à Comunidade HTX. Aqui, pode manter-se informado sobre os mais recentes desenvolvimentos da plataforma e obter acesso a análises profissionais de mercado. As opiniões dos utilizadores sobre o preço de AI (AI) são apresentadas abaixo.

活动图片