Introduction to the Concept of World Models: A Story from Psychology to the Main Battlefield of AI

marsbitPublished on 2026-06-29Last updated on 2026-06-29

Abstract

**World Models: From Psychology to AI's Core Concept** "World model" is a trending but often confusing term in AI, describing a system that allows machines to internally simulate, predict, and rehearse potential outcomes before taking real-world action—like a mental "sandbox." While definitions vary—Yann LeCun emphasizes physical understanding, OpenAI's Sora is a video-based "world simulator," Google DeepMind's Genie 3 creates interactive 3D environments, and companies like Alibaba and Tesla focus on practical applications—the core goal is consistent: reduce reliance on vast real-world data by creating an internal, predictive model for safer and more efficient AI. The concept has deep roots, tracing back to psychologist Kenneth Craik (1943). In AI, it was revitalized by researchers like David Ha and Jürgen Schmidhuber (2018). Major technical approaches include: 1) generative video models (e.g., Sora) for visual realism; 2) abstract predictive models (e.g., LeCun's JEPA) for efficiency and physical reasoning; and 3) explicit 3D simulators (e.g., NVIDIA Omniverse) for precision. Fei-Fei Li proposes a classification based on the AI action loop: renderers (output observations), simulators (output world states), and planners (output actions). The emerging "World Action Model" (WAM) paradigm aims to unify future prediction and action generation. An industry framework is forming: upstream (data, compute, sensors), midstream (general and vertical platforms), and downstream appli...

The world model is currently one of the hottest yet most confusing concepts for ordinary people in the AI circle. Some say it's the ability for AI to dream, others call it a simulator for autonomous driving, and still others describe it as the brain of a robot.

Fei-Fei Li, Yann LeCun, OpenAI, Google DeepMind, NVIDIA, as well as domestic giants like Alibaba, Tencent, Huawei, and automakers, each have their own definitions.

This article attempts to explain in plain language:

What problem world models aim to solve; why these scholars and big tech companies are fascinated by them; and why this concept has become an industrial battleground even before its name has been standardized.

I. Understanding in One Sentence: Letting AI Pre-enact the World in a 'Mental Sandbox'

Imagine you're standing at an intersection about to cross the street.

Your eyes see the green light, vehicles, pedestrians; your brain constructs a miniature scenario within milliseconds: if I walk now, will that car accelerate? Will that cyclist suddenly turn?

You haven't actually stepped out; you've first run through several possibilities in your mind.

Psychologists call this ability a 'mental model,' while AI researchers term it a 'world model.'

In other words, a world model is a 'mental sandbox' inside a machine.

It doesn't simply recognize what's in a scene; it can predict what will happen next and repeatedly trial-and-error without taking real action.

For autonomous driving, it can generate virtual test papers for heavy rain, blizzards, and irregular obstacles; for robots, it can let humanoid robots fall 100,000 times in a simulated world before going outside; for gaming and film companies, it could be an infinitely explorable parallel universe.

By 2026, the frequency of the term 'world model' appearing in tech reports had already surpassed the clarity of its definition.

Alibaba developed Qwen-AgentWorld, HappyOyster, Qwen-RobotWorld, targeting language worlds, virtual worlds, and physical worlds respectively; Tencent's HY-World 2.0 emphasizes 3D editable worlds; Nio, Xpeng, Li Auto prefer terms like 'driving world model' or 'world behavior model'; Huawei and Baidu seldom use the term alone in public materials.

The confusion in naming makes the concept seem like a catch-all basket.

But behind all the terms lies a common core:

Allowing the machine to first establish an internally deducible, reviewable environment before taking real action. This environment can be pixels, 3D structures, physical parameters, or abstract states. The goal is to reduce unlimited reliance on real data, compressing the real world into a data engine capable of infinite generation, infinite mistakes, and infinite retries.

The lack of unified naming precisely indicates that world models are in the early stage of transitioning from an academic concept to industrial infrastructure.

II. The Source of Thought: A WWII Psychologist and Several AI Pioneers

2.1 Kenneth Craik: The First to Talk About a 'Small Model in the Mind'

The idea of world models predates deep learning by most of a century. In 1943, Scottish psychologist Kenneth Craik, in his book 'The Nature of Explanation,' proposed that the human brain constructs 'small-scale models' of reality to predict and understand external events.

Craik was only 31 then, a scholar at the Cambridge University Psychological Laboratory, also engaged in applied psychology research in Britain during WWII.

His book was published two years before he died in a bicycle accident at the age of 33.

But the idea persisted: humans don't need to fully replicate the world; a sufficiently useful internal model allows pre-enactment before action.

This view aligns almost perfectly with the core of today's AI world models. Machines also don't need to remember every detail of the world but learn the laws governing it and deduce the future when needed.

After Craik, in the 1980s, British psychologist Philip Johnson-Laird further systematized this thought, proving that much human reasoning involves manipulating 'mental models' in the brain. He taught long-term at Princeton and Cambridge and is a key figure in cognitive science.

2.2 Marvin Minsky: The One Who Wanted Machines to Have a Common-Sense Framework

The field of artificial intelligence echoed this early on. In the 1960s, Marvin Minsky at MIT proposed 'frame theory.'

He was a co-founder of the MIT AI Lab, a 1969 Turing Award laureate, and often regarded as one of the founders of the AI discipline.

Frame theory attempted to capture human commonsense knowledge about the world using structured knowledge frames:

Entering a door requires finding the handle first; restaurants typically have tables and chairs; objects fall under gravity.

What Minsky aimed to do is exactly what world models today still haven't accomplished—giving machines a structured, deducible common-sense knowledge base of the world.

2.3 David Ha & Jürgen Schmidhuber: Bringing World Models Back to the Deep Learning Mainstream

The field of reinforcement learning approached the same goal from another path.

In 2018, David Ha and Jürgen Schmidhuber's NeurIPS paper, 'Recurrent World Models Facilitate Policy Evolution,' reintroduced the term 'world model' to the deep learning mainstream.

David Ha was at Google Brain then, later becoming an independent researcher. His work style leans towards engineering, skilled at creating impressive demos with concise architectures.

Jürgen Schmidhuber is a co-founder of the Swiss AI Lab IDSIA, one of the inventors of Long Short-Term Memory networks (LSTM), known in the AI field for being outspoken and holding independent views. He is sometimes called the 'father of modern AI,' though this title is debated, his academic influence is undeniable.

Their architecture was simple:

Use a VAE to compress high-dimensional frames into low-dimensional latent vectors, use an RNN to learn the changes of these vectors over time, then use a simple controller to train policies in 'imagination.'

The agent first dreams in the learned world model, then transfers the policy back to the real environment.

This paper was selected for a NeurIPS oral presentation, directly inspiring the later Dreamer series and turning 'world model' from a psychological concept into an engineering goal in deep learning.

III. World Models in the Eyes of Scholars

3.1 Yann LeCun: Don't Just Generate Videos, Understand Physics

Yann LeCun is French, a professor at New York University, and Chief AI Scientist at Meta.

He is one of the inventors of Convolutional Neural Networks (CNN), jointly awarded the 2018 Turing Award with Geoffrey Hinton (Fei-Fei Li's PhD advisor) and Yoshua Bengio; the trio is hailed as the 'Godfathers of Deep Learning.'

LeCun has consistently been critical of the current large language model path, believing that merely predicting the next word cannot produce true intelligence.

In 2022, in an article titled 'A Path Towards Autonomous Machine Intelligence,' he proposed that true intelligence requires a configurable predictive world model.

The goal is not generating text or images but understanding the laws of the physical world and predicting action consequences. He even criticized continuing to scale up large language models as 'nonsense,' arguing that the core of intelligence lies in learning the physical structure of the real world.

JEPA is the technical vehicle for this path. JEPA stands for Joint Embedding Predictive Architecture.

Unlike predicting the next frame in pixel space, JEPA simulates changes in world states in an abstract representation space.

An analogy: video generation models are drawing the next picture; JEPA is 'feeling' what will happen next in the mind.

The 2023 I-JEPA, 2024 V-JEPA, 2025 LeJEPA, and 2026 LeWorldModel form a continuously evolving system.

LeCun also introduced the 'System 1 / System 2' concept: System 1 is intuitive, fast reactions; System 2 involves invoking the world model for deliberate reasoning and planning.

Latest theoretical work even proves that under certain conditions, the representations learned by JEPA can establish a linear correspondence with real physical variables, meaning the model mathematically learns physical structure, not just a useful encoding.

3.2 Fei-Fei Li: Classifying World Models Using an 'Action-Observation' Loop

Fei-Fei Li is a professor of computer science at Stanford University, the primary creator of the ImageNet dataset. ImageNet catalyzed the deep learning revolution in 2012, earning her the title 'Godmother of AI.'

She previously served as Chief Scientist of AI at Google Cloud, founded World Labs in 2023 focusing on spatial intelligence and 3D world models. In 2024, she received multiple honors for promoting AI democratization and applications in healthcare, etc., and is one of the most influential Chinese scientists in AI today.

In June 2026, Fei-Fei Li and the World Labs team published a widely circulated article attempting to establish a taxonomy for the chaotic world model concept.

She referenced POMDP (Partially Observable Markov Decision Process) from reinforcement learning.

This concept sounds complex but describes a simple cycle: the agent takes an action, the action changes the world state, the agent obtains an observation, then takes the next action based on the observation.

She pointed out that all systems called world models are essentially projections of this cycle in different directions, each outputting a fragment of the cycle.

Based on this, she classified world models into three categories.

The first is Renderers, outputting observations—pixels for the human eye. Typical examples are video generation models and Google Genie 3, optimizing for visual fidelity.

The second is Simulators, outputting states—faithful world representations at geometric, physical, and dynamic levels. Typical examples are NVIDIA Omniverse and World Labs' Marble, optimizing for structural accuracy.

The third is Planners, outputting actions—answering 'what to do next' given observations and goals. Typical examples are VLA and World Action Models.

Li believes these three capabilities rely on the same underlying knowledge, and the ultimate trend is towards a unified world model.

3.3 Tsinghua FIB-Lab: Only Two Types of World Models—Understanding the World or Predicting the Future

Tsinghua University FIB-Lab is a team long researching AGI, embodied intelligence, and robot learning. FIB is typically understood as 'Future Intelligence and Brain' related lab, affiliated with the Institute for AI Industry Research, Tsinghua University.

The team has published numerous surveys and papers on world models and robotics, a significant force in domestic research on this direction.

In 2026, they released the survey 'Understanding World or Predicting Future: A Comprehensive Survey of World Models,' dividing the field in another way.

They classified the core functions of world models into two broad categories: Understanding the World and Predicting the Future.

Understanding the World emphasizes constructing implicit representations of the external environment to support decision-making, represented by the Dreamer series and world knowledge based on large language models.

Predicting the Future emphasizes explicitly generating future states, typified by video or 3D environment generation models like Sora, Genie 3, Cosmos.

This classification's advantage is being closer to engineering practice: the former serves reinforcement learning and decision-making, the latter serves generation and simulation.

3.4 Peking University OpenWorldLib: Making a Standardized Toolbox for World Models

In April 2026, Peking University jointly with institutions like Kuaishou released OpenWorldLib. Peking University is a domestic powerhouse in AI foundational research, housing institutions like the Key Laboratory of Machine Perception and Intelligence (MoE); Kuaishou is a domestic short-video giant, investing heavily in large models and multimodal generation in recent years.

Their joint release of OpenWorldLib shows both academia and industry are realizing world models need unified standards and reusable components.

OpenWorldLib first attempted a standardized definition for world models: a model or framework with perception as its core, possessing interactive and long-term memory capabilities, used for understanding and predicting the complex world.

They criticized equating world models simply with 'predicting the next frame' as too narrow, believing true world models must embody genuine understanding of physical laws.

OpenWorldLib splits world models into five core modules: Operator, Synthesis, Reasoning, Representation, Memory, coordinated by a pipeline module.

This framework resembles a toolbox, aiming to let different research teams combine modules like building blocks.

IV. World Models in the Eyes of Big Tech

4.1 OpenAI: Sora as a 'World Simulator'

OpenAI is currently one of the most influential AI companies globally. It is famous for the GPT series of large language models and ChatGPT. After releasing Sora in 2024, it again sparked global attention on video generation and world simulation.

In February 2024, OpenAI released Sora's technical report titled 'Video Generation Models as World Simulators,' directly positioning video generation models as world simulators. Sora doesn't rely on explicit 3D modeling or physics engines but trains generative models on massive video data, enabling emergent abilities like 3D consistency, long-term coherence, object permanence, and simple world interactions.

OpenAI believes large-scale scaling of video generation models is a promising path to building a general simulator of the physical world.

But Sora's limitations are evident: inability to accurately simulate basic physics like glass breaking, inconsistencies in long samples, objects appearing uncontrollably. So it's more a directional statement than a mature definition.

4.2 Google DeepMind: Genie 3 as a Real-Time, Interactive General World Model

Google DeepMind was formed after Google acquired the UK AI company DeepMind in 2014; Demis Hassabis is the co-founder and CEO.

DeepMind developed milestone systems like AlphaGo and AlphaFold, one of the global frontiers in AI research. Demis Hassabis himself is a computer scientist, neuroscientist, and game designer, long focused on AGI.

In August 2025, Google DeepMind released Genie 3, officially defined as 'the first real-time, interactive, photorealistic world model.'

It can generate explorable 3D environments from simple text descriptions, runs at 20-24 fps, supports character control, promptable world events, and interactive memory up to one minute. Genie 3 generates frames autoregressively, anchors the real world using Google Maps street view data, and is positioned as a key milestone towards AGI.

4.3 NVIDIA: Cosmos as the 'World Foundation Model' for Physical AI

NVIDIA was founded in 1993 by Jensen Huang, Chris Malachowsky, and Curtis Priem, with Jensen Huang long serving as CEO. The company started with graphics chips (GPUs) and became the core supplier of global AI infrastructure over the past decade due to exploding demand for AI training compute.

Jensen Huang frequently proposes judgments like 'Physical AI' and 'The next wave of AI is robotics.' NVIDIA also continuously launches software/hardware platforms for robotics, autonomous driving, and simulation.

In January 2025, NVIDIA released Cosmos, positioned as a 'World Foundation Model Platform.' It's not a single model but a series of physics-aware video models that can predict and generate future states of virtual environments, divided into Nano, Super, Ultra tiers, trained on 20 million hours of real-world data.

Cosmos's ambition is to become the underlying infrastructure for Physical AI, serving robotics, autonomous driving, industrial simulation, etc.

NVIDIA also open-sourced it, allowing commercial use.

4.4 Domestic Giants: Not Calling It World Models, But Doing World Models

Domestic enterprises rarely provide philosophical definitions in public materials, instead directly landing on products and scenarios.

Alibaba's three products cover language world simulation, virtual world generation, and robot physical world respectively;

Tencent's HY-World 2.0 focuses on 3D editable worlds; ByteDance's Seed world model aims to reach Genie 3's SOTA level by year-end;

Huawei's Pangu Model Intelligent Driving Edition emphasizes physical law learning and closed-loop simulation; Baidu Apollo ADFM integrates world model capabilities into the autonomous driving large model; Xiaomi's OneVL attempts to unify VLA with world models.

Among automakers, Nio's NWM, Li Auto's reconstruction plus generation world model, Xpeng's X-World, Geely's WAM, BYD's pre-research, Great Wall's VLA plus world model, core uses are end-to-end intelligent driving training and long-tail scenario generation.

V. Three Technical Paths: Drawing, Mental Calculation, Building Blocks

From an engineering perspective, current world models roughly have three main technical paths, understandable through three metaphors.

The first is the 'Drawing' path, i.e., generative video models. Sora, Genie 3, Cosmos, Kuaishou's Kling, Pika belong here. Core ability is generating future frames in pixel space; advantage is strong visual realism, low data threshold, easily understandable. Disadvantage is weak physical consistency; watching longer reveals object distortion, gravity failure, timeline confusion.

The second is the 'Mental Calculation' path, represented by LeCun's JEPA and Ha & Schmidhuber's RNN world model. Core idea is not predicting pixels but predicting abstract representations. Advantage is high efficiency, more stable learning of physical structure; disadvantage is poor interpretability of representation space, long engineering implementation cycles. It's more like an athlete's intuition: not needing to mentally play the action frame-by-frame to anticipate the ball's landing.

The third is the 'Building Blocks' path, represented by NVIDIA Omniverse, World Labs Marble, Tencent HY-World. Core idea is directly generating 3D environments with geometric, physical, dynamic properties. Advantage is precise, controllable, editable, verifiable; disadvantage is scarce data, high computational cost, limited generalization. It's more like an engineer's CAD software—precisely measurable, repeatedly adjustable, but distant from the natural world.

The three paths currently have their own territories, but boundaries are blurring. Video generation models are adding physical constraints; 3D simulators are introducing generative capabilities; JEPA architectures are merging with VLA into WAM. The unified world model predicted by Fei-Fei Li is precisely the result of their fusion.

VI. World Action Model: From 'Seeing the World' to 'Taking Action'

In May 2026, the Fudan OpenMOSS team jointly with multiple institutions released a WAM survey, formally proposing the World Action Models paradigm.

Fudan OpenMOSS is one of the earliest teams promoting the large model open-source ecosystem domestically; the Mooss series models have high recognition in the Chinese community.

WAM's core definition: Future state prediction and action generation must be jointly learned within the same policy, not training a VLA first then attaching a world model as an auxiliary.

A通俗对比: VLA is 'see the scene, understand the instruction, then take action'; world model is 'know the current state and action, can imagine the next frame'; WAM is 'see the scene, understand the instruction, simultaneously imagine the next frame and take action.'

These three combined are the true 'unity of knowledge and action' ability robots need.

WAM is divided into Cascaded and Joint architectures.

Cascaded generates future frames first then decodes actions, easier to build engineering-wise but higher latency, errors easily propagate. Joint uses a single model to simultaneously output future and action, theoretically more robust but complex training objective design.

NVIDIA's Jim Fan even asserted at the 2026 Sequoia AI Ascent conference, 'VLA is dead, world action models are the future.' Jim Fan is a senior research scientist at NVIDIA, head of the GEAR team, researching robotics, simulation, embodied intelligence.

Though controversial, this statement highlights the field's热度.

VII. Industry Framework: A Three-Tier Structure Has Formed

The world model industry chain is transitioning from papers and demos to layered infrastructure. Imagine building a house: some mine and smelt steel, some produce prefabricated panels, some build residences, malls, factories on top.

The upstream is the Basic Support Layer, including high-precision data collection, computing services, and sensor hardware.

Data collection involves HD maps, spatial scanning, video采集, teleoperation; computing services center on GPUs and cloud servers; sensor hardware includes LiDAR, cameras, IMUs. NVIDIA, with GPUs, holds an invisible霸主 position here; almost all world model training relies on its computing power.

Cost is the core pain point: training trillion-parameter world models requires thousands of GPUs, single training costs can reach millions of dollars.

The midstream is the Technology Platform Layer, divided into general-purpose platforms and vertical platforms.

General-purpose platforms provide cross-industry通用能力, represented by NVIDIA Omniverse, SenseTime OpenDIL, Huawei Pangu, Alibaba Tongyi series. Vertical platforms focus on specific industries, like autonomous driving world models, architectural world models, embodied intelligence world models. Platform companies are gaining dominance through ecosystem integration,预计到2030年 may occupy over 50% of the industrial chain's market share.

The downstream is the Scenario Application Layer, covering autonomous driving, embodied intelligence, smart construction, gaming/entertainment, spatial services, medical simulation, climate prediction, etc.

Automotive, electronics, healthcare are believed to contribute over 60% of current industry revenue. Autonomous driving is the most mature application scenario;几乎所有主流车企 have incorporated world models into core R&D processes; embodied intelligence is the most promising新兴方向; over 60% of industrial robots use world models for辅助训练.

VIII. Why Lack of Conceptual Unity is Actually Good

The chaos surrounding the world model concept often makes outsiders think it's a hyped-up trend.

But from an industrial history perspective, lack of conceptual unity is often the norm in the early stages of a technological revolution.

Early cloud computing had IaaS, PaaS, SaaS debates; early big data had Hadoop, NoSQL, data warehouse debates; early AI even had symbolism, connectionism, behaviorism debates. Naming分歧 reflects different groups approaching the same宏大问题 from different angles.

The current分歧 in world models is essentially a debate over what form the 'world' should be compressed into.

Video generation folks see the world as pixel sequences; 3D engine folks see it as geometry and physics; autonomous driving folks see it as traffic rules and driving behaviors; robotics folks see it as action consequences.

Each compression method corresponds to different data, compute, and application scenarios. In the industry's early stage, such分歧 is necessary, allowing parallel exploration of different paths.

But beneath the分歧, goals have converged.

Whether it's LeCun's JEPA, Fei-Fei Li's POMDP loop, Sora's video generation, Genie 3's 3D interaction, or various domestic giants' products, all ultimately point to the same capability: endowing machines with an internal world that is deducible, reviewable, and generalizable, enabling them to act safer, more efficiently, and more generally in the real world.

Language models gave machines the ability to talk about the world; world models attempt to give them the ability to understand, imagine, reason, and interact with the world.

The concept will unify, but that will happen after the landscape settles. Until then, the chaos in naming is precisely the标志 of world models entering the main battlefield.

This article is from the WeChat public account 'IT桔子' (ID: itjuzi521), author: Judy

Trending Cryptos

Related Questions

QWhat is the core idea behind a 'World Model' in AI, according to the article?

AThe core idea is to enable machines to have an internal 'sandbox' or model of the world where they can predict what will happen next and simulate different actions and their consequences without actually acting in the real world. This allows for trial-and-error learning and planning before real-world execution.

QHow does the article categorize different types of World Models based on the work of Fei-Fei Li?

ABased on Fei-Fei Li's framework, the article categorizes World Models into three types: 1) Renderer (outputs observations/pixels, like video generation models), 2) Simulator (outputs states/accurate world representations, like 3D simulation platforms), and 3) Planner (outputs actions, answering 'what to do next' given observations and a goal).

QWhat are the three main technical approaches to building World Models mentioned in the article?

AThe three main technical approaches are: 1) The 'Drawing' route (Generative video models like Sora, focusing on pixel-space generation), 2) The 'Mental Calculation' route (Models like JEPA that predict abstract representations, not pixels), and 3) The 'Building Blocks' route (Systems like NVIDIA Omniverse that generate precise 3D environments with geometry and physics).

QWhat is a World Action Model (WAM) and how does it differ from a Vision-Language-Action (VLA) model?

AA World Action Model (WAM) integrates future state prediction and action generation within a single policy. Unlike a VLA model, which 'sees a scene, understands an instruction, and then produces an action,' a WAM 'sees a scene, understands an instruction, simultaneously imagines the next frame, *and* produces an action.' It aims for a more unified 'knowledge-action' capability essential for robots.

QWhy does the article suggest that the current lack of a unified definition for 'World Model' is actually a good sign for the field?

AThe article suggests the lack of a unified definition is a sign of an early-stage technological revolution. Different groups (video generation, 3D simulation, autonomous driving, robotics) are approaching the same grand problem from different angles, focusing on different data and application needs. This parallel exploration allows for necessary experimentation. The underlying goal—enabling machines to have a predictable, simulatable internal world—is already converging despite the surface-level naming confusion.

Related Reads

"King of Pump Calls" Arthur Hayes Strikes Again, This Time Targeting Deribit

On June 29, BitMEX co-founder Arthur Hayes purchased approximately 6.16 million SYN tokens via OTC platform Flowdesk for around $2.2 million. Hayes subsequently declared on X that SYN represents one of the most asymmetric investments he has seen since HYPE, stating it's time for an options DEX to challenge the dominant platform Deribit, and identifying Hypercall as that challenger. SYN's price surged over 40% following his comments, with a tenfold increase in June 2026 alone, bringing its FDV to roughly $110 million. The article details Synapse Protocol's evolution from a cross-chain messaging and liquidity network into the chain-based options trading protocol Hypercall. Hypercall, built on the Hyperliquid ecosystem's HyperEVM, aims to be a universal options exchange supporting any asset size with capped loss (limited to premium paid) and no forced liquidations. Deribit, established in 2016, remains the centralized leader in crypto options with an estimated 85% market share in BTC and ETH options and $3.588 billion in assets. Its strengths include deep liquidity and professional tools, but it faces criticisms over custody risk, KYC requirements, and regulatory uncertainty. The analysis positions Hypercall not as an immediate replacement for Deribit's entrenched network effects, but as a potential complementary and differentiated competitor, particularly for DeFi-native assets and new asset classes like RWA. The article concludes by noting Hayes's recent mixed "call" record, including fully exiting and later re-buying HYPE, and the controversial price target for CARDS from his family office Maelstrom, which was followed by a significant price drop.

marsbit5m ago

"King of Pump Calls" Arthur Hayes Strikes Again, This Time Targeting Deribit

marsbit5m ago

AI is Sweeping the Globe, So Why is Crypto + AI in a Slump?

AI Booms, But Crypto + AI Remains Sluggish: A Demand-Side Analysis Despite the AI industry's explosive growth and massive investment, the convergence of blockchain and AI (Crypto + AI) has seen limited traction. The core issue is a severe supply-demand mismatch, not a flawed premise. Analyzing four key sub-sectors reveals specific gaps: 1. **Decentralized Compute/Storage:** Offer logical benefits like data sovereignty and cost savings but lack a decisive technical advantage over entrenched cloud giants (AWS, GCP). Enterprises prioritize performance and stability and are unwilling to bear the switching risk and uncertainty of decentralized networks. 2. **Model Verification/Privacy (e.g., ZKML):** Address important long-term issues like auditability and data privacy, but these are not urgent operational pain points for most businesses today. Widespread demand will likely follow regulatory mandates (like the EU AI Act), not precede them. 3. **AI Agent Infrastructure:** Projects are building infrastructure for a future of autonomous, interacting agents. However, the current market focus is on internal process automation within corporate firewalls. The technology is ahead of market readiness. 4. **AI Agent Payments:** This is the only sub-sector where blockchain is on a level playing field with traditional finance. Both are trying to solve the unsolved problem of real-time, micro-transactions for machines, making it the most immediately competitive area. The overarching problem is that the AI industry invests heavily in solutions that solve immediate bottlenecks (e.g., faster memory, more power). Most Crypto + AI solutions target secondary, longer-term concerns (decentralization, transparency) and often come with performance trade-offs. The lack of a flagship, large-scale commercial success case further hinders mainstream capital inflow. The path forward requires either aligning more closely with the current industry's performance demands or patiently building the foundational infrastructure for the next phase of AI.

Foresight News15m ago

AI is Sweeping the Globe, So Why is Crypto + AI in a Slump?

Foresight News15m ago

Continuous Net Outflows from ETFs, Are Institutions Exiting?

US spot Bitcoin ETFs have experienced approximately $6 billion in net outflows over the past six weeks, marking the longest consecutive weekly withdrawal streak since their launch in 2024. The iShares Bitcoin Trust (IBIT) from BlackRock has been particularly affected, accounting for over 70% of recent outflows. On-chain analysis indicates that long-term Bitcoin holders (holding for over 155 days), who control about 83% of the circulating supply, remain steadfast. The selling pressure is primarily coming from allocators who entered through ETF brokerage accounts. This represents the first major collective capitulation since Bitcoin gained mainstream Wall Street recognition, driven more by risk-off portfolio adjustments than a fundamental rejection of the asset. Factors such as rising inflation, a hawkish shift in Federal Reserve policy, massive capital inflows into AI infrastructure, and attractive IPO opportunities have redirected speculative funds. Bitcoin, treated as a high-beta risk asset, was among the first to be sold. While the pace of outflows has slowed significantly—from $1.72 billion in early June to $226.8 million mid-month—the structural issue remains. IBIT's large size means its outflows alone exert substantial market pressure. With spot market volume thin, new capital inflows absent, and ETF buying muted, the market lacks sufficient buying support to absorb this selling. The coming sessions are critical. If IBIT outflows decelerate and Bitcoin reclaims $60,000, this phase could be seen as a healthy reset. However, if heavy IBIT redemptions resume and the price falls below $58,000, it would signal a more sustained institutional exit, requiring non-ETF buyers to shoulder the entire selling pressure alone. The ETF, while lowering entry barriers, has not removed Bitcoin's inherent volatility.

marsbit56m ago

Continuous Net Outflows from ETFs, Are Institutions Exiting?

marsbit56m ago

Building the Bright Path While Secretly Crossing Chencang: Is Walsh Paving the Way for a September "Rate Cut"?

The title "Building the Plank Road Openly While Secretly Crossing at Chencang: Is Walsh Paving the Way for a September 'Rate Cut'?" suggests Federal Reserve Chair Kevin Walsh's hawkish stance may be a deliberate smokescreen. Academy Securities analyst Peter Tchir argues in a report that markets, currently pricing a 75% chance of a September hike, are missing a potential path to a September rate cut that Walsh himself might be quietly preparing. Tchir posits that Walsh's hawkish rhetoric aims to suppress long-term yield risks (with the 10-year Treasury yield falling recently) while creating room for a narrative shift based on upcoming data. The potential political endgame, according to this view, could be rate cuts in September and October, ahead of the midterm elections. This hinges on a political logic where the Trump administration's preference for lower rates remains unchanged. A core part of Tchir's argument involves redefining inflation metrics. He contends the Fed under Walsh may deprioritize the PCE index, criticizing its lagging components like Owners' Equivalent Rent (OER). Instead, he points to alternative, more real-time indicators like the New Tenant Repeat Rent Index (NTRR) and the Truflation daily index, which shows core inflation around 1.45%. He suggests the Fed could shift its data narrative to justify policy easing. Furthermore, Tchir downplays AI-driven inflation fears. He argues that consumer price sensitivity, evidenced by negative market reactions to price hikes (e.g., Apple), contradicts persistent inflation narratives. He also separates AI/data center spending—which he sees as relatively rate-insensitive—from broader consumer affordability issues, implying rate hikes are misdirected. Based on this analysis, Tchir sees a re-pricing of rate cut expectations as likely, creating opportunities in short-duration Treasuries. He maintains a neutral-to-slightly-bullish view on the long end of the yield curve. For equities, he recommends a significant overweight in energy (especially global nuclear assets) and, within defense/security themes, an overweight in biotech/pharma versus an underweight in semiconductors, expressing caution on AI/data center valuations.

marsbit1h ago

Building the Bright Path While Secretly Crossing Chencang: Is Walsh Paving the Way for a September "Rate Cut"?

marsbit1h ago

Trading

Spot

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片