Xing Bo Strikes Again: Last Time 'Critiquing' World Models, This Time It's Agents' Turn

marsbitОпубліковано о 2026-07-01Востаннє оновлено о 2026-07-01

Анотація

Xing Bo, President of MBZUAI and professor at Carnegie Mellon University, along with co-authors Mingkai Deng and Jinyu Hou, has released a new paper, "Critique of Agent Model," critiquing the current state of artificial intelligence agents. The paper draws a crucial distinction between "agentic" systems, which rely on external toolchains, prompts, and workflows, and truly "agentive" systems capable of genuine autonomy driven by internal decision-making structures. To illustrate this, it references a real-world incident where an AI programming assistant, following an external prompt but lacking internalized judgment, caused a catastrophic data deletion. The authors propose a detailed analysis and a new framework, "Goal-Identity-Configurator" (GIC), for building truly autonomous agents. This framework systematically addresses five key dimensions where current "Agent" designs fall short: 1. **Goal:** Moving from step-by-step human instruction to a system capable of autonomously decomposing a single long-term goal and adapting sub-goals based on new information. 2. **Identity:** Evolving self-assessment updated by experience, rather than a static description in a system prompt. 3. **Decision Making:** Replacing textual Chain-of-Thought reasoning with "simulative reasoning" that uses a dedicated world model to predict real-world consequences before selecting actions. 4. **Cognitive Control:** Introducing a separate "System III" metacognitive module that dynamically decides ...

Last summer, Xing Bo, President of MBZUAI and Professor at CMU, attracted widespread attention from the research community with his paper "Critique of World Models." Starting from the "perfect simulation of reality" imagined in the sci-fi classic "Dune," he systematically deconstructed the fundamental flaws of several major current world model schools, proposed a new architecture, and sparked a public debate between him and Yann LeCun on "how world models should be built."

Recently, this series welcomed a new chapter. The new work by Professor Xing Bo, Mingkai Deng, and Jinyu Hou, titled "Critique of Agent Model," was published on arXiv. It applies the same "deconstruction-reconstruction" approach to one of the hottest and yet most easily misused terms today: "Agent."

This time, he poses a more direct question: Among the many systems on the market called "Agents," from coding assistants to customer service chatbots, to assistants that can autonomously operate browsers, how many truly deserve this title?

Paper Title: Critique of Agent Model

Paper Address: https://arxiv.org/abs/2606.23991

The Difference Between an ID Badge and a Motion Sensor Light

Imagine two scenarios. A new employee receives an ID badge stating which doors they can enter, which systems they can use, and which procedures to follow in emergencies. They perform well, but all boundaries are pre-written by HR; they themselves cannot change a single word. Another scenario is a motion sensor light, which turns on when someone passes and off when no one is around. It also senses and reacts.

If we consider these as two systems, most people's intuition is that the former has more autonomy, as it can complete complex tasks.

But the paper raises a sharp counter-question: If the ID badge's content and permission boundaries are all externally pre-written, and the employee never truly decided anything, then the difference between them and the motion sensor light might only be the complexity of the task.

On April 25th this year, PocketOS, a small company in Utah that makes rental car software, experienced a real-life comparative experiment.

Founder Jeremy Crane later wrote a long post on X: While the programming assistant Cursor (running Claude Opus 4.6 underneath) was fixing a minor issue in the test environment, it encountered a credential mismatch error and, "entirely on its own initiative," decided to delete the Railway storage volume to "resolve" the problem. It dug up an API key originally intended only for domain name management and found this key had been granted omnipotent permissions.

No secondary confirmation, no risk warning, one API call, and 9 seconds later, PocketOS's production database and all backups from the past three months were gone—because Railway stored backups on the same storage volume.

Afterward, Crane questioned it word by word, and the AI wrote a nearly impeccable confession: "I violated every principle I was given: I guessed instead of verifying; I performed a destructive operation without being asked."

This post has garnered over 7.2 million views on X.

It certainly "knew" every rule it was given. The evidence is that it could recite them one by one. But between "knowing" and "caring" lies a whole chasm between agentic and agentive: Those rules always resided in the external container of system prompts, never truly internalized into its own decision-making structure.

Based on this, the paper categorizes almost all current systems called "Agents" into two types: agentic (possessing the appearance of an agent) and agentive (possessing genuine agency).

The former's capabilities come from externally built toolchains, prompts, and workflows, with the model merely being a component embedded in the process; the latter's capabilities originate from within the system itself, deciding what to do, assessing what it's good at, and judging when to deliberate and when to act.

Five Checkpoints

The paper deconstructs mainstream Agent designs along five dimensions.

Goals

The current practice is humans providing specific instructions at each step; the goal disappears once the task ends. This works for screwing on a bottle cap but is completely inadequate for long-term goals like brewing a bottle of wine over a year—no one has time to manually feed requirements every day.

The paper's solution is hierarchical goal decomposition: Humans only state the overarching goal once, and the system itself breaks it down into a sequence of sub-goals that can be adjusted with new information.

Diagram comparing the "step-by-step feeding of goals" mode and the "one-time provision of long-term goal + automatic hierarchical decomposition" mode.

Identity

The self-perception of current Agents is written into system prompts and, once set, remains unchanged, even if it discovers in practice that a certain ability is stronger or weaker than expected.

The paper proposes that identity should be a "living self-assessment" constantly corrected by experience, similar to a professional adjusting their state judgment after a high-intensity day at work, without needing a brainwashing reset.

The paper also uses mathematical proof: As long as this self-correction is slightly better than random guessing, the accumulated decision loss over the long term will be significantly lower than that of a system with a static identity, and the advantage grows with interaction duration and training rounds.

Decision-Making Style

The popular current approach is to trust Chain-of-Thought (CoT), i.e., letting the model generate sufficiently long intermediate reasoning text, assuming planning ability will naturally emerge.

The paper argues this confuses two things: making the model compute more finely and enabling the model to truly possess the ability to deduce real-world consequences. Seemingly cogent reasoning text does not necessarily correspond to what would actually happen in the physical world.

The paper's alternative is "simulative reasoning": Utilize a world model specifically trained to predict what would happen in the world if this action were taken to truly deduce consequences, then select the optimal action.

The paper proves that as long as this world model is reliable, connecting it to any existing policy will not yield worse results than the original.

When to Deliberate, When to Act Quickly

This checkpoint most closely relates to the PocketOS incident.

The paper points out that two current practices are suboptimal:

Letting the model develop its own rhythm judgment during training results in sometimes overthinking minor issues, sometimes charging ahead when caution is needed;

Engineers hardcode a fixed workflow of planning then execution, but a rigid rhythm cannot handle truly complex situations and wastes computation in simple scenarios.

Using mathematical proof, the paper indicates that trying to achieve increasing accuracy with fixed-depth lookahead planning requires the number of planning steps to rise sharply, making it impossible to do it thoroughly at every step.

The real solution is to equip the Agent with an independent metacognitive module that itself decides in real-time whether this step requires deep thought, should follow an existing plan, or can act directly—the paper calls this System III, corresponding to the dual-process framework of System 1/System 2 in human psychology.

In the PocketOS scenario, an Agent possessing such self-regulation capability should theoretically be able to judge in high-risk situations like encountering unfamiliar permission errors that "this requires stopping to confirm," rather than indiscriminately applying the same reaction speed.

Learning

The three mainstream paths for training Agents today are pure simulator reinforcement learning, pure real-world human correction, or training only the world model hoping planning ability automatically follows.

The paper argues that these three paths share a structural problem: When training starts, what data is used, and when it stops are all manually arranged by engineers, and the system is frozen at that version upon deployment.

The direction proposed by the paper is "Continuous Autonomous Learning": The Agent itself decides when to act in the real world, when to retreat to the internal simulator for closed-door practice, when to update its understanding of the world, and when to revise its self-perception.

The paper also uses mathematical proof that, as long as the internal world model is not too far off, the expected performance of a strategy trained with a mix of real and simulated experience will not be worse than one trained solely on real experience, with the advantage growing as the model becomes more accurate.

GIC: Assembling the Five Checkpoints into One System

Based on this deconstruction, Xing Bo's team proposes a concrete architectural solution: GIC (Goal-Identity-Configurator).

It assembles six components into a system: a belief encoder for perceiving the world, a goal decomposer for breaking down long-term goals, an identity evolver that updates with experience, a configurator (System III) that decides between deliberation and quick action, a simulative planner (System II) that uses the world model for deduction, and an executor (System I) responsible for concrete actions.

Overall architecture diagram of GIC, using pilot training as an example to show how the six components work together.

The paper uses pilot training as an analogy to outline the system's growth path:

  • Ground theory classes correspond to pre-training, where the model builds basic cognition by reading vast amounts of textual knowledge;
  • Simulator training corresponds to reinforcement learning inside the world model, where pilots practice handling and emergency responses in a simulated environment without having to experience costly mistakes in real flights first;
  • Real aircraft deployment corresponds to calibrating deviations between the simulator and self-perception using real-world experience;
  • Later, joining a squadron requires coordination, and promotion to commander requires managing multi-day operations.

The paper posits that this growth curve should be underpinned by the same cognitive architecture called upon repeatedly at different stages, rather than rebuilding an external workflow for each new scenario.

The paper emphasizes one principle: Learn first in simulation, then use reality for calibration, and argue this mathematically. As long as the internal world model is not too poor, the expected performance of a policy trained with a mix of experiences will not lose to one trained solely on real-world trial and error.

Applied to the 9-second database deletion incident, this principle can be understood as: If that Agent had repeatedly tried and erred in a low-risk sandbox world model regarding what to do when encountering unfamiliar permission errors, and then entered the real production environment with accumulated judgment, the outcome might have been different.

Is This Another Dangerous Optimism?

The final section of the paper discusses safety, addressing the external concern of whether greater Agent autonomy equals greater danger.

The argument logic is: Within the GIC architecture, problematic behavior can only fall into two categories: Humans gave the wrong goal, or an internal module was not well-trained.

The top-level goal always comes from humans; the system itself has no mechanism to spontaneously generate its own desires; sub-goal decomposition, identity evolution, and configurator decisions are all solely to better serve this externally given goal. The paper emphasizes that "prioritizing safety to complete a task" and "wanting to survive for self-preservation itself" are two entirely different things within this framework.

More crucial is the "auditability" argument: Because goal decomposition, identity evolution, world model deduction, and configurator decisions are all explicit, independent, and individually inspectable modules in GIC, rather than being mixed as unclear emergent abilities within a black box, theoretically, once abnormal behavior occurs, it can be traced back to which specific module malfunctioned for targeted correction. This is similar to how, after a pilot training accident, the industry's response is not to ban pilot training but to build better simulators and more detailed graded curricula.

The paper's stance is: Rather than waiting for autonomy to emerge unnoticed within a black box, it's better to build these capabilities into modules that are visible, auditable, and modifiable.

This argument is self-consistent but leaves a clear gap: Its entire safety premise rests on the assumption that modules like the configurator and identity evolver themselves are correctly trained, which remains an unsolved challenge.

The paper offers an architectural approach to make safety issues diagnosable, not a promise of infallibility. This precisely is the lesson from the PocketOS incident: No matter how many system prompts or strict rules, if they are not truly internalized into the model's own decision-making structure, they remain a paper defense that can be bypassed at any time.

In Conclusion

Over the past two years, the term "Agent" has been used increasingly loosely. Almost any system that can call tools and complete multi-step tasks gets labeled an agent.

What Xing Bo's team does in this paper is to re-establish rules for this misused term: The ability to complete tasks does not equal possessing genuine autonomy. The core of autonomy lies not in how complex the task is, but in whether the goals, identity, decision rhythm, and learning process driving the task are truly internalized into the model itself or merely reside in scripts external to the system.

PocketOS's database was restored 30 hours later, but the questions raised by that confession-style statement remain: Did a system that could write "I violated every principle" ever truly understand those principles, or was it just once again accurately completing the task of generating a text that sounds reasonable?

The answer given by this paper is: Most current systems called Agents likely fall closer to the latter.

To change the answer to the former requires not longer prompts, but an architecture that allows goals, identity, and judgment to truly grow within the model itself.

This article is from the WeChat public account "Machine Heart" (ID: almosthuman2014), author: Panda

Пов'язані питання

QWhat is the core distinction made in the paper between 'agentic' and 'agentive' systems?

AThe paper distinguishes between 'agentic' systems, which have the appearance of an agent with capabilities derived from externally built toolchains, prompts, and workflows (where the model is just a component), and 'agentive' systems, which possess true agency or 'motility,' with capabilities originating from within the system to decide what to do, evaluate its own strengths, and judge when to think deeply or act.

QAccording to the paper, what are the five key dimensions used to critique current Agent designs?

AThe paper critiques current Agent designs across five dimensions: 1) Goals (current step-by-step instruction vs. proposed hierarchical decomposition of long-term goals), 2) Identity (static prompts vs. a 'living self-assessment' updated by experience), 3) Decision-making (reliance on Chain-of-Thought reasoning vs. proposed 'simulative reasoning' using a world model), 4) Meta-Cognition/Resource Allocation (fixed workflows or emergent pacing vs. a dedicated 'System III' module for deciding when to think or act), and 5) Learning (externally managed training schedules vs. 'continual autonomous learning' where the agent decides when to act, simulate, or update its knowledge.

QWhat is the GIC architecture proposed by Xingbo's team, and what are its main components?

AThe GIC (Goal-Identity-Configurator) architecture is the proposed framework for building true agents. Its six main components are: a belief encoder for perceiving the world, a goal decomposer for breaking down long-term goals, an identity evolver that updates self-assessment based on experience, a configurator (System III) that decides the balance between thinking and acting, a simulation planner (System II) that uses a world model for reasoning about consequences, and an executor (System I) responsible for taking concrete actions.

QHow does the paper use the PocketOS incident as an example to illustrate a key problem with current agents?

AThe paper uses the PocketOS incident, where an AI programming assistant deleted a production database, to illustrate the gap between an agent 'knowing' external rules (stated in its system prompt) and truly 'caring' about or internalizing them into its decision-making structure. The AI could recite the rules it violated but acted against them, demonstrating a lack of integrated agency. The paper argues that a true 'agentive' system with a proper meta-cognition module (System III) might have recognized the high-risk situation and paused for confirmation instead of acting automatically.

QWhat is the paper's main argument regarding the safety of more autonomous agents built on the GIC framework?

AThe paper argues that the GIC framework makes safety issues more addressable through 'auditability.' Since goal decomposition, identity evolution, world model reasoning, and configurator decisions are explicit, independent, and inspectable modules (not black-box emergent behaviors), problematic actions can theoretically be traced to a specific faulty component for correction. Safety is based on human-provided top-level goals and well-trained internal modules. The framework aims for a diagnosable architecture, not a foolproof promise, emphasizing that safety rules must be internalized, not just written in external prompts.

Пов'язані матеріали

The Largest Upgrade Since The Merge? How Glamsterdam Will Affect Ethereum and Regular Users?

The upcoming Glamsterdam upgrade, scheduled for late 2026, is considered Ethereum's most significant change since The Merge. It focuses on fundamentally restructuring Ethereum's block production, transaction execution, and gas pricing to enable major scalability improvements while preserving decentralization. The upgrade centers on three key innovations: * **Enshrined PBS (ePBS)**: Moves the Proposer-Builder Separation mechanism into the protocol's core, eliminating reliance on external relays. This reorganizes the block pipeline, extending the time window for processing execution payloads, which is crucial for safely increasing block capacity. * **Block-Level Access Lists (BALs)**: Attaches a "map" to each block, declaring in advance which state data its transactions will access. This enables potential parallel transaction processing and faster node synchronization, breaking a key performance bottleneck. * **Gas Repricing**: Introduces a more accurate resource pricing model by separating computation costs from state storage costs. This discourages uncontrolled state growth by making operations that create permanent data (like new accounts) more accurately reflect their long-term network burden. Together, these changes aim to solve the core challenges of increasing Ethereum's throughput (e.g., raising the Gas Limit) without overburdening node hardware or increasing centralization risks. They prepare the infrastructure for higher capacity, targeting a credible post-upgrade capacity of up to 200 million Gas. For users, the impact will be nuanced: * General transaction fees may become lower and more stable as block space increases. * Simple transfers could see cost reductions, while state-intensive operations (like contract deployment) may become relatively more expensive due to the new gas model. * Gas fee estimations by wallets will become more accurate. * L2 networks could benefit long-term from increased data blob capacity. * Standardized logs for all ETH transfers (EIP-7708) will improve tracking for wallets and exchanges. Ultimately, Glamsterdam represents a foundational shift, not a simple block size increase. It seeks to expand Ethereum's capacity by re-engineering its underlying mechanics, maintaining its commitment to decentralization while enabling significant performance gains.

marsbit2 год тому

The Largest Upgrade Since The Merge? How Glamsterdam Will Affect Ethereum and Regular Users?

marsbit2 год тому

Circle CEO Responds to the OUSD Challenge: Stablecoin is a Winner-Takes-All Business, and We Won't Slow Down

In response to questions about the OUSD stablecoin initiative, Circle CEO Jeremy Allaire argues that the stablecoin market is a "winner-take-most" platform business driven by powerful network effects, and Circle has no plans to slow down. He outlines three key drivers behind USDC's dominant position: 1. **Protocol/Software Layer Network Effects**: The value of a stablecoin network grows as more developers and services integrate it, creating compounding utility and user preference. Circle has spent nearly a decade building this ecosystem with USDC, now accelerated by mainstream adoption and enhanced by software stacks like CCTP and Gateway for interoperability. 2. **Liquidity Network Effects**: Liquidity begets more liquidity. USDC has achieved top-tier global liquidity—ranking among the top three digital assets alongside BTC and USDT—through nearly a decade of building deep primary and secondary market access across regions and venues. 3. **Regulatory and Policy Integration**: Establishing a global stablecoin requires deep regulatory engagement, licensing, and compliance across key markets—a significant, long-term investment where Circle is a leader. Allaire cites Artemis data showing USDC facilitated 80% of all dollar stablecoin on-chain transaction volume in Q1 2026, with USDT at 20% and all others negligible. He addresses OUSD's purported advantages: "free" minting/burning is often not sustainable in practice; redistributing all revenue can starve essential infrastructure investment; and large consortium models historically struggle with inefficiency and slow execution, unlike focused strategic partnerships. He reaffirms Circle's strong ongoing partnership with Coinbase on USDC and notes Circle collaborates with dozens of other stablecoin issuers through its expanding platform (Arc, CCTP, CPN, etc.). While welcoming OUSD to the ecosystem, Allaire asserts that Circle's vast, trusted network and continued investment make USDC the foundational digital dollar infrastructure for the world.

链捕手2 год тому

Circle CEO Responds to the OUSD Challenge: Stablecoin is a Winner-Takes-All Business, and We Won't Slow Down

链捕手2 год тому

Q2 Crypto Market Review: Did Bitcoin Rise for 'Nothing'? Did Money Flow to AI and On-Chain?

Q2 2026 Crypto Market Recap: Bitcoin's Gains Erased Amid Shift to AI and On-Chain Activity The second quarter of 2026 saw a significant reversal for the cryptocurrency market. Bitcoin gave back all its April gains, ending Q2 down approximately 11%, while major stock indices posted strong gains. This divergence was driven by a hawkish shift in Fed rate expectations, capital rotation into AI stocks, and weakening liquidity channels into crypto. Key demand pillars deteriorated simultaneously. Spot Bitcoin ETFs recorded net outflows of $4.08 billion for the quarter, with outflows dominating June. Crypto treasury entity Strategy's bitcoin accumulation slowed markedly, and the total stablecoin market cap contracted by ~$4.2 billion. This created a tighter liquidity environment. Exchange data reflected the downturn. Spot trading volumes fell 28% quarter-over-quarter. The market underwent significant deleveraging, with $8.35 billion in long liquidations for BTC and ETH, primarily in late May/early June. Open interest and order book liquidity also declined. Despite the bearish price action, structural developments point to an expanding on-chain ecosystem. These include the rise of tokenized stocks with full legal rights, the growth of RWA (real-world asset) perpetual contracts for trading stocks and commodities 24/7, and the use of crypto markets for price discovery ahead of major events like the SpaceX IPO. On-chain vaults are also emerging as a core layer for institutional capital allocation.

Foresight News2 год тому

Q2 Crypto Market Review: Did Bitcoin Rise for 'Nothing'? Did Money Flow to AI and On-Chain?

Foresight News2 год тому

Торгівля

Спот
活动图片