World Models Shift from Prediction to Planning: HWM and the Challenge of Long-Horizon Control

marsbitPublished on 2026-04-17Last updated on 2026-04-17

Abstract

World models have evolved from focusing on representation learning and future prediction to addressing long-horizon planning challenges. While models like V-JEPA 2 demonstrate strong predictive capabilities using large-scale video pre-training, they struggle with multi-stage control tasks due to error accumulation and exponential growth in action search space. HWM (Hierarchical World Model) introduces a two-level planning structure: a high-level planner outlines coarse subgoals over longer time horizons, while a low-level executor handles short-term actions. This decomposition reduces planning complexity and error propagation. In experiments, HWM achieved 70% success in real-world robotic tasks where flat models failed entirely. Complementary efforts include V-JEPA (focused on representation), HWM (on hierarchical planning), and WAV (World Action Verifier, on self-correction). Together, they mark a shift from pure world modeling to integrated systems capable of prediction, planning, and verification—key to deploying world models in real-world agents and long-term tasks.

Over the past year, the research focus of world models has initially centered on representation learning and future prediction. Models first understand the world and then internally simulate future states. This approach has already produced a number of representative results. V-JEPA 2 (Video Joint Embedding Predictive Architecture 2—a video world model suite released by Meta in 2025) used over 1 million hours of internet video for pre-training, combined with a small amount of robot interaction data, demonstrating the potential of world models in understanding, prediction, and zero-shot robot planning.

However, a model's ability to predict does not equate to its ability to handle long-horizon tasks. When faced with multi-stage control, systems typically encounter two challenges. One is that prediction errors accumulate over long rollouts (multi-step simulations), causing the entire path to increasingly deviate from the goal. The other is that the action search space expands rapidly as the planning horizon increases, leading to continuously rising planning costs. HWM does not rewrite the underlying learning approach of world models but instead adds a hierarchical planning structure on top of existing action-conditioned world models, enabling the system to first organize stage paths and then handle local actions.

From a technical perspective, V-JEPA 2 (https://ai.meta.com/research/vjepa/) leans more toward world representation and basic prediction, HWM leans more toward long-horizon planning, and WAV (World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry, https://arxiv.org/abs/2604.01985) leans more toward the model's ability to identify and correct its own prediction distortions. These three lines of research are gradually converging. The focus of world model research has shifted from merely predicting the future to transforming predictive capabilities into executable, correctable, and verifiable system capabilities.

I. Why Long-Horizon Control Remains a Bottleneck for World Models

The difficulties of long-horizon control become clearer when applied to robotic tasks. Take robotic arm manipulation as an example: picking up a cup and placing it in a drawer is not a single action but a sequence of continuous steps. The system must approach the object, adjust its posture, complete the grasp, move to the target location, and then handle the drawer and placement. As the chain lengthens, two problems arise simultaneously. One is that prediction errors accumulate along the rollout, and the other is that the action search space expands rapidly.

What the system often lacks is not local predictive ability but the capacity to organize distant goals into stage paths. Many actions may appear to deviate from the goal locally but are actually intermediate steps required to achieve it. For example, raising the arm before grasping or moving back slightly and adjusting the angle before opening a drawer.

In demonstration tasks, world models can already provide coherent predictions. However, when entering real control scenarios, performance begins to decline, and problems emerge. The pressure comes not only from the representation itself but also from the immaturity of the planning layer.

II. How HWM Restructures the Planning Process

HWM splits the originally single-layer planning process into two layers. The upper layer is responsible for stage direction at a longer time scale, while the lower layer handles local execution at a shorter time scale. The model plans at two different temporal rhythms simultaneously, rather than at a single pace.

When handling long tasks, single-layer methods typically need to search the entire action chain directly in the underlying action space. The longer the task, the higher the search cost, and the more likely prediction errors are to diffuse along multi-step rollouts. After HWM's decomposition, the high layer only handles route selection at a longer time scale, and the low layer only handles the execution of the current segment. The entire long task is broken into multiple shorter segments, reducing planning complexity.

Another key design is that high-level actions are not simply the difference between two states but use an encoder to compress a sequence of low-level actions into a higher-level action representation. For long tasks, the key is not just the difference between the start and end points but also how the intermediate steps are organized. If the high layer only looks at displacement differences, it may lose path information in the action chain.

HWM embodies a hierarchical task organization approach. Faced with a multi-stage task, the system no longer unfolds all actions at once but first forms a coarse stage path and then executes and corrects it segment by segment. Once this hierarchical relationship is incorporated into the world model, predictive capabilities begin to transform more stably into planning capabilities.

III. From 0% to 70%: What the Experimental Results Indicate

In the real-world grasp-and-place task set up in the paper, the system was given only the final goal condition without manually decomposed intermediate goals. Under these conditions, HWM achieved a success rate of 70%, while the single-layer world model had a 0% success rate. A long task that was nearly impossible to complete originally became highly achievable after introducing hierarchical planning.

The paper also tested simulation tasks such as object pushing and maze navigation. The results showed that hierarchical planning not only improved success rates but also reduced the computational cost of the planning phase. In some environments, the computational cost of the planning phase could be reduced to about a quarter of the original while maintaining higher or comparable success rates.

IV. From V-JEPA to HWM to WAV

V-JEPA 2 represents the world representation approach. V-JEPA 2 used over 1 million hours of internet video for pre-training, combined with less than 62 hours of robot video for post-training (targeted training after pre-training), resulting in a latent action-conditioned world model (a world model that predicts in an abstract representation space incorporating action information) usable for understanding, predicting, and planning in the physical world. It demonstrates that models can acquire world representations through large-scale observation and transfer these representations to robot planning.

HWM is the next step. The model already possesses world representation and basic predictive capabilities, but once multi-stage control is involved, the problems of error accumulation and search space expansion erupt. HWM does not change the underlying representation learning approach but adds a multi-timescale planning structure on top of existing action-conditioned world models. It addresses how the model organizes distant goals into a set of intermediate steps and then advances segment by segment.

WAV further focuses on verification capabilities. For world models to enter policy optimization and deployment scenarios, they cannot just predict; they must also identify areas where they are prone to distortion and make corrections accordingly. It focuses on how the model checks itself.

V-JEPA leans toward world representation, HWM toward task planning, and WAV toward result verification. Although their focuses differ, their overall direction is consistent. The next phase of world models is no longer just internal prediction but the gradual integration of prediction, planning, and verification into a system capability.

V. From Internal Prediction to Executable Systems

Many past world model efforts were closer to improving the continuity of future state predictions or the stability of internal world representations. However, the current research focus is beginning to change. Systems must not only form judgments about the environment but also translate those judgments into actions and continue to adjust the next steps after results are obtained. To get closer to real deployment, it is necessary to control error propagation in long-horizon tasks, compress the search space, and reduce inference costs.

Such changes will also affect AI agents. Many agent systems can already handle short-chain tasks, such as calling tools, reading files, and executing multi-step commands. However, once tasks become long-chain, multi-stage, and require mid-course re-planning, performance declines. This is not fundamentally different from the difficulties in robotic control; both stem from insufficient high-level path organization capabilities, leading to a disconnect between local execution and overall goals.

The hierarchical approach provided by HWM—where the high level is responsible for paths and stage goals, the low level handles local actions and feedback processing, and result verification is layered on top—will continue to appear in more systems in the future. The next phase of world models will focus not only on predicting the future but on organizing prediction, execution, and correction into a viable path.

In Jinjiang, Fujian, a Storage Super Unicorn Lies Quiet

In Fujian's Jinjiang, a city known for sportswear, lies a quiet semiconductor giant: Fujian Jinhua Integrated Circuit Co. (JHICC). Once a promising domestic DRAM manufacturer alongside Yangtze Memory and ChangXin Memory Technologies (CXMT), its journey was derailed in 2018 when the U.S. placed it on an Entity List and filed criminal charges for alleged trade secret theft. This halted production for years. A turning point came in February 2024 when a U.S. federal court found JHICC not guilty. However, it had lost crucial time. While CXMT soared to become a top-valued A-share company in 2024, JHICC, with an estimated valuation of 80 billion RMB, was just restarting. Its current output is primarily customized DDR4 chips, not the advanced DDR5/HBM demanded for AI, but it still benefits from the broader memory chip upcycle. JHICC's story is tied to Chen Zhengkun, a veteran engineer who left Micron to lead the venture. Founded in 2016 with state-backed funding, JHICC partnered with Taiwan's UMC to develop DRAM technology. Rapid progress was cut short by the U.S. actions, which Micron initiated, partly due to its heavy reliance on the Chinese market. Post-sanctions, Chen's team worked to rebuild the production line with reduced reliance on U.S. technology. According to its records, JHICC achieved small-scale production and revenue growth under immense pressure. It now focuses on the stable "niche" DRAM market (e.g., TVs, routers) with a monthly capacity of ~40,000 wafers, aiming for 60,000 by 2026. It holds over 1,000 patents but remains on the Entity List. For Jinjiang, investing in JHICC was a bold industrial leap. The local government provided unwavering financial and logistical support during the crisis, helping the company survive. JHICC has become the anchor for a growing local semiconductor cluster. Though its scale lags behind domestic peers, JHICC's persistence symbolizes a hard-won foothold in a global market long dominated by Samsung, SK Hynix, and Micron. Having missed one boom, it seeks a place in the new AI-driven memory supercycle.

marsbit31m ago

In Jinjiang, Fujian, a Storage Super Unicorn Lies Quiet

marsbit31m ago

Amid a 38GW Power Shortage, Why Have Bitcoin Mining Facilities Suddenly Become the New Gateway to AI Compute?

Under the impending threat of a 38GW power deficit projected for US data centers between 2026-2028, decommissioned Bitcoin mining sites are emerging as critical infrastructure for AI compute. A growing number of listed North American miners—including TeraWulf, Hut 8, and Cipher Mining—are pivoting to become "Powered Shell Providers." They are repurposing their existing assets: pre-approved grid connections, land, and substations to offer ready-to-use data center shells to cloud providers and AI companies. This shift capitalizes on the severe bottleneck in securing new, large-scale power access, which can take 5-7 years. Analysts at Morgan Stanley estimate retrofitted mining facilities could supply 10-19GW of the needed capacity. The core value proposition is no longer cryptocurrency mining, but selling a scarce "time to power" advantage in the race for AI infrastructure.

华尔街日报31m ago

Amid a 38GW Power Shortage, Why Have Bitcoin Mining Facilities Suddenly Become the New Gateway to AI Compute?

华尔街日报31m ago

Michael Saylor: 'We Never Said We Would Never Sell Bitcoin'

Michael Saylor stated that his company never made a commitment to never sell its bitcoin holdings, though it expects to remain a net buyer of bitcoin long-term. His comments came following reports that the company had received new authorization to sell up to $5 billion in bitcoin. Saylor clarified that this authorization is not new and was announced on June 29th as part of the company's capital management strategy. He emphasized that the authorization permits but does not obligate sales for specific purposes and that no new approval has been announced. Saylor also noted the company never officially adopted a "bitcoin will never be sold" policy.

cryptonews.ru1h ago

Michael Saylor: 'We Never Said We Would Never Sell Bitcoin'

cryptonews.ru1h ago

The 'Summer Saw' Continues: A Break Above $67,000 Could Signal the Start of Bitcoin's Rally

Bitcoin continues to consolidate within a $58,000–$67,000 range, with its price dropping to $62,217 on August 1st. Analysts are divided on the next direction. Trader Crypto Candy suggests a potential drop towards $60,000 if the price remains below $66,000. Investor Jelle refers to the prolonged sideways movement as a "summer saw" and maintains a dollar-cost averaging strategy. The key upside scenario hinges on a breakout above $67,000. Daan Crypto Trades states that without this, the movement risks being just an extended pause. Roman projects a sharper rise to $70,000–$80,000+ if a breakout occurs with sufficient volume. Macro-analyst Gert van Lagen views this as an accumulation phase within a multi-year "cup and handle" pattern. He notes that long-term holders are refusing to sell, as indicated by the NUPL metric staying far from capitulation. In summary, the market is in an accumulation phase, with the $60,000 and $67,000 levels being critical. A break above $67,000 could initiate significant growth, while a fall below $60,000 may lead to further decline. The recent pullback shows that legislative catalysts have provided only short-lived momentum, raising questions about the sustainability of any future breakout attempts.

cryptonews.ru1h ago

The 'Summer Saw' Continues: A Break Above $67,000 Could Signal the Start of Bitcoin's Rally

cryptonews.ru1h ago

Must-Watch Events Next Week｜CLARITY Act Could Face Senate Vote; SpaceX, Circle to Report Earnings (8.3-8.9)

**Summary: Key Events and Developments to Watch (August 3-9)** The upcoming week is marked by significant financial disclosures, key legislative deadlines, and notable product updates. **Major Financial Events:** Several companies are scheduled to release their Q2 2026 earnings. American Bitcoin (ABTC) will report on August 3, followed by SpaceX and Hut 8 Mining Corp. on August 4, and Circle on August 5. Notably, a significant portion of SpaceX shares (up to 12% of total shares) will be unlocked on August 6 following their earnings release. **Key Legislative Deadline:** The U.S. Senate faces an August 7 deadline to secure 60 votes for the CLARITY Act, a bipartisan bill aiming to establish a federal regulatory framework for cryptocurrencies. The Senate may hold a full vote on the bill during the week. **Economic Data:** The U.S. July Non-Farm Payrolls report will be released on August 7, providing crucial labor market data. **Technology & Product Updates:** * **Shutdowns:** DeFi portfolio tracker Zapper and wallet app Ctrl Wallet will cease operations on August 3. * **Upgrades:** LayerZero will deprecate its v1 relayers on August 3. XRP Ledger's new version 3.3.0, featuring five new functions, is expected next week. * **AI:** Elon Musk announced that the advanced Grok 4.6 AI model is set for release around August 7. * **Bitcoin:** The BIP-110 forced signaling for a potential Bitcoin network change is scheduled to begin around August 8. **Other Notable Events:** Chinese robotics firm Unitree Tech has set its preliminary price inquiry for its IPO for August 5. South Korean exchange Upbit will delist AQT and AERGO tokens on August 3.

marsbit2h ago

Must-Watch Events Next Week｜CLARITY Act Could Face Senate Vote; SpaceX, Circle to Report Earnings (8.3-8.9)

marsbit2h ago

Trading

Spot

World Models Shift from Prediction to Planning: HWM and the Challenge of Long-Horizon Control

Abstract

I. Why Long-Horizon Control Remains a Bottleneck for World Models

II. How HWM Restructures the Planning Process

III. From 0% to 70%: What the Experimental Results Indicate

IV. From V-JEPA to HWM to WAV

V. From Internal Prediction to Executable Systems

Related Questions

Related Reads

In Jinjiang, Fujian, a Storage Super Unicorn Lies Quiet

Amid a 38GW Power Shortage, Why Have Bitcoin Mining Facilities Suddenly Become the New Gateway to AI Compute?

Michael Saylor: 'We Never Said We Would Never Sell Bitcoin'

The 'Summer Saw' Continues: A Break Above $67,000 Could Signal the Start of Bitcoin's Rally

Must-Watch Events Next Week｜CLARITY Act Could Face Senate Vote; SpaceX, Circle to Report Earnings (8.3-8.9)

Trading