Tremble Humans, AI Continues Its Accelerated Sprint

marsbitPublished on 2026-06-13Last updated on 2026-06-13

Abstract

Trembling, Humans: AI Continues Its Accelerated Sprint Yes, AI is still rapidly accelerating. While deep learning seemed to stall quickly in its early years, large models after years of development show no sign of hitting their ceiling. At the Zhiyuan Conference 2026, the focus is on enabling AI to move from the digital world into the physical world. Scaling Law remains effective, continuing to drive advancements in both large language models and multimodal models. The industry is now entering a phase of pursuing World Models, though unresolved technical paths and data issues mean this exploration may take 3-5 more years. Concurrently, breakthroughs in Agents are accelerating AI's real-world application in fields like healthcare and meetings. Making Agents truly useful requires key hardware-software co-design, evident from the strong presence of chip vendors at the conference. We stand at a new historical threshold where AI is becoming a foundational force reshaping the world. The first day of the conference highlighted AI's evolution from "knowing how to chat" to "knowing how to work." Scaling Law persists, World Models are the next key battleground, and Agents are transitioning from usable to好用 (user-friendly). Scaling Law is not ending but diversifying. New models like Anthropic's Fable 5 demonstrate scaling through parameter size, synthetic data, and reinforcement learning. Advancements in AI Coding and Agent deployment are enabling a trend of AI self-evolution, poten...

That's right, AI is still in an accelerated sprint.


In 2016, deep learning had only been exploding for a year before it almost stagnated. In 2026, after four years of explosive growth, large models still haven't hit their ceiling.


At the 2026 BAAI Conference, Guangzhui Intelligent observed that from models to software/hardware to products, everything is striving for AI to 'run' from the digital world into the physical world.


On one hand, Scaling Law continues to function steadily, propelling the ongoing development of large language models and multimodal models. The AI industry has entered a phase of pursuing World Models. However, issues like current technical routes and data remain unresolved, likely requiring at least 3-5 more years of exploration.


On the other hand, breakthroughs in Agents are accelerating the deployment of AI in real-world scenarios. As Agents have reached a usable stage, the industry is advancing their application in areas like healthcare and meetings. To transition Agents from usable to useful, software-hardware co-design has become key. At the exhibition booths of the BAAI Conference, chip manufacturers occupied 'half the room,' with nearly all leading domestic AI chip companies present.



"We are standing at a new historical inflection point. Artificial intelligence is no longer just a tool transforming a specific industry but is becoming the underlying force reconstructing the world. AI Coding, autonomous agents, and model self-evolution are opening up possibilities for creating AI. World Models, embodied intelligence, and robotics are extending intelligence from the digital world to the physical world," said Wang Zhongyuan, President of the Beijing Academy of Artificial Intelligence (BAAI).


What exactly is happening within this wave of reconstruction by this underlying force?


On the first day of the BAAI Conference, the guests present offered this answer: AI is moving from 'being able to chat' to 'being able to work.' Scaling Law persists, World Models with unconverged technical directions become the focus of the next phase, while Agents have started transitioning from usable to useful, with many optimization challenges remaining.


AI Has Not Hit Its Technical Ceiling,


And Has Learned Self-Evolution


Over the past year, as high-quality internet text data was being exhausted, a pessimistic sentiment spread throughout the industry that 'Scaling Law is about to peak.'


In multiple forums at the BAAI Conference, the question 'Has the Scaling Law dividend diminished?' was frequently raised. Several guests denied this notion.


"I still firmly believe scaling is far from over," said Wang He, Founder and CTO of Galaxy Universal. "Looking back today, Scaling Law hasn't failed; it has just become more diversified."


Scaling continues to show its effect on a series of newly released large language models. Analyzing Anthropic's recently released Fable 5, Luo Fuli from Xiaomi suggested this model itself is a product of scientifically advancing scaling. It is the result of extending large models by combining three dimensions: parameter scale, synthetic data, and reinforcement learning.


"We speculate that Fable 5's parameter scale itself is likely several times that of the current largest open-source models. Additionally, it involved significant computational investment in Test-Time Scaling or reinforcement learning. Furthermore, synthetic data generated by humans and agents brought the data scale to a new order of magnitude," said Luo Fuli.


In the multimodal field, performance improvements brought by scaling are equally significant. Zhu Jun, Founder and Chief Scientist of Shengsheng Technology, stated that data quality, model size, and large-scale training all enhance model performance. With improved foundational model capabilities, models also learn physical laws and understand 3D scenes more efficiently.


While scaling continues to be effective, alongside the maturation of AI Coding and accelerated deployment of Agents, a trend of AI self-evolution is becoming evident, upgrading from writing code to autonomously completing product iteration updates.


"The foundation of the vast human digital world is largely constructed through code. With AI Coding making substantial progress and becoming mainstream, it means AI could gradually take over everything in the digital world," said Wang Zhongyuan.


Globally, using AI for product updates has become the norm.


"If the model determines an agent's capabilities, then the Harness determines the upper limit of those capabilities," said Li Jingqiu. "Its difficulty lies in further improving problem clarification, verification, and feedback on top of the model."


For example, relying solely on the model to understand a problem inevitably has limitations. The Harness needs to elaborate and enrich the user's simple one-sentence instruction so the model can better comprehend the requirement. This requires the Harness to leverage intent understanding. After receiving the task, it must design the subsequent workflow and then orchestrate the model to execute it. This process may require human intervention and correction, followed by checks before task completion.


World Models:


The Next Key Battleground for Large Models


Pushing outward along the boundaries of the digital world, World Models have become the next key battleground for large models.


"Currently, no single world model truly feels particularly impressive or solves all kinds of problems in the real physical world," said Wang Zhongyuan.


For World Models in their early developmental stage, the industry hasn't reached full consensus on the technologies involved. With technical routes not yet converged, a series of unresolved problems remain. Using data as an example, Wang Zhongyuan illustrated that whether video data, simulation data, or real-world physical data is needed, a clear methodological path hasn't been found yet.


Taking Galaxy Universal as an example, Wang He introduced their application of synthetic data at the event.


"Before the WAM (World Action Model) paradigm emerged, we conducted extensive experiments within the VLA paradigm using synthetic data, specifically for grasping tasks," said Wang He. "We used 1 billion frames of simulation data to prove: as long as you scale the data to this extent, you can achieve complete zero-shot learning. Give me any object in the real world, and it can handle the grasp."


Regarding the development progress of World Models, the BAAI predicts that 'at least several more years' are needed. The next three to five years will likely be a phase of continuous evolution and iteration for World Models.


Over the past few years, various world models with different technical routes have emerged in the industry, each progressing distinctively.


Taking multimodal world models as an example, Zhu Jun stated that video models and world models are closely related because world models need three capabilities: understanding and interpreting states, prediction, and action. Among currently accessible training data, video data is most relevant to world models.


With various technical routes diverging and industry consensus yet to form, the BAAI classifies world models into four categories:


First, language-centric world models, mapping other modalities and abilities into language space, including LLMs, VLMs, VLAs, etc.


Second, pixel-centric world models; video generation essentially predicts the next frame, but video generation models are not equivalent to world models, though they are related. The potentially very popular World Action Model (WAM) this year is evolving from a pixel-centric perspective.


Third, 3D structure-centric world models, including 3D reconstruction which focuses purely on the three-dimensional world.


Fourth, visual representation-centric world models.



Currently, BAAI is exploring a 'fifth' path – the fusion of language-centric and visual representation-centric approaches, namely latent space representation. This involves compressing information like text and images into a vector space to represent various states of the real physical world.


"Future unified latent space modeling will not be limited to visual space but encompass full-modal latent space. This is highly likely to be the true next possible path for world models," said Wang Zhongyuan.


At the conference, BAAI introduced the world model it is developing – WuJie · Physis-v0.1. Centered on physical space modeling to predict the next physical state, it is positioned as the world's first general-purpose world foundation model, emphasizing four key capabilities: 'physically correct, causally traceable actions, long-term temporal consistency, and general-purpose generalization.'



Currently, this model is still in the training phase. BAAI will continue to share progress in the second half of the year and will open-source the model upon training completion.


From 'Usable' to 'Useful':


Agents Face More Challenges


On the model side, progress in World Models drives the realization of physical AI; on the product side, Agents (Intelligent Agents) become the key products for AI to enter public life.


Since 2025, dubbed the 'Year of the Agent,' some impressive Agent products have emerged, showing signs of taking off. However, the unexpected surge in popularity of 'Lobsters' this year still came as a surprise.


Compared to last year when agents were mostly in an execution state, this year's agents have clearly become more proactive and capable, able to help users proactively execute more complex tasks.


At this year's BAAI Conference, BAAI also released four vertical-focused agents: BAAI Cardiac Agent, the world's first auxiliary diagnosis agent for cardiac magnetic resonance, aiding doctor decision-making by integrating multimodal capabilities and medical expertise; the autonomous research agent AREX for the scientific research field; SoulAgent, an agent helping users listen to meetings in real-time and capture key points; and a risk discovery agent targeting hazardous protein acquisition.


For example, regarding the meeting-listening agent, Guangzhui Intelligent tested its ability to summarize different meeting contents. SoulAgent did provide simple summaries of meeting content. While not as complete as minutes, the core viewpoints were accurate. This is particularly suitable for situations where parallel forum sessions overlap.



However, current agents still face numerous technical issues requiring further optimization. Yang An, President's Chair Professor at Nanyang Technological University, mentioned that to maintain and enhance agent capabilities, the most crucial aspects currently are related to context engineering, such as Memory, orchestration, etc.


At the agent sub-forum, Harness (literally meaning a horse's harness, referring to the entire engineering framework or environment built around an agent), which received little attention last year but gained significant popularity this year, became a high-frequency keyword mentioned on-site.


"If the model determines an agent's capabilities, then the Harness determines the upper limit of those capabilities," said Li Jingqiu. "Its difficulty lies in further improving problem clarification, verification, and feedback on top of the model."


For example, if relying solely on the model to understand a problem, limitations are inevitable. The Harness needs to elaborate and enrich the user's simple one-sentence instruction so the model can better comprehend the requirement. This requires the Harness to leverage intent understanding. After receiving the task, it must design the subsequent workflow and then orchestrate the model to execute it. This process may require human intervention and correction, followed by checks before task completion.


In short, like a real human assistant, every detailed step requires product refinement for the Harness to further improve the Agent's execution effectiveness.


Currently, Agents are still in the early stages of development. It is foreseeable that this industry has immense room for growth. Both improvements in model capabilities and solidification of engineering details will continue to enhance Agents' task-handling abilities.

This article is from WeChat Official Account: Guangzhui Intelligent , Author: Focus on Frontier Technology

Related Questions

QWhat is the main theme of the 2026 Zhiyuan Conference according to the article?

AThe main theme is that AI is evolving from being a tool for chatting to becoming capable of performing practical tasks ('work'), with a focus on scaling laws, the pursuit of world models, and the advancement of Agents from being usable to good.

QWhat does the article suggest about the current status and future of Scaling Law in AI development?

AThe article states that the Scaling Law is far from reaching its limit and is still effectively driving progress. While its form has diversified, it continues to push advancements in large language models and multimodal models through increased parameters, synthetic data, and reinforcement learning.

QWhat are the four categories of world models outlined by the Zhiyuan Research Institute, and what is the 'fifth path' they are exploring?

AThe four categories are: 1) Language-centric models, 2) Pixel-centric models, 3) 3D-structure-centric models, and 4) Visual-representation-centric models. The 'fifth path' they are exploring is a fusion of language-centric and visual-representation-centric approaches, aiming for unified latent space modeling across all modalities.

QAccording to the article, what is 'Harness' in the context of AI Agents, and why is it important?

AHarness refers to the engineering framework or environment built around an AI Agent. It is crucial because it determines the upper limit of an Agent's capabilities. Its role is to clarify user intent, design task workflows, schedule model execution, and incorporate human intervention and verification to ensure tasks are completed correctly, going beyond what the model alone can understand and execute.

QWhat example does the article give to illustrate the current practical application and capability of AI Agents?

AThe article gives the example of SoulAgent, an AI Agent designed for listening to and summarizing meetings. It was tested at the conference and was able to provide simple, accurate summaries of the core points from different sessions, demonstrating its utility in situations where forum times overlap.

Related Reads

Robots Begin to 'Consume Data': The Hidden Production Chain from Indian Data Factories to Billion-Dollar Humanoid Robots

Robots have started to 'consume data,' driving the formation of a new industrial supply chain focused on producing training data for embodied AI. Unlike large language models, which are trained on vast internet text corpora, embodied AI models face a 'data desert' in the physical world. This has created a massive demand for first-person perspective video data (Ego Data), captured by workers wearing cameras in places like Indian garment factories. Companies like Neocambrian AI are establishing 'data factories' where workers perform standardized tasks (e.g., sorting clothes, kitchen organization) to generate thousands of hours of video. Research, such as NVIDIA's EgoScale, demonstrates that scaling this human demonstration data predictably improves robot performance, particularly for dexterous manipulation. This has validated a training path combining large-scale human data for pre-training with smaller amounts of robot-specific data for fine-tuning. The value of different data types varies significantly, forming a 'data pyramid.' The base consists of low-cost, large-scale internet and Ego Data. Higher layers include more expensive motion-capture data (e.g., from data gloves), simulation/synthetic data, and the most costly and scarce layer: real robot teleoperation data. This demand has spawned a layered ecosystem of data suppliers: low-cost data factories, motion capture and alignment specialists, robot-native teleoperation service providers, simulation data companies, and platforms aiming for data standardization. Robot companies themselves are adopting a 'layered procurement' strategy: outsourcing generic Ego Data while building in-house capabilities for robot-specific adaptation data and the critical deployment/failure data generated in real-world applications. The industry is shifting focus from hardware and basic mobility to the data pipelines required for general-purpose capability. While parallels exist to data labeling companies like Scale AI in the LLM boom, the physical complexity of robot data—involving action success ambiguity and sim-to-real gaps—requires more integrated solutions for data collection, annotation, and a continuous feedback loop. The race is on to build the data engines that will teach robots to operate reliably in the unstructured real world.

marsbit41m ago

Robots Begin to 'Consume Data': The Hidden Production Chain from Indian Data Factories to Billion-Dollar Humanoid Robots

marsbit41m ago

Spicy Commentary | Michael Saylor's 'Player Talk'; 60-Year-Old Aunt Liquidated After 'Scamming a Young Man'

**"Spicy Commentary": Three Tales of Crypto's Wild Week** This week's "Spicy Commentary" column highlights three dramatic stories from the cryptocurrency world. First, **MicroStrategy's Michael Saylor** addressed the controversy over his company potentially selling Bitcoin. At the BTC Prague event, he clarified, "I never said the company can't sell Bitcoin. I told *you* never to sell *your* Bitcoin." This "do as I say, not as I do" stance was criticized by netizens as peak linguistic gymnastics, noting a history of him previously stating the company would "never" sell. Second, a **bizarre fraud case** emerged from Beijing. A 60-year-old woman, obsessed with getting rich from crypto but unwilling to risk her own savings, posed online as the 20-something "god-daughter" of a high-ranking official. She catfished a young man, convincing him to give her over 200,000 yuan for fabricated emergencies. She then invested all the stolen money into cryptocurrency with 10x leverage, only to lose everything in a market crash. The woman was sentenced to four years in prison for fraud. Finally, a **sobering trader's tale** surfaced on Reddit. A user posted "Tale of a crypto trader," confessing their net worth had plummeted from a peak of $45 million to roughly $17,200, primarily due to holding meme coins too long. The post, described as a crypto "book of confessions," sparked reactions ranging from sympathy to critique about greed, poor risk management, and the perils of treating meme coins as long-term investments instead of taking profits. The column concludes that this week featured masterful rhetoric, elaborate scams, and extreme financial volatility, stitching together another chapter in crypto's unpredictable theater.

Foresight News1h ago

Spicy Commentary | Michael Saylor's 'Player Talk'; 60-Year-Old Aunt Liquidated After 'Scamming a Young Man'

Foresight News1h ago

The Backside of Musk's Trillion-Dollar Fortune: 85% Can't Be Sold

Elon Musk becomes the world's first trillionaire, driven by SpaceX's IPO valuing the company at $1.77 trillion. However, his vast wealth is largely illiquid: he holds over 85% voting control, likely through super-voting shares that are subject to lock-ups and selling restrictions. While his net worth surpasses $1 trillion across SpaceX, Tesla, and private holdings, only a tiny fraction (potentially under 2% annually) could be converted to cash without jeopardizing control and market confidence. SpaceX's IPO also creates paper millionaires for roughly 4,400 employees, but their holdings face lock-up periods, exercise costs, and taxes, delaying and reducing actual cash proceeds. Only 4.2% of total shares are initially available for public trading, making the stock price highly sensitive to limited net buying or selling pressure. A major test will come when lock-ups expire for the remaining 96% of shares. The article contrasts SpaceX's wealth distribution with potential AI IPOs. Anthropic and OpenAI could generate employee wealth pools 20 times larger than SpaceX's in paper value, due to their higher valuations relative to revenue and potentially more distributed ownership. However, sustaining those high price-to-sales multiples post-IPO is uncertain. A key financial puzzle for SpaceX investors is its xAI unit. While it has locked in an estimated $26 billion in annual compute revenue from clients like Anthropic and Google, the unit reported a $6.4 billion loss in 2025. More critically, estimated annual capital expenditures of ~$30.8 billion exceed that revenue. The long-term viability of SpaceX's AI narrative hinges on whether this compute income can eventually cover the unit's massive ongoing investments and losses.

链捕手1h ago

The Backside of Musk's Trillion-Dollar Fortune: 85% Can't Be Sold

链捕手1h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片