Tsinghua's Prediction 2 Years Ago Is Becoming Global Consensus: Meta and Two Other Major AI Institutions Have Reached the Same Conclusion

marsbitPublished on 2026-04-13Last updated on 2026-04-13

Abstract

Summary: In a remarkable validation of Chinese AI research, Meta and METR have independently reached conclusions that align perfectly with the "Density Law" proposed by a Tsinghua University and FaceWall Intelligent team two years ago. Published in Nature Machine Intelligence in late 2025, the law states that the computational power required to achieve a specific level of AI performance halves every 3.5 months. This convergence was starkly evident in April 2026. METR reported that AI capabilities are doubling every 88.6 days, while Meta's new model, Muse Spark, demonstrated it could match the performance of a model from the previous year using less than one-tenth of the training compute. When plotted, the growth curves from all three sources—using different metrics (parameters, compute, task length)—show an almost identical exponential slope. The findings have profound implications: AI inference costs are collapsing faster than anticipated, powerful edge-computing AI is becoming rapidly feasible, and the industry's strategy of simply scaling model size is becoming economically inefficient. The Chinese team, which has been building its "MiniCPM" model series based on this law since 2024, is seen as having a significant two-year lead in practical engineering experience, marking a rare instance where Chinese researchers pioneered a fundamental predictive trend in AI.

【Insight】Too crazy! The AI evolution data recently measured by Meta and METR perfectly aligns with the "Density Law" proposed by a Chinese team two years ago. Silicon Valley suddenly realized that Chinese researchers have been leading this path for two years!

Three of the world's most serious AI research institutions collectively collided in the past week!

On April 3, the American research institution METR quietly updated a technical report, with the core conclusion compressed into one sentence.

AI capabilities double every 88.6 days.

Five days later, on April 8, Meta's Super Intelligence Lab released a new model, Muse Spark, and disclosed an internal training efficiency curve called the scaling ladder, with the conclusion also being one sentence.

To catch up with the performance of Llama 4 Maverick from a year ago, the new model requires less than one-tenth of the training compute.

One measured task duration, the other measured training compute. The two institutions had no interaction, and their research methods had no overlap.

But when the two curves were converted to the same coordinate system, their slopes were almost identical.

At this point, things were already bizarre enough.

Even more bizarre is that this curve was completely drawn by a Chinese team two years ago and even published in a Nature sub-journal.

It's called the Density Law.

Two Years Ago, Someone Drew This Line in Advance

This concept first appeared in a paper called "Densing Law of LLMs".

The authors were a joint team from ModelBest and Tsinghua University, led by Professors Sun Maosong and Liu Zhiyuan, with doctoral student Xiao Chaojun as the first author.

The paper was posted on arXiv in December 2024 and accepted by Nature Machine Intelligence in November 2025.

Paper address: https://arxiv.org/abs/2412.04315

Paper address: https://www.nature.com/articles/s42256-025-01137-0

The core assertion of the paper is just one sentence.

Model intelligence density increases exponentially over time, and the number of parameters required to reach a specific intelligence level halves every 3.5 months.

Back in late 2024, this sounded a bit radical.

At that time, the entire industry was worshipping the scaling law. OpenAI was scaling models, Anthropic was scaling models, Meta was scaling models.

Everyone thought that bigger parameters meant stronger intelligence, and burning GPUs to the extreme was the right way.

But the research team didn't see it that way.

They put all influential open-source foundation models at the time, from Llama-1 all the way to Gemma-2 and MiniCPM-3, a total of 51 models, into the same ruler to measure.

After running five major benchmarks, the result was an almost perfect exponential relationship, with an R² of 0.934.

Considering that LLM evaluation is easily contaminated by data pollution, they retested using a newly constructed contamination-filtered dataset, MMLU-CF. R²=0.953.

Both fits achieved an R² close to 1. Statistically, this is almost impossible to be a coincidence.

In other words, every mainstream open-source model released in these two years, regardless of which team it came from or what architecture it used, fell on the same exponential line of "doubling every 3.5 months".

Up to this point, the story was just "a Chinese team proposed a seemingly radical empirical rule".

What truly made this a "moment" was what happened in the next half year.

Three Institutions, Three Methods, One Slope

Laying out the conclusions from ModelBest, Meta, and METR:

ModelBest's Density Law measures "how many parameters are needed for the same intelligence level". The conclusion is that the parameter requirement halves every 3.5 months.
Meta's scaling ladder measures "how much training compute is needed for the same intelligence level". The conclusion is that Muse Spark saves an order of magnitude compared to Llama 4 Maverick from a year ago.
METR's time horizon report measures "how long a task a model can handle". The conclusion is that the task length doubles every 88.6 days.

Three rulers. Three academic institutions. Three research paths with no overlap whatsoever.

But when all the numbers are converted and viewed in the same coordinate system, the slopes of their curves are almost identical.

The most easily overlooked point about this is that the Density Law was the first proposed among these three. It was nearly two years earlier than Meta's scaling ladder and more than a year earlier than METR's complete modeling.

And when Meta drew that scaling ladder in their early April blog post, they probably didn't realize it themselves. The shape of this graph was almost the same line as the curve on a PPT from an academic conference in Beijing in 2024.

What Kind of Observation Deserves to Be Called a "Law"

In the scientific community, there is an unwritten standard to judge whether an empirical observation qualifies as a "law".

It's not about how beautiful the data is, but whether it holds true across multiple independent measurement systems.

The reason Moore's Law is a law is because the semiconductor industry has verified it over decades from three completely different dimensions: lithography precision, transistor density, and cost per unit of computing power.

The Density Law follows the same path.

It was initially just a fitting curve from a single team. By the time it was accepted by the Nature sub-journal, it could already be reproduced on contamination-filtered datasets. And this month, it was independently verified twice more in Meta's training data and METR's task evaluations.

Viewed in a larger coordinate system, this moment is very much like when electricity first entered New York in the 1880s.

Back then, it was also different inventors, different engineers, different cities, each working on their own power grids. It wasn't until someone plotted the development curves of all the projects on one sheet of paper that people realized. This wasn't a few scattered engineering advances; this was a new era quietly unfolding.

Only this time, it took less than a year from the paper's publication to its verification by global peers.

Three Inferences, Each Rewriting Industry Assumptions

If the Density Law holds, it will rewrite many things simultaneously.

First, inference costs will crash faster than anyone expected.

One inference of the Density Law is that the inference cost for LLMs achieving the same performance roughly halves every 2.6 months.

Today, this rate of decline has already been exceeded by reality.

The latest tracking data from Epoch AI shows that the token price for LLMs achieving Claude 3.5 Sonnet performance level has dropped 400 times in the past year. The fastest decline for the same performance tier reached 900x/year.

The level that GPT-3.5 priced at $20 per million tokens in late 2022, today Mistral Nemo charges only $0.02, 1000 times cheaper, and the model is even stronger.

In retrospect, the prediction in the paper was conservative.

Second, the explosion point for on-device AI is closer than anyone thought.

Multiplying the Density Law by Moore's Law yields an even more exciting number.

According to current estimates, the maximum effective model size that can run on chips at the same price roughly doubles every 88 days.

This number is almost identical to the 88.6 days calculated by METR. Two completely different calculation paths collided after the decimal point.

In the next three to five years, running a current top-tier GPT-level model on an ordinary laptop or even a mobile phone may no longer be science fiction.

Third, the optimal strategy for the large model industry is quietly reversing.

For the past three years, the industry's understanding of the scaling law has remained at "stack parameters, stack data."

But the Density Law offers a counterintuitive judgment. Given the continuous exponential growth in density, any state's strongest model only has an optimal window of a few months.

Throwing all resources into training a larger model, only to be surpassed by a new model half the size three months later, is economically unwise.

The truly sustainable path is to invest resources in improving the density itself. Better architectures, higher quality data, smarter training algorithms.

ModelBest, Has Been Walking Along the Ruler They Drew

It's worth mentioning that the Density Law is not a paper that ended after publication.

ModelBest, which proposed this theory, has been verifying it with their own "Small Cannon" MiniCPM series models for the past two years.

When MiniCPM-1-2.4B was released in February 2024, its benchmark scores could match or exceed Mistral-7B from September 2023. That is, in four months, with 35% of the parameters, it achieved equivalent performance.

This number was directly written into the Nature sub-journal paper as the first empirical case of the Density Law.

Since then, the Small Cannon series has been open-sourced all the way, covering four major directions for parameters below 10B: text, multimodal, speech, and full-modality. This level of open-source completeness domestically, besides Alibaba, only ModelBest has achieved.

So far, the global open-source downloads of the Small Cannon series have exceeded 24 million.

It is not the largest model in the industry. But it is the first team in the industry to implement "density first" as a company methodology.

And when Meta and METR verified the Density Law in their respective ways in April 2026, this Chinese company that started training models using this methodology back in 2024 actually had a two-year lead in engineering experience.

This Time, Chinese Researchers Are at the Starting Point of the Curve

A theoretical framework proposed by a Chinese research team two years ago is being rediscovered, in their own ways, by overseas institutions like Meta and METR, the most serious players.

The weight of this matter may take some time to fully understand.

It is not a "we can do it too" story. It is a "we saw it a bit earlier" story.

There aren't many such moments in the history of science. A judgment doubted in 2024 became the same curve pointed to by multiple independent pieces of evidence in 2026.

This kind of "coincidence" across regions, methods, and institutions has happened a few times in physics, each time marking the end of an old paradigm and the beginning of a new one.

This time, Chinese AI researchers are standing at that starting point.

And that curve is still rising, doubling every 88 days.

References:

The "Density Law"首创 (first proposed) by ModelBest gains recognition from top overseas institutions like Meta

https://arxiv.org/abs/2412.04315

https://www.nature.com/articles/s42256-025-01137-0

https://metr.org/blog/2026-1-29-time-horizon-1-1/

https://ai.meta.com/blog/introducing-muse-spark-msl/

This article is from the WeChat public account "新智元" (New Zhiyuan), edited: 好困 (Hao Kun) 桃子 (Tao Zi)

An Opening For Ripple: Why XRP Is Set To Dominate This Crypto Sector

Ripple's XRP is positioned as a key asset poised to drive the next wave of transformation in the crypto sector, particularly in decentralized finance (DeFi). According to XRP Ledger validator Vet, XRP could challenge traditional finance (TradFi) by addressing structural limitations like slow settlement times, high transaction costs, and restricted cross-border accessibility. Vet emphasizes XRP's superior protocol design and high-value use cases, such as cross-border payments and liquidity provisioning, which make it ideal for real-world financial applications. However, Flare Network co-founder Hugo Philion criticized these claims as premature, arguing that protocol superiority should be proven at scale under real-world conditions. Vet defended his stance, highlighting XRP's risk-minimized design and pointing to the upcoming XLS-66 upgrade as evidence of its structural advantages.

bitcoinist4m ago

An Opening For Ripple: Why XRP Is Set To Dominate This Crypto Sector

bitcoinist4m ago

Bitcoin Has Likely Found Bottom—3 Indicators Make $100,000 Seem Much Nearer

Bitcoin (BTC) is showing signs of having bottomed out and may be poised for a rally toward $100,000, according to analyst Ali Martinez. He cites three key indicators supporting this outlook. First, the Sharpe Ratio rebounded from -43 to around 20.35, indicating reduced market volatility and improved risk-adjusted conditions. Second, the Percentage Realized Cap fell below 7%, suggesting low retail activity and a market dominated by long-term holders—a historical bottom signal. Third, the Inter-exchange Flow Pulse shows BTC moving toward derivatives exchanges, indicating growing trader confidence and leveraged long interest. Martinez identifies $73,700 as a critical support level. Holding above it could push BTC toward $96,000, while a break below may lead to a decline toward $55,000.

bitcoinist1h ago

Bitcoin Has Likely Found Bottom—3 Indicators Make $100,000 Seem Much Nearer

bitcoinist1h ago

PEPE’s TCT Model Distribution Predicts The Top Of The Rally

Crypto analyst The Composite Trader predicts that PEPE could rally to $0.004 based on its TCT model distribution, though it may drop to $0.0038 afterward. He notes that PEPE may have formed a "naked low," indicating weak accumulation and a potential setup for bearish reversals. While not a guaranteed scenario, he is monitoring lower timeframes for confirmation before taking positions. Another analyst, Sweep, suggests PEPE is days away from a major upward move, with signs of a market bottom. Despite a recent 3% drop to ~$0.000003764, whale activity—including a $3 million PEPE withdrawal from Coinbase—hints at accumulation ahead of a potential rally. Broader crypto market optimism, fueled by possible de-escalation in U.S.-Iran tensions, may also support meme coins.

bitcoinist3h ago

PEPE’s TCT Model Distribution Predicts The Top Of The Rally

bitcoinist3h ago

XRP Co-Creator Arthur Britto Is Real, David Schwartz Says In Rare Comments

David Schwartz, Ripple's Chief Technology Officer, has publicly confirmed that Arthur Britto, the elusive co-creator of the XRP Ledger, is a real and highly private individual, not a myth or pseudonym. In detailed comments, Schwartz described Britto as a key architect whose ideas were fundamental to core XRP Ledger mechanics like its exchange functionality and auto-bridging. He emphasized Britto's extreme value for privacy, noting he has attended events anonymously and was even upset about a photo showing the back of his head. Schwartz compared Britto's unique intelligence to Mozart's, stating he arrives at brilliant solutions intuitively, without needing to verbalize the intermediate logic, unlike Schwartz's own step-by-step reasoning. Despite his absence from public life, Britto is confirmed to be "doing really well." At the time of reporting, XRP was trading at $1.4228.

bitcoinist5h ago

XRP Co-Creator Arthur Britto Is Real, David Schwartz Says In Rare Comments

bitcoinist5h ago

Verifiable Bitcoin Accounts for Institutional Bitcoin. Your Custody, Your Terms.

Threshold Network has launched Verifiable Bitcoin Accounts (VBA), a new institutional Bitcoin framework built on its existing signer infrastructure, which has processed over $5 billion with zero losses. VBA uses Bitcoin Script and PSBTs to define preauthorized spending paths, signer combinations, timelocks, and recovery routes. It allows institutions to engage in onchain lending and yield strategies while maintaining their current custody setups—including Qualified Custodians, MPC networks, or self-custody—without transferring asset title. VBA ensures consensus-enforced spending, multi-party controls, predefined recovery options, and whitelisted deployment into approved platforms like Aave and Curve. Every transaction is verifiable onchain, reducing reliance on trust and enabling institutional-scale Bitcoin adoption through enforceable, auditable operations. Available to qualified participants, VBA extends Threshold’s proven infrastructure to meet institutional standards for security and compliance.

TheNewsCrypto5h ago

Verifiable Bitcoin Accounts for Institutional Bitcoin. Your Custody, Your Terms.

TheNewsCrypto5h ago

Trading

Spot

Futures

Hot Articles

In-Depth Research Report on Account Abstraction (AA): Generational Leap in Ethereum’s Account System & Landscape Reshaping in the Next Five Years

As a major evolution of Ethereum’s account system, AA is designed to address the fundamental security and experience bottlenecks of the “private key equals account” model in the EOA era.

3.6k Total ViewsPublished 2025.12.18Updated 2025.12.18

In-Depth Research Report on Account Abstraction (AA): Generational Leap in Ethereum’s Account System & Landscape Reshaping in the Next Five Years

Hot Tokens Learning Week 7: Privacy Coins Rally in Rotation, with RIVER Standing Out as 2026’s Surprise Performer

The privacy + payments narrative has been the primary catalyst driving rotation and substantial price gains in privacy coins such as DASH and XMR.

16.3k Total ViewsPublished 2026.01.20Updated 2026.01.20

Hot Tokens Learning Week 7: Privacy Coins Rally in Rotation, with RIVER Standing Out as 2026’s Surprise Performer

Hot Tokens Learning Week 8: ADA's Ouroboros Leios Mainnet Expected to Launch in 2026

ADA's Ouroboros Leios mainnet is expected to launch in 2026, and the hard fork to Protocol Version 11 is planned for Q1 2026.

40.1k Total ViewsPublished 2026.02.10Updated 2026.02.12

Hot Tokens Learning Week 8: ADA's Ouroboros Leios Mainnet Expected to Launch in 2026

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of S (S) are presented below.