Tsinghua's Prediction 2 Years Ago Is Becoming Global Consensus: Meta and Two Other Major AI Institutions Have Reached the Same Conclusion

marsbitPublished on 2026-04-13Last updated on 2026-04-13

Abstract

Summary: In a remarkable validation of Chinese AI research, Meta and METR have independently reached conclusions that align perfectly with the "Density Law" proposed by a Tsinghua University and FaceWall Intelligent team two years ago. Published in Nature Machine Intelligence in late 2025, the law states that the computational power required to achieve a specific level of AI performance halves every 3.5 months. This convergence was starkly evident in April 2026. METR reported that AI capabilities are doubling every 88.6 days, while Meta's new model, Muse Spark, demonstrated it could match the performance of a model from the previous year using less than one-tenth of the training compute. When plotted, the growth curves from all three sources—using different metrics (parameters, compute, task length)—show an almost identical exponential slope. The findings have profound implications: AI inference costs are collapsing faster than anticipated, powerful edge-computing AI is becoming rapidly feasible, and the industry's strategy of simply scaling model size is becoming economically inefficient. The Chinese team, which has been building its "MiniCPM" model series based on this law since 2024, is seen as having a significant two-year lead in practical engineering experience, marking a rare instance where Chinese researchers pioneered a fundamental predictive trend in AI.

【Insight】Too crazy! The AI evolution data recently measured by Meta and METR perfectly aligns with the "Density Law" proposed by a Chinese team two years ago. Silicon Valley suddenly realized that Chinese researchers have been leading this path for two years!

Three of the world's most serious AI research institutions collectively collided in the past week!

On April 3, the American research institution METR quietly updated a technical report, with the core conclusion compressed into one sentence.

AI capabilities double every 88.6 days.

Five days later, on April 8, Meta's Super Intelligence Lab released a new model, Muse Spark, and disclosed an internal training efficiency curve called the scaling ladder, with the conclusion also being one sentence.

To catch up with the performance of Llama 4 Maverick from a year ago, the new model requires less than one-tenth of the training compute.

One measured task duration, the other measured training compute. The two institutions had no interaction, and their research methods had no overlap.

But when the two curves were converted to the same coordinate system, their slopes were almost identical.

At this point, things were already bizarre enough.

Even more bizarre is that this curve was completely drawn by a Chinese team two years ago and even published in a Nature sub-journal.

It's called the Density Law.

Two Years Ago, Someone Drew This Line in Advance

This concept first appeared in a paper called "Densing Law of LLMs".

The authors were a joint team from ModelBest and Tsinghua University, led by Professors Sun Maosong and Liu Zhiyuan, with doctoral student Xiao Chaojun as the first author.

The paper was posted on arXiv in December 2024 and accepted by Nature Machine Intelligence in November 2025.

Paper address: https://arxiv.org/abs/2412.04315

Paper address: https://www.nature.com/articles/s42256-025-01137-0

The core assertion of the paper is just one sentence.

Model intelligence density increases exponentially over time, and the number of parameters required to reach a specific intelligence level halves every 3.5 months.

Back in late 2024, this sounded a bit radical.

At that time, the entire industry was worshipping the scaling law. OpenAI was scaling models, Anthropic was scaling models, Meta was scaling models.

Everyone thought that bigger parameters meant stronger intelligence, and burning GPUs to the extreme was the right way.

But the research team didn't see it that way.

They put all influential open-source foundation models at the time, from Llama-1 all the way to Gemma-2 and MiniCPM-3, a total of 51 models, into the same ruler to measure.

After running five major benchmarks, the result was an almost perfect exponential relationship, with an R² of 0.934.

Considering that LLM evaluation is easily contaminated by data pollution, they retested using a newly constructed contamination-filtered dataset, MMLU-CF. R²=0.953.

Both fits achieved an R² close to 1. Statistically, this is almost impossible to be a coincidence.

In other words, every mainstream open-source model released in these two years, regardless of which team it came from or what architecture it used, fell on the same exponential line of "doubling every 3.5 months".

Up to this point, the story was just "a Chinese team proposed a seemingly radical empirical rule".

What truly made this a "moment" was what happened in the next half year.

Three Institutions, Three Methods, One Slope

Laying out the conclusions from ModelBest, Meta, and METR:

  • ModelBest's Density Law measures "how many parameters are needed for the same intelligence level". The conclusion is that the parameter requirement halves every 3.5 months.
  • Meta's scaling ladder measures "how much training compute is needed for the same intelligence level". The conclusion is that Muse Spark saves an order of magnitude compared to Llama 4 Maverick from a year ago.
  • METR's time horizon report measures "how long a task a model can handle". The conclusion is that the task length doubles every 88.6 days.

Three rulers. Three academic institutions. Three research paths with no overlap whatsoever.

But when all the numbers are converted and viewed in the same coordinate system, the slopes of their curves are almost identical.

The most easily overlooked point about this is that the Density Law was the first proposed among these three. It was nearly two years earlier than Meta's scaling ladder and more than a year earlier than METR's complete modeling.

And when Meta drew that scaling ladder in their early April blog post, they probably didn't realize it themselves. The shape of this graph was almost the same line as the curve on a PPT from an academic conference in Beijing in 2024.

What Kind of Observation Deserves to Be Called a "Law"

In the scientific community, there is an unwritten standard to judge whether an empirical observation qualifies as a "law".

It's not about how beautiful the data is, but whether it holds true across multiple independent measurement systems.

The reason Moore's Law is a law is because the semiconductor industry has verified it over decades from three completely different dimensions: lithography precision, transistor density, and cost per unit of computing power.

The Density Law follows the same path.

It was initially just a fitting curve from a single team. By the time it was accepted by the Nature sub-journal, it could already be reproduced on contamination-filtered datasets. And this month, it was independently verified twice more in Meta's training data and METR's task evaluations.

Viewed in a larger coordinate system, this moment is very much like when electricity first entered New York in the 1880s.

Back then, it was also different inventors, different engineers, different cities, each working on their own power grids. It wasn't until someone plotted the development curves of all the projects on one sheet of paper that people realized. This wasn't a few scattered engineering advances; this was a new era quietly unfolding.

Only this time, it took less than a year from the paper's publication to its verification by global peers.

Three Inferences, Each Rewriting Industry Assumptions

If the Density Law holds, it will rewrite many things simultaneously.

First, inference costs will crash faster than anyone expected.

One inference of the Density Law is that the inference cost for LLMs achieving the same performance roughly halves every 2.6 months.

Today, this rate of decline has already been exceeded by reality.

The latest tracking data from Epoch AI shows that the token price for LLMs achieving Claude 3.5 Sonnet performance level has dropped 400 times in the past year. The fastest decline for the same performance tier reached 900x/year.

The level that GPT-3.5 priced at $20 per million tokens in late 2022, today Mistral Nemo charges only $0.02, 1000 times cheaper, and the model is even stronger.

In retrospect, the prediction in the paper was conservative.

Second, the explosion point for on-device AI is closer than anyone thought.

Multiplying the Density Law by Moore's Law yields an even more exciting number.

According to current estimates, the maximum effective model size that can run on chips at the same price roughly doubles every 88 days.

This number is almost identical to the 88.6 days calculated by METR. Two completely different calculation paths collided after the decimal point.

In the next three to five years, running a current top-tier GPT-level model on an ordinary laptop or even a mobile phone may no longer be science fiction.

Third, the optimal strategy for the large model industry is quietly reversing.

For the past three years, the industry's understanding of the scaling law has remained at "stack parameters, stack data."

But the Density Law offers a counterintuitive judgment. Given the continuous exponential growth in density, any state's strongest model only has an optimal window of a few months.

Throwing all resources into training a larger model, only to be surpassed by a new model half the size three months later, is economically unwise.

The truly sustainable path is to invest resources in improving the density itself. Better architectures, higher quality data, smarter training algorithms.

ModelBest, Has Been Walking Along the Ruler They Drew

It's worth mentioning that the Density Law is not a paper that ended after publication.

ModelBest, which proposed this theory, has been verifying it with their own "Small Cannon" MiniCPM series models for the past two years.

When MiniCPM-1-2.4B was released in February 2024, its benchmark scores could match or exceed Mistral-7B from September 2023. That is, in four months, with 35% of the parameters, it achieved equivalent performance.

This number was directly written into the Nature sub-journal paper as the first empirical case of the Density Law.

Since then, the Small Cannon series has been open-sourced all the way, covering four major directions for parameters below 10B: text, multimodal, speech, and full-modality. This level of open-source completeness domestically, besides Alibaba, only ModelBest has achieved.

So far, the global open-source downloads of the Small Cannon series have exceeded 24 million.

It is not the largest model in the industry. But it is the first team in the industry to implement "density first" as a company methodology.

And when Meta and METR verified the Density Law in their respective ways in April 2026, this Chinese company that started training models using this methodology back in 2024 actually had a two-year lead in engineering experience.

This Time, Chinese Researchers Are at the Starting Point of the Curve

A theoretical framework proposed by a Chinese research team two years ago is being rediscovered, in their own ways, by overseas institutions like Meta and METR, the most serious players.

The weight of this matter may take some time to fully understand.

It is not a "we can do it too" story. It is a "we saw it a bit earlier" story.

There aren't many such moments in the history of science. A judgment doubted in 2024 became the same curve pointed to by multiple independent pieces of evidence in 2026.

This kind of "coincidence" across regions, methods, and institutions has happened a few times in physics, each time marking the end of an old paradigm and the beginning of a new one.

This time, Chinese AI researchers are standing at that starting point.

And that curve is still rising, doubling every 88 days.

References:

The "Density Law"首创 (first proposed) by ModelBest gains recognition from top overseas institutions like Meta

https://arxiv.org/abs/2412.04315

https://www.nature.com/articles/s42256-025-01137-0

https://metr.org/blog/2026-1-29-time-horizon-1-1/

https://ai.meta.com/blog/introducing-muse-spark-msl/

This article is from the WeChat public account "新智元" (New Zhiyuan), edited: 好困 (Hao Kun) 桃子 (Tao Zi)

Related Questions

QWhat is the 'Density Law' proposed by the Chinese research team from Tsinghua University and ModelBest, and what is its core finding?

AThe 'Density Law' is a theory proposed by a joint team from Tsinghua University and ModelBest (面壁智能). Its core finding is that the intelligence density of AI models increases exponentially over time, with the number of parameters required to achieve a specific level of intelligence halving approximately every 3.5 months.

QWhich three independent research institutions recently reached conclusions that align with the Density Law, and what did each one measure?

AThe three institutions are ModelBest (the original proposer), Meta's Super Intelligence Lab, and METR. ModelBest measured the parameter efficiency (parameters needed for a given intelligence level). Meta measured training compute efficiency (compute needed for a given intelligence level). METR measured the expansion of AI capabilities over time (task horizon length doubling every 88.6 days).

QAccording to the article, what are three major implications of the Density Law for the AI industry?

A1. Inference costs are collapsing faster than expected, potentially halving every ~2.6 months. 2. The arrival of powerful on-device AI is much closer than previously thought, as smaller models can achieve performance previously requiring large data centers. 3. The optimal strategy for AI companies is shifting from simply scaling up model size (scaling law) to focusing on improving intelligence density through better architectures, data, and algorithms.

QHow did the MiniCPM model series from ModelBest serve as an early validation of the Density Law?

AThe MiniCPM-1-2.4B model, released in February 2024, achieved performance comparable to the Mistral-7B model from September 2023. This demonstrated that in just four months, a model with only 35% of the parameters could achieve the same level of performance, providing an early empirical case for the Density Law's prediction of rapidly increasing parameter efficiency.

QWhat is the significance of multiple independent institutions arriving at the same conclusion as the Density Law using different methods?

AThis convergence of evidence from independent paths (parameter efficiency, training compute, and capability growth) significantly strengthens the validity of the Density Law. In science, a principle gains the status of a 'law' when it holds across multiple, independent measurement systems, similar to how Moore's Law was validated across different aspects of semiconductor progress. It suggests a fundamental shift in the AI development paradigm is underway.

Related Reads

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of S (S) are presented below.

活动图片