Author: Ada, Deep Tide TechFlow
San Francisco, San Jose Convention Center, GTC live.
NVIDIA's Chief Scientist Bill Dally sat on stage, facing Google's Jeff Dean. Halfway through their conversation, Dally dropped a number: "Previously, migrating a standard cell library containing about 2500 to 3000 units required a team of 8 engineers taking about 10 months."
He paused.
"Now it only takes a single GPU card, running overnight."
There were no gasps from the audience, because those who understood the statement knew what it meant. The work of 8 engineers over 10 months was devoured overnight by a GPU produced in-house. And Dally added: the results matched or even exceeded human design in the three key metrics of area, power consumption, and latency.
The next day, news outlets interpreted it as "NVIDIA uses AI to design GPUs."
But the truth of the matter is far more intriguing than the headline.
What is NVIDIA running internally?
What NVIDIA runs internally isn't a black box either; it's a set of toolchains refined over several years.
NB-Cell is a reinforcement learning-based program, specifically tackling the grueling work of standard cell library migration. Prefix RL aims to solve the long-standing research challenge of placement in the lookahead stage of carry-lookahead chains. Dally stated that the layouts generated by this system are something "humans could never have conceived of," with key metrics improving by about 20% to 30% compared to human design.
Then there are the two internal LLMs, Chip Nemo and Bug Nemo. NVIDIA fed these two large models with the RTL code, architecture documents, and design specifications of every GPU in its history. According to Dally's description, this is equivalent to distilling NVIDIA's twenty years of muscle memory from G80 to Blackwell into an internal model. New hires essentially interface directly with a senior engineer possessing twenty years of experience.
So, can "AI design GPUs" now?
Quite the opposite. Dally's exact words were: "I would love to one day just say 'design me a new GPU,' but we are still far from that step."
NVIDIA did not use AI to design a GPU. But it did something else that will make the entire industry unable to function without it.
$2 Billion Entry into the Heart of EDA
On December 1, 2025, NVIDIA invested $2 billion in Synopsys, one of the three EDA giants. The two parties signed a joint development agreement to embed NVIDIA's accelerated computing stack into Synopsys's entire EDA workflow. Blackwell and the next-generation Rubin GPU are to be deeply integrated with Synopsys.ai.
Synopsys's position needs some explanation. Nearly every advanced process chip globally—Apple's M-series, AMD's MI-series, Google's TPU—runs on toolchains from either Synopsys or Cadence during the design phase. These two, plus Siemens EDA, monopolize the underlying tools for chip design. You might not use Qualcomm's chips, you might not use TSMC's production lines, but you cannot escape the software from these three companies.
Three months after the investment in Synopsys, NVIDIA brought Cadence, Siemens, and Dassault on board, announcing that they are all developing AI-driven chip design tools based on NVIDIA GPUs.
The benchmark data released by NVIDIA is quite staggering: Synopsys PrimeSim is 30x faster on Blackwell, Proteus is 20x faster, Sentaurus achieves a 12x speedup on B200 compared to CPU. MediaTek used H100 to accelerate Cadence Spectre by 6x. Astera Labs used Synopsys + NVIDIA to accelerate chip verification by 3.5x.
One detail is worth highlighting separately: Cadence's Millennium M2000 platform is labeled as "exclusively based on NVIDIA Blackwell, specifically built for the EDA market."
The word "exclusively" is most telling. This means that EDA tools, which previously ran on CPUs (Intel, AMD could play), now require buying NVIDIA cards if you want the fastest EDA.
The True Shape of the Flywheel
NVIDIA's flywheel, as most people understand it, goes like this: sell GPUs to AI companies, AI companies train large models, large models prove GPUs are irreplaceable, more people buy GPUs.
This flywheel is scary enough. But there's another layer beneath it.
NVIDIA uses its own tools to design the next generation of GPUs, creating a generational gap in design efficiency, while simultaneously tying the entire industry's EDA toolchain to its own hardware. Competitors want to catch up, but even the tools for catching up must be rented from NVIDIA's ecosystem.
The anxiety hidden behind AMD's earnings report that caused its stock to plunge is precisely this. Even though NVIDIA and Synopsys ostensibly say the "investment does not carry any obligation to purchase NVIDIA hardware," the market understands: accelerated EDA features debut first on NVIDIA hardware. AMD and Intel can only rely on a path "optimized for their biggest competitor's platform."
Imagine an AMD engineer in the future wanting to design a chip to rival Blackwell. They open Synopsys's tool, which runs fastest on NVIDIA GPUs. So, they either endure a design cycle twice as long, or buy a bunch of NVIDIA cards to design the chip meant to defeat NVIDIA.
The shovels are still being sold. But the way they are sold has changed.
The Real Situation of Domestic GPUs
At this point, it's necessary to present some sobering numbers.
In the same fiscal year 2025 that NVIDIA's net profit exceeded $70 billion, the domestic GPU "Four Little Dragons"—Moore Threads, MetaX, Biren, and Enflame—were queuing up before the IPO window.
Moore Threads' prospectus shows that from 2022 to 2024, the cumulative net loss was 5 billion yuan, with another 271 million yuan loss in the first half of 2025. As of June 30, the accumulated uncovered loss was 1.478 billion yuan. The company's management itself estimates that achieving consolidated profitability will be possible no earlier than 2027. MetaX is slightly better, with cumulative losses exceeding 3 billion yuan over three years. The worst is Biren, with losses exceeding 6.3 billion yuan over three and a half years. Its revenue in the first half of 2025 was only 58.9 million yuan, not even a fraction of Moore Threads' 702 million yuan during the same period.
Look at the R&D intensity. Moore Threads' R&D expenses as a percentage of revenue were 2422.51% in 2022, and still as high as 309.88% in 2024. The money spent on R&D in a year is more than three times the revenue. This isn't business operation; it's life support, sustained by continuous输血 (transfusions) from the primary market and the recently opened Sci-Tech Innovation Board (STAR Market) window.
The tool situation is even more卡脖子 (a chokehold). Empyrean Technology's (Huada Jiutian) 2022 IPO prospectus showed its tools only partially support 5nm advanced processes. Primarius Technologies (Gailun Electronics) can cover 7nm/5nm/3nm nodes, but only makes point tools, far from a full flow.
Empyrean founder Liu Weiping was very candid: "Domestic EDA still has obvious deficiencies in supporting advanced processes, especially current ones like 7nm, 5nm, 3nm. Currently, domestic EDA can achieve 14nm level. Although 7nm process technology is mastered, the deep integration of 7nm with practical applications still requires协同发力 (collaborative effort) from the entire industry chain."
In other words, full-flow EDA for advanced processes is basically unusable domestically. Domestic GPU companies still use Synopsys and Cadence to design chips. In 2025, Trump once announced export controls on all critical software. Although not substantially implemented, EDA tools for advanced processes below 7nm remain under strict control. When the license gets cut off, the switch is in someone else's hands.
The capital market's reaction is surreal enough. On its listing day, MetaX's stock closed at 829.9 yuan, a single-day increase of 692.95%. After Moore Threads went public, its stock price once rose to become the third highest in the A-share market,仅次于 (only behind) Kweichow Moutai and Cambricon. Some media calculated its total market capitalization at about 359.5 billion yuan based on the stock price at the time.
The real business behind the numbers is this: a group of companies that are still burning money and losing, still reliant on controlled foreign toolchains to continue designing chips, are being priced in the secondary market as the successors to the "domestic NVIDIA."
And the set of tools these companies use to design chips is becoming part of the NVIDIA ecosystem. NVIDIA's $2 billion binding with Synopsys, Cadence Millennium M2000's "exclusively based on NVIDIA Blackwell" label, make the act of catching up itself a paradox.
A Complete Chain from Design to Manufacturing
Back to that GTC conversation.
Dally was very humble throughout. "AI is still far from designing chips on its own"—NVIDIA has been saying this for four or five years. But the phrasing changes every year. Four years ago it was "AI can assist design," three years ago it was "AI can automate certain steps," this year it's "does the work of 8 people in 10 months overnight." Pushing forward a step every year, leaving behind a "still far from the ultimate goal" each time. Looking back three years later, the previous "still far" has been achieved, and the new "still far" is defined at a place all competitors still cannot reach.
What NVIDIA has done in the past twelve months is essentially one thing: applying AI to the most valuable, deepest moat segments of the chip industry chain, and then selling these tools layer by layer to the entire industry.
The front end of chip design is being taken over by internal LLMs like Chip Nemo; the mid-design tasks like standard cell library migration and layout optimization are being taken over by NB-Cell and Prefix RL; the entire EDA toolchain is being tied to its own GPUs through the $2 billion deal with Synopsys and Cadence's "exclusively based on Blackwell"; the computational lithography on the manufacturing end is being taken over by cuLitho, which TSMC is already using.
From design to manufacturing, every segment has been re-done by NVIDIA using AI. Every segment ultimately leads to the same conclusion: if you want the fastest tools, you have to buy NVIDIA's cards.
For all opponents who want to build a chip that can defeat Blackwell, the most awkward thing has already happened. The EDA tools needed to design this chip have their fastest versions running on NVIDIA GPUs; the computational lithography needed to manufacture this chip has its fastest algorithm library provided by NVIDIA; the compute power used to train the design AI is still NVIDIA's cards.
The one you want to defeat is now renting you all the tools needed to defeat it. The rent is paid yearly, and the contract price increases annually.






