CPU Makes a Comeback to the Table, A $170 Billion "Power Seizure" Drama Begins

marsbitPublished on 2026-06-19Last updated on 2026-06-19

Abstract

A new era is dawning for the server CPU (Central Processing Unit), driven by the shift from AI model training to large-scale reasoning and the rise of Agentic AI. This article explores how the CPU is reclaiming a central role in the AI data center. For years, the focus has been on the GPU (Graphics Processing Unit) for AI training. However, as AI moves to the inference and Agent phase—where tasks involve complex, multi-step reasoning, tool calls, and data management—the workload balance is flipping. Studies show CPUs now handle over 70% of the workload in Agentic AI, up from 10-30% in training. This is because Agent tasks generate massive intermediate data (KV Cache) that exceeds GPU memory, forcing it to be offloaded to the CPU's larger, more scalable memory pools. This increased importance is translating into market changes. Major players are taking note: NVIDIA launched its first standalone CPU line, Vera, based on ARM architecture and optimized for Agent performance. AMD doubled its server CPU market forecast to over $1200 billion by 2030. Analyst reports project the total server CPU market could reach $1700 billion by 2030, with AI-driven demand being a primary driver. Furthermore, the classic ratio of CPUs to GPUs in AI servers is rapidly changing, converging from 1:8 toward 1:1 for Agent deployments. This surge in demand has led to a rare industry-wide price increase of 10-15% for server CPUs from Intel and AMD, breaking a decade-long trend of "more performance for ...

On June 1st, at the GTC Taipei 2026 conference held during the Taipei Computer Show, NVIDIA unveiled the Vera CPU. Its newly released next-generation AI supercomputing platform, Vera Rubin, has OpenAI and Anthropic among its first customers.

This marks NVIDIA's first foray into a standalone CPU product line. NVIDIA's growth over the past 20 years has been almost entirely built on GPUs. NVIDIA CEO Jensen Huang stated at the launch that in the era of AI agents, the CPU has become a key bottleneck in data center performance, and the speed of token production in AI factories cannot be slowed down by the CPU.

Earlier in May, AMD CEO Lisa Su announced during an earnings call that she was doubling the market size forecast for server CPUs from $60 billion to over $120 billion. The corresponding compound annual growth rate from 2025 to 2030 was raised from 18% to 35%.

According to IDC statistics, the global server market reached $444.1 billion in 2025, a year-on-year increase of 80.4%, with AI servers contributing the majority of the growth. UBS predicted in a recent semiconductor industry report that the total addressable market (TAM) for server CPUs would grow from approximately $30 billion in 2025 to about $170 billion in 2030, a nearly fivefold increase in five years.

Data from market research firm Mercury Research shows that in Q1 2026, AMD's revenue share in server CPUs reached 46.2%, while Intel's was 53.8%. However, AMD's unit shipment share was only 33.2%, with Intel still holding 66.8%. This means AMD generated higher revenue with fewer chips, with the premium pricing power of high-core-count products being particularly evident in this quarter.

Lin Meibing, chief analyst at ChipTalk ICTIME, told the Economic Observer that the CPU is the most unexpected variable in the current AI cycle. As AI moves from dialogue to Agents (intelligent agents), the demand for CPUs in inference has already surpassed that in training.

GPUs Are "Waiting" for CPUs

Intel and Georgia Tech jointly published a paper in November 2025 titled "A CPU-Centric Perspective on Agentic AI." In this paper, the research team tested five typical Agent workloads. The results showed that the time spent on CPU-side tool processing accounted for 43.8% to 90.6% of the total latency.

A securities analyst long focused on the semiconductor sector said that during the large model training phase, the CPU's workload share is typically only about 10-30%, possibly reaching nearly 40% for certain workloads, with the vast majority of computations handled by GPUs. This is because AI model training involves highly regularized calculations, with billions of parameters performing matrix multiplications repeatedly on massive datasets. The parallel architecture of GPUs is designed for such tasks, while CPUs are responsible for data loading, communication scheduling, and result copying, not the core matrix operations.

However, this ratio flips during the inference stage. The CPU's workload share rises to over 70%, and even higher in Agent scenarios. This is because Agent tasks require multi-step reasoning, calling external tools, executing code, reading/writing databases, searching the web, and then orchestrating intermediate results into a final output.

Programming assistants, data analysis tools, and automated research agents fall into this category and are currently the fastest-growing scenarios in large model applications. The common characteristic of these tasks is control flow intensity, complex branching, and frequent input/output operations. GPU utilization significantly declines when faced with such serialized, fragmented tasks.

Multiple industry insiders stated that in Agent tasks, overall GPU utilization is generally below 50%, far lower than the 70% to 85% seen in traditional inference services. The token consumption for AI deployment in Agent mode is typically 20 to 30 times that of ordinary conversations, as a single user interaction often involves dozens of tool calls and intermediate reasoning steps.

IDC predicts that the global number of Agent-executed tasks per year will grow from about 44 billion in 2025 to over 400 trillion in 2030.

Intel management stated during the Q1 2026 earnings call that in the era of AI agents, the number of CPU cores required per gigawatt of power may increase from the current approximately 30 million to 120 million. Market research firm Gartner also predicts that by 2027, 40% of Agent projects will be scaled back or canceled due to infrastructure cost overruns, with a significant portion of these overruns stemming from the ongoing tool invocation and context management overhead on the CPU side.

Agents generate a large amount of intermediate data when handling long conversations and complex tasks. AI systems need to remember all previous dialogue content and tool call results during the reasoning process, known in the industry as KV Cache (Key-Value Cache). It continuously expands with conversation turns, but the onboard storage capacity of GPUs is very limited. NVIDIA's H100 has only 80GB, and the next-generation B200 has 192GB. The intermediate data generated by a complex Agent task can easily exceed this limit.

Currently, the common industry practice is to transfer this intermediate data from the GPU to the CPU side. CPUs can be equipped with external DDR5 memory, with single-chip capacities reaching several terabytes, one to two orders of magnitude larger than GPU memory.

The CXL (Compute Express Link) industry consortium, comprising chip manufacturers like Intel, AMD, and ARM, released the CXL 4.0 protocol in November 2025. This open standard for high-speed interconnection between chips allows multiple CPUs to share a single large-capacity memory pool, reducing the overhead of moving data between chips.

Thus, CPUs are no longer solely responsible for task scheduling; they also handle data storage and memory management during the AI inference process.

Additionally, CPUs themselves have undergone intensive technological upgrades in recent years. The core count in server CPUs has risen from 28 cores in 2017 to 288 cores (Intel Clearwater Forest) and 256 cores (AMD Venice) in 2026, representing a nearly 10-fold increase in density.

Intel introduced the AMX (Advanced Matrix Extensions) instruction set in 2023, giving CPUs dedicated matrix computing units for the first time. According to Intel's test data, in deep learning inference scenarios, the 4th Gen Xeon Scalable processors with AMX showed up to a 10x improvement in AI performance compared to the previous generation. The memory subsystem also upgraded from DDR4 to DDR5, doubling the bandwidth and capacity per platform.

The increase in core count and instruction set upgrades also correspond to changes in the CPU-to-GPU ratio. Intel CEO Pat Gelsinger (Note: The article uses the Chinese name 陈立武, which is likely referring to Pat Gelsinger) said during the Q1 2026 earnings call that in training scenarios, the typical ratio is 7 to 8 GPUs per CPU. In inference scenarios, this converges to 3 to 4 GPUs per CPU, and in Agent scenarios, it is expected to further converge to 1:1.

Intel CFO David Zinsner added during the same call that the industry's overall CPU-to-GPU ratio has converged from the previous 1:8 to approximately 1:4.

First Major Price Hike in Over a Decade

The aforementioned ratio change has already translated into product pricing.

Jia Bin, head of marketing at a CPU distributor in Shenzhen, told the reporter that since February 2026, Intel and AMD have successively raised prices across their server CPU product lines, with overall increases ranging from 10% to 15%. The spot premium for some high-end AI server CPUs is even higher, and there might be another round of price increases in the second half of the year.

Jia Bin said that for over a decade, server CPUs essentially followed a "more performance for the same price" trend, with performance improving with process nodes while unit prices remained stable. This year's price increase is rare in the industry. The capacity utilization rate of Intel's main production lines has risen from less than 80% previously to 100%, with multiple models in short supply and lead times extending to 3-4 months.

AMD is also facing tight capacity. Jia Bin said that 2026 is the first time since he entered the industry that he has seen Intel's and AMD's server CPU capacity essentially fully booked. "In the past, CPU supply was always sufficient; this year, it's the opposite."

Jia Bin also noted that customer demand for CPUs when procuring AI servers is splitting into two categories. One is CPUs deployed inside the rack to work alongside GPU computing, which pursue extreme core counts (128+ cores), with an average price above $4,000. Traditional server CPUs average around $2,000. The other category is CPUs for independent deployment outside the rack, used for Agent tool execution, sandbox operation, and task orchestration. These don't require extreme performance; around 64 cores are sufficient, but the required quantity is much larger.

Jia Bin explained that ideally, each Agent task exclusively uses one CPU. Independent deployment is more efficient than virtualization partitioning. The average price for these "outside-the-rack" CPUs is about $3,000. "The unit price increase is greater for higher core counts, not proportional. So, the common practice among customers now is to use mid-range products for volume deployment outside the rack and flagship products for performance inside the rack."

In a semiconductor industry report titled "Rise of the Agents" released on June 11th, Bank of America Securities (BofA Securities) raised its forecast for the total addressable market (TAM) of server CPUs in 2030 to over $170 billion. For the first time, it segmented this market into three parts: traditional cloud computing CPUs (~$30 billion), AI cluster head node CPUs (~$70 billion), and AI agent independent node CPUs (~$70 billion). Among these, the third part had a size close to zero in 2025 and is a completely new market emerging in 2026.

Morgan Stanley also predicted in a report on June 4th that agentic AI would bring $32.5 to $60 billion in new demand for the server CPU market by 2030. Zhongtai Securities stated in a CPU in-depth research report released on June 7th that 2026 marks the "first year of CPUs benefiting from AI scale-out."

The aforementioned BofA Securities report also provided a historical comparison of shipment volumes: In 2022, AI CPU shipments were equivalent to 19% of AI accelerator (GPU, etc.) shipments. By 2025, this ratio rose to 51% and is projected to reach 127% by 2030. According to this prediction, the number of CPUs in AI servers will surpass that of GPUs within five years.

New Demand for Domestic CPUs

Information released by NVIDIA during the Taipei Computer Show indicates that its newly unveiled Vera CPU is based on the ARM architecture (a CPU instruction set known for low power consumption and high efficiency, one of the two mainstream architectures alongside x86). Up to 256 units can be deployed per cabinet, utilizing liquid cooling.

In Agent sandbox scenarios, Vera's performance is 1.8 times that of x86 processors. In NVIDIA's latest Vera Rubin supercomputing cluster (NVIDIA's next-generation AI data center platform), a 40-rack POD (the smallest complete computing unit composed of multiple racks) contains 1152 Rubin GPUs and up to 1088 Vera CPUs, achieving a ratio close to 1:1.

NVIDIA also mentioned that its previously released Grace CPU has cumulatively shipped nearly 2.5 million units, and CPU-related revenue in 2026 is expected to approach $20 billion.

Jia Bin believes that the statistical scope for the aforementioned $20 billion is broad, encompassing CPU revenue attribution across various product forms, which is not entirely the same as the revenue from selling standalone CPU chips in the traditional sense. But even considering differences in scope, this volume is significant for a company that did not have an independent CPU business in 2024.

Lin Meibing believes that the symbolic significance of NVIDIA making CPUs is greater than the product itself. In the past, AI servers were GPU-centric, with CPUs merely serving as supporting components. When the world's largest GPU company starts making CPUs itself and locks in OpenAI and Anthropic as its first customers, the market status of the CPU has fundamentally changed compared to two years ago.

According to AMD's Q1 2026 financial report, the company's Data Center segment revenue reached $5.775 billion, surpassing Intel's $5.1 billion in the same period for the first time. Furthermore, Lisa Su set a five-year goal during the earnings call: to move towards $100 billion in annual Data Center revenue.

Intel CEO Pat Gelsinger has also repeatedly expressed in public his firm confidence in the core role of CPUs in the AI era.

This also presents an opportunity for China's CPU industry chain companies. Jia Bin stated that domestic leading cloud vendors are increasing their procurement of server CPUs this year. On one hand, this is to accompany GPU purchases for newly built AI data centers. On the other hand, it's because the CPU-to-GPU ratio has converged from the previous 1:8 to 1:4 or even higher, meaning the same data center requires more than double the number of CPUs compared to last year.

In fact, a relatively complete industrial chain around server CPUs has already formed domestically.

Hygon Information Technology (688041.SH) is one of the largest domestic vendors in terms of x86 architecture server CPU shipments. According to relevant financial reports, Hygon's revenue in 2025 was 14.377 billion yuan, a year-on-year increase of 56.92%. Its Q1 2026 revenue was 4.034 billion yuan, with the growth rate accelerating further to 68.06% year-on-year.

According to public information, Huawei's Kunpeng follows a full-stack self-developed ARM route. The Kunpeng 920/950 deeply synergizes with the Ascend AI chips, primarily serving Huawei's own ecosystem and the information innovation (信创) market.

In terms of supporting chips, Montage Technology's (688008.SH) main products are memory interface chips (signal relay chips between server CPUs and memory modules). According to public information, its memory interface chips held a 36.8% global market share in 2024, ranking first worldwide. Another product line, PCIe Retimer chips (used for signal amplification and repair in high-speed data transmission), held a 10.9% global market share in 2024, ranking second.

In the packaging, testing, and manufacturing segment, according to public information, Tongfu Microelectronics (002156.SZ) is one of AMD's most important packaging and testing partners globally.

Li Bin (Note: The article uses 李彬 here, but earlier used 贾彬/Jia Bin for the distributor. This might be a different person or a typo/alias.) told the reporter that the software ecosystem for domestic chips is nearing a tipping point. He gave an example: On the day DeepSeek V4 was released, multiple domestic chip manufacturers completed adaptation on the same day, whereas the adaptation cycle for the previous DeepSeek R1 took 1 to 2 months. The significant acceleration in adaptation speed indicates that the software toolchains and driver layers for domestic chips are maturing rapidly, which is beneficial for the entire domestic CPU and accelerator industry chain.

In Lin Meibing's view, the benefit logic for domestic CPUs operates on two levels: one is the industry growth driven by the global increase in server CPU demand, and the other is the domestic substitution driven by the information innovation (信创) policy.

According to relevant documents issued by the State-owned Assets Supervision and Administration Commission (SASAC) in 2022, central state-owned enterprises are required to complete the domestic transformation of their information systems by the end of 2027. The reporter also learned during the interview process that the domestic substitution rate for high-end server CPUs in China is still relatively low, indicating vast room for replacement. With less than two years until the policy deadline, the delivery window for信创 CPUs is narrowing, presenting a concentrated test for the product maturity and shipment capabilities of domestic CPU manufacturers like Hygon and Loongson Technology (688047.SH).

Lin Meibing believes that the current CPU price increase cycle is different from the past. The increment comes from the entirely new demand for CPUs driven by AI Agents, not from replacement demand driven by process node upgrades.

Ying Zhiwei's (应志伟 - likely an analyst/industry expert) judgment is similar. He said that for the past few years, market attention has been almost entirely focused on GPUs. But when AI applications truly enter the large-scale deployment stage, the scheduling and management functions undertaken by CPUs will only become increasingly important. In his view, this is not about CPUs replacing GPUs; GPUs are still crucial. However, what will truly create differentiation moving forward is the synergy capability between CPUs and GPUs, rather than the performance parameters of a single chip.

This article is from WeChat public account: Economic Observer , Author: Zheng Chenye

Trending Cryptos

Related Questions

QWhy has CPU become a critical bottleneck in AI data centers according to Jensen Huang and the article?

AJensen Huang stated that in the AI agent era, the CPU has become a key performance bottleneck in data centers. This is because AI agent tasks involve complex, multi-step reasoning, calling external tools, executing code, and managing intermediate data (like KV Cache). These tasks are control-flow intensive and have low GPU utilization (often below 50%), requiring significant CPU resources for orchestration and data management to prevent slowing down the overall 'token' production speed.

QWhat is the projected growth for the server CPU market according to UBS and what are the main drivers?

AAccording to UBS, the total addressable market (TAM) for server CPUs is projected to grow from approximately $30 billion in 2025 to about $170 billion by 2030, a nearly fivefold increase. The main drivers are the massive growth in AI servers and, more specifically, the emergence of new demand from AI Agent workloads, which require a higher ratio of CPUs to GPUs and new types of CPU deployments for tool execution and task orchestration.

QHow is the CPU-to-GPU ratio in AI servers changing from training to agent scenarios?

AThe CPU-to-GPU ratio is converging significantly as workloads move from training to inference to agent tasks. For training, the typical ratio is 1 CPU to 7-8 GPUs. For inference, it converges to about 1 CPU to 3-4 GPUs. For AI agent scenarios, the ratio is expected to converge further towards 1:1. Industry-wide, the ratio has already shifted from a historical 1:8 to approximately 1:4.

QWhat are the two distinct types of CPU demand emerging in AI server procurement as described in the article?

ATwo distinct types of CPU demand are emerging: 1) High-end, high-core-count CPUs (128+ cores, priced above $4000) deployed inside the server rack alongside GPUs for intensive compute support. 2) Mid-range CPUs (around 64 cores, priced around $3000) deployed outside the rack in independent nodes dedicated to tool execution, sandboxing, and task orchestration for AI agents. This second type requires larger quantities of CPUs.

QWhat opportunities and challenges does the AI-driven CPU boom present for China's domestic CPU industry?

AOpportunities include: 1) Growth from increased global CPU demand for AI data centers. 2) Policy-driven import substitution (Xinchuang), requiring state-owned enterprises to complete IT system localization by 2027, creating a vast replacement market. Challenges involve: 1) The need for domestic CPU makers (like Haiguang Information, Loongson) to rapidly mature their products and scale delivery to meet the tight policy deadline. 2) Enhancing software ecosystems and toolchains to achieve faster AI model adaptation, as demonstrated by the quick adaptation to models like DeepSeek V4.

Related Reads

TechFlow Intelligence: AMD AI Director Publicly Criticizes Claude Code for "Becoming Dumber and Lazier", Trump Claims Full Ceasefire in Hormuz But Strait Still Has 80 Unexploded Mines

TechFlow Intelligence Report: This daily digest covers key developments in AI, crypto, hardware, and geopolitics. In AI, SK Telecom faces US export control scrutiny over its partnership with Anthropic, while a Gemini user reports being misled in a scam scenario, sparking safety debates. China's Z.AI launches the GLM-5.2 model, rivaling Claude Opus without NVIDIA chips. In crypto, Bithumb lists ReProtocol, and Upbit delists KernelDAO. On the hardware front, MIT researchers build a custom OS to study chips, ASML denies US claims its advanced lithography machines are in China, and Amazon considers selling its in-house AI chips. Apple's future A21 Pro chip may use TSMC's latest N2P process. Major tech issues include 10,000 GitHub repositories distributing malware and Apple patching a critical eavesdropping flaw in Beats earbuds. US stocks rise, led by semiconductors, with Intel surging 10.6%, while SpaceX falls 3.5%. Geopolitically, despite a US-Iran deal, the Strait of Hormuz remains risky with ~80 uncleared mines, stalling 80M barrels of oil on standby tankers. Iran postpones Switzerland talks, and Trump calls the agreement an "unconditional surrender." The report highlights a contrast: temporary geopolitical calm versus the ongoing, fundamental restructuring of tech supply chains and chip independence.

marsbit2h ago

TechFlow Intelligence: AMD AI Director Publicly Criticizes Claude Code for "Becoming Dumber and Lazier", Trump Claims Full Ceasefire in Hormuz But Strait Still Has 80 Unexploded Mines

marsbit2h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of S (S) are presented below.

活动图片