Where Is the AI Infrastructure Industry Chain Stuck?

marsbitPublished on 2026-04-21Last updated on 2026-04-21

Abstract

The AI infrastructure (AI Infra) industry chain is facing unprecedented systemic bottlenecks, despite the rapid emergence of applications like DeepSeek and Seedance 2.0. The surge in global computing demand has exposed critical constraints across multiple layers of the supply chain—from core manufacturing equipment and data center cabling to specialty materials and cleanroom facilities. Key challenges include four major "walls": - **Memory Wall**: High-bandwidth memory (HBM) and DRAM face structural shortages as AI inference demand outpaces training, with new capacity not expected until 2027. - **Bandwidth Wall**: Data transfer speeds lag behind computing power, causing multi-level bottlenecks in-chip, between chips, and across data centers. - **Compute Wall**: Advanced chip manufacturing, reliant on EUV lithography and monopolized by ASML, remains the fundamental constraint, with supply chain fragility affecting production. - **Power Wall**: While energy demand from data centers is rising, power supply is a solvable near-term challenge through diversified energy infrastructure. Expansion is further hindered by shortages in testing equipment, IC substrates (critical for GPUs and seeing price hikes over 30%), specialty materials like low-CTE glass fiber, and high-end cleanroom facilities. Connection technologies are evolving, with copper cables resurging for short-range links due to cost and latency advantages, while optical solutions dominate long-range scenari...

As groundbreaking AI applications like DeepSeek and Seedance 2.0 continue to emerge, global demand for computing power is surging at an unprecedented pace. However, behind this computing arms race, the AI infrastructure (AI Infra) industry chain is facing systemic bottlenecks like never before. From core equipment in chip manufacturing to a single copper cable in data centers, from specialty materials to cleanroom facilities, nearly every critical link is flashing a "red light."

Four Major "Walls" in Computing Power Development

The development of AI computing power is not just about improving chip performance; it is a complex systems engineering challenge involving computing, storage, transmission, and energy.

(1) Memory Wall: The First Shackle in the AI Inference Era

Currently, the AI industry is shifting its focus from large model training to inference, with global AI inference demand expected to surpass training scenarios by 2026. The explosion in AI inference demand directly drives the need for high-bandwidth memory (HBM) and high-capacity DRAM.

Although major memory chip manufacturers are planning to expand production capacity, it takes at least two years from investment to actual production line operation, meaning the supply shortage is unlikely to ease in the short term. New capacity is primarily set to come online in 2027 and beyond, leading to a structural mismatch in 2026 where demand grows rapidly while supply lags.

(2) Bandwidth Wall: The "Clogged Capillaries" of Data Flow

The speed of computing power improvement far exceeds that of data transmission. This contradiction has led to a severe "bandwidth wall" problem—data flow within chips, between chips, within server racks, and between data centers has become the performance bottleneck of the entire computing system.

The current bandwidth bottleneck is multi-layered: within chips, interconnect delays and power consumption between transistors are continuously rising; between chips, traditional PCB board interconnects can no longer meet the high-bandwidth, low-latency demands of AI chips; within server racks, interconnect bandwidth between servers has become a constraint for Scale Up (vertical scaling); between data centers, long-distance transmission bandwidth and latency limit the efficiency of Scale Out (horizontal scaling) and cross-regional computing power scheduling.

Estimates show that in current AI training clusters, the energy consumption of data movement already exceeds that of computation itself. How to unclog the "capillaries" of data flow and reduce transmission latency and power consumption is a critical issue that must be addressed for AI Infra development.

(3) Compute Wall: High-End Chip Manufacturing as the Fundamental Constraint

AI chip performance iteration heavily relies on advanced process technologies, and the production capacity of these advanced processes is entirely constrained by upstream high-end manufacturing equipment, particularly EUV (extreme Ultraviolet) lithography machines.

Currently, only ASML can produce EUV lithography machines globally, with extremely limited capacity and strict export controls. This directly results in a severe shortage of capacity for processes below 7nm, unable to meet the explosive demand for AI chips. As the global leader in AI chips, NVIDIA's delivery of high-end chips like the H100 and H200 has been constrained by TSMC's advanced process capacity, with lead times stretching to several months or even over a year.

More critically, chip manufacturing is a highly globalized industry chain; a break in any single link affects the entire production capacity. From raw materials like photoresists, target materials, and electronic special gases to key equipment like etching and deposition tools, there are varying degrees of monopoly and supply constraints. This makes high-end chip manufacturing capability the most challenging bottleneck to break through in the AI Infra industry chain.

(4) Power Wall: A Relatively Controllable Short-Term Challenge

Compared to the first three, the power wall is a relatively easier bottleneck to solve. AI data centers are major energy consumers; the annual electricity consumption of a single ultra-large data center campus can even exceed that of a medium-sized city with hundreds of thousands of people. Currently, global data center electricity consumption accounts for 2% to 3% of total global electricity use and is still climbing. But the power issue is essentially an infrastructure construction problem that can be addressed through diversified energy supply methods like gas turbines, fuel cells, and photovoltaics.

In the long run, with the development of renewable energy technologies and the improvement of energy infrastructure, power supply will not become the biggest mid-to-long-term bottleneck for AI computing power development. However, in some regions, short-term power supply pressures due to lagging grid construction may still limit the pace of data center construction.

The "Invisible Killer" of Capacity Expansion: Comprehensive Shortages in Equipment and Materials

The pace of AI chip capacity expansion is far slower than expected, with the core constraint not being the chips themselves but comprehensive shortages in upstream equipment and materials.

(1) Rapid Growth in Demand for Testing Equipment

AI chip technology upgrades are driving higher precision and efficiency requirements for testing equipment. Compared to ordinary logic chips, AI GPUs have a massive increase in signal ports, consuming more signal channel resources of testers; simultaneously, the surge in transistor count leads to a significant increase in corresponding test vector scale and per-chip testing time. More critically, while only a certain percentage of chips in traditional consumer electronics are tested, for AI chips, 100% of chips must be tested, often through multiple stages, to ensure the entire chipset operates normally. Driven strongly by AI computing demand and the memory market explosion, semiconductor test equipment (ATE) has become one of the fastest-growing categories in the semiconductor equipment sector.

Advantest, the world's largest chip test equipment supplier, also stated that it expects record highs for the fiscal year ending March 2026, with revenue projected to grow 37% and net profit more than doubling from the previous year.

(2) IC Substrates/Package Substrates: The "Choke Point" More Expensive Than Chips

Surprisingly, the biggest supply chain pain point for leading chip manufacturers like NVIDIA is not the chips themselves, but IC substrates (package substrates). IC substrates are key components connecting chips to PCB boards, providing electrical connection and physical support. AI chips have extremely high requirements for IC substrates—they need larger area, higher wiring density, better thermal performance, and lower signal loss. This also means their value is inevitably much higher than ordinary PCBs. Estimates show that IC substrates account for about 50% of the total packaging cost, and in advanced flip-chip packaging, this proportion can even reach 70%–80%. Depending on the resin material used, IC substrates are mainly divided into BT substrates and ABF substrates. BT substrates are primarily used for various memory chips, while ABF is more focused on logic chips like CPUs, GPUs, FPGAs, and ASICs.

According to incomplete statistics, since 2025, IC substrate prices have accumulated an increase of over 30%. The price hike is mainly due to two reasons: first, cost transmission from upstream raw materials—core materials like high-end glass fiber cloth and copper foil have been in continuous short supply since 2025, with the capacity gap不断扩大 (expanding); second, the explosion in demand for 2.5D/3D advanced packaging—high-end chips like GPUs普遍采用 (commonly adopt) multi-chip stacking architectures, and the significant increase in chip layers and area directly drives up the demand for substrate area.

Unlike ordinary PCBs, IC substrates have high technical barriers and complex processes. Global capacity for high-end IC substrates is mainly concentrated in a few Taiwanese manufacturers like Unimicron and Nan Ya PCB, with capacity expansion cycles as long as 18-24 months. This means the tight supply situation for IC substrates is unlikely to be fundamentally alleviated within the next two years.

(3) Key Specialty Materials: The Extremely Scarce "Industrial MSG"

Some seemingly insignificant specialty materials are becoming the "Achilles' heel" of the AI industry chain. Materials like Low-CTE (low coefficient of thermal expansion) glass fiber, specialty copper foil, and high-end drill bits, though used in small quantities, are indispensable "industrial MSG" for manufacturing high-end IC substrates and PCB boards.

The high power consumption and performance requirements of AI chips necessitate the use of materials with extremely low thermal expansion coefficients for substrates and PCBs to prevent deformation under high-temperature operating conditions. Simultaneously, as fillers are used, the lifespan of drill bits used in the加工过程 (processing) is drastically reduced to 1/5-1/7 of the original, leading to an explosive growth in demand for drill bits.

These specialty materials have extremely high technical barriers, global capacity is highly concentrated, and expansion is difficult. Any supply interruption will directly impact the normal operation of the entire AI industry chain.

(4) High-End Cleanrooms: The Overlooked High-Barrier Segment

In the AI industry chain's capacity expansion, high-end cleanrooms are another severely overlooked high-barrier segment. Advanced process chips and advanced packaging have extremely high requirements for production environment cleanliness—a single speck of dust in the air can cause an entire wafer to be scrapped.

The construction of high-end cleanrooms requires not only huge capital investment but also extremely high technical expertise. From air purification systems to anti-static facilities, from temperature and humidity control to vibration isolation, every环节 (aspect) has strict standards. Currently, the global high-end cleanroom market is mainly dominated by overseas companies, with net profit margins potentially exceeding 20%, far higher than domestic counterparts.

With the global expansion of AI chip capacity, demand for high-end cleanrooms remains strong, making it a segment with extremely strong certainty and high prosperity within the industry chain.

The "Route Dispute" in Connection Technology: Copper Resurgence and Photonic-Electronic Integration

Beyond computing and expansion bottlenecks, connection technology inside data centers is undergoing a profound transformation. The technological路线之争 (route dispute) between copper and light, along with the technological upgrades of PCB/substrates, is reshaping the connectivity landscape of AI Infra.

(1) Scenario-Based Competition and Substitution Between Copper and Light

For a long time, optical modules have been considered the future direction for high-speed interconnection in data centers. But with the explosion of AI computing demand, copper cable technology is experiencing a "resurgence," with copper and light forming a relationship of complementarity and substitution in different scenarios.

Short Distance (≤7 meters): Copper cables (AEC, Active Electrical Cables), with advantages of low cost, high reliability, and low latency, are comprehensively replacing laser-based optical modules. In short-distance interconnection scenarios within servers and within server racks, copper cables offer significant cost-performance advantages.

Medium Distance (~30 meters): Micro LED optical cables have become a compromise solution. They combine the advantages of copper cables and optical modules, offering better reliability than laser optical modules and lower cost than traditional optical modules, suitable for medium-distance interconnection between racks.

Long Distance (Between Data Centers): Traditional pluggable optical modules and fiber optics remain mainstream. CPO (Co-Packaged Optics) technology is considered the future direction; it integrates the optical engine with the chip package, significantly increasing bandwidth and reducing power consumption. However, it still faces challenges like high cost and poor reliability, and widespread commercial use is still some time away.

It is worth noting that the procurement scale and performance specifications for optical fiber in AI data centers have already created an order-of-magnitude difference compared to traditional telecom networks. To meet the low-latency, high-bandwidth interconnection needs of GPU clusters, demand for特种光纤 (specialty optical fibers) like G.657.A2 continues to rise, and more cutting-edge hollow-core fiber solutions have entered the deployment stage. Hollow-core fiber replaces the traditional glass core with air, significantly optimizing transmission: transmission loss can be reduced from the常规 (conventional) 0.14dB/km to below 0.1dB/km, transmission delay reduced from 5μs/km to 3.46μs/km, while tolerating higher optical power.

Currently, the number of participants in the hollow-core fiber market is expanding rapidly, but prices remain relatively stable, at about 30,000-40,000 RMB per kilometer, far higher than普通光纤 (ordinary optical fiber).

(2) Technological Upgrade Pressure on PCB/Substrates

To meet the high-bandwidth demands of AI chips, PCB and substrate technologies are also continuously upgrading. Currently, PCB/substrates are moving towards n+m layer structures, glass substrates, and modified Semi-Additive Process (mSAP) technology.

The n+m structure increases the number of layers and wiring density, enhancing the substrate's bandwidth capability; glass substrates have a lower coefficient of thermal expansion and better high-frequency performance, representing an important future direction for high-end substrates; mSAP technology enables finer circuit wiring, meeting high-density interconnection demands.

These technological upgrades place new demands (提出了全新的要求) on upstream equipment, materials, and manufacturing processes, also bringing new industrial opportunities and challenges.

Summary

The AI Infra industry chain is facing intertwined constraints from multiple bottlenecks. From the computing层面的 (level) memory wall, bandwidth wall, compute wall, and power wall, to expansion-level shortages in testing equipment, IC substrates, specialty materials, and cleanrooms, to the technological route dispute at the connection level, every环节 (link) affects the large-scale deployment of AI computing power.

High-end chip manufacturing capability is the most fundamental constraint, determining the performance上限 (upper limit) and production scale of AI chips. Testing equipment, high-end IC substrates, key specialty materials, etc., are currently the segments with the strongest certainty and the most acute supply-demand矛盾 (contradiction). In the long run, AI Infra development will show two major trends: first, the technological evolution of copper cable resurgence and photonic-electronic integration, where different technological routes will coexist in their respective advantageous scenarios; second, the restructuring of the global industry chain and the acceleration of localization, where domestic companies are expected to achieve breakthroughs in some细分领域 (segments).

This article is from the WeChat public account "Semiconductor Industry Vertical and Horizontal" (ID: ICViews), author: Peng Cheng

Related Questions

QWhat are the four major bottlenecks (walls") mentioned in the article that are constraining AI infrastructure development?

AThe four major bottlenecks are: 1) The Memory Wall, caused by the shift to AI inference and the resulting shortage of HBM and DRAM. 2) The Bandwidth Wall, where data transfer speeds cannot keep up with computing power, creating a performance bottleneck. 3) The Compute Wall, where the manufacturing of high-end chips is fundamentally constrained by the limited supply of advanced equipment like EUV lithography machines. 4) The Power Wall, a relatively more solvable short-term challenge concerning the massive energy consumption of AI data centers.

QAccording to the article, what is a more immediate supply chain pain point for chipmakers like NVIDIA than the chips themselves?

AThe article states that the most immediate supply chain pain point for chipmakers like NVIDIA is not the chips themselves, but IC substrates (packaging substrates). These are the critical components that connect the chip to the PCB, and their production is constrained by high technical barriers and long expansion cycles of 18-24 months.

QHow is the 'Bandwidth Wall' problem described in the context of AI clusters?

AThe 'Bandwidth Wall' is described as a multi-level performance bottleneck where the speed of data movement cannot keep up with the speed of computation. This occurs within chips (interconnect delays), between chips (traditional PCB interconnects are insufficient), inside server racks (limiting scale-up), and between data centers (limiting scale-out). It's noted that the energy consumed by moving data in an AI training cluster already exceeds the energy consumed by the computation itself.

QWhat two key factors are driving the price increases and shortages of IC substrates?

AThe two key factors driving IC substrate price increases and shortages are: 1) Cost transmission from upstream raw materials like high-end glass fiber cloth and copper foil, which have been in short supply. 2) The explosive demand from 2.5D/3D advanced packaging, where multi-chip stacking architectures used in GPUs significantly increase the required substrate area.

QIn data center connectivity, what are the competing technological routes for different distance scenarios as outlined in the article?

AThe article outlines a scenario-based competition between copper and optical technologies: 1) Short distance (≤7m): Active Electrical Cables (AEC) are replacing optical modules due to lower cost and higher reliability. 2) Medium distance (~30m): Micro LED optical cables are a compromise solution. 3) Long distance (between data centers): Traditional pluggable optical modules and fiber optics remain the mainstream, with Co-Packaged Optics (CPO) seen as a future direction.

Related Reads

How Difficult is Chip Making? A Division Error Costs 475 Million Dollars

How Hard Is It to Make a Chip? A Division Error Cost $475 Million Chip expert Shi Kan, a researcher at the Chinese Academy of Sciences and a popular tech creator, explains the immense challenges of chip development. Chips are foundational to modern technology, but their creation is extraordinarily difficult. The journey from sand to a functional chip involves complex design and manufacturing, but a critical bottleneck is verification—ensuring the design works flawlessly before costly production. A single, undetected bug can have catastrophic consequences, as illustrated by the infamous 1994 Intel Pentium FDIV bug. A flaw in the floating-point division unit forced a recall costing $475 million. Unlike software, chips cannot be easily patched after manufacture, making "first-time success" paramount. However, industry surveys show only 24% of chip projects achieve this; over three-quarters require at least one costly re-spin due to design flaws. Verification has thus become the dominant phase, consuming up to 70% of the design cycle. The core challenge is a "verification impossible triangle" between high performance, good debuggability, and low cost. Exhaustively verifying a modern CPU core could take 15,000 years with software simulation, or 30 years with advanced hardware emulation—timeframes utterly impractical for development. Despite being essential, verification is often seen as unglamorous "dirty work," receiving less academic attention than fields like AI. Shi and his team are tackling this by developing an agile verification research framework called ENCORE, based on FPGA technology, to improve verification efficiency and debug capability. Beyond research, Shi engages in public science communication through long-form video content, aiming to demystify chip technology, AI, and computer science. He argues for the value of pursuing "hard and long-term" endeavors, whether in the meticulous world of chip verification or in creating substantive educational content, believing such sustained effort is likely the right path forward.

marsbit5m ago

How Difficult is Chip Making? A Division Error Costs 475 Million Dollars

marsbit5m ago

Blockchain Has Finally Started to Sail into the Mainstream After 18 Years

Blockchain Finds Its True Path After 18 Years: Becoming the Financial Backbone for AI Agents and Autonomy This analysis explores a pivotal shift in the blockchain and crypto investment landscape, driven by the dominance of AI. Major venture capital firms, including Variant, Paradigm, Haun Ventures, and YZi Labs, are moving beyond pure "crypto" investment theses. They are expanding their focus to AI, robotics, and frontier tech, signaling that blockchain is no longer seen as a standalone sector but as an underlying infrastructure layer. The core argument is that blockchain's killer application may not be user-facing apps, but rather providing the economic rails for the coming wave of AI agents, autonomous robots, and automated systems. Key capabilities like self-custody wallets, programmable stablecoins for micropayments, on-chain identity, and verifiable smart contracts are positioned as essential for a future where machines conduct economic activity. The recent $1.4 billion investment by Tether (via its venture arm) in German robotics company NEURA Robotics exemplifies this, aiming to embed Tether's wallet tools directly into robots for autonomous transactions. While many "AI + Crypto" projects remain superficial, the article concludes that true value lies where crypto is a necessary component—enabling machine-to-machine payments, agent autonomy, verifiable data provenance, and open financial settlement for the AI era. For crypto venture capital, this convergence with AI represents both an adaptation to shifting capital flows and a potential path to unlocking the large-scale, non-speculative utility the industry has long sought.

marsbit25m ago

Blockchain Has Finally Started to Sail into the Mainstream After 18 Years

marsbit25m ago

Blockchain has finally begun sailing toward the main channel after 18 years

After 18 years of development, blockchain technology is beginning to move from a specialized niche into mainstream adoption, according to a recent industry analysis. The shift is reflected in the changing strategies of major crypto venture capital firms, which are expanding their focus beyond pure "digital ownership" towards broader themes like "autonomy." The report highlights that leading VC firms like Variant, Paradigm, Haun Ventures, and YZi Labs are broadening their investment mandates to include not only crypto but also artificial intelligence (AI), robotics, biotech, and other frontier technologies. This reflects a recognition that the isolated "crypto investment" narrative is losing appeal to limited partners (LPs) as capital and attention increasingly flow toward AI and other high-growth tech sectors. A key emerging thesis is that blockchain's most significant future application may not be as a consumer-facing product, but as the underlying economic and settlement infrastructure for the AI era. As AI agents and autonomous systems become more prevalent, they will require programmable, global, and low-cost payment networks (like stablecoins), verifiable digital identities, and secure wallets to manage transactions and assets on behalf of users. The investment by stablecoin issuer Tether into robotics company NEURA, with plans to integrate its wallet technology, is cited as a prime example of this convergence. However, the article cautions that simply labeling projects as "AI + Crypto" is insufficient. True value lies in integrations where blockchain technology is essential—such as enabling machine-to-machine micropayments, verifiable data provenance for AI, or transparent governance for autonomous organizations—rather than being a superficial marketing add-on. In conclusion, while AI currently dominates the tech narrative and capital flows, it may ultimately create the real-world, high-frequency demand that the crypto industry has long sought. For crypto VCs and projects, the path forward is to position blockchain not as a competing sector, but as a critical foundational layer powering autonomy and economic activity in an AI-driven future.

链捕手32m ago

Blockchain has finally begun sailing toward the main channel after 18 years

链捕手32m ago

Y Combinator Co-founder: How to Make a Billion Dollars?

The Y Combinator co-founder argues that becoming a billionaire by founding a successful startup is not only possible but demonstrably achievable without unfair or unethical practices. He disputes a politician's claim to the contrary, using the example of a founder whose company grew at 93% monthly solely through creating a product users loved and recommended. The core mechanism is exponential growth. A conservative 15% monthly growth rate compounds to a 4384x increase over five years, which can easily lead to billion-dollar valuations and founder wealth. The process depends on two key variables: the growth rate and the duration it can be sustained. A high growth rate stems from a great product that users naturally promote, while a long duration requires a large enough market. For aspiring founders, especially young ones, the simplest path is to build something they and their friends genuinely need. Young people's current needs often predict future mass-market trends. He advises against actively "searching" for ideas, as this tends to filter out unconventional but promising ones. Instead, inspiration should come from working on interesting projects with friends, as many iconic companies (e.g., Apple, Facebook) started this way. Ultimately, building a massively valuable startup is not about exploitation but empathy: deeply understanding a user group and building a product that significantly improves their lives. This, powered by exponential growth in a large market, is the legitimate path to immense wealth creation.

Foresight News34m ago

Y Combinator Co-founder: How to Make a Billion Dollars?

Foresight News34m ago

Trading

Spot
Futures
活动图片