Where Is the AI Infrastructure Industry Chain Stuck?

marsbit发布于2026-04-21更新于2026-04-21

文章摘要

The AI infrastructure (AI Infra) industry chain is facing unprecedented systemic bottlenecks, despite the rapid emergence of applications like DeepSeek and Seedance 2.0. The surge in global computing demand has exposed critical constraints across multiple layers of the supply chain—from core manufacturing equipment and data center cabling to specialty materials and cleanroom facilities. Key challenges include four major "walls": - **Memory Wall**: High-bandwidth memory (HBM) and DRAM face structural shortages as AI inference demand outpaces training, with new capacity not expected until 2027. - **Bandwidth Wall**: Data transfer speeds lag behind computing power, causing multi-level bottlenecks in-chip, between chips, and across data centers. - **Compute Wall**: Advanced chip manufacturing, reliant on EUV lithography and monopolized by ASML, remains the fundamental constraint, with supply chain fragility affecting production. - **Power Wall**: While energy demand from data centers is rising, power supply is a solvable near-term challenge through diversified energy infrastructure. Expansion is further hindered by shortages in testing equipment, IC substrates (critical for GPUs and seeing price hikes over 30%), specialty materials like low-CTE glass fiber, and high-end cleanroom facilities. Connection technologies are evolving, with copper cables resurging for short-range links due to cost and latency advantages, while optical solutions dominate long-range scenari...

As groundbreaking AI applications like DeepSeek and Seedance 2.0 continue to emerge, global demand for computing power is surging at an unprecedented pace. However, behind this computing arms race, the AI infrastructure (AI Infra) industry chain is facing systemic bottlenecks like never before. From core equipment in chip manufacturing to a single copper cable in data centers, from specialty materials to cleanroom facilities, nearly every critical link is flashing a "red light."

Four Major "Walls" in Computing Power Development

The development of AI computing power is not just about improving chip performance; it is a complex systems engineering challenge involving computing, storage, transmission, and energy.

(1) Memory Wall: The First Shackle in the AI Inference Era

Currently, the AI industry is shifting its focus from large model training to inference, with global AI inference demand expected to surpass training scenarios by 2026. The explosion in AI inference demand directly drives the need for high-bandwidth memory (HBM) and high-capacity DRAM.

Although major memory chip manufacturers are planning to expand production capacity, it takes at least two years from investment to actual production line operation, meaning the supply shortage is unlikely to ease in the short term. New capacity is primarily set to come online in 2027 and beyond, leading to a structural mismatch in 2026 where demand grows rapidly while supply lags.

(2) Bandwidth Wall: The "Clogged Capillaries" of Data Flow

The speed of computing power improvement far exceeds that of data transmission. This contradiction has led to a severe "bandwidth wall" problem—data flow within chips, between chips, within server racks, and between data centers has become the performance bottleneck of the entire computing system.

The current bandwidth bottleneck is multi-layered: within chips, interconnect delays and power consumption between transistors are continuously rising; between chips, traditional PCB board interconnects can no longer meet the high-bandwidth, low-latency demands of AI chips; within server racks, interconnect bandwidth between servers has become a constraint for Scale Up (vertical scaling); between data centers, long-distance transmission bandwidth and latency limit the efficiency of Scale Out (horizontal scaling) and cross-regional computing power scheduling.

Estimates show that in current AI training clusters, the energy consumption of data movement already exceeds that of computation itself. How to unclog the "capillaries" of data flow and reduce transmission latency and power consumption is a critical issue that must be addressed for AI Infra development.

(3) Compute Wall: High-End Chip Manufacturing as the Fundamental Constraint

AI chip performance iteration heavily relies on advanced process technologies, and the production capacity of these advanced processes is entirely constrained by upstream high-end manufacturing equipment, particularly EUV (extreme Ultraviolet) lithography machines.

Currently, only ASML can produce EUV lithography machines globally, with extremely limited capacity and strict export controls. This directly results in a severe shortage of capacity for processes below 7nm, unable to meet the explosive demand for AI chips. As the global leader in AI chips, NVIDIA's delivery of high-end chips like the H100 and H200 has been constrained by TSMC's advanced process capacity, with lead times stretching to several months or even over a year.

More critically, chip manufacturing is a highly globalized industry chain; a break in any single link affects the entire production capacity. From raw materials like photoresists, target materials, and electronic special gases to key equipment like etching and deposition tools, there are varying degrees of monopoly and supply constraints. This makes high-end chip manufacturing capability the most challenging bottleneck to break through in the AI Infra industry chain.

(4) Power Wall: A Relatively Controllable Short-Term Challenge

Compared to the first three, the power wall is a relatively easier bottleneck to solve. AI data centers are major energy consumers; the annual electricity consumption of a single ultra-large data center campus can even exceed that of a medium-sized city with hundreds of thousands of people. Currently, global data center electricity consumption accounts for 2% to 3% of total global electricity use and is still climbing. But the power issue is essentially an infrastructure construction problem that can be addressed through diversified energy supply methods like gas turbines, fuel cells, and photovoltaics.

In the long run, with the development of renewable energy technologies and the improvement of energy infrastructure, power supply will not become the biggest mid-to-long-term bottleneck for AI computing power development. However, in some regions, short-term power supply pressures due to lagging grid construction may still limit the pace of data center construction.

The "Invisible Killer" of Capacity Expansion: Comprehensive Shortages in Equipment and Materials

The pace of AI chip capacity expansion is far slower than expected, with the core constraint not being the chips themselves but comprehensive shortages in upstream equipment and materials.

(1) Rapid Growth in Demand for Testing Equipment

AI chip technology upgrades are driving higher precision and efficiency requirements for testing equipment. Compared to ordinary logic chips, AI GPUs have a massive increase in signal ports, consuming more signal channel resources of testers; simultaneously, the surge in transistor count leads to a significant increase in corresponding test vector scale and per-chip testing time. More critically, while only a certain percentage of chips in traditional consumer electronics are tested, for AI chips, 100% of chips must be tested, often through multiple stages, to ensure the entire chipset operates normally. Driven strongly by AI computing demand and the memory market explosion, semiconductor test equipment (ATE) has become one of the fastest-growing categories in the semiconductor equipment sector.

Advantest, the world's largest chip test equipment supplier, also stated that it expects record highs for the fiscal year ending March 2026, with revenue projected to grow 37% and net profit more than doubling from the previous year.

(2) IC Substrates/Package Substrates: The "Choke Point" More Expensive Than Chips

Surprisingly, the biggest supply chain pain point for leading chip manufacturers like NVIDIA is not the chips themselves, but IC substrates (package substrates). IC substrates are key components connecting chips to PCB boards, providing electrical connection and physical support. AI chips have extremely high requirements for IC substrates—they need larger area, higher wiring density, better thermal performance, and lower signal loss. This also means their value is inevitably much higher than ordinary PCBs. Estimates show that IC substrates account for about 50% of the total packaging cost, and in advanced flip-chip packaging, this proportion can even reach 70%–80%. Depending on the resin material used, IC substrates are mainly divided into BT substrates and ABF substrates. BT substrates are primarily used for various memory chips, while ABF is more focused on logic chips like CPUs, GPUs, FPGAs, and ASICs.

According to incomplete statistics, since 2025, IC substrate prices have accumulated an increase of over 30%. The price hike is mainly due to two reasons: first, cost transmission from upstream raw materials—core materials like high-end glass fiber cloth and copper foil have been in continuous short supply since 2025, with the capacity gap不断扩大 (expanding); second, the explosion in demand for 2.5D/3D advanced packaging—high-end chips like GPUs普遍采用 (commonly adopt) multi-chip stacking architectures, and the significant increase in chip layers and area directly drives up the demand for substrate area.

Unlike ordinary PCBs, IC substrates have high technical barriers and complex processes. Global capacity for high-end IC substrates is mainly concentrated in a few Taiwanese manufacturers like Unimicron and Nan Ya PCB, with capacity expansion cycles as long as 18-24 months. This means the tight supply situation for IC substrates is unlikely to be fundamentally alleviated within the next two years.

(3) Key Specialty Materials: The Extremely Scarce "Industrial MSG"

Some seemingly insignificant specialty materials are becoming the "Achilles' heel" of the AI industry chain. Materials like Low-CTE (low coefficient of thermal expansion) glass fiber, specialty copper foil, and high-end drill bits, though used in small quantities, are indispensable "industrial MSG" for manufacturing high-end IC substrates and PCB boards.

The high power consumption and performance requirements of AI chips necessitate the use of materials with extremely low thermal expansion coefficients for substrates and PCBs to prevent deformation under high-temperature operating conditions. Simultaneously, as fillers are used, the lifespan of drill bits used in the加工过程 (processing) is drastically reduced to 1/5-1/7 of the original, leading to an explosive growth in demand for drill bits.

These specialty materials have extremely high technical barriers, global capacity is highly concentrated, and expansion is difficult. Any supply interruption will directly impact the normal operation of the entire AI industry chain.

(4) High-End Cleanrooms: The Overlooked High-Barrier Segment

In the AI industry chain's capacity expansion, high-end cleanrooms are another severely overlooked high-barrier segment. Advanced process chips and advanced packaging have extremely high requirements for production environment cleanliness—a single speck of dust in the air can cause an entire wafer to be scrapped.

The construction of high-end cleanrooms requires not only huge capital investment but also extremely high technical expertise. From air purification systems to anti-static facilities, from temperature and humidity control to vibration isolation, every环节 (aspect) has strict standards. Currently, the global high-end cleanroom market is mainly dominated by overseas companies, with net profit margins potentially exceeding 20%, far higher than domestic counterparts.

With the global expansion of AI chip capacity, demand for high-end cleanrooms remains strong, making it a segment with extremely strong certainty and high prosperity within the industry chain.

The "Route Dispute" in Connection Technology: Copper Resurgence and Photonic-Electronic Integration

Beyond computing and expansion bottlenecks, connection technology inside data centers is undergoing a profound transformation. The technological路线之争 (route dispute) between copper and light, along with the technological upgrades of PCB/substrates, is reshaping the connectivity landscape of AI Infra.

(1) Scenario-Based Competition and Substitution Between Copper and Light

For a long time, optical modules have been considered the future direction for high-speed interconnection in data centers. But with the explosion of AI computing demand, copper cable technology is experiencing a "resurgence," with copper and light forming a relationship of complementarity and substitution in different scenarios.

Short Distance (≤7 meters): Copper cables (AEC, Active Electrical Cables), with advantages of low cost, high reliability, and low latency, are comprehensively replacing laser-based optical modules. In short-distance interconnection scenarios within servers and within server racks, copper cables offer significant cost-performance advantages.

Medium Distance (~30 meters): Micro LED optical cables have become a compromise solution. They combine the advantages of copper cables and optical modules, offering better reliability than laser optical modules and lower cost than traditional optical modules, suitable for medium-distance interconnection between racks.

Long Distance (Between Data Centers): Traditional pluggable optical modules and fiber optics remain mainstream. CPO (Co-Packaged Optics) technology is considered the future direction; it integrates the optical engine with the chip package, significantly increasing bandwidth and reducing power consumption. However, it still faces challenges like high cost and poor reliability, and widespread commercial use is still some time away.

It is worth noting that the procurement scale and performance specifications for optical fiber in AI data centers have already created an order-of-magnitude difference compared to traditional telecom networks. To meet the low-latency, high-bandwidth interconnection needs of GPU clusters, demand for特种光纤 (specialty optical fibers) like G.657.A2 continues to rise, and more cutting-edge hollow-core fiber solutions have entered the deployment stage. Hollow-core fiber replaces the traditional glass core with air, significantly optimizing transmission: transmission loss can be reduced from the常规 (conventional) 0.14dB/km to below 0.1dB/km, transmission delay reduced from 5μs/km to 3.46μs/km, while tolerating higher optical power.

Currently, the number of participants in the hollow-core fiber market is expanding rapidly, but prices remain relatively stable, at about 30,000-40,000 RMB per kilometer, far higher than普通光纤 (ordinary optical fiber).

(2) Technological Upgrade Pressure on PCB/Substrates

To meet the high-bandwidth demands of AI chips, PCB and substrate technologies are also continuously upgrading. Currently, PCB/substrates are moving towards n+m layer structures, glass substrates, and modified Semi-Additive Process (mSAP) technology.

The n+m structure increases the number of layers and wiring density, enhancing the substrate's bandwidth capability; glass substrates have a lower coefficient of thermal expansion and better high-frequency performance, representing an important future direction for high-end substrates; mSAP technology enables finer circuit wiring, meeting high-density interconnection demands.

These technological upgrades place new demands (提出了全新的要求) on upstream equipment, materials, and manufacturing processes, also bringing new industrial opportunities and challenges.

Summary

The AI Infra industry chain is facing intertwined constraints from multiple bottlenecks. From the computing层面的 (level) memory wall, bandwidth wall, compute wall, and power wall, to expansion-level shortages in testing equipment, IC substrates, specialty materials, and cleanrooms, to the technological route dispute at the connection level, every环节 (link) affects the large-scale deployment of AI computing power.

High-end chip manufacturing capability is the most fundamental constraint, determining the performance上限 (upper limit) and production scale of AI chips. Testing equipment, high-end IC substrates, key specialty materials, etc., are currently the segments with the strongest certainty and the most acute supply-demand矛盾 (contradiction). In the long run, AI Infra development will show two major trends: first, the technological evolution of copper cable resurgence and photonic-electronic integration, where different technological routes will coexist in their respective advantageous scenarios; second, the restructuring of the global industry chain and the acceleration of localization, where domestic companies are expected to achieve breakthroughs in some细分领域 (segments).

This article is from the WeChat public account "Semiconductor Industry Vertical and Horizontal" (ID: ICViews), author: Peng Cheng

相关问答

QWhat are the four major bottlenecks (walls") mentioned in the article that are constraining AI infrastructure development?

AThe four major bottlenecks are: 1) The Memory Wall, caused by the shift to AI inference and the resulting shortage of HBM and DRAM. 2) The Bandwidth Wall, where data transfer speeds cannot keep up with computing power, creating a performance bottleneck. 3) The Compute Wall, where the manufacturing of high-end chips is fundamentally constrained by the limited supply of advanced equipment like EUV lithography machines. 4) The Power Wall, a relatively more solvable short-term challenge concerning the massive energy consumption of AI data centers.

QAccording to the article, what is a more immediate supply chain pain point for chipmakers like NVIDIA than the chips themselves?

AThe article states that the most immediate supply chain pain point for chipmakers like NVIDIA is not the chips themselves, but IC substrates (packaging substrates). These are the critical components that connect the chip to the PCB, and their production is constrained by high technical barriers and long expansion cycles of 18-24 months.

QHow is the 'Bandwidth Wall' problem described in the context of AI clusters?

AThe 'Bandwidth Wall' is described as a multi-level performance bottleneck where the speed of data movement cannot keep up with the speed of computation. This occurs within chips (interconnect delays), between chips (traditional PCB interconnects are insufficient), inside server racks (limiting scale-up), and between data centers (limiting scale-out). It's noted that the energy consumed by moving data in an AI training cluster already exceeds the energy consumed by the computation itself.

QWhat two key factors are driving the price increases and shortages of IC substrates?

AThe two key factors driving IC substrate price increases and shortages are: 1) Cost transmission from upstream raw materials like high-end glass fiber cloth and copper foil, which have been in short supply. 2) The explosive demand from 2.5D/3D advanced packaging, where multi-chip stacking architectures used in GPUs significantly increase the required substrate area.

QIn data center connectivity, what are the competing technological routes for different distance scenarios as outlined in the article?

AThe article outlines a scenario-based competition between copper and optical technologies: 1) Short distance (≤7m): Active Electrical Cables (AEC) are replacing optical modules due to lower cost and higher reliability. 2) Medium distance (~30m): Micro LED optical cables are a compromise solution. 3) Long distance (between data centers): Traditional pluggable optical modules and fiber optics remain the mainstream, with Co-Packaged Optics (CPO) seen as a future direction.

你可能也喜欢

富达年中复盘:2026 年数字资产的 6 大关键趋势

富达数字资产研究团队在2026年年中复盘报告中,梳理了年初展望中提出的六大关键趋势进展,认为数字资产领域正在进行结构性“重塑”。 1. **数字资产与传统资本市场加速整合**:趋势持续且快于预期。现货比特币ETP期权未平仓合约激增,代币化领域活跃,监管框架(如SEC/CFTC指南)趋于清晰,推动数字资产进一步融入主流金融体系。 2. **代币持有者权利受关注但仍不明朗**:生态内相关机制(如回购、治理重组)的试验在继续,但市场尚未对此形成明确的“权利溢价”定价,该趋势仍处早期。 3. **人工智能算力需求可能影响比特币挖矿**:比特币算力与挖矿难度出现下降,虽部分受季节性因素影响,但增长放缓的长期趋势与AI算力竞争加剧的预测相符,矿工可能正转向更有利可图的AI数据中心业务。 4. **比特币网络处于新的转折点**:OP_RETURN数据上限放宽未导致网络滥用或臃肿。当前焦点转向网络动态,Bitcoin Knots节点占比约17%,虽引发对潜在分裂风险的讨论,但Bitcoin Core(占比77%)仍主导共识。同时,抗量子计算等长期安全升级研究获得进展。 5. **空头暂时掌控市场局面**:年初至今,受清算去杠杆、高通胀及地缘政治不确定性影响,熊市情景占上风,比特币价格下跌。但近期在地缘冲突后,比特币展现出避险属性,跑赢部分传统资产,且机构参与、监管清晰度提升等结构性利好依然存在。 6. **黄金保持强势,去美元化趋势显现**:黄金在央行购金及去美元化趋势支撑下表现强劲。有证据显示比特币开始在一些国际贸易场景(如伊朗)作为支付手段被使用,但比特币紧随黄金优异表现的情景尚未出现。 **结论**:当前数字资产市场呈现短期压力与长期结构性进展并存的局面。投资者需超越价格波动,关注机构融合、监管、基础设施等领域的实质推进,这些正为下一阶段增长积蓄力量。

marsbit1小时前

富达年中复盘:2026 年数字资产的 6 大关键趋势

marsbit1小时前

富达年中复盘:2026 年数字资产的 6 大关键趋势

富达数字资产在年中复盘中,梳理了其在《2026年展望》中提出的六大关键趋势的当前进展: 1. **数字资产与资本市场加速整合**:传统金融渠道对数字资产的敞口需求坚挺,现货比特币ETP期权等产品发展迅速,反映出机构和主流投资者采用率持续上升。代币化势头增强,监管框架也趋于清晰,推动数字资产进一步融入金融体系。 2. **代币持有者权利逐渐受关注**:生态内正在试验更多机制以绑定持有者利益,如基于储备的回购和治理结构更新。但相关的“权利溢价”尚未完全体现在市场定价中,趋势仍处早期。 3. **人工智能与挖矿的潜在转变**:比特币算力增长呈现放缓趋势,部分原因可能是矿工将能源和基础设施转向利润率可能更高的AI算力需求。这符合此前关于结构性转变的判断。 4. **比特币处于新的转折点**:提高OP_RETURN数据上限并未导致区块链明显膨胀或网络压力。当前焦点转向网络动态,如Bitcoin Knots节点的波动可能带来潜在分裂风险,但Bitcoin Core节点仍主导共识。同时,抗量子计算等长期安全升级的准备工作也在推进。 5. **空头暂时掌控局面**:受去杠杆、高通胀及地缘政治不确定性影响,比特币价格承压,熊市情景占上风。但在压力时期,比特币也展现出作为高流动性中立资产的韧性,且机构参与、监管清晰度提升等结构性利好依然存在。 6. **黄金保持强势**:受央行购金及全球“去美元化”趋势支撑,黄金年初表现强劲。央行需求持续,黄金已成为全球主要储备资产。然而,此前预期的比特币紧随黄金的优异表现尚未出现。 **结论**:当前数字资产市场呈现短期压力与长期结构性进展并存的局面。机构化、监管和基础设施等趋势正按预期推进,为下一阶段增长积蓄力量,投资者需关注这些底层转变而非短期价格波动。

链捕手1小时前

富达年中复盘:2026 年数字资产的 6 大关键趋势

链捕手1小时前

Crypto GP 的中年危机:没有 PMF,就没有 LP 的下一张支票

**Crypto GP的中年危机:没有PMF,就没有LP的下一张支票** 当前加密货币市场,有限合伙人(LP)已不再愿意为虚无的梦想买单,普通合伙人(GP)必须拿出具有产品市场契合度(PMF)的具体产品才能持续获得融资。市场环境已从“购买未来愿景”转向“购买具体产品”阶段,LP要求立即、相对确定的赚钱机会。 文章将当前加密募资产品分为三大类:一级市场(Primary)、流动性市场(Liquid)以及中心化/去中心化金融原生收益(CeFi/DeFi Native Yield)。本文上篇重点分析一级市场。 **一级市场现状与挑战:** 过去,LP投资加密VC的主要理由包括:捕捉行业增长红利、获取项目投资渠道、信赖GP的卓越判断力、看重GP的“攒局”资源整合能力,或是进行声誉投资。然而,这些理由如今已大大削弱: 1. 获取加密资产曝险的途径(如ETF、托管账户等)已非常丰富,不再依赖VC盲池基金。 2. LP自身学习能力增强,或已建立内部团队,对GP渠道的依赖降低。 3. 多数GP在上个周期未能证明其判断力优于市场。 4. 市场下行时,“攒局”与退出变得困难。 **谁能留在牌桌上?** 在当前环境下,能继续在一级市场募资的GP主要包括: 1. 管理规模足以进入捐赠基金等长期耐心资本配置范围的基金。 2. 使用自有资金投资的家族办公室、公司或高净值人士。 3. 在本周期内真正为LP创造了超额回报的少数基金。 4. 具备明确生态资源和利益置换能力的“攒局型”基金。 对于其他大多数GP而言,行业信任已然受损,需要心态归零,在细分领域重新证明自己创造超额回报的能力,或提供具体的服务价值,以此重建信任并寻求发展。

marsbit2小时前

Crypto GP 的中年危机:没有 PMF,就没有 LP 的下一张支票

marsbit2小时前

脱钩时代来临,比特币不再是加密的唯一罗盘

文章指出,加密市场正告别以比特币为单一风向标的时代,分化为“内生型”和“外生型”两大资产阵营。 内生型资产(如比特币和多数传统加密货币)的价值仍与加密市场整体行情深度绑定。而外生型资产的崛起成为新趋势,其价值主要依托自身业务的真实需求和基本面,日益独立于比特币价格波动。 例如,Hyperliquid作为混合型案例,其部分合约交易已转向非加密资产。Venice等项目则完全脱离加密市场,其商业模式更接近消费级AI服务,收入来自用户为AI推理付费。Figure公司利用区块链技术提升贷款效率,其核心价值在于金融科技业务本身。 此外,稳定币等赛道的企业收购与高增长(如BVNK、Bridge),也显示了其发展与加密牛熊周期的脱钩。 这一转变意味着行业分析逻辑的根本改变:研究外生型资产需要像分析传统企业一样,专注于用户群体、经济模型和行业护城河等基本面,而非紧盯比特币价格。文章列举了多个具备潜力的外生型赛道,包括链上金融服务、AI与加密融合、新型数字银行、支付、非金融消费产品等。 目前,投资相关企业股权仍是主要途径,代币机制仍需优化。但核心趋势已定:加密市场的驱动力正变得多元,行业研究重心将从解读比特币图表转向深耕企业基本面。未来,加密市场齐涨共跌的局面或将不再。

marsbit4小时前

脱钩时代来临,比特币不再是加密的唯一罗盘

marsbit4小时前

交易

现货
合约
活动图片