The Underlying Logic of Bottleneck Propagation in the AI Computing Power Industry Chain

marsbitPublished on 2026-05-22Last updated on 2026-05-22

Abstract

The article analyzes the evolving bottleneck progression within the AI compute supply chain. Initially constrained by GPU chip and advanced packaging capacity (2022-2024), the primary bottleneck shifted to HBM memory (2024-2025) due to massive model parameter growth. As cluster scale expands, physical limits of copper interconnects are making optical interconnect technologies the next critical phase (2025-2026). The ultimate, emerging constraint is power delivery and advanced liquid cooling (from 2026 onward), driven by skyrocketing rack power densities exceeding traditional infrastructure limits. The core thesis is that AI compute demand follows a "Leontief" production function where solving one bottleneck immediately exposes the next in the sequence: Compute (GPU) → Memory (HBM) → Interconnect (Optics) → Power & Cooling. Each shift reallocates value and investment across the semiconductor and infrastructure landscape.

Author: qinbafrank

In February, in the article "What Does This War of Capital Expenditure Mean?", it was discussed that key segments in the computing power industry chain can still capture the greatest value: chips, packaging & testing, memory, optical modules, etc. Those with capacity that is difficult to expand rapidly or those with extremely high moats will enjoy the红利 of massive capital expenditures.

There is still significant room for efficiency optimization: Distillation, quantization, MoE, dedicated chips, liquid cooling, nuclear fusion (long-term) on the inference side may reduce the energy consumption and cost per unit of computing power by another 10–100 times. Opportunities should be sought in these segments.

Recently, multiple investment banks including Morgan Stanley, J.P. Morgan, Bank of America, Goldman Sachs, UBS, Citi, Bernstein, and HSBC have published update reports on AI/semiconductors/power/memory. The bottlenecks for AI hardware have expanded from the single dimension of "GPU supply" to collective tension across five dimensions: power, chips, memory, equipment, and materials.

The scale of AI demand has broken through the forecast intervals of all traditional power planning, semiconductor equipment capacity, memory price models, and robot installation assumptions.

Morgan Stanley's global thematic research review points out that the global weekly large language model token consumption soared from 6.4 trillion to 22.7 trillion within 3 months, an increase of 2.5 times. The U.S. data center power gap for 2025-28 is 55 GW; J.P. Morgan's inaugural coverage of data center high-performance computing project debt directly gives a "122 GW financing gap in the next 5 years" figure. U.S. 5-year power planning has surged from 101 GW to 230 GW, with 44% of new projects experiencing grid connection wait times exceeding 4 years; Bank of America's latest target price report for Alphabet directly revises its 2026 capital expenditure upward to $181.5 billion, doubling year-on-year, with free cash flow declining 62%. These three sets of data are not outputs from the same framework, but independent portraits from three separate institutions on different research paths.

The evolution of bottlenecks in the semiconductor industry chain (especially in the AI computing power field) precisely progresses in this clear sequential order: "Computing (GPU) → Memory (HBM, etc.) → Optical Interconnect → Power/Liquid Cooling". This is the industry consensus for 2025-2026. As AI training/inference clusters scale from single cabinets (dozens of GPUs) to super-large scale (thousands to hundreds of thousands of GPUs), each time a bottleneck in one segment is resolved, the next physical/supply chain constraint is immediately exposed, forming "Leontief-style" complementary constraints (if one is missing, nothing can be shipped).

It is necessary to understand why this evolution occurs, the current status, and the underlying physical/engineering reasons:

1. First Phase Bottleneck: GPU Computing (Dominant from 2022-2024) Core Constraint:

High-end GPU (e.g., NVIDIA Hopper H100 → Blackwell B200 → Rubin) wafer capacity itself + advanced packaging.

Why it was the bottleneck: AI large models require massive parallel computing. TSMC's 4nm/3nm/2nm logic processes + CoWoS (2.5D/3D packaging) capacity once became the biggest choke point. Even if front-end wafers were sufficient, the back-end capability to package logic chips + HBM stacks couldn't keep up, preventing the entire GPU from being produced.

Easing situation: TSMC aggressively expanded CoWoS (capacity doubling 2024-2025), NVIDIA Blackwell is shipping in large volumes. But this only unlocked the "computing" segment, immediately exposing new problems.

2. Second Phase Bottleneck: Memory (HBM High Bandwidth Memory, becoming the tightest from 2024-2025)

Core Constraint: HBM3/HBM3e/HBM4 capacity.

Why it became the next bottleneck: GPU computing power increased, but model parameters exploded (trillions to tens of trillions of parameters), making data movement (memory bandwidth) the "memory wall." HBM can transmit several TB of data per second, over 20 times faster than conventional DDR memory. Because HBM is adjacent to the logic chip, data doesn't need to travel far, thus saving energy.

A single B200 GPU requires 192GB+ of HBM3e. A single cabinet (NVL72) HBM total capacity has reached 30-40TB, and bandwidth demands far exceed traditional DRAM.

Supply chain status: Only SK Hynix, Samsung, and Micron can mass-produce HBM, with complex processes (TSV + stacking). 2025 supply is already sold out, 2026 remains in short supply, with prices soaring 246% year-on-year. Even if GPU chips are ready, without HBM, assembly and delivery are impossible, causing delays in entire AI cluster deployments.

Result: Memory transformed from a "commodity" into a strategic choke point, potentially accounting for 30% of capital expenditures.

3. Third Phase Bottleneck: Optical Interconnect (Transition underway in 2025-2026)

Core Constraint: Physical limits of copper cables (NVLink/NVSwitch) in bandwidth, distance, power consumption, and weight.

Why a shift to optics is inevitable: Copper can still work within a single cabinet (72 GPUs), but when scaling to multi-cabinet or even thousands of GPU interconnects, copper cable attenuation is severe (effective distance <1 meter at 1.8TB/s bandwidth), weight explodes (NVL72 cabinet copper cables exceed 5,000, total weight 1.36 tons), and power consumption is high (replaceable optical modules replacing copper add an extra 20,000W). Signal integrity, latency, and cooling cannot support larger clusters.

Solution: Shift to optical interconnect (CPO Co-Packaged Optics + Silicon Photonics). Embedding optical engines directly next to the GPU/ASIC, using fiber optics for scale-out, achieving higher bandwidth density, lower per-bit power consumption, and longer distances.

NVIDIA heavily bet on this at GTC 2026, having invested in optical companies. Demand for 800G/1.6T optical modules is exploding. Companies like Lumentum, Broadcom, Coherent, Ayar Labs become new winners.

Current progress: Copper has reached its limit. Optics are shifting from "optional" to "mandatory," breaking through AI data center performance ceilings.

4. Fourth Phase Bottleneck (The Current Frontier): Power + Liquid Cooling (Becoming the ultimate physical constraint from 2026 onwards) Core Constraint: Power Wall + Cooling Wall + Grid Access.

Why it's the ultimate bottleneck: Each GPU's power consumption rose from 300W→700-1200W. Single cabinet power surged from 10-20kW (CPU era) to 120-200kW+ or even higher. Traditional air cooling has a physical limit of only 20-50kW, with unacceptable noise, airflow, and energy consumption.

Power side: Data centers require GW-level power supply, with grid connection queues potentially lasting years. Delivery cycles for transformers, solid-state transformers, and other equipment are extending to 100 weeks. Microsoft's CEO once bluntly stated, "We have GPUs but no electricity to plug them into."

Liquid cooling side: Must switch to Direct-to-Chip liquid cooling or immersion cooling, combined with microfluidics, cold plates, and other technologies. TSMC has demonstrated silicon-based liquid cooling on the CoWoS platform, supporting >2.6kW TDP. Liquid cooling/thermal management companies like Vertiv (VRT) are becoming new infrastructure core players.

Chain reaction: PUE (Power Usage Effectiveness) requirements are <1.2. Waste heat recovery, nuclear/new energy grid integration have become new topics. Even if all previous segments are solved, without power and cooling, cabinets cannot be racked and operated.

The Essential Logic of AI Computing Power Industry Chain Bottleneck Shifts AI computing power is not a "single-point" issue, but a systemic Leontief production function — GPU, HBM, interconnect, power, cooling must match based on the lowest-capacity component. Hyperscalers (Google, Microsoft, Meta, etc.) each time they solve one, immediately push capital and innovation to the next segment.

Currently (2026), we are in the transition period of "accelerated optical interconnect deployment + large-scale commercialization of power/liquid cooling." New bottlenecks may yet emerge (e.g., lasers, fiber materials, or grid transformers), but this chain of "computing → memory → optics → power/cooling" has become the recognized industry path.

This also explains why the investment logic is shifting from NVIDIA/TSMC to the HBM trio (SK Hynix, etc.), optical manufacturers (Lumentum, Coherent), and liquid cooling/power infrastructure companies (Vertiv, related power supply companies).

Every bottleneck shift is reshaping the value distribution across the entire semiconductor + data center industry chain.

Related Questions

QWhat are the four sequential bottleneck stages in the AI computing power supply chain as described in the article, and which one is identified as the 'ultimate bottleneck'?

AThe four sequential bottleneck stages are: 1) GPU/Computing, 2) Memory (HBM), 3) Optical Interconnect, and 4) Power + Liquid Cooling. The article identifies the fourth stage, Power and Liquid Cooling, as the 'ultimate bottleneck' or final physical constraint, as even if all other components are ready, a lack of power and cooling prevents the AI clusters from running.

QWhy did High Bandwidth Memory (HBM) become a critical bottleneck after the initial GPU shortage was alleviated?

AHBM became the critical bottleneck because as GPU computing power increased to handle massive AI models with trillions of parameters, the need for faster data transfer (memory bandwidth) created a 'memory wall.' HBM, which is much faster than traditional DDR memory, is essential for feeding data to these powerful GPUs. Its complex manufacturing process (involving TSVs and stacking) and limited suppliers (SK Hynix, Samsung, Micron) made its supply unable to keep up with explosive demand, delaying entire AI cluster deployments even when GPU chips were available.

QAccording to the article, what is the fundamental reason the industry is transitioning from copper cables to optical interconnects for scaling AI clusters?

AThe fundamental reason is the physical limitations of copper cables. While usable within a single server rack, copper cables face severe signal attenuation, excessive weight (e.g., over 1.36 tons for an NVL72 rack), high power consumption for signal integrity, and distance constraints when scaling to multi-rack clusters with thousands of GPUs. Optical interconnects (like CPO and silicon photonics) offer higher bandwidth density, lower power per bit, and longer transmission distances, making them a necessity for breaking the performance ceiling of large-scale AI data centers.

QHow does the article characterize the nature of bottlenecks in the AI computing power supply chain, and what investment shift does this logic explain?

AThe article characterizes the bottlenecks as forming a system-level 'Leontief production function,' where components like GPU, HBM, interconnect, power, and cooling are complementary constraints—the system's capacity is determined by the lowest-performing (most bottlenecked) component. This logic explains the shift in investment focus from earlier leaders like NVIDIA and TSMC to companies in subsequent bottleneck areas: HBM suppliers (SK Hynix, etc.), optical component makers (Lumentum, Coherent), and power/cooling infrastructure providers (Vertiv, power companies), as each bottleneck转移 reshapes value distribution in the产业链.

QWhat specific data points from major investment banks does the article cite to illustrate the scale and unpredictability of current AI infrastructure demand?

AThe article cites several independent data points: Morgan Stanley noted a 2.5x increase in global weekly LLM token consumption in 3 months. J.P. Morgan identified a 122 GW financing gap for data center projects over 5 years and that 44% of new U.S. power projects face over 4-year grid connection waits. Bank of America significantly raised Alphabet's 2026 CAPEX forecast to $181.5 billion (a doubling year-over-year), expecting a 62% drop in free cash flow. These figures from different research paths collectively show AI demand has exceeded all traditional planning models for power, semiconductor equipment, and memory pricing.

Related Reads

Female Crypto Mogul Survived Mining Crackdown and Market Plunge, but Paid a $60 Million Tuition to a U.S.-Style 'Pig-Butchering' Scam

An 80s-born Chinese entrepreneur, Fiona Lyu (also known as Lv Yongshuang), CEO of the mining firm Chengdu Valarhash Technology, was defrauded of over $9.4 million (approx. RMB 60 million) in the US, according to a Caixin report. Lyu's company once operated the 1THash and Bytepool mining pools, which collectively controlled about 9% of the global Bitcoin hash rate at their peak in early 2020. The scam began in 2021 after China's crackdown on crypto mining forced Lyu to seek overseas relocation for her operations. She was introduced to Zubair Al Zubair, who posed as an "UAE royal family member" with connections to Middle Eastern capital and US local government resources. He and his brother, who impersonated a hedge fund manager, orchestrated a fake contract signing for a mining facility in Ohio, witnessed by local officials. Lyu transferred millions in contract payments. The brothers, both US citizens with fabricated backgrounds, later fraudulently sold 1,067 of her miners for $6.17 million. The scheme involved bribing a mayor's chief of staff for legitimacy. In May 2026, US courts sentenced Zubair to 24 years in prison, his brother to 23 years, and the official to 8 years. Simultaneously, Lyu faced a separate legal battle in China. A subsidiary of listed company ST Zhongchang sued her firm, seeking refunds for a 2021 contract involving Bitcoin mining equipment. Chinese courts ruled the mining contract invalid and ordered a refund of nearly RMB 19.3 million. This dual blow marked a stark downturn for the once-prominent figure in the crypto mining industry.

Foresight News12m ago

Female Crypto Mogul Survived Mining Crackdown and Market Plunge, but Paid a $60 Million Tuition to a U.S.-Style 'Pig-Butchering' Scam

Foresight News12m ago

Trade.xyz Pricing Controversy Exposes Fatal Weakness of Pre-IPO Perpetual Contracts

The Trade.xyz pricing controversy surrounding its SPCX (SpaceX) pre-IPO perpetual contract on Hyperliquid has exposed a critical vulnerability in decentralized finance (DeFi) platforms offering such instruments. The dispute erupted after SpaceX's updated filing revealed its total shares outstanding were approximately 10% higher than market estimates. While centralized exchanges (CEXs) paused trading and repriced contracts based on the new data, Trade.xyz maintained its position that its "IPOP" contract tracks market expectations for the per-share price, not the company's fundamental valuation or share count. This discrepancy triggered cross-platform arbitrage and led to significant losses for leveraged long positions on Trade.xyz, as the contract price gaped down without a value-neutral adjustment mechanism. The incident highlights the absence of a "Rebase" function—a mechanism that proportionally adjusts contract prices and user positions to reflect corporate actions like share count changes—within many decentralized perpetual exchanges (Perp DEXs). Unlike CEXs, which can centrally execute such adjustments, implementing Rebase on-chain involves significant technical complexity, gas costs, and potential security risks. Trade.xyz's architecture, which allows independent market deployment, further complicates platform-wide Rebase implementation. The controversy underscores broader challenges for Perp DEXs venturing into real-world assets (RWA) like pre-IPO shares. It raises questions about pricing reliability, transparent rule disclosure, and the ability to handle corporate events, testing user trust and the long-term viability of these synthetic markets for price discovery before official listings.

链捕手28m ago

Trade.xyz Pricing Controversy Exposes Fatal Weakness of Pre-IPO Perpetual Contracts

链捕手28m ago

When AI Traffic Surpasses Humans, How Do You Prove You're Human?

As AI-generated web traffic now surpasses human activity, the internet's foundational business models—built on human attention, browsing, and advertising—face severe disruption. AI agents crawl websites at immense scale without generating ad revenue, while AI summaries divert traffic from original content sites. In response, over 2.5 million sites are blocking AI crawlers, and protections like Cloudflare's "honeypot" traps have emerged, though advanced AI can bypass these. The collapse of traditional CAPTCHAs, which assumed machines were weaker than humans, has led to a shift toward behavioral biometrics for human verification. Companies like IBM and BioCatch now analyze unique human patterns—cursor movements, typing rhythms, keystroke dynamics, and even cognitive delays like the Stroop effect—to distinguish real users from bots. These biometric signatures are difficult to fake or alter, offering a new layer of security but raising significant privacy concerns. Two competing visions for a reliable human verification system are emerging. One, exemplified by Sam Altman’s World (formerly Worldcoin), uses centralized iris scanning to generate unique credentials, though it faces bans and criticism over unauthorized data collection. The other employs cryptographic zero-knowledge proofs, allowing users to prove they are human without revealing identity or biometric data, as advocated by Vitalik Buterin. However, decentralized approaches risk exploitation through identity renting in economically unequal regions. The central dilemma is between a scalable but privacy-invasive centralized system that permanently controls users' biometric data, and a privacy-preserving cryptographic system vulnerable to real-world economic manipulation. The author expresses a preference for the cryptographic path, arguing that despite its flaws, it avoids the irreversible biometric surveillance inherent in centralized alternatives.

Foresight News36m ago

When AI Traffic Surpasses Humans, How Do You Prove You're Human?

Foresight News36m ago

Crypto Primary Market Investment and Financing Forward-Looking Weekly Report | Stablecoin Regulation Nears Implementation, ETF Funds Continue to Withdraw, Capital Begins Betting on Payment and Cash Flow

Crypto Market Weekly Report (Jun 1-7, 2026): Capital Shifts Focus to Payments & Cash Flow Market data indicates a significant divergence: while traditional institutional funds continue exiting via BTC and ETH ETFs (recording net outflows of $1.72B and $168M this week, respectively), stablecoin supply continues growing. This suggests capital is shifting from speculative asset allocation toward defensive positioning within on-chain liquidity, awaiting new, concrete opportunities. This trend is reflected in venture capital focus. Weekly fundraising fell 27% to $302M, with investments concentrating on infrastructure with tangible revenue potential: 1. **Stablecoin Infrastructure (28% of funding):** Projects like M0 Protocol ($35M raise) are gaining attention as regulatory clarity (e.g., the GENIUS Act) nears, shifting the focus from legitimacy to building payment and settlement networks. 2. **AI Agent Infrastructure (26%):** Investments are moving from conceptual AI Agents towards the execution and economic layers required for a functional "Agent economy." Key raises include OpenRouter ($40M) and Halliday ($20M). 3. **Real World Assets (RWA) (18%):** The search for on-chain yield and cash flow drives continued interest in RWA platforms like Ondo Finance. Security threats are evolving from smart contract exploits toward key management failures, permission control issues, and regulatory execution risks (e.g., court-ordered asset freezes). **Key Takeaways:** The investment thesis is shifting from narrative-driven bets to revenue and cash-flow-generating protocols. Future attention should be on the progression of stablecoin regulations, the commercial validation of AI Agent economies, and the performance of high-revenue protocols like derivatives platforms.

marsbit43m ago

Crypto Primary Market Investment and Financing Forward-Looking Weekly Report | Stablecoin Regulation Nears Implementation, ETF Funds Continue to Withdraw, Capital Begins Betting on Payment and Cash Flow

marsbit43m ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片