The Underlying Logic of Bottleneck Propagation in the AI Computing Power Industry Chain

marsbitОпубликовано 2026-05-22Обновлено 2026-05-22

Введение

The article analyzes the evolving bottleneck progression within the AI compute supply chain. Initially constrained by GPU chip and advanced packaging capacity (2022-2024), the primary bottleneck shifted to HBM memory (2024-2025) due to massive model parameter growth. As cluster scale expands, physical limits of copper interconnects are making optical interconnect technologies the next critical phase (2025-2026). The ultimate, emerging constraint is power delivery and advanced liquid cooling (from 2026 onward), driven by skyrocketing rack power densities exceeding traditional infrastructure limits. The core thesis is that AI compute demand follows a "Leontief" production function where solving one bottleneck immediately exposes the next in the sequence: Compute (GPU) → Memory (HBM) → Interconnect (Optics) → Power & Cooling. Each shift reallocates value and investment across the semiconductor and infrastructure landscape.

Author: qinbafrank

In February, in the article "What Does This War of Capital Expenditure Mean?", it was discussed that key segments in the computing power industry chain can still capture the greatest value: chips, packaging & testing, memory, optical modules, etc. Those with capacity that is difficult to expand rapidly or those with extremely high moats will enjoy the红利 of massive capital expenditures.

There is still significant room for efficiency optimization: Distillation, quantization, MoE, dedicated chips, liquid cooling, nuclear fusion (long-term) on the inference side may reduce the energy consumption and cost per unit of computing power by another 10–100 times. Opportunities should be sought in these segments.

Recently, multiple investment banks including Morgan Stanley, J.P. Morgan, Bank of America, Goldman Sachs, UBS, Citi, Bernstein, and HSBC have published update reports on AI/semiconductors/power/memory. The bottlenecks for AI hardware have expanded from the single dimension of "GPU supply" to collective tension across five dimensions: power, chips, memory, equipment, and materials.

The scale of AI demand has broken through the forecast intervals of all traditional power planning, semiconductor equipment capacity, memory price models, and robot installation assumptions.

Morgan Stanley's global thematic research review points out that the global weekly large language model token consumption soared from 6.4 trillion to 22.7 trillion within 3 months, an increase of 2.5 times. The U.S. data center power gap for 2025-28 is 55 GW; J.P. Morgan's inaugural coverage of data center high-performance computing project debt directly gives a "122 GW financing gap in the next 5 years" figure. U.S. 5-year power planning has surged from 101 GW to 230 GW, with 44% of new projects experiencing grid connection wait times exceeding 4 years; Bank of America's latest target price report for Alphabet directly revises its 2026 capital expenditure upward to $181.5 billion, doubling year-on-year, with free cash flow declining 62%. These three sets of data are not outputs from the same framework, but independent portraits from three separate institutions on different research paths.

The evolution of bottlenecks in the semiconductor industry chain (especially in the AI computing power field) precisely progresses in this clear sequential order: "Computing (GPU) → Memory (HBM, etc.) → Optical Interconnect → Power/Liquid Cooling". This is the industry consensus for 2025-2026. As AI training/inference clusters scale from single cabinets (dozens of GPUs) to super-large scale (thousands to hundreds of thousands of GPUs), each time a bottleneck in one segment is resolved, the next physical/supply chain constraint is immediately exposed, forming "Leontief-style" complementary constraints (if one is missing, nothing can be shipped).

It is necessary to understand why this evolution occurs, the current status, and the underlying physical/engineering reasons:

1. First Phase Bottleneck: GPU Computing (Dominant from 2022-2024) Core Constraint:

High-end GPU (e.g., NVIDIA Hopper H100 → Blackwell B200 → Rubin) wafer capacity itself + advanced packaging.

Why it was the bottleneck: AI large models require massive parallel computing. TSMC's 4nm/3nm/2nm logic processes + CoWoS (2.5D/3D packaging) capacity once became the biggest choke point. Even if front-end wafers were sufficient, the back-end capability to package logic chips + HBM stacks couldn't keep up, preventing the entire GPU from being produced.

Easing situation: TSMC aggressively expanded CoWoS (capacity doubling 2024-2025), NVIDIA Blackwell is shipping in large volumes. But this only unlocked the "computing" segment, immediately exposing new problems.

2. Second Phase Bottleneck: Memory (HBM High Bandwidth Memory, becoming the tightest from 2024-2025)

Core Constraint: HBM3/HBM3e/HBM4 capacity.

Why it became the next bottleneck: GPU computing power increased, but model parameters exploded (trillions to tens of trillions of parameters), making data movement (memory bandwidth) the "memory wall." HBM can transmit several TB of data per second, over 20 times faster than conventional DDR memory. Because HBM is adjacent to the logic chip, data doesn't need to travel far, thus saving energy.

A single B200 GPU requires 192GB+ of HBM3e. A single cabinet (NVL72) HBM total capacity has reached 30-40TB, and bandwidth demands far exceed traditional DRAM.

Supply chain status: Only SK Hynix, Samsung, and Micron can mass-produce HBM, with complex processes (TSV + stacking). 2025 supply is already sold out, 2026 remains in short supply, with prices soaring 246% year-on-year. Even if GPU chips are ready, without HBM, assembly and delivery are impossible, causing delays in entire AI cluster deployments.

Result: Memory transformed from a "commodity" into a strategic choke point, potentially accounting for 30% of capital expenditures.

3. Third Phase Bottleneck: Optical Interconnect (Transition underway in 2025-2026)

Core Constraint: Physical limits of copper cables (NVLink/NVSwitch) in bandwidth, distance, power consumption, and weight.

Why a shift to optics is inevitable: Copper can still work within a single cabinet (72 GPUs), but when scaling to multi-cabinet or even thousands of GPU interconnects, copper cable attenuation is severe (effective distance <1 meter at 1.8TB/s bandwidth), weight explodes (NVL72 cabinet copper cables exceed 5,000, total weight 1.36 tons), and power consumption is high (replaceable optical modules replacing copper add an extra 20,000W). Signal integrity, latency, and cooling cannot support larger clusters.

Solution: Shift to optical interconnect (CPO Co-Packaged Optics + Silicon Photonics). Embedding optical engines directly next to the GPU/ASIC, using fiber optics for scale-out, achieving higher bandwidth density, lower per-bit power consumption, and longer distances.

NVIDIA heavily bet on this at GTC 2026, having invested in optical companies. Demand for 800G/1.6T optical modules is exploding. Companies like Lumentum, Broadcom, Coherent, Ayar Labs become new winners.

Current progress: Copper has reached its limit. Optics are shifting from "optional" to "mandatory," breaking through AI data center performance ceilings.

4. Fourth Phase Bottleneck (The Current Frontier): Power + Liquid Cooling (Becoming the ultimate physical constraint from 2026 onwards) Core Constraint: Power Wall + Cooling Wall + Grid Access.

Why it's the ultimate bottleneck: Each GPU's power consumption rose from 300W→700-1200W. Single cabinet power surged from 10-20kW (CPU era) to 120-200kW+ or even higher. Traditional air cooling has a physical limit of only 20-50kW, with unacceptable noise, airflow, and energy consumption.

Power side: Data centers require GW-level power supply, with grid connection queues potentially lasting years. Delivery cycles for transformers, solid-state transformers, and other equipment are extending to 100 weeks. Microsoft's CEO once bluntly stated, "We have GPUs but no electricity to plug them into."

Liquid cooling side: Must switch to Direct-to-Chip liquid cooling or immersion cooling, combined with microfluidics, cold plates, and other technologies. TSMC has demonstrated silicon-based liquid cooling on the CoWoS platform, supporting >2.6kW TDP. Liquid cooling/thermal management companies like Vertiv (VRT) are becoming new infrastructure core players.

Chain reaction: PUE (Power Usage Effectiveness) requirements are <1.2. Waste heat recovery, nuclear/new energy grid integration have become new topics. Even if all previous segments are solved, without power and cooling, cabinets cannot be racked and operated.

The Essential Logic of AI Computing Power Industry Chain Bottleneck Shifts AI computing power is not a "single-point" issue, but a systemic Leontief production function — GPU, HBM, interconnect, power, cooling must match based on the lowest-capacity component. Hyperscalers (Google, Microsoft, Meta, etc.) each time they solve one, immediately push capital and innovation to the next segment.

Currently (2026), we are in the transition period of "accelerated optical interconnect deployment + large-scale commercialization of power/liquid cooling." New bottlenecks may yet emerge (e.g., lasers, fiber materials, or grid transformers), but this chain of "computing → memory → optics → power/cooling" has become the recognized industry path.

This also explains why the investment logic is shifting from NVIDIA/TSMC to the HBM trio (SK Hynix, etc.), optical manufacturers (Lumentum, Coherent), and liquid cooling/power infrastructure companies (Vertiv, related power supply companies).

Every bottleneck shift is reshaping the value distribution across the entire semiconductor + data center industry chain.

Связанные с этим вопросы

QWhat are the four sequential bottleneck stages in the AI computing power supply chain as described in the article, and which one is identified as the 'ultimate bottleneck'?

AThe four sequential bottleneck stages are: 1) GPU/Computing, 2) Memory (HBM), 3) Optical Interconnect, and 4) Power + Liquid Cooling. The article identifies the fourth stage, Power and Liquid Cooling, as the 'ultimate bottleneck' or final physical constraint, as even if all other components are ready, a lack of power and cooling prevents the AI clusters from running.

QWhy did High Bandwidth Memory (HBM) become a critical bottleneck after the initial GPU shortage was alleviated?

AHBM became the critical bottleneck because as GPU computing power increased to handle massive AI models with trillions of parameters, the need for faster data transfer (memory bandwidth) created a 'memory wall.' HBM, which is much faster than traditional DDR memory, is essential for feeding data to these powerful GPUs. Its complex manufacturing process (involving TSVs and stacking) and limited suppliers (SK Hynix, Samsung, Micron) made its supply unable to keep up with explosive demand, delaying entire AI cluster deployments even when GPU chips were available.

QAccording to the article, what is the fundamental reason the industry is transitioning from copper cables to optical interconnects for scaling AI clusters?

AThe fundamental reason is the physical limitations of copper cables. While usable within a single server rack, copper cables face severe signal attenuation, excessive weight (e.g., over 1.36 tons for an NVL72 rack), high power consumption for signal integrity, and distance constraints when scaling to multi-rack clusters with thousands of GPUs. Optical interconnects (like CPO and silicon photonics) offer higher bandwidth density, lower power per bit, and longer transmission distances, making them a necessity for breaking the performance ceiling of large-scale AI data centers.

QHow does the article characterize the nature of bottlenecks in the AI computing power supply chain, and what investment shift does this logic explain?

AThe article characterizes the bottlenecks as forming a system-level 'Leontief production function,' where components like GPU, HBM, interconnect, power, and cooling are complementary constraints—the system's capacity is determined by the lowest-performing (most bottlenecked) component. This logic explains the shift in investment focus from earlier leaders like NVIDIA and TSMC to companies in subsequent bottleneck areas: HBM suppliers (SK Hynix, etc.), optical component makers (Lumentum, Coherent), and power/cooling infrastructure providers (Vertiv, power companies), as each bottleneck转移 reshapes value distribution in the产业链.

QWhat specific data points from major investment banks does the article cite to illustrate the scale and unpredictability of current AI infrastructure demand?

AThe article cites several independent data points: Morgan Stanley noted a 2.5x increase in global weekly LLM token consumption in 3 months. J.P. Morgan identified a 122 GW financing gap for data center projects over 5 years and that 44% of new U.S. power projects face over 4-year grid connection waits. Bank of America significantly raised Alphabet's 2026 CAPEX forecast to $181.5 billion (a doubling year-over-year), expecting a 62% drop in free cash flow. These figures from different research paths collectively show AI demand has exceeded all traditional planning models for power, semiconductor equipment, and memory pricing.

Похожее

GitHub, Transfixed by AI

On the night of February 9th, GitHub suffered a major outage caused by a simple configuration change—reducing a cache refresh interval from 12 to 2 hours—that triggered a cascade of failures. This was not an isolated event, but part of a broader pattern. In early 2026, GitHub experienced at least 8 major incidents, failing to meet its promised 99.9% availability. These outages stemmed from structural issues: explosive growth in load, tight service coupling, and insufficient protection against abnormal traffic. This unprecedented load is driven by AI Agents. In 2025, GitHub handled ~1 billion commits. By 2026, weekly commits reached 275 million, projecting to ~14 billion for the year—a 14x increase. AI tools like Claude Code now contribute 4.5% of all public repository commits, with weekly submissions surging 25x in just three months. AI-generated pull requests jumped from 4 million to 17 million per month in half a year. Unlike human developers, AI Agents work continuously, generating commits at a scale that overwhelms infrastructure designed for human rhythms. The surge also shattered GitHub's business model. Copilot's flat-rate pricing, based on assisting human developers, became unsustainable as Agentic AI sessions consumed resources worth hundreds of dollars for a few dollars in fees. In response, GitHub imposed usage limits and, by June 1st, shifted to a pay-per-use "AI Credits" system. Facing this new reality, GitHub realized a 10x scaling plan was insufficient. It announced a need to *redesign* its architecture for 30x current scale—decoupling services, adding fault isolation, and improving change management to prevent cascading failures. Other platforms like Stripe and AWS are facing similar challenges with AI Agents. Fundamentally, GitHub is transitioning from a human collaboration platform to an "exhaust pipe" for automated AI workflows. Its detailed post-mortem reports aim to maintain trust during this turbulent rebuild. The February outage was not just a technical glitch, but a signal of the software industry's entry into a new, AI-driven era.

marsbit10 мин. назад

GitHub, Transfixed by AI

marsbit10 мин. назад

Both Suffer Massive Losses Exceeding $90 Billion, Which Is in Greater Peril: Strategy or Bitmine?

Facing massive paper losses exceeding $90 billion each amidst a sharp market downturn, "Digital Asset Treasury" (DAT) giants Strategy and Bitmine find themselves in a precarious position, but with different underlying risks. Strategy, heavily invested in Bitcoin (BTC), faces significant financial strain. Its strategy relies heavily on debt, including convertible notes and preferred stock (STRC) requiring substantial dividend payments. With its cash reserves dwindling and BTC offering no staking yield for cash flow, Strategy's high leverage makes it vulnerable. A continued price decline could force asset sales to meet obligations, potentially creating a negative feedback loop. Its market value has already fallen sharply. In contrast, Bitmine, an Ethereum (ETH) holder, appears on firmer financial ground. It primarily funds its purchases through equity offerings (like ATM programs), avoiding debt pressure. It also generates income by staking a large portion of its ETH holdings. While not immune to market drops and shareholder dilution concerns, Bitmine maintains more flexibility, recently announcing a new preferred share offering to raise further capital. The core divergence lies in their financing: Bitmine uses equity (investor money), while Strategy uses debt (borrowed money). Consequently, Bitmine currently faces less immediate liquidity pressure than Strategy, which must navigate the dual challenge of servicing debt/dividends and a declining core asset (BTC) price.

marsbit17 мин. назад

Both Suffer Massive Losses Exceeding $90 Billion, Which Is in Greater Peril: Strategy or Bitmine?

marsbit17 мин. назад

Where the AI Bubble Really Is: Which Layer of Players Are Naked

AI Bubble: Where It Really Is and Who's Swimming Naked This analysis dissects the AI industry not as a single entity but as a five-layer pyramid, arguing that bubbles are concentrated in specific tiers, not uniformly distributed. **Key Distinction from the 2000 Dot-com Bubble:** Unlike 2000, where companies had stock prices before revenue, today's leading AI players have massive, contract-backed revenue driving their valuations. Core infrastructure demand is real, with every GPU running at full capacity for paying customers. **The Five-Layer Pyramid & Bubble Assessment:** * **L0 (Fab/Manufacturing) & Top L4 (Leading AI Apps): NO BUBBLE.** Companies like TSMC, NVIDIA, major cloud providers (Microsoft, Google, Meta, Amazon), and top AI labs have real revenues and orders. Supply is tightly constrained by TSMC's disciplined capacity control and physical limits like power/land for data centers, preventing a supply glut. * **L1 (Memory): BATTLEGROUND.** Sky-high HBM margins could signal a new structural cycle or a classic "boom before bust." The oligopoly of three major players may enforce supply discipline, making this a high-stakes bet. * **L2 (Interconnect/Optical Modules): BUBBLE TERRITORY.** Companies like Lumentum and AAOI have seen stock surges (4-10x) far outpacing revenue growth. This hardware segment has lower physical barriers to expansion than fabs, allowing speculation. It mirrors the 2000 bubble's epicenter—optics. * **L3 (Infrastructure/"GPU Landlords"): VULNERABLE.** GPU leasing companies profit from the current compute shortage but own no long-term moat. Their business model relies on a temporary bottleneck that will ease as big tech expands and new tech (e.g., potential space-based data centers) emerges. * **L4 Long Tail (VC-backed Startups): STRONG BUBBLE SIGNALS.** VC funding concentration in AI is twice that of the 1999 peak. Many startups with little revenue use the valuation logic of successful giants to justify their own, creating high risk of a "valuation crunch" when funding dries up. **Critical Risks to Monitor:** 1. **GPU Depreciation & Accounting:** Companies extending the assumed useful life of GPUs artificially boost profits. The true economic life depends on future generational leaps from NVIDIA. 2. **"GPU Credit" & Off-Balance-Sheet Leverage:** Emerging structures where shell companies borrow to buy GPUs and lease them out (with chipmakers sometimes investing) move debt off major balance sheets. This echoes the "vendor financing" of 2000 and the securitization risks of 2008, though currently small-scale. 3. **TSMC Abandoning Caution:** If the primary supply bottleneck (TSMC's conservative capacity planning) breaks, runaway supply could trigger a bust. 4. **Algorithmic Efficiency Breakthrough:** A major leap in software efficiency could drastically reduce the need for raw compute hardware, undermining the investment thesis. **Conclusion:** The AI boom is expensive and has frothy areas, but its core is underpinned by real demand and physical supply constraints. The bubble risk is layered: most present in optical components, GPU leasing, and the long-tail startup ecosystem, while the foundational chip manufacturing and leading application layers remain relatively solid—for now.

marsbit29 мин. назад

Where the AI Bubble Really Is: Which Layer of Players Are Naked

marsbit29 мин. назад

Торговля

Спот
Фьючерсы

Популярные статьи

Неделя обучения по популярным токенам (2): 2026 может стать годом приложений реального времени, сектор AI продолжает оставаться в тренде

2025 год — год институциональных инвесторов, в будущем он будет доминировать в приложениях реального времени.

1.8k просмотров всегоОпубликовано 2025.12.16Обновлено 2025.12.16

Неделя обучения по популярным токенам (2): 2026 может стать годом приложений реального времени, сектор AI продолжает оставаться в тренде

Обсуждения

Добро пожаловать в Сообщество HTX. Здесь вы сможете быть в курсе последних новостей о развитии платформы и получить доступ к профессиональной аналитической информации о рынке. Мнения пользователей о цене на AI (AI) представлены ниже.

活动图片