Editor's Note: As AI transitions from a "tool" to a "workflow infrastructure," GPU rental prices are accelerating upwards, with supply continuously tightening.
From the nearly 40% price surge in H100 one-year contracts to computing power being locked in until the second half of 2026, and AI labs continuously securing supply through long-term contracts and renewal mechanisms, the operating logic of the GPU market has fundamentally changed: prices are no longer primarily determined by hardware costs but are shaped by token consumption, model capabilities, and production efficiency.
Changes on the demand side are particularly critical. New paradigms like multi-agent systems, native content generation, and AI programming tools are driving token usage into an exponential growth phase. The core conclusion of the report is also becoming clear: the return on investment (ROI) of AI tools has been validated, with 5–10x returns making it difficult for computing power prices to effectively constrain demand for a considerable period.
The resulting tension is increasingly evident: the real-world computing power market shows comprehensive shortages and shifting pricing power upwards, while the capital market remains stuck in the expectation of "eventual oversupply and commoditization." This misalignment between expectations and reality is reshaping the valuation logic of the AI infrastructure sector.
As computing power becomes a new factor of production, its pricing mechanism, supply structure, and capital returns are undergoing a deep restructuring.
The following is the original text:
Anthropic's Claude 4.6 Opus and Claude Code demand has surged significantly. Its Annual Recurring Revenue (ARR) leaped from $9 billion at the end of last year to over $25 billion currently in just one quarter, nearly tripling. Meanwhile, open-source models represented by GLM and Kimi K2.5 have also driven the rapid expansion of application scenarios related to open-source models. Continued financing by companies including Anthropic, OpenAI, and several Neolabs is also intensifying the demand for GPU resources.
This inflection point means demand has risen sharply in a short period, triggering a GPU buying frenzy among hyperscalers and emerging cloud service providers (Neoclouds).
This new demand is pushing prices higher along the entire supply chain, from DRAM and NAND storage to fiber optic cables, data center colocation, and infrastructure like gas turbines—almost all related products and services are experiencing price increases.
GPU rental prices have become the latest area among computing power-related products and services to experience supply tightness and price surges. The price of a one-year H100 GPU rental contract rose from a low of $1.70 per GPU per hour in October 2025 to $2.35 in March 2026, an increase of nearly 40%.
On-demand GPU rental capacity is almost completely sold out across all models—users who have secured on-demand instances are unwilling to release computing power back to the market even after price increases. In early 2026, finding GPU computing power was almost like trying to snag a ticket for the "last flight out": prices were high, and tickets were scarce. A more apt analogy might be "finding a channel to buy medicine."
At SemiAnalysis, we have long and deeply tracked various trends and key issues within the Neocloud and hyperscaler ecosystem, including GPU rental prices. This capability stems from our ongoing research and practice in projects like ClusterMAX, InferenceX, and AI Cloud Total Cost of Ownership (TCO).
Simultaneously, we invest significant effort in helping various AI labs connect with Neocloud service providers, search for GPU rental resources on the market, and continuously exchange insights on GPU rental price trends with almost all participants in the ecosystem.
Since 2023, we have established and maintained a GPU rental price index system for our clients, covering mainstream GPU models (such as H100, H200, B200, B300, GB200, GB300, MI300, MI325, MI355) across different lease terms, from on-demand and 1-month short-term leases to long-term contracts of up to 5 years. This index is built based on survey data from multiple Neocloud service providers and computing power buyers, cross-validated with actual transaction data and our participation in facilitating negotiations and deals.
Today, we are making the SemiAnalysis H100 One-Year GPU Rental Price Index publicly available, hoping to provide the industry with more data and insights. This index is updated monthly, and we will also continuously publish the latest trend interpretations and market observations via X and LinkedIn. As for the complete pricing data covering different lease structures and other mainstream GPU models, it is currently only available to institutional subscribers of our AI Cloud TCO model.
This report will focus on the latest trends in the GPU rental market, firsthand market observations, and key data, analyzing how we understand the overall market structure and providing a preliminary judgment on the future direction of rental prices.
GPU Rental Market Enters "Dynamic Pricing" Phase
Looking solely at the H100 one-year rental price curve is insufficient to fully capture the market's tightness—our actual experiences sourcing computing power on the front lines and feedback from market participants paint a more severe picture.
Current demand comes from multiple highly heterogeneous use cases, with almost no "one-size-fits-all" solution. For instance, on the inference side, large-scale Mixture-of-Experts (MoE) models are better suited to run on the latest large-scale systems like the GB300 NVL72; whereas on the training side, H100 still holds a cost-performance advantage, keeping demand for even relatively "older generation" GPUs high.
Clients are now even scrambling to pay $14 per GPU per hour for AWS p6-b200 spot instance prices; some leading Neocloud providers have stopped selling single nodes; renewal prices for some H100 contracts are identical to those signed two or three years ago; and some H100 contracts have been directly renewed until 2028, a lease term of 4 years. Finding even an 8-node (64 GPU) H100 or H200 cluster is not easy now—half the providers we asked were completely sold out, and most replied that no Hopper architecture GPUs would be released from expiring contracts anytime soon.
We've even heard that some computing power lessees have started subdividing and subletting the clusters they've rented, much like splitting apartments for short-term rentals during the Monaco Grand Prix. The emergence of so-called "Neocloud subletters" might not be a joke anymore.
Blackwell supply is also extremely tight. We understand that due to strong demand for open-weight models and the ongoing inference boom, the deployment and delivery cycle for new Blackwell clusters has now extended to June-July. Moreover, these upcoming clusters are mostly pre-booked. In fact, looking at the entire market, almost all new capacity scheduled to come online until August-September 2026 has already been reserved.
GPU Rental Prices: Making a Comeback
But how did the market get here? Just 6 months ago, most market observers were skeptical about the GPU's "terminal value" and普遍认为 GPU rental prices would inevitably decline over time. Back then, if a Neocloud or hyperscaler used a 6-year depreciation cycle for GPU computing assets in their financial models, they might even be criticized by financial analysts. Before discussing future trends, let's quickly review how things evolved to this point.
Before the second half of 2025, the mainstream expectation across the ecosystem was that with the large-scale deployment of Blackwell and its significantly lower cost per unit of compute, Hopper (i.e., H100 and H200) rental prices would noticeably fall. The opposite happened. By H2 2025, H100 demand not only didn't weaken but intensified in many scenarios. The rapid adoption of open-weight models and the continued acceleration of inference demand at that time were the earliest signals of this near-limitless wave of computing demand.
By January 2026, the computing power market reached its next inflection point: DRAM and NAND storage prices, after several quarters of rapid increases, began a near-"parabolic" surge. According to our storage models, LPDDR5 and DDR5 contract prices saw year-on-year increases approaching approximately 4x and 5x respectively in Q1 2026.
To mitigate margin risks from sharply rising component costs, OEMs began raising AI server prices, with increases significantly higher than the underlying component price hikes themselves. This complicated cluster capital expenditure decisions: higher server procurement costs compressed project expected returns, forcing some operators to slow deployment pace or even cancel projects outright. The result was that some potential new supply was delayed or shelved, further exacerbating the tightness in the rental market.
Amid this procurement chaos triggered by "AI server pricing getting out of control," GPU rental demand accelerated significantly, and the remaining computing power on the market was almost completely absorbed in January and February. By March, available capacity was nearly impossible to find for H100, H200, or B200 across any lease term. One-year rental prices broke through $2 per GPU per hour by the end of January and rose another 15%–20% from late January levels by mid-to-late February, with an expected further 15%–20% month-on-month increase by the end of March.
A key driver of demand earlier this year came from native media generation. Applications like Seedance and Nano Banana are driving users to generate and iterate images and videos at scale, significantly increasing token throughput. But a more critical and visible source of demand is the rise of multi-agent workloads—these systems execute multi-step processes, continuously iterating in high-concurrency environments, driving token consumption and computing demand in an "exponential" growth pattern.
This trend is particularly evident in the data related to Claude Code, which we have mentioned in several articles. Taking SemiAnalysis as an example, in just the past 7 days, the company internally consumed billions of tokens, at an average cost of about $5 per million tokens. But the resulting time savings, workflow expansion, and capability enhancements far exceeded the cost itself. Today, SemiAnalysis has embedded a suite of AI tools into multiple workflows, no longer limited to simple search and summarization but extending to data dashboards, automated scraping, large-scale data processing, and agent-based financial modeling.
We also track this explosive demand growth through metrics like Claude Commits Daily. At the current trend, we expect Claude Code to account for over 20% of all code commits by the end of 2026. It's fair to say that, in the time you haven't noticed, AI has begun "eating" the entire software development process. Institutional clients interested in accessing this dataset can contact our API team. A sneak peek: this commit volume is already significantly higher than when we first released it.
In our circle, almost everyone is a heavy user of Claude Code. But we also know this circle is deeply immersed in AI and semiconductors, essentially just "a small group on the front lines."
For many Fortune 500 companies and the broader public, Claude Code and the "agent world" are merely slightly novel fringe topics, occasionally appearing in Facebook feeds or NPR podcasts. They have hardly realized that a productivity wave and structural shock driven by agents is approaching.
As more participants from the real economy gradually realize the astonishing ROI offered by using AI tools and join this "computing power wave," token consumption will continue to see step-like increases. The debate about AI ROI is, in fact, settled—the value created by using AI tools often exceeds their cost by an order of magnitude. Against this backdrop, the continuous rightward shift of the token demand curve is forming a strong and (at this stage) relatively inelastic force pushing GPU rental prices higher.
Simply put, if the ROI from using AI tools can reach 5–10x, then GPU rental prices still have considerable room to rise before they truly start to suppress demand. We also cannot rule out the possibility that further increases in rental prices will continue to be passed upstream, pushing server and core component costs even higher.
SemiAnalysis H100 One-Year Rental Price Index Release
Today, we are making the SemiAnalysis H100 One-Year Rental Contract Price Index freely available to the public, aiming to enhance market awareness and transparency regarding GPU rental price trends.
This index is built based on monthly survey data from over 100 market participants (including Neocloud providers, computing power buyers, and sellers) to determine the representative range (25th to 75th percentile) of GPU rental prices. It is also cross-validated with actual transaction data, and we facilitate deals between buyers and sellers within our network, directly participating in some transactions to further calibrate price levels.
Since 2023, we have continuously tracked contract prices for GPUs including H100, H200, B200, B300, GB200, GB300 across lease terms from 3 months to 5 years; data for the AMD series (MI300, MI325, MI355) is also included.
Compared to existing GPU indices on the market, the SemiAnalysis H100 One-Year Contract Price Index has several key differences:
First, many GPU rental indices are based on spot/on-demand quotes or publicly listed prices, but in reality, the vast majority of GPU rental transactions are completed through long-term contracts, typically with terms of 6 months or more. These prices are often formed through bilateral negotiations and do not appear in any public database. Most large Neocloud providers prefer leases of at least 1 year, 2–3 years is more ideal, and 5-year large-scale offtake agreements are even better. The SemiAnalysis H100 One-Year Rental Index focuses precisely on this "contract market"—where the actual transaction volume is most concentrated. By clearly targeting a specific lease term, this index also makes it easier for users to understand the market segment it covers and compare it with their own observations.
Second, publicly disclosed prices do not represent actual transaction prices. Prices published by hyperscalers and Neoclouds provide more of a directional reference for trends rather than actual transaction levels. These prices often lag behind changes in the contract market, usually adjusting only after computing demand has already shifted. Especially in the on-demand market, prices are often set at relatively fixed levels, while actual supply-demand changes are reflected through utilization or occupancy rates, with adjustments made only when necessary. This market mechanism will be discussed further later in the article.
Third, while there are many indices capable of processing large-scale quote, price, and transaction data, offering advantages in trend analysis, our approach emphasizes direct interaction with market participants. Behind every quote, every transaction, there is specific context and decision logic. We aim to complement quantitative data with these qualitative insights and frontline observations to more fully还原 the true structure of the GPU rental market.
For institutional subscribers, we also provide complete term structure data covering almost the entire mainstream GPU rental market.
Alongside releasing the H100 One-Year Contract Price Index, we have also launched the SemiAnalysis Tokenomics Dashboard for institutional Tokenomics model subscribers, to track and understand the frontier AI model landscape. This dashboard allows users to perform custom comparisons across dimensions like code, reasoning, math, and agent evaluation, compare API pricing across different models and service providers, and view key data disclosed by major AI labs, including token usage, revenue, valuation, and customer scale.
Current Structure of the GPU Rental Market
Before the second half of 2025, the pricing environment in the GPU rental market was relatively more competitive. At that time, operators had more ample GPU inventory, and end demand was just beginning to accelerate. Therefore, competition among Neocloud service providers was fierce,普遍通过更具吸引力的价格来争夺客户 with the core goal of increasing utilization,尽可能 "extracting" the value of existing computing assets before the next GPU iteration cycle arrived.
Since then, the market landscape has done a 180-degree turn. Today, Neoclouds and hyperscalers completely hold the initiative—they can demand higher upfront payments, better pricing, longer contract terms, and even自主选择合约的起止时间 to match their own inventory and capacity plans. Time is also on the supply side's side: they can proceed with deployment at their own pace and, in a continuously rising price environment, gradually筛选出最优质的客户组合.
Structurally, the GPU rental market can be roughly divided into three segments, corresponding to different types of customer demand:
Short-Term Leases: On-demand, spot, and contracts under 3 months
Mid-Term Contracts: Contracts from 3 months to over 3 years
Long-Term Offtakes: 4–5 year contracts, with 5 years being most common
Short-Term Leases: On-Demand, Spot, and Sub-3-Month Contracts
Short-term leases are at the very front end of the entire term structure and often correspond to "excess capacity." However, some providers (like Runpod, Lambda) specialize in providing sizable, flexible on-demand or spot computing power.
It's important to note that the pricing mechanism of the on-demand market differs significantly from other contract markets. Typically, service providers set a relatively fixed price level for on-demand resources and adjust it only in rare circumstances. In other words, prices in the short-term market are not entirely driven by real-time supply and demand but rather reflect market tightness through changes in resource utilization.
Service providers usually make one-time adjustments to prices based on resource utilization: when utilization is low, they stimulate demand by lowering prices; when utilization is near full capacity, they raise prices because demand can still be sustained even at higher price levels.
This also explains why, viewed over time, the on-demand prices published by Neoclouds often remain unchanged for long periods before suddenly experiencing "jump-like" increases or decreases. For the on-demand market, the true high-frequency indicator of demand change is not price, but resource utilization.
Mid-Term Contracts
From an economic perspective, the more critical segment is the "contract market," as the vast majority of GPU rental transaction value occurs here. Among these, 1-year contracts are particularly important—they reflect both the marginal demand from non-AI lab customers and the spillover demand from large customers, making them the most sensitive indicator for gauging market tightness.
AI-native companies and small-to-medium-sized AI labs are primarily active in the 1–3 year range. However, a recent clear trend is that these organizations are also beginning to try to lock in computing resources through longer-term contracts—many extending to 4 years or more, even willing to pay over 20% upfront payments, which was not common in past contracts over 4 years.
Long-Term Offtakes
In the longer-term 4–5 year market, the dominant force is large AI labs, which lock in large-scale computing resources early on. These deals typically correspond to clusters of 50MW, 100MW, or even larger scale, roughly equivalent to about 24,000 to 48,000 GB300 NVL72 GPUs. Overall,这类长期包销协议已占据 Neocloud GPU 租赁市场相当大的份额.
AI labs favor such contracts because they can lock in large-scale computing power at once to cope with rapidly growing end demand. Simultaneously, these organizations often deeply participate in cluster design, including key aspects like storage, networking, and CPU configuration. These transactions are often delivered in **bare metal** form, as AI labs possess sufficient engineering capability to customize the technology stack at a lower level, achieving optimal TCO (Total Cost of Ownership) and performance.
For Neocloud service providers, such deals are also attractive. On one hand, they can concentrate sales efforts on a few large orders rather than handling numerous small clients for the same revenue; on the other hand, long-term contracts facilitate better terms for debt financing—matching financing duration with contract terms可以有效降低期限错配与价格波动风险, and in most cases lock in project internal rates of return (IRR) of several percentage points.
Furthermore, hyperscalers often play the role of "backstop"—they act as direct承购方, purchasing computing power from Neoclouds and reselling it to AI labs. This structure is a win-win for all parties: Neoclouds can secure better financing terms based on AAA-rated承购方; while hyperscalers can share in a portion of the project's profits by providing credit backing without expanding their own balance sheets.
The table below lists some large offtake agreements we are tracking. We conduct in-depth analysis of these deals to reverse-engineer the implied GPU hourly price ($/hr/GPU), as well as key profitability metrics like project IRR and EBIT margins.
In the current market environment, the vast majority of large AI clusters being expanded are actually "internally consumed" by AI labs. However, these organizations still enter the sub-4-year contract market to supplement computing power, while also indirectly preventing supply from re-entering this market by renewing existing H100 and H200 clusters. As GB200 and GB300 ultra-large-scale clusters gradually come online, how the supply-demand relationship evolves in the 1–3 year contract market will become a key variable to watch.
"Where The Puck is Going"
Currently, the most striking feature is the clear divergence between underlying reality and market sentiment. Although signals that should be bullish for Neoclouds (margin expansion, extended asset useful life) like supply tightening and rising prices are very clear, the public market has grown increasingly pessimistic about companies like CoreWeave, Nebius, Iris Energy, whose stock prices remain near the lows of the past 6–12 months.
The market is still dominated by the narrative of "eventual oversupply and compute commoditization," and the aforementioned changes have not truly alleviated investor concerns about the long-term value of GPUs. But from the frontline perspective,持续紧张, enhanced pricing power means almost all computing power is being "absorbed" by demand—even with performance variations, it remains in short supply in this extreme shortage environment.
Three Key Future Observables
To judge whether GPU rental prices will remain high, focus on three variables:
1、GB300 Cluster Expansion Pace (2026)
The key is the relative speed between新增算力 and token demand—whether supply alleviates tightness or demand continues to outpace supply. This will directly affect whether AI labs continue to participate in the sub-4-year market and the price trend in that segment.
2、Worsening Chip Shortages
Including key bottlenecks like TSMC's N3 process capacity, HBM, DRAM, NAND—any fluctuations in manufacturing execution could further tighten supply.
3、AI Lab Revenue (ARR) & Token Consumption Growth Rate
The expansion of AI commercialization and usage scale will determine the strength of end demand, which is the core variable driving computing power demand.
Prices Move Unidirectionally Upward, Returns Follow
Overall, a relatively clear conclusion is: the probability of GPU rental prices continuing to rise is higher than the probability of them falling.
This process is distinctly self-reinforcing: when Neoclouds observe supply tightening and prices rising, they lock in more hardware in advance, further compressing market supply and pushing prices even higher. This is similar to the GPU shortage cycle of 2023–2024—where supply tightness drove significant profit expansion for OEMs and led to substantial server price increases (though this process may not fully repeat given the market's higher maturity this cycle).
Simultaneously, the renewed rise in GPU rental prices is also improving Neoclouds' Return on Invested Capital (ROIC):
On one hand, it increases the profit margin of deployed assets
On the other hand, it extends the economic useful life of GPUs, allowing capital to generate cash flow for a longer period
Who Benefits Most Currently?
The most direct beneficiaries currently are computing power providers with the following characteristics:
· Short-cycle contracts为主 (can be repriced quickly)
· Possess large存量 of H100 equipment
· Have new capacity coming online in the short term
Neoclouds with short-lease structures can release old contracts faster and re-sign at higher prices, quickly achieving profit expansion. Also, hyperscalers and Neoclouds that locked in next-generation computing power (multi-year contracts) early will benefit in the future cycle.
So the question arises: This time, will it really be "different"?
















