Can Alibaba Cloud Rewrite Itself?

marsbitPubblicato 2026-05-20Pubblicato ultima volta 2026-05-20

Introduzione

Over the past five months, Alibaba Cloud's MaaS (Model as a Service) revenue has surged 15x, marking a strategic overhaul where the company is shifting its 17-year-old system designed for "humans using cloud" to a new paradigm centered on "Agents consuming Tokens." At its recent summit, Alibaba Cloud announced a full-stack upgrade encompassing "chip-cloud-model-inference," all optimized for AI Agents. Key launches include the new AI product portal "QianWen Cloud," hyper-node servers powered by the in-house AI chip Zhenwu M890, and the latest flagship model, Qwen3.7-Max. Senior VP Liu Weiguang described this as building "China's largest AI factory," where chips are raw materials, the cloud is the workshop, models are machines, and the inference platform is the assembly line, with Tokens as the final product. The company is now emphasizing its chip strategy, unveiling the Zhenwu M890 and a two-year roadmap for future chips. With over 560,000 chips deployed across 400+ clients, Alibaba Cloud aims to control the marginal cost per Token, mirroring Google's integration of TPU and Gemini for optimal cost-performance. The cloud infrastructure itself is being rewritten. Traditional cloud interfaces are being transformed into standardized, Agent-callable Skills. A new scheduling logic focuses on "task scheduling" over "resource scheduling" to handle the unpredictable, elastic workloads of Agents. Liu noted that AI applications now automatically provision cloud resources, with one cu...

Over the past five months, the MaaS revenue of Alibaba Cloud has grown 15 times, which is just one facet of Alibaba Cloud's self-reconstruction. At the summit, Alibaba Cloud announced the completion of a full-stack Agentization upgrade covering "chip-cloud-model-inference," simultaneously launching the new AI product official website "Qianwen Cloud," the hyper-node server equipped with the self-developed AI chip Zhenwu M890, and the latest flagship model Qwen3.7-Max.

In the words of Alibaba Cloud Senior Vice President Liu Weiguang, "We are building China's largest AI factory." The factory metaphor implies a complete production logic: chips are the raw materials, the cloud is the workshop, models are the machines, the inference platform is the assembly line, and the final commodity produced is the Token.

The essence of this reconstruction is to transform the entire system built over the past 17 years around "humans using the cloud" into a new system centered on "Agents consuming Tokens."

Why Play the Chip Card Now?

Alibaba Cloud rarely emphasized chips in public before. At this summit, it not only released the new-generation training-and-inference integrated AI chip Zhenwu M890 but also unprecedentedly disclosed its chip roadmap for the next two years, with successive generations of products, Zhenwu V900 and Zhenwu J900, progressing year by year.

The Zhenwu M890 is equipped with 144GB of video memory, an inter-chip interconnect bandwidth of 800GB/s, and performance three times that of the previous generation Zhenwu 810E. Paired with the self-developed ICN Switch interconnect chip, 128 AI chips can be combined into a single machine, with P2P latency compressed to within 150 nanoseconds.

But beyond the specifications, the more crucial information is scale. The Zhenwu series has cumulatively shipped 560,000 units and has already entered over 400 clients across more than 20 industries, including telecommunications, FAW, and SPDB.

Liu Weiguang repeatedly used Google as an analogy. The deep integration of Google's TPU and Gemini allowed Google to achieve optimal cost-effectiveness within its own framework. Alibaba Cloud certainly wants to follow the same path. He summarized the competitive logic in one sentence: "If the future competition is about every chip generating more high-quality Tokens than competitors, then we win."

Combined with the Yitian CPU, Panmai Smart NIC, and Zhenyue storage master control chip, T-Head's chip landscape has expanded from a single point to complete coverage of computing power, networking, and storage. When inference demand expands exponentially, only by holding chips in one's own hands can the marginal cost of each Token be controlled.

The reasoning is not complicated. Model companies can compete on parameters, but cloud providers ultimately compete on whose Tokens are cheaper, more stable, and faster. Chips are the starting point of this cost war.

The Cloud Itself Must Be Rewritten

Chips solve the problem of "being able to run," but Agents' demands on the cloud extend far beyond computing power.

The interaction logic of traditional cloud products is designed for humans: opening the console, looking at menus, configuring parameters, clicking buttons. This setup is completely unusable for Agents. Agents don't view web pages or click buttons; they need structured capability descriptions, standardized calling protocols, and predictable feedback.

Alibaba Cloud CTO Li Feifei used a set of comparisons to illustrate the problem: Traditional cloud workloads are steady-state; an ECS instance, once launched, might run for months or even years. But Agent workloads are characterized by "irregular elasticity, short lifecycles, and instant scaling up and down." After an Agent completes a task, its sandbox is destroyed. The next request might come in a few milliseconds, or it might not come for several hours.

To address this, Alibaba Cloud has done three things.

First, it made cloud products Skill-based, MCP-based, and CLI-based. Simply put, each cloud product is packaged into a standardized interface that Agents can directly call, using the cloud like calling functions.

Second, it built a dedicated runtime environment for Agents—lightweight sandboxes, multi-Agent collaboration, cross-task memory, and data flow channels.

Third, it rebuilt the scheduling logic, shifting from "resource scheduling" to "task scheduling," because traditional resource orchestration methods cannot withstand the concurrency of massive numbers of Agents.

Liu Weiguang stated that some AI applications, after going online, automatically provision cloud resources in the background—virtual machines, database instances, sandbox environments—without any human intervention. The volume of resources automatically provisioned for one customer in a single day is equivalent to two weeks of manual operations in the past.

"This essentially means Agents are using the cloud by themselves." Liu Weiguang provided an internal conversion formula: Token consumption can be proportionally converted into GPU usage, and each additional GPU card roughly drives a one-to-one increase in CPU. In other words, the growth in Token revenue is not cannibalizing traditional cloud revenue but is pulling it up, provided the cloud platform can handle the Agent workload.

Therefore, Alibaba Cloud is not merely adding a layer of AI capability to the original system. It is rewriting everything from interaction methods, scheduling logic, and billing models to product forms.

Models Are Not for Chatting

The third layer of the full-stack reconstruction is the model. Qwen3.7-Max ranked first among domestic models in the global Arena blind test overall leaderboard, surpassing Kimi-K2.6, DeepSeek-v4-pro, and GLM-5.1. The focus of this release is Alibaba's redefinition of the direction of model capabilities.

Alibaba Group's Tongyi large model leader Zhou Jingren said, "In the past, we pursued how well the model 'spoke.' Now we demand that the model 'gets things done.'"

Taking Alibaba Cloud's practice in chips as an example, on the previously untrained Zhenwu M890 chip, Qwen3.7-Max, relying solely on a task description, autonomously worked for 35 hours from scratch, independently completing the writing and optimization of a production-grade AI computing kernel. The final performance was 10 times higher than the official version, with no human intervention or intermediate guidance throughout the entire process.

This demonstrates the model's core capability in Agent scenarios: long-range autonomous execution. It takes a task, breaks it down, plans, writes code, debugs—working continuously for 35 hours without stopping.

To support this level of inference demand, the Bailian platform has also undergone corresponding upgrades: pooled scheduling to improve GPU utilization, context caching to eliminate redundant calculations, and elastic throughput scheduling to handle concurrency peaks.

In terms of ecosystem, Bailian remains open for access. Besides the Qianwen model matrix, it has also onboarded third-party models such as Zhipu's GLM-5.1, MiniMax's M2.7, and Moonshot AI's Kimi K2.6.

Liu Weiguang mentioned, "Clients in practice don't use just one model; they use a combination of multiple models. We provide the combinations; clients find the mix that best suits them on the platform." At the summit, executives from six leading domestic model companies collectively took the stage, creating a scene reminiscent of a "domestic AI alliance."

Within the last three months, the Qianwen flagship model has iterated through three versions: 3.5, 3.6, and 3.7. This release rhythm itself sends a signal: the competition in model capabilities is far from over, and Alibaba intends to establish a long-term advantage through the vertical integration of self-developed chips and self-developed models.

The Real Bet of This Reconstruction

Looking back, the underlying logic of Alibaba Cloud's full-stack reconstruction is simple and pure. When AI revenue growth far outpaces traditional cloud business, when Tokens might replace ECS as the largest product line, when Agents start automatically provisioning cloud resources without humans needing to log into the console, the entire technical system designed for humans has reached a point where it must be changed.

But the difficulty of execution is another matter. Liu Weiguang himself admitted that the transformation is "easy to talk about, very hard to do." In the past, the sales team interacted with clients' IT departments. Now, doing MaaS requires dialogue with business units or even the CEO.

"Your conversational ability, your experience, are requirements of a completely different level." Alibaba Cloud has already established dedicated MaaS sales roles for large enterprise clients, operating and being assessed separately from traditional IaaS sales.

Performance metrics are also changing, no longer just looking at call volume, but at "high-quality Tokens"—Tokens that solve real problems, not chit-chat Tokens. Three core metrics: daily growth in paying customers, the number of core business systems integrated with models, and the efficiency of Agents autonomously completing task loops.

These adjustments at the organizational and mechanism levels often indicate a company's true judgment more than technical announcements. Alibaba Cloud wants to rebuild its revenue structure, customer relationships, and sales system. Liu Weiguang stated, "When we were building the cloud before, the client's IT budget was calculable—how many servers offline, roughly how much to move them up—you could see the problem. But with MaaS, the answer to this problem is unknown; once you get in, it might exceed your imagination."

The problem statement can't be seen, and the answer is uncertain, but Alibaba Cloud has still decided to dismantle and rewrite its entire system because the only certainty is that AI is an opportunity ten or even a hundred times larger than any before.

This is probably the most noteworthy information from this summit: not which chip has more computing power, or which model ranks where, but that China's largest cloud provider is betting on a future it believes will come, with an aggressive posture approaching that of a startup. (Author: Zhang Shuai, Editor: Yang Lin)

Domande pertinenti

QWhat is the core reason behind Alibaba Cloud's comprehensive 'chip-cloud-model-inference' stack reconstruction according to the article?

AThe core reason is to transform its entire 17-year-old system built around 'people using cloud' into a new system centered on 'Agents consuming Tokens'. This is driven by the recognition that AI revenue growth vastly outpaces traditional cloud business, and Tokens could replace ECS as the largest product line as Agents begin to automatically provision cloud resources.

QWhy does the article emphasize Alibaba Cloud's focus on developing its own AI chips like Zhenwu M890 now?

AThe article emphasizes it because, in an era where inference demand is expanding exponentially, controlling the cost of each Token is crucial for competition. By developing its own chips (like Zhenwu), Alibaba Cloud aims to control the marginal cost per Token from the source, similar to Google's strategy with TPU and Gemini, to ultimately provide cheaper, more stable, and faster Tokens.

QHow does Agent's demand for cloud differ from traditional human user demand as described in the text?

AAgent demand differs fundamentally: it is 'irregularly elastic, short-lived, and instantly scalable then gone,' requiring structured capability descriptions, standardized calling protocols, and predictable feedback. In contrast, traditional cloud workloads are steady-state (e.g., an ECS instance running for months). Agents do not interact with web consoles; they need cloud services encapsulated as standardized, function-like interfaces.

QWhat key shift in Alibaba Cloud's model capability focus is highlighted with the release of Qwen3.7-Max?

AThe key shift is moving from pursuing models that 'speak well' to models that 'can accomplish tasks.' The focus is now on long-term autonomous execution capabilities. For example, Qwen3.7-Max autonomously wrote and optimized a production-level AI computing kernel for the new Zhenwu M890 chip over 35 hours without human intervention, improving performance tenfold.

QAccording to the article, what organizational and operational changes is Alibaba Cloud making to support its MaaS (Model-as-a-Service) transformation?

AAlibaba Cloud is making several changes: 1) Establishing dedicated MaaS sales roles for large clients, separate from traditional IaaS sales with independent assessments. 2) Shifting key performance indicators (KPIs) from mere call volume to 'high-quality Tokens' that solve real problems. Core metrics now include daily growth of paying customers, number of core business systems integrated with models, and the efficiency of Agents completing task loops autonomously. 3) Changing sales dialogue from IT departments to business units or CEOs, requiring different skills and experience levels.

Letture associate

Moutai Moment: When Liquidity Dries Up, Everyone Huddles Around HYPE and ZEC

In May 2026, a notable sentiment shift is occurring in the crypto market, symbolized by prominent Ethereum advocate David Hoffman selling his remaining ETH. While major assets like ETH and SOL struggle—ETH is down over 50% from its 2025 high—two assets, HYPE and ZEC, are rallying strongly. This divergence mirrors the "core asset crowding" phenomenon seen in traditional markets during liquidity crunches, where capital concentrates in few perceived safe havens. The market faces liquidity pressure, partly due to Bitcoin ETF outflows and stalled narratives for major Layer 1s. In contrast, Hyperliquid (HYPE) attracts capital due to its strong fundamentals as a leading decentralized perp exchange with substantial protocol revenue and a share of USDC reserve yields. Its tokenomics, heavily favoring users, add to its appeal. Meanwhile, Zcash (ZEC) surges as a "privacy beta" play, driven by growing fears over AI-driven deanonymization and quantum computing threats. Endorsements from figures like Arthur Hayes and Multicoin Capital's Tushar Jain, alongside regulatory clarity and ETF expectations, fuel its rise. This crowding poses risks. Similar to the A股白酒 rally that ended when liquidity returned, the current crypto crowding could unravel if macro conditions improve or if positions become too concentrated, leading to a sharp correction. The article concludes by questioning whether investors hold assets out of conviction or inertia and prompts consideration of what the next crowded trade might be.

marsbit15 min fa

Moutai Moment: When Liquidity Dries Up, Everyone Huddles Around HYPE and ZEC

marsbit15 min fa

OpenAI Expands into Singapore

OpenAI has established its first applied AI laboratory outside the United States in Singapore, backed by an investment exceeding SGD 300 million (approximately USD 234 million). This new lab, part of a strategic partnership with Singapore's digital development agency, aims to strengthen the local AI ecosystem and support clients across the Asia-Pacific region. It plans to hire over 200 staff to work on national priorities like education, public services, finance, and healthcare, including training programs for mid-career engineers. In parallel, Singapore has also forged a new national AI partnership with Google, focusing on tackling societal challenges, building an AI-ready workforce, and fostering enterprise innovation. This builds upon existing collaborations and aligns with Singapore's broader national AI strategy, which commits over SGD 1 billion to boost public-sector AI capabilities between 2025 and 2030. These moves underscore Singapore's push to solidify its position as a global AI hub.

marsbit1 h fa

marsbit1 h fa

SpaceX Prospectus Reveals: Huge Loss of 49 Billion, Musk Controls 85% Voting Rights

SpaceX has filed for a historic IPO, potentially making Elon Musk the world's first trillionaire. The 2025 financials reveal $18.7B in revenue but a net loss of $4.9B, with Q1 2026 losses deepening to $4.3B. The filing outlines three core businesses: Space (loss-making due to heavy Starship investment), Connection (profitable, driven by Starlink's 10.3M users), and AI (the largest loss driver with massive capital expenditure). Following its merger with xAI, the AI unit faces regulatory scrutiny over Grok and relies heavily on ground-based data centers for now, with orbital AI computing targeted for 2028. Musk will retain ~85% voting post-IPO. The company's valuation thesis hinges on future growth from AI and space infrastructure, despite current steep losses and a high prospective price-to-sales multiple. Proceeds will repay debt and fund expansion.

marsbit1 h fa

SpaceX Prospectus Reveals: Huge Loss of 49 Billion, Musk Controls 85% Voting Rights

marsbit1 h fa

NVIDIA's Q1 Performance is Solid, Vera CPU Drives Future Increment

NVIDIA reported solid Q1 FY2027 results and Q2 guidance, largely meeting optimistic investor expectations. Revenue reached $81.62 billion, up 85% year-over-year, with adjusted EPS of $1.87 beating estimates. The company re-segmented its business, highlighting Data Center as the core growth driver, with Hyperscale revenue surging 115%. The most significant new information was the unveiling of the Vera CPU, targeting a new $200 billion market for Agentic AI. It can be sold alongside Rubin GPUs or independently, with production starting in Q3. Management reaffirmed the $1 trillion revenue target for the Blackwell and Rubin platforms through 2027. Q2 revenue guidance of approximately $91 billion aligns with expectations. However, the announcement of a new $80 billion share buyback authorization and a raised dividend, while positive, fell slightly short of some investors' hopes for a larger repurchase plan.

marsbit1 h fa

NVIDIA's Q1 Performance is Solid, Vera CPU Drives Future Increment

marsbit1 h fa

A $50 Million Funding Round Ignites Airdrop Expectations, Making Variational the New Focus in Perp DEX

Perp DEX Variational has become a focal point in the airdrop community following its announcement of a $50 million Series A funding round led by Dragonfly Capital, with participation from Bain Capital Crypto and CoinBase Ventures. The news caused a significant pre-market price surge. Variational is a zero-fee perpetual decentralized exchange built on Arbitrum, notable for its "brokerage-style" model that aggregates liquidity from various sources. It currently ranks fourth in open interest among Perp DEXs and is the only top-five platform yet to issue a token. The primary way for users to participate is through its "Trade to Earn" points program, designed to reward organic trading activity. The system includes benefits for early users, a tiered rewards structure based on trading volume, and a referral program. The platform has committed to allocating approximately 50% of its token supply to the community. While an exact Token Generation Event (TGE) date has not been announced, official documentation indicates the points program will run at least until the end of Q3 2026. Prediction markets currently suggest a TGE is more likely in Q4 2026.

marsbit1 h fa

A $50 Million Funding Round Ignites Airdrop Expectations, Making Variational the New Focus in Perp DEX

marsbit1 h fa

Trading

Spot

Futures