Can Alibaba Cloud Rewrite Itself?

marsbitPubblicato 2026-05-20Pubblicato ultima volta 2026-05-20

Introduzione

Over the past five months, Alibaba Cloud's MaaS (Model as a Service) revenue has surged 15x, marking a strategic overhaul where the company is shifting its 17-year-old system designed for "humans using cloud" to a new paradigm centered on "Agents consuming Tokens." At its recent summit, Alibaba Cloud announced a full-stack upgrade encompassing "chip-cloud-model-inference," all optimized for AI Agents. Key launches include the new AI product portal "QianWen Cloud," hyper-node servers powered by the in-house AI chip Zhenwu M890, and the latest flagship model, Qwen3.7-Max. Senior VP Liu Weiguang described this as building "China's largest AI factory," where chips are raw materials, the cloud is the workshop, models are machines, and the inference platform is the assembly line, with Tokens as the final product. The company is now emphasizing its chip strategy, unveiling the Zhenwu M890 and a two-year roadmap for future chips. With over 560,000 chips deployed across 400+ clients, Alibaba Cloud aims to control the marginal cost per Token, mirroring Google's integration of TPU and Gemini for optimal cost-performance. The cloud infrastructure itself is being rewritten. Traditional cloud interfaces are being transformed into standardized, Agent-callable Skills. A new scheduling logic focuses on "task scheduling" over "resource scheduling" to handle the unpredictable, elastic workloads of Agents. Liu noted that AI applications now automatically provision cloud resources, with one cu...

Over the past five months, the MaaS revenue of Alibaba Cloud has grown 15 times, which is just one facet of Alibaba Cloud's self-reconstruction. At the summit, Alibaba Cloud announced the completion of a full-stack Agentization upgrade covering "chip-cloud-model-inference," simultaneously launching the new AI product official website "Qianwen Cloud," the hyper-node server equipped with the self-developed AI chip Zhenwu M890, and the latest flagship model Qwen3.7-Max.

In the words of Alibaba Cloud Senior Vice President Liu Weiguang, "We are building China's largest AI factory." The factory metaphor implies a complete production logic: chips are the raw materials, the cloud is the workshop, models are the machines, the inference platform is the assembly line, and the final commodity produced is the Token.

The essence of this reconstruction is to transform the entire system built over the past 17 years around "humans using the cloud" into a new system centered on "Agents consuming Tokens."

Why Play the Chip Card Now?

Alibaba Cloud rarely emphasized chips in public before. At this summit, it not only released the new-generation training-and-inference integrated AI chip Zhenwu M890 but also unprecedentedly disclosed its chip roadmap for the next two years, with successive generations of products, Zhenwu V900 and Zhenwu J900, progressing year by year.

The Zhenwu M890 is equipped with 144GB of video memory, an inter-chip interconnect bandwidth of 800GB/s, and performance three times that of the previous generation Zhenwu 810E. Paired with the self-developed ICN Switch interconnect chip, 128 AI chips can be combined into a single machine, with P2P latency compressed to within 150 nanoseconds.

But beyond the specifications, the more crucial information is scale. The Zhenwu series has cumulatively shipped 560,000 units and has already entered over 400 clients across more than 20 industries, including telecommunications, FAW, and SPDB.

Liu Weiguang repeatedly used Google as an analogy. The deep integration of Google's TPU and Gemini allowed Google to achieve optimal cost-effectiveness within its own framework. Alibaba Cloud certainly wants to follow the same path. He summarized the competitive logic in one sentence: "If the future competition is about every chip generating more high-quality Tokens than competitors, then we win."

Combined with the Yitian CPU, Panmai Smart NIC, and Zhenyue storage master control chip, T-Head's chip landscape has expanded from a single point to complete coverage of computing power, networking, and storage. When inference demand expands exponentially, only by holding chips in one's own hands can the marginal cost of each Token be controlled.

The reasoning is not complicated. Model companies can compete on parameters, but cloud providers ultimately compete on whose Tokens are cheaper, more stable, and faster. Chips are the starting point of this cost war.

The Cloud Itself Must Be Rewritten

Chips solve the problem of "being able to run," but Agents' demands on the cloud extend far beyond computing power.

The interaction logic of traditional cloud products is designed for humans: opening the console, looking at menus, configuring parameters, clicking buttons. This setup is completely unusable for Agents. Agents don't view web pages or click buttons; they need structured capability descriptions, standardized calling protocols, and predictable feedback.

Alibaba Cloud CTO Li Feifei used a set of comparisons to illustrate the problem: Traditional cloud workloads are steady-state; an ECS instance, once launched, might run for months or even years. But Agent workloads are characterized by "irregular elasticity, short lifecycles, and instant scaling up and down." After an Agent completes a task, its sandbox is destroyed. The next request might come in a few milliseconds, or it might not come for several hours.

To address this, Alibaba Cloud has done three things.

First, it made cloud products Skill-based, MCP-based, and CLI-based. Simply put, each cloud product is packaged into a standardized interface that Agents can directly call, using the cloud like calling functions.

Second, it built a dedicated runtime environment for Agents—lightweight sandboxes, multi-Agent collaboration, cross-task memory, and data flow channels.

Third, it rebuilt the scheduling logic, shifting from "resource scheduling" to "task scheduling," because traditional resource orchestration methods cannot withstand the concurrency of massive numbers of Agents.

Liu Weiguang stated that some AI applications, after going online, automatically provision cloud resources in the background—virtual machines, database instances, sandbox environments—without any human intervention. The volume of resources automatically provisioned for one customer in a single day is equivalent to two weeks of manual operations in the past.

"This essentially means Agents are using the cloud by themselves." Liu Weiguang provided an internal conversion formula: Token consumption can be proportionally converted into GPU usage, and each additional GPU card roughly drives a one-to-one increase in CPU. In other words, the growth in Token revenue is not cannibalizing traditional cloud revenue but is pulling it up, provided the cloud platform can handle the Agent workload.

Therefore, Alibaba Cloud is not merely adding a layer of AI capability to the original system. It is rewriting everything from interaction methods, scheduling logic, and billing models to product forms.

Models Are Not for Chatting

The third layer of the full-stack reconstruction is the model. Qwen3.7-Max ranked first among domestic models in the global Arena blind test overall leaderboard, surpassing Kimi-K2.6, DeepSeek-v4-pro, and GLM-5.1. The focus of this release is Alibaba's redefinition of the direction of model capabilities.

Alibaba Group's Tongyi large model leader Zhou Jingren said, "In the past, we pursued how well the model 'spoke.' Now we demand that the model 'gets things done.'"

Taking Alibaba Cloud's practice in chips as an example, on the previously untrained Zhenwu M890 chip, Qwen3.7-Max, relying solely on a task description, autonomously worked for 35 hours from scratch, independently completing the writing and optimization of a production-grade AI computing kernel. The final performance was 10 times higher than the official version, with no human intervention or intermediate guidance throughout the entire process.

This demonstrates the model's core capability in Agent scenarios: long-range autonomous execution. It takes a task, breaks it down, plans, writes code, debugs—working continuously for 35 hours without stopping.

To support this level of inference demand, the Bailian platform has also undergone corresponding upgrades: pooled scheduling to improve GPU utilization, context caching to eliminate redundant calculations, and elastic throughput scheduling to handle concurrency peaks.

In terms of ecosystem, Bailian remains open for access. Besides the Qianwen model matrix, it has also onboarded third-party models such as Zhipu's GLM-5.1, MiniMax's M2.7, and Moonshot AI's Kimi K2.6.

Liu Weiguang mentioned, "Clients in practice don't use just one model; they use a combination of multiple models. We provide the combinations; clients find the mix that best suits them on the platform." At the summit, executives from six leading domestic model companies collectively took the stage, creating a scene reminiscent of a "domestic AI alliance."

Within the last three months, the Qianwen flagship model has iterated through three versions: 3.5, 3.6, and 3.7. This release rhythm itself sends a signal: the competition in model capabilities is far from over, and Alibaba intends to establish a long-term advantage through the vertical integration of self-developed chips and self-developed models.

The Real Bet of This Reconstruction

Looking back, the underlying logic of Alibaba Cloud's full-stack reconstruction is simple and pure. When AI revenue growth far outpaces traditional cloud business, when Tokens might replace ECS as the largest product line, when Agents start automatically provisioning cloud resources without humans needing to log into the console, the entire technical system designed for humans has reached a point where it must be changed.

But the difficulty of execution is another matter. Liu Weiguang himself admitted that the transformation is "easy to talk about, very hard to do." In the past, the sales team interacted with clients' IT departments. Now, doing MaaS requires dialogue with business units or even the CEO.

"Your conversational ability, your experience, are requirements of a completely different level." Alibaba Cloud has already established dedicated MaaS sales roles for large enterprise clients, operating and being assessed separately from traditional IaaS sales.

Performance metrics are also changing, no longer just looking at call volume, but at "high-quality Tokens"—Tokens that solve real problems, not chit-chat Tokens. Three core metrics: daily growth in paying customers, the number of core business systems integrated with models, and the efficiency of Agents autonomously completing task loops.

These adjustments at the organizational and mechanism levels often indicate a company's true judgment more than technical announcements. Alibaba Cloud wants to rebuild its revenue structure, customer relationships, and sales system. Liu Weiguang stated, "When we were building the cloud before, the client's IT budget was calculable—how many servers offline, roughly how much to move them up—you could see the problem. But with MaaS, the answer to this problem is unknown; once you get in, it might exceed your imagination."

The problem statement can't be seen, and the answer is uncertain, but Alibaba Cloud has still decided to dismantle and rewrite its entire system because the only certainty is that AI is an opportunity ten or even a hundred times larger than any before.

This is probably the most noteworthy information from this summit: not which chip has more computing power, or which model ranks where, but that China's largest cloud provider is betting on a future it believes will come, with an aggressive posture approaching that of a startup. (Author: Zhang Shuai, Editor: Yang Lin)

Domande pertinenti

QWhat is the core reason behind Alibaba Cloud's comprehensive 'chip-cloud-model-inference' stack reconstruction according to the article?

AThe core reason is to transform its entire 17-year-old system built around 'people using cloud' into a new system centered on 'Agents consuming Tokens'. This is driven by the recognition that AI revenue growth vastly outpaces traditional cloud business, and Tokens could replace ECS as the largest product line as Agents begin to automatically provision cloud resources.

QWhy does the article emphasize Alibaba Cloud's focus on developing its own AI chips like Zhenwu M890 now?

AThe article emphasizes it because, in an era where inference demand is expanding exponentially, controlling the cost of each Token is crucial for competition. By developing its own chips (like Zhenwu), Alibaba Cloud aims to control the marginal cost per Token from the source, similar to Google's strategy with TPU and Gemini, to ultimately provide cheaper, more stable, and faster Tokens.

QHow does Agent's demand for cloud differ from traditional human user demand as described in the text?

AAgent demand differs fundamentally: it is 'irregularly elastic, short-lived, and instantly scalable then gone,' requiring structured capability descriptions, standardized calling protocols, and predictable feedback. In contrast, traditional cloud workloads are steady-state (e.g., an ECS instance running for months). Agents do not interact with web consoles; they need cloud services encapsulated as standardized, function-like interfaces.

QWhat key shift in Alibaba Cloud's model capability focus is highlighted with the release of Qwen3.7-Max?

AThe key shift is moving from pursuing models that 'speak well' to models that 'can accomplish tasks.' The focus is now on long-term autonomous execution capabilities. For example, Qwen3.7-Max autonomously wrote and optimized a production-level AI computing kernel for the new Zhenwu M890 chip over 35 hours without human intervention, improving performance tenfold.

QAccording to the article, what organizational and operational changes is Alibaba Cloud making to support its MaaS (Model-as-a-Service) transformation?

AAlibaba Cloud is making several changes: 1) Establishing dedicated MaaS sales roles for large clients, separate from traditional IaaS sales with independent assessments. 2) Shifting key performance indicators (KPIs) from mere call volume to 'high-quality Tokens' that solve real problems. Core metrics now include daily growth of paying customers, number of core business systems integrated with models, and the efficiency of Agents completing task loops autonomously. 3) Changing sales dialogue from IT departments to business units or CEOs, requiring different skills and experience levels.

Letture associate

Morning Post | Michael Saylor Releases Bitcoin Tracker Info; Aave Publishes Kelp rsETH Bridge Attack Post-Incident Investigation; Gravity Bridge Announces Service Suspension Following Attack

ChainCatcher Daily Summary - June 1, 2026 In regulatory news, the U.S. OCC granted preliminary conditional approval for Laser Digital to establish a federally regulated trust bank. In Vietnam, a draft law amendment proposes allowing SMEs to use digital and virtual assets as loan collateral. Hong Kong's SFC chairman reported that trading volume on the city's 12 licensed virtual asset platforms grew nearly 300% YoY in Q1 2026. Notable incidents include the Cosmos ecosystem cross-chain bridge Gravity Bridge pausing services after an attack. Aave published a post-mortem on the April 18th Kelp rsETH bridge attack, attributing it to a third-party bridge infrastructure vulnerability via an RPC poisoning attack, not the Aave protocol itself. In market developments, MicroStrategy's Michael Saylor hinted at a potential upcoming Bitcoin purchase announcement. Fed Governor Waller commented that widespread stablecoin adoption could amplify the impact of U.S. monetary policy. Meanwhile, sentiment analysis from Santiment indicates a record-high Bitcoin long/short ratio of 2.23, potentially signaling a short-term price correction, while Ethereum shows signs of FUD among commentators. In legal matters, the SEC sued the founder of Privvy Investments for an alleged $12.3 million crypto AI trading bot scam. In China, a Qingdao man was sentenced to 10 years and 9 months for stealing 107 BTC by obtaining a victim's wallet seed phrase. Top trending meme tokens on ETH, Solana, and Base networks for the past 24 hours are also listed.

链捕手23 min fa

Morning Post | Michael Saylor Releases Bitcoin Tracker Info; Aave Publishes Kelp rsETH Bridge Attack Post-Incident Investigation; Gravity Bridge Announces Service Suspension Following Attack

链捕手23 min fa

Alibaba 'Stocks Up', ByteDance 'Trains'

"In late May, two closely timed events in China's AI industry clearly revealed the divergent strategic approaches of two tech giants: Alibaba and ByteDance. Alibaba is aggressively integrating AI into its existing commercial ecosystem, prioritizing immediate monetization. Its Qwen App now fully integrates with Taobao, leveraging the platform's 4-billion-item database for AI-powered shopping features like virtual try-on and price comparison. Internally, Alibaba has reorganized to incentivize AI-driven business growth, notably through the 'Agentic Commerce Trust Protocol' to enable AI-agent transactions. Financially, it emphasizes ROI, with CEO Daniel Wu stating every AI chip purchased is generating revenue. Alibaba's strategy bets that foundational AI model capabilities won't be leapfrogged in the next five years, allowing its 'AI-as-a-utility' approach to succeed. In stark contrast, ByteDance's Seed division focuses on pushing the frontiers of AGI with a long-term, research-oriented mindset. Its video generation model, Seedance 2.0, topped international benchmarks. The division, led by researchers Wu Yonghui and product head Zhu Wenjia, is tasked with 'exploring the upper limits of intelligence,' even considering open-sourcing its models—a rare move among Chinese firms. ByteDance is investing heavily, with reports of its 2026 capital expenditure plan being nearly triple that of 2024, funded by its substantial private profits. This allows it to pursue projects like an 8-month research paper questioning if video models are true 'world models,' devoid of immediate commercial pressure. The core divergence is less about corporate philosophy and more about structural constraints. As a publicly traded company, Alibaba is bound to quarterly financial expectations, forcing a pragmatic, revenue-focused AI integration. As a private entity, ByteDance has the luxury to fund long-term, high-risk foundational research without answering to public markets. The article concludes that the true determinant of a Chinese company's AI path is its IPO status, suggesting that if ByteDance were public, or if Alibaba were private, their strategies might well be reversed."

marsbit1 h fa

Alibaba 'Stocks Up', ByteDance 'Trains'

marsbit1 h fa

Why More AI Agents Does Not Equal Higher Productivity?

Editor's Note: As AI Agents become cheaper and easier to use, a new constraint emerges: the cost isn't in launching more Agents, but in the human attention required to manage, judge, and integrate their outputs. This hidden cost is called the "orchestration tax." The article argues that a developer's cognitive bandwidth is the key bottleneck—a serial, non-parallelizable resource akin to a Global Interpreter Lock (GIL). While many Agents can run concurrently, their results ultimately require human judgment for review, conflict resolution, and final integration. Therefore, more Agents don't automatically mean higher productivity; they can simply create longer queues, lead to cognitive fatigue, and create the illusion of busyness without real output. The core solution is to design workflows around this scarce human attention. Key strategies include: scaling the number of Agents to match review capacity (not UI capacity), categorizing tasks (delegating independent ones, keeping complex judgment-heavy ones serial), batch reviewing results to minimize context-switching costs, automating verifiable checks to reserve human judgment for critical decisions, and protecting focused, uninterrupted thinking time. Ultimately, the critical skill is not launching many Agents, but architecting systems that respect the fundamental limit of human attention. Unpaid "orchestration tax" accumulates as both technical and cognitive debt, undermining system understanding and quality. True productivity comes from thoughtfully managing the single-threaded resource—your focus.

marsbit3 h fa

Why More AI Agents Does Not Equal Higher Productivity?

marsbit3 h fa

Trading

Spot
Futures
活动图片