Why Did Zhipu Surge Nearly 30% in a Single Day?

marsbit發佈於 2026-05-23更新於 2026-05-23

文章摘要

"Global AI Model Unicorn" Zhipu's stock surged nearly 30% in a single day, reaching a new market cap high. The catalyst was the launch of its GLM-5.1-highspeed API, boasting a generation speed of **400 tokens per second**, setting a new global benchmark. This speed, roughly 3-5 times faster than industry leaders like OpenAI's GPT-4o and Anthropic's Claude, is achieved **without compromising the full-scale model's capabilities**. In the era of AI Agents requiring dozens of self-calls, such latency reduction is critical, transforming speed from a system metric into a determinant of intelligence limits. The breakthrough stems from a three-layer technical overhaul: 1. **TileRT Inference Engine**: Compiles the entire model into a continuous, always-on computation pipeline using "Warp Specialization," minimizing GPU idle time by having different processor groups handle data loading, computation, and communication in parallel. 2. **Heterogeneous Parallelism for MLA**: To efficiently run the GLM-5.1 model using the MLA attention mechanism, TileRT employs a heterogeneous strategy. One GPU handles sparse indexing/routing, while the others perform dense computation, optimizing for MLA's unique workflow. 3. **ZCube Network Architecture**: Replaces the standard Spine-Leaf (ROFT) network topology with a flat, dual-group interconnect. This design creates a single optimal path between any two GPUs, eliminating network congestion at scale and reducing latency. The business impact is sig...

By AIDeepDive

Today, Zhipu (02513.HK), hailed as the "world's first listed large language model company," surged once again.

Its intraday increase once exceeded 30%. It closed at HK$1,282, up over 26% for the day, with its market capitalization reaching HK$571.57 billion, setting another historical high.

The trigger for this surge was a specific technical metric: 400 tokens/s.

On May 22, Zhipu officially opened access to the GLM-5.1 Highspeed API (GLM-5.1-highspeed) for enterprise clients. The most critical core parameter is just one: model output speed reaching 400 tokens per second, setting a new global upper limit for API speed among major LLM providers.

I initially thought this was just another public relations stunt by a domestic LLM company, but after examining the technical details, I finally understood the logic behind the capital market's reaction.

What does 400 tokens/s mean?

The model can generate approximately 200 Chinese characters per second, equivalent to the high-intensity output of a professional writer in one minute, compressed into just one second.

A volume of text that would take a creator several days of desk work to complete can be delivered by the GLM-5.1 Highspeed in just 1 minute; a system refactoring task that would occupy an engineer for 3 days can be completed in the time it takes to drink a cup of coffee.

01 Speed Is More Important Than You Think

Speed has historically been the most easily overlooked dimension in AI model competition.

Over the past three years, the LLM arms race has centered on two tracks: parameter scale (making models larger and smarter) and price wars (making tokens cheaper and more accessible). "Speed" was never the protagonist.

This is because, in the past, "speed" was typically achieved by reducing model parameters. To increase speed, one had to use smaller, more streamlined models, at the cost of diminished capabilities.

The significance of the GLM-5.1 Highspeed lies in its achievement of pushing speed to 400 tokens/s while retaining the capabilities of the flagship full-size base model.

For both domestic and international models, "flagship-level capability" and "ultra-low latency" have been achieved without compromise for the first time.

Why is speed so critical? Because the main battlefield for AI is undergoing a fundamental shift.

As AI moves from the ChatBot era into the Agent era, Q&A is no longer the primary scenario. For an Agent to complete a task, it often requires the model to make dozens or even hundreds of self-calls: writing code, calling APIs, searching for information, utilizing tools...

In this operational mode, the latency between each call is mercilessly magnified. For a task requiring 50 calls, saving 1 second per call speeds up the entire task by nearly 1 minute. For AI programming assistants, voice interaction, and commercial decision systems, this difference can be a matter of life or death.

At a deeper level, within a fixed time budget, faster inference means the model can explore deeper reasoning paths and perform more rounds of self-verification. Speed is transforming from a system metric into an upper limit of intelligence itself.

02 How Difficult Is Achieving Speed?

So, what's the current industry standard for speed?

Among leading providers, OpenAI's GPT-4o is around 100–150 tokens/s, Anthropic's Claude Sonnet series around 80–120 tokens/s, while mainstream domestic flagship model APIs mostly fall within the 50–100 tokens/s range. 400 tokens/s is approximately 3 to 5 times the industry average.

More crucially, this gap cannot be bridged simply by throwing more computing power at it.

A server equipped with 8 H200 GPUs can theoretically move up to 38TB of data per second. For GLM-5.1, generating a single token only requires reading about 42GB of activation parameters. Purely theoretical calculation suggests it should approach 1000 tokens/s.

But real-world systems often only achieve a few dozen tokens/s.

This is a gap of an order of magnitude. The GPUs aren't inherently too slow; rather, a significant amount of time is wasted on waiting, idling, and inefficient scheduling.

Zhipu's breakthrough this time stems from simultaneous innovations at three levels: the inference engine, parallelization strategy, and network architecture.

03 Three-Layer Technology Stack, Approaching Hardware Physical Limits

Here's how traditional LLMs operate: the model is decomposed into independent operators (kernels). Each operator launches a computing kernel, computes, stops, synchronizes and waits, then launches the next one.

During the training phase, each computation takes seconds or even minutes, making these startup and wait overheads negligible. But during inference, generating a single token, a key step might only require tens of microseconds, making the startup and wait overheads proportionally significant.

TileRT's Core Idea: Compile the entire model into a continuously running engine, start once, run perpetually.

TileRT statically unfolds all of the model's computational logic into a continuous pipeline during the code compilation phase. At runtime, the GPU maintains high-speed operation, with computation, data movement, and communication proceeding in parallel. Intermediate results are kept within the GPU's high-speed cache as much as possible, avoiding repeated writes to slow VRAM and subsequent re-reads.

There's a crucial design detail here: Warp Specialization.

Understanding Warp requires first understanding GPU operation. The biggest difference between a GPU and a CPU is that a GPU contains thousands of relatively simple computing units, bundled together in groups of 32. This group is called a Warp.

All 32 units within the same Warp must always act synchronously, executing the same instruction, like a squad in the army where the squad leader orders everyone to perform the same action simultaneously.

In traditional frameworks, all Warps execute the same sequence of instructions. TileRT assigns different Warp groups different responsibilities: some specialize in prefetching the next batch of data, some in mathematical computation, some in communicating with other GPUs. The three groups work simultaneously, pipelining seamlessly without waiting for each other.

It's akin to moving from "one worker moving bricks, laying walls, and inspecting serially" to "a brick-moving group, a wall-laying group, and an inspection group operating concurrently."

With single-GPU efficiency solved, multi-GPU parallelism presents a new challenge.

The industry standard approach is Tensor Parallelism (TP): Split the model's weight matrices into several parts, with each GPU responsible for one part. After computing, results are aggregated via high-speed interconnects (NVLink).

This solution works well for regular, dense computations like matrix multiplication and is the standard multi-GPU solution for almost all current LLM inference frameworks.

GLM-5.1 employs **MLA (Multi-head Latent Attention), an attention mechanism proposed by DeepSeek.

Traditional attention mechanisms require storing large amounts of intermediate data (KV Cache) generated at each step for later use, which consumes significant VRAM. MLA's approach is to first compress this intermediate data into a compact "latent vector" for storage, then expand and restore it when needed, drastically reducing VRAM requirements and improving inference efficiency.

However, MLA's computational flow has a special step: performing sparse indexing from a large amount of historical information: similar to quickly finding the most relevant few books in a vast library before carefully reading them.

The "book-finding" step relies on global information and is not well-suited for distribution across multiple GPUs; the "careful reading" is the dense computation suitable for multi-GPU parallelism. If all 8 GPUs are forced to participate in "book-finding," a lot of time would be wasted on inter-GPU synchronization communication.

TileRT's solution is to have GPUs operate heterogeneously: GPU 0 specializes as the "library retriever," handling sparse indexing and routing decisions; GPUs 1–7 act as "detailed analysts," responsible for dense attention computation and matrix operations. The two types of workers each adopt the parallelization strategy best suited to them, collaborating to complete the entire computational layer.

Next, TileRT embeds inter-GPU communication operations directly into the execution pipeline, no longer treating them as separate steps. Externally, the entire 8-GPU system completing one layer of attention computation requires only one kernel launch; internal communication and computation are all seamlessly completed within the continuous pipeline.

The above two layers address problems within a single server. When scaling clusters to hundreds or thousands of GPUs, data transmission between GPUs itself becomes the new bottleneck.

The industry standard approach is ROFT (Rail-Optimized Fat-Tree), NVIDIA's officially recommended solution and the absolute industry standard.

Its structure is like a tree: servers connect first to underlying Leaf switches (access layer, directly facing servers). Leaf switches then connect upward to Spine switches (backbone layer, responsible for interconnecting different Leafs, like highway hubs). Data transmission between two GPUs must "go up to a Spine, then down to the target Leaf," traversing at least 3 hops.

To prevent traffic from concentrating on a few links, this architecture relies on the ECMP algorithm to distribute data across multiple paths, functioning well under the premise of "statistically uniform" internet traffic.

But inference traffic is completely non-uniform. Context lengths between different requests can vary by tens of times, and the direction of KV Cache transmission between GPUs is almost random. A few Leaf switches periodically become hotspots, triggering backpressure mechanisms that spread congestion from local to the entire link. This congestion cannot be solved by protocol parameter tuning; it's inherent to the topology structure.

ZCube's fundamental breakthrough: Architecturally preventing this type of congestion from physically occurring.

The core design consists of two steps:

First, eliminate the Spine backbone layer, flatten the entire network. Divide all Leaf switches into two groups based on odd/even numbering, with the two groups fully interconnected. Any odd-numbered switch connects to all even-numbered switches, and vice versa. Any two GPUs can reach each other via at most two switches, reducing hops from 3 to 2.

The second step, and the most ingenious part: Connect each GPU network card to the two groups of switches in two completely different ways. This special topology yields a key mathematical property: Between any two GPUs in the entire network, there is one and only one optimal path.

The "single path" directly eliminates the root cause of congestion. Traditional architectures are prone to hotspots precisely because there are multiple paths to choose from; if the load-balancing algorithm makes a wrong choice, traffic concentrates. ZCube eliminates "choice" itself by design: no balancing is needed because there are no forks.

04 Under the Same Hardware Conditions, How Does the Math Work?

After upgrading the GLM-5.1 production cluster from traditional ROFT to ZCube, Zhipu obtained three key numbers:

In summary, with the same GPU investment, the cluster can serve more users; with the same user experience requirements, the cluster can purchase one-third fewer network devices. Efficiency and cost are improved in both directions.

Specifically, throughput increased by 15%, equivalent to gaining 15% more computing power for free. With the same number of GPUs, a 15% higher throughput is equivalent to approximately a 13% reduction in the amortized hardware cost per token, or the ability to serve 15% more users at the same cost.

If a cluster has 1000 GPUs, this upgrade is equivalent to gaining the productive capacity of 150 additional cards for free. Based on current high-end inference GPU market prices, this represents computing power value in the billions of yuan.

A 40.6% reduction in tail latency addresses stability, not average speed. For an Agent task requiring 50 calls, if tail latency is reduced by 1 second per call, the worst-case completion time for the entire task is compressed by nearly 1 minute.

A one-third cost reduction is a direct saving at the construction level. ZCube eliminates the Spine layer, directly reducing the number of switches and optical modules required for the same cluster scale by one-third. According to Zhipu's calculations, in a ten-thousand-GPU scale cluster, this alone could save approximately 210 million to 640 million yuan.

In the long term, as cluster sizes expand exponentially, the complexity of inter-GPU communication grows manifold, and the probability and impact of congestion amplify accordingly. This means the value of architectural innovations like ZCube will accelerate as inference clusters continue to expand. The gains for tomorrow's ten-thousand-GPU clusters may far exceed today's 15%.

05 Final Thoughts

After reading Zhipu's technical report, I wondered: Could this bring a storm to the industry, much like DeepSeek's sudden emergence?

Upon careful consideration, their impacts seem to lie in different aspects. When DeepSeek emerged, it proved that the same level of intelligence could be achieved with far less computing power. The market worried that "fewer GPUs would be needed," causing NVIDIA's market cap to evaporate nearly $600 billion that day.

But Zhipu's technology today proves: The same computing power can produce more output. It is reshaping "what other infrastructure outside of GPUs should look like."

In the short term, NVIDIA may not be affected. But in the long run, the moat of GPU + NVLink interconnect + InfiniBand network + CUDA software ecosystem is being "loosened," especially the InfiniBand technology NVIDIA acquired with its $6.9 billion purchase of Mellanox in 2019. NVIDIA's premium on the network side will be significantly eroded.

Furthermore, while ZCube eliminates the Spine layer, it actually imposes higher requirements on the port density of Leaf switches. This benefits manufacturers capable of producing high-density, large-port Leaf switches (like Ruijie, Arista, Broadcom switching chips) and disadvantages those who primarily rely on high-end Spine layer switches for premium pricing.

In 2025, Celestica and NVIDIA together held about 50% of the AI backend network switch market share. This landscape faces a potential reshuffle if the ZCube paradigm proliferates.

Optical modules are the most directly beneficial segment in this industry chain change, with a very clear logic. For domestic optical module manufacturers (like Zhongji Innolight, Tianfu Communications, etc.), this is a structural positive: not only is the total volume growing, but the demand for high-speed optical modules (800G, 1.6T) under the ZCube paradigm is more concentrated and urgent compared to traditional architectures.

Whether it's TileRT or the ZCube architecture, this is a set of pure software inference engines running on standard GPUs, not reliant on NVIDIA's proprietary hardware features. In theory, they can be ported to domestic chips like Huawei's Ascend. Once this direction is viable, it will significantly lower the software stack barrier for domestic AI chips in inference scenarios.

This is perhaps the even greater significance behind this technological innovation.

你可能也喜歡

Bit Digital CEO：我为何还在加仓ETH

Bit Digital CEO Sam Tabar 解释了其持续加仓ETH的原因。他认为，ETH的定价存在偏差，不应仅以“货币”框架来评估，因为以太坊的核心价值在于构建了一个可编程的结算层，并专注于实用性。目前，以太坊已实际支撑了稳定币发行、美国国债代币化及AI智能体交易结算等关键机构应用。他强调，机构资本的采用不依赖于叙事共识，而需要可靠、合规的基础设施，以太坊是目前唯一能同时提供计算层和结算轨道的规模化平台。 Tabar指出，ETH的价值重估将来自机构需求，而非散户炒作。机构入场需待合规、托管及监管环境成熟，他认为这一时机已临近。他持有ETH是基于其信托责任和资本配置决策：ETH能通过质押产生高额收益（第一季度业务毛利率达94.7%），并为核心智能合约平台提供安全支撑，该平台处理了数万亿美元交易且机构交易量持续增长。他认为当前ETH价格相对于其驱动的基础设施实际价值存在显著折扣，因此值得买入并持有。

marsbit1 小時前

marsbit1 小時前

Cardano (ADA) 生态系统增长推动质押活动强劲增加

卡尔达诺（ADA）生态系统的增长推动了其质押活动的显著增加。尽管ADA价格持续面临下行压力，但网络质押活动大幅上升，超过217.5亿ADA（占总供应量370.1亿的约58%）已被质押，这增强了区块链安全性，并反映了持有者对网络长期发展的信心。与此同时，链上数据显示，持有至少100万ADA的大型投资者（鲸鱼）正在加速积累代币，这类地址目前持有总量超过251.1亿ADA，达到自2017年12月以来的最高水平，约占流通供应的67%。分析指出，质押参与度的提升和鲸鱼的积累行为，均表明市场核心参与者对卡尔达诺网络的现状及未来方向抱有强烈信心。

bitcoinist2 小時前

bitcoinist2 小時前

美国商品期货交易委员会为受监管的加密货币和比特币永续合约铺平道路——Kalshi紧随其后

美国商品期货交易委员会（CFTC）宣布，允许其监管下的交易所上市与比特币挂钩的永续合约，为在美国规则内运作流动性比特币永续产品开辟了更清晰的道路。此举被视为与巩固美国作为全球加密货币中心的目标相一致。同时，CFTC向加密货币交易所Coinbase发出了不行动函，允许其美国客户访问该平台已有的期权和永续合约产品。Coinbase首席法务官称此为该行业的重大进展。受此监管变化影响，预测市场平台Kalshi宣布计划推出永续期货合约，首先从加密货币永续合约开始，旨在为美国交易者提供受监管的离岸平台替代方案。Kalshi指出离岸永续市场年交易量从2023年的28万亿美元激增至2025年的逾90万亿美元，并强调其产品的差异化在于监管合规。该公司表示，其永续合约将每八小时收取一次资金费率，且费率在交易历史中可见，并明确初期产品线不包括农产品永续合约。

bitcoinist2 小時前

美国商品期货交易委员会为受监管的加密货币和比特币永续合约铺平道路——Kalshi紧随其后

bitcoinist2 小時前

美国将对法官要求广泛退还特朗普关税的裁决提出上诉。美国在向贸易法院提交的文件中披露了对退税裁决提出上诉的计划。比特币目前涨0.1%

美国计划对法官要求广泛退还特朗普时期关税的裁决提出上诉，已在向贸易法院提交的文件中披露这一上诉意向。与此同时，比特币价格微涨0.1%。

华尔街日报3 小時前

美国将对法官要求广泛退还特朗普关税的裁决提出上诉。
美国在向贸易法院提交的文件中披露了对退税裁决提出上诉的计划。
比特币目前涨0.1%

华尔街日报3 小時前

Claude Opus4.8发布，Anthropic开始把「可信」做成产品卖点

Anthropic发布了Claude Opus 4.8模型，在六项核心基准测试中取得五项第一，价格维持不变。此次发布的核心卖点并非单纯性能提升，而是显著增强了模型的“可信度”。在代码诚实度测试中，模型对自身错误的漏报率从19.7%大幅降至3.7%。它更擅长暴露不确定性，减少“过度自信的错误回答”，降低了用户因收到看似完整流畅实则错误的答案而带来的风险。 Claude Code功能新增了动态工作流（研究预览版），能自动调度多个子Agent并行工作，并引入对抗性自检机制在交付结果前进行验证，进一步提升了复杂任务输出的可靠性。模型在数学（USAMO 2026得分提升27%）、长上下文推理和多Agent协作等任务上表现突出。不过，在终端操作任务上仍稍逊于GPT-5.5，且在少数领域（如抗提示注入能力）存在退步。文章指出，前沿模型的竞争正从追逐基准分数转向对可靠性、可验证性的争夺。同时，Opus 4.8被定位为通往更强大、受限访问的“Mythos”级模型的公开入口，该模型预计将在未来几周内推出。对于用户而言，此次升级意味着可以将更多重要、复杂的工作委托给AI，推动“可委托”成为下一代AI的核心门槛。

marsbit3 小時前

marsbit3 小時前

交易

現貨

合約

熱門文章

什麼是 $S$

理解 SPERO：全面概述 SPERO 簡介隨著創新領域的不斷演變，web3 技術和加密貨幣項目的出現在塑造數字未來中扮演著關鍵角色。在這個動態領域中，SPERO（標記為 SPERO,$$s$）是一個引起關注的項目。本文旨在收集並呈現有關 SPERO 的詳細信息，以幫助愛好者和投資者理解其基礎、目標和在 web3 和加密領域內的創新。 SPERO,$$s$ 是什麼？ SPERO,$$s$ 是加密空間中的一個獨特項目，旨在利用去中心化和區塊鏈技術的原則，創建一個促進參與、實用性和金融包容性的生態系統。該項目旨在以新的方式促進點對點互動，為用戶提供創新的金融解決方案和服務。 SPERO,$$s$ 的核心目標是通過提供增強用戶體驗的工具和平台來賦能個人。這包括使交易方式更加靈活、促進社區驅動的倡議，以及通過去中心化應用程序（dApps）創造金融機會的途徑。SPERO,$$s$ 的基本願景圍繞包容性展開，旨在彌合傳統金融中的差距，同時利用區塊鏈技術的優勢。誰是 SPERO,$$s$ 的創建者？ SPERO,$$s$ 的創建者身份仍然有些模糊，因為公開可用的資源對其創始人提供的詳細背景信息有限。這種缺乏透明度可能源於該項目對去中心化的承諾——這是一種許多 web3 項目所共享的精神，優先考慮集體貢獻而非個人認可。通過將討論重心放在社區及其共同目標上，SPERO,$$s$ 體現了賦能的本質，而不特別突出某些個體。因此，理解 SPERO 的精神和使命比識別單一創建者更為重要。誰是 SPERO,$$s$ 的投資者？ SPERO,$$s$ 得到了來自風險投資家到天使投資者的多樣化投資者的支持，他們致力於促進加密領域的創新。這些投資者的關注點通常與 SPERO 的使命一致——優先考慮那些承諾社會技術進步、金融包容性和去中心化治理的項目。這些投資者通常對不僅提供創新產品，還對區塊鏈社區及其生態系統做出積極貢獻的項目感興趣。這些投資者的支持強化了 SPERO,$$s$ 作為快速發展的加密項目領域中的一個重要競爭者。 SPERO,$$s$ 如何運作？ SPERO,$$s$ 採用多面向的框架，使其與傳統的加密貨幣項目區別開來。以下是一些突顯其獨特性和創新的關鍵特徵：去中心化治理：SPERO,$$s$ 整合了去中心化治理模型，賦予用戶積極參與決策過程的權力，關於項目的未來。這種方法促進了社區成員之間的擁有感和責任感。代幣實用性：SPERO,$$s$ 使用其自己的加密貨幣代幣，旨在在生態系統內部提供多種功能。這些代幣使交易、獎勵和平台上提供的服務得以促進，增強了整體參與度和實用性。分層架構：SPERO,$$s$ 的技術架構支持模塊化和可擴展性，允許在項目發展過程中無縫整合額外的功能和應用。這種適應性對於在不斷變化的加密環境中保持相關性至關重要。社區參與：該項目強調社區驅動的倡議，採用激勵合作和反饋的機制。通過培養強大的社區，SPERO,$$s$ 能夠更好地滿足用戶需求並適應市場趨勢。專注於包容性：通過提供低交易費用和用戶友好的界面，SPERO,$$s$ 旨在吸引多樣化的用戶群體，包括那些以前可能未曾參與加密領域的個體。這種對包容性的承諾與其通過可及性賦能的總體使命相一致。 SPERO,$$s$ 的時間線理解一個項目的歷史提供了對其發展軌跡和里程碑的關鍵見解。以下是建議的時間線，映射 SPERO,$$s$ 演變中的重要事件：概念化和構思階段：形成 SPERO,$$s$ 基礎的初步想法被提出，與區塊鏈行業內的去中心化和社區聚焦原則密切相關。項目白皮書的發布：在概念階段之後，發布了一份全面的白皮書，詳細說明了 SPERO,$$s$ 的願景、目標和技術基礎設施，以吸引社區的興趣和反饋。社區建設和早期參與：積極進行外展工作，建立早期採用者和潛在投資者的社區，促進圍繞項目目標的討論並獲得支持。代幣生成事件：SPERO,$$s$ 進行了一次代幣生成事件（TGE），向早期支持者分發其原生代幣，並在生態系統內建立初步流動性。首次 dApp 上線：與 SPERO,$$s$ 相關的第一個去中心化應用程序（dApp）上線，允許用戶參與平台的核心功能。持續發展和夥伴關係：對項目產品的持續更新和增強，包括與區塊鏈領域其他參與者的戰略夥伴關係，使 SPERO,$$s$ 成為加密市場中一個具有競爭力和不斷演變的參與者。結論 SPERO,$$s$ 是 web3 和加密貨幣潛力的見證，能夠徹底改變金融系統並賦能個人。憑藉對去中心化治理、社區參與和創新設計功能的承諾，它為更具包容性的金融環境鋪平了道路。與任何在快速發展的加密領域中的投資一樣，潛在的投資者和用戶都被鼓勵進行徹底研究，並對 SPERO,$$s$ 的持續發展進行深思熟慮的參與。該項目展示了加密行業的創新精神，邀請人們進一步探索其無數可能性。儘管 SPERO,$$s$ 的旅程仍在展開，但其基礎原則確實可能影響我們在互聯網數字生態系統中如何與技術、金融和彼此互動的未來。

85 人學過發佈於 2024.12.17更新於 2024.12.17

什麼是 AGENT S

Agent S：Web3中自主互動的未來介紹在不斷演變的Web3和加密貨幣領域，創新不斷重新定義個人如何與數字平台互動。Agent S是一個開創性的項目，承諾通過其開放的代理框架徹底改變人機互動。Agent S旨在簡化複雜任務，為人工智能（AI）提供變革性的應用，鋪平自主互動的道路。本詳細探索將深入研究該項目的複雜性、其獨特特徵以及對加密貨幣領域的影響。什麼是Agent S？ Agent S是一個突破性的開放代理框架，專門設計用來解決計算機任務自動化中的三個基本挑戰：獲取特定領域知識：該框架智能地從各種外部知識來源和內部經驗中學習。這種雙重方法使其能夠建立豐富的特定領域知識庫，提升其在任務執行中的表現。長期任務規劃：Agent S採用經驗增強的分層規劃，這是一種戰略方法，可以有效地分解和執行複雜任務。此特徵顯著提升了其高效和有效地管理多個子任務的能力。處理動態、不均勻的界面：該項目引入了代理-計算機界面（ACI），這是一種創新的解決方案，增強了代理和用戶之間的互動。利用多模態大型語言模型（MLLMs），Agent S能夠無縫導航和操作各種圖形用戶界面。通過這些開創性特徵，Agent S提供了一個強大的框架，解決了自動化人機互動中涉及的複雜性，為AI及其他領域的無數應用奠定了基礎。誰是Agent S的創建者？儘管Agent S的概念根本上是創新的，但有關其創建者的具體信息仍然難以捉摸。創建者目前尚不清楚，這突顯了該項目的初期階段或戰略選擇將創始成員保密。無論是否匿名，重點仍然在於框架的能力和潛力。誰是Agent S的投資者？由於Agent S在加密生態系統中相對較新，關於其投資者和財務支持者的詳細信息並未明確記錄。缺乏對支持該項目的投資基礎或組織的公開見解，引發了對其資金結構和發展路線圖的質疑。了解其支持背景對於評估該項目的可持續性和潛在市場影響至關重要。 Agent S如何運作？ Agent S的核心是尖端技術，使其能夠在多種環境中有效運作。其運營模型圍繞幾個關鍵特徵構建：類人計算機互動：該框架提供先進的AI規劃，力求使與計算機的互動更加直觀。通過模仿人類在任務執行中的行為，承諾提升用戶體驗。敘事記憶：用於利用高級經驗，Agent S利用敘事記憶來跟蹤任務歷史，從而增強其決策過程。情節記憶：此特徵為用戶提供逐步指導，使框架能夠在任務展開時提供上下文支持。支持OpenACI：Agent S能夠在本地運行，使用戶能夠控制其互動和工作流程，與Web3的去中心化理念相一致。與外部API的輕鬆集成：其多功能性和與各種AI平台的兼容性確保了Agent S能夠無縫融入現有技術生態系統，成為開發者和組織的理想選擇。這些功能共同促成了Agent S在加密領域的獨特地位，因為它以最小的人類干預自動化複雜的多步任務。隨著項目的發展，其在Web3中的潛在應用可能重新定義數字互動的展開方式。 Agent S的時間線 Agent S的發展和里程碑可以用一個時間線來概括，突顯其重要事件： 2024年9月27日：Agent S的概念在一篇名為《一個像人類一樣使用計算機的開放代理框架》的綜合研究論文中推出，展示了該項目的基礎工作。 2024年10月10日：該研究論文在arXiv上公開，提供了對框架及其基於OSWorld基準的性能評估的深入探索。 2024年10月12日：發布了一個視頻演示，提供了對Agent S能力和特徵的視覺洞察，進一步吸引潛在用戶和投資者。這些時間線上的標記不僅展示了Agent S的進展，還表明了其對透明度和社區參與的承諾。有關Agent S的要點隨著Agent S框架的持續演變，幾個關鍵特徵脫穎而出，強調其創新性和潛力：創新框架：旨在提供類似人類互動的直觀計算機使用，Agent S為任務自動化帶來了新穎的方法。自主互動：通過GUI自主與計算機互動的能力標誌著向更智能和高效的計算解決方案邁進了一步。複雜任務自動化：憑藉其強大的方法論，能夠自動化複雜的多步任務，使過程更快且更少出錯。持續改進：學習機制使Agent S能夠從過去的經驗中改進，不斷提升其性能和效率。多功能性：其在OSWorld和WindowsAgentArena等不同操作環境中的適應性確保了它能夠服務於廣泛的應用。隨著Agent S在Web3和加密領域中的定位，其增強互動能力和自動化過程的潛力標誌著AI技術的一次重大進步。通過其創新框架，Agent S展現了數字互動的未來，為各行各業的用戶承諾提供更無縫和高效的體驗。結論 Agent S代表了AI與Web3結合的一次大膽飛躍，具有重新定義我們與技術互動方式的能力。儘管仍處於早期階段，但其應用的可能性廣泛且引人入勝。通過其全面的框架解決關鍵挑戰，Agent S旨在將自主互動帶到數字體驗的最前沿。隨著我們深入加密貨幣和去中心化的領域，像Agent S這樣的項目無疑將在塑造技術和人機協作的未來中發揮關鍵作用。

802 人學過發佈於 2025.01.14更新於 2025.01.14

如何購買S

歡迎來到HTX.com！在這裡，購買Sonic (S)變得簡單而便捷。跟隨我們的逐步指南，放心開始您的加密貨幣之旅。第一步：創建您的HTX帳戶使用您的 Email、手機號碼在HTX註冊一個免費帳戶。體驗無憂的註冊過程並解鎖所有平台功能。立即註冊第二步：前往買幣頁面，選擇您的支付方式信用卡/金融卡購買：使用您的Visa或Mastercard即時購買Sonic (S)。餘額購買：使用您HTX帳戶餘額中的資金進行無縫交易。第三方購買：探索諸如Google Pay或Apple Pay等流行支付方式以增加便利性。C2C購買：在HTX平台上直接與其他用戶交易。HTX 場外交易 (OTC) 購買：為大量交易者提供個性化服務和競爭性匯率。第三步：存儲您的Sonic (S)購買Sonic (S)後，將其存儲在您的HTX帳戶中。您也可以透過區塊鏈轉帳將其發送到其他地址或者用於交易其他加密貨幣。第四步：交易Sonic (S)在HTX的現貨市場輕鬆交易Sonic (S)。前往您的帳戶，選擇交易對，執行交易，並即時監控。HTX為初學者和經驗豐富的交易者提供了友好的用戶體驗。

1.6k 人學過發佈於 2025.01.15更新於 2025.03.21

Why Did Zhipu Surge Nearly 30% in a Single Day?

文章摘要

01 Speed Is More Important Than You Think

02 How Difficult Is Achieving Speed?

03 Three-Layer Technology Stack, Approaching Hardware Physical Limits

04 Under the Same Hardware Conditions, How Does the Math Work?

05 Final Thoughts

相關問答

你可能也喜歡

Bit Digital CEO：我为何还在加仓ETH

Cardano (ADA) 生态系统增长推动质押活动强劲增加

美国商品期货交易委员会为受监管的加密货币和比特币永续合约铺平道路——Kalshi紧随其后

美国将对法官要求广泛退还特朗普关税的裁决提出上诉。美国在向贸易法院提交的文件中披露了对退税裁决提出上诉的计划。比特币目前涨0.1%

Claude Opus4.8发布，Anthropic开始把「可信」做成产品卖点

交易