腾讯新模型发布,姚顺雨交卷了

marsbitPublished on 2026-04-27Last updated on 2026-04-27

文 | 新眸,作者 | 李小东

大模型的牌局上,腾讯终于甩出了一张新牌。

4月23日,腾讯混元Hy3 preview语言模型正式发布并开源。这是一款快慢思考融合的混合专家(MoE)模型,总参数295B(2950亿),激活参数21B,最大支持256K上下文长度。官方给它的定义干脆利落:“混元重建后训练的第一个模型”,以及“混元迄今最智能的模型”。

如果把目光往回拉四个月,腾讯AI的日子并不好过。今年年会上,马化腾亲口承认“动作慢了”,慢了9个月到1年。刘炽平复盘,混元像高中生背题应考,成绩单好看,一上真实考场就露馅。与此同时,字节豆包月活已达3.45亿,阿里千问1.66亿,而元宝约5700万,差距不仅没缩小还在拉大。

所以当28岁的姚顺雨——前OpenAI研究员、清华姚班出身、中国互联网最受瞩目的天才少年,在去年被正式任命为腾讯“CEO/总裁办公室”首席AI科学家时,外界解读只有一个意思:腾讯要动真格了。

四个月后,Hy3 preview上线。交卷时刻似乎到了。

01 一场“推倒重来”式的重建

Hy3 preview不是一次常规迭代。用腾讯自己的话说,这是一场底层工程的重构。2月,混元重建了预训练和强化学习的基础设施,原有的训练框架没有沿用,姚顺雨到任后在一个月内完成了整个基础设施的重建。

重建的方向很明确,三个原则:能力体系化、评测真实性、性价比追求。翻译一下就是,不做“偏科生”,不刷榜,不让模型变成烧钱的无底洞。

“不做偏科生”这条尤其值得展开。Hy3 preview从研发之初就围绕智能体场景做了针对性设计,而姚顺雨的判断是,即便是代码智能体的单一应用,也涉及推理、长文、指令、对话、代码、工具等多种能力的深度协同。你不能让模型只会写代码却看不懂文档,或者能聊天但调不了API。

同时,姚顺雨指出,过去的混元过度追逐榜单成绩,将打榜语料放入训练集导致数据被污染,影响了真实场景表现。他要求团队“以后不要打榜”,主动跳出易被刷榜的公开榜单,通过自建题目、最新考试、人工评测、产品众测等多种方式来评估模型的“真实战斗力”。

从研发节奏来看,Hy3 preview于2026年1月底正式启动训练,从训练到上线用了不到三个月,被腾讯内部定义为混元大模型从“读万卷书”走向“行万里路”、尝试解决真实世界复杂问题的开端。

不到三个月,重建基础设施、定方向、训模型、开源发布。这个速度放在大厂体系里,相当激进。

Hy3 preview最核心的技术理念是“快慢思考融合”。

这个概念对应的是认知科学中的双系统理论:系统1(快思考)是快速、自动、直觉式的反应;系统2(慢思考)是缓慢、深度、需要调用大量计算资源的推理。传统大模型在设计时通常两条路只能选一条,要么快但能力有限,要么强但响应慢。

Hy3 preview的做法是让模型根据任务难度自动选择思考模式:简单任务用快思考,复杂任务切换到慢思考,在速度和能力之间找一个最优平衡点。

工程上,这套机制依赖MoE架构来实现。295B的总参数中,每次推理只激活21B,激活占比约7.1%。这意味着实际计算量远小于一个2950亿的稠密模型。

慢思考任务会激活更多专家、调用更多计算资源,快思考任务则只激活少数专家、节省算力。快慢思考的切换不是简单叠加两个模型,而是在一个模型内部根据任务自适应用分配计算量。

这个设计思路并不新鲜,但能在不到三个月里完成架构选型、训练和上线,背后的工程能力确实不容小觑。

对腾讯这种手握微信、QQ、腾讯文档等海量用户产品的公司来说,推理成本的可控性很大程度上决定了模型能否真正进入产品,Hy3 preview的架构选择也正因如此具备了现实的商业考量。

02 不刷榜的底气是什么?

既然说“不打榜”,评估体系就得自己建。

腾讯混元提出了CL-bench和CL-bench-Life两套评测框架,重点考察模型在长而杂乱的上下文中理解信息、遵循复杂规则并完成任务的能力。这两套框架对应的正是许多真实生产和生活场景里最常见、也最难被传统榜单覆盖的问题。

在具体性能表现上,Hy3 preview在几个关键基准测试中取得了有竞争力的结果。在编程基准SWE-Bench Verified中,Hy3 preview得分74.4%,相比前代Hy2的53.0%提升超过40%,已接近GLM-4.7的水平。

在复杂推理任务上,Hy3 preview在FrontierScience-Olympiad、IMOAnswerBench等高难度理工科推理任务中表现突出,全国中学生生物学联赛(CHSBO 2025)等高难度推理任务中,Hy3 preview同样取得优异成绩,展现了其在复杂逻辑推理上的泛化能力。

虽然没有刻意追求任何单一维度的“SOTA”,但Hy3 preview在各个方向上都展现出了相当均衡的竞争力。这种选择恰好印证了姚顺雨在AGI-Next峰会上传递的信号,行业需要跳出“打榜”束缚,核心聚焦在真实用户价值。

不过也要看到,Hy3 preview的某些实测表现并不完美。

有机构一手测试显示,在一个覆盖数据抓取、数值计算、可视化生成和文本分析的全链路综合任务中,Hy3 preview在数据获取阶段反复受阻,接口认证失败后接连切换多个数据源,部分数据因速率限制缺失而被迫用模拟数据替代;

最关键的是,提示词明确要求输出500字的跨市场资产配置Memo,模型却只给出了几行Bullet point式的简略配置比例,没有成文的分析段落。

这表明Hy3 preview在真实复杂场景下的交付完整性仍有不小提升空间。当然,作为一款preview版本,这些瑕疵大致在预期之内。

除此以外,价格层面可能也是现在最受关注的问题之一,Hy3 preview在腾讯云大模型服务平台TokenHub上的定价是:输入价格最低1.2元/百万tokens,输入命中缓存价格0.4元/百万tokens,输出价格最低4元/百万tokens。同时,腾讯云联合混元推出的定制Token Plan套餐,个人版定价最低28元/月。

放在当前的市场坐标系里,Hy3 preview的价格并不算激进。

作为对比,DeepSeek-V4-Flash的输入价格为0.2元/百万tokens,V4-Pro在限时优惠后输入缓存命中价格更是低至0.025元/百万tokens。在OpenRouter平台上,DeepSeek-V4-Flash每百万Token的平均输出价仅为GPT-5.5 Pro的1.55‰。

但在“百模大战”进入Agent时代的当下,腾讯的价格逻辑是清晰的:不拼绝对低价,追求“能力-成本-场景”的三角平衡。

21B的激活参数本身就是一张成本牌的底,结合MoE架构的高效推理,它为Agent场景中高频次、长链路的调用提供了一个相对可控的成本底座。

换句话说,摸到了Agent落地的门槛。

03 腾讯AI的底牌仍然是自有生态

模型的真正价值在于用起来。

Hy3 preview目前在腾讯云、元宝、ima、CodeBuddy、WorkBuddy、QQ、QQ浏览器、腾讯文档、腾讯乐享等首发上线,微信公众号、和平精英、腾讯新闻、腾讯自选股、腾讯客服、微信读书等多个主线产品也在陆续接入中。

值得留意的是开源生态的接入:Hy3 preview支持接入OpenClaw、OpenCode、KiloCode等流行开源智能体产品。这意味着腾讯不仅在用自己的模型武装自家的产品矩阵,也在试图进入更广泛的开源智能体生态。

但产品侧的挑战同样直接。元宝此前接入DeepSeek-R1后日活暴涨超20倍,但搜索链路分走混元和DeepSeek两套系统,体验不统一,留存转化始终是难题。Hy3 preview全面接入后,能否解决这个“分流”问题,将是检验模型真实战斗力的第一次试金。

目前,腾讯旗下最大的AI应用元宝已全面接入Hy3 preview。从微信到QQ,从腾讯文档到和平精英,腾讯的产品矩阵正在围绕一个统一的模型底座进行集结。这种“自有生态+自有模型”的打法,和字节豆包依托火山引擎的路径形成了有趣的对比。

回到Hy3 preview,发布当天,同一天晚上,OpenAI发布GPT-5.5。不到24小时,DeepSeek V4预览版跟着上线。

这是一个缩影。今年大模型牌局,对手们出牌的速度远比外界想象的快。

Meta前段时间凭Muse Spark打了个翻身仗,股价当日大涨;Google的Gemini 3.1系列继续保持强势,AI聊天机器人市场份额已从不到6%攀升至约20%以上。国内方面,阿里的Qwen3.6-Max-Preview,月之暗面的Kimi K2.6。更早之前,豆包大模型2.0首次大版本跨代升级,百度也发了2.4万亿参数的原生全模态文心大模型5.0正式版。

至于DeepSeek,V4-Pro在Agent能力、世界知识和推理性能上达到开源模型最佳水平,并在两天内连续降价,部分价格降至原价的四十分之一,V4-Flash每百万tokens输入缓存命中价格仅为0.02元。

行业里有了共识,跟DeepSeek拼价格,对任何厂商都不是划算的买卖。

在这一背景下,腾讯按自己的节奏走“实用主义+生态落地”的路线。正如汤道生此前判断的,主流大模型的能力差距正在缩小,企业的核心需求已不再是拥有最好的模型,而是如何通过系统工程把模型的能力最大程度发挥出来,真正拉开差距的是“工程化交付能力”。

04 姚顺雨:从“定义下半场”到“交出模型”

整件事最特别的地方在于一个人:姚顺雨。

2025年4月,还在OpenAI的姚顺雨发表了一篇博客《The Second Half》,提出AI已经从上半场走到下半场,重点不再是训练出更强的模型,而是如何定义值得解决的问题,用更接近真实世界的方式评估模型。

这篇博客让他获得了“定义AI下半场的人”这样的标签。

加入腾讯后,他需要从提出判断走向落地判断。四个月,一套新的基础设施,一个新模型,一次开源。对于外界来说,Hy3 preview是一个答案的开头。

姚顺雨自己的表态很清醒:“Hy3 preview是混元大模型重建的第一步。我们希望通过这次开源和发布,获得来自开源社区和用户的真实反馈,帮助我们提升Hy3正式版的实用性”。

这番话没有任何夸口的成分,倒更像一个阶段性的项目汇报。

公开信息显示,除了姚顺雨,腾讯在过去一年从微软、阿里、DeepSeek等顶尖团队引入了不下10位AI大牛,包括前微软亚洲研究院视觉计算组首席研究员胡瀚、微软WizardLM项目创建者徐灿等。腾讯在AI人才上的投入力度,从薪资、职级到职责范围,都给了候选人能在业内拿到的几乎最高水平。

Hy3 preview不是一个天才少年单枪匹马的成果,而是一个重新集结后的团队,在一个被重建的基础上做出的第一款产品。

对于腾讯而言,Hy3 preview本质上是在回答一个问题:腾讯大模型到底还行不行?从参数、架构、评测数据和产品落地来看,这份答卷至少在及格线之上。

但一个preview版本只是起点。在这样一个对手环伺、节奏加速的牌局里,腾讯需要的是一个能持续迭代、真正在自有生态里扎根、并最终跑出差异化价值的模型体系。

这才是接下来真正值得关注的问题:Hy3正式版何时到来,腾讯的产品矩阵能否围绕它形成一套真正自洽的“模型-应用-商业”闭环,元宝能否在混元自己的基座上跑出留存和增长,以及,在Agent时代真正到来时,腾讯的生态纵深能否转化为实际的竞争优势。

四个月前,姚顺雨拿到了一张新牌桌。四个月后,Hy3 preview是第一张牌。接下来怎么打,才是看功力的地方。

本文系新眸原创,申请转载授权、商务合作请联系微信:ycj841642330,添加好友请备注公司和职位。

— END —


更多内容,点击下方关注

Trending Cryptos

Related Reads

The "Impossible Triad" Is Fundamentally a Pseudo-Problem

The article argues that blockchain's fundamental limitation is not the scalability trilemma (decentralization, scalability, security), which has been largely solved, but the lack of **privacy** and, until recently, clear **legitimacy**. Blockchain is described as a slow, expensive, globally shared computer whose core value is censorship resistance and verifiability. While ideal for native digital assets like money (e.g., stablecoins), its default transparency acts as a **tax**, exposing all transactions and enabling MEV extraction, which deters serious institutional capital. Simultaneously, its permissionless nature created regulatory ambiguity. The piece contends that **privacy** is the missing critical feature. It rejects the false choice between total transparency and complete anonymity. Modern cryptography (like zero-knowledge proofs) enables **compliant privacy**: users can prove facts (solvency, KYC status, compliance) without revealing the underlying sensitive data (specific holdings, identities). This preserves auditability for regulators and eliminates the leak of financial information. With recent regulatory progress (e.g., the GENIUS Act) addressing legitimacy, adding default, provably compliant privacy becomes a pure upgrade. It transforms blockchain from a costly, public ledger into a confidential settlement layer, finally bridging the gap to mainstream institutional and individual adoption of on-chain finance.

链捕手10h ago

The "Impossible Triad" Is Fundamentally a Pseudo-Problem

链捕手10h ago

Optical Chips: Collective Capacity Expansion

The global optical chip industry is experiencing a massive wave of expansion driven by surging AI data center demand. Major players across the US, Japan, Europe, and China are aggressively investing to ramp up production capacity. In the US, Coherent is expanding its 6-inch Indium Phosphide (InP) semiconductor fab in Texas, supported by CHIPS Act funding and a $2 billion strategic investment from NVIDIA. Lumentum is building a new factory for InP optical devices, and Nokia is scaling its advanced photonic chip packaging and testing capabilities. NVIDIA's investments aim to secure future supply of critical lasers and optical interconnect products for AI infrastructure. Japan's JX Advanced Metals, a leading InP substrate supplier, plans a multi-billion yen investment to increase its capacity 7-10 times, strengthening its grip on the crucial upstream materials market. In Europe, IQE and Tower Semiconductor settled a patent dispute and signed a multi-year InP epitaxial wafer supply agreement, highlighting that next-generation silicon photonics platforms will integrate high-performance InP components. STMicroelectronics and Sivers Semiconductors are also expanding silicon photonics production and partnerships. China is rapidly building out its domestic supply chain. Dongshan Precision's subsidiary, Source Photonics, announced a $12 billion project to expand optical chip and module production. Companies like Sanan Optoelectronics and Yunnan Germanium are scaling up InP chip manufacturing and substrate production, moving towards vertical integration from materials to modules. While debate continues around the exact future architecture—whether CPO (Co-Packaged Optics), NPO, or pluggables will dominate—analysts like Morgan Stanley argue the underlying driver is unchangeable: the explosive growth in bandwidth demand. This will inevitably increase the volume of optical engines, lasers, and related content per GPU, regardless of the final technical path. The competition for "more light" in the AI era has intensified into a global, full-chain capacity race.

marsbit12h ago

Optical Chips: Collective Capacity Expansion

marsbit12h ago

Stablecoins Finally Find Real Yield: An In-Depth Look at On-Chain Reinsurance Re | A Conversation with Re Founder Karan Saroya

Stablecoin Real Yield Found: A Deep Dive into On-Chain Reinsurance with Re's Karan Saroya As stablecoin supply exceeds $170 billion, the search for sustainable, non-speculative yield intensifies. Re, an on-chain reinsurance platform, provides an answer: connecting stablecoin capital to the trillion-dollar traditional reinsurance market. Re operates as a regulated reinsurer, accepting stablecoin deposits as collateral to back US insurance companies. These insurers pay premiums, generating yield that flows back to on-chain depositors. Currently supporting 35 insurers and underwriting $500 million, Re projects scaling to over $1 billion soon. Key insights from a Bankless podcast with founder Karan Saroya and investor Avichal of Electric Capital: 1. **Uncorrelated, Real-World Yield:** Re offers stablecoin holders access to reinsurance returns (targeting 12-14%+), an asset class entirely separate from crypto or equity markets. 2. **Operational Efficiency via Smart Contracts:** Re replaces traditional, labor-intensive capital fundraising with smart contracts, allowing a ~12-person team to compete with industry giants. 3. **Regulatory Leverage:** For every $1 of collateral, regulations allow backing $5-7 in written premiums. This leverage amplifies returns from the underlying risk-free rate. 4. **DeFi Integration:** Depositors receive receipt tokens, which can be used in protocols like Morpho for "looping," potentially pushing yields to 18-20%+. 5. **The "DeFi Mullet" Model:** A compliant front-end (regulated reinsurer) paired with a decentralized back-end (smart contracts, DeFi capital markets). 6. **RE Governance Token:** Modeled on Lloyd's of London, the token governs the central capital pool's allocation, counterparty acceptance, and parameters. 7. **Real Economic Impact:** Capital funds real-world productivity (factories, clinics, businesses) via insurance, moving beyond crypto's internal loops. The discussion highlights a pivotal moment: DeFi's supply-side infrastructure is now met by real demand for productive yield, potentially kickstarting a flywheel where vast on-chain stablecoin capital seeks these real-world returns.

链捕手13h ago

Stablecoins Finally Find Real Yield: An In-Depth Look at On-Chain Reinsurance Re | A Conversation with Re Founder Karan Saroya

链捕手13h ago

1996 or 1999? Walsh's First Test is 'How to View AI'

"1996 or 1999? Wall's First Big Test Is 'How to View AI'" Federal Reserve Chairman Wall's initial challenge is not whether to raise or cut rates, but a more fundamental judgment: what kind of boom is the current AI boom? This will determine the Fed's policy path and define his legacy. Economics is split between two opposing views, according to reporter Nick Timiraos. One sees imminent productivity gains that will increase supply and cool inflation, allowing the Fed to hold steady. The other argues that while productivity benefits are distant, demand shocks are here now, and waiting for data confirmation risks missing the intervention window, forcing sharper rate hikes later. Wall has signaled a leaning toward the first view, echoing 1996-era Alan Greenspan, who embraced strong, productivity-driven growth without fear of inflation. However, Wall faces a different macro environment than Greenspan did, with tariff pressures, expanding fiscal deficits, and diminishing globalization benefits, which could force more significant inflation pressures even if AI benefits materialize. Wall's logic, expressed before taking office, is that AI-driven productivity gains won't show in official data for years. If the Fed waits for confirmation, it might mistakenly tighten policy and choke off the very growth that could suppress inflation. This argues for using forward-looking narratives over lagging data. Chicago Fed President Austan Goolsbee presents a key counter-argument. He distinguishes between expected and unexpected productivity booms. A widely anticipated boom, like the current AI wave, can cause people to spend future wealth gains in advance, overheating the economy before productivity actually rises, thus requiring preemptive rate hikes. He cites rising costs for AI data centers as evidence of such overheating. Fed Governor Christopher Waller offers a rebuttal to Goolsbee, noting the "expected spending" mechanism only works if people can borrow against future income, which many households cannot do due to borrowing constraints. Wall also faces a paradox related to his desire to reduce the Fed's use of "forward guidance" (pre-announcing policy moves). This practice was established in 1999 when Greenspan began signaling hikes to avoid market shocks. If the economy follows a less optimistic path, Wall may be forced to choose between using the guidance he wants to abolish or risking market volatility by staying silent. The ultimate question defining Wall's first major test remains: Is this 1996 or 1999?

marsbit14h ago

1996 or 1999? Walsh's First Test is 'How to View AI'

marsbit14h ago

Trading

Spot
Futures

Hot Articles

How to Buy 4

Welcome to HTX.com! We've made purchasing 4 (4) simple and convenient. Follow our step-by-step guide to embark on your crypto journey.Step 1: Create Your HTX AccountUse your email or phone number to sign up for a free account on HTX. Experience a hassle-free registration journey and unlock all features.Get My AccountStep 2: Go to Buy Crypto and Choose Your Payment MethodCredit/Debit Card: Use your Visa or Mastercard to buy 4 (4) instantly.Balance: Use funds from your HTX account balance to trade seamlessly.Third Parties: We've added popular payment methods such as Google Pay and Apple Pay to enhance convenience.P2P: Trade directly with other users on HTX.Over-the-Counter (OTC): We offer tailor-made services and competitive exchange rates for traders.Step 3: Store Your 4 (4)After purchasing your 4 (4), store it in your HTX account. Alternatively, you can send it elsewhere via blockchain transfer or use it to trade other cryptocurrencies.Step 4: Trade 4 (4)Easily trade 4 (4) on HTX's spot market. Simply access your account, select your trading pair, execute your trades, and monitor in real-time. We offer a user-friendly experience for both beginners and seasoned traders.

4.4k Total ViewsPublished 2025.10.20Updated 2026.06.02

How to Buy 4

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of 4 (4) are presented below.

活动图片