Meta 天价收购 Scale AI 近半股权背后,Web3 AI 如何摆脱偏见?

深潮Published on 2025-06-11Last updated on 2025-06-11

无论 Web3 AI 还是 Web2 AI,都已经从「卷算力」走到了「卷数据质量」的十字路口。

撰文:Haotian

一边是 Meta 砸下 148 亿美元收购 Scale AI 近半股权,整个硅谷都在惊呼巨头用天价为「数据标注」重新定价;另一边则是即将 TGE 的

@SaharaLabsAI,依然被困在「蹭概念、无法自证」的 Web3 AI 偏见标签下。这种巨大反差背后,市场到底忽略了什么?

首先,数据标注是比去中心化算力聚合更有价值的赛道。

用闲置 GPU 挑战云计算巨头的故事确实精彩,但算力本质上是标准化商品,差异主要在于价格和可获得性。价格优势看似能从巨头垄断中找到缝隙,但可获得性受制于地理分布、网络延迟以及用户激励不足,一旦巨头降价或增加供给,这种优势瞬间就会被抹平了。

数据标注则完全不同——这是一个需要人类智慧和专业判断的差异化领域。 每一个高质量标注都承载着独特的专业知识、文化背景和认知经验等等,根本无法像 GPU 算力那样「标准化」复制。

一个精准的癌症影像诊断标注,需要资深肿瘤医生的专业直觉;一个老道的金融市场情绪分析,离不开华尔街 Trader 的实战经验。这种天然的稀缺性和不可替代性,让「数据标注」具备了算力永远无法企及的护城河深度。

6 月 10 日,Meta 正式宣布以 148 亿美元收购数据标注公司 Scale AI 49% 的股份,这是今年 AI 领域最大的单笔投资。 更值得关注的是,Scale AI 创始人兼 CEO Alexandr Wang 将同时担任 Meta 新成立的「超级智能」研究实验室负责人。

这位 25 岁的华裔企业家于 2016 年创立 Scale AI 时还是斯坦福大学的辍学生,如今他掌管的公司估值已达 300 亿美元。Scale AI 的客户名单堪称 AI 界的「全明星阵容」:OpenAI、特斯拉、微软、国防部等都是其长期合作伙伴。 该公司专门为 AI 模型训练提供高质量数据标注服务,拥有超过 30 万名经过专业培训的标注员。

你看,当所有人还在为谁家模型跑分更高争论不休时,真正的玩家已经悄悄把战场转移到了数据源头。

一场关于 AI 未来控制权的「暗战」已经开始。

Scale AI 的成功暴露了一个被忽视的真相:算力不再稀缺,模型架构趋于同质化,真正决定 AI 智能上限的是那些被精心「调教」过的数据。Meta 用天价买下的不是一家外包公司,而是 AI 时代的「石油开采权」。

垄断的故事总有反叛者。

正如云算力聚合平台试图颠覆中心化云计算服务一样,Sahara AI 试图用区块链彻底重写数据标注的价值分配规则。传统数据标注模式的致命缺陷不是技术问题,而是激励设计问题。

一个医生花几小时标注医疗影像,拿到的可能就是几十美元劳务费,而这些数据训练出的 AI 模型价值数十亿美元,医生却分不到一分钱。这种价值分配的极度不公平,严重抑制了高质量数据的供给意愿。

而有了 web3 代币激励机制的催化,他们都不再是廉价的数据「农民工」,而是 AI LLM 网络的真正「股东」。显然,web3 改造生产关系的优势相比算力更适用于数据标注场景。

有趣的是,Sahara AI 恰好在 Meta 天价收购的节点 TGE,是巧合还是精心策划?在我看来,这其实反映了一个市场拐点:无论 Web3 AI 还是 Web2 AI,都已经从「卷算力」走到了「卷数据质量」的十字路口。

当传统巨头用金钱筑起数据壁垒时,Web3 正在用 Tokenomics 构建一个更大的「数据民主化」实验。

Trending Cryptos

Related Reads

Just now, DeepSeek V4 updates with DSpark, improving inference speed by 80%

DeepSeek has updated its DeepSeek V4 model with the DSpark speculative decoding framework, achieving a significant 60-85% speedup in generation for Flash models and 57-78% for Pro models while maintaining the same overall throughput. This engineering-focused update, rather than a core architectural change, introduces DSpark to address latency and throughput bottlenecks in high-concurrency production environments. DSpark combines high-throughput parallel generation with adaptive load-aware verification. Its key innovations include a semi-autoregressive generation architecture to model dependencies within token blocks and a hardware-aware confidence-scheduled verification system. This system uses a confidence head to predict token acceptance probabilities, allowing it to dynamically optimize verification length per request and allocate compute only to tokens with the highest expected payoff. The asynchronous scheduler is designed for real-world deployment, ensuring zero-overhead scheduling and continuous CUDA graph replay while preserving the target model's output distribution. In tests across mathematical reasoning, code generation, and daily dialogue, DSpark outperformed state-of-the-art models like Eagle3 and DFlash, increasing average acceptance length by 26.7%-30.9% and 16.3%-18.4% respectively on Qwen3 target models. DeepSeek also open-sourced DeepSpec, a full-stack codebase for training and evaluating speculative decoding draft models, providing a standardized toolkit that includes data preparation tools, model implementations, training code, and evaluation scripts.

marsbit2h ago

Just now, DeepSeek V4 updates with DSpark, improving inference speed by 80%

marsbit2h ago

BIT Research: The 2028 Halving Is Not the End, the Real Shake-Up of the Bitcoin Mining Industry Is Just Beginning

The Bitcoin mining industry is undergoing its most complex structural adjustment since inception. Despite Bitcoin's price holding near $61,000 and the network hash rate approaching a record 1 ZH/s, miner profitability is deteriorating. The industry is operating close to its breakeven point, with the 2028 halving expected to accelerate consolidation. The challenges extend beyond the halving's subsidy reduction; the industry's revenue model has yet to successfully transition towards a fee-driven structure. Increasingly, mining companies are evolving from simple Bitcoin producers into infrastructure and energy operators, including providers of AI/HPC computing power. Competition is shifting from pure hash rate expansion to business model upgrades. Economic pressure is evident. The theoretical daily mining revenue at current prices is around $78 million, yet the actual figure is only about $33 million—a 136% gap. Transaction fees remain low at roughly $220k daily, far below historical implied levels. With a current estimated industry-wide breakeven price near $65,000, mining alone is struggling to generate ideal profits. The 2028 halving is projected to push the fundamental production cost floor to approximately $93,289. This will likely accelerate a shift towards consolidation among larger, well-capitalized miners with diversified revenue streams. Competitive advantage will belong to institutionalized players with access to low-cost energy, AI/HPC hosting operations, and stronger balance sheets. In essence, Bitcoin mining is transitioning from a "mining business" to an "infrastructure business." Future profitability and resilience will depend less on block rewards and more on diversified income sources like energy management and computational infrastructure services. For investors, the key question is not the halving itself, but which miners can successfully navigate this business model transformation.

marsbit4h ago

BIT Research: The 2028 Halving Is Not the End, the Real Shake-Up of the Bitcoin Mining Industry Is Just Beginning

marsbit4h ago

This is How God Karpathy Uses Claude?

Andrej Karpathy, a prominent figure in AI, has reportedly joined Anthropic, leading to a noticeable decrease in his open-source contributions and social media activity. A document claiming to be his personal "CLAUDE.md" file—a set of instructions for the Claude AI to follow within a specific codebase—has been circulating online. While its authenticity is unverified, the content aligns closely with Karpathy's publicly shared principles on effective AI-assisted programming. The document outlines key rules for AI coding assistants, emphasizing the importance of reading existing code thoroughly before writing new code to maintain consistency. It advises against over-engineering, advocating for simple, surgical modifications that match the project's existing style. Other guidelines include clarifying assumptions upfront, writing meaningful tests, thoughtful debugging, and carefully considering dependencies. The core message is that these principles help prevent common AI coding failures, such as introducing unnecessary abstractions, style drift, or making invisible architectural decisions. The community has noted that even experts like Karpathy require detailed instructions to guide AI effectively, akin to managing a junior developer. A related GitHub repository, "andrej-karpathy-skills," which encapsulates these ideas, is reported to significantly reduce Claude's code error rate. Ultimately, the advice stresses that the best CLAUDE.md is tailored to one's own tech stack and coding practices.

marsbit4h ago

This is How God Karpathy Uses Claude?

marsbit4h ago

Trading

Spot

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片