The Recursive AI Anthropic Warned About: Tian Yuandong's New Company Has Just Taken the "First Step"

marsbit发布于2026-06-12更新于2026-06-12

文章摘要

Anthropic recently highlighted the rapid progress toward "recursive self-improvement," where AI systems autonomously design and train their successors. In response, Recursive Superintelligence, a new company co-founded by former Meta researcher Tian Yuan Dong, has publicly demonstrated its first step toward automating AI research. The company released a system designed to autonomously execute the full AI research cycle: generating ideas, implementing code, running experiments, and learning from results. It validated this approach by achieving state-of-the-art results on three diverse benchmarks: 1. **NanoChat Autoresearch:** Optimizing a small language model's validation loss under a fixed 5-minute GPU budget, improving upon the community's best result. 2. **NanoGPT Speedrun:** Reducing the time to train a GPT model to a specific loss on 8 H100 GPUs from 79.7 seconds to 77.5 seconds, beating a highly optimized, human-driven community effort. 3. **SOL-ExecBench:** Improving the overall score on NVIDIA's suite of 235 GPU kernel optimization tasks by 18%, closing the gap to the hardware limit. The system discovered novel optimizations in this highly specialized domain without direct human expertise. Recursive's system operates as a general framework, capable of parallel exploration and cross-task knowledge transfer while incorporating safeguards against reward hacking. The company, backed by $650M in funding and a star-studded team including Richard Socher and Alexey Dosov...

Recently, Anthropic published an article titled "When AI Builds Itself," which quickly sparked widespread discussion. The article revealed a striking set of internal data: as of May 2026, over 80% of the code in Anthropic's codebase had been written by Claude, with engineers merging eight times more code per day than in 2024. In an internal test, Claude improved the runtime of a piece of training code by approximately 52x over a baseline, whereas an experienced human researcher typically takes 4 to 8 hours to achieve a 4x speedup.

Anthropic points this trajectory towards a deeper destination: "Recursive Self-Improvement"—AI systems autonomously designing, building, and training their own successive versions, with humans no longer driving every step. Notably, the company also called for industry coordination to have the option to pause or even temporarily halt frontier AI development when the moment of recursive self-improvement arrives. And Anthropic is already doing this: restricting its latest Claude Fable 5 from being used for frontier AI research.

Now, Recursive Superintelligence has announced it has taken the first step toward automated AI research.

This new company co-founded by Tian Yuandong has been out of stealth mode for just one month, and has now released its first public technical achievement. They have built an open-ended automated knowledge discovery system and achieved state-of-the-art (SOTA) results on three benchmarks. Simply put, they have succeeded in making AI run experiments for you.

https://x.com/tydsh/status/2065062838255649082

The First Result: Let AI Run Experiments for You

Recursive's first public technical achievement is called "First Steps Toward Automated AI Research."

Tweet: https://x.com/Recursive_SI/status/2064980090702962699

Repo: https://github.com/recursive-org/first-steps-toward-automated-ai-research

Blog: https://www.recursive.com/articles/first-steps-toward-automated-ai-research

To summarize in one sentence, the core of this work is: building a system capable of autonomously advancing the AI research cycle and setting new records on three benchmark tests.

Before dissecting the results, it's necessary to understand the design logic of this system.

The traditional AI research process is a highly human-dependent closed loop of "propose idea—write code—run experiment—analyze results—propose new idea." Its efficiency bottleneck lies not in computing power, but in people. The number of researchers worldwide who can design frontier training pipelines is exceedingly small, and each round of experimental iteration requires their intensive involvement.

Recursive's system attempts to automate this closed loop.

Its working method is: for a clearly defined optimization objective, the system automatically proposes experimental ideas, implements code, runs validation, learns from it, and then decides how to search next. Multiple research lines can be advanced in parallel, effective discoveries can be reused across tasks, and mechanisms for detecting reward hacking are embedded within the entire loop to prevent the system from "taking shortcuts" to inflate evaluation metrics without genuinely improving anything.

This is not a specialized tool fine-tuned for a single problem, but rather a general-purpose research automation framework spanning different domains. Recursive demonstrates this using three significantly different test scenarios.

Three Battlefields, Three New Records

Scenario One: Small Model Training Under Fixed Compute Budget (NanoChat Autoresearch)

The rules for this benchmark come from the autoresearch project initiated by Andrej Karpathy (author of GPT-2, former OpenAI co-founder): on a single GPU, given a fixed training budget of five minutes, train a small language model to achieve the lowest possible validation loss (measured in BPB, lower is better).

This scenario is naturally suited for automated research: short experimental cycles, low metric variance, relatively easy detection of cheating behavior. Precisely because of this, a community project called "autoresearch@home" has been running on this benchmark for a long time—dozens of human researchers collaborating with hundreds of AI agents continuously pushing the metric down.

Recursive's system started from the same initial code and ultimately improved the validation BPB from the community's best of 0.9372 to 0.9109, an improvement of 0.0263 BPB. Put another way: to achieve the same training quality, Recursive's solution requires 1.3 times less training time than the competitor's.

The improvements discovered by the system were not a single silver bullet. It combined architecture adjustments, auxiliary losses, attention mechanism modifications, optimizer behavior, weight decay scheduling, compiler settings, and more. One of the key discoveries was a richer short-context memory mechanism: within the attention's value path, embedding both bigram (adjacent word pairs) and trigram (triplet) information via hash tables, with weighted mixing via learnable gating. Different Transformer layers use different hash functions, reducing the probability of cross-layer collision.

This trick is conceptually related to works like DeepSeek Engram, but the system deployed it in a specific variant not yet seen in published literature for the fixed-budget scenario.

Scenario Two: Training Speed Limit Race (NanoGPT Speedrun)

If the previous scenario was about "going one step further" on an active community's results, this scenario is much harder.

NanoGPT Speedrun is another benchmark initiated by Karpathy and continuously optimized by the community for over two years: the shortest time required to train a GPT model to a validation loss of 3.28 on 8 H100 GPUs. Since mid-2024, the community has compressed the time from about 45 minutes to 79.7 seconds through 83 documented contributions. Each new solution must squeeze out more time from an already extremely optimized codebase, making the difficulty self-evident.

Recursive's system started from the existing optimal solution and further compressed the training time to 77.5 seconds, saving 2.2 seconds. This improvement is comparable to, or even better than, what recent human contributors have achieved.

The core tricks found by the system this time include:

FP8 Precision Attention Computation. The community solution used FP8 (8-bit floating point) computation only in the model's final layer (language model head). The system extended FP8 into the matrix operations of the attention layers, using FP8 for forward propagation to achieve twice the Tensor Core throughput, while retaining BF16 for backward propagation to maintain stability.

Annealing Exploration Noise in the Optimizer. The system injected zero-mean Gaussian noise into the update steps of the NorMuon optimizer, with the noise amplitude linearly annealing to zero as training progressed. This is somewhat like giving the optimizer a behavior pattern of "explore boldly first, then converge robustly," helping the final solution settle in a flatter loss basin.

More Streamlined Fused MLP Kernel. The system rewrote a Triton GPU kernel so that forward propagation only stores activation values after ReLU squaring, and during backward propagation, the unsquared intermediate results are recomputed internally within the kernel, saving one full round-trip read/write of the activation tensor in high-bandwidth GPU memory—a direct hardware-level speedup.

Three improvements, belonging to three different specialized areas: precision strategy, optimizer design, and GPU kernel programming. The fact that the system found room for improvement on a result optimized by the community for two years speaks for itself.

Scenario Three: GPU Kernel Optimization (SOL-ExecBench)

The first two scenarios operated at the model training level. The third scenario delves deeper: optimizing GPU compute kernels.

SOL-ExecBench is a benchmark introduced by NVIDIA, containing 235 kernel writing tasks covering various real-world workloads like matrix multiplication, reduction, normalization layers, attention components, quantization routines, fused blocks, etc. The scoring metric is the SOL score: 0.5 corresponds to a baseline PyTorch implementation, and 1.0 corresponds to the hardware's theoretical limit. The previous best public score was 0.699.

Recursive's system ran on all 235 kernels, allowing discovered optimization patterns (e.g., memory access strategies, tiling methods, reduction techniques) to be reused across tasks. The final score improved to 0.754, reducing the gap to the hardware limit by 18%.

This scenario is particularly significant because kernel engineering is an extremely specialized field—engineers who can write efficient Triton/CUDA kernels are rare globally. The Recursive team candidly admits in their blog, "We ourselves are not experts in kernel engineering. These ideas came from the system itself, not from our specialized background."

Recursive: Using AI to Research and Recursively Improve AI

The company releasing this achievement, Recursive Superintelligence, was founded between late 2025 and early 2026 and only came out of stealth last month. In addition to Tian Yuandong, former Research Scientist Director at Meta FAIR, the founding team includes:

Richard Socher, Recursive CEO, former Chief Scientist at Salesforce.

Alexey Dosovitskiy, former Google DeepMind Research Scientist and first author of Vision Transformer, with over 160,000 Google Scholar citations.

Tim Rocktäschel, former DeepMind Principal Scientist and UCL AI Professor.

Peter Norvig, former Google Director of Research, co-author with Stuart Russell of the famous AI textbook "Artificial Intelligence: A Modern Approach."

Caiming Xiong, former VP of AI at Salesforce.

Tim Shi, former OpenAI researcher, co-founder and CTO of enterprise AI company Cresta.

Josh Tobin, Recursive CTO, former Research Lead at OpenAI and Uber ATG.

Jeff Clune, former VP of Research at Google DeepMind, Professor of Computer Science at the University of British Columbia, Canada.

Remarkably, this startup, without even having a public product yet, has already secured $650 million in funding with a valuation of $4.65 billion, led by GV (Google Ventures) and Greycroft, with follow-on investment from NVIDIA and AMD Ventures.

The company's core proposition directly corresponds to its name: building AI systems that can recursively enhance their own research capabilities, allowing AI to participate in and accelerate the R&D process of AI itself, ultimately forming a self-reinforcing closed loop.

For more details, refer to the report "After Leaving Meta, Tian Yuandong Just Announced His Startup."

Of course, Recursive is not alone in this arena. Yann LeCun's AMI Labs raised $1 billion in March this year, and David Silver's Ineffable Intelligence secured a $1.1 billion seed round in April, both pointing in a similar direction: enabling AI systems to autonomously generate knowledge and reduce human intervention in the research process. However, in terms of the pace of public achievements, Recursive's "First Steps" is likely one of the most concrete and reproducible technical demonstrations among similar companies to date.

The Dawn of the Recursive Paradigm

Placed within the broader industry context, Recursive's released achievement represents the preliminary realization of a new type of AI R&D paradigm: making the AI system itself the primary agent of research.

The core logic of this "recursive AI" is not complicated: AI enhances AI research capabilities, and the improved AI can then more effectively enhance itself, in a virtuous cycle. It does not rely on a single breakthrough, but on a system that continuously generates breakthroughs.

This approach has significant implications for the economics of AI research itself. The training pipelines for frontier models still heavily depend on a small number of researchers with specific skills, numbering no more than a few thousand globally. If automated research systems can take over even a portion of this work, both the speed and cost curve of AI progress will change.

This assessment also echoes other recent voices from the industry. For instance, Anthropic's "When AI Builds Itself" mentioned at the beginning of this article has a serious tone—it calls for industry coordination to have options to pause or temporarily halt frontier AI development when the moment of recursive self-improvement arrives, to allow time for societal structures and alignment research to catch up. For more details, see "AI Self-Evolution Too Fast, Anthropic Calls for Global Halt on R&D."

https://www.anthropic.com/institute/recursive-self-improvement

These two events happening simultaneously are thought-provoking. On one side, Anthropic is documenting and warning about the direction of this trajectory; on the other side, teams like Recursive are making step-by-step progress to turn this trajectory into reality.

Of course, Recursive itself acknowledges this is still the "first step": the current system works best in scenarios with clear metrics, rapid feedback, and detectable cheating. There is still considerable distance from autonomously advancing open scientific questions. Preventing reward hacking will be a core challenge on the path to scaling.

But a closed loop has begun to turn. The question now is simply how fast it will spin.

This article is from the WeChat public account "Machine Heart" (ID: almosthuman2014), author: Machine Heart in Recursive Evolution, editor: Panda

你可能也喜欢

产品发布：市场指南针

Glassnode推出全新工具“市场指南针”，旨在解决用户面对海量数据时难以抉择的问题。该工具通过七个维度综合分析市场：其中四个前瞻性维度（宏观环境、资金流动、投资者行为、链上基本面）汇合成一个从“风险规避”到“风险偏好”的主综合评分；另外三个独立维度（周期位置、衍生品、跨资产轮动）则描述当前市场状态。目前主评分为14（满分100），处于“风险规避”区间，显示市场仍处熊市阶段。比特币价格约64,400美元，月内下跌16%。具体来看： * **宏观**：评分23，主要受美元走强拖累。 * **资金流动**：评分31，稳定币供应增长转负，市场“弹药”略有减少。 * **投资者行为**：评分35，长期持有者占比创新高，显示筹码正流向坚定持有者。 * **链上基本面**：评分38，网络活动有初步回暖迹象，但尚未全面复苏。 * **周期位置**：评分18，处于“投降”阶段，但现价仍高于平均成本。 * **衍生品**：评分43，杠杆率较低，市场仓位谨慎且对冲充分。 * **跨资产轮动**：评分70，显示资金相对青睐山寨币，但各板块普跌，实为“跌得少”的相对优势。总体而言，市场处于低位盘整阶段，内部结构正在修复，但由美元主导的宏观约束尚未解除，明确的趋势反转仍需等待美元指数回落至其200日均线以下。该工具每日更新数据，每周提供分析摘要。

insights.glassnode7分钟前

insights.glassnode7分钟前

英伟达CPU压境，中国RISC-V迎战：半导体深观察之四

英伟达即将向中国客户提供其首款专为AI设计的独立CPU Vera，基于Arm架构，单颗售价超2万美元。这凸显了中国在AI算力需求激增下，对CPU架构自主可控的迫切性。文章指出，除了x86和Arm，RISC-V正成为中国突破“不可能三角”（繁荣、可控、自主）的关键赛道。 RISC-V因其开源、模块化特性，被视为实现自主可控且有机会繁荣的路径。当前，中国已成为全球RISC-V发展的热点，受AI算力需求、出口管制压力、开源降本以及政策支持等多重因素推动。国内多家厂商的高性能RISC-V核心在SPEC定点跑分上已触及或超过15分的行业门槛，并实现了3GHz以上的主频，拿到了进入高性能计算俱乐部的“入场券”。产业焦点已从单核性能转向完整的“计算子系统”，包括自研一致性片上网络（NoC）和满足数据中心要求的全栈RAS能力。已有厂商交付了40核、严格兼容RVA23国际标准的服务器处理器，体现了对生态统一性的重视。在视频编解码、加解密等特定负载上，部分国产RISC-V处理器已接近甚至超越x86/Arm同代产品。挑战同样严峻。生态碎片化、EDA工具链不完善、验证复杂度高、单核能效追趕、以及先进工艺制约等都是必须啃下的“硬骨头”。业界清醒认识到，在数据中心领域超越成熟架构的周期将比预期更长。结论是，面对英伟达Vera的敲门，中国自研CPU并非只有跟随Arm一条路。RISC-V赛道已在中国推开大门，并在高性能计算领域取得了实质性进展。虽然前路漫长，充满工程挑战，但它为中国提供了在下一轮算力革命中掌握主动权的可能性。

marsbit1小时前

marsbit1小时前

Stratosphere、Pudgy Penguins与Streamex于2026年ETHConf及纽约科技周期间举办创始人圆桌VIP晚宴

2026年6月9日，在ETHConf 2026和纽约科技周期间，Stratosphere、Pudgy Penguins和Streamex在纽约市联合举办了一场私密的“创始人桌”VIP晚宴，汇聚了数字资产、科技、人工智能、传统金融和机构资本领域的众多领导者。此次仅限受邀者参加的晚宴，旨在将精选的创始人、运营商、基金、高管及机构领袖聚集一堂，在私密环境中促进自然交流。出席嘉宾包括来自花旗、BitMine、BitGo、未来资产证券（美国）、Experian、Pyth Network、Space and Time、MegaETH、B3、Stable、Antler、Delphi Digital、Fun、Linera、Vanta Trading、Streamex、PolyData、Horizen Labs、World Foundation、Zipcode、OpenLedger、Onyx、Definitive、Notalone Ventures等机构的代表。晚宴由Stratosphere主办，Pudgy Penguins和Streamex联合举办。Stratosphere贡献了其广泛的创始人、运营商、投资者和机构网络；Pudgy Penguins带来了数字资产领域强大的消费品牌和社区；Streamex则聚焦于代币化黄金和大宗商品市场，引入了机构及现实世界资产的视角。 Stratosphere首席执行官哈桑·谢赫表示：“我对数字资产的下一阶段，尤其是商品代币化感到乐观。这类晚宴让我们能将基金、机构和创始人聚集在同一房间，探讨市场走向。”该“创始人桌”系列活动计划在全年主要全球会议期间持续举办，致力于在私密、以关系驱动的场合中连接创始人、资本、机构和领先品牌。 Stratosphere是一家服务于科技和金融行业领导者的生态合作伙伴与增长咨询公司。

TheNewsCrypto3小时前

Stratosphere、Pudgy Penguins与Streamex于2026年ETHConf及纽约科技周期间举办创始人圆桌VIP晚宴

TheNewsCrypto3小时前

Coding 的投注面板赚钱了，但 Polymarket 真不是个「套利」的好地方

作者Tyler分享了自己为Polymarket投注开发的监控面板，通过半个月实测，用约1600美元本金获得了30%以上的收益。但他强调，本文目的并非宣扬Polymarket容易赚钱或提供套利教程，恰恰相反，他认为Polymarket并非适合传统“套利”思维的平台。面板分为“持仓仪表盘”和“机会监控”两部分，核心功能是可视化持仓、进行风险控制。作者将投注按确信度分为T1（高确信）、T2（较稳）、T3（纯投机）三档，并严格设置仓位上限和主题簇暴露限制，以管理风险。作者指出，Polymarket存在“数学期望陷阱”：即使判断胜率高，一旦错误，单笔投注可能导致本金全损。因此，高确信不等于应下重注，必须为判断错误预留空间。许多投注事件看似独立，实则底层相关性高，容易造成“伪分散”。最终，作者认为Polymarket上机会源于信息差和仓位管理，而非无风险套利。每个投注本质上都是高风险的。他将平台视为训练个人判断力的工具，强调纪律和风控框架比短期盈利更重要。其价值在于“把感觉变成框架，把框架变成纪律”，避免因连续获胜而过度自信，最终因一次大错损失全部利润。

marsbit4小时前

Coding 的投注面板赚钱了，但 Polymarket 真不是个「套利」的好地方

marsbit4小时前

微信 AI 卡实测指南：AI Shopping时代降临了吗？

微信AI卡实测：AI购物时代尚未到来微信支付近期推出了AI专属卡，旨在为AI助手提供支付能力。用户需在WorkBuddy对话中开通此卡并充值独立余额，AI相关的消费将从此卡扣款。然而，实测发现，它并非AI“全自动消费”工具，每笔支付仍需用户扫码确认。文章通过尝试用WorkBuddy点一杯喜茶揭示了当前局限。虽然能触发支付，但AI未能正确理解需求，最终购买了错误的美团团购券。这表明问题核心不在于支付功能本身，而在于AI代理（Agent）执行复杂任务链路的成熟度，包括需求理解、平台授权和商品匹配等环节。目前，AI专属卡更像一个与主钱包隔离、额度可控的“小钱包”，设计上较为克制。它适用于购买付费内容、调用付费API或订阅服务等标准化场景，但对于外卖等需要多环节协作的生活服务，效果尚不理想。总结建议：用户可小额体验，但支付前务必仔细核对商品信息，切勿完全依赖AI对复杂需求的理解。

marsbit6小时前

marsbit6小时前

交易

现货

合约

The Recursive AI Anthropic Warned About: Tian Yuandong's New Company Has Just Taken the "First Step"

文章摘要

The First Result: Let AI Run Experiments for You

Three Battlefields, Three New Records

Recursive: Using AI to Research and Recursively Improve AI

The Dawn of the Recursive Paradigm

热门币种推荐

相关问答

你可能也喜欢

产品发布：市场指南针

英伟达CPU压境，中国RISC-V迎战：半导体深观察之四

Stratosphere、Pudgy Penguins与Streamex于2026年ETHConf及纽约科技周期间举办创始人圆桌VIP晚宴

Coding 的投注面板赚钱了，但 Polymarket 真不是个「套利」的好地方

微信 AI 卡实测指南：AI Shopping时代降临了吗？

交易

热门文章

加密市场宏观研报：原油飓风、AI巨浪与比特币的十字路口

自主AI经济的基石：Talus如何重塑链上智能代理

火币成长学院：AI与Crypto深度研报：算法与账本的共生时代

相关讨论

热门问答

热门分类

热门标签