The Recursive AI Anthropic Warned About: Tian Yuandong's New Company Has Just Taken the "First Step"

marsbitОпубликовано 2026-06-12Обновлено 2026-06-12

Введение

Anthropic recently highlighted the rapid progress toward "recursive self-improvement," where AI systems autonomously design and train their successors. In response, Recursive Superintelligence, a new company co-founded by former Meta researcher Tian Yuan Dong, has publicly demonstrated its first step toward automating AI research. The company released a system designed to autonomously execute the full AI research cycle: generating ideas, implementing code, running experiments, and learning from results. It validated this approach by achieving state-of-the-art results on three diverse benchmarks: 1. **NanoChat Autoresearch:** Optimizing a small language model's validation loss under a fixed 5-minute GPU budget, improving upon the community's best result. 2. **NanoGPT Speedrun:** Reducing the time to train a GPT model to a specific loss on 8 H100 GPUs from 79.7 seconds to 77.5 seconds, beating a highly optimized, human-driven community effort. 3. **SOL-ExecBench:** Improving the overall score on NVIDIA's suite of 235 GPU kernel optimization tasks by 18%, closing the gap to the hardware limit. The system discovered novel optimizations in this highly specialized domain without direct human expertise. Recursive's system operates as a general framework, capable of parallel exploration and cross-task knowledge transfer while incorporating safeguards against reward hacking. The company, backed by $650M in funding and a star-studded team including Richard Socher and Alexey Dosov...

Recently, Anthropic published an article titled "When AI Builds Itself," which quickly sparked widespread discussion. The article revealed a striking set of internal data: as of May 2026, over 80% of the code in Anthropic's codebase had been written by Claude, with engineers merging eight times more code per day than in 2024. In an internal test, Claude improved the runtime of a piece of training code by approximately 52x over a baseline, whereas an experienced human researcher typically takes 4 to 8 hours to achieve a 4x speedup.

Anthropic points this trajectory towards a deeper destination: "Recursive Self-Improvement"—AI systems autonomously designing, building, and training their own successive versions, with humans no longer driving every step. Notably, the company also called for industry coordination to have the option to pause or even temporarily halt frontier AI development when the moment of recursive self-improvement arrives. And Anthropic is already doing this: restricting its latest Claude Fable 5 from being used for frontier AI research.

Now, Recursive Superintelligence has announced it has taken the first step toward automated AI research.

This new company co-founded by Tian Yuandong has been out of stealth mode for just one month, and has now released its first public technical achievement. They have built an open-ended automated knowledge discovery system and achieved state-of-the-art (SOTA) results on three benchmarks. Simply put, they have succeeded in making AI run experiments for you.

https://x.com/tydsh/status/2065062838255649082

The First Result: Let AI Run Experiments for You

Recursive's first public technical achievement is called "First Steps Toward Automated AI Research."

Tweet: https://x.com/Recursive_SI/status/2064980090702962699

Repo: https://github.com/recursive-org/first-steps-toward-automated-ai-research

Blog: https://www.recursive.com/articles/first-steps-toward-automated-ai-research

To summarize in one sentence, the core of this work is: building a system capable of autonomously advancing the AI research cycle and setting new records on three benchmark tests.

Before dissecting the results, it's necessary to understand the design logic of this system.

The traditional AI research process is a highly human-dependent closed loop of "propose idea—write code—run experiment—analyze results—propose new idea." Its efficiency bottleneck lies not in computing power, but in people. The number of researchers worldwide who can design frontier training pipelines is exceedingly small, and each round of experimental iteration requires their intensive involvement.

Recursive's system attempts to automate this closed loop.

Its working method is: for a clearly defined optimization objective, the system automatically proposes experimental ideas, implements code, runs validation, learns from it, and then decides how to search next. Multiple research lines can be advanced in parallel, effective discoveries can be reused across tasks, and mechanisms for detecting reward hacking are embedded within the entire loop to prevent the system from "taking shortcuts" to inflate evaluation metrics without genuinely improving anything.

This is not a specialized tool fine-tuned for a single problem, but rather a general-purpose research automation framework spanning different domains. Recursive demonstrates this using three significantly different test scenarios.

Three Battlefields, Three New Records

Scenario One: Small Model Training Under Fixed Compute Budget (NanoChat Autoresearch)

The rules for this benchmark come from the autoresearch project initiated by Andrej Karpathy (author of GPT-2, former OpenAI co-founder): on a single GPU, given a fixed training budget of five minutes, train a small language model to achieve the lowest possible validation loss (measured in BPB, lower is better).

This scenario is naturally suited for automated research: short experimental cycles, low metric variance, relatively easy detection of cheating behavior. Precisely because of this, a community project called "autoresearch@home" has been running on this benchmark for a long time—dozens of human researchers collaborating with hundreds of AI agents continuously pushing the metric down.

Recursive's system started from the same initial code and ultimately improved the validation BPB from the community's best of 0.9372 to 0.9109, an improvement of 0.0263 BPB. Put another way: to achieve the same training quality, Recursive's solution requires 1.3 times less training time than the competitor's.

The improvements discovered by the system were not a single silver bullet. It combined architecture adjustments, auxiliary losses, attention mechanism modifications, optimizer behavior, weight decay scheduling, compiler settings, and more. One of the key discoveries was a richer short-context memory mechanism: within the attention's value path, embedding both bigram (adjacent word pairs) and trigram (triplet) information via hash tables, with weighted mixing via learnable gating. Different Transformer layers use different hash functions, reducing the probability of cross-layer collision.

This trick is conceptually related to works like DeepSeek Engram, but the system deployed it in a specific variant not yet seen in published literature for the fixed-budget scenario.

Scenario Two: Training Speed Limit Race (NanoGPT Speedrun)

If the previous scenario was about "going one step further" on an active community's results, this scenario is much harder.

NanoGPT Speedrun is another benchmark initiated by Karpathy and continuously optimized by the community for over two years: the shortest time required to train a GPT model to a validation loss of 3.28 on 8 H100 GPUs. Since mid-2024, the community has compressed the time from about 45 minutes to 79.7 seconds through 83 documented contributions. Each new solution must squeeze out more time from an already extremely optimized codebase, making the difficulty self-evident.

Recursive's system started from the existing optimal solution and further compressed the training time to 77.5 seconds, saving 2.2 seconds. This improvement is comparable to, or even better than, what recent human contributors have achieved.

The core tricks found by the system this time include:

FP8 Precision Attention Computation. The community solution used FP8 (8-bit floating point) computation only in the model's final layer (language model head). The system extended FP8 into the matrix operations of the attention layers, using FP8 for forward propagation to achieve twice the Tensor Core throughput, while retaining BF16 for backward propagation to maintain stability.

Annealing Exploration Noise in the Optimizer. The system injected zero-mean Gaussian noise into the update steps of the NorMuon optimizer, with the noise amplitude linearly annealing to zero as training progressed. This is somewhat like giving the optimizer a behavior pattern of "explore boldly first, then converge robustly," helping the final solution settle in a flatter loss basin.

More Streamlined Fused MLP Kernel. The system rewrote a Triton GPU kernel so that forward propagation only stores activation values after ReLU squaring, and during backward propagation, the unsquared intermediate results are recomputed internally within the kernel, saving one full round-trip read/write of the activation tensor in high-bandwidth GPU memory—a direct hardware-level speedup.

Three improvements, belonging to three different specialized areas: precision strategy, optimizer design, and GPU kernel programming. The fact that the system found room for improvement on a result optimized by the community for two years speaks for itself.

Scenario Three: GPU Kernel Optimization (SOL-ExecBench)

The first two scenarios operated at the model training level. The third scenario delves deeper: optimizing GPU compute kernels.

SOL-ExecBench is a benchmark introduced by NVIDIA, containing 235 kernel writing tasks covering various real-world workloads like matrix multiplication, reduction, normalization layers, attention components, quantization routines, fused blocks, etc. The scoring metric is the SOL score: 0.5 corresponds to a baseline PyTorch implementation, and 1.0 corresponds to the hardware's theoretical limit. The previous best public score was 0.699.

Recursive's system ran on all 235 kernels, allowing discovered optimization patterns (e.g., memory access strategies, tiling methods, reduction techniques) to be reused across tasks. The final score improved to 0.754, reducing the gap to the hardware limit by 18%.

This scenario is particularly significant because kernel engineering is an extremely specialized field—engineers who can write efficient Triton/CUDA kernels are rare globally. The Recursive team candidly admits in their blog, "We ourselves are not experts in kernel engineering. These ideas came from the system itself, not from our specialized background."

Recursive: Using AI to Research and Recursively Improve AI

The company releasing this achievement, Recursive Superintelligence, was founded between late 2025 and early 2026 and only came out of stealth last month. In addition to Tian Yuandong, former Research Scientist Director at Meta FAIR, the founding team includes:

Richard Socher, Recursive CEO, former Chief Scientist at Salesforce.

Alexey Dosovitskiy, former Google DeepMind Research Scientist and first author of Vision Transformer, with over 160,000 Google Scholar citations.

Tim Rocktäschel, former DeepMind Principal Scientist and UCL AI Professor.

Peter Norvig, former Google Director of Research, co-author with Stuart Russell of the famous AI textbook "Artificial Intelligence: A Modern Approach."

Caiming Xiong, former VP of AI at Salesforce.

Tim Shi, former OpenAI researcher, co-founder and CTO of enterprise AI company Cresta.

Josh Tobin, Recursive CTO, former Research Lead at OpenAI and Uber ATG.

Jeff Clune, former VP of Research at Google DeepMind, Professor of Computer Science at the University of British Columbia, Canada.

Remarkably, this startup, without even having a public product yet, has already secured $650 million in funding with a valuation of $4.65 billion, led by GV (Google Ventures) and Greycroft, with follow-on investment from NVIDIA and AMD Ventures.

The company's core proposition directly corresponds to its name: building AI systems that can recursively enhance their own research capabilities, allowing AI to participate in and accelerate the R&D process of AI itself, ultimately forming a self-reinforcing closed loop.

For more details, refer to the report "After Leaving Meta, Tian Yuandong Just Announced His Startup."

Of course, Recursive is not alone in this arena. Yann LeCun's AMI Labs raised $1 billion in March this year, and David Silver's Ineffable Intelligence secured a $1.1 billion seed round in April, both pointing in a similar direction: enabling AI systems to autonomously generate knowledge and reduce human intervention in the research process. However, in terms of the pace of public achievements, Recursive's "First Steps" is likely one of the most concrete and reproducible technical demonstrations among similar companies to date.

The Dawn of the Recursive Paradigm

Placed within the broader industry context, Recursive's released achievement represents the preliminary realization of a new type of AI R&D paradigm: making the AI system itself the primary agent of research.

The core logic of this "recursive AI" is not complicated: AI enhances AI research capabilities, and the improved AI can then more effectively enhance itself, in a virtuous cycle. It does not rely on a single breakthrough, but on a system that continuously generates breakthroughs.

This approach has significant implications for the economics of AI research itself. The training pipelines for frontier models still heavily depend on a small number of researchers with specific skills, numbering no more than a few thousand globally. If automated research systems can take over even a portion of this work, both the speed and cost curve of AI progress will change.

This assessment also echoes other recent voices from the industry. For instance, Anthropic's "When AI Builds Itself" mentioned at the beginning of this article has a serious tone—it calls for industry coordination to have options to pause or temporarily halt frontier AI development when the moment of recursive self-improvement arrives, to allow time for societal structures and alignment research to catch up. For more details, see "AI Self-Evolution Too Fast, Anthropic Calls for Global Halt on R&D."

https://www.anthropic.com/institute/recursive-self-improvement

These two events happening simultaneously are thought-provoking. On one side, Anthropic is documenting and warning about the direction of this trajectory; on the other side, teams like Recursive are making step-by-step progress to turn this trajectory into reality.

Of course, Recursive itself acknowledges this is still the "first step": the current system works best in scenarios with clear metrics, rapid feedback, and detectable cheating. There is still considerable distance from autonomously advancing open scientific questions. Preventing reward hacking will be a core challenge on the path to scaling.

But a closed loop has begun to turn. The question now is simply how fast it will spin.

This article is from the WeChat public account "Machine Heart" (ID: almosthuman2014), author: Machine Heart in Recursive Evolution, editor: Panda

Связанные с этим вопросы

QWhat is recursive self-improvement in AI, and why is it significant according to the article?

ARecursive self-improvement refers to AI systems autonomously designing, building, and training their own successor versions, reducing human intervention at every step. According to the article, this is significant because it points towards a future where AI progress could accelerate dramatically. As highlighted by Anthropic's data, AI (like Claude) is already writing most of their code and optimizing processes far beyond human efficiency, potentially leading to a self-reinforcing cycle of improvement that changes the economics and speed of AI development.

QWhat specific achievement did Recursive Superintelligence announce, and how does it work?

ARecursive Superintelligence announced its first public technical achievement: an open-ended automated knowledge discovery system for AI research. The system automates the traditional AI research loop of 'idea generation - coding - experimentation - analysis.' It works by autonomously proposing experimental ideas, implementing code, running validations, learning from results, and deciding the next search direction for a given optimization goal. It demonstrated this by achieving state-of-the-art results on three different benchmark tests.

QWhat are the three benchmark tests where Recursive's system achieved new records, and what were the key improvements?

A1. NanoChat Autoresearch: The system improved validation loss (BPB) by 0.0263 on a small model training task with a fixed 5-minute compute budget. A key improvement was a richer short-context memory mechanism using hashed bigram/trigram information in attention layers. 2. NanoGPT Speedrun: It reduced the training time to reach a target validation loss from 79.7 seconds to 77.5 seconds. Key improvements included FP8 precision in attention calculations, annealed exploration noise in the optimizer, and a more efficient fused MLP GPU kernel. 3. SOL-ExecBench: The system improved the overall score for GPU kernel optimization tasks from 0.699 to 0.754 (closing 18% of the gap to the hardware limit) by discovering and reusing optimization patterns across 235 different kernel tasks.

QWhy is the development of automated AI research systems like Recursive's considered a potential concern, as hinted by the article?

AThe development of automated AI research systems is a potential concern because it could lead to rapid, uncontrolled recursive self-improvement. As noted with Anthropic's warning, if AI systems become proficient at autonomously improving themselves, the pace of AI advancement could outstrip society's ability to develop safety measures, governance, and alignment research. This creates a risk scenario where highly capable AI emerges before adequate safeguards are in place, prompting calls for coordinated pauses in frontier AI development.

QWho are some of the notable founders and backers of Recursive Superintelligence mentioned in the article?

AThe notable founders include Tianyuan Dong (former Meta FAIR), Richard Socher (CEO, former Salesforce), Alexey Dosovitskiy (Vision Transformer co-author), Tim Rocktäschel (former DeepMind), Peter Norvig (co-author of 'Artificial Intelligence: A Modern Approach'), Caiming Xiong (former Salesforce AI VP), Tim Shi (former OpenAI), Josh Tobin (CTO, former OpenAI), and Jeff Clune (former Google DeepMind). The company raised $650 million in funding at a $4.65 billion valuation, led by GV (Google Ventures) and Greycroft, with participation from NVIDIA and AMD Ventures.

Похожее

The Revelation from the Raydium Theft Incident: New DeFi Vulnerabilities Lurking in Forgotten Old Contracts

**Raydium Exploit Reveals DeFi's Hidden Risk: Forgotten "Zombie" Contracts** A recent attack on Raydium's deprecated V3 AMM pools resulted in a loss of approximately $1.34 million. The hacker exploited pools that were no longer supported by Raydium's current UI or SDK but remained fully functional and accessible on-chain. This incident highlights a critical, often overlooked category of risk in DeFi: inactive or legacy smart contracts that projects fail to properly decommission. Since March 2025, there have been at least 8 publicly reported attacks targeting such abandoned contracts, with total losses around $10.8 million. Including older pools and deprecated features, the count rises to 10 incidents with roughly $22.5 million in losses. These "zombie contracts" represent a lifecycle management failure rather than a code vulnerability, yet they are typically misclassified under general "code bug" categories in security reports, masking the true scale of the problem. The root cause is that projects often merely document a contract as "deprecated" without taking essential technical steps to secure it: withdrawing remaining assets, disabling external call functions, and implementing ongoing monitoring. These forgotten, under-monitored components become prime targets for attackers. To address this, the industry needs to recognize "zombie contracts" as a distinct risk category and establish standardized decommissioning protocols. Essential steps should include: 1) a formal retirement announcement, 2) removal of all front-end integrations, 3) withdrawal of locked assets, 4) disabling key contract functions, 5) ongoing security monitoring, 6) clear user communication, and 7) a post-mortem analysis. The value of a DeFi project lies not only in its current TVL but also in the security of its historical codebase, which has now become a new attack surface.

Foresight News1 ч. назад

The Revelation from the Raydium Theft Incident: New DeFi Vulnerabilities Lurking in Forgotten Old Contracts

Foresight News1 ч. назад

Robots Begin to 'Consume Data': The Hidden Production Chain from Indian Data Factories to Billion-Dollar Humanoid Robots

Robots have started to 'consume data,' driving the formation of a new industrial supply chain focused on producing training data for embodied AI. Unlike large language models, which are trained on vast internet text corpora, embodied AI models face a 'data desert' in the physical world. This has created a massive demand for first-person perspective video data (Ego Data), captured by workers wearing cameras in places like Indian garment factories. Companies like Neocambrian AI are establishing 'data factories' where workers perform standardized tasks (e.g., sorting clothes, kitchen organization) to generate thousands of hours of video. Research, such as NVIDIA's EgoScale, demonstrates that scaling this human demonstration data predictably improves robot performance, particularly for dexterous manipulation. This has validated a training path combining large-scale human data for pre-training with smaller amounts of robot-specific data for fine-tuning. The value of different data types varies significantly, forming a 'data pyramid.' The base consists of low-cost, large-scale internet and Ego Data. Higher layers include more expensive motion-capture data (e.g., from data gloves), simulation/synthetic data, and the most costly and scarce layer: real robot teleoperation data. This demand has spawned a layered ecosystem of data suppliers: low-cost data factories, motion capture and alignment specialists, robot-native teleoperation service providers, simulation data companies, and platforms aiming for data standardization. Robot companies themselves are adopting a 'layered procurement' strategy: outsourcing generic Ego Data while building in-house capabilities for robot-specific adaptation data and the critical deployment/failure data generated in real-world applications. The industry is shifting focus from hardware and basic mobility to the data pipelines required for general-purpose capability. While parallels exist to data labeling companies like Scale AI in the LLM boom, the physical complexity of robot data—involving action success ambiguity and sim-to-real gaps—requires more integrated solutions for data collection, annotation, and a continuous feedback loop. The race is on to build the data engines that will teach robots to operate reliably in the unstructured real world.

marsbit4 ч. назад

Robots Begin to 'Consume Data': The Hidden Production Chain from Indian Data Factories to Billion-Dollar Humanoid Robots

marsbit4 ч. назад

Spicy Commentary | Michael Saylor's 'Player Talk'; 60-Year-Old Aunt Liquidated After 'Scamming a Young Man'

**"Spicy Commentary": Three Tales of Crypto's Wild Week** This week's "Spicy Commentary" column highlights three dramatic stories from the cryptocurrency world. First, **MicroStrategy's Michael Saylor** addressed the controversy over his company potentially selling Bitcoin. At the BTC Prague event, he clarified, "I never said the company can't sell Bitcoin. I told *you* never to sell *your* Bitcoin." This "do as I say, not as I do" stance was criticized by netizens as peak linguistic gymnastics, noting a history of him previously stating the company would "never" sell. Second, a **bizarre fraud case** emerged from Beijing. A 60-year-old woman, obsessed with getting rich from crypto but unwilling to risk her own savings, posed online as the 20-something "god-daughter" of a high-ranking official. She catfished a young man, convincing him to give her over 200,000 yuan for fabricated emergencies. She then invested all the stolen money into cryptocurrency with 10x leverage, only to lose everything in a market crash. The woman was sentenced to four years in prison for fraud. Finally, a **sobering trader's tale** surfaced on Reddit. A user posted "Tale of a crypto trader," confessing their net worth had plummeted from a peak of $45 million to roughly $17,200, primarily due to holding meme coins too long. The post, described as a crypto "book of confessions," sparked reactions ranging from sympathy to critique about greed, poor risk management, and the perils of treating meme coins as long-term investments instead of taking profits. The column concludes that this week featured masterful rhetoric, elaborate scams, and extreme financial volatility, stitching together another chapter in crypto's unpredictable theater.

Foresight News4 ч. назад

Spicy Commentary | Michael Saylor's 'Player Talk'; 60-Year-Old Aunt Liquidated After 'Scamming a Young Man'

Foresight News4 ч. назад

Tremble Humans, AI Continues Its Accelerated Sprint

Trembling, Humans: AI Continues Its Accelerated Sprint Yes, AI is still rapidly accelerating. While deep learning seemed to stall quickly in its early years, large models after years of development show no sign of hitting their ceiling. At the Zhiyuan Conference 2026, the focus is on enabling AI to move from the digital world into the physical world. Scaling Law remains effective, continuing to drive advancements in both large language models and multimodal models. The industry is now entering a phase of pursuing World Models, though unresolved technical paths and data issues mean this exploration may take 3-5 more years. Concurrently, breakthroughs in Agents are accelerating AI's real-world application in fields like healthcare and meetings. Making Agents truly useful requires key hardware-software co-design, evident from the strong presence of chip vendors at the conference. We stand at a new historical threshold where AI is becoming a foundational force reshaping the world. The first day of the conference highlighted AI's evolution from "knowing how to chat" to "knowing how to work." Scaling Law persists, World Models are the next key battleground, and Agents are transitioning from usable to好用 (user-friendly). Scaling Law is not ending but diversifying. New models like Anthropic's Fable 5 demonstrate scaling through parameter size, synthetic data, and reinforcement learning. Advancements in AI Coding and Agent deployment are enabling a trend of AI self-evolution, potentially allowing AI to take over digital world iterations. World Models represent the next frontier for large models extending into the physical realm, but no current model is truly impressive at solving real-world problems. Technical consensus is lacking, with debates on data sources (video, simulation, real-world). Different approaches are emerging: language-centric, pixel-centric, 3D-structure-centric, and visual-representation-centric models. Zhiyuan Institute is exploring a fifth path: unified latent space modeling fusing language and visual representations, and introduced its own under-development World Model, Physis-v0.1. On the product side, Agents are key to bringing AI into daily life. Since 2025, the "Year of the Agent," products have become more proactive and capable of complex tasks. Zhiyuan showcased four vertical Agents for cardiac diagnosis, autonomous research, meeting summarization, and protein risk discovery. However, technical challenges remain, particularly in context engineering like memory and orchestration. "Harness" – the engineering framework around an Agent – is crucial for maximizing its capabilities by clarifying intent, designing workflows, and incorporating validation and feedback. In summary, AI's breakneck pace continues on multiple fronts: foundational model scaling, the ambitious pursuit of World Models for physical understanding, and the ongoing refinement of practical Agents. The journey from capable to truly reliable and useful AI systems is well underway.

marsbit4 ч. назад

Tremble Humans, AI Continues Its Accelerated Sprint

marsbit4 ч. назад

Торговля

Спот
Фьючерсы

Популярные статьи

Неделя обучения по популярным токенам (2): 2026 может стать годом приложений реального времени, сектор AI продолжает оставаться в тренде

2025 год — год институциональных инвесторов, в будущем он будет доминировать в приложениях реального времени.

1.8k просмотров всегоОпубликовано 2025.12.16Обновлено 2025.12.16

Неделя обучения по популярным токенам (2): 2026 может стать годом приложений реального времени, сектор AI продолжает оставаться в тренде

Обсуждения

Добро пожаловать в Сообщество HTX. Здесь вы сможете быть в курсе последних новостей о развитии платформы и получить доступ к профессиональной аналитической информации о рынке. Мнения пользователей о цене на AI (AI) представлены ниже.

活动图片