Claude Science Completes Two Years' Work in a Few Weeks, Is 10x Research Acceleration Really Here?

marsbitPublicado a 2026-07-01Actualizado a 2026-07-01

Resumen

Claude Science, a new AI workbench from Anthropic, is being tested by scientists, reportedly accelerating specific research workflows by up to 10x. A neuro-scientist at the Allen Institute completed a lengthy literature review in weeks instead of nearly two years using the tool, which automates tasks like citation verification. The platform is an integrated environment for macOS and Linux, connecting to local or remote computing resources. It streamlines the fragmented research process—literature analysis, computation, visualization, and drafting—into a single, auditable workflow. A key feature is its emphasis on reproducibility: every chart generated includes the exact code, environment, and history used to create it. Claude Science uses a multi-agent system. A coordinator manages over 60 pre-configured skills for life sciences (genomics, proteomics, etc.) and can spawn specialized agents. A dedicated reviewer agent checks citations and calculations for accuracy, creating a form of internal AI peer review. The system operates with a human-in-the-loop, requiring user approval for major steps. Initial applications are in life sciences. Examples include target identification for biotech company Manifold Bio and germline variant analysis for glioma research at UCSF, completing analyses in roughly one-tenth the previous time. The approach contrasts with competitors: Google focuses on proprietary models like AlphaFold, while OpenAI is advancing models' scientific reasoning wit...

Two years' work, now completed in a few weeks.

Recently, neuroscientist Jérôme Lecoq and his team at the Allen Institute compressed the writing time of a long-form review article from nearly two years to just a few weeks.

Jérôme Lecoq had a backlog of about 10 reviews, many exceeding 100 pages, with every single citation checked sentence by sentence by an AI agent.

The tool helping him was Claude Science, the new application just launched by Anthropic.

On June 30, 2026, Anthropic released Claude Science, positioned as an AI workbench for scientists. (Image source: Anthropic official blog)

According to Anthropic, this task would have taken the scientist and his team two years in the past.

Anthropic's positioning for Claude Science is not just a smarter research model, but an AI workbench tailored for scientists.

Its true breakthrough lies in: for the first time, breaking down scientific research into an auditable, step-by-step pipeline.

Currently, Claude Science is in beta on macOS and Linux, available to Pro, Max, Team, and Enterprise users.

What's Really Changing is the Entire Research Toolchain

Anyone who has done research understands the drudgery:

A project requires jumping between dozens of databases, each with its own schema and query language;

File formats are all over the place, requiring custom pipelines and viewers for each;

You have a row of tools on hand: PubMed for literature, Jupyter for code, R for statistics, cluster terminals for submitting jobs...

Constant context switching, leaving little time for actual scientific thinking amidst the work of moving, splicing, and debugging.

What Claude Science does is bundle these fragmented scenarios into a single execution environment:

Literature analysis, multi-step computation, chart polishing, manuscript drafting—all stages are completed in the same environment, so you don't interrupt your train of thought switching tools.

It can run on your local macOS or Linux machine, connect via SSH to remote machines, or attach to a high-performance computing (HPC) login node.

Just like using Jupyter normally, it goes where the data is.

It even handles computational resource scheduling.

Large tasks like protein folding or running a genomic pipeline on massive data used to require researchers to babysit: setting up jobs, queuing for cluster time, monitoring success or failure, and pulling results back—half a day gone in a flash.

Claude Science takes over this flow: drafting a plan, asking for permission before touching new resources, letting you review or revert tasks before they are written and submitted, scaling analysis from 1 GPU to hundreds.

Claude Science dispatches an 8-group scVI hyperparameter scan to a lab's A100 cluster. The Notebook on the right shares the same live kernel with the agent, with variables and state synchronized in real-time. (Image source: Anthropic official blog)

More importantly, sensitive data doesn't leave the original system; only the context truly needed for each step is sent to Claude.

Every Chart Comes with Traceable Code

Science inherently deals with visuals: protein 3D structures, genome browser tracks, chemical formulas—these are essentially diagrams.

Building on this, Claude Science, while generating charts and drafts, also outputs the code that created them and can render them natively.

The key lies in reproducibility.

Whenever Claude Science generates a chart, it "pins" the exact code that created it, along with the runtime environment, plain-language descriptions, and the full conversation history, right onto the chart.

On the left, a cell chart across 138 species; on the right, the exact code that generated it is displayed side-by-side. Annotating with a sentence can have the agent modify the chart. Every result is reproducible and traceable to its code. (Image source: Anthropic official blog)

Months often pass from a paper's submission to publication; months later, when a reviewer asks you to rerun a specific chart, you can easily reproduce the entire chain of inputs, process, and results on the spot.

Want to edit the chart? Just speak—"remove gridlines," "switch the y-axis to log," the agent directly edits the code it wrote.

You can also fork the session at any point to try two different approaches simultaneously, without disrupting the original thread.

In short, research is integrated for the first time into an auditable workflow, with code, environment, and history all placed within a closed loop.

One Agent Writes, Another Specializes in Finding Errors

Behind Claude Science, it's not a single agent working alone.

You're facing a coordinating agent that manages over 60 pre-configured skills and connectors for genomics, single-cell analysis, proteomics, structural biology, and cheminformatics.

When work piles up, it can spawn additional agents for division of labor and can call upon expert agents you've created yourself.

The most clever part is the reviewer agent.

It specializes in checking citations and calculations, hunting down wrong citations, unsourced numbers, charts that don't match the code, marking them, and fixing them itself.

In the Allen Institute case, the team used an actor-critic pairing: one agent responsible for writing, another specialized in evaluating its accuracy and the veracity of citations.

This structure already hints at a prototype for "AI internal peer review."

But one boundary must be clear: it's human-in-the-loop throughout.

Before utilizing new resources, it seeks authorization; every decision you can review and revoke. It automates the process, not the scientific discovery itself.

It also integrates with NVIDIA's BioNeMo Agent Toolkit, connecting natively to life science models like Evo 2, Boltz-2, and OpenFold3.

Models, data, and pipelines your lab trusts can be saved as reusable skills and integrated, automatically inherited by future sessions.

Claude Science's First Stop is Life Sciences

Claude Science's initial focus is on life sciences.

Genomics, single-cell, proteomics, structural biology, cheminformatics—ready to use out of the box.

It can read literature, query 60+ scientific databases like UniProt, PDB, Ensembl, ClinVar, ChEMBL, and GEO—you no longer need to learn how to use each disparate database one by one.

Claude Science comes with pre-configured environments for genomics, single-cell, proteomics, and cheminformatics, backed by 60+ scientific databases. (Image source: Anthropic official blog)

Manifold Bio works on tissue-targeting drugs.

They use Claude Science to nominate targets for the latest experiments, evaluating surface expression, transport, and safety for each tissue and target, then ranking candidates according to standards learned from their own proprietary data.

Manifold says ordinary programming assistants can't do this; Claude Science can complete it end-to-end, getting the right data, making the right judgments, and carrying context from past projects.

There are even more hardcore examples.

An epidemiology associate professor at the UCSF Brain Tumor Center uses it for molecular epidemiology studies of glioma, analyzing how thousands of small-effect germline variants combine to shape individual susceptibility.

According to Anthropic, this germline analysis was completed by Claude Science in about 1/10th the time, and his team independently verified the results, confirming both speed and reliability.

However, these 10x acceleration scenarios are currently limited to review writing, genomic analysis, and specific pipeline automation, and do not equate to "10x acceleration of research overall."

Simultaneously, the threshold for research credibility is being redefined.

In the past, the reliability of a study depended on peer review and whether others could reproduce it.

Reproducibility has long been a major pain point in science—code gets lost, environments change, months later even the authors themselves can't reproduce that original chart.

Claude Science ensures every chart has traceable code, every result is linked to its environment and history. It might be the first to truly cross the reproducibility hurdle.

Three Players on the Same Track

In the bio-research track, the three giants are all competing, each with a different approach.

Google bets on proprietary models, OpenAI bets on the model's scientific IQ, while Anthropic bets on workflow.

Google holds exclusive models like AlphaFold and AlphaGenome that others don't have, going directly to market.

OpenAI takes a different path.

In April this year, it launched GPT-Rosalind, a cutting-edge model specifically built for biological reasoning and drug discovery.

Now it's going further, training the model's "scientific judgment."

It just launched GeneBench-Pro, specifically testing if models can make judgments like a computational biologist: 129 questions, spanning genomics, population genetics, all the way to clinical diagnosis, specifically testing the feel for "does the data support this question" and "which step should be redone."

The strongest model, GPT-5.6 Sol, scored 28.7%, 31.5% with Pro mode; GPT-5 from a few generations ago was less than 5%.

OpenAI itself says that at this rate, the benchmark could be topped by the end of the year.

But even the strongest models only solve less than a third. The unsolvable portion is precisely where human scientists remain.

The AI shortcomings exposed by GeneBench-Pro are also obvious:

Models can start but can't close the final loop, like whether to exclude a batch of anomalous data, how to change course when a hypothesis is overturned—these judgments still require the scientist's own decision.

Claude Science doesn't bypass this either; proposals are submitted for human review, every decision left for human revocation. It automates the process, judgment is not handed to the model, humans remain in the loop.

For scientists like Lecoq, whether a review is reproducible, whether it still holds up months later, matters more than an extra few percentage points on a leaderboard.

Claude Science is betting precisely on making AI research truly land in the daily routine of the lab.

References:

https://www.anthropic.com/news/claude-science-ai-workbench

https://openai.com/index/introducing-genebench-pro/

This article is from the WeChat public account "New Zhiyuan," author: ASI Apocalypse

Preguntas relacionadas

QWhat specific scientific task was dramatically accelerated by Claude Science according to the article?

AThe article describes how Claude Science enabled neuroscientist Jérôme Lecoq and his team to complete the writing of a long-form scientific review in a few weeks, a task that would have taken nearly two years previously.

QWhat is the core difference between Claude Science and other AI models for science like Google's or OpenAI's, as stated in the article?

AThe article states that while Google bets on proprietary models (e.g., AlphaFold) and OpenAI focuses on a model's research intelligence (e.g., GPT-Rosalind), Anthropic's Claude Science bets on workflow. It is positioned as an AI workbench that integrates and automates the entire scientific toolchain into an auditable pipeline, not just a smarter model.

QHow does Claude Science address the critical challenge of reproducibility in scientific research?

AClaude Science addresses reproducibility by packaging every generated figure with the exact code that produced it, the runtime environment, a natural language description, and the full conversation history. This creates a traceable, closed-loop workflow where results can be easily recreated and audited later.

QDescribe the role of the 'reviewer agent' within the Claude Science system.

AAccording to the article, the 'reviewer agent' is a specialized component that checks citations and computations. It identifies incorrect citations, numbers without sources, and figures that don't match the code, then flags and corrects them. This creates a system resembling 'internal AI peer review'.

QWhat limitation of current AI models in scientific research was highlighted by OpenAI's GeneBench-Pro benchmark, as mentioned in the article?

AOpenAI's GeneBench-Pro benchmark highlighted that while models can start analyses, they struggle with the final, critical judgment calls. This includes decisions like whether to exclude a batch of outlier data or how to change the research approach when a hypothesis is disproven. These core scientific judgments still require a human scientist's decision.

Lecturas Relacionadas

How Collector Crypt Uses 'Recirculating Buybacks' to Create an Illusion of Growth

Title: How Collector Crypt Creates a Growth Illusion with "Buyback Loops" Key Findings: Collector Crypt's (CC) net take rate has halved from 11.2% in Q3 2025 to 5.6% in Q2 2026, while GMV grew 4.7x. This growth is driven by higher-tier card packs ($250, $1,000, $2,500) which have lower platform dollar retention rates. The newly launched $2,500 Mythic tier captured 36.7% of June GMV within 13 days. Growth is fueled by a small cohort of high-spending, high-frequency wallets rather than broad user base expansion. The economic model faces pressure from three key areas: 1) **Shifting GMV Mix**: Pushing users towards larger, lower-retention card packs increases GMV but reduces overall profitability. 2) **Physical Redemptions**: Card redemptions for physical items remove reusable inventory from the system, creating costly replenishment needs. In May, redemptions consumed 41.6% of pre-redemption net income. Only 75 wallets drove redemptions in June. 3) **B2B/API Strategy**: Partner revenue remains negligible (cumulatively $1.83M) and dependent on CC for inventory, vaulting, and buyback services, failing to create a scalable, asset-light recurring revenue stream. The core product is a repetitive pack-buyback loop with limited secondary market activity and token value accrual. Sensitive modeling shows CC's economics turn negative when any two of the following pressures coincide: replenishment costs near market price, redemption rates exceeding 9%, or high-tier buyback rates around 93%. While CC operates in a large and growing collectibles market, its current growth levers—bigger packs, high buyback rates, and capital recycling by a few wallets—create a volume illusion without demonstrating sustainable collector engagement, deep secondary markets, or a viable path to improved margins. Future proof points include broadening collector participation, deepening secondary trading, and developing true asset-light B2B revenue channels.

Foresight NewsHace 8 min(s)

How Collector Crypt Uses 'Recirculating Buybacks' to Create an Illusion of Growth

Foresight NewsHace 8 min(s)

Polygon burns hit 107M POL this year – So why is its price tanking?

Polygon (POL) has burned over 107 million tokens in 2026, becoming net deflationary this year. Its network activity remains strong, achieving the highest stablecoin transaction volume among payment networks in May at 198 million. Major staking activity and whale confidence also support its fundamentals. However, POL's price has declined, recently trading around $0.06948. This price drop is attributed to a prolonged technical downtrend since late 2024, with the asset now in a consolidation pattern and facing significant selling pressure, as indicated by exchange offloads and negative market indicators.

ambcryptoHace 9 min(s)

Polygon burns hit 107M POL this year – So why is its price tanking?

ambcryptoHace 9 min(s)

24/7 Clearing Has Arrived for US Stocks, But Crypto Didn't Get a Ticket In

The U.S. National Securities Clearing Corporation (NSCC), a subsidiary of the Depository Trust & Clearing Corporation (DTCC), has launched 24-hour, five-day-a-week clearing for traditional securities. This move directly undermines a core narrative of the cryptocurrency industry, which has long touted the 24/7 trading capability of digital assets as a key advantage over traditional finance. The DTCC's implementation of near-continuous clearing did not utilize any public blockchain networks, contrary to hopes within the crypto community. Instead, the institution has consistently relied on private, permissioned systems for its projects, such as the Ion settlement platform and a recent U.S. Treasury tokenization initiative on the Canton network. Expectations from supporters of Ethereum, XRP Ledger, and other public chains for DTCC integration have repeatedly failed to materialize. While crypto markets still maintain a minor differentiation by operating on weekends, the DTCC's successful rollout of extensive clearing hours demonstrates traditional finance can evolve its own infrastructure to meet demand for longer trading windows, without involving the public cryptocurrency ecosystem.

marsbitHace 23 min(s)

24/7 Clearing Has Arrived for US Stocks, But Crypto Didn't Get a Ticket In

marsbitHace 23 min(s)

Grayscale's Latest Research: What is Solana's Next Growth Engine?

Grayscale's latest report, "Solana: Crypto's Financial Bazaar," signals a shift in how the market views Solana, moving beyond its high-performance and meme-centric reputation. The report frames Solana as an evolving application platform for large-scale economic activity, akin to a bustling digital marketplace. The analysis highlights that public chain competition has moved past raw throughput (TPS) to focus on genuine economic activity—daily users, transaction volume, and real revenue. Solana's metrics, such as over 1,000 dApps, 100M+ daily transactions, and ~4.3M daily active users, showcase this shift toward application-layer prosperity. The report identifies three key growth drivers: 1. **Jupiter**: Evolving from a DEX aggregator to a core liquidity hub and comprehensive financial platform for Solana's DeFi. 2. **Pump.fun**: Demonstrates Solana's capacity for consumer-scale applications, attracting millions of users and generating significant, sustainable revenue, validating network stability under high load. 3. **Helium & DePIN**: Represents expansion into real-world infrastructure, connecting blockchain to physical resources like wireless networks and positioning services, opening new long-term use cases. Solana Foundation's recent focus aligns with this broader vision, emphasizing AI Agents (for machine-to-machine transactions), payments, stablecoins, and Real-World Assets (RWA) to build a sustainable growth model beyond cyclical trends. While challenges remain—such as value capture for SOL and maintaining ecosystem sustainability beyond hot trends—institutional interest is growing due to Solana's maturing application business models, expanding payment/stablecoin ecosystem, and persistent developer activity. The competition is no longer about speed alone, but about which network can foster the most vibrant and valuable digital economy.

marsbitHace 29 min(s)

Grayscale's Latest Research: What is Solana's Next Growth Engine?

marsbitHace 29 min(s)

They Waited 7 Years for This Money

The article discusses the significant drop in share price of Circle, known as the "first stablecoin stock," triggered by the announcement of a new alliance including Visa, Stripe, Mastercard, Coinbase, BlackRock, Google, IBM, and Ripple. This alliance plans to launch Open USD, a USD stablecoin, later this year. Key to the market reaction is Open USD's plan to distribute reserve-generated profits to its adopters, directly challenging Circle's core revenue model from USDC's reserve interest. The piece draws a parallel to Facebook's 2019 Libra (later Diem) project, which involved many of the same companies. Libra failed due to regulatory pressure, its association with Facebook's controversial reputation, and overly ambitious global currency narratives. However, the underlying desire of these major financial and tech firms to create a new digital payment infrastructure persisted. Over seven years, the landscape changed: clearer US stablecoin regulations (GENIUS Act), mature blockchain infrastructure, and companies gaining practical experience with crypto payments. Open USD presents a more modest, compliance-focused narrative—a settlement tool and enterprise payment rail rather than a revolutionary global currency. While the new alliance poses a serious threat to Circle's profitability and exclusivity, it faces challenges typical of large consortia: slow decision-making and complex profit-sharing. USDC's established liquidity, trust, and integrations provide Circle with significant defenses. The market's reaction is seen partly as an emotional overreaction but also a necessary reevaluation of Circle's business model from a unique "stablecoin era ticket" to a "strong issuer" in a competitive commodity market. Ultimately, the core ambition from the Libra era remains: to digitize the movement of dollar value on the internet and capture the adjacent commercial opportunities. The lesson learned is to pursue this goal not as a high-profile, platform-led revolution, but as a quiet, utility-focused infrastructure play.

marsbitHace 29 min(s)

marsbitHace 29 min(s)

Trading

Spot