Claude Science Completes Two Years' Work in a Few Weeks, Is 10x Research Acceleration Really Here?

marsbitPublié le 2026-07-01Dernière mise à jour le 2026-07-01

Résumé

Claude Science, a new AI workbench from Anthropic, is being tested by scientists, reportedly accelerating specific research workflows by up to 10x. A neuro-scientist at the Allen Institute completed a lengthy literature review in weeks instead of nearly two years using the tool, which automates tasks like citation verification. The platform is an integrated environment for macOS and Linux, connecting to local or remote computing resources. It streamlines the fragmented research process—literature analysis, computation, visualization, and drafting—into a single, auditable workflow. A key feature is its emphasis on reproducibility: every chart generated includes the exact code, environment, and history used to create it. Claude Science uses a multi-agent system. A coordinator manages over 60 pre-configured skills for life sciences (genomics, proteomics, etc.) and can spawn specialized agents. A dedicated reviewer agent checks citations and calculations for accuracy, creating a form of internal AI peer review. The system operates with a human-in-the-loop, requiring user approval for major steps. Initial applications are in life sciences. Examples include target identification for biotech company Manifold Bio and germline variant analysis for glioma research at UCSF, completing analyses in roughly one-tenth the previous time. The approach contrasts with competitors: Google focuses on proprietary models like AlphaFold, while OpenAI is advancing models' scientific reasoning wit...

Two years' work, now completed in a few weeks.

Recently, neuroscientist Jérôme Lecoq and his team at the Allen Institute compressed the writing time of a long-form review article from nearly two years to just a few weeks.

Jérôme Lecoq had a backlog of about 10 reviews, many exceeding 100 pages, with every single citation checked sentence by sentence by an AI agent.

The tool helping him was Claude Science, the new application just launched by Anthropic.

On June 30, 2026, Anthropic released Claude Science, positioned as an AI workbench for scientists. (Image source: Anthropic official blog)

According to Anthropic, this task would have taken the scientist and his team two years in the past.

Anthropic's positioning for Claude Science is not just a smarter research model, but an AI workbench tailored for scientists.

Its true breakthrough lies in: for the first time, breaking down scientific research into an auditable, step-by-step pipeline.

Currently, Claude Science is in beta on macOS and Linux, available to Pro, Max, Team, and Enterprise users.

What's Really Changing is the Entire Research Toolchain

Anyone who has done research understands the drudgery:

A project requires jumping between dozens of databases, each with its own schema and query language;

File formats are all over the place, requiring custom pipelines and viewers for each;

You have a row of tools on hand: PubMed for literature, Jupyter for code, R for statistics, cluster terminals for submitting jobs...

Constant context switching, leaving little time for actual scientific thinking amidst the work of moving, splicing, and debugging.

What Claude Science does is bundle these fragmented scenarios into a single execution environment:

Literature analysis, multi-step computation, chart polishing, manuscript drafting—all stages are completed in the same environment, so you don't interrupt your train of thought switching tools.

It can run on your local macOS or Linux machine, connect via SSH to remote machines, or attach to a high-performance computing (HPC) login node.

Just like using Jupyter normally, it goes where the data is.

It even handles computational resource scheduling.

Large tasks like protein folding or running a genomic pipeline on massive data used to require researchers to babysit: setting up jobs, queuing for cluster time, monitoring success or failure, and pulling results back—half a day gone in a flash.

Claude Science takes over this flow: drafting a plan, asking for permission before touching new resources, letting you review or revert tasks before they are written and submitted, scaling analysis from 1 GPU to hundreds.

Claude Science dispatches an 8-group scVI hyperparameter scan to a lab's A100 cluster. The Notebook on the right shares the same live kernel with the agent, with variables and state synchronized in real-time. (Image source: Anthropic official blog)

More importantly, sensitive data doesn't leave the original system; only the context truly needed for each step is sent to Claude.

Every Chart Comes with Traceable Code

Science inherently deals with visuals: protein 3D structures, genome browser tracks, chemical formulas—these are essentially diagrams.

Building on this, Claude Science, while generating charts and drafts, also outputs the code that created them and can render them natively.

The key lies in reproducibility.

Whenever Claude Science generates a chart, it "pins" the exact code that created it, along with the runtime environment, plain-language descriptions, and the full conversation history, right onto the chart.

On the left, a cell chart across 138 species; on the right, the exact code that generated it is displayed side-by-side. Annotating with a sentence can have the agent modify the chart. Every result is reproducible and traceable to its code. (Image source: Anthropic official blog)

Months often pass from a paper's submission to publication; months later, when a reviewer asks you to rerun a specific chart, you can easily reproduce the entire chain of inputs, process, and results on the spot.

Want to edit the chart? Just speak—"remove gridlines," "switch the y-axis to log," the agent directly edits the code it wrote.

You can also fork the session at any point to try two different approaches simultaneously, without disrupting the original thread.

In short, research is integrated for the first time into an auditable workflow, with code, environment, and history all placed within a closed loop.

One Agent Writes, Another Specializes in Finding Errors

Behind Claude Science, it's not a single agent working alone.

You're facing a coordinating agent that manages over 60 pre-configured skills and connectors for genomics, single-cell analysis, proteomics, structural biology, and cheminformatics.

When work piles up, it can spawn additional agents for division of labor and can call upon expert agents you've created yourself.

The most clever part is the reviewer agent.

It specializes in checking citations and calculations, hunting down wrong citations, unsourced numbers, charts that don't match the code, marking them, and fixing them itself.

In the Allen Institute case, the team used an actor-critic pairing: one agent responsible for writing, another specialized in evaluating its accuracy and the veracity of citations.

This structure already hints at a prototype for "AI internal peer review."

But one boundary must be clear: it's human-in-the-loop throughout.

Before utilizing new resources, it seeks authorization; every decision you can review and revoke. It automates the process, not the scientific discovery itself.

It also integrates with NVIDIA's BioNeMo Agent Toolkit, connecting natively to life science models like Evo 2, Boltz-2, and OpenFold3.

Models, data, and pipelines your lab trusts can be saved as reusable skills and integrated, automatically inherited by future sessions.

Claude Science's First Stop is Life Sciences

Claude Science's initial focus is on life sciences.

Genomics, single-cell, proteomics, structural biology, cheminformatics—ready to use out of the box.

It can read literature, query 60+ scientific databases like UniProt, PDB, Ensembl, ClinVar, ChEMBL, and GEO—you no longer need to learn how to use each disparate database one by one.

Claude Science comes with pre-configured environments for genomics, single-cell, proteomics, and cheminformatics, backed by 60+ scientific databases. (Image source: Anthropic official blog)

Manifold Bio works on tissue-targeting drugs.

They use Claude Science to nominate targets for the latest experiments, evaluating surface expression, transport, and safety for each tissue and target, then ranking candidates according to standards learned from their own proprietary data.

Manifold says ordinary programming assistants can't do this; Claude Science can complete it end-to-end, getting the right data, making the right judgments, and carrying context from past projects.

There are even more hardcore examples.

An epidemiology associate professor at the UCSF Brain Tumor Center uses it for molecular epidemiology studies of glioma, analyzing how thousands of small-effect germline variants combine to shape individual susceptibility.

According to Anthropic, this germline analysis was completed by Claude Science in about 1/10th the time, and his team independently verified the results, confirming both speed and reliability.

However, these 10x acceleration scenarios are currently limited to review writing, genomic analysis, and specific pipeline automation, and do not equate to "10x acceleration of research overall."

Simultaneously, the threshold for research credibility is being redefined.

In the past, the reliability of a study depended on peer review and whether others could reproduce it.

Reproducibility has long been a major pain point in science—code gets lost, environments change, months later even the authors themselves can't reproduce that original chart.

Claude Science ensures every chart has traceable code, every result is linked to its environment and history. It might be the first to truly cross the reproducibility hurdle.

Three Players on the Same Track

In the bio-research track, the three giants are all competing, each with a different approach.

Google bets on proprietary models, OpenAI bets on the model's scientific IQ, while Anthropic bets on workflow.

Google holds exclusive models like AlphaFold and AlphaGenome that others don't have, going directly to market.

OpenAI takes a different path.

In April this year, it launched GPT-Rosalind, a cutting-edge model specifically built for biological reasoning and drug discovery.

Now it's going further, training the model's "scientific judgment."

It just launched GeneBench-Pro, specifically testing if models can make judgments like a computational biologist: 129 questions, spanning genomics, population genetics, all the way to clinical diagnosis, specifically testing the feel for "does the data support this question" and "which step should be redone."

The strongest model, GPT-5.6 Sol, scored 28.7%, 31.5% with Pro mode; GPT-5 from a few generations ago was less than 5%.

OpenAI itself says that at this rate, the benchmark could be topped by the end of the year.

But even the strongest models only solve less than a third. The unsolvable portion is precisely where human scientists remain.

The AI shortcomings exposed by GeneBench-Pro are also obvious:

Models can start but can't close the final loop, like whether to exclude a batch of anomalous data, how to change course when a hypothesis is overturned—these judgments still require the scientist's own decision.

Claude Science doesn't bypass this either; proposals are submitted for human review, every decision left for human revocation. It automates the process, judgment is not handed to the model, humans remain in the loop.

For scientists like Lecoq, whether a review is reproducible, whether it still holds up months later, matters more than an extra few percentage points on a leaderboard.

Claude Science is betting precisely on making AI research truly land in the daily routine of the lab.

References:

https://www.anthropic.com/news/claude-science-ai-workbench

https://openai.com/index/introducing-genebench-pro/

This article is from the WeChat public account "New Zhiyuan," author: ASI Apocalypse

Questions liées

QWhat specific scientific task was dramatically accelerated by Claude Science according to the article?

AThe article describes how Claude Science enabled neuroscientist Jérôme Lecoq and his team to complete the writing of a long-form scientific review in a few weeks, a task that would have taken nearly two years previously.

QWhat is the core difference between Claude Science and other AI models for science like Google's or OpenAI's, as stated in the article?

AThe article states that while Google bets on proprietary models (e.g., AlphaFold) and OpenAI focuses on a model's research intelligence (e.g., GPT-Rosalind), Anthropic's Claude Science bets on workflow. It is positioned as an AI workbench that integrates and automates the entire scientific toolchain into an auditable pipeline, not just a smarter model.

QHow does Claude Science address the critical challenge of reproducibility in scientific research?

AClaude Science addresses reproducibility by packaging every generated figure with the exact code that produced it, the runtime environment, a natural language description, and the full conversation history. This creates a traceable, closed-loop workflow where results can be easily recreated and audited later.

QDescribe the role of the 'reviewer agent' within the Claude Science system.

AAccording to the article, the 'reviewer agent' is a specialized component that checks citations and computations. It identifies incorrect citations, numbers without sources, and figures that don't match the code, then flags and corrects them. This creates a system resembling 'internal AI peer review'.

QWhat limitation of current AI models in scientific research was highlighted by OpenAI's GeneBench-Pro benchmark, as mentioned in the article?

AOpenAI's GeneBench-Pro benchmark highlighted that while models can start analyses, they struggle with the final, critical judgment calls. This includes decisions like whether to exclude a batch of outlier data or how to change the research approach when a hypothesis is disproven. These core scientific judgments still require a human scientist's decision.

Lectures associées

« La plus grande mise à niveau depuis The Merge » ? Comment Glamsterdam va impacter Ethereum et les utilisateurs lambda ?

La prochaine mise à niveau majeure d'Ethereum, baptisée "Glamsterdam" (combinaison des mises à jour de consensus "Glas" et d'exécution "Amsterdam"), est prévue pour le second semestre 2026. Considérée comme la plus importante depuis "The Merge", elle vise à réformer en profondeur l'architecture du réseau principal (L1) pour en augmenter la capacité et l'efficacité, sans compromettre sa décentralisation. Trois changements principaux sont au cœur de cette mise à niveau : 1. **L'ePBS (Enshrined Proposer-Builder Separation)** : Intègre la séparation entre proposeurs et constructeurs de blocs directement dans le protocole, éliminant le besoin d'intermédiaires de confiance externes (Relays). Cela prolonge la fenêtre de traitement des blocs, permettant d'augmenter la limite de gaz (Gas Limit) et donc la taille des blocs. 2. **Les listes d'accès au niveau des blocs (Block-Level Access Lists - BALs)** : Fournissent une "carte" préalable des données d'état que les transactions d'un bloc vont lire ou modifier. Cela permet un traitement parallèle des transactions non conflictuelles et accélère la synchronisation des nœuds, ouvrant la voie à une exécution plus rapide. 3. **La reprix du gaz** : Introduit un modèle de tarification plus précis pour mieux refléter le coût réel des ressources, en distinguant le coût de calcul instantané du coût de stockage permanent des données d'état. Cela vise à contrôler l'expansion incontrôlée de l'état du réseau. **Impact pour les utilisateurs :** * **Frais de transaction** : Une capacité de bloc accrue devrait globalement réduire la congestion et stabiliser les frais, en particulier pour les transferts simples. Cependant, les opérations créant beaucoup de données d'état (comme le déploiement de contrats complexes) pourraient voir leurs coûts augmenter. * **Expérience utilisateur** : Les portefeuilles pourront estimer les frais de gaz avec plus de précision, réduisant les échecs de transaction. Les mouvements internes d'ETH seront plus faciles à tracer. * **Couche 2 (L2)** : La capacité accrue à traiter les "blobs" de données bénéficiera à long terme aux Rollups, pouvant stabiliser leurs coûts. * **Nœuds et décentralisation** : En optimisant les processus sous-jacents plutôt qu'en exigeant simplement plus de puissance de calcul, Glamsterdam cherche à augmenter le débit tout en préservant la possibilité pour des particuliers de faire tourner des nœuds. En résumé, Glamsterdam ne se contente pas d'augmenter la taille des blocs. Il restructure les fondations d'Ethereum pour permettre une future expansion durable du réseau principal, en équilibrant performances accrues et préservation des principes de décentralisation.

marsbitIl y a 51 mins

« La plus grande mise à niveau depuis The Merge » ? Comment Glamsterdam va impacter Ethereum et les utilisateurs lambda ?

marsbitIl y a 51 mins

Avis sur l'Exchange Crypto Payodex : Fonctionnalités, Sécurité, Dépôts et Retraits

**Aperçu de l'échange de crypto-monnaies Payodex** Fondé en 2018 et basé à Chypre, Payodex est un échange de cryptomonnaies centralisé qui attire les traders grâce à son interface simple, ses frais de transaction bas et sa vérification rapide des comptes. Il convient aux débutants comme aux traders expérimentés, prenant en charge le trading spot, à terme (futures) et sur marge avec un effet de levier allant jusqu'à 1:20. La plateforme propose également des outils d'analyse technique et un accès API pour le trading automatisé. La sécurité est une priorité, avec des mesures telles que l'authentification à deux facteurs (2FA), le stockage des actifs en portefeuilles froids et le chiffrement des données. Payodex déclare être réglementé par la CySEC (Cyprus Securities and Exchange Commission). Les dépôts et retraits sont exclusivement effectués en cryptomonnaies. Les processus sont simples, mais nécessitent une attention particulière aux adresses de portefeuille et aux réseaux sélectionnés pour éviter les erreurs. En résumé, Payodex offre une gamme complète de fonctionnalités de trading avec une interface conviviale. Ses principaux atouts sont ses frais compétitifs et sa simplicité d'utilisation. Cependant, la plateforme présente certaines limites, comme une sélection d'altcoins moins étendue et une liquidité parfois inférieure à celle des grands échanges pour certaines paires de trading. Des options de revenus passifs et un programme de parrainage complètent l'offre.

TheNewsCryptoIl y a 1 h

Avis sur l'Exchange Crypto Payodex : Fonctionnalités, Sécurité, Dépôts et Retraits

TheNewsCryptoIl y a 1 h

Récapitulatif du marché crypto au Q2 : le Bitcoin a-t-il monté pour rien, l’argent s’est-il rué vers l'IA et les applications on-chain ?

Récapitulatif du Q2 sur les marchés crypto : Bitcoin a perdu les gains d'avril, clôturant le trimestre en baisse d'environ 11%. Cette faiblesse a été alimentée par un durcissement des perspectives de taux de la Fed, des sorties de fonds des ETF Bitcoin (sortie nette de 4,08 milliards de dollars sur le trimestre) et une rotation des capitaux vers les actions d'IA. Les trois principaux canaux de liquidité (ETF, sociétés comme Strategy, et stablecoins) se sont affaiblis simultanément. L'activité boursière a décliné, avec un volume spot en baisse de 28% et un ratio spot/dérivés en compression. Le marché des dérivés a subi une déleveragisation significative, avec 8,35 milliards de dollars de liquidations longues sur BTC et ETH. La liquidité sur les carnets d'ordres s'est détériorée. Malgré le contexte baissier, des thèmes structurels émergent : l'essor des actions tokenisées (comme l'annonce de Coinbase), la croissance des contrats perpétuels sur actifs réels (RWA) via des plateformes comme Hyperliquid, et l'utilisation de la blockchain pour la découverte de prix avant introduction en bourse (exemple : SpaceX). Les coffres-forts (vaults) DeFi deviennent également une couche importante pour le capital institutionnel. Le marché entre dans le Q3 dans un état moins endetté, la question étant de savoir si la liquidité reviendra vers les crypto-actifs.

Foresight NewsIl y a 1 h

Récapitulatif du marché crypto au Q2 : le Bitcoin a-t-il monté pour rien, l’argent s’est-il rué vers l'IA et les applications on-chain ?

Foresight NewsIl y a 1 h

Trading

Spot
活动图片