Biology's Paradigm Shift: Zuckerberg's New Open-Source Model Completely Overturns Google's AlphaFold Throne

marsbitОпубликовано 2026-05-29Обновлено 2026-05-29

Введение

The AlphaFold era faces a major challenge. A new open-source AI model, ESMFold2, from Meta CEO Mark Zuckerberg's Biohub, has been released alongside a massive database of 11 billion predicted protein structures—surpassing the AlphaFold database by 8 billion entries. Published in Nature, the model is reported to outperform AlphaFold3 in key areas, particularly in predicting protein complexes. Crucially, it is fully open-source with no commercial restrictions. ESMFold2 takes a different technical approach, building on a protein language model trained on billions of sequences, including microbial data from diverse environments like soil and ocean—areas less covered by AlphaFold. The team validated its utility by designing and successfully synthesizing novel functional proteins in the lab. The decision to open-source everything is seen as a strategic move, similar to Meta's approach with its Llama models, aiming to build an ecosystem and accelerate global research. While scientists welcome the resource, some urge caution, noting the need for independent validation of predictions and questioning its performance on entirely novel protein folds. The development signals intensified competition in protein AI, rapidly evolving much like the large language model field, and represents a significant step forward in using AI to decode and engineer the machinery of life.

The AlphaFold throne is in peril!

Nature publishes: The Biohub backed by Zuckerberg has dropped a bombshell, releasing 1.1 billion protein structure predictions in one go, surpassing the AlphaFold database by 800 million entries.

The underlying AI model, ESMFold2, is claimed to comprehensively outperform AlphaFold3.

More crucially, it is completely open-source and unrestricted for commercial use.

https://www.nature.com/articles/d41586-026-01686-3

Google DeepMind's hard-earned dominance in protein AI over the years is being shaken by an open-source disruptor.

The landscape of the protein AI race may be rewritten.

1.1 Billion Protein Structures, Served All at Once

On May 27th, the biomedical institution Biohub, founded by Mark Zuckerberg and his wife, officially launched the ESM Atlas protein structure database.

1.1 billion predicted protein structures, plus 6.8 billion protein sequence entries.

AlphaFold's database has accumulated over 200 million structure predictions. ESM Atlas arrives with 800 million more.

The AI model generating these predictions is called ESMFold2, developed by a team led by Biohub's scientific lead, Alex Rives.

Rives stated:

This atlas reveals the full picture of protein biology, especially its most unknown parts.

Why is protein structure prediction important?

Proteins are the core components driving life. Knowing their shape allows us to understand their function, thereby designing new drugs and conquering diseases.

AlphaFold won the Nobel Prize in Chemistry for this, becoming a landmark case of AI transforming science.

Now, a new model stands up with a dataset five times larger.

As an AI Model, Where Does ESMFold2 Excel?

ESMFold2 takes a different technical path than AlphaFold.

It's built upon a "protein language model" released in 2024. Its core idea borrows from NLP practices, treating protein sequences as "language" to understand. Trained on tens of billions of protein data points, the model learns to predict 3D structure directly from sequence.

AlphaFold's AI peers should find this familiar—it's the same logic large language models use to learn human language.

The coverage of training data is a key variable.

ESMFold2 incorporates a vast amount of microbial protein data from environments like soil and oceans, which is a gap in AlphaFold's database.

With broader coverage, the model's understanding of the "protein world" is more complete.

The Biohub team claims ESMFold2 outperforms AlphaFold3 in predicting complex structures of protein-protein interactions.

But the most convincing aspect isn't benchmarks, but real-world validation.

The team used ESMFold2 to design novel proteins, synthesized them in the lab for testing, and a high proportion of the designs functioned as intended.

From "prediction" to "design" to "verification," running this pipeline extends value from papers to the real world.

Completely Open-Source, This is the Real Killer App

ESMFold2's sharpest competitive weapon is being fully open-source with no commercial restrictions.

The strategic significance of this choice becomes clearer when viewed within the context of the entire AI industry.

While AlphaFold has an open database, AlphaFold3 initially imposed restrictions on commercial use upon release.

Isomorphic Labs under Google DeepMind went a step further, making its protein interaction prediction model this year entirely closed-source.

Further Reading: Google Releases "AlphaFold 4," No Longer Open-Source! Performance Crushes Predecessor

MIT computational biologist Ovchinnikov directly pointed out the value of open-source: "I expect many people will be excited to try ESMFold2."

The leverage effect of open-source AI has been fully demonstrated in the large language model arena, with Meta's Llama series being the prime example.

A sufficiently powerful open-source model can mobilize the global community to iterate, apply, and discover uses the original developers never imagined.

The situation in the protein AI field is even more unique. Globally, countless labs and research institutions urgently need a free, unrestricted structure prediction tool. No matter how powerful a closed-source model is, its reachable user base is limited.

Biohub's choice to go fully open-source aligns with Meta's strategy in large language models.

The Zuckerberg-affiliated strategy in AI is becoming increasingly clear—using open-source as infrastructure and building an ecosystem as a moat.

Do Fellow Experts Buy It?

The academic response is positive, but reservations are also clear.

Gemma Atkinson from Lund University, Sweden, called ESM Atlas "should be a phenomenal resource for biology."

Christine Orengo from University College London acknowledged its value but emphasized that the predictions need independent verification.

A sharper question came from Martin Steinegger of Seoul National University.

He is concerned about ESMFold2's performance when facing "novel structures" that differ significantly from known proteins.

His team previously found that the first version of ESMFold was not stellar in this regard. This question remains unresolved for ESMFold2.

MIT's Ovchinnikov offered the most measured judgment, suggesting that ESM Atlas might be better positioned as a supplement to the AlphaFold database.

He also noted that Isomorphic Labs' closed-source model and some other open-source models not directly compared by Biohub have achieved similar levels of results.

The lead of ESMFold2 might not be as large as the paper suggests.

This caution precisely reflects that competition in the protein AI race has become white-hot.

Open-source, closed-source, academic, commercial—models of all stripes are iterating at an extremely fast pace.

Today's "strongest" might be surpassed in six months. This pace is already very similar to the arms race in the large language model field.

When AI Begins to Read Life's Source Code

In the past, determining a protein's 3D structure could take months to years of lab work.

AlphaFold first proved AI could do it in minutes.

Now, ESMFold2 pushes the prediction scale to 1.1 billion, covering a vast number of proteins never before characterized.

Extrapolating forward along this path, when AI can accurately predict all protein structures and design novel functional proteins validated effectively by experiments, then AGI's landing in life sciences might be closer than most anticipate.

If and when ASI truly arrives, biology would no longer be a discipline it needs to "study," but a system that can be "engineered."

Designing life at the molecular level, customizing proteins on demand, rewriting the rules of evolution.

It sounds like science fiction, but tools like ESMFold2 are gradually turning "science fiction" into an "engineering problem."

Today, 1.1 billion protein structures are laid out on the table, free for any scientist worldwide with an internet connection to use.

This means AI's ability to understand life has reached another level.

Reference: https://www.nature.com/articles/d41586-026-01686-3

This article is from the WeChat public account "New Zhiyuan" (新智元), author: ASI启示录; editor: Marco

Связанные с этим вопросы

QWhat major protein structure database was recently released by Biohub, and how does its size compare to the AlphaFold database?

ABiohub recently released the ESM Atlas protein structure database, which contains 1.1 billion predicted protein structures. This is approximately 800 million more entries than the AlphaFold database, which has over 200 million predictions.

QWhat is the key technical difference between the ESMFold2 and AlphaFold approaches?

AESMFold2 is based on a 'protein language model' approach, treating protein sequences as a 'language' to be understood and trained on billions of protein data points to predict 3D structure directly from sequence. AlphaFold utilizes a different methodology that integrates multiple sequence alignments and physical constraints.

QWhat is stated as the biggest competitive advantage of the ESMFold2 model compared to recent models like AlphaFold3?

AThe biggest competitive advantage of ESMFold2 is that it is completely open source with no restrictions on commercial use, unlike AlphaFold3 which had initial usage restrictions.

QAccording to the article, what is one significant area of concern raised by researchers like Martin Steinegger about the new ESMFold2 model?

AResearchers like Martin Steinegger raised concerns about ESMFold2's performance when predicting the structure of 'novel' proteins that are very different from known proteins, noting that the first version of ESMFold was not strong in this area.

QWhat strategic AI principle, common to Meta's approach with LLMs, does the article suggest Biohub's open-sourcing of ESMFold2 follows?

AThe article suggests Biohub's strategy follows the same principle as Meta's with its Llama models: using open-source software as infrastructure and building an ecosystem as a competitive moat.

Похожее

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

Three Years Later: Revisiting My 2023 Predictions on ChatGPT In March 2023, shortly after ChatGPT's launch, I made 20 predictions about its future. Now, in mid-2026, I've used AI agents to fact-check each one against the latest data. Overall, most major directional forecasts were correct, with only one outright error (incorrectly stating GPT-4 had 100 trillion parameters). Key successes included predicting that RAG and retrieval architectures would become the standard for handling knowledge and hallucinations, that natural language interfaces (LUI) would create a massive new industry layer beyond the models themselves, and that China would develop viable large language models, significantly closing the performance gap with Western counterparts within about three years. Predictions about the absence of mass unemployment, the rise of a new "robot network" for agent communication, and ChatGPT not possessing consciousness also held true in their core arguments. However, the "devil was in the details." Errors frequently involved specific numbers, timelines, or overlooking distributional effects. I tended to overestimate the speed of adoption (e.g., for agent networks) while underestimating the ultimate scale of capabilities or costs (e.g., AI winning IMO gold without tools, or the extreme capital required for frontier models). Other misjudgments included: underestimating how AI would reinforce, not dissolve, information filter bubbles; incorrectly assuming AI-generated content would easily circumvent copyright (it has instead triggered record-breaking settlements); and misidentifying where value would be captured (it accrued overwhelmingly to the compute layer, like Nvidia, not just the application or model layers). Key lessons from reviewing these predictions are: 1) Directional and mechanistic insights are far more reliable than precise numbers or absolute statements. 2) There's a consistent bias to overestimate short-term speed but underestimate long-term magnitude. 3) Errors often lie in missing distributional impacts within a generally correct aggregate trend. 4) Predictions phrased with nuance and caveats aged the best. 5) Some fundamental debates (e.g., on machine consciousness or the ultimate value chain) remain unresolved even after three years. This exercise is less about scoring the past and more about establishing rules for clearer thinking about the next three years of AI.

marsbit2 ч. назад

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

marsbit2 ч. назад

Three Years Later: Looking Back on My 2023 Predictions for ChatGPT

Looking Back After Three Years: Revisiting My 2023 Predictions on ChatGPT In March 2023, shortly after ChatGPT's debut and before GPT-4's release, I made over twenty predictions about AI's future based on limited information and intuition. Now, in May 2026, I revisited those forecasts using an AI-driven analysis with 41 Opus 4.8 agents to cross-reference them with the latest data. The assessment used symbols: ✅ Correct, 🟢 Mostly Correct, 🟡 Partially Correct, ❌ Incorrect. Overall, the directional judgments held up well, with only one major factual error regarding GPT-4's rumored parameter size (incorrectly cited as 100T). However, nuances and degrees of accuracy revealed more. **What Was Largely Correct:** Predictions about mechanisms and directions proved accurate. The rise of RAG (Retrieval-Augmented Generation) as the standard architecture for combating AI hallucination was confirmed, as was the transformative potential of LUI (Language User Interface) in creating a new industry layer atop GUIs. The emergence of "robot networks" (agent-to-agent communication protocols) and China's rapid catch-up in developing capable large models (closing the performance gap with top models to ~2.7%) were also on point. The analysis affirmed that LLMs lack consciousness and that the Turing Test merely measures perceived intelligence. **What Was Off Target:** Errors often involved specific numbers, over-optimistic timelines, or misjudged distributions. The prediction that value would primarily accrue to the application layer was half-right but missed NVIDIA's dominance as the profitable infrastructure layer. Forecasts about AI circumventing copyright issues and fostering a "global common ground" by averaging human viewpoints were incorrect; instead, major copyright settlements occurred and AI personalization is increasing. Estimates for model training costs ("$5-10 billion cap") were significantly off, underestimating frontier costs and overestimating replication costs. The notion that LLMs could never do complex math without tools was disproven by later models winning IMO gold. **Key Patterns from the Review:** 1. **Direction over precision:** Judgments about mechanisms and trends were more reliable than specific numbers or definitive statements. 2. **Timing bias:** There was a tendency to overestimate short-term speed but underestimate long-term magnitude and transformation. 3. **The distribution blind spot:** Aggregate-level correctness often masked uneven impacts (e.g., on young professionals' employment). 4. **The value of qualifiers:** Predictions framed with caution (e.g., "reportedly," "for now," "prototype in 2-3 years") aged better. 5. **Some debates continue:** Issues like the nature of "emergent abilities" or machine consciousness remain unresolved. This three-year review highlights that while seeing the big picture is crucial, humility regarding specifics, timelines, and disparate impacts is essential for future forecasting.

链捕手4 ч. назад

Three Years Later: Looking Back on My 2023 Predictions for ChatGPT

链捕手4 ч. назад

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

The article issues a stark warning about a potential AI investment bubble. It notes that while the AI boom shares similarities with the TMT bubble of the late 1990s, its scale is vastly larger, currently driving 93% of U.S. GDP growth. Major hyperscale cloud providers like Microsoft, Alphabet, Amazon, Meta, and Oracle are planning to invest trillions in AI data centers over the coming years. However, calculations based on analyst projections for 2025-2030 reveal a concerning math problem: expected capital expenditure growth far outpaces projected revenue growth. Even under an extremely optimistic scenario of zero costs, the implied return on investment for most of these tech giants (except Amazon) is deeply negative. This suggests that the current trajectory could lead to one of history's largest shareholder value destruction events. The piece outlines two potential escapes: AI generating vastly more revenue than currently anticipated—a near-impossible task—or a significant cutback in the planned investment splurge. The latter scenario could trigger a domino effect, severely impacting the entire tech supply chain (from Nvidia to TSMC), potentially pushing the U.S. economy into recession, and causing a major stock market downturn. The author suggests upcoming high-profile IPOs by companies like OpenAI and Anthropic might represent a transfer of risk from early investors to public market participants. While the peak of the hype cycle might sustain investment through 2026, the fundamental financial dilemma remains unresolved, setting the stage for a potential market correction in 2027 or 2028, similar to the years following Alan Greenspan's "irrational exuberance" warning.

marsbit5 ч. назад

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

marsbit5 ч. назад

From Tokens to Machine Labor: AI is Shifting from Tool to "Worker"

The article "From Token to Machine Labor: AI is Evolving from Tool to 'Worker'" argues that the business model for AI is shifting beyond simply selling computational resources (tokens, GPU hours) or model access. Instead, a new "machine labor market" is emerging, where the core economic transaction is the purchase of economically useful work directly performed by software. The central thesis is that AI pricing will evolve through four stages: 1) raw tokens, 2) standardized LLM capabilities (e.g., text generation), 3) industry-specific labor markets (e.g., legal review, radiology), and finally 4) a programmable results market where tasks like resolving a support ticket are bid on and priced based on outcome. In this future, buyers will care less about *which* model or GPU completes a task and more about whether the work meets specified standards for accuracy, latency, and cost. This transition reframes the impact of AI on human labor. Rather than simple replacement, it suggests a re-coordination where machines handle standardized, verifiable work, freeing humans for roles involving oversight, context management, responsibility, and final judgment. In some cases, this "last 1%" of human input becomes more valuable as it enables the other 99% to be automated. Furthermore, as AI reduces the cost of work, demand may expand, creating larger markets (e.g., 24/7 customer service) rather than just cheaper versions of existing ones. The article concludes that while infrastructure (GPUs, models, tokens) remains crucial upstream, the market is converging on a simpler, tradeable unit: machine labor that can be defined, measured, priced, and procured based on contractible specifications.

marsbit5 ч. назад

From Tokens to Machine Labor: AI is Shifting from Tool to "Worker"

marsbit5 ч. назад

Xiaomi MiMo's 99% Price Cut is Not Marketing! Luo Fuli Posts on X to Refute Critics

The price of Xiaomi's MiMo-V2.5 series API has been permanently reduced by up to 99%, specifically for the "Input (Cache Hit)" cost, which covers users re-reading historical context in long conversations. MiMo's head, Luo Fuli, published a detailed technical blog to clarify that this drastic price cut stems from genuine engineering breakthroughs, not a marketing stunt or a simple price war. The core of the achievement lies in six key engineering optimizations. First, the model architecture adopts a Hybrid Sliding Window Attention (SWA), reducing the memory footprint (KVCache) to 1/7th of a traditional model. Second, a dual-pool memory management system actually utilizes these savings, allowing a single GPU to handle over 5 times more concurrent users. Third, an upgraded prefix caching mechanism achieves a cache hit rate of 93-95% for repeated reads, meaning most such requests bypass GPU computation entirely. Fourth, a self-developed distributed cache (GCache) utilizes idle SSD space on existing GPU servers, eliminating additional storage costs. Fifth, an intelligent scheduling system (LLM-Router) efficiently routes requests to maximize cache reuse and performance. Sixth, Multi-Token Prediction (MTP) accelerates the model's text generation ("output") side. Together, these systemic optimizations dramatically lower the real computational cost per request, enabling the 99% price reduction for cached inputs while reportedly maintaining positive gross margins. Luo Fuli's disclosure aims to shift the narrative from "price war" to a demonstration of substantive AI engineering progress.

marsbit7 ч. назад

Xiaomi MiMo's 99% Price Cut is Not Marketing! Luo Fuli Posts on X to Refute Critics

marsbit7 ч. назад

Торговля

Спот

Фьючерсы

Обсуждения

Добро пожаловать в Сообщество HTX. Здесь вы сможете быть в курсе последних новостей о развитии платформы и получить доступ к профессиональной аналитической информации о рынке. Мнения пользователей о цене на S (S) представлены ниже.

Biology's Paradigm Shift: Zuckerberg's New Open-Source Model Completely Overturns Google's AlphaFold Throne

Введение

1.1 Billion Protein Structures, Served All at Once

As an AI Model, Where Does ESMFold2 Excel?

Completely Open-Source, This is the Real Killer App

Do Fellow Experts Buy It?

When AI Begins to Read Life's Source Code

Связанные с этим вопросы

Похожее

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

Three Years Later: Looking Back on My 2023 Predictions for ChatGPT

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

From Tokens to Machine Labor: AI is Shifting from Tool to "Worker"

Xiaomi MiMo's 99% Price Cut is Not Marketing! Luo Fuli Posts on X to Refute Critics

Торговля

Популярные статьи

Как купить S

Sonic: Обновления под руководством Андре Кронье – новая звезда Layer-1 на фоне спада рынка

HTX Learn: Пройдите обучение по "Sonic" и разделите 1000 USDT

Обсуждения

Топ вопросы

Популярные категории

Популярные теги