AlphaGo's Creator Puts AI into a 23-Year-Old Artificial Society: All Three Toughest Challenges for AI Agents Are Here

marsbitPublished on 2026-05-25Last updated on 2026-05-25

Abstract

Demis Hassabis, CEO of DeepMind, has embarked on a new AI research venture by partnering with the long-running space MMO, EVE Online. This collaboration, announced in early May, aims to use the game's 23-year-old, player-driven persistent universe as a testbed for tackling three core challenges in AI agent research: long-horizon planning, memory, and continual learning. Unlike previous DeepMind environments like AlphaGo (Go) or AlphaStar (StarCraft II), EVE Online features no fixed end state. Its single-shard universe has fostered complex, emergent player societies with real economies, political alliances, and wars that can span months or years. These conditions naturally demand the very skills—long-term strategic planning, maintaining memories over extended periods, and adapting to constant change—that are hardest for current AI agents to master. The research will initially use an offline version of EVE, providing a controlled, complex sandbox without interfering with the live player server. This move continues DeepMind's trajectory of using increasingly complex and open-ended virtual worlds for AI training, from Atari games and Go to StarCraft II and the SIMA project. The EVE environment represents a significant step towards testing AI in a persistent, socially complex, and continuously evolving world shaped by human behavior over decades.

DeepMind CEO and AlphaGo creator Demis Hassabis has been using games for AI research for over a decade.

This time, he has thrown AI into a "living universe" that has been running for 23 years: the space-themed massively multiplayer online game EVE Online, a game whose new player tutorial alone can deter players.

Chess games have an end, but EVE does not.

In early May, DeepMind officially announced a research collaboration with EVE Online for a simple reason: EVE's complex, player-driven universe is the perfect safe sandbox to test AI memory, continual learning, and long-term planning.

DeepMind's collaboration with EVE is not about pursuing fun gameplay or enhancing game mechanics. Instead, it aims to tackle the three toughest, most widely recognized challenges in current AI agent research. Hassabis is betting on finding answers in a 23-year-old game.

Fenris Creations (formerly CCP Games) announces partnership with DeepMind

On the same day, May 6th, the company behind EVE Online announced four things:

  • Regained independence from its parent company Pearl Abyss;
  • Renamed to Fenris Creations;
  • Completed a $120 million transaction;
  • As part of this independence, Google acquired a minority stake in Fenris Creations and simultaneously initiated a research partnership with Google DeepMind.

Fenris Creations CEO Hilmar Veigar Pétursson stated in the announcement:

This transition does not involve layoffs or restructuring. The team, products, and development plans remain unchanged. EVE continues.

Looking at operational figures, this company came to the table with "real ammunition" for collaboration, not to sell assets for survival.

EVE Online's revenue in 2025 exceeded $70 million, with November setting a historical revenue record, and Q4 becoming the second-highest revenue quarter in the game's 20-year history.

Fenris Creations' independence means EVE now has a parent company that can autonomously decide on research collaborations, no longer constrained by the strategic goals of a larger game publishing company.

A box of a board game product published by Fenris in 1997. The name "Fenris" predates EVE Online by 6 years. Renaming to Fenris Creations is a look back, not a fresh start.

Why did DeepMind choose EVE?

A 23-Year "Artificial Society"

An AI Benchmark Difficult to Replicate

When many people hear "games + AI research," their first thought is of AlphaGo or AlphaStar. EVE is different from both.

Go and StarCraft share a common characteristic: a match has a beginning, an end, and clear win/lose rules.

AlphaGo's goal was to win a Go game. AlphaStar's goal was to win a StarCraft match. Both represent a "single-game intelligence" research paradigm. But EVE has no endgame.

EVE Online is famous for its "single-shard / single shared universe," where a vast number of players compete, trade, form alliances, and wage war in a persistent world over the long term.

Players here have built real economic systems, political alliances, military coalitions, trade routes, historical grudges, and warfare plans that span years.

Some campaigns take an entire year from preparation to conclusion. The rise and fall of some alliances are studied by later players as real history.

Hilmar stated in the announcement: "EVE is one of the few places where we can explore questions of intelligence in an environment that already operates like the real world."

Hassabis further explained that he has played games since childhood, his career started with designing AI simulation games, and his work on AlphaGo, AlphaStar, and SIMA has been deeply tied to games. EVE is the choice for the next stage:

I'm thrilled to partner with Fenris Creations to safely explore new game experiences and advance AI research within this player-created, uniquely complex universe.

Most AI benchmarks are like medical checkups. EVE is more like throwing AI into an "artificial society" that has been running for 23 years.

The Three Toughest Challenges for Agents

Happen to be Daily Life for EVE Players

The official announcement explicitly lists three research directions: long-horizon planning, memory, and continual learning.

These three directions are widely acknowledged as the three toughest challenges in current AI agent research.

If you know someone who has played EVE Online for over ten years, ask them to open their account and show you their friend list. You'll likely see dozens of groups and hundreds of names, with notes in the remarks field like "Debt owed from the 2018 Delve campaign," "Traitor within Goonswarm, do not cooperate," "This guy is a spy, everyone in the corp knows."

This isn't a context window; it's cross-session long-term memory spanning at least a decade.

EVE players navigate the memory challenge every day. The continual learning challenge is the same.

In January 2014, the B-R5RB battle lasted about 21 hours, involving over 7,500 characters, the destruction of 75 Titans, with losses equivalent to roughly $300,000 in real-world currency. The trigger for the entire battle was a sovereignty bill that failed to auto-pay.

After this battle, the entire game's fleet tactics were rewritten. Alliance fleet compositions and tactical systems for years after revolved around post-battle analysis and iteration. Updates were made monthly, with every failure broken down into actionable strategic updates.

As for long-horizon planning, the standard time unit for EVE alliance warfare isn't hours; it's months. From preparation to execution, a cross-regional war involves shipbuilding, logistics, diplomacy, infiltration, and counter-espionage, with hundreds of players spontaneously collaborating without any task manager to advance a common goal over months.

This collaborative system evolved organically from the players over 23 years.

The three hardest challenges recognized in current AI agent evaluation happen to be the daily life of EVE players.

Twenty-three years of player-driven evolution in EVE have produced an environment that is always changing, always complex, with no shortcuts. This level of complexity cannot be synthetically created in a lab.

DeepMind's SIMA 2, released in November 2025, has evolved from "executing instructions" to "understanding goals, reasoning about processes, and learning while playing."

From a research question perspective, the EVE project shares the same "games as a training ground for agents" path as SIMA 2. The difference is that the venue has been swapped for a real universe that has been running for 23 years.

In-game battle scene from EVE Online. These large-scale, player-organized battles, often lasting for hours, are the core reason DeepMind chose EVE as a research environment for long-horizon planning and continual learning.

DeepMind is Entering an Offline Sandbox

Not the Live Player Universe

DeepMind's collaboration method with Fenris is more conservative than one might imagine. DeepMind does not have direct access to the live player servers.

DeepMind officially stated in the announcement: Initial research will be conducted on an offline version of EVE Online, using local servers in a controlled environment to test and evaluate models, without connecting to EVE Online's live operational servers.

On one hand, the offline version means DeepMind will not consume live player PvP data or disrupt the actual server economy, avoiding any privacy and compliance complexities.

On the other hand, the offline version of EVE can still retain the complex rule systems, ship and economic mechanics, star system structure, and other core design elements.

DeepMind is getting a "complex world pressure-tested by players for 23 years" as the examination hall where its agents must survive.

From Atari to EVE

Where This Path Leads

Looking back at DeepMind's choice of training grounds over the past decade, there's a clear evolutionary line.

2013 to 2015: Atari was the starting point. DQN put agents into games like *Breakout* and *Space Invaders* with clear levels and closed rules. It tested reaction and value estimation.

2016 to 2017: AlphaGo and AlphaZero. Go has neat rules, a huge but closed action space. It tested search and long-chain reasoning.

2019: AlphaStar entered *StarCraft II*. The first entry into a real-time, imperfect-information, multi-threaded博弈 environment. It tested decision-making under partial observability.

2024: SIMA aimed to be a generalist agent across multiple games. It tested transfer and generalization.

2025: SIMA 2 upgraded: not just executing instructions, but also conversing with users, reasoning about goals, and self-improving during gameplay.

DeepMind's SIMA 2, released in 2025, has evolved from "executing instructions" to "understanding goals, reasoning processes, and learning while playing."

Each generation of environment incorporates more aspects of the "real world" than the last: from closed rules to open rules, from perfect information to imperfect information, from single-game对抗 to cross-game migration.

However, these previous environments were still relatively closed, segmentable, and repeatable task fields. For example, Atari has fixed-rule arcade games; AlphaStar faced StarCraft matches that ended one by one; SIMA tested cross-game generalization in multiple 3D virtual environments.

The difference with EVE is that it is a persistent world that has been running long-term, driven by players, with continuously evolving economic and political structures.

It has been organically evolved over 23 years by real players in an open-ruled world: a complete player-driven economy (ISK price fluctuations comparable to real financial markets), political structures across alliances (diplomacy, espionage, ceasefires), and a whole warfare ecosystem from small skirmishes to 21-hour mega-battles.

The consensus within the field on agent evaluation is increasingly clear: running point task benchmarks hasn't produced anything new for a long time, but long-term memory, planning across weeks, and learning from failure still lack decent evaluation arenas.

Therefore, DeepMind's choice this time is: rather than creating another synthetic environment, step into an "artificial society" that has already been pressure-tested by human players for 23 years.

But a bigger question then emerges:

An AI agent that can persist, continually learn, and plan within EVE—what is still missing between it and an autonomous agent operating in the real world?

References:

https://x.com/GoogleDeepMind/status/2052011542707630461

https://www.ccpgames.com/news/2026/studio-behind-eve-online-goes-independent-rebrands-as-fenris-creations-enters-research-partnership-with-google-deepmind

https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/

This article is from the WeChat public account "新智元" (New Zhiyuan), author: ASI启示录 (ASI Revelation), editor: 元宇 (Yuanyu).

Related Questions

QWhy did DeepMind choose EVE Online as a research environment for AI agents?

ADeepMind chose EVE Online because it provides a complex, player-driven, and persistent universe that has evolved over 23 years. This environment is a perfect safe sandbox for testing key challenges in AI research, specifically long-horizon planning, memory, and continual learning, which are difficult to replicate in standard, closed-ended AI benchmarks.

QWhat are the three main research challenges DeepMind aims to tackle in its EVE Online collaboration?

AThe three main research challenges are long-horizon planning, memory, and continual learning. These are considered among the hardest problems in current AI agent research, and they correspond to the everyday activities and adaptations of long-term EVE Online players.

QHow is the DeepMind and Fenris Creations research collaboration structured in terms of accessing the EVE Online game world?

AThe initial research will be conducted in an offline version of EVE Online on local servers. DeepMind will not connect to the live, operational game servers. This approach provides a controlled environment for testing and evaluation without impacting the active player economy or raising privacy and compliance issues.

QAccording to the article, how does the complexity of EVE Online's environment differ from previous DeepMind research platforms like Atari games or StarCraft?

AUnlike previous platforms like Atari (closed rules, single sessions) or StarCraft (individual matches with a clear end), EVE Online is a persistent, single-shared universe without a defined end. Its complexity is not just in its rules but in the player-driven, long-term evolution of its economy, politics, and warfare, which have developed organically over 23 years.

QWhat major corporate changes happened to the studio behind EVE Online alongside the announcement of the DeepMind partnership?

AThe studio (formerly CCP Games) became independent from its parent company Pearl Abyss, rebranded as Fenris Creations, completed a $120 million transaction, and Google acquired a minority stake in the new company as part of the deal, which also initiated the research partnership with Google DeepMind.

Related Reads

Why More AI Agents Does Not Equal Higher Productivity?

Editor's Note: As AI Agents become cheaper and easier to use, a new constraint emerges: the cost isn't in launching more Agents, but in the human attention required to manage, judge, and integrate their outputs. This hidden cost is called the "orchestration tax." The article argues that a developer's cognitive bandwidth is the key bottleneck—a serial, non-parallelizable resource akin to a Global Interpreter Lock (GIL). While many Agents can run concurrently, their results ultimately require human judgment for review, conflict resolution, and final integration. Therefore, more Agents don't automatically mean higher productivity; they can simply create longer queues, lead to cognitive fatigue, and create the illusion of busyness without real output. The core solution is to design workflows around this scarce human attention. Key strategies include: scaling the number of Agents to match review capacity (not UI capacity), categorizing tasks (delegating independent ones, keeping complex judgment-heavy ones serial), batch reviewing results to minimize context-switching costs, automating verifiable checks to reserve human judgment for critical decisions, and protecting focused, uninterrupted thinking time. Ultimately, the critical skill is not launching many Agents, but architecting systems that respect the fundamental limit of human attention. Unpaid "orchestration tax" accumulates as both technical and cognitive debt, undermining system understanding and quality. True productivity comes from thoughtfully managing the single-threaded resource—your focus.

marsbit1h ago

Why More AI Agents Does Not Equal Higher Productivity?

marsbit1h ago

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

Three Years Later: Revisiting My 2023 Predictions on ChatGPT In March 2023, shortly after ChatGPT's launch, I made 20 predictions about its future. Now, in mid-2026, I've used AI agents to fact-check each one against the latest data. Overall, most major directional forecasts were correct, with only one outright error (incorrectly stating GPT-4 had 100 trillion parameters). Key successes included predicting that RAG and retrieval architectures would become the standard for handling knowledge and hallucinations, that natural language interfaces (LUI) would create a massive new industry layer beyond the models themselves, and that China would develop viable large language models, significantly closing the performance gap with Western counterparts within about three years. Predictions about the absence of mass unemployment, the rise of a new "robot network" for agent communication, and ChatGPT not possessing consciousness also held true in their core arguments. However, the "devil was in the details." Errors frequently involved specific numbers, timelines, or overlooking distributional effects. I tended to overestimate the speed of adoption (e.g., for agent networks) while underestimating the ultimate scale of capabilities or costs (e.g., AI winning IMO gold without tools, or the extreme capital required for frontier models). Other misjudgments included: underestimating how AI would reinforce, not dissolve, information filter bubbles; incorrectly assuming AI-generated content would easily circumvent copyright (it has instead triggered record-breaking settlements); and misidentifying where value would be captured (it accrued overwhelmingly to the compute layer, like Nvidia, not just the application or model layers). Key lessons from reviewing these predictions are: 1) Directional and mechanistic insights are far more reliable than precise numbers or absolute statements. 2) There's a consistent bias to overestimate short-term speed but underestimate long-term magnitude. 3) Errors often lie in missing distributional impacts within a generally correct aggregate trend. 4) Predictions phrased with nuance and caveats aged the best. 5) Some fundamental debates (e.g., on machine consciousness or the ultimate value chain) remain unresolved even after three years. This exercise is less about scoring the past and more about establishing rules for clearer thinking about the next three years of AI.

marsbit7h ago

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

marsbit7h ago

Three Years Later: Looking Back on My 2023 Predictions for ChatGPT

Looking Back After Three Years: Revisiting My 2023 Predictions on ChatGPT In March 2023, shortly after ChatGPT's debut and before GPT-4's release, I made over twenty predictions about AI's future based on limited information and intuition. Now, in May 2026, I revisited those forecasts using an AI-driven analysis with 41 Opus 4.8 agents to cross-reference them with the latest data. The assessment used symbols: ✅ Correct, 🟢 Mostly Correct, 🟡 Partially Correct, ❌ Incorrect. Overall, the directional judgments held up well, with only one major factual error regarding GPT-4's rumored parameter size (incorrectly cited as 100T). However, nuances and degrees of accuracy revealed more. **What Was Largely Correct:** Predictions about mechanisms and directions proved accurate. The rise of RAG (Retrieval-Augmented Generation) as the standard architecture for combating AI hallucination was confirmed, as was the transformative potential of LUI (Language User Interface) in creating a new industry layer atop GUIs. The emergence of "robot networks" (agent-to-agent communication protocols) and China's rapid catch-up in developing capable large models (closing the performance gap with top models to ~2.7%) were also on point. The analysis affirmed that LLMs lack consciousness and that the Turing Test merely measures perceived intelligence. **What Was Off Target:** Errors often involved specific numbers, over-optimistic timelines, or misjudged distributions. The prediction that value would primarily accrue to the application layer was half-right but missed NVIDIA's dominance as the profitable infrastructure layer. Forecasts about AI circumventing copyright issues and fostering a "global common ground" by averaging human viewpoints were incorrect; instead, major copyright settlements occurred and AI personalization is increasing. Estimates for model training costs ("$5-10 billion cap") were significantly off, underestimating frontier costs and overestimating replication costs. The notion that LLMs could never do complex math without tools was disproven by later models winning IMO gold. **Key Patterns from the Review:** 1. **Direction over precision:** Judgments about mechanisms and trends were more reliable than specific numbers or definitive statements. 2. **Timing bias:** There was a tendency to overestimate short-term speed but underestimate long-term magnitude and transformation. 3. **The distribution blind spot:** Aggregate-level correctness often masked uneven impacts (e.g., on young professionals' employment). 4. **The value of qualifiers:** Predictions framed with caution (e.g., "reportedly," "for now," "prototype in 2-3 years") aged better. 5. **Some debates continue:** Issues like the nature of "emergent abilities" or machine consciousness remain unresolved. This three-year review highlights that while seeing the big picture is crucial, humility regarding specifics, timelines, and disparate impacts is essential for future forecasting.

链捕手10h ago

Three Years Later: Looking Back on My 2023 Predictions for ChatGPT

链捕手10h ago

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

The article issues a stark warning about a potential AI investment bubble. It notes that while the AI boom shares similarities with the TMT bubble of the late 1990s, its scale is vastly larger, currently driving 93% of U.S. GDP growth. Major hyperscale cloud providers like Microsoft, Alphabet, Amazon, Meta, and Oracle are planning to invest trillions in AI data centers over the coming years. However, calculations based on analyst projections for 2025-2030 reveal a concerning math problem: expected capital expenditure growth far outpaces projected revenue growth. Even under an extremely optimistic scenario of zero costs, the implied return on investment for most of these tech giants (except Amazon) is deeply negative. This suggests that the current trajectory could lead to one of history's largest shareholder value destruction events. The piece outlines two potential escapes: AI generating vastly more revenue than currently anticipated—a near-impossible task—or a significant cutback in the planned investment splurge. The latter scenario could trigger a domino effect, severely impacting the entire tech supply chain (from Nvidia to TSMC), potentially pushing the U.S. economy into recession, and causing a major stock market downturn. The author suggests upcoming high-profile IPOs by companies like OpenAI and Anthropic might represent a transfer of risk from early investors to public market participants. While the peak of the hype cycle might sustain investment through 2026, the fundamental financial dilemma remains unresolved, setting the stage for a potential market correction in 2027 or 2028, similar to the years following Alan Greenspan's "irrational exuberance" warning.

marsbit11h ago

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

marsbit11h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片