AlphaGo's Creator Puts AI into a 23-Year-Old Artificial Society: All Three Toughest Challenges for AI Agents Are Here

marsbitОпубликовано 2026-05-25Обновлено 2026-05-25

Введение

Demis Hassabis, CEO of DeepMind, has embarked on a new AI research venture by partnering with the long-running space MMO, EVE Online. This collaboration, announced in early May, aims to use the game's 23-year-old, player-driven persistent universe as a testbed for tackling three core challenges in AI agent research: long-horizon planning, memory, and continual learning. Unlike previous DeepMind environments like AlphaGo (Go) or AlphaStar (StarCraft II), EVE Online features no fixed end state. Its single-shard universe has fostered complex, emergent player societies with real economies, political alliances, and wars that can span months or years. These conditions naturally demand the very skills—long-term strategic planning, maintaining memories over extended periods, and adapting to constant change—that are hardest for current AI agents to master. The research will initially use an offline version of EVE, providing a controlled, complex sandbox without interfering with the live player server. This move continues DeepMind's trajectory of using increasingly complex and open-ended virtual worlds for AI training, from Atari games and Go to StarCraft II and the SIMA project. The EVE environment represents a significant step towards testing AI in a persistent, socially complex, and continuously evolving world shaped by human behavior over decades.

DeepMind CEO and AlphaGo creator Demis Hassabis has been using games for AI research for over a decade.

This time, he has thrown AI into a "living universe" that has been running for 23 years: the space-themed massively multiplayer online game EVE Online, a game whose new player tutorial alone can deter players.

Chess games have an end, but EVE does not.

In early May, DeepMind officially announced a research collaboration with EVE Online for a simple reason: EVE's complex, player-driven universe is the perfect safe sandbox to test AI memory, continual learning, and long-term planning.

DeepMind's collaboration with EVE is not about pursuing fun gameplay or enhancing game mechanics. Instead, it aims to tackle the three toughest, most widely recognized challenges in current AI agent research. Hassabis is betting on finding answers in a 23-year-old game.

Fenris Creations (formerly CCP Games) announces partnership with DeepMind

On the same day, May 6th, the company behind EVE Online announced four things:

Regained independence from its parent company Pearl Abyss;
Renamed to Fenris Creations;
Completed a $120 million transaction;
As part of this independence, Google acquired a minority stake in Fenris Creations and simultaneously initiated a research partnership with Google DeepMind.

Fenris Creations CEO Hilmar Veigar Pétursson stated in the announcement:

This transition does not involve layoffs or restructuring. The team, products, and development plans remain unchanged. EVE continues.

Looking at operational figures, this company came to the table with "real ammunition" for collaboration, not to sell assets for survival.

EVE Online's revenue in 2025 exceeded $70 million, with November setting a historical revenue record, and Q4 becoming the second-highest revenue quarter in the game's 20-year history.

Fenris Creations' independence means EVE now has a parent company that can autonomously decide on research collaborations, no longer constrained by the strategic goals of a larger game publishing company.

A box of a board game product published by Fenris in 1997. The name "Fenris" predates EVE Online by 6 years. Renaming to Fenris Creations is a look back, not a fresh start.

Why did DeepMind choose EVE?

A 23-Year "Artificial Society"

An AI Benchmark Difficult to Replicate

When many people hear "games + AI research," their first thought is of AlphaGo or AlphaStar. EVE is different from both.

Go and StarCraft share a common characteristic: a match has a beginning, an end, and clear win/lose rules.

AlphaGo's goal was to win a Go game. AlphaStar's goal was to win a StarCraft match. Both represent a "single-game intelligence" research paradigm. But EVE has no endgame.

EVE Online is famous for its "single-shard / single shared universe," where a vast number of players compete, trade, form alliances, and wage war in a persistent world over the long term.

Players here have built real economic systems, political alliances, military coalitions, trade routes, historical grudges, and warfare plans that span years.

Some campaigns take an entire year from preparation to conclusion. The rise and fall of some alliances are studied by later players as real history.

Hilmar stated in the announcement: "EVE is one of the few places where we can explore questions of intelligence in an environment that already operates like the real world."

Hassabis further explained that he has played games since childhood, his career started with designing AI simulation games, and his work on AlphaGo, AlphaStar, and SIMA has been deeply tied to games. EVE is the choice for the next stage:

I'm thrilled to partner with Fenris Creations to safely explore new game experiences and advance AI research within this player-created, uniquely complex universe.

Most AI benchmarks are like medical checkups. EVE is more like throwing AI into an "artificial society" that has been running for 23 years.

The Three Toughest Challenges for Agents

Happen to be Daily Life for EVE Players

The official announcement explicitly lists three research directions: long-horizon planning, memory, and continual learning.

These three directions are widely acknowledged as the three toughest challenges in current AI agent research.

If you know someone who has played EVE Online for over ten years, ask them to open their account and show you their friend list. You'll likely see dozens of groups and hundreds of names, with notes in the remarks field like "Debt owed from the 2018 Delve campaign," "Traitor within Goonswarm, do not cooperate," "This guy is a spy, everyone in the corp knows."

This isn't a context window; it's cross-session long-term memory spanning at least a decade.

EVE players navigate the memory challenge every day. The continual learning challenge is the same.

In January 2014, the B-R5RB battle lasted about 21 hours, involving over 7,500 characters, the destruction of 75 Titans, with losses equivalent to roughly $300,000 in real-world currency. The trigger for the entire battle was a sovereignty bill that failed to auto-pay.

After this battle, the entire game's fleet tactics were rewritten. Alliance fleet compositions and tactical systems for years after revolved around post-battle analysis and iteration. Updates were made monthly, with every failure broken down into actionable strategic updates.

As for long-horizon planning, the standard time unit for EVE alliance warfare isn't hours; it's months. From preparation to execution, a cross-regional war involves shipbuilding, logistics, diplomacy, infiltration, and counter-espionage, with hundreds of players spontaneously collaborating without any task manager to advance a common goal over months.

This collaborative system evolved organically from the players over 23 years.

The three hardest challenges recognized in current AI agent evaluation happen to be the daily life of EVE players.

Twenty-three years of player-driven evolution in EVE have produced an environment that is always changing, always complex, with no shortcuts. This level of complexity cannot be synthetically created in a lab.

DeepMind's SIMA 2, released in November 2025, has evolved from "executing instructions" to "understanding goals, reasoning about processes, and learning while playing."

From a research question perspective, the EVE project shares the same "games as a training ground for agents" path as SIMA 2. The difference is that the venue has been swapped for a real universe that has been running for 23 years.

In-game battle scene from EVE Online. These large-scale, player-organized battles, often lasting for hours, are the core reason DeepMind chose EVE as a research environment for long-horizon planning and continual learning.

DeepMind is Entering an Offline Sandbox

Not the Live Player Universe

DeepMind's collaboration method with Fenris is more conservative than one might imagine. DeepMind does not have direct access to the live player servers.

DeepMind officially stated in the announcement: Initial research will be conducted on an offline version of EVE Online, using local servers in a controlled environment to test and evaluate models, without connecting to EVE Online's live operational servers.

On one hand, the offline version means DeepMind will not consume live player PvP data or disrupt the actual server economy, avoiding any privacy and compliance complexities.

On the other hand, the offline version of EVE can still retain the complex rule systems, ship and economic mechanics, star system structure, and other core design elements.

DeepMind is getting a "complex world pressure-tested by players for 23 years" as the examination hall where its agents must survive.

From Atari to EVE

Where This Path Leads

Looking back at DeepMind's choice of training grounds over the past decade, there's a clear evolutionary line.

2013 to 2015: Atari was the starting point. DQN put agents into games like *Breakout* and *Space Invaders* with clear levels and closed rules. It tested reaction and value estimation.

2016 to 2017: AlphaGo and AlphaZero. Go has neat rules, a huge but closed action space. It tested search and long-chain reasoning.

2019: AlphaStar entered *StarCraft II*. The first entry into a real-time, imperfect-information, multi-threaded博弈 environment. It tested decision-making under partial observability.

2024: SIMA aimed to be a generalist agent across multiple games. It tested transfer and generalization.

2025: SIMA 2 upgraded: not just executing instructions, but also conversing with users, reasoning about goals, and self-improving during gameplay.

DeepMind's SIMA 2, released in 2025, has evolved from "executing instructions" to "understanding goals, reasoning processes, and learning while playing."

Each generation of environment incorporates more aspects of the "real world" than the last: from closed rules to open rules, from perfect information to imperfect information, from single-game对抗 to cross-game migration.

However, these previous environments were still relatively closed, segmentable, and repeatable task fields. For example, Atari has fixed-rule arcade games; AlphaStar faced StarCraft matches that ended one by one; SIMA tested cross-game generalization in multiple 3D virtual environments.

The difference with EVE is that it is a persistent world that has been running long-term, driven by players, with continuously evolving economic and political structures.

It has been organically evolved over 23 years by real players in an open-ruled world: a complete player-driven economy (ISK price fluctuations comparable to real financial markets), political structures across alliances (diplomacy, espionage, ceasefires), and a whole warfare ecosystem from small skirmishes to 21-hour mega-battles.

The consensus within the field on agent evaluation is increasingly clear: running point task benchmarks hasn't produced anything new for a long time, but long-term memory, planning across weeks, and learning from failure still lack decent evaluation arenas.

Therefore, DeepMind's choice this time is: rather than creating another synthetic environment, step into an "artificial society" that has already been pressure-tested by human players for 23 years.

But a bigger question then emerges:

An AI agent that can persist, continually learn, and plan within EVE—what is still missing between it and an autonomous agent operating in the real world?

References:

https://x.com/GoogleDeepMind/status/2052011542707630461

https://www.ccpgames.com/news/2026/studio-behind-eve-online-goes-independent-rebrands-as-fenris-creations-enters-research-partnership-with-google-deepmind

https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/

This article is from the WeChat public account "新智元" (New Zhiyuan), author: ASI启示录 (ASI Revelation), editor: 元宇 (Yuanyu).

Связанные с этим вопросы

QWhy did DeepMind choose EVE Online as a research environment for AI agents?

ADeepMind chose EVE Online because it provides a complex, player-driven, and persistent universe that has evolved over 23 years. This environment is a perfect safe sandbox for testing key challenges in AI research, specifically long-horizon planning, memory, and continual learning, which are difficult to replicate in standard, closed-ended AI benchmarks.

QWhat are the three main research challenges DeepMind aims to tackle in its EVE Online collaboration?

AThe three main research challenges are long-horizon planning, memory, and continual learning. These are considered among the hardest problems in current AI agent research, and they correspond to the everyday activities and adaptations of long-term EVE Online players.

QHow is the DeepMind and Fenris Creations research collaboration structured in terms of accessing the EVE Online game world?

AThe initial research will be conducted in an offline version of EVE Online on local servers. DeepMind will not connect to the live, operational game servers. This approach provides a controlled environment for testing and evaluation without impacting the active player economy or raising privacy and compliance issues.

QAccording to the article, how does the complexity of EVE Online's environment differ from previous DeepMind research platforms like Atari games or StarCraft?

AUnlike previous platforms like Atari (closed rules, single sessions) or StarCraft (individual matches with a clear end), EVE Online is a persistent, single-shared universe without a defined end. Its complexity is not just in its rules but in the player-driven, long-term evolution of its economy, politics, and warfare, which have developed organically over 23 years.

QWhat major corporate changes happened to the studio behind EVE Online alongside the announcement of the DeepMind partnership?

AThe studio (formerly CCP Games) became independent from its parent company Pearl Abyss, rebranded as Fenris Creations, completed a $120 million transaction, and Google acquired a minority stake in the new company as part of the deal, which also initiated the research partnership with Google DeepMind.

Похожее

Vitalik Finally Speaks Out: ETH is the Most Important Product of Ethereum

Ethereum co-founder Vitalik Buterin outlines his personal vision for the future direction of the Ethereum Foundation (EF). He emphasizes that the EF should not cater to mainstream trends but must be "impressive" by steadfastly adhering to CROPS values: censorship-resistant, open, private, and secure. Buterin argues the EF, holding only about 0.16% of ETH, should not be Ethereum's center but a specialized node focused on critical, long-term tasks others might neglect. Its new role is to ensure Ethereum becomes deeply impressive in the CROPS dimension, not by chasing maximal transaction speed, but by pursuing ambitious technical goals: 1) A provably bug-free Ethereum via AI-assisted formal verification. 2) High-availability chain consensus combining BFT and Bitcoin-style resilience. 3) "Intermediary minimization" to reduce reliance on middlemen for transaction inclusion and user data. He states that Ethereum's highest-value "product" is the ETH asset itself, which these features support. Buterin reveals nearly 90% of his net worth is in ETH. The future EF will be a smaller, more opinionated, and longer-lasting organization dedicated to ensuring Ethereum delivers something meaningful to the world.

Odaily星球日报4 мин. назад

Vitalik Finally Speaks Out: ETH is the Most Important Product of Ethereum

Odaily星球日报4 мин. назад

Futu's Fine Turns into a Boon for Hyperliquid?

The article explores the interconnected narratives of a regulatory crackdown on Chinese fintech brokers and the rise of the decentralized exchange Hyperliquid. It begins with China's May 2026 proposal for severe penalties against brokers like Futu and Tiger for illegal cross-border operations, suggesting this may redirect capital toward platforms like Hyperliquid. This is evidenced by HYPE token's price surge coinciding with the news. The core of the article analyzes Hyperliquid's disruptive potential and the regulatory pressure it faces. Traditional giants like CME and ICE are lobbying the CFTC to crack down on Hyperliquid, citing its lack of KYC, position limits, and market surveillance—particularly for its weekend crude oil contracts, which challenge traditional market hours. Despite this, Hyperliquid demonstrates remarkable efficiency, with a small team generating high revenue, largely funneled into HYPE buybacks. Its innovation lies in synthetic perpetual contracts for pre-IPO companies (e.g., Cerebras, SpaceX), enabling price discovery outside traditional channels. Unlike tokenized equity platforms (PreStocks, Ondo) tied to physical assets or entities, Hyperliquid's "asset-less" synthetic contracts are argued to be more resilient to legal targeting, as they are simply code on a decentralized network. However, the article notes this is not absolute, citing the network's limited validators and past interventions. The piece concludes that Hyperliquid's fundamental advantage is offering continuous, permissionless trading—effectively competing on *time*—which established players cannot easily replicate, even as significant regulatory risks loom.

marsbit25 мин. назад

Futu's Fine Turns into a Boon for Hyperliquid?

marsbit25 мин. назад

Mythos Report Released: Billions of Devices Worldwide Exposed, 10,000 Critical Vulnerabilities Uncovered in 30 Days

The first report from Anthropic's "Project Glasswing" reveals staggering results from its secret initiative using the next-generation AI model, Claude Mythos Preview. In just 30 days, collaborating with roughly 50 global tech giants and critical infrastructure developers, Mythos identified over 10,000 high or critical-severity software vulnerabilities. It demonstrated an extremely low false-positive rate, even outperforming human experts, and successfully intercepted a $1.5 million bank fraud in progress. Key findings include uncovering 2,000 bugs in Cloudflare's core systems, fixing 271 critical vulnerabilities in Firefox 150 (ten times more than previous methods), and discovering a 27-year-old hidden bug in OpenBSD's codebase. The AI even autonomously constructed full attack chains for some exploits. Mythos also scanned over 1,000 essential open-source projects, identifying 23,019 total vulnerabilities, with 6,202 rated high/critical by the AI. Independent verification confirmed a 90.6% true-positive rate, validating 1,094 severe vulnerabilities. A critical case involved wolfSSL, a cryptography library used by billions of devices, where Mythos found a flaw allowing perfect digital certificate forgery. This unprecedented discovery speed has created a new crisis: human developers are overwhelmed and cannot patch vulnerabilities fast enough. In response, Anthropic is rolling out defensive tools like "Claude Security" to auto-generate patches and releasing frameworks to help security teams automate code review and threat modeling. Due to its immense power and potential for weaponization if misused, Anthropic is delaying Mythos's public release until robust safety measures are established. The company urges the industry to shorten patch cycles, enforce updates, and strengthen security fundamentals. The project signals a paradigm shift where AI could eventually make critical code vastly more secure, though the transition period poses significant challenges for human defenders.

marsbit1 ч. назад

Mythos Report Released: Billions of Devices Worldwide Exposed, 10,000 Critical Vulnerabilities Uncovered in 30 Days

marsbit1 ч. назад

VItalik Buterin Defends Long-Term Vision Amid Token Price Concerns

Ethereum co-founder Vitalik Buterin defended the Ethereum Foundation's long-term vision amidst concerns over ETH's token price, which has fallen more than 50% from its all-time high. He stated the Foundation's core goals remain supporting decentralization, security, censorship-resistance, and long-range research, not competing on transaction throughput. Buterin emphasized the Foundation's minimal ETH holdings (around 0.16% of supply) compared to other projects and announced a focus on "longevity," prioritizing research over actions that could impact price. While the Foundation recently unstaked a significant amount of ETH, he clarified this does not mean the tokens will be sold. The comments address market worries following high-profile exits and sales by some investors, as well as criticism that post-Dencun upgrade tokenomics were not adequately considered.

TheNewsCrypto3 ч. назад

VItalik Buterin Defends Long-Term Vision Amid Token Price Concerns

TheNewsCrypto3 ч. назад

Bitcoin Spot ETFs Bleed $1.26 Billion In Largest Net Outflows In 3 Months – Details

US Bitcoin Spot ETFs saw $1.26 billion in net outflows last week, marking the largest withdrawal since late January. The sector experienced six consecutive days of negative flows, with BlackRock's IBIT alone accounting for over $1 billion in net outflows. Overall, May has seen $1 billion leave these ETFs. Ethereum Spot ETFs also extended their outflow streak to 10 days, losing $215.19 million last week. Despite the capital flight, Bitcoin's price rose 1.75% to $76,735, while Ethereum increased 2.78% to $2,119 at the time of reporting.

bitcoinist9 ч. назад

Bitcoin Spot ETFs Bleed $1.26 Billion In Largest Net Outflows In 3 Months – Details

bitcoinist9 ч. назад

Торговля

Спот

Фьючерсы

Обсуждения

Добро пожаловать в Сообщество HTX. Здесь вы сможете быть в курсе последних новостей о развитии платформы и получить доступ к профессиональной аналитической информации о рынке. Мнения пользователей о цене на AI (AI) представлены ниже.