Топовые ИИ-модели не осилили видеоигры девяностых

cryptonews.ruPublished on 2025-03-21Last updated on 2025-04-21

Даже самые продвинутые ИИ-модели не способны эффективно играть в классический шутер от первого лица Doom. К такому выводу пришли эксперты после проверки нейросетей в новом бенчмарке VideoGameBench.

Claude can play Pokemon, but can it play DOOM?

With a simple agent, we let VLMs play it, and found Sonnet 3.7 to get the furthest, finding the blue room!

Our VideoGameBench (twenty games from the 90s) and agent are open source so you can try it yourself now —> 🧵 pic.twitter.com/vl9NNZPBHY

— Alex Zhang (@a1zhang) April 17, 2025

Тест призван проверить способность современных нейросетей играть и побеждать в 20 популярных видеоиграх. Использовать они могут только информацию с экрана.

«Современные модели VLM с трудом справляются с видеоиграми из-за высокой задержки вывода. Когда агент делает снимок экрана и запрашивает VLM о том, какое действие ему следует предпринять, к моменту получения ответа состояние игры значительно меняется, и действие уже неактуально», — отметили исследователи.

Для теста использовались классические игры из 1990 годов из-за простых визуальных эффектов и различных стилей ввода вроде мыши, клавиатуры и игрового контроллера. Такой подход позволяет проверить у модели пространственное мышление и «зрение».

VideoGameBench разработан ученым и ИИ-исследователем Алексом Чжаном. В бенчмарк входят Warcraft II, Age of Empires, Prince of Persia и другие игры.

Список игр из бенчмарка VideoGameBench. Данные: сайт vgbench.

Sonnet 3.7 справилась с Doom лучше остальных — нейросеть нашла синюю комнату.

Исследователи подчеркнули, что задержка реакции — главная проблема в шутерах от первого лица. В быстро меняющейся обстановке враг может переместиться или даже добраться до игрока раньше его реакции на происходящее.

Помимо проблем с пониманием игрового окружения, модели также не могли выполнить основные действия.

«Мы часто наблюдали случаи, когда агент не мог понять, как его действия вроде движения вправо будут отображаться на экране. Самой распространенной ошибкой среди всех протестированных нами пограничных моделей оказалась неспособность надежно управлять мышью в таких играх, как Civilization и Warcraft II, где очень важны точные и частые движения», — отметили эксперты.

Также модели не всегда понимают игровые механики, когда нет прямой инструкции о необходимых действиях.

Напомним, в феврале ИИ-стартап Anthropic представил свою «самую интеллектуальную модель» Claude 3.7 Sonnet, которая прошла игру Pokemon.

Related Reads

Base Halts for Two Hours: A Single Invalid Block Reveals the Centralized Reality of L2s

Base, an Ethereum Layer-2 Rollup, experienced a two-hour network outage starting around 00:03 UTC on June 26. The halt was caused by a consensus issue that led to an invalid block being sequenced, which prevented the generation of new blocks after block 47806542. The team identified the problem, restored block sequencing by 01:51 UTC, and confirmed full recovery of ecosystem infrastructure synchronization shortly after. This incident highlights the operational reality for many L2s: while they leverage Ethereum for security and data availability, their day-to-day usability heavily depends on their sequencer and internal systems. Base employs a high-availability sequencer system with one active leader, but this setup did not prevent the outage when a consensus-level problem arose. This follows a previous 33-minute outage in August 2025 related to a faulty sequencer handover process. The downtime occurred near the scheduled activation window for the "Beryl" network upgrade, which has since been postponed. Beryl introduces the native B20 token standard, among other improvements. The incident has sparked renewed discussion about Base potentially launching its own network token in the future, shifting the conversation from mere speculation to questions about how a token might relate to sequencer decentralization, governance, and accountability in such failure scenarios.

Foresight News2m ago

Base Halts for Two Hours: A Single Invalid Block Reveals the Centralized Reality of L2s

Foresight News2m ago

STRC Must Re-Anchor for a BTC Bull Market to Happen

Title: STRC's Depegging Threatens MicroStrategy's Bitcoin-Buying Machine, and Thus the BTC Bull Run Summary: The sustained depegging of MicroStrategy's priority share STRC (trading ~25% below its $100 target) is severely disrupting the company's core business model and poses a major risk to Bitcoin (BTC) price support. STRC was MicroStrategy's most efficient and low-cost funding tool, designed to allow continuous capital raises near its $100 par value to fuel relentless BTC accumulation. Its depegging has effectively blocked this primary funding channel. The situation creates a severe cash flow crisis. STRC and other priority shares now obligate MicroStrategy to pay approximately $1.7 billion in annual cash dividends, while the company's cash reserves are only about $1.4 billion — insufficient to cover one year of payments. To raise cash, MicroStrategy is increasingly resorting to issuing common stock (MSTR) through ATM offerings. However, recent raises show most proceeds (around 90% in one week) are now used to bolster cash reserves rather than buy Bitcoin. This dilutes the key metric of Bitcoin per MSTR share, eroding the fundamental value proposition for equity investors. The company faces grim alternatives: issuing high-cost debt or selling its massive Bitcoin holdings. The latter, though hinted at, would likely trigger significant negative market reactions. Conclusion: As BTC's largest corporate holder and a major marginal buyer, MicroStrategy's funding woes mean reduced, and potentially reversing, institutional buy-side pressure. The company has shifted from being a guaranteed source of BTC demand to a significant overhang on the market. The article argues that without STRC re-anchoring to restore its funding engine, a sustained BTC bull market is in jeopardy.

marsbit22m ago

STRC Must Re-Anchor for a BTC Bull Market to Happen

marsbit22m ago

No Bull Market for BTC Without STRC Re-pegging

Summary: The sustained de-pegging of MicroStrategy's (MSTR) Strategy Preferred Shares (STRC) poses a severe threat to Bitcoin (BTC) and could prevent a bull market. STRC, designed to trade near a $100 target, has plunged to around $75, effectively shutting down MicroStrategy's cheapest and most efficient funding channel. This channel was critical for its "raise funds, buy BTC" business model. More critically, MicroStrategy now faces a massive cash outflow from these preferred shares. With approximately $10.49 billion of STRC outstanding at an 11.5% dividend yield, the annual cash obligation exceeds $1.2 billion. Combined with other preferred shares, the total annual payout nears $1.7 billion, depleting its current ~$1.4 billion cash reserve within a year. To address this, MicroStrategy is increasingly relying on common stock (MSTR) offerings via its ATM program. However, recent sales show most raised capital is now used to bolster cash reserves rather than buy more Bitcoin. This dilutes the key metric of BTC per share for common stockholders, eroding the foundation of its premium valuation. If STRC cannot re-peg, this costly dilution may continue. Worse, if cash pressure intensifies, selling Bitcoin becomes a real risk. As the largest corporate BTC holder (~847,363 BTC), any significant sales could crash the market. Thus, MicroStrategy is transforming from BTC's most reliable institutional buyer into a major potential seller, casting a significant shadow over Bitcoin's price prospects.

Odaily星球日报23m ago

No Bull Market for BTC Without STRC Re-pegging

Odaily星球日报23m ago

Tornado Cash Suffers Another Governance Attack: A Fake Proposal Targets $23 Million Community Treasury

On June 25, 2026, a deceptive governance proposal (#67) appeared in the Tornado Cash DAO, masquerading as an upgrade to implement fee adjustments and token burns. Security researchers, including Sergey Shemyakov and Pascal Caversaccio, quickly identified it as malicious. The proposal's unverified code contained a hidden function designed to stealthily replace the protocol's legitimate governance address (0x5efda50f22d34F262c29268506C5Fa42cB56A1Ce) with an attacker-controlled address (0x5efda50f22d34f272c7077689d6abc42f15e285f). If passed, this would have granted the attacker control over the DAO's treasury, containing approximately $23 million in TORN tokens, and the ability to drain all relayers. The attacker's wallet (0xd4eca8c9242b9f9faa3cf19a78defc21dc97a925) was funded via the privacy protocol Railgun four days prior, obscuring the source. The community response was swift, with the proposal receiving 27,163 TORN votes against (100%) and 0 for, far below the 100,000 TORN quorum required for validity. It is set to expire on June 30. This incident marks the second major governance attack on Tornado Cash, following a May 2023 exploit that stole $2.17 million. It highlights persistent vulnerabilities in DAO structures where power derives from token ownership. The article advises users to follow security researchers, vote against unverified proposals, and delegate voting power. For developers, implementing timelocks—a delay between proposal approval and execution—is presented as a critical security measure to allow for community review and intervention.

Foresight News43m ago

Tornado Cash Suffers Another Governance Attack: A Fake Proposal Targets $23 Million Community Treasury

Foresight News43m ago

Trading

Spot
Futures
活动图片