# Пов'язані статті щодо Continual Learning

Центр новин HTX надає останні статті та поглиблений аналіз на тему "Continual Learning", що охоплює ринкові тренди, оновлення проєктів, технологічні розробки та регуляторну політику в криптоіндустрії.

AlphaGo's Creator Puts AI into a 23-Year-Old Artificial Society: All Three Toughest Challenges for AI Agents Are Here

Demis Hassabis, CEO of DeepMind, has embarked on a new AI research venture by partnering with the long-running space MMO, EVE Online. This collaboration, announced in early May, aims to use the game's 23-year-old, player-driven persistent universe as a testbed for tackling three core challenges in AI agent research: long-horizon planning, memory, and continual learning. Unlike previous DeepMind environments like AlphaGo (Go) or AlphaStar (StarCraft II), EVE Online features no fixed end state. Its single-shard universe has fostered complex, emergent player societies with real economies, political alliances, and wars that can span months or years. These conditions naturally demand the very skills—long-term strategic planning, maintaining memories over extended periods, and adapting to constant change—that are hardest for current AI agents to master. The research will initially use an offline version of EVE, providing a controlled, complex sandbox without interfering with the live player server. This move continues DeepMind's trajectory of using increasingly complex and open-ended virtual worlds for AI training, from Atari games and Go to StarCraft II and the SIMA project. The EVE environment represents a significant step towards testing AI in a persistent, socially complex, and continuously evolving world shaped by human behavior over decades.

marsbit05/25 00:08

AlphaGo's Creator Puts AI into a 23-Year-Old Artificial Society: All Three Toughest Challenges for AI Agents Are Here

marsbit05/25 00:08

OpenAI Post-Training Engineer Weng Jiayi Proposes a New Paradigm Hypothesis for Agentic AI

OpenAI engineer Weng Jiayi's "Heuristic Learning" experiments propose a new paradigm for Agentic AI, suggesting that intelligent agents can improve not just by training neural networks, but also by autonomously writing and refining code based on environmental feedback. In the experiment, a coding agent (powered by Codex) was tasked with developing and maintaining a programmatic strategy for the Atari game Breakout. Starting from a basic prompt, the agent iteratively wrote code, ran the game, analyzed logs and video replays to identify failures, and then modified the code. Through this engineering loop of "code-run-debug-update," it evolved a pure Python heuristic strategy that achieved a perfect score of 864 in Breakout and performed competitively with deep reinforcement learning (RL) algorithms in MuJoCo control tasks like Ant and HalfCheetah. This approach, termed Heuristic Learning (HL), contrasts with Deep RL. In HL, experience is captured in readable, modifiable code, tests, logs, and configurations—a software system—rather than being encoded solely into opaque neural network weights. This offers potential advantages in explainability, auditability for safety-critical applications, easier integration of regression tests to combat catastrophic forgetting, and more efficient sample use in early learning stages, as demonstrated in broader tests on 57 Atari games. However, the blog acknowledges clear limitations. Programmatic strategies struggle with tasks requiring long-horizon planning or complex perception (e.g., Montezuma's Revenge), areas where neural networks excel. The future vision is a hybrid architecture: specialized neural networks for fast perception (System 1), HL systems for rules, safety, and local recovery (also System 1), and LLM agents providing high-level feedback and learning from the HL system's data (System 2). The core proposition is that in the era of capable coding agents, a significant portion of an AI's learned experience could be maintained as an auditable, evolving software system.

marsbit05/11 00:17

OpenAI Post-Training Engineer Weng Jiayi Proposes a New Paradigm Hypothesis for Agentic AI

marsbit05/11 00:17

a16z: AI's 'Amnesia', Can Continuous Learning Cure It?

The article "a16z: AI's 'Amnesia' – Can Continual Learning Cure It?" explores the limitations of current large language models (LLMs), which, like the protagonist in the film *Memento*, are trapped in a perpetual present—unable to form new memories after training. While methods like in-context learning (ICL), retrieval-augmented generation (RAG), and external scaffolding (e.g., chat history, prompts) provide temporary solutions, they fail to enable true internalization of new knowledge. The authors argue that compression—the core of learning during training—is halted at deployment, preventing models from generalizing, discovering novel solutions (e.g., mathematical proofs), or handling adversarial scenarios. The piece introduces *continual learning* as a critical research direction to address this, categorizing approaches into three paths: 1. **Context**: Scaling external memory via longer context windows, multi-agent systems, and smarter retrieval. 2. **Modules**: Using pluggable adapters or external memory layers for specialization without full retraining. 3. **Weights**: Enabling parameter updates through sparse training, test-time training, meta-learning, distillation, and reinforcement learning from feedback. Challenges include catastrophic forgetting, safety risks, and auditability, but overcoming these could unlock models that learn iteratively from experience. The conclusion emphasizes that while context-based methods are effective, true breakthroughs require models to compress new information into weights post-deployment, moving from mere retrieval to genuine learning.

marsbit04/25 04:23

a16z: AI's 'Amnesia', Can Continuous Learning Cure It?

marsbit04/25 04:23

Indepth Research

Regulatory Policy

1Report Analysis: TSMC's AI Revenue to Double by 2027, CoWoS Capacity Remains a Bottleneck