# Пов'язані статті щодо Paradigm Shift

Центр новин HTX надає останні статті та поглиблений аналіз на тему "Paradigm Shift", що охоплює ринкові тренди, оновлення проєктів, технологічні розробки та регуляторну політику в криптоіндустрії.

Large Language Models Ace All Exams, Yet Move Farther from AGI: What Does This Paper Reveal?

The article discusses the ongoing challenge of defining and achieving Artificial General Intelligence (AGI). It notes that industry leaders have set vague, often profit- or time-based benchmarks for AGI, while the concept itself lacks a consensus definition—a situation the article compares to a "Rorschach test." It highlights a recent 2025 paper by researcher Michael Timothy Bennett, who proposes a new, measurable definition. Bennett frames AGI not as mimicking human performance on tests, which current large language models (LLMs) have already mastered, but as an "artificial scientist." A true AGI, according to this view, should be able to widely and efficiently adapt to new environments and tasks within real-world constraints (like computational and energy limits), focusing on the *discovery of new knowledge* rather than the replication of existing data. The author contrasts this with the current dominant approach of "scale-maxing"—massively scaling up data, parameters, and compute. While powerful, this method leads to models that fail on out-of-distribution problems and lack core intelligent abilities: they are passive learners, cannot reason causally, and cannot actively experiment or balance exploration with exploitation. The article argues that Bennett's framework offers a crucial shift. It makes AGI a quantifiable engineering problem and proposes new evaluation "adaptation benchmarks" that test an AI's ability to actively learn in novel scenarios. The conclusion is that achieving AGI will require a fundamental reset—a fusion of multiple methodologies beyond simple scaling, moving AI from mimicking patterns to embodying the scientific spirit of inquiry and discovery.

marsbit5 год тому

Large Language Models Ace All Exams, Yet Move Farther from AGI: What Does This Paper Reveal?

marsbit5 год тому

OpenAI Post-Training Engineer Weng Jiayi Proposes a New Paradigm Hypothesis for Agentic AI

OpenAI engineer Weng Jiayi's "Heuristic Learning" experiments propose a new paradigm for Agentic AI, suggesting that intelligent agents can improve not just by training neural networks, but also by autonomously writing and refining code based on environmental feedback. In the experiment, a coding agent (powered by Codex) was tasked with developing and maintaining a programmatic strategy for the Atari game Breakout. Starting from a basic prompt, the agent iteratively wrote code, ran the game, analyzed logs and video replays to identify failures, and then modified the code. Through this engineering loop of "code-run-debug-update," it evolved a pure Python heuristic strategy that achieved a perfect score of 864 in Breakout and performed competitively with deep reinforcement learning (RL) algorithms in MuJoCo control tasks like Ant and HalfCheetah. This approach, termed Heuristic Learning (HL), contrasts with Deep RL. In HL, experience is captured in readable, modifiable code, tests, logs, and configurations—a software system—rather than being encoded solely into opaque neural network weights. This offers potential advantages in explainability, auditability for safety-critical applications, easier integration of regression tests to combat catastrophic forgetting, and more efficient sample use in early learning stages, as demonstrated in broader tests on 57 Atari games. However, the blog acknowledges clear limitations. Programmatic strategies struggle with tasks requiring long-horizon planning or complex perception (e.g., Montezuma's Revenge), areas where neural networks excel. The future vision is a hybrid architecture: specialized neural networks for fast perception (System 1), HL systems for rules, safety, and local recovery (also System 1), and LLM agents providing high-level feedback and learning from the HL system's data (System 2). The core proposition is that in the era of capable coding agents, a significant portion of an AI's learned experience could be maintained as an auditable, evolving software system.

marsbit05/11 00:17

OpenAI Post-Training Engineer Weng Jiayi Proposes a New Paradigm Hypothesis for Agentic AI

marsbit05/11 00:17

Cursor 3 Released: The IDE Becomes Irrelevant, Agent Console Takes Over, The VS Code Era Begins to Fade

Cursor 3, codenamed Glass, represents a fundamental shift in AI-assisted development by replacing the traditional code editor with an agent management console as the primary interface. While engineers can still write code, the core design philosophy now centers on users spending most of their time directing AI agents, reviewing their outputs, and deciding which tasks to deploy. Key features include multi-repository support, a unified sidebar for all agents (local and cloud), and Cloud Handoff, which allows seamless movement of agent sessions between local and cloud environments. This release is part of Cursor's accelerated response to competitive pressure from tools like Anthropic's Claude Code. The company also recently launched Automations for triggering agents automatically, Composer 2 (its proprietary model claiming superior performance to Claude Opus), and self-hosted cloud agents for enterprise customers. The transition signals a broader industry paradigm shift where agent orchestration becomes the new control plane, similar to how cloud consoles replaced SSH for infrastructure management. This challenges the decades-long dominance of IDEs like VS Code, suggesting that software engineering roles are evolving toward overseeing AI agents rather than directly editing code. The architectural debate now centers on whether this orchestration layer should exist inside the IDE (Cursor, Google), as a separate tool (Anthropic, OpenAI), or be omnipresent.

marsbit04/08 10:16

Cursor 3 Released: The IDE Becomes Irrelevant, Agent Console Takes Over, The VS Code Era Begins to Fade

marsbit04/08 10:16

活动图片