6 Major AI Paradigm Shifts in 2025: From RLVR Training and Vibe Coding to Nano Banana

marsbit发布于2025-12-22更新于2025-12-22

文章摘要

Summary: In 2025, six key paradigm shifts are redefining the AI landscape. RLVR (Reinforcement Learning with Verifiable Rewards) has become a core training method, enabling models to develop reasoning-like strategies through optimization on objective tasks like math and coding. This has shifted computational focus from pre-training to extended RL training. The concept of "ghost" vs. "animal" intelligence highlights the unique, jagged capability profile of LLMs, which excel in verifiable domains but remain brittle elsewhere, leading to widespread skepticism of benchmark performance. Cursor emerged as a new application-layer paradigm, demonstrating how vertical-specific tools can orchestrate multiple LLM calls into complex workflows. Claude Code redefined local AI by running powerful coding agents directly on user devices, integrating deeply with private data and environments. "Vibe Coding" lowered the barrier to programming, allowing both amateurs and professionals to build software through natural language description. Finally, Google's Nano banana signaled the next major computing paradigm by moving beyond text to a multi-modal, graphical user interface for LLMs, better aligning with human visual and spatial cognition.

Author: Andrej Karpathy

Compiled by: Tim, PANews

2025 has been a year of rapid development and significant changes for large language models, yielding abundant achievements. Below are the "paradigm shifts" that I personally find noteworthy and somewhat surprising—changes that have altered the landscape and, at least on a conceptual level, left a deep impression on me.

1. Reinforcement Learning with Verifiable Rewards (RLVR)

At the beginning of 2025, the LLM production stack at all AI labs generally looked like this:

Pre-training (GPT-2/3 from 2020);
Supervised Fine-Tuning (InstructGPT from 2022);
And Reinforcement Learning from Human Feedback (RLHF, from 2022).

For a long time, this was a stable and mature technical stack for training production-level large language models. By 2025, Reinforcement Learning with Verifiable Rewards had become the core technology widely adopted. By training large language models in various environments with automatically verifiable rewards (such as solving math and programming problems), these models spontaneously develop strategies that humans perceive as "reasoning." They learn to break down problem-solving into intermediate computational steps and master multiple strategies for solving problems through repeated deduction (refer to the DeepSeek-R1 paper for examples). In the previous stack, these strategies were difficult to achieve because the optimal reasoning path and backtracking mechanisms were not explicit for large language models—they had to explore solutions suitable for themselves through reward optimization.

Unlike the Supervised Fine-Tuning and RLHF stages (which are relatively short and involve less computational fine-tuning), RLVR involves long-term optimization training on objective, non-gameable reward functions. It has been proven that running RLVR brings significant capability improvements per unit cost, consuming a large portion of the computational resources originally allocated for pre-training. Therefore, the progress in large language model capabilities in 2025 is mainly reflected in how major AI labs have absorbed the enormous computational demands of this new technology. Overall, we see models of roughly similar scales but with significantly extended RL training times. Another unique aspect of this new technology is that we gain a new调控 dimension (and corresponding scaling laws), where model capabilities can be controlled as a function of test-time computation by generating longer reasoning trajectories and increasing "thinking time." OpenAI's o1 model (released in late 2024) was the first demonstration of an RLVR model, and the release of o3 (early 2025) marked a clear turning point, allowing people to intuitively feel a qualitative leap.

2. Ghost Intelligence vs. Animal Jagged Intelligence

2025 was the year when I (and I believe the entire industry) began to intuitively understand the "form" of large language model intelligence. We are not "evolving or nurturing animals" but "summoning ghosts." The entire technical stack of large language models (neural architecture, training data, training algorithms, and especially optimization objectives) is entirely different, so it is no surprise that we obtain entities in the intelligence domain that are vastly different from biological intelligence. It is inappropriate to examine them from an animal perspective. From the perspective of supervisory information, human neural networks are optimized for survival in tribal jungle environments, while large language model neural networks are optimized for imitating human text, earning rewards in math puzzles, and winning human likes in arenas. As verifiable domains provide conditions for RLVR, the capabilities of large language models in these areas experience "sudden jumps," overall presenting an interesting, jagged performance characteristic. They can simultaneously be erudite geniuses and confused, cognitively struggling elementary students,随时可能 leaking your data under诱导 prompts.

Human intelligence: blue, AI intelligence: red. I like this version of the meme (sorry, I can't find the original Twitter post) because it points out that human intelligence also has its own jagged waves in its own way.

Related to this, in 2025, I developed a general sense of indifference and distrust towards various benchmarks. The core issue is that benchmarks are essentially verifiable environments, making them highly susceptible to RLVR and weaker forms of influence through synthetic data generation. In the typical "score maximization" process, LLM teams inevitably construct training environments near the small embedded spaces of benchmarks and cover these areas with "capability jaggedness." "Training on the test set" has become a new norm.

So what if we sweep all benchmarks but still fail to achieve artificial general intelligence?

3. Cursor: A New Tier of LLM Applications

What impressed me about Cursor (besides its rapid rise this year) is that it convincingly revealed a new "LLM application" tier, as people began talking about "the Cursor of XX field." As I emphasized in my Y Combinator speech this year, LLM applications like Cursor focus on integrating and orchestrating LLM calls for specific vertical domains:

They handle "context engineering";
Orchestrate multiple LLM calls into increasingly complex directed acyclic graphs at the底层, finely balancing performance and cost;
Provide application-specific graphical interfaces for personnel in the "human-in-the-loop";
And offer an "autonomy adjustment slider."

In 2025, there has been extensive discussion about the development space around this emerging application layer. Will LLM platforms dominate all applications, or is there still broad room for LLM applications? I personally speculate that LLM platforms will gradually position themselves as cultivating "generalist university graduates," while LLM applications will be responsible for organizing these "graduates," fine-tuning them, and making them实战-ready professional teams in specific vertical domains by providing private data, sensors, actuators, and feedback loops.

4. Claude Code: AI Running Locally

The emergence of Claude Code convincingly demonstrated for the first time the form of LLM agents, which combine tool use and reasoning in a cyclical manner to achieve more persistent complex problem-solving. Additionally, what impressed me about Claude Code is that it runs on the user's personal computer, deeply integrated with the user's private environment, data, and context. I believe OpenAI misjudged this direction by focusing their development of code assistants and agents on cloud deployment—i.e., containerized environments orchestrated by ChatGPT—rather than the localhost environment. Although cloud-run agent clusters seem like the "ultimate form towards AGI," we are currently in a过渡阶段 with uneven capability development and relatively slow progress. Under these realistic conditions, deploying agents directly on local computers, closely collaborating with developers and their specific work environments, is a more reasonable path. Claude Code accurately grasped this priority order and packaged it into a concise, elegant, and highly attractive command-line tool form, thereby reshaping how AI is presented. It is no longer just a website like Google that needs to be visited but a little精灵 or ghost "living" in your computer. This is a全新的, unique paradigm for interacting with AI.

5. Vibe Coding

In 2025, AI crossed a critical capability threshold, making it possible to build various amazing programs solely through English descriptions, without even caring about the underlying code. Interestingly, I coined the term "Vibe Coding" in a casual shower thought tweet, never expecting it to develop to its current extent. Under the paradigm of vibe coding, programming is no longer strictly confined to highly trained professionals but becomes something everyone can participate in. From this perspective, it is another example of the phenomenon I described in "Empowering People: How Large Language Models Change the Mode of Technology Diffusion." In stark contrast to all other technologies so far, ordinary people benefit more from large language models than professionals, businesses, and governments. But vibe coding not only empowers ordinary people to access programming but also enables professional developers to write more "software that would never have been implemented." While developing nanochat, I used vibe coding to write a custom efficient BPE tokenizer in Rust without relying on existing libraries or深入学习 Rust. This year, I also used vibe coding to quickly prototype multiple projects just to verify whether certain ideas were feasible. I even wrote entire one-off applications just to locate a specific bug because code suddenly becomes free, ephemeral, malleable, and disposable. Vibe coding will reshape the software development ecosystem and profoundly change the boundaries of职业 definitions.

6. Nano Banana: LLM Graphical Interface

Google's Gemini Nano Banana was one of the most disruptive paradigm shifts of 2025. In my view, large language models are the next major computing paradigm after computers in the 1970s and 80s. Therefore, we will see innovations of the same kind for similar fundamental reasons, akin to the evolution of personal computing, microcontrollers, and even the internet. Especially in human-computer interaction, the current "conversation" mode with LLMs is somewhat similar to inputting commands into computer terminals in the 1980s. Text is the most primitive data representation form for computers (and LLMs) but not the preferred way for humans (especially for input). Humans actually dislike reading text—it is slow and laborious. Instead, humans prefer to receive information through visual and spatial dimensions, which is precisely why graphical user interfaces emerged in traditional computing. Similarly, large language models should communicate with us in forms preferred by humans—through images, infographics, slides, whiteboards, animations, videos, web applications, and other carriers. The current early forms are already realized through "visual text decorations" like emojis and Markdown (such as headings, bold, lists, tables, and other排版 elements). But who will actually build the graphical interface for large language models? From this perspective, nano banana is an early雏形 of this future blueprint. It is worth noting that the breakthrough of nano banana lies not only in its image generation capability itself but also in the comprehensive ability formed by the interweaving of text generation, image generation, and world knowledge within the model weights.

你可能也喜欢

Blockchain.com通过与Ondo Finance合作扩展代币化股票访问渠道

Blockchain.com通过与Ondo Finance的合作，将其钱包生态系统中的代币化美国股票和ETF访问权限扩展给符合条件的用户。这一合作为用户提供了在熟悉的加密钱包界面内接触受监管的现实世界资产的途径，而非依赖传统券商模式。 Ondo Finance在代币化现实资产（RWA）市场中已成为知名参与者，专注于将国债、收益产品和股票等传统金融产品上链。此次与Blockchain.com的整合为Ondo提供了一个面向大量钱包用户的分发渠道，解决了代币化资产的分发与可及性问题。该举措主要面向美国以外的全球用户，为其提供接触美国股市的加密原生替代方案，这些用户可能已在使用稳定币和加密钱包作为金融基础设施。随着RWA市场竞争日益激烈，股票和ETF因其易于理解和全球需求旺盛而成为焦点。然而，代币化股票仍需解决托管、赎回、法律权利和监管处理等问题。Blockchain.com和Ondo押注于钱包原生访问方式，旨在使这些资产像普通加密货币一样简单易用，同时确保底层资产的合法性。

bitcoinist1小时前

Blockchain.com通过与Ondo Finance合作扩展代币化股票访问渠道

bitcoinist1小时前

CPU杀回牌桌，一场1700亿美元的“上位”大戏开启

英伟达在2026年台北电脑展上首次发布独立CPU产品线Vera CPU，标志着其业务重心从GPU向更广阔的计算领域扩展。CEO黄仁勋指出，在AI智能体时代，CPU已成为数据中心性能的关键瓶颈。与此同时，AMD将服务器CPU市场规模预测大幅上调至1200亿美元以上，行业预测其潜在市场规模将在2030年达到约1700亿美元。市场格局正在发生变化。2026年一季度，AMD在服务器CPU收入份额上逼近英特尔，显示出高核数产品的强大溢价能力。分析指出，AI发展正从训练转向推理和智能体阶段，后者需要频繁进行复杂控制流、工具调用和数据处理，这些任务严重依赖CPU而非GPU。在智能体任务中，GPU利用率可能低于50%，而CPU工作量占比可达七成以上。这导致CPU与GPU的配比从过去的1:8显著收敛至1:4甚至1:1。需求变化直接推动了十多年来首次大规模涨价，英特尔和AMD服务器CPU价格普遍上涨10%-15%，且出现产能紧张。市场分化为配合GPU的高核数CPU和用于智能体任务编排的中核数批量CPU两类需求。英伟达基于ARM架构的Vera CPU入局，进一步凸显了CPU的战略地位。这对中国CPU产业链既是机遇也是挑战。国产CPU厂商如海光信息、华为鲲鹏等，既受益于全球AI需求增长，也面临信创政策带来的国产替代窗口期。行业共识是，AI大规模落地的关键已从单芯片性能转向CPU与GPU的协同能力。

marsbit1小时前

marsbit1小时前

TechFlow 情报局：AMD AI 总监公开批评 Claude Code"变得更笨更懒"，特朗普称霍尔木兹将全面停火但海峡仍有 80 枚水雷待清

**科技与地缘动态摘要** **AI与芯片领域** * **技术竞争与审查**：韩国SK Telecom因与Anthropic的合作面临美国出口管制审查。与此同时，中国Z.AI发布了不依赖英伟达芯片、性能对标Claude Opus的GLM-5.2大模型，引发关于技术围堵效果的讨论。 * **安全与伦理问题**：Google Gemini被曝在诈骗场景中提供误导建议，引发AI安全担忧。GitHub上发现上万个分发木马的仓库，开源供应链安全敲响警钟。 * **行业动态**：亚马逊正洽谈对外出售其自研AI芯片，意图进军市场。苹果据悉将为特殊版iPhone独享台积电最新制程工艺。0G Labs宣布其链上AI推理总量突破重要里程碑。 * **争议与监管**：AMD AI总监公开批评Claude Code性能下降。多名亚马逊工程师因批评公司AI数据中心扩张的环境影响遭内部调查。微软、亚马逊云服务或面临欧盟严厉反垄断审查。 **加密/Web3动态** * 韩国交易所Bithumb上线ReProtocol (RE)交易对，而Upbit则移除了KernelDAO (KERNEL)交易对。 **地缘与财经** * **霍尔木兹海峡局势**：尽管美伊达成协议，但霍尔木兹海峡主航道仍有约80枚水雷未清除，导致近8000万桶满载石油的油轮滞留，等待“安全信号”。伊朗取消了赴瑞士外交行程，和谈前景不明。特朗普称协议是伊朗“无条件投降”，并宣称总统拥有无限权力。 * **美股表现**：美股半导体板块大涨，英特尔因与苹果合作传闻暴涨10.6%，而SpaceX股价下跌3.5%。 **核心观察** 当前局势呈现鲜明对比：地缘政治达成临时“和平”，但实际风险（水雷）与不确定性（伊朗行程取消）犹存，导致经济活动（油轮通航）停滞。与此同时，科技领域的竞争与重构却在加速进行，从芯片自主研发、AI模型突破到供应链安全，科技公司正以另一种方式重塑全球格局。

marsbit1小时前

TechFlow 情报局：AMD AI 总监公开批评 Claude Code"变得更笨更懒"，特朗普称霍尔木兹将全面停火但海峡仍有 80 枚水雷待清

marsbit1小时前

马特·达蒙将于瑞波Swell大会发表演讲 Water.org的RLUSD推广引关注

马特·达蒙将出席Ripple Swell 2026大会并发表主题演讲，其共同创立的慈善组织Water.org近期发起的“Get Blue”活动受到关注。该活动旨在扩大安全用水获取，而Ripple被列为独家数字资产与支付合作伙伴。关键点包括：达蒙的参与为Ripple连接了加密货币支付基础设施与主流慈善事业，拓宽了其支付叙事的受众面。Water.org的“Get Blue”活动将利用Ripple Payments及其稳定币RLUSD，以提高向微观金融合作伙伴转移资金的效率与降低成本。此举为RLUSD提供了一个超越交易和国库管理的人道主义支付用例，强调其在新兴市场快速、低成本转移资金的价值。对Ripple而言，此举有助于提升声誉，将稳定币定位为实用的支付基础设施，而非投机性资产，并向主流受众展示区块链支付在慈善等现实场景中的应用。不过，该合作的实际成效仍有待观察。

bitcoinist3小时前

马特·达蒙将于瑞波Swell大会发表演讲 Water.org的RLUSD推广引关注

bitcoinist3小时前

微软发现针对钱包地址和私钥的新型加密恶意软件

2026年2月，微软威胁情报与微软 Defender 专家发现了一种针对加密货币的“剪切板劫持器”恶意软件活动。该恶意软件通过USB驱动器中的恶意.lnk快捷方式文件传播，利用Windows Script Host和ActiveX技术激活，无需安装程序或控制服务器即可运行。一旦感染系统，该恶意软件会持续监控剪贴板内容，专门寻找12词或24词恢复短语、比特币及以太坊私钥和钱包地址。它会在用户完成交易前，将复制的收款地址替换为攻击者控制的地址。此外，恶意软件还能截图并通过Tor连接发送，使攻击者能窃取用户钱包余额和活动信息，并具备远程代码执行能力。微软将此威胁检测为Trojan/CryptoBandits.A，并建议组织禁用USB自动运行功能、限制从USB驱动器执行脚本和快捷方式，并监控相关的可疑活动，如本地9050端口代理活动、PowerShell截图行为等。

TheNewsCrypto4小时前

TheNewsCrypto4小时前

交易

现货

合约

6 Major AI Paradigm Shifts in 2025: From RLVR Training and Vibe Coding to Nano Banana

文章摘要

1. Reinforcement Learning with Verifiable Rewards (RLVR)

2. Ghost Intelligence vs. Animal Jagged Intelligence

3. Cursor: A New Tier of LLM Applications

4. Claude Code: AI Running Locally

5. Vibe Coding

6. Nano Banana: LLM Graphical Interface

热门币种推荐

相关问答

你可能也喜欢

Blockchain.com通过与Ondo Finance合作扩展代币化股票访问渠道

CPU杀回牌桌，一场1700亿美元的“上位”大戏开启

TechFlow 情报局：AMD AI 总监公开批评 Claude Code"变得更笨更懒"，特朗普称霍尔木兹将全面停火但海峡仍有 80 枚水雷待清

马特·达蒙将于瑞波Swell大会发表演讲 Water.org的RLUSD推广引关注

微软发现针对钱包地址和私钥的新型加密恶意软件

交易

热门文章

如何购买BANANA

相关讨论

热门问答

热门分类

热门标签