6 Major AI Paradigm Shifts in 2025: From RLVR Training and Vibe Coding to Nano Banana

marsbitPublished on 2025-12-22Last updated on 2025-12-22

Abstract

Summary: In 2025, six key paradigm shifts are redefining the AI landscape. RLVR (Reinforcement Learning with Verifiable Rewards) has become a core training method, enabling models to develop reasoning-like strategies through optimization on objective tasks like math and coding. This has shifted computational focus from pre-training to extended RL training. The concept of "ghost" vs. "animal" intelligence highlights the unique, jagged capability profile of LLMs, which excel in verifiable domains but remain brittle elsewhere, leading to widespread skepticism of benchmark performance. Cursor emerged as a new application-layer paradigm, demonstrating how vertical-specific tools can orchestrate multiple LLM calls into complex workflows. Claude Code redefined local AI by running powerful coding agents directly on user devices, integrating deeply with private data and environments. "Vibe Coding" lowered the barrier to programming, allowing both amateurs and professionals to build software through natural language description. Finally, Google's Nano banana signaled the next major computing paradigm by moving beyond text to a multi-modal, graphical user interface for LLMs, better aligning with human visual and spatial cognition.

Author: Andrej Karpathy

Compiled by: Tim, PANews

2025 has been a year of rapid development and significant changes for large language models, yielding abundant achievements. Below are the "paradigm shifts" that I personally find noteworthy and somewhat surprising—changes that have altered the landscape and, at least on a conceptual level, left a deep impression on me.

1. Reinforcement Learning with Verifiable Rewards (RLVR)

At the beginning of 2025, the LLM production stack at all AI labs generally looked like this:

  • Pre-training (GPT-2/3 from 2020);
  • Supervised Fine-Tuning (InstructGPT from 2022);
  • And Reinforcement Learning from Human Feedback (RLHF, from 2022).

For a long time, this was a stable and mature technical stack for training production-level large language models. By 2025, Reinforcement Learning with Verifiable Rewards had become the core technology widely adopted. By training large language models in various environments with automatically verifiable rewards (such as solving math and programming problems), these models spontaneously develop strategies that humans perceive as "reasoning." They learn to break down problem-solving into intermediate computational steps and master multiple strategies for solving problems through repeated deduction (refer to the DeepSeek-R1 paper for examples). In the previous stack, these strategies were difficult to achieve because the optimal reasoning path and backtracking mechanisms were not explicit for large language models—they had to explore solutions suitable for themselves through reward optimization.

Unlike the Supervised Fine-Tuning and RLHF stages (which are relatively short and involve less computational fine-tuning), RLVR involves long-term optimization training on objective, non-gameable reward functions. It has been proven that running RLVR brings significant capability improvements per unit cost, consuming a large portion of the computational resources originally allocated for pre-training. Therefore, the progress in large language model capabilities in 2025 is mainly reflected in how major AI labs have absorbed the enormous computational demands of this new technology. Overall, we see models of roughly similar scales but with significantly extended RL training times. Another unique aspect of this new technology is that we gain a new调控 dimension (and corresponding scaling laws), where model capabilities can be controlled as a function of test-time computation by generating longer reasoning trajectories and increasing "thinking time." OpenAI's o1 model (released in late 2024) was the first demonstration of an RLVR model, and the release of o3 (early 2025) marked a clear turning point, allowing people to intuitively feel a qualitative leap.

2. Ghost Intelligence vs. Animal Jagged Intelligence

2025 was the year when I (and I believe the entire industry) began to intuitively understand the "form" of large language model intelligence. We are not "evolving or nurturing animals" but "summoning ghosts." The entire technical stack of large language models (neural architecture, training data, training algorithms, and especially optimization objectives) is entirely different, so it is no surprise that we obtain entities in the intelligence domain that are vastly different from biological intelligence. It is inappropriate to examine them from an animal perspective. From the perspective of supervisory information, human neural networks are optimized for survival in tribal jungle environments, while large language model neural networks are optimized for imitating human text, earning rewards in math puzzles, and winning human likes in arenas. As verifiable domains provide conditions for RLVR, the capabilities of large language models in these areas experience "sudden jumps," overall presenting an interesting, jagged performance characteristic. They can simultaneously be erudite geniuses and confused, cognitively struggling elementary students,随时可能 leaking your data under诱导 prompts.

Human intelligence: blue, AI intelligence: red. I like this version of the meme (sorry, I can't find the original Twitter post) because it points out that human intelligence also has its own jagged waves in its own way.

Related to this, in 2025, I developed a general sense of indifference and distrust towards various benchmarks. The core issue is that benchmarks are essentially verifiable environments, making them highly susceptible to RLVR and weaker forms of influence through synthetic data generation. In the typical "score maximization" process, LLM teams inevitably construct training environments near the small embedded spaces of benchmarks and cover these areas with "capability jaggedness." "Training on the test set" has become a new norm.

So what if we sweep all benchmarks but still fail to achieve artificial general intelligence?

3. Cursor: A New Tier of LLM Applications

What impressed me about Cursor (besides its rapid rise this year) is that it convincingly revealed a new "LLM application" tier, as people began talking about "the Cursor of XX field." As I emphasized in my Y Combinator speech this year, LLM applications like Cursor focus on integrating and orchestrating LLM calls for specific vertical domains:

  • They handle "context engineering";
  • Orchestrate multiple LLM calls into increasingly complex directed acyclic graphs at the底层, finely balancing performance and cost;
  • Provide application-specific graphical interfaces for personnel in the "human-in-the-loop";
  • And offer an "autonomy adjustment slider."

In 2025, there has been extensive discussion about the development space around this emerging application layer. Will LLM platforms dominate all applications, or is there still broad room for LLM applications? I personally speculate that LLM platforms will gradually position themselves as cultivating "generalist university graduates," while LLM applications will be responsible for organizing these "graduates," fine-tuning them, and making them实战-ready professional teams in specific vertical domains by providing private data, sensors, actuators, and feedback loops.

4. Claude Code: AI Running Locally

The emergence of Claude Code convincingly demonstrated for the first time the form of LLM agents, which combine tool use and reasoning in a cyclical manner to achieve more persistent complex problem-solving. Additionally, what impressed me about Claude Code is that it runs on the user's personal computer, deeply integrated with the user's private environment, data, and context. I believe OpenAI misjudged this direction by focusing their development of code assistants and agents on cloud deployment—i.e., containerized environments orchestrated by ChatGPT—rather than the localhost environment. Although cloud-run agent clusters seem like the "ultimate form towards AGI," we are currently in a过渡阶段 with uneven capability development and relatively slow progress. Under these realistic conditions, deploying agents directly on local computers, closely collaborating with developers and their specific work environments, is a more reasonable path. Claude Code accurately grasped this priority order and packaged it into a concise, elegant, and highly attractive command-line tool form, thereby reshaping how AI is presented. It is no longer just a website like Google that needs to be visited but a little精灵 or ghost "living" in your computer. This is a全新的, unique paradigm for interacting with AI.

5. Vibe Coding

In 2025, AI crossed a critical capability threshold, making it possible to build various amazing programs solely through English descriptions, without even caring about the underlying code. Interestingly, I coined the term "Vibe Coding" in a casual shower thought tweet, never expecting it to develop to its current extent. Under the paradigm of vibe coding, programming is no longer strictly confined to highly trained professionals but becomes something everyone can participate in. From this perspective, it is another example of the phenomenon I described in "Empowering People: How Large Language Models Change the Mode of Technology Diffusion." In stark contrast to all other technologies so far, ordinary people benefit more from large language models than professionals, businesses, and governments. But vibe coding not only empowers ordinary people to access programming but also enables professional developers to write more "software that would never have been implemented." While developing nanochat, I used vibe coding to write a custom efficient BPE tokenizer in Rust without relying on existing libraries or深入学习 Rust. This year, I also used vibe coding to quickly prototype multiple projects just to verify whether certain ideas were feasible. I even wrote entire one-off applications just to locate a specific bug because code suddenly becomes free, ephemeral, malleable, and disposable. Vibe coding will reshape the software development ecosystem and profoundly change the boundaries of职业 definitions.

6. Nano Banana: LLM Graphical Interface

Google's Gemini Nano Banana was one of the most disruptive paradigm shifts of 2025. In my view, large language models are the next major computing paradigm after computers in the 1970s and 80s. Therefore, we will see innovations of the same kind for similar fundamental reasons, akin to the evolution of personal computing, microcontrollers, and even the internet. Especially in human-computer interaction, the current "conversation" mode with LLMs is somewhat similar to inputting commands into computer terminals in the 1980s. Text is the most primitive data representation form for computers (and LLMs) but not the preferred way for humans (especially for input). Humans actually dislike reading text—it is slow and laborious. Instead, humans prefer to receive information through visual and spatial dimensions, which is precisely why graphical user interfaces emerged in traditional computing. Similarly, large language models should communicate with us in forms preferred by humans—through images, infographics, slides, whiteboards, animations, videos, web applications, and other carriers. The current early forms are already realized through "visual text decorations" like emojis and Markdown (such as headings, bold, lists, tables, and other排版 elements). But who will actually build the graphical interface for large language models? From this perspective, nano banana is an early雏形 of this future blueprint. It is worth noting that the breakthrough of nano banana lies not only in its image generation capability itself but also in the comprehensive ability formed by the interweaving of text generation, image generation, and world knowledge within the model weights.

Trending Cryptos

Related Questions

QWhat is RLVR and how does it differ from previous training methods like RLHF?

ARLVR (Reinforcement Learning with Verifiable Rewards) is a training method where LLMs are optimized in environments with automatically verifiable rewards, such as math or programming problems. Unlike RLHF, which relies on human feedback, RLVR uses objective, non-gameable reward functions and involves long-duration optimization. It allows models to develop reasoning-like strategies and significantly improves capabilities per unit of compute, consuming resources originally allocated for pre-training.

QHow does the concept of 'Ghost Intelligence' contrast with 'Animal Intelligence' in AI?

A'Ghost Intelligence' refers to the unique, non-biological form of intelligence exhibited by LLMs, which is optimized for mimicking human text, solving verifiable problems, and winning human approval. It contrasts with 'Animal Intelligence,' which is evolved for survival in natural environments. LLMs show a jagged performance profile, excelling in specific verifiable domains while potentially failing in others, making them fundamentally different from biological intelligence.

QWhat makes Cursor represent a new layer of LLM applications?

ACursor represents a new layer of LLM applications by specializing in vertical domains through context engineering, orchestrating multiple LLM calls into complex graphs, providing domain-specific GUIs for human-in-the-loop interaction, and offering an 'autonomy slider.' It acts as a specialized team that fine-tunes general-purpose LLMs (like 'university graduates') for practical use cases with private data, sensors, and actuators.

QWhy is Claude Code's local execution significant for AI agents?

AClaude Code's local execution is significant because it runs on the user's computer, deeply integrating with their private environment, data, and context. This approach, which prioritizes local deployment over cloud-based containers, allows for more effective collaboration with developers in their specific workflows. It presents AI as a 'local ghost' or assistant, offering a new paradigm of interaction distinct from cloud-centric models.

QWhat is 'Vibe Coding' and how does it change software development?

A'Vibe Coding' is a paradigm where programs are built through natural English descriptions, eliminating the need for deep coding expertise. It democratizes programming, enabling non-experts to create software and professionals to rapidly prototype or implement ideas that would otherwise be unfeasible. This approach makes code 'free, ephemeral, malleable, and disposable,' reshaping software development landscape and blurring the lines of professional boundaries.

Related Reads

The "Impossible Triad" Is Fundamentally a Pseudo-Problem

The article argues that blockchain's fundamental limitation is not the scalability trilemma (decentralization, scalability, security), which has been largely solved, but the lack of **privacy** and, until recently, clear **legitimacy**. Blockchain is described as a slow, expensive, globally shared computer whose core value is censorship resistance and verifiability. While ideal for native digital assets like money (e.g., stablecoins), its default transparency acts as a **tax**, exposing all transactions and enabling MEV extraction, which deters serious institutional capital. Simultaneously, its permissionless nature created regulatory ambiguity. The piece contends that **privacy** is the missing critical feature. It rejects the false choice between total transparency and complete anonymity. Modern cryptography (like zero-knowledge proofs) enables **compliant privacy**: users can prove facts (solvency, KYC status, compliance) without revealing the underlying sensitive data (specific holdings, identities). This preserves auditability for regulators and eliminates the leak of financial information. With recent regulatory progress (e.g., the GENIUS Act) addressing legitimacy, adding default, provably compliant privacy becomes a pure upgrade. It transforms blockchain from a costly, public ledger into a confidential settlement layer, finally bridging the gap to mainstream institutional and individual adoption of on-chain finance.

链捕手9h ago

The "Impossible Triad" Is Fundamentally a Pseudo-Problem

链捕手9h ago

Optical Chips: Collective Capacity Expansion

The global optical chip industry is experiencing a massive wave of expansion driven by surging AI data center demand. Major players across the US, Japan, Europe, and China are aggressively investing to ramp up production capacity. In the US, Coherent is expanding its 6-inch Indium Phosphide (InP) semiconductor fab in Texas, supported by CHIPS Act funding and a $2 billion strategic investment from NVIDIA. Lumentum is building a new factory for InP optical devices, and Nokia is scaling its advanced photonic chip packaging and testing capabilities. NVIDIA's investments aim to secure future supply of critical lasers and optical interconnect products for AI infrastructure. Japan's JX Advanced Metals, a leading InP substrate supplier, plans a multi-billion yen investment to increase its capacity 7-10 times, strengthening its grip on the crucial upstream materials market. In Europe, IQE and Tower Semiconductor settled a patent dispute and signed a multi-year InP epitaxial wafer supply agreement, highlighting that next-generation silicon photonics platforms will integrate high-performance InP components. STMicroelectronics and Sivers Semiconductors are also expanding silicon photonics production and partnerships. China is rapidly building out its domestic supply chain. Dongshan Precision's subsidiary, Source Photonics, announced a $12 billion project to expand optical chip and module production. Companies like Sanan Optoelectronics and Yunnan Germanium are scaling up InP chip manufacturing and substrate production, moving towards vertical integration from materials to modules. While debate continues around the exact future architecture—whether CPO (Co-Packaged Optics), NPO, or pluggables will dominate—analysts like Morgan Stanley argue the underlying driver is unchangeable: the explosive growth in bandwidth demand. This will inevitably increase the volume of optical engines, lasers, and related content per GPU, regardless of the final technical path. The competition for "more light" in the AI era has intensified into a global, full-chain capacity race.

marsbit11h ago

Optical Chips: Collective Capacity Expansion

marsbit11h ago

Stablecoins Finally Find Real Yield: An In-Depth Look at On-Chain Reinsurance Re | A Conversation with Re Founder Karan Saroya

Stablecoin Real Yield Found: A Deep Dive into On-Chain Reinsurance with Re's Karan Saroya As stablecoin supply exceeds $170 billion, the search for sustainable, non-speculative yield intensifies. Re, an on-chain reinsurance platform, provides an answer: connecting stablecoin capital to the trillion-dollar traditional reinsurance market. Re operates as a regulated reinsurer, accepting stablecoin deposits as collateral to back US insurance companies. These insurers pay premiums, generating yield that flows back to on-chain depositors. Currently supporting 35 insurers and underwriting $500 million, Re projects scaling to over $1 billion soon. Key insights from a Bankless podcast with founder Karan Saroya and investor Avichal of Electric Capital: 1. **Uncorrelated, Real-World Yield:** Re offers stablecoin holders access to reinsurance returns (targeting 12-14%+), an asset class entirely separate from crypto or equity markets. 2. **Operational Efficiency via Smart Contracts:** Re replaces traditional, labor-intensive capital fundraising with smart contracts, allowing a ~12-person team to compete with industry giants. 3. **Regulatory Leverage:** For every $1 of collateral, regulations allow backing $5-7 in written premiums. This leverage amplifies returns from the underlying risk-free rate. 4. **DeFi Integration:** Depositors receive receipt tokens, which can be used in protocols like Morpho for "looping," potentially pushing yields to 18-20%+. 5. **The "DeFi Mullet" Model:** A compliant front-end (regulated reinsurer) paired with a decentralized back-end (smart contracts, DeFi capital markets). 6. **RE Governance Token:** Modeled on Lloyd's of London, the token governs the central capital pool's allocation, counterparty acceptance, and parameters. 7. **Real Economic Impact:** Capital funds real-world productivity (factories, clinics, businesses) via insurance, moving beyond crypto's internal loops. The discussion highlights a pivotal moment: DeFi's supply-side infrastructure is now met by real demand for productive yield, potentially kickstarting a flywheel where vast on-chain stablecoin capital seeks these real-world returns.

链捕手12h ago

Stablecoins Finally Find Real Yield: An In-Depth Look at On-Chain Reinsurance Re | A Conversation with Re Founder Karan Saroya

链捕手12h ago

1996 or 1999? Walsh's First Test is 'How to View AI'

"1996 or 1999? Wall's First Big Test Is 'How to View AI'" Federal Reserve Chairman Wall's initial challenge is not whether to raise or cut rates, but a more fundamental judgment: what kind of boom is the current AI boom? This will determine the Fed's policy path and define his legacy. Economics is split between two opposing views, according to reporter Nick Timiraos. One sees imminent productivity gains that will increase supply and cool inflation, allowing the Fed to hold steady. The other argues that while productivity benefits are distant, demand shocks are here now, and waiting for data confirmation risks missing the intervention window, forcing sharper rate hikes later. Wall has signaled a leaning toward the first view, echoing 1996-era Alan Greenspan, who embraced strong, productivity-driven growth without fear of inflation. However, Wall faces a different macro environment than Greenspan did, with tariff pressures, expanding fiscal deficits, and diminishing globalization benefits, which could force more significant inflation pressures even if AI benefits materialize. Wall's logic, expressed before taking office, is that AI-driven productivity gains won't show in official data for years. If the Fed waits for confirmation, it might mistakenly tighten policy and choke off the very growth that could suppress inflation. This argues for using forward-looking narratives over lagging data. Chicago Fed President Austan Goolsbee presents a key counter-argument. He distinguishes between expected and unexpected productivity booms. A widely anticipated boom, like the current AI wave, can cause people to spend future wealth gains in advance, overheating the economy before productivity actually rises, thus requiring preemptive rate hikes. He cites rising costs for AI data centers as evidence of such overheating. Fed Governor Christopher Waller offers a rebuttal to Goolsbee, noting the "expected spending" mechanism only works if people can borrow against future income, which many households cannot do due to borrowing constraints. Wall also faces a paradox related to his desire to reduce the Fed's use of "forward guidance" (pre-announcing policy moves). This practice was established in 1999 when Greenspan began signaling hikes to avoid market shocks. If the economy follows a less optimistic path, Wall may be forced to choose between using the guidance he wants to abolish or risking market volatility by staying silent. The ultimate question defining Wall's first major test remains: Is this 1996 or 1999?

marsbit13h ago

1996 or 1999? Walsh's First Test is 'How to View AI'

marsbit13h ago

Trading

Spot
Futures

Hot Articles

How to Buy BANANA

Welcome to HTX.com! We've made purchasing Banana Gun (BANANA) simple and convenient. Follow our step-by-step guide to embark on your crypto journey.Step 1: Create Your HTX AccountUse your email or phone number to sign up for a free account on HTX. Experience a hassle-free registration journey and unlock all features.Get My AccountStep 2: Go to Buy Crypto and Choose Your Payment MethodCredit/Debit Card: Use your Visa or Mastercard to buy Banana Gun (BANANA) instantly.Balance: Use funds from your HTX account balance to trade seamlessly.Third Parties: We've added popular payment methods such as Google Pay and Apple Pay to enhance convenience.P2P: Trade directly with other users on HTX.Over-the-Counter (OTC): We offer tailor-made services and competitive exchange rates for traders.Step 3: Store Your Banana Gun (BANANA)After purchasing your Banana Gun (BANANA), store it in your HTX account. Alternatively, you can send it elsewhere via blockchain transfer or use it to trade other cryptocurrencies.Step 4: Trade Banana Gun (BANANA)Easily trade Banana Gun (BANANA) on HTX's spot market. Simply access your account, select your trading pair, execute your trades, and monitor in real-time. We offer a user-friendly experience for both beginners and seasoned traders.

3.2k Total ViewsPublished 2024.03.29Updated 2026.06.02

How to Buy BANANA

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of BANANA (BANANA) are presented below.

活动图片