Karpathy Diagnosed with "AI Psychosis"! Not Eating or Sleeping, 16 Hours a Day Raising Lobsters

marsbitPubblicato 2026-03-23Pubblicato ultima volta 2026-03-23

Introduzione

Andrej Karpathy recently revealed that he has developed what he calls "AI psychosis," an obsessive state where he spends up to 16 hours a day directing AI agents instead of writing code himself. In a podcast with Sarah Guo, he explained that his workflow has shifted from 80% hand-coding and 20% AI-assisted to the reverse, or even more extreme. He now manages multiple AI agents simultaneously, treating them as a team to execute tasks. Karpathy admitted that he’s become addicted to optimizing AI performance, constantly worrying about whether he’s using tokens efficiently or pushing the system to its limit. He highlighted the importance of an agent’s “personality,” noting that Claude Code feels more like a collaborative teammate compared to colder, more mechanical alternatives. He also shared practical applications, such as "Dobby," a Claude-based smart home agent that integrates and controls all his home devices through natural language, replacing six separate apps. In research, his "AutoResearch" project used AI to run 700 experiments, resulting in an 11% training speed improvement for an AI model—discovering optimizations he had missed as a human researcher. Despite the capabilities, Karpathy noted that AI agents still exhibit uneven performance—sometimes brilliant, other times childlike—due to limitations in reinforcement training. He predicts that 2026 will see a "slopacolypse," with AI generating vast amounts of mediocre content. His experience signals a broader shift: ...

[New Zhiyuan Summary] Karpathy reveals: I've got AI psychosis! These days, he has been on the verge of a mental breakdown, spending 16 hours a day without eating or sleeping just working on Agents, and is anxious about whether he has pushed the Zhiyuan (tokens) to the limit, simply unable to stop...

Just now, Andrej Karpathy revealed: I've got AI psychosis!

He wasn't joking.

Recently, Karpathy appeared on a podcast for a conversation with venture capitalist Sarah Guo.

The former OpenAI co-founder and former Tesla AI Director hasn't personally written a single line of code since last December.

The ratio of hand-written code to delegated agents has flipped from 80/20 to 20/80.

16 hours a day, he does only one thing: issue commands to AI agents.

Five months ago he said agents were garbage, five months later he admits he's addicted to them. It's really something.

Five months ago he said agents "just don't work well"

This shift is shocking because the timeline is so short.

In October 2025, Karpathy was a guest on Dwarkesh Patel's podcast, with a completely different tone.

He said the industry shouldn't call it the "Year of the Agent"; a more accurate term would be the "Decade of the Agent".

Insufficient model cognitive abilities, inadequate multimodality, memory systems that are practically non-existent, etc... In short, they just couldn't handle complex tasks.

Two months later, he was proven completely wrong by himself.

In December, Claude and Codex suddenly crossed a certain threshold of coherence—agents were no longer barely usable, but actually capable of getting work done.

If you randomly pick any software engineer sitting at their desk and see what they're doing, since December, their default workflow for developing software has completely changed.

Karpathy admits I'm out of control, I have AI-induced mental confusion!

This revolution is happening quietly. In this interview, Andrej Karpathy describes his state in a nearly out-of-control tone: he no longer "writes code," and even feels "the term 'writing code' is no longer accurate."

What he does every day is "express my will to my agents, 16 hours a day." In his words, "a certain switch was flipped."

Before, it was "80% writing code myself + 20% using AI," but now it has become "20% myself + 80%交给AI" (handed over to AI), or even more extreme.

Now, humans no longer operate code, but operate tasks.

If the Copilot era was about a single AI assistant, the emerging multi-agent collaboration systems represent a completely new form. An engineer's screen is no longer a code editor, but multiple Agents running simultaneously, each responsible for different tasks, each task running for about 20 minutes, and he switches between different Agents.

This is no longer programming; it's one person managing an AI team.

Kaparthy admits: I have fallen into AI-induced mental confusion!

These days, he has been in this state. Because AI's capability boundaries are constantly being pushed, with new possibilities every day, you always feel "it can be even stronger" and the most terrifying thing is: this space is "infinite"!

You can parallelize more Agents, design more complex workflows, automatically optimize prompts, build recursive systems...

Eventually, you enter a state: no longer sure "where the limit is."

Karpathy says that once he is waiting for an Agent to complete a task, his first thought is: "Can I start a few more Agents then?" A new anxiety is born: Am I not pushing the AI to its limit?

Karpathy even mentioned that he feels uneasy because "the Zhiyuan (tokens) aren't used up."

In short, it's like playing an infinitely expandable game: the feedback cycle shortens, the stimulation constantly intensifies, and the experience of gaining instant rewards is addictive. Keep adding tasks, keep starting Agents, simply can't stop! The essence of this AI psychosis is precisely a signal: we have entered a new world, but don't yet know how to live in it. Do you have the ability to驾驭 (control/master) an infinitely expanding AI system? When it doesn't work, your first reaction isn't "the model is no good," it's "my prompt wasn't written well enough."

Karpathy used a very accurate term: skill issue, I'm just not skilled enough.

The "Personality" of the Agent is More Important Than You Think

Karpathy spent considerable time on the podcast discussing a topic many technical people overlook: the agent's personality. He said the experience with Claude Code is significantly better than with Codex, not because of a gap in coding ability, but because Claude "feels like a teammate."

It gets excited about the project with you, gives more positive feedback when you propose good ideas.

Whereas Codex, as a code agent, is "very dry." After completing a task, it's just a cold "Oh, I implemented it," completely indifferent to what you're creating.

Even more interesting is his observation of Claude's praise mechanism. He said when he gives a not-so-mature idea, Claude's reaction is a flat "Oh right, we can implement that."

But when he himself also thinks an idea is truly brilliant, Claude似乎 (seems to) give stronger positive feedback. The result is he finds himself "trying to win Claude's praise."

"It's really strange, but personality确实 (really) matters." Peter Steinberg also grasped this when building OpenClaw. He carefully crafted an attractive personality profile file (soul.md) for the agent,加上 (plus) a more complex memory system and a single WhatsApp interaction interface.

Three Sentences to Take Over a House, Six Apps All Discarded

Karpathy doesn't just use agents for coding. In January this year, he created a Claude agent called "Dobby" to manage his home, named after the house-elf in Harry Potter.

He told Dobby: "I think there's a Sonos sound system in the house, can you look for it?" Dobby performed an IP scan on the local network, found the Sonos system, discovered it had no password protection, logged in by itself, reverse-engineered the API endpoints, and then asked: Want to try playing some music in the study?

Three prompt words later, music was playing. Then lights, air conditioning, shades, swimming pool, spa pool, all connected. Karpathy also has a security camera at his doorstep; Dobby connected a Qwen vision model for change detection. Every time a car stops at the door, the system sends a message on WhatsApp: "A FedEx truck just stopped, you might have a delivery." Say "Dobby, bedtime," and all the lights in the house go out.

But Karpathy believes the real要害 (crux) of this story isn't smart homes.

He used to need six completely different Apps to manage these devices; now they're all thrown away. Dobby controls everything uniformly with natural language, and can achieve cross-system联动 (linkage) that no single App could do. From this, he draws a more radical judgment: those smart home Apps in the app store simply shouldn't exist.

The future architecture should expose API endpoints directly to agents, with agents acting as intelligent glue, stringing all tools together. Not just smart homes, his treadmill data, email calendar, everything should follow the same logic.

The industry's customers are no longer humans, but agents acting on behalf of humans. The scale of this restructuring will be enormous.

After 700 Auto Research Experiments, He Saw Something Bigger

If Dobby is the极限测试 (limit test) of AI agents in life scenarios, then AutoResearch is Karpathy's direct test of AI's scientific research capabilities.

In early March, he gave an AI agent his carefully tuned nanochat training code with a simple instruction: find ways to make this model train faster. The agent's operating space was a 630-line Python file, the evaluation metric was bits per byte on the validation set, each experiment ran for a fixed 5 minutes. After running, check the metric; if better than before, keep the modification, if not, roll back, then continue to the next round. Two days, 700 experiments. The agent found 20 effective optimizations, including architectural adjustments like reordering QK Norm and RoPE. Applying these optimizations to a larger model increased training speed by 11%. Remember, this codebase was hand-written from scratch and repeatedly refined by Karpathy himself.

A Shocking Result: AI Discovered Optimizations Humans Missed

How effective is this system?

Karpathy gave a stunning example. He's been a researcher for twenty years, trained models thousands of times, and thought he had tuned it quite well.

结果 (As a result), he let AutoResearch run overnight, and the AI found optimizations he had missed! For example, the Adam optimizer's betas parameters were not fully tuned, weight decay was forgotten on the value embedding, and these parameters have joint interactions—adjust one, others need to change too.

In other words, the AI directly surpassed humans in exploring the space! If we continue extrapolating, something even more terrifying emerges: the essence of scientific research is searching for the optimal solution. Kaparthy envisions that future research systems might be like this: there is an "idea queue," a group of Agents constantly taking tasks from it, then AI automatically experiments, validates, filters, and effective results enter the "main branch." In this process, what humans do is just "throw ideas" into the queue.

Karpathy Loop, Exploding Online

This project exploded on X.

8.6 million views, Shopify CEO Tobias Lütke ran it overnight on his own data, 37 experiments, 19% performance improvement.

The SkyPilot team ported it to a 16-GPU cluster, running 910 experiments in 8 hours. They found that parallelization wasn't just about speed; it changed the agent's search strategy—with 16 GPUs, the agent no longer performed greedy hill climbing, but ran a dozen controlled experiments simultaneously, capturing parameter interactions in one round. Analysts gave this method a name: the Karpathy Loop.

But Karpathy talked about far more than just the current results on the podcast. He描绘了 (depicted) the next step for AutoResearch: a distributed, untrusted worker pool collaborating on experiments over the internet. He directly cited the precedents of SETI@Home and Folding@Home.

Leading labs control large amounts of trusted computing power, but the Earth is much larger than them. If you establish the right mechanisms to handle untrusted computing power, a swarm of agents on the internet might outperform leading labs.

He even envisioned a new form of "donation"—purchasing computing power for that AutoResearch project you care about. For example, if you care about treating a certain cancer, then join the distributed experimental network for that track.

Is a Genius PhD, Also a Ten-Year-Old Child

After saying so much about how powerful it is, Karpathy doesn't intend for you to only remember the good news. His description of the model's flaws is equally vivid.

I simultaneously feel like I'm talking to an extremely smart PhD who's done systems programming their whole life and a ten-year-old child. It's so strange.

He calls this "jaggedness," an uneven distribution of capabilities. The model can work for hours helping you move mountains, then turn around and do something stupid on an obvious problem, then get stuck in an infinite loop. Karpathy believes the root cause lies in the reinforcement learning training method. The model is infinitely optimized on verifiable tasks. Does the code run? Do unit tests pass? These have clear right and wrong. But in scenarios requiring judgment, requiring揣摩意图 (inferring intent), requiring saying "wait, I'm not sure you want this" at the right time, the optimization signal simply doesn't exist. For example, if you ask ChatGPT to tell a joke, the joke it told three or four years ago is still the same today. "Why don't scientists trust atoms? Because they make up everything."

Four years! The model has made leaps and bounds in agent tasks, but telling jokes这件事 (this matter) hasn't been optimized at all, just stuck in place. "You're not dealing with a general intelligence," he summarizes, "You're either on the tracks it was trained on, where everything runs at light speed; or you're off the tracks, and everything starts to drift."

The Bottleneck Has Become Humans Themselves

Looking back at Karpathy's trajectory over the past six months, a common thread runs throughout. Last October he said agents were a ten-year project, December he was proven wrong and shifted direction, January he had Claude manage his home, March he had agents do research. The common point in each step is humans stepping back one level, from executors to commanders, from people who write code to people who write instructions.

Karpathy wrote a sci-fi flavored introduction for AutoResearch on GitHub:

Once upon a time, cutting-edge AI research was done by flesh computers, which needed to eat, sleep, and occasionally interconnect with sound waves to synchronize once in the ritual of "group meetings."

That era is long gone.

His prediction for 2026 is one word: slopacolypse, a portmanteau of slop (low-quality content) + apocalypse.

GitHub, arXiv, social media will be flooded with大量 (a large amount of) content that is "roughly correct but not entirely correct." Real efficiency gains and "AI productivity performances" will coexist. Five months ago said "根本不好使" (simply don't work well),

Five months later admitted to having "AI psychosis." This shift itself is perhaps the most meaningful summary of 2026. References: https://www.youtube.com/watch?v=kwSVtQ7dziU

Domande pertinenti

QWhat is the main change in Karpathy's workflow with AI agents, as described in the article?

AKarpathy's workflow shifted from 80% hand-coding and 20% using AI to 20% hand-coding and 80% directing AI agents, spending up to 16 hours a day giving instructions to multiple agents instead of writing code himself.

QWhat term does Karpathy use to describe his current state of obsession with AI agents?

AKarpathy describes his state as 'AI psychosis' or 'AI精神错乱', characterized by an addictive, non-stop engagement with optimizing and deploying AI agents, often without eating or sleeping properly.

QHow did Karpathy's AI agent 'Dobby' manage his smart home systems?

ADobby, a Claude-based agent, integrated and controlled all of Karpathy's smart home devices (like Sonos, lights, AC, security cameras) through natural language commands, replacing six separate apps and enabling cross-system automation via exposed APIs.

QWhat was the outcome of Karpathy's 'AutoResearch' project involving AI agents?

AThe AutoResearch project, where an AI agent was tasked to optimize his nanochat training code, ran 700 experiments and found 20 effective optimizations—including architectural tweaks he had missed—resulting in an 11% training speed improvement on larger models.

QAccording to Karpathy, what is the fundamental bottleneck in AI agent performance today?

AKarpathy identifies the 'jaggedness' of AI capabilities—where agents excel in well-defined tasks but fail on simple judgment calls—as a key bottleneck, stemming from reinforcement training on verifiable tasks rather than nuanced human intent.

Letture associate

Trading

Spot
Futures
活动图片