Is Writing Prompts Outdated? AI Programming is Shifting to Loop Engineering

marsbitPublished on 2026-06-10Last updated on 2026-06-10

Abstract

"The era of manually writing prompts for AI coding agents is shifting towards 'Loop Engineering,' where developers design automated, self-managing workflows instead of step-by-step guidance. A loop consists of five core components: Automations (scheduled task discovery and triage), Worktrees (isolated environments to prevent file conflicts), Skills (project-specific knowledge and conventions), Plugins/Connectors (integration with tools like GitHub, Linear, or Slack), and Sub-agents (separate agents for execution and verification). An external memory layer, such as Markdown files or Linear boards, tracks progress across sessions. Tools like Claude Code and Codex now embed these components, enabling loops to run autonomously—discovering tasks, assigning work, checking results, and deciding next steps. This approach amplifies developer leverage but doesn’t replace human judgment. Key risks include 'comprehension debt' (losing understanding of the codebase) and 'cognitive surrender' (over-relying on automation). Effective loop design requires balancing automation with ongoing verification and deep system understanding, making it a more complex but higher-leverage skill than prompt engineering."

Editor's Note: The way AI coding agents are used is shifting from "manually writing prompts and advancing tasks step by step" to "people designing loops that let the system continuously schedule agents." What Addy Osmani calls Loop Engineering is, at its core, about setting up a workflow that can automatically discover tasks, assign them, check results, track progress, and decide the next steps.

This loop roughly consists of five modules: Automations (periodically discovering and triaging tasks), Worktrees (isolating multiple parallel development environments), Skills (documenting project knowledge and team conventions), Plugins/Connectors (connecting to real tools like GitHub, Linear, Slack, databases, etc.), and Sub-agents (separating the executor and the reviewer). Additionally, there's an external memory layer, such as Markdown files or a Linear board, to preserve state and progress.

The article reminds us that the significance of Loop Engineering is not just about "letting the AI run a few more rounds," but about front-loading an engineer's judgment into the system design. Loops can significantly amplify a developer's leverage, but they won't replace verification, understanding, and judgment. The real risk isn't in using loops, but in using them as an excuse to avoid understanding the code and the system. Perhaps the key skill for collaborating with AI programming in the future will no longer be just writing a good prompt, but designing reliable, verifiable, and sustainably running agent workflows.

Here is the original article:

Loop engineering is replacing your role as the "person who writes prompts for the agent." You design a system that prompts the agent on your behalf. The "loop" here can be understood as a recursive goal: you define an objective, and the AI iterates continuously until the task is complete. It roughly consists of five components, and both Claude Code and Codex now possess these five components.

I believe this might be how we collaborate with coding agents in the future. However, this is still in its early stages, and I remain skeptical. You absolutely need to be mindful of token costs, as the difference between usage patterns can be massive, especially depending on whether you are "token-rich" or "token-poor." You also need some mechanism to ensure quality doesn't degrade. Concerns about "AI slop" are valid. That said, let's see what this is all about.

@steipete recently said: "You shouldn't be writing prompts for coding agents anymore. You should be designing loops that prompt your agents." Similarly, Anthropic's Claude Code lead @bcherny said: "I don't prompt Claude anymore. I have a bunch of loops running that prompt Claude and decide what to do next. My job is to write loops."

So what does this mean?

For the past two years or so, the basic way to get a coding agent to do something was to write a good prompt and provide enough context. You'd input a sentence, read the response, then input the next sentence. The agent was a tool, and you held it, pushing it forward round after round. That phase is somewhat over, or at least some think it's about to end.

Now, you build a small system: it discovers work, assigns tasks, checks results, logs completion, and decides what to do next. In other words, you let this system drive the agent, rather than prompting it yourself repeatedly. I previously wrote about its "close relative"—agent harness engineering, which is building the runtime environment for a single agent; and the factory model, which is the system that builds software. Loop engineering sits one layer above the harness. It's like a harness, but runs on a timer, spawns little assistants, and feeds itself.

What surprised me is that this is no longer just a "tool-level" concern. A year ago, if you wanted a loop, you'd write a bunch of bash scripts and maintain them forever. That was your own thing, for you alone. Now, these components are built directly into products. The capabilities listed by Steinberger almost map one-to-one to the Codex app, and almost equally to Claude Code. Once you realize they share the same shape, you stop worrying about which tool to use and start designing a loop: it keeps running regardless of which tool you're sitting in.

Five Components, and Some Explanations

A loop needs five things, plus a place to remember information. I'll list them first, then explain each.

First, Automations: Triggered on a schedule, they automatically discover and triage.

Second, Worktrees: Ensure two agents working in parallel don't step on each other's files.

Third, Skills: Write down project knowledge so the agent doesn't have to guess every time.

Fourth, Plugins and connectors: Connect the agent to tools you already use.

Fifth, Sub-agents: One proposes solutions, another reviews them.

Then the sixth thing: Memory. It can be a Markdown file, a Linear board, or anything separate from a single conversation that can store "what's done" and "what's next." It sounds simple and unimportant, but this is the same trick every long-running agent relies on. I've written in detail about long-running agents: the model forgets between runs, so memory must be on disk, not in context. The agent forgets, but the code repository doesn't.

Now, both products already have these five components.

Their naming differs slightly, but the capabilities are essentially the same. Let me explain each, because frankly, whether a loop runs stably or quietly leaks everywhere comes down to the details.

Automations: This is the loop's heartbeat

Automations are what make a loop a loop, not a one-off task you ran manually once. In the Codex app, you can create an automation on the Automations tab, selecting the project, the prompt it runs, the frequency, and whether it runs in your local checkout or a background worktree. Runs that find issues go to the Triage inbox; those that find nothing auto-archive, which is nice. OpenAI also uses them internally for boring but necessary tasks: daily issue triage, summarizing CI failures, writing commit digests, tracking bugs introduced last week. Automations can also call skills, so you can keep recurring tasks maintainable: trigger $skill-name instead of pasting a wall of instructions into a scheduled task no one will ever update.

Claude Code achieves the same, just via a different path: scheduling and hooks. You can use /loop to run a prompt or command at fixed intervals, schedule a cron job, or use hooks to trigger shell commands at certain points in an agent's lifecycle. If you want it to keep running after you close your laptop, you can push the whole thing to GitHub Actions. The idea is identical: define an autonomous task, give it a rhythm, and have findings come to you instead of you checking everywhere.

There's also an in-session primitive worth knowing, closer to the real core of this article. /loop repeats at intervals; /goal continues until a condition you write actually holds. After each round, a separate small model judges if the task is done, so the coding agent isn't grading its own work. You give it a condition like "all tests in test/auth pass and lint is clean" and walk away. Codex has the same capability, also called /goal. It works persistently across rounds until a verifiable stop condition is met, with support for pausing, resuming, and clearing. Same primitive, both tools. That's basically the recurring pattern in this article.

So, Automations surface the work. The rest of the loop processes it.

Worktrees: Keeping parallelism from becoming chaos

Once you run more than one agent, file conflicts become a failure point. Two agents writing to the same file is inherently problematic, just like two engineers modifying the same line without communicating. git worktree solves this. It's a separate working directory on an isolated branch but sharing the same repository history, so changes from one agent physically cannot touch another agent's checkout.

Codex has built-in worktree support, so multiple threads can work on the same repo simultaneously without collisions. Claude Code can achieve the same isolation via git worktree: you can open a session in an isolated checkout with the --worktree flag, or set isolation: worktree on a subagent, giving each little assistant a fresh checkout that gets cleaned up automatically afterwards. I wrote about the human side in the orchestration tax: worktrees remove mechanical conflicts, but you are still the bottleneck. What truly determines how many agents you can run concurrently isn't the tool, but your review bandwidth.

Skills: So you don't have to re-explain the project every time

A skill is a mechanism so you don't have to re-explain the same project context from scratch like a goldfish every session. Both tools use the same format: a folder with a SKILL.md holding instructions and metadata; optional scripts, references, and resource files. Codex runs a skill when you invoke it with $ or /skills, and also runs it automatically when your task matches the skill's description. That's why a tight, plain description is often better than a clever, flashy one. Claude Code does the same; I've written about this pattern in agent skills.

Skills are also where you stop consuming your intent repeatedly. I said in intent debt that agents start each session cold; any gap in your intent, and they fill it with confident guesses. A skill writes that intent externally: project conventions, build steps, "we don't do this because of that incident before," all written once in a place the agent reads every run. Without skills, every loop round re-derives your entire project from zero; with skills, it's more like compounding interest.

One thing to clarify: a skill is a writing format, a plugin is a distribution method. When you want to share a skill across multiple repos or bundle several together, you package them as a plugin. Codex does this; Claude Code does this.

Plugins and connectors: Letting the loop touch your real tools

A loop that only sees the filesystem is a very small loop. Connectors, built on MCP, let the agent read your issue tracker, query databases, call staging APIs, or post in Slack. Both Codex and Claude Code support MCP, so a connector you write for one often works in the other. Plugins package connectors and skills together, letting your teammates install a complete setup at once instead of rebuilding it from memory.

This is the difference between "an agent tells you 'here's the fix'" and "a loop opens the PR itself, links the Linear ticket, and notifies the channel when CI passes." Connectors matter because they let the loop act within your real environment, not just tell you "if I could, I would."

Sub-agents: Keeping the maker away from the checker

Within a loop, one of the most useful structural designs is simply separating the "writer" and the "checker." The code-writing model is too easily generous when grading its own work. Another agent with different instructions, sometimes even a different model, can catch issues the first agent self-convinced itself to overlook.

Codex only generates subagents when you ask; they run in parallel and merge results back into a single answer. You can define your own agents in .codex/agents/ with TOML files: each has a name, description, instructions, and optional model and reasoning strength. So your security reviewer can be a strong model with high reasoning strength, while your explorer can be a fast, read-only lightweight model. Claude Code achieves similar through subagents and agent teams in .claude/agents/, letting multiple agents pass work among themselves. The most common division on both sides is: one agent explores, one implements, one verifies against the spec.

I've argued this point twice: once in code agent orchestra, and again in adversarial code review. It's especially important in a loop because the loop runs unattended, so a verifier you truly trust is the only reason you dare walk away. Subagents do consume more tokens, as each agent makes its own model calls and tool calls, so you should use them where a second opinion is worth the cost. This is also basically what Claude Code's /goal does under the hood: a fresh model judges if the loop is done, not the one that did the work. In other words, it applies the maker/checker separation to the stop condition itself.

What a Loop Looks Like

Putting this together, a single thread becomes a small control panel. Here's a structure I often use.

Every morning, an automation runs on the repo. Its prompt calls a triage skill, reads yesterday's CI failures, open issues, recent commits, and writes findings to a Markdown file or Linear board. For each issue worth handling, the thread opens an isolated worktree, dispatches a sub-agent to draft a fix, then a second sub-agent to review that fix against project skills and existing tests.

Connectors let this loop open PRs itself and update tickets. Anything the loop can't handle goes to the triage inbox for me. The status file is the spine of the whole system: it remembers what was tried, what passed, what remains undone. So the next morning's run picks up where today left off.

Notice what you're actually doing. You only designed it once. Those steps aren't things you personally prompted step-by-step. This is the real-world version of Steinberger's statement. And the same loop can run in Codex or Claude Code because the components are the same components.

What Loops Still Won't Do For You

Loops change how work is done, but they don't remove you from the work. In fact, as loops get stronger, three problems become sharper, not easier.

Verification still depends on you. A loop running unattended can also be making mistakes unattended. The reason you separate the verifier sub-agent from the maker is to make the loop's statement of "done" somewhat meaningful. Even so, "done" is still a claim, not proof. I keep repeating the same line in code review in the age of AI: your job is to ship code you have confirmed works.

If left unchecked, your own understanding still rots. The faster loops ship code you didn't personally write, the faster the gap grows between what you actually understand and what's actually in the system. This is comprehension debt. A smooth loop just lets that debt accumulate faster if you don't read its output.

And yes, the most comfortable posture is likely the most dangerous. When loops run themselves, it's easy to stop forming your own judgment and just accept whatever they return. I call this cognitive surrender. If you design a loop with judgment, it's the antidote; if you design a loop to avoid thinking, it's the accelerant. The same action yields completely opposite results.

Build Loops, But Still Be an Engineer

I think this signals the evolution of our future work. That said, if I don't personally review code, or rely entirely on automated loops to fix code, my product quality suffers. I'd likely get into a downward spiral: digging myself deeper.

So, by all means, go build your loops. But don't forget that directly prompting your agent is still effective. The key is finding the right balance.

Loop outcomes will also vary by person. Two people can build identical loops with opposite results. One uses it to accelerate work they deeply understand; the other uses it to avoid understanding the work itself. The loop doesn't know the difference. You do.

That's why loop design is harder than prompt engineering, not easier. Cherny's point isn't that the work gets easier; the leverage point shifts.

Build loops. But build them like someone who still intends to be an engineer, not like someone whose only job is to press "start."

Related Questions

QWhat is Loop Engineering in the context of AI programming, and what is its core purpose?

ALoop Engineering refers to designing systems where AI agents are automatically prompted and managed within a continuous workflow, rather than manually writing prompts for each step. Its core purpose is to create a recursive loop that can autonomously discover tasks, assign them, check results, track progress, and decide on next steps, thereby amplifying developer leverage while incorporating human judgment into the system design.

QAccording to the article, what are the five key components of a loop in Loop Engineering?

AThe five key components are: 1. Automations (timely discovery and triage of tasks), 2. Worktrees (isolated environments for parallel development), 3. Skills (documented project knowledge and team conventions), 4. Plugins/Connectors (integration with real tools like GitHub, Linear, Slack), and 5. Sub-agents (separating executors from reviewers). Additionally, an external memory layer (like Markdown files or Linear boards) is needed to preserve state.

QHow do 'Skills' and 'Sub-agents' specifically contribute to the effectiveness of a loop?

A'Skills' store project-specific knowledge and conventions externally, allowing agents to avoid cold starts and repetitive explanations, thus enabling loops to build cumulative understanding. 'Sub-agents' separate the roles of making (e.g., drafting code) and checking (e.g., reviewing against specifications), which reduces bias and improves reliability, especially in unattended operations, by having different agents or models validate each other's work.

QWhat are the potential risks or challenges associated with using Loop Engineering, as mentioned in the article?

AKey risks include: 1. Verification still ultimately depends on human engineers; loops can make unsupervised mistakes. 2. 'Comprehension debt' can grow if developers don't actively understand the code produced by loops. 3. 'Cognitive surrender'—where developers stop forming their own judgments and blindly accept loop outputs—can lead to a dangerous downward spiral in code quality and system understanding.

QHow does the article suggest the role of an engineer evolves with the adoption of Loop Engineering?

AThe article suggests that the engineer's role shifts from manual prompt writing and step-by-step task management to designing reliable, verifiable, and sustainable agent workflows. The key competency becomes loop design—embedding judgment into system architecture—rather than just crafting good prompts. However, engineers must still actively review outputs, maintain comprehension, and avoid using loops as an excuse to disengage from understanding the code and system.

Related Reads

12.9 Million Candidates: The First Summer of Fate in the Hands of AI

The 2026 Chinese college entrance exam, or Gaokao, saw a novel phenomenon: AI aggressively entering the college application advice arena before results were even released. Major tech companies like Alibaba, Tencent, Baidu, and others launched free AI-powered "agents" and tools designed to generate personalized university and major recommendations for over 12.9 million candidates. For years, a lucrative industry thrived on the "information gap" in college applications, with personalized consulting services costing families thousands of dollars. AI is now disrupting this by providing similar, data-driven analysis for free. These tools process standardized data—scores, rankings, historical admission trends—to create tailored application strategies, offering a form of information parity previously unavailable, especially to students from rural or less-resourced backgrounds. This shift represents more than just a marketing trend; it signifies AI's first large-scale entry into a critical, high-stakes life decision for millions of Chinese families. The Gaokao application, with its clear inputs and outputs, is an ideal scenario for AI. Its involvement begins to level the informational playing field, potentially reducing the advantage held by families with greater social capital or access to expensive consultants. However, the article raises a profound question: while AI can optimize choices for employability and financial return based on cold data, it risks promoting a homogenized, utilitarian path. It might steer a passionate student away from a less lucrative field like literature or archaeology toward supposedly "safer" options like computer science. The core dilemma remains: as AI flattens information disparities, does it also flatten the diversity of life choices and the freedom to make—and learn from—mistakes? Ultimately, 2026 may be remembered not for exam questions, but as the year AI began formally influencing the life trajectories of ordinary Chinese people. The real test lies not in the algorithm's recommendations, but in whether individuals will retain the courage to make their own choices and bear the consequences in an increasingly algorithmic age.

marsbit36m ago

12.9 Million Candidates: The First Summer of Fate in the Hands of AI

marsbit36m ago

IC3 Top Universities Collaborative Analysis: Is AI x Crypto the Real Future or Just a Narrative Bubble?

IC3 researchers from leading universities analyze the convergence of AI and crypto. They argue meaningful integration is still nascent, with hype often outstripping progress. The report frames AI as a "translation middleware" making blockchain accessible, while crypto serves as a "trust middleware" via tools like ZK proofs and TEEs for integrity, availability, and confidentiality. Two main directions are examined: 1) **Crypto x AI**: Using AI to enhance blockchain via analysis (fraud detection), algorithmic design, and AI oracles (with accuracy varying by task). New risks include AI-driven malicious smart contracts. 2) **AI x Crypto**: Using crypto to enhance AI via decentralized infrastructure (DePIN), data markets, agent micropayments, governance, and securing AI pipelines (training/federated learning, secure inference). The "Protected Pipeline" (Props) framework combines oracles and trusted computation for secure use of private data. Key challenges are highlighted: The industry must rigorously prove decentralized AI's cost competitiveness and crypto's utility for agent payments. Major research gaps include providing systemic security for autonomous agents and addressing novel threats like unstoppable AI agents. The report concludes by debunking five common misconceptions: blockchain cannot inherently detect AI content, solve algorithmic bias, grant true AI autonomy, ensure AI trustworthiness through mere transparency, or guarantee that decentralization is always cheaper for AI tasks. The field remains in an early, evidence-seeking phase.

marsbit1h ago

IC3 Top Universities Collaborative Analysis: Is AI x Crypto the Real Future or Just a Narrative Bubble?

marsbit1h ago

Anthropic Released the "Most Powerful Model," But Most People Can't Use It

In April, Anthropic launched a preview of its "Mythos" model, which was not publicly released due to its exceptional ability to autonomously discover high-risk zero-day vulnerabilities, posing a security threat if misused. It was restricted to a trusted group of security partners under "Project Glasswing." On June 10, Anthropic officially released Fable 5 and Mythos 5. They share the same underlying model but are distributed under different rules. Fable 5 is for general users, while Mythos 5 remains locked for trusted security partners. Benchmarks show Fable 5 leading in software engineering and long-task execution, with significant improvements in generating production-ready code. However, Fable 5 includes a safety classifier that automatically downgrades requests related to cybersecurity, biochemistry, or model distillation to the weaker Opus 4.8 model. This mechanism, while intended for safety, can affect the user experience and has faced criticism for being overly conservative. Pricing is another key point. Fable 5's API costs are double that of Opus 4.8. Furthermore, after a free trial period ending June 23, it will be removed from standard subscription plans, requiring users to purchase additional credits for access. This shift signals a move towards pay-as-you-go pricing for the most advanced capabilities. The strategy highlights a growing divergence in the AI industry: while some players like DeepSeek are drastically cutting prices, Anthropic is increasing them for its top-tier model, using cost as a filter for high-value users. The article suggests the AI market is stratifying, with commoditized capabilities becoming cheaper while premium, cutting-edge models command a significant price premium.

marsbit1h ago

Anthropic Released the "Most Powerful Model," But Most People Can't Use It

marsbit1h ago

Trading

Spot
Futures
活动图片