Slow Down: The Answer in the Age of Agents

marsbitPublished on 2026-03-29Last updated on 2026-03-29

Abstract

In the era of generative AI, the software industry is shifting from amazement to efficiency anxiety. However, as coding agents are increasingly used in production, issues like amplified errors, uncontrolled complexity, and reduced system reliability emerge. The author argues that agents lack human-like learning from mistakes and, without proper bottlenecks and feedback, minor issues quickly escalate. Their limited perspective and low recall in complex codebases further worsen structural chaos. The core problem isn’t the technology but the premature surrender of human judgment and control driven by anxiety. Instead of fully outsourcing work to agents, the author advocates for a balanced approach: assign agents localized, well-defined tasks while retaining human oversight for system design, quality assurance, and critical decisions. Slowing down becomes a strength—it ensures understanding, enables informed trade-offs, and maintains control. Ultimately, what’s scarce in the AI age isn’t faster code generation but the judgment to manage complexity and the discipline to choose quality over speed.

Editor's Note: As generative AI rapidly integrates into software engineering, industry sentiment is shifting from "awe at capabilities" to "anxiety about efficiency." Not writing fast enough, not using it enough, or not automating thoroughly enough seem to create pressure to avoid being left behind. But as coding Agents truly enter production environments, more practical issues emerge: errors are amplified, complexity spirals out of control, systems become increasingly incomprehensible, and efficiency gains do not translate proportionally into quality improvements.

Based on firsthand practice, this article offers a sober reflection on the current "agentic coding" frenzy. The author points out that Agents do not learn from mistakes like humans do; without bottlenecks and feedback mechanisms, minor issues are rapidly magnified. Furthermore, in complex codebases, their local perspective and limited recall capabilities exacerbate the chaos of the system structure. The essence of these problems lies not in the technology itself, but in humans, driven by anxiety, prematurely relinquishing judgment and control.

Therefore, rather than succumbing to the anxiety of "must we fully embrace AI," it's better to recalibrate the relationship between humans and tools: let Agents handle local, controllable tasks, while firmly keeping system architecture, quality control, and key decision-making in our own hands. In this process, "slowing down" becomes a capability—it means you still understand the system, can make trade-offs, and still retain a sense of control over your work.

In an era of constantly evolving tools, what is truly scarce might not be faster generation capabilities, but the judgment to handle complexity and the fortitude to make choices between efficiency and quality.

The original text follows:

About a year ago, coding Agents that could genuinely help you "complete entire projects from start to finish" began to appear. Earlier tools like Aider and the early Cursor existed, but they were more like assistants than "agents." The new generation of tools is extremely attractive, and many people spent a lot of their free time doing all those projects they always wanted to do but never had time for.

I think that's fine in itself. Working on things in your free time is inherently enjoyable, and most of the time you don't really need to worry about code quality and maintainability. It also gives you a path to learn new tech stacks.

During the Christmas holidays, Anthropic and OpenAI even gave out some "free credits," sucking people in like a slot machine. For many, this was the first real experience of the magic of "Agents writing code." More and more people got involved.

Now, coding Agents are also starting to enter production codebases. Twelve months on, we are beginning to see the consequences of this "progress." Here are my current thoughts.

Everything is Broken

While this is mostly anecdotal, software today gives a feeling of being "fragile and ready to break." 98% availability is becoming the norm rather than the exception, even for large services. User interfaces are filled with outrageous bugs, the kind that QA teams should catch at a glance.

I admit this situation existed before Agents appeared. But now, the problem is clearly accelerating.

We can't see what's happening inside companies, but occasionally information leaks out, like the rumored "AI-induced AWS outage." Amazon Web Services was quick to "correct" the story, but then immediately launched a 90-day remediation plan internally.

Satya Nadella (Microsoft CEO) has also recently emphasized that more and more code within the company is written by AI. While there's no direct evidence, there is a feeling that Windows quality is declining. Even from blogs published by Microsoft themselves, they seem to tacitly acknowledge this.

Companies that claim "100% of the product code is AI-generated" almost always outputting the worst products you can imagine. No offense, but memory leaks measured in GB, chaotic UIs, incomplete features, frequent crashes... these are hardly the "quality endorsements" they think they are, let alone positive examples of "letting the Agent do everything for you."

Privately, you hear more and more, from both large companies and small teams, saying one thing: they have been backed into a corner by "Agent-written code." No code reviews, handing design decisions to Agents, piling on features nobody needs—the outcome is predictably bad.

Why We Shouldn't Use Agents This Way

We have almost abandoned all engineering discipline and subjective judgment, instead falling into an "addictive" way of working: the sole goal is to generate the most code in the shortest time, with no consideration for the consequences.

You're building an orchestration layer to command an army of automated Agents. You install Beads, completely unaware that it's essentially almost uninstallable "malware." Just because the internet says "everyone is doing it." If you don't, you're "not gonna make it" (ngmi).

You're consuming yourself in a constant "recursive loop."

Look—Anthropic used a group of Agents to make a C compiler. It has problems now, but the next-gen model will fix it, right?

Look again—Cursor used a large group of Agents to make a browser. It's basically unusable now and needs manual intervention from time to time, but the next-gen model will handle it, right?

"Distributed," "divide and conquer," "autonomous systems," "lights-out factory," "solving software in six months," "SaaS is dead, my grandma just built a Shopify with Claw"...

These narratives sound exciting.

Sure, this approach might "still work" for your side project that almost no one uses (including yourself). Maybe, just maybe, there exists a genius who can use this method to create a non-garbage, actually-used software product. If you are that person, I sincerely admire you.

But at least in my circle of developer acquaintances, I haven't seen a case where this method actually works. Of course, maybe we're all just too incompetent.

Errors Compound Without Learning, Without Bottlenecks, with Delayed Explosions

The problem with Agents is: they make mistakes. That's fine in itself; humans make mistakes too. They might be correctness errors, easy to identify and fix, and adding a regression test makes it more stable. Or they might be code smells that linters can't catch: an unused method here, an unreasonable type there, some duplicate code, etc. Individually, these are harmless; human developers make these minor mistakes too.

But "machines" are not people. After making the same mistake a few times, humans usually learn not to repeat it—either scolded into awareness or through genuine process improvement.

Agents lack this learning capability, at least by default. They will repeat the same mistakes over and over, and might even "create" wonderful combinations of different errors based on training data.

You can certainly try to "train" it: write rules in AGENTS.md telling it not to make this mistake; design a complex memory system for it to query historical errors and best practices. This can work for certain specific types of problems. But the prerequisite is—you must first observe it making this error.

The more critical difference is: humans are a bottleneck, Agents are not.

A human cannot spit out twenty thousand lines of code in a few hours. Even with a non-trivial error rate, only a limited number of errors can be introduced per day, and their accumulation is slow. Usually, when the "pain from errors" accumulates to a certain level, humans (instinctively averse to pain) will stop to fix them. Or the person is replaced, and someone else fixes it. In short, problems get handled.

But when you use a whole orchestrated army of Agents, there is no bottleneck and no "pain sensation." These originally trivial minor errors compound at an unsustainable rate. You have been removed from the loop, unaware that these seemingly harmless small issues have grown into a behemoth. By the time you truly feel the pain, it's often too late.

Until one day, you want to add a new feature and find the current system architecture (essentially a pile of errors) cannot support the change; or users start complaining frantically because the latest release has problems, or even lost data.

That's when you realize: you can no longer trust this code.

Worse, the thousands of unit tests, snapshot tests, and end-to-end tests you had the Agent generate are also no longer trustworthy. The only way left to determine if "the system is working properly" is manual testing.

Congratulations, you've screwed yourself (and the company).

Purveyors of Complexity

You have completely lost track of what's happening in the system because you handed control to the Agent. And Agents, by nature, are "purveyors of complexity." They have seen tons of terrible architectural decisions in their training data, and these patterns are reinforced during their RL process. Letting them design the system leads to predictable results.

What you end up with is: an extremely complex system, a mishmash of poor imitations of "industry best practices," which you failed to constrain before the problems got out of hand.

But the problem goes further. Your Agents do not share execution context with each other, cannot see the entire codebase, and do not understand the decisions you or other Agents made previously. Therefore, their decisions are always "local."

This directly leads to the problems mentioned earlier: massive code duplication, structures abstracted for abstraction's sake, various inconsistencies. These problems compound, eventually forming an irredeemably complex system.

This is actually very similar to human-written enterprise codebases. Except that kind of complexity is usually the result of years of accumulation: the pain is distributed across many people, no single person reaches the "must fix" breaking point, and the organization itself has high tolerance, so complexity "co-evolves" with the organization.

But in a human + Agent combination, this process is greatly accelerated. Two people, plus a bunch of Agents, can reach this level of complexity in weeks.

Agentic Search Has Low Recall

You might pin your hopes on the Agent to "clean up the mess," to help you refactor, optimize, and clean the system. But the problem is: they can't do it anymore.

Because the codebase is too large, the complexity too high, and they can only ever see locally. This isn't just about the context window being too small, or long-context mechanisms failing against millions of lines of code. The problem is more subtle.

Before the Agent attempts to fix the system, it must first find all the code that needs modification, as well as existing implementations that can be reused. This step we call agentic search.

How the Agent does this depends on the tools you give it: it could be Bash + ripgrep, a queryable code index, an LSP service, a vector database...

But no matter the tool, the essence is the same: the larger the codebase, the lower the recall. And low recall means: the Agent cannot find all relevant code, and therefore cannot make correct modifications.

This is also why those minor "code smell" errors appeared in the first place; it didn't find the existing implementation, so it reinvented the wheel, introducing inconsistency. Eventually, these problems spread and compound, blooming into an extremely complex "flower of rot."

So how do we avoid all this?

How We Should Collaborate with Agents (For Now)

Coding Agents are like sirens, luring you in with extremely fast code generation speed and that "intermittent yet occasionally stunning" intelligence. They can often complete simple tasks with astonishing speed and high quality. The real problems start when you get the idea—"This is so powerful, computer, do my work for me!"

There's nothing wrong with assigning tasks to Agents per se. Good Agent tasks typically have several characteristics: the scope can be well-defined, not requiring understanding of the entire system; the task is closed-loop, meaning the Agent can evaluate the result itself; the output is not on the critical path, just some temporary tool or internal software, not affecting real users or revenue; or you just need a "rubber duck" to aid thinking—essentially taking your ideas and colliding them with the compressed knowledge of the internet and synthetic data.

If these conditions are met, then it's a task suitable for an Agent, provided that you, the human, remain the final quality gatekeeper.

For example, using Andrej Karpathy's auto-research method to optimize application startup time? Great. But you must be clear that the code it spits out is absolutely not production-ready. Auto-research works because you give it an evaluation function, allowing it to optimize around a specific metric (like startup time or loss). But this evaluation function only covers a very narrow dimension. The Agent will righteously ignore all metrics not in the evaluation function, like code quality, system complexity, and even correctness in some cases—if your evaluation function itself is flawed.

The core idea is simple: let Agents do the boring things that don't teach you anything new, or the exploratory work you never had time to try. Then you evaluate the results, pick out the parts that are actually reasonable and correct, and complete the final implementation. Of course, you can also use an Agent for this final step.

But what I want to emphasize more is: really, slow down a bit.

Give yourself time to think about what you are actually doing and why. Give yourself a chance to say "no," "No, we don't need this." Set a clear upper limit for the Agent: how much code it is allowed to generate per day, an amount that should match your actual ability to review it. All parts that determine the "overall shape" of the system, like architecture, APIs, etc., should be written by hand. You can use autocomplete to get a "feel of handwritten code," or pair program with an Agent, but the key is: you must be in the code.

Because, writing code yourself, or watching it being built step by step, brings a sense of "friction." It is precisely this friction that makes you clearer about what you want to do, how the system works, and the overall "feel." This is where experience and "taste" come into play, and this is precisely what the most advanced models currently cannot replace. Slowing down, enduring a bit of friction, is exactly how you learn and grow.

In the end, what you get will be a system that is still maintainable—at least no worse than before Agents appeared. Yes, past systems weren't perfect either. But your users will thank you because your product is "usable," not a pile of slapped-together garbage.

You will do fewer features, but more correctly. Learning to say "no" is a capability in itself. You can also sleep soundly because you at least still know what's happening in the system; you still hold the initiative. It is this understanding that allows you to compensate for the recall problems of agentic search, making the Agent's output more reliable and requiring less patching.

When the system has problems, you can step in and fix it; when the design was不合理 from the start, you can understand the issue and refactor it into a better form. Whether there's an Agent or not isn't really that important.

All of this requires discipline. All of this depends on people.

Related Questions

QWhat are the main risks of using AI coding agents in production environments without proper oversight?

AThe main risks include amplified errors due to lack of learning and feedback loops, uncontrolled complexity from poor architectural decisions, low recall in agentic search leading to inconsistencies, and eventual system unmaintainability. Without human bottlenecks, minor issues compound rapidly, making the codebase untrustworthy and difficult to modify.

QHow does the author suggest humans should collaborate with coding agents effectively?

AThe author recommends using agents for bounded, non-critical tasks like exploratory work or automating tedious processes, while humans retain control over system design, architecture, and quality assurance. Humans should set limits on code generation, review all outputs, and maintain friction by writing core components themselves to ensure understanding and maintainability.

QWhy do AI-generated systems often become overly complex and unmanageable?

AAgents tend to 'sell complexity' by imitating poor architectural patterns from training data and making localized decisions without a global view of the codebase. This results in redundant code, unnecessary abstractions, and inconsistencies. Without human intervention, these issues accumulate rapidly, creating an unmanageable system far quicker than in human-driven development.

QWhat is the 'agentic search' problem mentioned in the article?

AAgentic search refers to an agent's ability to find and recall relevant code in a large codebase. As the system grows, recall rates drop significantly, causing agents to miss existing implementations, introduce duplicates, or make inconsistent changes. This low recall exacerbates system chaos and reduces the reliability of agent-generated code.

QWhat does the author mean by 'slowing down' as a solution in the AI agent era?

A'Slowing down' means prioritizing thoughtful decision-making, human oversight, and disciplined development over raw code generation speed. It involves saying 'no' to unnecessary features, setting limits on agent output, and maintaining hands-on involvement in coding and design. This approach preserves system understanding, control, and quality, ultimately leading to more reliable and maintainable software.

Related Reads

KOL's Perspective: Why Is SOL Set to Rise from This Point?

**Summary: Why SOL is Positioned for Growth at This Level** The article argues that SOL is poised for an upward move from its current price point, citing several key factors. Primarily, SOL has just broken out of a 4-month consolidation phase. This breakout signals a return of risk appetite to the broader crypto market, as SOL is seen as a key indicator of overall crypto health. The token's ownership has reportedly shifted from short-term traders and tourists to long-term accumulators, leading to low volume. Any meaningful increase in trading activity could thus trigger significant upward momentum. Fundamental strengths include strong institutional adoption, integration with DeFi and RWAs (Real-World Assets), and the potential benefits from the Clarity Act. Despite its high volatility—having dropped 70% from its all-time high but still up 12x from its bear market low—SOL is highlighted as one of the few tokens from the last cycle to reach new highs. It boasts a robust ecosystem of applications, users, and protocols. Future catalysts include the expected influx of AI developers following the Miami Accelerate conference, which focused on AI on Solana. Furthermore, Solana is positioned as the premier chain for memecoin activity, a trend expected to continue and drive network usage and fees. The article concludes that recent price action reflects a healthy transfer to long-term holders, setting the stage for growth.

marsbit34m ago

KOL's Perspective: Why Is SOL Set to Rise from This Point?

marsbit34m ago

Those Pre-Bitcoin PoW Protocols Have Recently Been Reimplemented

This article details a recent surge in replicating pre-Bitcoin Proof-of-Work (PoW) protocols, specifically focusing on Hal Finney's 2004 RPOW (Reusable Proofs of Work). Within five days in May 2026, multiple independent builders in the Bitcoin/cypherpunk community launched projects inspired by this early electronic cash proposal. The initiative began with Fred Krueger's `rpow2.com`, a centralized but auditable system that replaced RPOW's original IBM 4758 hardware with Ed25519 signatures. Initially a faithful replica, it later adopted Bitcoin-like features (21M supply cap, difficulty adjustment) and a controversial 5.24% founder allocation. This sparked rapid forks, including `rpow4.com` which incorporated full Bitcoin parameters, a prediction market (`rpowmarket.com`), and a DEX (`rpow2swap.com`). Concurrently, Mike In Space created a prototype of Wei Dai's 1998 b-money proposal (`b-money.replit.app`), pushing the historical exploration even further back. The article contrasts these centralized, server-dependent experiments with Bitcoin's core innovation of decentralized, trustless consensus. It also highlights a parallel development: the `HASH` project on Ethereum, which uses smart contract hooks to enable a purely fair-launch, browser-mineable PoW token with 0% allocations to team or VCs. The collective activity is framed as a meme-driven, educational exploration of cypherpunk history rather than a serious financial movement, with all projects heavily disclaiming any investment value.

marsbit39m ago

Those Pre-Bitcoin PoW Protocols Have Recently Been Reimplemented

marsbit39m ago

South Korean Exchanges 'Battle' Regulators, Challenging the Boundaries of Enforcement and Legislation

South Korea's cryptocurrency industry is engaged in a rare, direct confrontation with regulators. The Financial Intelligence Unit (FIU), the primary anti-money laundering (AML) watchdog, has recently imposed heavy penalties on major exchanges like Upbit and Bithumb for alleged violations involving unregistered overseas VASPs and AML procedures. However, exchanges are now actively challenging these actions in court and through industry associations. In a significant shift, the Seoul Administrative Court ruled in favor of Upbit's operator, Dunamu, overturning part of an FIU-ordered business suspension. The court found the FIU's penalty criteria and justification insufficiently clear. Similarly, the court suspended the enforcement of a six-month business suspension against Bithumb pending a final ruling, citing potential irreversible harm to the exchange. Beyond legal battles, the industry is contesting proposed legislative amendments. The Digital Asset eXchange Alliance (DAXA) strongly opposes a draft rule that would mandate Suspicious Transaction Reports (STRs) for all crypto transfers over 10 million KRW (~$6,800). DAXA argues this "poison pill" clause violates legal principles and would overwhelm the STR system, increasing reports from 63,000 to an estimated 5.45 million annually for major exchanges, thereby crippling effective AML monitoring. This conflict highlights a structural tension in South Korea's crypto governance: comprehensive digital asset laws are still developing, while regulators rely heavily on AML enforcement. The industry's move from passive compliance to active legal and legislative challenges signifies a new phase, pressing for clearer rules and more proportionate enforcement. While short-term disputes may intensify, this clash could ultimately lead to a more mature and sustainable regulatory framework for South Korea's vibrant crypto market.

marsbit1h ago

South Korean Exchanges 'Battle' Regulators, Challenging the Boundaries of Enforcement and Legislation

marsbit1h ago

After 50x Storage Surge, Justin Sun Always Looks to the Next Decade

Sun Yuchen, known for his controversial stunts like a $30 million lunch with Warren Buffett (canceled due to a kidney stone) and eating a $6.2 million duct-taped banana, is often overshadowed by a significant fact: his decade-long track record of spotting major investment trends. In 2016, he famously advised young people to invest in Bitcoin, Nvidia, Tesla, and Tencent instead of buying property. A hypothetical $20,000 investment in Nvidia and Tesla from that list would now be worth over 50 million RMB. His latest major call was on November 6, 2025, predicting a "50x storage opportunity" tied to the AI boom, which materialized with Sandisk's stock surging nearly 50-fold by 2026. Looking ahead, Sun now focuses on the next frontier: Physical AI. He identifies four key areas: 1. **Embodied AI/Robotics**: He sees this reaching its "iPhone moment," with companies like UBTech and Galaxy General leading in commercialization. 2. **Drones**: Viewed as the first commercially viable form of Physical AI, revolutionizing sectors from warfare (e.g., AeroVironment's Switchblade) to logistics. 3. **Spatial Computing**: Beyond VR, it's about AI understanding physical space, a foundational technology for robotics and autonomous systems, exemplified by Apple's Vision Pro. 4. **Space Exploration**: After a 2025 suborbital flight with Blue Origin, Sun advocates for space as the ultimate frontier, discussing blockchain's potential role in space asset management and data transactions. His investment philosophy involves betting on entire, inevitable trends rather than single companies. For robotics, he sees Tesla (the body/manufacturer) and Nvidia (the brain/AI platform) as complementary plays. In defense drones, he highlights companies making tanks obsolete (AeroVironment) and those augmenting fighter jets (Kratos). For space, he participated in Blue Origin's flight and anticipates SpaceX's potential IPO to redefine the sector's valuation. Sun Yuchen's vision frames the next two decades not as a revolution in information flow (like the internet), but in the fundamental operation of the physical world through AI-powered robots, autonomous systems, and spatial intelligence, ultimately extending human and AI activity into space. While many still focus on conventional assets, he continues to look toward the next technological horizon.

marsbit2h ago

After 50x Storage Surge, Justin Sun Always Looks to the Next Decade

marsbit2h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片