Why More AI Agents Does Not Equal Higher Productivity?

marsbitОпубликовано 2026-05-31Обновлено 2026-05-31

Введение

Editor's Note: As AI Agents become cheaper and easier to use, a new constraint emerges: the cost isn't in launching more Agents, but in the human attention required to manage, judge, and integrate their outputs. This hidden cost is called the "orchestration tax." The article argues that a developer's cognitive bandwidth is the key bottleneck—a serial, non-parallelizable resource akin to a Global Interpreter Lock (GIL). While many Agents can run concurrently, their results ultimately require human judgment for review, conflict resolution, and final integration. Therefore, more Agents don't automatically mean higher productivity; they can simply create longer queues, lead to cognitive fatigue, and create the illusion of busyness without real output. The core solution is to design workflows around this scarce human attention. Key strategies include: scaling the number of Agents to match review capacity (not UI capacity), categorizing tasks (delegating independent ones, keeping complex judgment-heavy ones serial), batch reviewing results to minimize context-switching costs, automating verifiable checks to reserve human judgment for critical decisions, and protecting focused, uninterrupted thinking time. Ultimately, the critical skill is not launching many Agents, but architecting systems that respect the fundamental limit of human attention. Unpaid "orchestration tax" accumulates as both technical and cognitive debt, undermining system understanding and quality. True productiv...

Editor's Note: As AI Agents become cheaper and easier to invoke, software development is entering a new phase: the problem is no longer whether we can launch more Agents, but whether humans still have enough attention to manage, judge, and integrate their outputs.

This article introduces a very thought-provoking concept—"orchestration tax." The cost of launching an Agent is low, requiring just a prompt or a click. But the subsequent steps are truly expensive: checking if the result is correct, understanding its impact on the system architecture, handling conflicts between different Agents, and finally deciding which code can be merged into the main branch. This work cannot be simply parallelized; it still returns to the same serial resource: human judgment.

The author compares developers to the "GIL" in an AI Agent system—the single-threaded lock that ultimately limits the throughput of a concurrent system. Multiple Agents can run simultaneously, but as soon as they enter phases like architectural judgment, code review, and conflict resolution, they must pass back through the developer's brain. Thus, more Agents don't necessarily mean higher output; they may just create a longer queue of tasks awaiting review, pushing the developer into more frequent context switching and cognitive fatigue.

This is also a point easily overlooked in the current wave of AI programming tools: a sense of efficiency is not always synonymous with real productivity. A dashboard filled with running Agents creates an illusion of "high productivity"; but if the developer doesn't truly understand, review, and integrate these changes, what the system ultimately accumulates may not be productivity, but technical debt and cognitive debt.

Therefore, the real discussion here is not "how to use more Agents," but "how to redesign workflows around human attention." In the age of Agents, the key skill is not just knowing how to ask questions or delegate tasks, but knowing which tasks can be handled in parallel by machines and which must be reserved for human judgment; knowing when to batch reviews and when to stop orchestrating to refocus on a core problem.

AI is expanding software production's concurrency capacity, but human attention remains the system's most scarce, non-replicable resource. A truly mature Agent workflow doesn't throw all tasks at the machine, but seriously designs its own attention architecture, much like designing a production system.

Here is the original text:

It's now very easy to launch more AI Agents. But having more Agents running simultaneously does not mean "you" have multiplied. Your cognitive bandwidth cannot be parallelized. All the judgment truly needed to guide them, evaluate results, and merge changes must ultimately pass through the same serial processor—you yourself.

The so-called "orchestration tax" is essentially the price you pay for forgetting this. And the only real solution is to start designing your own attention, just as you would design any concurrent system.

I recently participated in a roundtable discussion at Google I/O with Richard Seroter, Aja Hammerly, and Ciera Jaspan, talking about the current state and future evolution of software engineering. Near the end, Richard asked us: What's the one thing developers should take away from this and change?

I shared a point I've been pondering repeatedly these past months: Feeling busy is absolutely not equal to being productive. You can run 20 Agents simultaneously and feel incredibly busy. But that doesn't mean you've delivered the workload of 20 Agents.

Earlier in that conversation, Richard gave this problem a name. He said, "What you're describing is essentially the orchestration tax. You cannot successfully manage 20 Agents in your own head."

He was absolutely right. I want to unpack this concept more fully because this isn't a discipline problem; it's an architecture problem.

There was a line I almost casually uttered during that roundtable that has stuck with me since: Running multiple Agents does not mean there is another you in the world.

The Unaccounted-For Asymmetry

There is a hidden asymmetry in Agent workflows.

Launching an Agent is very cheap. You just press a key or write a prompt. But closing the loop on an Agent is not cheap at all. Someone must check if its returned result is correct and reconcile it with changes made by other Agents.

That someone is you. And there is only one of you.

Last month, I wrote about part of this problem in "Your Parallel Agent Limit," mainly discussing the ambient anxiety of not knowing which parallel thread is quietly failing. This article aims to discuss the structure behind this cost.

When you start viewing Agent development as a concurrent system, you realize that the human is just a component in that system. A very slow, serial component.

You Are the Single-Threaded Resource

If you've written concurrent code, you already possess the intuition to understand this problem. You've just been applying this intuition in the wrong place.

Python has the Global Interpreter Lock, or GIL. You can create as many threads as you want, but only one thread can execute Python bytecode at any given time because they all must acquire this lock first.

You are the GIL for your AI Agents.

They can all run concurrently. But whenever their work requires a genuine understanding of system architecture or needs to resolve merge conflicts, they must acquire that lock first. And there's only one of that lock, held by you.

Amdahl's Law states this very precisely: The speedup limit from parallelization depends on the portion of the work that must still be done serially. If a large part of your process cannot be parallelized, then no matter how many cores you throw at it, you'll eventually hit a hard ceiling.

In Agent development, that serial portion is judgment.

Launching 8 Agents does not accelerate your judgment time. It only makes the queue waiting for you longer.

This is a very old fact in performance engineering, yet many are still surprised by it: Optimizing a non-bottleneck part does not increase overall throughput. You're just piling up more unfinished work in front of the bottleneck.

Adding Agents optimizes the part that was never the constraint. The real constraint is the review phase, and the system's overall throughput is exactly equal to that phase's throughput.

The orchestration tax is the structural gap between Agent production capacity and what you can actually merge. It happens when you task a single-threaded resource with managing a concurrent system.

Pushing Harder Doesn't Solve Structural Limits

During that roundtable, I said something: I have never felt my tools so efficient, yet I have never felt so exhausted.

Both feelings are completely real, and they stem from the same reason.

This exhaustion has a very specific source: it's the feeling of keeping a serial processor at 100% utilization with no slack.

Every time you check back on an Agent that has left your sphere of attention, you pay a context-switching cost. You must flush your brain and reload another context from scratch.

A CPU can do this in microseconds, and architects still try to avoid frequent switching. It takes you minutes, and you can never perfectly restore context.

Five Agents are not 1x the workload repeated five times. It's five cold-start context reloads, plus a background brain process constantly worrying about which Agent you should be checking now.

You cannot solve a structural limit by "trying harder." This tax will always be paid.

If you try to brute-force it, it will eventually manifest in another form: either code reviews become increasingly shallow, or you enter a state of "cognitive surrender"—because forming your own judgment is too taxing, you simply accept whatever code the Agent wrote.

You either pay this tax consciously, or you let it slowly erode your understanding of your system in the dark.

Design Your Attention Like a System

So, you must treat your attention as a scarce serial resource.

You wouldn't design a distributed system without considering bottlenecks. Give your brain the same respect.

Here are some methods that have genuinely worked for me:

Scale your Agent team according to review capacity, not UI capacity.

A good concurrent system uses backpressure mechanisms to prevent queues from growing indefinitely. Producers must slow down to match the consumer's processing capability.

Your number of Agents is the producer; your review capacity is the consumer. The correct number of parallel Agents is the number you can seriously perform code reviews for. For most people, that's typically a low single-digit number.

AI tools will happily let you launch 20 Agents, but that's a UI feature, not an indication of your actual management capacity.

Categorize tasks.

When Richard asked me how I handle this, I mentioned this method. I separate tasks into two piles.

The first pile is relatively independent work I'm willing to delegate to Agents running in the cloud background. These tasks can be executed asynchronously and usually only require a final check from me.

The second pile is complex tasks where the real work *is* judgment. Like a weird bug or an architectural design.

The biggest mistake is trying to parallelize this second category as well. Parallelizing multiple complex tasks doesn't expand your output; it just causes that lock to be heavily contended for, ultimately degrading all results.

Batch reviews.

Each context switch costs you dearly. Sitting down to review results from 4 Agents in one go is much cheaper than checking one, doing something else, and cold-starting again for another.

Give your Agents a longer leash. Let work accumulate a bit, then process it as a batch.

Use that lock only for judgment.

Don't waste your brain on things a machine can verify on its own. Let Agents write tests that pass, or generate screenshots.

Let them prove the 80% of dull but verifiable aspects themselves. Then, your scarce attention only needs to focus on the 20% that truly requires human judgment.

Protect your serial time.

The bottleneck needs your best time, not the leftover scraps between Agent checks.

Sometimes, the highest-leverage action is to completely stop orchestrating: turn off the computer filled with Agents, focus solely on thinking about one problem, and hold that lock firmly throughout the entire process.

Orchestration is not the real work. It's just the overhead generated around the work.

Aja pointed out that architectural ability has become the most urgent skill now: you need to know what tasks fit into an Agent and what tasks are too big for it.

I'd add: You yourself are also a component in this system. Your attention has a known, low serial throughput. The system either respects this number, or it will bypass it by quietly lowering your standards.

Busy Does Not Equal Productive

This point is crucial because this failure mode is almost invisible to you personally.

Twenty running Agents give you a feeling of "productivity explosion." The dashboard is full, everything is moving. But this feeling has become decoupled from actually merging high-quality code into the main branch.

You can be busy to the limit yet produce almost nothing real. From an internal experience, these two states feel almost identical.

Ciera mentioned Margaret-Anne Storey's research on debt. We talked about technical debt and cognitive debt.

Unpaid orchestration tax makes you accumulate both simultaneously.

You merge things you haven't read carefully. Your mental model of the codebase becomes completely outdated. These problems won't appear on the dashboard today. They'll surface when production breaks—when you look at the system and suddenly realize you no longer understand how it actually works.

So, the real conclusion is: Launching Agents is not a capability. Anyone can run 20.

The real capability is designing the system around that resource which cannot be cloned, cannot be parallelized.

That resource is your attention.

Design it as you would design any key component your production environment depends on.

Связанные с этим вопросы

QWhat is the core argument of the article regarding AI Agents and productivity?

AThe core argument is that simply running more AI Agents does not automatically translate to higher productivity. While Agents can perform tasks in parallel, the final, crucial tasks of reviewing, judging, integrating, and architecting their outputs require a single, serial resource: human attention and cognitive bandwidth. Unmanaged, an increase in Agents can lead to a 'coordination tax,' creating longer review queues, context-switching fatigue, and the accumulation of technical and cognitive debt, ultimately limiting real throughput.

QWhat is the 'coordination tax' as described in the article?

AThe 'coordination tax' is the hidden cost incurred when you forget that your cognitive bandwidth is a serial bottleneck. It's the structural gap between the parallel production capacity of multiple AI Agents and your actual ability to properly review, understand, judge, and merge their work. This tax manifests as increased mental fatigue, shallow code reviews, deferred technical understanding, and the accumulation of unmanaged technical and cognitive debt.

QHow does the article use the concept of a 'GIL' to explain the human role in an AI Agent system?

AThe article compares the human developer to the Global Interpreter Lock (GIL) in Python. Just as the GIL allows only one thread to execute Python bytecode at a time despite multiple threads existing, the human developer is the single-threaded resource (the 'lock') in an Agent system. While multiple Agents can run in parallel, any task requiring architectural judgment, code review, or conflict resolution must pass through this single, serial 'lock'—the developer's brain. This creates the fundamental bottleneck for system throughput.

QAccording to the article, what are some practical strategies for managing one's attention as a scarce resource when working with AI Agents?

AKey strategies include: 1) Scaling the number of Agents based on your review capacity, not UI capabilities. 2) Categorizing tasks into independent ones for async Agents and complex judgment-heavy ones for focused human work. 3) Batch reviewing Agent outputs to minimize costly context switches. 4) Using Agents to handle the 80% of verifiable work, reserving human judgment for the critical 20%. 5) Protecting dedicated, uninterrupted 'serial time' for deep thinking and complex problem-solving, sometimes by stopping coordination altogether.

QWhat is the dangerous disconnect the article warns about between the feeling of productivity and actual output?

AThe article warns that a dashboard full of simultaneously running AI Agents creates an illusion of high productivity ('feeling busy'). However, this feeling is dangerously disconnected from the actual output of merging high-quality, well-understood code into the main branch. One can be extremely busy managing numerous Agents yet produce very little real, sustainable value. This failure mode is hard to detect internally and leads to the silent accumulation of technical and cognitive debt, which only becomes apparent when systems fail or become incomprehensible.

Похожее

Blocked Its Own Treasure, WeChat AI Steps Up

Tencent's stock surged over 10% on June 2nd amid reports that WeChat, with 1.43 billion monthly users, is finalizing tests for a native AI Agent. The reported feature, accessible by swiping right from the main interface, allows users to issue commands in natural language. The AI then decomposes tasks and automatically calls upon relevant Mini Programs within WeChat to complete actions like ordering food, booking tickets, or making payments, creating a closed-loop service execution system. This strategic shift follows the internal conflict and subsequent "blocking" of Tencent's standalone AI app, Yuanbao, by WeChat for violating sharing rules during a 2026 Spring Festival promotion. The incident highlighted a lack of internal consensus and exposed the weakness of competing in the standalone AI assistant arena against rivals like ByteDance's Doubao (345M MAU) and Alibaba's Qianwen. The new WeChat AI Agent aims to leverage WeChat's unique assets—its massive user base, standardized Mini Program APIs, WeChat Pay, and identity system—to move from simple content generation to actual task execution. Analysts note this changes the competitive landscape from model benchmarks to which AI can connect to more real-world services. However, success depends on key variables: the capability of Tencent's underlying Hunyuan model, managing massive inference costs, and redesigning incentives for Mini Program developers whose traffic might be bypassed. The move is seen as an attempt to keep user service intent within WeChat's ecosystem as AI begins to redefine how users access services.

marsbit32 мин. назад

Blocked Its Own Treasure, WeChat AI Steps Up

marsbit32 мин. назад

ByteDance Adopts Arm CPUs, Jensen Huang: So Sad I Didn't Buy Arm

**Summary:** At Computex 2026, Arm CEO Rene Haas announced that ByteDance and Oracle have adopted Arm's self-designed Arm AGI data center CPU. The company expects significant revenue growth from this product, projecting $20 billion in demand for the 2027/2028 fiscal years. Haas noted that restricting AI-capable CPUs from the US to China is nearly impossible due to their widespread applications. Arm's stock has surged dramatically this year, notably rising 16% after NVIDIA's Arm-based Vera CPU and RTX Spark announcements. A highlight was the informal, humorous on-stage conversation between Haas and NVIDIA CEO Jensen Huang. Huang joked about NVIDIA's failed attempt to acquire Arm and playfully lamented selling his Arm shares. Both executives showed a clear sense of camaraderie and shared regret over the missed merger. Key technical topics were discussed: 1. **AI PC Design:** Huang explained NVIDIA's RTX Spark superchip (with a 20-core Arm CPU) is designed for future AI agents that will autonomously run and use tools on PCs, blending local and cloud processing. 2. **Agent vs. OS:** Huang emphasized the operating system remains crucial, as AI agents rely on its APIs and tools to function. 3. **Growth Constraints:** He identified the shift to "useful AI" that generates profitable tokens as a primary driver for immense, almost limitless, computational demand. Haas outlined Arm's strategy across PC and data centers. For PCs, Arm collaborates with partners like NVIDIA and MediaTek, offering its compute subsystem (CSS) for custom SoCs. In data centers, its Arm AGI CPU (built on TSMC's 3nm process) has gained major partners including OpenAI, Meta, and now ByteDance and Oracle. Arm presented a multi-year roadmap for its in-house CPU line. The article concludes that while GPUs dominated the AI training race, the explosion of AI agents is shifting significant focus to CPUs for inference, state management, and tool orchestration. The industry is trending towards vertical integration, with companies like cloud providers designing chips and chip/IP firms offering full solutions, all competing to deliver more efficient computing per watt.

marsbit52 мин. назад

ByteDance Adopts Arm CPUs, Jensen Huang: So Sad I Didn't Buy Arm

marsbit52 мин. назад

New Wall Street Play: Yen Shorts Still Adding, But Japan Stocks Don't Rely on Carry Trade Unwinding

On June 3rd, USD/JPY hit 160.44, its highest level since July 2024, while the Nikkei 225 surged past 68,000 points. Contrary to popular narratives of an imminent "carry trade unwind" akin to August 2024, data reveals a more complex picture. Speculative net short positions in yen futures have actually increased, reaching -114,667 contracts by late May, suggesting traders are doubling down rather than retreating. Meanwhile, Japan's Finance Ministry conducted its largest-ever single-round FX intervention (11.73 trillion yen) in April-May but failed to hold the 160 yen line. The Nikkei's rally is not driven by carry trade dynamics. Foreign investors are aggressively buying Japanese stocks, with net purchases in 2026 running nearly 16 times higher than 2025 levels. This inflow is concentrated in AI and semiconductor-related stocks like SoftBank and Socionext, fueled by positive sector outlooks, rather than being a flight from unwinding yen shorts. Furthermore, the Nikkei has continued climbing despite the Bank of Japan's (BOJ) rate hikes to 0.75%. This disconnect exists because the current equity boom is fueled by AI-driven foreign investment, not reliant on cheap yen funding. However, this relationship remains fragile. Should the BOJ hike rates further (e.g., to 1.0%) while dollar weakness increases carry trade costs, the trajectories of the yen and Japanese stocks could reconverge, potentially triggering volatility.

marsbit56 мин. назад

New Wall Street Play: Yen Shorts Still Adding, But Japan Stocks Don't Rely on Carry Trade Unwinding

marsbit56 мин. назад

Broadcom's Q3 Guidance Misses Expectations by $12 Billion, After-Hours Trading Plummets Over 13%, AI Narrative "Cooling"?

On June 3, Broadcom released record Q2 FY26 results with revenue of $22.19B, up 48% YoY, and AI chip sales of $10.8B, up 143%. Adjusted EPS of $2.44 beat estimates. However, its Q3 AI semiconductor revenue guidance of $16B, while up over 200% YoY, fell roughly $1.2B (7%) short of analyst consensus expectations of $17.2B. This miss, coupled with slightly weaker-than-expected software revenue, triggered a severe market reaction. CEO Hock Tan maintained the FY26 AI revenue outlook of over $100B but did not raise it, disappointing investors who had priced in more robust growth. The stock plummeted over 13% in after-hours trading, erasing roughly $270B in market cap. The sell-off extended to peers like Marvell. A key concern for markets, particularly for Chinese optical module suppliers, was Tan's comment that the contribution of AI networking (e.g., Ethernet switches, optical interconnect chips) to AI revenue, currently near 40%, is expected to normalize to around 30% over time, signaling a potential peak in growth for that segment. Despite the guidance shortfall, Tan reiterated that AI demand remains "insatiable" and reaffirmed the long-term target of exceeding $100B in AI revenue by FY27. The reaction highlights the heightened sensitivity and premium valuation placed on AI-exposed stocks, where anything less than stellar guidance can prompt significant profit-taking. The broader question is whether this represents a cooling AI narrative or a correction in overstretched valuations.

marsbit57 мин. назад

Broadcom's Q3 Guidance Misses Expectations by $12 Billion, After-Hours Trading Plummets Over 13%, AI Narrative "Cooling"?

marsbit57 мин. назад

Торговля

Спот
Фьючерсы

Популярные статьи

Неделя обучения по популярным токенам (2): 2026 может стать годом приложений реального времени, сектор AI продолжает оставаться в тренде

2025 год — год институциональных инвесторов, в будущем он будет доминировать в приложениях реального времени.

1.8k просмотров всегоОпубликовано 2025.12.16Обновлено 2025.12.16

Неделя обучения по популярным токенам (2): 2026 может стать годом приложений реального времени, сектор AI продолжает оставаться в тренде

Обсуждения

Добро пожаловать в Сообщество HTX. Здесь вы сможете быть в курсе последних новостей о развитии платформы и получить доступ к профессиональной аналитической информации о рынке. Мнения пользователей о цене на AI (AI) представлены ниже.

活动图片