Agentic Design Patterns: A Book That Made Me Re-Understand "What Is an Agent, Really?"

链捕手2026-05-25 tarihinde yayınlandı2026-05-25 tarihinde güncellendi

Özet

"Agentic Design Patterns" is a 2025 book by Antonio Gullí, a Google engineering director, which offers a systematic framework for AI Agent development through 21 design patterns. A core contribution is the "Four Levels of Agency": Level 0 (bare LLMs) are not true agents. Level 1 agents actively decide when and how to use tools. Level 2 agents engage in strategic planning, context engineering (curating and filtering information), and self-reflection. Level 3 involves multi-agent collaboration with defined communication topologies. The book introduces **Context Engineering** as a superset of prompt engineering, managing four layers of information for the agent: system prompts, external data, implicit context (user history, environment), and feedback loops for automated optimization. A key pattern is **Reflection (Producer-Critic)**, where two distinct agents with different prompts collaborate iteratively—one produces output, the other critiques it—until quality is satisfactory or a max iteration limit is reached. For **Memory**, a three-layer model is proposed: Session (ephemeral conversation context), State (temporary task data), and Memory (persistent, long-term storage). Regarding **Multi-Agent Systems**, the book advises against unnecessary complexity, recommending simple topologies like Supervisor or Peer-to-Peer based on task needs. It emphasizes perfecting a single Level 2 agent before moving to multi-agent setups. The author concludes with three actionable takeawa...

Author: Yanhua

Antonio Gullí is an Engineering Director at Google. He wrote a 453-page book, breaking down AI Agent development into 21 design patterns.

But this is not a book review. My motivation for reading this book was specific: I've written about Harness Engineering, shared my experience with pitfalls in Clawdbot, and written "AI Agents Are Not Magic" about the seven turning points from burning tokens to becoming truly usable. After each piece, there remained an unanswered question: Is there an underlying, reusable logic behind all these things?

This book gave me an answer, and it went deeper than I expected.

What You're Writing Might Not Be an Agent At All

The most incisive judgment in the book is hidden in the prologue.

The "AI" most people are using is just Level 0: a bare LLM, with no tools, no memory, and no ability to act. You ask it which film won Best Picture at the 2025 Oscars, and it guesses. The book is blunt: Level 0 stuff is not an Agent.

Only the higher levels are true Agents:

Level 1: Tool User

The Agent starts using tools: search, APIs, databases. But it's not just "able to call an interface"; it must decide *when* to call, *what* to call, and *how* to use the result. The book gives a concrete example: a user asks "What are some new TV shows?" The Agent realizes on its own that this information isn't in its training data, actively calls a search tool to find it, then synthesizes the results. The key step is "realizing on its own." It's not a human telling it "go search for this"; it judges *for itself* that a search is needed. This judgment ability is the threshold for Level 1.
Level 2: Strategic Thinker

Adds two more things: Planning and Context Engineering. The book defines Context Engineering: it's not about dumping information, but about carefully selecting, trimming, and packaging context. A great example: a user wants to find a coffee shop between two locations. The Agent first calls a mapping tool to get a bunch of data, then judges that "only street names are needed for the next step," trims the map output into a short list, and feeds it to a local search tool. Every step is about reducing information noise.

There's a sentence in the book I read several times: "To achieve the highest accuracy from AI, you must give it short, focused, and powerful context." Context Engineering is exactly about doing this.

At this level, the Agent can also self-reflect. It reviews its own work after finishing, identifies issues, and makes corrections itself. I'll talk about this in more detail later.
Level 3: Multi-Agent Collaboration

The book's stance is clear: stop trying to build one all-powerful super agent. The reliable approach is to build a team: a Project Manager Agent + a Researcher Agent + a Designer Agent + a Copywriter Agent. The example given is for a new product launch: a "Project Manager Agent" coordinates overall, assigning tasks to "Market Research Agent," "Product Design Agent," and "Marketing Agent." The key is communication: how Agents pass data, synchronize state, and handle conflicts. This chapter diagrams six communication topologies, from the simplest single Agent to the most flexible custom hybrids, with explanations for each scenario.

After reading these four levels, I suddenly understood why many people say "my Agent doesn't work well." The model isn't the problem; the problem is you're using it like a chatbot, and it might not even be at Level 1.

Context Engineering: The Book's Most Underrated Concept

I wrote an article about Harness Engineering, discussing how the design of the racetrack is more important than the engine's horsepower. After reading this book, I realized that Context Engineering is the mapping of Harness Engineering at the prompt level.

Traditional Prompt Engineering only cares about "how you ask." Context Engineering in the book cares about "what the Agent sees in front of it before it's asked." It includes four layers of information:

First layer, the system prompt. Defines who the Agent is, its tone, its boundaries. Most people only write this layer.
Second layer, external data. Documents retrieved via RAG, return values from tool calls, real-time API data. This is where most people get stuck: they know they need to feed data, but not how to do it without overwhelming the model.
Third layer, implicit data. User identity, interaction history, environmental state. Things you don't explicitly state but the Agent should know. For example, if you tell the Agent, "Help me email John to confirm tomorrow's meeting," it should know what tomorrow's meeting in your calendar is and what your relationship with John is.
Fourth layer, the feedback loop. After each output, the Agent automatically evaluates quality and adjusts the context strategy for next time. The book calls this "automated context optimization." Google's Vertex AI Prompt Optimizer is the engineering implementation of this idea.

When I read this part, I remembered my article "AI Agents Are Not Magic," which included the insight: "Your Agent needs rules, and a lot of them." Looking back now, those rules were essentially a manual version of Context Engineering; the book systematizes it.

Reflection: Two Agents Are Truly Better Than One

This is the pattern with the most practical value for me in the entire book.

The core of Reflection is simple: after finishing work, the Agent reviews itself, finds problems, and corrects them. But the implementation matters. The book states clearly: The Producer and the Critic must be two different Agents, with different system prompts. The same persona reviewing its own work will always have blind spots. If you let the same LLM write code and then review the code it just wrote, it will most likely say, "It's fine."

The book provides a complete code example.

The Producer's prompt is: "You are a Python developer. Write a function to calculate factorial, handle edge cases and exceptions."
The Critic's prompt is: "You are a nitpicking senior engineer. Review the code line by line, check for bugs, style, missed edge cases, and areas for improvement. If perfect, output CODE_IS_PERFECT, otherwise list all issues."
Then there's a for loop: Producer writes code → Critic reviews → Producer revises based on feedback → Critic reviews again → until Critic says CODE_IS_PERFECT or the maximum iteration count is reached.

It's that simple. But the book warns about a cost issue easily overlooked: each reflection loop is a new LLM call; the more iterations, the more expensive. Also, as the conversation history expands, the context window gets filled with earlier versions and criticisms, reducing the actual reasoning space available. So the best practice for Reflection is: set a reasonable maximum number of iterations (the book uses 3), stop once the Critic is satisfied, don't pursue perfection.

Its uses go far beyond writing code. Writing articles, making plans, summarizing documents, solving logic puzzles—the Producer-Critic model applies everywhere. The book lists seven application scenarios, with the same core logic: produce, review, revise.

Multi-Agent Isn't About Being More Complex

In the Multi-Agent Collaboration chapter, my favorite part is the six communication topology diagrams. Many people start with complex structures, but in reality, three are sufficient for most scenarios:

Single Agent (Independent Execution): The task can be broken down into independent sub-problems, each handled by its own Agent. Simple, easy to maintain.
Peer-to-Peer Network: Agents communicate directly with each other, with no central control node. Decentralized, good fault tolerance—if one Agent fails, it doesn't affect the whole. But coordination costs are high, and it can get chaotic.
Supervisor (Centralized Orchestration): A Supervisor Agent manages a group of Worker Agents. Assigns tasks, collects results, resolves conflicts. Clear hierarchy, easy to manage. But the Supervisor is a single point of failure and a performance bottleneck.

The other three (Supervisor-as-Tool, Hierarchical, Custom Hybrid) are variations and combinations of the first three. The book is very practical: the topology you need depends on your task complexity. The more fragmented the task, the higher the communication cost. At a certain point, the Supervisor pattern becomes more efficient than the hierarchical one.

My takeaway is that many people building Multi-Agent systems spend 80% of their time on communication protocols, forgetting to ask a more fundamental question: does this task *really* need multiple Agents? The book is clear: a single Level 2 Agent with Reflection is often sufficient. Level 3 is for scenarios where a single Agent genuinely can't handle it.

The Three-Layer Memory Model: I Felt It Vaguely But Never Named It

I resonated most with the Memory chapter because when I wrote those two articles about Obsidian + Claude, I kept wondering: how should an Agent's memory be layered?

The book provides the answer:

Session (Conversation Layer): The context window for the current conversation. This is the shortest memory; it's gone when the conversation ends. Long-context models simply enlarge this window, but it's still temporary, and each inference has to process the entire window, which is expensive and slow.
State (State Layer): Temporary data during the current task. For example, "what is the ongoing task," "what step has been completed," "what intermediate data has been generated." Longer than Session, but cleaned up when the task ends. The book provides a complete example using Google ADK's State mechanism.
Memory (Persistent Layer): Long-term memory across sessions and tasks. User preferences, learned experiences, important historical decisions, stored in databases or vector stores, retrieved semantically. The book emphasizes an important point: Memory isn't just about storing; you must design a full strategy for *what* to store, *when* to store it, and *how* to retrieve it. Store too much, noise increases; store too little, it's insufficient.

In my previous article about Clawdbot, I mentioned "state files" and "workspace documents," which were essentially handcrafting the State and Memory layers. The book has framed this.

Five Hypotheses, the Fifth Is the Most Outlandish

At the end of the book, it presents five hypotheses about the future of Agents. The first four are within reasonable speculation: General-purpose Agents evolve from writing code to managing projects; Deep Personalization proactively discovers your needs; Embodied Intelligence moves from screens into the physical world; Agents become independent economic entities.

The fifth one stunned me: Shape-Shifting Multi-Agent.

You only declare a goal, like "start a premium coffee e-commerce business." The system automatically decides: first create a "Market Research Agent" and a "Brand Agent." After running a round of data, it judges that the Brand Agent is no longer needed, splitting it into three new ones: "Logo Design Agent," "Website Builder Agent," "Supply Chain Agent." If the Website Builder Agent becomes a bottleneck, the system automatically replicates three parallel Agents to work on different pages simultaneously. Throughout the process, the system continuously auto-tunes each Agent's prompts and constantly restructures the team architecture.

The book calls this a "goal-driven, self-transforming multi-agent system." It's not executing a plan you wrote; it's generating the plan itself, adjusting the plan itself, and reorganizing the execution team itself.

This reminds me of Karpathy's AutoResearch: write a program.md, define goals, metrics, boundaries, and press "Launch." Humans are outside the loop. But this book pushes further: even how the Agent team is formed and restructured is left for the system to decide. Humans only declare "what they want."

Three Things You Can Do Immediately

After reading this book, I have three actionable items to implement immediately:

First, add a Critic to your current Agent. Whether you use Claude Code, CrewAI, or your own framework, add one step at the end of your existing workflow: have another Agent (with a different system prompt) review the previous step's output. Code generation + code review, article writing + fact-checking, plan creation + feasibility assessment. It's one more LLM call, but the quality improvement is often doubled. The book's Producer-Critic pattern is plug-and-play.
Second, start doing Context Engineering, not just Prompt Engineering. Go back and look at your instruction files for the Agent. If they are all rules about "how you should do things" but lack the context of "what environment you are currently facing," add it. Tell the Agent which project it's in, what decisions it made before, what the user's preferences are. The Context Engineering chapter in the book and your AGENTS.md are two expressions of the same thing.
Third, don't rush into Multi-Agent yet. Get your single Agent to Level 2 first: with tools, Reflection, and Memory. The book repeatedly emphasizes that a Level 2 single Agent with Producer-Critic and Context Engineering can cover the vast majority of practical scenarios. Level 3 is for truly cross-domain, multi-stage tasks requiring parallel division of labor. Most people's problem isn't having too few Agents; it's that they haven't even tuned one Agent properly.

This book is 453 pages, published by Springer in 2025. Code examples cover LangChain/LangGraph, Google ADK, CrewAI, and the OpenAI API. The foreword is written by Google Cloud AI VP, and there's a surprising and engaging recommendation preface from a Goldman Sachs CIO.

But my reason for recommending it isn't "comprehensive." It's because you'll realize something after reading: the pitfalls you've encountered with Agents in the past six months have been organized into patterns. You don't need to reinvent Reflection, guess how Memory should be layered, or experiment with which communication topology to use for Multi-Agent.

Someone has drawn the map for you. The rest is just walking.

Are you using AI Agents for development? What Level is your current Agent at?

İlgili Sorular

QWhat are the four levels of AI Agent maturity described in the book 'Agentic Design Patterns'?

AThe book describes four levels: Level 0 (Bare LLM, not a real Agent), Level 1 (Tool User), Level 2 (Strategic Thinker with planning and Context Engineering), and Level 3 (Multi-Agent Collaboration).

QAccording to the article, what is the core difference between Prompt Engineering and Context Engineering?

APrompt Engineering focuses on 'how you ask,' while Context Engineering manages 'what is in front of the Agent before it asks.' It involves structuring four layers of information: system prompt, external data, implicit data, and feedback loops to provide the Agent with focused, actionable context.

QWhat is the 'Reflection' pattern, and what is a key practical implementation detail highlighted in the book?

AThe Reflection pattern involves having an Agent review and revise its own work. A key implementation detail is that the Producer (who creates) and the Critic (who reviews) must be two different Agents with different system prompts to avoid blind spots. The process involves iterative loops until the Critic approves or a maximum iteration limit (e.g., 3) is reached.

QWhat are the three main memory layers defined for AI Agents in the book's model?

AThe three memory layers are: 1) Session (the current conversation's context window), 2) State (temporary data for an ongoing task), and 3) Memory (the persistent, long-term storage for cross-session and cross-task information like user preferences and learned experiences).

QWhat are the three actionable recommendations the article author suggests after reading the book?

AThe three recommendations are: 1) Add a Critic Agent to your current workflow for review. 2) Start doing Context Engineering, not just Prompt Engineering, by providing environmental context. 3) Focus on perfecting a single Level 2 Agent with tools, reflection, and memory before rushing into Multi-Agent systems.

İlgili Okumalar

Within Strategy's Framework, STRC's Dividend Yield Remains at 12% as Share Price Stays Below Par Value

Michael Saylor, Executive Chairman of Strategy (MSTR), confirmed that the dividend rate for its STRC perpetual preferred shares will remain at 12.00% through August 2026. The rate has increased from 9% at its July 2025 launch to the current high via a "ratchet" mechanism, which permanently raises the rate by 0.5% whenever the share price falls below $95. This mechanism is intended to push the price back toward its $100 par value and support Strategy's "at-the-market" (ATM) program for issuing new shares to fund Bitcoin purchases. However, the mechanism has not worked as intended. STRC shares closed at $89.46 on July 31, remaining about 10-11% below par value despite the record-high dividend. Competition from rival Strive's higher-yielding SATA securities has pressured demand. The persistent discount has forced Strategy to suspend new STRC issuances via its ATM program, limiting this funding channel for Bitcoin acquisitions. STRC's struggles reflect Bitcoin's own volatility, as the preferred shares historically move in tandem. Analysts have warned the ratchet structure carries long-term, one-way risk. A law firm is investigating Strategy's ability to maintain dividend payments if Bitcoin's price stays low. Retail investors own roughly 83% of outstanding STRC shares, a group seen as prone to panic selling during downturns. In response, Strategy has established financial reserves, including a liquidity cushion covering about 26 months of dividend/interest obligations, and a $2 billion share buyback program alongside a Bitcoin monetization framework, though the company emphasized it is not obligated to sell any Bitcoin.

cryptonews.ru3 dk önce

Within Strategy's Framework, STRC's Dividend Yield Remains at 12% as Share Price Stays Below Par Value

cryptonews.ru3 dk önce

Analyst: Bitcoin's Price Will Drop to $60k in August, Then Rebound to $70k

Financial analyst Andrey Poroshin has provided a new forecast for Bitcoin's price dynamics in August. Poroshin, an analyst at the Bitbanker exchange, expects the cryptocurrency market to experience a downturn this month, with prices retesting the $60,000 level due to a lack of supportive macroeconomic catalysts. He noted that the recent US Federal Reserve decision to hold interest rates did not significantly impact the market, while inflation remains above the 2% target. Poroshin stated that Bitcoin is ending July under pressure from moderate volatility and a lack of new macroeconomic stimuli, leading to continued market caution. According to his base scenario, Bitcoin will drop to a range of $60,000 to $62,000 before recovering to $70,000. He pointed out that even $70,000 remains below the cost of mining in the US, which has prompted some miners to shift towards AI data center operations. Poroshin cited the winding down of BitMEX's operations as a potential catalyst for a price rebound, suggesting the exit of weaker players often coincides with market reversals and reduced short-term selling pressure. He believes Bitcoin is currently less susceptible to geopolitical shocks, such as the Iran-US conflict, and does not expect significant market changes in August related to the pending CLARITY Act. Looking ahead, Poroshin forecasts that September will bring more active price fluctuations driven by potential Fed rate decisions and possible discussions or approval of the CLARITY Act.

cryptonews.ru3 dk önce

Analyst: Bitcoin's Price Will Drop to $60k in August, Then Rebound to $70k

cryptonews.ru3 dk önce

Following the Coldcard Hack, One of the Largest Bitcoin Wallet Hacks Recently, a New Wave of Losses Begins! Losses Are Mounting

Following a major hack targeting Coldcard hardware wallets, losses have surged to approximately 1,367 BTC ($88.6 million) across 4,585 addresses. The third wave of attacks stole an additional 207.7 BTC, exhibiting different patterns from the first two. While initial attacks used shared deposit addresses and targeted P2WPKH wallets, the latest wave employed unique recipient addresses per victim and focused on P2WSH addresses. Analysis by Galaxy Research cannot definitively link all three waves to the same attacker, raising the possibility of a second actor exploiting the known vulnerability. The stolen funds, predominantly from wallets holding under 1 BTC, remain unspent. The vulnerable Coldcard firmware was released in March 2021, and all stolen coins originate from after that date.

cryptonews.ru1 saat önce

Following the Coldcard Hack, One of the Largest Bitcoin Wallet Hacks Recently, a New Wave of Losses Begins! Losses Are Mounting

cryptonews.ru1 saat önce

Trump Media sells another 2,628 BTC, holdings fall to 4,261 BTC

Trump Media & Technology Group has sold an additional 2,628 Bitcoin (worth approximately $165 million), continuing a series of sales over the past seven months. According to blockchain data from Arkham cited by Lookonchain, these latest transfers to Crypto.com bring the company's total reported Bitcoin sales to 7,281 BTC (worth about $545 million), reducing its holdings by 63%. The company's remaining Bitcoin holdings now stand at 4,261 BTC, valued at $269.8 million. Trump Media initially purchased 11,542 BTC at an average price of $118,522 before beginning the sales. This activity occurs amid broader scrutiny of crypto ventures linked to former President Donald Trump, as lawmakers debate the CLARITY Act, which focuses on ethics rules, digital asset ownership, and potential conflicts of interest for public officials.

cointelegraph1 saat önce

Trump Media sells another 2,628 BTC, holdings fall to 4,261 BTC

cointelegraph1 saat önce

In Jinjiang, Fujian, a Storage Super Unicorn Lies Quiet

In Fujian's Jinjiang, a city known for sportswear, lies a quiet semiconductor giant: Fujian Jinhua Integrated Circuit Co. (JHICC). Once a promising domestic DRAM manufacturer alongside Yangtze Memory and ChangXin Memory Technologies (CXMT), its journey was derailed in 2018 when the U.S. placed it on an Entity List and filed criminal charges for alleged trade secret theft. This halted production for years. A turning point came in February 2024 when a U.S. federal court found JHICC not guilty. However, it had lost crucial time. While CXMT soared to become a top-valued A-share company in 2024, JHICC, with an estimated valuation of 80 billion RMB, was just restarting. Its current output is primarily customized DDR4 chips, not the advanced DDR5/HBM demanded for AI, but it still benefits from the broader memory chip upcycle. JHICC's story is tied to Chen Zhengkun, a veteran engineer who left Micron to lead the venture. Founded in 2016 with state-backed funding, JHICC partnered with Taiwan's UMC to develop DRAM technology. Rapid progress was cut short by the U.S. actions, which Micron initiated, partly due to its heavy reliance on the Chinese market. Post-sanctions, Chen's team worked to rebuild the production line with reduced reliance on U.S. technology. According to its records, JHICC achieved small-scale production and revenue growth under immense pressure. It now focuses on the stable "niche" DRAM market (e.g., TVs, routers) with a monthly capacity of ~40,000 wafers, aiming for 60,000 by 2026. It holds over 1,000 patents but remains on the Entity List. For Jinjiang, investing in JHICC was a bold industrial leap. The local government provided unwavering financial and logistical support during the crisis, helping the company survive. JHICC has become the anchor for a growing local semiconductor cluster. Though its scale lags behind domestic peers, JHICC's persistence symbolizes a hard-won foothold in a global market long dominated by Samsung, SK Hynix, and Micron. Having missed one boom, it seeks a place in the new AI-driven memory supercycle.

marsbit2 saat önce

In Jinjiang, Fujian, a Storage Super Unicorn Lies Quiet

marsbit2 saat önce

İşlemler

Spot