How to Conduct Deep Research Using Claude's Dynamic Workflows

marsbitPublished on 2026-06-09Last updated on 2026-06-09

Abstract

The article "How to Use Claude's Dynamic Workflows for Deep Research" discusses overcoming the pitfalls of technical research, where both humans and AI can get overwhelmed by information, leading to vague conclusions. It introduces Claude Code's new "Dynamic Workflows" feature, which automatically designs and executes task-specific workflows before starting a task, unlike simpler "planning modes." This approach incorporates validation, result convergence, and adversarial verification from the outset. The core of Dynamic Workflows is six predefined scheduling patterns that address how to decompose tasks and synthesize results: 1. **Classify-and-Act (Routing):** An agent classifies the task and routes it to the most suitable specialist agent for execution. It's precise and efficient but struggles with ambiguous tasks. 2. **Fan-out & Merge:** The task is split into parallel, independent subtasks whose results are later merged. It's fast and isolates contexts but is more expensive and challenging to synthesize. 3. **Adversarial Verification:** Multiple "challenger" agents critique a worker agent's conclusion, requiring majority approval. This counters confirmation bias and self-assessment errors but relies on verifiable facts. 4. **Generate & Filter:** Multiple agents generate many candidate solutions, which are then filtered against a rubric to output only the best. It fosters diversity but depends heavily on the filter's quality. 5. **Tournament:** Multiple agents compete...

Conducting technical research is actually a trap-filled endeavor (for both humans and AI). After all, from the beginning of the investigation, you receive a massive amount of information. Views and opinions pile up, and conclusions become increasingly vague. So, it's always crucial to remember to return to the original goal.

This is also why AI hasn't been excellent in this area for a long time. From the perspectives of attention and associative thinking, it tends to get even more trapped by the current volume of information and is weak at making truly valuable cross-disciplinary connections.

Of course, where AI excels is in its execution capability. It can search, summarize, and conclude layer by layer in the form of agents, completely avoiding the loss of details.

Although I haven't published much on my public account in the past six months, I've been closely following and researching major developments across almost all mainstream sectors in the industry. What supports this input and output is my own deep-research system.

Facing the launch of the Dynamic Workflows feature in Claude Code last week, I wanted to challenge it against my own system to see if its default capabilities could completely surpass mine.

2. What is Dynamic Workflows?

Dynamic Workflows has a core concept: before executing a task, the AI first automatically designs what workflow should be used to complete this task, and then initiates the execution.

This is fundamentally different from the "plan mode" and "skills" we used before. Plan mode breaks down tasks into finer details, but not necessarily according to a reasonable workflow. Only as you arrange your prompts might it add acceptance criteria (crucial for research); similarly, it only sets up harness rules better when given prompts.

But Dynamic Workflows automatically incorporates acceptance logic, result convergence, adversarial verification, and similar elements.

The trigger is simple: directly use the /deep-research command in Claude Code and provide some research templates and entry materials. If you want to use the dynamic workflow capability alone, you can use a specific prompt or directly say 'ultracode'. Before using, note that token consumption is about tens of times higher than normal.

3. The Six Built-in Workflow Patterns

The underlying foundation of Dynamic Workflows is six core orchestration patterns summarized by the developers. This is why it's more powerful than regular conversations/agents/skills.

Actually, behind these six patterns lie only two core questions: How to split the task? How to merge the results? Splitting them into six is essentially a permutation and combination of these two.

3.1 Router Pattern (Classify-And-Act)

First, one agent determines the task type, then distributes the task to the most suitable specialized agent to handle. The core logic is the selection logic of the router, not parallelism or iteration. A task follows only one path; other paths are not executed at all.

For example, I could pre-define three sub-agent roles: one analytical agent that strictly verifies data, one output agent skilled in writing, and one challenge agent specialized in finding vulnerabilities. The routing layer judges which sub-task is suitable for whom, rather than having one agent handle everything.

The value of this pattern lies in: precision and frugality. Each agent's prompt can be highly independent, not interfered by other objectives, enabling exploration with vertical depth. Token consumption is lowest, response speed is fastest. Responsibility boundaries are very clear.

The drawback is also significant: weak handling capability for tasks with blurry boundaries (e.g., "both a technical and an account issue").

3.2 Split & Merge (Fan-out & Merge)

Also my most commonly used pattern. The core logic is parallel + merge. The task is split into N independent subtasks to run simultaneously, and after all are completed, they are uniformly merged.

The advantage lies in speed and isolation. The total time taken is approximately equal to the slowest subtask, not the sum of all subtasks. Each subtask has an independent context, not interfering with each other, and noise from one subtask won't pollute others.

The weakness is that token cost is N times that of serial execution, and the merge layer (Synthesize) itself is challenging—how to fuse N structurally inconsistent outputs is a design challenge. Poor subtask division can lead to omissions or repeated coverage.

3.3 Adversarial Verification

The core logic is inspection. For the same conclusion, multiple agents challenge it from a "refutation" perspective; it only passes if a majority vote is reached.

The advantage is that, since the Verifier doesn't know the Worker's line of thought and only looks at the result, it structurally eliminates the self-assessment bias that occurs when "asking the model to check its own code."

This pattern solves a problem that has long troubled me: we often chat with AI in a colloquial manner, but AI tends to answer according to your expectations, easily leading to "confirmation bias." Adversarial verification forces the AI to look for counterexamples and to verify based on data and experiments, not just cater to your ideas.

However, when it comes to verification, if it gives wrong judgments, it can mislead the Worker to cater to the Verifier. Therefore, it's better to prefer verification based on reproducible facts rather than relying on opinions.

Jokingly speaking, if you ask AI to find problems, it can find endless issues, so you must limit the boundaries within which it can search for problems.

3.4 Generate & Filter

The core logic is diverge then converge. Deliberately produce an excessive number of candidates first, then use a rubric to filter down to the essence, retaining only high-confidence results for output.

Instead of letting one agent output a "decent" answer, have it generate ten, then filter with a verification layer. Therefore, the advantage lies in diversity. Multiple Generators can use different strategies and prompts to produce solutions hard to anticipate manually, and the filtering step ensures highly concentrated final output quality.

The weakness is that the quality of the Filter's rubric directly determines the final outcome. A poorly designed rubric renders the entire process useless.

Suitable scenarios are situations where the correct answer is unknown beforehand, needing to choose the best from multiple possibilities, or where there's a clear need for diversity.

It's only superficially similar to Fan-out-And-Synthesize: Both are "multi-path parallel → single output," making them easiest to confuse.

The key difference lies in the intent: Each path in Fan-out handles a different part of the task; results are complementary, and all paths contribute during merging. In Generate-And-Filter, each path handles the same task; results are competitive, and most are discarded during merging. The former is a "jigsaw puzzle," the latter is a "beauty pageant."

3.5 Tournament Pattern

The core logic is competitive elimination. N agents each independently do the same thing, eliminated round by round through pairwise comparison, ultimately selecting the best solution.

I've done this manually before—running two or three versions of the same code change, then having AI compare which is better. Now it can be orchestrated directly within the workflow.

The advantage lies in judgment stability. Pairwise comparison ("Which is better, A or B?") is much more stable than absolute scoring ("Score A") because it eliminates the problem of drifting scoring standards. The result goes through multiple rounds of competition, so the credibility of the final winner is high.

It's also superficially similar to Generate-And-Filter: Both select the best from multiple candidates. The key difference is the selection mechanism: Tournament uses pairwise judges for two-by-two comparison, which is "making candidates compete against each other." When the rubric is difficult to quantify or judgment is essentially relative, this is more reliable.

3.6 Loop Pattern (Loop)

The core logic is adaptive iteration: constantly trying, encountering resistance, collecting error information, supplementing context, retrying until meeting the acceptance conditions.

Essentially, it combats AI's randomness: try a few more times, and you'll eventually stumble upon a better result. But a more mature approach is to combine it with adversarial verification, making each iteration execute with more information, not just relying on randomness.

The advantage lies in handling capability for tasks with unknown workloads. The other five patterns assume the task boundary is determined. Loop Until Done is the only pattern that can handle "not knowing how many rounds are needed."

The weakness is the potential risk of losing control—poorly designed stopping conditions can lead to infinite loops. The agent in each round has a brand-new context and cannot accumulate cross-round state (unless explicitly written to a file).

4. The Battle Between My Own Skill and the Official Workflow

Before Dynamic Workflows came out, I had specifically designed my own deep-research skill. The logic of my skill roughly went like this:

  1. Give only simple information (e.g., a certain project launched a new feature).
  2. Let the AI search all related materials: official documentation, source code, market sentiment.
  3. Compress the information into meaningful summaries.
  4. Multiple agent roles conduct adversarial analysis to generate a report.
  5. Automatically deduplicate because multi-agent content often has high repetition rates.

After using it for a while, I thought it was quite useful. But it had a fundamental flaw: it lacked goal-oriented convergence.

Moreover, even with the fifth step of deduplication, it often deleted valuable information. Without deduplication, the skill would easily give you a 10,000-word long article—very comprehensive in information but failing to directly tell you "what does this have to do with you, what should you do?"

However, research serves "decision-making." This is why many skills can only stop at the research itself, achieving 80 points but missing the crucial final 20.

As a result, after AI preliminarily completes the research, it often requires ten more rounds of thinking and dialogue to reach a satisfactory, comprehensive conclusion.

What More Does the Official Dynamic Workflow Do?

Through several complex research task experiments this week, I found that the deep-research workflow built into Claude Code (note, not just a skill, but a module compiled and embedded into CC), compared to my own skill, adds several key steps:

  • Problem Decomposition Layer: It doesn't start searching directly. Instead, it first asks questions, breaking my query into multiple sub-questions: What do you really want to clarify? How does this relate to you? Which dimensions are worth delving into? This step I used to skip.
  • Credibility Assessment: Evaluates the falsifiability of each piece of information, similar to authority scoring in traditional SEO—is the source credible? What about citation counts? This is a step I hadn't thought to add before.
  • Cross-cutting Deletion, Not Average Merging: My previous approach was to average and select all conclusions, so documents were large. Dynamic Workflows conducts multi-agent voting on each conclusion, deleting those with insufficient votes, not simply merging.
  • Goal-oriented Output: The final report isn't an information dump, but provides judgment and suggested solutions centered around your original goal. The key to achieving this lies in its preset ability to orchestrate multiple sub-agents. The reason my skill often lacked final goal orientation was precisely because of instruction weight decay after massive amounts of information.

What Problems Do These Mechanisms Solve?

They target several typical problems when AI handles long tasks:

Goal Drift: Good state at the start of the task, losing track in the middle, and regaining rhythm at the end—similar to a human zoning out in class. More obvious with longer tasks.

Premature Stopping: Running along, encountering difficulty, the AI thinks it's "done" and stops, even though the acceptance criteria aren't met at all.

Context Pollution: A single agent handling a complex task compresses the execution space for subsequent steps with the large amount of preceding prompts. A better way is to control preceding prompts within a few thousand tokens and distribute context across multiple agents.

Output Bias: AI tends to answer along your expectations; colloquial questioning triggers this problem more easily.

Dynamic Workflows solves these four problems in a structured way: automatically adding acceptance criteria to prevent premature stopping; parallel isolation of context; adversarial verification to counteract output bias; decomposing the problem and constraining the AI layer by layer to first understand the goal before acting.

5. Summary

Finally, as a long-term research practitioner, I am astounded by this new CC mechanism. Its six built-in patterns—Router Selection, Split & Merge, Adversarial Verification, Generate & Filter, Tournament Competition, and Loop Cycle—cover the orchestration needs of the vast majority of complex research tasks.

I no longer need to manually design agent orchestration, nor handle deduplication and cross-validation myself; these are now built into the workflow itself.

Moreover, it's particularly suitable for open-ended exploration where information is lacking. Because its inherent multi-agent orchestration + task goal decomposition raises its generality once again. In fact, as early as three years ago, AI already did quite well in solving extremely clear, small problems under layered constraints. But the qualitative leap for AI truly lies in generality. Only then does its competition shift from simply solving one problem to becoming a true Agent, adapting from solid-state problem-solving to handling any problem.

So, Dynamic Workflows is not "smarter single conversations," but the structuralization of the research process itself.

Research that originally required me to initiate ten-plus independent conversations is now compressed to 3-4. Although the corresponding token consumption has increased tens of times.

So why still 3-4 times? I think the root cause lies in the differences of these demands.

First, the rigor of the verification mechanism. I mainly research new technologies on the blockchain. For many things, official documentation lags; there are more valuable reference sources like open-source code, on-chain transactions, etc. Currently, the AI still defaults to official documentation as the standard, not verification based on factual evidence.

Second, completely cross-disciplinary deep thinking. Although workflow presets can solve some of this (pre-defining various dimensional subAgents to think about the same issue), AI excels at mainstream thinking models. For the very new, very profound, lacking data basis, it's slightly insufficient.

Third, solution design and verification. The significance of a solution lies not in its proposal but in its verification and support. It relies on measuring existing mechanisms, investment, and costs. If AI is well-trained, it can certainly do better, but this somewhat contradicts generality.

Finally, extreme information condensation. This then requires returning to an understanding of the audience's background. Some people have no foundation and need you to explain in an anthropomorphic, vivid way, while some listeners need you to impress them in one sentence~.

Related Questions

QWhat are the two core issues that the six workflow patterns in Dynamic Workflows address?

AThe two core issues are: How to break down a task, and how to synthesize/merge the results.

QWhat are the key advantages of the Adversarial Verification workflow pattern?

AIts key advantages are that it structurally eliminates self-evaluation bias by having a separate agent verify the results, and it forces the AI to seek counter-examples and base conclusions on data rather than conforming to user expectations, thus mitigating confirmation bias.

QWhat major advantage does the Fan-out & Merge workflow offer compared to sequential execution?

AIts major advantage is speed and isolation. The total time is roughly equal to the slowest subtask, not the sum of all subtasks. Each subtask runs with independent context, preventing noise or pollution between them.

QAccording to the author, what is the fundamental flaw of their own previous deep-research skill system?

AThe fundamental flaw is a lack of goal-oriented convergence. It produced comprehensive information but failed to clearly relate it to the user's objectives or provide actionable conclusions, stopping at the research itself.

QWhat is one significant additional step the Claude Code built-in deep research workflow performs compared to the author's old skill?

AIt performs a problem decomposition layer, where it starts by asking clarifying questions to break the user's query into sub-questions about their true goals and relevant dimensions, before beginning any research.

Related Reads

Humanity Loses $31 Million in Attack, Token Price Plummets 90% Due to a Single Private Key

On June 9th, the digital identity project Humanity Protocol suffered a major security breach resulting in over $31 million in losses. According to on-chain analyst Specter, hundreds of wallets holding the project's H token were drained. The attack was confirmed by founder Terence Kwok to be caused by the compromise of a foundation member's private key. As a precaution, users are advised to avoid interacting with Humanity's cross-chain bridge or liquidity pools. The incident caused the H token price to crash over 90%, from around $0.70 to a low of $0.052, wiping its market cap from $2 billion to approximately $35.7 million. The attacker allegedly minted 100 million new H tokens and is selling them for BNB. This breach adds to existing controversies surrounding Humanity Protocol. Founded in 2024, it aimed to verify human users via palm-print biometrics and zero-knowledge proofs. However, a leaked conversation in 2025 revealed that only about 1 million of its 9 million claimed Human IDs had completed biometric verification, suggesting 88% might be bots. Furthermore, the project has faced allegations of being a repackaged product from a Chinese access control vendor, raising privacy and authenticity concerns. Founder Terence Kwok's previous venture, Tink Labs, a hotel smartphone startup that raised $170 million, failed and entered bankruptcy in 2020 after burning through its funding. The current attack highlights the persistent critical issue of private key management in crypto. Unlike smart contract exploits, a private key compromise bypasses all on-chain security mechanisms. With no user compensation plan announced yet, this $31 million breach may be a final blow to the project's credibility, already weakened by previous controversies and a heavily depreciated token.

marsbit32m ago

Humanity Loses $31 Million in Attack, Token Price Plummets 90% Due to a Single Private Key

marsbit32m ago

MicroStrategy Will Not Die in This Downturn: Reflexivity, STRC Anchoring Back to Par, and the Self-Rescue Logic of "Sell Stock, Not Bitcoin"

This article analyzes the recent sharp decline in Bitcoin and MicroStrategy (MSTR), framing it as a targeted "reflexivity" attack. The trigger was MSTR using its cash reserves to buy back convertible notes, raising market concerns about a liquidity crisis. The playbook follows George Soros's principle: market expectations can shape reality. Fears that MSTR might be forced to sell BTC caused panic selling, lowering BTC's price and worsening MSTR's financial ratios, thus reinforcing the negative narrative. The author argues that MSTR's Structured Convertible (STRC), while falling in price, is a floating-rate security that will eventually return to par value (100). The price drop reflects the market demanding a higher yield due to perceived risk, but as a floating-rate instrument, its coupon can adjust, naturally pulling the price back to par over time. This is crucial for MSTR's continued ability to raise funds. The core thesis is that MSTR's best move to counter the attack is to **issue new equity (sell shares)**, not sell its Bitcoin holdings. While selling BTC would solve the immediate cash crunch, it would destroy the company's core investment thesis and premium. It would dilute the BTC per share, likely erase the market premium over its net asset value (mNAV > 1), and worsen its debt-to-asset ratio. Issuing shares while mNAV is high (e.g., 1.25x) allows MSTR to raise cash for reserves without harming shareholder value or the "perpetual accumulation" narrative. It improves the debt ratio and reassures STRC holders, breaking the negative reflexivity cycle. In conclusion, while MSTR could survive this episode even by selling BTC, doing so would fundamentally alter its investment proposition and weaken it for future cycles. The optimal, value-preserving strategy is to sell equity to rebuild reserves and maintain the long-term growth flywheel.

marsbit33m ago

MicroStrategy Will Not Die in This Downturn: Reflexivity, STRC Anchoring Back to Par, and the Self-Rescue Logic of "Sell Stock, Not Bitcoin"

marsbit33m ago

Humanity Loses $31 Million, a Private Key Causes Token Price to Plunge 90%

On June 9th, the digital identity project Humanity Protocol suffered a major security breach resulting in over $31 million stolen from hundreds of wallets holding its H token. The attack was caused by the compromise of a private key belonging to a foundation member, leading the team to advise users against interacting with its bridge or liquidity pools. Following the incident, the price of the H token plummeted by over 90%, from around $0.70 to a low of $0.052, wiping out a significant portion of its market capitalization. The attacker allegedly minted 100 million new H tokens and began selling them for BNB. Humanity Protocol, founded in 2024, aimed to verify human users through palm-print biometrics and zero-knowledge proofs on Polygon CDK. Despite raising $50 million across two funding rounds and achieving a unicorn valuation, the project faced prior controversies. Shortly after its June 2025 token launch, reports emerged that only about 1 million of its 9 million registered IDs had completed biometric verification, suggesting 88% might be bots. Furthermore, allegations surfaced that the project might be a rebranded "shell" of a Chinese access control company, raising concerns about data privacy and authenticity. The project's founder, Terence Kwok, has a controversial business history. His previous venture, Tink Labs, burned through $170 million in funding before collapsing in 2020. The breach highlights the persistent critical risk of private key management in crypto. With no user compensation plan detailed in the initial response, the incident deals a severe blow to trust in a project already struggling with credibility issues.

Foresight News54m ago

Humanity Loses $31 Million, a Private Key Causes Token Price to Plunge 90%

Foresight News54m ago

When LPs Teach Me Investment with Doubao: A Self-Narrative of a Private Equity GP Switching Careers

When LPs Use Doubao to Teach Investing: A Transition Story of a Private Equity GP AI is making life increasingly difficult for small private equity fund managers, as a former GP of an offshore dollar fund reveals. The fund, managing tens of millions in US stocks, outperformed the Nasdaq but struggled with fundraising. Its traditional Cayman SPC/BVI structure failed to attract major Asian LPs, who now prefer Hong Kong LPF or Singapore VCC frameworks. The rise of AI-powered quantitative strategies has further squeezed the space for funds like his, which relied on subjective, discretionary investing. AI tools have leveled the information playing field, empowering LPs—often high-net-worth individuals, entrepreneurs, or family offices—to analyze investments themselves using chatbots like Doubao. This has eroded trust in GPs' expertise, leading to more frequent challenges over investment decisions and even withdrawals, especially during market rallies when retail investors sometimes outperform funds. Friction arises not necessarily from AI's capabilities but from how LPs use it. Many rely on conversational AI for validation rather than rigorous analysis, sometimes receiving misleading or hallucinated advice. While AI democratizes research, effective investing still requires discerning real insight from plausible-sounding output. Ultimately, AI is unlikely to fully replace GPs. Asset management remains a trust-based service. However, the industry must adapt. The future may see "human私募" (private equity) learning from AI and focusing more on providing value beyond pure analysis—perhaps by mastering the emotional intelligence and trust-building that machines cannot replicate.

Odaily星球日报1h ago

When LPs Teach Me Investment with Doubao: A Self-Narrative of a Private Equity GP Switching Careers

Odaily星球日报1h ago

Trading

Spot
Futures
活动图片