How to Conduct Deep Research Using Claude's Dynamic Workflows

marsbitОпубліковано о 2026-06-09Востаннє оновлено о 2026-06-09

Анотація

The article "How to Use Claude's Dynamic Workflows for Deep Research" discusses overcoming the pitfalls of technical research, where both humans and AI can get overwhelmed by information, leading to vague conclusions. It introduces Claude Code's new "Dynamic Workflows" feature, which automatically designs and executes task-specific workflows before starting a task, unlike simpler "planning modes." This approach incorporates validation, result convergence, and adversarial verification from the outset. The core of Dynamic Workflows is six predefined scheduling patterns that address how to decompose tasks and synthesize results: 1. **Classify-and-Act (Routing):** An agent classifies the task and routes it to the most suitable specialist agent for execution. It's precise and efficient but struggles with ambiguous tasks. 2. **Fan-out & Merge:** The task is split into parallel, independent subtasks whose results are later merged. It's fast and isolates contexts but is more expensive and challenging to synthesize. 3. **Adversarial Verification:** Multiple "challenger" agents critique a worker agent's conclusion, requiring majority approval. This counters confirmation bias and self-assessment errors but relies on verifiable facts. 4. **Generate & Filter:** Multiple agents generate many candidate solutions, which are then filtered against a rubric to output only the best. It fosters diversity but depends heavily on the filter's quality. 5. **Tournament:** Multiple agents compete...

Conducting technical research is actually a trap-filled endeavor (for both humans and AI). After all, from the beginning of the investigation, you receive a massive amount of information. Views and opinions pile up, and conclusions become increasingly vague. So, it's always crucial to remember to return to the original goal.

This is also why AI hasn't been excellent in this area for a long time. From the perspectives of attention and associative thinking, it tends to get even more trapped by the current volume of information and is weak at making truly valuable cross-disciplinary connections.

Of course, where AI excels is in its execution capability. It can search, summarize, and conclude layer by layer in the form of agents, completely avoiding the loss of details.

Although I haven't published much on my public account in the past six months, I've been closely following and researching major developments across almost all mainstream sectors in the industry. What supports this input and output is my own deep-research system.

Facing the launch of the Dynamic Workflows feature in Claude Code last week, I wanted to challenge it against my own system to see if its default capabilities could completely surpass mine.

2. What is Dynamic Workflows?

Dynamic Workflows has a core concept: before executing a task, the AI first automatically designs what workflow should be used to complete this task, and then initiates the execution.

This is fundamentally different from the "plan mode" and "skills" we used before. Plan mode breaks down tasks into finer details, but not necessarily according to a reasonable workflow. Only as you arrange your prompts might it add acceptance criteria (crucial for research); similarly, it only sets up harness rules better when given prompts.

But Dynamic Workflows automatically incorporates acceptance logic, result convergence, adversarial verification, and similar elements.

The trigger is simple: directly use the /deep-research command in Claude Code and provide some research templates and entry materials. If you want to use the dynamic workflow capability alone, you can use a specific prompt or directly say 'ultracode'. Before using, note that token consumption is about tens of times higher than normal.

3. The Six Built-in Workflow Patterns

The underlying foundation of Dynamic Workflows is six core orchestration patterns summarized by the developers. This is why it's more powerful than regular conversations/agents/skills.

Actually, behind these six patterns lie only two core questions: How to split the task? How to merge the results? Splitting them into six is essentially a permutation and combination of these two.

3.1 Router Pattern (Classify-And-Act)

First, one agent determines the task type, then distributes the task to the most suitable specialized agent to handle. The core logic is the selection logic of the router, not parallelism or iteration. A task follows only one path; other paths are not executed at all.

For example, I could pre-define three sub-agent roles: one analytical agent that strictly verifies data, one output agent skilled in writing, and one challenge agent specialized in finding vulnerabilities. The routing layer judges which sub-task is suitable for whom, rather than having one agent handle everything.

The value of this pattern lies in: precision and frugality. Each agent's prompt can be highly independent, not interfered by other objectives, enabling exploration with vertical depth. Token consumption is lowest, response speed is fastest. Responsibility boundaries are very clear.

The drawback is also significant: weak handling capability for tasks with blurry boundaries (e.g., "both a technical and an account issue").

3.2 Split & Merge (Fan-out & Merge)

Also my most commonly used pattern. The core logic is parallel + merge. The task is split into N independent subtasks to run simultaneously, and after all are completed, they are uniformly merged.

The advantage lies in speed and isolation. The total time taken is approximately equal to the slowest subtask, not the sum of all subtasks. Each subtask has an independent context, not interfering with each other, and noise from one subtask won't pollute others.

The weakness is that token cost is N times that of serial execution, and the merge layer (Synthesize) itself is challenging—how to fuse N structurally inconsistent outputs is a design challenge. Poor subtask division can lead to omissions or repeated coverage.

3.3 Adversarial Verification

The core logic is inspection. For the same conclusion, multiple agents challenge it from a "refutation" perspective; it only passes if a majority vote is reached.

The advantage is that, since the Verifier doesn't know the Worker's line of thought and only looks at the result, it structurally eliminates the self-assessment bias that occurs when "asking the model to check its own code."

This pattern solves a problem that has long troubled me: we often chat with AI in a colloquial manner, but AI tends to answer according to your expectations, easily leading to "confirmation bias." Adversarial verification forces the AI to look for counterexamples and to verify based on data and experiments, not just cater to your ideas.

However, when it comes to verification, if it gives wrong judgments, it can mislead the Worker to cater to the Verifier. Therefore, it's better to prefer verification based on reproducible facts rather than relying on opinions.

Jokingly speaking, if you ask AI to find problems, it can find endless issues, so you must limit the boundaries within which it can search for problems.

3.4 Generate & Filter

The core logic is diverge then converge. Deliberately produce an excessive number of candidates first, then use a rubric to filter down to the essence, retaining only high-confidence results for output.

Instead of letting one agent output a "decent" answer, have it generate ten, then filter with a verification layer. Therefore, the advantage lies in diversity. Multiple Generators can use different strategies and prompts to produce solutions hard to anticipate manually, and the filtering step ensures highly concentrated final output quality.

The weakness is that the quality of the Filter's rubric directly determines the final outcome. A poorly designed rubric renders the entire process useless.

Suitable scenarios are situations where the correct answer is unknown beforehand, needing to choose the best from multiple possibilities, or where there's a clear need for diversity.

It's only superficially similar to Fan-out-And-Synthesize: Both are "multi-path parallel → single output," making them easiest to confuse.

The key difference lies in the intent: Each path in Fan-out handles a different part of the task; results are complementary, and all paths contribute during merging. In Generate-And-Filter, each path handles the same task; results are competitive, and most are discarded during merging. The former is a "jigsaw puzzle," the latter is a "beauty pageant."

3.5 Tournament Pattern

The core logic is competitive elimination. N agents each independently do the same thing, eliminated round by round through pairwise comparison, ultimately selecting the best solution.

I've done this manually before—running two or three versions of the same code change, then having AI compare which is better. Now it can be orchestrated directly within the workflow.

The advantage lies in judgment stability. Pairwise comparison ("Which is better, A or B?") is much more stable than absolute scoring ("Score A") because it eliminates the problem of drifting scoring standards. The result goes through multiple rounds of competition, so the credibility of the final winner is high.

It's also superficially similar to Generate-And-Filter: Both select the best from multiple candidates. The key difference is the selection mechanism: Tournament uses pairwise judges for two-by-two comparison, which is "making candidates compete against each other." When the rubric is difficult to quantify or judgment is essentially relative, this is more reliable.

3.6 Loop Pattern (Loop)

The core logic is adaptive iteration: constantly trying, encountering resistance, collecting error information, supplementing context, retrying until meeting the acceptance conditions.

Essentially, it combats AI's randomness: try a few more times, and you'll eventually stumble upon a better result. But a more mature approach is to combine it with adversarial verification, making each iteration execute with more information, not just relying on randomness.

The advantage lies in handling capability for tasks with unknown workloads. The other five patterns assume the task boundary is determined. Loop Until Done is the only pattern that can handle "not knowing how many rounds are needed."

The weakness is the potential risk of losing control—poorly designed stopping conditions can lead to infinite loops. The agent in each round has a brand-new context and cannot accumulate cross-round state (unless explicitly written to a file).

4. The Battle Between My Own Skill and the Official Workflow

Before Dynamic Workflows came out, I had specifically designed my own deep-research skill. The logic of my skill roughly went like this:

Give only simple information (e.g., a certain project launched a new feature).
Let the AI search all related materials: official documentation, source code, market sentiment.
Compress the information into meaningful summaries.
Multiple agent roles conduct adversarial analysis to generate a report.
Automatically deduplicate because multi-agent content often has high repetition rates.

After using it for a while, I thought it was quite useful. But it had a fundamental flaw: it lacked goal-oriented convergence.

Moreover, even with the fifth step of deduplication, it often deleted valuable information. Without deduplication, the skill would easily give you a 10,000-word long article—very comprehensive in information but failing to directly tell you "what does this have to do with you, what should you do?"

However, research serves "decision-making." This is why many skills can only stop at the research itself, achieving 80 points but missing the crucial final 20.

As a result, after AI preliminarily completes the research, it often requires ten more rounds of thinking and dialogue to reach a satisfactory, comprehensive conclusion.

What More Does the Official Dynamic Workflow Do?

Through several complex research task experiments this week, I found that the deep-research workflow built into Claude Code (note, not just a skill, but a module compiled and embedded into CC), compared to my own skill, adds several key steps:

Problem Decomposition Layer: It doesn't start searching directly. Instead, it first asks questions, breaking my query into multiple sub-questions: What do you really want to clarify? How does this relate to you? Which dimensions are worth delving into? This step I used to skip.
Credibility Assessment: Evaluates the falsifiability of each piece of information, similar to authority scoring in traditional SEO—is the source credible? What about citation counts? This is a step I hadn't thought to add before.
Cross-cutting Deletion, Not Average Merging: My previous approach was to average and select all conclusions, so documents were large. Dynamic Workflows conducts multi-agent voting on each conclusion, deleting those with insufficient votes, not simply merging.
Goal-oriented Output: The final report isn't an information dump, but provides judgment and suggested solutions centered around your original goal. The key to achieving this lies in its preset ability to orchestrate multiple sub-agents. The reason my skill often lacked final goal orientation was precisely because of instruction weight decay after massive amounts of information.

What Problems Do These Mechanisms Solve?

They target several typical problems when AI handles long tasks:

Goal Drift: Good state at the start of the task, losing track in the middle, and regaining rhythm at the end—similar to a human zoning out in class. More obvious with longer tasks.

Premature Stopping: Running along, encountering difficulty, the AI thinks it's "done" and stops, even though the acceptance criteria aren't met at all.

Context Pollution: A single agent handling a complex task compresses the execution space for subsequent steps with the large amount of preceding prompts. A better way is to control preceding prompts within a few thousand tokens and distribute context across multiple agents.

Output Bias: AI tends to answer along your expectations; colloquial questioning triggers this problem more easily.

Dynamic Workflows solves these four problems in a structured way: automatically adding acceptance criteria to prevent premature stopping; parallel isolation of context; adversarial verification to counteract output bias; decomposing the problem and constraining the AI layer by layer to first understand the goal before acting.

5. Summary

Finally, as a long-term research practitioner, I am astounded by this new CC mechanism. Its six built-in patterns—Router Selection, Split & Merge, Adversarial Verification, Generate & Filter, Tournament Competition, and Loop Cycle—cover the orchestration needs of the vast majority of complex research tasks.

I no longer need to manually design agent orchestration, nor handle deduplication and cross-validation myself; these are now built into the workflow itself.

Moreover, it's particularly suitable for open-ended exploration where information is lacking. Because its inherent multi-agent orchestration + task goal decomposition raises its generality once again. In fact, as early as three years ago, AI already did quite well in solving extremely clear, small problems under layered constraints. But the qualitative leap for AI truly lies in generality. Only then does its competition shift from simply solving one problem to becoming a true Agent, adapting from solid-state problem-solving to handling any problem.

So, Dynamic Workflows is not "smarter single conversations," but the structuralization of the research process itself.

Research that originally required me to initiate ten-plus independent conversations is now compressed to 3-4. Although the corresponding token consumption has increased tens of times.

So why still 3-4 times? I think the root cause lies in the differences of these demands.

First, the rigor of the verification mechanism. I mainly research new technologies on the blockchain. For many things, official documentation lags; there are more valuable reference sources like open-source code, on-chain transactions, etc. Currently, the AI still defaults to official documentation as the standard, not verification based on factual evidence.

Second, completely cross-disciplinary deep thinking. Although workflow presets can solve some of this (pre-defining various dimensional subAgents to think about the same issue), AI excels at mainstream thinking models. For the very new, very profound, lacking data basis, it's slightly insufficient.

Third, solution design and verification. The significance of a solution lies not in its proposal but in its verification and support. It relies on measuring existing mechanisms, investment, and costs. If AI is well-trained, it can certainly do better, but this somewhat contradicts generality.

Finally, extreme information condensation. This then requires returning to an understanding of the audience's background. Some people have no foundation and need you to explain in an anthropomorphic, vivid way, while some listeners need you to impress them in one sentence~.

Пов'язані питання

QWhat are the two core issues that the six workflow patterns in Dynamic Workflows address?

AThe two core issues are: How to break down a task, and how to synthesize/merge the results.

QWhat are the key advantages of the Adversarial Verification workflow pattern?

AIts key advantages are that it structurally eliminates self-evaluation bias by having a separate agent verify the results, and it forces the AI to seek counter-examples and base conclusions on data rather than conforming to user expectations, thus mitigating confirmation bias.

QWhat major advantage does the Fan-out & Merge workflow offer compared to sequential execution?

AIts major advantage is speed and isolation. The total time is roughly equal to the slowest subtask, not the sum of all subtasks. Each subtask runs with independent context, preventing noise or pollution between them.

QAccording to the author, what is the fundamental flaw of their own previous deep-research skill system?

AThe fundamental flaw is a lack of goal-oriented convergence. It produced comprehensive information but failed to clearly relate it to the user's objectives or provide actionable conclusions, stopping at the research itself.

QWhat is one significant additional step the Claude Code built-in deep research workflow performs compared to the author's old skill?

AIt performs a problem decomposition layer, where it starts by asking clarifying questions to break the user's query into sub-questions about their true goals and relevant dimensions, before beginning any research.

Пов'язані матеріали

AI Investors' 2026 Anxiety: When Models Devour Everything, What Moat Is Left for Startups?

In 2026, a wave of investor anxiety questions the defensibility of AI startups as models improve, fearing that most companies are just "thin wrappers" destined to be absorbed by foundation models or chipmakers. The author argues against this despair, positing that true moats lie not in benchmark performance but in areas models cannot easily reach. The logic of despair is that if models excel at all measurable tasks, only compute and cutting-edge model weights hold lasting value. However, the essay contends that the most valuable work is inherently "untrainable." Benchmarks measure what can be measured and thus optimized for, but real-world correctness often resides in private, complex systems. Examples include legacy codebases, intricate legal transactions, or hospital workflows. This kind of correctness is proprietary, costly to establish, and cannot be validated quickly—it requires time and trust within an organization. As models commodify visible, measurable tasks from both above (labs absorbing scaffolding) and below (saturation by cheaper models), value shifts to "untrainable ground." This encompasses work where correctness is a private truth, locked behind integration barriers, licenses, liability frameworks, and entrenched user habits. Trust and adoption are slow, human-centric processes that smarter models cannot accelerate. Successful companies defend their position by embedding deeply into client operations, owning the definition of "good" within a specific domain (e.g., Harvey in law, OpenEvidence in medicine), and pricing on outcomes rather than tokens. While labs compete fiercely, they are incentivized to keep the application layer vibrant. The future belongs not to those competing on generic benchmarks but to those navigating unscoreable terrain, doing the "unsexy work" of translation between models and messy human realities. The most cited benchmark scores are thus maps of territory about to become worthless, signaling who will lose the right to define what counts as good.

marsbit27 хв тому

AI Investors' 2026 Anxiety: When Models Devour Everything, What Moat Is Left for Startups?

marsbit27 хв тому

Three-Year High Smashes Rate-Cut Hopes, Who's Using CPI to Clean Out Whale Holdings?

U.S. CPI inflation accelerated to 4.2% year-over-year in May, reaching a three-year high and dampening market expectations for Federal Reserve rate cuts in 2026. The surge was primarily driven by a sharp 23.5% annual increase in energy prices, fueled by geopolitical conflicts. While core CPI rose 2.9% year-over-year, showing relative moderation, the hotter-than-expected headline data prompted a repricing in interest rate markets. Traders now overwhelmingly expect the Fed to hold rates steady, with some even pricing in potential rate hikes for late 2026 or 2027. The report triggered declines in major U.S. stock indices and pressured risk assets like Bitcoin, which remained volatile around $61,000-$62,000. Analysts note that while the energy-driven inflation spike is concerning, the controlled core figures provide the Fed room for patience. However, the "higher for longer" interest rate narrative has solidified. For crypto markets, this translates to continued outflows from Bitcoin ETFs, heightened volatility, and a market seemingly in a "capitulation" phase, with leverage reset but sustained demand yet to materialize.

Foresight News47 хв тому

Three-Year High Smashes Rate-Cut Hopes, Who's Using CPI to Clean Out Whale Holdings?

Foresight News47 хв тому

The Ethereum Indicator That Never Missed A Bottom Is Signaling Again, This Time At $700

Ethereum (ETH) is trading around $1,606, down 70% from its all-time high. Analyst Ali Martinez points to the Delta Price indicator, which has historically signaled market bottoms and is now at $708. This suggests ETH could face another 56% drop from current levels to approximately $700 before a sustained recovery. Martinez identifies key support levels at $1,560 and $1,070. For a bullish reversal, ETH needs to reclaim the 200-week SMA near $2,500 and then the 50-week SMA near $3,100, conditions not currently met as selling pressure persists.

bitcoinist59 хв тому

The Ethereum Indicator That Never Missed A Bottom Is Signaling Again, This Time At $700

bitcoinist59 хв тому

Trump's Crypto Empire: A $2.3 Billion Wealth Transfer Experiment

In June 2026, Reuters investigations revealed that since Donald Trump's return to the White House, his family has accumulated roughly $2.3 billion in profits from four core crypto ventures: World Liberty Financial (WLFI), the $TRUMP meme coin, American Bitcoin, and ALT5 Sigma (later renamed AI Financial). Coincidentally, overall investor losses in these projects were estimated to be a similar amount. The businesses, spanning DeFi, stablecoins, meme coins, Bitcoin mining, and digital payments, largely relied not on technological innovation but on converting the political influence and notoriety of the Trump brand into financial assets sold to the market. This marks a dramatic shift from Trump's earlier skepticism of cryptocurrencies. The ventures operated on a similar logic: leveraging the Trump name to generate market hype and trust, attracting investment through token sales or public listings, and enabling the family to capture profits upfront through equity, token allocations, and fees, while later entrants often bore the brunt of the risk as markets cooled. WLFI, the most profitable venture, generated an estimated $1.6 billion for the family, primarily through sales of its locked, illiquid governance token and its USD1 stablecoin. The $TRUMP meme coin, a direct monetization of the presidential IP, brought in over $600 million for Trump-linked entities before its price crashed nearly 97% from its peak. American Bitcoin gained a "Trump stock" premium for its mining operations, and ALT5 Sigma/AI Financial combined Trump, AI, and crypto themes for a temporary valuation surge. The episode underscores how political influence can be packaged into financial assets, creating substantial wealth for promoters while highlighting the risks for investors who base decisions on hype and brand allegiance over fundamental business models and cash flows.

marsbit1 год тому

Trump's Crypto Empire: A $2.3 Billion Wealth Transfer Experiment

marsbit1 год тому

CFTC Proposes New Rules for Prediction Markets, Redefining Which Events Can Be Listed and Who Can Participate

The U.S. Commodity Futures Trading Commission (CFTC) has proposed new rules to establish a clearer regulatory framework for prediction markets. The proposal aims to modify how "event contracts" are reviewed, creating a structured process to determine if contracts involving terrorism, assassination, war, or illegal activities violate the public interest. This moves away from a blanket ban toward a case-by-case assessment of whether a contract's subject matter is acceptable for financial trading. A key focus is distinguishing between predicting the impact of risks and predicting the occurrence of harm. The proposal suggests that many sports-based prediction markets—such as those on game outcomes, scores, or season standings—may be permissible as they can provide price discovery and meaningful information. However, markets on easily manipulated events like specific player injuries, referee calls, or outcomes of youth sports would face stricter scrutiny. The rules directly target insider trading and manipulation risks, highlighting cases where individuals with non-public information or the ability to influence an event's outcome could unfairly profit. This underscores a shift toward ensuring market fairness. The proposal does not end the regulatory debate, particularly with state gambling regulators who argue that sports prediction markets are essentially sports betting and should fall under state jurisdiction. Nonetheless, the CFTC's action signals a move toward formalizing prediction markets, pushing the industry from a phase of rapid, often unregulated expansion into a more institutionalized, rule-based environment that more closely resembles traditional financial markets.

marsbit1 год тому

CFTC Proposes New Rules for Prediction Markets, Redefining Which Events Can Be Listed and Who Can Participate

marsbit1 год тому

Торгівля

Спот

Ф'ючерси