The Right Way to Use Skills: 5 Reflections After Anthropic Publicly Shared Its Internal Methodology

marsbitPublished on 2026-06-08Last updated on 2026-06-08

Abstract

A deep dive into Anthropic's internal methodology for building effective AI "Skills" reveals five key insights for maximizing their value. First, Skills should focus on capturing "Gotchas" and tacit organizational knowledge—like common pitfalls and undocumented rules—rather than restating general information the AI already knows. Second, think of Skills as a form of "Context Engineering"; they are best structured as folders, not monolithic documents. A core `SKILL.md` file should act as a navigational index, progressively pulling in detailed references, examples, and assets only as needed to avoid overwhelming the model's context window. Third, whenever possible, automate repetitive tasks with scripts. This preserves the model's reasoning capacity for judgment and analysis, while scripts reliably handle the execution, saving tokens and improving accuracy. Instructions within a Skill provide the "why" and the expert judgment, while scripts provide the concrete "how." Fourth, a Skill's description is critical and often misunderstood. It should not be a list of features but a routing rule that clearly signals *when* the Skill should be triggered based on user intent and common phrasing. Finally, as Skills scale from personal tools to team-wide assets, management is crucial. Anthropic advocates for a lightweight, organic approach: let new Skills spread organically within small groups first. Those that prove genuinely useful through adoption naturally graduate to a formal marke...

Author: AI Product Aying

I read a blog post by the Anthropic team titled "Lessons from building Claude Code: How we use skills." This is probably the most in-depth practical summary I've seen about Skills so far.

Skills aren't that complicated, but doing them well isn't that easy either.

I remember when Skills first became popular, everyone loved making all kinds of writing style Skills, composition Skills. It seemed like as long as you stuffed your writing style into it, the model could consistently output in that style.

But later, after trying a bunch myself, I found it often just didn't work.

Because a writing style Skill might stuff in thousands or even tens of thousands of words. Once the Skill loads, it eats up a big chunk of the context. When the context gets heavy, the model's reasoning ability actually tends to drop.

You often end up with this situation: the style is learned, but the content becomes shallow, and the analytical ability weakens.

There's another common scenario.

When many people write Skills, they love stuffing them with various operation instructions. Step one do this, step two do that, step three do this. When you run it, you'll find the model's execution isn't stable.

Later I slowly understood that a lot of this repetitive execution work is actually more suitable to be solidified into a Script, rather than written as long Instructions.

After reading this Anthropic article, my biggest takeaway is that many people are actually using Skills, but they might not truly understand Skills.

Skill is essentially about Context Engineering. There's a lot of experience involved in deciding when knowledge should go into a Skill, when it should be split into References, when it should be written as a Script, and when Gotchas should be used to constrain the model.

After understanding how Skills work, looking back at those excellent Skills, you'll find they're never solving prompt problems; they're solving problems related to context, experience accumulation, and capability reuse.

If you want to deeply research Skills, I highly recommend reading two articles:

https://claude.com/blog/lessons-from-building-claude-code-how-we-use-skills

https://research.perplexity.ai/articles/designing-refining-and-maintaining-agent-skills-at-perplexity

#01 Don't Write Nonsense

Skills are essentially about accumulating "tacit knowledge" within an organization. So, don't repeat common sense the model already knows in a Skill. What's truly valuable is the information the model fundamentally doesn't know.

Anthropic internally often emphasizes that what Skills really need to document are the Gotchas, the common pitfalls.

For example:

1. This table cannot be sorted by `created_at`

2. Staging returning 200 doesn't mean success

3. `request_id` and `trace_id` are the same thing

Because this kind of information often exists in employees' experience. So you must remember what a Skill essentially is.

Skill = Writing down the experienced master's knowledge.

Through Skills, you accumulate the experience originally scattered in different people's minds.

#02 Skill is Actually Context Engineering

This might be one of Anthropic's most profound points.

A Skill is not a markdown file; it's a folder. For people who have used Skills, this sounds like stating the obvious.

But I've been mulling it over these past few days and slowly realized: they precisely want to use the folder form to express the concept of Context Engineering.

Let's look again at a typical Skill structure:

skill/ ├── SKILL.md ├── references/ - place detailed instructions, API references, edge cases ├── scripts/ - place executable scripts ├── examples/ - place examples ├── assets/ - place templates, images, fixed materials

When a Skill is invoked, the model first reads SKILL.md. If we cram all information into this file, context will explode very quickly.

Assume this is a payment troubleshooting Skill, containing Stripe error code explanations, historical failure cases, troubleshooting scripts, and final report templates.

If all this content is piled into SKILL.md, every time the Skill is invoked, Claude has to read it all again.

Even if the user just wants to confirm the meaning of one error code, even if they just want to check why a payment status hasn't updated. A large amount of completely unnecessary information also gets shoved into the context.

Anthropic's approach is completely different.

SKILL.md is more like a navigation page. Its job is to tell the model, when encountering a Stripe error, go to `references` to find the corresponding explanation.

When needing to reference historical cases, go to `examples` to check similar issues; when needing to actually execute troubleshooting actions, run the script in `scripts`; finally, when generating the troubleshooting report, use the template in `assets`.

The whole process is a gradual exposure.

I strongly suggest you save the image below.

#03 Use Scripts Whenever Possible

Don't let the model waste its limited context and reasoning power on repetitive labor. Hand these tasks over to scripts.

For example. When many people write Skills, they write like this:

1. Query registration data; 2. Query payment data; 3. Calculate conversion rate; 4. Analyze root causes.

This way of writing is fine, of course. The model can complete it. But every time it executes, it has to run through the entire analysis process from the beginning.

Querying data, organizing data, handling various edge cases — this work is all repetitive.

Since these capabilities have been verified countless times. Why make the model reinvent it each time? Just provide the concrete scripts directly.

And through scripts, Skill execution becomes more accurate and also saves tokens.

From this perspective, the Scripts in a Skill are actually solidifying organizational capability. Behind each script is often the best practice summarized by the team after countless past pitfalls.

After solidifying these capabilities, Claude can work based on this accumulated experience every time, instead of starting from scratch again and again.

So I increasingly feel that within a Skill, Instructions and Scripts solve problems at two different levels.

Instructions provide experience and judgment; Scripts provide capability and execution.

For example, a payment troubleshooting Skill might have this line:

If Stripe returns 200, don't assume payment success directly; you need to further check the `payment_events` table.

This belongs to Instructions. Because it's experience. Whereas `check_payment_events()` belongs to Script, because it's execution capability.

If you only have the Script, the model knows *how* to check, but may not know *why* to check.

If you only have Instructions, the model knows it *should* check. But has to re-implement it every time. Both are indispensable.

#04 Description is More Like a Routing Rule

The way many people write Skill Descriptions is inherently wrong.

Because people are used to writing them as feature introductions. For example: PR Management Skill helps users monitor PR status, handle CI issues, automatically complete Merges.

But the problem is, the model doesn't find Skills by their functionality. When Claude Code starts up, it first scans the names and Descriptions of all Skills.

Then, based on the user's current question, it decides which Skill should be loaded.

So the most important information in the Description is not what this Skill can *do*, but under what circumstances it *should* be loaded.

The Description actually handles the routing work for the entire Skill.

In the real world, few people say "help me invoke a PR management tool." People are more likely to say: "help me keep an eye on this PR," "the CI is down again," and so on.

So a good Description should try to describe the user's *intent*, not list features.

I even think you can use a very simple method to check.

After writing the Description, delete the entire Skill, keeping only this one line Description. Then ask yourself: after the model sees the user's question, can it know when to load this Skill?

If it can't, you probably need to keep revising.

#05 Skill Management and Distribution

Another point is about Skill management.

When one person uses Skills, it's pretty simple. Write a few Skills yourself, maintain them yourself, upgrade them yourself. But I believe most teams will eventually face the same problem.

When Skills grow from a few to dozens, or even hundreds, how should these Skills be managed? How should they be upgraded? How should they be distributed to team members?

I think Anthropic's experience in this area is quite worth referencing.

When the team size is relatively small, Skills can travel directly with the code repository. Just put them in the project's .claude/skills directory. Everyone shares the same set of Skills and the same working methods.

But as the number of Skills increases, a new problem appears.

When Claude Code starts up, it scans the names and Descriptions of all Skills, then decides which Skill should be invoked for the current task. The more Skills there are, the higher the routing cost.

This is also why Anthropic later started making a Marketplace. But what's even more interesting is how they manage the Marketplace.

When many companies encounter this problem, their first reaction is often to establish an approval process. Whoever writes a Skill submits an application first; after approval, it enters the official Skill library. We did this internally before too, but it's very heavy. Managing for the sake of management.

I found Anthropic's organization is very lightweight.

Let new Skills spread in a small scope first; let colleagues install and try them themselves.

If more and more people start using it, it shows this Skill truly solves a real problem. At this stage, the author can then submit it to the formal Marketplace.

So they don't first debate whether a Skill is valuable; they first let it be tested in real usage scenarios. If many people use it, it naturally enters the formal system. The Skills that remain this way are basically the ones the team truly needs.

Related Questions

QAccording to the article, what is the fundamental purpose of a Skill in AI systems like Claude?

AThe fundamental purpose of a Skill is to be a form of Context Engineering. It aims to capture and codify the 'tacit knowledge' or 'experienced master's knowledge' within an organization, such as gotchas, common pitfalls, and specific operational insights that the AI model wouldn't inherently know. It's about solving problems related to context, experience accumulation, and capability reuse, rather than just being a lengthy prompt or instruction set.

QBased on Anthropic's methodology, what is the key structural concept for organizing a Skill to avoid context overload?

AThe key structural concept is to treat a Skill not as a single markdown file, but as a folder with organized subdirectories. A typical Skill folder includes `SKILL.md` (acting as a navigation page), `references/` for detailed documentation, `scripts/` for executable scripts, `examples/` for case studies, and `assets/` for templates. This structure allows for progressive exposure of information, where only the necessary components are loaded into the context as needed, preventing 'context explosion' and preserving the model's reasoning capabilities.

QWhat is the recommended distinction between 'Instructions' and 'Scripts' within a Skill, and why is it important?

AInstructions and Scripts solve problems at different levels. Instructions provide 'experience and judgment'—they tell the AI *what* to do and *why*, based on accumulated knowledge (e.g., 'If Stripe returns 200, don't assume success; check the payment_events table'). Scripts provide 'capability and execution'—they are concrete, reusable pieces of code that perform repetitive tasks (e.g., a `check_payment_events()` function). This distinction is important because scripts prevent the model from wasting context and reasoning power on re-implementing verified actions, making execution more accurate and token-efficient, while instructions ensure the model applies the correct logic and understanding.

QWhat is the primary function of a Skill's Description, and what common mistake do people make when writing it?

AThe primary function of a Skill's Description is to act as a routing rule. It should clearly indicate *when* the Skill should be loaded based on the user's intent or the problem context, not just list the Skill's features. The common mistake is writing it as a feature introduction (e.g., 'This Skill helps monitor PR status...'). Instead, it should describe user intent (e.g., phrases users might say like 'help me watch this PR' or 'the CI is broken again') so the AI can accurately decide which Skill to invoke for a given query.

QHow does Anthropic manage the distribution and evolution of Skills within a team as their number grows, according to the article?

AAnthropic employs a lightweight, usage-driven approach. Initially, Skills are shared within a project's `.claude/skills` directory. For broader distribution and management (like in a Marketplace), they avoid heavy approval processes. Instead, new Skills are first shared informally among colleagues for installation and trial. If a Skill gains organic adoption and proves useful by solving a real problem for many users, the author can then submit it to the official Marketplace. This method ensures that only genuinely valuable and tested Skills become part of the formal system.

Related Reads

Jensen Huang 'Saves' South Korean Stock Market: Locks In SK Hynix Memory, Chip Shortage to Continue

On June 5th, South Korea's stock market experienced a sharp decline, with major chipmakers like Samsung and SK Hynix dropping nearly 10%. Amidst the turmoil, NVIDIA CEO Jensen Huang's visit to Seoul played a dramatic role in boosting market sentiment. Following a dinner meeting with SK Group Chairman Chey Tae-won and SK Hynix CEO Kwak Noh-Jung, Huang confirmed that NVIDIA's new Vera CPU will utilize SK Hynix DRAM. The companies announced a multi-year technical partnership to co-develop next-generation memory for NVIDIA's AI infrastructure, covering products from data centers to personal AI and robotics. This collaboration extends beyond memory supply. SK Hynix is integrating NVIDIA's AI and Omniverse platform into its own semiconductor design and manufacturing processes, including computational lithography and creating digital twins of its fabrication plants for autonomous operation. While strengthening ties with SK Hynix, NVIDIA is diversifying its supply chain for the upcoming HBM4 memory, with Samsung, SK Hynix, and Micron all certified as suppliers for its Vera Rubin platform. Despite this, Huang warned that the global chip shortage, driven by relentless demand from AI factory construction, is expected to persist for several years across the entire supply chain. His visit underscores NVIDIA's systematic effort to deepen integration with South Korea's broader tech industry.

marsbit22m ago

Jensen Huang 'Saves' South Korean Stock Market: Locks In SK Hynix Memory, Chip Shortage to Continue

marsbit22m ago

Nasdaq Plunges 4.2% in a Single Day: Does "Black Friday" Burst the U.S. Stock Market Bubble?

The Nasdaq plunged 4.18% on June 5, 2026, its worst single-day drop in over a year, as a much stronger-than-expected US jobs report triggered fears of economic overheating and delayed Federal Reserve interest rate cuts. The selloff, centered on high-valuation tech and AI stocks like Nvidia and Broadcom, spread across major indices. The article examines whether this signals a market top. The strong May non-farm payrolls data, nearly double expectations, pushed bond yields higher, directly hurting rate-sensitive tech stocks. This exposed vulnerabilities in the crowded AI trade, where valuations had soared on narratives of infinite growth, despite emerging signs of slowing order momentum and corporate AI monetization challenges. Prior to the drop, market indicators flashed warning signs: historically high valuations (e.g., Shiller CAPE ratio near 39.5), extreme bullish sentiment, and high levels of leverage. Technical charts showed key support levels being breached. Wall Street is divided on the outlook. Bears, citing risks of "stagflation" and AI bubble comparisons to the dot-com era, warn of a potential significant correction. Bulls view the drop as a healthy correction within a bull market, underpinned by a strong economy and expected corporate earnings growth of around 7% in 2026. The immediate future hinges on upcoming key events: the May CPI inflation data and the mid-June FOMC meeting. Their outcomes will critically shape market expectations for the Fed's rate path. The article concludes that conditions for a major market top are aligning, marking a fragile transition from narrative-driven gains to a phase demanding validation from macroeconomic data and corporate fundamentals. Caution is advised.

marsbit26m ago

Nasdaq Plunges 4.2% in a Single Day: Does "Black Friday" Burst the U.S. Stock Market Bubble?

marsbit26m ago

Nasdaq Plunges 4.2% in a Single Day, Did 'Black Friday' Pop the U.S. Stock Bubble?

The Nasdaq Composite plummeted 4.18% on June 5, its biggest single-day drop since April 2025, triggering widespread debate over whether the U.S. stock market has peaked. The sell-off was sparked by a stronger-than-expected U.S. non-farm payrolls report, which fueled fears of economic overheating and pushed back market expectations for Federal Reserve rate cuts, leading to a sharp rise in Treasury yields. The AI sector, the primary driver of the recent bull market, suffered severe losses, with the Philadelphia Semiconductor Index crashing over 10%. Stocks like Nvidia, Broadcom, and Micron led the decline. Concerns are mounting about the sustainability of AI capital expenditures and high valuations, with signs of order cuts for next-generation chips emerging. Analyses point to several warning signs: historically high market valuations (e.g., elevated Shiller CAPE ratio, Buffett Indicator), extreme bullish sentiment indicators, and significant insider selling. The sell-off also caused a key technical breakdown, with the S&P 500 breaking below its short-term moving average and testing its 200-day moving average. Wall Street is divided on the outlook. Bears warn this could be the start of a bubble deflation or a "stagflation" scenario, while bulls view it as a healthy, overdue correction within a bull market driven by solid corporate earnings growth. A more moderate view suggests the easy liquidity-driven rally is over, and markets are entering a phase of fundamental stock-picking with potential for consolidation. The immediate future hinges on key upcoming events: the May CPI report and the mid-June FOMC meeting. Their outcomes will be critical in determining whether this is a temporary pullback or the beginning of a more significant trend reversal. The consensus is that the era of one-directional market gains may be ending, requiring increased investor caution.

Odaily星球日报32m ago

Nasdaq Plunges 4.2% in a Single Day, Did 'Black Friday' Pop the U.S. Stock Bubble?

Odaily星球日报32m ago

The First Case on AI Agents: What Was Adjudicated?

"The First 'Agent' Ruling: What Was Decided?" On April 30, the Guangzhou Internet Court issued a ruling—China's first behavior preservation order in the intelligent agent (AI agent) field. The defendant, an open-source AI agent software, was ordered to stop downloads, cease actions that bypassed a platform's technical protection measures, and delete related tutorials and data. The core issue: the software used the operating system's "accessibility service" permissions to automate user interactions within other apps without those platforms' authorization. This mirrors a recent US case where Amazon sued Perplexity for similar reasons—bypassing Amazon's API to directly scrape and interact with its pages—and won a preliminary injunction. Both rulings establish a crucial legal boundary for the AI agent era: agents cannot operate unchecked. The article argues the fundamental legal principle emerging is one of **dual authorization**. An AI agent requires both **user consent** AND **platform consent** to operate legitimately within that platform's ecosystem. Bypassing platform rules through system-level permissions, even with user permission, undermines platform responsibilities for content moderation, data security, and user privacy, creating liability issues. The piece uses the evolution of "Doubao Phone" (an AI-integrated smartphone) as a case study. Its initial, aggressive version that bypassed platform controls faced roadblocks. Its upcoming 2.0 version is reportedly pivoting to negotiate API access and authorization deals with major platforms (like Alibaba's ecosystem), seen as a strategic adaptation to the new regulatory reality. A global trend is identified: the era of unregulated, "wild west" growth for AI agents is ending, replaced by a **compliance race**. This raises barriers to entry, as securing platform authorizations becomes a new cost. Open-source status is also not a legal shield if the code facilitates bypassing technical protections. In conclusion, these first rulings target not the largest, but the most **aggressive and representative** cases. By setting precedent with them, regulators are efficiently steering the entire industry towards a new, more regulated operating paradigm defined by dual authorization and platform cooperation.

marsbit37m ago

The First Case on AI Agents: What Was Adjudicated?

marsbit37m ago

Fired by Google Over a 14-Page Paper, Over 4,000 Rallied for Her. 6 Years Later: She Almost Predicted the Entire AI Era Back Then.

In late 2020, Google AI researcher Timnit Gebru was effectively dismissed following a conflict over a 14-page, unpublished research paper she co-authored titled "On the Dangers of Stochastic Parrots." The paper, which has since been cited over 14,000 times, raised critical early warnings about the risks of large language models (LLMs). It argued that these models, trained on vast, biased internet data, are essentially "stochastic parrots" that mimic language without true understanding, potentially amplifying societal biases, generating plausible but false information (later termed "AI hallucination"), consuming massive energy, and obscuring their training data contents. Gebru's stance led to a clash with Google management, who requested the paper's withdrawal. Her subsequent internal criticism of the company's diversity efforts and handling of the matter culminated in her termination, which sparked protests from over 4,000 Google employees and researchers. Six years later, the paper's predictions have proven remarkably prescient. Issues like AI hallucination, embedded bias (evident in resume screening and healthcare algorithms), soaring energy consumption from AI data centers, unvetted training data containing harmful content, and the risk of "model collapse" from AI-generated internet content have become central industry challenges. The incident also highlighted concerns about AI development being driven primarily by commercial competition within a handful of powerful tech companies, often at the expense of ethical considerations. After leaving Google, Gebru founded the Distributed AI Research Institute (DAIR) to explore these issues independently. The controversy underscores how her early, critical insights into the fundamental limitations and societal impacts of LLMs anticipated many of the most pressing dilemmas in today's AI era.

marsbit38m ago

Fired by Google Over a 14-Page Paper, Over 4,000 Rallied for Her. 6 Years Later: She Almost Predicted the Entire AI Era Back Then.

marsbit38m ago

Trading

Spot
Futures
活动图片