Thin Harness, Fat Skills: The True Source of 100x AI Productivity

marsbitPublished on 2026-04-13Last updated on 2026-04-13

Abstract

The article "Thin Harness, Fat Skills: The True Source of 100x AI Productivity" argues that the key to massive productivity gains in AI is not more advanced models, but a superior system architecture. This framework, "fat skills + thin harness," decouples intelligence from execution. Core components are defined: 1. **Skill Files:** Reusable markdown documents that teach a model *how* to perform a process, acting like parameterized function calls. 2. **Harness:** A thin runtime layer that manages the model's execution loop, context, and security, staying minimal and fast. 3. **Resolver:** A context router that loads the correct documentation or skill at the right time, preventing context window pollution. 4. **Latent vs. Deterministic:** A strict separation between tasks requiring AI judgment (latent space) and those needing predictable, repeatable results (deterministic). 5. **Diarization:** The critical process where the model reads all materials on a topic and synthesizes a structured, one-page summary, capturing nuanced intelligence. The architecture prioritizes pushing intelligence into reusable skills and execution into deterministic tools, with a thin harness in between. This allows the system to learn and improve over time, as demonstrated by a YC system that matches startup founders. Skills like `/enrich-founder` and `/match` perform complex analysis and matching that pure embedding searches cannot. A learning loop allows skills to rewrite themselves based on f...

Editor's Note: While "more powerful models" have become the default answer in the industry, this article offers a different perspective: what truly creates a 10x, 100x, or even 1000x productivity gap is not the model itself, but the entire system design built around it.

The author of this article is Garry Tan, the current President and CEO of Y Combinator, who has long been deeply involved in AI and the early-stage startup ecosystem. He proposes the "fat skills + thin harness" framework, breaking down AI applications into key components such as skills, execution harness, context routing, task division, and knowledge compression.

Within this system, the model is no longer the entirety of capability but merely an execution unit; what truly determines the output quality is how you organize context, solidify processes, and delineate the boundary between "judgment" and "computation."

More importantly, this method is not just conceptual; it has been validated in real-world scenarios: faced with the task of processing and matching data for thousands of entrepreneurs, a system achieved capabilities close to a human analyst through a "read-organize-judge-write back" loop, and continuously self-optimized without rewriting code. This kind of "learning system" transforms AI from a one-time tool into infrastructure with compound effects.

Thus, the core reminder from the article becomes clear: in the AI era, the efficiency gap no longer depends on whether you use the most advanced model, but on whether you build a system capable of continuously accumulating capabilities and evolving automatically.

Below is the original text:

Steve Yegge says that people using AI programming agents are "10x to 100x more efficient than engineers who only use Cursor and chat tools to write code, roughly 1000x more efficient than Google engineers in 2005."

This is not an exaggeration. I've seen it with my own eyes, and I've experienced it myself. But when people hear such a gap, they often attribute it to the wrong reasons: a stronger model, a smarter Claude, more parameters.

In reality, the person achieving a 2x efficiency boost and the one achieving a 100x boost are using the same model. The difference isn't in "intelligence," but in "architecture," and this architecture is simple enough to be written on a card.

The Harness (Execution Framework) Is the Product Itself.

On March 31, 2026, an accident at Anthropic led to the full source code of Claude Code being published on npm—totaling 512,000 lines. I read through it all. This confirmed what I've been saying at YC (Y Combinator): the real secret isn't the model, but the "layer that wraps the model."

Real-time code repository context, prompt caching, tools designed for specific tasks, compressing redundant context as much as possible, structured session memory, sub-agents running in parallel—none of these make the model smarter. But they give the model the "right context" at the "right time," while avoiding being flooded with irrelevant information.

This layer of "wrapping" is called the harness (execution framework). And the question all AI builders should really ask is: what should go into the harness, and what should stay out?

This question actually has a very specific answer—I call it: thin harness, fat skills.

Five Definitions

The bottleneck has never been the model's intelligence. Models have long known how to reason, synthesize information, and write code.

They fail because they don't understand your data—your schema, your conventions, the specific shape of your problem. And these five definitions are precisely meant to solve this problem.

1. Skill file

A skill file is a reusable markdown document that teaches the model "how to do something." Note, it's not telling it "what to do"—that part is provided by the user. The skill file provides the process.

The key point most people miss is: a skill file is actually like a method call. It can receive parameters. You can call it with different parameters. The same set of processes, because different parameters are passed in, can exhibit completely different capabilities.

For example, there is a skill called /investigate. It contains seven steps: define the data scope, build a timeline, diarize each document, synthesize and summarize, argue from both positive and negative sides, cite sources. It receives three parameters: TARGET, QUESTION, and DATASET.

If you point it at a security scientist and 2.1 million forensic emails, it becomes a medical research analyst, judging whether a whistleblower was suppressed.

If you point it at a shell company and FEC (Federal Election Commission) filing documents, it becomes a forensic investigator, tracking coordinated political donations.

It's the same skill. The same seven steps. The same markdown file. A skill describes a judgment process, and what grounds it in the real world are the parameters passed during the call.

This isn't prompt engineering; it's software design: except here, markdown is the programming language, and human judgment is the runtime environment. In fact, markdown is even more suitable for encapsulating capabilities than rigid source code because it describes processes, judgments, and context—precisely the language models "understand" best.

2. Harness (Execution Framework)

The harness is the program layer that drives the LLM's operation. It only does four things: run the model in a loop, read/write your files, manage context, and enforce security constraints.

That's it. This is "thin."

The anti-pattern is: fat harness, thin skills.

You must have seen this kind of thing: 40+ tool definitions, with descriptions eating up half the context window; an all-powerful God-tool, taking 2 to 5 seconds per MCP round trip; or, wrapping every REST API endpoint as a separate tool. The result is triple the token usage, triple the latency, and triple the failure rate.

The ideal approach is to use purpose-built, fast, and narrowly focused tools.

For example, a Playwright CLI where each browser operation takes 100 milliseconds; not a Chrome MCP that takes 15 seconds for one screenshot → find → click → wait → read sequence. The former is 75x faster.

There's no need for software to be "over-engineered to the point of bloat" anymore. What you should do is: only build what you truly need, and nothing more.

3. Resolver

A resolver is essentially a context routing table. When task type X appears, prioritize loading document Y. Skills tell the model "how to do"; resolvers tell the model "when to load what."

For example, a developer changes a prompt. Without a resolver, they might just deploy after the change. With a resolver, the model first reads docs/EVALS.md. And this document says: run the evaluation suite first, compare the scores before and after; if accuracy drops by more than 2%, roll back and investigate the cause. This developer might not even have known an evaluation suite existed. The resolver loaded the correct context at the correct moment.

Claude Code has a built-in resolver. Each skill has a description field, and the model automatically matches user intent with the skill's description. You don't even need to remember if the /ship skill exists—the description itself is the resolver.

Frankly: my previous CLAUDE.md was a full 20,000 lines. All quirks, all patterns, all lessons I'd ever encountered, all stuffed in. Absurd. The model's attention quality noticeably declined. Claude Code even told me directly to cut it down.

The final fix was about 200 lines—just keeping a few document pointers. When a specific document is truly needed, let the resolver load it at the critical moment. This way, the 20,000 lines of knowledge are still available on demand, but don't pollute the context window.

4. Latent & Deterministic

In your system, every step belongs to one category or the other. And confusing these two is the most common error in agent design.

· Latent space is where intelligence resides. The model reads, understands, judges, and makes decisions here. This handles: judgment, synthesis, pattern recognition.

· Deterministic is where reliability resides. Same input, always the same output. SQL queries, compiled code, arithmetic operations belong on this side.

An LLM can help you seat 8 people for a dinner party, considering each person's personality and social relationships. But ask it to seat 800 people, and it will confidently generate a "seemingly reasonable, actually completely wrong" seating chart. Because that's no longer a problem for the latent space, but a deterministic problem—a combinatorial optimization problem—forced into the latent space.

The worst systems always misplace work on either side of this dividing line. The best systems draw the boundary very coldly.

5. Diarization (Document Organization / Topic Profiling)

The diarization step is what truly gives AI value for real knowledge work.

It means: the model reads all materials related to a topic and then writes a structured profile. It condenses the judgments from dozens or even hundreds of documents onto one page.

This is not something an SQL query can produce. This is not something a RAG pipeline can produce. The model must actually read, hold conflicting information in its mind simultaneously, notice what changed and when, and synthesize this into structured intelligence.

This is the difference between a database query and an analyst briefing.

This Architecture

These five concepts can be combined into a very simple three-layer architecture.

· The top layer is fat skills: processes written in markdown, carrying judgment, methodology, and domain knowledge. 90% of the value is in this layer.
· The middle is a thin CLI harness: about 200 lines of code, takes JSON input, outputs text, read-only by default.
· The bottom layer is your application system: QueryDB, ReadDoc, Search, Timeline—these are the deterministic infrastructure.

The core principle is directional: push "intelligence" up into the skills as much as possible; push "execution" down into deterministic tools as much as possible; keep the harness thin and light.

The result is: whenever model capabilities improve, all skills automatically become stronger; while the underlying deterministic system remains stable and reliable.

The Learning System

Let me use a real system we are building at YC to show how these five definitions work together.

July 2026, Chase Center. Startup School has 6000 founders attending. Everyone has structured application materials, questionnaire responses, transcripts of 1:1 conversations with mentors, and public signals: posts on X, GitHub commit history, Claude Code usage (which can indicate their development speed).

The traditional approach is: a 15-person project team reads applications one by one, makes intuitive judgments, and updates a spreadsheet.

This method works at a scale of 200 people but completely fails at 6000. No human can hold so many profiles in their mind and realize: the three strongest candidates in the AI agent infrastructure direction are a dev tools founder in Lagos, a compliance entrepreneur in Singapore, and a CLI tool developer in Brooklyn—and they described the same pain point using completely different expressions in different 1:1 conversations.

The model can do it. Here's how:

Enrichment

There is a skill called /enrich-founder that pulls all data sources, performs enrichment, diarization, and flags discrepancies between "what the founder says" and "what they actually do."

The underlying deterministic system handles: SQL queries, GitHub data, browser testing of Demo URLs, social signal scraping, CrustData queries, etc. A scheduled task runs daily. 6000 founder profiles are always up to date.

The output of diarization captures information that keyword searches completely miss:

This kind of "stated vs. actual behavior" discrepancy requires simultaneously reading GitHub commit history, application materials, and conversation transcripts, and integrating them mentally. No embedding similarity search can do this, nor can keyword filtering. The model must read completely and then make a judgment. (This is exactly the kind of task that belongs in the latent space!)

Matching

This is where "skill = method call" shows its power.

The same matching skill, called three times, can produce completely different strategies:

/match-breakout: processes 1200 people, clusters by domain, 30 people per group (embedding + deterministic assignment)

/match-lunch: processes 600 people, cross-domain "serendipitous matching," 8 people per table with no repeats—LLM generates themes first, then deterministic algorithm assigns seats

/match-live: processes live, real-time participants, based on nearest neighbor embedding, completes 1-to-1 matching within 200ms, excluding people already met

And the model can make judgments that traditional clustering algorithms cannot:

"Santos and Oram are both in AI infrastructure, but not competitors—Santos does cost attribution, Oram does orchestration. Should be in the same group."
"Kim's application said developer tools, but the 1:1 conversation shows they're doing SOC2 compliance automation. Should be re-categorized to FinTech / RegTech."

This re-categorization is something embeddings completely capture. The model must read the entire profile.

Learning Loop

After the event, an /improve skill reads NPS survey results, performs diarization on those "just okay" feedbacks—not the bad ones, but the "almost good" ones—and extracts patterns.

Then, it proposes new rules and writes them back into the matching skill:

When a participant says "AI infrastructure," but 80%+ of their code is billing modules:
→ Categorize as FinTech, not AI Infra

When two people in a group already know each other:
→ Reduce matching weight
Prioritize introducing new relationships

These rules are written back to the skill file. They take effect automatically on the next run. The skill is "rewriting itself." In the July event, "just okay" ratings were 12%; in the next event, it dropped to 4%.

The skill file learned what "just okay" means, and the system got better without anyone rewriting code.

This pattern can be migrated to any domain:

Retrieve → Read → Diarize → Count → Synthesize

Then: Investigate → Survey → Diarize → Rewrite skill

If you ask what the most valuable loop in 2026 is, it's this one. It can be applied to almost all knowledge work scenarios.

Skills Are Permanent Upgrades

I recently posted an instruction for OpenClaw on X, and the response was bigger than expected:

This content received thousands of likes and over two thousand bookmarks. Many thought it was a prompt engineering trick.

Actually, it's not; it's the architecture described earlier. Every skill you write is a permanent upgrade to the system. It doesn't degrade, doesn't forget. It runs automatically at 3 AM. And when the next generation of models is released, all skills instantly become stronger—the latent judgment capabilities improve, while the deterministic parts remain stable and reliable.

This is the source of the 100x efficiency Yegge talks about.

Not a smarter model, but: Fat Skills, Thin Harness, and the discipline to solidify everything into capabilities.

The system grows with compound interest. Build it once, run it long-term.

$500 to Buy OpenAI Stock: Silicon Valley's Most Respectable Liquidity Invitation

Silicon Valley's largest venture capital platform, AngelList, has launched a new fund called USVC, allowing U.S. retail investors to buy into high-profile AI companies like OpenAI, Anthropic, and xAI with a minimum investment of $500—no accredited investor status required. Promoted by AngelList co-founder Naval Ravikant, the fund is framed as an opportunity for ordinary people to access high-growth private tech investments traditionally reserved for VCs. However, critics argue it functions more like an exit vehicle for early insiders. USVC acquires shares not through primary rounds but largely via secondary transactions—purchasing stakes from early investors, VC funds, and employees looking to cash out at peak valuations. With companies like xAI heavily weighted in the portfolio, the fund effectively channels retail money into providing liquidity for insiders who entered at much lower valuations. The fund’s structure raises concerns: shares are illiquid, with no secondary market, and buybacks are limited and discretionary. The actual annual fee reaches 3.61%, far above the advertised 1% management fee. This model parallels the "low float, high fully diluted valuation" strategy seen in crypto, where early investors profit by selling to latecomers at inflated prices. The timing—alongside similar moves by platforms like Robinhood—suggests that Silicon Valley’s sudden interest in retail inclusion may be less about democratizing access and more about securing exits for insiders.

marsbit15m ago

$500 to Buy OpenAI Stock: Silicon Valley's Most Respectable Liquidity Invitation

marsbit15m ago

Bitcoin Approaches $80,000 Mark, ETF Funds Continue Inflow, Is the Crypto Market Shifting?

Bitcoin is approaching the $80,000 mark, reaching a new high since February, while Ethereum remains around $2,400 with some altcoins surging significantly. Over the past 24 hours, $462 million in futures contracts were liquidated, with shorts accounting for $353 million. The market fear and greed index has risen to 60, indicating neutral sentiment. Global risk assets continue to rebound, with U.S. stock markets hitting record highs. The S&P 500, Nasdaq, and Dow Jones all saw gains. Meanwhile, the U.S. dollar index remains stable around 98.61. Geopolitical developments, including a temporary extension of the U.S.-Iran ceasefire, have influenced market dynamics, though tensions persist. Bitcoin spot ETFs have seen six consecutive days of net inflows, with a peak single-day inflow of $663.91 million. Ethereum spot ETFs also recorded nine straight days of net inflows. Stablecoin market capitalization has risen to $320.6 billion, with a weekly net inflow of $635 million. Market analysts note that Bitcoin's recovery is supported by renewed spot demand and ETF inflows. However, high realized profits and low volatility suggest caution, with resistance expected near $80,000. Institutional and ETF-driven demand is strengthening market support, indicating a potential shift toward a new trading range with increased upward momentum.

marsbit16m ago

Bitcoin Approaches $80,000 Mark, ETF Funds Continue Inflow, Is the Crypto Market Shifting?

marsbit16m ago

Anthropic Survey of 80,000 Claude Users: Those Who Use AI to Improve Efficiency the Fastest Feel the Least Secure About the Future

Anthropic surveyed 81,000 Claude users and found a paradox: those who benefit most from AI efficiency gains—like programmers and designers—are also the most anxious about being replaced by AI. Early-career workers expressed greater job insecurity than senior professionals. High-wage and low-wage occupations reported the largest productivity improvements, often through task scope expansion. Nearly half of users cited expanded capabilities as the key benefit, while 40% highlighted increased speed. However, those experiencing the greatest speed boosts reported higher levels of anxiety about job displacement. The findings suggest that AI’s economic impact is already affecting the labor market psychologically, even as many individuals report personal gains from AI use.

marsbit55m ago

Anthropic Survey of 80,000 Claude Users: Those Who Use AI to Improve Efficiency the Fastest Feel the Least Secure About the Future

marsbit55m ago

US Military Confirms Running Bitcoin Node, Four-Star General Calls It a 'Power Projection Tool'

U.S. Indo-Pacific Command (INDOPACOM) Commander Admiral Samuel Paparo has confirmed that the military command is operating a Bitcoin network node and conducting cybersecurity tests using the protocol. In testimonies before the Senate and House Armed Services Committees, Paparo characterized Bitcoin not as a financial asset but as a "computer science tool" and a "tool of national power." He emphasized its value in cybersecurity, particularly the proof-of-work mechanism, which imposes physical costs on attackers and can be applied in both offensive and defensive cyber operations. This marks a significant shift in the Pentagon’s narrative, moving from targeting illicit finance to recognizing Bitcoin as a technology with national security implications. INDOPACOM is not mining Bitcoin but is using the node for monitoring and operational testing. The move signals the U.S. military’s direct participation in the Bitcoin network and reflects growing geopolitical interest in the protocol’s strategic potential.

marsbit1h ago

US Military Confirms Running Bitcoin Node, Four-Star General Calls It a 'Power Projection Tool'

marsbit1h ago

Contract Audit Passed, Thermometer Did Not: Polymarket's 'Physical Vulnerability' Moment

According to reports, an individual manipulated temperature sensors at Paris Charles de Gaulle Airport (LFPG) on April 6th and 15th, causing brief, anomalous spikes of over 3°C. These events were allegedly orchestrated to profit from corresponding low-probability bets on the prediction market Polymarket, turning a small investment into approximately $34,000. The French national meteorological service, Météo-France, filed a criminal lawsuit after discovering signs of physical tampering. The attack required minimal technical skill; the perpetrator reportedly used a battery-powered hairdryer to briefly heat the publicly accessible sensor. Polymarket’s market for Paris temperature settles based on the day's highest recorded temperature from a data chain that runs from the physical sensor to Météo-France, to Weather Underground, and finally to its smart contract. In response, Polymarket did not void the profits or make an official statement. It silently changed the data source for its Paris market from LFPG to Le Bourget Airport (LFPB), a location with similarly unprotected sensors. This incident highlights a critical vulnerability: while the smart contracts are audited and secure, the physical data sources feeding them remain exposed and easy to manipulate.

marsbit1h ago

Contract Audit Passed, Thermometer Did Not: Polymarket's 'Physical Vulnerability' Moment

marsbit1h ago

Trading

Spot

Futures

Hot Articles

Audiera: The AI Agent Network Powering the Web4 Entertainment Economy

Audiera is a dual-platform Web4 entertainment ecosystem combining a mobile rhythm experience and a lightweight Telegram mini-game, powered by AI interaction and an on-chain creator economy.

39.7k Total ViewsPublished 2026.03.11Updated 2026.03.11

Audiera: The AI Agent Network Powering the Web4 Entertainment Economy

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

Talus is a decentralized AI Agent framework built on the Sui, designed to solve the structural problems of current AI systems: centralization, opacity, and a lack of native economic identity.

41.2k Total ViewsPublished 2026.03.18Updated 2026.03.18

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

By 2026, the integration of artificial intelligence and cryptocurrency has advanced from proof-of-concept to a new stage of "system-level integration".

1.3k Total ViewsPublished 2026.03.26Updated 2026.03.26

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

Thin Harness, Fat Skills: The True Source of 100x AI Productivity

Abstract

The Harness (Execution Framework) Is the Product Itself.

Five Definitions

1. Skill file

2. Harness (Execution Framework)

3. Resolver

4. Latent & Deterministic

5. Diarization (Document Organization / Topic Profiling)

This Architecture

The Learning System

Enrichment

Matching

Learning Loop

Skills Are Permanent Upgrades

Related Questions

Related Reads

$500 to Buy OpenAI Stock: Silicon Valley's Most Respectable Liquidity Invitation

Bitcoin Approaches $80,000 Mark, ETF Funds Continue Inflow, Is the Crypto Market Shifting?

Anthropic Survey of 80,000 Claude Users: Those Who Use AI to Improve Efficiency the Fastest Feel the Least Secure About the Future

US Military Confirms Running Bitcoin Node, Four-Star General Calls It a 'Power Projection Tool'

Contract Audit Passed, Thermometer Did Not: Polymarket's 'Physical Vulnerability' Moment

Trading

Hot Articles

Audiera: The AI Agent Network Powering the Web4 Entertainment Economy

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

Discussions

Top Questions

Hot Categories

Hot Tags