After 540,000 Lines of Code, Garry Tan Realizes the Old Game of AI Programming Is Over

marsbitPublished on 2026-06-02Last updated on 2026-06-02

Abstract

YC President Garry Tan's project "Garry's List" involved over 540,000 lines of Rails code, but he concluded this approach is outdated. He argues the software industry is stuck in a "Foxconn factory" mindset: building excessive tests, validators, and control logic to constrain LLMs, which have become cheap and capable enough to work autonomously. The old paradigm treated LLM calls as expensive and code as cheap, leading to complex systems to "manage" the AI. This has now reversed. The future lies in "just-in-time software"—using natural language instructions (Markdown-based "skill packs") and minimal code, letting the AI handle the work. Tan advocates for "skillifying" workflows: after an agent completes a task, it packages the process into a reusable, tested skill pack. This shifts value from writing code to designing capabilities. For example, his agent reviewed 85 hackathon submissions in 30 minutes, a task that previously took days. He emphasizes "tokenmaxxing"—willingly spending on LLM tokens to gain a years-long competitive advantage, as costs are plummeting. The core shift is from measuring output in lines of code to focusing on clarity, taste, and judgment. The best future engineers won't write the most code, but will know what to build and how to unlock the most intelligence with the least code.

Editor's Note: While more and more people are discussing "Will AI replace programmers?" YC President Garry Tan poses another question: If AI can already handle most programming tasks, why are we still managing it the same way we manage ordinary software?

Earlier this year, Garry Tan spent several months building a project called Garry's List using Rails and AI Agents, amassing 540,000 lines of code. After completing it, however, he arrived at a seemingly paradoxical conclusion: the 540,000 lines of code themselves weren't important. The real value lay in GStack—a new type of development framework built around AI Agent workflows—that emerged during the development process.

In his view, the software industry has developed a collective inertia over the past few years: developers keep adding tests, validators, retry mechanisms, background jobs, and various control logic, wrapping models in layers upon layers. This approach had its merits in an era when models were expensive and limited, but when LLMs are now capable of autonomously handling vast amounts of work, these systems start to look like building a "Foxconn factory" for an ultra-intelligent worker—constraining an already-capable agent with a multitude of rules and processes.

As model costs plummet and capabilities continue to rise, the focus of software development may be shifting from "writing more code" to "designing more capabilities." The author proposes using Markdown to build skill packs (testable, reusable capability modules), allowing Agents to automatically generate code, test and evaluation systems, and to turn complex workflows into compounding capability assets. He even demonstrates an example: reviewing hackathon submissions, which previously took days, can now be completed by an Agent in just tens of minutes.

In a sense, this article is not about programming, but about the end of software industrialization logic. When code is no longer the scarcest resource, the core competencies of engineers are also shifting: judging what's worth building, how to define problems, and how to crystallize experience into reusable capabilities are becoming more important than writing more code. The author's ultimate conclusion is that the best engineers of the future might not be those who write the most code, but those who write the least yet unleash the most intelligence.

The original text follows:

In January this year, I started coding again and built Garry's List. The Rails code, plus the tests constraining it, totaled over 500,000 lines.

I was really proud of it at the time. But I shouldn't have been. The real thing to be proud of wasn't the application, but the working methodology I figured out while building it. GStack—the way I program with Agents—grew out of the process of building Garry's List. I later open-sourced it. It's now among the top 100 open-source projects in GitHub history by star count, gaining about 105,000 stars in less than three months.

Those 500,000+ lines were the "product." That working methodology was the "byproduct." And the important one is the byproduct.

So, what is the essence of 540,000 lines of code built around an LLM?

It's a Foxconn factory. A factory built for a highly intelligent AI worker. A worker who didn't need such intense monitoring, yet we built it anyway.

Shoe covers at the door. Wake up at 6 AM. Group calisthenics. Standing on the same assembly line day after day. Lives so difficult that every tall building needs protective nets because—it's not a life you'd want to live. Every test, every guardrail, every retry loop is another inch of cage screwed onto this worker. A worker who could already do the job, and even a thousand things you never imagined.

Humans and Agents alike have infinite potential, but the Foxconn logic is to extract intelligence and labor from beautiful life. They could do this work, and even 1000 times more, if only we allowed them to.

I've built such factories. Almost everyone is building them this way today. And now I want to tell you: Stop doing it.

Time Traveler

What I truly proved with 539,000 lines of code is that I can perfectly impersonate a time traveler.

A 2013 Web 2.0 engineer—the last time I could truly call myself a software engineer—thrown into 2026, holding modern tools, yet building software the only way he knows how: more code. Always more code.

The tools changed, but my instincts didn't.

The 2013 engineer believed, deep down, that capability equals lines of code. This belief was correct for decades. Until today.

Hand me Codex or Claude Code, and I could do the work of 100, even 1000 engineers. But it's still the same map, just with a faster engine, racing at top speed towards a destination that is now wrong.

This is precisely where almost every AI builder is right now. They upgraded the tools but kept the 2013 mental model.

The trap doesn't look like a trap because the code works. Garry's List did launch. For that month, I felt like I'd experienced the most productive stretch of my life.

But it was productivity in service of an outdated idea.

LLMs Were Expensive, So We Had to "Tame" Them

The old economics up until around 2025 were: LLM calls are expensive, and code is cheap.

So you'd write code to save on model calls, to constrain it, tame it, call it carefully. The architecture back then was: wrap a few precious model calls in lots of software.

But both sides of that equation have flipped.

Models are getting cheap, and cheaper every quarter. Meanwhile, models are smart enough that the value-to-cost ratio has inverted. Models can also write usable code.

So you no longer need to write code to "babysit" the model. You can tell the model what to do in natural language and have it write only the minimal code that's truly necessary.

This is just-in-time software, and we're entering its golden age.

The artifact of software has also completely changed. That Rails app was 540,000 lines of code I wrote and owned, plus the tests policing it. Its replacement is an Agent built of Markdown and a tiny bit of code, an order of magnitude smaller.

Same capabilities. Easier to read. Easier to maintain. Far more flexible. Because the behavior lives in instructions you can edit in natural language, not frozen in logic code you wrote one day.

We used to write code to watch over something, but now that something is smarter than the code.

Inside the Foxconn Factory: Even the Protective Nets Are Up

If you've been writing code recently, you've likely been building such factories without realizing it.

You can walk through your codebase and count how many lines exist solely because you didn't trust the model to do its job. In my codebase, roughly 262,000 lines were application code, and about 276,000 lines were tests policing it. The audit committee was larger than the company itself.

Some sanitizers check inputs the model could have handled. Some validators check outputs the model could have spotted. Some retry loops wrap model calls the model could have recovered from on its own. Every line of that code is a bet: this worker will fail.

You've made similar bets. We all have.

127 background jobs, 33 of which are scheduled tasks. This isn't capability; it's setting 33 alarms for an LLM worker who now generally shows up on time.

In my Foxconn-building days, Claude and I wrote a 1,778-line file. Its sole purpose was to fact-check the model's assertions.

It would break down every claim the model made, send them in parallel to five different sources for verification, then score them. Simple claims would first pass through a lightweight triage threshold to avoid everything going through the full process. If the first round yielded nothing, retry. Then there were backups for the backups.

There's an episode of *Rick and Morty* where Rick builds a little robot at the breakfast table. The robot boots up, looks up, and asks: What is my purpose? Rick says: You pass butter. The robot slides the butter dish over, looks down at its hands, and says: Oh my god. And then it just sits there. That robot also had infinite potential. It was built to pass butter. My 276,000 lines of tests were that butter dish.

When you build software with the 2023-style "Foxconn factory" method, you're building a cage. If you're not careful, you become the guard of this AI Agent prison.

Markdown Is Now the Program

When I say Markdown, I don't mean prompts.

Prompts are ephemeral. You type a sentence, get a result, and it evaporates.

I'm talking about building. Versioned, testable, reusable building.

Markdown is the instruction layer: intent, skills, judgment, and instructions on how the work should be done. TypeScript is a thin layer of deterministic logic. It handles only the few things that truly must be code: I/O, and the parts that absolutely cannot hallucinate.

More importantly, you test Markdown like you test code.

In my system, the loop is one word: skillify it.

I'll work with an Agent to build something until it works. Then I say, "skillify it." The Agent then writes:

A Markdown skill description;

The minimal code it needs;

Unit tests for that code;

An LLM eval for the skill;

Integration tests covering both skill and code;

A resolver that lets the Agent automatically invoke this skill in relevant contexts;

And an eval for the resolver itself.

This whole package is a skill pack. It's a unit of reusable capability that compounds.

The real magic is the testing: coverage for the skill allows it to change without breaking. This is what separates it from vibe coding. Vibe coding is just a feeling; a skill pack has tests.

We're just now beginning to figure out, in real-time, the system primitives for Agent engineering, much like inventing the stack, heap, registers, and von Neumann architecture in the early CPU era.

I believe skill packs are one such primitive. Harness is another.

Most people haven't realized this because they still measure software in lines of code.

You Can Really Build Some Crazy Things

This isn't a toy argument.

What this Agent can do already exceeds that 500,000+ line Rails app, with only a fraction of the additional code.

Take a concrete example: hackathon judging.

Two Saturdays ago, we ran a GStack/GBrain hackathon with 85 submissions. I uploaded a Google Drive with all the projects and said: Go.

The Agent analyzed the code quality of each repository, conducted deep research on every participant, watched and screenshotted each demo video, scored the interfaces, and ranked all 85 teams. Finally, it told me the top 5 most noteworthy applications from the batch.

Judging a hackathon, once several days of grueling work, is now about a 30-minute affair.

I didn't write code. I had OpenClaw run the task, and I guided it. When it finished, I said: skillify it.

Now it's a tarball anyone can reuse forever, applicable to any hackathon spreadsheet.

I say "skillify" almost every day now. I have over 350 skill packs. Almost every task I need to handle in my personal and work life, my Agent can now do.

This is an example of the inversion.

In the past, a capability like this would have been a real software project: requiring crawlers, scoring pipelines, video processing, research modules, ranking systems. Now, it's Markdown plus a bit of code, built by an Agent in an afternoon, and reusable by everyone.

By the way, the hackathon winner did write a piece of code I eventually polished and merged into main. Now GStack can test iOS apps on both simulators and real devices, and this entire feature was built by one person in under 8 hours during a hackathon.

Tokenmaxxing

There's an admission ticket here, but almost no one wants to pay it: you must be willing to spend on tokens.

Peter Steinberger built OpenClaw, my favorite harness. He's said he's willing to spend about $1 million per year on tokens.

Most people recoil at that number. But they shouldn't, because the gold is here: if you're willing to do this, you can live in 2028. And it will take others years to catch up.

That's also why OpenAI decided to offer $2 million in token credits to each YC company, in the form of an uncapped SAFE.

When you can turn raw intelligence into tokens, and tokens into real output that users can use, solves real needs, and are willing to pay for, something magical happens.

If you're a founder, you should max out this capability. That's why I keep emphasizing skillify, because it's a methodology that truly yields good outcomes.

For an entire era, we felt LLM calls were too expensive and had to be rationed. We've been rationing them.

But now, that very instinct is what's holding people back.

If you're willing to tokenmax, to let Agents freely consume tokens and run continuously, you gain a first-mover advantage akin to the early internet days of 1994, only this time the cost is paid in tokens.

This locks 99.99%+ of organizations still penny-pinching over a resource whose price is collapsing out of the game, handing the lead to the few who truly see it.

For tens to hundreds of thousands of dollars a year—even less for some—you can today operate the way the entire world will be forced to in a few years.

You can live like it's 2028 in 2026. The advance payment is worth it. Because $100k in tokens today might be $10k next year, $1k the year after, and maybe $100 by the end of 2028.

If you told any entrepreneur in history: you can put in six figures of capital to get yourself two to three years into the future early, and maintain that lead for years, 100 out of 100 qualified founders would take that deal.

The only thing standing in the way is that 2013 instinct telling you model calls are too expensive to use freely.

But they're not expensive anymore. That's the old economics. The inversion has happened.

Esalen, Not Foxconn

If 540,000 lines of control code was building a Foxconn factory for a worker, then the solution is to build its opposite.

There's a place on the cliffs of Big Sur called Esalen. People go there to be taken apart, remade, to shed their armor and come back more themselves.

No assembly lines. No foremen. No 6 AM whistles. Freedom, not control.

Build that.

Build a place like YC, where we help you start companies, solve real problems, find product-market fit.

Build places that let workers be free, whether those workers are human or AI.

That's the entire ethos.

Make things that free Agents. Make companies that free humans to create.

In knowledge work, the factory is the failure mode. The real goal is to build institutions that release people. Now, that goal points to Agents too.

OpenClaw is like a Ferrari you have to bring your own wrench to. The model is the engine, not the whole car. We're still in the Apple I moment, soldering breadboards.

It shipped rough. You still have to finish it yourself.

My open-sourced GBrain, retrieval engine, and skill packs aren't turnkey finished products yet.

Some say OpenClaw is unsafe. They don't understand that its freedom is its strength. Don't rush to put safety rails on something you trust before you even have a problem. The wrench in your hand is proof it's not yet in a cage.

Control systems are polished because control requires total control—the Foxconn factory. Free systems are rough because they trust you to finish them.

You have to choose which one you're building. Then look back at how much code you wrote.

What This All Means

540,000 lines of Rails code was me proving I could still play the old game at the highest level.

But that level was Web 2.0. It was a decade ago.

I could still play as well as I ever did, even be a 1000x engineer. But I was building Foxconn factories. Old code. Old game.

The new game isn't played with lines of code at all.

As it turns out, my haters were right. If you're reading this, anonymous friends, I salute you.

When you can turn intent directly into runnable, testable, reusable systems, the bottleneck is no longer how much you can build, but what you actually want and whether it's worth building.

The scarce resources become clarity, taste, and judgment.

The engineers writing the least code are often the ones building the most.

It took me 540,000 lines of code to learn this. You don't have to.

Related Questions

QAccording to Garry Tan, what is the real valuable outcome from his project with 540,000 lines of code?

AThe real valuable outcome is not the 540,000 lines of code itself, but the GStack framework—a new development framework built around AI Agent workflows that emerged during the development process. This represents a shift from managing software traditionally to designing reusable capability modules (skill packs).

QWhat analogy does Garry Tan use to describe the traditional approach of wrapping LLMs with extensive control logic, and why does he consider it flawed?

AHe uses the analogy of building a 'Foxconn factory' for a highly intelligent AI worker. He considers it flawed because modern LLMs are capable and autonomous, but the traditional approach imposes excessive rules, tests, retry mechanisms, and control logic—like cages—that constrain the AI's potential rather than leveraging its full capabilities.

QWhat does Garry Tan mean by 'skillify it,' and what components does a 'skill pack' include?

A'Skillify it' means transforming a working capability into a reusable, tested module. A 'skill pack' includes: a Markdown skill description, minimal necessary code, unit tests for the code, an LLM evaluation for the skill, integration tests covering both skill and code, a resolver for automatic skill invocation by the Agent, and an evaluation for the resolver itself.

QWhat is 'tokenmaxxing,' and why does Garry Tan advocate for it?

A'Tokenmaxxing' refers to willingly spending substantial amounts on LLM tokens to maximize AI Agent capabilities and productivity. Tan advocates for it because token costs are rapidly decreasing, and investing in tokens now allows early adopters to operate with future-level efficiency, gaining a significant competitive advantage while others hesitate due to outdated cost concerns.

QIn the new paradigm of AI-powered development, what does Garry Tan suggest will become the scarce resources for engineers?

AHe suggests that in the new paradigm, the scarce resources for engineers will shift from writing code to clarity of intent, taste, and judgment. The bottleneck becomes defining what is worth building and how to frame problems, rather than the ability to produce large volumes of code.

Related Reads

Can DeepSeek Save China One Trillion Dollars?

"DeepSeek and the $1 Trillion Infrastructure Question" The article examines whether DeepSeek's AI optimization breakthroughs could potentially save China $1 trillion in future AI infrastructure costs. The analysis begins with Nvidia's upcoming Vera Rubin AI platform, costing ~$7.8 million, where memory (HBM4/LPDDR5X) constitutes $2 million—a 435% cost increase in one year, highlighting how AI hardware spending is shifting toward expensive memory components. DeepSeek's approach works in the opposite direction. Through three key technical innovations showcased in DeepSeek V4, the company dramatically improves hardware efficiency: 1. **Memory Compression (MLA)**: Re-engineers the attention mechanism to compress long-context memory (KV Cache) by over 90%, drastically reducing expensive HBM usage. 2. **Selective Activation (MoE)**: Employs Mixture-of-Experts architecture where only a small fraction of parameters (e.g., 49B out of 1.6T in V4-Pro) are activated per token, allowing most parameters to reside in cheaper memory/SSD. 3. **Computation Caching**: Reuses previously computed results via cache hits, replacing expensive GPU computations with cheap memory reads. Combined, these optimizations allow the same hardware to produce approximately 4x more tokens, effectively reducing required hardware investment by 75%. DeepSeek's pricing reflects this: a 10-billion token workload costs ~$522 monthly versus ~$9,000-$10,000 for competitors. The $1 trillion savings projection stems from McKinsey's estimate that global AI infrastructure will require ~$5.2 trillion investment by 2030. As China's daily token consumption grows toward quadrillions, even marginal efficiency gains scale massively. With a conservative 4x throughput improvement, China could avoid building tens of thousands of AI data centers equivalent to ~7 trillion RMB ($1 trillion) in saved investment. Critically, this strategy shifts dependency from scarce, expensive GPU/HBM—where China lags—toward more accessible storage, caching, and systems engineering where domestic suppliers like CXMT are gaining strength. Rather than "replacing Nvidia," DeepSeek rebalances AI's value chain away from monolithic hardware dependency. Ultimately, DeepSeek's technical breakthroughs could lower the barrier to AI adoption across Chinese industries by making advanced capabilities affordable at scale—transforming who can access next-generation AI.

marsbit41m ago

Can DeepSeek Save China One Trillion Dollars?

marsbit41m ago

Overturning the Mainstream Approach to Hallucinations: Metacognition is the New Solution for Large Models to Break the Hallucination Barrier

This paper, "Hallucinations Undermine Trust; Metacognition is a Way Forward," proposes a paradigm shift in combating AI hallucination. It argues that the current mainstream approaches—striving for omniscience by scaling data/models or having AI abstain from uncertain answers—are fundamentally flawed. The former has inevitable knowledge gaps, while the latter imposes a crippling "utility tax," requiring the rejection of many correct answers to achieve high accuracy, due to models' poor "discrimination" (the ability to distinguish correct from incorrect answers internally). The core contribution is redefining hallucination not as "being wrong," but as "expressing false information with unwarranted certainty." The proposed solution is **Faithful Uncertainty** or **Metacognition**: enabling AI to accurately perceive its internal uncertainty and honestly express it in its language (e.g., using hedging phrases when unsure). This creates a more reliable assistant that provides useful information while signaling its confidence, minimizing harm from errors. The paper emphasizes that metacognition is critical for the era of AI Agents. Without it, Agents cannot intelligently decide when to use tools like search engines, leading to inefficiency and misuse. Key implementation challenges are highlighted: the "bootstrapping paradox" of training with static uncertainty data, the "alignment distortion signal" where human preference training suppresses internal uncertainty cues, and the difficulty of causally evaluating true metacognition vs. its superficial imitation. The paper concludes that the goal should not be an infallible AI, but one that is honest about the limits of its knowledge, thereby building user trust through transparent communication of its certainty.

marsbit45m ago

Overturning the Mainstream Approach to Hallucinations: Metacognition is the New Solution for Large Models to Break the Hallucination Barrier

marsbit45m ago

Hedge by Buying Gold and Oil, Chase Soaring Returns with AI. ‘Dated’ Bitcoin Enters a Bear Market

Bitcoin has recently declined, hitting a two-month low near $66,123, while Ethereum fell to a three-month low around $1,837. Analysts suggest the drop is not merely due to factors like ETF outflows or MicroStrategy's selling but reflects a deeper issue: Bitcoin is losing a broader asset competition. In a near-zero interest rate environment, Bitcoin previously thrived as an outlet for investor dissatisfaction with inflation and limited options. However, the market landscape has shifted. Bitcoin now occupies an "awkward middle ground," facing competition on three fronts. For inflation hedging, investors prefer gold, energy stocks, and commodity producers—assets with tangible backing and clearer pricing power. For growth exposure, AI-related companies with actual revenues and profits are more attractive. Even within crypto, investors can choose stablecoins, exchanges, or infrastructure firms tied directly to adoption, offering clearer business models and leverage. Thus, Bitcoin is no longer the top choice for hedging, growth, or crypto exposure. This shift is evident in market reactions: despite recent warnings about persistent inflation from a Fed official, Bitcoin did not rally as it might have in the past. Instead, capital flowed to assets with direct commodity or energy exposure. The recent ETF outflows and MicroStrategy sales are symptoms, not causes, of this new reality. Investors are becoming more selective, demanding clearer value propositions beyond mere scarcity. The emerging bear case for Bitcoin is not about it being a bubble or failed technology, but that scarcity alone is no longer sufficient.

华尔街日报48m ago

Hedge by Buying Gold and Oil, Chase Soaring Returns with AI. ‘Dated’ Bitcoin Enters a Bear Market

华尔街日报48m ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片