After 540,000 Lines of Code, Garry Tan Realizes the Old Game of AI Programming Is Over

marsbitPublished on 2026-06-02Last updated on 2026-06-02

Abstract

YC President Garry Tan's project "Garry's List" involved over 540,000 lines of Rails code, but he concluded this approach is outdated. He argues the software industry is stuck in a "Foxconn factory" mindset: building excessive tests, validators, and control logic to constrain LLMs, which have become cheap and capable enough to work autonomously. The old paradigm treated LLM calls as expensive and code as cheap, leading to complex systems to "manage" the AI. This has now reversed. The future lies in "just-in-time software"—using natural language instructions (Markdown-based "skill packs") and minimal code, letting the AI handle the work. Tan advocates for "skillifying" workflows: after an agent completes a task, it packages the process into a reusable, tested skill pack. This shifts value from writing code to designing capabilities. For example, his agent reviewed 85 hackathon submissions in 30 minutes, a task that previously took days. He emphasizes "tokenmaxxing"—willingly spending on LLM tokens to gain a years-long competitive advantage, as costs are plummeting. The core shift is from measuring output in lines of code to focusing on clarity, taste, and judgment. The best future engineers won't write the most code, but will know what to build and how to unlock the most intelligence with the least code.

Editor's Note: While more and more people are discussing "Will AI replace programmers?" YC President Garry Tan poses another question: If AI can already handle most programming tasks, why are we still managing it the same way we manage ordinary software?

Earlier this year, Garry Tan spent several months building a project called Garry's List using Rails and AI Agents, amassing 540,000 lines of code. After completing it, however, he arrived at a seemingly paradoxical conclusion: the 540,000 lines of code themselves weren't important. The real value lay in GStack—a new type of development framework built around AI Agent workflows—that emerged during the development process.

In his view, the software industry has developed a collective inertia over the past few years: developers keep adding tests, validators, retry mechanisms, background jobs, and various control logic, wrapping models in layers upon layers. This approach had its merits in an era when models were expensive and limited, but when LLMs are now capable of autonomously handling vast amounts of work, these systems start to look like building a "Foxconn factory" for an ultra-intelligent worker—constraining an already-capable agent with a multitude of rules and processes.

As model costs plummet and capabilities continue to rise, the focus of software development may be shifting from "writing more code" to "designing more capabilities." The author proposes using Markdown to build skill packs (testable, reusable capability modules), allowing Agents to automatically generate code, test and evaluation systems, and to turn complex workflows into compounding capability assets. He even demonstrates an example: reviewing hackathon submissions, which previously took days, can now be completed by an Agent in just tens of minutes.

In a sense, this article is not about programming, but about the end of software industrialization logic. When code is no longer the scarcest resource, the core competencies of engineers are also shifting: judging what's worth building, how to define problems, and how to crystallize experience into reusable capabilities are becoming more important than writing more code. The author's ultimate conclusion is that the best engineers of the future might not be those who write the most code, but those who write the least yet unleash the most intelligence.

The original text follows:

In January this year, I started coding again and built Garry's List. The Rails code, plus the tests constraining it, totaled over 500,000 lines.

I was really proud of it at the time. But I shouldn't have been. The real thing to be proud of wasn't the application, but the working methodology I figured out while building it. GStack—the way I program with Agents—grew out of the process of building Garry's List. I later open-sourced it. It's now among the top 100 open-source projects in GitHub history by star count, gaining about 105,000 stars in less than three months.

Those 500,000+ lines were the "product." That working methodology was the "byproduct." And the important one is the byproduct.

So, what is the essence of 540,000 lines of code built around an LLM?

It's a Foxconn factory. A factory built for a highly intelligent AI worker. A worker who didn't need such intense monitoring, yet we built it anyway.

Shoe covers at the door. Wake up at 6 AM. Group calisthenics. Standing on the same assembly line day after day. Lives so difficult that every tall building needs protective nets because—it's not a life you'd want to live. Every test, every guardrail, every retry loop is another inch of cage screwed onto this worker. A worker who could already do the job, and even a thousand things you never imagined.

Humans and Agents alike have infinite potential, but the Foxconn logic is to extract intelligence and labor from beautiful life. They could do this work, and even 1000 times more, if only we allowed them to.

I've built such factories. Almost everyone is building them this way today. And now I want to tell you: Stop doing it.

Time Traveler

What I truly proved with 539,000 lines of code is that I can perfectly impersonate a time traveler.

A 2013 Web 2.0 engineer—the last time I could truly call myself a software engineer—thrown into 2026, holding modern tools, yet building software the only way he knows how: more code. Always more code.

The tools changed, but my instincts didn't.

The 2013 engineer believed, deep down, that capability equals lines of code. This belief was correct for decades. Until today.

Hand me Codex or Claude Code, and I could do the work of 100, even 1000 engineers. But it's still the same map, just with a faster engine, racing at top speed towards a destination that is now wrong.

This is precisely where almost every AI builder is right now. They upgraded the tools but kept the 2013 mental model.

The trap doesn't look like a trap because the code works. Garry's List did launch. For that month, I felt like I'd experienced the most productive stretch of my life.

But it was productivity in service of an outdated idea.

LLMs Were Expensive, So We Had to "Tame" Them

The old economics up until around 2025 were: LLM calls are expensive, and code is cheap.

So you'd write code to save on model calls, to constrain it, tame it, call it carefully. The architecture back then was: wrap a few precious model calls in lots of software.

But both sides of that equation have flipped.

Models are getting cheap, and cheaper every quarter. Meanwhile, models are smart enough that the value-to-cost ratio has inverted. Models can also write usable code.

So you no longer need to write code to "babysit" the model. You can tell the model what to do in natural language and have it write only the minimal code that's truly necessary.

This is just-in-time software, and we're entering its golden age.

The artifact of software has also completely changed. That Rails app was 540,000 lines of code I wrote and owned, plus the tests policing it. Its replacement is an Agent built of Markdown and a tiny bit of code, an order of magnitude smaller.

Same capabilities. Easier to read. Easier to maintain. Far more flexible. Because the behavior lives in instructions you can edit in natural language, not frozen in logic code you wrote one day.

We used to write code to watch over something, but now that something is smarter than the code.

Inside the Foxconn Factory: Even the Protective Nets Are Up

If you've been writing code recently, you've likely been building such factories without realizing it.

You can walk through your codebase and count how many lines exist solely because you didn't trust the model to do its job. In my codebase, roughly 262,000 lines were application code, and about 276,000 lines were tests policing it. The audit committee was larger than the company itself.

Some sanitizers check inputs the model could have handled. Some validators check outputs the model could have spotted. Some retry loops wrap model calls the model could have recovered from on its own. Every line of that code is a bet: this worker will fail.

You've made similar bets. We all have.

127 background jobs, 33 of which are scheduled tasks. This isn't capability; it's setting 33 alarms for an LLM worker who now generally shows up on time.

In my Foxconn-building days, Claude and I wrote a 1,778-line file. Its sole purpose was to fact-check the model's assertions.

It would break down every claim the model made, send them in parallel to five different sources for verification, then score them. Simple claims would first pass through a lightweight triage threshold to avoid everything going through the full process. If the first round yielded nothing, retry. Then there were backups for the backups.

There's an episode of *Rick and Morty* where Rick builds a little robot at the breakfast table. The robot boots up, looks up, and asks: What is my purpose? Rick says: You pass butter. The robot slides the butter dish over, looks down at its hands, and says: Oh my god. And then it just sits there. That robot also had infinite potential. It was built to pass butter. My 276,000 lines of tests were that butter dish.

When you build software with the 2023-style "Foxconn factory" method, you're building a cage. If you're not careful, you become the guard of this AI Agent prison.

Markdown Is Now the Program

When I say Markdown, I don't mean prompts.

Prompts are ephemeral. You type a sentence, get a result, and it evaporates.

I'm talking about building. Versioned, testable, reusable building.

Markdown is the instruction layer: intent, skills, judgment, and instructions on how the work should be done. TypeScript is a thin layer of deterministic logic. It handles only the few things that truly must be code: I/O, and the parts that absolutely cannot hallucinate.

More importantly, you test Markdown like you test code.

In my system, the loop is one word: skillify it.

I'll work with an Agent to build something until it works. Then I say, "skillify it." The Agent then writes:

A Markdown skill description;

The minimal code it needs;

Unit tests for that code;

An LLM eval for the skill;

Integration tests covering both skill and code;

A resolver that lets the Agent automatically invoke this skill in relevant contexts;

And an eval for the resolver itself.

This whole package is a skill pack. It's a unit of reusable capability that compounds.

The real magic is the testing: coverage for the skill allows it to change without breaking. This is what separates it from vibe coding. Vibe coding is just a feeling; a skill pack has tests.

We're just now beginning to figure out, in real-time, the system primitives for Agent engineering, much like inventing the stack, heap, registers, and von Neumann architecture in the early CPU era.

I believe skill packs are one such primitive. Harness is another.

Most people haven't realized this because they still measure software in lines of code.

You Can Really Build Some Crazy Things

This isn't a toy argument.

What this Agent can do already exceeds that 500,000+ line Rails app, with only a fraction of the additional code.

Take a concrete example: hackathon judging.

Two Saturdays ago, we ran a GStack/GBrain hackathon with 85 submissions. I uploaded a Google Drive with all the projects and said: Go.

The Agent analyzed the code quality of each repository, conducted deep research on every participant, watched and screenshotted each demo video, scored the interfaces, and ranked all 85 teams. Finally, it told me the top 5 most noteworthy applications from the batch.

Judging a hackathon, once several days of grueling work, is now about a 30-minute affair.

I didn't write code. I had OpenClaw run the task, and I guided it. When it finished, I said: skillify it.

Now it's a tarball anyone can reuse forever, applicable to any hackathon spreadsheet.

I say "skillify" almost every day now. I have over 350 skill packs. Almost every task I need to handle in my personal and work life, my Agent can now do.

This is an example of the inversion.

In the past, a capability like this would have been a real software project: requiring crawlers, scoring pipelines, video processing, research modules, ranking systems. Now, it's Markdown plus a bit of code, built by an Agent in an afternoon, and reusable by everyone.

By the way, the hackathon winner did write a piece of code I eventually polished and merged into main. Now GStack can test iOS apps on both simulators and real devices, and this entire feature was built by one person in under 8 hours during a hackathon.

Tokenmaxxing

There's an admission ticket here, but almost no one wants to pay it: you must be willing to spend on tokens.

Peter Steinberger built OpenClaw, my favorite harness. He's said he's willing to spend about $1 million per year on tokens.

Most people recoil at that number. But they shouldn't, because the gold is here: if you're willing to do this, you can live in 2028. And it will take others years to catch up.

That's also why OpenAI decided to offer $2 million in token credits to each YC company, in the form of an uncapped SAFE.

When you can turn raw intelligence into tokens, and tokens into real output that users can use, solves real needs, and are willing to pay for, something magical happens.

If you're a founder, you should max out this capability. That's why I keep emphasizing skillify, because it's a methodology that truly yields good outcomes.

For an entire era, we felt LLM calls were too expensive and had to be rationed. We've been rationing them.

But now, that very instinct is what's holding people back.

If you're willing to tokenmax, to let Agents freely consume tokens and run continuously, you gain a first-mover advantage akin to the early internet days of 1994, only this time the cost is paid in tokens.

This locks 99.99%+ of organizations still penny-pinching over a resource whose price is collapsing out of the game, handing the lead to the few who truly see it.

For tens to hundreds of thousands of dollars a year—even less for some—you can today operate the way the entire world will be forced to in a few years.

You can live like it's 2028 in 2026. The advance payment is worth it. Because $100k in tokens today might be $10k next year, $1k the year after, and maybe $100 by the end of 2028.

If you told any entrepreneur in history: you can put in six figures of capital to get yourself two to three years into the future early, and maintain that lead for years, 100 out of 100 qualified founders would take that deal.

The only thing standing in the way is that 2013 instinct telling you model calls are too expensive to use freely.

But they're not expensive anymore. That's the old economics. The inversion has happened.

Esalen, Not Foxconn

If 540,000 lines of control code was building a Foxconn factory for a worker, then the solution is to build its opposite.

There's a place on the cliffs of Big Sur called Esalen. People go there to be taken apart, remade, to shed their armor and come back more themselves.

No assembly lines. No foremen. No 6 AM whistles. Freedom, not control.

Build that.

Build a place like YC, where we help you start companies, solve real problems, find product-market fit.

Build places that let workers be free, whether those workers are human or AI.

That's the entire ethos.

Make things that free Agents. Make companies that free humans to create.

In knowledge work, the factory is the failure mode. The real goal is to build institutions that release people. Now, that goal points to Agents too.

OpenClaw is like a Ferrari you have to bring your own wrench to. The model is the engine, not the whole car. We're still in the Apple I moment, soldering breadboards.

It shipped rough. You still have to finish it yourself.

My open-sourced GBrain, retrieval engine, and skill packs aren't turnkey finished products yet.

Some say OpenClaw is unsafe. They don't understand that its freedom is its strength. Don't rush to put safety rails on something you trust before you even have a problem. The wrench in your hand is proof it's not yet in a cage.

Control systems are polished because control requires total control—the Foxconn factory. Free systems are rough because they trust you to finish them.

You have to choose which one you're building. Then look back at how much code you wrote.

What This All Means

540,000 lines of Rails code was me proving I could still play the old game at the highest level.

But that level was Web 2.0. It was a decade ago.

I could still play as well as I ever did, even be a 1000x engineer. But I was building Foxconn factories. Old code. Old game.

The new game isn't played with lines of code at all.

As it turns out, my haters were right. If you're reading this, anonymous friends, I salute you.

When you can turn intent directly into runnable, testable, reusable systems, the bottleneck is no longer how much you can build, but what you actually want and whether it's worth building.

The scarce resources become clarity, taste, and judgment.

The engineers writing the least code are often the ones building the most.

It took me 540,000 lines of code to learn this. You don't have to.

Related Questions

QAccording to Garry Tan, what is the real valuable outcome from his project with 540,000 lines of code?

AThe real valuable outcome is not the 540,000 lines of code itself, but the GStack framework—a new development framework built around AI Agent workflows that emerged during the development process. This represents a shift from managing software traditionally to designing reusable capability modules (skill packs).

QWhat analogy does Garry Tan use to describe the traditional approach of wrapping LLMs with extensive control logic, and why does he consider it flawed?

AHe uses the analogy of building a 'Foxconn factory' for a highly intelligent AI worker. He considers it flawed because modern LLMs are capable and autonomous, but the traditional approach imposes excessive rules, tests, retry mechanisms, and control logic—like cages—that constrain the AI's potential rather than leveraging its full capabilities.

QWhat does Garry Tan mean by 'skillify it,' and what components does a 'skill pack' include?

A'Skillify it' means transforming a working capability into a reusable, tested module. A 'skill pack' includes: a Markdown skill description, minimal necessary code, unit tests for the code, an LLM evaluation for the skill, integration tests covering both skill and code, a resolver for automatic skill invocation by the Agent, and an evaluation for the resolver itself.

QWhat is 'tokenmaxxing,' and why does Garry Tan advocate for it?

A'Tokenmaxxing' refers to willingly spending substantial amounts on LLM tokens to maximize AI Agent capabilities and productivity. Tan advocates for it because token costs are rapidly decreasing, and investing in tokens now allows early adopters to operate with future-level efficiency, gaining a significant competitive advantage while others hesitate due to outdated cost concerns.

QIn the new paradigm of AI-powered development, what does Garry Tan suggest will become the scarce resources for engineers?

AHe suggests that in the new paradigm, the scarce resources for engineers will shift from writing code to clarity of intent, taste, and judgment. The bottleneck becomes defining what is worth building and how to frame problems, rather than the ability to produce large volumes of code.

Related Reads

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片