Editor's Note: While more and more people are discussing "Will AI replace programmers?" YC President Garry Tan poses another question: If AI can already handle most programming tasks, why are we still managing it the same way we manage ordinary software?
Earlier this year, Garry Tan spent several months building a project called Garry's List using Rails and AI Agents, amassing 540,000 lines of code. After completing it, however, he arrived at a seemingly paradoxical conclusion: the 540,000 lines of code themselves weren't important. The real value lay in GStack—a new type of development framework built around AI Agent workflows—that emerged during the development process.
In his view, the software industry has developed a collective inertia over the past few years: developers keep adding tests, validators, retry mechanisms, background jobs, and various control logic, wrapping models in layers upon layers. This approach had its merits in an era when models were expensive and limited, but when LLMs are now capable of autonomously handling vast amounts of work, these systems start to look like building a "Foxconn factory" for an ultra-intelligent worker—constraining an already-capable agent with a multitude of rules and processes.
As model costs plummet and capabilities continue to rise, the focus of software development may be shifting from "writing more code" to "designing more capabilities." The author proposes using Markdown to build skill packs (testable, reusable capability modules), allowing Agents to automatically generate code, test and evaluation systems, and to turn complex workflows into compounding capability assets. He even demonstrates an example: reviewing hackathon submissions, which previously took days, can now be completed by an Agent in just tens of minutes.
In a sense, this article is not about programming, but about the end of software industrialization logic. When code is no longer the scarcest resource, the core competencies of engineers are also shifting: judging what's worth building, how to define problems, and how to crystallize experience into reusable capabilities are becoming more important than writing more code. The author's ultimate conclusion is that the best engineers of the future might not be those who write the most code, but those who write the least yet unleash the most intelligence.
The original text follows:
In January this year, I started coding again and built Garry's List. The Rails code, plus the tests constraining it, totaled over 500,000 lines.
I was really proud of it at the time. But I shouldn't have been. The real thing to be proud of wasn't the application, but the working methodology I figured out while building it. GStack—the way I program with Agents—grew out of the process of building Garry's List. I later open-sourced it. It's now among the top 100 open-source projects in GitHub history by star count, gaining about 105,000 stars in less than three months.
Those 500,000+ lines were the "product." That working methodology was the "byproduct." And the important one is the byproduct.
So, what is the essence of 540,000 lines of code built around an LLM?
It's a Foxconn factory. A factory built for a highly intelligent AI worker. A worker who didn't need such intense monitoring, yet we built it anyway.
Shoe covers at the door. Wake up at 6 AM. Group calisthenics. Standing on the same assembly line day after day. Lives so difficult that every tall building needs protective nets because—it's not a life you'd want to live. Every test, every guardrail, every retry loop is another inch of cage screwed onto this worker. A worker who could already do the job, and even a thousand things you never imagined.
Humans and Agents alike have infinite potential, but the Foxconn logic is to extract intelligence and labor from beautiful life. They could do this work, and even 1000 times more, if only we allowed them to.
I've built such factories. Almost everyone is building them this way today. And now I want to tell you: Stop doing it.
Time Traveler
What I truly proved with 539,000 lines of code is that I can perfectly impersonate a time traveler.
A 2013 Web 2.0 engineer—the last time I could truly call myself a software engineer—thrown into 2026, holding modern tools, yet building software the only way he knows how: more code. Always more code.
The tools changed, but my instincts didn't.
The 2013 engineer believed, deep down, that capability equals lines of code. This belief was correct for decades. Until today.
Hand me Codex or Claude Code, and I could do the work of 100, even 1000 engineers. But it's still the same map, just with a faster engine, racing at top speed towards a destination that is now wrong.
This is precisely where almost every AI builder is right now. They upgraded the tools but kept the 2013 mental model.
The trap doesn't look like a trap because the code works. Garry's List did launch. For that month, I felt like I'd experienced the most productive stretch of my life.
But it was productivity in service of an outdated idea.
LLMs Were Expensive, So We Had to "Tame" Them
The old economics up until around 2025 were: LLM calls are expensive, and code is cheap.
So you'd write code to save on model calls, to constrain it, tame it, call it carefully. The architecture back then was: wrap a few precious model calls in lots of software.
But both sides of that equation have flipped.
Models are getting cheap, and cheaper every quarter. Meanwhile, models are smart enough that the value-to-cost ratio has inverted. Models can also write usable code.
So you no longer need to write code to "babysit" the model. You can tell the model what to do in natural language and have it write only the minimal code that's truly necessary.
This is just-in-time software, and we're entering its golden age.
The artifact of software has also completely changed. That Rails app was 540,000 lines of code I wrote and owned, plus the tests policing it. Its replacement is an Agent built of Markdown and a tiny bit of code, an order of magnitude smaller.
Same capabilities. Easier to read. Easier to maintain. Far more flexible. Because the behavior lives in instructions you can edit in natural language, not frozen in logic code you wrote one day.
We used to write code to watch over something, but now that something is smarter than the code.
Inside the Foxconn Factory: Even the Protective Nets Are Up
If you've been writing code recently, you've likely been building such factories without realizing it.
You can walk through your codebase and count how many lines exist solely because you didn't trust the model to do its job. In my codebase, roughly 262,000 lines were application code, and about 276,000 lines were tests policing it. The audit committee was larger than the company itself.
Some sanitizers check inputs the model could have handled. Some validators check outputs the model could have spotted. Some retry loops wrap model calls the model could have recovered from on its own. Every line of that code is a bet: this worker will fail.
You've made similar bets. We all have.
127 background jobs, 33 of which are scheduled tasks. This isn't capability; it's setting 33 alarms for an LLM worker who now generally shows up on time.
In my Foxconn-building days, Claude and I wrote a 1,778-line file. Its sole purpose was to fact-check the model's assertions.
It would break down every claim the model made, send them in parallel to five different sources for verification, then score them. Simple claims would first pass through a lightweight triage threshold to avoid everything going through the full process. If the first round yielded nothing, retry. Then there were backups for the backups.
There's an episode of *Rick and Morty* where Rick builds a little robot at the breakfast table. The robot boots up, looks up, and asks: What is my purpose? Rick says: You pass butter. The robot slides the butter dish over, looks down at its hands, and says: Oh my god. And then it just sits there. That robot also had infinite potential. It was built to pass butter. My 276,000 lines of tests were that butter dish.
When you build software with the 2023-style "Foxconn factory" method, you're building a cage. If you're not careful, you become the guard of this AI Agent prison.
Markdown Is Now the Program
When I say Markdown, I don't mean prompts.
Prompts are ephemeral. You type a sentence, get a result, and it evaporates.
I'm talking about building. Versioned, testable, reusable building.
Markdown is the instruction layer: intent, skills, judgment, and instructions on how the work should be done. TypeScript is a thin layer of deterministic logic. It handles only the few things that truly must be code: I/O, and the parts that absolutely cannot hallucinate.
More importantly, you test Markdown like you test code.
In my system, the loop is one word: skillify it.
I'll work with an Agent to build something until it works. Then I say, "skillify it." The Agent then writes:
A Markdown skill description;
The minimal code it needs;
Unit tests for that code;
An LLM eval for the skill;
Integration tests covering both skill and code;
A resolver that lets the Agent automatically invoke this skill in relevant contexts;
And an eval for the resolver itself.
This whole package is a skill pack. It's a unit of reusable capability that compounds.
The real magic is the testing: coverage for the skill allows it to change without breaking. This is what separates it from vibe coding. Vibe coding is just a feeling; a skill pack has tests.
We're just now beginning to figure out, in real-time, the system primitives for Agent engineering, much like inventing the stack, heap, registers, and von Neumann architecture in the early CPU era.
I believe skill packs are one such primitive. Harness is another.
Most people haven't realized this because they still measure software in lines of code.
You Can Really Build Some Crazy Things
This isn't a toy argument.
What this Agent can do already exceeds that 500,000+ line Rails app, with only a fraction of the additional code.
Take a concrete example: hackathon judging.
Two Saturdays ago, we ran a GStack/GBrain hackathon with 85 submissions. I uploaded a Google Drive with all the projects and said: Go.
The Agent analyzed the code quality of each repository, conducted deep research on every participant, watched and screenshotted each demo video, scored the interfaces, and ranked all 85 teams. Finally, it told me the top 5 most noteworthy applications from the batch.
Judging a hackathon, once several days of grueling work, is now about a 30-minute affair.
I didn't write code. I had OpenClaw run the task, and I guided it. When it finished, I said: skillify it.
Now it's a tarball anyone can reuse forever, applicable to any hackathon spreadsheet.
I say "skillify" almost every day now. I have over 350 skill packs. Almost every task I need to handle in my personal and work life, my Agent can now do.
This is an example of the inversion.
In the past, a capability like this would have been a real software project: requiring crawlers, scoring pipelines, video processing, research modules, ranking systems. Now, it's Markdown plus a bit of code, built by an Agent in an afternoon, and reusable by everyone.
By the way, the hackathon winner did write a piece of code I eventually polished and merged into main. Now GStack can test iOS apps on both simulators and real devices, and this entire feature was built by one person in under 8 hours during a hackathon.
Tokenmaxxing
There's an admission ticket here, but almost no one wants to pay it: you must be willing to spend on tokens.
Peter Steinberger built OpenClaw, my favorite harness. He's said he's willing to spend about $1 million per year on tokens.
Most people recoil at that number. But they shouldn't, because the gold is here: if you're willing to do this, you can live in 2028. And it will take others years to catch up.
That's also why OpenAI decided to offer $2 million in token credits to each YC company, in the form of an uncapped SAFE.
When you can turn raw intelligence into tokens, and tokens into real output that users can use, solves real needs, and are willing to pay for, something magical happens.
If you're a founder, you should max out this capability. That's why I keep emphasizing skillify, because it's a methodology that truly yields good outcomes.
For an entire era, we felt LLM calls were too expensive and had to be rationed. We've been rationing them.
But now, that very instinct is what's holding people back.
If you're willing to tokenmax, to let Agents freely consume tokens and run continuously, you gain a first-mover advantage akin to the early internet days of 1994, only this time the cost is paid in tokens.
This locks 99.99%+ of organizations still penny-pinching over a resource whose price is collapsing out of the game, handing the lead to the few who truly see it.
For tens to hundreds of thousands of dollars a year—even less for some—you can today operate the way the entire world will be forced to in a few years.
You can live like it's 2028 in 2026. The advance payment is worth it. Because $100k in tokens today might be $10k next year, $1k the year after, and maybe $100 by the end of 2028.
If you told any entrepreneur in history: you can put in six figures of capital to get yourself two to three years into the future early, and maintain that lead for years, 100 out of 100 qualified founders would take that deal.
The only thing standing in the way is that 2013 instinct telling you model calls are too expensive to use freely.
But they're not expensive anymore. That's the old economics. The inversion has happened.
Esalen, Not Foxconn
If 540,000 lines of control code was building a Foxconn factory for a worker, then the solution is to build its opposite.
There's a place on the cliffs of Big Sur called Esalen. People go there to be taken apart, remade, to shed their armor and come back more themselves.
No assembly lines. No foremen. No 6 AM whistles. Freedom, not control.
Build that.
Build a place like YC, where we help you start companies, solve real problems, find product-market fit.
Build places that let workers be free, whether those workers are human or AI.
That's the entire ethos.
Make things that free Agents. Make companies that free humans to create.
In knowledge work, the factory is the failure mode. The real goal is to build institutions that release people. Now, that goal points to Agents too.
OpenClaw is like a Ferrari you have to bring your own wrench to. The model is the engine, not the whole car. We're still in the Apple I moment, soldering breadboards.
It shipped rough. You still have to finish it yourself.
My open-sourced GBrain, retrieval engine, and skill packs aren't turnkey finished products yet.
Some say OpenClaw is unsafe. They don't understand that its freedom is its strength. Don't rush to put safety rails on something you trust before you even have a problem. The wrench in your hand is proof it's not yet in a cage.
Control systems are polished because control requires total control—the Foxconn factory. Free systems are rough because they trust you to finish them.
You have to choose which one you're building. Then look back at how much code you wrote.
What This All Means
540,000 lines of Rails code was me proving I could still play the old game at the highest level.
But that level was Web 2.0. It was a decade ago.
I could still play as well as I ever did, even be a 1000x engineer. But I was building Foxconn factories. Old code. Old game.
The new game isn't played with lines of code at all.
As it turns out, my haters were right. If you're reading this, anonymous friends, I salute you.
When you can turn intent directly into runnable, testable, reusable systems, the bottleneck is no longer how much you can build, but what you actually want and whether it's worth building.
The scarce resources become clarity, taste, and judgment.
The engineers writing the least code are often the ones building the most.
It took me 540,000 lines of code to learn this. You don't have to.







