On February 9th, Beijing time late at night, tens of millions of developers worldwide opened GitHub and saw the same page.
It wasn't a 404, but something more anxiety-inducing than a 404—that chilling yellow warning bar that sends shivers down every engineer's spine, alongside a status page full of indicators turning from green to red.
github.com was down.
The API was down.
GitHub Actions was down.
Git operations were down—even Copilot wasn't spared.
That night, some people's CI/CD pipelines ground to a halt at the most critical juncture, some saw their automated deployments stuck mid-air, and others waited for a pull request that just wouldn't merge—behind it, a feature waiting to go live, waiting for real users.
GitHub later published an incident report. The root cause, in technical terms, was "an overload of a core database cluster responsible for authentication and user management." But behind those words lay a startling chain of events—
Two days prior, the engineering team, in a hurry to push a new model to users, changed the refresh time of a "user settings cache" from 12 hours to 2 hours. Just one configuration number.
The result: cache rewrites that were supposed to be spread over 12 hours were compressed into 2, creating a dense "cache rewrite storm." Asynchronous task queues were instantly overwhelmed, shared infrastructure components crashed, and the cascading effects spread to services responsible for proxying HTTPS Git operations, eventually exhausting all platform connections.
One number, changed from 12 to 2.
GitHub was brought down by a configuration change it made itself.
But if you only see that one config change, you've probably missed the most important part of this story.
01 Not One Accident, But Ten
The February 9th incident was not an isolated event.
In fact, in the first three months of 2026, GitHub experienced at least 8 major incidents. February alone saw 37 recorded failures, big and small. GitHub's CTO Vlad Fedorov later admitted in a blog post that GitHub had failed to maintain the "three nines"—99.9% availability—it promises its enterprise customers during those two months.
Looking through the failure records of those two months, you'll find a peculiar pattern: each incident appears to have a different cause.
February 2nd: Issues with the Azure compute provider, causing GitHub Actions to be down for nearly 4 hours, affecting Copilot Chat, CodeQL, Dependabot, and more.
February 9th: Cache rewrite storm, authentication database overload.
March 5th: Redis cluster failure, 95% of GitHub Actions workflows unable to start within 5 minutes, average delay of 30 minutes.
March 18th: Webhook latency spiked to 32 times the normal level.
Each one looked like an "accident," each with a different immediate cause. But Fedorov's explanation strings them together into the same story. He said these incidents share three common structural causes: "rapid load growth, tight coupling between services leading to localized failure propagation, and systems lacking protection capabilities against abnormal client traffic."
In engineer speak, GitHub's foundation is starting to crack under the pressure of new loads.
And this "new load" has a specific name.
02 275 Million Commits Per Week
Key Data
Total commits for all of 2025: Approximately 1 billion
Weekly commit volume in 2026: 275 million
Projected annual total for 2026 at this rate: 14 billion (a 14-fold year-over-year increase)
GitHub Actions compute minutes: 5 billion minutes per week in 2023 → 10 billion in 2025 → 21 billion minutes in a week in early 2026
If you're a GitHub infrastructure engineer, the comparison between your monitoring dashboard in 2025 and 2026 would probably leave you speechless.
Throughout all of 2025, GitHub processed around 1 billion code commits. That number itself is massive, the result of years of platform growth. But by 2026, the *weekly* commit volume reached 275 million. Doing the math—if this pace continues for the whole year, the total commits for 2026 would be close to 14 billion, a full 14 times the total for all of 2025.
This isn't a smooth growth curve; it's a cliff. The change in GitHub Actions compute minutes is even more telling: 5 billion minutes per week in 2023, doubling to 10 billion in 2025, and then in one week in early 2026, it skyrocketed to 21 billion minutes.
What's submitting code so frantically?
Not human developers.
GitHub's data shows that AI Agents are becoming the most active 'users' on the platform. Claude Code alone now accounts for 4.5% of all commits to public repositories on GitHub. 2.6 million commits per week—a number that was only 100,000 in late September 2025, a 25-fold increase in three months.
The number of PRs opened by AI Agents is also exploding. In September 2025, AI-generated PRs numbered about 4 million per month. By March 2026, that number jumped to 17 million—more than four times higher in half a year.
A picture might help you understand what this means.
Before, GitHub's "users" were mainly human programmers. They work during the day, sleep at night, rest on weekends. Each commit involves thought, hesitation; their typing speed has limits. System load followed human schedules, with peaks and troughs that could be predicted.
Now, more and more "users" are AI Agents. They don't sleep, don't rest, don't hesitate. One task can spawn multiple parallel Agents. A single Agent can easily commit more code per hour than a real engineer does in a week. More importantly, they're not just committing code; they're constantly creating new repositories—treating repositories as "output artifacts" of a workflow, not a human's "workspace."
GitHub's infrastructure engineers are no longer facing a larger version of the same problem, but a fundamentally different kind of problem.
03 Copilot's Money Isn't Enough to Burn Anymore
Frequent failures are just one side of the problem. GitHub has another, even more troublesome headache—when doing the math, they found they were losing money.
Copilot's original pricing logic was based on a reasonable assumption: users primarily engaged in "assistive completion," each interaction brief, with predictable compute demands. The personal plan at $10/month and the business plan at $19/month, charged per seat, worked well for several years.
Then, Agentic AI arrived.
Agentic workflows and traditional completion are different species. Standard code completion involves linear, predictable requests with short compute cycles. An Agentic coding session might run for hours, spawning multiple parallel threads, performing multi-step reasoning, self-correction, cross-repository refactoring—the token consumption of one session can easily exceed the entire monthly subscription fee of an average user.
GitHub faces a situation where a minority of heavy Agentic users are consuming compute resources worth hundreds of dollars for a monthly fee of a few dollars.
Faced with this, GitHub's reaction was direct—control the flow first, then change the price.
Starting early this year, GitHub implemented two parallel rate-limiting mechanisms for Copilot: session duration caps and weekly usage caps, both calculated based on token consumption multiplied by model compute weight. At the same time, new user registration for some individual Copilot plans was paused.
On June 1st, GitHub completed a more fundamental pricing overhaul: Copilot fully switched to usage-based billing, replacing old plan fees with "AI Credits." 1 AI Credit equals 1 US cent, with usage calculated in real-time based on token consumption.
The era of per-seat pricing has reached its end in the face of Agentic AI.
This shift isn't just GitHub's headache. It's a collective pricing crisis the entire AI tool industry is experiencing in 2026—when AI starts replacing humans in executing entire workflows, not just "assisting" human work, all subscription logic based on "per user per month" becomes unsustainable.
04 30 Times, Not 10 Times
Back to the infrastructure problem. How does GitHub actually plan to handle this "14-fold growth"?
A detail here illustrates the severity of the situation:
In late December 2025, Agentic workflows suddenly began accelerating. GitHub's engineers realized that 10x wasn't enough. By February 2026, after that major outage, GitHub announced it needed to redesign its architecture for 30 times today's scale.
Not scaling, but redesigning.
The difference between these two words is significant. Scaling is adding more machines, more memory to existing databases—same direction, just bigger. Redesigning means the underlying architectural assumptions will fail systematically at 30x scale, forcing a fundamental rethinking of service decomposition, data flow, and failure isolation from the ground up.
GitHub's disclosed specific directions include decoupling critical services to prevent cascading failures, introducing backpressure and traffic degradation capabilities, deploying independent hosts for hotspot services, eliminating single points of failure, and implementing more robust change management—to avoid operations like "changing cache TTL from 12 hours to 2 hours" going live without sufficient load testing.
It's worth noting GitHub isn't alone.
Stripe has already encountered issues with AI Agents creating accounts in bulk; AWS is building Agent-specific identity systems, logging systems, and production control mechanisms. These moves aren't precautionary; signals have already appeared on their monitoring dashboards that they had to address.
GitHub was just the first to be transfixed—because it's at the very core of the AI toolchain.
05 Code Repositories, Becoming AI's Exhaust Pipe
Stop and think about the nature of this whole thing.
What is GitHub? The most intuitive answer: it's where programmers store code. But on a deeper level, it's the infrastructure for human software collaboration—commits are the tracks of collaboration, PRs are containers for discussion, Issues are records of intent, Actions are pipelines for execution. The entire system was designed for human work rhythms, thought processes, and collaborative patterns.
AI Agents have changed all that.
When an AI Agent can commit code hundreds of times a day, each "commit" lacking human thought and trade-off, just being a step in a task loop—is a code repository still a "container for collaboration"?
When AI tools automatically generate repos, automatically open PRs, automatically run CI, automatically merge—are developers still the primary actors in this process, or have they devolved into "reviewers" or even "bystanders"?
GitHub's CTO described this crisis as "rapid load growth." But this term likely understates the essence—this isn't just quantitative growth; it's a qualitative change in usage. In the old model, GitHub was a "developer's tool." In the new model, GitHub is becoming "AI's exhaust pipe," an output channel for automated workflows.
What this means for GitHub actually has no answer yet. Scaling 30x can solve the traffic problem, but it can't solve the redefinition of the business model, nor can it solve the identity question of "who is my real user."
A rather telling phenomenon recently: After the outages, GitHub published a flurry of engineering blog posts, describing the root causes of each incident in great detail, reaching a level of transparency that is almost surprisingly high. Some see this as GitHub actively building trust; others see it as trading transparency for the patience of the developer community—because the upcoming refactoring period will bring more instability.
A platform, after being transfixed by its own success, needs to tear itself apart and rebuild—and that process itself is a test of whether it can hold on.
On the night of February 9th, that engineer waiting for a PR to merge probably eventually saw the green light. But they might not have realized that the outage that made them wait wasn't just a GitHub accident; it was a signal—a sound announcing the entire software development industry's entry into a new era.
This article is from WeChat Official Account "GeekPark" (ID: geekpark), author: Yu Hang Yuan








