GitHub, Transfixed by AI

marsbitОпубликовано 2026-06-04Обновлено 2026-06-04

Введение

On the night of February 9th, GitHub suffered a major outage caused by a simple configuration change—reducing a cache refresh interval from 12 to 2 hours—that triggered a cascade of failures. This was not an isolated event, but part of a broader pattern. In early 2026, GitHub experienced at least 8 major incidents, failing to meet its promised 99.9% availability. These outages stemmed from structural issues: explosive growth in load, tight service coupling, and insufficient protection against abnormal traffic. This unprecedented load is driven by AI Agents. In 2025, GitHub handled ~1 billion commits. By 2026, weekly commits reached 275 million, projecting to ~14 billion for the year—a 14x increase. AI tools like Claude Code now contribute 4.5% of all public repository commits, with weekly submissions surging 25x in just three months. AI-generated pull requests jumped from 4 million to 17 million per month in half a year. Unlike human developers, AI Agents work continuously, generating commits at a scale that overwhelms infrastructure designed for human rhythms. The surge also shattered GitHub's business model. Copilot's flat-rate pricing, based on assisting human developers, became unsustainable as Agentic AI sessions consumed resources worth hundreds of dollars for a few dollars in fees. In response, GitHub imposed usage limits and, by June 1st, shifted to a pay-per-use "AI Credits" system. Facing this new reality, GitHub realized a 10x scaling plan was insufficient. It a...

On February 9th, Beijing time late at night, tens of millions of developers worldwide opened GitHub and saw the same page.

It wasn't a 404, but something more anxiety-inducing than a 404—that chilling yellow warning bar that sends shivers down every engineer's spine, alongside a status page full of indicators turning from green to red.

github.com was down.

The API was down.

GitHub Actions was down.

Git operations were down—even Copilot wasn't spared.

That night, some people's CI/CD pipelines ground to a halt at the most critical juncture, some saw their automated deployments stuck mid-air, and others waited for a pull request that just wouldn't merge—behind it, a feature waiting to go live, waiting for real users.

GitHub later published an incident report. The root cause, in technical terms, was "an overload of a core database cluster responsible for authentication and user management." But behind those words lay a startling chain of events—

Two days prior, the engineering team, in a hurry to push a new model to users, changed the refresh time of a "user settings cache" from 12 hours to 2 hours. Just one configuration number.

The result: cache rewrites that were supposed to be spread over 12 hours were compressed into 2, creating a dense "cache rewrite storm." Asynchronous task queues were instantly overwhelmed, shared infrastructure components crashed, and the cascading effects spread to services responsible for proxying HTTPS Git operations, eventually exhausting all platform connections.

One number, changed from 12 to 2.

GitHub was brought down by a configuration change it made itself.

But if you only see that one config change, you've probably missed the most important part of this story.

01 Not One Accident, But Ten

The February 9th incident was not an isolated event.

In fact, in the first three months of 2026, GitHub experienced at least 8 major incidents. February alone saw 37 recorded failures, big and small. GitHub's CTO Vlad Fedorov later admitted in a blog post that GitHub had failed to maintain the "three nines"—99.9% availability—it promises its enterprise customers during those two months.

Looking through the failure records of those two months, you'll find a peculiar pattern: each incident appears to have a different cause.

February 2nd: Issues with the Azure compute provider, causing GitHub Actions to be down for nearly 4 hours, affecting Copilot Chat, CodeQL, Dependabot, and more.

February 9th: Cache rewrite storm, authentication database overload.

March 5th: Redis cluster failure, 95% of GitHub Actions workflows unable to start within 5 minutes, average delay of 30 minutes.

March 18th: Webhook latency spiked to 32 times the normal level.

Each one looked like an "accident," each with a different immediate cause. But Fedorov's explanation strings them together into the same story. He said these incidents share three common structural causes: "rapid load growth, tight coupling between services leading to localized failure propagation, and systems lacking protection capabilities against abnormal client traffic."

In engineer speak, GitHub's foundation is starting to crack under the pressure of new loads.

And this "new load" has a specific name.

02 275 Million Commits Per Week

Key Data

Total commits for all of 2025: Approximately 1 billion

Weekly commit volume in 2026: 275 million

Projected annual total for 2026 at this rate: 14 billion (a 14-fold year-over-year increase)

GitHub Actions compute minutes: 5 billion minutes per week in 2023 → 10 billion in 2025 → 21 billion minutes in a week in early 2026

If you're a GitHub infrastructure engineer, the comparison between your monitoring dashboard in 2025 and 2026 would probably leave you speechless.

Throughout all of 2025, GitHub processed around 1 billion code commits. That number itself is massive, the result of years of platform growth. But by 2026, the *weekly* commit volume reached 275 million. Doing the math—if this pace continues for the whole year, the total commits for 2026 would be close to 14 billion, a full 14 times the total for all of 2025.

This isn't a smooth growth curve; it's a cliff. The change in GitHub Actions compute minutes is even more telling: 5 billion minutes per week in 2023, doubling to 10 billion in 2025, and then in one week in early 2026, it skyrocketed to 21 billion minutes.

What's submitting code so frantically?

Not human developers.

GitHub's data shows that AI Agents are becoming the most active 'users' on the platform. Claude Code alone now accounts for 4.5% of all commits to public repositories on GitHub. 2.6 million commits per week—a number that was only 100,000 in late September 2025, a 25-fold increase in three months.

The number of PRs opened by AI Agents is also exploding. In September 2025, AI-generated PRs numbered about 4 million per month. By March 2026, that number jumped to 17 million—more than four times higher in half a year.

A picture might help you understand what this means.

Before, GitHub's "users" were mainly human programmers. They work during the day, sleep at night, rest on weekends. Each commit involves thought, hesitation; their typing speed has limits. System load followed human schedules, with peaks and troughs that could be predicted.

Now, more and more "users" are AI Agents. They don't sleep, don't rest, don't hesitate. One task can spawn multiple parallel Agents. A single Agent can easily commit more code per hour than a real engineer does in a week. More importantly, they're not just committing code; they're constantly creating new repositories—treating repositories as "output artifacts" of a workflow, not a human's "workspace."

GitHub's infrastructure engineers are no longer facing a larger version of the same problem, but a fundamentally different kind of problem.

03 Copilot's Money Isn't Enough to Burn Anymore

Frequent failures are just one side of the problem. GitHub has another, even more troublesome headache—when doing the math, they found they were losing money.

Copilot's original pricing logic was based on a reasonable assumption: users primarily engaged in "assistive completion," each interaction brief, with predictable compute demands. The personal plan at $10/month and the business plan at $19/month, charged per seat, worked well for several years.

Then, Agentic AI arrived.

Agentic workflows and traditional completion are different species. Standard code completion involves linear, predictable requests with short compute cycles. An Agentic coding session might run for hours, spawning multiple parallel threads, performing multi-step reasoning, self-correction, cross-repository refactoring—the token consumption of one session can easily exceed the entire monthly subscription fee of an average user.

GitHub faces a situation where a minority of heavy Agentic users are consuming compute resources worth hundreds of dollars for a monthly fee of a few dollars.

Faced with this, GitHub's reaction was direct—control the flow first, then change the price.

Starting early this year, GitHub implemented two parallel rate-limiting mechanisms for Copilot: session duration caps and weekly usage caps, both calculated based on token consumption multiplied by model compute weight. At the same time, new user registration for some individual Copilot plans was paused.

On June 1st, GitHub completed a more fundamental pricing overhaul: Copilot fully switched to usage-based billing, replacing old plan fees with "AI Credits." 1 AI Credit equals 1 US cent, with usage calculated in real-time based on token consumption.

The era of per-seat pricing has reached its end in the face of Agentic AI.

This shift isn't just GitHub's headache. It's a collective pricing crisis the entire AI tool industry is experiencing in 2026—when AI starts replacing humans in executing entire workflows, not just "assisting" human work, all subscription logic based on "per user per month" becomes unsustainable.

04 30 Times, Not 10 Times

Back to the infrastructure problem. How does GitHub actually plan to handle this "14-fold growth"?

A detail here illustrates the severity of the situation:

In late December 2025, Agentic workflows suddenly began accelerating. GitHub's engineers realized that 10x wasn't enough. By February 2026, after that major outage, GitHub announced it needed to redesign its architecture for 30 times today's scale.

Not scaling, but redesigning.

The difference between these two words is significant. Scaling is adding more machines, more memory to existing databases—same direction, just bigger. Redesigning means the underlying architectural assumptions will fail systematically at 30x scale, forcing a fundamental rethinking of service decomposition, data flow, and failure isolation from the ground up.

GitHub's disclosed specific directions include decoupling critical services to prevent cascading failures, introducing backpressure and traffic degradation capabilities, deploying independent hosts for hotspot services, eliminating single points of failure, and implementing more robust change management—to avoid operations like "changing cache TTL from 12 hours to 2 hours" going live without sufficient load testing.

It's worth noting GitHub isn't alone.

Stripe has already encountered issues with AI Agents creating accounts in bulk; AWS is building Agent-specific identity systems, logging systems, and production control mechanisms. These moves aren't precautionary; signals have already appeared on their monitoring dashboards that they had to address.

GitHub was just the first to be transfixed—because it's at the very core of the AI toolchain.

05 Code Repositories, Becoming AI's Exhaust Pipe

Stop and think about the nature of this whole thing.

What is GitHub? The most intuitive answer: it's where programmers store code. But on a deeper level, it's the infrastructure for human software collaboration—commits are the tracks of collaboration, PRs are containers for discussion, Issues are records of intent, Actions are pipelines for execution. The entire system was designed for human work rhythms, thought processes, and collaborative patterns.

AI Agents have changed all that.

When an AI Agent can commit code hundreds of times a day, each "commit" lacking human thought and trade-off, just being a step in a task loop—is a code repository still a "container for collaboration"?

When AI tools automatically generate repos, automatically open PRs, automatically run CI, automatically merge—are developers still the primary actors in this process, or have they devolved into "reviewers" or even "bystanders"?

GitHub's CTO described this crisis as "rapid load growth." But this term likely understates the essence—this isn't just quantitative growth; it's a qualitative change in usage. In the old model, GitHub was a "developer's tool." In the new model, GitHub is becoming "AI's exhaust pipe," an output channel for automated workflows.

What this means for GitHub actually has no answer yet. Scaling 30x can solve the traffic problem, but it can't solve the redefinition of the business model, nor can it solve the identity question of "who is my real user."

A rather telling phenomenon recently: After the outages, GitHub published a flurry of engineering blog posts, describing the root causes of each incident in great detail, reaching a level of transparency that is almost surprisingly high. Some see this as GitHub actively building trust; others see it as trading transparency for the patience of the developer community—because the upcoming refactoring period will bring more instability.

A platform, after being transfixed by its own success, needs to tear itself apart and rebuild—and that process itself is a test of whether it can hold on.

On the night of February 9th, that engineer waiting for a PR to merge probably eventually saw the green light. But they might not have realized that the outage that made them wait wasn't just a GitHub accident; it was a signal—a sound announcing the entire software development industry's entry into a new era.

This article is from WeChat Official Account "GeekPark" (ID: geekpark), author: Yu Hang Yuan

Связанные с этим вопросы

QWhat was the immediate cause of GitHub's major outage on February 9th, as detailed in the article?

AThe immediate cause was a configuration change that reduced the refresh time for a user settings cache from 12 hours to 2 hours. This compressed cache rewriting, creating a 'cache rewrite storm' that overloaded asynchronous job queues, crashed shared infrastructure, and ultimately exhausted platform connections.

QAccording to the article, what is the fundamental structural reason behind GitHub's series of outages in early 2026?

AThe fundamental structural reason is that GitHub's infrastructure was not designed for the nature and scale of traffic from AI Agents. This new workload leads to rapid load growth, causes cascading failures due to tight service coupling, and overwhelms systems that lack protection against abnormal client traffic patterns.

QHow did the nature of GitHub's primary 'users' change, contributing to its infrastructure crisis?

AGitHub's most active 'users' are increasingly AI Agents, not human developers. Unlike humans, AI Agents operate continuously, submit code at an extremely high rate (exceeding a human's weekly output per hour), and treat repositories as disposable output channels rather than collaborative workspaces, creating an unpredictable and massive load.

QWhy did GitHub change its Copilot pricing model from a per-seat subscription to a usage-based 'AI Credits' system?

AThe old per-seat pricing became unsustainable with the rise of Agentic AI workflows. These sessions consume vastly more computational resources (tokens) than traditional code completion. A few heavy Agentic users could consume hundreds of dollars worth of resources for a few dollars in subscription fees, forcing GitHub to adopt a pay-per-use model.

QWhat does the article suggest is the deeper, symbolic shift occurring as AI Agents become dominant on platforms like GitHub?

AThe article suggests a profound shift in purpose: GitHub is transitioning from being a 'container for human collaboration' (tracking intent, discussion, and execution) to becoming an 'AI exhaust pipe'—a mere output channel for automated workflows. This challenges its core identity and the design of its systems, which were built for human rhythms and collaboration.

Похожее

What Are Some Good Paths for Chinese Web3 Entrepreneurship? (Part 5)

This article explores pathways for Chinese Web3 teams to pivot toward AI, building on a previous discussion. It focuses on two specific team profiles: **Security & Risk Control Teams:** These teams, skilled in smart contract auditing, wallet security, and on-chain monitoring, can transition to providing **Agent behavior auditing and AI security governance**. As AI Agents automate tasks, access data, and trigger payments, enterprises will need solutions to monitor permissions, audit logs, control data access, and prevent anomalies—creating a strong B2B demand. **Application & Community-Focused Teams:** Instead of completely rebranding as AI companies, these teams should use AI to **enhance their existing products**. For example, research platforms can use AI to summarize information and identify signals; community tools can automate user support and analysis; and educational products can create personalized learning paths. The key is integrating AI to solve existing user pain points, like information overload or high operational costs. The article also advises against certain AI directions for Chinese Web3 teams, such as building general-purpose large language models (too resource-intensive), creating overly broad Agent platforms (hard to monetize), developing AI traders/automated yield products (high regulatory and risk sensitivity), or simply adding superficial AI features without genuine value. The core conclusion: Successful migration depends not on chasing AI hype, but on **identifying how a team's existing Web3 capabilities—be it in data, payments, security, or user operations—can address real needs in new AI application scenarios.**

marsbit3 ч. назад

What Are Some Good Paths for Chinese Web3 Entrepreneurship? (Part 5)

marsbit3 ч. назад

Торговля

Спот
Фьючерсы

Популярные статьи

Неделя обучения по популярным токенам (2): 2026 может стать годом приложений реального времени, сектор AI продолжает оставаться в тренде

2025 год — год институциональных инвесторов, в будущем он будет доминировать в приложениях реального времени.

1.8k просмотров всегоОпубликовано 2025.12.16Обновлено 2025.12.16

Неделя обучения по популярным токенам (2): 2026 может стать годом приложений реального времени, сектор AI продолжает оставаться в тренде

Обсуждения

Добро пожаловать в Сообщество HTX. Здесь вы сможете быть в курсе последних новостей о развитии платформы и получить доступ к профессиональной аналитической информации о рынке. Мнения пользователей о цене на AI (AI) представлены ниже.

活动图片