Confirmed: GPT-5.5 "Brain Drain" Exposed, OpenAI's Own Documentation Admits It

marsbitPublicado a 2026-05-27Actualizado a 2026-05-27

Resumen

Summary: Evidence emerges that OpenAI's GPT-5.5 may be "silently" switching to a less capable model during use. Users report that after roughly two hours, the GPT-5.5 Extended Thinking model begins responding instantly with significantly degraded output quality, while the interface continues to display the premium model's label. Complaints on developer forums describe a loss of instruction-following ability and poor code quality, with even the highest "xhigh" tier affected. This is corroborated by an OpenAI help document stating that after Plus users exceed 160 messages per 3 hours, the system "silently" switches to a "mini" model without any user notification. Pro users also report "heavy thinking" modes being throttled during high server loads. Trace commands from earlier incidents have shown users requesting GPT-5.3 Codex but receiving GPT-5.2 outputs. OpenAI acknowledged performance degradation in mid-May, marking it resolved, but user reports surged again in late May. The pattern mirrors past controversies with GPT-5, 5.2, 5.3, and 5.4 releases, where each update was followed by user complaints of reduced capability. The article suggests cost-cutting on compute may be a factor, noting that while GPT-5.5 users struggle, GPT-5.6 is already being tested internally.

[Introduction] GPT-5.5 exposed for "fake thinking," secretly switched to 'mini' after two hours of use. $200 monthly fee buys you a "Schrödinger's brain." Trace command provides concrete evidence, official documentation personally acknowledges. Users are flocking to complain: OpenAI, who are you trying to fool?

ChatGPT has been caught "dumbing down" again!

Just in the last couple of days, it blew up on X first.

User Lisan al Gaib discovered that after using GPT-5.5 for an hour or two, it suddenly became stupid, with every request answered instantly and quality plummeting off a cliff.

Yet the interface still displayed "GPT-5.5 Extended Thinking."

In other words, the thinking label was still there, but the thinking itself had vanished.

$200/month for a "Schrödinger's Model"

On the OpenAI developer forum, a complaint post blew up simultaneously.

Agentify.sh stated that GPT-5.5 would suddenly lose its ability to follow instructions during use.

Watching it excitedly announce it was "fixed," only to produce code so poor it triggered a mass rollback.

UI tasks that the previous 5.5-med could handle easily now couldn't even manage the simplest changes.

Upgrading to 5.5-high didn't work. Upgrading again to xhigh, still no luck.

And xhigh, which used to run for several hours, now clearly lasted a shorter time.

As soon as the post went up, the replies exploded.

Some directly reverted to 5.4.

One used the highest tier, xhigh, but found it "clearly worse than last week, frequent errors on long tasks, not following the workflow at all."

One reported an even more bizarre situation: "Simple queries also take ages to process, and if you interrupt to correct its direction, it completely ignores you and continues with its previous incorrect plan."

That's right, everyone was describing the same phenomenon—GPT's brain had been swapped out at some unknown point.

GPT-5.5's current performance is on par with 5.3, no exaggeration. It was amazing the first few days, but now you can't find a trace of that original model.

Not an illusion, OpenAI spells it out in black and white

To verify, Lisan al Gaib conducted a comparative test.

Same account, Extended Thinking on the ChatGPT side produced garbage, but switching to xhigh on the Codex side immediately restored normal performance.

In his own words, Codex was "literally 4 billion times smarter than this thing."

Developer Andrew Curran came up with a clever trick—directly asking the model, "What is the cutoff date for your training data?"

The model answered: August 2025.

The problem? The cutoff date for GPT-5.5 Thinking is December. August is the cutoff date for the Instant version!

In other words, he selected Thinking, but the system actually ran Instant for him.

Not a single word of the model label on the interface changed, but the model behind it had been secretly swapped......

The funny thing is, this time OpenAI itself nailed the coffin shut for users in its own help documentation.

According to the official explanation in the OpenAI Help Center, Plus users can send a maximum of 160 GPT-5.5 messages every 3 hours.

After that quota is used up, the system will silently switch to the mini model until the quota resets.

Note the word "silently."

No pop-up notification, no change in the model label, no visual feedback whatsoever.

You still think you're using the flagship model, while on the other end it has quietly been replaced with mini.

Pro users, don't celebrate too soon either.

Heavy thinking mode, the top reasoning tier exclusive to Pro users, is also subject to capacity throttling when server load is high. Again, without any warning.

In other words, a $200/month Pro subscription buys you a service that can be "switched out" at any moment.

This kind of "label unchanged, brain swapped" operation was caught even earlier on the Codex side.

In February this year, an issue appeared on GitHub where a Pro user used a trace command to discover that they were requesting GPT-5.3 Codex, but the actual model returned was GPT-5.2.

Not even 5.2 Codex, but the lower-tier base 5.2.

He posted the reproduction command:

RUST_LOG='codex_api::sse::responses=trace' codex exec --skip-git-repo-check -s read-only -m 'gpt-5.3-codex' 'hi' 2>&1 >/dev/null | rg -o --replace '$1' '"model":"([^"]+)"' | head -n1
Output: gpt-5.2-2025-12-11
Expected: gpt-5.3-codex

Multiple Pro users confirmed the same downgrade under the same issue.

And this kind of downgrade is "sticky," it doesn't revert on its own, and there's no explanation.

Even on the day GPT-5.5 was released in April, there were user reports that the speed of Fast mode was similar to Standard, but billing was still at the Fast rate.

A simple task took 7 minutes and 49 seconds, when normally it should be 5-6 minutes.

OpenAI admitted it, and then... nothing

On May 15, a record appeared on OpenAI's status page.

GPT5.5 Performance Degradation, We are investigating reports of performance degradation for GPT-5.5 from some users.

On May 17, the status was updated to "Resolved."

But judging from the timeline of forum posts, complaints about "brain drain" from May 24-26 were even more intense than the wave on May 15.

Either the "resolved" problem came back, or it was never truly solved in the first place.

Every upgrade comes with a "brain drain controversy"

While all companies face complaints about their models "getting dumber," OpenAI hasn't missed a single one with every update from GPT-5 to GPT-5.5.

Every time OpenAI says it's investigating, every time it says it's resolved, and then continues with the next version.

August 2025, GPT-5's debut. The hot post on Reddit was titled directly "GPT-5 is so bad." Users complained about short replies, more refusals, less personality.

OpenAI was forced to urgently restore the GPT-4o option. Altman personally admitted in a Reddit AMA, "bumpier than we expected."

December 2025, GPT-5.2. Translation quality regressed, fabricated non-existent APIs, refused to execute style instructions that 5.1 could easily handle.

February 2026, GPT-5.3-Codex. Pro users silently downgraded to 5.2, trace command confirmed.

March 2026, GPT-5.4. A post titled "GPT-5.4 has clearly regressed in Codex" appeared on the OpenAI community forum, with all replies confirming.

Early May 2026, GPT-5.5 Instant launched. Reply length shortened by 30%, emojis almost disappeared. User summary: Accuracy improved, but warmth vanished.

Late May 2026, now. Complaints about Thinking mode "brain drain" erupted again.

Lisan al Gaib revealed that since he led the fight for ChatGPT Plus quotas during GPT-5's release, "I receive DMs like this every week."

The latest one was someone asking him to help get their xhigh/heavy thinking back.

The day it benchmarks strongest is launch day

chatgptdisaster.com compiled 1087 verified user complaints, one frequently mentioned scenario is "routing layer failure," where the UI shows GPT-5.5 Pro, but the output is completely from another tier.

Users describe a reproducible pattern: after a long session, the model starts "completely ignoring what you say," but the top-tier label is still hanging on the model selector.

The most absurd footnote is that the mechanism for Plus users automatically switching to mini after using 160 messages/3 hours is described as a "feature" in OpenAI's official documentation.

Why is this happening? Lisan al Gaib's analysis suggests the answer is two words: cost-saving.

The crunch on compute power and profitability is affecting everyone. Cutting corners everywhere, not missing any opportunity to save a buck.

Yet, in the same week GPT-5.5 users were collectively complaining, traces of GPT-5.6 had already appeared in Codex backend logs.

Internal codename iris-alpha, 1.5 million token context, Polymarket gave an over 85% probability for a June release.

On one side, 5.5 users can't even secure a basic experience; on the other, 5.6 is already quietly running real traffic in the background.

This is the 2026 ASI race.

The speed of creating new models is getting faster and faster, but making an old model run a single session properly is getting harder and harder.

The day it benchmarks strongest is always launch day, and every day after is Schrödinger's GPT.

Reference: https://x.com/scaling01/status/2058643470357590058?s=20

This article is from the WeChat public account "AI Era," author: ASI Apocalypse; Editor: Moses

Preguntas relacionadas

QWhat is the main issue reported by users regarding GPT-5.5?

AUsers report that after using GPT-5.5 for a short period, its performance degrades significantly, with responses becoming instant and of much lower quality, while the interface still shows the 'GPT-5.5 Extended Thinking' label, indicating a silent model switch.

QAccording to the article, what does OpenAI's official documentation reveal about user limits?

AOpenAI's official Help Center documentation states that Plus users are limited to 160 GPT-5.5 messages every 3 hours. Once this limit is reached, the system silently switches to a mini model until the quota resets, with no visual indication to the user.

QHow did developers verify that they were receiving a different model than selected?

ADevelopers used methods like comparing outputs between ChatGPT and Codex endpoints, asking the model for its training data cutoff date (which revealed an instant model date when thinking was selected), and using trace commands that showed the actual model returned was a lower-tier version than requested.

QWhat pattern does the article describe regarding OpenAI's model updates?

AThe article describes a recurring pattern where each major model update (GPT-5, 5.2, 5.3, 5.4, 5.5) is followed by widespread user complaints about performance degradation. OpenAI typically acknowledges and investigates the issue, but complaints resurface with subsequent releases.

QWhat reason does the article suggest is behind these performance issues and silent model switches?

AThe article suggests the primary reason is cost-saving. It cites an analysis stating that 'compute and profitability constraints are affecting everyone,' leading OpenAI to silently downgrade models to manage costs, even for users paying high subscription fees.

Lecturas Relacionadas

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

Three Years Later: Revisiting My 2023 Predictions on ChatGPT In March 2023, shortly after ChatGPT's launch, I made 20 predictions about its future. Now, in mid-2026, I've used AI agents to fact-check each one against the latest data. Overall, most major directional forecasts were correct, with only one outright error (incorrectly stating GPT-4 had 100 trillion parameters). Key successes included predicting that RAG and retrieval architectures would become the standard for handling knowledge and hallucinations, that natural language interfaces (LUI) would create a massive new industry layer beyond the models themselves, and that China would develop viable large language models, significantly closing the performance gap with Western counterparts within about three years. Predictions about the absence of mass unemployment, the rise of a new "robot network" for agent communication, and ChatGPT not possessing consciousness also held true in their core arguments. However, the "devil was in the details." Errors frequently involved specific numbers, timelines, or overlooking distributional effects. I tended to overestimate the speed of adoption (e.g., for agent networks) while underestimating the ultimate scale of capabilities or costs (e.g., AI winning IMO gold without tools, or the extreme capital required for frontier models). Other misjudgments included: underestimating how AI would reinforce, not dissolve, information filter bubbles; incorrectly assuming AI-generated content would easily circumvent copyright (it has instead triggered record-breaking settlements); and misidentifying where value would be captured (it accrued overwhelmingly to the compute layer, like Nvidia, not just the application or model layers). Key lessons from reviewing these predictions are: 1) Directional and mechanistic insights are far more reliable than precise numbers or absolute statements. 2) There's a consistent bias to overestimate short-term speed but underestimate long-term magnitude. 3) Errors often lie in missing distributional impacts within a generally correct aggregate trend. 4) Predictions phrased with nuance and caveats aged the best. 5) Some fundamental debates (e.g., on machine consciousness or the ultimate value chain) remain unresolved even after three years. This exercise is less about scoring the past and more about establishing rules for clearer thinking about the next three years of AI.

marsbitHace 3 hora(s)

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

marsbitHace 3 hora(s)

Three Years Later: Looking Back on My 2023 Predictions for ChatGPT

Looking Back After Three Years: Revisiting My 2023 Predictions on ChatGPT In March 2023, shortly after ChatGPT's debut and before GPT-4's release, I made over twenty predictions about AI's future based on limited information and intuition. Now, in May 2026, I revisited those forecasts using an AI-driven analysis with 41 Opus 4.8 agents to cross-reference them with the latest data. The assessment used symbols: ✅ Correct, 🟢 Mostly Correct, 🟡 Partially Correct, ❌ Incorrect. Overall, the directional judgments held up well, with only one major factual error regarding GPT-4's rumored parameter size (incorrectly cited as 100T). However, nuances and degrees of accuracy revealed more. **What Was Largely Correct:** Predictions about mechanisms and directions proved accurate. The rise of RAG (Retrieval-Augmented Generation) as the standard architecture for combating AI hallucination was confirmed, as was the transformative potential of LUI (Language User Interface) in creating a new industry layer atop GUIs. The emergence of "robot networks" (agent-to-agent communication protocols) and China's rapid catch-up in developing capable large models (closing the performance gap with top models to ~2.7%) were also on point. The analysis affirmed that LLMs lack consciousness and that the Turing Test merely measures perceived intelligence. **What Was Off Target:** Errors often involved specific numbers, over-optimistic timelines, or misjudged distributions. The prediction that value would primarily accrue to the application layer was half-right but missed NVIDIA's dominance as the profitable infrastructure layer. Forecasts about AI circumventing copyright issues and fostering a "global common ground" by averaging human viewpoints were incorrect; instead, major copyright settlements occurred and AI personalization is increasing. Estimates for model training costs ("$5-10 billion cap") were significantly off, underestimating frontier costs and overestimating replication costs. The notion that LLMs could never do complex math without tools was disproven by later models winning IMO gold. **Key Patterns from the Review:** 1. **Direction over precision:** Judgments about mechanisms and trends were more reliable than specific numbers or definitive statements. 2. **Timing bias:** There was a tendency to overestimate short-term speed but underestimate long-term magnitude and transformation. 3. **The distribution blind spot:** Aggregate-level correctness often masked uneven impacts (e.g., on young professionals' employment). 4. **The value of qualifiers:** Predictions framed with caution (e.g., "reportedly," "for now," "prototype in 2-3 years") aged better. 5. **Some debates continue:** Issues like the nature of "emergent abilities" or machine consciousness remain unresolved. This three-year review highlights that while seeing the big picture is crucial, humility regarding specifics, timelines, and disparate impacts is essential for future forecasting.

链捕手Hace 5 hora(s)

Three Years Later: Looking Back on My 2023 Predictions for ChatGPT

链捕手Hace 5 hora(s)

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

The article issues a stark warning about a potential AI investment bubble. It notes that while the AI boom shares similarities with the TMT bubble of the late 1990s, its scale is vastly larger, currently driving 93% of U.S. GDP growth. Major hyperscale cloud providers like Microsoft, Alphabet, Amazon, Meta, and Oracle are planning to invest trillions in AI data centers over the coming years. However, calculations based on analyst projections for 2025-2030 reveal a concerning math problem: expected capital expenditure growth far outpaces projected revenue growth. Even under an extremely optimistic scenario of zero costs, the implied return on investment for most of these tech giants (except Amazon) is deeply negative. This suggests that the current trajectory could lead to one of history's largest shareholder value destruction events. The piece outlines two potential escapes: AI generating vastly more revenue than currently anticipated—a near-impossible task—or a significant cutback in the planned investment splurge. The latter scenario could trigger a domino effect, severely impacting the entire tech supply chain (from Nvidia to TSMC), potentially pushing the U.S. economy into recession, and causing a major stock market downturn. The author suggests upcoming high-profile IPOs by companies like OpenAI and Anthropic might represent a transfer of risk from early investors to public market participants. While the peak of the hype cycle might sustain investment through 2026, the fundamental financial dilemma remains unresolved, setting the stage for a potential market correction in 2027 or 2028, similar to the years following Alan Greenspan's "irrational exuberance" warning.

marsbitHace 6 hora(s)

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

marsbitHace 6 hora(s)

From Tokens to Machine Labor: AI is Shifting from Tool to "Worker"

The article "From Token to Machine Labor: AI is Evolving from Tool to 'Worker'" argues that the business model for AI is shifting beyond simply selling computational resources (tokens, GPU hours) or model access. Instead, a new "machine labor market" is emerging, where the core economic transaction is the purchase of economically useful work directly performed by software. The central thesis is that AI pricing will evolve through four stages: 1) raw tokens, 2) standardized LLM capabilities (e.g., text generation), 3) industry-specific labor markets (e.g., legal review, radiology), and finally 4) a programmable results market where tasks like resolving a support ticket are bid on and priced based on outcome. In this future, buyers will care less about *which* model or GPU completes a task and more about whether the work meets specified standards for accuracy, latency, and cost. This transition reframes the impact of AI on human labor. Rather than simple replacement, it suggests a re-coordination where machines handle standardized, verifiable work, freeing humans for roles involving oversight, context management, responsibility, and final judgment. In some cases, this "last 1%" of human input becomes more valuable as it enables the other 99% to be automated. Furthermore, as AI reduces the cost of work, demand may expand, creating larger markets (e.g., 24/7 customer service) rather than just cheaper versions of existing ones. The article concludes that while infrastructure (GPUs, models, tokens) remains crucial upstream, the market is converging on a simpler, tradeable unit: machine labor that can be defined, measured, priced, and procured based on contractible specifications.

marsbitHace 6 hora(s)

From Tokens to Machine Labor: AI is Shifting from Tool to "Worker"

marsbitHace 6 hora(s)

Xiaomi MiMo's 99% Price Cut is Not Marketing! Luo Fuli Posts on X to Refute Critics

The price of Xiaomi's MiMo-V2.5 series API has been permanently reduced by up to 99%, specifically for the "Input (Cache Hit)" cost, which covers users re-reading historical context in long conversations. MiMo's head, Luo Fuli, published a detailed technical blog to clarify that this drastic price cut stems from genuine engineering breakthroughs, not a marketing stunt or a simple price war. The core of the achievement lies in six key engineering optimizations. First, the model architecture adopts a Hybrid Sliding Window Attention (SWA), reducing the memory footprint (KVCache) to 1/7th of a traditional model. Second, a dual-pool memory management system actually utilizes these savings, allowing a single GPU to handle over 5 times more concurrent users. Third, an upgraded prefix caching mechanism achieves a cache hit rate of 93-95% for repeated reads, meaning most such requests bypass GPU computation entirely. Fourth, a self-developed distributed cache (GCache) utilizes idle SSD space on existing GPU servers, eliminating additional storage costs. Fifth, an intelligent scheduling system (LLM-Router) efficiently routes requests to maximize cache reuse and performance. Sixth, Multi-Token Prediction (MTP) accelerates the model's text generation ("output") side. Together, these systemic optimizations dramatically lower the real computational cost per request, enabling the 99% price reduction for cached inputs while reportedly maintaining positive gross margins. Luo Fuli's disclosure aims to shift the narrative from "price war" to a demonstration of substantive AI engineering progress.

marsbitHace 8 hora(s)

Xiaomi MiMo's 99% Price Cut is Not Marketing! Luo Fuli Posts on X to Refute Critics

marsbitHace 8 hora(s)

Trading

Spot

Futuros