Three Years Later: Looking Back on My 2023 Predictions for ChatGPT

链捕手Published on 2026-05-31Last updated on 2026-05-31

Abstract

Looking Back After Three Years: Revisiting My 2023 Predictions on ChatGPT In March 2023, shortly after ChatGPT's debut and before GPT-4's release, I made over twenty predictions about AI's future based on limited information and intuition. Now, in May 2026, I revisited those forecasts using an AI-driven analysis with 41 Opus 4.8 agents to cross-reference them with the latest data. The assessment used symbols: ✅ Correct, 🟢 Mostly Correct, 🟡 Partially Correct, ❌ Incorrect. Overall, the directional judgments held up well, with only one major factual error regarding GPT-4's rumored parameter size (incorrectly cited as 100T). However, nuances and degrees of accuracy revealed more. **What Was Largely Correct:** Predictions about mechanisms and directions proved accurate. The rise of RAG (Retrieval-Augmented Generation) as the standard architecture for combating AI hallucination was confirmed, as was the transformative potential of LUI (Language User Interface) in creating a new industry layer atop GUIs. The emergence of "robot networks" (agent-to-agent communication protocols) and China's rapid catch-up in developing capable large models (closing the performance gap with top models to ~2.7%) were also on point. The analysis affirmed that LLMs lack consciousness and that the Turing Test merely measures perceived intelligence. **What Was Off Target:** Errors often involved specific numbers, over-optimistic timelines, or misjudged distributions. The prediction that value would p...

Author: Wang Jianshuo

On March 6, 2023, shortly after ChatGPT emerged and before the release of GPT-4, Sarah and I conducted an interview about ChatGPT—the third episode of Traders' Talk's "Plain Talk Series" (Plain Talk on ChatGPT Podcast Released, Welcome to Listen).

At that time, ChatGPT had just come out, and very few people had actually used it hands-on. This three-hour interview later stayed at the top of the ChatGPT category on Xiao Yuzhou. In it, I threw out over twenty judgments and predictions in one go, based purely on intuition and limited information, with little data. The full transcript of that interview is still on the public WeChat account.

Now it's late May 2026. Three years have passed, and AI has grown into something unimaginable back then.

I want to do one thing: take out those twenty points one by one and, using the latest data available today, objectively reconcile the accounts. See clearly how the world has actually changed in these three years, and see clearly where the me from three years ago got it right and where I was off.

To be as unbiased as possible, I handed this reconciliation task to an AI this time: I threw the old interview transcript into a workflow, which dispatched 41 agents of Opus 4.8 to first break down the twenty judgments, then have each search the latest data online, cross-verify them one by one, and finally grade Wang Jianshuo from three years ago. These agents spent about 20 minutes and burned approximately 1.4 million tokens (roughly $35) to produce the report below. The judgments are all from them, not me. The benchmark date is May 2026.

I. The Scoreboard

Verdict symbols: ✅ Correct · 🟢 Mostly Correct · 🟡 Partially Correct · ❌ Wrong

At a glance, Wang Jianshuo's overall direction back then mostly held up. The only real hard error is the one about GPT-4 being rumored at 100T parameters. But the devil is in the details: behind almost every "correct" judgment lies a tail end that wasn't quite right back then. None of the twenty points are purely "still uncertain"; three years is long enough, and most things have a clear tendency. Let's discuss in groups.

II. The Ones He Got Right

The common point in this group: Wang Jianshuo correctly bet on the direction, mechanism, and even the timing rhythm back then. Errors were only in "degree" and "absolute wording."

RAG & Retrieval Architecture (Points 2, 3)

> 2023 Wang Jianshuo said: The mainstream method to solve knowledge and hallucination isn't tweaking the model, but vector retrieval to inject knowledge as "cheat sheets"; the correct architecture is a search engine doing retrieval, feeding results to an LLM.

This is the de facto standard for all AI products today. RAG has become the default architecture for enterprise AI; OpenAI, Google, Anthropic have all built it into platform-level capabilities; ChatGPT Search is literally "first retrieve with Bing indexing, feed results to GPT, then generate answers with citations." Google AI Overviews uses grounding to reach ~2 billion MAU, and Perplexity, a company purely built on this architecture, has a valuation of ~$20B.

When GPT-4 hadn't been released yet and the industry default was "injecting knowledge via fine-tuning," he bet on "not touching model parameters, external retrieval"—correct on mechanism and timing.

To be honest: he envisioned "static one-time retrieval," but reality is more complex—long context, GraphRAG, agentic retrieval have all come to enhance it. The 2026 "RAG is dead" debate precisely proves the broad direction isn't dead; what it negates is only "naive one-time retrieval," concluding with an upgrade to hybrid retrieval, not reverting to tweaking model parameters. Also, note: the term RAG was introduced in Meta's 2020 paper, not his creation—he just correctly bet during the window that it would become mainstream.

LUI is a New Frontier (Point 7)

> 2023 Wang Jianshuo said: ChatGPT's greatest achievement isn't AIGC, but opening up LUI (Natural Language User Interface), which will restructure human-computer interaction like GUI did, spawning a new industry much larger than "building large models" itself.

The "new frontier" part is almost entirely correct. Natural language has become the mainstream dominant interaction layer (ChatGPT 900M weekly active users) and spawned an independent new industry—agent, coding agent, protocol layer all materialized. The most specific phrase "much larger than building models itself" is strongly confirmed: MCP protocol became the "operating system standard" for the LUI era, widely adopted by OpenAI, Google, Microsoft in 2025, transferred to the Linux Foundation by year-end; Claude Code alone reached ~$2.5B annualized revenue.

But his use of strong wording like "restructure, replace GUI" looks, after three years, to be coexistence and layering, not replacement. Three types of hard counterexamples: MIT report shows 95% of enterprise GenAI pilots have no measurable ROI; top models' computer-use agents for direct UI manipulation score ~78% on test sets, just reaching human baseline; purely screenless voice hardware mostly failed (Humane Pin permanently shut down in 2025). A more accurate statement: LUI is a new interaction layer layered on top of GUI.

Robot Network & New Addressing (Point 9)

> 2023 Wang Jianshuo said: In about a decade, a "robot network" will emerge—agents automatically handshake and call each other using natural language, no longer needing traditional APIs; a brand-new domain name addressing system will be born. This system "can be done in two or three years."

The direction is hit remarkably well. MCP, A2A (donated to Linux Foundation, supported by 150+ organizations) solve agent inter-calls; Agent Network Protocol directly uses W3C's DID for "agent addressing without central authority," aiming for a "collaborative network of billions of agents"—highly isomorphic to his "new domain name system."

Two corrections: First, "no longer need APIs" doesn't hold; underlying mainstream protocols are structured schemas, essentially a standard layer on top of APIs. Second, "done in two or three years" didn't materialize; Gartner data shows only ~17% of organizations had truly deployed agents by 2026. Interestingly, he actually layered his statement back then—prototype in "two or three years," maturity in "about a decade." The prototype timing was accurate, and the maturity cycle is indeed decadal. Viewed separately, this point is of higher quality than it appears.

China Can Definitely Produce Usable LLMs (Points 10, 20)

> 2023 Wang Jianshuo said: China can definitely produce usable large models; the gap with the top will narrow rapidly within about three years (analogous to Red Flag browser catching up to Netscape).

This timeline matches surprisingly well. Stanford's 2026 AI Index actual measurements show the gap between top Chinese and US models narrowed from 17.5–31.6 percentage points in May 2023 to 2.7%; while US private AI investment is ~23x that of China's—convergence achieved with much smaller input. DeepSeek, Qwen, Kimi, GLM have become global mainstream, with open-source ecosystem even leading.

But "rapidly" was optimistic—real maturity came about 14 months later, not "a few months." Also, this is catching up in usability, not defining the frontier: as of early 2026, no Chinese model surpassed OpenAI o3. In point 20, he was clearly wrong: the judgment "once the door opens, it won't close" was directly overturned when OpenAI proactively cut off API access to China in July 2024; the door was closed by the supplier. The model he named to lead, Wenxin Yiyan, fell behind, while the baton was picked up by the then-obscure DeepSeek, Doubao, and Qianwen.

No Consciousness, Turing Test Only Measures Appearance (Point 13)

> 2023 Wang Jianshuo said: ChatGPT has no consciousness; it's a case of "the speaker has no intent, the listener reads too much into it." The Turing test only measures "whether it makes you think it has it," not whether it actually does.

The core judgment "measures appearance" stands firm, ironically cemented by an experiment: in a 2025 UC San Diego Turing test, GPT-4.5, under prompts to "play a persona," was judged human 73% of the time—higher than actual humans—but purely through performance skill, the perfect footnote for "only tests whether it makes you think it has it."

To add: the absolute strong statement "machines definitely have no consciousness" has been pushed into a gray area over three years. Anthropic established a "model welfare" research position, giving a ~15–20% probability of consciousness, and added a function for Claude to "actively end abusive conversations." These turn "absolutely none" into "low probability but cannot be ruled out." However, these are based on "possibly, should assume" not "proven," the core isn't overturned, just his tone was too definitive back then.

Others He Got Right (Points 6, 11, 12, 16, 18, 19)

Not AGI but a Big Step Forward
: Both ends hold. Altman himself in the GPT-5 era still says "not AGI, lacks continuous learning"; meanwhile, IMO gold medal, ARC-AGI jumped from near zero to 85%, "a big step" is undisputed.
No Unemployment Wave
: US unemployment rate in April 2026 only 4.3%. Blind spot in "distribution"—Stanford research shows the hardest hit are precisely the first-rung 22–25-year-old young newcomers; the "smooth absorption" mechanism failed for them.
Will Not Be Flooded by AI Garbage
: Net benefit direction correct, but he severely underestimated the scale—AI content accounts for ~52% of new web pages, "AI slop" became word of the year.
A Big Year for Startups
: Correctly caught the inflection point of the wave, xAI (founded March 2023) reached $230B valuation. But he locked "great companies" too narrowly into 2023—the truly trillion-dollar scale OpenAI, Anthropic were founded earlier.
1994 Browser Moment
: Relative ranking confirmed, OpenAI literally launched the Atlas browser in 2025, turning metaphor into literal reality. Just that ChatGPT diffusion was more explosive than browsers, so the metaphor was conservative.
Prompting with Injected Facts Reduces Hallucinations
: Direction confirmed, GPT-5 offline without retrieval hallucination rate spikes to 47%, inversely confirming "facts" as a key variable. Only underestimated the root cause is training incentives, not prompting.

III. The Ones He Got Wrong or Was Off On

GPT-4 is 100T Parameters (Point 4)—Completely Wrong

> 2023 Wang Jianshuo said: (Rumor) GPT-4 is 100T parameters, about 600x GPT-3's 175B.

Both numbers wrong. GPT-3 is 175B, best estimate from July 2023 leaks is GPT-4 is ~ 1.8T, 16-expert MoE, only ~10x. 100T is off by ~55x magnitude. The only source for "100T" is a secondhand paraphrase of "about" by Cerebras CEO in 2021; Sam Altman had already called that comparison chart "complete bullshit" to his face in January 2023.

His original wording marked it "rumor," retaining uncertainty. Deeper layer: the framework of "using parameter multiples to measure generations" is itself outdated: OpenAI's later GPT-4.5, GPT-5 no longer disclose parameter counts. This is the only point that's hard wrong on both numbers and perspective.

LLM Math (Point 1)—Diagnosis Correct, Capping Conclusion Wrong

> 2023 Wang Jianshuo said: LLMs being bad at math is intrinsic; letting them learn math themselves is both impossible and unnecessary; the correct approach is external tools.

"Diagnosis plus tool route" all correct—root cause is indeed token-by-token generation causing carry unreliability (2025 mechanism paper precisely confirmed the intuition of "last digit often right, middle digits wrong"); external tools also bring huge improvements (o4-mini with Python allowed scores 99.5% on AIME 2025).

Wrong in the capping wording "impossible, unnecessary." "Impossible" disproven—in July 2025, Gemini Deep Think and OpenAI models won IMO gold medals using pure natural language, no tools. Key turning point was "reasoning models" emerging in 2024–2025, unforeseeable in March 2023—so this prediction should be judged leniently on direction, not faulted on timing.

Value Capture (Point 8)—Right Half, Core Argument Reversed

> 2023 Wang Jianshuo said: Value will ultimately reside in the application layer; companies creating the foundational layer (model builders) may not necessarily end up profitable.

Money indeed started flowing to the application layer (Cursor reached $2B annualized revenue in three years)—that half is right. But "foundation layer builders not profitable" is directly disproven by NVIDIA: FY2026 net profit ~$120B, market cap $5T+, the only clear large-scale profit maker in the whole market. The model layer he implied would win (OpenAI projected loss ~$14B in 2026) instead most resembles the "burning money, unprofitable foundational layer" he described.

He didn't distinguish between "compute foundational layer" and "model foundational layer," nor between "revenue" and "profit." Value in 2026 is captured even more extremely by the compute layer than in 2023, not shifting to the application layer. To add: it's the cloud providers buying chips that lose money, not NVIDIA selling chips—exactly the misplacement in his "railroad overbuilding" analogy.

Copyright (Point 14)—Registration Correct, Evasion Wrong

> 2023 Wang Jianshuo said: AI-generated content might evade copyright (protects expression, not ideas); the generated works might neither infringe nor be registrable.

"Unable to register" became settled legal fact (2025 US Copyright Office clarified "mere input of prompts insufficient to claim authorship"). But "evade infringement" is clearly wrong: courts repeatedly ruled AI outputs can constitute infringement if substantially similar to original works; Anthropic settled for pirated training data for $1.5B, the largest copyright settlement in US history. AI not only didn't "evade" copyright, it paid the highest price ever.

World Harmony (Point 15)—Mechanism Correct, Trend Bet Reversed

> 2023 Wang Jianshuo said: ChatGPT creates a "weighted average" of human opinions, could combat TikTok-like information cocoons, offering the possibility of "world harmony."

Mechanism layer correct—2025 multiple studies confirmed LLMs compress opinions toward the majority, systematically underestimate minority views. But social judgment layer bet reversed: his own added "at least it's not personalized yet" was overturned within three years—OpenAI made cross-conversation memory and personalization default capabilities starting April 2025, AI is rapidly moving towards personalization. More crucially, he imagined "weighted average" as a neutral world consensus, but actual measurements show it's a directional shift, compounded with sycophancy, and can be used to actively manipulate stances—pointing towards "creating new cocoons," not "dissolving polarization."

Local Wars & Cost (Point 17)—Qualitative Correct, Quantitative Disproven

> 2023 Wang Jianshuo said: Building larger models will quickly become "local wars"; costs are knowable (capped at $5-10B USD after removing detours); many players will enter.

Qualitative direction correct remarkably—many players entered, rapid commoditization, open-source catching up to closed-source, all materialized. But the hard number "$5-10B cap" wrong on both ends: frontier severely underestimated (GPT-5 level 2026 training cost $2-5B, plus hundred-billion-dollar data centers and the $500B Stargate); replication end overestimated (DeepSeek pushed marginal training cost down to the million-dollar level). The "cost" of the same model can vary 200x depending on definition, just not within his given range.

Emergent Abilities (Point 5)—Direction Correct, Numbers and Framing Wrong

> 2023 Wang Jianshuo said: Around 60B parameters and above, new abilities appear not present in raw data and unexplainable by researchers.

Directional intuition holds, but two statements don't stand: First, there is no unified "60B threshold"—chain-of-thought's real threshold is ~100B, different abilities emerge at scales from 13B to 540B. Second, "unexplainable" was challenged by a NeurIPS outstanding paper late 2023—many "sudden jumps" are artifacts of metric choice, curves become smooth and predictable with continuous metrics. Fair to say, he was repeating the absolutely mainstream narrative then; the truly corrigible parts are treating "60B" as a hard threshold and "unexplainable" as a qualitative conclusion.

IV. Looking Back After Three Years: Several Patterns

After reconciling each point and stepping back, within Wang Jianshuo's twenty judgments lie several patterns more worth remembering than any single point.

First, direction is far more reliable than numbers and degree. Among the twenty, all judgments about mechanism and direction (RAG, LUI, robot network, Turing test) are almost all correct; all that gave specific numbers or definitive wording (100T parameters, 60B threshold, $5-10B cost, math "impossible") are almost all wrong. For fast-changing fields, bet on direction and mechanism, less on precise numbers, and be especially wary of words like "impossible, definitely, capped, absolutely none"—these are high-risk zones for being proven wrong by time.

Second, regarding time, he tended to overestimate speed and underestimate magnitude. All statements like "rapidly, can be done in two or three years" generally took longer to mature; but the ceiling of capability leaps was underestimated—math went from "impossible" to IMO gold, frontier costs rose to unimaginable magnitudes. In short: too optimistic in the short term, too conservative in the long term.

Third, the most subtle errors repeatedly lie in "distribution." Not wrong on direction, but only looking at the aggregate, ignoring distribution. "No unemployment wave" correct, but harm highly concentrated on young newcomers; "value in application layer" half right, but didn't distinguish compute layer from model layer. Aggregate correctness masks distributional disasters—this is the most important lesson to add.

Fourth, where he left room, those points withstand scrutiny after three years. "Rumor," "at least for now," "significantly reduces rather than eliminates," "prototype in two or three years, mature in about a decade"—all judgments back then with qualifiers, layered distinctions, hold up better today. Conversely, the absolute sentences blurted out are most prone to flip. The honesty of a prediction lies half in daring to speak, and half in daring to annotate one's own uncertainty.

Fifth, some questions simply aren't settled in three years. Where value ultimately goes, whether emergence is true transformation, whether machines have even a sliver of consciousness, whether long context will eat RAG—these debates from back then are still debates in 2026. Being able to distinguish "already answered" from "still waiting" is more important than rushing to conclusions on everything.

Three years ago, Wang Jianshuo pointed out twenty directions in the fog before GPT-4 was even released, based on intuition. After this reconciliation today, perhaps the most important sentence to remember is: getting the broad direction right isn't that hard; what's hard is admitting one's repeated presumptions about numbers, speed, and distribution. These twenty accounts are less about grading the past, and more about establishing a few rules for the next three years. Let's reconcile again in three years, in 2029.

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

Three Years Later: Revisiting My 2023 Predictions on ChatGPT In March 2023, shortly after ChatGPT's launch, I made 20 predictions about its future. Now, in mid-2026, I've used AI agents to fact-check each one against the latest data. Overall, most major directional forecasts were correct, with only one outright error (incorrectly stating GPT-4 had 100 trillion parameters). Key successes included predicting that RAG and retrieval architectures would become the standard for handling knowledge and hallucinations, that natural language interfaces (LUI) would create a massive new industry layer beyond the models themselves, and that China would develop viable large language models, significantly closing the performance gap with Western counterparts within about three years. Predictions about the absence of mass unemployment, the rise of a new "robot network" for agent communication, and ChatGPT not possessing consciousness also held true in their core arguments. However, the "devil was in the details." Errors frequently involved specific numbers, timelines, or overlooking distributional effects. I tended to overestimate the speed of adoption (e.g., for agent networks) while underestimating the ultimate scale of capabilities or costs (e.g., AI winning IMO gold without tools, or the extreme capital required for frontier models). Other misjudgments included: underestimating how AI would reinforce, not dissolve, information filter bubbles; incorrectly assuming AI-generated content would easily circumvent copyright (it has instead triggered record-breaking settlements); and misidentifying where value would be captured (it accrued overwhelmingly to the compute layer, like Nvidia, not just the application or model layers). Key lessons from reviewing these predictions are: 1) Directional and mechanistic insights are far more reliable than precise numbers or absolute statements. 2) There's a consistent bias to overestimate short-term speed but underestimate long-term magnitude. 3) Errors often lie in missing distributional impacts within a generally correct aggregate trend. 4) Predictions phrased with nuance and caveats aged the best. 5) Some fundamental debates (e.g., on machine consciousness or the ultimate value chain) remain unresolved even after three years. This exercise is less about scoring the past and more about establishing rules for clearer thinking about the next three years of AI.

marsbit1h ago

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

marsbit1h ago

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

The article issues a stark warning about a potential AI investment bubble. It notes that while the AI boom shares similarities with the TMT bubble of the late 1990s, its scale is vastly larger, currently driving 93% of U.S. GDP growth. Major hyperscale cloud providers like Microsoft, Alphabet, Amazon, Meta, and Oracle are planning to invest trillions in AI data centers over the coming years. However, calculations based on analyst projections for 2025-2030 reveal a concerning math problem: expected capital expenditure growth far outpaces projected revenue growth. Even under an extremely optimistic scenario of zero costs, the implied return on investment for most of these tech giants (except Amazon) is deeply negative. This suggests that the current trajectory could lead to one of history's largest shareholder value destruction events. The piece outlines two potential escapes: AI generating vastly more revenue than currently anticipated—a near-impossible task—or a significant cutback in the planned investment splurge. The latter scenario could trigger a domino effect, severely impacting the entire tech supply chain (from Nvidia to TSMC), potentially pushing the U.S. economy into recession, and causing a major stock market downturn. The author suggests upcoming high-profile IPOs by companies like OpenAI and Anthropic might represent a transfer of risk from early investors to public market participants. While the peak of the hype cycle might sustain investment through 2026, the fundamental financial dilemma remains unresolved, setting the stage for a potential market correction in 2027 or 2028, similar to the years following Alan Greenspan's "irrational exuberance" warning.

marsbit4h ago

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

marsbit4h ago

From Tokens to Machine Labor: AI is Shifting from Tool to "Worker"

The article "From Token to Machine Labor: AI is Evolving from Tool to 'Worker'" argues that the business model for AI is shifting beyond simply selling computational resources (tokens, GPU hours) or model access. Instead, a new "machine labor market" is emerging, where the core economic transaction is the purchase of economically useful work directly performed by software. The central thesis is that AI pricing will evolve through four stages: 1) raw tokens, 2) standardized LLM capabilities (e.g., text generation), 3) industry-specific labor markets (e.g., legal review, radiology), and finally 4) a programmable results market where tasks like resolving a support ticket are bid on and priced based on outcome. In this future, buyers will care less about *which* model or GPU completes a task and more about whether the work meets specified standards for accuracy, latency, and cost. This transition reframes the impact of AI on human labor. Rather than simple replacement, it suggests a re-coordination where machines handle standardized, verifiable work, freeing humans for roles involving oversight, context management, responsibility, and final judgment. In some cases, this "last 1%" of human input becomes more valuable as it enables the other 99% to be automated. Furthermore, as AI reduces the cost of work, demand may expand, creating larger markets (e.g., 24/7 customer service) rather than just cheaper versions of existing ones. The article concludes that while infrastructure (GPUs, models, tokens) remains crucial upstream, the market is converging on a simpler, tradeable unit: machine labor that can be defined, measured, priced, and procured based on contractible specifications.

marsbit4h ago

From Tokens to Machine Labor: AI is Shifting from Tool to "Worker"

marsbit4h ago

Xiaomi MiMo's 99% Price Cut is Not Marketing! Luo Fuli Posts on X to Refute Critics

The price of Xiaomi's MiMo-V2.5 series API has been permanently reduced by up to 99%, specifically for the "Input (Cache Hit)" cost, which covers users re-reading historical context in long conversations. MiMo's head, Luo Fuli, published a detailed technical blog to clarify that this drastic price cut stems from genuine engineering breakthroughs, not a marketing stunt or a simple price war. The core of the achievement lies in six key engineering optimizations. First, the model architecture adopts a Hybrid Sliding Window Attention (SWA), reducing the memory footprint (KVCache) to 1/7th of a traditional model. Second, a dual-pool memory management system actually utilizes these savings, allowing a single GPU to handle over 5 times more concurrent users. Third, an upgraded prefix caching mechanism achieves a cache hit rate of 93-95% for repeated reads, meaning most such requests bypass GPU computation entirely. Fourth, a self-developed distributed cache (GCache) utilizes idle SSD space on existing GPU servers, eliminating additional storage costs. Fifth, an intelligent scheduling system (LLM-Router) efficiently routes requests to maximize cache reuse and performance. Sixth, Multi-Token Prediction (MTP) accelerates the model's text generation ("output") side. Together, these systemic optimizations dramatically lower the real computational cost per request, enabling the 99% price reduction for cached inputs while reportedly maintaining positive gross margins. Luo Fuli's disclosure aims to shift the narrative from "price war" to a demonstration of substantive AI engineering progress.

marsbit6h ago

Xiaomi MiMo's 99% Price Cut is Not Marketing! Luo Fuli Posts on X to Refute Critics

marsbit6h ago

$26 Billion: An 'All-Chinese Team' Backs the World's Highest-Valued AI Programming Company

Cognition AI, the company behind the AI programmer "Devin," has raised over $1 billion in new funding at a valuation of $26 billion, just eight months after reaching a $10.2 billion valuation. The round was led by Lux Capital, General Catalyst, and 8VC. Founded by three young Chinese entrepreneurs with strong competitive programming backgrounds, Cognition initially gained fame with Devin, marketed as the world's first AI software engineer capable of handling tasks from start to finish. While its early demos were impressive, real-world usage revealed reliability and cost-effectiveness issues, leading to a significant price cut for Devin in 2025. A pivotal moment came when Cognition acquired the assets of AI IDE company Windsurf after a failed acquisition by OpenAI. This move gave Cognition a crucial developer-facing tool, allowing it to pursue a two-pronged strategy: Devin for autonomous task execution and Windsurf for integrated, collaborative coding within an IDE. This shift helped the company move away from the controversial "AI replacement" narrative towards a model of augmenting human engineers, particularly for repetitive or maintenance tasks. This strategic pivot is backed by strong commercial metrics. The company reports a 10x increase in enterprise usage this year, with an annual revenue run-rate of $492 million and a 50% month-over-month growth in enterprise Devin usage over the past six months. Its client list now includes major corporations like Goldman Sachs and Mercedes-Benz, as well as government agencies like NASA and the U.S. Army. Investors are betting on Cognition becoming a foundational piece of next-generation software engineering infrastructure, positioning it at the center of a hybrid future where AI agents and human developers work in tandem.

marsbit6h ago

$26 Billion: An 'All-Chinese Team' Backs the World's Highest-Valued AI Programming Company

marsbit6h ago

Trading

Spot

Futures

Three Years Later: Looking Back on My 2023 Predictions for ChatGPT

Abstract

I. The Scoreboard

II. The Ones He Got Right

III. The Ones He Got Wrong or Was Off On

IV. Looking Back After Three Years: Several Patterns

Related Questions

Related Reads

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

From Tokens to Machine Labor: AI is Shifting from Tool to "Worker"

Xiaomi MiMo's 99% Price Cut is Not Marketing! Luo Fuli Posts on X to Refute Critics

$26 Billion: An 'All-Chinese Team' Backs the World's Highest-Valued AI Programming Company

Trading