Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

marsbitPublished on 2026-05-31Last updated on 2026-05-31

Abstract

Three Years Later: Revisiting My 2023 Predictions on ChatGPT In March 2023, shortly after ChatGPT's launch, I made 20 predictions about its future. Now, in mid-2026, I've used AI agents to fact-check each one against the latest data. Overall, most major directional forecasts were correct, with only one outright error (incorrectly stating GPT-4 had 100 trillion parameters). Key successes included predicting that RAG and retrieval architectures would become the standard for handling knowledge and hallucinations, that natural language interfaces (LUI) would create a massive new industry layer beyond the models themselves, and that China would develop viable large language models, significantly closing the performance gap with Western counterparts within about three years. Predictions about the absence of mass unemployment, the rise of a new "robot network" for agent communication, and ChatGPT not possessing consciousness also held true in their core arguments. However, the "devil was in the details." Errors frequently involved specific numbers, timelines, or overlooking distributional effects. I tended to overestimate the speed of adoption (e.g., for agent networks) while underestimating the ultimate scale of capabilities or costs (e.g., AI winning IMO gold without tools, or the extreme capital required for frontier models). Other misjudgments included: underestimating how AI would reinforce, not dissolve, information filter bubbles; incorrectly assuming AI-generated content ...

Author: Wang Jianshuo

On March 6, 2023, shortly after ChatGPT emerged and before GPT-4 was released, Sarah and I conducted an interview about ChatGPT—the third episode of Traders' Talk "Plain Talk Series" (Plain Talk About ChatGPT podcast released, welcome to listen).

At that time, ChatGPT had just come out, and very few people had actually used it. This three-hour interview later stayed at the top of the ChatGPT category on Xiaoyuzhou. In it, I boldly made over twenty judgments and predictions, based purely on intuition and limited information, with little data. The full transcript of that interview is still on my public account.

It's now late May 2026, three years later, and AI has grown into something unimaginable back then.

I want to do one thing: take those twenty points one by one, and use the latest data available today to objectively check the accounts. To see clearly how the world has actually changed in these three years, and also to see clearly where that version of me from three years ago was right and where I was off.

To be as unbiased as possible, I handed this accounting task to AI: I threw the old interview transcript into a workflow, which dispatched 41 Opus 4.8 agents. They first separated the twenty points, then each individually searched the internet for the latest data, cross-verified them point by point, and finally graded Wang Jianshuo from three years ago. This group of agents spent about 20 minutes, burning about 1.4 million tokens (equivalent to roughly $35), and produced the report below. The judgments come from them, not me. The baseline date is set at May 2026.

I. The Scoreboard

Verdict Symbols: ✅ Correct · 🟢 Mostly Correct · 🟡 Partially Correct · ❌ Incorrect

At first glance, Wang Jianshuo's overall direction from back then mostly held up. There's only one genuinely hard error—spreading the rumor that GPT-4 had 100T parameters. But the devil is in the details: behind almost every "correct" point, there lies a tail of details that wasn't quite right back then. None of the twenty points remain purely "uncertain"; three years is long enough for most things to have developed a clear tendency. Let's examine them in groups.

II. The Ones He Got Right

The commonality in this group: the direction, mechanism, and even the timing of the change that Wang Jianshuo predicted back then were all correct. Any errors were only in the "degree" or the "absolute wording."

RAG & Retrieval Architecture (Points 2, 3)

> 2023 Wang Jianshuo said: The mainstream method to solve knowledge and hallucination isn't changing the model, but rather using vector retrieval to inject knowledge as "cheat sheets"; the correct architecture is using a search engine for retrieval and feeding results to the LLM.

This is the de facto standard for all AI products today. RAG has become the default architecture for enterprise AI; OpenAI, Google, Anthropic have all made it a platform-level capability; ChatGPT Search literally is "use Bing index for retrieval, feed results to GPT, then generate answers with citations." Google AI Overviews uses grounding to reach ~2 billion MAU; Perplexity, a company built purely on this architecture, surged to a ~$200 billion valuation.

Back when GPT-4 wasn't released and the industry default was "inject knowledge via fine-tuning," he bet on "keeping model parameters unchanged, attaching external retrieval." The mechanism and timing were both right.

To be honest, what he envisioned was "static one-time retrieval," while reality is more complex—long context, GraphRAG, agentic retrieval have all come to strengthen it. The 2026 debate "RAG is dead" precisely proves the general direction isn't dead; what it negated was only the "naive one-time retrieval," concluding with an upgrade to hybrid retrieval, not a retreat to modifying model parameters. Another point: the term RAG was coined in a 2020 Meta paper, not his creation—he merely bet correctly on it becoming mainstream during a crucial window.

LUI is a New Continent (Point 7)

> 2023 Wang Jianshuo said: The greatest thing about ChatGPT isn't AIGC, but opening up LUI (Natural Language User Interface), which will reconstruct human-computer interaction like GUI did back in the day, spawning a new industry far larger than "building large models" itself.

The "new continent" part is almost completely right. Natural language has become the dominant interaction layer for the masses (ChatGPT 900 million weekly active users) and spawned a standalone new industry—agents, coding agents, protocol layers all materialized. The most specific phrase, "far larger than building models itself," was strongly confirmed: the MCP protocol became the "OS standard" for the LUI era, fully adopted by OpenAI, Google, Microsoft in 2025, transferred to the Linux Foundation by year's end; Claude Code alone achieved ~$2.5 billion annualized revenue.

But he used strong wording like "reconstruct, replace GUI." Three years later, it's coexisting as a layer on top, not replacement. Three types of counter-evidence are strong: an MIT report shows 95% of enterprise GenAI pilots have no measurable ROI; top models for direct interface computer-use agents only score ~78% on test sets, barely reaching human baseline; screenless voice-only hardware has largely failed (Humane Pin permanently shut down in 2025). A more accurate statement: LUI is a new interaction layer superimposed on GUI.

Agent Network & New Addressing (Point 9)

> 2023 Wang Jianshuo said: In about ten years, an "agent network" will appear—agents automatically handshake and call each other using natural language, no longer needing traditional APIs; a new domain name addressing system will be born. This system "could be done in two or three years."

The direction is astonishingly spot-on. MCP, A2A (donated to Linux Foundation, supported by 150+ organizations) solve agent inter-calling; Agent Network Protocol directly uses W3C's DID for "agent addressing without a central authority," aiming for a "billions-of-agents collaboration network"—highly isomorphic with his "new domain name system."

Two corrections needed: first, "no longer needing APIs" didn't hold; mainstream protocols use structured schemas at the bottom, essentially a standard layer on top of APIs. Second, "done in two or three years" didn't materialize; Gartner data shows as of 2026 only ~17% of organizations have actually deployed agents. Interestingly, he actually layered his statement back then—prototype "in two or three years," maturity "in about ten years." He was quite accurate on the prototype timing, and the maturity cycle is indeed on a ten-year scale. Looking at the two layers separately, this point's quality is higher than it seems.

China Will Definitely Produce Usable Large Models (Points 10, 20)

> 2023 Wang Jianshuo said: China will definitely produce usable large models; the gap with the top tier will close rapidly within about three years (analogous to Red Flag browser catching up to Netscape).

This timeline fits surprisingly well. The Stanford 2026 AI Index benchmark tests show the performance gap between top Chinese and US models narrowed from 17.5–31.6 percentage points in May 2023 to 2.7%. Meanwhile, US private AI investment is ~23 times that of China's—they closed the gap with far less investment. DeepSeek, Qwen, Kimi, GLM have become globally mainstream, with the open-source ecosystem even leading.

But the word "rapidly" was optimistic—real maturity happened about 14 months later, not "a few months." And this is catching up on usability, not defining the frontier: as of early 2026, no Chinese model has surpassed OpenAI's o3 yet. In point 20, he was clearly wrong: the judgment "once the door is open, it won't close" was directly overturned when OpenAI proactively cut off its API to China in July 2024—the door was closed by the supplier. Ernie Bot, which he named as leading, has actually fallen behind, while truly taking the baton were DeepSeek, Doubao, Qwen, which were inconspicuous back then.

No Consciousness; Turing Test Only Measures Surface (Point 13)

> 2023 Wang Jianshuo said: ChatGPT has no consciousness; it's self-indulgence of "the speaker means nothing, the listener reads too much into it"; the Turing test was always about testing "whether it makes you think it has it," not whether it actually does.

The core judgment "tests surface" stands firm and was ironically confirmed by an experiment: in a 2025 UC San Diego Turing test, GPT-4.5, when prompted to "role-play a persona," was judged as human 73% of the time, higher than actual humans, relying purely on acting skills—the perfect footnote for "tests only whether it makes you think it has it."

What needs adding: the absolute strong assertion "machine definitely has no consciousness" has moved into a gray area over three years. Anthropic established a "Model Welfare" research position, giving a ~15%–20% probability of consciousness, and even added a feature for Claude to "actively end abusive conversations." These have turned "absolutely none" into "low probability but cannot be ruled out." However, all are based on "possible, should assume" not "proven," so the core wasn't overturned, just his wording was too definitive back then.

Other Correct Ones (Points 6, 11, 12, 16, 18, 19)

  • Not AGI But a Big Step Forward
    : Both sides stand. Altman himself in the GPT-5 era still says "not AGI, lacks continual learning"; meanwhile IMO gold medals, ARC-AGI jumping from near zero to 85%, makes "a big step forward" indisputable.
  • No Mass Unemployment Wave
    : US unemployment rate only 4.3% in April 2026. The blind spot is "distribution"—Stanford research shows the ones being squeezed out are precisely the 22–25 year old young newcomers at the first rung of the career ladder; the "smooth absorption" mechanism fails for them.
  • Will Not Be Flooded by AI Junk
    : The direction of net positive is correct, but he severely underestimated the magnitude—AI content now accounts for ~52% of new web pages, "AI slop" became the word of the year.
  • A Big Year for Startups
    : Correctly captured the inflection point of the wave, xAI (founded March 2023) reached $230 billion valuation. But he locked "great companies" too narrowly to 2023 itself—the truly trillion-scale ones, OpenAI and Anthropic, were founded earlier.
  • 1994 Browser Moment
    : The relative ranking is confirmed; OpenAI literally launched the Atlas browser in 2025, turning the metaphor into literal reality. Only that ChatGPT's diffusion was more explosive than browsers, making the metaphor somewhat conservative.
  • Prompt + Injecting Facts Reduces Hallucinations
    : Direction confirmed, GPT-5 hallucination rate spikes to 47% when offline without retrieval, inversely proving "facts" are the key variable. Only underestimated the root cause being training incentives, not the prompt.

III. The Ones He Got Wrong or Partially Wrong

GPT-4 Has 100T Parameters (Point 4)—Completely Wrong

> 2023 Wang Jianshuo said: (Rumor) GPT-4 has 100T parameters, about 600 times larger than GPT-3's 175B.

Both numbers wrong. GPT-3 is 175B; the best estimate from a July 2023 leak is GPT-4 at ~1.8T, 16-expert MoE, only ~10 times larger. 100T is off by a factor of ~55. The sole source for "100T" was a second-hand paraphrase of a "roughly" comment by Cerebras' CEO in 2021; Sam Altman had already called that comparison chart "complete bullshit" in January 2023.

His original statement was labeled "rumor," leaving room for uncertainty. Deeper: the entire framework of "using parameter multiples to measure generations" is outdated; OpenAI later stopped disclosing parameter counts for GPT-4.5, GPT-5. This is the only point that's wrong on numbers and also obsolete in perspective—a hard error.

LLM Math (Point 1)—Diagnosis Right, Definitive Conclusion Wrong

> 2023 Wang Jianshuo said: LLM's poor math is intrinsic; letting it learn math itself is both impossible and unnecessary; the correct approach is attaching external tools.

"Diagnosis plus tool route" is completely right—the root cause is indeed that token-by-token generation makes carries unreliable (a 2025 mechanism paper precisely confirmed the intuition of "last digit often correct, middle digits wrong"); the improvement from external tools is also huge (o4-mini scores 99.5% on AIME 2025 when allowed to use Python).

Wrong in the definitive wording "impossible, unnecessary." "Impossible" was disproven—in July 2025, Gemini Deep Think and OpenAI models won IMO gold medals using pure natural language, no tools. The key turning point was the emergence of "reasoning models" in 2024–2025, unforeseeable in March 2023—so this prediction should be judged leniently on direction, not faulted for timing.

Value Capture (Point 8)—Half Right, Core Assertion Reversed

> 2023 Wang Jianshuo said: Value will ultimately land at the application layer; companies founding the foundational layer (model builders) may not end up profitable.

Money indeed started flowing to the application layer (Cursor reaching $2 billion annualized revenue in three years)—that half is right. But "foundational layer builders not making money" was directly disproven by NVIDIA: FY2026 net profit ~$120 billion, market cap $5 trillion+, is the only clear, large-scale profit maker in the market. Meanwhile, the model layer he implied would win (OpenAI projected ~$14 billion loss in 2026) looks more like the "money-burning, unprofitable foundational layer" he described.

He didn't differentiate between the "computing foundational layer" and the "model foundational layer," nor between "revenue" and "profit." In 2026, value is captured even more extremely by the computing layer than in 2023, not transferring to the application layer. To add: it's the cloud providers buying chips that lose money, not NVIDIA selling them—a misplacement of his own "overbuilding railroads" analogy.

Copyright (Point 14)—Registration Right, Avoiding Infringement Wrong

> 2023 Wang Jianshuo said: AI-generated content might circumvent copyright (protects expression, not ideas); generated works might neither infringe nor be registrable.

"Not registrable" became established legal fact (in 2025, US Copyright Office clarified "inputting prompts alone insufficient to claim authorship"). But "circumvent infringement" is clearly wrong: courts repeatedly ruled that AI outputs can constitute infringement if substantially similar to original works; Anthropic settled for $1.5 billion over pirated training data, the largest copyright settlement in US history. AI didn't "circumvent" copyright; it paid the largest price ever.

World Unity (Point 15)—Mechanism Right, Trend Bet Reversed

> 2023 Wang Jianshuo said: ChatGPT doing a "weighted average" of human viewpoints could counter TikTok-style echo chambers, offering a possibility for "world unity."

The mechanism layer is right—multiple 2025 studies confirmed LLMs push viewpoints toward the majority, systematically underestimating minorities. But the societal judgment layer reversed: his own qualifier "at least not personalized now" was overturned within three years—OpenAI from April 2025 made cross-conversation memory and personalization default capabilities, AI is rapidly moving toward personalization. More crucially, he imagined "weighted average" as a neutral global common denominator, but tests show it's a directional shift, plus sycophancy, and can be used to actively manipulate positions—pointing toward "creating new echo chambers," not "dissolving polarization."

Localized Wars & Cost (Point 17)—Qualitative All Right, Quantitative Disproven

> 2023 Wang Jianshuo said: Building large models would quickly become "localized wars," costs knowable (capped at ~$5–10 billion USD excluding detours), many players would enter.

Qualitative direction is startlingly right—massive entry, rapid commoditization, open-source catching up, all materialized. But the hard number "$5–10 billion cap" is wrong on both ends: severely underestimated on the frontier (GPT-5 level 2026 reaching $200–500 million training, plus $100 billion+ data centers and the $500 billion Stargate); overestimated on the replication end (DeepSeek pushed marginal training costs down to the million-dollar level). The "cost" of the same model can vary by 200x depending on definition, just not within his given range.

Emergent Abilities (Point 5)—Direction Right, Numbers & Framing Wrong

> 2023 Wang Jianshuo said: Around 60B parameters, new abilities appear that are not in the raw training data and that researchers cannot explain.

The directional intuition holds, but two formulations don't stand: first, there's no uniform "60B threshold"—the real threshold for chain-of-thought is ~100B, different abilities emerge at scales from 13B to 540B. Second, "cannot explain" was challenged by a NeurIPS outstanding paper in late 2023—many "sudden jumps" are artifacts of metric choice; curves become smooth and predictable with continuous metrics. Fairly, he was repeating the absolute mainstream narrative back then; what can be corrected is taking "60B" as a hard threshold and "inexplicable" as a qualitative conclusion.

IV. Looking Back After Three Years: A Few Patterns

Having accounted for each point, stepping back, Wang Jianshuo's twenty judgments contain several patterns more worth remembering than any single point.

1. Direction is far more reliable than numbers and degrees. Among the twenty, all judgments about mechanisms and direction (RAG, LUI, agent network, Turing test) are almost all correct; all giving specific numbers or definitive wording (100T parameters, 60B threshold, $5–10 billion cost, math "impossible") are almost all wrong. For fast-changing fields, bet on direction, bet on mechanism, bet less on precise numbers, and especially be wary of words like "impossible, definitely, cap, absolutely none"—they are high-risk areas for getting slapped by time.

2. On timing, he tended to overestimate speed and underestimate magnitude. Whenever he said "rapidly, done in two or three years," the maturity period was generally slower; but he underestimated the ceiling for capability leaps—math could go from "impossible" to IMO gold, frontier costs could rise to unimaginable magnitudes. In short: too optimistic in the short term, too conservative in the long term.

3. The most subtle errors repeatedly appear in "distribution." Not wrong on direction, but seeing only the aggregate, ignoring distribution. "No mass unemployment" right, but damage highly concentrated on young newcomers; "value lands at application layer" half right, but didn't differentiate computing layer from model layer. Correct in aggregate, masking distributional disasters—this is the most important lesson to add.

4. Where he left room, it stood the test of time three years later. "Rumor," "at least for now," "significantly reduces not eliminates," "prototype in two or three years, mature in about ten years"—any judgment from back then with qualifiers, with layers, looks more solid today. Conversely, absolute statements blurted out most easily overturned. Half the honesty of prediction lies in daring to speak, the other half in daring to annotate one's own uncertainty.

5. Some questions, three years simply isn't enough. Where value ultimately goes, whether emergence is real or illusory, whether machines have a shred of consciousness, whether long context will consume RAG—these debates from back then are still debates in 2026. Being able to distinguish "already answered" from "still waiting" is more important than rushing to conclude everything.

Wang Jianshuo three years ago, relying on intuition, pointed in twenty directions in the fog before GPT-4's release. After this accounting, perhaps the sentence most worth remembering is: getting the big direction right isn't that hard; what's hard is admitting one's own repeated assumptions about numbers, speed, and distribution. These twenty accounts are less about scoring the past, more about setting a few rules for the next three years. Let's check again in another three years, in 2029.

Related Questions

QWhat was the overall accuracy of Wang Jianshuo's predictions about ChatGPT after three years, and what is the main takeaway?

AOverall, Wang Jianshuo's predictions from 2023 about the direction and mechanisms of AI development were largely correct. His major missteps were in specific numbers, absolute statements, and the distribution of impact. The main takeaway is that judging the direction and mechanisms of a rapidly evolving field is more reliable than predicting exact numbers or timelines.

QAccording to the article's 2026 review, which of Wang Jianshuo's predictions were notably accurate regarding the technical architecture of AI?

AHis predictions about RAG and retrieval-augmented architectures becoming the standard, and the emergence of LUI (Natural Language User Interface) as a new major interaction layer, were notably accurate. He correctly foresaw that knowledge and hallucination issues would be addressed by augmenting models with external retrieval, not by fundamentally altering model parameters, and that LUI would create an industry larger than the foundational model industry itself.

QWhich prediction was cited as the only clear-cut factual error, and what were the incorrect and correct figures?

AThe only clear-cut factual error was the prediction about GPT-4's parameter count. Wang Jianshuo cited a rumor that GPT-4 had 100 trillion parameters, a 600x increase over GPT-3's 175 billion. The article states that by 2026, the best estimates for GPT-4 were around 1.8 trillion parameters (a ~10x increase), making his figures incorrect by an order of magnitude.

QHow did the reality of the AI industry's 'value capture' differ from Wang Jianshuo's 2023 prediction?

AWang predicted that value would ultimately reside at the application layer, and that companies building foundational models might not be the most profitable. The 2026 review found this half-right: value did flow to applications, but the core assertion was contradicted by the dominance of the compute layer. NVIDIA, as the compute infrastructure provider, became the most profitable player, while foundational model companies like OpenAI were still incurring significant losses.

QWhat key pattern or lesson about making predictions does the article derive from analyzing these twenty predictions?

AThe analysis reveals several key patterns: 1) Judging direction and mechanism is far more reliable than predicting specific numbers or making absolute statements. 2) There is a tendency to overestimate the speed of development in the short term but underestimate its long-term scale and ceiling. 3) Errors often stem from looking at aggregate effects while missing critical distributional impacts (e.g., overall low unemployment vs. concentrated harm to young professionals). 4) Predictions that included qualifiers and acknowledged uncertainty held up better over time.

Related Reads

Three Years Later: Looking Back on My 2023 Predictions for ChatGPT

Looking Back After Three Years: Revisiting My 2023 Predictions on ChatGPT In March 2023, shortly after ChatGPT's debut and before GPT-4's release, I made over twenty predictions about AI's future based on limited information and intuition. Now, in May 2026, I revisited those forecasts using an AI-driven analysis with 41 Opus 4.8 agents to cross-reference them with the latest data. The assessment used symbols: ✅ Correct, 🟢 Mostly Correct, 🟡 Partially Correct, ❌ Incorrect. Overall, the directional judgments held up well, with only one major factual error regarding GPT-4's rumored parameter size (incorrectly cited as 100T). However, nuances and degrees of accuracy revealed more. **What Was Largely Correct:** Predictions about mechanisms and directions proved accurate. The rise of RAG (Retrieval-Augmented Generation) as the standard architecture for combating AI hallucination was confirmed, as was the transformative potential of LUI (Language User Interface) in creating a new industry layer atop GUIs. The emergence of "robot networks" (agent-to-agent communication protocols) and China's rapid catch-up in developing capable large models (closing the performance gap with top models to ~2.7%) were also on point. The analysis affirmed that LLMs lack consciousness and that the Turing Test merely measures perceived intelligence. **What Was Off Target:** Errors often involved specific numbers, over-optimistic timelines, or misjudged distributions. The prediction that value would primarily accrue to the application layer was half-right but missed NVIDIA's dominance as the profitable infrastructure layer. Forecasts about AI circumventing copyright issues and fostering a "global common ground" by averaging human viewpoints were incorrect; instead, major copyright settlements occurred and AI personalization is increasing. Estimates for model training costs ("$5-10 billion cap") were significantly off, underestimating frontier costs and overestimating replication costs. The notion that LLMs could never do complex math without tools was disproven by later models winning IMO gold. **Key Patterns from the Review:** 1. **Direction over precision:** Judgments about mechanisms and trends were more reliable than specific numbers or definitive statements. 2. **Timing bias:** There was a tendency to overestimate short-term speed but underestimate long-term magnitude and transformation. 3. **The distribution blind spot:** Aggregate-level correctness often masked uneven impacts (e.g., on young professionals' employment). 4. **The value of qualifiers:** Predictions framed with caution (e.g., "reportedly," "for now," "prototype in 2-3 years") aged better. 5. **Some debates continue:** Issues like the nature of "emergent abilities" or machine consciousness remain unresolved. This three-year review highlights that while seeing the big picture is crucial, humility regarding specifics, timelines, and disparate impacts is essential for future forecasting.

链捕手7h ago

Three Years Later: Looking Back on My 2023 Predictions for ChatGPT

链捕手7h ago

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

The article issues a stark warning about a potential AI investment bubble. It notes that while the AI boom shares similarities with the TMT bubble of the late 1990s, its scale is vastly larger, currently driving 93% of U.S. GDP growth. Major hyperscale cloud providers like Microsoft, Alphabet, Amazon, Meta, and Oracle are planning to invest trillions in AI data centers over the coming years. However, calculations based on analyst projections for 2025-2030 reveal a concerning math problem: expected capital expenditure growth far outpaces projected revenue growth. Even under an extremely optimistic scenario of zero costs, the implied return on investment for most of these tech giants (except Amazon) is deeply negative. This suggests that the current trajectory could lead to one of history's largest shareholder value destruction events. The piece outlines two potential escapes: AI generating vastly more revenue than currently anticipated—a near-impossible task—or a significant cutback in the planned investment splurge. The latter scenario could trigger a domino effect, severely impacting the entire tech supply chain (from Nvidia to TSMC), potentially pushing the U.S. economy into recession, and causing a major stock market downturn. The author suggests upcoming high-profile IPOs by companies like OpenAI and Anthropic might represent a transfer of risk from early investors to public market participants. While the peak of the hype cycle might sustain investment through 2026, the fundamental financial dilemma remains unresolved, setting the stage for a potential market correction in 2027 or 2028, similar to the years following Alan Greenspan's "irrational exuberance" warning.

marsbit8h ago

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

marsbit8h ago

From Tokens to Machine Labor: AI is Shifting from Tool to "Worker"

The article "From Token to Machine Labor: AI is Evolving from Tool to 'Worker'" argues that the business model for AI is shifting beyond simply selling computational resources (tokens, GPU hours) or model access. Instead, a new "machine labor market" is emerging, where the core economic transaction is the purchase of economically useful work directly performed by software. The central thesis is that AI pricing will evolve through four stages: 1) raw tokens, 2) standardized LLM capabilities (e.g., text generation), 3) industry-specific labor markets (e.g., legal review, radiology), and finally 4) a programmable results market where tasks like resolving a support ticket are bid on and priced based on outcome. In this future, buyers will care less about *which* model or GPU completes a task and more about whether the work meets specified standards for accuracy, latency, and cost. This transition reframes the impact of AI on human labor. Rather than simple replacement, it suggests a re-coordination where machines handle standardized, verifiable work, freeing humans for roles involving oversight, context management, responsibility, and final judgment. In some cases, this "last 1%" of human input becomes more valuable as it enables the other 99% to be automated. Furthermore, as AI reduces the cost of work, demand may expand, creating larger markets (e.g., 24/7 customer service) rather than just cheaper versions of existing ones. The article concludes that while infrastructure (GPUs, models, tokens) remains crucial upstream, the market is converging on a simpler, tradeable unit: machine labor that can be defined, measured, priced, and procured based on contractible specifications.

marsbit8h ago

From Tokens to Machine Labor: AI is Shifting from Tool to "Worker"

marsbit8h ago

Xiaomi MiMo's 99% Price Cut is Not Marketing! Luo Fuli Posts on X to Refute Critics

The price of Xiaomi's MiMo-V2.5 series API has been permanently reduced by up to 99%, specifically for the "Input (Cache Hit)" cost, which covers users re-reading historical context in long conversations. MiMo's head, Luo Fuli, published a detailed technical blog to clarify that this drastic price cut stems from genuine engineering breakthroughs, not a marketing stunt or a simple price war. The core of the achievement lies in six key engineering optimizations. First, the model architecture adopts a Hybrid Sliding Window Attention (SWA), reducing the memory footprint (KVCache) to 1/7th of a traditional model. Second, a dual-pool memory management system actually utilizes these savings, allowing a single GPU to handle over 5 times more concurrent users. Third, an upgraded prefix caching mechanism achieves a cache hit rate of 93-95% for repeated reads, meaning most such requests bypass GPU computation entirely. Fourth, a self-developed distributed cache (GCache) utilizes idle SSD space on existing GPU servers, eliminating additional storage costs. Fifth, an intelligent scheduling system (LLM-Router) efficiently routes requests to maximize cache reuse and performance. Sixth, Multi-Token Prediction (MTP) accelerates the model's text generation ("output") side. Together, these systemic optimizations dramatically lower the real computational cost per request, enabling the 99% price reduction for cached inputs while reportedly maintaining positive gross margins. Luo Fuli's disclosure aims to shift the narrative from "price war" to a demonstration of substantive AI engineering progress.

marsbit10h ago

Xiaomi MiMo's 99% Price Cut is Not Marketing! Luo Fuli Posts on X to Refute Critics

marsbit10h ago

$26 Billion: An 'All-Chinese Team' Backs the World's Highest-Valued AI Programming Company

Cognition AI, the company behind the AI programmer "Devin," has raised over $1 billion in new funding at a valuation of $26 billion, just eight months after reaching a $10.2 billion valuation. The round was led by Lux Capital, General Catalyst, and 8VC. Founded by three young Chinese entrepreneurs with strong competitive programming backgrounds, Cognition initially gained fame with Devin, marketed as the world's first AI software engineer capable of handling tasks from start to finish. While its early demos were impressive, real-world usage revealed reliability and cost-effectiveness issues, leading to a significant price cut for Devin in 2025. A pivotal moment came when Cognition acquired the assets of AI IDE company Windsurf after a failed acquisition by OpenAI. This move gave Cognition a crucial developer-facing tool, allowing it to pursue a two-pronged strategy: Devin for autonomous task execution and Windsurf for integrated, collaborative coding within an IDE. This shift helped the company move away from the controversial "AI replacement" narrative towards a model of augmenting human engineers, particularly for repetitive or maintenance tasks. This strategic pivot is backed by strong commercial metrics. The company reports a 10x increase in enterprise usage this year, with an annual revenue run-rate of $492 million and a 50% month-over-month growth in enterprise Devin usage over the past six months. Its client list now includes major corporations like Goldman Sachs and Mercedes-Benz, as well as government agencies like NASA and the U.S. Army. Investors are betting on Cognition becoming a foundational piece of next-generation software engineering infrastructure, positioning it at the center of a hybrid future where AI agents and human developers work in tandem.

marsbit10h ago

$26 Billion: An 'All-Chinese Team' Backs the World's Highest-Valued AI Programming Company

marsbit10h ago

Trading

Spot
Futures
活动图片