Apple's Desired On-Device AI Sees a Dark Horse Emerge: The First Cognitive Model is Born, 4B Matches GPT-5.4

marsbitPublished on 2026-06-09Last updated on 2026-06-09

Abstract

A Chinese company, Tomorrow's Journey (Nextie), has introduced what it is calling the industry's first "cognitive model" for edge devices. Named New Journey Alpha, this 4-billion-parameter model reportedly matches the performance of trillion-parameter giants like GPT-5.4 in group intelligence tasks such as debate and collective decision-making. The development follows Andrej Karpathy's vision of stripping vast factual knowledge from large language models to retain only a smaller "cognitive core" capable of reasoning, planning, and knowing its own limits. This approach directly addresses the soaring computational costs and token expenses hindering AI's widespread deployment, as highlighted by incidents like Amazon shutting down an internal AI tool due to prohibitive costs. Trained via reinforcement learning on a corpus of academic papers from 1800-2020 to enhance generalization, the model enables three key advancements: 1) Improved decision quality in multi-agent systems, 2) Drastically reduced compute costs, allowing for cost-effective cloud or on-device (e.g., MacBook) deployment, and 3) The feasibility of "proactive" AI agents that act autonomously without user prompts, unlocking new commercial possibilities beyond today's reactive models. Built by the former Microsoft Xiaoice team—known for creating a 3.6B model that outperformed a 65B Llama model—the company is now focusing on the multi-agent systems sector, a field gaining significant investor interest. The model's ec...

[Introduction] At the just-concluded WWDC, Siri's rebirth powered by AI was a key topic, and 'on-device models' have become a trend! Earlier, Andrej Karpathy called for stripping models of knowledge, retaining only the 'cognitive core'. A Chinese company claims to have realized this direction—4B parameters, achieving performance akin to trillion-parameter large models in collective intelligence tasks. What can on-device cognitive models truly change?

Last night, Siri was reborn with the help of Google's 1.2 trillion-parameter Gemini.

However, on the other hand, Amazon shut down its highly controversial internal AI leaderboard—employees extensively used AI tools, causing computing costs to skyrocket to a point where management could no longer ignore it.

Token cost has become the hardest barrier to large-scale AI adoption.

Andrej Karpathy previously suggested a direction in an interview: strip the massive knowledge from the model, leaving only a 'cognitive core' capable of thinking, planning, and knowing what it doesn't know—1B-level parameters would suffice.

https://www.youtube.com/watch?v=lXUZvyajciY

This direction is being validated.

A 4B-parameter model has achieved results equivalent to trillion-parameter large models like GPT-5.4 in collective intelligence tasks, and supports on-device deployment.

It comes from a founding team that previously topped the Japanese Hugging Face leaderboard with a 3.6B model defeating the 65B Llama.

This time, they have created the industry's first on-device cognitive model.

Karpathy's Prediction and the Bill for Compute

The pressure of compute costs has shifted from a technical issue to a financial one, with Amazon's case being just the tip of the iceberg.

Amazon employees frequently used internal AI tools to call upon large model inference capabilities, driving up overall compute expenditures, forcing management to urgently halt the leaderboard mechanism to curb usage.

https://www.ft.com/content/b1a62a7f-6df5-4c90-94ce-64ce9c9961b6?syn-25a6b1a6=1

The industry is experiencing its first 'Token retreat,' with the daily compute consumption of some companies reaching the hundred-million-yuan level.

The business model of large models is hitting a structural wall: the more powerful the capability and the deeper the reasoning chain, the higher the cost per call.

GPU Cost / Revenue is a key metric for all AI companies, and the trend of continuously expanding model parameters only worsens this ratio.

Karpathy's thinking points to another path: he proposed the need to strip 'memory/knowledge' from models, retaining what he calls the 'cognitive core'—

An entity stripped of massive facts and knowledge, but retaining thinking algorithms, intelligent magic, and problem-solving strategies.

He believes that even at a scale of 1 billion parameters, efficient human-like thinking can be achieved:

It will think like a human... If you ask it a factual question, it might need to look it up—it knows it doesn't know and will go check.

This statement sparked widespread discussion in the tech community.

Consensus on the direction is forming, but teams that can push the 'cognitive core' from concept to deployable product are the real variables.

4B Matches Trillion-Parameter: What Nextie Alpha Has Done

The entity that pushed Karpathy's described 'cognitive core' from concept to product is Nextie.

This company conducted reinforcement learning training on open-source reasoning models, decoupling knowledge from cognition—stripping the model of memorized knowledge reserves while enhancing generalization and abstract thinking capabilities.

The resulting model is named Nextie Alpha, with a parameter scale of 4B. It has completed training and been deployed, making it the industry's first product defined as a 'cognitive model'.

Specifically regarding its training method, it actually starts from an uncommon origin.

The Nextie team compiled human academic papers from 1800 to 2020, spanning 220 years, attempting to trace the evolutionary trajectory of collective intelligence as a reference for their technical roadmap.

Based on this research, they performed reinforcement learning on open-source reasoning models, focusing on improving generalization and abstraction capabilities.

To give an intuitive example: the trained model can transfer a Go player's decision-making patterns to daily life scenarios—Karpathy's idea of 'retaining thinking algorithms' finds concrete technical implementation here.

In terms of effectiveness, Nextie Alpha achieved output quality equivalent to large models like GPT-5.4 in collective intelligence tasks (debate, reflection, challenge, voting, etc.) with 4B parameters, offering significant advantages in compute consumption and inference speed.

More noteworthy is the application space unlocked by this model, with three layers of progressive significance.

First layer: improvement in multi-agent decision-making quality.

Within the Harness decision-making framework, using the cognitive model yields better output than using a reasoning model.

Upgrading the underlying model from 'reasoning' to 'cognition' brings a leap in the overall quality of the decision-making chain within multi-agent collaborative systems.

Second layer: orders-of-magnitude reduction in compute costs.

4B compared to trillion-parameter models drastically reduces compute overhead for cloud deployment.

Nextie Alpha also supports on-device deployment—it can run directly on MacBooks, embodied intelligence devices, etc., converting compute costs into electricity costs.

This is particularly significant for the field of embodied intelligence: using a trillion-parameter model to drive a household robot consumes large amounts of Tokens with every 'thought,' potentially making the comprehensive cost higher than hiring a human. 4B on-device deployment fundamentally rewrites this equation.

Third layer: unlocking proactive scenarios.

Currently, the vast majority of AI products operate in a reactive mode—the user gives a command, the model responds.

Proactive mode means the agent makes decisions and executes tasks autonomously, without waiting for commands. The commercial scale of proactive mode far exceeds that of reactive mode but has always been blocked by compute costs in the past.

Nextie Alpha supports 24/7 operation at a controllable cost, making previously shelved proactive agents—deemed too expensive—possible.

Team's Trump Card and Positioning in the Race

Nextie was founded by the founding team of Microsoft Xiaoice.

This team's signature is 'winning with small parameters against large parameters'—their previously trained open-source model rinna (Japanese Xiaoice) topped the Japanese Hugging Face leaderboard with 3.6B parameters, defeating the 65B Llama.

Nextie Alpha achieving effects comparable to trillion-parameter models with 4B continues the same technical lineage.

Nextie is heavily investing in the track of—Harness group multi-agent.

This track is gaining validation from top-tier capital—in March 2026, OpenAI invested in the startup Isara, directly pushing its valuation to $650 million. Isara's research focus is precisely multi-agent collaboration and collective intelligence.

https://www.wsj.com/tech/ai/openai-backs-new-ai-startup-seeking-bot-army-breakthroughs-a0b1fedc

In intelligent depth evaluations (IDI) for this field, Nextie's comprehensive performance is significantly higher than any single large model.

Capital validates the track's value, while evaluation data pinpoints Nextie's position within it.

The combination of these two signals points to the same judgment: group multi-agent is the next high-value direction in the AI application layer, and cognitive models are the key infrastructure driving it.

Cognitive Models Change Not Just Parameters, but the Ledger

GPU Cost / Revenue is the sword of Damocles hanging over all AI companies.

The solution provided by cognitive models fundamentally points to the reconstruction of the economic model—achieving with 4B what previously required trillion parameters means an entirely different cost structure for the same output quality.

Nextie revealed in an interview that the team is training a more generalized 8B cognitive model.

If 4B can already match GPT-5.4 in collective intelligence tasks, the capability boundaries of 8B are worth anticipating.

A more profound question is left for the entire industry: When the cost of running a cognitive model on-device 24/7 drops to a negligible level, all AI products designed today based on the 'user commands, model responds' reactive model may need to re-examine their product forms.

The commercial imagination space for proactive agents far exceeds everything under the current reactive agent paradigm.

This article is from the WeChat public account 'New Zhiyuan', author: ASI启示录

Related Questions

QWhat is the core idea behind Andrej Karpathy's proposed 'cognitive core' for AI models?

AAndrej Karpathy proposed stripping the vast 'knowledge/memory' from large models, leaving only a 'cognitive core'—a smaller entity (potentially around 1B parameters) that retains thinking algorithms, problem-solving strategies, and the awareness of what it does not know, enabling it to reason and look up information like a human.

QWhat company developed the 'NewCheng Alpha' model, and what is its key achievement?

AThe 'NewCheng Alpha' model was developed by Nextie (Tomorrow's New Journey). Its key achievement is being an industry-first edge-deployable cognitive model with 4B parameters that achieves output quality equivalent to trillion-parameter models like GPT-5.4 in collective intelligence tasks.

QWhat are the three main advantages or implications of deploying the 4B NewCheng Alpha model?

AThe three main advantages are: 1) Enhanced decision quality in multi-agent systems, 2) Drastic reduction in computational power costs (enabling edge deployment on devices like MacBooks), and 3) Unlocking Proactive AI scenarios where agents can operate autonomously 24/7 at a sustainable cost.

QWhat industry trend does the article mention regarding AI operating costs and business models?

AThe article mentions that token/inference costs are becoming a major barrier to AI scalability. Companies are facing unsustainable GPU Cost/Revenue ratios, leading to a 'Token Great Retreat' where even large firms like Amazon are curbing internal AI tool usage due to soaring compute expenses.

QBased on the article, what is the potential significance of the Harness multi-agent system and cognitive models?

AHarness multi-agent systems, driven by efficient cognitive models like NewCheng Alpha, represent a high-value future direction for AI application. They enable superior collective intelligence performance and could fundamentally shift AI product design from Reactive (command-response) to Proactive (autonomous action) models, vastly expanding commercial possibilities.

Related Reads

OpenAI's 'Blueprint for the Future': Making AI Beneficial for Every Person on the Planet

A new transformative technology emerges every few generations. OpenAI draws a parallel with the advent of electricity in the 1920s, which initially brought convenience but ultimately enabled unprecedented progress in medicine, engineering, and living standards by empowering people to create new possibilities. AI is poised to recreate this phenomenon. Its true significance lies not in the technology itself, but in what people can achieve with it—from understanding a medical bill or starting a business to aiding scientific discovery. OpenAI believes AI should be universally accessible, allowing everyone to use it according to their own needs. This future, however, is not guaranteed. While transformative tech can centralize power, OpenAI's philosophy is that AI must serve humanity, augmenting human capabilities and broadly distributing its benefits. The company's first commitment is to build AI for human service, aiming to empower the many rather than concentrate power in a few. Safety, alignment with human intent, and oversight are paramount. OpenAI is optimistic about AI's potential to expand human welfare but remains clear-eyed about risks. The goal is to help people achieve more, not to replace them. Full automation is not the desired future; human judgment, values, and direction will become even more critical. OpenAI outlines three core goals: 1. Build automated AI researchers to accelerate and increasingly automate the research process itself, maintaining close human collaboration. The internal projection is that by March 2028, a significant portion of their research will be conducted by AI systems working alongside human researchers. 2. Accelerate economic development by advancing science, boosting productivity, and fostering growth, while ensuring the fruits are widely shared. 3. Provide a personal AGI for everyone on Earth, allowing individuals to benefit from this transformative technology in their own way. The company is entering its third phase, moving from foundational AGI research (Phase 1) to product deployment and learning from real-world use (Phase 2). The current challenge is making advanced AI abundant, affordable, safe, practical, and usable for all individuals and organizations. OpenAI concludes that a widely distributed power structure leads to a more resilient, adaptable, and free society. A positive AI future should not be controlled by a handful of entities but built, benefited from, and owned by many. If realized correctly, AI can become a cornerstone for enhancing global productivity, creativity, scientific advancement, and economic opportunity, fulfilling the mission to ensure AGI benefits all of humanity.

marsbit3h ago

OpenAI's 'Blueprint for the Future': Making AI Beneficial for Every Person on the Planet

marsbit3h ago

Arthur Hayes' New Article: AI Bubble Nears Bursting, Crypto Market Faces Short-Term Pressure

In a new essay, Arthur Hayes argues that the AI market bubble is approaching a rupture, which will place significant short-term pressure on crypto assets. He identifies rising oil prices, a trio of massive tech IPOs (SpaceX, Anthropic, OpenAI), and potential anti-AI political rhetoric from Trump as the three key catalysts for a correction. Hayes posits that the prolonged blockage of the Strait of Hormuz will drive energy prices higher, increasing operational costs for data centers and squeezing AI company profits. Simultaneously, the market may struggle to absorb the upcoming wave of multi-trillion dollar tech IPOs. Furthermore, with high inflation hurting his election chances, Trump could pivot to attacking the AI sector with proposals for heavy taxation and regulation to win over voters, spooking the market. Hayes notes that nearly all new dollar liquidity since 2022 has flowed into the AI sector, leaving little for Bitcoin, explaining its recent underperformance. He believes an AI stock crash would trigger a broad risk-off sentiment and credit contraction, dragging down crypto in the near term. Consequently, his fund, Maelstrom, has sold all AI-related stocks and non-core cryptocurrencies, retaining only Bitcoin and Ethereum while building positions in traditional energy stocks. He anticipates Bitcoin will bottom and resume its bull run only after the AI bubble pops and a new monetary easing cycle begins.

marsbit3h ago

Arthur Hayes' New Article: AI Bubble Nears Bursting, Crypto Market Faces Short-Term Pressure

marsbit3h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of S (S) are presented below.

活动图片