Understanding the New Economic Model of Tokenization

marsbitPublished on 2026-05-19Last updated on 2026-05-19

Abstract

Understanding the New Token Economics Model The commercialization of AI applications is evolving from selling software and subscriptions to selling token call capacity. Tokens, the fundamental unit of information processing for large language models (LLMs), have become the basis for API billing and consumption. With call volumes exploding, tokens themselves are now being traded—procured, routed, split, and resold—forming a new intermediary market. This layer connects upstream LLM providers with downstream developers and enterprises, acting as a global wholesale-to-retail liquidity network. The rise of this business is fueled by a massive surge in China's daily token call volume—growing over a thousandfold from 100 billion in early 2024 to over 140 trillion by March 2026—and significant improvements in domestic LLM capabilities, which are now competitive globally. The core value of token distribution platforms extends beyond simple arbitrage. Key functions include aggregating multiple models (like GPT, Claude, and domestic models such as Kimi and DeepSeek) under a unified API, lowering network and payment barriers, and providing enterprise services like model selection, prompt engineering, and system integration. Profit models are diversifying: (1) resale margins; (2) technical premiums from proprietary inference acceleration (e.g., reducing costs to 1/10 of the industry standard); and (3) enterprise value-added services. High-consumption scenarios like marketing, short-f...

Author: Zhao Ying

Source: Wall Street News

The commercialization of AI applications is extending from selling software and memberships to selling token-calling capabilities. Here, Tokens refer to the smallest units of information processed by large models, serving as the basis for API billing, settlement, and consumption. As the volume of calls increases, Tokens themselves are beginning to be procured, routed, split, and resold like a form of "inventory."

Chen Liangdong, an analyst at Huayuan Securities, summarized the core change in a recent media industry report: "Token operations are forming a new intermediary market, which involves exploring token distribution models to connect upstream large model providers with downstream developers, enterprises, and individuals. The essence is the liquidity infrastructure for a global network of token wholesale to retail."

The background for this business is not complex: On one hand, China's daily token call volume has surged rapidly, rising from 100 billion tokens per day at the beginning of 2024 to 100 trillion by the end of 2025, surpassing 140 trillion by March 2026. On the other hand, domestic large models have improved significantly, entering the global top tier in certain rankings and call volumes. With increasing demand and a growing number of models, the real barriers to transactions have become payment, network access, interfaces, compliance, distribution channels, and scenario implementation.

However, token distribution cannot be simply understood as "reselling API quotas." The thinnest layer of profit comes from resale margins, while the thicker portion comes from inference acceleration, unified interfaces, enterprise-level prompt engineering, Agent orchestration, model selection, and integration with business systems. Precisely because the entry barrier is not high, the risks in this market are equally direct: intensified competition, funding requirements for upfront payments, bad debts, and policy changes from upstream model providers can all squeeze the profits of the intermediary layer.

Tokens Now Have "Wholesalers" and "Retailers"

The basic chain of token distribution includes three types of roles.

Upstream are the model providers, including ByteDance's Seedance series, Alibaba's Qwen series, Zhipu's GLM series, Moonshot AI's Kimi series, DeepSeek series, etc. They are the original suppliers of tokens.

In the middle are agency platforms responsible for procuring resources from upstream model providers and distributing them to end-users. Their work is not just about reselling quotas; they also convert the interface protocols of different models into a unified API format, enabling downstream users to access multiple models through a single API Key.

Downstream are the actual consumers of tokens, including individual users, developers, enterprise clients, and possibly lower-tier distributors.

The value of this intermediary layer focuses on several areas: reducing network barriers through domestic direct connections; enabling a single codebase to adapt to multiple models; supporting both personal and corporate payments; potentially obtaining lower costs through bulk procurement; and aggregating models like GPT, Claude, DeepSeek, and Kimi on one platform to reduce the cost of repeated integration for developers.

Thus, token distribution appears to be asset-light, requiring neither the training of large models nor massive server clusters. The core assets become the API routing and scheduling system, upstream model resources, channel clients, and service capabilities.

The Surge in Call Volume is the Most Direct Fuel for This Business

For the token operation model to succeed, there must first be a sufficiently large consumption volume.

China's daily token call volume increased more than a thousandfold in two years, from 100 billion to over 140 trillion tokens. This expansion stems from the deployment of various vertical Agents and the embedding of generative AI into more business processes by enterprises.

IDC data presents an even more aggressive trajectory: the number of active intelligent agents in Chinese enterprises is expected to exceed 350 million by 2031, with a compound annual growth rate (CAGR) exceeding 135%. As the density and complexity of agent tasks increase, the annual growth rate in token consumption by agents is projected to exceed 30-fold.

This change is already visible in execution-oriented agents. The weekly token consumption of OpenClaw on the OpenRouter platform increased from 0.81T between February 2 and March 16, 2026, to 4.97T, with its share rising from 8.31% to 24.36%.

Once tokens become a mass-consumed commodity, their procurement, pricing, routing, and settlement naturally stratify. Model providers may not directly serve every client, and end customers may not be willing to integrate with each model individually, creating space for the intermediary layer.

The Cost-Effectiveness of Domestic Models Opens the Door for Token Export

The improvement in domestic large model capabilities is a key variable enabling token distribution to expand from domestic to cross-border markets.

Data from SuperCLUE shows that domestic models like ByteDance's Doubao and the DeepSeek series have achieved overall scores exceeding 70 points, narrowing the gap with leading overseas models like GPT-5.4 and Gemini. Models like Tongyi Qianwen, Kimi, and Zhipu GLM have also formed a relatively clear tiered structure.

According to OpenRouter data, for the week ending May 10, 2026, Tencent's Hy3 preview (free) topped the call volume list. Among the top 5, top 10, and top 20 models, there were 2, 6, and 9 domestic large models, respectively.

A more significant change occurred in Q1 2026. From February 9 to 15, the call volume of Chinese models on OpenRouter reached 4.12 trillion tokens, surpassing the 2.94 trillion tokens of US models for the first time in the same period. From February 16 to 22, the weekly call volume of Chinese models further increased to 5.16 trillion tokens. Among the top five models on the platform by call volume, four were from Chinese providers: MiniMax M2.5, Kimi K2.5, Zhipu GLM-5, and DeepSeek V3.2, collectively accounting for 85.7% of the total call volume of the top five.

The price advantage is also prominent. The input price for both MiniMax M2.5 and GLM-5 is $0.3 per million tokens, compared to $5 for Claude Opus 4.6. For output, MiniMax M2.5 is $1.1, GLM-5 is $2.55, and Claude Opus 4.6 is $25. The cost-effectiveness of domestic models becomes more pronounced in high-token-consumption scenarios like AI Agents and code development.

Global AI Resource Imbalance Makes Routing Platforms the "Transit Hubs"

Token distribution doesn't just solve price issues; it also addresses resource mismatches.

Leading overseas large models face barriers like regional access restrictions, compliance rules, and payment hurdles, preventing them from directly reaching certain user groups, including developers in mainland China. Similarly, high-quality domestic models expanding overseas encounter challenges in localization, channel development, and user acquisition.

This imbalance fuels the demand for cross-border flow, aggregated routing, and layered distribution.

OpenRouter is already a typical example. The volume of tokens processed on its platform increased from 5-7 trillion per week in 2025 to over 20 trillion per week by April 2026. Its annualized revenue in 2026 exceeded $50 million, a roughly fivefold increase from the over $10 million annualized revenue disclosed in October 2025.

Similar platforms exist domestically. Silicon Flow is a one-stop large model cloud service platform based on its self-developed inference engine for efficient inference acceleration, while also providing enterprise-grade large model services. As of December 2025, the platform had over 9 million registered users, more than 10,000 enterprise users, and over 150 models available.

Even politically connected capital in the US has entered this field. On May 5, 2026, WLFI, a cryptocurrency company closely linked to Trump and his family, partnered with WorldClaw to launch WorldRouter, integrating over 300 models including Claude, GPT, and Gemini. Settled in USD, its pricing is approximately 30% lower than official public rates.

Real Profits May Not Lie in "Resale Margins"

There are three ways to profit from token distribution.

The first is resale margins. Platforms purchase API quotas in bulk from upstream model providers and resell them at a markup to downstream clients. OpenRouter, which adds about a 5.5% premium to supplier costs, exemplifies this model.

The second is technological premium. Platforms use self-developed inference acceleration engines to reduce the cost per token. Even when selling at prices close to or lower than official rates, they can generate gross profit through computational efficiency advantages. Silicon Flow's SiliconLLM and OneDiff technologies improve language model inference speeds by 10 times and text-to-image efficiency by 3 times, reducing the cost of large model API calls to as low as 1/10th of the industry average.

The third is enterprise value-added services. The cost of deploying AI for enterprises isn't just in token unit prices; it also includes prompt engineering, multi-model selection, business system integration, workflow orchestration, operational scheduling, and employee AI skills development. As basic token prices decline, these hidden costs become more likely points for monetization.

Silicon Flow's enterprise-level MaaS platform follows this direction: providing enterprise users with capabilities across three layers—model training and fine-tuning, deployment and inference, and application development support—covering data processing, model fine-tuning, prompt engineering, and RAG, ultimately delivered in the form of standardized APIs to industries like energy, finance, and government.

Marketing, Short Videos, Gaming, and E-commerce Are Scenarios That Consume Tokens More Easily

To be profitable, token distribution must ultimately land in real-world scenarios.

Generative AI applications are entering industries like healthcare, transportation, and industrial manufacturing and are starting to participate in core processes like corporate decision support and strategic management. However, many enterprises have weak foundations for digital transformation, insufficient data asset accumulation, and limited computing power investment, making direct AI deployment challenging.

In contrast, marketing and advertising companies already possess clients and scenarios, especially in short videos, webtoons, gaming, and e-commerce. Their token consumption demand is more direct and sustained. For such companies, the opportunity isn't just about reselling model capabilities but embedding tokens into client workflows for content generation, ad placement, asset production, and video creation.

Investment leads also unfold along two main lines:

One category includes companies with strong model capabilities, such as Alibaba, Tencent Holdings, Kuaishou, Kunlun Tech, Zhipu, MiniMax, etc.

The other category includes companies with strong token consumption scenarios and quality client sources, especially those with overseas client resources and marketing scenarios, and a willingness to actively invest in AI marketing and AI videoization. Examples include EasyClick and BlueFocus.

Risks Are Also Concrete: Low Barriers, Upfront Funding Requirements, Upstream Dependence

The token distribution business model is asset-light, but its moat is not inherently deep.

Peer competition is the first risk. The technical barrier for distribution is relatively low. Once leading distributors with capital, client, and channel advantages enter, they can quickly replicate the model, compressing profit margins.

Upfront funding requirements and bad debts are the second risk. Distributors often offer monthly or quarterly settlements to downstream clients but need to fund the upfront purchase of API quotas from upstream providers. The larger the token consumption scale, the greater the funding pressure. If clients delay payment, bad debt risks amplify simultaneously.

Policy changes by upstream model providers are the third risk. Large model providers control API pricing and access rules and may adjust prices or tighten policies for third-party access. For the intermediary layer, this is the most difficult factor to control.

Related Questions

QWhat is the core change in the commercialization of AI applications as described in the article?

AThe core change is that AI application commercialization is extending from selling software and memberships to selling Token-calling capability. A new middle-layer market for Token distribution is forming, connecting upstream large model vendors with downstream developers, enterprises, and individuals, essentially creating a liquidity infrastructure for the wholesale-to-retail network of global Tokens.

QWhat are the three key roles involved in the Token distribution chain?

AThe three key roles are: 1) Upstream Model Providers (like ByteDance Seedance, Alibaba Qwen, etc.), who are the source of Tokens. 2) Middle-layer Agent Platforms, responsible for distributing resources to end-users and providing unified APIs. 3) Downstream Consumers, including individual users, developers, enterprise clients, and possibly sub-distributors.

QWhat are the three main profit models for Token distribution platforms mentioned in the article?

AThe three main profit models are: 1) Resale Margin: Buying API quotas in bulk from vendors and selling at a markup. 2) Technical Premium: Using proprietary inference acceleration engines to lower per-Token costs and profit from efficiency gains. 3) Enterprise Value-added Services: Offering services like Prompt engineering, multi-model selection, system integration, and workflow orchestration.

QAccording to the article, what is a key factor that has enabled the expansion of Token distribution from domestic to cross-border markets?

AThe key factor is the significant improvement in the capabilities and cost-effectiveness of domestic Chinese large models. Their performance scores have closed the gap with top overseas models, and their lower prices (e.g., $0.3 per million input Tokens for some Chinese models vs. $5 for Claude Opus) create a strong value proposition for high-Token consumption scenarios in the global market.

QWhat are the primary risks associated with the Token distribution business model?

AThe primary risks are: 1) Intensifying Competition due to low entry barriers. 2) Capital Commitment and Bad Debt risk, as distributors must prepay upstream vendors while offering credit terms to downstream clients. 3) Policy Changes by Upstream Model Vendors, who control API pricing and access rules, making this an uncontrollable variable for the middle layer.

Related Reads

Bitcoin's Weak Rebound Fails to Mask Adjustment Trend, HYPE's Top Signal Warns of Short-Term Risks | Invited Analysis

**Title:** Bitcoin's Weak Rebound Fails to Mask Downtrend; HYPE Top Signal Alerts of Short-Term Risks | Exclusive Analysis **Abstract:** This weekly market analysis examines the current technical structures of Bitcoin and HYPE, outlining key trading strategies. Bitcoin's daily chart shows it has broken below the median line of its primary ascending channel, indicating structural weakness. It is currently experiencing a weak rebound within a short-term descending channel, targeting resistance at $75,000-$76,000. Failure to break above this zone could lead to a resumption of the downtrend, testing support at $69,500-$70,500. Trading strategies include positioning for a rebound rejection (Plan A) or a breakdown below key support (Plan B) with controlled short positions. For HYPE, the 4-hour chart reveals a potential seven-wave advance from the May 14 low, now showing signs of exhaustion. A bearish divergence (momentum weakening) has been observed, coupled with a top signal from the proprietary "Spread Trading Model" at potential endpoint 47. The key this week is to monitor if a confirmed top forms here, especially upon a breach of the $62.5-$64.57 support area. If broken, a larger corrective move towards $54-$56.30 is anticipated. The short-term strategy for HYPE focuses on cautious long entries only upon confirmed stabilization within the support zone. The report also details a successful short BTC trade from the previous week, yielding a ~5.07% profit, executed based on model signals and price action. Strict risk management rules, including dynamic stop-loss adjustments, are emphasized.

marsbit8m ago

Bitcoin's Weak Rebound Fails to Mask Adjustment Trend, HYPE's Top Signal Warns of Short-Term Risks | Invited Analysis

marsbit8m ago

Bitcoin's Weak Rebound Fails to Conceal Adjustment Trend; HYPE's Top Signals Warn of Short-Term Risks | Guest Analysis

Bitcoin's Weak Bounce Fails to Mask Correction Trend; HYPE Top Signals Warn of Short-Term Risks | Invited Analysis Core Weekly View: Bitcoin's daily chart structure has weakened. The key question is whether its short-term rebound can effectively break above the upper boundary of the descending channel. Has HYPE's seven-wave advance reached its conclusion? This analysis systematically examines the current market structure across multiple timeframes and outlines operational strategies for the week. **Bitcoin (BTC) Analysis:** The daily chart shows BTC trading within a long-term rising channel (yellow) since February but has recently broken below its midline, indicating structural weakness. It is currently confined within a short-term descending channel (blue) originating from the May 6 high. The ongoing bounce appears to be a weak technical correction targeting the blue channel's upper rail (approx. $75,000-$76,000). The 4-hour chart reveals a complex 10-segment corrective structure from the May high, containing two downward pivot zones (Central D and E). The current rebound (segment 36-37) is expected to face resistance in the $75,000-$76,000 area. A failure to break above could lead to a resumption of the downtrend, testing support at $69,500-$70,500 and potentially $65,000. **BTC Weekly Strategy:** The price is currently below the "Bull-Bear Channel," placing it in a technically weak zone. The core focus is on the test of the $75,000-$76,000 resistance and $69,500-$70,500 support. * *Medium-term*: Consider initiating short positions (up to 30% allocation) if the price rejects the $75,000-$76,000 area. Increase exposure to 60% if the long-term rising channel's lower support fails. * *Short-term (30% allocation)*: Two scenarios are outlined: * *Plan A (Sell on Rally)*: Short on rejection at $75,000-$76,000, with a stop-loss above $77,000. * *Plan B (Breakdown Sell)*: Short on a confirmed breakdown below $69,500-$70,500, with a stop-loss above $72,000. **HYPE Analysis:** The 4-hour chart shows HYPE has completed a seven-wave advance from its May 14 low, including a central consolidation zone. A bearish divergence was noted at the prior high (point 45), leading to a 13% correction. The current rally leg (46-47) shows weakening momentum compared to the initial leg (42-43), suggesting a potential momentum divergence. Furthermore, the proprietary "Spread Trading Model" has triggered a strong top warning signal at point 47. A confirmed top here, combined with the momentum divergence, could signal the end of the current uptrend. **HYPE Weekly Strategy:** The core is observing whether a confirmed top at point 47 coincides with the momentum divergence. * Monitor the key support zone of $62.5-$64.75. A hold and bounce from this area, supported by model buy signals, could allow for a light long position (<30% allocation). * A decisive break below this support would indicate a shift to a larger-degree correction, targeting the $54-$56.3 area. **Trade Review:** A previous short trade on BTC was executed at $77,449 based on model top signals (bearish candlestick pattern, spread model warning, momentum divergence) and closed at $73,519 for a 5.07% profit. **Risk Management Reminder:** Always set an initial stop-loss immediately upon entry. Move the stop-loss to breakeven once a 1% profit is achieved, and trail it upwards to lock in profits as the trade progresses. *Disclaimer: Market conditions change rapidly. All views, models, and strategies are for educational purposes and personal trading logs only, not investment advice. Trading carries significant risk.*

Odaily星球日报14m ago

Bitcoin's Weak Rebound Fails to Conceal Adjustment Trend; HYPE's Top Signals Warn of Short-Term Risks | Guest Analysis

Odaily星球日报14m ago

How to Define "Real U.S. Stocks": Differences Between On-Chain Tokens, Price Contracts, and Direct Broker Connections

**Title:** Defining "Real US Stocks": Differences Among On-Chain Tokens, Price Contracts, and Broker-Direct Access **Summary:** In 2026, using stablecoins to purchase US stocks is mainstream, but products marketed as "buying US stocks with USDT" offer fundamentally different assets. This article analyzes three primary models. **1. Tokenized Stocks:** These are on-chain tokens representing economic exposure to underlying stocks, held by an issuer or custodian. They offer benefits like 24/7 trading and DeFi composability (e.g., use as loan collateral). However, users lack direct legal shareholder status; dividends may not be paid in cash, and voting rights are typically non-binding advisory expressions. Examples include platforms like Ondo Finance. **2. Stock Futures / Equity Perpetuals:** These are derivative contracts tracking a stock's price, allowing leveraged long/short positions 24/7, similar to crypto perpetuals. They offer high efficiency and flexibility but involve funding fees, which can be a significant long-term cost, especially during strong trends. Crucially, they confer no ownership rights (dividends, voting) to the holder. **3. Broker-Direct Model:** This model provides access to real securities via licensed broker-dealers. Stocks/ETFs are bought and held within the US clearing and custodial system (e.g., DTCC), making it the only path to genuine stock ownership. Users receive cash dividends and formal proxy voting rights (where applicable). It supports thousands of stocks and ETFs, far exceeding the coverage of the other two models. Key advantages include no funding fees, a clean cost structure for long-term holds, and the potential to transfer holdings to other brokers. Some platforms facilitate stablecoin (USDT/USDC) deposits, reducing reliance on traditional banking. A critical distinction exists *within* the broker-direct model: the underlying brokerage architecture (e.g., Fully Disclosed IB, Omnibus IB, Self-Clearing) determines how client assets are held, protected, and how safeguards like SIPC insurance are conveyed. Users should verify the specific clearing structure and regulatory compliance of any platform. In conclusion, "buying US stocks with USDT" can mean holding an on-chain economic proxy (Tokenized Stocks), trading a price derivative (Stock Futures), or owning the actual security (Broker-Direct). For users seeking full ownership rights and long-term investment, the broker-direct model is the definitive choice, though its implementation details require careful scrutiny.

marsbit1h ago

How to Define "Real U.S. Stocks": Differences Between On-Chain Tokens, Price Contracts, and Direct Broker Connections

marsbit1h ago

NVIDIA Launches DSX Platform, Expanding into AI Factory Infrastructure

NVIDIA has unveiled the DSX platform at its GTC Taipei event, marking a strategic expansion from GPU sales into comprehensive AI factory infrastructure solutions. The platform addresses challenges like power supply, cooling, and resource orchestration as AI models scale, shifting the industry focus from single-chip performance to overall infrastructure efficiency. DSX integrates NVIDIA's chips, systems, software, and partner technologies to cover the entire AI factory lifecycle—from design and simulation to deployment and operations. It aims to accelerate deployment, improve reliability and operational efficiency, and reduce the cost per generated token in AI inference. The software suite includes DSX MaxLPS, which uses 45°C liquid cooling and rack-level optimization to allow up to 40% more GPUs per megawatt, and DSX OS, an open-source platform for AI factory operations. The platform also encompasses reference designs, digital twin simulation (DSX Sim), dynamic workload adjustment based on grid conditions (DSX Flex), and data exchange between systems. Early adopters include cloud providers like CoreWeave and Lambda. Major hardware partners, including Dell, HPE, Lenovo, and Supermicro, are developing DSX-ready systems. Pilot projects for DSX Flex are underway with energy providers. Strategically, DSX represents NVIDIA's ongoing transition from an AI chip supplier to a full-stack AI infrastructure platform provider, aiming to set industry standards and solidify its market leadership.

marsbit1h ago

NVIDIA Launches DSX Platform, Expanding into AI Factory Infrastructure

marsbit1h ago

After Burning Tens of Billions of Dollars in Tokens, Silicon Valley Giants Start Limiting Employee Token Usage

After burning tens of billions of dollars on AI tokens, major Silicon Valley firms are now restricting employee usage. Companies like Microsoft, Uber, and Salesforce, which heavily promoted AI for "efficiency," are facing a cost crisis. The practice of "tokenmaxxing"—pushing employees to maximize AI tool usage—led to wasteful spending on trivial tasks like checking the weather or writing birthday messages, with studies showing significant hidden costs for bug fixes and code rewrites. The core issue is a misalignment between individual productivity gains and actual business value. While employees use AI to automate tasks they dislike, such as writing reports, this often doesn't translate to increased company revenue or improved core business outcomes. For instance, AI-generated code speeds up development but also sees an 800% increase in "code churn" (code being discarded or rewritten). As a result, only 14% of CFOs report seeing a clear, measurable return on AI investments. Firms are now shifting strategies. Microsoft has revoked most internal licenses for Claude Code, while others are implementing monitoring and cost controls. New tools from companies like Harness and CloudZero aim to track AI spending and tie costs to business results. Some AI vendors, like HubSpot, are moving from token-based pricing to charging based on outcomes, such as "resolved conversations" or "leads generated." This represents a necessary correction in the AI adoption cycle. The challenge now is for companies to move beyond using AI merely to speed up old tasks and instead rethink their workflows and business models fundamentally. The future of enterprise AI depends on proving its value, not just its usage.

marsbit1h ago

After Burning Tens of Billions of Dollars in Tokens, Silicon Valley Giants Start Limiting Employee Token Usage

marsbit1h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片