Editor's Note: As AI begins to write code, handle customer service tickets, and review legal documents, a more fundamental question is emerging: what are businesses truly purchasing—tokens, GPU hours, or completed work?
This article proposes a noteworthy framework: the commercialization of AI should not merely be understood as a "computing power market" or a "model invocation market," but is moving toward a new "machine labor market." In this market, tokens are merely a unit of measurement, GPUs are input goods, models are production tools, and the real objects being priced and traded are economic tasks directly completed by software.
The article's core thesis is that AI pricing mechanisms will evolve from raw tokens and standardized model capabilities to industry-specific labor, and finally to a programmable outcomes market. This means that in the future, enterprises might no longer care which specific model or GPU type completes a task, but rather whether it delivers a result meeting defined standards within specified latency, accuracy, reliability, and cost constraints.
This also implies that AI's impact on the human labor market may not be simple replacement. As machines take on more standardized, verifiable work, the human role may shift towards review, accountability, context management, and final judgment. In some scenarios, the final 1% of human judgment could become even more valuable, as it unlocks large-scale automation of the other 99%.
From this perspective, the next stage of competition in the AI market may no longer be just about model capability, nor a simple price war over computing power, but about who can first standardize, verify, and price "work," and ultimately turn machine labor into a new type of factor of production that can be procured, settled, and traded.
Original article follows:
Waves of productivity have historically come from providing humans with tools and software to optimize how work gets done. Spreadsheets aided accountants and analysts, conveyor belts increased throughput, hammers amplified human leverage. But the actual labor always came from humans.
Now, AI is producing work outcomes end-to-end, directly executing the labor itself. It can write code, handle customer service tickets, review legal documents. A compression is happening at the far end of the tech stack: the old stack supported labor, the new stack is starting to produce it.
If you've listened to discussions on AI financialization recently, you've likely heard Jensen and others say that LLM tokens and/or GPU hours are becoming the new commodities. This intuition is understandable because tokens are measurable, billable, and easy to chart; billions of dollars are also flowing into GPU hours. But tokens remain merely a meter, GPU hours are just an input—no one buys them for their own sake. What people actually want is to get work done. AI is turning the tech stack itself into a source of labor.
Machine Labor: Work performed by software, for an economic purpose, and sold into the production process.
The market is already moving in this direction. Benchmark's Sarah Tavel prefers to understand this opportunity through the lens of outsourcing labor markets, not software categories. If a repeatable task is already performed by a dedicated offshore team or professional services firm, it's often also a good candidate for AI delivery. a16z's Alex Rampell calls this "software eating labor": software's next act is to do the work itself. Sequoia's Julien Bek describes the same shift from another angle: services are turning into software, copilots sell tools, while autopilots sell the work.
The Missing Market Behind Outcomes-Based Pricing
Seat pricing charges for access, token pricing charges for usage. Outcomes-based pricing charges when work is completed. Outcomes pricing moves us a step forward, but it still doesn't answer one question: who decides the price?
If machine labor can be bought directly, price should come from competition among suppliers. These suppliers must be able to meet the same class of tasks or work completion standards, which requires standardization within different industries and tasks.
The current approach uses LLM tokens, but raw tokens are just the lowest layer. A barrel of oil is just a unit of measurement; what's actually traded is a barrel of a specific grade of oil, with defined quality, delivery terms, and market price. A barrel of Brent crude and a barrel of high-sulfur heavy crude are not the same commodity. It's the same with LLM tokens. Tokens are just the unit; what matters is the intelligence behind them: model quality, benchmark floors, latency, context window, reliability, and delivery guarantees. One million tokens from a frontier coding model are not the same commodity as one million tokens from a cheap general-purpose model. The market needs standardized inference grades, just as the energy market needs standardized oil grades.
Anjali Shriva points this out directly: a token is not a fixed cost unit. Its economics vary with context length, task structure, input/output ratio, retry counts, tool calls, and agent workflows. A token in a short prompt and a token buried in a long agent loop are not the same economic object.
We already do this in human labor markets. No one hires a radiologist as a generic "human hour." They look at training background, certifications, specialty, years of experience, availability, reputation, liability, etc. Different human contract specs correspond to different minimum standards and grade expectations.
Human labor markets already run on these specs; it's just that these specs are often muddled, qualitative, and full of proxy signals. Machine labor will make these specs more explicit and quantifiable.
For an LLM or an agent, metrics like skill, experience, speed, and reliability can be written directly into a contract: benchmark scores, latency, throughput, context window, max output length, tool-use accuracy, uptime, error rates. We can procure labor based on quantifiable expectations and outcomes.
TheGrid.ai's contract spec is essentially a qualification filter, plus price competition for LLM outputs. Any supplier meeting the spec can enter the competition:
Intelligence Benchmark ≥ Floor
Latency ≤ Ceiling
Throughput ≥ Floor
Uptime ≥ Floor
Error Rate ≤ Ceiling
Once suppliers all meet the same minimum bar, they compete on price. The buyer asks: which supplier can deliver the required labor at the best price?
Hiring a radiologist, in the LLM context, becomes a measurable question: which LLMs can read X-rays with high proficiency, meeting defined latency, context window, and other outcome-based contract specs.
Outcome is how the buyer measures success; Labor is the economic activity being supplied; Token is the fuel the machine consumes while doing the work.
The Grid is the machine labor market.
From Tokens to Machine Labor Market
The market can price inputs of the tech stack, but to price outputs requires a machine labor market. Buyers don't care about GPU hours. Model endpoints themselves are unstable: they get renamed, deprecated, wrapped, or simply retired.
Users and liquidity hate frequent changes. GPUs and models will keep evolving, but the stable unit is the work itself.
I believe the market will evolve along the following path. Each step up the ladder, what is being purchased becomes more abstract, more valuable, but also harder to verify. The Grid should climb this ladder progressively:
Raw Tokens → Commoditized LLM Capability Market → Commoditized Labor Market → Programmable Outcomes Market
Stage 1: Raw Tokens
Claude 4.7, GPT 5.5, Kimi 2.6, DeepSeek V4, GLM 5, etc.
Today, buyers purchase raw model output from inference providers. They send their prompts, receive inference results, and pay per use. This is easy to verify, but it's still just raw material. Buyers don't actually want tokens; they want useful intelligence at the best price.
Stage 2: Commoditized LLM Capability Market
E.g., text/usd, code/usd, agent/usd, etc.
The buyer no longer chooses a specific model, but the category of intelligence they need. The buyer still owns the workflow, prompts, data, and application logic. The Grid just routes each request to the qualified model that meets the contract spec and offers the lowest price.
Note: This is the first real abstraction above raw tokens, and where TheGrid.ai currently sits.
Stage 3: Commoditized Labor Market
E.g., accounting/usd, support_agent/usd, legal/usd, healthcare/usd, radiology/usd, etc.
As models become more specialized, the capability market can evolve further into industry-specific markets. This is analogous to specialization in human labor markets.
At this layer, we're selling inference suited for workflows in specific labor verticals. This category will expand rapidly as niche industry models become common. Examples include Cursor's Composer, Harvey for legal work, and EvidenceOpen for healthcare.
Stage 4: Programmable RFQ and Outcomes Market for Agents
E.g., support_ticket_resolved/usd, pr_merged/usd, claim_processed/usd, etc.
The final layer is where The Grid moves from an inference market to a machine labor market.
This layer requires RFQs (Request for Quote), escrow, delayed settlement, buyer confirmation, supplier reputation, clawback, dispute resolution, etc. It will likely start with RFQs rather than order books. Buyers define the job, constraints, acceptance criteria, and settlement terms; agents bid to complete it. The Grid helps route, price, verify, and settle these jobs.
This is the most valuable layer, but also the hardest to verify, as outcomes can be delayed, subjective, and easily gamed. A customer service ticket might be reopened; a pull request might pass tests but still introduce poor architecture.
Total Price = Cost of doing the work + Cost of bearing the risk
A workflow does not automatically become a market just because intelligence has a market or intelligence gets cheaper. Some work is deeply dependent on private context, like customer history or internal policies. The more context-dependent the work, the less likely it is to be cleanly liquidated in an open market. [@hypersoren https://hypersoren.xyz/posts/cybernetic-arbitrage/]
The market needs to reveal which labor categories will expand and which will contract.
"Machine Labor vs. Human Labor" or "Machine Labor & Human Labor"
Anjali Shriva notes in her mechanism design draft that the AI narrative is too often described as replacement. But in reality, it's more of a coordination problem: how work, attribution, incentives, and value get reorganized when both humans and machines participate in production.
Today, much AI usage inside companies remains stuck because employees use AI privately, workflows stay locked on individuals, and the firm cannot price these productivity gains or scale them.
Most automatable work will likely shift to machines. Some work will turn into human review, accountability, training, and context management. In some cases, the final 1% of human judgment will become more valuable, as it unlocks the other 99% of automation at scale.
Rachel Su Park's "Brave New World of AI Markets" points out that AI's TAM should not be simply modeled as a replacement for existing human labor spend, because it changes both price and quantity. As the cost of work falls, unit price may decline, but quantity consumed may expand because existing work gets consumed more often, and entirely new work that wasn't economical before becomes possible. The article summarizes it as:
P × Q: Market Size = Price per unit of work × Quantity of work consumed
If AI makes a customer service interaction cheaper, companies can afford to offer 24/7 coverage. The market won't just be a cheaper version of the old customer service labor market; it might become a much larger market for customer interactions.
AI is an expansionary market because demand does not stay constant when the cost of work falls.
The Labor Layer
The machine labor market should start with work that can be crisply defined in specs. GPU hours contain too much input information; they only tell you what powered the work. Pricing full outcomes is too complex, too context-dependent. As verification, reputation, and risk/insurance pricing get handled by machines, the market will move further into the pure outcomes layer.
Machine labor can become tradable because buyers will increasingly not care which model or which GPU produced the work, but rather whether the work itself meets the minimum standards and grade from the contract spec, at the right price. Agents will care even less about the underlying source.
Machines can now directly execute work for an economic purpose, and that work can be defined, measured, priced, procured, and eventually traded. Electricity, compute, models, and tokens are still important, of course, but they're all still upstream.
Downstream is where the work actually gets done, and the market is moving toward a simpler object: machine labor.








