Dragonfly Partner Haseeb: Why the Fastest-Growing Companies of the Future May All Stagnate at 149 Employees

链捕手Published on 2026-06-24Last updated on 2026-06-24

Abstract

Dragonfly partner Haseeb analyzes AI service pricing, particularly Anthropic's model, as a form of "tax policy" on AI labor. He observes a stark contrast: small companies (under ~150 users) benefit from low or zero marginal cost via flat-rate subscriptions, incentivizing them to maximize AI usage ("tokenmaxxing"). In contrast, large enterprises pay per-token API costs with high (e.g., 75%) markups, creating a disincentive for experimental or marginal automation. This pricing cliff at 150 users acts like a regulatory notch, potentially causing the fastest-growing AI-native companies to artificially cap their headcount to retain subsidized rates. Consequently, large-scale direct AI-for-human substitution within big corporations may not happen as expected; instead, job displacement may occur indirectly as lean, AI-heavy startups outcompete and erode their market share. The article concludes that token pricing, though not designed as such, functions as a powerful de facto tax policy shaping company structure and automation incentives.

Author: Haseeb

Compiler: Jiahuan, ChainCatcher

@SemiAnalysis_ recently discovered an incredible phenomenon in the economics of AI programming subscriptions. If you max out your usage, you actually pay 20 to 70 times less per token than buying tokens via the API.

Many people see this and say: My God, look at how much these LLM companies are subsidizing on tokens, the bubble must burst soon.

This reaction is wrong. The reason LLM companies are willing to offer such generous plans is naturally because most users rarely hit the cap. This product is like a gym membership: the allowance is generous because the vast majority of people hardly use it.

But I spent a long time pondering this; there's something genuinely odd here.

We don't know their actual comprehensive profit margin on subscriptions, but according to SemiAnalysis's estimates, at an average utilization rate of 20%, Anthropic's Max 5x plan barely breaks even. A 20% utilization rate might even be on the high side, especially in organizations where everyone (including non-programmers) has a subscription but only uses it occasionally. Most institutions I know, including Dragonfly, generously distribute Claude Code subscriptions and encourage non-programming staff to try them.

But what SemiAnalysis didn't delve into is that this is entirely a phenomenon for small businesses. Large enterprises cannot access this subscription pricing.

Here's why: When you exceed 150 users, you are forced out of the subscription model called "Team." You must switch to "Enterprise," priced at a base of $20/seat plus API fees calculated based on actual token usage. Enterprises can only pay linearly by token cost, and SemiAnalysis estimates the gross margin on API tokens to be around 75%. This is a massive price hike that suddenly kicks in at 150 users.

So, if you're a small business or startup (or an individual user), your perception of AI spending is distorted. Your token pricing is actually heavily subsidized; Anthropic likely operates on very low or even negative margins on you.

You might wonder why Microsoft and Uber are making a fuss about token spending and talking about "token-mining." This is the reason. They pay a structurally higher cost per token than startups and individuals.

But Anthropic doesn't care! For a B2B company, extracting maximum value from small companies or individuals isn't very meaningful. Look at companies like Datadog or Cloudflare: 80% to 90% of their revenue comes from large contracts (Annual Recurring Revenue > $100k). Making zero profit on the long tail is just a customer acquisition cost.

This is classic B2B sales thinking.

But there's another way to view the same situation: through the lens of tax policy.

Because if tokens are replacing labor, then the gross margin that OpenAI and Anthropic charge on tokens is effectively a tax on AI labor.

Viewing token pricing this way leads to two main consequences.

Token Pricing as Tax Policy

Assuming the profit margins from SemiAnalysis's article hold: subscriptions break even, large enterprise API gross margin is 75%. The immediate reaction is to call this a 75% tax on AI labor for large organizations and a 0% tax for startups.

Standard tax analysis would say this discourages large companies from using AI labor internally, incentivizing them, at the margin, to automate less and retain more human labor. (Obviously, it also encourages using smaller or open-source models, but the net effect incentivizes both. Remember, we're talking about the margin here.)

However, what drives behavior more strongly is not the average tax rate. In tax policy, it never is. What we really care about is the marginal tax rate.

For startups on flat-rate subscriptions, the marginal price of the next token is zero until the cap is hit. And a zero marginal price is the maximum distortion a policy can create.

For startups, the subscription model is essentially an innovation subsidy. The most overwhelming incentive is to figure out how to spend the entire token budget as efficiently as possible. This means running Ralph loops, keeping screens full of Claude Code sessions, and scheduling swarms of agents to work together.

Until the cap is reached, exploration is free. So startups are essentially racing to squeeze every last drop of value from the subscription, overwhelming competitors with output. Paradoxically, the more you use, the lower the average token price gets. Every startup wants to be the one that costs Anthropic the most on the subscription.

Large enterprises face the opposite incentive. If you exceed 150 seats, every token in exploration is charged at full markup (with a 75% surcharge!). So they face a linearly increasing penalty for every step of exploratory frontier they probe.

Large firms will still automate the obvious high-volume tasks, but the marginal, experimental, risky automation is never discovered because the cost of discovery is too high. This tax structure ultimately encourages them to retain more human labor and maintain their original organizational structure.

This is the exact opposite of Japan. Due to a declining population, Japan faces a huge labor shortage. Historically, this meant Japan pursued high automation because expensive human labor incentivized it. That's why Japan has robots in restaurants, factories, hotels, and hospitals.

But curiously, large enterprises find themselves in the opposite dilemma: if they have to pay a high tax on using AI, it weakens the incentive to automate and strengthens the motive to retain existing employees (which is even more pronounced if wages stagnate during this period).

So where does the labor substitution flow in this model?

Everyone is watching large companies, waiting for the wave of AI layoffs. But with a 75% tax, aggressively replacing your own employees with AI might simply not be cost-effective; the token budget would explode.

But that doesn't mean substitution won't happen; it just manifests differently.

When large enterprises lose market share to AI-native startups with extremely low composite human costs, the large firms' declining revenue and stock prices trigger layoffs. But those eliminated jobs never reappear in the winning startups. The net reduction effect is the same; this unemployment gap is simply transferred to a lower-taxed part of the economy.

This is also why "AI-washing" (framing ordinary layoffs as newfound AI efficiency) might not be a temporary phenomenon. AI-washing refers to a company attributing layoffs to AI efficiency when it's really just masking ordinary business weakness.

Many think this is just a fad in the current AI hype cycle. But even though everyone is ready to witness large enterprises conducting real AI layoffs, replacing positions with AI, this might never happen on a large scale.

Labor substitution may unfold differently: startups defeat large companies, large companies disguise decline under the banner of AI until they fail, and startups never rebuild those old jobs. Job substitution still occurs, just not where everyone is looking.

This is the first consequence of this model. But there's a second, even more bizarre consequence.

The 150-Person Cliff

A regulatory notch is a regulatory boundary that incentivizes a huge behavioral jump. For example: the 30-hour weekly threshold for full-time employment has spawned a lot of jobs that are exactly 29 hours per week.

It's well known that France has extremely strict labor regulations that kick in once a company reaches 50 employees (work councils, mandatory profit-sharing, firing protections), while small companies are exempt. This gives employers a huge incentive to desperately keep their headcount below 50.

From: Garicano, Luis, Claire Lelarge, and John Van Reenen, 2016, "Firm Size Distortions and the Productivity Distribution: Evidence from France."

Extend this analogy to AI. LLM companies have created a tax threshold that punishes companies exceeding 150 seats. This means you must stay small to keep that wonderful subsidized subscription price, where tokens are taxed at roughly 0% (or even negative), not 75%.

This could give rise to an entirely new philosophy of corporate management. Startups will become obsessed with using agents for everything, having smaller teams, frequent layoffs, more outsourcing, exhausting all means to minimize the parts that require humans.

Not because it's the "optimal" level of automation, but because the incentives push them there. If the magic number is 149, every seat is crucial; you can't waste a single person outside the company's core functions.

This discontinuity might be hailed by Harvard Business School types as "the new generation of AI-first management." But understood correctly, it's simply a rational response to enterprise pricing plans.

This might sound exaggerated. But you can already see the behavioral divergence between organizations. Talk to developers at large enterprises; they are meticulously counting tokens, growing anxious as leadership cuts token budgets. Meanwhile, developers at startups are aggressively tokenmaxxing, launching swarms of agents overnight to check the logs in the morning. I expect this trend to accelerate.

No one designed this intentionally. No committee decided to subsidize innovation for startups and tax incumbents. It flows directly from those tried-and-true enterprise pricing strategies.

But tax law has always been this way: a pile of incidental rules that ultimately determine which companies get built and how those companies contort themselves to minimize their tax burden.

You might argue this is temporary, that LLM companies will eventually meter everyone. Github Copilot already made this transition. Maybe, maybe not. But before pricing normalizes, the 149-person company and this new AI-first management style might have already exploded, swallowed large market shares, and written the playbook for the next generation of startups.

Tax policy matters immensely. The entire "gig economy" concept exists precisely because of the legal distinction between W-2 (employee) and 1099 (independent contractor). As more and more labor is cannibalized by AI, token pricing could become the most impactful tax policy of the next decade. Yet, no one will ever vote on it.

(Don't be surprised if the fastest-growing companies in the next cycle are all conspicuously stuck at 149 seats.)

JD.com and Former OpenAI CTO Mira Murati Have Bet on the Same AI Track

JD.com and Mira Murati's Thinking Machines Lab are converging on the same AI frontier: proactive visual-language interaction models. JD just open-sourced JoyAI-VL-Interaction, the world's first full-stack open-source model of its kind. Unlike traditional "turn-based" AI that waits for user prompts, this model actively analyzes continuous video streams, autonomously deciding when to respond, stay silent, or delegate complex tasks. It prioritizes vision as the primary driver for decision-making in physical-world scenarios like elderly fall detection, live sports commentary, or warehouse monitoring. The 8-billion-parameter model is designed for practical deployment, running on a single RTX 3090 GPU with sub-second latency. Its "full-stack" open-source release includes the model, inference system, and dataset, aiming to catalyze a developer ecosystem. JD's strategy is underpinned by its vast operational footprint in retail, logistics, and healthcare, which provides crucial real-world data for training. The move signals a broader shift in AI competition from screen-based Q&A to active participation in the physical world.

marsbit19m ago

JD.com and Former OpenAI CTO Mira Murati Have Bet on the Same AI Track

marsbit19m ago

Google Starts Selling TPUs, Big Tech Aims to Produce "Low-Cost Tokens" with AI Chips

Google has begun selling its proprietary TPU chips and AI computing hardware directly to third-party data centers and clients, marking a strategic shift. Previously only accessible via cloud rentals, TPUs are specialized processors designed for the matrix and tensor operations central to AI models. By combining thousands into supercomputing clusters managed by CPUs, Google achieves high-efficiency AI processing. This move enables Google’s Gemini AI to offer competitive token pricing, challenging rivals like OpenAI. It also signals a broader industry trend where AI compute is becoming a commoditized resource like electricity. While NVIDIA remains dominant with its CUDA ecosystem and high-performance GPUs, the focus is shifting from raw power to cost efficiency and system integration. Google’s approach mirrors NVIDIA’s by selling an entire ecosystem—hardware, software, and data center expertise—rather than just chips. This threatens NVIDIA’s grip on the mid-range inference market, where lower-cost, efficient solutions are increasingly demanded. Similarly, cloud providers like Huawei Cloud and Alibaba Cloud in China are developing their own AI chip ecosystems (e.g., Ascend, Zhenwu), packaging chips, clusters, and tools into full-stack solutions. They aim to reduce token costs and capture market share through integrated systems. In summary, the AI infrastructure race is evolving from a competition for the strongest chips to a contest for the most efficient and cost-effective systems. Google’s TPU sales highlight this transition, emphasizing that future success lies in delivering affordable, scalable AI compute as a foundational service.

marsbit20m ago

Google Starts Selling TPUs, Big Tech Aims to Produce "Low-Cost Tokens" with AI Chips

marsbit20m ago

Do Not Apply Blindly: A Comprehensive Evaluation of the Eight Main Pathways to Hong Kong Residency by 2026

Hong Kong has recently updated its talent attraction policies, offering eight mainstream pathways to residency. These include programs like the Top Talent Pass Scheme (TTPS), Quality Migrant Admission Scheme (QMAS), the newly introduced Technology Professionals Admission Scheme (TP Stream), and Vocational Professionals Admission Scheme (VPAS). Navigating these options involves understanding key details such as core eligibility criteria, employer sponsorship requirements, and the respective advantages and drawbacks of each scheme. A comprehensive comparison chart is provided to help applicants evaluate their choices and potentially save on consultancy fees. Applicants are reminded to always verify information with the official announcements from the Hong Kong Immigration Department.

marsbit42m ago

Do Not Apply Blindly: A Comprehensive Evaluation of the Eight Main Pathways to Hong Kong Residency by 2026

marsbit42m ago

Report Analysis: Semiconductor Sector Surges 155%, Bernstein Says NVDA and AVGO Are 'Absurdly Cheap'

Title: Bernstein's Semiconductor Quarterly Review: AI is the "Only Game in Town," Highlights "Absurdly Cheap" NVDA and AVGO Bernstein's June 23 semiconductor industry review asserts AI is the sector's dominant driver, fueling record gains. The SOX index rose 155.6% over the past year, primarily driven by a 75% increase in forward EPS, not just valuation expansion. While sector-wide valuations are high, Bernstein identifies a significant valuation gap. Despite leading the AI chip market, Nvidia (NVDA) and Broadcom (AVGO) have lagged in performance this year. Based on their 2027 EPS projections and critical roles in the AI supply chain, Bernstein rates both as "Outperform," calling their current valuations "absurdly cheap." The report notes extreme divergence within the sector, with memory chips up 500% YTD, while GPUs/ASICs gained 115%. Bernstein upgraded AMD to "Outperform" due to its dual opportunity in both AI/GPU and CPU markets. However, it remains cautious on Qualcomm (QCOM), citing smartphone market pressures. Key risks highlighted include historically high sector crowding and elevated inventory levels, which could pressure the supply chain if demand softens. The conclusion stresses selective stock-picking over broad directional bets, given current high valuations.

marsbit1h ago

Report Analysis: Semiconductor Sector Surges 155%, Bernstein Says NVDA and AVGO Are 'Absurdly Cheap'

marsbit1h ago

Bitcoin Loses $63,500 Support As Heatmaps Show Liquidity Building Above Price

Bitcoin has broken below the $63,500 support level, a key area that had previously held during dips. The failure of buyers to defend this level signals a shift in market sentiment, as broken support can turn into resistance. However, analysis of liquidation heatmaps shows significant liquidity building above the current price, between $65,500 and $66,500. This overhead liquidity could act as a magnet for a short squeeze or relief rally, creating tension with the bearish signal from the support break. The near-term direction hinges on whether Bitcoin can reclaim $63,500 or gets rejected, with a move through the overhead liquidity zone needed to improve the bullish setup. Traders are advised to watch for confirmation from price action, volume, and follow-through rather than treating this as a definitive directional call.

bitcoinist1h ago

Bitcoin Loses $63,500 Support As Heatmaps Show Liquidity Building Above Price

bitcoinist1h ago

Trading

Spot

Futures