Spending $200 to Buy Stars, Scamming VCs Out of Tens of Millions: The Entire GitHub Fake Star Industry Exposed

marsbitОпубліковано о 2026-04-21Востаннє оновлено о 2026-04-21

Анотація

A peer-reviewed study from Carnegie Mellon University (CMU) reveals that GitHub hosts approximately 6 million fake Stars, involving 18,600 repositories and 301,000 accounts, with AI/LLM projects being the largest non-malicious category for fake engagement. The fake Star market has exploded, with prices as low as $0.03 per Star. Research shows that venture capital firms, such as Redpoint Ventures, use GitHub Star counts as a key metric for evaluating startups, with median Stars at 2,850 for seed-stage funding. For less than $200, a project can artificially meet this threshold, distorting investment landscape. Over a dozen websites openly sell GitHub Stars, and fake Star activity saw explosive growth in 2024. AI-related repositories were among the most heavily affected. Despite GitHub’s policies against fake engagement, enforcement remains inconsistent: while 90% of flagged repositories were deleted, only 57% of involved accounts were suspended. The report highlights how purchased Stars can manipulate GitHub’s Trending algorithm and influence VC funding decisions, creating a cycle where artificial metrics attract real investment.

Author: Claude, Deep Tide TechFlow

Deep Tide Intro: A peer-reviewed study from Carnegie Mellon University (CMU) found approximately 6 million fake Stars on GitHub, involving 18,600 repositories and 301,000 accounts. AI/LLM projects are the largest non-malicious category for star-buying. The market price for a single star can be as low as $0.03. Redpoint data shows the median number of Stars for VC seed-stage projects is 2,850—meaning spending less than $200 can 'buy' a false level of popularity that meets the seed-round threshold.

GitHub Stars are becoming an elaborately packaged scam.

According to an investigative report published by Awesome Agents on April 13th, a mature gray market around GitHub Stars is operating in plain sight: academic papers have quantified the scale of the problem, over a dozen websites openly sell Stars, and venture capital firms directly incorporate Star counts into their project screening decisions.

The investigation team independently verified 20 repositories and found that 36% to 76% of the Stars for some projects came from accounts with zero followers, with fork-to-star ratios less than one-tenth of the baseline for organic projects.

The core academic support for this report comes from a peer-reviewed paper jointly published by CMU, North Carolina State University, and Socket at ICSE 2026 (International Conference on Software Engineering). The research team's detection tool, StarScout, analyzed 20TB of GitHub metadata (6.7 billion events, 326 million Stars, covering 2019 to 2024), ultimately flagging approximately 6 million suspicious fake Stars, 18,600 involved repositories, and about 301,000 participating accounts.

6 Million Fake Stars: Explosive Growth in 2024, AI Projects Heavily Affected

Fake Stars are not a new phenomenon, but their scale exploded in 2024. CMU paper data shows that before 2022, there were no more than 10 repositories involved in fake Star activity per month. By the peak in July 2024, this number skyrocketed to 3,216 repositories and 30,779 participating accounts. As of July 2024, 16.66% of repositories with more than 50 Stars had engaged in fake Star activity.

The detection accuracy of the research team was indirectly validated by GitHub's own actions: 90.42% of the repositories flagged by StarScout have been deleted, and 57.07% of the flagged accounts have been purged.

In the classification of fake Star usage, most are used to promote short-lived phishing/malware repositories. But among non-malicious categories, AI and LLM-related projects rank first, with a total of 177,000 fake Stars, surpassing blockchain/cryptocurrency projects. The paper notes that "many of these are academic paper repositories or products from LLM-related startups." More critically, 78 repositories detected with fake Star activity had appeared on the GitHub Trending page, proving that purchased Stars can indeed successfully manipulate the platform's recommendation algorithm.

A Star for as Low as 3 Cents: The Openly Operating Star-Buying Market

This is not a dark web transaction. The investigation confirmed that at least a dozen websites openly sell GitHub Stars, including SocialPlug.io, Buy.fans, Boost-Like.store, etc. There are 24 active Star-buying services on Fiverr, ranging from basic packages for $5 to "organic promotion" packages for $25 and above.

Pricing is tiered: cheap tier (disposable new accounts) $0.03 to $0.10 per star, mid-tier $0.20 to $0.50, premium tier (aged accounts with years of history) $0.80 to $0.90. Premium services promise "non-drop stars" and a 30-day refill guarantee. SocialPlug claims to have delivered 3.1 million Stars cumulatively, serving over 53,000 customers, and even offers an API interface for programmatic bulk purchasing.

Star exchange platforms like GithubStarMate.com and SafeStarExchange.com use a points-based mutual brushing model, allowing users to exchange Stars without spending money. There are also at least 7 open-source tools on GitHub (e.g., fake-git-history, commit-bot, etc.) specifically designed to forge contribution history graphs. Pre-made GitHub accounts with 5 years of commit history and the Arctic Code Vault contributor badge are sold on Telegram for about $5,000.

A 2020 study from Tsinghua University documented the operations of promotion groups on QQ and WeChat in China: groups with over 1,020 members process about 20 repository star-buying tasks daily, estimating an annual industry profit of $3.4 million to $4.4 million.

VCs Use Stars for Project Screening, Spending $200 Can "Meet" Seed Round Standards

The relationship between Stars and funding is not speculation; it's something venture capital firms themselves publicly admit.

Redpoint Ventures partner Jordan Segall analyzed 80 developer tool companies and found that the median number of GitHub Stars at seed funding was 2,850, and 4,980 at Series A. He explicitly stated: "Many VCs write internal crawlers to find GitHub projects with fast Star growth. Stars are the metric they most commonly track."

These numbers essentially give startups a precise shopping list. Using cheap Stars, spending $85 to $285 can manufacture 2,850 Stars to reach the seed round median; spending $990 to $4,500 can reach the Series A threshold. Compared to the typical seed round funding range of $1 million to $10 million, the return on investment ranges from 3,500x to 117,000x.

The ROSS Index (Ranking of Open Source Startups), published quarterly by Runa Capital, further amplifies this incentive. According to TechCrunch, 68% of the companies on the ROSS Index received investment at the seed stage, with total tracked funding reaching $169 million. An independent analysis in the investigative report found that Union Labs, ranked first in the Q2 2025 ROSS Index (Star growth 54.2x, total 74,300 Stars), showed severe signs of star-buying: 32.7% of its Stars came from accounts with zero repositories, 52% from accounts with zero followers, and StarScout flagged 47.4% of its Stars as suspicious. The top project on an industry ranking widely cited by VCs had nearly half its Stars涉嫌造假 (suspected of being fake).

Actual cases already corroborate the conversion chain from Stars to funding: Lovable (formerly GPT Engineer) secured a $7.5 million pre-seed round with 50,000+ Stars, with a Series A valuation of $1.8 billion; Browser-use received a $17 million seed round after gaining 50,000 Stars in three months; Pangolin entered Y Combinator with 1,000 Stars and completed a $4.7 million seed round within eight months.

GitHub's Asymmetric Enforcement: Delete Repositories but Keep Accounts

GitHub's Acceptable Use Policies explicitly prohibit "artificial engagement," "ranking manipulation," and creating a secondary market for fake Stars, even specifically banning star-buying behavior incentivized by "cryptocurrency airdrops."

But enforcement is passive and asymmetric. GitHub deleted 90.42% of the repositories flagged by StarScout but only purged 57.07% of the executing accounts. The "workforce" of the fake Star industry remains largely intact. After Dagster published an investigative report in 2023, the related fake Star accounts were deleted within 48 hours—but this was a reaction to public exposure, not the result of proactive detection.

The CMU research team suggested GitHub adopt a network centrality-based weighted popularity metric to replace the raw Star count, structurally dismantling the fake Star economy. GitHub has not implemented this to date.

This forms a self-reinforcing loop: VCs use Stars as a screening signal → Startups buy Stars → VCs see artificial hype → More VCs adopt Star tracking → More startups buy Stars. The benchmark numbers publicly released by Redpoint (seed: 2,850, Series A: 4,980) essentially gave startups a clearly priced shopping list.

As one commentator in the investigative report said: "Star counts can be faked, but saving someone a weekend of bug fixes cannot."

Пов'язані питання

QWhat is the estimated number of fake GitHub Stars identified in the CMU study, and which category of projects had the highest number of non-malicious fake Stars?

AThe CMU study identified approximately 6 million fake GitHub Stars. Among non-malicious categories, AI and LLM-related projects had the highest number of fake Stars, totaling 177,000.

QHow much does the cheapest fake GitHub Star cost, and what is the estimated cost to fake the median Star count for a seed-round project?

AThe cheapest fake GitHub Star costs as low as $0.03. To fake the median Star count of 2,850 for a seed-round project, it would cost less than $200 using the cheapest options.

QAccording to the article, what percentage of repositories with over 50 Stars had engaged in fake Star activities by July 2024?

ABy July 2024, 16.66% of repositories with over 50 Stars had engaged in fake Star activities.

QWhich venture capital firm published data on the median GitHub Star counts for seed and Series A rounds, and what were those numbers?

ARedpoint Ventures published the data. The median GitHub Star count was 2,850 for seed rounds and 4,980 for Series A rounds.

QWhat tool did the research team develop to detect fake GitHub Stars, and how was its accuracy indirectly validated?

AThe research team developed a tool called StarScout to detect fake GitHub Stars. Its accuracy was indirectly validated by GitHub's actions: 90.42% of the repositories flagged by StarScout were deleted, and 57.07% of the flagged accounts were purged.

Пов'язані матеріали

AI Relay Stations Spark Heated Debate on Zhihu: Behind Cheap Tokens, What Are Users Really Worried About?

A discussion on Zhihu about "AI relay stations" shifted the niche developer topic of "cheap tokens" into broader user awareness. Users moved beyond simply questioning the legitimacy of these services to focus on practical concerns: Where do cheap tokens truly come from? Is the model being accessed the real one? Can relay stations see prompts, code, and API keys? For occasional users, are the risks worth it? The core debate centered less on price and more on trust. A primary worry is model authenticity—the risk of "model swapping," where users paying for a premium model might be routed to a cheaper one, creating an information asymmetry. Others argued that cost comparisons matter; while cheaper than official pay-as-you-go APIs, relay stations may not be the lowest-cost option versus subscriptions, domestic models, or free tiers, making user needs assessment crucial. Speculation about token sources ranged from legitimate bulk discounts to gray-area methods like account sharing or exploiting regional pricing. This opacity makes risk assessment difficult for users. Data security emerged as a critical concern, especially for enterprise use. When processing sensitive information like code, contracts, or client data, the inability to verify a relay station's data handling, retention, or access policies poses significant compliance and confidentiality risks. The evolving consensus suggests relay stations can be used cautiously for low-sensitivity, disposable tasks (e.g., summarizing public info, simple translation). However, they should not be the default for sensitive, professional, or production workflows involving proprietary data, Agents, or automated systems. Recommendations include avoiding large prepayments, not relying on a single service, using test prompts to monitor quality, anonymizing data where possible, and keeping official channels as backups. Ultimately, the discussion framed tokens not just as a billing unit but as a measure of real cost encompassing price, model integrity, data security, and service stability. The popularity of relay stations highlights user demand for affordable access, but the debate underscores a key trade-off: the savings from cheap tokens may come at the price of trust, transparency, and control over one's data and AI experience.

marsbit21 хв тому

AI Relay Stations Spark Heated Debate on Zhihu: Behind Cheap Tokens, What Are Users Really Worried About?

marsbit21 хв тому

In-Depth Research Report on TradFi: The Convergence Wave of Crypto and Traditional Finance

In 2026, the crypto industry is undergoing a profound infrastructure-level transformation—TradFi assets are migrating on-chain at an unprecedented pace. According to CoinGecko's Q1 2026 report, the total value locked (TVL) of tokenized real-world assets (RWA) has surpassed $31 billion, a nearly 4x increase from $7.8 billion at the beginning of 2025, with the sector’s aggregate market capitalization reaching $19.3 billion. Among these, the market cap of tokenized stocks surged from $2 million to $486 million, with Q1 spot trading volume reaching $15.1 billion—a single quarter already surpassing the entire second half of 2025. RWA perpetual contract Q1 trading volume reached a staggering $524.8 billion, far exceeding the $313 billion for all of 2025. Meanwhile, BlackRock's BUIDL fund has reached $2.3 billion in scale and has filed for two new tokenized funds, signaling that the world's largest asset manager's tokenization strategy is evolving from pilot to product suite expansion. HTX, as a core participant in the crypto exchange sector, officially launched TradFi perpetual futures products including NVDA, AAPL, MSFT, META, and SPY in 2026, enabling crypto users to gain 24/7 trading access to core U.S. equities. Boston Consulting Group predicts that global tokenized asset scale could reach $16 trillion by 2030, while McKinsey offers a conservative estimate of approximately $2 trillion. The on-chain migration of TradFi assets is no longer a "future narrative" but a structural transformation unfolding in real time, as crypto exchanges evolve from single crypto asset trading platforms toward "multi-asset-class trading infrastructure."

HTX Learn23 хв тому

In-Depth Research Report on TradFi: The Convergence Wave of Crypto and Traditional Finance

HTX Learn23 хв тому

Blocked Its Own Treasure, WeChat AI Steps Up

Tencent's stock surged over 10% on June 2nd amid reports that WeChat, with 1.43 billion monthly users, is finalizing tests for a native AI Agent. The reported feature, accessible by swiping right from the main interface, allows users to issue commands in natural language. The AI then decomposes tasks and automatically calls upon relevant Mini Programs within WeChat to complete actions like ordering food, booking tickets, or making payments, creating a closed-loop service execution system. This strategic shift follows the internal conflict and subsequent "blocking" of Tencent's standalone AI app, Yuanbao, by WeChat for violating sharing rules during a 2026 Spring Festival promotion. The incident highlighted a lack of internal consensus and exposed the weakness of competing in the standalone AI assistant arena against rivals like ByteDance's Doubao (345M MAU) and Alibaba's Qianwen. The new WeChat AI Agent aims to leverage WeChat's unique assets—its massive user base, standardized Mini Program APIs, WeChat Pay, and identity system—to move from simple content generation to actual task execution. Analysts note this changes the competitive landscape from model benchmarks to which AI can connect to more real-world services. However, success depends on key variables: the capability of Tencent's underlying Hunyuan model, managing massive inference costs, and redesigning incentives for Mini Program developers whose traffic might be bypassed. The move is seen as an attempt to keep user service intent within WeChat's ecosystem as AI begins to redefine how users access services.

marsbit1 год тому

Blocked Its Own Treasure, WeChat AI Steps Up

marsbit1 год тому

ByteDance Adopts Arm CPUs, Jensen Huang: So Sad I Didn't Buy Arm

**Summary:** At Computex 2026, Arm CEO Rene Haas announced that ByteDance and Oracle have adopted Arm's self-designed Arm AGI data center CPU. The company expects significant revenue growth from this product, projecting $20 billion in demand for the 2027/2028 fiscal years. Haas noted that restricting AI-capable CPUs from the US to China is nearly impossible due to their widespread applications. Arm's stock has surged dramatically this year, notably rising 16% after NVIDIA's Arm-based Vera CPU and RTX Spark announcements. A highlight was the informal, humorous on-stage conversation between Haas and NVIDIA CEO Jensen Huang. Huang joked about NVIDIA's failed attempt to acquire Arm and playfully lamented selling his Arm shares. Both executives showed a clear sense of camaraderie and shared regret over the missed merger. Key technical topics were discussed: 1. **AI PC Design:** Huang explained NVIDIA's RTX Spark superchip (with a 20-core Arm CPU) is designed for future AI agents that will autonomously run and use tools on PCs, blending local and cloud processing. 2. **Agent vs. OS:** Huang emphasized the operating system remains crucial, as AI agents rely on its APIs and tools to function. 3. **Growth Constraints:** He identified the shift to "useful AI" that generates profitable tokens as a primary driver for immense, almost limitless, computational demand. Haas outlined Arm's strategy across PC and data centers. For PCs, Arm collaborates with partners like NVIDIA and MediaTek, offering its compute subsystem (CSS) for custom SoCs. In data centers, its Arm AGI CPU (built on TSMC's 3nm process) has gained major partners including OpenAI, Meta, and now ByteDance and Oracle. Arm presented a multi-year roadmap for its in-house CPU line. The article concludes that while GPUs dominated the AI training race, the explosion of AI agents is shifting significant focus to CPUs for inference, state management, and tool orchestration. The industry is trending towards vertical integration, with companies like cloud providers designing chips and chip/IP firms offering full solutions, all competing to deliver more efficient computing per watt.

marsbit1 год тому

ByteDance Adopts Arm CPUs, Jensen Huang: So Sad I Didn't Buy Arm

marsbit1 год тому

Торгівля

Спот
Ф'ючерси
活动图片