I Only Trust yuxinlu1 in the Hugging Face Model TOP Rankings Now

marsbit發佈於 2026-06-28更新於 2026-06-28

文章摘要

An independent developer has unexpectedly risen to the top of Hugging Face's Trending models chart, surpassing major tech companies. The developer, Yuxin Lu (yuxinlu1), has two 12B GGUF models (Coder and Agentic versions) based on Gemma-4-12B on the list, with downloads exceeding 200k and 500k respectively. His models distill coding and reasoning capabilities from sources like Fable 5 into a locally runnable format, requiring as little as 4.5GB of VRAM. Lu, a graduate student in AI, developed these models as a personal, self-funded project for skill development. He emphasizes data quality over quantity, using around 10,000 verified examples. The models' popularity stems from offering privacy and a free, offline alternative for coding assistance and agentic tasks, filling a niche not prioritized by larger companies focused on broader goals. Lu's other projects include general-purpose distillation models and Chinese web novel LoRAs. He advises fellow developers to prioritize honesty about model capabilities and persistence. He views his success not as defeating major players, but as focusing deeply on solving specific user needs with genuine effort. The models are best run using llama.cpp.

A solo developer actually fought their way to the top of the Hugging Face Models Trending list, standing tall among major tech companies?!

It was an ordinary day, and I was casually browsing the Trending list on Hugging Face as usual.

First place was GLM-5.2, Zhipu AI's latest open-source model, an old acquaintance with over 60k downloads—nothing surprising.

Second was Baidu's Unlimited-OCR, recently quietly open-sourced, capable of parsing over 40 pages of documents in one go, downloads also reaching 70k.

Looking further down, a personal account suddenly appeared: yuxinlu1.

Hmm...... Huh?!

And it occupied two spots at once.

Looking at the download counts—latest figures are a staggering 207k and 536k. Wow, what kind of magical model is this?

Even the week before, this solo developer's models had once dominated the Hugging Face charts, even surpassing GLM-5.2, with the head of Zhipu AI publicly recommending it on X:

So, among names like Zhipu AI, Baidu, Qwen, NVIDIA... a solo developer account squeezed into the TOP rankings, and with such high download counts.

This naturally raises curiosity: Who is luyuxin? How can they have so much influence?

"Amateur Model" Rushes Up the Hugging Face Hot List

On this wave of the Hugging Face hot list, the top spots are mostly held by major companies, star teams, and hot sectors.

For example, Zhipu AI's GLM-5.2, a massive 753B parameter count, a domestic star large model; Baidu's Unlimited-OCR, riding the recent wave of OCR and document understanding.

Further down are Qwen's AgentWorld, NVIDIA's LocateAnything, Microsoft's FastContext.

Familiar faces of domestic open-source large models are also there: MiniMax M3, Kimi-K2.7-Code, DeepSeek-V4-Pro.

The image generation direction also has Krea, with new models Krea-2-Turbo and Krea-2-Raw on the list.

And sandwiched in between are two of luyuxin's 12B GGUF models.

Nah... luyuxin, you stand out too much...

Looking closer, these two new models mainly distilled the programming and reasoning capabilities of Fable 5 into a small, locally runnable Gemma4-12B model.

It runs on just 4.5GB of VRAM, local, offline, zero API cost. An average user with a consumer-grade GPU, or even a Mac with unified memory, can run it.

The two models have different focuses.

V1 is the Coder version, focused on writing code, solving problems, generating executable code.

According to the model card, its training data consists of "verifiable" code reasoning: each reasoning chain's corresponding code had to actually run tests and pass before being kept.

Teacher data mainly came from Cursor's Composer 2.5, plus Fable 5—problems Composer 2.5 got wrong were re-reasoned by Fable 5 to generate new reasoning chains and correct code.

After V1's release, it topped the Hugging Face Trending list for multiple consecutive days.

V2 is the agentic version, adding multi-step tool-calling capability, usable as a local Agent, able to read, reason, act, and verify on its own.

The author also ran benchmarks—on the telecom subset of the tau2-bench, the base gemma-4-12B scored 15%, while the V2 model scored 55%, roughly 3.5 times the base performance.

However, the author also noted this is a relative value from local self-testing, a single domain, 20 tasks, and shouldn't be directly compared to official leaderboards. He candidly admitted there's still a significant gap compared to frontier large models.

The author also mentioned: Fable 5 was later taken offline, and only his own dataset still retains the "original" reasoning process from Fable 5.

For the missing reasoning portions in the community-contributed data, he used Claude Opus 4.8 (xhigh) to regenerate them, piece by piece, and filled them back in.

He also admitted the reconstructed trajectories "might differ from the original Fable 5," but it was the only feasible solution at the time.

He revealed in the discussion that this fine-tuning dataset actually only has about 10k examples. He emphasized that dataset size isn't as important as everyone thinks; what truly matters is quality, filtering, and verification.

Another very practical reason these models gained such high popularity on Hugging Face is: they can run locally.

Both models are in GGUF quantized format.

GGUF is a common local model format in the llama.cpp ecosystem. Users can load them directly with tools like llama.cpp, Ollama, LM Studio, Jan, etc.

This is especially attractive for coding scenarios. After all, writing code, browsing repositories, running commands, debugging often involve private projects and local environments. Being able to run on your own machine means not having to upload code to the cloud or pay API costs each time.

More importantly, the barrier to entry isn't too high.

The V1 model card states that the smallest Q2_K version is about 4.5GB. With just about 4.5GB of VRAM or unified memory, you can run a private, offline programming assistant.

The author's recommended sweet spot is Q4_K_M, about 6.87GB; the higher-quality Q8_0 is about 11.8GB.

For V2, being more agentic, the author didn't release a Q2_K version. The reason given is it failed stress tests and wasn't reliable enough.

So V2's smallest reliable version starts from Q3_K_M, about 5.7GB; the recommended Q4_K_M is still about 6.87GB.

The author also teased future plans—V3 is already on the way.

He said V3 will continue along the 12B path in the coding+agentic direction. The author admitted he didn't expect the performance gains from this post-training to be so significant, so he'll keep pushing forward.

Especially on the tau2-bench telecom subset, V2 still has some issues with "over-attempting, repeated retries," which V3 will aim to improve with more training.

On another front, he's also working on a larger version: Qwen3.6-27B. This essentially applies the same coding+agentic recipe to a larger base model, for users with more generous VRAM.

One Person, 40 Hours, Breaking into the Major Players' Midst

To single-handedly charge up the Hugging Face hot list, with combined downloads exceeding 700k, carving out a place among major companies and institutions...

Who exactly is this author?

After reaching out to the author, we also learned his story.

His name is Lu Yuxin, currently a graduate student in AI at a US university. His undergraduate degree was in Data and Business Analytics, and in between, he specifically studied full-stack development, learning front-end, back-end, software development, and data processing.

These two trending models are not his main focus; they are purely self-funded personal projects.

"Open source is actually just spending money; it doesn't bring you any income." He's very clear about this. His initial motivation for making V1 was actually "self-improvement":

University-taught knowledge updates too slowly. During his graduate studies, professors were still teaching content from two or three years ago, while AI evolves daily. So he used this project to force himself to catch up with the latest developments.

To create these models, he burned through an entire Claude Max 20× subscription. V2 alone took over 40 hours.

Synthesizing data piece by piece, manual cleaning, training, evaluation, re-training—almost all done alone.

For hardware, he used an RTX 5090 with 32GB VRAM; plus about 96GB of local SSD resources available. The actual usable resource scale was around 128GB.

Not bad for a solo developer, but completely incomparable to the compute pools of major companies and AI labs.

He told us that the most time-consuming part of the entire process wasn't training, but data processing.

Especially agentic data; real conversations are often long, with a task potentially having dozens of steps, thousands or even tens of thousands of tokens. But limited by VRAM, he could only feed 2048 tokens at a time during training.

So he did something like a "sliding window" process: within each multi-turn session, using the latest user message as an anchor point, centering around one tool call, and trimming the context to fit the budget.

Both V1 and V2 use Gemma 4-12B as the base. It wasn't chosen because it was easy; on the contrary, Gemma 4's format and tool protocols are quite special, making adaptation troublesome, and even many client-side supports aren't perfect.

Lu Yuxin said it was partly to challenge himself; on the other hand, because the 12B size is very attractive.

He calculated that if quantized to around 3-bit, many Mac users with 8GB unified memory could also run it, with some context window remaining.

I now know many people are still using computers with around 8GB of unified memory. So I wanted to make it usable for as many people as possible within the maximum feasible parameter count.

Lu Yuxin summarized the value of local models in two words:

Privacy, free.

He thinks many people just want AI to help them organize files, process data, make PPTs, or experience an agent, and aren't necessarily willing to pay monthly for Claude or GPT.

People might just want to play around, why does it have to be paid?

After releasing V1, he didn't pay much attention to the rankings at first, just stating in the model card as usual: if people liked it, and downloads and likes were high, he'd continue with V2.

Unexpectedly, two or three days later, the model suddenly jumped from an unknown rank to eighth; after sleeping, it surged to first.

Then, comments and issues flooded in.

He read almost every one. At most, he spent three to four hours a day reading Hugging Face comments, replying to questions, testing user feedback, and then informing the users of the results.

He said: "The community has needs, and I'm genuinely acting on them. That's the most crucial part."

Turns out, he's also a fan of web novels...

On HF, Lu Yuxin has released 9 public models in total. Besides the two trending models, he also made models that "directly distilled Claude."

For example, gemma-4-12B-it-Claude-4.6-4.8-Opus-GGUF can be understood as a general-purpose distilled version of Gemma4-12B.

It's not limited to programming; it's more about compressing Claude Opus's answer style, reasoning habits, and thinking capabilities into this 12B local model.

Another model simply uses JetBrains's programming model Mellum2 as the base, specifically for reasoning distillation.

Looking further down...

Wait, there are even fine-tuned models for web novels?

Wow, and they're divided into four genres, all Chinese web novel LoRAs, and all based on Qwen3.6.

Lu Yuxin told us this was actually his entry point into making Hugging Face models.

Because he personally enjoys reading novels. When chasing an unfinished novel, readers get anxious; authors also work hard writing daily updates.

So, he wanted to create a complete free novel generation pipeline, using different styles of Chinese novel LoRAs, allowing authors to speed up with AI, and readers to see content faster.

But Chinese novel LoRAs aren't that popular on HF. Later, he found users were more interested in coding and agentic models, so his direction gradually shifted to the current path.

When asked what advice he had for other solo developers, Lu Yuxin said: Honesty and persistence are most important.

Honesty means not exaggerating model capabilities. Be clear about what's strong and what's weak.

You have to tell everyone truthfully. If I lie to you about how strong my model is, but in actual use many problems arise, the next time I release something, you won't believe me.

Persistence means open-source authors must accept this: you will inevitably encounter negative voices.

After the models gained popularity, Lu Yuxin also faced skepticism, but he decided to persist.

In his view, the open-source path is inherently difficult.

Even topping the Hugging Face hot list doesn't directly bring income. More often, it's spending your own money on compute, time processing data, replying to comments, fixing bugs, and then facing a few negative voices.

What supported him along the way was also a very personal work rhythm.

Lu Yuxin mentioned he has ADHD.

In the past, this might have meant difficulty following a long-term, step-by-step schedule. But in the rapidly changing field of AI, quickly switching interests and rapidly entering hyperfocus has instead become an advantage.

He even believes: "The AI era belongs to those with ADHD." Because when one direction cools down, if you keep drilling into it, by the time you switch to learning something new, it might already be too late.

Towards the end of the conversation, we posed the initial question:

As a solo developer, how can you squeeze into the front row among major companies?

Lu Yuxin's answer was very balanced.

He believes major companies can certainly do better, with more researchers and stronger compute.

But when major companies release open-source small models, they often also bear goals like brand promotion, API traffic diversion, etc.; whereas solo developers don't have these burdens and can focus more on solving a specific pain point.

I'm happy, but it's not that I've completely defeated them. It's just that I might be a bit more dedicated.

In his view, this is precisely the opportunity for solo open-source authors: not needing to create a jack-of-all-trades model, but making a sufficiently specific problem work well.

If you also want to try this local model, the link is provided below.

Friendly reminder: Currently, the most compatible platform is llama.cpp, which is highly recommended for use~

HF Address: https://huggingface.co/yuxinlu1

This article is from the WeChat public account "QbitAI" (ID: QbitAI), author: Following Frontier Technology

你可能也喜歡

Bitmine以太坊储备增至98亿美元："加密货币最好的年份尚未到来"

比特浸入科技（Bitmine Immersion Technologies）近期再次成为头条，其在一周内增持了27,084枚以太坊（ETH）。这使得其以太坊总持有量达到5,700,040枚，按每枚1,569美元计算，价值约90.1亿美元，占以太坊总供应量的4.7%。此次增持发生在以太坊价格从约1780美元下跌至1578.54美元（撰稿时）的一周内。同时，根据SoSo Value数据，以太坊ETF在整个六月大部分时间出现资金外流，总额达5.0139亿美元。针对疲软的市场状况，比特浸入科技董事长汤姆·李（Tom Lee）表示，近期市场对加密货币投资者颇具挑战，并指出临近季度末的“粉饰橱窗”行为导致投资者减持过去三个月表现不佳的资产是常见现象。此外，迈克尔·赛勒（Michael Saylor）的公司Strategy正面临持续审查，据报道其持有约140亿美元未实现亏损，而其普通股和优先股价格均跌破100美元水平，引发加密社区部分人士建议其停止扩张比特币持仓。由于比特浸入科技常被称为“以太坊的Strategy”，市场担忧其持续的以太坊积累行为可能面临类似困境与批评。目前上市公司共持有价值约749.4亿美元的比特币和114.8亿美元的以太坊，Strategy是最大的比特币持仓上市公司。然而，目前这些担忧仅是推测。比特浸入科技并非单纯积累以太坊，其每年质押收入估计达2.11亿美元，同时持有5.55亿美元现金及等价物以及488万枚质押的ETH。该公司还于6月26日被纳入罗素1000大型股指数。汤姆·李强调，公司计划稳步增长至2026年，并认为市场正开启新一轮牛市周期，代币化和人工智能的快速进展将推动区块链和去中心化加密领域的指数级需求增长。最终摘要： * 新增持后，比特浸入科技持有5,700,040枚ETH，价值约90.1亿美元。 * 尽管以太坊价格疲软、ETF资金外流且Strategy面临批评，比特浸入科技仍持续购入以太坊。

ambcrypto1 小時前

ambcrypto1 小時前

英国FCA公布加密资产监管规则手册：基于风险的方法将于2027年10月启动

英国金融行为监管局公布新的加密货币监管框架，采取风险为本方法而非“一刀切”规则，将于2027年10月生效。新规要求加密公司持有充足资本覆盖潜在损失，具体金额将根据其风险状况浮动，较小或风险较低的公司可减少信息披露负担以节省合规成本。企业需自行评估资产负债表风险并进行年度压力测试，以确定所需资本水平，FCA将审核评估结果但不强加统一规则。此举旨在提升市场信心，吸引额外300-400万英国用户使用加密货币。针对稳定币，FCA保留了基本框架但简化了部分合规要求，例如取消储备构成预测估算，同时强化消费者保护，要求储备资产置于法定信托下并允许最多5%的流通稳定币作为储备。大型系统性发行机构可能面临更严监管。监管机构强调新规为加密行业提供了明确性与稳健基础，但也有市场人士提醒，监管虽可增强保护、减少欺诈，但无法完全消除风险。FCA将于下月开始提供许可申请前支持会议，以协助企业适应新规。

ambcrypto2 小時前

ambcrypto2 小時前

你天天用的Claude和Codex，Meta内部不让随便用了

今年5月，Meta为其应用AI工程部门的工程师划定了红线：限制内部使用Claude Code和Codex这两款流行的AI编程工具，相关限制至今仍在生效。作为这些工具的主要客户之一，Meta此举并非因其不好用，而是恰恰相反——担心其过于强大和好用。 Meta正在自研名为MetaCode的AI编程助手，旨在替代外部模型以节省成本并掌握核心技术。限制使用外部模型的核心原因，是防止“蒸馏陷阱”：即担忧员工在构建MetaCode的训练数据、编程题库和评测标准时，过度依赖或掺入Claude/Codex的输出。这会导致自研模型在不知不觉中学习对手的“本事”和判断标准，使能力来源模糊，并可能违反与OpenAI、Anthropic等竞争对手的服务条款，引发法律风险。内部指南明确禁止了可能让外部AI模型“定义能力”的三类任务：不能用其输出来生成测试题目、不能用其分析代码或设计测试点、其生成内容不得进入被测模型的访问环境。仅允许AI处理搭建工作流、整理文件等“打下手”的辅助性任务，且所有AI产出必须经过人工审核。这一事件揭示了AI行业的一个普遍困境：在利用强大外部工具加速自身研发的同时，如何清晰界定并守护自身模型能力的原创性，避免陷入知识产权与合同风险。随着AI参与创造AI的循环加深，“本事究竟是谁的”这条界线正变得越来越模糊。

marsbit2 小時前

marsbit2 小時前

为什么今天我们需要AI内容观？

亚马逊AI动画《朋克鸭》因伦理争议被叫停，折射出AI内容当前面临的困境。2026年AI视频技术取得突破，能产出完整视觉故事，推动短剧和仿真人内容爆发，院线级AI长片加速涌现。然而，AI在影视行业的应用也引发激烈争议，尤其围绕替代真人表演的伦理问题。 AI内容在不同媒介场景中适配度不同。短视频等“文化速食”内容追求快节奏、浅层情绪和免费模式，AI能高效提供海量供给，满足用户碎片化娱乐需求。但进入影视等“文化正餐”领域则面临挑战，因为影视承载着更深的情感表达、艺术创新和社会意义构建功能，其核心价值在于人的独特参与。 AI难以完全替代真人创作的价值。人在创作中的创新能力、劳动付出凝结的生命经验，以及基于真实情感和个性化表达的互动，是文化作品珍贵性的核心。尽管AI能提升生产效率、拉高质量均值，但易导致内容同质化，并可能通过低成本优势挤压人类创作空间，引发侵权和低质内容泛滥的风险。因此，发展AI内容需要建立明确的边界和规则，即“AI内容观”。其核心原则是：确保AI放大而非挤压人的创作空间；尊重而非掠夺人的创作成果；坚持人在创作中的主导地位与责任；保障AI创作的公开、透明与可溯源。最终目标是让人成为技术的“掌舵者”，在利用AI提升效率的同时，守护文化创作中人的主体性和核心价值，推动AI向善、文化向美。

marsbit3 小時前

marsbit3 小時前

普朗克被撤稿了？量子之父竟被算法绊了一跤

一篇新发表的论文指出，量子力学奠基人、诺贝尔奖得主马克斯·普朗克发表于1940年和1942年的两篇文章，在斯普林格出版社的数字平台上被标记为“已撤稿”。调查显示，这并非因为学术不端，而是现代出版平台的算法“误伤”。这两篇文章原是普朗克关于科学哲学的演讲与讨论，发表在当时德国重要的综合期刊《自然科学》上。在20世纪上半叶，这种将演讲内容发表于期刊或文集的做法是科学思想传播的常见方式。然而，现代数字出版平台的系统可能将其识别为“重复发表”或“版权违规”，从而自动添加了撤稿标记。更甚的是，原文在平台上已被替换为空白页，读者需通过互联网档案馆等非营利渠道才能查阅。此事暴露了历史文献数字化过程中的一个深层问题：当代基于文献计量和版权管理的自动化规则，与前数字时代的科学出版实践发生了错位。诸如“自我剽窃”等现代概念被反向施加于历史文献，导致其可访问性受损，科学记录的完整性面临挑战。在人工智能日益依赖结构化数据库的时代，此类错误标签或内容缺失可能被进一步放大，影响我们对科学历史的准确认知。这提醒我们，数字知识库并非中性镜像，而是受到商业逻辑和平台规则塑造的过滤器。

marsbit3 小時前

marsbit3 小時前

交易

現貨

I Only Trust yuxinlu1 in the Hugging Face Model TOP Rankings Now

文章摘要

"Amateur Model" Rushes Up the Hugging Face Hot List

One Person, 40 Hours, Breaking into the Major Players' Midst

Turns out, he's also a fan of web novels...

熱門幣種推薦

相關問答

你可能也喜歡

Bitmine以太坊储备增至98亿美元："加密货币最好的年份尚未到来"

英国FCA公布加密资产监管规则手册：基于风险的方法将于2027年10月启动

你天天用的Claude和Codex，Meta内部不让随便用了

为什么今天我们需要AI内容观？

普朗克被撤稿了？量子之父竟被算法绊了一跤

交易

熱門文章

如何購買TOP

相關討論

熱門問答

熱門分類

熱門標籤