Running Gemma 4 Locally on iPhone Goes Viral: How Far Are We from the Zero Token Era?

marsbit2026-04-06 tarihinde yayınlandı2026-04-06 tarihinde güncellendi

Özet

Google's newly open-sourced Gemma 4 model, built on the same architecture as Gemini 3, has gained significant attention for its ability to run locally on mobile devices like the iPhone and Samsung Galaxy. With smaller versions such as E2B (2.3B parameters) and E4B (4.5B parameters), it supports native multimodal capabilities and offers a 128K context window. Users report impressive speeds—over 40 tokens per second on Apple chips with MLX optimization—making it feel "like magic." The model is accessible via Google’s official AI Edge Gallery app, ensuring ease of use and security. While Gemma 4 excels in tasks like text generation, coding, and image understanding, it struggles with more complex agent-based workflows, such as tool calling and structured outputs, where models like Qwen3-coder perform better. Despite some limitations in reasoning, Gemma 4’s local performance hints at a future where everyday AI tasks—chat, coding, reasoning—can be handled offline, reducing reliance on cloud-based token services. Although cloud models still lead in advanced reasoning and large-scale multi-agent tasks, the trend suggests that as hardware and quantization improve, on-device models will increasingly handle high-frequency simple tasks. This shift could disrupt the AI industry’s reliance on token sales and API subscriptions, pushing providers to focus on more complex, data-intensive capabilities. Gemma 4 is just the beginning of this transformation.

Machine Heart Editorial Department

Google's newly open-sourced model, Gemma 4, released a few days ago, gave the industry a huge surprise.

It adopts the same technological architecture as Gemini 3, supports native full-modality, ranked third globally on the Arena AI leaderboard, and comes in multiple model sizes. Several smaller models — E2B (2.3B effective parameters) and E4B (4.5B effective parameters) — can be deployed directly to run locally on mobile devices, with a context window of 128K. They can be described as a "Gemini alternative that fits in your pocket".

As expected, the model quickly became a new toy for mobile users after its release.

Among them, a post by an X user was viewed hundreds of thousands of times. In the post, he shared a video demonstrating how he ran Gemma 4 locally on an iPhone, including processing images, audio, and controlling the flashlight. He stated that Gemma 4 is incredibly fast, feeling like magic.

Someone quantified this speed on an iPhone 17 Pro, pointing out that if the phone uses Apple silicon, the model's inference speed can exceed 40 tokens per second with the help of MLX (Apple's machine learning framework) optimized for this chipset.

Others achieved similar speeds on a Samsung Galaxy, even with a 'thinking mode' enabled. This led people to exclaim that it's "unbelievably fast".

Such speeds make running AI models on mobile devices a viable option for the future, and are particularly useful in sensitive scenarios like healthcare.

The 128k context window also makes these small models more attractive.

So how do you run it? It's actually very simple and not exclusive to geeks, because Google released an official App — Google AI Edge Gallery. Those who want to experience it on their phone can directly download this App, then download the desired model version, and open it to run.

Moreover, since it's officially released by Google, security concerns are naturally less of an issue.

Beyond these small models running on phones, some have tried larger versions of Gemma 4 on more powerful hardware, such as running Gemma 4 Mixture-of-Experts 26B on a MacBook Pro with an M5 Pro chip.

For direct conversation, this model is still very fast, with smooth text generation and code explanation.

But when he actually tried to use Gemma 4 as a coding agent, problems arose. Because running an agent requires a large context (Gemma 4 26B has a 256k context window), complex prompts, and stable tool calls, Gemma 4 clearly couldn't handle it, often freezing, reporting errors, or outputting incorrect structures.

The turning point came when he switched the model to qwen3-coder. In the same environment, file creation, command execution, and multi-step tasks all ran normally. He believes the problem lies not with the agent framework, but with whether the model itself has been optimized for "tool calling + structured output". In this regard, Gemma 4 might not be sufficient, or perhaps this developer hasn't found the correct method yet.

Additionally, some say that Gemma 4's intelligence level is still somewhat lacking.

Even so, the emergence of a "performance powerhouse" like Gemma 4 should not be underestimated. If in the future, a large number of daily queries, chats, simple reasoning, code generation, and image understanding tasks can all be run locally without needing to buy tokens, wouldn't vendors who sell tokens be in an awkward position?

Of course, the current situation is not that pessimistic yet. After all, there is still a gap between the currently open-sourced models and the cutting-edge closed-source flagship models. Furthermore, most capable open-source models are still constrained by hardware capabilities and暂时 (zànshí - temporarily) haven't reached a usable level on the device side.

But the future trend is clear. In the short term, cloud-based closed-source models will still lead in cutting-edge complex reasoning and ultra-large-scale multi-agent collaboration. But in the long term, as hardware continues to advance and quantization techniques continue to optimize, on-device models will gradually encroach on the cloud's high-frequency simple tasks.

Those vendors who rely solely on selling tokens and API subscriptions will have to compete more fiercely on the "truly tough" parts — super-powered Agents, ultra-long reliable context, and specialized capabilities requiring massive real-time data.

Gemma 4 is just the beginning. The next surprise might be an on-device model that, in daily use, completely makes users unaware of the difference between "local" and "cloud". When that day comes, the entire AI industry's business model will undergo a real reshuffle.

This article is from the WeChat public account "Machine Heart" (ID: almosthuman2014), author: Machine Heart

Trend Kriptolar

İlgili Sorular

QWhat is the key feature of Google's newly open-sourced Gemma 4 model that makes it suitable for mobile devices?

AGemma 4 has smaller variants, E2B (2.3B effective parameters) and E4B (4.5B effective parameters), which are designed to run locally on mobile devices with a 128K context window.

QWhat speed was reported for running Gemma 4 on an iPhone 17 Pro with Apple's MLX framework?

AThe model's inference speed was reported to exceed 40 tokens per second on an iPhone 17 Pro using Apple's optimized MLX framework.

QWhat is the name of the official Google app that allows users to easily run Gemma 4 on their mobile devices?

AThe official app is called 'Google AI Edge Gallery', where users can download the model and run it directly.

QWhat was a significant limitation observed when using the larger Gemma 4 26B model as a coding agent?

AThe Gemma 4 26B model struggled with tasks requiring large context (256K), complex prompts, and stable tool calls, often leading to crashes, errors, or incorrect structured outputs.

QAccording to the article, what long-term impact could the advancement of on-device models like Gemma 4 have on the AI industry?

AOn-device models could gradually erode the market for cloud-based models on high-frequency simple queries, forcing API and token-selling companies to focus on more complex areas like super-powered Agents, ultra-long reliable context, and capabilities requiring massive real-time data.

İlgili Okumalar

No Sales Team, $20 Million in Revenue: How Did AI Employee Viktor Win Over 30,000 Companies?

The AI employee Viktor, developed by a team with DeepMind background, has achieved $20 million in annual revenue without a traditional sales team, serving over 30,000 companies. Its core innovation lies in positioning itself as a "Tier 3 AI Coworker" capable of "end-to-end execution and delivery of results," moving beyond the "draft and wait for human completion" model of typical AI assistants. Users can simply mention Viktor in Slack or Microsoft Teams using natural language commands, and it autonomously performs tasks like pulling sales data from a CRM, generating reports, or even cross-tool operations like creating board meeting PPTs by aggregating data from six different sources. Key to its growth is a pure Product-Led Growth (PLG) model, eliminating complex implementation cycles and per-seat licensing. Instead, it charges based on task credits or consumption, lowering the trial barrier with a $100 free credit offer and no credit card required. This enabled viral, bottom-up adoption within organizations. Viktor's interaction paradigm removes the barrier of prompt engineering, allowing non-technical employees to delegate complex workflows seamlessly. It also features proactive, automated task execution (e.g., overnight bookkeeping, scheduled reports) based on triggers, effectively embedding AI as an automated "process layer" within business operations. However, its expansion into Microsoft Teams—a platform with 320 million users—highlights challenges. Large enterprises require stringent IT compliance, security reviews (e.g., SOC 2), and governance, potentially hindering the frictionless, user-driven adoption that succeeded in Slack. Additionally, the "black box" nature of its autonomous decision-making raises concerns about operational risks, data integrity, and the need for robust audit logs and permission controls. Balancing efficiency gains with security and trust remains a critical hurdle for Viktor and similar AI agents aiming to become core enterprise infrastructure.

marsbit18 dk önce

No Sales Team, $20 Million in Revenue: How Did AI Employee Viktor Win Over 30,000 Companies?

marsbit18 dk önce

Manus Buyback Plan Emerges: Chinese Investors Plan to Repurchase Equity with $2 Billion, Path to Hong Kong IPO Becomes Clearer

According to a report by The Information, early Chinese investors of Manus, including Tencent, Sequoia Capital China, and ZhenFund, are planning to repurchase the company from Meta for $2 billion—the same price Meta paid in its acquisition last December. This move is a direct response to the Chinese government's prohibition of the foreign acquisition in April. As part of the repurchase plan, Manus is considering establishing a Sino-foreign joint venture within China. This structure is seen as a way to ensure regulatory compliance for its Chinese investors and to pave the way for a future IPO in Hong Kong. Notably, U.S. investor Benchmark will not participate in the buyback, which will concentrate ownership even more among Chinese capital. Since its acquisition by Meta, Manus's business has grown rapidly, with its annualized revenue run rate reportedly increasing four-to-fivefold to $400-$500 million in roughly six months. This strong growth underpins the investors' willingness to repurchase at the original price. Financially, the forced unwinding of the deal may benefit the early investors, allowing them to regain equity at a cost far below the company's current implied valuation, with the added prospect of an independent future listing. However, specific terms of the repurchase, including funding proportions and the joint venture's equity structure, are still under negotiation. This "repurchase-joint venture-Hong Kong IPO" approach could serve as a reference model for other Chinese AI startups navigating cross-border M&A regulations.

marsbit44 dk önce

Manus Buyback Plan Emerges: Chinese Investors Plan to Repurchase Equity with $2 Billion, Path to Hong Kong IPO Becomes Clearer

marsbit44 dk önce

STRC Loses Peg by 11%, Can Strategy's Perpetual Motion Machine Keep Running?

The article discusses the significant and concerning depegging of MicroStrategy's (MSTR) preferred stock, STRC. Designed to trade near its $100 target par value, STRC has recently fallen sharply, reaching a low of $83.26 and closing at $88.59, representing an over 11% discount. STRC is a core component of MicroStrategy's financial strategy. As a perpetual preferred stock, it allows the company to raise capital through an "at-the-market" (ATM) issuance program without diluting common shareholders (MSTR). This capital is primarily used to purchase Bitcoin, creating a "capital flywheel": issuing STRC → raising cash → buying BTC → increasing net assets → supporting STRC's value. The flywheel's operation depends on STRC maintaining its $100 price. To enforce this, MicroStrategy employs a dynamic dividend mechanism, recently raising the rate to 11.5% and increasing payout frequency. However, this has failed to halt the depegging, indicating market concerns extend beyond yield. Analysts cite two main reasons. First, technical factors like forced liquidations from leveraged arbitrage trades may have exacerbated the sell-off. Second, and more fundamentally, is waning confidence in MicroStrategy's financial resilience. A JPMorgan report highlighted the company's limited cash relative to its ~$1.7 billion annual dividend obligation, raising liquidity concerns. While MicroStrategy counters that its massive Bitcoin holdings provide decades of coverage, this argument relies on the potential need to sell BTC—a departure from its long-standing "never sell" narrative. The company's recent sale of a small amount of Bitcoin for "testing," despite being framed as minor, has intensified these fears. The persistent depegging threatens to cripple MicroStrategy's primary funding channel. If STRC remains discounted, the company's ability to fund further Bitcoin purchases weakens. Should cash reserves dwindle while financing is constrained, the market may increasingly price in the risk of MicroStrategy becoming a forced seller of Bitcoin to meet obligations. This shift from a major marginal buyer to a potential seller could pose significant downside risk to the broader Bitcoin market.

链捕手53 dk önce

STRC Loses Peg by 11%, Can Strategy's Perpetual Motion Machine Keep Running?

链捕手53 dk önce

İşlemler

Spot
Futures

Popüler Makaleler

4 Nasıl Satın Alınır

HTX.com’a hoş geldiniz! 4 (4) satın alma işlemlerini basit ve kullanışlı bir hâle getirdik. Adım adım açıkladığımız rehberimizi takip ederek kripto yolculuğunuza başlayın. 1. Adım: HTX Hesabınızı OluşturunHTX'te ücretsiz bir hesap açmak için e-posta adresinizi veya telefon numaranızı kullanın. Sorunsuzca kaydolun ve tüm özelliklerin kilidini açın. Hesabımı Aç2. Adım: Kripto Satın Al Bölümüne Gidin ve Ödeme Yönteminizi SeçinKredi/Banka Kartı: Visa veya Mastercard'ınızı kullanarak anında 4 (4) satın alın.Bakiye: Sorunsuz bir şekilde işlem yapmak için HTX hesap bakiyenizdeki fonları kullanın.Üçüncü Taraflar: Kullanımı kolaylaştırmak için Google Pay ve Apple Pay gibi popüler ödeme yöntemlerini ekledik.P2P: HTX'teki diğer kullanıcılarla doğrudan işlem yapın.Borsa Dışı (OTC): Yatırımcılar için kişiye özel hizmetler ve rekabetçi döviz kurları sunuyoruz.3. Adım: 4 (4) Varlıklarınızı Saklayın4 (4) satın aldıktan sonra HTX hesabınızda saklayın. Alternatif olarak, blok zinciri transferi yoluyla başka bir yere gönderebilir veya diğer kripto para birimlerini takas etmek için kullanabilirsiniz.4. Adım: 4 (4) Varlıklarınızla İşlem YapınHTX'in spot piyasasında 4 (4) ile kolayca işlemler yapın.Hesabınıza erişin, işlem çiftinizi seçin, işlemlerinizi gerçekleştirin ve gerçek zamanlı olarak izleyin. Hem yeni başlayanlar hem de deneyimli yatırımcılar için kullanıcı dostu bir deneyim sunuyoruz.

393 Toplam GörüntülenmeYayınlanma 2025.10.20Güncellenme 2026.06.02

4 Nasıl Satın Alınır

Tartışmalar

HTX Topluluğuna hoş geldiniz. Burada, en son platform gelişmeleri hakkında bilgi sahibi olabilir ve profesyonel piyasa görüşlerine erişebilirsiniz. Kullanıcıların 4 (4) fiyatı hakkındaki görüşleri aşağıda sunulmaktadır.

活动图片