Running Gemma 4 Locally on iPhone Goes Viral: How Far Are We from the Zero Token Era?

marsbitPubblicato 2026-04-06Pubblicato ultima volta 2026-04-06

Introduzione

Google's newly open-sourced Gemma 4 model, built on the same architecture as Gemini 3, has gained significant attention for its ability to run locally on mobile devices like the iPhone and Samsung Galaxy. With smaller versions such as E2B (2.3B parameters) and E4B (4.5B parameters), it supports native multimodal capabilities and offers a 128K context window. Users report impressive speeds—over 40 tokens per second on Apple chips with MLX optimization—making it feel "like magic." The model is accessible via Google’s official AI Edge Gallery app, ensuring ease of use and security. While Gemma 4 excels in tasks like text generation, coding, and image understanding, it struggles with more complex agent-based workflows, such as tool calling and structured outputs, where models like Qwen3-coder perform better. Despite some limitations in reasoning, Gemma 4’s local performance hints at a future where everyday AI tasks—chat, coding, reasoning—can be handled offline, reducing reliance on cloud-based token services. Although cloud models still lead in advanced reasoning and large-scale multi-agent tasks, the trend suggests that as hardware and quantization improve, on-device models will increasingly handle high-frequency simple tasks. This shift could disrupt the AI industry’s reliance on token sales and API subscriptions, pushing providers to focus on more complex, data-intensive capabilities. Gemma 4 is just the beginning of this transformation.

Machine Heart Editorial Department

Google's newly open-sourced model, Gemma 4, released a few days ago, gave the industry a huge surprise.

It adopts the same technological architecture as Gemini 3, supports native full-modality, ranked third globally on the Arena AI leaderboard, and comes in multiple model sizes. Several smaller models — E2B (2.3B effective parameters) and E4B (4.5B effective parameters) — can be deployed directly to run locally on mobile devices, with a context window of 128K. They can be described as a "Gemini alternative that fits in your pocket".

As expected, the model quickly became a new toy for mobile users after its release.

Among them, a post by an X user was viewed hundreds of thousands of times. In the post, he shared a video demonstrating how he ran Gemma 4 locally on an iPhone, including processing images, audio, and controlling the flashlight. He stated that Gemma 4 is incredibly fast, feeling like magic.

Someone quantified this speed on an iPhone 17 Pro, pointing out that if the phone uses Apple silicon, the model's inference speed can exceed 40 tokens per second with the help of MLX (Apple's machine learning framework) optimized for this chipset.

Others achieved similar speeds on a Samsung Galaxy, even with a 'thinking mode' enabled. This led people to exclaim that it's "unbelievably fast".

Such speeds make running AI models on mobile devices a viable option for the future, and are particularly useful in sensitive scenarios like healthcare.

The 128k context window also makes these small models more attractive.

So how do you run it? It's actually very simple and not exclusive to geeks, because Google released an official App — Google AI Edge Gallery. Those who want to experience it on their phone can directly download this App, then download the desired model version, and open it to run.

Moreover, since it's officially released by Google, security concerns are naturally less of an issue.

Beyond these small models running on phones, some have tried larger versions of Gemma 4 on more powerful hardware, such as running Gemma 4 Mixture-of-Experts 26B on a MacBook Pro with an M5 Pro chip.

For direct conversation, this model is still very fast, with smooth text generation and code explanation.

But when he actually tried to use Gemma 4 as a coding agent, problems arose. Because running an agent requires a large context (Gemma 4 26B has a 256k context window), complex prompts, and stable tool calls, Gemma 4 clearly couldn't handle it, often freezing, reporting errors, or outputting incorrect structures.

The turning point came when he switched the model to qwen3-coder. In the same environment, file creation, command execution, and multi-step tasks all ran normally. He believes the problem lies not with the agent framework, but with whether the model itself has been optimized for "tool calling + structured output". In this regard, Gemma 4 might not be sufficient, or perhaps this developer hasn't found the correct method yet.

Additionally, some say that Gemma 4's intelligence level is still somewhat lacking.

Even so, the emergence of a "performance powerhouse" like Gemma 4 should not be underestimated. If in the future, a large number of daily queries, chats, simple reasoning, code generation, and image understanding tasks can all be run locally without needing to buy tokens, wouldn't vendors who sell tokens be in an awkward position?

Of course, the current situation is not that pessimistic yet. After all, there is still a gap between the currently open-sourced models and the cutting-edge closed-source flagship models. Furthermore, most capable open-source models are still constrained by hardware capabilities and暂时 (zànshí - temporarily) haven't reached a usable level on the device side.

But the future trend is clear. In the short term, cloud-based closed-source models will still lead in cutting-edge complex reasoning and ultra-large-scale multi-agent collaboration. But in the long term, as hardware continues to advance and quantization techniques continue to optimize, on-device models will gradually encroach on the cloud's high-frequency simple tasks.

Those vendors who rely solely on selling tokens and API subscriptions will have to compete more fiercely on the "truly tough" parts — super-powered Agents, ultra-long reliable context, and specialized capabilities requiring massive real-time data.

Gemma 4 is just the beginning. The next surprise might be an on-device model that, in daily use, completely makes users unaware of the difference between "local" and "cloud". When that day comes, the entire AI industry's business model will undergo a real reshuffle.

This article is from the WeChat public account "Machine Heart" (ID: almosthuman2014), author: Machine Heart

Crypto di tendenza

Haedal ProtocolHAEDAL

Domande pertinenti

QWhat is the key feature of Google's newly open-sourced Gemma 4 model that makes it suitable for mobile devices?

AGemma 4 has smaller variants, E2B (2.3B effective parameters) and E4B (4.5B effective parameters), which are designed to run locally on mobile devices with a 128K context window.

QWhat speed was reported for running Gemma 4 on an iPhone 17 Pro with Apple's MLX framework?

AThe model's inference speed was reported to exceed 40 tokens per second on an iPhone 17 Pro using Apple's optimized MLX framework.

QWhat is the name of the official Google app that allows users to easily run Gemma 4 on their mobile devices?

AThe official app is called 'Google AI Edge Gallery', where users can download the model and run it directly.

QWhat was a significant limitation observed when using the larger Gemma 4 26B model as a coding agent?

AThe Gemma 4 26B model struggled with tasks requiring large context (256K), complex prompts, and stable tool calls, often leading to crashes, errors, or incorrect structured outputs.

QAccording to the article, what long-term impact could the advancement of on-device models like Gemma 4 have on the AI industry?

AOn-device models could gradually erode the market for cloud-based models on high-frequency simple queries, forcing API and token-selling companies to focus on more complex areas like super-powered Agents, ultra-long reliable context, and capabilities requiring massive real-time data.

Letture associate

LATEST NEWS: Donald Trump makes a sharp statement regarding Iran! He has halted attacks

U.S. President Donald Trump announced he called off planned military strikes against Iran after Saudi Arabia, the UAE, Qatar, and Iran itself requested a delay. Trump stated the planned operation would have been large-scale and powerful but was suspended to allow time for diplomatic negotiations. He added that regional allies believe an agreement is near, with initial talks focused on security and reopening the Strait of Hormuz. Negotiations on Iran's nuclear program would follow once that is settled. The Strait of Hormuz is a vital global chokepoint for oil and gas shipments, and conflict there could significantly impact energy prices and world trade. Trump further announced that new talks with Iran will begin tomorrow. Separately, Trump commented on events involving the Japanese yen, stating the U.S. intervened in the market due to good relations with Japan, asserting Washington's consistent support for Tokyo and mutual economic benefits from the relevant rules. *This is not an investment recommendation.

cryptonews.ru1 h fa

LATEST NEWS: Donald Trump makes a sharp statement regarding Iran! He has halted attacks

cryptonews.ru1 h fa

Bank of Italy Finds No Systemic Advantages of Stablecoins in Transfers

A study by the Bank of Italy found that stablecoins do not offer a consistent advantage in cost or speed for cross-border money transfers. The research compared sending 200 USDC in 10 bilateral corridors (Italy to Brazil, Argentina, Japan, UAE, and South Africa) against standard money transfer services. While the final cost of stablecoin transfers ranged from 0.3% to nearly 9%, and were often cheaper than the global average cost of 6.65%, they only outperformed services like Wise in three out of seven comparable corridors. Key costs and delays were attributed to fees for converting to and from fiat currency and the quality of local payment infrastructure, not blockchain fees. Transfer times varied from under 20 minutes in corridors with instant payment systems to one or two business days where such infrastructure was lacking. The authors concluded that stablecoins' benefits would be more significant if they could be spent directly without conversion and noted that overly restrictive regulations complicate retail use without eliminating demand.

cryptonews.ru2 h fa

Bank of Italy Finds No Systemic Advantages of Stablecoins in Transfers

cryptonews.ru2 h fa

Bitcoin Chart Pattern 'Head and Shoulders' Promises a Rise to $67,200

Bitcoin price action is forming a potential bullish reversal pattern. Currently trading around $63,200, BTC is shaping the right shoulder of an inverse head-and-shoulders formation. Analysts note this pattern is the primary reason for short-term bullish optimism, targeting a key breakout toward $67,200. However, market dynamics show a rotation of liquidity into Ethereum. The ETH/BTC pair has already broken upward, with ETH establishing an uptrend and targeting 0.0312. Against the US dollar, ETH is testing support near $1,875, with a path to $2,163 if it holds. This relative strength in ETH signals overall market positivity but drains volume from Bitcoin. The near-term outlook for BTC hinges on a decisive breakout above the pattern's neckline. Failure to do so could see bears push prices toward support levels at $60,000 and $58,000.

cryptonews.ru2 h fa

Bitcoin Chart Pattern 'Head and Shoulders' Promises a Rise to $67,200

cryptonews.ru2 h fa

Bitcoin Boom in Full Swing: Saylor's Latest Statement Fuels Buying Speculation

MicroStrategy's Executive Chairman Michael Saylor has fueled speculation about a new Bitcoin purchase by posting "Bitcoin Drive engaged" on August 2, accompanied by the company's customary purchase tracker. This aligns with his pattern of hinting at treasury changes ahead of weekly reports. The accompanying report showed MicroStrategy's Bitcoin holdings at 843,775 BTC, with an average cost of $75,653 per coin and an unrealized loss of -$10.58B. A similar signal preceded the company's July 27 announcement, strengthening expectations for a treasury update on Monday. However, MicroStrategy's real-time ledger reflects two recent Bitcoin sales totaling 3,588 BTC, reducing holdings from 847,363 BTC to the current 843,775 BTC. The company stated these sales funded preferred stock dividends and replenished its U.S. dollar reserve. Recent reports indicate the company made no Bitcoin purchases the week ending July 26 while increasing its dollar reserve to approximately $3.75B. The company faces financial headwinds after reporting an $8.33B operating loss for Q2 2026, including an $8.32B unrealized loss on its digital assets. Management may sell up to $1.25B more in Bitcoin to meet cash obligations. The expected Monday update will reveal if the "Bitcoin Drive" signal marks a return to accumulation as MicroStrategy balances its massive Bitcoin stash against growing cash commitments.

cryptonews.ru2 h fa

Bitcoin Boom in Full Swing: Saylor's Latest Statement Fuels Buying Speculation

cryptonews.ru2 h fa

AI Company Stocks Trading Like 'Memecoins' as Bitcoin Barely Moves — Weekly Review

Weekly Review Summary: AI stocks traded erratically like memecoins this week, while Bitcoin remained relatively flat around $64,000. Major market volatility stemmed from a forced liquidation by the "Situational Awareness" fund, contributing to a sharp sell-off in chip and AI stocks, particularly impacting Asian markets like South Korea's KOSPI. The Fed's signals and broader macro concerns added to the uncertainty. In crypto, the news was largely overshadowed by traditional finance but remained negative. Several crypto firms announced closures (BitMart) or bankruptcies (Storj Labs), and layoffs continued across the industry (Coinbase, Uphold). MicroStrategy notably shifted its strategy, using proceeds to buy back its own stock instead of purchasing more Bitcoin, drawing criticism. DeFi saw success stories like Trade.xyz on Hyperliquid, but also potential risks from insider trading and questions about platform dependence (e.g., Pump.fun on Solana). The intersection of AI and crypto gained attention with renewed hype around projects like Bittensor ($TAO). A critical security warning was reiterated for Coldcard wallet users regarding a potential private key vulnerability, emphasizing the high responsibility of self-custody. The overall tone cautioned against panic but urged preparedness amid a challenging market.

cryptonews.ru2 h fa

Trading

Spot

Articoli Popolari

Come comprare 4

Benvenuto in HTX.com! Abbiamo reso l'acquisto di 4 (4) semplice e conveniente. Segui la nostra guida passo passo per intraprendere il tuo viaggio nel mondo delle criptovalute.Step 1: Crea il tuo Account HTXUsa la tua email o numero di telefono per registrarti il tuo account gratuito su HTX. Vivi un'esperienza facile e sblocca tutte le funzionalità,Crea il mio accountStep 2: Vai in Acquista crypto e seleziona il tuo metodo di pagamentoCarta di credito/debito: utilizza la tua Visa o Mastercard per acquistare immediatamente 44.Bilancio: Usa i fondi dal bilancio del tuo account HTX per fare trading senza problemi.Terze parti: abbiamo aggiunto metodi di pagamento molto utilizzati come Google Pay e Apple Pay per maggiore comodità.P2P: Fai trading direttamente con altri utenti HTX.Over-the-Counter (OTC): Offriamo servizi su misura e tassi di cambio competitivi per i trader.Step 3: Conserva 4 (4)Dopo aver acquistato 4 (4), conserva nel tuo account HTX. In alternativa, puoi inviare tramite trasferimento blockchain o scambiare per altre criptovalute.Step 4: Scambia 4 (4)Scambia facilmente 4 (4) nel mercato spot di HTX. Accedi al tuo account, seleziona la tua coppia di trading, esegui le tue operazioni e monitora in tempo reale. Offriamo un'esperienza user-friendly sia per chi ha appena iniziato che per i trader più esperti.

436 Totale visualizzazioniPubblicato il 2025.10.20Aggiornato il 2026.06.02

Discussioni

Benvenuto nella Community HTX. Qui puoi rimanere informato sugli ultimi sviluppi della piattaforma e accedere ad approfondimenti esperti sul mercato. Le opinioni degli utenti sul prezzo di 4 4 sono presentate come di seguito.