Running Gemma 4 Locally on iPhone Goes Viral: How Far Are We from the Zero Token Era?

marsbitPublicado em 2026-04-06Última atualização em 2026-04-06

Resumo

Google's newly open-sourced Gemma 4 model, built on the same architecture as Gemini 3, has gained significant attention for its ability to run locally on mobile devices like the iPhone and Samsung Galaxy. With smaller versions such as E2B (2.3B parameters) and E4B (4.5B parameters), it supports native multimodal capabilities and offers a 128K context window. Users report impressive speeds—over 40 tokens per second on Apple chips with MLX optimization—making it feel "like magic." The model is accessible via Google’s official AI Edge Gallery app, ensuring ease of use and security. While Gemma 4 excels in tasks like text generation, coding, and image understanding, it struggles with more complex agent-based workflows, such as tool calling and structured outputs, where models like Qwen3-coder perform better. Despite some limitations in reasoning, Gemma 4’s local performance hints at a future where everyday AI tasks—chat, coding, reasoning—can be handled offline, reducing reliance on cloud-based token services. Although cloud models still lead in advanced reasoning and large-scale multi-agent tasks, the trend suggests that as hardware and quantization improve, on-device models will increasingly handle high-frequency simple tasks. This shift could disrupt the AI industry’s reliance on token sales and API subscriptions, pushing providers to focus on more complex, data-intensive capabilities. Gemma 4 is just the beginning of this transformation.

Machine Heart Editorial Department

Google's newly open-sourced model, Gemma 4, released a few days ago, gave the industry a huge surprise.

It adopts the same technological architecture as Gemini 3, supports native full-modality, ranked third globally on the Arena AI leaderboard, and comes in multiple model sizes. Several smaller models — E2B (2.3B effective parameters) and E4B (4.5B effective parameters) — can be deployed directly to run locally on mobile devices, with a context window of 128K. They can be described as a "Gemini alternative that fits in your pocket".

As expected, the model quickly became a new toy for mobile users after its release.

Among them, a post by an X user was viewed hundreds of thousands of times. In the post, he shared a video demonstrating how he ran Gemma 4 locally on an iPhone, including processing images, audio, and controlling the flashlight. He stated that Gemma 4 is incredibly fast, feeling like magic.

Someone quantified this speed on an iPhone 17 Pro, pointing out that if the phone uses Apple silicon, the model's inference speed can exceed 40 tokens per second with the help of MLX (Apple's machine learning framework) optimized for this chipset.

Others achieved similar speeds on a Samsung Galaxy, even with a 'thinking mode' enabled. This led people to exclaim that it's "unbelievably fast".

Such speeds make running AI models on mobile devices a viable option for the future, and are particularly useful in sensitive scenarios like healthcare.

The 128k context window also makes these small models more attractive.

So how do you run it? It's actually very simple and not exclusive to geeks, because Google released an official App — Google AI Edge Gallery. Those who want to experience it on their phone can directly download this App, then download the desired model version, and open it to run.

Moreover, since it's officially released by Google, security concerns are naturally less of an issue.

Beyond these small models running on phones, some have tried larger versions of Gemma 4 on more powerful hardware, such as running Gemma 4 Mixture-of-Experts 26B on a MacBook Pro with an M5 Pro chip.

For direct conversation, this model is still very fast, with smooth text generation and code explanation.

But when he actually tried to use Gemma 4 as a coding agent, problems arose. Because running an agent requires a large context (Gemma 4 26B has a 256k context window), complex prompts, and stable tool calls, Gemma 4 clearly couldn't handle it, often freezing, reporting errors, or outputting incorrect structures.

The turning point came when he switched the model to qwen3-coder. In the same environment, file creation, command execution, and multi-step tasks all ran normally. He believes the problem lies not with the agent framework, but with whether the model itself has been optimized for "tool calling + structured output". In this regard, Gemma 4 might not be sufficient, or perhaps this developer hasn't found the correct method yet.

Additionally, some say that Gemma 4's intelligence level is still somewhat lacking.

Even so, the emergence of a "performance powerhouse" like Gemma 4 should not be underestimated. If in the future, a large number of daily queries, chats, simple reasoning, code generation, and image understanding tasks can all be run locally without needing to buy tokens, wouldn't vendors who sell tokens be in an awkward position?

Of course, the current situation is not that pessimistic yet. After all, there is still a gap between the currently open-sourced models and the cutting-edge closed-source flagship models. Furthermore, most capable open-source models are still constrained by hardware capabilities and暂时 (zànshí - temporarily) haven't reached a usable level on the device side.

But the future trend is clear. In the short term, cloud-based closed-source models will still lead in cutting-edge complex reasoning and ultra-large-scale multi-agent collaboration. But in the long term, as hardware continues to advance and quantization techniques continue to optimize, on-device models will gradually encroach on the cloud's high-frequency simple tasks.

Those vendors who rely solely on selling tokens and API subscriptions will have to compete more fiercely on the "truly tough" parts — super-powered Agents, ultra-long reliable context, and specialized capabilities requiring massive real-time data.

Gemma 4 is just the beginning. The next surprise might be an on-device model that, in daily use, completely makes users unaware of the difference between "local" and "cloud". When that day comes, the entire AI industry's business model will undergo a real reshuffle.

This article is from the WeChat public account "Machine Heart" (ID: almosthuman2014), author: Machine Heart

Criptomoedas em alta

Haedal ProtocolHAEDAL

Perguntas relacionadas

QWhat is the key feature of Google's newly open-sourced Gemma 4 model that makes it suitable for mobile devices?

AGemma 4 has smaller variants, E2B (2.3B effective parameters) and E4B (4.5B effective parameters), which are designed to run locally on mobile devices with a 128K context window.

QWhat speed was reported for running Gemma 4 on an iPhone 17 Pro with Apple's MLX framework?

AThe model's inference speed was reported to exceed 40 tokens per second on an iPhone 17 Pro using Apple's optimized MLX framework.

QWhat is the name of the official Google app that allows users to easily run Gemma 4 on their mobile devices?

AThe official app is called 'Google AI Edge Gallery', where users can download the model and run it directly.

QWhat was a significant limitation observed when using the larger Gemma 4 26B model as a coding agent?

AThe Gemma 4 26B model struggled with tasks requiring large context (256K), complex prompts, and stable tool calls, often leading to crashes, errors, or incorrect structured outputs.

QAccording to the article, what long-term impact could the advancement of on-device models like Gemma 4 have on the AI industry?

AOn-device models could gradually erode the market for cloud-based models on high-frequency simple queries, forcing API and token-selling companies to focus on more complex areas like super-powered Agents, ultra-long reliable context, and capabilities requiring massive real-time data.

Leituras Relacionadas

For $100,000 a Month: Truth Social Sells Access to Trump's Posts to Investment Firms

In August 2026, Trump Media and Technology Group (TMTG) launched Truth API, a paid data service offering real-time access to posts from influential Truth Social accounts, including Donald Trump's, for institutional and algorithmic trading firms. Subscriptions reportedly cost up to $100,000 monthly, with discounts for long-term contracts. TMTG's CEO framed it as a strategy to monetize platform assets and create shareholder value. The move drew criticism from lawmakers, including Democrats Elizabeth Warren and Adam Schiff, who called for an SEC investigation, and Republican Bill Cassidy, who criticized the "sale" of privileged access. An AI analysis notes this creates a market risk architecture similar to past incidents where algorithms rapidly traded on unverified social media posts, raising questions about accountability for potential misinformation or manipulation.

cryptonews.ruHá 33m

For $100,000 a Month: Truth Social Sells Access to Trump's Posts to Investment Firms

cryptonews.ruHá 33m

Strategy leaves preferred STRC dividend at 12% as price still below par

Strategy's preferred STRC shares remain priced significantly below their $100 par value, closing July at $89.46 despite a monthly gain. The company confirmed its August dividend will hold at the recently increased 12% annual rate, paid semi-monthly. Management's stated objective is for the shares to trade at $99-$100, though no timeline was given. The firm reported a large Q2 net loss due to unrealized losses on its Bitcoin holdings but has built a $3.75 billion cash reserve to support preferred dividend payments for over two years. It has also begun repurchasing STRC shares while they trade below par.

cointelegraphHá 1h

Strategy leaves preferred STRC dividend at 12% as price still below par

cointelegraphHá 1h

Bitcoin Withdrawals Continue: 8 Years of Storage in a Coldcard Cold Wallet Ended in Zero

Coldcard Hardware Wallet Hacked: Losses Mount Due to Vulnerable Seed Generation A critical vulnerability in Coldcard hardware wallets has led to a continued wave of fund thefts. According to Galaxy Research, the total stolen has reached 1,367.05 BTC (approx. $88.6 million) from 4,585 addresses, a significant increase from the initial 594.5 BTC reported on July 30, 2026. Most of the stolen funds remain on the attackers' addresses. The issue is not with the current firmware, which Coinkite has updated, but with seed phrases generated on vulnerable devices between March 2021 and the release of fixed firmware versions. Due to a programmer error, devices switched from using a hardware random number generator to the software-based Yasmarang generator, which was initialized with publicly accessible data like the chip's serial number. This made the seed phrases predictable through offline brute-force attacks, meaning wallets remain at risk until funds are moved to a new wallet generated with the patched firmware. Affected devices include Mk2/Mk3 with firmware 4.0.1–4.1.9 (and up to 5.0.3), Mk4/Mk5 up to version 5.6.0, and Q models up to 1.5.0Q. The only exceptions are seeds created with a high-entropy method like at least 50 independent dice rolls or a strong unique BIP-39 passphrase. All other owners must generate a new seed on the fixed firmware and transfer their assets. A case highlighting the human impact involves a 39-year-old long-term investor who lost 2 BTC (approx. $130,000) in minutes. He had accumulated the Bitcoin over eight years through physical labor, viewing it as a financial lifeline and a retirement plan in a country suffering from hyperinflation. His story underscores that even conservative "buy and hold in cold storage" strategies can be compromised by such underlying technical flaws. From a technical perspective, this incident echoes historical failures where weak random number generators undermined cryptographic security, challenging the assumption that offline storage is automatically foolproof.

cryptonews.ruHá 2h

Bitcoin Withdrawals Continue: 8 Years of Storage in a Coldcard Cold Wallet Ended in Zero

cryptonews.ruHá 2h

Explosive Growth in Trading Volumes of 15 Altcoins Observed in South Korea!

Major South Korean cryptocurrency exchanges Upbit and Bithumb have reported a significant surge in trading volumes for several altcoins. Over the past 24 hours, the total trading volume for the most popular altcoins reached approximately $347.7 million. MetaDAO (META) led the rankings with a trading volume of $65.84 million on Upbit alone, accounting for 12.39% of the exchange's total spot volume. Euler (EUL) followed in second place with a total volume of $47.65 million across both exchanges. XRP, which consistently attracts substantial interest from Korean investors, achieved a total volume of $38.11 million. Other notable altcoins in the top 15 by trading volume include ThunderCore (TT) at $35.64 million, Babylon (BABY) at $25.15 million, and Shiba Inu (SHIB) at $10.55 million.

cryptonews.ruHá 3h

Explosive Growth in Trading Volumes of 15 Altcoins Observed in South Korea!

cryptonews.ruHá 3h

Donald Trump's Company Sold Another Large Batch of Bitcoins!

Donald Trump's company, Trump Media & Technology Group, reportedly transferred another large batch of Bitcoin to the CryptoCom exchange. Blockchain analysis indicates that addresses linked to Trump Media moved approximately 2,628 BTC (worth around $165 million) to the exchange. Prior reports suggested the company had acquired a total of 11,542 BTC at an average price of $118,500. It is claimed that by 2026, about 7,281 BTC had been withdrawn from these addresses, with approximately 4,261 BTC still held on them. The total realized and unrealized losses from Trump Media's Bitcoin investments are estimated to be roughly $555 million. It is important to note that sending Bitcoin to an exchange does not definitively mean the assets were sold. Such transfers could also be for custody, liquidity management, or other financial operations. However, movements from cold wallets to centralized exchanges are commonly viewed as potential sales activity.

cryptonews.ruHá 5h

Donald Trump's Company Sold Another Large Batch of Bitcoins!

cryptonews.ruHá 5h

Trading

Spot

Artigos em Destaque

Como comprar 4

Bem-vindo à HTX.com!Tornámos a compra de 4 (4) simples e conveniente.Segue o nosso guia passo a passo para iniciar a tua jornada no mundo das criptos.Passo 1: cria a tua conta HTXUtiliza o teu e-mail ou número de telefone para te inscreveres numa conta gratuita na HTX.Desfruta de um processo de inscrição sem complicações e desbloqueia todas as funcionalidades.Obter a minha contaPasso 2: vai para Comprar Cripto e escolhe o teu método de pagamentoCartão de crédito/débito: usa o teu visa ou mastercard para comprar 4 (4) instantaneamente.Saldo: usa os fundos da tua conta HTX para transacionar sem problemas.Terceiros: adicionamos métodos de pagamento populares, como Google Pay e Apple Pay, para aumentar a conveniência.P2P: transaciona diretamente com outros utilizadores na HTX.Mercado de balcão (OTC): oferecemos serviços personalizados e taxas de câmbio competitivas para os traders.Passo 3: armazena teu 4 (4)Depois de comprar o teu 4 (4), armazena-o na tua conta HTX.Alternativamente, podes enviá-lo para outro lugar através de transferência blockchain ou usá-lo para transacionar outras criptomoedas.Passo 4: transaciona 4 (4)Transaciona facilmente 4 (4) no mercado à vista da HTX.Acede simplesmente à tua conta, seleciona o teu par de trading, executa as tuas transações e monitoriza em tempo real.Oferecemos uma experiência de fácil utilização tanto para principiantes como para traders experientes.

702 Visualizações TotaisPublicado em {updateTime}Atualizado em 2026.06.02

Discussões

Bem-vindo à Comunidade HTX. Aqui, pode manter-se informado sobre os mais recentes desenvolvimentos da plataforma e obter acesso a análises profissionais de mercado. As opiniões dos utilizadores sobre o preço de 4 (4) são apresentadas abaixo.