# Speedup Related Articles

HTX News Center provides the latest articles and in-depth analysis on "Speedup", covering market trends, project updates, tech developments, and regulatory policies in the crypto industry.

Running MoE on Mobile Phones? Meta Proposes MobileMoE, Speeding Up iPhone 16 Pro by 3.8x

Meta's MobileMoE, a mobile-optimized Mixture-of-Experts (MoE) language model architecture, enables efficient on-device large language model (LLM) inference for the first time on commercial smartphones. Designed for decoder-only Transformers, it replaces dense feed-forward layers with MoE layers. Key design choices include 8 experts with granularity g=8, top-4 routing, and a shared expert. The model undergoes a four-stage training process: pre-training, intermediate training, supervised fine-tuning, and quantization-aware training. Results show MobileMoE models, with similar memory footprint, achieve equal or higher average accuracy across 14 foundational benchmarks while using only 1/2 to 1/4 of the FLOPs compared to dense baselines. After INT4 quantization, they remain competitive. Notably, on an iPhone 16 Pro, MobileMoE-S demonstrates significant speedups: up to 3.8x faster in the prompt phase and 2.2-3.4x faster in per-token generation compared to a dense counterpart, with lower peak memory usage. While MobileMoE establishes a new Pareto frontier for on-device LLMs in accuracy-compute trade-offs, particularly excelling in code and math tasks, it currently lags behind models like Qwen3.5 2B in advanced instruction following and knowledge reasoning. Future work includes improving post-training techniques, exploring NPU deployment, and managing the runtime memory sensitivity of MoE models to varying inputs.

marsbit06/01 06:09

Running MoE on Mobile Phones? Meta Proposes MobileMoE, Speeding Up iPhone 16 Pro by 3.8x

marsbit06/01 06:09

活动图片