Apple Re-invented Image Compression with AI: Same Quality, One-Third the File Size

marsbitPublished on 2026-05-30Last updated on 2026-05-30

Abstract

Apple’s PICO: An AI-Powered Image Codec That Cuts File Size by Two-Thirds at Equal Perceived Quality In 2025, JPEG AI became the first international standard for learned image compression. However, it, like most codecs, still prioritizes mathematical metrics like PSNR over true perceptual quality—what the human eye finds pleasing. Apple researchers have introduced PICO (Perceptual Image Codec), a neural codec designed to optimize for human perception. It tackles key practical challenges: 1) Speed: A novel "one-shot context model" accelerates entropy encoding without sacrificing compression efficiency. 2) Artifacts: A dedicated TextFidelity loss preserves text clarity, and a TilingArtifact loss eliminates color seams between image tiles processed in parallel. 3) Control: It avoids the "hallucinations" common in GAN-based perceptual models. In a large-scale human evaluation (74,925 comparisons), PICO achieved the same perceived quality as standards like AV1, VVC, and JPEG AI while using only 30-43% of the bitrate. It also outperforms other learned perceptual codecs by 20-40%. Remarkably, it runs in 230ms (encode) and 150ms (decode) on an iPhone 17 Pro Max. While less efficient on synthetic graphics, PICO represents a significant shift from optimizing mathematical scores to directly targeting human visual experience, making high-quality perceptual compression practical for consumer devices. The work builds on expertise from WaveOne, whose team joined Apple and previously adv...

How small can an image be compressed?

In February 2025, the Joint Photographic Experts Group (JPEG) quietly announced a milestone celebrated within the industry: the official release of JPEG AI, the first end-to-end learned image coding international standard, which had been years in the making and was highly anticipated.

The news spread, with many researchers reposting on social media, adding comments like 'AI has finally entered the standards.'

The JPEG standard was born in 1992 and has been a fundamental language for digital images for over three decades. Now, artificial intelligence is starting to rewrite the grammar of this language.

However, behind the celebration lies a subtle reality: even JPEG AI still has considerable distance from true 'perceptual compression.'

Engineers know that traditional metrics like Peak Signal-to-Noise Ratio (PSNR) have little to do with what the human eye perceives as 'good-looking.' An image scoring high on PSNR might look mediocre to a person, while another image with lower PSNR might appear detailed and realistic. Optimizing mathematical metrics and optimizing for human perception are two entirely different things.

For decades, from JPEG to VVC and now JPEG AI, the design logic of almost all codecs has revolved within the framework of mathematical metrics. Perceptual compression (directly optimizing for the human visual experience) has always seemed like a distant goal in academic papers, not an engineering reality that could fit into a phone.

At this critical juncture, a team of engineers at Apple quietly published a paper with their answer, codenamed: PICO.

Paper Title: What Matters in Practical Learned Image Compression

Paper Address: https://arxiv.org/pdf/2605.05148

Why is 'Looking Better' Much Harder Than 'Scoring Higher'?

To understand PICO, one must first understand what image compression is actually doing.

Saving a photo as a file is essentially a problem of 'choosing what to forget and what to remember.' With limited storage space, some information must be discarded while making it as unnoticeable as possible to the viewer. Different codecs follow different 'discarding' rules.

Traditional codecs like JPEG, AV1, and VVC are manually designed rule-based systems. They divide the image into blocks, transform, quantize, and entropy code—each step based on decades of accumulated human expertise. These systems can perform excellently on mathematical metrics like PSNR, but their design is inherently oriented toward 'reducing pixel error,' not 'reducing visual discomfort for the human eye.'

The problem is that the human eye is not a pixel error meter. The human eye's sensitivity to texture, text, and detail is far more complex than mathematical formulas. When you compress a street scene photo heavily, the PSNR might still be respectable, but you might see blurred building edges or distorted text on street signs—precisely what the human eye detects first.

The emergence of learned codecs theoretically opened a new door: neural networks could be trained end-to-end directly for human perception, rather than for mathematical formulas. But before PICO, existing perceptual learned codecs were either too slow for practical use, lacked cross-device compatibility, or couldn't flexibly control bitrate, making them impossible to integrate into a consumer-grade product.

Three Core Problems, Three Solutions

The full name of PICO is Perceptual Image Codec. This name directly states its goal: to satisfy the human eye.

The research team systematically explored millions of model configurations and introduced several key technological innovations.

First Problem: Entropy Coding is Slow. What to Do?

A major challenge in image compression: to compress further, a codec needs an 'entropy model' to accurately estimate the information content of each pixel. The most accurate method is autoregressive coding: compressing each pixel requires looking at the surrounding already-compressed pixels for sequential prediction. It's like a chef checking the pot's state after adding each ingredient before deciding the next step. Accurate, but extremely slow.

PICO's solution is the 'One-shot Context Model': decoupling the crucial 'scale parameter' in entropy coding and computing it all in one forward pass, eliminating the need for waiting back and forth; other parameters can be computed in parallel. This retains the precision of autoregressive methods while circumventing their speed bottleneck. The result: removing this module degrades model performance by 10.28%; with it, speed is almost unaffected.

Second Problem: Perceptual Training Can Cause Hallucinations. What to Do?

Images trained with GANs (Generative Adversarial Networks) often 'look realistic,' but it might be a fabricated realism—hair strands turning into non-existent patterns, smooth surfaces gaining false textures. More troublesome, the human eye is extremely sensitive to text; even a slight distortion of a single letter is immediately noticeable.

PICO specifically designed TextFidelityLoss for text: using an off-the-shelf text detector to automatically find text regions in the image, then applying strict pixel fidelity constraints in these areas while suppressing the GAN's 'creative freedom' in text regions. Experiments showed that adding this loss function halved the absolute error in text regions.

Third Problem: Processing Images in Blocks Leaves Color Block Boundaries. What to Do?

To run fast on mobile phone chips, PICO divides images into 504×504 pixel tiles, processes them separately, and then stitches them back. However, GANs during training tend to ignore low-frequency color, often causing visible color discrepancies between adjacent tiles, similar to a poorly 'stitched' feeling in photo editing. The research team specifically introduced TilingArtifactLoss, a multi-resolution L1 loss, forcing the model to maintain color consistency across multiple spatial frequencies. This measure reduced errors at tile boundaries by more than half.

Experimental Results

The Apple team didn't rely solely on benchmark metrics. They commissioned a third-party platform, Mabyduck, to organize a large-scale human subjective evaluation.

The evaluation used a blind, pairwise comparison method: 610 screened evaluators (required to pass color blindness and compression artifact detection tests) compared reconstructed results of the same image using different codecs in paired comparisons, ultimately aggregated into a Bayesian ELO score. A total of 74,925 pairwise comparisons were collected.

The final numbers tell the story: At the same visual quality, PICO's file size is only one-third to one-half that of AV1, AV2, VVC, ECM, and JPEG AI—in other words, to store the same image, it requires only 30%-43% of the bits needed by these standards. Compared to the strongest existing perceptual learned codecs (HiFiC, MRIC, etc.), PICO also saves 20%-40% in file size.

In terms of speed, on an iPhone 17 Pro Max, PICO encodes a 12MP photo in just 230 milliseconds and decodes in 150 milliseconds. Most top-tier ML codecs running on NVIDIA V100 server GPUs are slower than this.

Notably, the paper also specifically recorded a 'counterexample': on the traditional PSNR metric, PICO performed average, even inferior to DCVC-RT and VVC. This恰好印证了团队的基本判断 perfectly illustrates the team's fundamental judgment: optimizing perceptual quality and optimizing mathematical metrics are inherently two different directions; you cannot have your cake and eat it too.

A Milestone, Not the Finish Line

PICO certainly has limitations. The paper acknowledges that for highly regular synthetic images like cartoons or schematic diagrams, PICO's compression efficiency is inferior to traditional codecs, as such content is inherently more suitable for rule-driven autoregressive modeling than perceptual generation.

But these limitations do not diminish the significance of this work.

For the past thirty years, technological progress in image compression has almost exclusively occurred on the track of 'making the numbers look better.' From JPEG to HEVC to VVC, engineers optimized metrics like PSNR and SSIM generation after generation. Human visual perception remained a 'difficult problem' that was circumvented.

PICO is the first time someone has systematically and directly tackled this difficult problem: from architecture search and loss function design to large-scale human subjective evaluation, culminating in a codec that can run in real-time on a mobile phone.

The next time you share a photo from an Apple device, you might not notice anything different. But perhaps within that quiet compression process, an algorithm tailored for human perception is deciding which information is worth keeping and which can be quietly forgotten.

The Team: From WaveOne to Apple

The corresponding author of this paper is Oren Rippel, an Apple researcher and a familiar face in the compression field.

His name first gained widespread attention in 2017. At that time, he was at the startup WaveOne, publishing a paper titled 'Real-Time Adaptive Image Compression,' using neural networks to outperform all mainstream codecs while maintaining real-time speeds. That paper caused significant waves in academia and established Rippel's standing in the field of learned compression.

Afterwards, the same core personnel continued their work at WaveOne, introducing ELF-VC for video compression, achieving a 44% bitrate saving compared to H.264 on the UVG video test set while running over five times faster than similar ML codecs.

This team from WaveOne later joined Apple as a group. And this PICO is their first systematic answer on perceptual image compression, backed by Apple's computing power and platform resources.

This article is from the WeChat public account "Almost Human" (ID: almosthuman2014), author: Compression is Intelligence

From Suppliers to Shareholders: The Big Three Memory Chip Giants Jointly Invest in Anthropic, AI Supply Chain Power Structure Undergoing Reshuffle

For the first time, memory chip giants Micron, Samsung, and SK hynix have jointly invested in the same AI company, Anthropic, as part of its massive $65 billion Series H funding round. This strategic move, positioning the three rival HBM suppliers as "strategic infrastructure partners," highlights a fundamental shift in the AI industry's power dynamics. With HBM (High Bandwidth Memory) being a critically scarce resource essential for AI model training and inference, securing a stable supply has become a key competitive differentiator. By making these chipmakers shareholders, Anthropic aims to lock in this vital component for its rapid expansion, which includes securing major compute commitments from Amazon, Google, and others. For the memory trio, this investment represents a strategic bet on defining the future of AI hardware. Each company gains: SK hynix reinforces its dominant position in the NVIDIA supply chain; Samsung diversifies its client base beyond NVIDIA; and Micron leverages its geopolitical significance as the sole US-based HBM maker. Their collective move signals that competition in AI is evolving beyond model capability to encompass control over the entire compute supply chain—from chips and memory to power and networking. This vertical integration trend, where infrastructure providers become direct stakeholders in AI firms, marks the industry's maturation as AI transforms from a research project into essential global infrastructure, setting the stage for a new era of ecosystem competition.

marsbit10m ago

From Suppliers to Shareholders: The Big Three Memory Chip Giants Jointly Invest in Anthropic, AI Supply Chain Power Structure Undergoing Reshuffle

marsbit10m ago

Sharplink CEO: Selling off ETH Now is Like Selling Amazon During the Internet Bubble

Sharplink CEO Joseph Chalom argues that selling Ethereum (ETH) now is akin to selling Amazon during the internet bubble. He asserts that the Ethereum Foundation (EF) is correctly focusing on core protocol development, security, and decentralization, which form the bedrock of institutional trust. Chalom, a former BlackRock executive, emphasizes Ethereum's leading position in processing stablecoin settlements, tokenizing real-world assets, and hosting high-value DeFi transactions. He contends that Ethereum's decentralization is a strength, not a weakness, and is crucial for its role as a future financial settlement layer. Comparing ETH to early Amazon, he believes the market underestimates its potential total addressable market, which is the entire global financial system, not just crypto trading. Chalom views the current market fear and negative sentiment, highlighted by a prominent figure liquidating ETH holdings, as a potential buying opportunity for disciplined capital, drawing parallels to Warren Buffett's strategy. He calls for ecosystem participants to amplify Ethereum's narrative and actively support what he sees as an impending "super-cycle" of institutional adoption, noting Sharplink's significant investments and initiatives in the space.

marsbit1h ago

Sharplink CEO: Selling off ETH Now is Like Selling Amazon During the Internet Bubble

marsbit1h ago

Sharplink CEO: Selling off ETH now is like selling Amazon during the dot-com bubble

Sharplink CEO Joseph Chalom argues that selling ETH now is akin to selling Amazon during the dot-com bubble. He emphasizes Ethereum's foundational strengths: the Ethereum Foundation (EF) is rightly focused on core protocol development, security, and decentralization—key pillars for institutional trust. Chalom, a former BlackRock executive, highlights Ethereum's dominant position in stablecoin settlement, tokenized real-world assets, and DeFi. He counters criticism of EF's perceived passivity, stating that decentralization is a core feature, not a bug, making Ethereum a credible neutral settlement layer. Drawing a parallel to Amazon's early days, he asserts that ETH's true potential lies in its role within the global financial system, not just crypto trading. The current market fear and negative sentiment, he suggests, present a strategic buying opportunity for disciplined capital. Chalom calls for ecosystem participants to actively champion Ethereum's narrative to drive institutional adoption, noting Sharplink's own significant investments and initiatives in the space. He concludes that Ethereum's future as critical financial infrastructure is being built now.

Odaily星球日报1h ago

Sharplink CEO: Selling off ETH now is like selling Amazon during the dot-com bubble

Odaily星球日报1h ago

Investment Philosophy of Gavin Baker, an Early Nvidia Investor: Long AI Infrastructure Bottlenecks, Short Overall Market Risk

Gavin Baker, an early investor in Nvidia and founder of Atreides Management, outlines his investment philosophy: going long on AI infrastructure bottlenecks while hedging against broader market risk. He argues AI is not a bubble but a supercycle driven by constraints in power, wafers (semiconductors), and compute efficiency (tokens per watt). True alpha, he believes, lies not in application-layer companies like OpenAI but in "picks and shovels" providers—companies solving physical bottlenecks in GPU connectivity (e.g., Astera Labs), memory (Micron), inference chips (Cerebras, Positron), advanced manufacturing (TSMC, ASML), and energy supply. His portfolio reflects this barbell strategy: concentrated bets on key infrastructure players alongside a significant put position on the QQQ ETF to hedge overall market downside. Baker contends this cycle differs from the dot-com bubble because demand is fueled by the strong balance sheets of hyperscalers (Google, Meta, Amazon, Microsoft), not debt, and physical supply constraints (e.g., chip manufacturing capacity) prevent runaway overinvestment. He highlights the growing importance of inference (vs. pre-training), vertical/small language models, sovereign infrastructure deployment speed, and the convergence of energy and space (e.g., orbital compute). His long-term view is that performance-per-watt and token cost reduction will dictate winners as AI scaling hits fundamental physical limits.

marsbit1h ago

Investment Philosophy of Gavin Baker, an Early Nvidia Investor: Long AI Infrastructure Bottlenecks, Short Overall Market Risk

marsbit1h ago

Shanghai's Leading Large Model Company Initiates A-Share Listing

Shanghai-based AI large language model leader MiniMax has initiated the process for an A-share listing in China, having filed a pre-IPO tutoring report with the Shanghai Securities Regulatory Bureau on May 29. This move positions it to compete with Zhipu AI for the title of the first major domestic LLM company to list on the A-share market. Having already completed an IPO in Hong Kong in January 2026, MiniMax's stock price has surged approximately 409% since its debut, with its market capitalization reaching around HK$263.45 billion (approximately RMB 227.55 billion) as of May 29. The company's rapid growth is supported by strong business performance. Its Annual Recurring Revenue (ARR) has grown over 100% in the past two months and now exceeds $300 million. It serves over one million global enterprise and developer clients and has around 300 million users worldwide. For the full year 2025, MiniMax reported revenue of $79.038 million, with a gross margin of 25.4%. While it reported an adjusted net loss of $250 million, the loss rate has narrowed significantly year-over-year. On the product front, MiniMax has released several flagship models this year, including MiniMax-M2.5, M2.6, and M2.7, with the first and last being open-sourced. Its models gained significant traction earlier in the year, briefly becoming the top model provider by usage share on the OpenRouter platform in February. The company has also upgraded its AI agent product, now named Mavis, and is preparing to launch its next-generation MiniMax-M3 model. Technical previews indicate M3 will feature a novel "MiniMax Sparse Attention" mechanism, promising substantial improvements in inference speed. MiniMax's push for an A-share listing reflects a broader trend among China's leading AI firms, including Zhipu AI, Moonshot AI, StepFun, and 01.AI, to seek public listings. This strategy aims to secure broader financing channels to support the immense computational costs and ongoing commercialization efforts inherent in developing advanced large language models.

marsbit2h ago

Shanghai's Leading Large Model Company Initiates A-Share Listing

marsbit2h ago

Trading

Spot

Futures

Hot Articles

Audiera: The AI Agent Network Powering the Web4 Entertainment Economy

Audiera is a dual-platform Web4 entertainment ecosystem combining a mobile rhythm experience and a lightweight Telegram mini-game, powered by AI interaction and an on-chain creator economy.

40.1k Total ViewsPublished 2026.03.11Updated 2026.03.11

Audiera: The AI Agent Network Powering the Web4 Entertainment Economy

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

Talus is a decentralized AI Agent framework built on the Sui, designed to solve the structural problems of current AI systems: centralization, opacity, and a lack of native economic identity.

41.8k Total ViewsPublished 2026.03.18Updated 2026.03.18

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

By 2026, the integration of artificial intelligence and cryptocurrency has advanced from proof-of-concept to a new stage of "system-level integration".

2.0k Total ViewsPublished 2026.03.26Updated 2026.03.26

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

Apple Re-invented Image Compression with AI: Same Quality, One-Third the File Size

Abstract

Why is 'Looking Better' Much Harder Than 'Scoring Higher'?

Three Core Problems, Three Solutions

First Problem: Entropy Coding is Slow. What to Do?

Second Problem: Perceptual Training Can Cause Hallucinations. What to Do?

Third Problem: Processing Images in Blocks Leaves Color Block Boundaries. What to Do?

Experimental Results

A Milestone, Not the Finish Line

The Team: From WaveOne to Apple

Related Questions

Related Reads

From Suppliers to Shareholders: The Big Three Memory Chip Giants Jointly Invest in Anthropic, AI Supply Chain Power Structure Undergoing Reshuffle

Sharplink CEO: Selling off ETH Now is Like Selling Amazon During the Internet Bubble

Sharplink CEO: Selling off ETH now is like selling Amazon during the dot-com bubble

Investment Philosophy of Gavin Baker, an Early Nvidia Investor: Long AI Infrastructure Bottlenecks, Short Overall Market Risk

Shanghai's Leading Large Model Company Initiates A-Share Listing

Trading

Hot Articles

Audiera: The AI Agent Network Powering the Web4 Entertainment Economy

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

Discussions

Top Questions

Hot Categories

Hot Tags