Apple Re-invented Image Compression with AI: Same Quality, One-Third the File Size

marsbitPublished on 2026-05-30Last updated on 2026-05-30

Abstract

Apple’s PICO: An AI-Powered Image Codec That Cuts File Size by Two-Thirds at Equal Perceived Quality In 2025, JPEG AI became the first international standard for learned image compression. However, it, like most codecs, still prioritizes mathematical metrics like PSNR over true perceptual quality—what the human eye finds pleasing. Apple researchers have introduced PICO (Perceptual Image Codec), a neural codec designed to optimize for human perception. It tackles key practical challenges: 1) Speed: A novel "one-shot context model" accelerates entropy encoding without sacrificing compression efficiency. 2) Artifacts: A dedicated TextFidelity loss preserves text clarity, and a TilingArtifact loss eliminates color seams between image tiles processed in parallel. 3) Control: It avoids the "hallucinations" common in GAN-based perceptual models. In a large-scale human evaluation (74,925 comparisons), PICO achieved the same perceived quality as standards like AV1, VVC, and JPEG AI while using only 30-43% of the bitrate. It also outperforms other learned perceptual codecs by 20-40%. Remarkably, it runs in 230ms (encode) and 150ms (decode) on an iPhone 17 Pro Max. While less efficient on synthetic graphics, PICO represents a significant shift from optimizing mathematical scores to directly targeting human visual experience, making high-quality perceptual compression practical for consumer devices. The work builds on expertise from WaveOne, whose team joined Apple and previously adv...

How small can an image be compressed?

In February 2025, the Joint Photographic Experts Group (JPEG) quietly announced a milestone celebrated within the industry: the official release of JPEG AI, the first end-to-end learned image coding international standard, which had been years in the making and was highly anticipated.

The news spread, with many researchers reposting on social media, adding comments like 'AI has finally entered the standards.'

The JPEG standard was born in 1992 and has been a fundamental language for digital images for over three decades. Now, artificial intelligence is starting to rewrite the grammar of this language.

However, behind the celebration lies a subtle reality: even JPEG AI still has considerable distance from true 'perceptual compression.'

Engineers know that traditional metrics like Peak Signal-to-Noise Ratio (PSNR) have little to do with what the human eye perceives as 'good-looking.' An image scoring high on PSNR might look mediocre to a person, while another image with lower PSNR might appear detailed and realistic. Optimizing mathematical metrics and optimizing for human perception are two entirely different things.

For decades, from JPEG to VVC and now JPEG AI, the design logic of almost all codecs has revolved within the framework of mathematical metrics. Perceptual compression (directly optimizing for the human visual experience) has always seemed like a distant goal in academic papers, not an engineering reality that could fit into a phone.

At this critical juncture, a team of engineers at Apple quietly published a paper with their answer, codenamed: PICO.

Paper Title: What Matters in Practical Learned Image Compression

Paper Address: https://arxiv.org/pdf/2605.05148

Why is 'Looking Better' Much Harder Than 'Scoring Higher'?

To understand PICO, one must first understand what image compression is actually doing.

Saving a photo as a file is essentially a problem of 'choosing what to forget and what to remember.' With limited storage space, some information must be discarded while making it as unnoticeable as possible to the viewer. Different codecs follow different 'discarding' rules.

Traditional codecs like JPEG, AV1, and VVC are manually designed rule-based systems. They divide the image into blocks, transform, quantize, and entropy code—each step based on decades of accumulated human expertise. These systems can perform excellently on mathematical metrics like PSNR, but their design is inherently oriented toward 'reducing pixel error,' not 'reducing visual discomfort for the human eye.'

The problem is that the human eye is not a pixel error meter. The human eye's sensitivity to texture, text, and detail is far more complex than mathematical formulas. When you compress a street scene photo heavily, the PSNR might still be respectable, but you might see blurred building edges or distorted text on street signs—precisely what the human eye detects first.

The emergence of learned codecs theoretically opened a new door: neural networks could be trained end-to-end directly for human perception, rather than for mathematical formulas. But before PICO, existing perceptual learned codecs were either too slow for practical use, lacked cross-device compatibility, or couldn't flexibly control bitrate, making them impossible to integrate into a consumer-grade product.

Three Core Problems, Three Solutions

The full name of PICO is Perceptual Image Codec. This name directly states its goal: to satisfy the human eye.

The research team systematically explored millions of model configurations and introduced several key technological innovations.

First Problem: Entropy Coding is Slow. What to Do?

A major challenge in image compression: to compress further, a codec needs an 'entropy model' to accurately estimate the information content of each pixel. The most accurate method is autoregressive coding: compressing each pixel requires looking at the surrounding already-compressed pixels for sequential prediction. It's like a chef checking the pot's state after adding each ingredient before deciding the next step. Accurate, but extremely slow.

PICO's solution is the 'One-shot Context Model': decoupling the crucial 'scale parameter' in entropy coding and computing it all in one forward pass, eliminating the need for waiting back and forth; other parameters can be computed in parallel. This retains the precision of autoregressive methods while circumventing their speed bottleneck. The result: removing this module degrades model performance by 10.28%; with it, speed is almost unaffected.

Second Problem: Perceptual Training Can Cause Hallucinations. What to Do?

Images trained with GANs (Generative Adversarial Networks) often 'look realistic,' but it might be a fabricated realism—hair strands turning into non-existent patterns, smooth surfaces gaining false textures. More troublesome, the human eye is extremely sensitive to text; even a slight distortion of a single letter is immediately noticeable.

PICO specifically designed TextFidelityLoss for text: using an off-the-shelf text detector to automatically find text regions in the image, then applying strict pixel fidelity constraints in these areas while suppressing the GAN's 'creative freedom' in text regions. Experiments showed that adding this loss function halved the absolute error in text regions.

Third Problem: Processing Images in Blocks Leaves Color Block Boundaries. What to Do?

To run fast on mobile phone chips, PICO divides images into 504×504 pixel tiles, processes them separately, and then stitches them back. However, GANs during training tend to ignore low-frequency color, often causing visible color discrepancies between adjacent tiles, similar to a poorly 'stitched' feeling in photo editing. The research team specifically introduced TilingArtifactLoss, a multi-resolution L1 loss, forcing the model to maintain color consistency across multiple spatial frequencies. This measure reduced errors at tile boundaries by more than half.

Experimental Results

The Apple team didn't rely solely on benchmark metrics. They commissioned a third-party platform, Mabyduck, to organize a large-scale human subjective evaluation.

The evaluation used a blind, pairwise comparison method: 610 screened evaluators (required to pass color blindness and compression artifact detection tests) compared reconstructed results of the same image using different codecs in paired comparisons, ultimately aggregated into a Bayesian ELO score. A total of 74,925 pairwise comparisons were collected.

The final numbers tell the story: At the same visual quality, PICO's file size is only one-third to one-half that of AV1, AV2, VVC, ECM, and JPEG AI—in other words, to store the same image, it requires only 30%-43% of the bits needed by these standards. Compared to the strongest existing perceptual learned codecs (HiFiC, MRIC, etc.), PICO also saves 20%-40% in file size.

In terms of speed, on an iPhone 17 Pro Max, PICO encodes a 12MP photo in just 230 milliseconds and decodes in 150 milliseconds. Most top-tier ML codecs running on NVIDIA V100 server GPUs are slower than this.

Notably, the paper also specifically recorded a 'counterexample': on the traditional PSNR metric, PICO performed average, even inferior to DCVC-RT and VVC. This恰好印证了团队的基本判断 perfectly illustrates the team's fundamental judgment: optimizing perceptual quality and optimizing mathematical metrics are inherently two different directions; you cannot have your cake and eat it too.

A Milestone, Not the Finish Line

PICO certainly has limitations. The paper acknowledges that for highly regular synthetic images like cartoons or schematic diagrams, PICO's compression efficiency is inferior to traditional codecs, as such content is inherently more suitable for rule-driven autoregressive modeling than perceptual generation.

But these limitations do not diminish the significance of this work.

For the past thirty years, technological progress in image compression has almost exclusively occurred on the track of 'making the numbers look better.' From JPEG to HEVC to VVC, engineers optimized metrics like PSNR and SSIM generation after generation. Human visual perception remained a 'difficult problem' that was circumvented.

PICO is the first time someone has systematically and directly tackled this difficult problem: from architecture search and loss function design to large-scale human subjective evaluation, culminating in a codec that can run in real-time on a mobile phone.

The next time you share a photo from an Apple device, you might not notice anything different. But perhaps within that quiet compression process, an algorithm tailored for human perception is deciding which information is worth keeping and which can be quietly forgotten.

The Team: From WaveOne to Apple

The corresponding author of this paper is Oren Rippel, an Apple researcher and a familiar face in the compression field.

His name first gained widespread attention in 2017. At that time, he was at the startup WaveOne, publishing a paper titled 'Real-Time Adaptive Image Compression,' using neural networks to outperform all mainstream codecs while maintaining real-time speeds. That paper caused significant waves in academia and established Rippel's standing in the field of learned compression.

Afterwards, the same core personnel continued their work at WaveOne, introducing ELF-VC for video compression, achieving a 44% bitrate saving compared to H.264 on the UVG video test set while running over five times faster than similar ML codecs.

This team from WaveOne later joined Apple as a group. And this PICO is their first systematic answer on perceptual image compression, backed by Apple's computing power and platform resources.

This article is from the WeChat public account "Almost Human" (ID: almosthuman2014), author: Compression is Intelligence

Trending Cryptos

CitreaCTR

wrapped stUSDTWSTUSDT

Velodrome FinanceVELODROME

BrevisBREV

PancakeSwapCAKE

JUSTJST

In Conversation with Ray Dalio: We Are Currently in an AI Bubble, with 1% of My Portfolio in Bitcoin

Ray Dalio, founder of Bridgewater Associates, warns in an interview that the current AI boom shows classic bubble characteristics, which could lead to significant economic downturns as seen in past cycles like 1929 or 2000. He explains that speculative enthusiasm, fueled by debt and overvaluation, often precedes a crash when rising rates or taxation force asset sales, causing widespread losses and recession. Dalio also outlines his "Big Cycle" theory, describing an approximate 80-year pattern where widening wealth gaps, massive government deficits, and shifting geopolitical power (like China's rise) create internal conflict and global instability. He emphasizes that we are in a late-cycle, transitional phase where traditional powers like the US and UK face decline. For personal wealth protection, Dalio advises diversification beyond cash into assets like stocks, bonds, real estate, and particularly gold, which he prefers over Bitcoin. While he holds about 1% of his portfolio in Bitcoin as a non-printable hard asset, he views gold as more secure from technological or governmental threats. Regarding AI's impact, Dalio believes it will disproportionately benefit capital owners, worsening inequality by replacing both physical and cognitive labor. He suggests that human intuition and emotional intelligence, combined with AI, will be key for future workers. On taxation, Dalio argues that wealth taxes are impractical and risk triggering asset sell-offs, reducing productive investment. He points to the UK as a cautionary example of debt, low productivity, and political strife. Geopolitically, Dalio foresees a more regionalized world, with the US showing weakness in prolonged conflicts like with Iran, akin to past imperial declines. The ideal outcome, he suggests, is coexisting powerful blocs (e.g., Americas, China-Asia Pacific) without major war.

marsbit19m ago

In Conversation with Ray Dalio: We Are Currently in an AI Bubble, with 1% of My Portfolio in Bitcoin

marsbit19m ago

Daily 7.2 Trillion KRW: Foreign Capital's Record Net Buying on Friday! Wall Street Says Headwinds for Korean Stock Fund Flows Have Subsided

South Korean stock market sees a dramatic shift in fund flows. On July 31, foreign investors made a record net purchase of approximately KRW 7.2 trillion in KOSPI stocks, marking a fundamental reversal from the persistent large-scale net outflows seen in previous months. This contributed to a significant narrowing of foreign net selling in July to KRW 9.8 trillion, down sharply from KRW 48.4 trillion in June and KRW 44.5 trillion in May. Simultaneously, domestic institutional pressure eased. South Korean pension funds and asset managers turned to a net buying position in July, purchasing KRW 1.0 trillion worth of KOSPI shares, contrasting with net sales in May and June. Market volatility is expected to be dampened by new financial regulations. Effective July 31, the Financial Services Commission tightened access for retail investors to single-stock leveraged ETFs by raising the minimum cash deposit requirement. Trading volumes for these products subsequently dropped to about 50% of their monthly average. Citigroup Research maintains its year-end KOSPI target of 10,000 points. The firm cites several supportive factors: the substantial easing of headwinds from capital outflows, a robust fundamental outlook for the semiconductor sector, historically low market valuations, strong economic fundamentals, and the potential for policy support from financial authorities if needed.

marsbit19m ago

Daily 7.2 Trillion KRW: Foreign Capital's Record Net Buying on Friday! Wall Street Says Headwinds for Korean Stock Fund Flows Have Subsided

marsbit19m ago

Breaking! OpenAI's Next-Gen AI Solves 10 Fields Medal-Level Problems

OpenAI's next-generation AI model Astra achieves breakthroughs in 10 long-standing mathematical conjectures. The results, including constructing the first known infinite, finitely presented non-sofic group—resolving a major question since 1999—and advancing the high-dimensional sphere packing problem beyond a 46-year-old barrier, are detailed in a 249-page paper. Key proofs have been formally verified using Lean 4. The AI also refuted a rigidity conjecture by Fields Medalist Alain Connes. According to OpenAI, generating these proofs cost under $2,000. Experts describe the findings as potentially Fields Medal-worthy and a landmark moment for both mathematics and AI, showcasing the model's ability to produce profound, human-like reasoning across diverse fields like group theory, geometry, and operator algebras.

marsbit1h ago

Breaking! OpenAI's Next-Gen AI Solves 10 Fields Medal-Level Problems

marsbit1h ago

How to Make Yourself Irreplaceable by AI Forever

This article argues that the primary threat from AI is not job replacement, but remaining trapped in "wage slavery"—financial dependence on employers. The path to becoming irreplaceable is not resisting AI, but becoming an "unemployable" individual who builds their own meaningful enterprise. The author identifies five key elements for this: Agency (acting without permission), Taste (judging what's worthwhile), Persuasion, Persistence, and Iteration. The solution is to stop being a "pawn" in someone else's game. To start, you must fundamentally change your identity and environment, then engage in rapid, real-world trial and error. While both coding and creating media (content) are powerful, content is more crucial. AI can generate assets, but true value lies in subjective, human-driven content that builds trust and narrative. The actionable advice is to carve out 15 minutes to answer foundational questions: 1) Uncover your "raw material"—what you know deeply or solve effortlessly. 2) Define your contrarian perspective—what common beliefs you think are wrong. The intersection of these answers is your direction. Finally, you must launch by publishing your first core idea immediately, using the feedback to iterate and develop the skills needed for a self-directed life and career.

marsbit2h ago

How to Make Yourself Irreplaceable by AI Forever

marsbit2h ago

Thanks to Dice Rolls, Bitcoin Keys Are Stored Offline, But Not Everyone Will Do It

The article discusses using dice rolls to generate secure Bitcoin wallet seeds, providing entropy independent of potentially flawed hardware random number generators. It explains that each fair dice roll offers about 2.585 bits of entropy, with around 50 rolls needed for a standard 12-word seed phrase and 99+ recommended for higher security. This method gained attention after a vulnerability was revealed in some Coldcard hardware wallets, where a faulty firmware RNG (dating back to 2021) compromised generated keys. The analysis notes that while a dice-generated main seed was safe from this specific flaw, other Coldcard functions (like creating paper wallets, backup keys, or passwords) could still be vulnerable if they used the defective RNG. The piece argues that while dice-based entropy is technically robust, the manual process is error-prone, tedious, and unrealistic for most new users, who might make mistakes in recording or inputting rolls. It concludes that while manual entropy generation should remain an option for advanced users, the long-term goal is to develop reliable, user-friendly hardware and software that securely generates randomness without requiring specialized knowledge. Coldcard users are advised to check their firmware version and replace any secondary secrets (like paper wallet keys) created with vulnerable devices, while also considering multi-signature setups with devices from different manufacturers for added security.

cryptonews.ru5h ago

Thanks to Dice Rolls, Bitcoin Keys Are Stored Offline, But Not Everyone Will Do It

cryptonews.ru5h ago

Trading

Spot

Hot Articles

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

Talus is a decentralized AI Agent framework built on the Sui, designed to solve the structural problems of current AI systems: centralization, opacity, and a lack of native economic identity.

43.4k Total ViewsPublished 2026.03.18Updated 2026.03.18

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

By 2026, the integration of artificial intelligence and cryptocurrency has advanced from proof-of-concept to a new stage of "system-level integration".

3.0k Total ViewsPublished 2026.03.26Updated 2026.03.26

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

U.S. Equity TradFi Assets: Traditional Finance as a Steady Anchor Amid the AI IPO Boom

In 2026, the U.S. IPO market has regained momentum.

36.9k Total ViewsPublished 2026.07.08Updated 2026.07.08

U.S. Equity TradFi Assets: Traditional Finance as a Steady Anchor Amid the AI IPO Boom

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

Apple Re-invented Image Compression with AI: Same Quality, One-Third the File Size

Abstract

Why is 'Looking Better' Much Harder Than 'Scoring Higher'?

Three Core Problems, Three Solutions

First Problem: Entropy Coding is Slow. What to Do?

Second Problem: Perceptual Training Can Cause Hallucinations. What to Do?

Third Problem: Processing Images in Blocks Leaves Color Block Boundaries. What to Do?

Experimental Results

A Milestone, Not the Finish Line

The Team: From WaveOne to Apple

Trending Cryptos

Related Questions

Related Reads

In Conversation with Ray Dalio: We Are Currently in an AI Bubble, with 1% of My Portfolio in Bitcoin

Daily 7.2 Trillion KRW: Foreign Capital's Record Net Buying on Friday! Wall Street Says Headwinds for Korean Stock Fund Flows Have Subsided

Breaking! OpenAI's Next-Gen AI Solves 10 Fields Medal-Level Problems

How to Make Yourself Irreplaceable by AI Forever

Thanks to Dice Rolls, Bitcoin Keys Are Stored Offline, But Not Everyone Will Do It

Trading

Hot Articles

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

U.S. Equity TradFi Assets: Traditional Finance as a Steady Anchor Amid the AI IPO Boom

Discussions

Top Questions