A Year Consumes a Solid-State Drive: Codex Log Bug Slammed as 'Slopware'

marsbit發佈於 2026-07-02更新於 2026-07-02

文章摘要

OpenAI's flagship AI coding tool, Codex, was found to have a critical bug causing its feedback logging system to silently and rapidly wear out users' SSDs. A developer reported that Codex was writing approximately 640 TB of data per year to a local SQLite database (`logs_2.sqlite`) through a constant cycle of inserting and immediately deleting log entries, primarily at the verbose TRACE level. While the database file itself remained around 1 GB, the underlying write-amplification from SQLite's WAL mechanism meant the physical SSD endured the full write load. This was enough to exceed the typical 600 TBW endurance rating of a consumer SSD within a year. The root cause was a hardcoded default logging level (`Level::TRACE`) in the configuration, which overrode any user attempts to reduce logging via environment variables. Analysis showed that over 96% of the logged data—including noisy WebSocket packet dumps and repeated system file events—was useless debug information. The issue, which had at least nine related bug reports in the Codex repository, remained latent because it didn't visibly consume disk space, only silently accumulated write cycles. After the report gained traction on Hacker News, OpenAI merged fixes estimated to reduce writes by about 85%. However, even post-fix, the tool would still write an estimated 96 TB annually. The incident sparked broader criticism of "slopware" in AI-assisted development tools, highlighting a lack of resource budgeting for disk, CPU, ...

A year 'eats' a 1TB SSD?

OpenAI's flagship programming tool, Codex, is burning through your solid-state drive with 640TB of writes per year.

Not long ago, a developer submitted an issue on GitHub. This now-closed GitHub issue, #28224, bears the title:

Codex's SQLite feedback log writes 640TB/year, rapidly depleting SSD lifespan.

According to the reporter's actual measurements, his primary SSD lost 37TB of write endurance after 21 days of continuous operation. At this rate, it's about 640TB per year—enough to wear out a consumer-grade drive with a 600 TBW (Total Bytes Written) rating.

As evidence, he posted two tables.

In evidence 1, the log database always appears to be only 1.2GB, seemingly like nothing happened; yet its auto-incrementing row ID has surged to 5.5 billion, while the actual retained rows are just over 500,000, a difference of ten thousand times.

The key is that SSD wear counts the total amount written, not what remains now: all those 5.5 billion rows were written to disk, and deleting them doesn't undo the writes already incurred. So you only ever see those 500,000 rows when checking the file, but the drive has already endured the write load of 5.5 billion rows.

Evidence 2 reveals the distribution of these 5.5 billion rows: over 90% are debug noise that even the developers themselves wouldn't look back at. Simply copying down each entire WebSocket data packet accounted for half of it.

The culprit is a default configuration of Level::TRACE, treating your drive's write endurance as free scratch paper.

A highly upvoted comment on Hacker News directly defined the nature of this issue:

This is one of the most egregious examples of "slopware."

This netizen also helplessly added:

This is a tragedy. The world needs someone to compete with Anthropic.

What's even more awkward is that this problem wasn't unreported.

Sporadic feedback has existed since April this year, dragging on for over two months. Only after users calculated it themselves, wrote reports, and pushed it to the top of Hacker News did it receive serious attention. Even then, this round only cut about 85% of the log writes.

Some tried to fix it themselves but found they couldn't: the desktop versions of these tools are closed-source.

There was also a classic comment in the thread: How did the review process not catch such an obvious bug? Oh right... @codex, review this.

How exactly did the 640TB get written?

What does 640TB mean?

Mainstream consumer SSDs have a rated write endurance of roughly 150 to 600 TBW, enough for the average user for over a decade or two.

Yet Codex's "record what I did" logging feature can write that much in a year.

The story begins when this user checked his drive usage. His machine, running continuously for 21 days, saw his primary SSD endure 37TB of writes.

At this speed, it's about 640TB per year.

What's more absurd is the write pattern.

Codex maintains a local SQLite database, logs_2.sqlite, specifically for feedback logs. This user monitored it for 15 seconds—36,211 rows were inserted, while the total retained row count remained 681,774 from start to finish, not a single one more.

For every row inserted, one was deleted. The row count stayed constant, but the disk was being rewritten tens of thousands of times.

This mechanism has a nickname: insert-and-prune.

Even more ridiculous is what it records: a bunch of filesystem inotify events.

ld.so.cache was logged 128,764 times, locale.alias 37,982 times, passwd 23,843 times.

The same file, by the same program, logged hundreds of thousands of times repeatedly.

The auto-incrementing ID in the logs had exceeded 5.5 billion, while only about 500,000 rows were retained.

A difference of ten thousand times.

This isn't a bug; it's more like an AI programming tool chanting to its own hard drive.

The file is only 1GB, but writes amount to 640TB

If it writes and deletes simultaneously, how big can logs_2.sqlite remain? About 1GB.

This leads to the most counterintuitive point of the whole affair: SSD lifespan depends on "write amount," not "file size." A 1GB file rewritten 640 times equals 640TB of writes for the drive.

SQLite uses a WAL (Write-Ahead Logging) mechanism. Changes are first written to a -wal file, then checkpointed back to the main database in batches. Codex performs over thirty thousand inserts and deletions every 15 seconds. Each one goes through WAL, index updates, and checkpoints—the same storage area, erased over and over.

An analogy: a 1GB notebook, where you erase and rewrite it 1,750 times a day for a year. The notebook is the same, but the paper is worn through.

This is also why this bug could remain hidden for so long: it doesn't take up space, only burns through lifespan.

Checking available disk space shows nothing unusual; the file size stays quiet. Only by reading the drive's own SMART health counters can you see the write amount silently accumulating.

Root cause: an ignored RUST_LOG line

Why were so many logs recorded?

The answer lies in a single line of configuration in the Codex source code: the SQLite feedback log sink is initialized with Targets::new().with_default(Level::TRACE).

In short, the log is set to TRACE level by default—the highest, most verbose, record-everything level.

Codex's logging framework is Rust's `tracing` ecosystem. The standard practice is to read the RUST_LOG environment variable. Users certainly tried setting RUST_LOG to info, warn, or even turning it off entirely.

No use.

with_default(Level::TRACE) hard-locks the global default to TRACE. RUST_LOG simply doesn't take effect on this path. You think you turned off logging, but it writes regardless.

The most deceptive part of this bug isn't that "you forgot to configure," but that "you configured, and it pretended not to hear."

Even more glaring is a proportion.

Breaking down the retained logs by category, TRACE accounts for 70.7%, about 732.5 MB. Adding the two mirrored telemetry logs from codex_otel (log_only and trace_safe) takes up another 25.3%.

70% of writes are TRACE noise. Combined with mirrored telemetry, 96% is pure nonsense nobody would read.

Only 4% is actually meaningful content.

This isn't the first one; at least the ninth

The reporter checked the Codex repository and found at least 9 issues of this "unbounded log growth" type.

#17320: WAL writes wildly during streaming responses, root cause identical to this one—TRACE ignoring RUST_LOG.

#24275: Desktop version logs_2.sqlite explodes.

#22444: WAL grows infinitely and doesn't free up space.

#26374: Writes 0.75GB per day, no rotation.

#27911: A 4KB goals_1.sqlite gets written at 11MB/s.

#20563: Disk writes wildly even when process is idle.

#27020: Disk at 100% activity on Windows.

The earliest trace leads back to #12969, the very PR that connected the SQLite feedback log sink at the TRACE level.

A 4KB database being written at 11MB/s is enough for a standalone article. Yet, both it and the 640TB issue are symptoms of the same product, the same telemetry system.

This shows that from the beginning, Codex's logging and telemetry systems lacked the concept of a "resource budget."

The entire field is competing over token budgets, context lengths, and model capabilities.

But almost no one asks: who manages the disk, memory, and CPU budget for an Agent that resides on a user's machine, running 24/7?

Fixed, but in a very OpenAI way

Reported on GitHub on June 14th. On June 23rd, the reporter updated: three PRs merged. According to his own Codex feedback, they reduced logs by about 85%, so he closed the issue.

First, about that 85%—it's not 100%, and it's not fully deployed yet.

Of the three fixes, #29432 and #29457 were released with version 0.142.0, cutting out per-WebSocket logging and noisy targets. The third, #29599, stops another type of bridged redundant log and will be in version 0.143.0.

Even with all three in place, the remaining ~15% still amounts to ~96TB of writes per year, merely reducing it from "burning out a drive in a year" to "burning it out in six years."

Some defended it: trace logs are stored by design for debugging, not a bug, and indeed help OpenAI track down edge cases.

But that's precisely the issue: using paying users' SSD endurance as free storage for the vendor's debugging—did the users ever agree to this?

In the programming battlefield, more than just SSDs are being burned through

Interestingly, Codex wasn't the only one singled out.

Comments quickly added: Claude Code also writes debug logs heavily to local storage. Some had to symlink the log directory to a RAM disk (tmpfs) to extend their SSD's life.

Both flagship products suffer from the same type of flaw.

Community discussion soon expanded from one bug to the broader quality issues of AI programming tools.

Some complained these agents keep the GPU maxed out and memory usage at 70GB+. Others simply coined a name for this generation of software: slopware.

The original developer's suggestion was simple: set a limit for the app, don't exceed 3GB. Just this one line took Codex 9 issues and several months to finally consider drawing.

The question is: why would a company that constantly talks about "AGI" stumble on a problem even an intern could spot?

Why could this flaw hide for so long? One comment also hit the nail on the head.

A decade ago, setting logs to TRACE would cause the program to freeze instantly, fixed the same day. Today, CPUs are fast enough, memory large enough, disks robust enough that such flaws are quietly absorbed by hardware performance. The program runs, the interface works, users feel nothing—until the SSD dies prematurely one day.

In recent years, software has been stuffed with AI-generated code. Features pile up, abstraction layers thicken, and resource consumption skyrockets, all barely propped up by hardware makers releasing faster chips every year.

Thus, an absurd cycle emerged: software gets worse, hardware gets more powerful. Users pay for new machines under the illusion that "it doesn't seem slower," when in reality, the new hardware is just barely supporting worse software.

A single small bug can't crush OpenAI, of course. But competition between Codex and Claude Code has already spread from model capabilities to the entry point of developers' workflows.

On this front, making quick changes and responding to developer needs was never a bonus—it's just the entry ticket.

References:

https://github.com/openai/codex/issues/28224

https://news.ycombinator.com/item?id=48626930

This article is from the WeChat public account "New Zhiyuan," author: ASI Revelation

熱門幣種推薦

相關問答

QWhat is the core issue reported regarding OpenAI's Codex programming tool in the article?

AThe core issue is that Codex's SQLite feedback logging mechanism is set to TRACE level by default, causing it to perform an excessive number of database insert and delete operations. This results in an estimated 640TB of write operations to a user's solid-state drive (SSD) per year, which can quickly deplete or exceed the drive's total write endurance (TBW), potentially leading to premature drive failure.

QHow does the article describe the 'insert-and-prune' mechanism causing the excessive disk writes?

AThe 'insert-and-prune' mechanism described in the article involves Codex continuously inserting new log entries into a SQLite database (logs_2.sqlite) and immediately deleting old ones to keep the file size constant (around 1GB). However, each insert and delete operation contributes to the total data written to the SSD. Even though the file size remains small, the constant churn of data within the file leads to massive cumulative write volumes on the physical disk over time.

QAccording to the article, why was this logging bug difficult for users to detect initially?

AThe bug was difficult to detect initially because it did not consume significant disk space. The logs_2.sqlite file maintained a relatively constant size (~1GB). The massive write volume was only observable by monitoring the drive's SMART health data (which tracks total bytes written), not by checking typical metrics like available disk space or file size.

QWhat was the community's reaction on platforms like Hacker News, as mentioned in the article?

AThe community reaction on platforms like Hacker News was highly critical. A high-rated comment labeled the issue as one of the most notorious examples of 'slopware' (slop software). There was frustration over the delayed response from OpenAI and criticism that such a fundamental resource management flaw could persist in a flagship product, with some comments calling for more competition in the AI coding assistant space.

QWhat broader software development trend does the article suggest this Codex bug exemplifies?

AThe article suggests this bug exemplifies a broader trend of declining software efficiency, often referred to as 'slopware.' Modern software, increasingly bloated with AI-generated code and complex abstractions, consumes excessive system resources (CPU, memory, disk I/O). This inefficiency is often masked by the rapid performance improvements in hardware, creating a cycle where users upgrade their hardware to compensate for poorly optimized software rather than demanding better code quality.

你可能也喜歡

XRP 1美元价位是熊市陷阱吗?数据揭示了什么

市场关于新一轮山寨币周期的讨论再次升温。本周表现强化了这一叙事:比特币周内上涨超5%,而以太坊资金流入量是其两倍,ETH/BTC比率创下去年8月以来最强周涨幅,显示出“山寨币引领”的动能正在积聚。 瑞波币(XRP)并未落后,本周上涨近9%,最高触及1.18美元,为六月中旬以来首次重返该水平,表明资金正轮动至高贝塔值山寨币。有分析师监测到某鲸鱼以1.10美元均价建立了1600万美元的XRP多头头寸,目前已产生约47.7万美元未实现利润,这并非偶然的高杠杆赌注,而是暗示市场对XRP在后市继续上涨的信念增强。 然而,XRP的多头杠杆正在急剧增加,这在价格接近1美元供应区时,增加了因情绪轻微转变而引发多头清算的风险。短期回调的论点因此获得关注。但机构资金流向描绘了另一幅图景:美国现货XRP ETF本周录得1719万美元净流入,尽管期间有两日出现流出,但需求仍具韧性。这已是该ETF连续第九周净流入,显示出“持续”的机构兴趣。 相比之下,同期以太坊ETF则持续录得净流出。这种分化可能使XRP区别于其他山寨币。简言之,ETH的短期上涨似乎主要由资金轮动驱动,而XRP则显示出更强的长期动能,使其在1美元附近的盘整看起来更像一个潜在的“多头陷阱”,持续的多头头寸和机构资金流入正共同构成看涨压力。

ambcrypto7 小時前

XRP 1美元价位是熊市陷阱吗?数据揭示了什么

ambcrypto7 小時前

交易

現貨

熱門文章

如何購買T

歡迎來到HTX.com!在這裡,購買Threshold Network Token (T)變得簡單而便捷。跟隨我們的逐步指南,放心開始您的加密貨幣之旅。第一步:創建您的HTX帳戶使用您的 Email、手機號碼在HTX註冊一個免費帳戶。體驗無憂的註冊過程並解鎖所有平台功能。立即註冊第二步:前往買幣頁面,選擇您的支付方式信用卡/金融卡購買:使用您的Visa或Mastercard即時購買Threshold Network Token (T)。餘額購買:使用您HTX帳戶餘額中的資金進行無縫交易。第三方購買:探索諸如Google Pay或Apple Pay等流行支付方式以增加便利性。C2C購買:在HTX平台上直接與其他用戶交易。HTX 場外交易 (OTC) 購買:為大量交易者提供個性化服務和競爭性匯率。第三步:存儲您的Threshold Network Token (T)購買Threshold Network Token (T)後,將其存儲在您的HTX帳戶中。您也可以透過區塊鏈轉帳將其發送到其他地址或者用於交易其他加密貨幣。第四步:交易Threshold Network Token (T)在HTX的現貨市場輕鬆交易Threshold Network Token (T)。前往您的帳戶,選擇交易對,執行交易,並即時監控。HTX為初學者和經驗豐富的交易者提供了友好的用戶體驗。

878 人學過發佈於 2024.12.10更新於 2026.06.02

如何購買T

相關討論

歡迎來到 HTX 社群。在這裡,您可以了解最新的平台發展動態並獲得專業的市場意見。 以下是用戶對 T (T)幣價的意見。

活动图片