The More Frequently They Are Updated, the More Similar Claude Code and Codex Become

marsbitPublicado em 2026-04-19Última atualização em 2026-04-19

Resumo

OpenAI's recent release of GPT-5.4-Cyber demonstrates a striking convergence with Anthropic's Claude Mythos, reflecting a broader trend of product and strategic alignment between the two AI giants. This is particularly evident in their flagship coding assistants, Codex and Claude Code, which have evolved from distinct philosophies into increasingly similar tools. Initially, Codex emphasized speed and real-time interaction, acting like a fast, junior developer, while Claude Code focused on handling extreme complexity with methodical, large-context analysis. However, both have adopted near-identical solutions to core challenges, such as using isolated sub-tasks or agent teams to prevent context pollution during large-scale code modifications. Benchmark results show a tight race: Codex leads in terminal tasks, while Claude Code excels in complex software engineering benchmarks. Community feedback highlights nuanced differences; Claude Code is faster but can accumulate technical debt, whereas Codex is slower but more deliberate and autonomous. The open-source framework OpenClaw has accelerated this homogenization by standardizing workflows, eroding proprietary advantages. Ultimately, the competition has shifted from pure capability to ecosystem strategy, pricing, and user experience. As these tools become ubiquitous, the developer's role evolves toward higher-level problem definition and architectural thinking, beyond automated code generation.

A few days ago, OpenAI officially released the new large model GPT-5.4-Cyber. Like many netizens, this model also gave us an extremely strong sense of déjà vu.

This new model, in terms of target user base, application scenarios, and even promotional strategy, almost completely mirrors Anthropic's recently released Claude Mythos. This "close-quarters combat" posture has reached a point of being completely unabashed. Even The New York Times pointed out sharply in the headline of its latest report: "Like Anthropic, OpenAI...".

This trend of homogenization is by no means limited to the underlying base models. If you look at the series of products recently released by these two companies, you will find that they are becoming mirror images of each other!

Under the shadowless lamp of the capital market, this convergence is even more obvious. Currently, the valuations of the two companies in the secondary market are very close, with Anthropic's even being slightly higher than OpenAI's recently, thanks to its rapid advance in the enterprise market. Capital has the most sensitive nose; in their eyes, these two unicorns are growing the same horns.

It seems that the homogenization of the underlying large models will inevitably lead to the convergence of upper-layer applications.

Today, what I want to discuss with you are the two benchmark tools representing the highest level of AI-assisted programming today: OpenAI's Codex and Anthropic's Claude Code. From once going their separate ways to now converging on the same path, how did they gradually grow to look the same?

From Divergence to Convergence: The Evolution History of the Two Titans

Rewind the clock a few years, and Codex and Claude Code were products of completely different technological philosophies.

Codex's underlying logic is "the ultimate martial art is unsurpassable speed." It is like a senior developer with 5 years of experience following behind you, ready to complete your code at any time.

In OpenAI's conception, Codex is a lightweight, highly interactive terminal agent focused on rapid iteration and interactive programming. Its execution speed is extremely fast; with the support of Cerebras WSE-3 hardware, it can achieve a throughput of 1000 tokens per second. In specific workflows, Codex offers three clear approval modes: suggestion, auto-edit, and full-auto, keeping the developer always in the loop. This design philosophy fits perfectly with geek developers who need to quickly build prototypes and handle high-frequency interactions.

In contrast, Claude Code, from its birth, carried a cold and restrained "architect" attribute.

Anthropic infused it with the genes to handle extremely complex tasks. It relies on a massive context window of up to 1 million tokens and unique "compression" technology to achieve infinite conversation. Claude Code's creed is "global control, plan before acting." Before performing any action, it uses agent search technology to thoroughly understand the context of the entire codebase, then coordinates multi-file consistency modifications. For enterprise-level refactoring tasks involving tens of thousands of lines of code migration, Claude Code has shown astonishing dominance.

However, as time passed and application scenarios continued to expand, these two tools, which originally had very different personalities, began to copy each other's homework.

Image source: MorphLLM

The biggest bottleneck a monolithic AI model faces when handling complex projects is context pollution. You ask the AI to refactor an authentication module; after it reads 40 files, it often forgets the design pattern of the first file. To solve this pain point, the two companies came up with almost identical answers: assign an independent context window for each subtask.

OpenAI quickly launched a new macOS desktop application, isolating tasks into different threads by project and running them independently in a cloud sandbox. Anthropic introduced an agent team architecture, allowing developers to spawn multiple sub-agents that share task lists and dependencies and work in parallel in their own independent windows. You'll find that whether it's called a "cloud sandbox" or an "agent team," their core engineering concepts have completely converged.

On the benchmark test scorecards, they also show a delicate balance. GPT-5.3-Codex leads in the terminal task Terminal-Bench 2.0 with a score of 77.3%. Claude Code scored 80.8% on the complex SWE-bench Verified leaderboard. They have both achieved the extreme in their areas of strength while desperately trying to弥补 (compensate for) their own shortcomings.

The OpenClaw Effect: The Invisible Hand Toppling the Walls

If the internal strategies of the two companies determine the internal cause of their homogenization, then the pressure from the entire open-source ecosystem is an external force that cannot be ignored. Here, we must mention the profound impact OpenClaw has had on the entire AI programming tools track.

As a workflow framework launched by the open-source community, the emergence of OpenClaw can be said to have toppled the ecological walls painstakingly built by the giants. It standardized the interaction process between large models and local terminal toolchains. In the past, how to elegantly allow large models to call local Git commits, how to safely run test scripts in a sandbox, how to perform multi-step reasoning verification—these were all proprietary "black technologies" that Codex and Claude Code were proud of.

But OpenClaw abstracted these processes into a universal protocol. This means that developers no longer need to be locked into a specific platform for a particular collaboration mode. The open-source community's狂欢 (carnival) made standardization an irreversible tide. Faced with this situation, both OpenAI and Anthropic had to lower their姿态 (posture) to兼容 (compatible) with this open standard.

When the underlying technical barriers were leveled by open-source forces like OpenClaw, when all advanced features became standard industry配置 (configurations), the only way out for Codex and Claude Code was to engage in endless involution at the more subtle level of user experience. This is also why we feel they are becoming more and more similar—because under a standardized framework, there is often only one optimal solution—just like convergent evolution in biology.

Codex is Catching Up to Claude Code

Although Claude Code and Codex are on the path of convergent evolution, differences between the two still exist, and Codex is even preferred by developers in some aspects.

The other day, on the r/ClaudeCode community, a senior engineer with 14 years of experience who had worked at tech giants, u/Canamerican726, shared an extremely hardcore evaluation.

Specifically, he invested 100 hours using Claude Code and 20 hours using Codex in a complex project containing 80,000 lines of code.

From his perspective, using Claude Code was like instructing an engineer chased by a deadline; it sprinted extremely fast but often ignored the specifications written by the developer in CLAUDE.md, and liked to continuously pile code into existing files to complete tasks, lacking refactoring thinking.

In contrast, Codex felt more like a steady veteran with 5 to 6 years of experience. Its processing speed was 3 to 4 times slower, but it would proactively stop to think and refactor code midway, and strictly adhere to instruction boundaries. This high degree of autonomy allowed this engineer to dare to throw tasks directly at it and then放心地 (feel at ease) go do other things.

The same voices appear on social networks like X. Researcher Aran Komatsuzaki mentioned, based on his own experience, that Claude Code still has the advantage in the front-end field, but in back-end planning and keeping information updated, Codex, which frequently calls web search, is显然 (clearly) more solid.

The comment section is filled with bloody lessons总结 (summaries) from real business scenarios. Some developers pointed out极其犀利地 (extremely sharply) that models based on Opus, although fast, often accumulate a large amount of "code cleaning debt" for projects; Codex is slow, but can clean the floor顺手 (in passing) while moving forward. I even saw users summarizing a survival rule, suggesting that everyone immediately start a new session when context window usage reaches 70%, otherwise it is extremely easy to receive系统附赠的 (system-attached) hidden bugs.

These real complaints from the front line clearly show that when the ability panels of the two great tools increasingly overlap, what ultimately determines which camp developers belong to is often these tiny experience gaps related to "pit-filling costs" and "maintenance mental load." Of course, there are some special difficulties for Chinese users, such as:

Cold Thinking: The Ecosystem Battle Behind Homogenization

Of course, the pros and cons of Codex and Claude Code also depend on the developers themselves and their own abilities. As summarized in the evaluation report by u/Canamerican726 mentioned above: If you don't understand software engineering, both tools will output糟糕的 (poor) results; tools are not equivalent to skills.

This sentence punctures a certain illusion long营造 (created) by AI programming tools. We once thought that with a powerful enough AI assistant, even a Vobe Coder with no foundation could single-handedly create enterprise-level applications. But the reality is that Claude Code needs an extremely focused and highly skilled "pilot," otherwise it can easily get lost in a huge codebase. Codex, although more independent, also requires developers to provide accurate system context to发挥最大效用 (achieve maximum utility).

So, in today's world of highly homogenized tool capabilities, where have the moats of these two companies转移 (moved) to?

The answer lies in those boring financial statements and pricing strategies. For the same task, the number of tokens consumed by Claude Code is often 3 to 4 times that of Codex. The usage cost is higher. For enterprise teams, using Claude Code costs $100 to $200 per developer per month. Codex, on the other hand, bundles its capabilities into more affordable subscription plans and has accumulated a large number of basic users through the vast GitHub community.

Image source: MorphLLM

Anthropic's ambition is to deeply embed Claude Code into the workflows of tech giants who are not short of money. For example, Stripe had 1370 engineers use Claude Code to complete a cross-language code migration in 4 days that would have taken 10 people weeks. Ramp company relied on it to reduce event response time by 80%. OpenAI, relying on its ubiquitous ecological penetration, has made Codex the default choice for many ordinary developers.

This is no longer a单纯 (pure) technical competition, but a war of attrition about ecological binding, pricing strategies, and reshaping user habits.

The Developer's Crossroads

Looking back at the technological evolution of the past year, the release of GPT-5.4-Cyber is just a small footnote in this long battle. Codex and Claude Code moving towards "the same face" marks the official entry of AI programming tools from an early testing phase full of variables and novelty into a mature and boring industrialized production phase.

Now, Claude Code automatically generates 135,000 GitHub commits daily, a number that already accounts for 4% of all public commits on the entire network. We can foresee that in the near future, most boilerplate code, basic test cases, and常规的 (routine) code refactoring will be silently completed in the background by these AI agents that look more and more alike.

Image source: MorphLLM & SemiAnalysis / GitHub Search API

Facing two super tools that are infinitely接近 (approaching) in capability and模仿 (imitating) each other in experience, what core value do we, as human developers, have left? Perhaps, the tool红利期 (dividend period) is about to end completely. When everyone holds equally sharp weapons, what truly determines victory will no longer be who has better code completion speed, but who can better define problems, who has a broader system architecture vision, and who can find that unique irreplaceability belonging to humans in this code world filled with AI.

By the way, which one do you choose?

Reference Links

https://www.morphllm.com/comparisons/codex-vs-claude-code

https://www.reddit.com/r/ClaudeCode/comments/1sk7e2k/claude_code_100_hours_vs_codex_20_hours/

https://x.com/arankomatsuzaki/status/2044270102003196007

https://www.nytimes.com/2026/04/14/technology/openai-cybersecurity-gpt54-cyber.html

This article is from the WeChat public account "机器之心" (ID: almosthuman2014), author: 机器之心 (Machine Heart)

Perguntas relacionadas

QWhat is the main trend observed between OpenAI's Codex and Anthropic's Claude Code according to the article?

AThe main trend is that Codex and Claude Code are becoming increasingly similar and homogeneous in their capabilities and approaches, evolving from distinct technical philosophies to convergent solutions.

QHow did the initial technical philosophies of Codex and Claude Code differ?

ACodex was initially designed as a lightweight, high-interaction terminal agent focused on speed and iterative programming, while Claude Code was built as a high-level 'architect' focused on handling extremely complex tasks with a massive context window and thorough codebase analysis.

QWhat external factor is cited as a significant force pushing Codex and Claude Code towards standardization and homogeneity?

AThe OpenClaw open-source workflow framework is cited as a major external force that standardized the interaction between large models and local toolchains, breaking down proprietary barriers and forcing both platforms to adopt common protocols.

QAccording to user feedback, what is a key practical difference in how Codex and Claude Code handle complex coding tasks?

AUser feedback indicates that Claude Code often works very fast but can accumulate 'code cleaning debt' by ignoring specifications and stacking code, while Codex is slower but more thoughtful, proactively refactoring code and strictly adhering to instruction boundaries.

QWhere has the competitive battleground between Codex and Claude Code shifted, now that their technical capabilities are converging?

AThe competition has shifted to ecosystem strategy, pricing models, and user habit formation, with Anthropic targeting deep integration into well-funded enterprise workflows and OpenAI leveraging its broad GitHub community penetration and more affordable subscription plans.

Leituras Relacionadas

An Open-Source AI Tool That No One Saw Predicted Kelp DAO's $292 Million Vulnerability 12 Days Ago

An open-source AI security tool flagged critical risks in Kelp DAO’s cross-chain architecture 12 days before a $292 million exploit on April 18, 2026—the largest DeFi incident of the year. The vulnerability was not in the smart contracts but in the configuration of LayerZero’s cross-chain bridge: a 1-of-1 Decentralized Verifier Network (DVN) setup allowed an attacker to forge cross-chain messages with a single compromised node. The tool, which performs AI-assisted architectural risk assessments using public data, identified several unremediated risks, including opaque DVN configuration, single-point-of-failure across 16 chains, unverified cross-chain governance controls, and similarities to historical bridge attacks like Ronin and Harmony. It also noted the absence of an insurance pool, which amplified losses as Aave and other protocols absorbed nearly $300M in bad debt. The attack unfolded over 46 minutes: the attacker minted 116,500 rsETH on Ethereum via a fraudulent message, used it as collateral to borrow WETH on lending platforms, and laundered funds through Tornado Cash. While an emergency pause prevented two subsequent attacks worth ~$200M, the damage was severe. The tool’s report, committed to GitHub on April 6, scored Kelp DAO a medium-risk 72/100—later acknowledged as too lenient. It failed to query on-chain DVN configurations or initiate private disclosure, highlighting gaps in current DeFi security approaches that focus on code audits but miss config-level and governance risks. The incident underscores the need for independent, AI-powered risk assessment tools that evaluate protocol architecture, not just code.

marsbitHá 34m

An Open-Source AI Tool That No One Saw Predicted Kelp DAO's $292 Million Vulnerability 12 Days Ago

marsbitHá 34m

Trading

Spot
Futuros
活动图片