The More Frequently They Are Updated, the More Similar Claude Code and Codex Become

marsbitPubblicato 2026-04-19Pubblicato ultima volta 2026-04-19

Introduzione

OpenAI's recent release of GPT-5.4-Cyber demonstrates a striking convergence with Anthropic's Claude Mythos, reflecting a broader trend of product and strategic alignment between the two AI giants. This is particularly evident in their flagship coding assistants, Codex and Claude Code, which have evolved from distinct philosophies into increasingly similar tools. Initially, Codex emphasized speed and real-time interaction, acting like a fast, junior developer, while Claude Code focused on handling extreme complexity with methodical, large-context analysis. However, both have adopted near-identical solutions to core challenges, such as using isolated sub-tasks or agent teams to prevent context pollution during large-scale code modifications. Benchmark results show a tight race: Codex leads in terminal tasks, while Claude Code excels in complex software engineering benchmarks. Community feedback highlights nuanced differences; Claude Code is faster but can accumulate technical debt, whereas Codex is slower but more deliberate and autonomous. The open-source framework OpenClaw has accelerated this homogenization by standardizing workflows, eroding proprietary advantages. Ultimately, the competition has shifted from pure capability to ecosystem strategy, pricing, and user experience. As these tools become ubiquitous, the developer's role evolves toward higher-level problem definition and architectural thinking, beyond automated code generation.

A few days ago, OpenAI officially released the new large model GPT-5.4-Cyber. Like many netizens, this model also gave us an extremely strong sense of déjà vu.

This new model, in terms of target user base, application scenarios, and even promotional strategy, almost completely mirrors Anthropic's recently released Claude Mythos. This "close-quarters combat" posture has reached a point of being completely unabashed. Even The New York Times pointed out sharply in the headline of its latest report: "Like Anthropic, OpenAI...".

This trend of homogenization is by no means limited to the underlying base models. If you look at the series of products recently released by these two companies, you will find that they are becoming mirror images of each other!

Under the shadowless lamp of the capital market, this convergence is even more obvious. Currently, the valuations of the two companies in the secondary market are very close, with Anthropic's even being slightly higher than OpenAI's recently, thanks to its rapid advance in the enterprise market. Capital has the most sensitive nose; in their eyes, these two unicorns are growing the same horns.

It seems that the homogenization of the underlying large models will inevitably lead to the convergence of upper-layer applications.

Today, what I want to discuss with you are the two benchmark tools representing the highest level of AI-assisted programming today: OpenAI's Codex and Anthropic's Claude Code. From once going their separate ways to now converging on the same path, how did they gradually grow to look the same?

From Divergence to Convergence: The Evolution History of the Two Titans

Rewind the clock a few years, and Codex and Claude Code were products of completely different technological philosophies.

Codex's underlying logic is "the ultimate martial art is unsurpassable speed." It is like a senior developer with 5 years of experience following behind you, ready to complete your code at any time.

In OpenAI's conception, Codex is a lightweight, highly interactive terminal agent focused on rapid iteration and interactive programming. Its execution speed is extremely fast; with the support of Cerebras WSE-3 hardware, it can achieve a throughput of 1000 tokens per second. In specific workflows, Codex offers three clear approval modes: suggestion, auto-edit, and full-auto, keeping the developer always in the loop. This design philosophy fits perfectly with geek developers who need to quickly build prototypes and handle high-frequency interactions.

In contrast, Claude Code, from its birth, carried a cold and restrained "architect" attribute.

Anthropic infused it with the genes to handle extremely complex tasks. It relies on a massive context window of up to 1 million tokens and unique "compression" technology to achieve infinite conversation. Claude Code's creed is "global control, plan before acting." Before performing any action, it uses agent search technology to thoroughly understand the context of the entire codebase, then coordinates multi-file consistency modifications. For enterprise-level refactoring tasks involving tens of thousands of lines of code migration, Claude Code has shown astonishing dominance.

However, as time passed and application scenarios continued to expand, these two tools, which originally had very different personalities, began to copy each other's homework.

Image source: MorphLLM

The biggest bottleneck a monolithic AI model faces when handling complex projects is context pollution. You ask the AI to refactor an authentication module; after it reads 40 files, it often forgets the design pattern of the first file. To solve this pain point, the two companies came up with almost identical answers: assign an independent context window for each subtask.

OpenAI quickly launched a new macOS desktop application, isolating tasks into different threads by project and running them independently in a cloud sandbox. Anthropic introduced an agent team architecture, allowing developers to spawn multiple sub-agents that share task lists and dependencies and work in parallel in their own independent windows. You'll find that whether it's called a "cloud sandbox" or an "agent team," their core engineering concepts have completely converged.

On the benchmark test scorecards, they also show a delicate balance. GPT-5.3-Codex leads in the terminal task Terminal-Bench 2.0 with a score of 77.3%. Claude Code scored 80.8% on the complex SWE-bench Verified leaderboard. They have both achieved the extreme in their areas of strength while desperately trying to弥补 (compensate for) their own shortcomings.

The OpenClaw Effect: The Invisible Hand Toppling the Walls

If the internal strategies of the two companies determine the internal cause of their homogenization, then the pressure from the entire open-source ecosystem is an external force that cannot be ignored. Here, we must mention the profound impact OpenClaw has had on the entire AI programming tools track.

As a workflow framework launched by the open-source community, the emergence of OpenClaw can be said to have toppled the ecological walls painstakingly built by the giants. It standardized the interaction process between large models and local terminal toolchains. In the past, how to elegantly allow large models to call local Git commits, how to safely run test scripts in a sandbox, how to perform multi-step reasoning verification—these were all proprietary "black technologies" that Codex and Claude Code were proud of.

But OpenClaw abstracted these processes into a universal protocol. This means that developers no longer need to be locked into a specific platform for a particular collaboration mode. The open-source community's狂欢 (carnival) made standardization an irreversible tide. Faced with this situation, both OpenAI and Anthropic had to lower their姿态 (posture) to兼容 (compatible) with this open standard.

When the underlying technical barriers were leveled by open-source forces like OpenClaw, when all advanced features became standard industry配置 (configurations), the only way out for Codex and Claude Code was to engage in endless involution at the more subtle level of user experience. This is also why we feel they are becoming more and more similar—because under a standardized framework, there is often only one optimal solution—just like convergent evolution in biology.

Codex is Catching Up to Claude Code

Although Claude Code and Codex are on the path of convergent evolution, differences between the two still exist, and Codex is even preferred by developers in some aspects.

The other day, on the r/ClaudeCode community, a senior engineer with 14 years of experience who had worked at tech giants, u/Canamerican726, shared an extremely hardcore evaluation.

Specifically, he invested 100 hours using Claude Code and 20 hours using Codex in a complex project containing 80,000 lines of code.

From his perspective, using Claude Code was like instructing an engineer chased by a deadline; it sprinted extremely fast but often ignored the specifications written by the developer in CLAUDE.md, and liked to continuously pile code into existing files to complete tasks, lacking refactoring thinking.

In contrast, Codex felt more like a steady veteran with 5 to 6 years of experience. Its processing speed was 3 to 4 times slower, but it would proactively stop to think and refactor code midway, and strictly adhere to instruction boundaries. This high degree of autonomy allowed this engineer to dare to throw tasks directly at it and then放心地 (feel at ease) go do other things.

The same voices appear on social networks like X. Researcher Aran Komatsuzaki mentioned, based on his own experience, that Claude Code still has the advantage in the front-end field, but in back-end planning and keeping information updated, Codex, which frequently calls web search, is显然 (clearly) more solid.

The comment section is filled with bloody lessons总结 (summaries) from real business scenarios. Some developers pointed out极其犀利地 (extremely sharply) that models based on Opus, although fast, often accumulate a large amount of "code cleaning debt" for projects; Codex is slow, but can clean the floor顺手 (in passing) while moving forward. I even saw users summarizing a survival rule, suggesting that everyone immediately start a new session when context window usage reaches 70%, otherwise it is extremely easy to receive系统附赠的 (system-attached) hidden bugs.

These real complaints from the front line clearly show that when the ability panels of the two great tools increasingly overlap, what ultimately determines which camp developers belong to is often these tiny experience gaps related to "pit-filling costs" and "maintenance mental load." Of course, there are some special difficulties for Chinese users, such as:

Cold Thinking: The Ecosystem Battle Behind Homogenization

Of course, the pros and cons of Codex and Claude Code also depend on the developers themselves and their own abilities. As summarized in the evaluation report by u/Canamerican726 mentioned above: If you don't understand software engineering, both tools will output糟糕的 (poor) results; tools are not equivalent to skills.

This sentence punctures a certain illusion long营造 (created) by AI programming tools. We once thought that with a powerful enough AI assistant, even a Vobe Coder with no foundation could single-handedly create enterprise-level applications. But the reality is that Claude Code needs an extremely focused and highly skilled "pilot," otherwise it can easily get lost in a huge codebase. Codex, although more independent, also requires developers to provide accurate system context to发挥最大效用 (achieve maximum utility).

So, in today's world of highly homogenized tool capabilities, where have the moats of these two companies转移 (moved) to?

The answer lies in those boring financial statements and pricing strategies. For the same task, the number of tokens consumed by Claude Code is often 3 to 4 times that of Codex. The usage cost is higher. For enterprise teams, using Claude Code costs $100 to $200 per developer per month. Codex, on the other hand, bundles its capabilities into more affordable subscription plans and has accumulated a large number of basic users through the vast GitHub community.

Image source: MorphLLM

Anthropic's ambition is to deeply embed Claude Code into the workflows of tech giants who are not short of money. For example, Stripe had 1370 engineers use Claude Code to complete a cross-language code migration in 4 days that would have taken 10 people weeks. Ramp company relied on it to reduce event response time by 80%. OpenAI, relying on its ubiquitous ecological penetration, has made Codex the default choice for many ordinary developers.

This is no longer a单纯 (pure) technical competition, but a war of attrition about ecological binding, pricing strategies, and reshaping user habits.

The Developer's Crossroads

Looking back at the technological evolution of the past year, the release of GPT-5.4-Cyber is just a small footnote in this long battle. Codex and Claude Code moving towards "the same face" marks the official entry of AI programming tools from an early testing phase full of variables and novelty into a mature and boring industrialized production phase.

Now, Claude Code automatically generates 135,000 GitHub commits daily, a number that already accounts for 4% of all public commits on the entire network. We can foresee that in the near future, most boilerplate code, basic test cases, and常规的 (routine) code refactoring will be silently completed in the background by these AI agents that look more and more alike.

Image source: MorphLLM & SemiAnalysis / GitHub Search API

Facing two super tools that are infinitely接近 (approaching) in capability and模仿 (imitating) each other in experience, what core value do we, as human developers, have left? Perhaps, the tool红利期 (dividend period) is about to end completely. When everyone holds equally sharp weapons, what truly determines victory will no longer be who has better code completion speed, but who can better define problems, who has a broader system architecture vision, and who can find that unique irreplaceability belonging to humans in this code world filled with AI.

By the way, which one do you choose?

Reference Links

https://www.morphllm.com/comparisons/codex-vs-claude-code

https://www.reddit.com/r/ClaudeCode/comments/1sk7e2k/claude_code_100_hours_vs_codex_20_hours/

https://x.com/arankomatsuzaki/status/2044270102003196007

https://www.nytimes.com/2026/04/14/technology/openai-cybersecurity-gpt54-cyber.html

This article is from the WeChat public account "机器之心" (ID: almosthuman2014), author: 机器之心 (Machine Heart)

Domande pertinenti

QWhat is the main trend observed between OpenAI's Codex and Anthropic's Claude Code according to the article?

AThe main trend is that Codex and Claude Code are becoming increasingly similar and homogeneous in their capabilities and approaches, evolving from distinct technical philosophies to convergent solutions.

QHow did the initial technical philosophies of Codex and Claude Code differ?

ACodex was initially designed as a lightweight, high-interaction terminal agent focused on speed and iterative programming, while Claude Code was built as a high-level 'architect' focused on handling extremely complex tasks with a massive context window and thorough codebase analysis.

QWhat external factor is cited as a significant force pushing Codex and Claude Code towards standardization and homogeneity?

AThe OpenClaw open-source workflow framework is cited as a major external force that standardized the interaction between large models and local toolchains, breaking down proprietary barriers and forcing both platforms to adopt common protocols.

QAccording to user feedback, what is a key practical difference in how Codex and Claude Code handle complex coding tasks?

AUser feedback indicates that Claude Code often works very fast but can accumulate 'code cleaning debt' by ignoring specifications and stacking code, while Codex is slower but more thoughtful, proactively refactoring code and strictly adhering to instruction boundaries.

QWhere has the competitive battleground between Codex and Claude Code shifted, now that their technical capabilities are converging?

AThe competition has shifted to ecosystem strategy, pricing models, and user habit formation, with Anthropic targeting deep integration into well-funded enterprise workflows and OpenAI leveraging its broad GitHub community penetration and more affordable subscription plans.

Letture associate

A 134% Surge, 75 P/E Ratio: Why Is the Market Paying Up for Murata's 'Zero Growth'?

Murata Manufacturing, the world's largest passive components maker, saw its stock price surge 134% over the past year and hit a record high on May 28th, despite reporting nearly zero growth in operating profit for its latest fiscal year. This has pushed its valuation to a P/E ratio of approximately 75x. The disconnect is driven by a fundamental market re-rating. The catalyst was a late-May meeting where management upgraded the AI investment cycle outlook to "lasting until around 2030" and noted that demand for its components is roughly double its supply capacity, with customers prioritizing securing volume over price. While Murata's revenue grew only 5.0% and operating profit stagnated at ¥281.8 billion for the fiscal year ending March 2026, its guidance for the current fiscal year projects a 34.8% jump in operating profit to ¥380 billion. This sharp growth is underpinned by expectations that its AI/data center-related revenue will nearly double from ¥170 billion to ¥325 billion, becoming a key pillar of its business. Analysts highlight that this growth stems not from broad price hikes but from a shift towards higher-value, cutting-edge MLCCs for AI servers, where Murata holds over 70% market share. The market is now pricing Murata not as a cyclical component maker but as a critical "AI pick-and-shovel" supplier with structural pricing power. However, the high valuation also carries risk if future AI demand or quarterly guidance falls short of the elevated expectations.

marsbit9 min fa

A 134% Surge, 75 P/E Ratio: Why Is the Market Paying Up for Murata's 'Zero Growth'?

marsbit9 min fa

a16z: Why Do Prediction Markets Matter?

Prediction markets, which allow users to trade on the outcome of future events, have gained significant traction, especially in the U.S. At their core, these markets function like any other market by aggregating information from all participants and translating it into a price signal—in this case, the perceived probability of a specific event occurring. Unlike polls or surveys that offer static snapshots, prediction markets provide dynamic, quantifiable probability estimates that update in real-time as new information and participants enter. A key advantage is the incentive structure: participants risk their own capital, which encourages serious research and trading based on genuine knowledge. This can surface information that traditional methods might miss. Furthermore, prediction markets can be created for a vast array of specialized questions—from geopolitical events to AI model performance—that aren't covered by traditional financial markets. However, several challenges remain. Infrastructure issues include reliably determining event outcomes and resolving disputes. Market design must ensure participation from well-informed individuals while preventing manipulation, such as insider trading or attempts to sway public perception by artificially moving prices. Addressing these concerns around rules, participation, and contract design is crucial. If these hurdles are overcome, prediction markets could evolve into a powerful, widely-used tool for forecasting and navigating uncertainty.

marsbit20 min fa

a16z: Why Do Prediction Markets Matter?

marsbit20 min fa

Interview with 7 Ordinary Professionals: After AI Arrived, How Are You Doing?

This article interviews seven professionals from diverse fields like Web3, bulk chemical trading, digital agriculture, and traditional wholesale to examine the impact of AI on their work. Key themes emerge from the discussions. AI has become integral to their workflows, primarily for increasing efficiency in tasks such as coding, content creation, research, and data analysis. Individuals across roles, from developers to managers, report that AI tools like ChatGPT and Claude have significantly reduced workloads and accelerated learning, creating opportunities for "super individuals" or one-person teams. However, this efficiency comes with a double-edged sword. It intensifies competition, pushing professionals to constantly learn new tools and adapt, leading to widespread anxiety about job security and a heightened pressure to keep pace. Interviewees anticipate significant job reductions in roles like administrative support, finance, HR, customer service, and some creative fields. A recurring view is that AI acts as a "great equalizer," amplifying the capabilities of those who use it effectively while leaving others behind, potentially deepening polarization. Despite AI's capabilities, interviewees identify enduring human strengths. AI struggles with tasks requiring deep contextual understanding, complex judgment in areas like risk assessment and system stability (especially in finance/Web3), nuanced human communication, and handling exceptions in logistics and manufacturing. These areas remain firmly in the human domain. Consequently, many professionals are refocusing their career strategies. They plan to evolve from task executors into "complex system owners," "super coordinators" managing AI agents, or specialists in high-level areas like business context, risk control, product design, and personal branding. In summary, the article portrays AI not as an optional tool but as a transformative force reshaping job demands. While it automates routine work, it also creates new forms of pressure and competition. The future, as seen by these professionals, belongs to those who can strategically integrate AI to augment uniquely human skills like judgment, responsibility, and strategic oversight.

marsbit35 min fa

Interview with 7 Ordinary Professionals: After AI Arrived, How Are You Doing?

marsbit35 min fa

Satoshi Nakamoto Sued? $83.7 Billion Worth of BTC Up for 'Legal Claim'

An anonymous individual known as Noah Doe, along with two Wyoming LLCs, has filed a lawsuit in the New York Supreme Court. They are attempting to use New York's "lost and found" laws to claim legal ownership of approximately 837 billion USD worth of Bitcoin held in 39,069 dormant addresses. Crucially, this list includes addresses believed to belong to Bitcoin's creator, Satoshi Nakamoto (holding around 837 billion USD), alongside other long-inactive addresses from Mt. Gox and early Bitcoin holders. The plaintiff's legal strategy hinges on classifying these public Bitcoin addresses as "lost property." They submitted a USB drive containing only the public addresses to the New York Police Department, sent OP_RETURN notifications on the Bitcoin blockchain, and issued press releases. Their argument is that after these efforts and a waiting period, they should be granted ownership. A key, and highly controversial, claim is an unnamed "independent expert" valuing each address at under 10 USD, allowing for a faster legal process. Analysts from Galaxy point out major flaws in the case. The plaintiff never physically possessed the Bitcoin or private keys. The "under 10 USD" valuation is considered unrealistic, and allowing anonymous companies to claim such vast assets is highly unusual. Even if the plaintiff wins, they would only receive a court declaration of ownership, not the actual private keys to move the Bitcoin. The real danger lies in this court document acting as a "cloud on title." If any of these Bitcoins are later transferred to a regulated exchange or custodian, the plaintiff could present the judgment to freeze the assets, forcing the true owner into lengthy and de-anonymizing litigation to prove ownership. The outcome is uncertain, but the case highlights potential legal risks for dormant cryptocurrency holdings.

marsbit40 min fa

Satoshi Nakamoto Sued? $83.7 Billion Worth of BTC Up for 'Legal Claim'

marsbit40 min fa

Trading

Spot
Futures
活动图片