The More Frequently They Are Updated, the More Similar Claude Code and Codex Become

marsbitXuất bản vào 2026-04-19Cập nhật gần nhất vào 2026-04-19

Tóm tắt

OpenAI's recent release of GPT-5.4-Cyber demonstrates a striking convergence with Anthropic's Claude Mythos, reflecting a broader trend of product and strategic alignment between the two AI giants. This is particularly evident in their flagship coding assistants, Codex and Claude Code, which have evolved from distinct philosophies into increasingly similar tools. Initially, Codex emphasized speed and real-time interaction, acting like a fast, junior developer, while Claude Code focused on handling extreme complexity with methodical, large-context analysis. However, both have adopted near-identical solutions to core challenges, such as using isolated sub-tasks or agent teams to prevent context pollution during large-scale code modifications. Benchmark results show a tight race: Codex leads in terminal tasks, while Claude Code excels in complex software engineering benchmarks. Community feedback highlights nuanced differences; Claude Code is faster but can accumulate technical debt, whereas Codex is slower but more deliberate and autonomous. The open-source framework OpenClaw has accelerated this homogenization by standardizing workflows, eroding proprietary advantages. Ultimately, the competition has shifted from pure capability to ecosystem strategy, pricing, and user experience. As these tools become ubiquitous, the developer's role evolves toward higher-level problem definition and architectural thinking, beyond automated code generation.

A few days ago, OpenAI officially released the new large model GPT-5.4-Cyber. Like many netizens, this model also gave us an extremely strong sense of déjà vu.

This new model, in terms of target user base, application scenarios, and even promotional strategy, almost completely mirrors Anthropic's recently released Claude Mythos. This "close-quarters combat" posture has reached a point of being completely unabashed. Even The New York Times pointed out sharply in the headline of its latest report: "Like Anthropic, OpenAI...".

This trend of homogenization is by no means limited to the underlying base models. If you look at the series of products recently released by these two companies, you will find that they are becoming mirror images of each other!

Under the shadowless lamp of the capital market, this convergence is even more obvious. Currently, the valuations of the two companies in the secondary market are very close, with Anthropic's even being slightly higher than OpenAI's recently, thanks to its rapid advance in the enterprise market. Capital has the most sensitive nose; in their eyes, these two unicorns are growing the same horns.

It seems that the homogenization of the underlying large models will inevitably lead to the convergence of upper-layer applications.

Today, what I want to discuss with you are the two benchmark tools representing the highest level of AI-assisted programming today: OpenAI's Codex and Anthropic's Claude Code. From once going their separate ways to now converging on the same path, how did they gradually grow to look the same?

From Divergence to Convergence: The Evolution History of the Two Titans

Rewind the clock a few years, and Codex and Claude Code were products of completely different technological philosophies.

Codex's underlying logic is "the ultimate martial art is unsurpassable speed." It is like a senior developer with 5 years of experience following behind you, ready to complete your code at any time.

In OpenAI's conception, Codex is a lightweight, highly interactive terminal agent focused on rapid iteration and interactive programming. Its execution speed is extremely fast; with the support of Cerebras WSE-3 hardware, it can achieve a throughput of 1000 tokens per second. In specific workflows, Codex offers three clear approval modes: suggestion, auto-edit, and full-auto, keeping the developer always in the loop. This design philosophy fits perfectly with geek developers who need to quickly build prototypes and handle high-frequency interactions.

In contrast, Claude Code, from its birth, carried a cold and restrained "architect" attribute.

Anthropic infused it with the genes to handle extremely complex tasks. It relies on a massive context window of up to 1 million tokens and unique "compression" technology to achieve infinite conversation. Claude Code's creed is "global control, plan before acting." Before performing any action, it uses agent search technology to thoroughly understand the context of the entire codebase, then coordinates multi-file consistency modifications. For enterprise-level refactoring tasks involving tens of thousands of lines of code migration, Claude Code has shown astonishing dominance.

However, as time passed and application scenarios continued to expand, these two tools, which originally had very different personalities, began to copy each other's homework.

Image source: MorphLLM

The biggest bottleneck a monolithic AI model faces when handling complex projects is context pollution. You ask the AI to refactor an authentication module; after it reads 40 files, it often forgets the design pattern of the first file. To solve this pain point, the two companies came up with almost identical answers: assign an independent context window for each subtask.

OpenAI quickly launched a new macOS desktop application, isolating tasks into different threads by project and running them independently in a cloud sandbox. Anthropic introduced an agent team architecture, allowing developers to spawn multiple sub-agents that share task lists and dependencies and work in parallel in their own independent windows. You'll find that whether it's called a "cloud sandbox" or an "agent team," their core engineering concepts have completely converged.

On the benchmark test scorecards, they also show a delicate balance. GPT-5.3-Codex leads in the terminal task Terminal-Bench 2.0 with a score of 77.3%. Claude Code scored 80.8% on the complex SWE-bench Verified leaderboard. They have both achieved the extreme in their areas of strength while desperately trying to弥补 (compensate for) their own shortcomings.

The OpenClaw Effect: The Invisible Hand Toppling the Walls

If the internal strategies of the two companies determine the internal cause of their homogenization, then the pressure from the entire open-source ecosystem is an external force that cannot be ignored. Here, we must mention the profound impact OpenClaw has had on the entire AI programming tools track.

As a workflow framework launched by the open-source community, the emergence of OpenClaw can be said to have toppled the ecological walls painstakingly built by the giants. It standardized the interaction process between large models and local terminal toolchains. In the past, how to elegantly allow large models to call local Git commits, how to safely run test scripts in a sandbox, how to perform multi-step reasoning verification—these were all proprietary "black technologies" that Codex and Claude Code were proud of.

But OpenClaw abstracted these processes into a universal protocol. This means that developers no longer need to be locked into a specific platform for a particular collaboration mode. The open-source community's狂欢 (carnival) made standardization an irreversible tide. Faced with this situation, both OpenAI and Anthropic had to lower their姿态 (posture) to兼容 (compatible) with this open standard.

When the underlying technical barriers were leveled by open-source forces like OpenClaw, when all advanced features became standard industry配置 (configurations), the only way out for Codex and Claude Code was to engage in endless involution at the more subtle level of user experience. This is also why we feel they are becoming more and more similar—because under a standardized framework, there is often only one optimal solution—just like convergent evolution in biology.

Codex is Catching Up to Claude Code

Although Claude Code and Codex are on the path of convergent evolution, differences between the two still exist, and Codex is even preferred by developers in some aspects.

The other day, on the r/ClaudeCode community, a senior engineer with 14 years of experience who had worked at tech giants, u/Canamerican726, shared an extremely hardcore evaluation.

Specifically, he invested 100 hours using Claude Code and 20 hours using Codex in a complex project containing 80,000 lines of code.

From his perspective, using Claude Code was like instructing an engineer chased by a deadline; it sprinted extremely fast but often ignored the specifications written by the developer in CLAUDE.md, and liked to continuously pile code into existing files to complete tasks, lacking refactoring thinking.

In contrast, Codex felt more like a steady veteran with 5 to 6 years of experience. Its processing speed was 3 to 4 times slower, but it would proactively stop to think and refactor code midway, and strictly adhere to instruction boundaries. This high degree of autonomy allowed this engineer to dare to throw tasks directly at it and then放心地 (feel at ease) go do other things.

The same voices appear on social networks like X. Researcher Aran Komatsuzaki mentioned, based on his own experience, that Claude Code still has the advantage in the front-end field, but in back-end planning and keeping information updated, Codex, which frequently calls web search, is显然 (clearly) more solid.

The comment section is filled with bloody lessons总结 (summaries) from real business scenarios. Some developers pointed out极其犀利地 (extremely sharply) that models based on Opus, although fast, often accumulate a large amount of "code cleaning debt" for projects; Codex is slow, but can clean the floor顺手 (in passing) while moving forward. I even saw users summarizing a survival rule, suggesting that everyone immediately start a new session when context window usage reaches 70%, otherwise it is extremely easy to receive系统附赠的 (system-attached) hidden bugs.

These real complaints from the front line clearly show that when the ability panels of the two great tools increasingly overlap, what ultimately determines which camp developers belong to is often these tiny experience gaps related to "pit-filling costs" and "maintenance mental load." Of course, there are some special difficulties for Chinese users, such as:

Cold Thinking: The Ecosystem Battle Behind Homogenization

Of course, the pros and cons of Codex and Claude Code also depend on the developers themselves and their own abilities. As summarized in the evaluation report by u/Canamerican726 mentioned above: If you don't understand software engineering, both tools will output糟糕的 (poor) results; tools are not equivalent to skills.

This sentence punctures a certain illusion long营造 (created) by AI programming tools. We once thought that with a powerful enough AI assistant, even a Vobe Coder with no foundation could single-handedly create enterprise-level applications. But the reality is that Claude Code needs an extremely focused and highly skilled "pilot," otherwise it can easily get lost in a huge codebase. Codex, although more independent, also requires developers to provide accurate system context to发挥最大效用 (achieve maximum utility).

So, in today's world of highly homogenized tool capabilities, where have the moats of these two companies转移 (moved) to?

The answer lies in those boring financial statements and pricing strategies. For the same task, the number of tokens consumed by Claude Code is often 3 to 4 times that of Codex. The usage cost is higher. For enterprise teams, using Claude Code costs $100 to $200 per developer per month. Codex, on the other hand, bundles its capabilities into more affordable subscription plans and has accumulated a large number of basic users through the vast GitHub community.

Image source: MorphLLM

Anthropic's ambition is to deeply embed Claude Code into the workflows of tech giants who are not short of money. For example, Stripe had 1370 engineers use Claude Code to complete a cross-language code migration in 4 days that would have taken 10 people weeks. Ramp company relied on it to reduce event response time by 80%. OpenAI, relying on its ubiquitous ecological penetration, has made Codex the default choice for many ordinary developers.

This is no longer a单纯 (pure) technical competition, but a war of attrition about ecological binding, pricing strategies, and reshaping user habits.

The Developer's Crossroads

Looking back at the technological evolution of the past year, the release of GPT-5.4-Cyber is just a small footnote in this long battle. Codex and Claude Code moving towards "the same face" marks the official entry of AI programming tools from an early testing phase full of variables and novelty into a mature and boring industrialized production phase.

Now, Claude Code automatically generates 135,000 GitHub commits daily, a number that already accounts for 4% of all public commits on the entire network. We can foresee that in the near future, most boilerplate code, basic test cases, and常规的 (routine) code refactoring will be silently completed in the background by these AI agents that look more and more alike.

Image source: MorphLLM & SemiAnalysis / GitHub Search API

Facing two super tools that are infinitely接近 (approaching) in capability and模仿 (imitating) each other in experience, what core value do we, as human developers, have left? Perhaps, the tool红利期 (dividend period) is about to end completely. When everyone holds equally sharp weapons, what truly determines victory will no longer be who has better code completion speed, but who can better define problems, who has a broader system architecture vision, and who can find that unique irreplaceability belonging to humans in this code world filled with AI.

By the way, which one do you choose?

Reference Links

https://www.morphllm.com/comparisons/codex-vs-claude-code

https://www.reddit.com/r/ClaudeCode/comments/1sk7e2k/claude_code_100_hours_vs_codex_20_hours/

https://x.com/arankomatsuzaki/status/2044270102003196007

https://www.nytimes.com/2026/04/14/technology/openai-cybersecurity-gpt54-cyber.html

This article is from the WeChat public account "机器之心" (ID: almosthuman2014), author: 机器之心 (Machine Heart)

Câu hỏi Liên quan

QWhat is the main trend observed between OpenAI's Codex and Anthropic's Claude Code according to the article?

AThe main trend is that Codex and Claude Code are becoming increasingly similar and homogeneous in their capabilities and approaches, evolving from distinct technical philosophies to convergent solutions.

QHow did the initial technical philosophies of Codex and Claude Code differ?

ACodex was initially designed as a lightweight, high-interaction terminal agent focused on speed and iterative programming, while Claude Code was built as a high-level 'architect' focused on handling extremely complex tasks with a massive context window and thorough codebase analysis.

QWhat external factor is cited as a significant force pushing Codex and Claude Code towards standardization and homogeneity?

AThe OpenClaw open-source workflow framework is cited as a major external force that standardized the interaction between large models and local toolchains, breaking down proprietary barriers and forcing both platforms to adopt common protocols.

QAccording to user feedback, what is a key practical difference in how Codex and Claude Code handle complex coding tasks?

AUser feedback indicates that Claude Code often works very fast but can accumulate 'code cleaning debt' by ignoring specifications and stacking code, while Codex is slower but more thoughtful, proactively refactoring code and strictly adhering to instruction boundaries.

QWhere has the competitive battleground between Codex and Claude Code shifted, now that their technical capabilities are converging?

AThe competition has shifted to ecosystem strategy, pricing models, and user habit formation, with Anthropic targeting deep integration into well-funded enterprise workflows and OpenAI leveraging its broad GitHub community penetration and more affordable subscription plans.

Nội dung Liên quan

Từ Hàn Quốc đến Mỹ: Nhờ AI, Lao Động Chân Tay Ngày Càng 'Lên Ngôi'

Trí tuệ nhân tạo (AI) đang định hình lại thị trường lao động. Bằng đại học 4 năm không còn là "tấm vé an toàn", trong khi các nghề kỹ thuật như thợ điện, thợ hàn, thợ sửa ống nước đang có nhu cầu cao và mức lương hấp dẫn chưa từng thấy. Thanh niên từ Mỹ đến Hàn Quốc đang thay đổi quan điểm về giáo dục và nghề nghiệp. Số liệu cho thấy sự chuyển dịch rõ rệt: doanh thu trường dạy nghề ở Mỹ tăng mạnh 11.4%, trong khi các đợt sa thải nhân viên văn phòng do AI cũng lên mức cao kỷ lục. Áp lực AI thay thế lao động trí thức, cộng với nhu cầu mở rộng cơ sở hạ tầng và trung tâm dữ liệu, đang đẩy cung-cầu nghiêng về lao động kỹ thuật. Khảo sát cho thấy 60% thế hệ Z có kế hoạch làm nghề lao động kỹ thuật vào năm 2026, vì tin rằng những nghề này có tính an toàn nghề nghiệp cao hơn trước AI. Nhiều nghề kỹ thuật hiện có mức lương trung bình ngang bằng hoặc vượt các nghề yêu cầu bằng đại học. Mô hình học việc "vừa học vừa làm" hấp dẫn, và công việc này khó bị AI thay thế hoặc gia công ra nước ngoài. Tại Hàn Quốc, học sinh tốt nghiệp trung học bán dẫn có tỷ lệ có việc làm ngay lên tới 96.4%. Tuy nhiên, ngành này đang đối mặt với khoảng trống kép: làn sóng nghỉ hưu ồ ạt và nhu cầu mở rộng cơ sở hạ tầng. Dự báo Mỹ sẽ thiếu hụt hàng triệu lao động có kỹ năng. Nhiều tập đoàn lớn như Meta, Lowe's đã đầu tư hàng trăm triệu USD vào các chương trình đào tạo. Dù nhu cầu thị trường rõ ràng, định kiến xã hội coi trọng bằng đại học hơn học nghề vẫn tồn tại. Các chuyên gia nhấn mạnh cần có những nỗ lực chủ động từ ngành công nghiệp để thu hẹp khoảng cách nhận thức này, thông qua các chương trình trải nghiệm thực tế cho giới trẻ, nhằm giải quyết tận gốc tình trạng thiếu hụt lao động kỹ thuật.

marsbit22 phút trước

Từ Hàn Quốc đến Mỹ: Nhờ AI, Lao Động Chân Tay Ngày Càng 'Lên Ngôi'

marsbit22 phút trước

Từ TPU đến Agent Tự Tiến Hóa: Jeff Dean Nhìn Nhận Bước Tiếp Theo của AI Như Thế Nào?

Trong buổi phỏng vấn tại YC Startup School 2026, Jeff Dean chia sẻ tầm nhìn về tương lai của AI. Ông nhấn mạnh rằng cuộc cạnh tranh AI không còn là ai có mô hình lớn hơn, mà là ai có thể tổ chức trí thông minh hiệu quả hơn. Điểm then chốt là sự chuyển dịch từ việc huấn luyện các mô hình đơn thuần sang xây dựng các hệ thống "Agent" có khả năng làm việc lâu dài, tự động thử nghiệm, xác minh và tích lũy kinh nghiệm. Các Agent này cần được trang bị công cụ, bộ nhớ, môi trường thực thi và cơ chế phản hồi rõ ràng. Jeff Dean cho rằng cơ hội cho các startup nằm ở những lĩnh vực chuyên biệt nơi các mô hình phổ thông hiện có tỷ lệ thành công rất thấp (gần 0-1%), nhờ vào dữ liệu độc quyền, công cụ đánh giá chuyên môn hoặc mô hình chuyên sâu. Ông cũng dự đoán phần cứng chuyên dụng cho suy luận (inference) với độ trễ thấp và tiêu thụ năng lượng thấp sẽ là chìa khóa. Khi chi phí tạo mã trở nên rẻ hơn, giá trị thực sự sẽ nằm ở khả năng xác định vấn đề đáng giải quyết, thiết kế thông số kỹ thuật rõ ràng và có "khiếu" thẩm định. Tương lai của AI là xây dựng các hệ thống có thể tự động hóa phương pháp khoa học, chạy các vòng lặp thử nghiệm với tốc độ cao để khám phá và cải tiến không ngừng.

marsbit23 phút trước

Từ TPU đến Agent Tự Tiến Hóa: Jeff Dean Nhìn Nhận Bước Tiếp Theo của AI Như Thế Nào?

marsbit23 phút trước

Qualcomm: Cơn sốt AI dịu dần, khi nào điện thoại mới thoát khỏi 'bóng mây'?

Qualcomm (QCOM.O) công bố báo cáo tài chính quý III năm tài chính 2026 (kết thúc tháng 6/2026) với doanh thu 9,95 tỷ USD, giảm 4% so với cùng kỳ, vượt kỳ vọng thị trường. Tuy nhiên, lợi nhuận gộp giảm 2,5 điểm phần trăm xuống 53,1% do chi phí sản xuất và lưu trữ tăng. Mảng kinh doanh bán dẫn (QCT) chịu ảnh hưởng nặng từ smartphone, doanh thu giảm 19,6% xuống 5,09 tỷ USD. Nguyên nhân do sản lượng điện thoại Android (trừ Apple) giảm 11% và xu hướng các hãng sử dụng nền tảng cũ để tiết kiệm chi phí. Trong khi đó, mảng ô tô tăng trưởng mạnh 61% lên 1,59 tỷ USD nhờ Snapdragon Digital Chassis, và IoT tăng 9% lên 1,83 tỷ USD. Lợi nhuận hoạt động cốt lõi giảm 41% do áp lực biên lợi nhuận và chi phí hoạt động tăng. Dự báo quý tới, doanh thu dự kiến 9,7-10,5 tỷ USD, phù hợp với kỳ vọng, nhưng EPS (Non-GAAP) dự báo thấp hơn. Trong bối cảnh thị trường điện thoại trì trệ, Qualcomm đang tìm kiếm tăng trưởng từ AI, bao gồm AI trên thiết bị (AI Phone, AI PC) và đặc biệt là trung tâm dữ liệu AI với bốn hướng: bộ tăng tốc AI, CPU thương mại, chip tùy chỉnh và sản phẩm kết nối. Công ty đặt mục tiêu doanh thu 15 tỷ USD cho mảng trung tâm dữ liệu vào năm tài chính 2029, nhưng hiện tại thị trường vẫn đang thận trọng. Giá cổ phiếu đã giảm từ đỉnh 250 USD xuống dưới 160 USD, phản ánh lo ngại về chi tiêu AI và hiệu suất cơ bản hiện tại.

marsbit25 phút trước

Qualcomm: Cơn sốt AI dịu dần, khi nào điện thoại mới thoát khỏi 'bóng mây'?

marsbit25 phút trước

Sự cố Coldcard kích hoạt làn sóng Bitcoin, củng cố tiền điện tử 'tăng giá': Hodler's Digest, 2 tháng 8

Sự kiện khai thác lỗ hổng trên ví cứng Coldcard đã dẫn đến việc mất khoảng 90 triệu USD Bitcoin, khiến nhiều nhà đầu tư nhỏ chuyển tiền gấp sang các sàn tập trung. Dữ liệu cho thấy khối lượng giao dịch Bitcoin dưới 1 BTC đạt mức cao nhất hàng ngày kể từ năm 2022. Trong bối cảnh đó, ngành công nghiệp tiền mã hóa được cho là đang bước vào giai đoạn củng cố lớn nhất, với doanh thu ngày càng tập trung vào một số ít giao thức hàng đầu như Hyperliquid và Pump.fun. Nhà phân tích Lorenzo Valente của ARK Invest nhận định xu hướng này là "cực kỳ tích cực" cho không gian crypto, dù nó có thể đi kèm với nhiều vụ sáp nhập, phá sản và đóng cửa dự án. Mặt khác, báo cáo thu nhập quý II của các công ty lớn như Coinbase và MicroStrategy cho thấy kết quả ảm đạm, với thua lỗ gia tăng. Đồng thời, những bất ổn về mặt pháp lý vẫn tiếp diễn, như sự bế tắc của Dự luật Clarity Act tại Mỹ hay các cáo buộc nhắm vào nhà sáng lập Telegram Pavel Durov từ Nga và Australia. Cuối tuần, thị trường chung đi xuống, với Bitcoin giảm 3%. Grayscale đưa ra dự báo lạc quan rằng Bitcoin có thể đã chạm đáy sớm hơn chu kỳ truyền thống, mặc dù nhiều tín hiệu tương tự trước đó vẫn chưa ứng nghiệm.

cointelegraph42 phút trước

Sự cố Coldcard kích hoạt làn sóng Bitcoin, củng cố tiền điện tử 'tăng giá': Hodler's Digest, 2 tháng 8

cointelegraph42 phút trước

TIN MỚI NHẤT: Donald Trump đưa ra tuyên bố gay gắt về Iran! Ông đã dừng các cuộc tấn công

Tổng thống Mỹ Donald Trump tuyên bố đã hoãn các cuộc tấn công quân sự theo kế hoạch chống Iran sau khi Ả Rập Xê-út, UAE, Qatar và chính Iran kêu gọi ông làm như vậy. Ông mô tả kế hoạch ban đầu là "rất quy mô và mạnh mẽ", nhưng quyết định tạm dừng để các nước trong khu vực có thời gian cho đàm phán ngoại giao. Theo Trump, các đồng minh tin rằng một thỏa thuận đang đến gần. Giai đoạn đàm phán đầu tiên tập trung vào vấn đề an ninh và mở lại eo biển Hormuz - một tuyến đường vận chuyển dầu mỏ và khí đốt tự nhiên hóa lỏng trọng yếu của thế giới. Một khi vấn đề này được thống nhất, các cuộc đàm phán về chương trình hạt nhân của Iran sẽ bắt đầu. Trump cũng thông báo các cuộc đàm phán mới với Iran sẽ bắt đầu vào ngày mai. Ngoài ra, Trump còn đề cập đến sự can thiệp của Mỹ trên thị trường liên quan đến đồng yên Nhật, khẳng định hành động này dựa trên mối quan hệ tốt đẹp giữa hai nước và mang lại lợi ích kinh tế cho Mỹ.

cryptonews.ru2 giờ trước