Just Now, Chinese AI Enters Top 2 in Global Programming, Only Claude Remains Ahead

marsbitPublished on 2026-05-27Last updated on 2026-05-27

Abstract

**China's AI Ranks Second Globally in Programming, Trailing Only Claude** Today, Alibaba's Qwen3.7-Max achieved a score of 1541 on the Code Arena benchmark, securing fourth place globally and surpassing top models like GPT-5.5 and Gemini 3.5 Flash. Among the top positions, it is now the only non-Claude model, placing second overall after Anthropic's Opus models. Before this official ranking, Qwen3.7-Max had already gained recognition overseas. In practical tests, it outperformed rivals on tasks like creating a self-training Tetris AI and generating complex 3D models, often at a significantly lower cost. Developers praised its ability, especially when integrated with tools like Hermes Agent and OpenCode, to effectively replace models such as GPT-5.5. In a hands-on challenge to create a 3D racing game from a detailed prompt, Qwen3.7-Max delivered a fully playable HTML file in the first attempt, requiring only minor bug fixes. It uniquely included a start menu and sound effects—details missed by other models. While competitors like Gemini 3.5 Flash and Claude Opus 4.6 produced less polished or functional versions, and GPT-5.5 had its own quirks, Qwen3.7-Max stood out for its initial completeness and playability. This performance stems from its design as an "Agent Base Model," built for long-duration, autonomous task execution. Internal tests show it can run continuously for 35 hours, making over 1158 tool calls without context degradation or instruction drift. Key technical ...

Today, the latest Code Arena leaderboard is out!

Qwen3.7-Max, with a score of 1541 points, broke into the global top four, surpassing top-tier models like GPT-5.5 and Gemini 3.5 Flash.

Ahead of it, only Claude Opus 4.7 and Opus 4.6 remain.

In other words, in the global arena for programming models, Alibaba is the only Chinese player to make it to this top table, second only to Anthropic, securing the number two spot.

Qwen3.7-Max Breaks into Global Top Five

The Only Non-Claude Model

Even before the Code Arena leaderboard was released, Qwen3.7-Max had already made a name for itself among overseas developer communities.

Atomic Chat conducted a head-to-head comparison, pitting Opus 4.7, GPT-5.5, and Qwen3.7-Max against each other on a task to write a self-training Tetris AI.

The result? Qwen3.7-Max not only outperformed both Opus 4.7 and GPT-5.5 at a token cost of just $1.32 but also improved performance by 56%.

Another overseas developer had Qwen3.7-Max build a 3D model of the universe, and the result was described as stunning.

In the task of generating a "3D Pixel Art Miniature Pagoda Model," Qwen3.7-Max's output speed and quality were also comprehensively superior.

Developer Paul Couvert even highly praised Qwen3.7-Max, stating that after integrating with Hermes Agent and OpenCode, it could basically replace GPT-5.5 and Opus 4.7.

Programming, A True Contender

However, scores are one thing; real-world testing is another.

We arranged a hardcore "Racing Game" challenge for Qwen3.7-Max.

With a detailed Prompt input, in no time, Qwen3.7-Max directly output a playable HTML file.

The first version had a small bug: the A/D steering keys were reversed.

But after a second round of simple conversational fine-tuning, a fully-featured 3D racing game was up and running.

The moment it opened, to be honest, was a bit of a shock.

Four cars racing together on a 3-lap circular track, over 100 coins scattered on the track, hitting obstacles causes slowdowns and loss of control.

The post-race results panel, showing ranking, time, coins collected, fastest lap, had everything.

But what was truly surprising were two details that only Qwen3.7-Max got right.

One was the start screen. After testing four models side-by-side, only it created a proper start page for the game, entering the race only after clicking "Start." The other three went straight into racing without even a title screen.

The other was sound effects. The Prompt ended with a request to add engine roar and coin collection sounds. Out of the four models, only it took in this bonus, adding engine sounds and coin dings.

Now let's look at the performance of the other contestants.

Gemini 3.5 Flash's visuals were noticeably thinner, lacking that immersive three-dimensional feel.

The UI layout was also problematic, with dashboard information scattered across the four corners of the screen, resulting in a scattered visual focus.

In contrast, Qwen3.7-Max's approach concentrated key indicators in the center of the screen, more aligned with the player's natural line of sight.

Claude Opus 4.6's result was somewhat... hard to describe.

Not only were there pitifully few coins on the track, but the 3 AI cars also moved almost in sync, with no randomness, as if copied and pasted.

Finally, GPT-5.5.

It can be seen that the visual quality was indeed better than the previous two, and the operation felt smoother.

But for some reason, coins were made into yellow "donuts"...

The shape is a minor issue. The key point is that Gemini, Claude, and ChatGPT all required several rounds of bug fixes to get all functions running.

Only Qwen3.7-Max's first-round generation was basically playable.

Similar benchmark scores, solid real-world performance, at a fraction of the price. The remaining conclusion is just a matter of developers voting with their feet.

The "Foundation" Model for the Agent Era

The reason Qwen3.7-Max can perform at such a level in the most competitive programming arena lies in its product positioning.

A few days ago, when Alibaba released Qwen3.7-Max, they gave it a very special label: Agent Foundation Model.

It was born to be a model designed for long-duration autonomous task execution.

Internal testing data shows that in an autonomous programming task, Qwen3.7-Max ran continuously for 35 hours, executing 1158 tool calls.

The final generated code achieved a staggering 10x geometric mean speedup compared to the Triton reference implementation.

Even more impressive is its "endurance" capability—

Even after 30 hours into the reasoning process, the model remained sharp, continuously uncovering new optimization spaces.

Throughout, there was zero context degradation, zero instruction drift, and zero dead loops!

It must be said, the difficulty isn't in the 1000 tool calls themselves. Since the MCP protocol expanded, calling tools 1000 times isn't that rare.

The difficulty lies in 35 hours of coherent reasoning.

Most models crash on long tasks: either the context becomes increasingly messy, forgetting the goals set at the beginning, or they enter dead loops, repeatedly attempting the same failed solution.

Qwen3.7-Max has made "continuously doing the right thing" a reality.

Revealing the Core Technology

We understand that this leap in programming for Qwen3.7-Max likely stems from upgrades in two key training methods.

First, Environmental Expansion.

During programming training for Qwen3.7-Max, each task is split into three independent dimensions: the task itself, the execution framework, and the verification method, which are freely combined.

The same problem might be solved within the Claude Code framework, sometimes in OpenClaw, and other times with a different verification method.

The effect is like an intern being rotated through all project teams. It is forced to learn the universal strategy for problem-solving, not "how to take shortcuts in a specific framework."

This explains a counterintuitive phenomenon: Qwen3.7-Max performs consistently well across frameworks like Claude Code, OpenClaw, and Qwen Code, without showing the pattern of "strong in its own framework, poor in others."

The second upgrade is, Long-Horizon Autonomous Execution.

In training, the team introduced a "Dynamic Accumulative Survival Game" framework.

This means making the model perform over a thousand steps of continuous decision-making in a continuously changing simulated environment, establishing its own hypotheses, adjusting strategies based on feedback, and avoiding "context corruption" from running too long.

Here's a telling data point: in the YC-Bench simulation of running a startup for a full year, Qwen3.7-Max achieved $2.08 million in revenue, double that of the previous generation ($1.05 million).

More crucially, it demonstrated strategic evolution: autonomously adjusting direction mid-term during a crisis, identifying and blocking malicious clients, eventually converging to a stable execution loop.

This is the underlying support for the 35-hour kernel optimization case and explains why on Kernel Bench L3, Qwen3.7-Max achieved speedup effects in 96% of scenarios.

And programming is just the first battlefield. This foundation of long-horizon reasoning and tool calling points to a greater ambition—a universal Agent foundation.

The Programming Finals Have a New Disruptor

Since its launch, Code Arena has always tested hard skills: multi-step reasoning, tool orchestration, complete project delivery—all real, Agent-level challenges.

Today, with a score of 1541 points, Qwen3.7-Max wedged itself into fourth place, positioned between Opus 4.6 Thinking and Opus 4.6.

On this track where Claude has dominated for over half a year, it has given its answer: Chinese models are not just followers; they can also be definers.

The global programming model competition is no longer a one-man show in Silicon Valley.

References:

https://arena.ai/leaderboard/code/webdev

This article is from the WeChat public account "AI Era Insights" (新智元), author: ASI启示录

Michael Saylor Claims It Has Become Impossible to Implement the Bitcoin Update He Opposed!

Michael Saylor stated that it has become mathematically impossible for BIP-110 to achieve its required 55% voluntary support threshold within the current Bitcoin mining difficulty adjustment cycle. According to his analysis of 946 blocks generated up to block 960,561, only 24 blocks included a version field signal supporting BIP-110. Saylor noted that all these signaling blocks were mined by DATUM miners sharing rewards through the OCEAN pool, with no supporting signals from miners outside OCEAN. He concluded that BIP-110 will not reach the 55% level in this cycle and argued the current signals do not represent a broad miner consensus. BIP-110 is a proposal aimed at making it harder to embed large non-financial data, like images or text, into the Bitcoin blockchain, promoting a view that Bitcoin should be used primarily for monetary transactions. Saylor opposes this, arguing the Bitcoin network should not arbitrate which transactions are necessary and that rules should not change based on the preferences of a few. He also contends that high support metrics may be inflated by automated signaling software rather than reflecting genuine miner support.

cryptonews.ru5m ago

Michael Saylor Claims It Has Become Impossible to Implement the Bitcoin Update He Opposed!

cryptonews.ru5m ago

Within Strategy's Framework, STRC's Dividend Yield Remains at 12% as Share Price Stays Below Par Value

Michael Saylor, Executive Chairman of Strategy (MSTR), confirmed that the dividend rate for its STRC perpetual preferred shares will remain at 12.00% through August 2026. The rate has increased from 9% at its July 2025 launch to the current high via a "ratchet" mechanism, which permanently raises the rate by 0.5% whenever the share price falls below $95. This mechanism is intended to push the price back toward its $100 par value and support Strategy's "at-the-market" (ATM) program for issuing new shares to fund Bitcoin purchases. However, the mechanism has not worked as intended. STRC shares closed at $89.46 on July 31, remaining about 10-11% below par value despite the record-high dividend. Competition from rival Strive's higher-yielding SATA securities has pressured demand. The persistent discount has forced Strategy to suspend new STRC issuances via its ATM program, limiting this funding channel for Bitcoin acquisitions. STRC's struggles reflect Bitcoin's own volatility, as the preferred shares historically move in tandem. Analysts have warned the ratchet structure carries long-term, one-way risk. A law firm is investigating Strategy's ability to maintain dividend payments if Bitcoin's price stays low. Retail investors own roughly 83% of outstanding STRC shares, a group seen as prone to panic selling during downturns. In response, Strategy has established financial reserves, including a liquidity cushion covering about 26 months of dividend/interest obligations, and a $2 billion share buyback program alongside a Bitcoin monetization framework, though the company emphasized it is not obligated to sell any Bitcoin.

cryptonews.ru7m ago

Within Strategy's Framework, STRC's Dividend Yield Remains at 12% as Share Price Stays Below Par Value

cryptonews.ru7m ago

Analyst: Bitcoin's Price Will Drop to $60k in August, Then Rebound to $70k

Financial analyst Andrey Poroshin has provided a new forecast for Bitcoin's price dynamics in August. Poroshin, an analyst at the Bitbanker exchange, expects the cryptocurrency market to experience a downturn this month, with prices retesting the $60,000 level due to a lack of supportive macroeconomic catalysts. He noted that the recent US Federal Reserve decision to hold interest rates did not significantly impact the market, while inflation remains above the 2% target. Poroshin stated that Bitcoin is ending July under pressure from moderate volatility and a lack of new macroeconomic stimuli, leading to continued market caution. According to his base scenario, Bitcoin will drop to a range of $60,000 to $62,000 before recovering to $70,000. He pointed out that even $70,000 remains below the cost of mining in the US, which has prompted some miners to shift towards AI data center operations. Poroshin cited the winding down of BitMEX's operations as a potential catalyst for a price rebound, suggesting the exit of weaker players often coincides with market reversals and reduced short-term selling pressure. He believes Bitcoin is currently less susceptible to geopolitical shocks, such as the Iran-US conflict, and does not expect significant market changes in August related to the pending CLARITY Act. Looking ahead, Poroshin forecasts that September will bring more active price fluctuations driven by potential Fed rate decisions and possible discussions or approval of the CLARITY Act.

cryptonews.ru7m ago

Analyst: Bitcoin's Price Will Drop to $60k in August, Then Rebound to $70k

cryptonews.ru7m ago

Following the Coldcard Hack, One of the Largest Bitcoin Wallet Hacks Recently, a New Wave of Losses Begins! Losses Are Mounting

Following a major hack targeting Coldcard hardware wallets, losses have surged to approximately 1,367 BTC ($88.6 million) across 4,585 addresses. The third wave of attacks stole an additional 207.7 BTC, exhibiting different patterns from the first two. While initial attacks used shared deposit addresses and targeted P2WPKH wallets, the latest wave employed unique recipient addresses per victim and focused on P2WSH addresses. Analysis by Galaxy Research cannot definitively link all three waves to the same attacker, raising the possibility of a second actor exploiting the known vulnerability. The stolen funds, predominantly from wallets holding under 1 BTC, remain unspent. The vulnerable Coldcard firmware was released in March 2021, and all stolen coins originate from after that date.

cryptonews.ru1h ago

Following the Coldcard Hack, One of the Largest Bitcoin Wallet Hacks Recently, a New Wave of Losses Begins! Losses Are Mounting

cryptonews.ru1h ago

Trump Media sells another 2,628 BTC, holdings fall to 4,261 BTC

Trump Media & Technology Group has sold an additional 2,628 Bitcoin (worth approximately $165 million), continuing a series of sales over the past seven months. According to blockchain data from Arkham cited by Lookonchain, these latest transfers to Crypto.com bring the company's total reported Bitcoin sales to 7,281 BTC (worth about $545 million), reducing its holdings by 63%. The company's remaining Bitcoin holdings now stand at 4,261 BTC, valued at $269.8 million. Trump Media initially purchased 11,542 BTC at an average price of $118,522 before beginning the sales. This activity occurs amid broader scrutiny of crypto ventures linked to former President Donald Trump, as lawmakers debate the CLARITY Act, which focuses on ethics rules, digital asset ownership, and potential conflicts of interest for public officials.

cointelegraph1h ago

Trump Media sells another 2,628 BTC, holdings fall to 4,261 BTC

cointelegraph1h ago

Trading

Spot

Just Now, Chinese AI Enters Top 2 in Global Programming, Only Claude Remains Ahead

Abstract

Qwen3.7-Max Breaks into Global Top Five

The Only Non-Claude Model

Programming, A True Contender

The "Foundation" Model for the Agent Era

Revealing the Core Technology

The Programming Finals Have a New Disruptor

Related Questions

Related Reads

Michael Saylor Claims It Has Become Impossible to Implement the Bitcoin Update He Opposed!

Within Strategy's Framework, STRC's Dividend Yield Remains at 12% as Share Price Stays Below Par Value

Analyst: Bitcoin's Price Will Drop to $60k in August, Then Rebound to $70k

Following the Coldcard Hack, One of the Largest Bitcoin Wallet Hacks Recently, a New Wave of Losses Begins! Losses Are Mounting

Trump Media sells another 2,628 BTC, holdings fall to 4,261 BTC

Trading