Just Now, Chinese AI Enters Top 2 in Global Programming, Only Claude Remains Ahead

marsbitPublished on 2026-05-27Last updated on 2026-05-27

Abstract

**China's AI Ranks Second Globally in Programming, Trailing Only Claude** Today, Alibaba's Qwen3.7-Max achieved a score of 1541 on the Code Arena benchmark, securing fourth place globally and surpassing top models like GPT-5.5 and Gemini 3.5 Flash. Among the top positions, it is now the only non-Claude model, placing second overall after Anthropic's Opus models. Before this official ranking, Qwen3.7-Max had already gained recognition overseas. In practical tests, it outperformed rivals on tasks like creating a self-training Tetris AI and generating complex 3D models, often at a significantly lower cost. Developers praised its ability, especially when integrated with tools like Hermes Agent and OpenCode, to effectively replace models such as GPT-5.5. In a hands-on challenge to create a 3D racing game from a detailed prompt, Qwen3.7-Max delivered a fully playable HTML file in the first attempt, requiring only minor bug fixes. It uniquely included a start menu and sound effects—details missed by other models. While competitors like Gemini 3.5 Flash and Claude Opus 4.6 produced less polished or functional versions, and GPT-5.5 had its own quirks, Qwen3.7-Max stood out for its initial completeness and playability. This performance stems from its design as an "Agent Base Model," built for long-duration, autonomous task execution. Internal tests show it can run continuously for 35 hours, making over 1158 tool calls without context degradation or instruction drift. Key technical ...

Today, the latest Code Arena leaderboard is out!

Qwen3.7-Max, with a score of 1541 points, broke into the global top four, surpassing top-tier models like GPT-5.5 and Gemini 3.5 Flash.

Ahead of it, only Claude Opus 4.7 and Opus 4.6 remain.

In other words, in the global arena for programming models, Alibaba is the only Chinese player to make it to this top table, second only to Anthropic, securing the number two spot.

Qwen3.7-Max Breaks into Global Top Five

The Only Non-Claude Model

Even before the Code Arena leaderboard was released, Qwen3.7-Max had already made a name for itself among overseas developer communities.

Atomic Chat conducted a head-to-head comparison, pitting Opus 4.7, GPT-5.5, and Qwen3.7-Max against each other on a task to write a self-training Tetris AI.

The result? Qwen3.7-Max not only outperformed both Opus 4.7 and GPT-5.5 at a token cost of just $1.32 but also improved performance by 56%.

Another overseas developer had Qwen3.7-Max build a 3D model of the universe, and the result was described as stunning.

In the task of generating a "3D Pixel Art Miniature Pagoda Model," Qwen3.7-Max's output speed and quality were also comprehensively superior.

Developer Paul Couvert even highly praised Qwen3.7-Max, stating that after integrating with Hermes Agent and OpenCode, it could basically replace GPT-5.5 and Opus 4.7.

Programming, A True Contender

However, scores are one thing; real-world testing is another.

We arranged a hardcore "Racing Game" challenge for Qwen3.7-Max.

With a detailed Prompt input, in no time, Qwen3.7-Max directly output a playable HTML file.

The first version had a small bug: the A/D steering keys were reversed.

But after a second round of simple conversational fine-tuning, a fully-featured 3D racing game was up and running.

The moment it opened, to be honest, was a bit of a shock.

Four cars racing together on a 3-lap circular track, over 100 coins scattered on the track, hitting obstacles causes slowdowns and loss of control.

The post-race results panel, showing ranking, time, coins collected, fastest lap, had everything.

But what was truly surprising were two details that only Qwen3.7-Max got right.

One was the start screen. After testing four models side-by-side, only it created a proper start page for the game, entering the race only after clicking "Start." The other three went straight into racing without even a title screen.

The other was sound effects. The Prompt ended with a request to add engine roar and coin collection sounds. Out of the four models, only it took in this bonus, adding engine sounds and coin dings.

Now let's look at the performance of the other contestants.

Gemini 3.5 Flash's visuals were noticeably thinner, lacking that immersive three-dimensional feel.

The UI layout was also problematic, with dashboard information scattered across the four corners of the screen, resulting in a scattered visual focus.

In contrast, Qwen3.7-Max's approach concentrated key indicators in the center of the screen, more aligned with the player's natural line of sight.

Claude Opus 4.6's result was somewhat... hard to describe.

Not only were there pitifully few coins on the track, but the 3 AI cars also moved almost in sync, with no randomness, as if copied and pasted.

Finally, GPT-5.5.

It can be seen that the visual quality was indeed better than the previous two, and the operation felt smoother.

But for some reason, coins were made into yellow "donuts"...

The shape is a minor issue. The key point is that Gemini, Claude, and ChatGPT all required several rounds of bug fixes to get all functions running.

Only Qwen3.7-Max's first-round generation was basically playable.

Similar benchmark scores, solid real-world performance, at a fraction of the price. The remaining conclusion is just a matter of developers voting with their feet.

The "Foundation" Model for the Agent Era

The reason Qwen3.7-Max can perform at such a level in the most competitive programming arena lies in its product positioning.

A few days ago, when Alibaba released Qwen3.7-Max, they gave it a very special label: Agent Foundation Model.

It was born to be a model designed for long-duration autonomous task execution.

Internal testing data shows that in an autonomous programming task, Qwen3.7-Max ran continuously for 35 hours, executing 1158 tool calls.

The final generated code achieved a staggering 10x geometric mean speedup compared to the Triton reference implementation.

Even more impressive is its "endurance" capability—

Even after 30 hours into the reasoning process, the model remained sharp, continuously uncovering new optimization spaces.

Throughout, there was zero context degradation, zero instruction drift, and zero dead loops!

It must be said, the difficulty isn't in the 1000 tool calls themselves. Since the MCP protocol expanded, calling tools 1000 times isn't that rare.

The difficulty lies in 35 hours of coherent reasoning.

Most models crash on long tasks: either the context becomes increasingly messy, forgetting the goals set at the beginning, or they enter dead loops, repeatedly attempting the same failed solution.

Qwen3.7-Max has made "continuously doing the right thing" a reality.

Revealing the Core Technology

We understand that this leap in programming for Qwen3.7-Max likely stems from upgrades in two key training methods.

First, Environmental Expansion.

During programming training for Qwen3.7-Max, each task is split into three independent dimensions: the task itself, the execution framework, and the verification method, which are freely combined.

The same problem might be solved within the Claude Code framework, sometimes in OpenClaw, and other times with a different verification method.

The effect is like an intern being rotated through all project teams. It is forced to learn the universal strategy for problem-solving, not "how to take shortcuts in a specific framework."

This explains a counterintuitive phenomenon: Qwen3.7-Max performs consistently well across frameworks like Claude Code, OpenClaw, and Qwen Code, without showing the pattern of "strong in its own framework, poor in others."

The second upgrade is, Long-Horizon Autonomous Execution.

In training, the team introduced a "Dynamic Accumulative Survival Game" framework.

This means making the model perform over a thousand steps of continuous decision-making in a continuously changing simulated environment, establishing its own hypotheses, adjusting strategies based on feedback, and avoiding "context corruption" from running too long.

Here's a telling data point: in the YC-Bench simulation of running a startup for a full year, Qwen3.7-Max achieved $2.08 million in revenue, double that of the previous generation ($1.05 million).

More crucially, it demonstrated strategic evolution: autonomously adjusting direction mid-term during a crisis, identifying and blocking malicious clients, eventually converging to a stable execution loop.

This is the underlying support for the 35-hour kernel optimization case and explains why on Kernel Bench L3, Qwen3.7-Max achieved speedup effects in 96% of scenarios.

And programming is just the first battlefield. This foundation of long-horizon reasoning and tool calling points to a greater ambition—a universal Agent foundation.

The Programming Finals Have a New Disruptor

Since its launch, Code Arena has always tested hard skills: multi-step reasoning, tool orchestration, complete project delivery—all real, Agent-level challenges.

Today, with a score of 1541 points, Qwen3.7-Max wedged itself into fourth place, positioned between Opus 4.6 Thinking and Opus 4.6.

On this track where Claude has dominated for over half a year, it has given its answer: Chinese models are not just followers; they can also be definers.

The global programming model competition is no longer a one-man show in Silicon Valley.

References:

https://arena.ai/leaderboard/code/webdev

This article is from the WeChat public account "AI Era Insights" (新智元), author: ASI启示录

Annual Salary of Millions Competing for Electricians, Meta Rushes to Open Its Own Technical School

The AI boom is facing an unexpected bottleneck: a severe shortage of skilled construction workers and electricians. As tech giants like Meta, OpenAI, and Alphabet race to build massive data centers—such as OpenAI's $16 billion "Stargate" project—they are hitting a critical labor wall. The U.S. needs an estimated 130,000 more electricians, 240,000 construction workers, and 150,000 supervisors by 2030 for AI infrastructure alone, but tens of thousands of electrician jobs go unfilled each year. While AI companies offer high premiums, with electricians earning up to $280,000 annually, worker scarcity still causes massive losses—delays on a single project can cost $14.2 million per month. The complexity of building AI data centers, which require immense power (equivalent to powering hundreds of thousands of homes), sophisticated electrical systems, and advanced liquid cooling solutions, demands highly skilled technicians who are in short supply. To combat this, companies are investing heavily in training. Meta has committed $115 million to a free training school offering tuition, housing, and stipends, targeting 5,000 new workers. OpenAI is partnering with unions to secure skilled labor. These efforts are paying off, with a significant rise in Gen Z interest in trade schools over college. However, the power demands are staggering. AI data centers are driving a rapid surge in electricity consumption, projected to account for up to 12% of U.S. power use by 2028 and raising costs for consumers. Furthermore, the construction boom is project-based, leading to a potential future glut of trained workers once building peaks, which could depress wages industry-wide. The race for AI supremacy now depends as much on skilled hands as on advanced chips.

marsbit57m ago

Annual Salary of Millions Competing for Electricians, Meta Rushes to Open Its Own Technical School

marsbit57m ago

OpenAI No Longer Sells Its Most Expensive Model for Profit

OpenAI is shifting its business strategy away from promoting its most expensive, flagship models for every task. Recent price cuts—80% for GPT-5.6 Luna and 20% for Terra—signal a deeper change: the company now actively advises users that many tasks don't require the most powerful model. Instead, OpenAI recommends a tiered approach: use the high-end GPT-5.6 Sol for complex planning and analysis, then delegate execution to cheaper models like Luna. This mirrors moves by Anthropic, which recently launched Claude Opus 5 at half the price of its top model, Fable 5. Both companies are de-emphasizing flagship models as primary revenue drivers, using them instead for brand prestige and technological showcases. The industry is entering a "mass-market" phase, similar to automotive, where high-volume, cost-effective models handle daily operations and drive scale. OpenAI's price reductions are partly enabled by AI models themselves optimizing underlying code and infrastructure, creating a self-reinforcing cycle of efficiency gains and cost reduction. Competition is shifting from "who is smartest" to "who offers the best value." The goal is no longer selling individual models but fostering widespread API adoption and ecosystem lock-in. By making AI calls cheap and ubiquitous, companies like OpenAI aim to become the indispensable, utility-like infrastructure powering automated workflows—the "water and electricity" of software, quietly embedded everywhere.

marsbit57m ago

OpenAI No Longer Sells Its Most Expensive Model for Profit

marsbit57m ago

Suspected 4th Coldcard attack wave sweeps 389 Bitcoin: Galaxy’s Thorn

Coldcard hardware wallet users are facing a new wave of coordinated attacks targeting a firmware flaw, with researcher Alex Thorn flagging 218 recent transactions moving approximately 389 Bitcoin from potentially impacted addresses. The attack pattern shows a high rate of transactions targeting unique victim addresses, differing from previous waves. The vulnerability, which causes affected devices to generate weaker wallet seeds, is estimated to have impacted over 1,100 wallets, leading to around $90 million in Bitcoin stolen. Thorn advises affected users who control their keys may attempt to move funds with a higher-fee transaction before the attacker's transactions are confirmed.

cointelegraph1h ago

Suspected 4th Coldcard attack wave sweeps 389 Bitcoin: Galaxy’s Thorn

cointelegraph1h ago

Bitcoin Miners Are Waving the White Flag, But Their Stocks Are Soaring

Bitcoin miners are capitulating as evidenced by a sustained drop in network hash rate and a record-steep 19.9% decline in mining difficulty, signaling the shuttering of unprofitable machines. However, in a significant divergence from historical patterns, the stocks of publicly traded mining companies have soared, with one major player gaining over 430% in the past year, even as BTC's price fell roughly 46%. This surge is largely attributed to these companies pivoting toward the more lucrative AI narrative. Simultaneously, miners face a structural squeeze. Daily block reward revenue in BTC terms has hit a new all-time low following the latest halving, with current dollar-denominated daily revenue around $30 million compared to a longer-term average of ~$40 million. Fee revenue remains negligible, covering less than one block's subsidy over a 28-day average and accounting for only about ten minutes of the network's daily security budget. This capitulation cycle is unique: miner stress and crypto price weakness have decoupled due to alternative revenue streams (AI), while the long-term reliance on increasing bitcoin prices to offset shrinking subsidies continues, with fee income still far from filling the impending gap.

marsbit1h ago

Bitcoin Miners Are Waving the White Flag, But Their Stocks Are Soaring

marsbit1h ago

Will the Fed Definitely Raise Interest Rates in September? How Will Crypto and U.S. Stocks Withstand the Pressure?

The market's expectation for a September Fed rate hike surged dramatically in early August, jumping from under 50% to over 80% within a week. This shift followed a contentious July FOMC meeting, where a 9-3 vote to hold rates revealed growing dissent from hawkish members advocating for an immediate hike to combat persistent inflation. The primary catalyst for this repricing is rising oil prices, driven by renewed geopolitical tensions around the Strait of Hormuz, which threaten global supply. Energy costs directly influence inflation metrics, making the upcoming July CPI report (due August 12th) a critical data point. If it shows inflation reaccelerating, the probability of a September hike will solidify. For Bitcoin and crypto assets, this is typically bearish news. Bitcoin continues to behave as a high-beta, liquidity-sensitive risk asset. A rate hike raises the opportunity cost of holding non-yielding assets and could drive capital toward money markets, pressuring crypto prices in the short term. However, historical patterns suggest that if a hike is perceived as the end of a tightening cycle rather than the start, any negative price impact may be brief. U.S. stocks, particularly crypto-linked equities like Coinbase and growth-oriented tech stocks, are also vulnerable. Higher rates increase discount rates in valuation models, putting pressure on high-multiple companies. This coincides with a pivotal tech earnings season where investor focus has shifted from massive AI capital expenditure to tangible revenue and cash flow generation. Companies with negative cash flow and weak growth narratives could face heightened volatility if borrowing costs rise in September. In summary, a September Fed hike has evolved into a mainstream market scenario. Key factors to watch are oil prices, the July CPI report, and Fed communications, which will determine the final decision and its impact on volatile crypto and equity markets.

marsbit1h ago

Will the Fed Definitely Raise Interest Rates in September? How Will Crypto and U.S. Stocks Withstand the Pressure?

marsbit1h ago

Trading

Spot

Just Now, Chinese AI Enters Top 2 in Global Programming, Only Claude Remains Ahead

Abstract

Qwen3.7-Max Breaks into Global Top Five

The Only Non-Claude Model

Programming, A True Contender

The "Foundation" Model for the Agent Era

Revealing the Core Technology

The Programming Finals Have a New Disruptor

Related Questions

Related Reads

Annual Salary of Millions Competing for Electricians, Meta Rushes to Open Its Own Technical School

OpenAI No Longer Sells Its Most Expensive Model for Profit

Suspected 4th Coldcard attack wave sweeps 389 Bitcoin: Galaxy’s Thorn

Bitcoin Miners Are Waving the White Flag, But Their Stocks Are Soaring

Will the Fed Definitely Raise Interest Rates in September? How Will Crypto and U.S. Stocks Withstand the Pressure?

Trading