Google's Deep Think Dominates Eight-Language Olympiads, Autonomously Solves Four Unsolved Problems, Research Barriers Collapse

marsbit2026-04-08 tarihinde yayınlandı2026-04-08 tarihinde güncellendi

Özet

Google DeepMind's "Deep Think" AI system has demonstrated exceptional performance across eight languages in regional academic competitions, including mathematics and informatics Olympiads. It achieved perfect scores in Japanese and French contests, and high results in Chinese, Korean, Hindi, Vietnamese, Russian, and Portuguese exams. This multi-language capability aims to reduce linguistic barriers in scientific research, enabling non-English-speaking researchers to access advanced AI tools equally. Beyond competitions, Deep Think has solved four previously unsolved mathematical problems and contributed to breakthroughs in computer science, physics, and economics. It powers the Aletheia agent, which autonomously generates and verifies research-level mathematical solutions. Despite these achievements, the results are based on internal evaluations without third-party verification or detailed methodology disclosure. Google positions Deep Think as a "human intelligence multiplier," expanding AI's role in global scientific collaboration beyond English-dominated benchmarks.

"Deep Think has defeated/matched competitors in all competitions"!

Just now, Google DeepMind senior researcher Conglong Li posted 12 messages on the X platform, revealing an unprecedented scorecard.

One AI, the same brain, eight exam papers in different languages, all submitted with high scores.

Such results are rare for any model.

From IMO Gold Medals to Full Coverage of Regional Competitions

Deep Think's high scores across multiple leaderboards are not a sudden breakthrough but part of a nearly year-long evolution of capabilities.

First, it topped the most rigorous reasoning competitions.

In July 2025, Gemini Deep Think achieved the gold medal standard at the International Mathematical Olympiad (IMO) for the first time, scoring 35 out of 42 points. It also achieved similarly high-level performance at the ICPC World Finals around the same time.

These two achievements have been officially announced in the DeepMind blog.

Google DeepMind subsequently included these two results in its official blog, marking Deep Think's crossing of the "world-class competition threshold" in mathematics and programming.

Next, Deep Think began moving from "world-champion-level individual breakthroughs" to "systematic validation across languages, disciplines, and scenarios."

In February 2026, Google published three blog posts.

One introduced the Gemini 3.1 Pro model itself, one detailed a major upgrade to the Deep Think specialized reasoning mode, and one from the DeepMind scientific discovery team directly positioned Deep Think as a "human intelligence multiplier."

The upgraded Deep Think delivered a series of hard metrics:

48.4% on Humanity's Last Exam (without tool assistance), 84.6% on ARC-AGI-2 (officially verified by the ARC Prize Foundation), a Codeforces competitive programming Elo rating of 3455, and gold medal-level performance on the written portions of the 2025 International Physics and Chemistry Olympiads.

The strategy is very clear: first use world-class competitions like the IMO and ICPC to prove its powerful reasoning abilities, then use multi-language, regional competition, and cross-disciplinary Olympiad results to prove its general, deep reasoning ability that stably transfers across languages and domains.

Gemini Deep Think's capability evolution from IMO gold medals to PhD-level research acceleration

A Detailed Look at the 8-Language Scorecard

Now, let's take a closer look at this scorecard.

Japanese results are the most impressive.

2025 35th Japanese Mathematical Olympiad Finals (JMO Finals), perfect score.

ICPC Asia Japan Preliminary Contest, perfect score.

Among these, the JMO Finals score even exceeded the level corresponding to the top 80% of scores that year, meeting the official "gold medal equivalent" standard.

French results were also a perfect 100%.

The Chinese results are interesting.

At the 41st Chinese Mathematical Olympiad (CMO), Deep Think scored 86.3%, which is quite outstanding. But at the Chinese National Olympiad in Informatics (NOI), it only scored 63.3%.

The gap between 86.3% and 63.3% outlines the real boundaries of AI reasoning ability.

In math competitions, the model faces abstract deduction, proof construction, and multi-step reasoning, which happens to be Deep Think's strongest suit.

But in informatics competitions, the problem is not just "figuring it out," but also translating logic into executable code, controlling boundary conditions, considering complexity constraints, and avoiding implementation errors.

The former is closer to pure reasoning, while the latter requires "reasoning + algorithm design + engineering implementation" to be successful simultaneously.

In the other languages—Korean, Hindi, Vietnamese, Russian, Portuguese—Deep Think also achieved results that either defeated competitors or at least matched them.

Looking at Japanese, French, and Chinese together, the most unusual aspect this time is not necessarily scoring a perfect mark in any single subject, but rather that the same model, the same Deep Think reasoning system, delivered first-tier results on exam papers in multiple languages.

Is This Scorecard Reliable?

But there is a key omission:

Conglong Li did not list specific comparative data from competitors: all results come from Google evaluations. There is no independent third-party replication, no official certification from the competitions, and the evaluation methodology is completely undisclosed.

Was each problem attempted once or many times with the best score taken? How much computational power was used during reasoning? Was there any manual prompt engineering involved?

These details, which directly affect the credibility of the results, were also not mentioned.

Another easily overlooked point: these exams are all regional selection competitions, not international finals.

There is an order of magnitude difference in difficulty between regional competition problems and international finals.

The researcher explicitly stated that these results "will be included in the model card." As of publication, the model card has not been officially updated.

So, for now, this still seems like a scorecard graded by the examinee themselves, announced by themselves, and not yet stamped by the academic affairs office.

Multilingual Research Equity: The Overlooked Real Battlefield

Why did Google specifically invest effort in evaluating 8 different regional languages?

Current evaluations of AI reasoning ability are almost entirely based on English.

MATH, GSM8K, HumanEval, ARC-AGI... these are all in English.

Mathematicians, physicists, and engineers worldwide whose native language is not English must first overcome a language barrier when using AI research tools.

Google's selection of these 8 languages is not random.

Japanese, Korean, and Chinese cover East Asian research powerhouses; Hindi and Vietnamese cover emerging markets; French, Russian, and Portuguese cover Europe and South America.

Together, this represents the majority of global research output.

In its official blog, DeepMind positioned Deep Think as a "human intelligence multiplier," saying it can "handle knowledge retrieval and rigorous verification, allowing scientists to focus on conceptual depth and creative direction."

Combined with these multi-language results, the subtext of this statement is not hard to understand: this multiplier is not just for scientists who use English.

More notably is how far Deep Think has already gone in research落地 (landing/application).

DeepMind announced a mathematical research agent called Aletheia, powered by Deep Think, capable of autonomously generating, verifying, and revising solutions to research-level mathematical problems.

Aletheia, driven by Deep Think, capable of iterative generation, verification, and correction for research-level mathematical problems

Aletheia has already contributed to multiple research papers, one of which was completed entirely autonomously by the AI, calculating specific structural constants in arithmetic geometry.

Furthermore, in a semi-autonomous evaluation of 700 open mathematical problems, it independently solved 4 previously unsolved problems.

The Gemini Deep Think mode also shows great potential in computer science, physics, economics, and other fields.

In computer science, Deep Think helped refute a conjecture that had remained open for a decade; in physics, it found a new analytical solution for gravitational radiation from cosmic strings; in economics, it extended an auction theory theorem.

Schematic diagram of the AI reasoning process, showing how large-scale exploration of the solution space at the network layer is aggregated into structured reasoning and confirmed through automated and manual verification.

By collaborating with experts to solve 18 research challenges, the advanced version of Gemini Deep Think helped break through long-standing bottlenecks in algorithms, machine learning and combinatorial optimization, information theory, and economics.

This goes far beyond "solving competition problems."

While competitors are still competing on English benchmark leaderboards, Google has already found a new battlefield in the "AI research accelerator" field.

The most important thing about this is not the scores; the real signal behind it is: the language barrier for AI research tools is being treated as an engineering problem to be solved.

If this path succeeds, scientists conducting research in Japanese, Korean, Chinese, Hindi, and other languages will, for the first time, stand on the same starting line as native English speakers.

This time, Google has laid its cards on the table.

As for which competitors will follow suit, we believe we will see soon.

References:

https://blog.google/intl/ja-jp/company-news/technology/gemini-31-pro-gemini-31-pro-deep-think/%20

https://deepmind.google/blog/accelerating-mathematical-and-scientific-discovery-with-gemini-deep-think/%20

https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/%20

https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-deep-think/

This article is from the WeChat public account "新智元" (New Zhiyuan), author: 新智元

İlgili Sorular

QWhat is the key achievement of Google's Deep Think AI model as reported in the article?

ADeep Think achieved top-tier results in eight different language versions of academic competitions, including perfect scores in Japanese and French math and programming contests, and high performance in Chinese, Korean, Hindi, Vietnamese, Russian, and Portuguese exams.

QWhich specific world-class competitions did Deep Think first demonstrate its reasoning capabilities in?

ADeep Think first demonstrated its reasoning capabilities by reaching gold medal standards in the International Mathematical Olympiad (IMO) with a score of 35 out of 42 in July 2025, and achieving similarly high performance in the ICPC World Finals.

QWhat is the significance of Deep Think's performance across multiple languages according to the article?

AIts performance across multiple languages signifies a breakthrough in breaking down language barriers in AI research tools, potentially allowing non-English speaking scientists worldwide to access advanced AI research assistance on equal footing.

QWhat are some research breakthroughs mentioned that were achieved using Deep Think?

ADeep Think autonomously solved 4 previously unsolved mathematical problems, refuted a decade-old conjecture in computer science, found new analytical solutions for cosmic string gravitational radiation in physics, and extended an auction theory theorem in economics.

QWhat concerns does the article raise about the reliability of Deep Think's reported results?

AThe article notes that all results are from internal Google evaluations without third-party verification, official contest authentication, or disclosure of testing methods such as attempt counts, computational resources used, or potential human prompt engineering involvement.

İlgili Okumalar

Base Under Pressure

**Title: The Pressure Mounts for Base** Base, the Ethereum Layer 2 scaling solution backed by Coinbase, is facing significant pressure and public scrutiny from its leadership following the launch of Robinhood Chain. Base co-founder Jesse Pollak recently acknowledged strategic missteps, admitting that the chain's past focus on social and creator tokens (e.g., through Farcaster, Zora) failed to deliver sustainable adoption. He has refocused on core infrastructure, handing leadership of the Base App back to Coinbase's Cobie. While Base remains a top L2 contender alongside OP Mainnet and Arbitrum, and boasts the highest TVL (nearly $12B), its weaknesses are being highlighted by the new competitor. Key criticisms include its slow progress on decentralization. Base has faced issues with its single sequencer causing block production halts, and L2BEAT is reportedly considering downgrading its decentralization rating from Stage 1 to Stage 0. This contrasts sharply with the rapid initial success of Robinhood Chain, whose DEX quickly entered the top five by volume. The leadership styles of the parent companies are also being compared: Robinhood's CEO actively engages with new projects, while a recent incident where Coinbase's Brian Armstrong briefly changed his profile picture—sparking and then crashing a related meme token—drew community ire and mockery. Pollak stated Base is working with Coinbase on tokenized stocks backed 1:1 by real equity, differentiating it from Robinhood's derivatives model. However, the article argues that Base's most urgent task is to address its long-standing technical and trust issues. With more traditional finance players likely to emulate Robinhood's path, Base must use this competitive pressure to solidify its position as long-term financial infrastructure.

Foresight News11 dk önce

Foresight News11 dk önce

White House Concession Removes Ethical Hurdle, Clarity Act Races Against Final Window Before Recess?

On July 21st, industry sources reported that the Trump administration has agreed to include an ethics provision in the "Clarity Act" (Digital Asset Market Clarity Act of 2025). This concession addresses the long-standing conflict-of-interest concerns regarding government officials and the crypto industry, potentially removing the final major obstacle to the bill's progress. Additionally, Patrick Witt, the executive director of the White House's Digital Asset Advisory Committee, confirmed he will remain in his role to help finalize the bill, alleviating previous concerns about his potential departure. The Clarity Act aims to establish a unified federal regulatory framework for the U.S. digital asset market. Its core objective is to resolve regulatory ambiguity by defining different types of digital assets (digital commodities, investment contract assets, and permitted payment stablecoins) and clarifying the respective oversight roles of the SEC and CFTC. This would end the long-running jurisdictional dispute between the two agencies and provide clearer compliance paths for the industry. With the ethics issue moving toward resolution, the most urgent challenge now is time. The U.S. Congress is set to begin its August recess in mid-August, leaving only a few working weeks to finalize the text and advance the bill through the Senate. Industry advocates, like the Blockchain Association's Kristin Smith, stress that this is a critical moment. If negotiations conclude successfully in the coming weeks, the Clarity Act could pass a key hurdle before the recess; otherwise, it may face significant delays. If enacted, the Clarity Act could mark a historic turning point in crypto regulation. By providing a clearer and more predictable legal framework, it aims to reduce uncertainty for businesses, developers, and traditional financial institutions looking to enter the digital asset space, potentially setting a global benchmark for market structure regulation.

Odaily星球日报16 dk önce

White House Concession Removes Ethical Hurdle, Clarity Act Races Against Final Window Before Recess?

Odaily星球日报16 dk önce

Midnight’s 515M NIGHT hack sends token down 32% – Will $0.015 hold?

In July 2026, the Midnight network was hacked, with 515 million NIGHT tokens drained from a cross-chain bridge contract. The attacker sold a large portion, causing the token price to crash 32% to an all-time low of $0.015. This triggered panic selling, spiking trading volume and driving market indicators into deeply oversold territory. While the Midnight Foundation stated its core network remained secure, the incident left the wrapped token on BNB unbacked. Analysts warn of continued bearish pressure, with the key question being whether the $0.015 support level will hold.

ambcrypto21 dk önce

Midnight’s 515M NIGHT hack sends token down 32% – Will $0.015 hold?

ambcrypto21 dk önce

AI Era, Industrial Revolution, and Future Civilization Interview — Zhang Dingwen: The Future Does Not Belong to Chasers

"AI Era, Industrial Revolution and Future Civilization: An Interview with Zhang Dingwen – The Future Does Not Belong to Those Who Chase" In this interview, entrepreneur Zhang Dingwen reflects on his entrepreneurial journey and philosophy, moving beyond discussions of financing or success to emphasize understanding the "era" itself. He argues that true entrepreneurs should not chase short-term trends ("winds"), but position themselves in the direction of long-term technological and societal evolution. Zhang shares key lessons from his early days, including the realization that user value does not automatically translate to commercial value. For him, the core of entrepreneurship is not building a company but constantly upgrading one's own "cognition" – the ability to interpret information, ask the right questions, and understand the underlying "causes" behind business outcomes, not just the effects. His thinking has evolved from a focus on creating good products to a strategic focus on building "entrances" – platforms that naturally connect users to digital services. He sees smart wearables, like watches, not merely as hardware but as potential future gateways combining technological, financial, social, and even fashion attributes to create sustained user relationships and ecosystems. Ultimately, Zhang's vision transcends individual products or companies. He discusses business competition in three stages: product, platform, and finally, "civilization" – where the greatest companies influence how society operates by defining new rules and ways of life. He believes the mission of a truly great enterprise is to solve problems of its time, build enduring trust, and contribute lasting value, leaving behind not just wealth but a positive impact on how the world works. The future, he concludes, belongs not to the fastest, but to those with the correct long-term direction and a commitment to continuous learning and evolution.

marsbit27 dk önce

AI Era, Industrial Revolution, and Future Civilization Interview — Zhang Dingwen: The Future Does Not Belong to Chasers

marsbit27 dk önce

Cryptocurrency & Stock Market Barometer丨Strategy Cash Reserves Increase to $3.23 Billion, Halting BTC Purchases; Vanguard and Other Asset Managers Increase Holdings in Strategy Stock (July 21)

Market Overview & Warnings: The article warns of high volatility in South Korean stocks and continued dependence on U.S. stocks on geopolitics. Chinese A-shares remain under pressure. It advises against using leverage in current equity markets. For crypto-linked stocks, most have limited growth except Robinhood, with caution advised. U.S. Stock Market: Bearish bets on U.S. stocks, particularly targeting AI-related companies, have reached record highs since 2010, signaling deep skepticism about the sustainability of the AI-driven rally. Tech and chip stocks led a market decline, with the Philadelphia Semiconductor Index potentially entering a bear market. Increased expectations for Federal Reserve interest rate hikes and geopolitical tensions contributed to the negative sentiment. Bitcoin Treasury Company Updates: * Strategy: Increased its cash reserves to $3.23 billion and paused Bitcoin purchases. Several major asset managers, including Vanguard Group and Capital Group, increased their holdings of Strategy (MSTR) stock. * Global corporate Bitcoin buying slowed significantly to just $1.33 million last week. * Other notable activity: Strive purchased 21 BTC; ORANGE JUICE raised $40 million for Bitcoin acquisitions; Bitcoin Japan Corp. raised $60 million, allocating $4.08 million for its first BTC purchase. Other Crypto Treasury Holdings: * Ethereum: BitMine increased its ETH holdings to 5.78 million, nearing its 5% of supply goal. Its total crypto assets, cash, and securities are valued at $11.5 billion. * Solana: No significant corporate treasury activity reported. * Altcoins: HypeStrat made no adjustments to its treasury; its mNAV ratio fell to a long-term low. (Note: This summary is for informational purposes only and does not constitute investment advice.)

marsbit28 dk önce

Cryptocurrency & Stock Market Barometer丨Strategy Cash Reserves Increase to $3.23 Billion, Halting BTC Purchases; Vanguard and Other Asset Managers Increase Holdings in Strategy Stock (July 21)

marsbit28 dk önce

İşlemler

Spot