The World Cup Has Only Just Begun, But AI Predictions Already Have Models Hailed as 'Godly' and Others Flipping Over

Odaily星球日报Published on 2026-06-15Last updated on 2026-06-15

Abstract

After only a few days of the World Cup, AI models are being widely used for match predictions, with mixed early results. These models analyze details like scores, upsets, red cards, and key players, offering users in prediction markets an extra layer of analysis beyond odds and news. Qwen gained early attention for its remarkably accurate calls on the opening day, correctly predicting Mexico's 2-0 win over South Africa and Korea's 2-1 victory over the Czech Republic, while also highlighting red card risks and match flow. Copilot had its own highlights, accurately forecasting the Mexico 2-0 result, the Korea 2-1 win, and a surprising 1-1 draw between Brazil and Morocco. However, it also misjudged several matches, like predicting a Swiss win that ended in a draw with Qatar and missing Australia's upset over Turkey. ChatGPT provided detailed pre-match analysis and correctly called the Mexico 2-0 score, explaining factors like home-field advantage. Yet, it struggled to anticipate upsets, often siding with the stronger team on paper, as seen in its missed calls for the Australia-Turkey and Japan-Netherlands matches. Social media tests pitted models like Gemini, Grok, and Claude against each other for the same games, revealing different predictive "scripts" even for the same fixture. Overall, while AI models like Qwen and Copilot have shown promising, high-profile successes in early matches, their consistency and ability to predict genuine upsets remain in question. As the tour...

Original | Odaily Planet Daily (@OdailyChina)

Author | Asher(@Asher_ 0210)

The most lively place in this World Cup isn't just on the pitch.

As the heat around World Cup prediction events rises, more and more users are starting to participate in trading with real money. Who will win, what will the score be, will there be an upset, will there be a red card, which player will score—topics that originally belonged to pre-match fan chatter have now been broken down into tradable prediction events.

And when predictions become trades, what users need is more than just emotion and intuition: odds movements, team form, injury news, historical matchups, market sentiment, all become references before a trade. In this process, AI models are being frequently pulled into World Cup prediction scenarios.

Large models like Qwen, ChatGPT, Gemini, Claude, DeepSeek, Qwen, and Copilot can not only answer "which team is more likely to win," but also give score predictions, upset possibilities, red card risks, key player performance, and match flow analysis. For prediction market participants, AI's pre-match deduction is becoming another layer of reference beyond odds, news, team data, and market sentiment.

However, predictions ultimately have to return to the matches themselves.

As the World Cup officially kicks off and the results of the first few matches come out one after another, those AI analyses that users used to aid their judgment before the matches finally have answers to compare against: Was the score predicted, were upsets spotted in advance, how many details like red cards, last-minute winners, and match flow were truly captured by the models.

The First to Go Viral Was, Unexpectedly, Qwen

The most dramatic performance on the first day of the World Cup was undoubtedly Qwen's.

For the opening match between Mexico and South Africa, Qwen's pre-match prediction was Mexico 2:0 South Africa. After the match ended, the score was indeed 2:0. What's more interesting is that the total of three red cards in the match also basically matched Qwen's pre-match risk assessment of "South Africa's overly physical defending, possibly getting into an early one-man-down situation."

If it were just predicting a Mexico win, that wouldn't be too surprising. As one of the hosts, Mexico was already favored. But what Qwen hit this time were more specific match details: the 2:0 scoreline, South Africa's red card risk, and the gradually widening gap in the latter stages of the match.

Immediately after, for the South Korea vs. Czech Republic match, Qwen gave another prediction of South Korea 2:1.

This match wasn't easy to call pre-match. The Czechs have physicality, set-piece threats, and the usual big-tournament experience of European teams. The match process was indeed not one-sided—the Czechs took the lead first, South Korea equalized later, and the match was deadlocked at 1:1 for a long time. Until the final stage, South Korea scored the winning goal, and the final score became 2:1.

At this point, Qwen's prediction gained a stronger "scripted" feel. Judging the winner can rely on paper strength, score prediction can have luck involved, but it's process details like red cards, comebacks, and last-stage winners that truly make people feel "there's something to it." After two matches on the first day, Qwen first pulled up the attention on AI World Cup predictions.

Copilot: Moments of Genius, and Clear Flops

Before the tournament, USA Today had Copilot predict all 104 matches of this World Cup. Looking at the matches that have concluded so far, this prediction has both highlights and clear misses.

Three match predictions stood out the most.

For the opening match Mexico vs. South Africa, Copilot predicted Mexico 2:0, which hit the final score exactly. For South Korea vs. Czech Republic, it predicted South Korea 2:1, again matching the result. For Brazil vs. Morocco, Copilot gave a 1:1 prediction, and Brazil was indeed held to a draw by Morocco.

Especially the Brazil 1:1 Morocco match, the prediction had considerable merit. Brazil, after all, is a traditional powerhouse, with squad strength and attention in the top tier. Although Morocco reached the semi-finals in the last World Cup, directly predicting a draw against Brazil pre-match wasn't a particularly safe choice. After the match, Brazil didn't get a winning start, and Morocco continued its tournament resilience. Copilot's prediction for this match was indeed a "stroke of genius."

But Copilot's issues also quickly surfaced.

It predicted Canada would beat Bosnia and Herzegovina 2:1, but they drew 1:1; predicted Switzerland would narrowly beat Qatar 1:0, but Switzerland was also held to a draw; predicted the USA would beat Paraguay 2:0—the direction was correct, but the actual score was 4:1, significantly underestimating the offensive intensity.

More obvious flops appeared in several upset matches and matches where strong teams were held back.

For Turkey vs. Australia, Copilot predicted Turkey to win 2:1, but Australia pulled off a 2:0 upset win. For Ecuador vs. Ivory Coast, it predicted Ecuador 2:1, but Ivory Coast won 1:0. For Netherlands vs. Japan, it predicted Netherlands 2:1, but Japan equalized twice, resulting in a 2:2 draw. For Sweden vs. Tunisia, it predicted 1:1, but Sweden directly won 5:1.

That Copilot could nail the exact scores for Mexico, South Korea, and Brazil matches shows it's not just giving answers favoring the favorites. But matches like Australia beating Turkey, Qatar drawing with Switzerland, and Japan drawing with Netherlands also expose that its judgment on upsets and draws is still relatively conservative.

ChatGPT: Analysis Is Comprehensive, But Not Sharp Enough on Upsets

Compared to Copilot's full schedule prediction, ChatGPT is more of a "pre-match analysis player."

In the opening match prediction, ChatGPT predicted Mexico 2:0 South Africa, hitting the final score. The reasoning it gave was also quite comprehensive, including Mexico's home advantage, recent form, South Africa's lack of attacking power, and factors like Mexico City's high altitude and home atmosphere. In this prediction, ChatGPT didn't just give a result; the underlying judgment logic also aligned with the match outcome.

But when it comes to full World Cup schedule predictions, ChatGPT's stability isn't as strong. While it hit Mexico 2:0 South Africa and Brazil 1:1 Morocco, and correctly called the winner in several matches like Scotland, Germany, and Sweden, in matches like South Korea 2:1 Czech Republic, Qatar 1:1 Switzerland, Australia 2:0 Turkey, and Japan 2:2 Netherlands, ChatGPT's judgments all predicted the team with stronger paper strength. For example, Switzerland should beat Qatar, Turkey should beat Australia, the Netherlands should narrowly beat Japan.

ChatGPT isn't without predictive ability; it can break down team strength, home environment, and recent form clearly, and can hit the score in some matches. But based on current results, it's better at explaining "why the favorite is more reasonable" than identifying in advance which matches might deviate from the favorite's script.

Gemini, Grok, Claude: Different Models Write Different Scripts for the Same Match

Besides Qwen, Copilot, and ChatGPT, some social media users fed the same match to multiple models for pre-match predictions.

Taking the opening match Mexico vs. South Africa as an example, a blogger tested four AI models—ChatGPT, Gemini, Grok, and Claude—simultaneously for pre-match predictions. The results showed that ChatGPT and Gemini both predicted Mexico 2:0 South Africa, hitting the final score exactly; Grok predicted Mexico 2:1, Claude predicted Mexico 3:1; while both correctly saw Mexico winning, they didn't nail the exact score.

For this opening match prediction, different models gave three different "scripts." ChatGPT and Gemini Pro were closer to the actual match: Mexico dominant, South Africa lacking in attack, eventually kept scoreless. Grok seemed to give a more open scoreline, thinking South Africa would get a goal back from a counter. Claude Sonnet raised expectations for Mexico's attack higher, giving a more open 3:1 result.

Summary

Since the currently reviewable sample of AI predictions is still limited, we can't directly judge which model is the most "football-savvy" at this stage.

But just looking at the few matches that have concluded, differences are starting to show. Qwen currently has the most memorable performance, hitting Mexico 2:0 South Africa and South Korea 2:1 Czech Republic on the first day, and also stepping on red card risks and match flow, a highlight in a small sample. However, whether it can sustain accuracy needs more matches to verify.

Copilot and ChatGPT both have highlights of hitting exact scores, but also expose a common problem—their judgment still isn't sensitive enough when facing matches that deviate from paper strength, like Australia beating Turkey, Qatar drawing Switzerland, or Japan drawing the Netherlands.

As for models like Gemini, Grok, and Claude, currently public samples are more concentrated on single matches or social media comparisons, having reference value but not yet suitable for direct ranking.

AI can already serve as a layer of reference for World Cup prediction market users, but it's far from a standard answer. Next, Odaily Planet Daily will continue to collect pre-match predictions from various models and keep reviewing them as the tournament progresses: which models just had good luck at the start, and which models can truly withstand the test of results across more matches.

Related Questions

QWhich AI model achieved a notable success in predicting the first two matches of the World Cup, including specific scores and details like red cards?

AThe AI model Qwen (千问) successfully predicted the exact scores of Mexico 2:0 South Africa and South Korea 2:1 Czech Republic in the first two matches. It also correctly identified the risk of red cards for South Africa.

QWhat were the major strengths and weaknesses of Copilot's World Cup match predictions according to the article?

ACopilot's strengths included accurately predicting the exact scores of Mexico 2:0 South Africa, South Korea 2:1 Czech Republic, and Brazil 1:1 Morocco. Its weaknesses were significant misses on several matches, such as incorrectly predicting wins for Turkey against Australia and for Switzerland against Qatar, showing a conservative bias against upsets and draws.

QHow did ChatGPT's approach to World Cup prediction differ from a model like Copilot?

AChatGPT functioned more as a 'pre-match analytical player,' providing detailed reasoning behind its predictions (e.g., considering Mexico's home advantage and altitude). While it accurately predicted some scores like Mexico 2:0, its full-tournament predictions showed less stability and a tendency to favor the stronger team on paper, missing several upsets and draws.

QWhat were the different predictions made by ChatGPT, Gemini, Grok, and Claude for the opening match between Mexico and South Africa?

AFor the opening match Mexico vs. South Africa, ChatGPT and Gemini predicted a 2:0 win for Mexico. Grok predicted a 2:1 win for Mexico, and Claude predicted a 3:1 win for Mexico. Only ChatGPT and Gemini predicted the exact final score of 2:0.

QWhat is the article's overall conclusion about the current reliability of AI models for World Cup predictions?

AThe article concludes that while AI models can provide an additional layer of reference for prediction market participants, they are far from being a standard answer. Their reliability varies, with some showing promising initial results but needing more matches for validation, and most models currently showing a lack of sensitivity in predicting upsets and unexpected draws.

Related Reads

The Foundation of SpaceX's Trillion-Dollar Valuation: Who is Dividing Up Musk's Annual Tens of Billions in Capital Expenditure?

SpaceX's trillion-dollar valuation is built on its three core businesses: Starlink (profitable, 60% of revenue), rockets (driving down launch costs), and AI (a major investment area). This creates a financial cycle: Starlink funds rocket development, which enables low-cost launches for AI hardware, generating future revenue. This cycle fuels annual capital expenditures of tens of billions, flowing to a vast supply chain. Suppliers are categorized by their replaceability. The first group includes irreplaceable players like NVIDIA (GPU/CUDA ecosystem), Eutelsat (critical radio spectrum), Filtronic (specialized amplifiers), Materion (strategic beryllium), and STMicroelectronics (antenna chips). The second group consists of hard-to-replace suppliers due to high switching costs, such as Honeywell (flight control), Carpenter Technology (specialty alloys), Hexcel (carbon fiber), Broadcom (data exchange), and Linde (industrial gases). The third group comprises high-volume, cost-critical suppliers for mass-produced items like Starlink terminals. Key names include Wistron NeWeb (primary manufacturer) and several A-share companies like Shenzhen Sunway (connectors), Pies New Materials (forgings), Western Superconducting (alloys), and Yingliu (castings). Other niche players include Trimble (timing), Astronics (power distribution), and CTS (thermal management). The article argues that investing in these suppliers, rather than SpaceX stock directly, offers an alternative opportunity. The rationale is threefold: procurement is just beginning to scale, SpaceX's IPO brings new transparency to its supply chain, and the situation mirrors early stages of past "super terminal" ecosystems like Apple or Tesla. While risks exist (commodity cycles, geopolitical factors, technology shifts), the core thesis is that SpaceX's massive, ongoing procurement will translate into reliable revenue for its key suppliers, regardless of its own stock price volatility.

marsbit10m ago

The Foundation of SpaceX's Trillion-Dollar Valuation: Who is Dividing Up Musk's Annual Tens of Billions in Capital Expenditure?

marsbit10m ago

SpaceX's Trillion-Dollar Valuation Base: Who's Sharing in Musk's Annual Tens of Billions in Capital Expenditure?

**Title: The Foundation of SpaceX's Trillion-Dollar Valuation: Who Benefits from Musk's Annual $100 Billion Capital Expenditure?** This article argues that investors seeking to benefit from SpaceX's growth might find greater opportunities in its supply chain rather than directly investing in the company itself, drawing parallels to historical successes with Apple, Tesla, and NVIDIA suppliers. **SpaceX's Business Model & Cash Flow:** SpaceX generates revenue from three main areas: 1. **Starlink:** Its profitable core, earning $11.3B in 2023 (60% of revenue), funding other ventures. 2. **Rockets (Falcon/Starship):** Requires $3B+ in annual R&D but achieves the world's lowest launch costs. 3. **AI:** Currently unprofitable (-$6B+ in 2023), investing heavily in ground-based supercomputers (220,000 GPUs) and future orbital data centers. The cycle is: Starlink profits → fund cheaper rockets → low-cost launches deploy AI hardware → AI compute rentals generate future revenue. This cycle drives annual procurement spending of tens of billions of dollars. **The Supply Chain Beneficiaries:** Suppliers are categorized by their replaceability: **1. Nearly Irreplaceable (High Barriers to Entry):** * **NVIDIA:** Powers the Colossus supercomputer; its CUDA ecosystem creates immense switching costs. * **Eutelsat (SATS):** Controls critical radio spectrum for satellite communications; holds a ~3% stake in SpaceX. * **Filtronic (FTC):** Supplies millimeter-wave signal amplifiers for Starlink satellites; SpaceX constitutes 83% of its revenue. * **Materion (MTRN):** Global leader in beryllium production, a strategic material used in Starship structures. * **STMicroelectronics (STM):** Supplies phased-array antenna chips for Starlink satellites. **2. Replaceable, but Switching Cost is Prohibitively High:** * **Honeywell (HON):** Provides flight control and inertial navigation systems with decades of certification. * **Carpenter Technology (CRS):** Manufactures ultra-pure specialty steel alloys for Raptor engines. * **Hexcel (HXL):** Supplies custom carbon fiber composites developed over a decade with SpaceX. * **Broadcom (AVGO):** Manages high-speed data switching. * **Linde Group:** Supplies industrial gases (liquid oxygen/nitrogen) from facilities built near SpaceX launch sites. **3. High-Volume, Cost-Critical Manufacturing:** Focuses on mass-producing components like Starlink user terminals (target: 30 million units). * **Key Players:** Wistron NeWeb (6285, primary terminal manufacturer), several Chinese A-share companies (e.g., Sunway Communication, PAX New Materials, Western Metal Materials, Yingliu Co.), and smaller US firms like Trimble (TRMB, timing systems). **Why Now?** Three factors make the supply chain opportunity timely: 1. **Volume Ramp-Up:** SpaceX plans 100 launches in 2026, aims for 30 million Starlink terminals, and will deploy AI data centers, meaning procurement will accelerate. 2. **Increased Transparency:** The IPO provides public financial data, allowing investors to track supplier order growth. 3. **Historical Precedent:** The current phase is likened to Tesla's early mass-production stage (circa 2018), suggesting a long growth runway for suppliers. **Conclusion:** The article posits that while investing in SpaceX stock is betting on Elon Musk's ambitious vision at a high valuation, investing in its established suppliers is a bet on the tangible, recurring revenue from its massive procurement budget, which is largely decoupled from day-to-day stock price volatility.

链捕手13m ago

SpaceX's Trillion-Dollar Valuation Base: Who's Sharing in Musk's Annual Tens of Billions in Capital Expenditure?

链捕手13m ago

The U.S. Government Blocked the Anthropic Model. It Wasn't About 'Jailbreaking' at All.

Last Friday, the U.S. Commerce Department issued an enforcement letter that forced Anthropic to take its two most advanced AI models, Fable 5 and Mythos 5, offline. The stated reason was unspecified national security concerns, initially linked to potential "jailbreaks" of the models' safeguards. However, new details suggest the action stemmed more from a deteriorating relationship between the Trump administration and Anthropic, rather than a genuine technical threat. According to reports, the government cited a little-known export control regulation, compelling Anthropic to block access for all non-U.S. persons, including its own international employees. The company complied, shutting down the models without a court order or specific technical details from the government. Cybersecurity expert Katie Moussouris revealed she was privately shown a research paper detailing a potential safeguard bypass in Fable 5. She argued the described method was minor and did not warrant an export ban, stating that attempts to "fix" it would only weaken the model's defensive capabilities. Moussouris and other experts have since called for the order to be revoked, warning it dangerously removes advanced cybersecurity tools from U.S. defenders. Analysts like Justin Hendrix suggest the move appears retaliatory and sets a dangerous precedent, signaling that the U.S. government can unilaterally shut down a tech company's products. The incident has raised concerns about the reliability of American AI and the potential for political interference in the tech industry, serving as a warning to the broader sector.

marsbit17m ago

The U.S. Government Blocked the Anthropic Model. It Wasn't About 'Jailbreaking' at All.

marsbit17m ago

Ray Dalio: AI Bull Market Continues to Soar, Should Investors Go All In or Cash Out and Leave the Field?

In his latest notes, Ray Dalio addresses a critical question for investors amid the AI-driven stock market surge: how should one allocate assets during a transformative technological revolution? Dalio emphasizes that technological advancement does not automatically make related stocks attractive. Historical tech cycles—marked by excitement, crowding, volatility, and eventual shakeouts—show that even long-term winners like Microsoft and Apple experienced severe drawdowns. Today's AI sector faces similar uncertainties: overinvestment, intensifying competition, geopolitical tensions (e.g., Taiwan's chip supply), tax policy shifts, anti-AI sentiment, and potential disruption from future technologies like quantum computing. Dalio's core argument focuses on the highly concentrated market structure, where a few tech giants dominate major indices. He warns investors against unknowingly holding concentrated, correlated exposures. Instead of chasing a handful of AI leaders, he advocates for a robust, diversified portfolio of 15 or more high-quality, uncorrelated investments, risk-balanced to match an investor's volatility tolerance. Mathematically, such diversification significantly improves the risk-return ratio—for example, holding 15 uncorrelated assets can boost the ratio by over four times compared to a single concentrated bet. Dalio cautions that future equity returns appear low, with his bubble indicator suggesting real returns could be negative over the next 5-10 years. He stresses that knowing what you don't know is as important as knowing what you do. In an environment of high uncertainty and concentration, avoiding large, concentrated bets on AI stocks is prudent. The optimal strategy is disciplined diversification—the "holy grail" of investing—to navigate this technologically driven cycle with lower risk and comparable or better returns.

marsbit21m ago

Ray Dalio: AI Bull Market Continues to Soar, Should Investors Go All In or Cash Out and Leave the Field?

marsbit21m ago

Trading

Spot
Futures
活动图片