The World Cup has only been played for a few days, but some AI prediction models have already been crowned as oracles, while others have stumbled badly.

marsbitPublié le 2026-06-16Dernière mise à jour le 2026-06-16

Résumé

The 2026 FIFA World Cup has sparked significant interest not only on the pitch but also in AI-driven match prediction. Major models like Qwen, Copilot, and ChatGPT are being used to forecast outcomes, scores, upsets, red cards, and key player performances. Qwen gained early attention by accurately predicting Mexico's 2-0 win over South Africa (including a red card risk) and South Korea's 2-1 victory over the Czech Republic in the opening matches. Copilot's pre-tournament predictions had notable successes, such as correctly calling the Mexico 2-0 scoreline, South Korea's 2-1 win, and Brazil's 1-1 draw with Morocco. However, it also had clear misses, failing to predict upsets like Australia's 2-0 win over Turkey or Switzerland's draw with Qatar. ChatGPT provided detailed analytical reasoning, correctly predicting Mexico's 2-0 win, but its full-tournament predictions tended to favor favorites, missing several underdog results and draws. Tests pitting multiple models (ChatGPT, Gemini, Grok, Claude) against the same match, like Mexico vs. South Africa, showed varying predictions, with only some hitting the exact score. In summary, while AI models like Qwen have shown promising early results in specific match details, and others have had isolated successes, they collectively struggle to consistently identify upsets and underdog performances. AI is becoming an additional reference tool for prediction markets but is far from a definitive source.

The most exciting place at this World Cup isn't just on the pitch.

As interest in World Cup prediction events heats up, more and more users are participating in trading with real money. Who will win, what will the score be, will there be an upset, will there be a red card, which player will score—these topics, originally just casual pre-match chatter among fans, are now broken down into individual tradable prediction events.

When predictions become trades, users need more than just emotions and intuition: odds fluctuations, team form, injury news, head-to-head history, and market sentiment all become reference points before making a trade. In this process, AI models are being frequently brought into World Cup prediction scenarios.

Large models like Qwen, ChatGPT, Gemini, Claude, DeepSeek, Qwen, and Copilot can not only answer 'which team is more likely to win' but also provide score predictions, likelihood of upsets, red card risks, key player performances, and match flow analysis. For prediction market participants, AI's pre-match analysis is becoming another layer of reference beyond odds, news, team data, and market sentiment.

However, predictions ultimately have to be judged against the actual matches.

With the official start of the World Cup, the results of the first few matches have come in. Those AI analyses that users consulted to aid their judgments before the matches now have answers to compare against: Were the scores predicted correctly? Were upsets foreseen? How many details like red cards, last-minute winners, and match flow were actually captured by the models?

The first to go viral was, surprisingly, Qwen

The most entertaining performance on the opening day of the World Cup undoubtedly belonged to Qwen.

For the opening match between Mexico and South Africa, Qwen's pre-match prediction was Mexico 2:0 South Africa. After the match ended, the score was indeed 2:0. What's more interesting is that the match saw a total of three red cards, which also largely aligned with Qwen's pre-match risk assessment of 'South Africa's overly aggressive defending, potentially leading to playing with ten men early on.'

If it were just predicting a Mexico win, that wouldn't be too surprising. As one of the hosts, Mexico was favored anyway. But what Qwen nailed this time were the more specific match details: the 2:0 scoreline, South Africa's red card risk, and the pace of the game gradually opening up in the later stages.

Next, for the match between South Korea and the Czech Republic, Qwen gave a prediction of South Korea 2:1.

This match wasn't easy to call before kick-off. The Czech Republic had physicality, set-piece threats, and the usual big-tournament experience of European teams. The match process was indeed not one-sided; the Czechs took the lead first, South Korea equalized later, and the game was deadlocked at 1:1 for a long time. It wasn't until the final stages that South Korea scored the winning goal, with the final score becoming 2:1.

This gave Qwen's prediction an even stronger sense of 'scriptwriting.' Predicting the winner can rely on paper strength, score predictions can involve luck, but process details like red cards, comebacks, and last-minute winners are what truly make people think 'there's something to this.' After two matches on the opening day, Qwen first raised the profile of AI World Cup predictions.

Copilot: Moments of brilliance, but also obvious stumbles

Before the tournament, USA Today had Copilot predict all 104 matches of this World Cup. Judging from the completed matches so far, these predictions have both highlights and obvious misses.

Among them, three match predictions stood out.

For the opening match Mexico vs. South Africa, Copilot predicted Mexico 2:0, which matched the final score exactly. For South Korea vs. the Czech Republic, it predicted South Korea 2:1, again consistent with the result. For Brazil vs. Morocco, Copilot gave a 1:1 prediction, and Brazil was indeed held to a draw by Morocco.

Especially the Brazil 1:1 Morocco match, the prediction had significant merit. Brazil is, after all, a traditional powerhouse, with a squad and level of attention in the top tier.

Although Morocco reached the semi-finals in the last World Cup, predicting a draw against Brazil before the match was not a particularly safe choice. After the match, Brazil failed to get a winning start, and Morocco continued its resilience in major tournaments—Copilot's prediction for this match was indeed a 'stroke of genius.'

But Copilot's issues also became apparent quickly.

It predicted Canada would beat Bosnia and Herzegovina 2:1, but the match ended 1:1; it predicted Switzerland would edge Qatar 1:0, but Switzerland was also held to a draw; it predicted the USA would beat Paraguay 2:0—the direction was correct, but the actual score was 4:1, significantly underestimating the attacking intensity.

More obvious stumbles occurred in several matches involving upsets and strong teams being held back.

For Turkey vs. Australia, Copilot predicted Turkey would win 2:1, but Australia pulled off a 2:0 upset win. For Ecuador vs. Ivory Coast, it predicted Ecuador 2:1, but Ivory Coast won 1:0. For the Netherlands vs. Japan, it predicted the Netherlands 2:1, but Japan came back twice to level, ending in a 2:2 draw. For Sweden vs. Tunisia, it predicted 1:1, but Sweden thrashed them 5:1.

The fact that Copilot could nail the exact scores for Mexico, South Korea, and Brazil shows it doesn't just follow the favorites. But matches like Australia beating Turkey, Qatar drawing with Switzerland, and Japan drawing with the Netherlands also expose its judgments on upsets and draws as still being relatively conservative.

ChatGPT: Analysis is thorough, but not sharp enough on upsets

Compared to Copilot's full tournament predictions, ChatGPT is more like a 'pre-match analytical player.'

In its opening match prediction, ChatGPT predicted Mexico 2:0 South Africa, hitting the final score. The reasoning it provided was also quite thorough, including Mexico's home advantage, recent form, South Africa's lack of attacking threat, and factors like the high altitude of Mexico City and the home crowd atmosphere. In this prediction, ChatGPT didn't just give a result; the underlying logic also aligned with the match outcome.

However, when it comes to full tournament predictions, ChatGPT's stability isn't as strong. While it correctly predicted Mexico 2:0 South Africa and Brazil 1:1 Morocco, and got the win/loss direction right for several matches like Scotland, Germany, and Sweden, for matches like South Korea 2:1 Czech Republic, Qatar 1:1 Switzerland, Australia 2:0 Turkey, and Japan 2:2 the Netherlands, ChatGPT's predictions favored the team with stronger paper strength. For example, it predicted Switzerland should beat Qatar, Turkey should beat Australia, and the Netherlands should edge Japan.

ChatGPT is not without predictive ability; it can break down team strength, home conditions, and recent form clearly, and can hit the score in some matches. But based on current results, it seems better at explaining 'why the favorite is more logical' rather than identifying in advance which matches might deviate from the favorite's script.

Gemini, Grok, Claude: Different models write different scripts for the same match

Besides Qwen, Copilot, and ChatGPT, some social media users have fed the same match to multiple models for pre-match predictions.

Taking the opening match Mexico vs. South Africa as an example, one blogger simultaneously tested four AI models—ChatGPT, Gemini, Grok, and Claude—for pre-match predictions. The results showed that both ChatGPT and Gemini predicted Mexico 2:0 South Africa, hitting the final score; Grok predicted Mexico 2:1, and Claude predicted Mexico 3:1. While both correctly predicted a Mexico win, they didn't nail the exact score.

For this opening match prediction, different models offered three different 'scripts.' ChatGPT Go and Gemini Pro were closer to the actual match: Mexico dominant, South Africa lacking in attack, ending with a clean sheet. Grok gave a more open scoreline, suggesting South Africa would get a goal back on the counter. Claude Sonnet set higher expectations for Mexico's attack, predicting a more open 3:1 result.

Summary

Since the number of AI prediction samples available for review is still limited at this stage, it's not yet possible to directly judge which model is the most 'football-savvy.'

But just looking at the few matches completed so far, differences are already starting to show. Qwen currently has the most memorable moments, hitting Mexico 2:0 South Africa and South Korea 2:1 Czech Republic on the opening day, and also catching red card risks and match flow, representing a standout performance in a small sample. However, whether it can sustain this accuracy requires verification from more matches.

Copilot and ChatGPT both have highlights of hitting exact scores, but they also share a common issue—their judgment remains insufficiently sensitive to matches that deviate from paper strength, like Australia beating Turkey, Qatar drawing with Switzerland, and Japan drawing with the Netherlands.

As for models like Gemini, Grok, and Claude, the publicly available samples are more focused on single matches or social media comparisons; they have reference value but are not yet suitable for direct rankings.

AI can already serve as one layer of reference for World Cup prediction market users, but it is far from being the standard answer.

Questions liées

QAccording to the article, which AI model had the most impressive start in predicting the World Cup matches?

AAccording to the article, the AI model Qwen (千问) had the most impressive start. It correctly predicted the exact scores (2:0 for Mexico vs. South Africa and 2:1 for South Korea vs. Czech Republic) for the first two matches it covered, and also accurately flagged the risk of a red card for South Africa.

QWhat are the major strengths and weaknesses identified for Copilot's predictions in the article?

AThe article states that Copilot's major strengths included accurately predicting exact scores for several matches, notably a 1:1 draw for Brazil vs. Morocco. Its major weakness was a tendency to be conservative in predicting upsets and draws, as it missed calls for matches like Australia beating Turkey, Qatar drawing with Switzerland, and Japan drawing with the Netherlands.

QHow does ChatGPT's approach to World Cup prediction differ from Copilot's, as described in the article?

AThe article describes ChatGPT as more of a 'pre-match analysis' tool that provides detailed reasoning for its predictions, such as considering home advantage and team form. In contrast, Copilot provided a complete forecast for all 104 tournament matches. However, both models shared a similar weakness in underestimating the likelihood of upsets.

QWhat was the key difference in the predictions for the Mexico vs. South Africa opener between models like ChatGPT/Gemini and Grok/Claude?

AFor the Mexico vs. South Africa opener, ChatGPT and Gemini correctly predicted the exact 2:0 scoreline. Grok predicted a 2:1 win for Mexico, and Claude predicted a 3:1 win. While all four models correctly predicted a Mexico victory, only ChatGPT and Gemini got the specific score and the fact that South Africa would be shut out.

QWhat is the article's overall conclusion about the current state of AI models in predicting World Cup outcomes?

AThe article concludes that while AI models can provide a useful additional reference for prediction market participants, they are far from being a definitive 'standard answer.' Their performance varies, and with a limited sample size of matches, it's too early to definitively judge which model is best. Models have shown they can predict specific scores and trends, but they still struggle with consistently identifying potential upsets or unexpected results.

Lectures associées

Après la tokenisation d'un actif, comment en sortir ?

**Titre : Après la tokenisation des actifs, comment en sortir ?** **Résumé :** La tokenisation facilite l'émission et le transfert d'actifs réels (RWA) sur la blockchain, mais le problème de la liquidité et de la sortie (rachat) pour les détenteurs reste crucial pour une adoption plus large. Actuellement, trois modèles architecturaux principaux émergent pour fournir une liquidité de sortie instantanée ("T+0") : 1. **Le modèle de bilan (ex. : Grove Basin) :** Une entité unique et bien capitalisée (comme Sky) utilise son propre bilan pour fournir instantanément des stablecoins aux détenteurs lors d'un rachat, réglant ensuite en différé avec l'émetteur de l'actif sous-jacent. Simple et rapide, sa capacité dépend cependant d'un seul bilan. 2. **Le modèle de coffres dédiés par actif (ex. : Upshift Clear) :** Des fournisseurs de liquidités indépendants allouent du capital dans des coffres séparés, chacun dédié à un actif tokenisé spécifique. Cela diversifie les sources de capitaux mais isole la liquidité et limite l'efficacité du capital, qui ne sert qu'un seul actif à la fois. 3. **Le modèle de couche de liquidité partagée (ex. : Symbiotic Liquid Lane) :** Ce modèle introduit une infrastructure de liquidité partagée. Des fonds communs (coffres) peuvent supporter simultanément le rachat de *multiples* actifs tokenisés différents. Le capital est ainsi plus efficace, continuant à générer du rendement (par ex., via le prêt DeFi) entre les événements de rachat. Les rachats sont réglés via un marché de requêtes de prix (RFQ) concurrentiel où des teneurs de marché qualifiés soumissionnent. **Conclusion :** Alors que les modèles de bilan et de coffres dédiés offrent des solutions de sortie rapides pour des cas d'usage spécifiques (comme les obligations d'État), le modèle de couche partagée vise à construire une infrastructure de liquidité évolutive, efficace en capital et capable de supporter la diversité croissante des RWA, notamment les actifs moins liquides comme le crédit privé. Il transforme la liquidité de rachat d'un mécanisme ponctuel en une couche financière fondamentale et composable pour le marché de la tokenisation.

链捕手Il y a 11 mins

Après la tokenisation d'un actif, comment en sortir ?

链捕手Il y a 11 mins

Les trois moments d'Anthropic : fuite de code, confrontation avec le gouvernement et militarisation

Le modèle Fable d'Anthropic, une version sécurisée de son puissant modèle Mythos, a été bloqué par le gouvernement américain pour des raisons de sécurité nationale suite à une méthode de contournement ("jailbreak") signalée, selon l'entreprise, par Amazon. Anthropic conteste la sévérité de cette faille et s'oppose à la décision gouvernementale. Cet incident s'inscrit dans un conflit plus large. L'article analyse la stratégie d'Anthropic, qui utilise un récit de sécurité pour justifier ses actions commerciales. Son objectif économique est de se rapprocher des utilisateurs pour capter leurs données, cruciales pour l'amélioration future des modèles via l'apprentissage par renforcement, et ainsi rivaliser directement avec les éditeurs de logiciels comme Microsoft. La politique initiale d'Anthropic, visant à dégrader discrètement les performances de Fable pour le développement de modèles rivaux, révèle sa volonté de contrôler l'accès à l'IA de pointe. L'entreprise, fondée sur des principes de sécurité, croit être la seule entité capable et légitime de développer et de contrôler une IA avancée, ce qui justifie à ses yeux ses décisions, souvent bénéfiques pour ses intérêts commerciaux. L'auteur souligne la cohérence interne et l'efficacité de cette approche, mais exprime une inquiétude face à la concentration de pouvoir et à l'assurance morale que cela implique, laissant entrevoir un futur où Anthropic pourrait détenir une influence considérable sur l'économie et la société.

marsbitIl y a 28 mins

Les trois moments d'Anthropic : fuite de code, confrontation avec le gouvernement et militarisation

marsbitIl y a 28 mins

SPCX va-t-elle atteindre 400 dollars ? Les options peuvent-elles propulser SpaceX au rang de première capitalisation boursière mondiale ?

TL;DR · ZeroHedge estime que l'introduction des options sur SPCX pourrait déclencher un gamma squeeze, poussant potentiellement le cours jusqu'à 400 USD dans un scénario extrême. · Seul le canal de volatilité est confirmé pour l'instant ; le prix de 400 USD ne représente pas un consensus de marché. · Titres concernés : SPCX, NVDA, MSFT, AAPL, SQQQ, SOXS. L'article analyse l'hypothèse de ZeroHedge selon laquelle le cours de SPCX, l'action SpaceX, pourrait atteindre 400 USD suite à un "gamma squeeze" déclenché par l'ouverture des options. Cette projection extrême s'appuie sur la structure de marché spécifique de SPCX : une faible flottante (peu d'actions disponibles à la vente) en début de cotation, un fort intérêt des investisseurs particuliers et l'arrivée imminente des options. Le mécanisme potentiel est le suivant : un achat massif d'options d'achat (calls), notamment hors de la monnaie (à des prix d'exercice supérieurs au cours actuel), forcerait les teneurs de marché à acheter l'action sous-jacente pour couvrir leurs risques. Cet achat ferait monter le cours, nécessitant encore plus de couverture, créant une boucle de rétroaction positive. Cependant, l'article souligne plusieurs limites et conditions. Le prix de 400 USD (représentant une capitalisation d'environ 5 200 milliards USD) est un scénario spéculatif, pas une prévision fondée. La réalisation d'un tel gamma squeeze dépendrait de données concrètes : des volumes importants sur les options, une concentration des achats sur les calls à prix d'exercice élevés, et la capacité du marché à absorber les ventes de prises de bénéfices. De plus, les flux des investisseurs particuliers semblent concentrés sur SPCX plutôt qu'être un signal d'appétit général pour le risque. En conclusion, si les conditions structurelles (faible flottante, options) permettent une volatilité accrue, le scénario des 400 USD reste hypothétique. Les prochains jours, l'observation de la chaîne des options (répartition des prix d'exercice, open interest, volatilité implicite) et la liquidité de l'action seront déterminants pour évaluer la probabilité d'un tel mouvement.

marsbitIl y a 41 mins

SPCX va-t-elle atteindre 400 dollars ? Les options peuvent-elles propulser SpaceX au rang de première capitalisation boursière mondiale ?

marsbitIl y a 41 mins

小鹏 et Nio misent sur la puissance de calcul, Li Auto change d’architecture

Le 15 juin, Li Auto a dévoilé en détail sa puce d'intelligence artificielle auto-développée, le Mahe M100, conçue pour sa nouvelle berline de luxe L9 Livis. Le CTO Xie Yan a souligné que l'objectif n'était pas seulement de créer une puce plus rapide, mais une puce fondamentalement différente dans son architecture, s'écartant de l'approche concurrente axée sur les TOPS. Dans un contexte où les constructeurs automobiles chinois (Nio, Xpeng, Huawei) développent leurs propres puces, Li Auto choisit de repenser l'architecture sous-jacente. Le Mahe M100 adopte une architecture à flux de données dynamiques, visant à réduire les goulots d'étranglement liés à la bande passante mémoire des architectures von Neumann classiques pour le traitement des grands modèles d'IA. Cela permettrait, selon Li Auto, une puissance de calcul effective triplée et une latence réduite de 40% par rapport à des solutions existantes pour ses propres algorithmes. L'architecture du M100, publiée et acceptée à l'ISCA 2026, est conçue pour être étroitement couplée avec les modèles d'IA de Li Auto, formant une chaîne d'innovation complète (puce, compilateur, OS, algorithmes, contrôleur de domaine) visant l'autonomie technologique et une optimisation poussée. Parallèlement, Li Auto a présenté sa vision de la "voiture à intelligence incarnée", redéfinissant le véhicule comme un assistant personnel intelligent. Pour soutenir cette vision, la marque s'est engagée à aligner son modèle de conduite autonome, Mahe VLA, avec Tesla FSD V14 d'ici le quatrième trimestre 2024, avec des mises à jour OTA progressives tout au long du second semestre. Cette stratégie technologique ambitieuse intervient alors que Li Auto fait face à des pressions financières et commerciales, avec un objectif de ventes de 550 000 unités pour 2026. L'efficacité de cette nouvelle approche, notamment la réussite de la puce M100 et les prochaines mises à jour logicielles, sera déterminante pour son avenir compétitif.

marsbitIl y a 1 h

小鹏 et Nio misent sur la puissance de calcul, Li Auto change d’architecture

marsbitIl y a 1 h

Année charnière de l'IA appliquée : Se contenter de dire oui en ignorant les risques ? Le journal de bord du développement logiciel devient open source

L'ère des applications d'IA est là, mais ses risques se cachent dans un code apparemment correct, menaçant de provoquer des fuites de données ou des pertes financières. Le projet open source **Narwhal AI Code Risks**, issu de l'Université de Pékin, compile ces dangers en un journal de navigation pour le développement logiciel. Il catégorise les incidents en trois niveaux : des **cas réels** (comme l'erreur de configuration d'un oracle Moonwell ayant causé une perte de 1,7 million de dollars), des **signaux précoces** à surveiller, et des **scénarios typiques** de risques. Le danger ne réside pas dans un code erroné, mais dans un code syntaxiquement parfait qui introduit des failles sémantiques, des dépendances inexistantes, des permissions excessives ou des configurations cloud vulnérables. Les agents IA, en enchaînant les actions, complexifient encore la traçabilité. Le projet identifie **7 grandes catégories de risques** : la chaîne d'approvisionnement, les vulnérabilités du code, les configurations cloud/infrastructure, les risques liés aux agents, les risques sectoriels (fintech, santé...), la propriété intellectuelle/conformité, et les facteurs humains. L'objectif est de transformer des expériences dispersées en une connaissance réutilisable, aidant les développeurs à anticiper les pièges, les chercheurs à constituer des bases d'analyse et les éditeurs d'outils à renforcer leurs détections. Il s'agit de créer une mémoire collective pour naviguer de manière plus sûre dans l'ère du code généré par l'IA.

marsbitIl y a 1 h

Année charnière de l'IA appliquée : Se contenter de dire oui en ignorant les risques ? Le journal de bord du développement logiciel devient open source

marsbitIl y a 1 h

Trading

Spot

Futures

The World Cup has only been played for a few days, but some AI prediction models have already been crowned as oracles, while others have stumbled badly.

Résumé

The first to go viral was, surprisingly, Qwen

Copilot: Moments of brilliance, but also obvious stumbles

ChatGPT: Analysis is thorough, but not sharp enough on upsets

Gemini, Grok, Claude: Different models write different scripts for the same match

Summary

Questions liées

Lectures associées

Après la tokenisation d'un actif, comment en sortir ?

Les trois moments d'Anthropic : fuite de code, confrontation avec le gouvernement et militarisation

SPCX va-t-elle atteindre 400 dollars ? Les options peuvent-elles propulser SpaceX au rang de première capitalisation boursière mondiale ?

小鹏 et Nio misent sur la puissance de calcul, Li Auto change d’architecture

Année charnière de l'IA appliquée : Se contenter de dire oui en ignorant les risques ? Le journal de bord du développement logiciel devient open source

Trading

Catégories populaires

Tags tendances