Original | Odaily Planet Daily(@OdailyChina)
Author | Asher(@Asher_ 0210)

Before each World Cup match, I always have AI models make predictions. Almost every model makes logical and detailed arguments.
Some discuss team value, some break down group stage data, some analyze injuries and tactics, and others directly give scripts for scorelines, extra time, and penalty shootouts. At first glance, ChatGPT, Grok, Qwen, DeepSeek, Gemini, and Claude all seem to know football well.
But as a user of prediction markets, what I really care about is not which model provides the most complete analysis, but which one is more worthy of reference.
As the World Cup entered the knockout stage, Odaily Planet Daily began asking different AI models the same questions before each match from the first game, and then checked back against the real results afterwards—which models just sounded plausible, and which ones truly foresaw the match outcome in advance.
So far, in the concluded World Cup knockout matches: Canada narrowly defeated South Africa 1-0, Brazil edged Japan 2-1, Germany was eliminated by Paraguay after being dragged into a penalty shootout, and the Netherlands also fell to Morocco on penalties. The Belgium vs. Senegal match was even more dramatic, ending 2-2 in regulation before an extra-time turnaround, fully highlighting the unpredictability of knockout football.
DeepSeek and Gemini: Gained Fame by Predicting the Morocco Match
The most memorable predictions so far are from DeepSeek and Gemini for the Netherlands vs. Morocco match. It was easy to pick the wrong side before this game—the Netherlands appeared stronger on paper and had a more complete squad. Many models acknowledged Morocco would be tough, but ultimately trusted the Netherlands to advance.
The brilliance of DeepSeek and Gemini lies in not stopping at "this will be a tight match." They wrote the subsequent script. Gemini predicted a 1-1 draw in regulation time and a penalty shootout win for Morocco before the match. The match indeed ended 1-1, with Morocco winning 3-2 on penalties. They didn't just guess the direction correctly; they basically predicted how the match would reach penalties and who would prevail.

Gemini's prediction for the Netherlands vs. Morocco match
DeepSeek was also very close. It judged that the match was most likely to end 1-1 or 0-0 in regulation, could go to extra time or even penalties, and leaned towards Morocco causing an upset with defense and counter-attacks.

DeepSeek's prediction for the Netherlands vs. Morocco match
After this match, DeepSeek and Gemini's presence skyrocketed. Especially Gemini, this time it didn't seem like making a pre-match prediction, but more like having read the match script in advance.
Grok and Qwen Consistently Nailed Exact Scores, More Stable Than Expected
Besides DeepSeek and Gemini's highlight in the Morocco match, Grok and Qwen also made their presence felt. Their most impressive aspect is that in matches with relatively clear favorite outcomes, they not only correctly predicted the advancing team but also forecasted scores quite close to the final results.
South Africa vs. Canada is an example. Most AI models favored Canada before the match, but the分歧 was whether Canada would win comfortably. Grok predicted a 1-0 win for Canada, and Qwen also predicted a narrow one-goal victory. In the end, Canada advanced with just one goal, not the imagined big win.

Qwen's prediction for the South Africa vs. Canada match
Brazil vs. Japan was similar. Most AI models thought Brazil was stronger, but whether Japan could keep the match tight was the key. Both Grok and Qwen predicted a 2-1 scoreline, and the match indeed ended with Brazil narrowly winning 2-1. They correctly foresaw not just "Brazil will win," but that Japan could cause enough trouble.
They were also quite accurate for Ivory Coast vs. Norway. Norway, with Haaland, was an understandable favorite, but Ivory Coast's physicality and wing attacks wouldn't make it one-sided. Both Grok and Qwen predicted a 2-1 win for Norway, which matched the final "script."

Grok's prediction for the Ivory Coast vs. Norway match
The advantage of Grok and Qwen is their detailed analysis of favored matches. They didn't write the grand script of Morocco eliminating the Netherlands in advance, but in matches involving Canada, Brazil, Norway, France, etc., they gave predictions for the outcome direction and score that were quite accurate. In other words, they might not be the best at spotting upsets, but they are skilled at judging whether a favorite will cruise through or struggle to a narrow win.
ChatGPT Didn't Have Many Miraculous Scores, But Its Match Process Analysis Was Quite Accurate
ChatGPT didn't pull off a prediction like Gemini's for Morocco eliminating the Netherlands on penalties, nor did it consecutively hit exact scores like Grok and Qwen. But its strength—in many matches where the favorite seemed clear before the game, ChatGPT would more noticeably remind that the match might not be that easy.
Brazil vs. Japan is an example. ChatGPT predicted Brazil would advance but didn't portray it as a comfortable rout. It mentioned Japan's pressing, running, and discipline would make Brazil uncomfortable, even having a chance to score first or equalize. Ivory Coast vs. Norway was similar; ChatGPT predicted Norway's advancement but warned it wouldn't be an easy game, highlighting Ivory Coast's physicality, wing attacks, and transition ability would cause problems.
Additionally, for the England vs. DR Congo knockout match, ChatGPT didn't simply predict an England rout. It suggested the match might be cagey, with DR Congo using low-block defense to slow the tempo. England eventually advanced but didn't win comfortably.

ChatGPT's prediction for the England vs. DR Congo match
ChatGPT's strength lies not in always predicting scores accurately, but often in identifying where the resistance in a match will come from in advance. It's well-suited for understanding the dynamics of a match, but less so if one only wants a final score prediction. It can describe the process accurately, but when it comes to calling a major upset, it often lacks decisive conviction.
Germany's Exit Became a Collective AI Model Wreck
If previous matches showed the highlights of different models, then Germany vs. Paraguay was a collective failure.
Before the match, all AI models sided with Germany. ChatGPT, Grok, Qwen, Gemini, Claude all favored Germany, with score predictions mostly集中在 2-0, 3-0, or 3-1. The reasoning was consistent: all believed Germany was stronger on paper, had better squad depth, and more firepower.
But this was the match where things went wrong. The AI models underestimated Paraguay's ability to drag the match into a quagmire. Germany failed to settle it in regulation, couldn't break the deadlock in extra time, and was eventually dragged into a penalty shootout and eliminated by Paraguay.
Who's Most Accurate So Far?
Judging from the concluded knockout matches, different models are starting to show their characteristics.
DeepSeek and Gemini have the most highlights. They not only predict advancements for favorites like Brazil and France but also provide valuable answers in harder-to-call upset matches. For the Netherlands vs. Morocco match, their key advantage was daring to write the script for a Morocco upset and penalty shootout in advance. Especially Gemini, directly predicting Morocco's penalty victory was truly brilliant for this match.
Grok and Qwen are more like "score-type predictors." They hit many exact scores, performing well especially in matches involving Canada, Brazil, Norway, and France. The problem is, when facing traditional powerhouses like Germany or the Netherlands, they ultimately leaned towards the favorite.
ChatGPT and Claude are more like "analysis-type predictors." Their reasoning is well-written, their direction is mostly on point, and they can warn about risks like extra time. The issue is, they often see that a match will be tough but are reluctant to conclude with an upset. The Netherlands vs. Morocco was a case in point; they saw the risks of extra time and penalties but still trusted the Netherlands more.
So, rather than hastily asking which model knows football best, it's better to see which scenarios they are respectively suited for.





