The World Cup has only been played for a few days, but some AI prediction models have already been crowned as oracles, while others have stumbled badly.

marsbit2026-06-16 tarihinde yayınlandı2026-06-16 tarihinde güncellendi

Özet

The 2026 FIFA World Cup has sparked significant interest not only on the pitch but also in AI-driven match prediction. Major models like Qwen, Copilot, and ChatGPT are being used to forecast outcomes, scores, upsets, red cards, and key player performances. Qwen gained early attention by accurately predicting Mexico's 2-0 win over South Africa (including a red card risk) and South Korea's 2-1 victory over the Czech Republic in the opening matches. Copilot's pre-tournament predictions had notable successes, such as correctly calling the Mexico 2-0 scoreline, South Korea's 2-1 win, and Brazil's 1-1 draw with Morocco. However, it also had clear misses, failing to predict upsets like Australia's 2-0 win over Turkey or Switzerland's draw with Qatar. ChatGPT provided detailed analytical reasoning, correctly predicting Mexico's 2-0 win, but its full-tournament predictions tended to favor favorites, missing several underdog results and draws. Tests pitting multiple models (ChatGPT, Gemini, Grok, Claude) against the same match, like Mexico vs. South Africa, showed varying predictions, with only some hitting the exact score. In summary, while AI models like Qwen have shown promising early results in specific match details, and others have had isolated successes, they collectively struggle to consistently identify upsets and underdog performances. AI is becoming an additional reference tool for prediction markets but is far from a definitive source.

The most exciting place at this World Cup isn't just on the pitch.

As interest in World Cup prediction events heats up, more and more users are participating in trading with real money. Who will win, what will the score be, will there be an upset, will there be a red card, which player will score—these topics, originally just casual pre-match chatter among fans, are now broken down into individual tradable prediction events.

When predictions become trades, users need more than just emotions and intuition: odds fluctuations, team form, injury news, head-to-head history, and market sentiment all become reference points before making a trade. In this process, AI models are being frequently brought into World Cup prediction scenarios.

Large models like Qwen, ChatGPT, Gemini, Claude, DeepSeek, Qwen, and Copilot can not only answer 'which team is more likely to win' but also provide score predictions, likelihood of upsets, red card risks, key player performances, and match flow analysis. For prediction market participants, AI's pre-match analysis is becoming another layer of reference beyond odds, news, team data, and market sentiment.

However, predictions ultimately have to be judged against the actual matches.

With the official start of the World Cup, the results of the first few matches have come in. Those AI analyses that users consulted to aid their judgments before the matches now have answers to compare against: Were the scores predicted correctly? Were upsets foreseen? How many details like red cards, last-minute winners, and match flow were actually captured by the models?

The first to go viral was, surprisingly, Qwen

The most entertaining performance on the opening day of the World Cup undoubtedly belonged to Qwen.

For the opening match between Mexico and South Africa, Qwen's pre-match prediction was Mexico 2:0 South Africa. After the match ended, the score was indeed 2:0. What's more interesting is that the match saw a total of three red cards, which also largely aligned with Qwen's pre-match risk assessment of 'South Africa's overly aggressive defending, potentially leading to playing with ten men early on.'

If it were just predicting a Mexico win, that wouldn't be too surprising. As one of the hosts, Mexico was favored anyway. But what Qwen nailed this time were the more specific match details: the 2:0 scoreline, South Africa's red card risk, and the pace of the game gradually opening up in the later stages.

Next, for the match between South Korea and the Czech Republic, Qwen gave a prediction of South Korea 2:1.

This match wasn't easy to call before kick-off. The Czech Republic had physicality, set-piece threats, and the usual big-tournament experience of European teams. The match process was indeed not one-sided; the Czechs took the lead first, South Korea equalized later, and the game was deadlocked at 1:1 for a long time. It wasn't until the final stages that South Korea scored the winning goal, with the final score becoming 2:1.

This gave Qwen's prediction an even stronger sense of 'scriptwriting.' Predicting the winner can rely on paper strength, score predictions can involve luck, but process details like red cards, comebacks, and last-minute winners are what truly make people think 'there's something to this.' After two matches on the opening day, Qwen first raised the profile of AI World Cup predictions.

Copilot: Moments of brilliance, but also obvious stumbles

Before the tournament, USA Today had Copilot predict all 104 matches of this World Cup. Judging from the completed matches so far, these predictions have both highlights and obvious misses.

Among them, three match predictions stood out.

For the opening match Mexico vs. South Africa, Copilot predicted Mexico 2:0, which matched the final score exactly. For South Korea vs. the Czech Republic, it predicted South Korea 2:1, again consistent with the result. For Brazil vs. Morocco, Copilot gave a 1:1 prediction, and Brazil was indeed held to a draw by Morocco.

Especially the Brazil 1:1 Morocco match, the prediction had significant merit. Brazil is, after all, a traditional powerhouse, with a squad and level of attention in the top tier.

Although Morocco reached the semi-finals in the last World Cup, predicting a draw against Brazil before the match was not a particularly safe choice. After the match, Brazil failed to get a winning start, and Morocco continued its resilience in major tournaments—Copilot's prediction for this match was indeed a 'stroke of genius.'

But Copilot's issues also became apparent quickly.

It predicted Canada would beat Bosnia and Herzegovina 2:1, but the match ended 1:1; it predicted Switzerland would edge Qatar 1:0, but Switzerland was also held to a draw; it predicted the USA would beat Paraguay 2:0—the direction was correct, but the actual score was 4:1, significantly underestimating the attacking intensity.

More obvious stumbles occurred in several matches involving upsets and strong teams being held back.

For Turkey vs. Australia, Copilot predicted Turkey would win 2:1, but Australia pulled off a 2:0 upset win. For Ecuador vs. Ivory Coast, it predicted Ecuador 2:1, but Ivory Coast won 1:0. For the Netherlands vs. Japan, it predicted the Netherlands 2:1, but Japan came back twice to level, ending in a 2:2 draw. For Sweden vs. Tunisia, it predicted 1:1, but Sweden thrashed them 5:1.

The fact that Copilot could nail the exact scores for Mexico, South Korea, and Brazil shows it doesn't just follow the favorites. But matches like Australia beating Turkey, Qatar drawing with Switzerland, and Japan drawing with the Netherlands also expose its judgments on upsets and draws as still being relatively conservative.

ChatGPT: Analysis is thorough, but not sharp enough on upsets

Compared to Copilot's full tournament predictions, ChatGPT is more like a 'pre-match analytical player.'

In its opening match prediction, ChatGPT predicted Mexico 2:0 South Africa, hitting the final score. The reasoning it provided was also quite thorough, including Mexico's home advantage, recent form, South Africa's lack of attacking threat, and factors like the high altitude of Mexico City and the home crowd atmosphere. In this prediction, ChatGPT didn't just give a result; the underlying logic also aligned with the match outcome.

However, when it comes to full tournament predictions, ChatGPT's stability isn't as strong. While it correctly predicted Mexico 2:0 South Africa and Brazil 1:1 Morocco, and got the win/loss direction right for several matches like Scotland, Germany, and Sweden, for matches like South Korea 2:1 Czech Republic, Qatar 1:1 Switzerland, Australia 2:0 Turkey, and Japan 2:2 the Netherlands, ChatGPT's predictions favored the team with stronger paper strength. For example, it predicted Switzerland should beat Qatar, Turkey should beat Australia, and the Netherlands should edge Japan.

ChatGPT is not without predictive ability; it can break down team strength, home conditions, and recent form clearly, and can hit the score in some matches. But based on current results, it seems better at explaining 'why the favorite is more logical' rather than identifying in advance which matches might deviate from the favorite's script.

Gemini, Grok, Claude: Different models write different scripts for the same match

Besides Qwen, Copilot, and ChatGPT, some social media users have fed the same match to multiple models for pre-match predictions.

Taking the opening match Mexico vs. South Africa as an example, one blogger simultaneously tested four AI models—ChatGPT, Gemini, Grok, and Claude—for pre-match predictions. The results showed that both ChatGPT and Gemini predicted Mexico 2:0 South Africa, hitting the final score; Grok predicted Mexico 2:1, and Claude predicted Mexico 3:1. While both correctly predicted a Mexico win, they didn't nail the exact score.

For this opening match prediction, different models offered three different 'scripts.' ChatGPT Go and Gemini Pro were closer to the actual match: Mexico dominant, South Africa lacking in attack, ending with a clean sheet. Grok gave a more open scoreline, suggesting South Africa would get a goal back on the counter. Claude Sonnet set higher expectations for Mexico's attack, predicting a more open 3:1 result.

Summary

Since the number of AI prediction samples available for review is still limited at this stage, it's not yet possible to directly judge which model is the most 'football-savvy.'

But just looking at the few matches completed so far, differences are already starting to show. Qwen currently has the most memorable moments, hitting Mexico 2:0 South Africa and South Korea 2:1 Czech Republic on the opening day, and also catching red card risks and match flow, representing a standout performance in a small sample. However, whether it can sustain this accuracy requires verification from more matches.

Copilot and ChatGPT both have highlights of hitting exact scores, but they also share a common issue—their judgment remains insufficiently sensitive to matches that deviate from paper strength, like Australia beating Turkey, Qatar drawing with Switzerland, and Japan drawing with the Netherlands.

As for models like Gemini, Grok, and Claude, the publicly available samples are more focused on single matches or social media comparisons; they have reference value but are not yet suitable for direct rankings.

AI can already serve as one layer of reference for World Cup prediction market users, but it is far from being the standard answer.

İlgili Sorular

QAccording to the article, which AI model had the most impressive start in predicting the World Cup matches?

AAccording to the article, the AI model Qwen (千问) had the most impressive start. It correctly predicted the exact scores (2:0 for Mexico vs. South Africa and 2:1 for South Korea vs. Czech Republic) for the first two matches it covered, and also accurately flagged the risk of a red card for South Africa.

QWhat are the major strengths and weaknesses identified for Copilot's predictions in the article?

AThe article states that Copilot's major strengths included accurately predicting exact scores for several matches, notably a 1:1 draw for Brazil vs. Morocco. Its major weakness was a tendency to be conservative in predicting upsets and draws, as it missed calls for matches like Australia beating Turkey, Qatar drawing with Switzerland, and Japan drawing with the Netherlands.

QHow does ChatGPT's approach to World Cup prediction differ from Copilot's, as described in the article?

AThe article describes ChatGPT as more of a 'pre-match analysis' tool that provides detailed reasoning for its predictions, such as considering home advantage and team form. In contrast, Copilot provided a complete forecast for all 104 tournament matches. However, both models shared a similar weakness in underestimating the likelihood of upsets.

QWhat was the key difference in the predictions for the Mexico vs. South Africa opener between models like ChatGPT/Gemini and Grok/Claude?

AFor the Mexico vs. South Africa opener, ChatGPT and Gemini correctly predicted the exact 2:0 scoreline. Grok predicted a 2:1 win for Mexico, and Claude predicted a 3:1 win. While all four models correctly predicted a Mexico victory, only ChatGPT and Gemini got the specific score and the fact that South Africa would be shut out.

QWhat is the article's overall conclusion about the current state of AI models in predicting World Cup outcomes?

AThe article concludes that while AI models can provide a useful additional reference for prediction market participants, they are far from being a definitive 'standard answer.' Their performance varies, and with a limited sample size of matches, it's too early to definitively judge which model is best. Models have shown they can predict specific scores and trends, but they still struggle with consistently identifying potential upsets or unexpected results.

İlgili Okumalar

Xpeng and NIO Compete on Computing Power, Li Auto Shifts Architecture

On June 15, 2026, Li Auto unveiled details of its self-developed chip, Mahe M100, for its new L9 Livis model. CTO Xie Yan stated the goal was not just a faster chip, but a fundamentally different one, targeting the chip architecture itself. While competitors like NIO, Xpeng, and Huawei highlight TOPS (computing power) figures for their self-developed chips, Li Auto’s Mahe M100 focuses on redesigning the underlying architecture. It employs a "dynamic data flow architecture" to address memory bandwidth bottlenecks in large model inference, claiming up to 3x the effective computing power of Nvidia's Thor U for its specific workloads and a 40% reduction in latency. The chip's design was peer-reviewed and accepted at ISCA 2026. However, this performance is highly optimized for Li Auto's own VLA2.1 algorithm, meaning it may not generalize as well to other tasks. Li Auto aims to achieve full-stack in-house development with Mahe M100, covering chip, compiler, OS, AI algorithms, and domain controller—a level of vertical integration few competitors match. Beyond the chip, CEO Li Xiang introduced a new strategic narrative: the "embodied intelligent vehicle," defined as an integration of an EV, a professional driver, an AI computer, and a life assistant. This shifts competition from features like large screens to systemic AI capabilities. A key commitment was that Li Auto's Mahe VLA autonomous driving model will match Tesla's FSD V14 by Q4 2026, with specific OTA milestones set for July, September, and December. Financially, Li Auto faces pressure with declining revenue and vehicle gross margins since Q4 2025, while maintaining high R&D investment (approx. ¥12B in 2026, 50% AI-related). Its 2026 sales target is 550,000 vehicles, up from 406,000 in 2025. The new L9 Livis garnered over 10,000 pre-orders in two weeks. The effectiveness of these strategic moves—new products, OTAs, and the novel chip architecture—will begin to show in Q3 2026 financial results, with the year-end FSD V14 benchmark being the ultimate test.

marsbit30 dk önce

Xpeng and NIO Compete on Computing Power, Li Auto Shifts Architecture

marsbit30 dk önce

The Year of AI Applications: Saying 'Yes' While Ignoring Risks? A Comprehensive Open Source Log of Software Development's Journey

The Year of AI Applications: Blindly Saying "Yes" While Ignoring Risks? A Software Development Log Goes Fully Open Source. AI-generated code harbors risks hidden within seemingly correct programs, potentially leading to data leaks or asset loss. The open-source project "Narwhal AI Code Risks," from Peking University's Narwhal-Lab, compiles real-world cases, early warning signs, and typical risk pathways. Its goal is to help developers identify potential hazards early and avoid repeating past mistakes. In 2026, code is generated faster than ever but deployed with less scrutiny. The danger often lies not in glaring errors, but in code that appears normal—syntactically correct, passing all checks—yet introduces subtle but critical flaws like non-existent dependencies, excessive permissions, or exposed databases. A stark example is the Moonwell cbETH oracle incident. A configuration file error, where a cryptocurrency price was set to ~$1.12 instead of ~$2,200, slipped through 28 checks and a pull request signed by both AI (Claude, Copilot) and human developers. This "semantic deviation" resulted in a loss of $1.78 million. The risk is that AI can produce functionally valid code that is semantically wrong for the business context. As AI moves beyond simple code completion to modifying configurations, installing dependencies, and operating via autonomous agents, it traverses longer, less traceable paths within software engineering, blurring traditional boundaries and oversight points. The Narwhal AI Code Risks project structures information into three layers: `/cases` for documented real-world incidents, `/inferred` for early warning signals, and `/scenarios` for clear, generalized risk patterns not yet tied to specific events. This aims to create a lasting, public record to prevent collective amnesia about past AI-coding pitfalls. Risks are categorized into seven areas: Software Supply Chain (e.g., recommending fake packages), Code-Level Vulnerabilities (e.g., reintroducing path traversal bugs), Cloud & Infrastructure Misconfiguration (e.g., overly permissive settings), Agent Risks (from autonomous tool execution), Vertical Domain Risks (e.g., in finance, healthcare), Intellectual Property & Compliance issues, and Human Factors (like over-reliance on AI output). The project's core value is transforming isolated incidents into reusable knowledge—a foundational resource for developers to spot similar issues, for security researchers to build upon, for toolmakers to create detection rules, and for the community to contribute new findings. As AI integration accelerates, this open-source "logbook" serves as a crucial navigational aid, charting past errors to help future projects steer clear of the same traps.

marsbit30 dk önce

The Year of AI Applications: Saying 'Yes' While Ignoring Risks? A Comprehensive Open Source Log of Software Development's Journey

marsbit30 dk önce

The Foundation of SpaceX's Trillion-Dollar Valuation: Who is Dividing Up Musk's Annual Tens of Billions in Capital Expenditure?

SpaceX's trillion-dollar valuation is built on its three core businesses: Starlink (profitable, 60% of revenue), rockets (driving down launch costs), and AI (a major investment area). This creates a financial cycle: Starlink funds rocket development, which enables low-cost launches for AI hardware, generating future revenue. This cycle fuels annual capital expenditures of tens of billions, flowing to a vast supply chain. Suppliers are categorized by their replaceability. The first group includes irreplaceable players like NVIDIA (GPU/CUDA ecosystem), Eutelsat (critical radio spectrum), Filtronic (specialized amplifiers), Materion (strategic beryllium), and STMicroelectronics (antenna chips). The second group consists of hard-to-replace suppliers due to high switching costs, such as Honeywell (flight control), Carpenter Technology (specialty alloys), Hexcel (carbon fiber), Broadcom (data exchange), and Linde (industrial gases). The third group comprises high-volume, cost-critical suppliers for mass-produced items like Starlink terminals. Key names include Wistron NeWeb (primary manufacturer) and several A-share companies like Shenzhen Sunway (connectors), Pies New Materials (forgings), Western Superconducting (alloys), and Yingliu (castings). Other niche players include Trimble (timing), Astronics (power distribution), and CTS (thermal management). The article argues that investing in these suppliers, rather than SpaceX stock directly, offers an alternative opportunity. The rationale is threefold: procurement is just beginning to scale, SpaceX's IPO brings new transparency to its supply chain, and the situation mirrors early stages of past "super terminal" ecosystems like Apple or Tesla. While risks exist (commodity cycles, geopolitical factors, technology shifts), the core thesis is that SpaceX's massive, ongoing procurement will translate into reliable revenue for its key suppliers, regardless of its own stock price volatility.

marsbit1 saat önce

The Foundation of SpaceX's Trillion-Dollar Valuation: Who is Dividing Up Musk's Annual Tens of Billions in Capital Expenditure?

marsbit1 saat önce

SpaceX's Trillion-Dollar Valuation Base: Who's Sharing in Musk's Annual Tens of Billions in Capital Expenditure?

**Title: The Foundation of SpaceX's Trillion-Dollar Valuation: Who Benefits from Musk's Annual $100 Billion Capital Expenditure?** This article argues that investors seeking to benefit from SpaceX's growth might find greater opportunities in its supply chain rather than directly investing in the company itself, drawing parallels to historical successes with Apple, Tesla, and NVIDIA suppliers. **SpaceX's Business Model & Cash Flow:** SpaceX generates revenue from three main areas: 1. **Starlink:** Its profitable core, earning $11.3B in 2023 (60% of revenue), funding other ventures. 2. **Rockets (Falcon/Starship):** Requires $3B+ in annual R&D but achieves the world's lowest launch costs. 3. **AI:** Currently unprofitable (-$6B+ in 2023), investing heavily in ground-based supercomputers (220,000 GPUs) and future orbital data centers. The cycle is: Starlink profits → fund cheaper rockets → low-cost launches deploy AI hardware → AI compute rentals generate future revenue. This cycle drives annual procurement spending of tens of billions of dollars. **The Supply Chain Beneficiaries:** Suppliers are categorized by their replaceability: **1. Nearly Irreplaceable (High Barriers to Entry):** * **NVIDIA:** Powers the Colossus supercomputer; its CUDA ecosystem creates immense switching costs. * **Eutelsat (SATS):** Controls critical radio spectrum for satellite communications; holds a ~3% stake in SpaceX. * **Filtronic (FTC):** Supplies millimeter-wave signal amplifiers for Starlink satellites; SpaceX constitutes 83% of its revenue. * **Materion (MTRN):** Global leader in beryllium production, a strategic material used in Starship structures. * **STMicroelectronics (STM):** Supplies phased-array antenna chips for Starlink satellites. **2. Replaceable, but Switching Cost is Prohibitively High:** * **Honeywell (HON):** Provides flight control and inertial navigation systems with decades of certification. * **Carpenter Technology (CRS):** Manufactures ultra-pure specialty steel alloys for Raptor engines. * **Hexcel (HXL):** Supplies custom carbon fiber composites developed over a decade with SpaceX. * **Broadcom (AVGO):** Manages high-speed data switching. * **Linde Group:** Supplies industrial gases (liquid oxygen/nitrogen) from facilities built near SpaceX launch sites. **3. High-Volume, Cost-Critical Manufacturing:** Focuses on mass-producing components like Starlink user terminals (target: 30 million units). * **Key Players:** Wistron NeWeb (6285, primary terminal manufacturer), several Chinese A-share companies (e.g., Sunway Communication, PAX New Materials, Western Metal Materials, Yingliu Co.), and smaller US firms like Trimble (TRMB, timing systems). **Why Now?** Three factors make the supply chain opportunity timely: 1. **Volume Ramp-Up:** SpaceX plans 100 launches in 2026, aims for 30 million Starlink terminals, and will deploy AI data centers, meaning procurement will accelerate. 2. **Increased Transparency:** The IPO provides public financial data, allowing investors to track supplier order growth. 3. **Historical Precedent:** The current phase is likened to Tesla's early mass-production stage (circa 2018), suggesting a long growth runway for suppliers. **Conclusion:** The article posits that while investing in SpaceX stock is betting on Elon Musk's ambitious vision at a high valuation, investing in its established suppliers is a bet on the tangible, recurring revenue from its massive procurement budget, which is largely decoupled from day-to-day stock price volatility.

链捕手1 saat önce

SpaceX's Trillion-Dollar Valuation Base: Who's Sharing in Musk's Annual Tens of Billions in Capital Expenditure?

链捕手1 saat önce

İşlemler

Spot
Futures
活动图片