AI Prediction Record: Want to Make Money in Prediction Markets with AI? But It Might Not Even Have Read the Question Clearly

marsbitPublished on 2026-01-04Last updated on 2026-01-04

Abstract

Based on an experiment comparing AI predictions against human "smart money" on Polymarket, this article investigates whether AI can reliably profit in prediction markets. The author tested Google's Gemini 2.5 Pro and xAI's Grok (via OpenRouter), both equipped with web search, on 21 resolved non-crypto market questions. The core finding is a divergence in performance: Grok achieved the highest win rate at 75%, outperforming humans (66.7%) and significantly beating Gemini (52.4%). However, a detailed analysis of the AI's reasoning revealed critical flaws. Gemini frequently misjudged the current date, leading to erroneous conclusions. Both models sometimes relied on superficial or commonsense assumptions instead of deep, evidence-based logic. A major failure mode was misinterpreting the specific settlement conditions of a market, such as confusing "any files" with "all files" being released. While Grok's results are promising, the experiment concludes that AI often fails to fully comprehend the question's nuances, highlighting a significant gap between raw information retrieval and true contextual understanding needed for reliable prediction.

Author|Nan Zhi (@Assassin_Malvo)

After many sectors were proven false, prediction markets have become one of the few sectors within the Crypto space that is still experiencing positive growth. On November 20, Nan Zhi began attempting to use last year's approach of finding smart money in Meme coins to search for smart money in prediction markets, achieving good results in the early stages.

In early December, coinciding with the launch of Gemini 3 Pro, the idea arose while testing related models: could AI be used to analyze and predict prediction markets, pitting humans against AI to see which side makes more accurate predictions?

When introducing prediction markets, they are often described as moving the market closer to the "truth" by "allowing insightful people to place real-money bets." However, some argue that Crypto + prediction markets allow "insiders" to safely profit from information asymmetry, thereby driving the market towards the "insider outcome." This is essentially a clash between the views of "wisdom of the crowd" and "truth is in the hands of the few." AI prediction leans more towards "wisdom of the crowd," thus requiring a large amount of available knowledge and insights.

Therefore, in selecting the AI model, Gemini and Grok were initially chosen because they rely on Google and the X platform, respectively, allowing for the most direct access to vast amounts of knowledge and insights. Recently, Nan Zhi added the combination of "Douban (Douyin Knowledge)," but due to the limited number of prediction questions involving it, it is not covered in this article.

Basic Rules

  • AI Versions: Gemini 2.5 pro (with built-in Google Search), Grok 4 Fast (called via OpenRouter, native search function enabled)
  • Question Selection: Humans choose the betting questions, AI follows with predictions, but the Crypto category is excluded.
  • Input Content: Official question (title), official description (Description), optional answers (actually only Yes and No)

Note: Polymarket's questions are divided into major categories (Events) and subcategories (Markets). Major Events are broad questions like "Who will be the next Fed Chair?" or "When will Strategy sell Bitcoin?" Under each Event, there are N sub-markets, such as "Will Hassett become the next Fed Chair?" or "Will Strategy sell Bitcoin before March 31, 2026?" To align with human predictions, Markets were chosen as the questions for AI judgment, without inputting other options. For example, the AI is only asked to judge "Will Hassett become the next Fed Chair?" rather than asking it to choose the most likely candidate from N possibilities.

  • Prompt Design:
  • Require the AI to search for the latest news, official announcements, expert analysis reports
  • Require the removal/prohibition of using prediction market data
  • Make judgments based on "evidence" using logical reasoning
  • Only allow Yes or No outputs, accompanied by a paragraph explaining the reasoning logic

Current Results

Among the predicted questions, 21 have been settled. Grok has the highest win rate at 75%, humans at 66.7%, and Gemini the lowest at 52.4%. Current results can be viewed on the relevant website.

What Mistakes Did the AI Make?

Gemini Occasionally Misjudges the Current Time

In the question "Will Trump's approval rating hit 35% in 2025?", Gemini stated that it is currently the first half of 2025, so anything is possible, and gave a random answer.

However, when the author directly asked Gemini to output the current time using a program, Gemini could provide the correct answer. It is still unclear why such an erroneous time perception occurred.

AI Lacks Depth of Thought

In the question "Gemini 3.0 Flash released by December 16?", Grok based its judgment on "official sources recently only mentioned Gemini 3 Pro and related 2.5 versions, with极少 mention of 3 Flash, therefore evidence is insufficient to judge," considering only immediate information.

Whereas Gemini pointed out "Gemini 1.0 was released in December 2023, and the experimental version of Gemini 2.0 Flash was launched in December 2024. Continuing this pattern, a 3.0 version release by the end of 2025 is logical," and also noted "a leaked demo about 'Gemini 3.0 Flash' circulating in online communities recently (December 14, 2025), further enhancing the possibility of its imminent public release."

Although, conclusion-wise, Gemini's answer was actually wrong, in this question, the obvious difference in the breadth of information relied upon by the two is evident.

AI Relies on Common Sense Rather Than Evidence + Logic for Inference

In the question "Trump approval Up or Down this week?", Gemini stated that "predicting the approval rating for a single week more than a year later is highly uncertain," first showing the "time misjudgment" issue again. Then Gemini said "in any ordinary week, the probability of events causing a slight decrease in support is likely slightly higher than the probability of positive events significantly boosting support," so a decrease in support is more likely. The generated conclusion was based solely on subjective common sense assumptions.

In this question, Grok based its judgment on news reports and polling data regarding "government shutdown, economic concerns, immigration policy disputes, and negative backlash from comments on Rob Reiner's death," which aligned with the design expectations.

Incorrect Judgment of Settlement Conditions

In the question "Will Trump release the Epstein files by December 20?", both Gemini and Grok already knew that "the government will release 'hundreds of thousands of pages' of documents on Friday (December 19th)." The settlement conditions clearly stated "if the government publicly releases any files related to Epstein's illegal activities that were not public before the listed date, it will be judged as Yes."

However, under this condition, Gemini stated that "completing the release of 'all' files by December 20th is impossible," clearly misjudging the conditions required for settlement, thus giving the wrong answer.

Summary

In summary, Grok's prediction win rate has surpassed that of these smart money players who have profited hundreds of thousands or even millions of dollars in prediction markets. However, upon深入探究 its prediction logic, there are still many areas that can be guided and corrected.

Related Questions

QWhat was the main purpose of the author's experiment with AI in prediction markets?

AThe author aimed to test whether AI could be used to analyze and predict outcomes in prediction markets, pitting AI predictions against human predictions to see which was more accurate.

QWhich two AI models were initially selected for the experiment and why?

AGemini and Grok were initially selected because they rely on Google and X platform, respectively, allowing them to directly access vast amounts of knowledge and insights.

QWhat was the key instruction given to the AI models regarding the data they could use for their predictions?

AThe AI models were instructed to search for the latest news, official announcements, and expert analysis reports, but were strictly prohibited from using prediction market data itself.

QWhat was one of the common errors the Gemini model made during the predictions?

AThe Gemini model occasionally misjudged the current time, leading to flawed reasoning, such as incorrectly assuming it was already 2025 when making a prediction.

QWhich AI model achieved the highest win rate in the experiment, and what was its performance?

AGrok achieved the highest win rate at 75%, outperforming both the human prediction rate of 66.7% and Gemini's rate of 52.4%.

Related Reads

KOL's Perspective: Why Is SOL Set to Rise from This Point?

**Summary: Why SOL is Positioned for Growth at This Level** The article argues that SOL is poised for an upward move from its current price point, citing several key factors. Primarily, SOL has just broken out of a 4-month consolidation phase. This breakout signals a return of risk appetite to the broader crypto market, as SOL is seen as a key indicator of overall crypto health. The token's ownership has reportedly shifted from short-term traders and tourists to long-term accumulators, leading to low volume. Any meaningful increase in trading activity could thus trigger significant upward momentum. Fundamental strengths include strong institutional adoption, integration with DeFi and RWAs (Real-World Assets), and the potential benefits from the Clarity Act. Despite its high volatility—having dropped 70% from its all-time high but still up 12x from its bear market low—SOL is highlighted as one of the few tokens from the last cycle to reach new highs. It boasts a robust ecosystem of applications, users, and protocols. Future catalysts include the expected influx of AI developers following the Miami Accelerate conference, which focused on AI on Solana. Furthermore, Solana is positioned as the premier chain for memecoin activity, a trend expected to continue and drive network usage and fees. The article concludes that recent price action reflects a healthy transfer to long-term holders, setting the stage for growth.

marsbit40m ago

KOL's Perspective: Why Is SOL Set to Rise from This Point?

marsbit40m ago

Those Pre-Bitcoin PoW Protocols Have Recently Been Reimplemented

This article details a recent surge in replicating pre-Bitcoin Proof-of-Work (PoW) protocols, specifically focusing on Hal Finney's 2004 RPOW (Reusable Proofs of Work). Within five days in May 2026, multiple independent builders in the Bitcoin/cypherpunk community launched projects inspired by this early electronic cash proposal. The initiative began with Fred Krueger's `rpow2.com`, a centralized but auditable system that replaced RPOW's original IBM 4758 hardware with Ed25519 signatures. Initially a faithful replica, it later adopted Bitcoin-like features (21M supply cap, difficulty adjustment) and a controversial 5.24% founder allocation. This sparked rapid forks, including `rpow4.com` which incorporated full Bitcoin parameters, a prediction market (`rpowmarket.com`), and a DEX (`rpow2swap.com`). Concurrently, Mike In Space created a prototype of Wei Dai's 1998 b-money proposal (`b-money.replit.app`), pushing the historical exploration even further back. The article contrasts these centralized, server-dependent experiments with Bitcoin's core innovation of decentralized, trustless consensus. It also highlights a parallel development: the `HASH` project on Ethereum, which uses smart contract hooks to enable a purely fair-launch, browser-mineable PoW token with 0% allocations to team or VCs. The collective activity is framed as a meme-driven, educational exploration of cypherpunk history rather than a serious financial movement, with all projects heavily disclaiming any investment value.

marsbit44m ago

Those Pre-Bitcoin PoW Protocols Have Recently Been Reimplemented

marsbit44m ago

South Korean Exchanges 'Battle' Regulators, Challenging the Boundaries of Enforcement and Legislation

South Korea's cryptocurrency industry is engaged in a rare, direct confrontation with regulators. The Financial Intelligence Unit (FIU), the primary anti-money laundering (AML) watchdog, has recently imposed heavy penalties on major exchanges like Upbit and Bithumb for alleged violations involving unregistered overseas VASPs and AML procedures. However, exchanges are now actively challenging these actions in court and through industry associations. In a significant shift, the Seoul Administrative Court ruled in favor of Upbit's operator, Dunamu, overturning part of an FIU-ordered business suspension. The court found the FIU's penalty criteria and justification insufficiently clear. Similarly, the court suspended the enforcement of a six-month business suspension against Bithumb pending a final ruling, citing potential irreversible harm to the exchange. Beyond legal battles, the industry is contesting proposed legislative amendments. The Digital Asset eXchange Alliance (DAXA) strongly opposes a draft rule that would mandate Suspicious Transaction Reports (STRs) for all crypto transfers over 10 million KRW (~$6,800). DAXA argues this "poison pill" clause violates legal principles and would overwhelm the STR system, increasing reports from 63,000 to an estimated 5.45 million annually for major exchanges, thereby crippling effective AML monitoring. This conflict highlights a structural tension in South Korea's crypto governance: comprehensive digital asset laws are still developing, while regulators rely heavily on AML enforcement. The industry's move from passive compliance to active legal and legislative challenges signifies a new phase, pressing for clearer rules and more proportionate enforcement. While short-term disputes may intensify, this clash could ultimately lead to a more mature and sustainable regulatory framework for South Korea's vibrant crypto market.

marsbit1h ago

South Korean Exchanges 'Battle' Regulators, Challenging the Boundaries of Enforcement and Legislation

marsbit1h ago

After 50x Storage Surge, Justin Sun Always Looks to the Next Decade

Sun Yuchen, known for his controversial stunts like a $30 million lunch with Warren Buffett (canceled due to a kidney stone) and eating a $6.2 million duct-taped banana, is often overshadowed by a significant fact: his decade-long track record of spotting major investment trends. In 2016, he famously advised young people to invest in Bitcoin, Nvidia, Tesla, and Tencent instead of buying property. A hypothetical $20,000 investment in Nvidia and Tesla from that list would now be worth over 50 million RMB. His latest major call was on November 6, 2025, predicting a "50x storage opportunity" tied to the AI boom, which materialized with Sandisk's stock surging nearly 50-fold by 2026. Looking ahead, Sun now focuses on the next frontier: Physical AI. He identifies four key areas: 1. **Embodied AI/Robotics**: He sees this reaching its "iPhone moment," with companies like UBTech and Galaxy General leading in commercialization. 2. **Drones**: Viewed as the first commercially viable form of Physical AI, revolutionizing sectors from warfare (e.g., AeroVironment's Switchblade) to logistics. 3. **Spatial Computing**: Beyond VR, it's about AI understanding physical space, a foundational technology for robotics and autonomous systems, exemplified by Apple's Vision Pro. 4. **Space Exploration**: After a 2025 suborbital flight with Blue Origin, Sun advocates for space as the ultimate frontier, discussing blockchain's potential role in space asset management and data transactions. His investment philosophy involves betting on entire, inevitable trends rather than single companies. For robotics, he sees Tesla (the body/manufacturer) and Nvidia (the brain/AI platform) as complementary plays. In defense drones, he highlights companies making tanks obsolete (AeroVironment) and those augmenting fighter jets (Kratos). For space, he participated in Blue Origin's flight and anticipates SpaceX's potential IPO to redefine the sector's valuation. Sun Yuchen's vision frames the next two decades not as a revolution in information flow (like the internet), but in the fundamental operation of the physical world through AI-powered robots, autonomous systems, and spatial intelligence, ultimately extending human and AI activity into space. While many still focus on conventional assets, he continues to look toward the next technological horizon.

marsbit2h ago

After 50x Storage Surge, Justin Sun Always Looks to the Next Decade

marsbit2h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片