AI Prediction Record: Want to Make Money in Prediction Markets with AI? But It Might Not Even Have Read the Question Clearly

marsbitPublished on 2026-01-04Last updated on 2026-01-04

Abstract

Based on an experiment comparing AI predictions against human "smart money" on Polymarket, this article investigates whether AI can reliably profit in prediction markets. The author tested Google's Gemini 2.5 Pro and xAI's Grok (via OpenRouter), both equipped with web search, on 21 resolved non-crypto market questions. The core finding is a divergence in performance: Grok achieved the highest win rate at 75%, outperforming humans (66.7%) and significantly beating Gemini (52.4%). However, a detailed analysis of the AI's reasoning revealed critical flaws. Gemini frequently misjudged the current date, leading to erroneous conclusions. Both models sometimes relied on superficial or commonsense assumptions instead of deep, evidence-based logic. A major failure mode was misinterpreting the specific settlement conditions of a market, such as confusing "any files" with "all files" being released. While Grok's results are promising, the experiment concludes that AI often fails to fully comprehend the question's nuances, highlighting a significant gap between raw information retrieval and true contextual understanding needed for reliable prediction.

Author|Nan Zhi (@Assassin_Malvo)

After many sectors were proven false, prediction markets have become one of the few sectors within the Crypto space that is still experiencing positive growth. On November 20, Nan Zhi began attempting to use last year's approach of finding smart money in Meme coins to search for smart money in prediction markets, achieving good results in the early stages.

In early December, coinciding with the launch of Gemini 3 Pro, the idea arose while testing related models: could AI be used to analyze and predict prediction markets, pitting humans against AI to see which side makes more accurate predictions?

When introducing prediction markets, they are often described as moving the market closer to the "truth" by "allowing insightful people to place real-money bets." However, some argue that Crypto + prediction markets allow "insiders" to safely profit from information asymmetry, thereby driving the market towards the "insider outcome." This is essentially a clash between the views of "wisdom of the crowd" and "truth is in the hands of the few." AI prediction leans more towards "wisdom of the crowd," thus requiring a large amount of available knowledge and insights.

Therefore, in selecting the AI model, Gemini and Grok were initially chosen because they rely on Google and the X platform, respectively, allowing for the most direct access to vast amounts of knowledge and insights. Recently, Nan Zhi added the combination of "Douban (Douyin Knowledge)," but due to the limited number of prediction questions involving it, it is not covered in this article.

Basic Rules

AI Versions: Gemini 2.5 pro (with built-in Google Search), Grok 4 Fast (called via OpenRouter, native search function enabled)
Question Selection: Humans choose the betting questions, AI follows with predictions, but the Crypto category is excluded.
Input Content: Official question (title), official description (Description), optional answers (actually only Yes and No)

Note: Polymarket's questions are divided into major categories (Events) and subcategories (Markets). Major Events are broad questions like "Who will be the next Fed Chair?" or "When will Strategy sell Bitcoin?" Under each Event, there are N sub-markets, such as "Will Hassett become the next Fed Chair?" or "Will Strategy sell Bitcoin before March 31, 2026?" To align with human predictions, Markets were chosen as the questions for AI judgment, without inputting other options. For example, the AI is only asked to judge "Will Hassett become the next Fed Chair?" rather than asking it to choose the most likely candidate from N possibilities.

Prompt Design:
Require the AI to search for the latest news, official announcements, expert analysis reports
Require the removal/prohibition of using prediction market data
Make judgments based on "evidence" using logical reasoning
Only allow Yes or No outputs, accompanied by a paragraph explaining the reasoning logic

Current Results

Among the predicted questions, 21 have been settled. Grok has the highest win rate at 75%, humans at 66.7%, and Gemini the lowest at 52.4%. Current results can be viewed on the relevant website.

What Mistakes Did the AI Make?

Gemini Occasionally Misjudges the Current Time

In the question "Will Trump's approval rating hit 35% in 2025?", Gemini stated that it is currently the first half of 2025, so anything is possible, and gave a random answer.

However, when the author directly asked Gemini to output the current time using a program, Gemini could provide the correct answer. It is still unclear why such an erroneous time perception occurred.

AI Lacks Depth of Thought

In the question "Gemini 3.0 Flash released by December 16?", Grok based its judgment on "official sources recently only mentioned Gemini 3 Pro and related 2.5 versions, with极少 mention of 3 Flash, therefore evidence is insufficient to judge," considering only immediate information.

Whereas Gemini pointed out "Gemini 1.0 was released in December 2023, and the experimental version of Gemini 2.0 Flash was launched in December 2024. Continuing this pattern, a 3.0 version release by the end of 2025 is logical," and also noted "a leaked demo about 'Gemini 3.0 Flash' circulating in online communities recently (December 14, 2025), further enhancing the possibility of its imminent public release."

Although, conclusion-wise, Gemini's answer was actually wrong, in this question, the obvious difference in the breadth of information relied upon by the two is evident.

AI Relies on Common Sense Rather Than Evidence + Logic for Inference

In the question "Trump approval Up or Down this week?", Gemini stated that "predicting the approval rating for a single week more than a year later is highly uncertain," first showing the "time misjudgment" issue again. Then Gemini said "in any ordinary week, the probability of events causing a slight decrease in support is likely slightly higher than the probability of positive events significantly boosting support," so a decrease in support is more likely. The generated conclusion was based solely on subjective common sense assumptions.

In this question, Grok based its judgment on news reports and polling data regarding "government shutdown, economic concerns, immigration policy disputes, and negative backlash from comments on Rob Reiner's death," which aligned with the design expectations.

Incorrect Judgment of Settlement Conditions

In the question "Will Trump release the Epstein files by December 20?", both Gemini and Grok already knew that "the government will release 'hundreds of thousands of pages' of documents on Friday (December 19th)." The settlement conditions clearly stated "if the government publicly releases any files related to Epstein's illegal activities that were not public before the listed date, it will be judged as Yes."

However, under this condition, Gemini stated that "completing the release of 'all' files by December 20th is impossible," clearly misjudging the conditions required for settlement, thus giving the wrong answer.

Summary

In summary, Grok's prediction win rate has surpassed that of these smart money players who have profited hundreds of thousands or even millions of dollars in prediction markets. However, upon深入探究 its prediction logic, there are still many areas that can be guided and corrected.

Google and Amazon Simultaneously Invest Heavily in a Competitor: The Most Absurd Business Logic of the AI Era Is Becoming Reality

In a span of four days, Amazon announced an additional $25 billion investment, and Google pledged up to $40 billion—both direct competitors pouring over $65 billion into the same AI startup, Anthropic. Rather than a typical venture capital move, this signals the latest escalation in the cloud wars. The core of the deal is not equity but compute pre-orders: Anthropic must spend the majority of these funds on AWS and Google Cloud services and chips, effectively locking in massive future compute consumption. This reflects a shift in cloud market dynamics—enterprises now choose cloud providers based on which hosts the best AI models, not just price or stability. With OpenAI deeply tied to Microsoft, Anthropic’s Claude has become the only viable strategic asset for Google and Amazon to remain competitive. Anthropic’s annualized revenue has surged to $30 billion, and it is expanding into verticals like biotech, positioning itself as a cross-industry AI infrastructure layer. However, this funding comes with constraints: Anthropic’s independence is challenged as it balances two rival investors, its safety-first narrative faces pressure from regulatory scrutiny, and its path to IPO introduces new financial pressures. Globally, this accelerates a "tri-polar" closed-loop structure in AI infrastructure, with Microsoft-OpenAI, Google-Anthropic, and Amazon-Anthropic forming exclusive model-cloud alliances. In contrast, China’s landscape differs—investments like Alibaba and Tencent backing open-source model firm DeepSeek reflect a more decoupled approach, though closed-source models from major cloud providers still dominate. The $65 billion bet is ultimately about securing a seat at the table in an AI-defined future—where missing the model layer means losing the cloud war.

marsbit5h ago

Google and Amazon Simultaneously Invest Heavily in a Competitor: The Most Absurd Business Logic of the AI Era Is Becoming Reality

marsbit5h ago

Computing Power Constrained, Why Did DeepSeek-V4 Open Source?

DeepSeek-V4 has been released as a preview open-source model, featuring 1 million tokens of context length as a baseline capability—previously a premium feature locked behind enterprise paywalls by major overseas AI firms. The official announcement, however, openly acknowledges computational constraints, particularly limited service throughput for the high-end DeepSeek-V4-Pro version due to restricted high-end computing power. Rather than competing on pure scale, DeepSeek adopts a pragmatic approach that balances algorithmic innovation with hardware realities in China’s AI ecosystem. The V4-Pro model uses a highly sparse architecture with 1.6T total parameters but only activates 49B during inference. It performs strongly in agentic coding, knowledge-intensive tasks, and STEM reasoning, competing closely with top-tier closed models like Gemini Pro 3.1 and Claude Opus 4.6 in certain scenarios. A key strategic product is the Flash edition, with 284B total parameters but only 13B activated—making it cost-effective and accessible for mid- and low-tier hardware, including domestic AI chips from Huawei (Ascend), Cambricon, and Hygon. This design supports broader adoption across developers and SMEs while stimulating China's domestic semiconductor ecosystem. Despite facing talent outflow and intense competition in user traffic—with rivals like Doubao and Qianwen leading in monthly active users—DeepSeek has maintained technical momentum. The release also comes amid reports of a new funding round targeting a valuation exceeding $10 billion, potentially setting a new record in China’s LLM sector. Ultimately, DeepSeek-V4 represents a shift toward open yet realistic infrastructure development in the constrained compute landscape of Chinese AI, emphasizing engineering efficiency and domestic hardware compatibility over pure model scale.

marsbit5h ago

Computing Power Constrained, Why Did DeepSeek-V4 Open Source?

marsbit5h ago

Memecoin Millionaires Line Up For Trump’s Exclusive Luncheon

A crypto investor paid only $500 to attend an exclusive luncheon with former President Donald Trump at Mar-a-Lago, reflecting a significant decline in enthusiasm compared to previous events. The TRUMP memecoin has lost over 93% of its value since its peak, falling from around $45 to under $3. Despite the collapse, nearly 300 top holders are expected to attend the gathering, which critics argue is an attempt to buy access to the president. Notable attendees include key figures from the industry like Tether CEO Paolo Ardoino and Bitcoin advocate Anthony Pompliano. However, Tron founder Justin Sun, the largest holder of the token, is absent amid a lawsuit against a crypto platform co-founded by Trump’s sons. Ethics watchdogs have raised concerns about conflicts of interest and lack of financial transparency.

bitcoinist9h ago

Memecoin Millionaires Line Up For Trump’s Exclusive Luncheon

bitcoinist9h ago

Why Bitcoin Price Failed To Breach $80K: An On-Chain Deep Dive

Bitcoin's price recently surged to a three-month high above $79,000 but struggled to break the $80,000 resistance. According to on-chain analysis, a key reason is the "True Market Mean Price," a metric representing the average cost basis of active market participants, which acted as a strong resistance level. Additionally, a shift in market sentiment toward FOMO (fear of missing out) and euphoria signaled caution among traders. Analysts suggest that a confirmed breakout would require holding above this level for about three days. As of the latest data, BTC is trading around $77,588.

bitcoinist10h ago

Why Bitcoin Price Failed To Breach $80K: An On-Chain Deep Dive

bitcoinist10h ago

XRP And Bitcoin Investors Are ‘Trapped’, But Is There A Way Out?

Crypto analyst RWA Investor claims that both XRP and Bitcoin short sellers are currently 'trapped.' He outlines a potential roadmap for XRP to reach a new all-time high of $7. The prediction involves XRP first breaking the $1.50-$1.60 range to rally to $2-$3, followed by a major pullback. A third wave is then expected to emerge, culminating in a final bear trap before a massive short squeeze propels the price to the $7 target. The analyst attributes this anticipated rally to Federal Reserve rate cuts and quantitative easing. Separately, analyst CasiTrades suggests XRP move to the $1.50-$1.53 range is still possible if Bitcoin approaches $79,900, provided XRP holds the $1.39 support level.

bitcoinist11h ago

XRP And Bitcoin Investors Are ‘Trapped’, But Is There A Way Out?

bitcoinist11h ago

Trading

Spot

Futures

Hot Articles

Audiera: The AI Agent Network Powering the Web4 Entertainment Economy

Audiera is a dual-platform Web4 entertainment ecosystem combining a mobile rhythm experience and a lightweight Telegram mini-game, powered by AI interaction and an on-chain creator economy.

39.7k Total ViewsPublished 2026.03.11Updated 2026.03.11

Audiera: The AI Agent Network Powering the Web4 Entertainment Economy

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

Talus is a decentralized AI Agent framework built on the Sui, designed to solve the structural problems of current AI systems: centralization, opacity, and a lack of native economic identity.

41.2k Total ViewsPublished 2026.03.18Updated 2026.03.18

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

By 2026, the integration of artificial intelligence and cryptocurrency has advanced from proof-of-concept to a new stage of "system-level integration".

1.4k Total ViewsPublished 2026.03.26Updated 2026.03.26

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

AI Prediction Record: Want to Make Money in Prediction Markets with AI? But It Might Not Even Have Read the Question Clearly

Abstract

Basic Rules

Current Results

What Mistakes Did the AI Make?

Gemini Occasionally Misjudges the Current Time

AI Lacks Depth of Thought

AI Relies on Common Sense Rather Than Evidence + Logic for Inference

Incorrect Judgment of Settlement Conditions

Summary

Related Questions

Related Reads

Google and Amazon Simultaneously Invest Heavily in a Competitor: The Most Absurd Business Logic of the AI Era Is Becoming Reality

Computing Power Constrained, Why Did DeepSeek-V4 Open Source?

Memecoin Millionaires Line Up For Trump’s Exclusive Luncheon

Why Bitcoin Price Failed To Breach $80K: An On-Chain Deep Dive

XRP And Bitcoin Investors Are ‘Trapped’, But Is There A Way Out?

Trading

Hot Articles

Audiera: The AI Agent Network Powering the Web4 Entertainment Economy

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

Discussions

Top Questions

Hot Categories

Hot Tags