AI Prediction Record: Want to Make Money in Prediction Markets with AI? But It Might Not Even Have Read the Question Clearly

marsbitPublished on 2026-01-04Last updated on 2026-01-04

Abstract

Based on an experiment comparing AI predictions against human "smart money" on Polymarket, this article investigates whether AI can reliably profit in prediction markets. The author tested Google's Gemini 2.5 Pro and xAI's Grok (via OpenRouter), both equipped with web search, on 21 resolved non-crypto market questions. The core finding is a divergence in performance: Grok achieved the highest win rate at 75%, outperforming humans (66.7%) and significantly beating Gemini (52.4%). However, a detailed analysis of the AI's reasoning revealed critical flaws. Gemini frequently misjudged the current date, leading to erroneous conclusions. Both models sometimes relied on superficial or commonsense assumptions instead of deep, evidence-based logic. A major failure mode was misinterpreting the specific settlement conditions of a market, such as confusing "any files" with "all files" being released. While Grok's results are promising, the experiment concludes that AI often fails to fully comprehend the question's nuances, highlighting a significant gap between raw information retrieval and true contextual understanding needed for reliable prediction.

Author|Nan Zhi (@Assassin_Malvo)

After many sectors were proven false, prediction markets have become one of the few sectors within the Crypto space that is still experiencing positive growth. On November 20, Nan Zhi began attempting to use last year's approach of finding smart money in Meme coins to search for smart money in prediction markets, achieving good results in the early stages.

In early December, coinciding with the launch of Gemini 3 Pro, the idea arose while testing related models: could AI be used to analyze and predict prediction markets, pitting humans against AI to see which side makes more accurate predictions?

When introducing prediction markets, they are often described as moving the market closer to the "truth" by "allowing insightful people to place real-money bets." However, some argue that Crypto + prediction markets allow "insiders" to safely profit from information asymmetry, thereby driving the market towards the "insider outcome." This is essentially a clash between the views of "wisdom of the crowd" and "truth is in the hands of the few." AI prediction leans more towards "wisdom of the crowd," thus requiring a large amount of available knowledge and insights.

Therefore, in selecting the AI model, Gemini and Grok were initially chosen because they rely on Google and the X platform, respectively, allowing for the most direct access to vast amounts of knowledge and insights. Recently, Nan Zhi added the combination of "Douban (Douyin Knowledge)," but due to the limited number of prediction questions involving it, it is not covered in this article.

Basic Rules

AI Versions: Gemini 2.5 pro (with built-in Google Search), Grok 4 Fast (called via OpenRouter, native search function enabled)
Question Selection: Humans choose the betting questions, AI follows with predictions, but the Crypto category is excluded.
Input Content: Official question (title), official description (Description), optional answers (actually only Yes and No)

Note: Polymarket's questions are divided into major categories (Events) and subcategories (Markets). Major Events are broad questions like "Who will be the next Fed Chair?" or "When will Strategy sell Bitcoin?" Under each Event, there are N sub-markets, such as "Will Hassett become the next Fed Chair?" or "Will Strategy sell Bitcoin before March 31, 2026?" To align with human predictions, Markets were chosen as the questions for AI judgment, without inputting other options. For example, the AI is only asked to judge "Will Hassett become the next Fed Chair?" rather than asking it to choose the most likely candidate from N possibilities.

Prompt Design:
Require the AI to search for the latest news, official announcements, expert analysis reports
Require the removal/prohibition of using prediction market data
Make judgments based on "evidence" using logical reasoning
Only allow Yes or No outputs, accompanied by a paragraph explaining the reasoning logic

Current Results

Among the predicted questions, 21 have been settled. Grok has the highest win rate at 75%, humans at 66.7%, and Gemini the lowest at 52.4%. Current results can be viewed on the relevant website.

What Mistakes Did the AI Make?

Gemini Occasionally Misjudges the Current Time

In the question "Will Trump's approval rating hit 35% in 2025?", Gemini stated that it is currently the first half of 2025, so anything is possible, and gave a random answer.

However, when the author directly asked Gemini to output the current time using a program, Gemini could provide the correct answer. It is still unclear why such an erroneous time perception occurred.

AI Lacks Depth of Thought

In the question "Gemini 3.0 Flash released by December 16?", Grok based its judgment on "official sources recently only mentioned Gemini 3 Pro and related 2.5 versions, with极少 mention of 3 Flash, therefore evidence is insufficient to judge," considering only immediate information.

Whereas Gemini pointed out "Gemini 1.0 was released in December 2023, and the experimental version of Gemini 2.0 Flash was launched in December 2024. Continuing this pattern, a 3.0 version release by the end of 2025 is logical," and also noted "a leaked demo about 'Gemini 3.0 Flash' circulating in online communities recently (December 14, 2025), further enhancing the possibility of its imminent public release."

Although, conclusion-wise, Gemini's answer was actually wrong, in this question, the obvious difference in the breadth of information relied upon by the two is evident.

AI Relies on Common Sense Rather Than Evidence + Logic for Inference

In the question "Trump approval Up or Down this week?", Gemini stated that "predicting the approval rating for a single week more than a year later is highly uncertain," first showing the "time misjudgment" issue again. Then Gemini said "in any ordinary week, the probability of events causing a slight decrease in support is likely slightly higher than the probability of positive events significantly boosting support," so a decrease in support is more likely. The generated conclusion was based solely on subjective common sense assumptions.

In this question, Grok based its judgment on news reports and polling data regarding "government shutdown, economic concerns, immigration policy disputes, and negative backlash from comments on Rob Reiner's death," which aligned with the design expectations.

Incorrect Judgment of Settlement Conditions

In the question "Will Trump release the Epstein files by December 20?", both Gemini and Grok already knew that "the government will release 'hundreds of thousands of pages' of documents on Friday (December 19th)." The settlement conditions clearly stated "if the government publicly releases any files related to Epstein's illegal activities that were not public before the listed date, it will be judged as Yes."

However, under this condition, Gemini stated that "completing the release of 'all' files by December 20th is impossible," clearly misjudging the conditions required for settlement, thus giving the wrong answer.

Summary

In summary, Grok's prediction win rate has surpassed that of these smart money players who have profited hundreds of thousands or even millions of dollars in prediction markets. However, upon深入探究 its prediction logic, there are still many areas that can be guided and corrected.

Trending Cryptos

CitreaCTR

wrapped stUSDTWSTUSDT

Velodrome FinanceVELODROME

BrevisBREV

ZRX（0X）ZRX

PancakeSwapCAKE

Base Under Pressure

**Title: The Pressure Mounts for Base** Base, the Ethereum Layer 2 scaling solution backed by Coinbase, is facing significant pressure and public scrutiny from its leadership following the launch of Robinhood Chain. Base co-founder Jesse Pollak recently acknowledged strategic missteps, admitting that the chain's past focus on social and creator tokens (e.g., through Farcaster, Zora) failed to deliver sustainable adoption. He has refocused on core infrastructure, handing leadership of the Base App back to Coinbase's Cobie. While Base remains a top L2 contender alongside OP Mainnet and Arbitrum, and boasts the highest TVL (nearly $12B), its weaknesses are being highlighted by the new competitor. Key criticisms include its slow progress on decentralization. Base has faced issues with its single sequencer causing block production halts, and L2BEAT is reportedly considering downgrading its decentralization rating from Stage 1 to Stage 0. This contrasts sharply with the rapid initial success of Robinhood Chain, whose DEX quickly entered the top five by volume. The leadership styles of the parent companies are also being compared: Robinhood's CEO actively engages with new projects, while a recent incident where Coinbase's Brian Armstrong briefly changed his profile picture—sparking and then crashing a related meme token—drew community ire and mockery. Pollak stated Base is working with Coinbase on tokenized stocks backed 1:1 by real equity, differentiating it from Robinhood's derivatives model. However, the article argues that Base's most urgent task is to address its long-standing technical and trust issues. With more traditional finance players likely to emulate Robinhood's path, Base must use this competitive pressure to solidify its position as long-term financial infrastructure.

Foresight News4m ago

Foresight News4m ago

White House Concession Removes Ethical Hurdle, Clarity Act Races Against Final Window Before Recess?

On July 21st, industry sources reported that the Trump administration has agreed to include an ethics provision in the "Clarity Act" (Digital Asset Market Clarity Act of 2025). This concession addresses the long-standing conflict-of-interest concerns regarding government officials and the crypto industry, potentially removing the final major obstacle to the bill's progress. Additionally, Patrick Witt, the executive director of the White House's Digital Asset Advisory Committee, confirmed he will remain in his role to help finalize the bill, alleviating previous concerns about his potential departure. The Clarity Act aims to establish a unified federal regulatory framework for the U.S. digital asset market. Its core objective is to resolve regulatory ambiguity by defining different types of digital assets (digital commodities, investment contract assets, and permitted payment stablecoins) and clarifying the respective oversight roles of the SEC and CFTC. This would end the long-running jurisdictional dispute between the two agencies and provide clearer compliance paths for the industry. With the ethics issue moving toward resolution, the most urgent challenge now is time. The U.S. Congress is set to begin its August recess in mid-August, leaving only a few working weeks to finalize the text and advance the bill through the Senate. Industry advocates, like the Blockchain Association's Kristin Smith, stress that this is a critical moment. If negotiations conclude successfully in the coming weeks, the Clarity Act could pass a key hurdle before the recess; otherwise, it may face significant delays. If enacted, the Clarity Act could mark a historic turning point in crypto regulation. By providing a clearer and more predictable legal framework, it aims to reduce uncertainty for businesses, developers, and traditional financial institutions looking to enter the digital asset space, potentially setting a global benchmark for market structure regulation.

Odaily星球日报10m ago

White House Concession Removes Ethical Hurdle, Clarity Act Races Against Final Window Before Recess?

Odaily星球日报10m ago

Midnight’s 515M NIGHT hack sends token down 32% – Will $0.015 hold?

In July 2026, the Midnight network was hacked, with 515 million NIGHT tokens drained from a cross-chain bridge contract. The attacker sold a large portion, causing the token price to crash 32% to an all-time low of $0.015. This triggered panic selling, spiking trading volume and driving market indicators into deeply oversold territory. While the Midnight Foundation stated its core network remained secure, the incident left the wrapped token on BNB unbacked. Analysts warn of continued bearish pressure, with the key question being whether the $0.015 support level will hold.

ambcrypto15m ago

Midnight’s 515M NIGHT hack sends token down 32% – Will $0.015 hold?

ambcrypto15m ago

AI Era, Industrial Revolution, and Future Civilization Interview — Zhang Dingwen: The Future Does Not Belong to Chasers

"AI Era, Industrial Revolution and Future Civilization: An Interview with Zhang Dingwen – The Future Does Not Belong to Those Who Chase" In this interview, entrepreneur Zhang Dingwen reflects on his entrepreneurial journey and philosophy, moving beyond discussions of financing or success to emphasize understanding the "era" itself. He argues that true entrepreneurs should not chase short-term trends ("winds"), but position themselves in the direction of long-term technological and societal evolution. Zhang shares key lessons from his early days, including the realization that user value does not automatically translate to commercial value. For him, the core of entrepreneurship is not building a company but constantly upgrading one's own "cognition" – the ability to interpret information, ask the right questions, and understand the underlying "causes" behind business outcomes, not just the effects. His thinking has evolved from a focus on creating good products to a strategic focus on building "entrances" – platforms that naturally connect users to digital services. He sees smart wearables, like watches, not merely as hardware but as potential future gateways combining technological, financial, social, and even fashion attributes to create sustained user relationships and ecosystems. Ultimately, Zhang's vision transcends individual products or companies. He discusses business competition in three stages: product, platform, and finally, "civilization" – where the greatest companies influence how society operates by defining new rules and ways of life. He believes the mission of a truly great enterprise is to solve problems of its time, build enduring trust, and contribute lasting value, leaving behind not just wealth but a positive impact on how the world works. The future, he concludes, belongs not to the fastest, but to those with the correct long-term direction and a commitment to continuous learning and evolution.

marsbit21m ago

AI Era, Industrial Revolution, and Future Civilization Interview — Zhang Dingwen: The Future Does Not Belong to Chasers

marsbit21m ago

Cryptocurrency & Stock Market Barometer丨Strategy Cash Reserves Increase to $3.23 Billion, Halting BTC Purchases; Vanguard and Other Asset Managers Increase Holdings in Strategy Stock (July 21)

Market Overview & Warnings: The article warns of high volatility in South Korean stocks and continued dependence on U.S. stocks on geopolitics. Chinese A-shares remain under pressure. It advises against using leverage in current equity markets. For crypto-linked stocks, most have limited growth except Robinhood, with caution advised. U.S. Stock Market: Bearish bets on U.S. stocks, particularly targeting AI-related companies, have reached record highs since 2010, signaling deep skepticism about the sustainability of the AI-driven rally. Tech and chip stocks led a market decline, with the Philadelphia Semiconductor Index potentially entering a bear market. Increased expectations for Federal Reserve interest rate hikes and geopolitical tensions contributed to the negative sentiment. Bitcoin Treasury Company Updates: * Strategy: Increased its cash reserves to $3.23 billion and paused Bitcoin purchases. Several major asset managers, including Vanguard Group and Capital Group, increased their holdings of Strategy (MSTR) stock. * Global corporate Bitcoin buying slowed significantly to just $1.33 million last week. * Other notable activity: Strive purchased 21 BTC; ORANGE JUICE raised $40 million for Bitcoin acquisitions; Bitcoin Japan Corp. raised $60 million, allocating $4.08 million for its first BTC purchase. Other Crypto Treasury Holdings: * Ethereum: BitMine increased its ETH holdings to 5.78 million, nearing its 5% of supply goal. Its total crypto assets, cash, and securities are valued at $11.5 billion. * Solana: No significant corporate treasury activity reported. * Altcoins: HypeStrat made no adjustments to its treasury; its mNAV ratio fell to a long-term low. (Note: This summary is for informational purposes only and does not constitute investment advice.)

marsbit22m ago

Cryptocurrency & Stock Market Barometer丨Strategy Cash Reserves Increase to $3.23 Billion, Halting BTC Purchases; Vanguard and Other Asset Managers Increase Holdings in Strategy Stock (July 21)

marsbit22m ago

Trading

Spot

Hot Articles

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

Talus is a decentralized AI Agent framework built on the Sui, designed to solve the structural problems of current AI systems: centralization, opacity, and a lack of native economic identity.

43.3k Total ViewsPublished 2026.03.18Updated 2026.03.18

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

By 2026, the integration of artificial intelligence and cryptocurrency has advanced from proof-of-concept to a new stage of "system-level integration".

2.8k Total ViewsPublished 2026.03.26Updated 2026.03.26

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

U.S. Equity TradFi Assets: Traditional Finance as a Steady Anchor Amid the AI IPO Boom

In 2026, the U.S. IPO market has regained momentum.

36.0k Total ViewsPublished 2026.07.08Updated 2026.07.08

U.S. Equity TradFi Assets: Traditional Finance as a Steady Anchor Amid the AI IPO Boom

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.