AI Prediction Record: Want to Make Money in Prediction Markets with AI? But It Might Not Even Have Read the Question Clearly
Based on an experiment comparing AI predictions against human "smart money" on Polymarket, this article investigates whether AI can reliably profit in prediction markets. The author tested Google's Gemini 2.5 Pro and xAI's Grok (via OpenRouter), both equipped with web search, on 21 resolved non-crypto market questions.
The core finding is a divergence in performance: Grok achieved the highest win rate at 75%, outperforming humans (66.7%) and significantly beating Gemini (52.4%). However, a detailed analysis of the AI's reasoning revealed critical flaws. Gemini frequently misjudged the current date, leading to erroneous conclusions. Both models sometimes relied on superficial or commonsense assumptions instead of deep, evidence-based logic. A major failure mode was misinterpreting the specific settlement conditions of a market, such as confusing "any files" with "all files" being released. While Grok's results are promising, the experiment concludes that AI often fails to fully comprehend the question's nuances, highlighting a significant gap between raw information retrieval and true contextual understanding needed for reliable prediction.
marsbit01/04 08:59