OpenAI’s latest paper exposes the risks of AI in smart contracts

ambcryptoОпубликовано 2026-02-19Обновлено 2026-02-19

Введение

OpenAI's latest research paper highlights the dual role of AI in smart contract security, both as a tool for identifying vulnerabilities and as a potential threat capable of exploiting them. As smart contracts now manage over $400 billion in assets, their immutable nature makes security critical. To evaluate AI's capabilities, researchers developed EVMbench, a benchmark using 120 real vulnerabilities from 40 blockchain projects. The study found that frontier AI agents can successfully discover and exploit vulnerabilities end-to-end, with exploit success rates jumping from 31.9% to 72.2% in just six months. However, a recent incident involving Claude Opus 4.6 demonstrated significant risks when AI-generated code contained critical errors, leading to $1.78 million in losses. EVMbench has limitations, including a limited dataset, false positives, and an inability to fully replicate real-world conditions like cross-chain activity. The paper underscores the need for responsible AI development as smart contracts increasingly become tools for both innovation and cybercrime.

As smart contracts evolve from small experiments into major financial systems managing over $400 billion in assets, security has become increasingly critical.

Unlike traditional software, most blockchain programs cannot be changed after deployment, meaning even minor coding errors can cause permanent financial losses.

To evaluate how artificial intelligence performs in this high-risk environment, researchers from OpenAI, Paradigm, and OtterSec developed EVMbench.

Instead of simple test challenges, it uses 120 real vulnerabilities from 40 blockchain projects, making the evaluation closer to real-world conditions.

Remarking on which, the OpenAI blog post noted,

“We evaluate a range of frontier agents and find that they are capable of discovering and exploiting vulnerabilities end-to-end against live blockchain instances.”

It further added,

“We release code, tasks, and tooling to support continued measurement of these capabilities and future work on security.”

Is AI actually reshaping smart contract security?

While AI greatly improves auditing and bug fixing, it can also exploit system weaknesses. To resolve this, EVMbench helps researchers track these risks.

It also guides responsible AI development for high-value financial systems.

That being said, EVMbench tests AI agents in three stages.

Each stage represents a different level of technical difficulty, reflecting growing security responsibility.

The community appreciates this effort

Appreciating this move, an X user account noted,

“This is a watershed moment for smart contract security. The jump from 31.9% to 72.2% exploit success in just 6 months shows AI agents aren’t just getting better at reading code—they’re mastering the full attack chain.”

Echoing similar sentiments, another user added,

“The 6× jump in exploit success is wild progress, but kinda worrying how fast offensive skills are scaling.”

Recent incident that sent shockwaves

Yet, despite such optimism, something unreal happened soon after OpenAI launched EVMbench. An exploit involving Claude Opus 4.6 raised serious concerns about the risks of “vibe-coded” smart contracts.

In this case, the AI helped write vulnerable Solidity code that incorrectly set the price of the cbETH asset at $1.12 instead of its real value of around $2,200, triggering liquidations and causing losses of nearly $1.78 million.

This shows that trusting AI with critical financial logic without careful human review can turn small mistakes into major losses.

Limitations remain

EVMbench has clear limitations. It includes only 120 curated vulnerabilities and cannot evaluate newly discovered issues.

Detect Mode also produces false positives. While the small number of Patch and Exploit tasks reflects the heavy manual effort needed to create them.

In addition, the sandboxed environment fails to fully represent real-world conditions such as cross-chain activity, timing complexities, and long-term network history.

Needless to say, as blockchain adoption accelerates, its misuse is evolving just as quickly.

Recently, research by Group-IB also showed that the DeadLock ransomware is using Polygon smart contracts to conceal server infrastructure and evade detection.

Together, these developments signal a troubling shift where smart contracts, originally designed to enhance transparency and trust, are increasingly being repurposed as tools for cybercrime.


Final Summary

  • Tools like EVMbench help researchers measure AI capabilities in realistic security settings.
  • Limited datasets and controlled environments still fail to capture real-world blockchain complexity.

Связанные с этим вопросы

QWhat is EVMbench and what is its purpose in the context of AI and smart contracts?

AEVMbench is a tool developed by researchers from OpenAI, Paradigm, and OtterSec to evaluate how artificial intelligence performs in the high-risk environment of smart contracts. It uses 120 real vulnerabilities from 40 blockchain projects to test AI agents, making the evaluation closer to real-world conditions. Its purpose is to help researchers track the risks of AI in smart contract security and guide responsible AI development for high-value financial systems.

QAccording to the article, what are the potential dual roles of AI in smart contract security?

AThe article states that AI can both greatly improve auditing and bug fixing in smart contracts, but it can also be used to exploit system weaknesses. This dual capability means AI can be a tool for both enhancing security and for conducting attacks.

QWhat was the concerning incident mentioned that involved the AI model Claude Opus 4.6?

AAn exploit involving Claude Opus 4.6 raised serious concerns. The AI helped write vulnerable Solidity code that incorrectly set the price of the cbETH asset at $1.12 instead of its real value of around $2,200. This error triggered liquidations and caused financial losses of nearly $1.78 million, demonstrating the risks of using AI for critical financial logic without careful human review.

QWhat are some of the limitations of the EVMbench tool as outlined in the article?

AEVMbench has several limitations: it includes only 120 curated vulnerabilities and cannot evaluate newly discovered issues; its Detect Mode can produce false positives; the small number of Patch and Exploit tasks reflects the heavy manual effort required to create them; and its sandboxed environment fails to fully represent real-world conditions like cross-chain activity, timing complexities, and long-term network history.

QHow did the community react to the release of EVMbench, as per the social media comments cited?

AThe community reaction, as cited from social media (X), was a mix of appreciation and concern. One user called it a 'watershed moment for smart contract security,' highlighting a jump in AI exploit success rates from 31.9% to 72.2% in six months. Another user expressed that the rapid progress was 'wild' but also 'kinda worrying,' noting how fast offensive AI skills are scaling.

Похожее

Super-Rich Hoarded Record Cash in February, Stock Market Hit New Highs Four Months Later: Who's Getting Fooled?

In February, the total assets in US money market funds reached a record high of approximately $8.25 trillion, a trend highlighted by high-net-worth individuals increasing their cash holdings. Notably, Warren Buffett's Berkshire Hathaway amassed a $381.7 billion cash pile ahead of his 2025 retirement, while other prominent figures like Peter Thiel sold tech stocks, fueling narratives of wealthy investors seeking safety. However, by June, the trend reversed. Money market fund assets fell to around $7.87 trillion, indicating a flow of capital back into equities. Concurrently, the S&P 500 and Nasdaq reached all-time highs, with the S&P 500 surpassing 7600 points. This market surge occurred despite the earlier defensive moves, highlighting a potential opportunity cost for those who retreated to cash. Analysis shows that since early 2022, the S&P 500's total return significantly outpaced that of prime money market funds. The capital shifted from equities appears to have been partly reallocated into alternative investments like real estate, art, and private credit, especially among ultra-high-net-worth individuals. Meanwhile, major investment banks like Goldman Sachs and Morgan Stanley have raised their year-end targets for the S&P 500, citing AI-driven earnings growth, while also cautioning about risks including market concentration and economic fragility beneath the surface rally.

marsbit5 мин. назад

Super-Rich Hoarded Record Cash in February, Stock Market Hit New Highs Four Months Later: Who's Getting Fooled?

marsbit5 мин. назад

Robot Vacuums Have Been Competing for 20 Years, So Why Are 90% of Chinese Households Still Hesitant?

The article explores why over 90% of Chinese households are still hesitant to adopt robotic vacuum cleaners despite two decades of industry development, identifying a core "trust gap" as the primary barrier. The central issue is not a lack of need, but user concerns about reliability in dynamic, real-world home environments. Common anxieties include the robot dragging pet waste, colliding with transparent objects, tangling in cords, scattering cat litter, getting stuck on thresholds, missing corners under furniture, and requiring high-maintenance bases that develop odors. The industry's past focus on competing on technical specs (suction power, mopping functions) has not adequately addressed these practical usability and trust problems. The piece then examines DJI's entry into the market with its ROMO 2 model as a potential new approach. Leveraging its expertise in spatial perception and obstacle avoidance from drones, DJI's solution emphasizes "less intervention" through three key principles: less manual re-cleaning, less user rescue missions, and less maintenance. Specific ROMO 2 features highlighted include advanced obstacle recognition (handling transparent objects and small items), adaptive leg mechanisms for climbing thresholds (up to 8.5cm), an extendable arm for reaching under furniture, AI for identifying and appropriately handling different mess types (e.g., avoiding scattering dry debris), and a self-cleaning base designed to minimize user upkeep. The article argues the next phase of competition should shift from a "parameter race" to a "trust race." It draws a parallel to the iPhone's simplification of the smartphone, suggesting that focusing on a reliable, low-hassle user experience—where people feel confident leaving their floors to the machine—is what's needed to finally convince the vast majority of观望ing families. The ultimate test for products like the ROMO 2 will be long-term user adoption, retention, and口碑, not just technical specifications.

marsbit5 мин. назад

Robot Vacuums Have Been Competing for 20 Years, So Why Are 90% of Chinese Households Still Hesitant?

marsbit5 мин. назад

The Unclear American Economy: Resilient or Cooling Down?

**U.S. Economic Outlook: Resilient or Cooling Down?** This analysis examines whether the U.S. economy is heading towards a recession. While still growing, the economy shows significant signs of strain. Key data points include Q1 2026 GDP growth of 1.6% and Q1 PCE inflation at 4.5% (annualized), more than double the Fed's target. The labor market remains resilient but is softening, with unemployment at 4.3%. Critical recession indicators present a mixed picture: the yield curve has normalized after a prolonged inversion (historically a late-cycle signal), and the Conference Board's Leading Economic Index has been declining. Current recession probability for 2026 is estimated at 19%, but rises to 41% for 2027, indicating heightened delayed risks. Major pressures are building: a wall of corporate debt refinancing at higher rates, depleted consumer savings, a contracting housing sector, and an energy price shock. The economy exhibits stagflationary characteristics—high inflation alongside slowing growth—which constrains the Federal Reserve's policy options. Historical patterns show recessions are often preceded by Fed tightening and yield curve inversions. If a recession occurs, it is expected to be mild, similar to 2001 rather than 2008. For investors, a defensive portfolio shift toward staples, healthcare, and short-term high-quality bonds may be prudent, while maintaining a long-term, diversified perspective. Key developments to monitor include upcoming GDP, employment, and inflation data, as well as policy signals from the new Fed Chair.

marsbit13 мин. назад

The Unclear American Economy: Resilient or Cooling Down?

marsbit13 мин. назад

The Most Advanced Large Models Are Now Subject to Export Controls Like Enriched Uranium

In an unprecedented move mirroring the control of enriched uranium, the US Commerce Department has imposed an export control ban on Anthropic's advanced AI models, Fable 5 and Mythos 5, forcing their global shutdown. This marks the first time a purely digital entity—a set of neural network weights—has been subjected to such hardware-like strategic export restrictions, based not on physical scarcity but on its concentrated "capability density." The article draws a direct parallel to the historical control of nuclear technology, arguing that just as uranium ore becomes a controlled substance only when enriched to a critical threshold, AI capabilities become subject to regulation when compressed into a single, potent, and easily accessible interface. This "enriched AI" is seen as crossing a threshold where its aggregated power poses a potential threat. The author predicts three major consequences over the next decade. First, capability auditing will become institutionalized, with governments setting compliance checklists and thresholds for model power, triggering automatic export controls. Second, jurisdictional boundaries will blur as US export controls extend their reach globally, governing any user of American AI services regardless of location, forcing non-US entities to reconsider their AI supply chain dependencies. Third, a technological bifurcation will occur, splitting the AI landscape into a restricted, high-risk track of advanced US proprietary models and a more reliable track of open-source or locally developed alternatives, where guaranteed access may outweigh raw performance. The core crisis exposed is the lack of a legal property rights framework for AI "intelligence." While companies invest heavily in integrating these models into their production systems, legally they only purchase a service that can be revoked at any time, leaving them with no recourse for their sunk investments. The conclusion warns of a permanently fractured digital world where the most capable models may not be the most usable, and clear, unassailable ownership of technology will become paramount.

marsbit25 мин. назад

The Most Advanced Large Models Are Now Subject to Export Controls Like Enriched Uranium

marsbit25 мин. назад

Торговля

Спот
Фьючерсы

Популярные статьи

Как купить S

Добро пожаловать на HTX.com! Мы сделали приобретение Sonic (S) простым и удобным. Следуйте нашему пошаговому руководству и отправляйтесь в свое крипто-путешествие.Шаг 1: Создайте аккаунт на HTXИспользуйте свой адрес электронной почты или номер телефона, чтобы зарегистрироваться и бесплатно создать аккаунт на HTX. Пройдите удобную регистрацию и откройте для себя весь функционал.Создать аккаунтШаг 2: Перейдите в Купить криптовалюту и выберите свой способ оплатыКредитная/Дебетовая Карта: Используйте свою карту Visa или Mastercard для мгновенной покупки Sonic (S).Баланс: Используйте средства с баланса вашего аккаунта HTX для простой торговли.Третьи Лица: Мы добавили популярные способы оплаты, такие как Google Pay и Apple Pay, для повышения удобства.P2P: Торгуйте напрямую с другими пользователями на HTX.Внебиржевая Торговля (OTC): Мы предлагаем индивидуальные услуги и конкурентоспособные обменные курсы для трейдеров.Шаг 3: Хранение Sonic (S)После приобретения вами Sonic (S) храните их в своем аккаунте на HTX. В качестве альтернативы вы можете отправить их куда-либо с помощью перевода в блокчейне или использовать для торговли с другими криптовалютами.Шаг 4: Торговля Sonic (S)С легкостью торгуйте Sonic (S) на спотовом рынке HTX. Просто зайдите в свой аккаунт, выберите торговую пару, совершайте сделки и следите за ними в режиме реального времени. Мы предлагаем удобный интерфейс как для начинающих, так и для опытных трейдеров.

1.5k просмотров всегоОпубликовано 2025.01.15Обновлено 2026.06.02

Как купить S

Sonic: Обновления под руководством Андре Кронье – новая звезда Layer-1 на фоне спада рынка

Он решает проблемы масштабируемости, совместимости между блокчейнами и стимулов для разработчиков с помощью технологических инноваций.

2.3k просмотров всегоОпубликовано 2025.04.09Обновлено 2025.04.09

Sonic: Обновления под руководством Андре Кронье – новая звезда Layer-1 на фоне спада рынка

HTX Learn: Пройдите обучение по "Sonic" и разделите 1000 USDT

HTX Learn — ваш проводник в мир перспективных проектов, и мы запускаем специальное мероприятие "Учитесь и Зарабатывайте", посвящённое этим проектам. Наше новое направление .

1.8k просмотров всегоОпубликовано 2025.04.10Обновлено 2025.04.10

HTX Learn: Пройдите обучение по "Sonic" и разделите 1000 USDT

Обсуждения

Добро пожаловать в Сообщество HTX. Здесь вы сможете быть в курсе последних новостей о развитии платформы и получить доступ к профессиональной аналитической информации о рынке. Мнения пользователей о цене на S (S) представлены ниже.

活动图片