CertiK Test: How the Vulnerable OpenClaw Skill Bypassed Review and Took Over Computers Without Authorization

marsbitPublished on 2026-03-22Last updated on 2026-03-22

Abstract

CertiK's latest research reveals critical security vulnerabilities in OpenClaw's third-party Skill ecosystem. Despite OpenClaw's three-layer review system—including VirusTotal scanning, static code analysis, and AI logic checks—malicious Skills can easily bypass these safeguards. CertiK demonstrated this by developing a seemingly benign "test-web-searcher" Skill that contained a hidden remote code execution vulnerability. It was approved without warnings, allowing unauthorized command execution on the host machine (e.g., launching system calculators via Telegram commands). The core issue is the industry’s overreliance on pre-release scans rather than runtime isolation and strict permission controls. Unlike iOS’s mandatory sandboxing, OpenClaw’s sandbox is optional and often disabled by users for functionality, leaving systems exposed. CertiK urges developers to enforce mandatory sandboxing and granular permissions for Skills, and advises users to deploy OpenClaw on isolated devices away from sensitive data or assets. The study underscores that scanning alone cannot secure high-permission AI agents; runtime isolation and damage containment are essential for safety.

Recently, the open-source self-hosted AI agent platform OpenClaw (commonly known as "Crawfish") has rapidly gained popularity due to its flexible scalability and self-controlled deployment features, becoming a phenomenon in the personal AI agent space. Its core ecosystem, Clawhub, serves as an app marketplace, gathering a vast number of third-party Skill plugins that enable agents to unlock advanced capabilities—from web search and content creation to encrypted wallet operations, on-chain interactions, and system automation—with a single click. The ecosystem's scale and user base have experienced explosive growth.

But for such third-party Skills running in high-privilege environments, where exactly is the platform's real security boundary?

Recently, CertiK, the world's largest Web3 security company, released new research on Skill security. The report points out that the current market has a misplaced perception of the security boundaries of AI agent ecosystems: the industry generally regards "Skill scanning" as the core security boundary, but this mechanism is almost useless against hacker attacks.

If OpenClaw is compared to an operating system for smart devices, Skills are the various APPs installed on the system. Unlike ordinary consumer APPs, some Skills in OpenClaw run in high-privilege environments, directly accessing local files, calling system tools, connecting to external services, executing host environment commands, and even operating users' encrypted digital assets. Once security issues arise, they can directly lead to serious consequences such as sensitive information leakage, remote device takeover, and theft of digital assets.

The current universal security solution for third-party Skills across the industry is "pre-listing scanning and review." OpenClaw's Clawhub has also built a three-layer review and protection system: integrating VirusTotal code scanning, static code detection engines, and AI logic consistency checks. It uses risk grading to push security alerts to users, attempting to safeguard ecosystem security. However, CertiK's research and proof-of-concept attack tests confirm that this detection system has shortcomings in real attack-defense scenarios and cannot bear the core responsibility of security protection.

The research first breaks down the inherent limitations of the existing detection mechanisms:

Static detection rules are easily bypassed. This engine primarily relies on matching code features to identify risks, such as flagging the combination of "reading sensitive environmental information + sending network requests" as high-risk behavior. However, attackers only need to make slight syntactic modifications to the code to completely bypass feature matching while fully retaining malicious logic. It's like rephrasing dangerous content with synonymous expressions, rendering the security scanner completely ineffective.

AI review has inherent detection blind spots. Clawhub's AI review is primarily positioned as a "logic consistency detector," which can only catch obvious malicious code where "declared functionality does not match actual behavior." However, it is helpless against exploitable vulnerabilities hidden within normal business logic, much like how it is difficult to find fatal traps buried deep in the clauses of a seemingly compliant contract.

More critically, the review process has underlying design flaws: even when VirusTotal's scan results are still "pending" and the full "health check" process is incomplete, Skills can still be directly listed publicly. Users can install them without any warnings, leaving an opening for attackers.

To verify the real危害性 of the risks, the CertiK research team completed full testing. The team developed a Skill named "test-web-searcher," which表面上 appears to be a fully compliant web search tool with code logic that完全符合常规开发规范. However, it actually implants a remote code execution vulnerability within the normal functional flow.

This Skill bypassed the detection of both the static engine and the AI review. While the VirusTotal scan was still pending, it was installed normally without any security warnings. Ultimately, by sending a remote command via Telegram, the vulnerability was successfully triggered, achieving arbitrary command execution on the host device (in the demo, it directly controlled the system to launch the calculator).

CertiK clearly stated in the research that these issues are not unique product bugs of OpenClaw but rather a common cognitive误区 across the AI agent industry: the industry普遍 regards "review scanning" as the core security防线, while neglecting the true security根基, which is runtime mandatory isolation and fine-grained permission control. This is similar to how the security core of Apple's iOS ecosystem has never been the strict review of the App Store, but rather the system's enforced sandbox mechanism and fine-grained permission management, ensuring each APP runs in its dedicated "isolation pod" without随意获取系统权限. OpenClaw's existing sandbox mechanism is optional而非强制的 and highly reliant on manual user configuration. Most users, to ensure Skill functionality, choose to disable the sandbox, ultimately leaving the agent in a "naked" state. Once a Skill with vulnerabilities or malicious code is installed, it can directly lead to catastrophic consequences.

Regarding the issues discovered, CertiK also provided security guidance:

● For developers of AI agents like OpenClaw, sandbox isolation must be set as the default mandatory configuration for third-party Skills, with a fine-grained permission control model. Third-party code must never默认继承 the host machine's high privileges.

● For ordinary users, Skills labeled "safe" in the marketplace merely indicate that no risks were detected; it does not equate to absolute safety. Before官方 makes底层强隔离机制 the default configuration, it is recommended to deploy OpenClaw on non-critical idle devices or virtual machines. Never let it near sensitive files, password credentials, or high-value加密资产.

The AI agent赛道 is currently on the eve of explosion. The speed of ecosystem expansion must not outpace the pace of security construction. Review scanning can only block初级恶意攻击 but can never become the security boundary for high-privilege agents. Only by shifting from "pursuing perfect detection" to "assuming risk exists and focusing on damage containment," and by establishing隔离边界强制 at the runtime底层, can the security底线 of AI agents truly be safeguarded, allowing this technological transformation to proceed steadily and go the distance.

Original Research: https://x.com/hhj4ck/status/2033527312042315816?s=20

https://mp.weixin.qq.com/s/Wxrzt7bAo86h3bOKkx6 UoA

Related Reads

RWA First Stock's Major Acquisition: Why Buy a 'Traditional' Mortgage Company?

On June 10th, Figure Technology Solutions (Nasdaq: FIGR), a blockchain-native capital markets firm, announced a $717 million acquisition of Kiavi, a leading non-bank lender for residential real estate investors. The deal involves Figure acquiring Kiavi's technology and operations for approximately $538 million, while forming a joint venture with alternative asset manager Sixth Street to purchase Kiavi's existing loan portfolio. Sixth Street also provided a $3 billion forward purchase commitment. This acquisition marks a strategic shift for Figure, known as the "RWA (Real World Asset) first stock," allowing it to expand significantly into the larger market of first-lien mortgages. Kiavi specializes in non-qualified mortgage (Non-QM) loans, such as short-term fix-and-flip (RTL) and rental property (DSCR) loans—a segment traditionally underserved by major banks. The move is expected to increase Figure's first-lien loan origination to over $7 billion annually, aiming for these to constitute about 40% of its business by 2027. Both companies leverage AI for underwriting: Kiavi uses proprietary models to value renovated properties and automate document processing, dominating the fix-and-flip lending space. Figure plans to integrate these assets onto its blockchain platform, Provenance, using its new 'Adaptor' product to standardize and tokenize the loans for institutional investors on its Democratized Prime marketplace. While the integration poses challenges—including merging different asset types, interest rate sensitivity of Kiavi's loans, and post-IPO execution risks—Figure projects the deal to be accretive to earnings with a cash payback period under four years. The transaction is seen as a major step in scaling blockchain-based capital markets, moving RWA tokenization from concept validation toward large-scale operation.

Foresight News12m ago

RWA First Stock's Major Acquisition: Why Buy a 'Traditional' Mortgage Company?

Foresight News12m ago

Trend in US Stocks: A Post Triggers a 930-Point Rebound, Tonight Belongs to SpaceX

On Thursday (June 11, U.S. Eastern Time), Wall Street staged a textbook V-shaped reversal. The Dow Jones surged 929.97 points (+1.86%) to close above 50,000, while the Nasdaq and S&P 500 rose 2.54% and 1.75%, respectively. The rally occurred despite the hottest PPI report in years, with May data showing a 6.5% year-on-year surge, the highest since 2022. The market ignored the inflation data, focusing instead on reports that former President Trump called off a planned strike on Iran, hinting at a potential multi-party peace agreement draft. This sparked a sharp drop in oil prices, fueling hopes that inflation may have peaked. Sector rotations were stark: previously battered AI hardware and cyclical stocks led the gains, while defensive sectors that hit record highs the prior day were sold off. Chip stocks like Micron and Intel saw sharp rebounds. In contrast, software giant Oracle plunged nearly 10% despite beating earnings, with concerns over cloud revenue and cash flow. Adobe also fell after hours despite raising guidance, as its CFO announced departure. The rally's sustainability is questioned, driven largely by social media posts about unconfirmed geopolitical developments. Inflation risks remain, with pipeline pressures still high. Meanwhile, the market's risk appetite faces a major test with SpaceX's historic IPO. Priced at $135 per share, it aims to raise ~$75 billion with a $1.75 trillion valuation, becoming the largest U.S. IPO ever. It will join the Nasdaq 100 in 15 days, triggering massive index fund buying. However, critics cite extreme valuation (88x sales) and market liquidity concerns.

marsbit35m ago

Trend in US Stocks: A Post Triggers a 930-Point Rebound, Tonight Belongs to SpaceX

marsbit35m ago

The Trillion-Dollar Valuation Test: Are the Three Super IPOs a Tech Stock Frenzy or a Crypto Market Nightmare?

Trillion-Dollar Valuation Test: Are the Three Mega IPOs a Tech Stock Frenzy or a Crypto Market Nightmare? The capital market in 2026 is witnessing a highly anticipated wave of tech IPOs, centered on SpaceX, OpenAI, and Anthropic. Collectively valued at over $3.5 trillion, their potential listing represents one of the largest such waves in recent years. This raises concerns about market liquidity, valuation bubbles, and potential capital outflows from other assets like crypto. SpaceX's valuation narrative has shifted from rocket launches to becoming a global infrastructure play via its Starlink satellite network, which now drives most revenue. Despite ongoing losses, investors focus on its long-term growth potential. OpenAI and Anthropic represent the core productivity engines of generative AI. Their public listings would offer the first direct investment opportunity in large foundation model companies, potentially triggering a repricing within the AI sector. Market fears of a massive "capital drain" from these IPOs are likely overstated. Historical precedents like Alibaba and Saudi Aramco show that mega-listings primarily cause capital reallocation, not destruction, within the vast equities market. Systemic risk is rarely triggered by IPOs alone. For stock markets, short-term volatility and sector repricing are expected, especially for AI concept stocks. Long-term, these listings could reinforce the tech sector's importance. For crypto, direct competition for speculative capital exists, particularly affecting AI-themed tokens. However, crypto's trajectory remains more tied to its own cycles, macro liquidity, and Bitcoin ETF flows rather than a single IPO event. The real risk lies not in the listings themselves but in the sky-high growth expectations embedded in these valuations. If future revenue, profitability, or commercialization progress disappoints, significant valuation resets could follow, impacting high-growth tech stocks. Ultimately, the market's direction hinges on macroeconomic conditions and whether these companies can deliver on their ambitious promises.

链捕手51m ago

The Trillion-Dollar Valuation Test: Are the Three Super IPOs a Tech Stock Frenzy or a Crypto Market Nightmare?

链捕手51m ago

Trillion-Dollar Valuation Test: Are the Three Super IPOs a Tech Stock Frenzy or a Crypto Market Nightmare?

Title: Trillion-Dollar Valuations at Stake: Super IPOs of SpaceX, OpenAI, Anthropic – Tech Boom or Crypto Nightmare? TL;DR: A wave of mega-tech IPOs is approaching, featuring SpaceX (targeting a $1.75 trillion valuation), OpenAI (~$852B), and Anthropic (~$965B), with a combined potential valuation exceeding $3.5 trillion. This tests the market's pricing of innovation and sparks debate on liquidity impact. * **SpaceX**'s valuation is now driven more by its Starlink global communications infrastructure than its core rocket business. * **OpenAI & Anthropic** offer the first major public investment opportunities in foundational AI models, potentially repricing the entire AI sector. * Concerns about a market-wide "liquidity drain" are likely overblown; history shows large IPOs mainly cause fund reallocation, not disappearance, and rarely trigger systemic risk. * Crypto markets, especially some AI-themed tokens, may face short-term fund competition, but their long-term trajectory depends more on macro liquidity, regulation, and Bitcoin cycles. * The real risk lies not in the IPOs themselves, but in whether these companies can justify their sky-high valuations with future revenue growth and profitability. Unmet expectations could lead to significant repricing pressure. Ultimately, these IPOs represent a massive market pricing of next-gen tech infrastructure, not a prelude to a market crash. The broader market direction will be determined by macro conditions, corporate earnings, and risk appetite.

marsbit51m ago

Trillion-Dollar Valuation Test: Are the Three Super IPOs a Tech Stock Frenzy or a Crypto Market Nightmare?

marsbit51m ago

Trading

Spot
Futures
活动图片