OpenClaw Goes Viral, Exposing 12 Types of Critical Vulnerabilities; MCP Protocol Security Benchmark Released

marsbitPubblicato 2026-04-16Pubblicato ultima volta 2026-04-16

Introduzione

The rapid rise of OpenClaw and similar AI Agents highlights the growing security risks associated with the Model Context Protocol (MCP), a standard enabling models to interact with external tools. Researchers from Beijing University of Posts and Telecommunications introduced MSB (MCP Security Bench), a security benchmark identifying 12 types of attacks across MCP’s three core stages: task planning, tool invocation, and response handling. These include name collision, false errors, retrieval injection, and mixed attacks. Notably, more capable models are often more vulnerable, with an average attack success rate (ASR) of 40.35%. The study also proposes a new metric, Net Resilient Performance (NRP), to balance security and utility. MSB evaluates agents in real-world environments, demonstrating that attacks remain effective even when harmless tools are present. The work underscores the urgent need for robust safety measures as AI agents gain broader tool-use capabilities.

The MCP protocol is enabling AI Agents to autonomously execute tasks, but security risks are soaring. Research reveals that attackers can trick Agents into performing malicious operations through 12 methods, including tool name obfuscation and false errors, with even top-tier models falling victim. A team from Beijing University of Posts and Telecommunications has released the MSB security benchmark, which, through real-environment testing, demonstrates: the more powerful the model, the more vulnerable it is to attacks. The new NRP metric, for the first time, balances security and practicality, providing a crucial yardstick for fortifying AI Agent defenses.

Recently, open-source AI Agent projects like OpenClaw have exploded in popularity within the developer community. With just a single sentence, an Agent can automatically write code, research information, manipulate local files, and even take control of your computer.

The astonishing autonomy of these Agents is underpinned by the capabilities provided by tool calls, with MCP (Model Context Protocol) serving as the interface unifying the AI tool ecosystem. Just as USB-C allows computers to connect to various devices, MCP enables large language models (LLMs) to call external tools like file systems, browsers, and databases in a standardized way.

Faced with such a vast ecosystem, even OpenClaw, which primarily focuses on native command-line operation, has integrated an adapter to connect to MCP, gaining access to a broader range of tool capabilities.

However, as AI's "hands" reach further, danger follows. What if the tool the Agent calls has been poisoned by a hacker? What if the error message returned by the tool contains hidden malicious instructions?

When the LLM unsuspectingly executes these instructions, your private data, local files, and even server permissions become easy prey for hackers.

To fill the gap in security assessment for the MCP ecosystem, a research team from institutions including Beijing University of Posts and Telecommunications introduced a dedicated security benchmark for the MCP protocol: MSB (MCP Security Bench). The study found: Attacks targeting each stage of the MCP process are effective. The more powerful the model, the more vulnerable it is to attacks. The paper has been accepted by ICLR 2026.

Paper link: https://openreview.net/pdf?id=irxxkFMrry

Code: https://github.com/dongsenzhang/MSB

MCP Security Risks Behind Agents

Figure 1: MCP Attack Framework

MCP significantly expands the capabilities of Agents, but it also significantly broadens the attack surface. Under the MCP framework, the Agent's tool invocation process typically involves three stages:

1. Task Planning: The Agent selects an appropriate tool based on the user query, using the tool's name and description.

2. Tool Invocation: The Agent sends a request to the selected tool, passing relevant parameters to perform specific operations.

3. Response Handling: The Agent parses the tool's response and continues reasoning or generates a final answer based on it.

Each stage can become a new attack vector. MSB covers the complete MCP tool invocation stages and is specifically designed to evaluate the security of Agents based on MCP tool usage. It has three core highlights:

MCP Attack Taxonomy

In the MCP workflow, Agents interact with tools through tool identifiers (names and descriptions), parameters, and tool responses, all of which can become attack pathways. MSB classifies attack types based on these pathways and interaction stages:

Tool Signature Attack: Attacks during the task planning phase, utilizing tool names and descriptions, including:

Name Collision (NC): Creating a malicious tool with a name similar to an official tool to induce the Agent to select it.

Preference Manipulation (PM): Injecting promotional statements into the tool description to induce the Agent to select it.

Prompt Injection (PI): Injecting malicious instructions into the tool description.

Tool Parameter Attack: Attacks during the tool invocation phase, utilizing tool parameters, including:

Out-of-Scope Parameter (OP): Setting tool parameters that exceed normal functionality, potentially causing information leakage through parameter passing.

Tool Response Attack: Attacks during the response handling phase, utilizing tool responses, including:

User Impersonation (UI): Impersonating the user to issue malicious instructions.

False Error (FE): Providing false tool execution error information, requiring the Agent to follow malicious instructions to successfully call the tool.

Tool Transfer (TT): Instructing the Agent to call a malicious tool.

Retrieval Injection Attack: Attacks during the response handling phase, utilizing external resources, including:

Retrieval Injection (RI): External resources embedding malicious instructions that corrupt the context through the tool response.

Mixed Attack: Attacks across multiple stages, simultaneously utilizing multiple tool components, including combinations of the above attacks.

Execution Suite in Real Environments

MSB rejects paper-based simulated evaluation. It is equipped with real MCP servers, covering 10 real-world scenarios, 405 real tools, and 2,000 attack instances. All instances involve real tool execution through MCP, accurately reflecting actual operating environments to directly observe the extent of environmental damage caused by attacks.

NRP Metric Balancing Performance and Security

In Agent security assessment, relying solely on the Attack Success Rate (ASR) is highly deceptive. If an Agent refuses to execute any tool calls to avoid risk, its ASR might be close to 0, but it would also fail to complete user tasks, losing practical value.

To address this, MSB proposes the Net Resilient Performance (NRP) metric:

NRP = PUA ⋅ (1 − ASR)

Where PUA (Performance Under Attack) is the proportion of user tasks the Agent completes in an adversarial environment, and ASR is the attack success rate. NRP aims to assess the overall risk resilience of an Agent in resisting attacks while maintaining performance, providing a comprehensive quantitative standard that balances performance and security.

Figure 2: NRP vs ASR, NRP vs PUA.

All Attack Methods Are Effective

Figure 3: Main experimental results.

The research team conducted large-scale tests on 10 mainstream models, including GPT-5, DeepSeek-V3.1, Claude 4 Sonnet, and Qwen3, using MSB. All attack methods demonstrated effectiveness, with an overall average ASR of 40.35%. Among them, novel attacks introduced by MCP are more aggressive; compared to PI and RI attacks that already exist in function calling, MCP-based attacks like UI and FE have higher success rates. Mixed attacks show synergistic enhancement, with their success rates higher than those of their constituent single attacks.

More Powerful Models Are More Vulnerable

The relationship between different metrics reveals a counterintuitive conclusion: the more capable the model, the more vulnerable it tends to be.

Figure 4: PUA vs ASR.

In MSB, completing attack tasks still requires the Agent to call tools, such as using a file read tool to obtain personal information. LLMs with higher utility, due to their superior tool calling and instruction-following capabilities, exhibit higher ASR. This finding highlights the significant practical risk of MCP security vulnerabilities.

Full-Stage, Multi-Tool Environmental Compromise

Figure 5: ASR across different stages and tool configurations.

Further analysis from the perspective of the MCP workflow and tool configuration reveals that Agents are vulnerable to attacks at all stages of MCP, with security being lowest during the tool invocation phase.

Furthermore, attacks remain effective even in multi-tool environments containing harmless tools. Real-world scenarios often provide Agents with a toolkit; even if harmless tools are present,诱导 methods like NC, PM, and TT can still lead to significant attack success.

Conclusion

The viral success of OpenClaw has given people a clear glimpse into the future of Agents: LLMs are no longer just answering questions but are starting to actually do things. MSB was proposed precisely in this context. It systematically reveals potential attack surfaces within the MCP ecosystem and provides a reproducible, quantifiable systematic evaluation benchmark for Agent security research.

Past LLM security research primarily focused on linguistic-level risks like prompt injection. MSB demonstrates that as AI calls tools and interacts with real systems, the attack surface is expanding from text space to the tool ecosystem. As Agents gradually become the new paradigm for AI applications, security might be the threshold that must be crossed for this technological leap.

References:

https://openreview.net/pdf?id=irxxkFMrry

This article is from the WeChat public account "新智元" (New Zhiyuan), author: 新智元

Domande pertinenti

QWhat is the MCP Security Bench (MSB) and what are its key features?

AThe MCP Security Bench (MSB) is a security benchmark developed by a research team from Beijing University of Posts and Telecommunications to evaluate the security of AI Agents using the Model Context Protocol (MCP). Its key features include a comprehensive MCP attack classification system covering all tool invocation phases, a real-world execution suite with 10 scenarios and 2,000 attack instances, and a novel Net Resilient Performance (NRP) metric that balances security and performance.

QWhat is the Net Resilient Performance (NRP) metric and why is it important?

AThe Net Resilient Performance (NRP) metric is a new evaluation standard proposed in the MSB benchmark. It is calculated as NRP = PUA * (1 - ASR), where PUA (Performance Under Attack) is the proportion of user tasks the Agent completes in an adversarial environment, and ASR is the Attack Success Rate. NRP is important because it provides a comprehensive measure of an Agent's overall risk resilience by balancing its ability to maintain performance while resisting attacks, preventing misleadingly low ASR scores from Agents that simply refuse to perform any tasks.

QAccording to the research, what is the relationship between model capability and vulnerability to MCP attacks?

AThe research found a counterintuitive relationship: more powerful and capable models are often more vulnerable to MCP attacks. This is because higher-performing models possess superior tool invocation and instruction-following capabilities, which attackers can exploit. In the MSB tests, these models showed a higher Attack Success Rate (ASR) as they were more likely to successfully execute the malicious operations requested by the attacker through the compromised tools.

QName three types of attacks classified in the MCP attack framework and the phase they target.

A1. Name Collision (NC): A Tool Signature Attack that targets the Task Planning phase by creating a malicious tool with a name similar to an official one to induce the Agent to select it. 2. False Error (FE): A Tool Response Attack that targets the Response Handling phase by providing false tool execution error messages containing malicious instructions. 3. Out-of-Scope Parameter (OP): A Tool Parameter Attack that targets the Tool Invocation phase by setting tool parameters that are beyond their normal functionality to cause information leaks.

QWhat overall finding did the MSB benchmark reveal about the effectiveness of attacks across the MCP workflow?

AThe MSB benchmark revealed that all 12 classified attack methods were effective, with an overall average Attack Success Rate (ASR) of 40.35%. It found that attacks are effective at every stage of the MCP workflow (Task Planning, Tool Invocation, and Response Handling), with the Tool Invocation phase being the most vulnerable. Furthermore, attacks remained effective even in multi-tool environments that included harmless tools, and mixed attacks combining multiple techniques showed a synergistic effect, resulting in higher success rates.

Letture associate

Silicon Valley 'Startup Guru' Steve Hoffman: Web3 + AI Could Be a Trap

Silicon Valley investor and "Godfather of Startups" Steve Hoffman warns that combining Web3 with AI is likely a trap, not a promising venture. In an interview, Hoffman argues that while AI is a foundational technology touching all industries, Web3 adds complexity, friction, and regulatory risk without solving mainstream consumer or business needs. He advises founders to focus on deep, specialized applications where startups can out-iterate giants, rather than on generic features easily replicated by large tech companies. Hoffman observes that Silicon Valley will lead foundational AI research, while China excels at rapid, large-scale application and commercialization, particularly in robotics. He stresses that AI-driven autonomous agents capable of collaborative, multi-step tasks are 2-4 years away, which will cause significant job displacement. The solution is not to slow AI but to redesign business models around human-AI collaboration and reform social systems like education and retraining. For startups, Hoffman recommends focusing on vertical, expertise-heavy domains to build defensibility. He sees major opportunities in AI fraud detection and cybersecurity. Key founder mindsets include systemic thinking over feature-focus, relentless customer centricity, building adaptive teams, and deeply understanding AI's capabilities and limits. Hoffman is also leading a non-profit initiative to establish university centers aimed at training future leaders in responsible, human-value-aligned AI innovation.

marsbit6 min fa

Silicon Valley 'Startup Guru' Steve Hoffman: Web3 + AI Could Be a Trap

marsbit6 min fa

Token Inefficient, Economy Tokenless

The article "Tokens Aren't Economical, Economics Aren't Tokenized" analyzes a pivotal shift in the AI industry from a technology-driven narrative to one dominated by capital efficiency. It highlights two concurrent trends: a severe capital shortage due to the exorbitant and recurring costs of compute (e.g., OpenAI's high burn rate) and a wave of corporate spin-offs where major tech companies are separating their AI units (like Kuaishou's Kling and Baidu's Kunlunxin). The core argument is that AI's "anti-internet" business model, where user growth increases costs rather than profits, has created a disconnect between high valuations and actual cash flow. Spin-offs address this by allowing AI assets to be valued independently. Within a parent company, they are seen as cost centers, but as standalone entities, they are priced based on their growth potential and scarcity in the primary market, leading to massive valuation premiums (e.g., Kling's estimated value tripling post-spin-off). The industry is at an inflection point, moving from "model worship" to "value realization." The competition is evolving from a pure compute (GPU) race to a broader focus on systemic efficiency and full-stack engineering (involving CPUs and orchestration) to achieve viable commercialization. The year 2026 is framed as a critical moment where the industry must definitively answer how to economically translate AI capability into tangible business value, reshaping the sector's future power structure.

marsbit10 min fa

Token Inefficient, Economy Tokenless

marsbit10 min fa

Crossing the 'Memory Wall': The Wafer-Level Revolution and Computing Power Routes in the AI Inference Era

In 2026, a historic shift occurred in AI as major cloud providers' inference spending surpassed training spending for the first time, signaling a move from "building large models" to "using large models." This shifts the core challenge from computing power to the "memory wall"—the bottleneck of data movement (model weights, activations, KV Cache) between external DRAM and processors, where energy and latency from data transfer far exceed computation itself. Companies like Nvidia face GPU idle time due to bandwidth limits. In contrast, Cerebras Systems adopts a radical "wafer-scale" approach with its Wafer-Scale Engine (WSE). Instead of cutting a silicon wafer into many chips, Cerebras uses almost the entire wafer as one massive chip (WSE-3). This design provides 44GB of on-chip SRAM, delivering memory bandwidth thousands of times higher than traditional HBM (e.g., 21 PB/s vs. Nvidia B200). For LLM inference, weights are streamed layer-by-layer from external MemoryX storage to the chip, avoiding HBM bottlenecks. This results in token generation speeds 1.5–5 times faster than Nvidia's B200 in some models and significant advantages in first-token latency and long-context tasks. Additionally, Cerebras's architecture offers much lower interconnect power consumption (0.15 pJ/bit vs. GPU's ~10 pJ/bit). However, Cerebras faces challenges: SRAM scaling has slowed with advanced nodes, limiting future capacity gains; the chip requires specialized liquid cooling and custom software stacks; and its external I/O bandwidth (150 GB/s) is low compared to NVLink, hindering multi-system scaling for very large models. Competition is intensifying. Major players are pursuing three paths: 1) Developing proprietary inference ASICs (e.g., Google TPU, Microsoft Maia), 2) Leveraging advanced packaging (e.g., TSMC's SoW) to democratize wafer-scale-like integration, potentially eroding Cerebras's process advantage within a few years, and 3) Exploring optical interconnects for ultimate bandwidth. Commercially, Cerebras is transitioning from a hardware vendor to a service provider, facing the immense challenge of building high-power, specialized data centers to meet large contracts (e.g., 250MW/year from 2026–2028). In conclusion, the AI inference era presents a fundamental architectural trade-off. Cerebras opts for extreme physical optimization for low-latency, single-task performance, while Nvidia prioritizes versatility and massive cluster throughput. The path forward remains uncertain, with technology and business models still evolving in the race toward advanced AI.

marsbit16 min fa

Crossing the 'Memory Wall': The Wafer-Level Revolution and Computing Power Routes in the AI Inference Era

marsbit16 min fa

Has Bitcoin's 'Rebound Ended', Officially Entering the Late Bear Market Phase?

**Title: Has Bitcoin's Rebound Ended, Entering the Late Bear Market Phase?** **Summary:** Bitcoin's price has declined by 13% this week, signaling a potential return to late-stage bear market conditions. The price fell to around $67k, positioned between the Realized Price and Realized Cap Weighted Average. For the first time since early 2022, the Short-Term Holder cost basis has dropped below this key average, confirming a hallmark of late-cycle bear markets. Profitability metrics have collapsed sharply. The 7-day average of the Realized Profit/Loss ratio plummeted from a local high of 3.16 to 0.29, mirroring the February panic sell-off. Critically, the 90-day average never breached the threshold of 2, indicating the recent rally to $82k was a bear market bounce, not a structural shift. Realized losses surged to $1.35 billion daily, with $770 million coming from Long-Term Holders selling at a loss. This accelerating redistribution of supply from weak to strong hands is a necessary but ongoing process for a market bottom. The rally stalled almost precisely at the aggregate cost basis (~$83k) of US spot Bitcoin ETF investors, turning that level into strong resistance and leaving the average ETF holder underwater again. Spot market flows have turned decisively negative, showing sellers are dominating order books despite the price drop. While a significant futures long liquidation event cleared over $400 million in leverage, providing a potential reset, sustained spot demand is yet to materialize. Options markets continue to price in higher future volatility (Implied Volatility) than recent price action (Realized Volatility) has shown, with a persistent skew towards put options, indicating ongoing demand for downside protection. In conclusion, multiple metrics point to a fragile market structure. Resistance at the ETF cost basis, accelerating realized losses, dominant spot selling, and cautious options pricing all suggest the bear market trend persists. A sustainable recovery likely requires a resurgence of spot demand, ETF holders returning to profit, and a clear reduction in selling pressure.

marsbit16 min fa

Has Bitcoin's 'Rebound Ended', Officially Entering the Late Bear Market Phase?

marsbit16 min fa

TechFlow Intelligence Agency: Anthropic Calls for Global Pause in AI Development While Preparing for Trillion-Dollar IPO; SpaceX IPO Roadshow Heats Up, But S&P 500 Rejects Fast-Track Inclusion

In today's TechFlow Intelligence Briefing, several major tech stories highlight a growing theme of trust and credibility gaps across AI, crypto, and finance. AI company Anthropic has publicly called for a global pause in AI development, citing risks from Claude's "recursive self-improvement." Ironically, this coincides with reports the company is preparing for a massive IPO targeting a near $1 trillion valuation. This perceived hypocrisy, coupled with widespread user complaints about Claude's declining performance, is sparking debate over whether the safety warning is genuine or a competitive tactic. Meanwhile, in a substantive security move, Anthropic open-sourced a framework for AI-powered vulnerability discovery. In the crypto market, Bitcoin's price drop below $61,000 triggered over $1.16 billion in liquidations, flipping the market into a state where more BTC is held at a loss than at a profit, a historical bearish signal. On the corporate front, SpaceX's highly anticipated IPO is generating immense Wall Street excitement, with Goldman Sachs projecting 100x revenue growth by 2030. However, the S&P 500 has refused to fast-track the company's inclusion post-IPO, potentially limiting immediate institutional demand. Separately, ByteDance's AI app Doubao lost over 6 million monthly active users after introducing a subscription model, highlighting the challenges of AI monetization. Other notable developments include Nvidia certifying HBM4 memory from Samsung, SK Hynix, and Micron; Cloudflare's acquisition of front-end tooling company VoidZero; and its CEO warning that bot traffic now exceeds human traffic online. The underlying narrative connects these events: a trust crisis. From AI firms' contradictory actions and crypto volatility to the clash between SpaceX's hyped narrative and institutional rules, a pattern is emerging where stated intentions and actual practices are increasingly misaligned.

marsbit32 min fa

TechFlow Intelligence Agency: Anthropic Calls for Global Pause in AI Development While Preparing for Trillion-Dollar IPO; SpaceX IPO Roadshow Heats Up, But S&P 500 Rejects Fast-Track Inclusion

marsbit32 min fa

Trading

Spot
Futures
活动图片