Capturing 15 Top-Tier Zero-Day Vulnerabilities: A Consensus Protocol Debug Agent Framework Built by 0G Lab in Collaboration with Teams from NUS, PKU, and BUPT

marsbit2026-06-11 tarihinde yayınlandı2026-06-11 tarihinde güncellendi

Özet

"Agents Capture 15 Critical Zero-Day Bugs: 0G Lab's Multi-Agent Framework Automates Debugging in Consensus Protocols" Distributed consensus protocols are notoriously difficult to debug due to complex, intertwined states. A novel framework, Agora, developed by 0G Labs with researchers from NUS, Peking University, and Beijing University of Posts and Telecommunications, tackles this by fusing deep domain expertise with a collaborative multi-agent LLM architecture. Agora moves beyond the limitations of single LLMs and traditional testing like fuzzing. It employs three specialized agents: an Orchestrator for global state, a Strategy agent for generating attack scenarios using distributed systems knowledge, and a TestGen agent that creates executable tests. A core innovation is its efficient "Succinct Memory & Communication" mechanism and a dynamic test harness. This allows the system to translate abstract hypotheses into concrete tests across languages like Go and Rust, run them, capture failures, and refine the approach in a closed loop—all with minimal token overhead. In rigorous evaluations on production-level protocols including Raft, EPaxos, and components from etcd and Sui, Agora discovered 15 previously unknown deep logic bugs (e.g., execution divergence, liveness violations). In stark contrast, powerful standalone LLMs like GPT-5.2 and Claude 4.5 found zero such bugs. Agora achieved this with a high precision of 73.9% and at an average cost of only about $40 per bug fou...

The "Holy Grail" of distributed systems—consensus protocols—has long been a "Bug Hell" for top-tier infrastructure engineers. Due to their extremely complex states and intertwined multi-node interactions, traditional testing and monolithic LLMs are almost powerless against hardcore Deep Bugs (deep logical vulnerabilities).

Recently, in a paper accepted at the upcoming ICML 2026, researchers from 0G Labs and top academic-industry teams including the National University of Singapore, Peking University, and Beijing University of Posts and Telecommunications proposed Agora—the first automated testing framework that deeply integrates domain knowledge with large language model multi-agent collaboration.

Through an innovative architecture that directly tackles the pain points of protocols, this framework has successfully captured 15 previously unknown protocol-level Deep Bugs in industrial and academic core protocols such as Raft, EPaxos, HotStuff, and BullShark! In stark contrast, top native large models like GPT-5.2 and Claude 4.5 all failed, scoring zero. As multi-agent systems and "Agentic Quality Control" become the hottest tracks in 2026, Agora delivers not just a paper, but a practical, industrial-grade solution.

Paper: "Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents"

1. Background: A Powerful Alliance between 0G and NUS, Merging Long-Term System Knowledge with the Cross-Generational Multi-Agent Paradigm

The evolution of distributed consensus protocols is both a history of genius innovation and a bloody chronicle of pitfalls encountered by countless top engineers. As Turing Award winner Lamport stated, ensuring the correctness of distributed protocol implementations is as challenging as navigating a constantly shaking maze blindfolded. On this "hellish" track, the market is quietly shifting: According to Gartner observations, enterprise consulting demand for multi-agent systems has surged over tenfold in just over a year, and the multi-agent platform market is entering a period of rapid expansion, nearly doubling annually—using "multi-agent collaboration" for the most hardcore low-level system verification is transforming from a frontier concept to an industry necessity.

Facing this hellish challenge, tech giants with halos were the first to embark on heavy-asset exploration. For example, industry leader Anthropic's recent internal Glasswing project within Claude Code attempted to use agents for low-level infrastructure testing, but its architecture still heavily relies on top-tier commercial large models, with vague project details and closed-door collaborations limited to a handful of large institutions and multinational corporations. More critically, such giant-led solutions may exhibit terrifying token consumption during operation. This high computational barrier and heavy-asset approach directly shut out startups and SMEs with limited budgets.

Are smaller companies and open-source communities doomed to be unable to afford top-tier automated vulnerability auditing tools?

Engineers from 0G Labs, collaborating with Xiang Liu from the National University of Singapore, Sa Song and Yong Sun from Beijing University of Posts and Telecommunications, and Ph.D. student Zhao-wei Zhang and researcher Ce-yao Zhang from Peking University's School of Intelligence, leveraged their profound knowledge in the agent domain to empower systems, launching a disruptive "David vs. Goliath" innovation. Their work has been accepted at the 2026 AI top conference ICML.

The academic world's "long-term accumulation of system knowledge" meets the industry's "pain points and keen insight." How can this ignite the next revolution in system security?

The 0G team has accumulated extremely rich production-level attack and defense experience in implementing blockchain consensus protocols; while the academic team has profound expertise in high-performance distributed systems, low-level concurrency control, and formal verification. They are keenly aware that traditional methods (like fuzzing) often struggle with state-space explosion when facing industrial-scale codebases. The researchers decided to infuse the "soul"—their long-accumulated knowledge of global invariant logical deduction in distributed systems—into the cutting-edge multi-agent collaboration paradigm and automated harness architecture, launching the open-source and accessible Agora framework.

Simultaneously, as a leader in modular AI infrastructure and high-performance decentralized data availability networks, the 0G team has accumulated extremely rich production-level attack/defense experience and real-world protocol defect samples in the industrial implementation of blockchain consensus protocols and high-concurrency BFT (Byzantine Fault Tolerance) architectures.

This cross-domain fusion fundamentally changes the game: it is neither blind brute-force testing nor large models "fumbling in the dark" without domain knowledge. Instead, through specialized agent roles, it transforms the decades of logical deduction intuition from seasoned system experts into strategic interaction and collaboration among agents, thereby acquiring the hardcore capability to outperform traditional testing tools.

Unlike Glasswing's heavy-asset approach, which voraciously consumes expensive top-tier tokens, Agora presents a highly accessible alternative for SMEs—it proves that even with a "slightly inferior" base model and higher cost-effectiveness, a cleverly designed domain-aware multi-agent collaborative architecture can still unearth hardcore Deep Bugs!

2. Pain Point: Monolithic LLMs Struggle to Break Through, Distributed Systems Hang Under the "Damocles' Sword" of Deep Logic

In today's world dominated by big data, blockchain, and distributed databases, consensus protocols (like Paxos, Raft, PBFT, etc.) form the foundational bedrock of the entire digital world. However, implementing consensus protocols is notoriously "hellishly difficult." Even industrial-grade benchmark projects like etcd, honed by countless top engineers worldwide over years of operation, still harbor Deep Bugs (deep logical vulnerabilities) that send chills down one's spine.

These vulnerabilities differ from ordinary low-level implementation bugs like memory leaks or integer overflows. They span multiple execution phases and depend on complex concurrent states. If maliciously triggered, they can not only cause core data corruption but also lead to catastrophic financial-level losses.

While Large Language Models (LLMs), hugely popular in recent years, have shown promise in general code analysis, they appear "intellectually challenged" when facing distributed consensus. They can at best find shallow defects in local code. When confronted with protocol-level logical vulnerabilities dependent on global state, monolithic LLMs often get stuck in the mud of local code, completely unable to perform global temporal reasoning.

3. The Breakthrough: Agora's Three-Agent Paradigm and Core Harness Architecture

To break this deadlock, Agora is the first to introduce the classic academic paradigm of Hypothesis-Driven Testing (HDT) into large model agent systems. To achieve efficient global reasoning, Agora completely abandons the traditional "lone wolf" mode, elegantly decoupling the workflow into three highly specialized agents with distinct roles:

Orchestrator Agent: Responsible for maintaining global state and performing "vulnerability exploitation" by extrapolating from known bugs.

Strategy Agent: Responsible for injecting distributed domain knowledge and generating highly aggressive anomalous scenarios tailored for CFT and BFT protocols.

TestGen Agent: The practical executor. The key that truly enables Agora to be operational and generate effective tests in a closed loop lies in its core automated testing architecture.

The architecture is illustrated in the following diagram:

In Agora's overall design, this "David vs. Goliath" accessible magic does not come out of thin air; it stems from the deep integration of its ingenious agent interaction mechanisms and the testing harness architecture.

The research team specially designed an extremely succinct and efficient communication and memory mechanism (Succinct Memory & Communication) within the system framework. While ensuring each agent focuses on its core tasks, it minimizes redundant context transmission overhead to the lowest level. Under this extreme communication constraint, the Orchestrator Agent (responsible for global coordination and state control), the Strategy Agent (responsible for generating distributed anomalous environments and scenarios), and the TestGen Agent (responsible for code testing and dynamic evaluation) are perfectly interwoven, collectively driving and fulfilling the Harness architecture:

Automated Closed-Loop Synergy: When the Strategy Agent deduces an abstract distributed attack scenario, relying on the highly decoupled interaction framework, the TestGen Agent can immediately launch the underlying test harness. This architecture not only possesses strong environmental adaptability, capable of spanning different programming language environments like Go and Rust to translate attack hypotheses into real, runnable unit tests, but also incorporates efficient reflection-loop technology.

Once a test throws an error during execution in the environment, the system precisely and real-time captures the call stack and execution logs, concisely feeding them back to the agents for targeted self-correction. This organic combination of "multi-agent minimal interaction + dynamic harness closed-loop" not only allows Agora to capture the most elusive deep logical bugs with extremely low token costs but also produces detailed analysis reports with very low false-positive rates.

The final operational overview is illustrated in the following diagram:

4. Results: Capturing 15 Top-Tier Zero-Day Deep Bugs, Baseline Large Models Score Zero

The evaluation results are astounding. The research team conducted a comprehensive assessment on four well-known consensus protocol libraries (including production-grade etcd and the underlying components of the emerging public chain core, Sui), comparing against top-tier models like GPT-5.2, Gemini 3.0 Pro Preview, Claude Sonnet 4.5, and Qwen3 Coder.

The outcome not only made 0G's own operational consensus systems more secure but also demonstrated overwhelming superiority:

15 New Logic Deep Bugs Uncovered: Agora successfully discovered 15 previously unknown protocol-level deep logical vulnerabilities. These span high-risk areas such as execution divergence, monotonicity violations, topology flaws, and signature vulnerabilities.

Native Large Models All Score Zero: In contrast, baseline models (even equipped with advanced ReAct dynamic toolchains) completely failed (0/15) against these deep logical vulnerabilities. They consumed massive amounts of tokens but could only find low-level code implementation bugs.

Extremely Low False-Positive Rate and High Cost-Effectiveness: Among all bug reports generated by Agora, genuine logical vulnerabilities accounted for a high 73.9% (false-positive rate only 26.1%). Even more impressive, it costs only about 5.32M tokens (approximately $40) on average to unearth one top-tier logical bug that would make seasoned architects lose their hair, demonstrating extremely high cost-effectiveness.

Results across multiple LLMs are shown below:

5. The Future: High Generalizability, Advancing into More Hardcore "Uncharted Territories"

Agora's success not only injects confidence into the security of distributed systems but also points the way for large model applications in vertical, industrial-grade scenarios.

Critically, Agora's architectural design demonstrates high generalizability and universality. The research team emphasizes that Agora can also be quickly reproduced and used by a broad user base in the form of plugins or skills. Our code (github.com/0gfoundation/agora) provides corresponding skills to aid reproduction. Furthermore, Agora's "Large Model + Multi-Agent Collaboration + Hypothesis-Driven" paradigm is not limited to consensus protocols. Due to the deep decoupling between its underlying workflow control and the upper-layer domain knowledge base and testing harness, the architecture means it can not only help numerous users quickly debug consensus protocols but can also be rapidly extended to other hardcore fields similarly plagued by "deep logical vulnerability hell" in a "plug-and-play" manner:

Database Concurrency Control: For testing complex transaction conflict defects in distributed databases under extreme isolation levels (like Serializable).

Operating System Kernels / Concurrent Systems: For deeply discovering hidden deadlocks and race conditions in multi-threaded infrastructure.

Web3 Smart Contract Auditing: For in-depth security boundary exploration of cross-chain protocols and DeFi logic involving complex economic models. The blockchain security market is projected to reach about $8.5 billion by 2026, and commercial products using "multi-agent security systems" for smart contract auditing, compressing audit cycles from weeks to hours, are already emerging. Market demand is exploding.

The era of AI-automated security for industrial-grade low-level infrastructure may have been officially inaugurated by Agora and its harness architecture.

We have reason to believe that Agora can help better test the capabilities of coding LLMs by discovering more deep bugs across various domains, and the deep bug use cases it finds can also help enhance coding LLMs' code comprehension abilities.

Agora can significantly improve the security of code repositories that form the foundation for financial secure transactions, such as consensus protocols, concurrency control, and smart contracts. Moreover, Agora can help more tech companies discover deeper logic bugs while consuming fewer tokens, saving funds and being more efficient!

More importantly, this precisely aligns with the two hottest current trends: First, multi-agent systems are transitioning from experimentation to production—Gartner predicts that by 2028, over 30% of enterprise software will have agentic AI built-in, and the multi-agent platform market size is expected to surge from the tens of billions to hundreds of billions of dollars within a few years. Second, "using agents to audit agents"—Agentic Quality Control—is becoming the industry standard for 2026.

Against the backdrop where the Veracode 2025 report indicates approximately 45% of AI-generated code contains security vulnerabilities and the agentic AI security market is growing at a ~42% CAGR, Agora enables tech companies to unearth deeper Logic Bugs with lower token costs, upgrading security auditing from a "human-powered task billed by the week" to an "automated capability delivered by the hour."

And as the landscape of this track becomes clearer, those who truly seize the early advantage are often not the loudest giants, but the team that first operationalizes the methodology and can consistently replicate it.

İlgili Sorular

QWhat is the core innovation of the Agora framework presented in the article?

AThe core innovation of the Agora framework is the first integration of deep domain knowledge with a large language model (LLM) multi-agent collaboration paradigm for autonomous bug detection in consensus protocols. It specifically uses a hypothesis-driven testing (HDT) approach with three specialized agents (Orchestrator, Strategy, and TestGen) coordinated within an automated test harness architecture to find deep logic bugs.

QHow does Agora's approach differ from traditional methods or using a single large language model (LLM) for bug detection in consensus protocols?

ATraditional methods like fuzzing struggle with state space explosion in industrial codebases. Single LLMs are limited to finding shallow, local implementation bugs and fail at global state and temporal reasoning required for protocol-level deep logic bugs. Agora overcomes this by decomposing the task into specialized agents that collaboratively perform global reasoning, hypothesis generation, and automated test execution with a reflection loop, enabling it to find complex, cross-stage vulnerabilities.

QWhat were the key experimental results of the Agora framework's evaluation on real consensus protocol codebases?

AIn evaluations on four major consensus protocol libraries (including etcd and Sui's components), Agora discovered 15 previously unknown protocol-level deep logic bugs across categories like execution divergence and monotonicity violations. In stark contrast, state-of-the-art single LLM baselines (GPT-5.2, Claude 4.5, etc.) equipped with advanced toolchains found zero such bugs (0/15). Agora achieved this with a high true positive rate (73.9%) and high cost-efficiency, averaging about 5.32M tokens (~$40) per deep bug found.

QWhat is the significance of Agora's design in terms of cost and accessibility compared to other industry approaches mentioned, like Anthropic's Glasswing project?

AAgora's design provides a cost-effective and accessible alternative to heavyweight, proprietary industry approaches. Unlike projects like Glasswing which rely on top-tier commercial models and incur high computational/token costs, Agora uses a streamlined multi-agent architecture with succinct communication. This allows it to achieve state-of-the-art bug detection using more cost-efficient base models, making advanced automated security auditing feasible for startups, SMEs, and open-source communities.

QBeyond consensus protocols, what other hardcore system domains does the article suggest the Agora framework's methodology could be applied to?

AThe article suggests that Agora's plug-and-play architecture, which decouples the core workflow from domain knowledge, can be generalized to other domains plagued by deep logic bugs. These include database concurrency control (e.g., testing transaction conflicts), operating system kernels/concurrent systems (e.g., for deadlocks and race conditions), and Web3 smart contract auditing (e.g., for complex cross-chain or DeFi protocol logic).

İlgili Okumalar

10% Position Limit Proposed: UK Retail Authorized Funds to Gain Indirect Exposure to Crypto Assets

The UK Financial Conduct Authority (FCA) is consulting on a proposal (CP26/17) that would allow retail funds, including UCITS and most Non-UCITS Retail Schemes (NURS), to invest up to 10% of their total assets in cryptoasset exchange-traded notes (crypto ETNs). This would enable indirect exposure to cryptoassets for mainstream investors through regulated funds. The rule maintains the existing prohibition on funds holding underlying cryptocurrencies like Bitcoin or Ethereum directly. The proposal introduces a strict 10% cap, positioning crypto ETNs as a potential satellite holding within diversified portfolios. Funds must ensure these investments align with their stated objectives and risk profiles. Notably, the cap does not apply to Qualified Investor Schemes (QIS) for professional clients, while Long-Term Asset Funds (LTAFs) would be prohibited from holding crypto ETNs. This move builds on the FCA's 2025 decision to permit retail trading of crypto ETNs on UK regulated exchanges. However, significant compliance burdens fall on fund managers, who must conduct thorough due diligence, assess liquidity, and provide clear risk disclosures to investors. The FCA emphasizes that even a small allocation can significantly impact a fund's risk profile. The policy's practical impact remains uncertain. Widespread adoption depends on whether asset managers deem the potential benefits worth the operational costs, disclosure requirements, and reputational risks. The consultation is open for feedback until July 13, 2026. Ultimately, the proposal represents a cautious, incremental step toward integrating cryptoassets into the regulated fund landscape, rather than a broad opening.

Foresight News28 dk önce

10% Position Limit Proposed: UK Retail Authorized Funds to Gain Indirect Exposure to Crypto Assets

Foresight News28 dk önce

Public Version of Mythos Officially Launched: Analyzing the Advantages and Limitations of AI Smart Contract Auditing

Publicly available Mythos, Anthropic's AI model, has officially launched, demonstrating both significant potential and limitations in smart contract security auditing. The article analyzes its capabilities through real-world cases. AI excels in identifying subtle, low-level vulnerabilities through pattern recognition and large-scale code screening. A key example is detecting a storage slot collision between a custom rewards mapping and a third-party library's ReentrancyGuard, a vulnerability easily missed in manual audits. In the recent Zcash incident, AI also rapidly discovered a critical soundness bug that had remained hidden for years. However, AI currently struggles with complex, interconnected scenarios. When tested on the Curve LlamaLend sDOLA exploit, which involved manipulating prices across multiple protocols (Curve pools, lending markets) to trigger liquidations, Fable 5 failed to identify the core cross-protocol attack vector. These scenarios require a deep understanding of DeFi economic models and multi-contract interactions. In conclusion, while AI tools like Mythos significantly boost efficiency in finding standardized, syntactic vulnerabilities, they cannot yet replace expert analysis for complex, business-logic, and cross-protocol attacks. An effective audit workflow combines AI's speed for initial screening with human expertise for in-depth, holistic analysis.

marsbit33 dk önce

Public Version of Mythos Officially Launched: Analyzing the Advantages and Limitations of AI Smart Contract Auditing

marsbit33 dk önce

Trade.xyz's Rebase Refusal Sparks Controversy, On-Chain Pre-IPO Market Faces Major Pricing Test

The debate surrounding Trade.xyz's refusal to adjust its SPCX (SpaceX pre-IPO) perpetual contract pricing amid updated share count revelations highlights a key challenge for on-chain pre-IPO markets. While several centralized exchanges (CEXs) paused and repriced their contracts after SpaceX's filing showed a ~10% increase in total shares, Trade.xyz maintained its market-driven pricing logic, which tracks expected per-share price sentiment rather than fundamental valuation metrics like market cap. This discrepancy triggered cross-platform arbitrage and caused leveraged long positions on Trade.xyz to suffer significant losses, as the platform's HIP-3 architecture lacks a native "Rebase" mechanism to neutrally adjust all user positions following such corporate actions. The incident underscores the difficulty for decentralized perpetual exchanges (Perp DEXs) to implement Rebase—a process CEXs handle by centrally pausing markets and adjusting ledger data. On-chain, this requires complex smart contract modifications, increasing gas costs, complexity, and potential attack surfaces. While some DEXs have managed similar adjustments, Trade.xyz's current design does not natively support it, though the team is reportedly exploring solutions for future events like stock splits. Ultimately, the controversy serves as a critical case study for the nascent on-chain pre-IPO sector, raising questions about price discovery reliability, transparent rule disclosure, and the readiness of DeFi infrastructures to handle traditional corporate actions as real-world assets (RWAs) gain traction.

marsbit41 dk önce

Trade.xyz's Rebase Refusal Sparks Controversy, On-Chain Pre-IPO Market Faces Major Pricing Test

marsbit41 dk önce

The 'Middle Eastern Prince' Swindles a Wealthy Woman: Renting Planes and Rolls-Royces, Scamming 120 Million Over Three Years

Two brothers who posed as "Middle Eastern princes" have been sentenced in the United States to 24 and 23 years in prison, respectively, and ordered to pay over $21.2 million in restitution and back taxes. Over three years, they fraudulently obtained approximately $21 million, primarily by promoting fictitious investment projects, including a non-existent cryptocurrency mining operation in a former General Electric industrial park in East Cleveland. The brothers, aged 42 and 33, created elaborate personas: one claimed to be a wealthy royal family heir and the city's "International Economic Advisor," while the other posed as a hedge fund manager with expertise from watching the TV show *Billions*. They bolstered their image by renting luxury cars and private jets and cultivating a relationship with a local mayor's chief of staff, who provided official-looking documents and government event access. A significant portion of the victims' funds, about $18 million, came from a single Chinese investor, a woman from Sichuan with experience in Bitcoin mining. The brothers also defrauded several women, including one former girlfriend. Their scheme unraveled when the primary investor discovered her $6 million worth of mining equipment had been sold off. The case highlights a trend of impostors using fabricated "Middle Eastern royal" identities to target wealthy individuals. Similar incidents include a "Dubai prince" who recently promoted a $500 million family office in Hong Kong and a Colombian man who impersonated a Saudi prince for decades in the US before being caught and sentenced in 2019.

marsbit56 dk önce

The 'Middle Eastern Prince' Swindles a Wealthy Woman: Renting Planes and Rolls-Royces, Scamming 120 Million Over Three Years

marsbit56 dk önce

İşlemler

Spot
Futures

Popüler Makaleler

0G Nasıl Satın Alınır

HTX.com’a hoş geldiniz! 0G (0G) satın alma işlemlerini basit ve kullanışlı bir hâle getirdik. Adım adım açıkladığımız rehberimizi takip ederek kripto yolculuğunuza başlayın. 1. Adım: HTX Hesabınızı OluşturunHTX'te ücretsiz bir hesap açmak için e-posta adresinizi veya telefon numaranızı kullanın. Sorunsuzca kaydolun ve tüm özelliklerin kilidini açın. Hesabımı Aç2. Adım: Kripto Satın Al Bölümüne Gidin ve Ödeme Yönteminizi SeçinKredi/Banka Kartı: Visa veya Mastercard'ınızı kullanarak anında 0G (0G) satın alın.Bakiye: Sorunsuz bir şekilde işlem yapmak için HTX hesap bakiyenizdeki fonları kullanın.Üçüncü Taraflar: Kullanımı kolaylaştırmak için Google Pay ve Apple Pay gibi popüler ödeme yöntemlerini ekledik.P2P: HTX'teki diğer kullanıcılarla doğrudan işlem yapın.Borsa Dışı (OTC): Yatırımcılar için kişiye özel hizmetler ve rekabetçi döviz kurları sunuyoruz.3. Adım: 0G (0G) Varlıklarınızı Saklayın0G (0G) satın aldıktan sonra HTX hesabınızda saklayın. Alternatif olarak, blok zinciri transferi yoluyla başka bir yere gönderebilir veya diğer kripto para birimlerini takas etmek için kullanabilirsiniz.4. Adım: 0G (0G) Varlıklarınızla İşlem YapınHTX'in spot piyasasında 0G (0G) ile kolayca işlemler yapın.Hesabınıza erişin, işlem çiftinizi seçin, işlemlerinizi gerçekleştirin ve gerçek zamanlı olarak izleyin. Hem yeni başlayanlar hem de deneyimli yatırımcılar için kullanıcı dostu bir deneyim sunuyoruz.

152 Toplam GörüntülenmeYayınlanma 2025.09.22Güncellenme 2026.06.02

0G Nasıl Satın Alınır

Tartışmalar

HTX Topluluğuna hoş geldiniz. Burada, en son platform gelişmeleri hakkında bilgi sahibi olabilir ve profesyonel piyasa görüşlerine erişebilirsiniz. Kullanıcıların 0G (0G) fiyatı hakkındaki görüşleri aşağıda sunulmaktadır.

活动图片