Japan's AI Dark Horse Emerges: How a 7B Small Model Challenges Fable and Mythos?

marsbit發佈於 2026-06-22更新於 2026-06-22

文章摘要

In June 2026, Sakana AI's new model Fugu caused a stir in the AI community. Its Fugu Ultra variant achieved scores of 73.7 on SWE-Bench Pro and 82.1 on TerminalBench 2.1, surpassing GPT-5.5 and Claude Opus 4.8, and was claimed to be comparable to export-restricted models like Fable 5 and Mythos Preview. Remarkably, the core of this high-performance system is not a massive model, but a small 7B-parameter RL Conductor model. Fugu operates as a multi-agent orchestrator: the 7B model acts as a "foreman," dynamically analyzing user tasks and delegating subtasks to a pool of top-tier global models (e.g., GPT-5, Gemini 3.1 Pro). It then synthesizes and verifies their outputs. This architecture represents a paradigm shift from monolithic models to an expert-team approach. It enhances performance in complex, multi-step engineering tasks like code review and security testing by enabling cross-validation from specialized models, improving long-session stability and token efficiency. However, Fugu's strengths come with trade-offs: it faces inherent latency due to multiple API calls, relies heavily on underlying US model APIs (creating dependency risks), and its benchmark comparisons with Fable/Mythos are based on reported scores, not head-to-head testing. For Japan's AI ecosystem, which lacks the massive compute and data resources of the US or China, Fugu exemplifies an "asymmetric breakthrough" strategy. Instead of competing directly in parameter scale, it focuses on intelligent orche...

June 22, 2026 — The new model "Fugu" released by Sakana AI sent shockwaves through the AI community. In the rigorous SWE-Bench Pro and TerminalBench benchmark tests, Fugu Ultra scored 73.7 and 82.1 points respectively, surpassing GPT-5.5 and Claude Opus 4.8, and even claimed to be on par with the export-controlled Fable 5 and Mythos Preview. Surprisingly, the core of this system, which topped the charts in engineering and reasoning capabilities, is not a massive model with hundreds of billions of parameters, but a model with only 7B parameters. It doesn't do the work itself; instead, it acts as a "project manager," dynamically orchestrating top global large models. This counter-intuitive architecture not only shatters the myth of "parameters equal justice" but also reflects Japan's path to AI breakthroughs amidst constrained computing resources.

The 7B "Project Manager": The Counter-Intuitive Architecture of Fugu

To understand the peculiarities of Fugu, one must first look at its origins. Sakana AI was founded in Tokyo in 2023 by Llion Jones, a co-author of the Transformer paper, and former Google researcher David Ha. From its inception, the company carried the "nature-inspired" gene, dedicated to solving AI problems with evolutionary algorithms and natural swarm intelligence. In 2025, Sakana AI secured investments from giants like NVIDIA and Google, valuing the company at over $25 billion. However, despite backing from these giants, Japan still lacks the massive computing infrastructure and data pools found in China and the US. Under these resource constraints, Sakana AI did not choose to compete head-on with trillion-parameter models but instead took an "orchestration" route.

Fugu is officially positioned as "a multi-agent orchestration system acting as a single foundational model." In traditional AI architecture, a large model is a "monolithic beast." A user inputs a prompt, and the model calculates from the first neural network layer to the last, outputting the result. This mode is extremely efficient for simple problems but often leads to hallucinations or logical breakdowns when facing complex, multi-step engineering tasks.

Fugu fundamentally changed this paradigm. Its core is a 7B-parameter model trained with reinforcement learning, called the RL Conductor. This 7B model does not directly generate the final answer; instead, it plays the role of a "project manager." When a user submits a task through a single OpenAI-compatible API, the RL Conductor dynamically analyzes the task type and then assigns subtasks to top global models in its agent pool, such as GPT-5, Gemini 3.1 Pro, or Claude Opus 4.8. It is responsible for scheduling, verifying, and synthesizing the outputs of these models, ultimately providing a result that has undergone multiple rounds of verification.

The theoretical underpinning for this architecture comes from two papers at ICLR 2026: "TRINITY: An Evolved LLM Coordinator" and "Learning to Orchestrate Agents in Natural Language with the Conductor." The papers detail how a small-parameter model can "conduct" large models through reinforcement learning. This changes the paradigm of "Test-time scaling." In the past, computing power was primarily used for deep inference within the model, making the model "struggle" for an answer. Now, computing power is used for external scheduling, verification, and synthesis. Traditional large models are monolithic all-rounders, while Fugu is a team of experts. The 7B RL Conductor proves that model parameter size is no longer the sole determinant of capability; knowing how to call tools and external agents can also lead to performance leaps.

The Truth Behind the Scores: Matching Fable and Surpassing GPT-5.5

The immediate reason for Fugu's sensation is its benchmark scores in rigorous tests. In the AI industry, benchmark scores are the hard currency for measuring model capabilities, but different benchmarks focus on entirely different aspects. The SWE-Bench Pro and TerminalBench 2.1 chosen by Sakana AI are both "tough nuts" biased towards real-world engineering environments.

SWE-Bench Pro focuses on software engineering capabilities, requiring models to locate and fix bugs in real codebases. According to data published in the Sakana AI console, Fugu Ultra scored 73.7 on SWE-Bench Pro. For comparison, Claude Opus 4.8 scored 69.2, GPT-5.5 scored 58.6, and Gemini 3.1 Pro scored 54.2. On TerminalBench 2.1, another test for system operation capabilities, Fugu Ultra scored 82.1, surpassing GPT-5.5's 78.2 and Opus 4.8's 74.6. These two tests not only examine a model's code generation ability but also its logical stability and tool-calling capability in multi-step, long-chain tasks. Fugu Ultra's lead means it experiences fewer mid-process crashes or deviations from goals when handling complex engineering problems compared to monolithic models.

More attention was paid to the comparison between Fugu and Fable 5/Mythos Preview. Anthropic's Fable series and another frontier lab's Mythos series represent the pinnacle of current AI reasoning capabilities. However, due to export controls or incomplete public release, these two models are not part of Fugu's agent pool. Sakana AI officially claims that Fugu Ultra is "on par" with Fable 5 and Mythos Preview on engineering and science benchmarks. It must be clarified, however, that this comparison is not based on head-to-head testing in the same pool. Fugu's scores are based on actual runs of its own system, while Fable and Mythos data are based on report scores publicly released by their respective vendors.

This comparison methodology has sparked some controversy in the developer community. Some argue that test conditions across different systems and environments are difficult to align perfectly, making direct score comparisons unfair. However, other developers point out that referencing vendor-reported data is industry practice in the absence of a unified testing environment. Setting aside the controversy with Fable and Mythos, Fugu Ultra's surpassing of GPT-5.5 and Opus 4.8 on SWE-Bench Pro and TerminalBench 2.1 is a real, like-for-like comparison. This surpassing is not because Fugu's underlying model is smarter than GPT-5.5, but because the RL Conductor performs task decomposition and expert scheduling more precisely. In experiments requiring multiple rounds of reasoning and verification, such as AutoResearch, Rubik's Cube solving, and mechanical design, Fugu consistently showed advantages. This indicates that in handling "long, messy, multi-step" real-world workflows, the multi-agent orchestration architecture indeed offers more resilience than monolithic models.

Real Development Scenario Tests: Code Review and Long Session Stability

For developers and AI tool users, benchmark scores are only references. What truly determines a model's usefulness is its performance in real work scenarios. Fugu underwent beta testing with nearly 500 early users before release. Their feedback revealed Fugu's unique value in practical applications.

Code review is one of the most common AI scenarios for developers. Traditional monolithic models often only find superficial syntax errors or common logic bugs when reviewing code. In beta testing, some developers reported that Fugu demonstrated unusually detailed performance in code reviews, capable of uncovering deep architectural bugs, while other tools often found only a few surface-level issues. This difference stems from Fugu's architecture. Upon receiving a code review task, the RL Conductor can call models specializing in static analysis, logical reasoning, and security auditing respectively to conduct cross-validation on the same piece of code from multiple angles. This "expert consultation" model naturally uncovers more hidden problems than the "solo effort" of a single model.

Another frequently mentioned advantage is long-session stability. When building AI Agent products, one of developers' biggest headaches is the model's "persona drift" in long conversations. As the number of dialogue rounds increases, monolithic models often forget the initial setup or deviate in instruction following. After testing, some enterprise executives reported that Fugu's Persona in long conversations is exceptionally stable, with almost no drift. This is because the RL Conductor itself is not responsible for maintaining long-text memory; it only selects the most appropriate underlying model to generate a response in each dialogue round based on the current context. This architecture of "separation of control and generation" greatly improves Agent stability during long-running sessions.

In the field of cybersecurity, Fugu also demonstrated end-to-end practical capability. In tests, Fugu could independently complete the entire workflow from reconnaissance, XSS/SQLi vulnerability detection to authentication review, and generate a complete penetration test report, strictly adhering to instructions not to cross boundaries and damage systems. This level of completion for complex tasks relies on the RL Conductor's precise orchestration of security toolchains and the capabilities of different large models.

In addition, token efficiency is a major highlight of Fugu. Traditional large models often generate lengthy chains of thought, consuming a large number of tokens when dealing with complex problems. Fugu's RL Conductor avoids wasteful long CoT consumption through precise routing. Official data and early testing show it can significantly reduce waste of ineffective tokens. For developers billed by tokens, this means not only cost reduction but also improved response speed.

The Achilles' Heel of Underlying Dependency: The Cost of Multi-Agent Orchestration

Although Fugu shines in architecture and benchmark scores, as a tool for practical work, it is not without weaknesses. The multi-agent orchestration architecture, while bringing performance breakthroughs, also introduces significant risks and limitations.

The core issue is underlying dependency risk. Fugu's agent pool heavily relies on underlying APIs from US giants like GPT, Claude, and Gemini. Although the RL Conductor has dynamic routing capabilities and can switch to other models if one fails or is rate-limited, this only mitigates single-supplier risk. It does not and cannot detach from the entire US AI infrastructure ecosystem. If these underlying models collectively raise prices, impose large-scale rate limits, or change API terms, Fugu's cost structure and stability will be directly impacted. This "parasitic" mode, living atop others' infrastructure, has inherent fragility in commercialization and long-term stability.

Next is the trade-off between latency and cost structure. While the RL Conductor saves on ineffective token consumption through precise routing, multi-agent orchestration inevitably involves multiple API calls and inter-model communication. For real-time interaction scenarios requiring extremely low latency, such as real-time voice conversations or high-frequency trading assistance, Fugu Ultra's "deep thinking and scheduling" time may be longer than directly calling a monolithic model. In scenarios where response speed is paramount, Fugu's architectural advantage could become a drag on user experience.

Furthermore, controversies over fairness of comparison persist. As mentioned, Fugu claims parity with Fable and Mythos, but the latter two are not in its agent pool. In the developer community, some voices question whether comparisons based on vendor-reported data have practical reference value. After all, model performance can vary greatly across different task distributions, and simple aggregate score comparisons might mask specific strengths and weaknesses. For developers needing precise model capability assessments, the lack of head-to-head test data means they must remain cautious during selection.

Not Competing on Compute, but on Orchestration: Japan's Asymmetric Breakthrough in Large Models

Looking beyond the specific product review, Fugu's birth carries deeper implications for Japan's large model ecosystem. In the global AI arms race, Japan is in an awkward position. It lacks both the continuous influx of top-tier computing power and frontier algorithm accumulation of the US, and the massive data pools and fiercely competitive market environment of China. More critically, Japan also faces export control risks from US frontier models (like Fable/Mythos). Against this backdrop, Sakana AI's "evolutionary algorithm" and "multi-agent orchestration" route showcase the logic of "asymmetric breakthrough" for a resource-constrained nation.

Japan does have domestic large model players. NTT released tsuzumi, and institutions like ELYZA, Rinna, and LLM-jp are also working hard to train local language models. However, most follow the traditional "train from scratch" route, struggling to compete with top US and Chinese models in parameter scale and general capabilities. Sakana AI is the only Japanese lab with global frontier influence that champions an "asymmetric architecture."

Fugu's dynamic routing capability essentially helps Japanese companies and institutions establish "AI Sovereignty." Under limited computing resources, instead of spending huge sums to train a hundred-billion-parameter model that is inferior to GPT-5.5 in all aspects, it's better to train a clever 7B "project manager." This manager can flexibly connect to the world's best models based on task needs. If one day a US model faces export controls or supply cuts, the RL Conductor can quickly route tasks to other available models, even connecting to Japan's domestic specialized models. This architecture gives Japan a degree of autonomy and risk resilience in utilizing AI capabilities.

Observing the global AI tool ecosystem, OmniTools notes that large model capabilities are gradually leveling, and the main battleground of competition is shifting from mere parameter stacking to toolchains and landing scenarios. The emergence of Fugu precisely confirms this trend. It no longer pursues perfection in a single model but pursues optimality at the system level. This thinking holds significant reference value for nations and regions lacking advantages in compute and data.

Of course, this "asymmetric breakthrough" has its ceiling. As long as the core technology of underlying models remains in the hands of a few giants, the capability ceiling of orchestration systems will be limited by those underlying models. Fugu proves a 7B model can be an excellent conductor, but it cannot magically create capabilities that the underlying models lack. For Japan's large models to truly achieve a breakthrough, beyond architectural innovation in orchestration, continued investment in underlying computing power, core algorithms, and high-quality data is still necessary. Fugu is an ingenious system-level innovation, but it's not a panacea. For developers and enterprise users, Fugu provides a highly competitive new option in complex engineering scenarios. However, when using it, one must also be clear-eyed about its underlying dependency vulnerabilities and the latency-cost trade-offs.

你可能也喜歡

‘出售……’——洞察灰度消除 Strategy 140亿美元未实现亏损的计划

灰度研究主管Zach Pandl针对近期备受关注的微策略（MicroStrategy，文中称Strategy）提出财务建议。该公司目前持有847,363枚比特币，价值约509亿美元，但面临约140亿美元的未实现亏损以及每年约12亿美元的股息支付压力。 Pandl认为，微策略不应通过提高优先股股息来吸引投资者，因为这可能增加其固定财务负担。相反，他建议公司出售约30亿美元的比特币，以覆盖未来两年的大部分现金债务。此举虽会减少比特币储备，但能显著改善流动性、降低再融资风险，并有望恢复市场信心。目前，微策略股价已跌至2024年3月以来的低点，其股价与比特币储备比率也大幅下降，反映出市场对其比特币国库策略的溢价正在缩减，投资者信心减弱。

ambcrypto17 分鐘前

ambcrypto17 分鐘前

Dwarkesh Patel：下一代AI，可能是干活干出来的

硅谷知名播客主持人Dwarkesh Patel探讨了下一代AI训练范式的可能方向。他指出，当前前沿实验室关注的RLVR（可验证奖励强化学习）虽然在代码、数学等可验证、可重复的任务上进展迅速，但其在复杂现实任务（如创业、法律、市场决策）中可能受限，因为这些任务反馈慢、变量多、环境不可重置。 Dwarkesh提出，AI的真正突破可能需要从“发布前训练”转向“发布后学习”。关键在于让模型能够从真实部署中积累经验，并将这些经验有效压缩并“写回”模型权重，实现持续学习。他提到了两种潜在技术方向：一是“在策略自蒸馏”（OPSD），将模型在长上下文中获得的经验蒸馏回基础模型；二是“梦境”模拟，即AI根据观察构建内部模拟环境进行练习。未来的训练流程可能是：先通过RLVR训练出基本智能体，再将其部署到真实任务中，从用户反馈和项目经验中持续学习。这意味着AI进步的核心数据源可能从互联网文本和实验室任务，转向智能体在真实世界中自行产生的经验。

marsbit30 分鐘前

marsbit30 分鐘前

加密货币市场本周赢家和输家 – VELVET, BEAT, WLD, XLM

加密货币市场本周持续承压，比特币和以太坊表现疲软。在整体市场情绪谨慎、大市值资产获利了结加速的背景下，资金轮动至少数小市值山寨币，推动部分代币实现三位数涨幅。 **本周涨幅领先者**： - **Velvet [VELVET]** 涨幅达235%，接近历史高点，虽RSI显示超买，但趋势仍看涨。 - **DeXe [DEXE]** 上涨60%，重返2021年第四季度以来高位，买盘支撑强劲。 - **Audiera [BEAT]** 反弹45%，此前一周暴跌近70%，呈现超跌反弹格局。 - 其他显著上涨代币包括 Cortex [CX]（+2710%）、Biconomy [BICO]（+246%）和 Synapse [SYN]（+186%）。 **本周跌幅居前者**： - **MemeCore [M]** 暴跌70%，因涉嫌内幕操纵引发恐慌性抛售，已进入超卖区域。 - **Worldcoin [WLD]** 下跌26%，属连续上涨后的健康回调，但买盘动力不足。 - **Stellar [XLM]** 下滑18.5%，日线连续下跌，关键支撑位失守，下行风险加剧。 - 其他大幅下跌代币包括 Humanity [H]（-71%）、Biconomy [BICO]（-68.5%）和 Yei Finance [CLO]（-42.2%）。本周市场波动剧烈，呈现暴涨暴跌态势。交易者需保持警惕，做好独立研究。

ambcrypto2 小時前

ambcrypto2 小時前

Sui与Token Terminal合作提供机构级链上数据分析

Sui Network与Token Terminal达成合作，旨在将链上金融数据和分析工具整合至机构研究流程中，以提升Sui网络数据对专业用户的可分析性。此举旨在增强机构级的市场透明度，而非直接推动SUI代币价格或锁仓量（TVL）的短期增长。在当前市场流动性偏薄、比特币方向不明朗的背景下，此类提供可验证数据访问的基建合作受到关注，有助于交易者关注实际生态进展而非短期投机叙事。报道强调，应通过Sui基金会官方公告等渠道核实后续整合细节，避免过度解读合作伙伴关系对市场的即时影响。

bitcoinist4 小時前

bitcoinist4 小時前

SecondFi 在价值240万美元的Cardano钱包漏洞后概述恢复计划

SecondFi在遭遇约240万美元的Cardano（ADA）钱包漏洞攻击后，已完成法证审查并公布了恢复计划。该计划称已对用户资产余额进行快照，并将在两周内返还资产。此次事件被定性为钱包层面的漏洞，而非Cardano底层协议故障。分析强调，在目前市场流动性偏薄、比特币方向不明朗的时期，交易者更应关注可验证的数据点，如资金流向、官方技术更新和安全事件披露，而非推测性叙事。报道明确指出，除非有主要技术证据支持，否则不应将此事件归咎于Cardano区块链本身。下一步的验证重点是关注SecondFi的官方安全更新和Cardano链上交易记录。此文基于官方信源和公开市场数据撰写，旨在以精确的语言陈述安全事件，避免对损失或责任方进行煽情化描述。

bitcoinist4 小時前

bitcoinist4 小時前

交易

現貨

熱門文章

什麼是 $S$

理解 SPERO：全面概述 SPERO 簡介隨著創新領域的不斷演變，web3 技術和加密貨幣項目的出現在塑造數字未來中扮演著關鍵角色。在這個動態領域中，SPERO（標記為 SPERO,$$s$）是一個引起關注的項目。本文旨在收集並呈現有關 SPERO 的詳細信息，以幫助愛好者和投資者理解其基礎、目標和在 web3 和加密領域內的創新。 SPERO,$$s$ 是什麼？ SPERO,$$s$ 是加密空間中的一個獨特項目，旨在利用去中心化和區塊鏈技術的原則，創建一個促進參與、實用性和金融包容性的生態系統。該項目旨在以新的方式促進點對點互動，為用戶提供創新的金融解決方案和服務。 SPERO,$$s$ 的核心目標是通過提供增強用戶體驗的工具和平台來賦能個人。這包括使交易方式更加靈活、促進社區驅動的倡議，以及通過去中心化應用程序（dApps）創造金融機會的途徑。SPERO,$$s$ 的基本願景圍繞包容性展開，旨在彌合傳統金融中的差距，同時利用區塊鏈技術的優勢。誰是 SPERO,$$s$ 的創建者？ SPERO,$$s$ 的創建者身份仍然有些模糊，因為公開可用的資源對其創始人提供的詳細背景信息有限。這種缺乏透明度可能源於該項目對去中心化的承諾——這是一種許多 web3 項目所共享的精神，優先考慮集體貢獻而非個人認可。通過將討論重心放在社區及其共同目標上，SPERO,$$s$ 體現了賦能的本質，而不特別突出某些個體。因此，理解 SPERO 的精神和使命比識別單一創建者更為重要。誰是 SPERO,$$s$ 的投資者？ SPERO,$$s$ 得到了來自風險投資家到天使投資者的多樣化投資者的支持，他們致力於促進加密領域的創新。這些投資者的關注點通常與 SPERO 的使命一致——優先考慮那些承諾社會技術進步、金融包容性和去中心化治理的項目。這些投資者通常對不僅提供創新產品，還對區塊鏈社區及其生態系統做出積極貢獻的項目感興趣。這些投資者的支持強化了 SPERO,$$s$ 作為快速發展的加密項目領域中的一個重要競爭者。 SPERO,$$s$ 如何運作？ SPERO,$$s$ 採用多面向的框架，使其與傳統的加密貨幣項目區別開來。以下是一些突顯其獨特性和創新的關鍵特徵：去中心化治理：SPERO,$$s$ 整合了去中心化治理模型，賦予用戶積極參與決策過程的權力，關於項目的未來。這種方法促進了社區成員之間的擁有感和責任感。代幣實用性：SPERO,$$s$ 使用其自己的加密貨幣代幣，旨在在生態系統內部提供多種功能。這些代幣使交易、獎勵和平台上提供的服務得以促進，增強了整體參與度和實用性。分層架構：SPERO,$$s$ 的技術架構支持模塊化和可擴展性，允許在項目發展過程中無縫整合額外的功能和應用。這種適應性對於在不斷變化的加密環境中保持相關性至關重要。社區參與：該項目強調社區驅動的倡議，採用激勵合作和反饋的機制。通過培養強大的社區，SPERO,$$s$ 能夠更好地滿足用戶需求並適應市場趨勢。專注於包容性：通過提供低交易費用和用戶友好的界面，SPERO,$$s$ 旨在吸引多樣化的用戶群體，包括那些以前可能未曾參與加密領域的個體。這種對包容性的承諾與其通過可及性賦能的總體使命相一致。 SPERO,$$s$ 的時間線理解一個項目的歷史提供了對其發展軌跡和里程碑的關鍵見解。以下是建議的時間線，映射 SPERO,$$s$ 演變中的重要事件：概念化和構思階段：形成 SPERO,$$s$ 基礎的初步想法被提出，與區塊鏈行業內的去中心化和社區聚焦原則密切相關。項目白皮書的發布：在概念階段之後，發布了一份全面的白皮書，詳細說明了 SPERO,$$s$ 的願景、目標和技術基礎設施，以吸引社區的興趣和反饋。社區建設和早期參與：積極進行外展工作，建立早期採用者和潛在投資者的社區，促進圍繞項目目標的討論並獲得支持。代幣生成事件：SPERO,$$s$ 進行了一次代幣生成事件（TGE），向早期支持者分發其原生代幣，並在生態系統內建立初步流動性。首次 dApp 上線：與 SPERO,$$s$ 相關的第一個去中心化應用程序（dApp）上線，允許用戶參與平台的核心功能。持續發展和夥伴關係：對項目產品的持續更新和增強，包括與區塊鏈領域其他參與者的戰略夥伴關係，使 SPERO,$$s$ 成為加密市場中一個具有競爭力和不斷演變的參與者。結論 SPERO,$$s$ 是 web3 和加密貨幣潛力的見證，能夠徹底改變金融系統並賦能個人。憑藉對去中心化治理、社區參與和創新設計功能的承諾，它為更具包容性的金融環境鋪平了道路。與任何在快速發展的加密領域中的投資一樣，潛在的投資者和用戶都被鼓勵進行徹底研究，並對 SPERO,$$s$ 的持續發展進行深思熟慮的參與。該項目展示了加密行業的創新精神，邀請人們進一步探索其無數可能性。儘管 SPERO,$$s$ 的旅程仍在展開，但其基礎原則確實可能影響我們在互聯網數字生態系統中如何與技術、金融和彼此互動的未來。

121 人學過發佈於 2024.12.17更新於 2024.12.17

什麼是 AGENT S

Agent S：Web3中自主互動的未來介紹在不斷演變的Web3和加密貨幣領域，創新不斷重新定義個人如何與數字平台互動。Agent S是一個開創性的項目，承諾通過其開放的代理框架徹底改變人機互動。Agent S旨在簡化複雜任務，為人工智能（AI）提供變革性的應用，鋪平自主互動的道路。本詳細探索將深入研究該項目的複雜性、其獨特特徵以及對加密貨幣領域的影響。什麼是Agent S？ Agent S是一個突破性的開放代理框架，專門設計用來解決計算機任務自動化中的三個基本挑戰：獲取特定領域知識：該框架智能地從各種外部知識來源和內部經驗中學習。這種雙重方法使其能夠建立豐富的特定領域知識庫，提升其在任務執行中的表現。長期任務規劃：Agent S採用經驗增強的分層規劃，這是一種戰略方法，可以有效地分解和執行複雜任務。此特徵顯著提升了其高效和有效地管理多個子任務的能力。處理動態、不均勻的界面：該項目引入了代理-計算機界面（ACI），這是一種創新的解決方案，增強了代理和用戶之間的互動。利用多模態大型語言模型（MLLMs），Agent S能夠無縫導航和操作各種圖形用戶界面。通過這些開創性特徵，Agent S提供了一個強大的框架，解決了自動化人機互動中涉及的複雜性，為AI及其他領域的無數應用奠定了基礎。誰是Agent S的創建者？儘管Agent S的概念根本上是創新的，但有關其創建者的具體信息仍然難以捉摸。創建者目前尚不清楚，這突顯了該項目的初期階段或戰略選擇將創始成員保密。無論是否匿名，重點仍然在於框架的能力和潛力。誰是Agent S的投資者？由於Agent S在加密生態系統中相對較新，關於其投資者和財務支持者的詳細信息並未明確記錄。缺乏對支持該項目的投資基礎或組織的公開見解，引發了對其資金結構和發展路線圖的質疑。了解其支持背景對於評估該項目的可持續性和潛在市場影響至關重要。 Agent S如何運作？ Agent S的核心是尖端技術，使其能夠在多種環境中有效運作。其運營模型圍繞幾個關鍵特徵構建：類人計算機互動：該框架提供先進的AI規劃，力求使與計算機的互動更加直觀。通過模仿人類在任務執行中的行為，承諾提升用戶體驗。敘事記憶：用於利用高級經驗，Agent S利用敘事記憶來跟蹤任務歷史，從而增強其決策過程。情節記憶：此特徵為用戶提供逐步指導，使框架能夠在任務展開時提供上下文支持。支持OpenACI：Agent S能夠在本地運行，使用戶能夠控制其互動和工作流程，與Web3的去中心化理念相一致。與外部API的輕鬆集成：其多功能性和與各種AI平台的兼容性確保了Agent S能夠無縫融入現有技術生態系統，成為開發者和組織的理想選擇。這些功能共同促成了Agent S在加密領域的獨特地位，因為它以最小的人類干預自動化複雜的多步任務。隨著項目的發展，其在Web3中的潛在應用可能重新定義數字互動的展開方式。 Agent S的時間線 Agent S的發展和里程碑可以用一個時間線來概括，突顯其重要事件： 2024年9月27日：Agent S的概念在一篇名為《一個像人類一樣使用計算機的開放代理框架》的綜合研究論文中推出，展示了該項目的基礎工作。 2024年10月10日：該研究論文在arXiv上公開，提供了對框架及其基於OSWorld基準的性能評估的深入探索。 2024年10月12日：發布了一個視頻演示，提供了對Agent S能力和特徵的視覺洞察，進一步吸引潛在用戶和投資者。這些時間線上的標記不僅展示了Agent S的進展，還表明了其對透明度和社區參與的承諾。有關Agent S的要點隨著Agent S框架的持續演變，幾個關鍵特徵脫穎而出，強調其創新性和潛力：創新框架：旨在提供類似人類互動的直觀計算機使用，Agent S為任務自動化帶來了新穎的方法。自主互動：通過GUI自主與計算機互動的能力標誌著向更智能和高效的計算解決方案邁進了一步。複雜任務自動化：憑藉其強大的方法論，能夠自動化複雜的多步任務，使過程更快且更少出錯。持續改進：學習機制使Agent S能夠從過去的經驗中改進，不斷提升其性能和效率。多功能性：其在OSWorld和WindowsAgentArena等不同操作環境中的適應性確保了它能夠服務於廣泛的應用。隨著Agent S在Web3和加密領域中的定位，其增強互動能力和自動化過程的潛力標誌著AI技術的一次重大進步。通過其創新框架，Agent S展現了數字互動的未來，為各行各業的用戶承諾提供更無縫和高效的體驗。結論 Agent S代表了AI與Web3結合的一次大膽飛躍，具有重新定義我們與技術互動方式的能力。儘管仍處於早期階段，但其應用的可能性廣泛且引人入勝。通過其全面的框架解決關鍵挑戰，Agent S旨在將自主互動帶到數字體驗的最前沿。隨著我們深入加密貨幣和去中心化的領域，像Agent S這樣的項目無疑將在塑造技術和人機協作的未來中發揮關鍵作用。

896 人學過發佈於 2025.01.14更新於 2025.01.14

如何購買S

歡迎來到HTX.com！在這裡，購買Sonic (S)變得簡單而便捷。跟隨我們的逐步指南，放心開始您的加密貨幣之旅。第一步：創建您的HTX帳戶使用您的 Email、手機號碼在HTX註冊一個免費帳戶。體驗無憂的註冊過程並解鎖所有平台功能。立即註冊第二步：前往買幣頁面，選擇您的支付方式信用卡/金融卡購買：使用您的Visa或Mastercard即時購買Sonic (S)。餘額購買：使用您HTX帳戶餘額中的資金進行無縫交易。第三方購買：探索諸如Google Pay或Apple Pay等流行支付方式以增加便利性。C2C購買：在HTX平台上直接與其他用戶交易。HTX 場外交易 (OTC) 購買：為大量交易者提供個性化服務和競爭性匯率。第三步：存儲您的Sonic (S)購買Sonic (S)後，將其存儲在您的HTX帳戶中。您也可以透過區塊鏈轉帳將其發送到其他地址或者用於交易其他加密貨幣。第四步：交易Sonic (S)在HTX的現貨市場輕鬆交易Sonic (S)。前往您的帳戶，選擇交易對，執行交易，並即時監控。HTX為初學者和經驗豐富的交易者提供了友好的用戶體驗。

1.9k 人學過發佈於 2025.01.15更新於 2026.06.02

Japan's AI Dark Horse Emerges: How a 7B Small Model Challenges Fable and Mythos?

文章摘要

The 7B "Project Manager": The Counter-Intuitive Architecture of Fugu

The Truth Behind the Scores: Matching Fable and Surpassing GPT-5.5

Real Development Scenario Tests: Code Review and Long Session Stability

The Achilles' Heel of Underlying Dependency: The Cost of Multi-Agent Orchestration

Not Competing on Compute, but on Orchestration: Japan's Asymmetric Breakthrough in Large Models

熱門幣種推薦

相關問答

你可能也喜歡

‘出售……’——洞察灰度消除 Strategy 140亿美元未实现亏损的计划

Dwarkesh Patel：下一代AI，可能是干活干出来的

加密货币市场本周赢家和输家 – VELVET, BEAT, WLD, XLM

Sui与Token Terminal合作提供机构级链上数据分析

SecondFi 在价值240万美元的Cardano钱包漏洞后概述恢复计划

交易

熱門文章

什麼是 $S$

什麼是 AGENT S

如何購買S

相關討論

熱門問答

熱門分類

熱門標籤