# Open Source的所有文章

在 HTX 新聞中心流覽與「Open Source」相關的最新資訊與深度分析。潘蓋市場趨勢、專案動態、技術進展及監管政策,提供權威的加密行業洞察。

Can Humans Control AI? Anthropic Conducted an Experiment Using Qwen

Can Humans Control Superintelligent AI? Anthropic’s Experiment with Qwen Models Anthropic conducted an experiment to explore whether humans can supervise AI systems smarter than themselves—a core challenge in AI safety known as scalable oversight. The study simulated a “weak human overseer” using a small model (Qwen1.5-0.5B-Chat) and a “strong AI” using a more powerful model (Qwen3-4B-Base). The goal was to see if the strong model could learn effectively despite imperfect supervision. The key metric was Performance Gap Recovered (PGR). A PGR of 1 means the strong model reached its full potential, while 0 means it was limited by the weak supervisor. Initially, human researchers achieved a PGR of 0.23 after a week of work. Then, nine AI agents (Automated Alignment Researchers, or AARs) based on Claude Opus took over. In five days, they improved PGR to 0.97 through iterative experimentation—proposing ideas, coding, training, and analyzing results. The findings suggest that, in well-defined and automatically scorable tasks, AI can help overcome the supervision gap. However, the methods didn’t generalize perfectly to unseen tasks, and applying them to a production model like Claude Sonnet didn’t yield significant improvements. The study highlights that while AI can automate parts of alignment research, human oversight remains essential to prevent “gaming” of evaluation systems and to handle more complex, real-world problems. Anthropic chose Qwen models for their open-source nature, performance, scalability, and reproducibility—key for rigorous and repeatable experiments. The research demonstrates progress toward automated alignment tools but also underscores that AI supervision remains a nuanced, human-AI collaborative effort.

marsbit昨天 09:28

Can Humans Control AI? Anthropic Conducted an Experiment Using Qwen

marsbit昨天 09:28

Hermes Agent Guide: Surpassing OpenClaw, Boosting Productivity by 100x

A guide to Hermes Agent, an open-source AI agent framework by Nous Research, positioned as a powerful alternative to OpenClaw. It is described as a self-evolving agent with a built-in learning loop that autonomously creates skills from experience, continuously improves them, and solidifies knowledge into reusable assets. Its core features include a memory system (storing environment info and user preferences in MEMORY.md and USER.md) and a skill system that generates structured documentation for complex tasks. The agent boasts over 40 built-in tools for web search, browser automation, vision, image generation, and text-to-speech. It supports scheduling automated tasks and can run on various infrastructures, from a $5 VPS to GPU clusters. Popular tools within its ecosystem include the Hindsight memory plugin, the Anthropic Cybersecurity Skills pack, and the mission-control dashboard for agent orchestration. Key differentiators from OpenClaw are its architecture philosophy—centered on the agent's own execution loop rather than a central controller—and its autonomous skill generation versus OpenClaw's manually written skills. Installation is a one-line command, and setup is guided. It integrates with messaging platforms like Telegram, Discord, and Slack. It's suited for scenarios requiring a persistent, context-aware assistant that improves over time, automates workflows, and operates across various deployment environments.

marsbit前天 13:11

Hermes Agent Guide: Surpassing OpenClaw, Boosting Productivity by 100x

marsbit前天 13:11

The Creator of Kling Returns to Alibaba and Builds Another Dark Horse

The article discusses the rise of HappyHorse-1.0, an AI video generation model developed by Alibaba, which topped the Artificial Analysis leaderboard in both text-to-video and image-to-video categories in April 2026. The model was created under the leadership of Zhang Di, who returned to Alibaba in November 2025 after working at Kuaishou, where he led the development of the Kling model. HappyHorse is open-source and commercially available, similar to Alibaba's Qwen model. Zhang Di's background includes extensive experience in large-scale data systems and machine learning at Alibaba and Kuaishou, which contributed to the rapid development of HappyHorse within just five months. The model uses a 15-billion-parameter transformer architecture with native multimodal training, supporting multiple languages and lip-sync capabilities. It also focuses on reducing inference time and cost, making it practical for commercial use. The primary application of HappyHorse is in e-commerce, where it can generate product videos to enhance user engagement and conversion rates by creating contextual and personalized content. This aligns with Alibaba's strengths in commerce, advertising, and data feedback loops. The model's success with open-source approach contrasts with challenges faced by closed-source models like OpenAI's Sora (shut down due to high costs) and ByteDance's Seedance 2.0 (paused over copyright issues). HappyHorse represents a strategic move for Alibaba to integrate AI video generation into its core business ecosystems.

marsbit04/13 05:10

The Creator of Kling Returns to Alibaba and Builds Another Dark Horse

marsbit04/13 05:10

When AI's Bottleneck Is No Longer the Model: Perseus Yang's Open Source Ecosystem Building Practices and Reflections

In 2026, the AI industry's primary bottleneck is no longer model capability but rather the encoding of domain knowledge, agent-world interfaces, and toolchain maturity. The open-source community is rapidly bridging this gap, evidenced by projects like OpenClaw and Claude Code experiencing explosive growth in their Skill ecosystems. Perseus Yang, a contributor to over a dozen AI open-source projects, argues that Skill systems are the most underestimated infrastructure of the AI agent era. They enable non-coders to program AI by writing natural language SKILL.md files, transferring power from engineers to all professionals. His project, GTM Engineer Skills, demonstrates this by automating go-to-market workflows, proving Skills can extend far beyond engineering into areas like product strategy and business analysis. He also identifies a critical blind spot: while browser automation thrives, agent operations are nearly absent from mobile apps, the world's dominant computing interface. His project, OpenPocket, is an open-source framework that allows agents to operate Android devices via ADB. It features human-in-the-loop security, agent isolation, and the ability for agents to autonomously create and save new reusable Skills. Yang believes the value of open source lies not in the code itself, but in defining the infrastructure standards during this formative period. His work validates the SKILL.md format as a portable unit for agent capability and pioneers new architectures for agent operation in API-less environments. His design philosophy prioritizes usability for non-technical users, ensuring the agent ecosystem can be expanded by practitioners from all fields, not just engineers.

marsbit04/13 01:29

When AI's Bottleneck Is No Longer the Model: Perseus Yang's Open Source Ecosystem Building Practices and Reflections

marsbit04/13 01:29

Mysterious Model HappyHorse Tops the Chart Overnight: Is the Video Generation Arena Welcoming a "Game Changer"?

A mysterious AI video generation model named "HappyHorse-1.0" has quietly topped the AI Video Arena leaderboard on Artificial Analysis, surpassing established models like Seedance 2.0 and others in Elo score—a user-blind-test-based ranking reflecting real perceived quality. The model’s origin was initially unknown, but technical analysis later linked it to the open-source model "daVinci-MagiHuman," jointly developed by Shanghai SII GAIR Lab and Beijing-based Sand.ai. HappyHorse-1.0, likely an optimized iteration by Sand.ai, uses a 15-billion-parameter transformer architecture for joint audio-video-text modeling. Its strong performance in human-centric scenes (e.g., portraits, narrations) helped it excel in blind tests, though it still lags in multi-character or complex motion scenarios. The achievement signals a potential shift: an open-source model rivaling closed-source alternatives in perceived quality, which could lower costs and increase flexibility for developers in vertical applications like virtual avatars. However, limitations remain, including high computational requirements (H100 GPU needed) and shorter generation lengths. While not yet threatening market leaders, HappyHorse represents progress toward open models reaching "production-ready" quality, potentially accelerating community-driven improvements in the video AI space.

marsbit04/08 07:57

Mysterious Model HappyHorse Tops the Chart Overnight: Is the Video Generation Arena Welcoming a "Game Changer"?

marsbit04/08 07:57

Running Gemma 4 Locally on iPhone Goes Viral: How Far Are We from the Zero Token Era?

Google's newly open-sourced Gemma 4 model, built on the same architecture as Gemini 3, has gained significant attention for its ability to run locally on mobile devices like the iPhone and Samsung Galaxy. With smaller versions such as E2B (2.3B parameters) and E4B (4.5B parameters), it supports native multimodal capabilities and offers a 128K context window. Users report impressive speeds—over 40 tokens per second on Apple chips with MLX optimization—making it feel "like magic." The model is accessible via Google’s official AI Edge Gallery app, ensuring ease of use and security. While Gemma 4 excels in tasks like text generation, coding, and image understanding, it struggles with more complex agent-based workflows, such as tool calling and structured outputs, where models like Qwen3-coder perform better. Despite some limitations in reasoning, Gemma 4’s local performance hints at a future where everyday AI tasks—chat, coding, reasoning—can be handled offline, reducing reliance on cloud-based token services. Although cloud models still lead in advanced reasoning and large-scale multi-agent tasks, the trend suggests that as hardware and quantization improve, on-device models will increasingly handle high-frequency simple tasks. This shift could disrupt the AI industry’s reliance on token sales and API subscriptions, pushing providers to focus on more complex, data-intensive capabilities. Gemma 4 is just the beginning of this transformation.

marsbit04/06 05:53

Running Gemma 4 Locally on iPhone Goes Viral: How Far Are We from the Zero Token Era?

marsbit04/06 05:53

活动图片