# Пов'язані статті щодо Automation

Центр новин HTX надає останні статті та поглиблений аналіз на тему "Automation", що охоплює ринкові тренди, оновлення проєктів, технологічні розробки та регуляторну політику в криптоіндустрії.

Can Humans Control AI? Anthropic Conducted an Experiment Using Qwen

Can Humans Control Superintelligent AI? Anthropic’s Experiment with Qwen Models Anthropic conducted an experiment to explore whether humans can supervise AI systems smarter than themselves—a core challenge in AI safety known as scalable oversight. The study simulated a “weak human overseer” using a small model (Qwen1.5-0.5B-Chat) and a “strong AI” using a more powerful model (Qwen3-4B-Base). The goal was to see if the strong model could learn effectively despite imperfect supervision. The key metric was Performance Gap Recovered (PGR). A PGR of 1 means the strong model reached its full potential, while 0 means it was limited by the weak supervisor. Initially, human researchers achieved a PGR of 0.23 after a week of work. Then, nine AI agents (Automated Alignment Researchers, or AARs) based on Claude Opus took over. In five days, they improved PGR to 0.97 through iterative experimentation—proposing ideas, coding, training, and analyzing results. The findings suggest that, in well-defined and automatically scorable tasks, AI can help overcome the supervision gap. However, the methods didn’t generalize perfectly to unseen tasks, and applying them to a production model like Claude Sonnet didn’t yield significant improvements. The study highlights that while AI can automate parts of alignment research, human oversight remains essential to prevent “gaming” of evaluation systems and to handle more complex, real-world problems. Anthropic chose Qwen models for their open-source nature, performance, scalability, and reproducibility—key for rigorous and repeatable experiments. The research demonstrates progress toward automated alignment tools but also underscores that AI supervision remains a nuanced, human-AI collaborative effort.

marsbit04/15 09:28

Can Humans Control AI? Anthropic Conducted an Experiment Using Qwen

marsbit04/15 09:28

Only Work 2 Hours a Day? This Google Engineer Uses Claude to Automate 80% of His Work

A Google engineer with 11 years of experience automated 80% of his work using Claude Code and a simple .NET application, reducing his daily work from 8 hours to just 2–3 hours while generating $28,000 in monthly passive income. The key to this transformation lies in three core elements: First, using a structured CLAUDE.md file based on Andrej Karpathy’s principles—Think Before Coding, Simplicity First, Surgical Changes, and Goal-Driven Execution—reduces Claude’s rule violations from 40% to just 3%. Second, the "Everything Claude Code" system acts as a full AI engineering team, with 27 pre-built agents for planning, reviewing, and executing tasks across multiple AI platforms. Third, a hidden token consumption issue in Claude Code v2.1.100 was identified, where 20,000 extra tokens were silently added, diluting instructions and reducing output quality. A quick fix using npx downgrades the version to avoid this. The automated system enables code generation, testing, and review to run autonomously in 15-minute cycles. The engineer now only reviews output, saving 5–6 hours daily. The setup takes less than 20 minutes, and the return on time investment is significant—potentially saving $10,000–$12,000 monthly for those valuing their time at $100/hour. The article emphasizes that managing AI systems, not just using them, is the new critical skill, enabling a shift from doing work to overseeing automated processes.

marsbit04/15 04:10

Only Work 2 Hours a Day? This Google Engineer Uses Claude to Automate 80% of His Work

marsbit04/15 04:10

Hermes Agent Guide: Surpassing OpenClaw, Boosting Productivity by 100x

A guide to Hermes Agent, an open-source AI agent framework by Nous Research, positioned as a powerful alternative to OpenClaw. It is described as a self-evolving agent with a built-in learning loop that autonomously creates skills from experience, continuously improves them, and solidifies knowledge into reusable assets. Its core features include a memory system (storing environment info and user preferences in MEMORY.md and USER.md) and a skill system that generates structured documentation for complex tasks. The agent boasts over 40 built-in tools for web search, browser automation, vision, image generation, and text-to-speech. It supports scheduling automated tasks and can run on various infrastructures, from a $5 VPS to GPU clusters. Popular tools within its ecosystem include the Hindsight memory plugin, the Anthropic Cybersecurity Skills pack, and the mission-control dashboard for agent orchestration. Key differentiators from OpenClaw are its architecture philosophy—centered on the agent's own execution loop rather than a central controller—and its autonomous skill generation versus OpenClaw's manually written skills. Installation is a one-line command, and setup is guided. It integrates with messaging platforms like Telegram, Discord, and Slack. It's suited for scenarios requiring a persistent, context-aware assistant that improves over time, automates workflows, and operates across various deployment environments.

marsbit04/13 13:11

Hermes Agent Guide: Surpassing OpenClaw, Boosting Productivity by 100x

marsbit04/13 13:11

When AI's Bottleneck Is No Longer the Model: Perseus Yang's Open Source Ecosystem Building Practices and Reflections

In 2026, the AI industry's primary bottleneck is no longer model capability but rather the encoding of domain knowledge, agent-world interfaces, and toolchain maturity. The open-source community is rapidly bridging this gap, evidenced by projects like OpenClaw and Claude Code experiencing explosive growth in their Skill ecosystems. Perseus Yang, a contributor to over a dozen AI open-source projects, argues that Skill systems are the most underestimated infrastructure of the AI agent era. They enable non-coders to program AI by writing natural language SKILL.md files, transferring power from engineers to all professionals. His project, GTM Engineer Skills, demonstrates this by automating go-to-market workflows, proving Skills can extend far beyond engineering into areas like product strategy and business analysis. He also identifies a critical blind spot: while browser automation thrives, agent operations are nearly absent from mobile apps, the world's dominant computing interface. His project, OpenPocket, is an open-source framework that allows agents to operate Android devices via ADB. It features human-in-the-loop security, agent isolation, and the ability for agents to autonomously create and save new reusable Skills. Yang believes the value of open source lies not in the code itself, but in defining the infrastructure standards during this formative period. His work validates the SKILL.md format as a portable unit for agent capability and pioneers new architectures for agent operation in API-less environments. His design philosophy prioritizes usability for non-technical users, ensuring the agent ecosystem can be expanded by practitioners from all fields, not just engineers.

marsbit04/13 01:29

When AI's Bottleneck Is No Longer the Model: Perseus Yang's Open Source Ecosystem Building Practices and Reflections

marsbit04/13 01:29

活动图片