AI2 Releases Fully Open-Source Web Agent MolmoWeb: Controlling Web Pages Using Only "Vision"

marsbit2026-03-26 tarihinde yayınlandı2026-03-26 tarihinde güncellendi

Özet

AI2 has released MolmoWeb, a groundbreaking, fully open-source web agent that operates solely by analyzing screenshots, marking a significant leap in vision-driven web navigation. Unlike traditional agents that rely on DOM, MolmoWeb captures and interprets visual data to make decisions—such as clicking, scrolling, or typing—making its process transparent and robust. Despite its compact size (4B and 8B parameters), MolmoWeb performs impressively: it scores 78.2% on the WebVoyager benchmark, nearing OpenAI’s proprietary o3 model (79.3%), and achieves up to 94.7% success with multiple attempts. It even surpasses Anthropic’s Claude3.7 in UI element localization. AI2 also released MolmoWebMix, a massive open dataset with 36K human-browsing tasks, over 2.2M screenshot-QA pairs, and GPT-4o-verified synthetic data. The model and data are fully available on Hugging Face and GitHub under Apache 2.0, promoting transparency and collaboration in AI development. Challenges remain in complex instructions, logins, and legal compliance.

The Allen Institute for Artificial Intelligence (AI2) recently released the groundbreaking fully open-source web agent MolmoWeb . Unlike traditional agents that rely on a webpage's underlying code (DOM), MolmoWeb makes decisions solely by reading screenshots, marking a significant leap forward in "vision-driven" web navigation technology.

Core Technology: "Seeing" Web Pages Like a Human

MolmoWeb's operating logic is very intuitive: it captures a screenshot of the current browser window, decides the next action (such as clicking, scrolling, or paging) through visual analysis, then executes it and repeats. This "what you see is what you get" model makes it more robust than traditional agents because the visual layout of a webpage is generally more stable than its underlying code, and its decision-making process is completely transparent and explainable to human users.

Performance Leap: Small Model Outperforms Giants

Despite having parameter sizes of only 4B and 8B, MolmoWeb demonstrates a "small but mighty" performance:

Topping the Charts: In the WebVoyager test, the 8B version scored an impressive 78.2%, not only ranking among the top open-source models but also approaching the performance of OpenAI's proprietary model o3 (79.3%).
Huge Potential: Research found that by running tasks multiple times and selecting the optimal result, its success rate could further jump to 94.7%.
Precise Localization: In UI element localization benchmark tests, it even surpassed Anthropic's Claude3.7.

Data Support: The Largest Open Dataset to Date

AI2 has not only open-sourced the model weights but also contributed a massive dataset named MolmoWebMix. This dataset contains:

36,000 real browsing tasks completed by human volunteers.
Over 2.2 million screenshot-question-answer pairs.
Automated synthetic data verified by GPT-4o. Experiments show that synthetic data is even better than human trajectories at guiding the agent to find the "optimal path".

Open-Source Spirit and Future Challenges

Currently, MolmoWeb is fully available under the Apache 2.0 license on Hugging Face and GitHub. Although it still faces challenges in handling complex instructions, login authentication, and legal compliance (such as terms of service), AI2 firmly believes that only through complete transparency and community collaboration can we truly counter the data monopoly of large tech companies.

İlgili Sorular

QWhat is the name of the fully open-source web agent released by the Allen Institute for AI (AI2) that navigates using only screenshots?

AThe web agent is called MolmoWeb.

QHow does MolmoWeb's approach to web navigation differ from traditional web agents?

AUnlike traditional agents that rely on a webpage's underlying code (DOM), MolmoWeb makes decisions by reading and analyzing screenshots, making it a 'vision-driven' technology.

QWhat was the performance score of the 8B parameter version of MolmoWeb on the WebVoyager test, and how does it compare to OpenAI's model?

AThe 8B version scored 78.2% on the WebVoyager test, which is very close to the performance of OpenAI's proprietary model o3, which scored 79.3%.

QWhat is the name of the large, open dataset released alongside MolmoWeb, and what does it contain?

AThe dataset is called MolmoWebMix. It contains 36,000 real browsing tasks completed by human volunteers, over 2.2 million screenshot-QA pairs, and automated synthetic data verified by GPT-4o.

QOn which platforms has MolmoWeb been made available, and under what license?

AMolmoWeb has been fully released on Hugging Face and GitHub under the Apache 2.0 license.

İlgili Okumalar

Is SWIFT Integrating XRP For Payments? The Code Might Hold The Answer

Crypto pundit SMQKE has speculated that SWIFT may be integrating XRP for payments, citing code from the R3 Corda codebase that links XRP to ISO 20022 and SWIFT network. The code includes modules named XrpPayment and SWIFTPaymentStatusType, suggesting technical compatibility. While XRP is ISO 20022-ready, enabling potential connections with SWIFT-enabled institutions, SWIFT has not confirmed any integration. Currently, the only known link is through Ripple Treasury, a SWIFT connectivity partner, though SWIFT does not use XRP for payments. In related news, X (formerly Twitter) has added XRP to its cashtags feature, increasing its visibility. XRP is trading at around $1.40, up 3% in 24 hours.

bitcoinist1 saat önce

Is SWIFT Integrating XRP For Payments? The Code Might Hold The Answer

bitcoinist1 saat önce

Blockchain Is South Korea’s New Fiscal Weapon — A Blow To Privacy?

South Korea is launching a pilot program to replace government expense credit cards with blockchain-based deposit tokens for treasury fund execution. This initiative, led by the Ministry of Finance and Economy, builds on a previous project with the Ministry of Environment. Deposit tokens are digital representations of bank deposits on permissioned blockchains, enabling programmable settlement, transparent tracking, and real-time reporting. Unlike CBDCs, they are issued by commercial banks. The pilot aims to enhance spending transparency, reduce intermediaries, and eliminate card-processing fees for small merchants. While improving efficiency, it raises privacy concerns. If successful, South Korea could become a model for blockchain-based fiscal systems.

bitcoinist1 saat önce

Blockchain Is South Korea’s New Fiscal Weapon — A Blow To Privacy?

bitcoinist1 saat önce

Qubic Starts Dogecoin Mining Phase 2, Shifting Rewards Away From XMR

Qubic has entered phase 2 of its Dogecoin mining rollout, shifting miner rewards from Monero (XMR) to DOGE-based incentives. Starting with epoch 209, computors can operate in either XMR or DOGE mode, with only the higher-yielding contribution counted per index—effectively phasing out XMR if DOGE is more profitable. All block rewards are used for QUBIC buybacks, which are then distributed proportionally based on Dogecoin shares submitted. This transition, part of a three-phase plan announced in March, aims to fully migrate from XMR to DOGE, allowing parallel operation of DOGE mining (via Scrypt ASICs) and AI training (on CPUs/GPUs). Phase 2 introduces weekly reward windows, aligning with Qubic’s goal of scaling toward full-capacity dual-workstream operations.

bitcoinist1 saat önce

Qubic Starts Dogecoin Mining Phase 2, Shifting Rewards Away From XMR

bitcoinist1 saat önce

Cardano Founder Says Bitcoin Has Entered ‘Shitcoin Land’

Cardano founder Charles Hoskinson criticized Bitcoin's governance and response to quantum computing threats in a livestream titled "BIP 361: Welcome to ShitcoinLand, Bitcoin." He argued that Bitcoin's resistance to change has left it vulnerable, with over 34% of all Bitcoin (approximately 8 million BTC) exposed to potential theft by quantum computers due to revealed public keys. Hoskinson claimed BIP 361's proposed solution—a hard fork to freeze non-migrated coins—would render around 1.7 million BTC, including Satoshi's estimated 1.1 million, unspendable due to outdated wallet formats. He contrasted Bitcoin's rigid culture with governance models in Cardano, Polkadot, and Ethereum, which he says offer better mechanisms for protocol upgrades. Hoskinson acknowledged the quantum threat as real but argued Bitcoin's ideological inflexibility complicates a effective response.

bitcoinist4 saat önce

Cardano Founder Says Bitcoin Has Entered ‘Shitcoin Land’

bitcoinist4 saat önce

PEPE Flashes Selling Climax Signal, What This Means For Price

PEPE, a meme coin, has been struggling with bearish pressure since its October 10 crash, showing muted price movements. However, technical analysis suggests a potential shift to a bullish trend. A key indicator is the formation of a Selling Climax (SC), where buying pressure overwhelmed selling, signaling the end of the bearish phase and a transition into accumulation. Additionally, a Change of Character (CHoCH) indicates a shift from a bearish to a bullish structure, with momentum favoring buyers. Other positive signs include an Automatic Rally trend and a Last Point of Support, suggesting bulls are regaining control. A Point of Interest (POI) analysis identifies a potential bounce-off level at $0.00000326 for a rally, with a possible peak around $0.0000062, representing a potential 60%+ price increase. This suggests a significant upward move may be imminent for PEPE.

bitcoinist5 saat önce

PEPE Flashes Selling Climax Signal, What This Means For Price

bitcoinist5 saat önce

İşlemler

Spot

Futures