Conversation with Mai-Lan from AWS: The Next Battlefield for S3 – How to Handle the Data Consumption Surge in the Agent Era

marsbitОпубліковано о 2026-05-08Востаннє оновлено о 2026-05-08

Анотація

The explosive rise of Agent AI, exemplified by OpenClaw in China, is putting unprecedented pressure on cloud data infrastructure. Unlike human engineers, Agents consume data in an "extremely active and aggressive" parallel fashion, launching tens to hundreds of queries simultaneously, leading to exponentially higher call frequencies and throughput. Mai-Lan Tomsen Bukovec, VP of Technology at AWS, emphasizes that cost-effectiveness in this data layer is now a decisive factor for customers building Agent systems. To address this, AWS is positioning its foundational Amazon S3 service, now 20 years old, as the critical data platform for the Agent era. Recent key innovations include: **S3 Table** with native Apache Iceberg support, enabling Agents to efficiently interact with structured data via familiar SQL; **S3 Vector**, which introduces vectors as a native type for building contextual data and serving as a shared "memory space" for AI systems; and the newly launched **S3 Files**, which provides a POSIX-compliant file system interface over S3, allowing Agents to interact with data through the familiar paradigm of files and directories. These enhancements are designed to meet the unique data interaction patterns of Agents, which are trained on models already proficient with SQL, file systems, and contextual vectors. By unifying these access methods on the scalable, durable, and cost-efficient S3 foundation, AWS aims to provide the data backbone capable of supporting the next w...

At the beginning of the year, the popularity of OpenClaw in the Chinese market allowed everyone to see the enormous potential of Agents. But what followed was a question that all cloud vendors must answer: When Agents begin to multiply like cybernetic lobsters and call data at high frequencies, are the AI cloud infrastructure layers, especially the data layer, ready?

For example, when enterprise data teams deploy Agents into production environments, they often encounter bottlenecks at the data layer. Building Agents across different platforms such as vector databases, relational databases, graph databases, and data lakehouses requires synchronized data pipelines to maintain the timeliness of context information. But in real production environments, this context information gradually becomes outdated.

The urgency of this problem stems from the fundamentally different data consumption patterns of Agents compared to human engineers.

"Agents are consuming data in an extremely active and aggressive way. Their call frequency to data warehouses or data lakes is astonishing."

Mai-Lan Tomsen Bukovec, Vice President of Technology at Amazon Web Services, recently pointed out in a discussion with the author that Agents operate through a "parallel comparison and selection" mode of work. That is, instead of one query at a time, they run dozens or hundreds in parallel simultaneously, comparing results to find the optimal path. This makes Agents far more aggressive data consumers than humans—with call frequencies several orders of magnitude higher and data throughput growing exponentially.

Mai-Lan further pointed out, "Customers are now very eager to build Agent infrastructure. Cost, or rather cost-effectiveness, is no longer a secondary factor but has become a decisive one. In the next six months to a year, with the explosion of Agents, the choice of underlying data services will become crucial."

Now, the OpenClaw frenzy is subsiding, leaving behind a pressure test warning for the underlying storage and compute capabilities of cloud vendors. Mai-Lan believes that AWS holds a natural advantage in this field. The scale of Amazon S3 (Amazon Simple Storage Service), and the cost efficiency of Amazon Redshift and Amazon Athena under high concurrency, are precisely prepared for this ultra-large-scale, ultra-high-frequency Agent data interaction mode.

Coinciding with the 20th anniversary of Amazon S3, and centered around customer demands for data processing in the AI era, Amazon S3 has recently implemented three major evolutions: S3 Table (Tabular), S3 Files (Files), and S3 Vector (Vector).

Take S3 Table's native support for Apache Iceberg, for example. Mai-Lan noted that when Agents process data, they tend to interact directly with data in Iceberg format via SQL. The underlying logic is that Agents are built on large language models (LLMs), and LLMs have developed mature processing capabilities for SQL syntax and Iceberg data formats during training. Storing all table data in Iceberg format on S3 allows Agents to efficiently handle data without needing to learn complex access APIs for multiple systems. Currently, Agents show a high degree of compatibility with S3 and Iceberg.

When Iceberg capabilities were introduced to S3, it triggered a new wave of innovation. Data sources like Postgres and Oracle began writing directly to Iceberg, and Agent systems could interact directly with these tables. And with the launch of S3 Vectors, more and more AI applications are using vectors as a shared memory medium, thereby injecting "state" into AI interaction experiences.

Mai-Lan also pointed out that vectors have been introduced as a native data type in S3. The application of vectors mainly concentrates on two dimensions: one is using vectors to build contextual information for data stored in S3, and the other is using vectors as shared memory. In the five months since S3 Vectors was released, market feedback has met expectations. A large number of customers have started using this feature, generating vectors via embedding models to enrich the context of their data. The usage of S3 Vectors as the memory space for Agent systems has seen explosive growth.

It is worth mentioning that S3 Files was released a few weeks ago, enabling Agents to process data in S3 via the POSIX standard—that is, through a file system approach. In Agent systems, LLMs pay high attention to the "file" form. Whether it's Python libraries or Shell scripts, they are content familiar from LLM training. Agents naturally prefer to use files as data interfaces.

For this reason, the design concept of S3 Files is to mount an EFS file system on an S3 bucket. Through this mechanism, users can process S3 data in the file system based on POSIX standards: small files can be accessed faster via EFS caching, while large files are streamed directly from S3. This allows Agents to interact natively with S3 data using the familiar language of the file system and treat the shared file system as a "shared memory space" from S3.

From the perspective of the development of LLM memory capabilities, this progress is significant. Current AI experiences are gradually introducing deeper conversational context and personalized interactions—whether between Agents, between humans and Agents, or between Agents and data, model performance is continuously evolving. By further extending this natural interface of the file system, the memory capabilities of Agent systems are expected to achieve deeper enhancements.

The author notes that from its start in 2006 primarily handling semi-structured data like images, to later analytical data, from the initial data warehouse to the rise of the data lake, AWS is now vigorously promoting Amazon S3 to become the key foundation for carrying AI workloads to meet current customer demands. Mai-Lan believes that the design core of Amazon S3 is to drive the growth of mainstream data types in a cost-effective way, while always adhering to principles such as data availability, durability, and resilience. And this is precisely why customers have entrusted their data operations to S3 for the past 20 years, and it will also carry its possibilities for the next 20 years.

(Author | Yang Li, Editor | Yang Lin)

Трендові криптовалюти

CitreaCTR

wrapped stUSDTWSTUSDT

Пов'язані питання

QWhat is the core difference in data consumption patterns between AI agents and human engineers as highlighted in the article?

AThe article emphasizes that AI agents consume data in an 'extremely active and aggressive' manner. They operate on a 'parallel comparison' or 'optimization by comparison' model, issuing dozens or even hundreds of parallel queries simultaneously to find the best path. This results in a data consumption frequency and throughput that is several orders of magnitude higher than that of human engineers.

QWhat are the three major innovations recently implemented for Amazon S3 to meet the demands of the AI era?

ATo address AI-era data processing needs, Amazon S3 has recently introduced three major innovations: S3 Table (with native support for Apache Iceberg format), S3 Files (enabling POSIX file system access to S3 data), and S3 Vector (introducing vectors as a native data type for building context and shared memory).

QWhy does the article suggest that S3's support for Apache Iceberg is particularly beneficial for AI agents?

AThe article states that AI agents, built on large language models (LLMs), are already proficient in handling SQL syntax and Iceberg data formats due to their training. By storing all table data in Iceberg format on S3, agents can interact with the data efficiently without needing to learn multiple complex access APIs. This creates a high degree of compatibility between agents and the S3/Iceberg ecosystem.

QHow does the newly released S3 Files feature enable better interaction for AI agents with data in S3?

AS3 Files allows agents to interact with S3 data via the POSIX file system standard. It works by mounting an EFS file system on an S3 bucket. This lets agents use familiar file system operations: small files are accelerated via EFS cache, while large files are streamed directly from S3. This provides agents with a natural 'file' interface, treating the shared file system as a 'shared memory space' sourced from S3.

QAccording to Mai-Lan, what has become a decisive factor for customers looking to build Agent infrastructure, moving beyond just being a secondary consideration?

AMai-Lan points out that for customers eager to build Agent infrastructure, 'cost, or rather cost-performance ratio, is no longer a secondary factor but has become a decisive factor.' She emphasizes that in the coming 6 to 12 months, the choice of underlying data services will be crucial as Agent adoption explodes.

Пов'язані матеріали

The Bitcoin Vector #61

Executive Summary: The article "The Bitcoin Vector #61" discusses recent developments and analysis in the Bitcoin ecosystem. It notes significant price volatility, major institutional investment moves, and key regulatory updates. The piece explores the impact of macroeconomic factors on cryptocurrency markets and provides technical analysis of Bitcoin's current market position.

insights.glassnode32 хв тому

The Full Story of How Crypto Unicorn Blockstream Is Mired in Serious Fraud Allegations

This article details serious allegations of fraud against Bitcoin infrastructure company Blockstream, founded by Bitcoin pioneer Adam Back. In June 2024, investigative account NatInfoSec published a report accusing Blockstream's mining note (BMN) program of potentially operating a multi-billion dollar scheme with Ponzi-like characteristics. The core allegations focus on Blockstream Mining Notes (BMNs), which offer investors fixed annual yields up to approximately 20% from Bitcoin mining. NatInfoSec's investigation raises several key issues: 1. **Suspicious Hashrate & Payout Capacity**: The analysis suggests Blockstream would need 20-45 EH/s of mining power to cover its BMN obligations, but its public dashboard shows only around 15 EH/s. Furthermore, no verifiable public evidence (e.g., grid connection records, import data) was found to support the massive mining operation required. 2. **Questionable Payout Source**: The BMN contract allows Blockstream to use Bitcoin from *any source* (Substitute Performance BTC) to fulfill investor payouts, raising concerns that payouts may not come from actual mining revenue. 3. **High-Risk, Fixed Returns**: Offering ~20% fixed yields in the volatile, cyclical Bitcoin mining industry is viewed as highly unusual and requires clear explanation. 4. **Undisclosed Criminal Record of Key Figure**: Christopher William Cook, a key figure in Blockstream's mining operations and CEO of spin-off Exacore, was found to have a federal felony conviction for mail fraud in 2008, a fact not disclosed in BMN offering documents. His background was also allegedly embellished. 5. **Potential Contagion to BSTR SPAC**: Questions were raised about whether these liabilities and Cook's record should have been disclosed in the SEC filings for Bitcoin Standard Treasury Company (BSTR), a separate Adam Back-associated firm planning a SPAC merger. The crypto community is divided. BitMEX Research validated Cook's criminal record and expressed concern over the high yields but found other evidence lacking or misleading, noting the legal separation between BMN, Blockstream, and BSTR. Blockstream defenders, like Samson Mow, argue the mining is real. Critics, however, emphasize the lack of independent, verifiable proof of the mining operation's scale and the true source of investor payouts. The article concludes that BMN remains shrouded in key unanswered questions regarding its actual size, the verifiability of its underlying mining assets and payouts, the source of its high yields, and the full role and disclosure concerning Chris Cook. Blockstream had not issued a comprehensive response at the time of writing.

marsbit1 год тому

The Full Story of How Crypto Unicorn Blockstream Is Mired in Serious Fraud Allegations

marsbit1 год тому

South Korea Targets 40 Unregistered Crypto Operators in Regulatory Crackdown

South Korea's Financial Intelligence Unit has referred around 40 unregistered virtual asset service providers to the police as part of a regulatory crackdown. Under local law, all crypto exchanges must obtain ISMS certification and FIU registration before operating, yet only 28 entities are currently registered. The FIU identified methods used by offshore platforms to attract Korean users, including advertising on local messaging apps and using private money changers to convert stablecoins. Officials stressed that unregistered platforms are not protected by Korean law. This action aligns with South Korea's push for stricter global crypto compliance through FATF measures. FIU Director Lee Hyung-joo recently urged FATF members to remove transaction thresholds for the "Travel Rule," as South Korea plans to mandate identity checks for all crypto transactions starting in August, expanding from the current 1 million won (≈$730) minimum. Regulatory enforcement has intensified in 2026, with authorities pursuing criminal cases against market manipulation schemes and enhancing cooperation with financial institutions to curb cross-border illegal transactions.

TheNewsCrypto1 год тому

South Korea Targets 40 Unregistered Crypto Operators in Regulatory Crackdown

TheNewsCrypto1 год тому

The Full Story Behind Encryption Unicorn Blockstream's Deep Entanglement in Serious Fraud Allegations

This article details allegations of serious fraud surrounding the crypto company Blockstream, founded by Bitcoin pioneer Adam Back. Investigation account NatInfoSec accuses Blockstream of raising billions through its Blockstream Mining Note (BMN) products, which offer high fixed yields of up to 20% from purported mining revenue. The core allegations are: 1) Blockstream's public mining hash rate (15 EH/s) appears insufficient to cover the massive payout obligations from sold BMN notes, raising questions about the true source of investor payouts. 2) Key executive Christopher William Cook, central to the mining operations, has a prior federal conviction for mail fraud, a fact not disclosed to investors. Cook's background and lavish lifestyle are highlighted as red flags. 3) The structure allows payouts from any source of BTC, not necessarily mining revenue, which critics argue gives it Ponzi-like characteristics. The controversy also touches on Bitcoin Standard Treasury Company (BSTR), a related entity planning a SPAC上市. Critics question whether BMN's liabilities and Cook's record should be disclosed in BSTR's filings. BitMEX Research offered a tempered analysis, confirming Cook's criminal record is likely true and the high yields concerning, but found other claims like insufficient抵押证据 less substantiated. Community debate centers on the need for verifiable proof of Blockstream's mining output and revenue. The article concludes that while fraud is not proven, BMN presents significant, unresolved questions regarding its actual scale, the source of its high fixed returns, the verifiability of its mining operations and payouts, and the full disclosure of associated risks and personnel backgrounds. Blockstream has not yet issued a formal response.

链捕手2 год тому

The Full Story Behind Encryption Unicorn Blockstream's Deep Entanglement in Serious Fraud Allegations

链捕手2 год тому

Crypto Lobby Pushes Congress To Keep Staking And Mining Tax Bill Intact

Crypto advocacy groups are urging U.S. lawmakers to pass the Tax Clarity for Mining and Staking Act (H.R. 9175) without changes. The bill seeks to clarify that rewards for proof-of-work miners and proof-of-stake validators are taxed only when the assets are sold, not immediately when received. This deferred tax treatment is crucial for operators' cash flow and profitability. The proposal faces opposition from banking interests, who argue it could give crypto yield products an unfair advantage over traditional savings. The outcome will impact network decentralization, as complex tax rules could push out smaller operators. The lobbying effort marks an expansion of crypto's policy focus beyond market structure into tax rules that underpin network economics. The bill's fate depends on whether Congress advances it as a standalone clarification or part of a broader digital-asset package.

bitcoinist3 год тому

Crypto Lobby Pushes Congress To Keep Staking And Mining Tax Bill Intact

bitcoinist3 год тому

Торгівля

Спот

Ф'ючерси

Обговорення

Ласкаво просимо до спільноти HTX. Тут ви можете бути в курсі останніх подій розвитку платформи та отримати доступ до професійної ринкової інформації. Нижче представлені думки користувачів щодо ціни ERA (ERA).