Conversation with Mai-Lan from AWS: The Next Battlefield for S3 – How to Handle the Data Consumption Surge in the Agent Era

marsbit2026-05-08 tarihinde yayınlandı2026-05-08 tarihinde güncellendi

Özet

The explosive rise of Agent AI, exemplified by OpenClaw in China, is putting unprecedented pressure on cloud data infrastructure. Unlike human engineers, Agents consume data in an "extremely active and aggressive" parallel fashion, launching tens to hundreds of queries simultaneously, leading to exponentially higher call frequencies and throughput. Mai-Lan Tomsen Bukovec, VP of Technology at AWS, emphasizes that cost-effectiveness in this data layer is now a decisive factor for customers building Agent systems. To address this, AWS is positioning its foundational Amazon S3 service, now 20 years old, as the critical data platform for the Agent era. Recent key innovations include: **S3 Table** with native Apache Iceberg support, enabling Agents to efficiently interact with structured data via familiar SQL; **S3 Vector**, which introduces vectors as a native type for building contextual data and serving as a shared "memory space" for AI systems; and the newly launched **S3 Files**, which provides a POSIX-compliant file system interface over S3, allowing Agents to interact with data through the familiar paradigm of files and directories. These enhancements are designed to meet the unique data interaction patterns of Agents, which are trained on models already proficient with SQL, file systems, and contextual vectors. By unifying these access methods on the scalable, durable, and cost-efficient S3 foundation, AWS aims to provide the data backbone capable of supporting the next w...

At the beginning of the year, the popularity of OpenClaw in the Chinese market allowed everyone to see the enormous potential of Agents. But what followed was a question that all cloud vendors must answer: When Agents begin to multiply like cybernetic lobsters and call data at high frequencies, are the AI cloud infrastructure layers, especially the data layer, ready?

For example, when enterprise data teams deploy Agents into production environments, they often encounter bottlenecks at the data layer. Building Agents across different platforms such as vector databases, relational databases, graph databases, and data lakehouses requires synchronized data pipelines to maintain the timeliness of context information. But in real production environments, this context information gradually becomes outdated.

The urgency of this problem stems from the fundamentally different data consumption patterns of Agents compared to human engineers.

"Agents are consuming data in an extremely active and aggressive way. Their call frequency to data warehouses or data lakes is astonishing."

Mai-Lan Tomsen Bukovec, Vice President of Technology at Amazon Web Services, recently pointed out in a discussion with the author that Agents operate through a "parallel comparison and selection" mode of work. That is, instead of one query at a time, they run dozens or hundreds in parallel simultaneously, comparing results to find the optimal path. This makes Agents far more aggressive data consumers than humans—with call frequencies several orders of magnitude higher and data throughput growing exponentially.

Mai-Lan further pointed out, "Customers are now very eager to build Agent infrastructure. Cost, or rather cost-effectiveness, is no longer a secondary factor but has become a decisive one. In the next six months to a year, with the explosion of Agents, the choice of underlying data services will become crucial."

Now, the OpenClaw frenzy is subsiding, leaving behind a pressure test warning for the underlying storage and compute capabilities of cloud vendors. Mai-Lan believes that AWS holds a natural advantage in this field. The scale of Amazon S3 (Amazon Simple Storage Service), and the cost efficiency of Amazon Redshift and Amazon Athena under high concurrency, are precisely prepared for this ultra-large-scale, ultra-high-frequency Agent data interaction mode.

Coinciding with the 20th anniversary of Amazon S3, and centered around customer demands for data processing in the AI era, Amazon S3 has recently implemented three major evolutions: S3 Table (Tabular), S3 Files (Files), and S3 Vector (Vector).

Take S3 Table's native support for Apache Iceberg, for example. Mai-Lan noted that when Agents process data, they tend to interact directly with data in Iceberg format via SQL. The underlying logic is that Agents are built on large language models (LLMs), and LLMs have developed mature processing capabilities for SQL syntax and Iceberg data formats during training. Storing all table data in Iceberg format on S3 allows Agents to efficiently handle data without needing to learn complex access APIs for multiple systems. Currently, Agents show a high degree of compatibility with S3 and Iceberg.

When Iceberg capabilities were introduced to S3, it triggered a new wave of innovation. Data sources like Postgres and Oracle began writing directly to Iceberg, and Agent systems could interact directly with these tables. And with the launch of S3 Vectors, more and more AI applications are using vectors as a shared memory medium, thereby injecting "state" into AI interaction experiences.

Mai-Lan also pointed out that vectors have been introduced as a native data type in S3. The application of vectors mainly concentrates on two dimensions: one is using vectors to build contextual information for data stored in S3, and the other is using vectors as shared memory. In the five months since S3 Vectors was released, market feedback has met expectations. A large number of customers have started using this feature, generating vectors via embedding models to enrich the context of their data. The usage of S3 Vectors as the memory space for Agent systems has seen explosive growth.

It is worth mentioning that S3 Files was released a few weeks ago, enabling Agents to process data in S3 via the POSIX standard—that is, through a file system approach. In Agent systems, LLMs pay high attention to the "file" form. Whether it's Python libraries or Shell scripts, they are content familiar from LLM training. Agents naturally prefer to use files as data interfaces.

For this reason, the design concept of S3 Files is to mount an EFS file system on an S3 bucket. Through this mechanism, users can process S3 data in the file system based on POSIX standards: small files can be accessed faster via EFS caching, while large files are streamed directly from S3. This allows Agents to interact natively with S3 data using the familiar language of the file system and treat the shared file system as a "shared memory space" from S3.

From the perspective of the development of LLM memory capabilities, this progress is significant. Current AI experiences are gradually introducing deeper conversational context and personalized interactions—whether between Agents, between humans and Agents, or between Agents and data, model performance is continuously evolving. By further extending this natural interface of the file system, the memory capabilities of Agent systems are expected to achieve deeper enhancements.

The author notes that from its start in 2006 primarily handling semi-structured data like images, to later analytical data, from the initial data warehouse to the rise of the data lake, AWS is now vigorously promoting Amazon S3 to become the key foundation for carrying AI workloads to meet current customer demands. Mai-Lan believes that the design core of Amazon S3 is to drive the growth of mainstream data types in a cost-effective way, while always adhering to principles such as data availability, durability, and resilience. And this is precisely why customers have entrusted their data operations to S3 for the past 20 years, and it will also carry its possibilities for the next 20 years.

(Author | Yang Li, Editor | Yang Lin)

İlgili Sorular

QWhat is the core difference in data consumption patterns between AI agents and human engineers as highlighted in the article?

AThe article emphasizes that AI agents consume data in an 'extremely active and aggressive' manner. They operate on a 'parallel comparison' or 'optimization by comparison' model, issuing dozens or even hundreds of parallel queries simultaneously to find the best path. This results in a data consumption frequency and throughput that is several orders of magnitude higher than that of human engineers.

QWhat are the three major innovations recently implemented for Amazon S3 to meet the demands of the AI era?

ATo address AI-era data processing needs, Amazon S3 has recently introduced three major innovations: S3 Table (with native support for Apache Iceberg format), S3 Files (enabling POSIX file system access to S3 data), and S3 Vector (introducing vectors as a native data type for building context and shared memory).

QWhy does the article suggest that S3's support for Apache Iceberg is particularly beneficial for AI agents?

AThe article states that AI agents, built on large language models (LLMs), are already proficient in handling SQL syntax and Iceberg data formats due to their training. By storing all table data in Iceberg format on S3, agents can interact with the data efficiently without needing to learn multiple complex access APIs. This creates a high degree of compatibility between agents and the S3/Iceberg ecosystem.

QHow does the newly released S3 Files feature enable better interaction for AI agents with data in S3?

AS3 Files allows agents to interact with S3 data via the POSIX file system standard. It works by mounting an EFS file system on an S3 bucket. This lets agents use familiar file system operations: small files are accelerated via EFS cache, while large files are streamed directly from S3. This provides agents with a natural 'file' interface, treating the shared file system as a 'shared memory space' sourced from S3.

QAccording to Mai-Lan, what has become a decisive factor for customers looking to build Agent infrastructure, moving beyond just being a secondary consideration?

AMai-Lan points out that for customers eager to build Agent infrastructure, 'cost, or rather cost-performance ratio, is no longer a secondary factor but has become a decisive factor.' She emphasizes that in the coming 6 to 12 months, the choice of underlying data services will be crucial as Agent adoption explodes.

İlgili Okumalar

Deconstructing the U.S. Stock Quantum Computing Sector: IonQ, Rigetti, D-Wave, Which of These Concept Stocks is Worth Betting On?

**Title:** Analyzing the US Quantum Computing Race: IonQ, Rigetti, D-Wave – Which Concept Stock is Worth Betting On? **Summary:** The podcast discusses the resurgence of quantum computing as a national priority for both the US and China, driven by its potential to break current encryption, revolutionize drug discovery, finance, and logistics. The core challenge is commercializing the technology, which is hampered by high error rates in quantum bits (qubits). Quantum error correction, requiring thousands of physical qubits per reliable logical qubit, is key but years away. The analysis compares three main publicly traded US quantum computing firms: * **IonQ (Ion Trap):** Considered the most financially stable with the fastest commercial progress (2025 revenue: $130M, +202%) and high-quality clients. Its valuation is very high, pricing in significant future growth. * **Rigetti (Superconducting):** Seen as the highest-risk, highest-potential-reward bet. It has the smallest revenue but recently launched a 108-qubit system. Its valuation multiples are extreme, making it highly sensitive to news. * **D-Wave (Quantum Annealing):** Has the most unique positioning with real-world enterprise clients today (e.g., Mastercard, Volkswagen) solving optimization problems. Its recent acquisition moves it into general-purpose quantum computing ("dual-platform"), adding execution risk. Major tech giants like Google, IBM, and Microsoft are also heavily invested, pursuing various technical approaches. Nvidia is positioning itself as the essential bridge between classical and quantum computing. The investment phase is likened to AI in 2018-2020: promising underlying technology with accelerating breakthroughs but a commercial inflection point still 3-7 years away, suggesting potential for a market correction ("bubble washout"). For investors, suggested approaches include gaining exposure through tech giants with quantum divisions (e.g., Google, IBM) or using niche ETFs like WQTM for pure-play quantum exposure, rather than direct stock picks in the highly volatile pure-play companies at this early stage.

marsbit9 dk önce

Deconstructing the U.S. Stock Quantum Computing Sector: IonQ, Rigetti, D-Wave, Which of These Concept Stocks is Worth Betting On?

marsbit9 dk önce

From Parallel Finance to Mainstream Finance: The On-Chain Securities Era Ushers in a Historic Window

From Parallel Finance to Mainstream: The Dawn of On-Chain Securities For over a decade, the crypto industry has operated as a parallel financial system with its own currencies, markets, and assets—from Bitcoin and ICOs to DeFi, NFTs, and memecoins. Despite building a robust internal ecosystem, a wall has separated it from the traditional financial world. That barrier is now crumbling. The industry's first act was one of internal evolution: ICOs streamlined fundraising, DeFi recreated financial services on-chain, and layer-2 networks competed for scalability—all within the crypto bubble. While innovative, this cycle remained closed, with capital and users circulating internally, leading to volatile boom-bust cycles. Even Bitcoin ETFs, while attracting Wall Street capital, merely provided a channel to buy crypto assets without bridging the systems. The next, larger narrative is Real-World Assets (RWA) moving on-chain. This involves tokenizing stocks, bonds, funds, and future cash flows. Blockchain can compress the complex traditional processes of trading, settlement, clearing, and custody into a seamless, automated network operating in seconds. This shift is creating a new financial gateway: the native crypto securities broker. This entity will combine functions of an exchange, broker, bank, and custodian into a unified global financial operating system. Consequently, the next major battleground won't be the "public chain wars" focused on speed and cost, but the competition to build the financial infrastructure capable of hosting high-quality, liquid real-world assets. Access to global equities, index funds, or stakes in companies like SpaceX could erase the boundary between crypto and traditional finance, unlocking a market orders of magnitude larger than crypto's current valuation. In summary, after years of creating a separate financial world, crypto's next decade will be defined by its integration into the existing global financial system, marking the true beginning of its largest growth story.

marsbit31 dk önce

From Parallel Finance to Mainstream Finance: The On-Chain Securities Era Ushers in a Historic Window

marsbit31 dk önce

Wang Chuan: When the Neighbor Old Wang Made 30x on Memory Stocks, How to Avoid Anxiety (Part Six) - The Trap of Commoditized Goods

Wang Chuan: When the Neighbor Lao Wang Made 30x on Storage Stocks, How to Stay Anxiety-Free (Part 6) - The Trap of Commoditized Goods. This essay uses historical and current examples to analyze the cyclical and high-risk nature of the data storage industry. It begins with the 1990s rise and dramatic fall of Iomega, whose stock soared over 160x in 18 months before collapsing 97% from its peak, illustrating the fleeting success of storage "meme stocks." The core problem is that storage products, like DRAM and flash memory, are highly commoditized. This leads to extreme volatility: prices have plummeted over 80% multiple times, and company stocks often crash 95% or go bankrupt. The industry's dynamic is defined by "elastic demand facing heavy-asset, long-cycle, rigid supply." When demand spikes and supply is fixed, prices skyrocket, as seen recently with AI-driven demand for High Bandwidth Memory (HBM). Companies like Sandisk and Micron have reported massive revenue and gross margin jumps (e.g., Sandisk's gross margin rising from 22.5% to 78.3%) despite minimal increases in production volume. However, these high margins are self-defeating. They incentivize massive new capacity investments (hundreds of billions planned from 2026), with supply expected to surge by late 2027. Once new supply meets demand, prices and profits will crash, potentially leading to a scenario where "selling more results in earning less." The article debunks the safety of long-term supply agreements, comparing them to fragile non-aggression pacts easily broken when market conditions shift. It warns that when an industry is highly profitable but trades at low P/E ratios, the risk is greatest, as plummeting prices quickly erase those earnings. Multiple asymmetric risks loom, including economic recession, reduced AI spending, faster-than-expected capacity expansion (especially from Chinese firms), and technological innovations that reduce memory requirements. In conclusion, the storage sector is a cyclical trap where periods of euphoric profits are often precursors to devastating downturns, luring unprepared investors into a "wealth incinerator."

marsbit40 dk önce

Wang Chuan: When the Neighbor Old Wang Made 30x on Memory Stocks, How to Avoid Anxiety (Part Six) - The Trap of Commoditized Goods

marsbit40 dk önce

Wang Chuan: When the neighbor Lao Wang earned thirty times from investing in memory storage stocks, how can you still avoid anxiety (6) - The trap of homogeneous products

The article, "Wang Chuan: How to Remain Unanxious After Neighbor Lao Wang's Thirty-Fold Gain on Storage Stocks (Part 6) - The Trap of Commoditized Goods," analyzes the cyclical and perilous nature of the data storage industry through historical and current case studies. It begins with the example of Iomega, whose Zip drives led to a stock surge of over 160x in the mid-1990s before collapsing over 97% from its peak due to competition from cheaper CD-R technology. This pattern is characteristic of storage, where products like DRAM are highly commoditized, leading to extreme price volatility. The sector has seen prices crash over 80% multiple times, with companies often facing bankruptcy. The core dynamic is "elastic demand facing heavy-asset, long-cycle, rigid supply." High prices attract new capacity, but the long lead time means supply eventually overshoots, causing sharp price corrections. The current AI-driven boom, exemplified by surging demand for High-Bandwidth Memory (HBM), has led to skyrocketing prices and profit margins for companies like SanDisk and Micron, despite relatively flat production volumes. However, the author warns this high-margin environment is self-defeating. The high profits are already triggering massive new capacity investments (hundreds of billions starting 2026), with supply expected to ramp up by late 2027. When supply catches up, total revenue and profits may fall even as more units are sold. Long-term supply agreements offer little protection, as buyers can find ways to renegotiate if market prices drop, similar to fragile political treaties. Key risks include economic downturns, cuts in AI spending, faster-than-expected capacity expansion (especially from Chinese firms), and innovations in chip/algorithm design that reduce memory needs. A critical trap is that at the cycle's peak, storage stocks often appear cheap with low P/E ratios, luring value investors just before an impending downturn where profits evaporate. The conclusion cautions that for commoditized goods like storage, high margins inevitably destroy themselves, and the current asymmetry favors downside risk over further upside. The neighbor's dream of easy wealth from storage stocks is portrayed as a precarious illusion.

链捕手58 dk önce

Wang Chuan: When the neighbor Lao Wang earned thirty times from investing in memory storage stocks, how can you still avoid anxiety (6) - The trap of homogeneous products

链捕手58 dk önce

İşlemler

Spot
Futures

Popüler Makaleler

ERA Nasıl Satın Alınır

HTX.com’a hoş geldiniz! Caldera (ERA) satın alma işlemlerini basit ve kullanışlı bir hâle getirdik. Adım adım açıkladığımız rehberimizi takip ederek kripto yolculuğunuza başlayın. 1. Adım: HTX Hesabınızı OluşturunHTX'te ücretsiz bir hesap açmak için e-posta adresinizi veya telefon numaranızı kullanın. Sorunsuzca kaydolun ve tüm özelliklerin kilidini açın. Hesabımı Aç2. Adım: Kripto Satın Al Bölümüne Gidin ve Ödeme Yönteminizi SeçinKredi/Banka Kartı: Visa veya Mastercard'ınızı kullanarak anında Caldera (ERA) satın alın.Bakiye: Sorunsuz bir şekilde işlem yapmak için HTX hesap bakiyenizdeki fonları kullanın.Üçüncü Taraflar: Kullanımı kolaylaştırmak için Google Pay ve Apple Pay gibi popüler ödeme yöntemlerini ekledik.P2P: HTX'teki diğer kullanıcılarla doğrudan işlem yapın.Borsa Dışı (OTC): Yatırımcılar için kişiye özel hizmetler ve rekabetçi döviz kurları sunuyoruz.3. Adım: Caldera (ERA) Varlıklarınızı SaklayınCaldera (ERA) satın aldıktan sonra HTX hesabınızda saklayın. Alternatif olarak, blok zinciri transferi yoluyla başka bir yere gönderebilir veya diğer kripto para birimlerini takas etmek için kullanabilirsiniz.4. Adım: Caldera (ERA) Varlıklarınızla İşlem YapınHTX'in spot piyasasında Caldera (ERA) ile kolayca işlemler yapın.Hesabınıza erişin, işlem çiftinizi seçin, işlemlerinizi gerçekleştirin ve gerçek zamanlı olarak izleyin. Hem yeni başlayanlar hem de deneyimli yatırımcılar için kullanıcı dostu bir deneyim sunuyoruz.

472 Toplam GörüntülenmeYayınlanma 2025.07.17Güncellenme 2025.07.17

ERA Nasıl Satın Alınır

Tartışmalar

HTX Topluluğuna hoş geldiniz. Burada, en son platform gelişmeleri hakkında bilgi sahibi olabilir ve profesyonel piyasa görüşlerine erişebilirsiniz. Kullanıcıların ERA (ERA) fiyatı hakkındaki görüşleri aşağıda sunulmaktadır.

活动图片