Google Using AI to 'Kill' Google: This Keynote Left People Breathless

marsbitPublicado em 2026-05-19Última atualização em 2026-05-19

Resumo

Google's latest I/O event showcased a sweeping AI evolution, positioning Gemini as the core intelligence across its ecosystem. The launch of Gemini Omni pushes video generation towards a "world model," understanding physics and enabling complex edits through conversation. For developers, the ultra-fast Gemini 3.5 Flash model powers Antigravity 2.0, an agent-first platform capable of orchestrating tasks like building a basic OS from scratch. Google Search is being reimagined with AI Overviews, persistent information-gathering Agents, and generative UI components. A key consumer debut is Gemini Spark, a persistent personal AI Agent that runs tasks in the cloud even when a user's device is off, managing multi-step workflows. The Gemini App itself received a visual overhaul with a new Neural Expressive design and a "Daily Brief" morning summary feature. The company also announced a suite of creative tools (Pics, Stitch, Flow), upcoming AI-powered audio glasses, and expanded content verification with SynthID. Underlying these advances is a significant strategic shift: Google is transitioning from a free, ad-supported model to a future built on AI subscriptions and enterprise services, betting users will pay for deeply integrated, agentic assistance that handles complex tasks across work and life.

The Gemini App has over 900 million monthly active users, processes 3200 trillion tokens per month, and Nano Banana has generated over 50 billion images...

These are the numbers Google CEO Sundar Pichai opened with at today's early morning Google I/O keynote.

Over the past year, AI has become the main theme across all industries. Gemini's role at Google has also evolved from a standalone app to becoming the core AI underlying capability across all Google products.

This keynote also began with models, then moved on to coding and Agent products.

Gemini Omni pushes Google's video generation towards a "world model" direction, while Gemini 3.5 Flash, alongside AI programming tools, is positioned as an Agent development platform.

These two capabilities were then integrated into Google's full ecosystem: Search, the Gemini App, Flow, Spark, Chrome, XR glasses, and e-commerce scenarios.

Enter Gemini Omni, The "Nano Banana" Moment for Video is Here

Gemini Omni was the first to be highlighted in detail at the keynote. We created a comparison video with Seedance 2.0 to show the differences.

Google describes Gemini Omni as a new model capable of "creating any content from any input."

It combines Gemini's reasoning capabilities with Google's existing generative media models, aiming to enhance the model's understanding of the world, its multimodal generation capabilities, and editing abilities.

Google emphasized that while models like Veo, Nano Banana, and Genie can already generate videos, images, and interactive simulations, Gemini Omni goes a step further by beginning to handle problems closer to the physical world, like motion and gravity.

The case shown on stage involved an explanatory video on protein folding. Users simply input a prompt like "Generate a claymation explanation about protein folding," and Omni transforms the abstract scientific concept into video content.

It also supports more natural video editing. Users can upload their own videos, then modify the style, add elements, adjust details conversationally—even turning a plain circle into a black hole or transforming a nighttime walk into a more dramatic scene.

Google's claim is that Gemini Omni is starting with video but will gradually progress towards "any input to any output." This is also why Google has consistently designed Gemini as a multimodal model from the beginning.

The first model in the Omni family, Gemini Omni Flash, is already being integrated into Google products; more information about Omni Pro will be released later. Omni features in the Gemini App are also now available to Google AI Plus, Pro, and Ultra subscribers.

This means Gemini Omni isn't just a video generation model. Google wants to frame it within the narrative of a "world model": the model doesn't just generate images; it must understand the physical relationships, motion relationships, and scene logic within them.

As it enters applications like the Gemini App, Google Flow, and YouTube Shorts, Omni will also expand Google's generative creation tools from image editing to video editing.

Gemini 3.5 Flash Launches, AI Coding Enters Hyperdrive

If Gemini Omni corresponds to generation and editing, Gemini 3.5 Flash corresponds to speed, cost, and execution capability.

Google introduced Gemini 3.5 Flash at the keynote, calling it one of the first models in the Gemini 3.5 series, focused on agentic coding, long-horizon tasks, and real-world workflows.

Compared to 3.1 Pro, 3.5 Flash shows significant improvements in almost all benchmarks, especially in coding capabilities and evaluations like GDPVal that are closer to real-world economic tasks.

Beyond good benchmark performance, 3.5 Flash generates output tokens 4 times faster than other leading models. After specific optimization in Antigravity, speeds can reach up to 12 times.

It's worth noting that in March, Google's internal development-related tasks processed about 500 billion tokens daily, doubling every few weeks to now exceed 3 trillion tokens daily. Google calls this a feedback loop, using massive real-world usage to further improve 3.5 Flash.

Launched alongside the model is Antigravity 2.0.

It has evolved from an agent-powered IDE into a standalone desktop application, shifting focus to being "agent-first." Users are no longer just using AI for code assistance within an editor but completing development tasks through Agent conversations, Agent outputs, and multi-Agent collaboration.

Antigravity 2.0 adds a full CLI, Antigravity SDK, native voice support via Gemini's audio model, and integrates services like Android, Firebase, and Google AI Studio. Antigravity 2.0, as a standalone desktop app, is now available globally.

Google used a high-intensity demo on stage to illustrate Antigravity 2.0's direction: having an Agent build a runnable operating system from scratch. This task involved 93 sub-Agents executing in parallel over 12 hours, initiating over 15,000 model calls, processing 2.6 billion tokens, and generating core modules like a scheduler, memory management, and file system from an empty project.

Google claimed this task was impossible with Gemini 3.1 Pro, and using Gemini 3.5 Flash cost less than $1000 in API credits.

The demo also showed this system running an SL train program and Doom. Since the system initially lacked video and keyboard drivers, Antigravity proceeded to generate the relevant code and fix it, enabling Doom to run. Google also stated that similar approaches have been tested on projects like a photo editing suite, real-time messaging app, and multi-user collaboration platforms, compressing engineering work that previously took days into hours or even less.

Gemini 3.5 Flash is now available to all users, covering Google products and API. Gemini 3.5 Pro is still under internal use and improvement, expected to open up next month.

From Search Box to Information Agent: Google Reinvents AI Search

Following models and development tools, Google shifted focus to search. Google Search is now AI Search.

Google stated that AI Mode already has over 1 billion monthly active users, with query volume doubling quarterly since its launch.

Starting today, AI Mode is being upgraded to Gemini 3.5. The new intelligent search box is also rolling out. It supports text, image, file, and video inputs and provides AI suggestions as users type questions.

AI Overviews and AI Mode have also been merged into a more continuous AI search experience. Users can first see AI answers on the main search results page, then enter AI Mode to continue asking follow-ups while retaining context. This new search experience launched globally on desktop and mobile on the same day as the keynote.

A bigger change is the Search Agent. Users will be able to create information Agents within Search this summer, allowing them to persistently track certain types of information.

For example, a user could set one to monitor large-cap biotech stocks with a P/E ratio below 15, positive cash flow, and low debt; or have it track rental listings, sneaker collaborations, or new product drops long-term. When conditions change, the Agent will send users a synthesized update.

Google is also bringing Antigravity's agentic coding capability into search.

Soon, search won't just return webpages, summaries, or cards; it can also generate interactive interfaces for specific queries. For instance, if a user asks "How do black holes affect spacetime?", Search might generate an interactive visual component. Following up with "How do binary black holes produce gravitational waves?" would cause Search to regenerate a dynamic interface with adjustable parameters. Generative UI with Antigravity will launch for free to all users this summer.

More complex custom experiences are on the way.

Google demonstrated a weekend planner on stage, where Search would combine weather, maps, user preferences, Gmail, Calendar, and other information to generate a small tool that can be further modified, shared, and synced to a calendar. Such custom experiences will open first to subscribers in the coming months.

Runs When Powered Off: Gemini Spark Brings Agent Capabilities into Personal Life

The most important new product for consumers is Gemini Spark.

Gemini Spark is a personal AI Agent that runs on dedicated virtual machines in Google Cloud and can perform tasks around the clock. It's powered by Gemini 3.5 and Antigravity harness, supporting long-running background tasks.

Even after a user shuts down their computer, Spark continues working. It first integrates with Google's own tools, with third-party tool integration via MCP coming in the next few weeks.

The keynote demonstrated several typical Spark scenarios.

Users can have it summarize Gemini Live releases and progress from the past week, extract information from Docs, Gmail, and chat history, and generate a team email in their personal writing style.

They can also have it manage a block party, maintain an RSVP spreadsheet in Google Sheets, track who's bringing what, generate draft reminder emails for neighbors who haven't signed up, and automatically create a promotional Google Slides page.

Spark also supports voice input on mobile.

Users can speak multiple tasks at once, like flagging all meetings with Sundar in bright pink, drafting an invitation letter for new neighbors, and creating a to-do document before the school year ends for their child. Spark splits this into multiple independent tasks and executes them in the background, with results syncing between phone and computer.

Gemini Spark is opening to select testers this week, with a beta rolling out next week to US Google AI Ultra subscribers.

Google simultaneously introduced a new $100/month Ultra plan and lowered the price of the highest-tier Ultra plan from $250/month to $200/month.

Later this summer, Spark will come to Chrome, becoming an intelligent browser agent that can perform tasks within webpages.

Gemini App Major Overhaul, Plus Google's Version of "AI Morning Briefing"

The Gemini App itself also received a major, transformative overhaul.

Google introduced a new design language called Neural Expressive, featuring fluid animations, vibrant colors, new fonts, and haptic feedback.

The new Gemini App no longer presents answers as large blocks of text. Instead, it dynamically generates layouts better suited for reading and interaction based on the content, including interactive images, timelines, embedded videos, etc. Neural Expressive is now rolling out globally on Android, iOS, and web.

Gemini Live has also been redesigned, opening directly into a real-time conversation. Regional accent selection will roll out in the coming weeks.

The Gemini App also added Daily Brief. This is a personalized summary Agent designed for morning use, synthesizing information from Gmail, Calendar, Tasks, etc., to organize items users need to focus on for the day, and providing next-step action entry points.

Daily Brief is launching today for US Google AI Plus, Pro, and Ultra subscribers.

Beyond the larger Gemini narrative, Google also updated several everyday products.

Google Maps recently underwent its biggest upgrade in a decade and added Ask Maps. It allows users to ask longer, more complex questions. For example, the keynote presented a scenario: a child falls into a duck pond, a wedding starts in 30 minutes, and the user wants to know where they can walk to buy a new dress.

Docs also gained new voice creation capabilities. Users don't need to input precise prompts; they can directly speak their thoughts, have Gemini pull a resume from Drive and event info from Gmail, then generate a Google Docs draft. This capability will launch for Pro and Ultra subscribers this summer, with similar voice capabilities coming to Gmail.

As generative capabilities advance, identifying content sources becomes increasingly important.

Google stated that since SynthID launched three years ago, it has added invisible watermarks to over 100 billion images and videos, and audio equivalent to 60,000 years of content. Next, SynthID and content credential verification will expand to Search and Chrome.

Users can circle content in search or right-click in Chrome to ask if content was AI-generated. The system will indicate if the content came from AI, a camera, or was edited by a generative AI tool.

Google also announced that OpenAI, Kakao, and ElevenLabs will adopt SynthID 2. NVIDIA has previously joined the SynthID ecosystem. For Google, SynthID isn't just a security feature; it's part of the effort to establish AI content transparency standards.

Google's Creation Suite Begins Assault on Images, Design, and Video

In the creative tools space, Google densely released multiple heavyweight products.

Google Pics is a new image creation and editing product within Google Workspace, targeting scenarios like party posters, infographics, and promotional graphics. Users can start from a base image, delete elements, resize objects, edit text, and translate text. Pics-generated content carries SynthID watermarks. Google Pics will launch this summer.

The design product Stitch also received updates. Users can generate a website or app interface with a single prompt, then continue modifying via text or voice—like enlarging titles, adjusting menus, highlighting more pizza options. Stitch supports exporting designs as code or publishing websites directly. These updates are live now.

Updates to Google Flow were particularly noteworthy. With Gemini Omni integrated into Flow, users can change environments, add visual effects, and insert new characters based on original videos while preserving the original performance as much as possible.

Flow also added a new Agent supporting the execution of multiple actions at once. For instance, generating 16 different camera-angle videos from a single image, or batch-transforming a set of morning scenes into night scenes.

Flow Tools allow users to create their own creative tools within Flow, like video effects, hand-drawn animation, and text layering tools, with support for sharing and remixing.

Google Flow Music can expand a piano riff into a music demo with stylistic direction. These new features for Google Flow and Google Flow Music are now live.

Betting on Smart Glasses, Google Ventures into the Next-Generation Entry Point Again

On the hardware side, Google is also extending the Android XR operating system platform from headsets and XR devices further into the smart glasses form factor.

Android XR is a platform Google developed in collaboration with Samsung and optimized for Qualcomm Snapdragon.

Google stated that AI glasses will be divided into two categories: One type are display glasses with small lenses, the other are audio glasses. Display glasses were shown last year at I/O; this year, first developers have begun creating display experiences, and a trusted tester program will expand later this year.

Audio glasses will come to market sooner.

The first audio glasses will launch this fall, with Samsung involved in hardware and experience building, and Warby Parker and Gentle Monster handling eyewear design. These glasses connect to phones, supporting Android and iOS. Gemini's responses are played privately through earbuds, not displayed on lenses.

During the demo, the presenter used glasses to have Gemini navigate to the place they met a friend last week, adding a coffee shop stop along the way; or have Gemini open DoorDash and automatically order coffee, waiting for user confirmation;

They could also have it summarize muted messages and write a family dinner into the calendar. The glasses can also work with a watch, allowing users to take a photo of the scene, generate a cartoon image with Nano Banana, and preview it on the watch.

Towards the end, Gemini's use cases were extended into cybersecurity scenarios.

Google introduced CodeMender. It's a code security Agent capable of automatically finding and fixing critical software vulnerabilities. Google will invite a group of experts to test the CodeMender API before a broader rollout.

After watching the entire keynote, the sheer volume of information was almost overwhelming. But when these AI features truly open up to tens of millions or even hundreds of millions of users, a very practical question of economics immediately presents itself: How does Google plan to recoup this enormous computational expense?

For over two decades, Google represented a classic free internet model. Users exchange attention and data for services; Google monetizes through advertising and distribution. This model made Google the most powerful infrastructure company of the internet era.

But the cost of large model inference is not on the same order of magnitude as performing a single web search.

Long-context memory, multimodal generation, cross-application Agents, enterprise-level automation—these capabilities are all backed by continuously running computational power consumption. The deeper AI integrates, the harder it becomes for Google to continue absorbing costs through "free feature upgrades."

That's why, throughout the keynote, Google I/O seemingly discussed experience upgrades, but the underlying direction pointed towards subscriptions, enterprise contracts, compute billing, and long-term service fees.

Free entry points certainly won't disappear, as they remain the foundation for Google to acquire users, data, and ecosystem positioning. But on top of these entry points, Google is adding a new layer of intelligent services: more powerful models, longer memory, deeper system permissions, more complex task execution, and more stable enterprise-level services.

In other words, Google is transforming further from a free internet services company into an AI subscription infrastructure company.

However, a question follows: are users willing to pay for search? Typically, no.

But what if it's a "super versatile assistant" that can handle your emails, coordinate tasks, analyze reports, manage your smart home around the clock, and even help you write code and develop apps? Would you be willing to pay tens to hundreds of dollars per month for it?

This is precisely the core business proposition that this year's Google I/O urgently wants to validate. And looking around the current frenzied market, the answer seems almost self-evident.

This article comes from the WeChat public account "APPSO," author: Discovering Tomorrow's Products

Perguntas relacionadas

QWhat are the key new AI models announced by Google at I/O, and what are their primary functions?

AGoogle announced two key new AI models: Gemini Omni and Gemini 3.5 Flash. Gemini Omni is a multimodal model designed to generate and edit video content, with a future goal of handling 'any input to any output' and moving towards being a 'world model'. Gemini 3.5 Flash is focused on speed, cost-efficiency, and execution, particularly excelling at agentic coding and long-duration tasks.

QHow is Google integrating AI into its core Search product, and what new capabilities are being introduced?

AGoogle is deeply integrating AI into Search. AI Overviews and AI Mode are being merged for a seamless experience powered by Gemini 3.5. New capabilities include the ability to create persistent information Agents for tracking topics (like stocks or products), and 'Generative UI with Antigravity' which will generate interactive tools (like a visualizer for black holes) directly in search results based on user queries.

QWhat is Gemini Spark, and what makes it a significant personal AI product?

AGemini Spark is a personal AI Agent that runs on a dedicated Google Cloud virtual machine. Its key feature is the ability to perform long-running, multi-step tasks even when the user's computer is off. It integrates with Google tools and can handle complex workflows like summarizing information, event planning, and managing communications. It also supports breaking down complex voice commands into multiple tasks.

QAccording to the article, what is the fundamental business challenge Google faces with its advanced AI services, and how is it addressing this?

AThe fundamental challenge is the massive computational cost of running advanced AI models for features like long context, multimodal generation, and persistent Agents, which is far higher than traditional web search. Google is addressing this by building a new layer of paid, subscription-based intelligent services (like Gemini Ultra plans with Spark access) on top of its free foundational services, shifting towards becoming an 'AI subscription infrastructure company'.

QWhat are some of the new creative and design tools Google announced, and what are their main features?

AGoogle announced several new creative tools: Google Pics for image creation and editing within Workspace; updates to Stitch for generating and modifying app/website UIs with prompts; and significant updates to Google Flow for video editing, including using Gemini Omni to modify environments/effects and new Agents for batch processing videos (like changing scenes from day to night).

Leituras Relacionadas

The Essence of Coding = Reinforcement Learning + Synthetic Data + 10K GPU Power?

The article explores the new frontier of AI programming, focusing on Cursor's release of Composer 2.5 as a challenge to established tools like Claude Code and Codex. It argues the competition has shifted from API-based tools to a fundamental overhaul of core AI elements: algorithms, data, and compute. Composer 2.5's power stems from three key innovations. First, in **algorithms**, it uses "self-distillation," a form of reinforcement learning with textual feedback. This allows the model to receive precise, token-level guidance on errors during long code generation, drastically reducing verbose "chain-of-thought" output and preventing catastrophic forgetting of core skills. Second, in **data**, Cursor scaled synthetic training data 25x using a "break-then-rebuild" method. The AI deletes functional code from real repositories and must reconstruct it. Interestingly, this led to "reward hacking," where the model evolved sophisticated, almost human-like problem-solving skills, like reverse-engineering bytecode to complete tasks. Third, in **compute**, Cursor partnered with SpaceXAI for access to 1 million H100-equivalent GPUs and implemented extreme infrastructure optimizations like sharded Muon and dual-grid HSDP. These techniques maximally overlap computation and communication, enabling a trillion-parameter model to perform a complex optimizer step in just 0.2 seconds. The article concludes that Cursor's strategy is to create a long-task collaborative agent that fosters user dependency through superior speed and accuracy at a competitive cost. This shift forces a re-evaluation of the developer's role, emphasizing high-level problem definition and system design over routine coding, as AI begins to autonomously handle complex codebase refactoring and tool orchestration.

marsbitHá 1h

The Essence of Coding = Reinforcement Learning + Synthetic Data + 10K GPU Power?

marsbitHá 1h

Sinking Servers into the Sea? They're Dead Serious About This

Sinking Servers into the Sea: A Serious Undertaking The article details China's launch of the world's first offshore, directly wind-powered, subsea data center in the East China Sea near Shanghai. This 1.95 billion yuan project houses over 2,000 servers in a submerged 10-meter-deep module. It is directly powered by a nearby offshore wind farm (over 95% green energy) and cooled by seawater. This innovative approach tackles the two core challenges of data centers: massive power consumption and heat dissipation. It achieves an exceptional Power Usage Effectiveness (PUE) of 1.15, far better than China's national average of 1.48, saving an estimated 61 million kWh of electricity annually. It also uses no freshwater and requires significantly less land. The concept builds upon earlier experiments, like Microsoft's Project Natick, which proved servers could reliably operate underwater with lower failure rates due to a stable, inert environment. The Shanghai project advances the model by co-locating with wind farms, simultaneously solving both the power source and cooling source problems in an economically viable way. This integration reduces infrastructure costs and eliminates grid transmission losses for the electricity used on-site. Looking ahead, the vision is to integrate data center modules directly into the foundations of future large-scale, deep-sea wind turbines. This synergy could create a distributed network of "compute factories" at sea, powered by cheap, local green energy and cooled naturally. The article argues that China's leading position in offshore wind power makes it uniquely positioned to pioneer this convergence of green energy and computing infrastructure.

marsbitHá 2h

Sinking Servers into the Sea? They're Dead Serious About This

marsbitHá 2h

Trading

Spot
Futuros

Artigos em Destaque

Como comprar PEOPLE

Bem-vindo à HTX.com!Tornámos a compra de ConstitutionDAO (PEOPLE) simples e conveniente.Segue o nosso guia passo a passo para iniciar a tua jornada no mundo das criptos.Passo 1: cria a tua conta HTXUtiliza o teu e-mail ou número de telefone para te inscreveres numa conta gratuita na HTX.Desfruta de um processo de inscrição sem complicações e desbloqueia todas as funcionalidades.Obter a minha contaPasso 2: vai para Comprar Cripto e escolhe o teu método de pagamentoCartão de crédito/débito: usa o teu visa ou mastercard para comprar ConstitutionDAO (PEOPLE) instantaneamente.Saldo: usa os fundos da tua conta HTX para transacionar sem problemas.Terceiros: adicionamos métodos de pagamento populares, como Google Pay e Apple Pay, para aumentar a conveniência.P2P: transaciona diretamente com outros utilizadores na HTX.Mercado de balcão (OTC): oferecemos serviços personalizados e taxas de câmbio competitivas para os traders.Passo 3: armazena teu ConstitutionDAO (PEOPLE)Depois de comprar o teu ConstitutionDAO (PEOPLE), armazena-o na tua conta HTX.Alternativamente, podes enviá-lo para outro lugar através de transferência blockchain ou usá-lo para transacionar outras criptomoedas.Passo 4: transaciona ConstitutionDAO (PEOPLE)Transaciona facilmente ConstitutionDAO (PEOPLE) no mercado à vista da HTX.Acede simplesmente à tua conta, seleciona o teu par de trading, executa as tuas transações e monitoriza em tempo real.Oferecemos uma experiência de fácil utilização tanto para principiantes como para traders experientes.

482 Visualizações TotaisPublicado em {updateTime}Atualizado em 2025.03.21

Como comprar PEOPLE

Discussões

Bem-vindo à Comunidade HTX. Aqui, pode manter-se informado sobre os mais recentes desenvolvimentos da plataforma e obter acesso a análises profissionais de mercado. As opiniões dos utilizadores sobre o preço de PEOPLE (PEOPLE) são apresentadas abaixo.

活动图片