Gemini 3.5 is Here! Tonight, Google Overtakes Google

链捕手2026-05-20 tarihinde yayınlandı2026-05-20 tarihinde güncellendi

Özet

Gemini 3.5 Launches: Google Renders Itself Obsolete at I/O 2026 At Google I/O 2026, the company unveiled a transformative suite of AI advancements headlined by three major releases. First, **Gemini Omni**, a true "omnimodal" model, can generate high-quality, coherent videos from any combination of text, image, audio, or video inputs, maintaining character consistency and physical logic across iterative edits. Second, the new flagship **Gemini 3.5 Flash** was introduced, decisively outperforming the previous Gemini 3.1 Pro on key benchmarks for coding, agent tasks, and multimodal reasoning. It is also significantly faster than competitors. This model powers the upgraded **Antigravity 2.0**, an independent Agent development platform that demonstrated the ability to orchestrate 93 sub-agents to build a functional operating system from scratch in just 12 hours. Third, **Gemini Spark** debuted as a personal, always-on AI agent. Running 24/7 in the cloud and integrated with Google Workspace, it can autonomously execute complex multi-step tasks like drafting emails, managing schedules, and planning events by accessing apps like Gmail, Docs, and Sheets. These releases collectively mark a significant leap, moving AI beyond simple generation towards autonomous understanding, decision-making, and task execution, signaling rapid progress on the path toward more advanced AI systems.

Author: XinZhiYuan

 

Google I/O 2026 goes all out!

Just now, Pichai and Demis Hassabis took the stage together, unveiling all the major releases they've been accumulating for half a year in one go.

Without any suspense, the biggest star of the night, Gemini Omni, officially debuted!

As a truly "omni" model, Omni can accept any form of input and generate any content. It debuts with video output support, making it the "video version of Nano Banana".

Another highlight of the night belongs to Gemini 3.5 Flash.

In almost all benchmarks, 3.5 Flash has achieved a crushing victory over its predecessor flagship, Gemini 3.1 Pro. Its output speed has also doubled, and it is over 4 times faster than GPT-5.5 and Opus 4.7. The more powerful 3.5 Pro will be released next month.

In addition, a slew of other major new products were unveiled:

  • Antigravity 2.0: A brand-new standalone desktop application, evolving from an IDE to an Agent development platform.

  • Gemini Spark: A personal AI agent, running 24/7 in the cloud.

  • Gemini App Redesign: Code-named "Neural Expressive," switching to compute-based billing.

  • AI Ultra Subscription Plan: Adds a new $100 tier; highest tier reduced from $250 to $200.

  • Google Search's Biggest Upgrade in 25 Years: Integrated with 3.5 Flash, adds intelligent search box, automatic mini-app generation, etc.

    ......

Without exaggeration, the density of substantive announcements at this I/O is the highest in years.

Gemini Omni Debut: The Birth of an 'Omni' AI

As hinted by the teaser video, the highly anticipated Gemini Omni has finally arrived. Hassabis personally took the stage to announce, "We are taking the next important step—Gemini Omni, a new model that can create content from any input."

This prominence says it all. What Google aims to build this time is an "omni" AI creation engine. It integrates Gemini's intelligence with the strongest generative AI, fully maximizing capabilities in world understanding, multimodality, and editing. Put simply, given any combination of images, audio, video, and text, it can generate a high-quality video. Moreover, you can edit videos through conversation.

More crucially, Omni doesn't just "look like it"; it truly understands the physical world. Hassabis stated, "Previous systems often stumbled when simulating concepts like gravity and momentum, but Omni achieves a 'step change.'" It injects Gemini's "world knowledge" and "reasoning ability" into video generation.

  • Given the prompt "Explain protein folding using clay animation," the generated video accurately depicts amino acid chains folding into α-helices and β-sheets at every step, visually presented as exquisite stop-motion animation.

  • Another example: assigning corresponding objects to the 26 letters of the English alphabet. C for Capybara, D for Disco Ball, L for Lava Lamp. Omni isn't just pasting assets; it's genuinely connecting language, images, and semantics.

It has to be said, the leap from realism to meaningfulness is enormous.

On stage, Hassabis pulled out a selfie video and began live editing. A circle drawn on a palm turned into a black hole; an evening street stroll transformed into a cyberpunk scene. Rewrite the scene with a sentence, change the world with another. Anything can become a canvas for creating new realities. For instance, conjuring fire in your palm from a selfie, or a circle drawn on paper instantly becoming a black hole—all sorts of imaginative possibilities are now achievable.

Moreover, this isn't a one-time generation. You can continue the conversation. Characters remain consistent in Gemini Omni's video output, physical logic holds, and scene memory is coherent.

  • Starting from an original performance clip. Round two: "Teleport the violinist into the environment of this picture," attaching a reference image of snowy mountains and meadows. The scene instantly switches, with actions and lighting fully adapting to the new environment.

  • Round three: "Cut the shot to behind the violinist's shoulder." The perspective rotates, but the performance actions and music remain completely continuous.

No matter how the scene changes, the main subjects in the video do not break.

What's even more thought-provoking is Omni's input flexibility. Images, text, video, audio—any references can be mixed as input to generate a coherent output. You can even create your own avatar, allowing an AI version of you to appear in any scene, speaking with your voice and doing things you haven't done.

Currently, Omni Flash is officially launched, with the API version opening in the coming weeks. The more powerful Omni Pro is also on the way. Leveraging Google's powerful integration capabilities, Omni is integrated at launch with Gemini App, Google Flow, and YouTube Shorts, and even free for YouTube Shorts users.

Flash Overtakes Pro: 3.5 Redefines 'Flagship'

Following Gemini Omni, another major highlight of this I/O is the release of the new flagship, Gemini 3.5 Flash. Google defines it as the strongest coding and agent model to date.

On stage, Pichai personally announced, "3.5 Flash outperforms Gemini 3.1 Pro across virtually all benchmarks!" Remember, 3.1 Pro was the flagship model Google launched just three months ago. Now, a Flash-tier model is crushing it.

Unexpectedly, Google delivered such impressive results in such a short time:

  • Terminal-Bench 2.1 (Coding): 76.2%

  • GDPval-AA (Real-world Agent Tasks): 1656 Elo

  • MCP Atlas (Large-scale Tool Usage): 83.6%

  • CharXiv Reasoning (Multimodal Understanding): 84.2%

In the four major benchmarks above, compared to Gemini 3.1 Pro, 3.5 Flash represents a massive leap forward. In terms of speed, 3.5 Flash occupies its own quadrant at 289 tokens/second, over 4 times faster than other frontier models. Additionally, 3.5 Flash matches or even surpasses GPT-5.5 and Claude Opus 4.7 in some benchmarks. It must be said, 3.5 Flash is both fast and powerful, with virtually no rivals.

Numbers are abstract; let's look at real demonstrations. In an instant, 3.5 Flash can digest an abstruse academic paper and write a fully interactive, visual website. In agent tasks, via Antigravity, it can complete multi-step workflows, automatically categorizing and naming sprawling assets. Or, using two agents, it reproduced the AlphaZero paper in just six hours and coded a fully playable game.

93 Agents Build an OS in Just 12 Hours

It's evident that all these capabilities of 3.5 Flash are enabled by the new Antigravity 2.0. Today, Google's agent development platform, Antigravity, has been upgraded to version 2.0, evolving from an IDE to a standalone desktop application, fully embracing an Agent-first design.

Varun took the stage and gave a demo that left the audience breathless. He tasked Antigravity powered by 3.5 Flash with building an operating system from scratch. 93 sub-agents worked in parallel, making over 15,000 model calls, processing 2.6 billion tokens. Twelve hours later, a completely blank project transformed into a fully functional OS kernel. Scheduler, memory management, file system—every line of code was written by agents, tested by agents, and audited by agents. The API cost was under $1,000.

Then, he attempted to run DOOM on this AI-written operating system. The first attempt failed, lacking video and keyboard drivers. So he immediately entered a fix command in Antigravity 2.0, and the agents began automatically writing the driver code. After a moment, the DOOM screen appeared, and the venue erupted.

To summarize, Antigravity 2.0's core upgrades include:

  • Sub-agents can be dynamically generated; the main agent splits tasks into subtasks and assigns them out, running in parallel without interference.

  • Asynchronous task management prevents long-running operations from blocking the main thread.

  • Scheduled Tasks allow setting "timed tasks" for agents to execute automatically, like checking PR status once a day or running a health check script every hour.

  • New slash commands: /goal lets the agent run to completion; /grill-me makes the agent clarify requirements before acting; /browser explicitly controls browser usage.

However, these are capabilities already proven internally. The token processing speed using Antigravity internally at Google was 500 billion per day in March. Now, it's roaring at 3 trillion per day. Moreover, this 12x accelerated Flash is available in Antigravity starting today.

3.5 Flash is now the default model for both the Gemini App and Google Search AI Mode, available to all users worldwide. Developers can access it via Antigravity 2.0, Gemini API, and Google AI Studio. Enterprise users can onboard via Gemini Enterprise Agent Platform. Even more explosive, 3.5 Pro is currently in internal testing and will be released next month.

24/7 Personal Assistant: Google Spark Finally Arrives

The third major announcement tonight is undoubtedly Gemini Spark! Pichai's positioning for it is very clear: your personal AI agent. It doesn't stop even when you close your laptop. It runs on a dedicated virtual machine in the cloud, enabling 24/7 availability.

Gemini Spark is powered by Gemini 3.5 + the Antigravity framework, deeply integrated with Google's "Workspace suite." Product VP Josh Woodward took the stage to demonstrate two scenarios that drove the audience wild.

  • The first is a work scenario: Input an instruction, "Draft an email for the team summarizing all information from the past week about the Gemini Live launch." Spark automatically pulls information across Gmail, Docs, and chat logs, and also invokes a "ghostwriter" skill Woodward wrote himself, making the email automatically match his personal tone. The entire process is done in the background; a human only needs to review and send. Yes, Spark supports custom skills, allowing it to learn your voice, your preferences, and your work style.

  • The second is a life scenario: Planning a neighborhood block party. Upon receiving the task, Spark executes step by step. It creates an RSVP tracking sheet in Google Sheets, directly linked to Gmail, updating automatically as people reply. For neighbors who haven't signed up, Spark automatically drafts reminder emails, creating drafts for confirmation before sending. Then, it also generates a promotional deck in Google Slides, even including information about placing an inflatable castle in the neighborhood. The entire process didn't involve opening a single app.

Moreover, Spark possesses powerful voice input capabilities. Live on stage, Woodward pulled out his phone and directly issued three tasks via voice: "Find all meetings with Sundar and mark them bright pink," "Write an invitation for new neighbor John to join the block party list," "Create a doc listing things to do for the kids before the school year ends, sorted by deadline."

The voice directly converted into text instructions, and Spark automatically split the continuous voice input into three independent task threads, executing them in parallel in the background.

Regarding pricing, the $100/month AI Ultra subscription provides access to the Spark Beta. The highest-tier Ultra plan has been reduced from $250 to $200. Spark will be available as a Beta next week, initially for U.S. AI Ultra subscribers.

Tonight, Google Unveils the Gateway to ASI

Looking back at this I/O, what's truly chilling isn't any single product. It's that all these capabilities arrived simultaneously.

Full multimodal understanding, full multimodal generation, and 24/7 online Agents—these three puzzle pieces were all put in place by Google in one night. Omni turns a sentence into a world without humans providing any assets; 93 agents create an operating system from scratch without humans writing a single line of code; Spark works for you 24/7 without humans opening an app.

When AI no longer needs humans to "feed it," but understands, decides, executes, and iterates on its own—the end of this road is called ASI (Artificial Superintelligence).

No one can give a definitive timeline. But tonight's Google I/O made everyone realize one thing: On the path to superintelligence, the obstacle of "technically impossible" no longer exists. What remains is merely the speed of engineering deployment. Half a year ago, we were debating whether AGI was a bubble. Half a year later, Google is already writing operating systems with agents. The acceleration in this industry has already surpassed what human intuition can perceive.

References:

  • https://youtu.be/wYSncx9zLIU

  • https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/

  • https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/

  • https://antigravity.google/blog/introducing-google-antigravity-2-0

  • https://antigravity.google/blog/google-io-2026-feature-deep-dive

Edited by: Peach Moses

 

 

 

 

 

 

 

 

 

 

 

 

 

İlgili Sorular

QWhat are the three major announcements made at Google I/O 2026 according to the article?

AThe three major announcements at Google I/O 2026 were: 1) The debut of Gemini Omni, a 'truly all-around' model capable of video output from any input. 2) The launch of the new flagship model Gemini 3.5 Flash, which significantly outperforms its predecessor. 3) The introduction of the personal AI Agent, Gemini Spark, which runs 24/7 in the cloud.

QWhat is the core capability of Gemini Omni as described in the article?

AThe core capability of Gemini Omni is that it is a 'truly all-around' AI creation engine. It can receive any combination of inputs (images, audio, video, text) and generate high-quality, meaningful videos. Key features include its understanding of the physical world, conversational video editing, and maintaining character and scene consistency across edits.

QHow does the Gemini 3.5 Flash model compare to the Gemini 3.1 Pro according to Google's presentation?

AAccording to Google CEO Sundar Pichai, Gemini 3.5 Flash outperforms the previous flagship model, Gemini 3.1 Pro, in almost all benchmark tests. It is described as achieving a 'fault leap forward' in areas like coding, real-world agent tasks, and multimodal understanding, while also being over 4 times faster than competing models like GPT-5.5 and Claude Opus 4.7.

QWhat impressive feat did the upgraded Antigravity 2.0 platform with Gemini 3.5 Flash accomplish in the demonstration?

AIn a demonstration, the Antigravity 2.0 platform, powered by Gemini 3.5 Flash, coordinated 93 sub-agents to build a fully functional operating system kernel from scratch in just 12 hours. The agents autonomously wrote over 26 billion tokens of code to create components like a scheduler, memory manager, and file system, and later successfully ran the classic game DOOM on this AI-built OS after a fix.

QWhat is the primary function of Gemini Spark, and what makes it unique?

AGemini Spark is a personal AI Agent designed to perform tasks autonomously on behalf of the user. Its primary function is to act as a 7x24 personal assistant that runs continuously on a dedicated cloud VM. It is unique because it can operate even when the user's device is off, deeply integrate with Google Workspace apps to perform complex, multi-step workflows, and execute multiple tasks parsed from a single voice command in parallel.

İlgili Okumalar

Wang Chuan: When the neighbor Lao Wang earned thirty times from investing in memory storage stocks, how can you still avoid anxiety (6) - The trap of homogeneous products

The article, "Wang Chuan: How to Remain Unanxious After Neighbor Lao Wang's Thirty-Fold Gain on Storage Stocks (Part 6) - The Trap of Commoditized Goods," analyzes the cyclical and perilous nature of the data storage industry through historical and current case studies. It begins with the example of Iomega, whose Zip drives led to a stock surge of over 160x in the mid-1990s before collapsing over 97% from its peak due to competition from cheaper CD-R technology. This pattern is characteristic of storage, where products like DRAM are highly commoditized, leading to extreme price volatility. The sector has seen prices crash over 80% multiple times, with companies often facing bankruptcy. The core dynamic is "elastic demand facing heavy-asset, long-cycle, rigid supply." High prices attract new capacity, but the long lead time means supply eventually overshoots, causing sharp price corrections. The current AI-driven boom, exemplified by surging demand for High-Bandwidth Memory (HBM), has led to skyrocketing prices and profit margins for companies like SanDisk and Micron, despite relatively flat production volumes. However, the author warns this high-margin environment is self-defeating. The high profits are already triggering massive new capacity investments (hundreds of billions starting 2026), with supply expected to ramp up by late 2027. When supply catches up, total revenue and profits may fall even as more units are sold. Long-term supply agreements offer little protection, as buyers can find ways to renegotiate if market prices drop, similar to fragile political treaties. Key risks include economic downturns, cuts in AI spending, faster-than-expected capacity expansion (especially from Chinese firms), and innovations in chip/algorithm design that reduce memory needs. A critical trap is that at the cycle's peak, storage stocks often appear cheap with low P/E ratios, luring value investors just before an impending downturn where profits evaporate. The conclusion cautions that for commoditized goods like storage, high margins inevitably destroy themselves, and the current asymmetry favors downside risk over further upside. The neighbor's dream of easy wealth from storage stocks is portrayed as a precarious illusion.

链捕手17 dk önce

Wang Chuan: When the neighbor Lao Wang earned thirty times from investing in memory storage stocks, how can you still avoid anxiety (6) - The trap of homogeneous products

链捕手17 dk önce

AI PCs Are Here, Going Toe-to-Toe with 120B Models Locally! NVIDIA Redefines the "Personal AI Computer" Foundation with RTX Spark

NVIDIA has redefined the "AI PC" standard with the launch of the RTX Spark super chip at GTC 2026. Boasting 1 petaflop (1000 TOPS) of AI performance, it dwarfs the 45-50 TOPS NPUs in current AI PCs. The SoC features a Blackwell GPU, a 20-core Arm CPU co-designed with MediaTek, and crucially, up to 128GB of unified memory shared between CPU and GPU. This architectural shift enables local execution of 120-billion-parameter large language models with million-token context windows, a massive leap from the 9B-40B models typical on current consumer hardware. Beyond AI, use cases include 12K video editing and high-fps ray-traced gaming. Key to enterprise adoption is a security collaboration with Microsoft. Windows security is upgraded, and NVIDIA's OpenShell sandbox runtime is integrated to safely contain AI agent actions. Major software support comes from Adobe, which announced a deep,底层-level rewrite of Photoshop and Premiere to leverage the unified memory for up to 2x performance gains. Six OEMs, including Dell, HP, Lenovo, and Microsoft Surface, will release RTX Spark-based轻薄本 and compact desktops this fall. However, questions remain about real-world performance,功耗, thermal management in laptops, pricing, and the actual impact of the OpenShell sandbox. The RTX Spark represents a fundamental power shift in the PC industry, moving from an x86 CPU-centric model to a GPU-centric SoC platform, but its ultimate success hinges on the upcoming product rollouts and ecosystem validation.

marsbit30 dk önce

AI PCs Are Here, Going Toe-to-Toe with 120B Models Locally! NVIDIA Redefines the "Personal AI Computer" Foundation with RTX Spark

marsbit30 dk önce

Jensen Huang: Vera Rubin Full Mass Production, AI Agent a Key Focus, Challenging Intel to Target the Next-Generation AI PC Gateway

NVIDIA CEO Jensen Huang delivered the keynote speech at GTC Taipei 2026, announcing several major product launches and strategic directions. The company's Vera Rubin architecture is now in full-scale production, with OpenAI, Anthropic, and SpaceX among the first customers. NVIDIA highlighted AI Agent as a key future focus, introducing the Vera CPU designed for AI agents and the Vera BlueField-4 STX for secure, chip-level AI storage processing. A significant move involves challenging Intel in the PC market. NVIDIA, in collaboration with MediaTek, is developing the RTX SPARK PC chip (manufactured by TSMC) for Windows systems, set to launch this fall for laptops and desktops. This signals NVIDIA's push into the next-generation AI PC arena, aiming to provide a vertically integrated core computing platform for the entire Windows ecosystem, similar to Apple's approach. Other announcements include the new Nemotron 3 Ultra AI model and the NVIDIA DSX platform, described as a complete "playbook" for building AI factories, allowing performance simulation and validation before physical deployment. In automotive, the DRIVE Hyperion platform was positioned as a global robotaxi platform, with major Chinese automakers like BYD, Geely, Zeekr, Xiaomi, and Pony.ai already adopting or developing autonomous driving solutions based on it. The Alpamayo 2 super open inference model for robotaxis was also introduced. For robotics, NVIDIA unveiled the Isaac GR00T humanoid robot reference platform for academic research and a large open-source agent tools and skills suite for Physical AI. The company plans to collaborate with global humanoid robot manufacturers, including China's Unitree, whose H2 Plus robot served as the reference hardware for the GR00T platform demonstration.

marsbit58 dk önce

Jensen Huang: Vera Rubin Full Mass Production, AI Agent a Key Focus, Challenging Intel to Target the Next-Generation AI PC Gateway

marsbit58 dk önce

Running MoE on Mobile Phones? Meta Proposes MobileMoE, Speeding Up iPhone 16 Pro by 3.8x

Meta's MobileMoE, a mobile-optimized Mixture-of-Experts (MoE) language model architecture, enables efficient on-device large language model (LLM) inference for the first time on commercial smartphones. Designed for decoder-only Transformers, it replaces dense feed-forward layers with MoE layers. Key design choices include 8 experts with granularity g=8, top-4 routing, and a shared expert. The model undergoes a four-stage training process: pre-training, intermediate training, supervised fine-tuning, and quantization-aware training. Results show MobileMoE models, with similar memory footprint, achieve equal or higher average accuracy across 14 foundational benchmarks while using only 1/2 to 1/4 of the FLOPs compared to dense baselines. After INT4 quantization, they remain competitive. Notably, on an iPhone 16 Pro, MobileMoE-S demonstrates significant speedups: up to 3.8x faster in the prompt phase and 2.2-3.4x faster in per-token generation compared to a dense counterpart, with lower peak memory usage. While MobileMoE establishes a new Pareto frontier for on-device LLMs in accuracy-compute trade-offs, particularly excelling in code and math tasks, it currently lags behind models like Qwen3.5 2B in advanced instruction following and knowledge reasoning. Future work includes improving post-training techniques, exploring NPU deployment, and managing the runtime memory sensitivity of MoE models to varying inputs.

marsbit1 saat önce

Running MoE on Mobile Phones? Meta Proposes MobileMoE, Speeding Up iPhone 16 Pro by 3.8x

marsbit1 saat önce

İşlemler

Spot
Futures
活动图片