Gemini 3.5 is Here! Tonight, Google Overtakes Google

链捕手Опубліковано о 2026-05-20Востаннє оновлено о 2026-05-20

Анотація

Gemini 3.5 Launches: Google Renders Itself Obsolete at I/O 2026 At Google I/O 2026, the company unveiled a transformative suite of AI advancements headlined by three major releases. First, **Gemini Omni**, a true "omnimodal" model, can generate high-quality, coherent videos from any combination of text, image, audio, or video inputs, maintaining character consistency and physical logic across iterative edits. Second, the new flagship **Gemini 3.5 Flash** was introduced, decisively outperforming the previous Gemini 3.1 Pro on key benchmarks for coding, agent tasks, and multimodal reasoning. It is also significantly faster than competitors. This model powers the upgraded **Antigravity 2.0**, an independent Agent development platform that demonstrated the ability to orchestrate 93 sub-agents to build a functional operating system from scratch in just 12 hours. Third, **Gemini Spark** debuted as a personal, always-on AI agent. Running 24/7 in the cloud and integrated with Google Workspace, it can autonomously execute complex multi-step tasks like drafting emails, managing schedules, and planning events by accessing apps like Gmail, Docs, and Sheets. These releases collectively mark a significant leap, moving AI beyond simple generation towards autonomous understanding, decision-making, and task execution, signaling rapid progress on the path toward more advanced AI systems.

Author: XinZhiYuan

 

Google I/O 2026 goes all out!

Just now, Pichai and Demis Hassabis took the stage together, unveiling all the major releases they've been accumulating for half a year in one go.

Without any suspense, the biggest star of the night, Gemini Omni, officially debuted!

As a truly "omni" model, Omni can accept any form of input and generate any content. It debuts with video output support, making it the "video version of Nano Banana".

Another highlight of the night belongs to Gemini 3.5 Flash.

In almost all benchmarks, 3.5 Flash has achieved a crushing victory over its predecessor flagship, Gemini 3.1 Pro. Its output speed has also doubled, and it is over 4 times faster than GPT-5.5 and Opus 4.7. The more powerful 3.5 Pro will be released next month.

In addition, a slew of other major new products were unveiled:

  • Antigravity 2.0: A brand-new standalone desktop application, evolving from an IDE to an Agent development platform.

  • Gemini Spark: A personal AI agent, running 24/7 in the cloud.

  • Gemini App Redesign: Code-named "Neural Expressive," switching to compute-based billing.

  • AI Ultra Subscription Plan: Adds a new $100 tier; highest tier reduced from $250 to $200.

  • Google Search's Biggest Upgrade in 25 Years: Integrated with 3.5 Flash, adds intelligent search box, automatic mini-app generation, etc.

    ......

Without exaggeration, the density of substantive announcements at this I/O is the highest in years.

Gemini Omni Debut: The Birth of an 'Omni' AI

As hinted by the teaser video, the highly anticipated Gemini Omni has finally arrived. Hassabis personally took the stage to announce, "We are taking the next important step—Gemini Omni, a new model that can create content from any input."

This prominence says it all. What Google aims to build this time is an "omni" AI creation engine. It integrates Gemini's intelligence with the strongest generative AI, fully maximizing capabilities in world understanding, multimodality, and editing. Put simply, given any combination of images, audio, video, and text, it can generate a high-quality video. Moreover, you can edit videos through conversation.

More crucially, Omni doesn't just "look like it"; it truly understands the physical world. Hassabis stated, "Previous systems often stumbled when simulating concepts like gravity and momentum, but Omni achieves a 'step change.'" It injects Gemini's "world knowledge" and "reasoning ability" into video generation.

  • Given the prompt "Explain protein folding using clay animation," the generated video accurately depicts amino acid chains folding into α-helices and β-sheets at every step, visually presented as exquisite stop-motion animation.

  • Another example: assigning corresponding objects to the 26 letters of the English alphabet. C for Capybara, D for Disco Ball, L for Lava Lamp. Omni isn't just pasting assets; it's genuinely connecting language, images, and semantics.

It has to be said, the leap from realism to meaningfulness is enormous.

On stage, Hassabis pulled out a selfie video and began live editing. A circle drawn on a palm turned into a black hole; an evening street stroll transformed into a cyberpunk scene. Rewrite the scene with a sentence, change the world with another. Anything can become a canvas for creating new realities. For instance, conjuring fire in your palm from a selfie, or a circle drawn on paper instantly becoming a black hole—all sorts of imaginative possibilities are now achievable.

Moreover, this isn't a one-time generation. You can continue the conversation. Characters remain consistent in Gemini Omni's video output, physical logic holds, and scene memory is coherent.

  • Starting from an original performance clip. Round two: "Teleport the violinist into the environment of this picture," attaching a reference image of snowy mountains and meadows. The scene instantly switches, with actions and lighting fully adapting to the new environment.

  • Round three: "Cut the shot to behind the violinist's shoulder." The perspective rotates, but the performance actions and music remain completely continuous.

No matter how the scene changes, the main subjects in the video do not break.

What's even more thought-provoking is Omni's input flexibility. Images, text, video, audio—any references can be mixed as input to generate a coherent output. You can even create your own avatar, allowing an AI version of you to appear in any scene, speaking with your voice and doing things you haven't done.

Currently, Omni Flash is officially launched, with the API version opening in the coming weeks. The more powerful Omni Pro is also on the way. Leveraging Google's powerful integration capabilities, Omni is integrated at launch with Gemini App, Google Flow, and YouTube Shorts, and even free for YouTube Shorts users.

Flash Overtakes Pro: 3.5 Redefines 'Flagship'

Following Gemini Omni, another major highlight of this I/O is the release of the new flagship, Gemini 3.5 Flash. Google defines it as the strongest coding and agent model to date.

On stage, Pichai personally announced, "3.5 Flash outperforms Gemini 3.1 Pro across virtually all benchmarks!" Remember, 3.1 Pro was the flagship model Google launched just three months ago. Now, a Flash-tier model is crushing it.

Unexpectedly, Google delivered such impressive results in such a short time:

  • Terminal-Bench 2.1 (Coding): 76.2%

  • GDPval-AA (Real-world Agent Tasks): 1656 Elo

  • MCP Atlas (Large-scale Tool Usage): 83.6%

  • CharXiv Reasoning (Multimodal Understanding): 84.2%

In the four major benchmarks above, compared to Gemini 3.1 Pro, 3.5 Flash represents a massive leap forward. In terms of speed, 3.5 Flash occupies its own quadrant at 289 tokens/second, over 4 times faster than other frontier models. Additionally, 3.5 Flash matches or even surpasses GPT-5.5 and Claude Opus 4.7 in some benchmarks. It must be said, 3.5 Flash is both fast and powerful, with virtually no rivals.

Numbers are abstract; let's look at real demonstrations. In an instant, 3.5 Flash can digest an abstruse academic paper and write a fully interactive, visual website. In agent tasks, via Antigravity, it can complete multi-step workflows, automatically categorizing and naming sprawling assets. Or, using two agents, it reproduced the AlphaZero paper in just six hours and coded a fully playable game.

93 Agents Build an OS in Just 12 Hours

It's evident that all these capabilities of 3.5 Flash are enabled by the new Antigravity 2.0. Today, Google's agent development platform, Antigravity, has been upgraded to version 2.0, evolving from an IDE to a standalone desktop application, fully embracing an Agent-first design.

Varun took the stage and gave a demo that left the audience breathless. He tasked Antigravity powered by 3.5 Flash with building an operating system from scratch. 93 sub-agents worked in parallel, making over 15,000 model calls, processing 2.6 billion tokens. Twelve hours later, a completely blank project transformed into a fully functional OS kernel. Scheduler, memory management, file system—every line of code was written by agents, tested by agents, and audited by agents. The API cost was under $1,000.

Then, he attempted to run DOOM on this AI-written operating system. The first attempt failed, lacking video and keyboard drivers. So he immediately entered a fix command in Antigravity 2.0, and the agents began automatically writing the driver code. After a moment, the DOOM screen appeared, and the venue erupted.

To summarize, Antigravity 2.0's core upgrades include:

  • Sub-agents can be dynamically generated; the main agent splits tasks into subtasks and assigns them out, running in parallel without interference.

  • Asynchronous task management prevents long-running operations from blocking the main thread.

  • Scheduled Tasks allow setting "timed tasks" for agents to execute automatically, like checking PR status once a day or running a health check script every hour.

  • New slash commands: /goal lets the agent run to completion; /grill-me makes the agent clarify requirements before acting; /browser explicitly controls browser usage.

However, these are capabilities already proven internally. The token processing speed using Antigravity internally at Google was 500 billion per day in March. Now, it's roaring at 3 trillion per day. Moreover, this 12x accelerated Flash is available in Antigravity starting today.

3.5 Flash is now the default model for both the Gemini App and Google Search AI Mode, available to all users worldwide. Developers can access it via Antigravity 2.0, Gemini API, and Google AI Studio. Enterprise users can onboard via Gemini Enterprise Agent Platform. Even more explosive, 3.5 Pro is currently in internal testing and will be released next month.

24/7 Personal Assistant: Google Spark Finally Arrives

The third major announcement tonight is undoubtedly Gemini Spark! Pichai's positioning for it is very clear: your personal AI agent. It doesn't stop even when you close your laptop. It runs on a dedicated virtual machine in the cloud, enabling 24/7 availability.

Gemini Spark is powered by Gemini 3.5 + the Antigravity framework, deeply integrated with Google's "Workspace suite." Product VP Josh Woodward took the stage to demonstrate two scenarios that drove the audience wild.

  • The first is a work scenario: Input an instruction, "Draft an email for the team summarizing all information from the past week about the Gemini Live launch." Spark automatically pulls information across Gmail, Docs, and chat logs, and also invokes a "ghostwriter" skill Woodward wrote himself, making the email automatically match his personal tone. The entire process is done in the background; a human only needs to review and send. Yes, Spark supports custom skills, allowing it to learn your voice, your preferences, and your work style.

  • The second is a life scenario: Planning a neighborhood block party. Upon receiving the task, Spark executes step by step. It creates an RSVP tracking sheet in Google Sheets, directly linked to Gmail, updating automatically as people reply. For neighbors who haven't signed up, Spark automatically drafts reminder emails, creating drafts for confirmation before sending. Then, it also generates a promotional deck in Google Slides, even including information about placing an inflatable castle in the neighborhood. The entire process didn't involve opening a single app.

Moreover, Spark possesses powerful voice input capabilities. Live on stage, Woodward pulled out his phone and directly issued three tasks via voice: "Find all meetings with Sundar and mark them bright pink," "Write an invitation for new neighbor John to join the block party list," "Create a doc listing things to do for the kids before the school year ends, sorted by deadline."

The voice directly converted into text instructions, and Spark automatically split the continuous voice input into three independent task threads, executing them in parallel in the background.

Regarding pricing, the $100/month AI Ultra subscription provides access to the Spark Beta. The highest-tier Ultra plan has been reduced from $250 to $200. Spark will be available as a Beta next week, initially for U.S. AI Ultra subscribers.

Tonight, Google Unveils the Gateway to ASI

Looking back at this I/O, what's truly chilling isn't any single product. It's that all these capabilities arrived simultaneously.

Full multimodal understanding, full multimodal generation, and 24/7 online Agents—these three puzzle pieces were all put in place by Google in one night. Omni turns a sentence into a world without humans providing any assets; 93 agents create an operating system from scratch without humans writing a single line of code; Spark works for you 24/7 without humans opening an app.

When AI no longer needs humans to "feed it," but understands, decides, executes, and iterates on its own—the end of this road is called ASI (Artificial Superintelligence).

No one can give a definitive timeline. But tonight's Google I/O made everyone realize one thing: On the path to superintelligence, the obstacle of "technically impossible" no longer exists. What remains is merely the speed of engineering deployment. Half a year ago, we were debating whether AGI was a bubble. Half a year later, Google is already writing operating systems with agents. The acceleration in this industry has already surpassed what human intuition can perceive.

References:

  • https://youtu.be/wYSncx9zLIU

  • https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/

  • https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/

  • https://antigravity.google/blog/introducing-google-antigravity-2-0

  • https://antigravity.google/blog/google-io-2026-feature-deep-dive

Edited by: Peach Moses

 

 

 

 

 

 

 

 

 

 

 

 

 

Пов'язані питання

QWhat are the three major announcements made at Google I/O 2026 according to the article?

AThe three major announcements at Google I/O 2026 were: 1) The debut of Gemini Omni, a 'truly all-around' model capable of video output from any input. 2) The launch of the new flagship model Gemini 3.5 Flash, which significantly outperforms its predecessor. 3) The introduction of the personal AI Agent, Gemini Spark, which runs 24/7 in the cloud.

QWhat is the core capability of Gemini Omni as described in the article?

AThe core capability of Gemini Omni is that it is a 'truly all-around' AI creation engine. It can receive any combination of inputs (images, audio, video, text) and generate high-quality, meaningful videos. Key features include its understanding of the physical world, conversational video editing, and maintaining character and scene consistency across edits.

QHow does the Gemini 3.5 Flash model compare to the Gemini 3.1 Pro according to Google's presentation?

AAccording to Google CEO Sundar Pichai, Gemini 3.5 Flash outperforms the previous flagship model, Gemini 3.1 Pro, in almost all benchmark tests. It is described as achieving a 'fault leap forward' in areas like coding, real-world agent tasks, and multimodal understanding, while also being over 4 times faster than competing models like GPT-5.5 and Claude Opus 4.7.

QWhat impressive feat did the upgraded Antigravity 2.0 platform with Gemini 3.5 Flash accomplish in the demonstration?

AIn a demonstration, the Antigravity 2.0 platform, powered by Gemini 3.5 Flash, coordinated 93 sub-agents to build a fully functional operating system kernel from scratch in just 12 hours. The agents autonomously wrote over 26 billion tokens of code to create components like a scheduler, memory manager, and file system, and later successfully ran the classic game DOOM on this AI-built OS after a fix.

QWhat is the primary function of Gemini Spark, and what makes it unique?

AGemini Spark is a personal AI Agent designed to perform tasks autonomously on behalf of the user. Its primary function is to act as a 7x24 personal assistant that runs continuously on a dedicated cloud VM. It is unique because it can operate even when the user's device is off, deeply integrate with Google Workspace apps to perform complex, multi-step workflows, and execute multiple tasks parsed from a single voice command in parallel.

Пов'язані матеріали

Alibaba 'Stocks Up', ByteDance 'Trains'

"In late May, two closely timed events in China's AI industry clearly revealed the divergent strategic approaches of two tech giants: Alibaba and ByteDance. Alibaba is aggressively integrating AI into its existing commercial ecosystem, prioritizing immediate monetization. Its Qwen App now fully integrates with Taobao, leveraging the platform's 4-billion-item database for AI-powered shopping features like virtual try-on and price comparison. Internally, Alibaba has reorganized to incentivize AI-driven business growth, notably through the 'Agentic Commerce Trust Protocol' to enable AI-agent transactions. Financially, it emphasizes ROI, with CEO Daniel Wu stating every AI chip purchased is generating revenue. Alibaba's strategy bets that foundational AI model capabilities won't be leapfrogged in the next five years, allowing its 'AI-as-a-utility' approach to succeed. In stark contrast, ByteDance's Seed division focuses on pushing the frontiers of AGI with a long-term, research-oriented mindset. Its video generation model, Seedance 2.0, topped international benchmarks. The division, led by researchers Wu Yonghui and product head Zhu Wenjia, is tasked with 'exploring the upper limits of intelligence,' even considering open-sourcing its models—a rare move among Chinese firms. ByteDance is investing heavily, with reports of its 2026 capital expenditure plan being nearly triple that of 2024, funded by its substantial private profits. This allows it to pursue projects like an 8-month research paper questioning if video models are true 'world models,' devoid of immediate commercial pressure. The core divergence is less about corporate philosophy and more about structural constraints. As a publicly traded company, Alibaba is bound to quarterly financial expectations, forcing a pragmatic, revenue-focused AI integration. As a private entity, ByteDance has the luxury to fund long-term, high-risk foundational research without answering to public markets. The article concludes that the true determinant of a Chinese company's AI path is its IPO status, suggesting that if ByteDance were public, or if Alibaba were private, their strategies might well be reversed."

marsbit1 год тому

Alibaba 'Stocks Up', ByteDance 'Trains'

marsbit1 год тому

Why More AI Agents Does Not Equal Higher Productivity?

Editor's Note: As AI Agents become cheaper and easier to use, a new constraint emerges: the cost isn't in launching more Agents, but in the human attention required to manage, judge, and integrate their outputs. This hidden cost is called the "orchestration tax." The article argues that a developer's cognitive bandwidth is the key bottleneck—a serial, non-parallelizable resource akin to a Global Interpreter Lock (GIL). While many Agents can run concurrently, their results ultimately require human judgment for review, conflict resolution, and final integration. Therefore, more Agents don't automatically mean higher productivity; they can simply create longer queues, lead to cognitive fatigue, and create the illusion of busyness without real output. The core solution is to design workflows around this scarce human attention. Key strategies include: scaling the number of Agents to match review capacity (not UI capacity), categorizing tasks (delegating independent ones, keeping complex judgment-heavy ones serial), batch reviewing results to minimize context-switching costs, automating verifiable checks to reserve human judgment for critical decisions, and protecting focused, uninterrupted thinking time. Ultimately, the critical skill is not launching many Agents, but architecting systems that respect the fundamental limit of human attention. Unpaid "orchestration tax" accumulates as both technical and cognitive debt, undermining system understanding and quality. True productivity comes from thoughtfully managing the single-threaded resource—your focus.

marsbit2 год тому

Why More AI Agents Does Not Equal Higher Productivity?

marsbit2 год тому

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

Three Years Later: Revisiting My 2023 Predictions on ChatGPT In March 2023, shortly after ChatGPT's launch, I made 20 predictions about its future. Now, in mid-2026, I've used AI agents to fact-check each one against the latest data. Overall, most major directional forecasts were correct, with only one outright error (incorrectly stating GPT-4 had 100 trillion parameters). Key successes included predicting that RAG and retrieval architectures would become the standard for handling knowledge and hallucinations, that natural language interfaces (LUI) would create a massive new industry layer beyond the models themselves, and that China would develop viable large language models, significantly closing the performance gap with Western counterparts within about three years. Predictions about the absence of mass unemployment, the rise of a new "robot network" for agent communication, and ChatGPT not possessing consciousness also held true in their core arguments. However, the "devil was in the details." Errors frequently involved specific numbers, timelines, or overlooking distributional effects. I tended to overestimate the speed of adoption (e.g., for agent networks) while underestimating the ultimate scale of capabilities or costs (e.g., AI winning IMO gold without tools, or the extreme capital required for frontier models). Other misjudgments included: underestimating how AI would reinforce, not dissolve, information filter bubbles; incorrectly assuming AI-generated content would easily circumvent copyright (it has instead triggered record-breaking settlements); and misidentifying where value would be captured (it accrued overwhelmingly to the compute layer, like Nvidia, not just the application or model layers). Key lessons from reviewing these predictions are: 1) Directional and mechanistic insights are far more reliable than precise numbers or absolute statements. 2) There's a consistent bias to overestimate short-term speed but underestimate long-term magnitude. 3) Errors often lie in missing distributional impacts within a generally correct aggregate trend. 4) Predictions phrased with nuance and caveats aged the best. 5) Some fundamental debates (e.g., on machine consciousness or the ultimate value chain) remain unresolved even after three years. This exercise is less about scoring the past and more about establishing rules for clearer thinking about the next three years of AI.

marsbit9 год тому

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

marsbit9 год тому

Торгівля

Спот
Ф'ючерси
活动图片