Although coding is still a mess, Google really has a knack for "multimodality".
The Gemini Omni Flash API is officially open, introducing the video edition Nano Banana.
Magical remakes of "Harry Potter" are no longer a dream. Just watch these four digital magic tricks performed by Gemini Omni:
It's insane. This level of consistency and text clarity makes green screens and special effects almost obsolete—just go live as Doctor Strange.
Meanwhile, the beloved "Banana" has welcomed a "lightspeed edition".
Nano Banana 2 Lite: The fastest, most cost-effective Gemini image model to date.
No exaggeration—it takes just 4 seconds to generate one image. A 1K resolution image costs about 20+ cents.

Compared side-by-side with Nano Banana 2, this speed is practically taking off.
Not to mention GPT Image 2, which takes 3 minutes for a single image generation...
No wonder Gemini 3.5 Pro hasn't been released yet—they probably spent all their time on their beloved multimodality, right, Hassabis!!
Gemini Omni Flash
First unveiled at Google I/O 2026, Gemini Omni Flash deeply integrates Gemini's multimodal reasoning capabilities with video generation and editing, garnering significant attention then.
Now, this model is officially available to developers via the Gemini API and Google AI Studio. It can easily generate and edit high-quality videos based on various inputs like text, images, and video.
Four key capabilities:
Conversational Video Editing: Modify and refine videos using natural language, just like editing a Lark document.
Multimodal Reference: Combine image, text, and video inputs to maintain scene control and consistency.
Real-World Knowledge: Leverage Gemini's knowledge in history, biology, narrative logic, etc., to construct videos, saving you from writing three pages of prompts to describe architectural styles.
Text and Action Synchronization: Connect text and graphics directly to video actions through simple prompts.

The pricing is also very competitive: $0.10 per second of video output, on par with Veo 3.1 Fast.
In terms of positioning, Omni Flash, also a lightweight video generation model, emphasizes Gemini's world knowledge and fully aligns with the Gemini ecosystem.
But Google is also quite candid, proactively listing a bunch of current limitations:
1. Currently only supports 10-second video generation; longer support will come later.
2. Does not yet support audio reference uploads or scene expansion.
3. The API supports video reference uploads up to 3 seconds, but the model currently cannot correctly process such inputs.
4. There are still limitations in character consistency during scene changes and camera movements.
Nano Banana 2 Lite
Nano Banana 2 Lite (also known as gemini-3.1-flash-lite-image) is designed specifically for high-speed processing.
Through targeted optimization, it aims at real-time application scenarios that are extremely sensitive to latency and require processing large volumes of images in a short time—such as bulk generation of e-commerce materials, rapid iteration of ad creatives, and automated content pipelines.
Two core selling points—
Lightspeed: Image generation latency is about 4 seconds, one-fifth of Nano Banana 2's (which is about 20 seconds).
Dirt Cheap: A 1K image costs about $0.034, half the price of Nano Banana 2 and one-quarter of Nano Banana Pro.
Speed and price are cut, but image generation and editing capabilities haven't noticeably shrunk. Nano Banana 2 Lite still maintains excellent text rendering effects, benchmarking on par with models like Grok.

Therefore, Google's suggestion is: If you're still cheaping out with the first-gen Nano Banana, swap it now. The Lite version already comprehensively outperforms it in all key metrics.
Twin Blades United
Wait, hold on.
You might think this is just the parallel release of two models, but Google indicates there's more.
The real magic lies in chaining these models together.
As we all know, AIGC creation requires repeated iteration, and asset management can be quite troublesome.
Now, with these two models, you no longer need to repeatedly upload files—image generation and video creation are seamlessly connected.
Specifically, you can first use Nano Banana 2 Lite to generate images at high speed, then feed the generated images as reference material to Gemini Omni Flash to transform them into videos with one click.
To showcase this magical 1+1>2 workflow, Google even created 3 Demo APPs:
1、Anywhere
Take a selfie or upload a photo, and NB2 Lite instantly Photoshopped you into dozens of landmark scenes.
Then click on the image, and Omni Flash turns the static scene into a dynamic short video.
Cyber tourism, now also end-to-end.
2、Space Lift
This is a bit scary. Combined with the Genie world model in the future, it might threaten many traditional interior design SaaS companies.
Upload a photo of a room. NB2 Lite first generates various interior design styles. Find one you like, click the video button, and Omni can directly create a cinematic space walkthrough for you.
3、Omni product studio
A boon for cross-border e-commerce.
Take a white-background photo of a product. NB2 Lite generates various contextual product images. Omni Flash then turns the static images into e-commerce short videos.
From "product" to "advertising material", the entire chain runs automatically.
So, what's the use of multimodality anyway?
Google has surely been asked this countless times.
Especially in 2026, where Coding ability has become almost synonymous with model intelligence. Everyone is fiercely competing in Coding.
Obsessing over multimodality, for what?
Forget the whole AGI narrative for a moment. In the short term, Google's suite of multimodal models can indeed empower many of its products—Stitch is one, the built-in photo editing in Pixel is another, and the emergence of NotebookLM was quite impressive.
The two new models released this time reveal even more potential for multimodality to land in vertical scenarios. E-commerce, interior design, short videos... the demand in these businesses is real, and so is the money.
Plus, with the Android ecosystem supporting it, there's little worry about commercialization.
Google might not catch up in Coding for now, but at the multimodality poker table, Google might be the only player with a full deck.
But...
When is Gemni 3.5 Pro coming out already!!!

Reference:[1]https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni-flash-nano-banana-2-lite/
This article comes from the WeChat public account "QbitAI", author: Following Cutting-Edge Tech







