Video Edition Nano Banana Arrives: Built-in Gemini World Knowledge, Original Banana Generates Images in Just 4 Seconds

marsbit發佈於 2026-07-01更新於 2026-07-01

文章摘要

Google has unveiled two new multimodal AI models: Gemini Omni Flash and Nano Banana 2 Lite. Gemini Omni Flash is a video generation and editing model that leverages Gemini's world knowledge. It allows for conversational video editing using natural language prompts, maintains scene consistency, and integrates text/graphics with video actions. Priced at $0.10 per second of output, its current limitations include a 10-second video cap. Nano Banana 2 Lite (gemini-3.1-flash-lite-image) is an optimized image generation model focused on speed and cost. It produces a 1K resolution image in about 4 seconds at a cost of roughly $0.034, making it significantly faster and cheaper than its predecessor. It retains strong text rendering capabilities. A key highlight is the combined workflow: users can rapidly generate images with Nano Banana 2 Lite and then seamlessly feed them into Gemini Omni Flash to create videos. Google demonstrated this with three application demos: "Anywhere" for creating travel videos from photos, "Space Lift" for generating interior design walkthroughs, and "Omni Product Studio" for automating e-commerce ad creation from product photos. The release underscores Google's strategic focus on advancing multimodal AI for practical, commercial applications in areas like marketing, design, and content creation, despite competitive pressures in other AI domains.

Although coding is still a mess, Google really has a knack for "multimodality".

The Gemini Omni Flash API is officially open, introducing the video edition Nano Banana.

Magical remakes of "Harry Potter" are no longer a dream. Just watch these four digital magic tricks performed by Gemini Omni:

&amp;amp;amp;amp;nbsp;

It's insane. This level of consistency and text clarity makes green screens and special effects almost obsolete—just go live as Doctor Strange.

Meanwhile, the beloved "Banana" has welcomed a "lightspeed edition".

Nano Banana 2 Lite: The fastest, most cost-effective Gemini image model to date.

No exaggeration—it takes just 4 seconds to generate one image. A 1K resolution image costs about 20+ cents.

Compared side-by-side with Nano Banana 2, this speed is practically taking off.

Not to mention GPT Image 2, which takes 3 minutes for a single image generation...

&amp;amp;amp;amp;nbsp;

No wonder Gemini 3.5 Pro hasn't been released yet—they probably spent all their time on their beloved multimodality, right, Hassabis!!

Gemini Omni Flash

First unveiled at Google I/O 2026, Gemini Omni Flash deeply integrates Gemini's multimodal reasoning capabilities with video generation and editing, garnering significant attention then.

Now, this model is officially available to developers via the Gemini API and Google AI Studio. It can easily generate and edit high-quality videos based on various inputs like text, images, and video.

Four key capabilities:

Conversational Video Editing: Modify and refine videos using natural language, just like editing a Lark document.

Multimodal Reference: Combine image, text, and video inputs to maintain scene control and consistency.

Real-World Knowledge: Leverage Gemini's knowledge in history, biology, narrative logic, etc., to construct videos, saving you from writing three pages of prompts to describe architectural styles.

Text and Action Synchronization: Connect text and graphics directly to video actions through simple prompts.

The pricing is also very competitive: $0.10 per second of video output, on par with Veo 3.1 Fast.

In terms of positioning, Omni Flash, also a lightweight video generation model, emphasizes Gemini's world knowledge and fully aligns with the Gemini ecosystem.

But Google is also quite candid, proactively listing a bunch of current limitations:

1. Currently only supports 10-second video generation; longer support will come later.

2. Does not yet support audio reference uploads or scene expansion.

3. The API supports video reference uploads up to 3 seconds, but the model currently cannot correctly process such inputs.

4. There are still limitations in character consistency during scene changes and camera movements.

Nano Banana 2 Lite

Nano Banana 2 Lite (also known as gemini-3.1-flash-lite-image) is designed specifically for high-speed processing.

Through targeted optimization, it aims at real-time application scenarios that are extremely sensitive to latency and require processing large volumes of images in a short time—such as bulk generation of e-commerce materials, rapid iteration of ad creatives, and automated content pipelines.

Two core selling points—

Lightspeed: Image generation latency is about 4 seconds, one-fifth of Nano Banana 2's (which is about 20 seconds).

Dirt Cheap: A 1K image costs about $0.034, half the price of Nano Banana 2 and one-quarter of Nano Banana Pro.

Speed and price are cut, but image generation and editing capabilities haven't noticeably shrunk. Nano Banana 2 Lite still maintains excellent text rendering effects, benchmarking on par with models like Grok.

Therefore, Google's suggestion is: If you're still cheaping out with the first-gen Nano Banana, swap it now. The Lite version already comprehensively outperforms it in all key metrics.

Twin Blades United

Wait, hold on.

You might think this is just the parallel release of two models, but Google indicates there's more.

The real magic lies in chaining these models together.

As we all know, AIGC creation requires repeated iteration, and asset management can be quite troublesome.

Now, with these two models, you no longer need to repeatedly upload files—image generation and video creation are seamlessly connected.

Specifically, you can first use Nano Banana 2 Lite to generate images at high speed, then feed the generated images as reference material to Gemini Omni Flash to transform them into videos with one click.

To showcase this magical 1+1>2 workflow, Google even created 3 Demo APPs:

1、Anywhere

Take a selfie or upload a photo, and NB2 Lite instantly Photoshopped you into dozens of landmark scenes.

Then click on the image, and Omni Flash turns the static scene into a dynamic short video.

Cyber tourism, now also end-to-end.

&amp;amp;amp;amp;nbsp;

2、Space Lift

This is a bit scary. Combined with the Genie world model in the future, it might threaten many traditional interior design SaaS companies.

Upload a photo of a room. NB2 Lite first generates various interior design styles. Find one you like, click the video button, and Omni can directly create a cinematic space walkthrough for you.

&amp;amp;amp;amp;nbsp;

3、Omni product studio

A boon for cross-border e-commerce.

Take a white-background photo of a product. NB2 Lite generates various contextual product images. Omni Flash then turns the static images into e-commerce short videos.

From "product" to "advertising material", the entire chain runs automatically.

&amp;amp;amp;amp;nbsp;

So, what's the use of multimodality anyway?

Google has surely been asked this countless times.

Especially in 2026, where Coding ability has become almost synonymous with model intelligence. Everyone is fiercely competing in Coding.

Obsessing over multimodality, for what?

Forget the whole AGI narrative for a moment. In the short term, Google's suite of multimodal models can indeed empower many of its products—Stitch is one, the built-in photo editing in Pixel is another, and the emergence of NotebookLM was quite impressive.

The two new models released this time reveal even more potential for multimodality to land in vertical scenarios. E-commerce, interior design, short videos... the demand in these businesses is real, and so is the money.

Plus, with the Android ecosystem supporting it, there's little worry about commercialization.

Google might not catch up in Coding for now, but at the multimodality poker table, Google might be the only player with a full deck.

But...

When is Gemni 3.5 Pro coming out already!!!

Reference:[1]https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni-flash-nano-banana-2-lite/

This article comes from the WeChat public account "QbitAI", author: Following Cutting-Edge Tech

你可能也喜歡

渣打接管 USDC 入口，Circle 让权换规模

渣打银行与Circle联合宣布，机构客户现可通过渣打账户体系直接进行USDC的铸造与赎回，无需在Circle单独开户。该服务率先在迪拜国际金融中心推出，未来将视监管情况拓展。此举标志着全球系统重要性银行首次正面接管稳定币的“印钞”入口。对于养老金、主权基金等大型合规机构而言，他们此前因风控与合规门槛难以直接使用USDC，如今可通过其信任的银行渠道安全接入，相当于将USDC转化为“银行账户内的一个选项”，有望吸引巨量传统资金入场。对Circle而言，这是一笔战略性交易：让渡部分前端客户关系，换取借助渣打成熟渠道触达以往难以突破的大型机构客户，从而提升USDC的整体发行规模与利息收入核心业务。对渣打而言，则无需自行发币，即可利用现有信用网络提供数字资产服务并收取费用。双方形成了新的分工：Circle专注规模与信用背书，渣打则扮演合规入口与分销渠道。选择迪拜首发，意在利用其友好的监管环境进行模式试验，再向其他市场复制。此事更深层的意义在于叙事转变：稳定币不再意图绕开传统金融体系，而是被整合进主流银行的资产负债表与产品货架，其机构层面的合法性问题已基本解决。未来的竞争焦点将转向在发行方、银行渠道与合规牌照的新组合中，谁更贴近客户并掌握定价权。

marsbit1 分鐘前

marsbit1 分鐘前

‘芝加哥的最后一笔交易？’ – CFTC主席警告勿实施新0.2%加密货币税

伊利诺伊州于7月通过了一项对每笔加密货币交易征收0.2%的“一揽子”税，该政策作为州财政预算的一部分持续引发强烈反对。美国商品期货交易委员会主席罗斯汀·贝纳姆批评了州立法者，警告该法律可能使芝加哥丧失其金融市场地位。他在一篇评论文章中认为，这项惩罚性税法并无必要，并指出联邦政府已在推进更为审慎的《CLARITY法案》。芝加哥商业交易所作为全球最大衍生品交易所，总部设在伊利诺伊州并提供全天候加密货币交易。贝纳姆警告，此类措施将导致投资者逃离该州，这可能成为芝加哥的“最后一笔交易”。Coinbase首席法务官保罗·格鲁瓦尔也谴责该税法是愚蠢政策之一。该州税法将于2027年1月生效。尽管《CLARITY法案》旨在支持创新、引导加密业务在岸发展并保护相关就业，但其与税收关系不大。目前该法案在参议院受阻，且即使通过也无法提供加密货币税收减免。美国国会众议院已审议七项涉及加密货币税收的提案，旨在解决关键问题，但鉴于11月中期选举前的紧张日程，这些提案的立法进程可能仍需时日。总体而言，税收提案可能在选举后获得新的推动力，而加密货币税法的进展速度也将取决于中期选举后由谁控制国会。

ambcrypto2 分鐘前

ambcrypto2 分鐘前

75美元的分水岭：Hyperliquid逼近关键决策点

自2025年10月以来，比特币及大多数山寨币整体呈下跌趋势。然而，Hyperliquid [HYPE] 是少数长期趋势保持强劲看涨的大市值代币之一。其周线图显示看涨的波动结构已持续一年多，年内涨幅超74%。近期HYPE在53.35美元至74.78美元区间内震荡。当前价格已突破64.1美元的中轴阻力及67.2美元的短期阻力，尽管有大户获利了结，但买盘需求吸收了抛压。在比特币看涨势头的带动下，HYPE有望继续上探75美元的关键供给区。技术指标上，OBV稳步上升，RSI保持在50中性线上方，显示买压持续且看涨动能完好。但对于波段交易者而言，当前风险回报比并不理想。关键在于75美元区域的表现：若价格突破该阻力或从此处被拒绝并转跌，将为交易者提供明确的方向性交易信号。目前建议等待价格测试75美元区域后的市场反应，再行决策。

ambcrypto1 小時前

ambcrypto1 小時前

‘威胁并未减弱’：印度央行为何要加大力度警告加密货币风险？

印度央行（RBI）向议会财政常设委员会重申其长期反对加密货币合法化的立场，强调比特币等虚拟数字资产（VDAs）对印度构成严重风险。央行认为，加密货币在既有银行体系外运作，难以监管，可能危及金融稳定，并可能助长洗钱、毒品走私和恐怖主义融资等非法活动。RBI还指出，欧洲仅在严格监管框架下允许数字资产，而中国和卡塔尔等国已全面禁止相关活动。与之相反，印度特许会计师协会（ICAI）主张建立全面的法律框架而非禁止加密货币，建议制定会计准则、财务报告原则和合规指南以增强透明度与监管。尽管印度政府已对加密货币交易征税但未予合法地位，2026年联邦预算还提议对未报告加密交易的组织处以罚款。数据显示，2026年第一季度零售加密货币交易额同比下降11%至9790亿美元，同期加密货币行业安全漏洞事件达创纪录的207起，但损失金额降至9.72亿美元，较2025年同期大幅下降。TRM Labs政策主管指出，威胁并未减弱，反而变得更加复杂和危险。市场面临安全漏洞、流动性收紧、地缘政治紧张、监管不确定性和零售参与度下降等挑战。总结而言，RBI与ICAI对加密货币在印度的监管立场截然相反，而诈骗增加与零售活动放缓可能是印度建议严格规则的原因。

ambcrypto3 小時前

ambcrypto3 小時前

观点：比特币下跌10%，不是因为Saylor卖了32枚 BTC

6月初，比特币一度跌破66,000美元，两天内回撤约10%。市场上有观点将下跌归因于Michael Saylor旗下的MicroStrategy卖出了32枚比特币（约250万美元），但这笔小额交易远不足以解释全球加密市场约2,000亿美元级别的市值蒸发。真正推动价格下行的核心因素是多方面的。首先是资金面压力：美国现货比特币ETF出现连续净流出，累计约44亿美元，同时Mt.Gox破产资产方转移了超1万枚比特币，虽未直接抛售，但引发了市场对潜在抛压的担忧。其次是资金分流：同期，AI与大型科技公司（如Alphabet、SpaceX）正在进行大规模融资，吸引了大量风险资本，导致加密资产面临资金竞争和降仓压力。最关键的是杠杆放大效应：现货价格下跌触发了高杠杆多头的集中清算，24小时内全市场多头清算规模约16.6亿美元，形成了链式踩踏，将下跌幅度急剧放大。技术面上，比特币价格一度接近3月低点，市场可能进入调整后段，但周期的底部形成往往需要时间。这次下跌的本质，并非由个别小额卖出引发，而是ETF资金流出、潜在抛压预期、资金分流至高景气赛道以及高杠杆仓位被清算等多重因素共同作用的结果。未来加密市场的反弹，需要等待这些卖压被充分消化。

marsbit4 小時前

marsbit4 小時前

交易

現貨

Video Edition Nano Banana Arrives: Built-in Gemini World Knowledge, Original Banana Generates Images in Just 4 Seconds

文章摘要

Gemini Omni Flash

Nano Banana 2 Lite

Twin Blades United

So, what's the use of multimodality anyway?

熱門幣種推薦

相關問答

你可能也喜歡

渣打接管 USDC 入口，Circle 让权换规模

‘芝加哥的最后一笔交易？’ – CFTC主席警告勿实施新0.2%加密货币税

75美元的分水岭：Hyperliquid逼近关键决策点

‘威胁并未减弱’：印度央行为何要加大力度警告加密货币风险？

观点：比特币下跌10%，不是因为Saylor卖了32枚 BTC

交易

熱門文章

如何購買4

相關討論

熱門問答

熱門分類

熱門標籤