Video Edition Nano Banana Arrives: Built-in Gemini World Knowledge, Original Banana Generates Images in Just 4 Seconds

marsbitPubblicato 2026-07-01Pubblicato ultima volta 2026-07-01

Introduzione

Google has unveiled two new multimodal AI models: Gemini Omni Flash and Nano Banana 2 Lite. Gemini Omni Flash is a video generation and editing model that leverages Gemini's world knowledge. It allows for conversational video editing using natural language prompts, maintains scene consistency, and integrates text/graphics with video actions. Priced at $0.10 per second of output, its current limitations include a 10-second video cap. Nano Banana 2 Lite (gemini-3.1-flash-lite-image) is an optimized image generation model focused on speed and cost. It produces a 1K resolution image in about 4 seconds at a cost of roughly $0.034, making it significantly faster and cheaper than its predecessor. It retains strong text rendering capabilities. A key highlight is the combined workflow: users can rapidly generate images with Nano Banana 2 Lite and then seamlessly feed them into Gemini Omni Flash to create videos. Google demonstrated this with three application demos: "Anywhere" for creating travel videos from photos, "Space Lift" for generating interior design walkthroughs, and "Omni Product Studio" for automating e-commerce ad creation from product photos. The release underscores Google's strategic focus on advancing multimodal AI for practical, commercial applications in areas like marketing, design, and content creation, despite competitive pressures in other AI domains.

Although coding is still a mess, Google really has a knack for "multimodality".

The Gemini Omni Flash API is officially open, introducing the video edition Nano Banana.

Magical remakes of "Harry Potter" are no longer a dream. Just watch these four digital magic tricks performed by Gemini Omni:

&amp;amp;amp;amp;nbsp;

It's insane. This level of consistency and text clarity makes green screens and special effects almost obsolete—just go live as Doctor Strange.

Meanwhile, the beloved "Banana" has welcomed a "lightspeed edition".

Nano Banana 2 Lite: The fastest, most cost-effective Gemini image model to date.

No exaggeration—it takes just 4 seconds to generate one image. A 1K resolution image costs about 20+ cents.

Compared side-by-side with Nano Banana 2, this speed is practically taking off.

Not to mention GPT Image 2, which takes 3 minutes for a single image generation...

&amp;amp;amp;amp;nbsp;

No wonder Gemini 3.5 Pro hasn't been released yet—they probably spent all their time on their beloved multimodality, right, Hassabis!!

Gemini Omni Flash

First unveiled at Google I/O 2026, Gemini Omni Flash deeply integrates Gemini's multimodal reasoning capabilities with video generation and editing, garnering significant attention then.

Now, this model is officially available to developers via the Gemini API and Google AI Studio. It can easily generate and edit high-quality videos based on various inputs like text, images, and video.

Four key capabilities:

Conversational Video Editing: Modify and refine videos using natural language, just like editing a Lark document.

Multimodal Reference: Combine image, text, and video inputs to maintain scene control and consistency.

Real-World Knowledge: Leverage Gemini's knowledge in history, biology, narrative logic, etc., to construct videos, saving you from writing three pages of prompts to describe architectural styles.

Text and Action Synchronization: Connect text and graphics directly to video actions through simple prompts.

The pricing is also very competitive: $0.10 per second of video output, on par with Veo 3.1 Fast.

In terms of positioning, Omni Flash, also a lightweight video generation model, emphasizes Gemini's world knowledge and fully aligns with the Gemini ecosystem.

But Google is also quite candid, proactively listing a bunch of current limitations:

1. Currently only supports 10-second video generation; longer support will come later.

2. Does not yet support audio reference uploads or scene expansion.

3. The API supports video reference uploads up to 3 seconds, but the model currently cannot correctly process such inputs.

4. There are still limitations in character consistency during scene changes and camera movements.

Nano Banana 2 Lite

Nano Banana 2 Lite (also known as gemini-3.1-flash-lite-image) is designed specifically for high-speed processing.

Through targeted optimization, it aims at real-time application scenarios that are extremely sensitive to latency and require processing large volumes of images in a short time—such as bulk generation of e-commerce materials, rapid iteration of ad creatives, and automated content pipelines.

Two core selling points—

Lightspeed: Image generation latency is about 4 seconds, one-fifth of Nano Banana 2's (which is about 20 seconds).

Dirt Cheap: A 1K image costs about $0.034, half the price of Nano Banana 2 and one-quarter of Nano Banana Pro.

Speed and price are cut, but image generation and editing capabilities haven't noticeably shrunk. Nano Banana 2 Lite still maintains excellent text rendering effects, benchmarking on par with models like Grok.

Therefore, Google's suggestion is: If you're still cheaping out with the first-gen Nano Banana, swap it now. The Lite version already comprehensively outperforms it in all key metrics.

Twin Blades United

Wait, hold on.

You might think this is just the parallel release of two models, but Google indicates there's more.

The real magic lies in chaining these models together.

As we all know, AIGC creation requires repeated iteration, and asset management can be quite troublesome.

Now, with these two models, you no longer need to repeatedly upload files—image generation and video creation are seamlessly connected.

Specifically, you can first use Nano Banana 2 Lite to generate images at high speed, then feed the generated images as reference material to Gemini Omni Flash to transform them into videos with one click.

To showcase this magical 1+1>2 workflow, Google even created 3 Demo APPs:

1、Anywhere

Take a selfie or upload a photo, and NB2 Lite instantly Photoshopped you into dozens of landmark scenes.

Then click on the image, and Omni Flash turns the static scene into a dynamic short video.

Cyber tourism, now also end-to-end.

&amp;amp;amp;amp;nbsp;

2、Space Lift

This is a bit scary. Combined with the Genie world model in the future, it might threaten many traditional interior design SaaS companies.

Upload a photo of a room. NB2 Lite first generates various interior design styles. Find one you like, click the video button, and Omni can directly create a cinematic space walkthrough for you.

&amp;amp;amp;amp;nbsp;

3、Omni product studio

A boon for cross-border e-commerce.

Take a white-background photo of a product. NB2 Lite generates various contextual product images. Omni Flash then turns the static images into e-commerce short videos.

From "product" to "advertising material", the entire chain runs automatically.

&amp;amp;amp;amp;nbsp;

So, what's the use of multimodality anyway?

Google has surely been asked this countless times.

Especially in 2026, where Coding ability has become almost synonymous with model intelligence. Everyone is fiercely competing in Coding.

Obsessing over multimodality, for what?

Forget the whole AGI narrative for a moment. In the short term, Google's suite of multimodal models can indeed empower many of its products—Stitch is one, the built-in photo editing in Pixel is another, and the emergence of NotebookLM was quite impressive.

The two new models released this time reveal even more potential for multimodality to land in vertical scenarios. E-commerce, interior design, short videos... the demand in these businesses is real, and so is the money.

Plus, with the Android ecosystem supporting it, there's little worry about commercialization.

Google might not catch up in Coding for now, but at the multimodality poker table, Google might be the only player with a full deck.

But...

When is Gemni 3.5 Pro coming out already!!!

Reference:[1]https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni-flash-nano-banana-2-lite/

This article comes from the WeChat public account "QbitAI", author: Following Cutting-Edge Tech

Crypto di tendenza

Haedal ProtocolHAEDAL

Domande pertinenti

QWhat are the two new Gemini models announced, and what are their primary functions?

ATwo new models are announced: Gemini Omni Flash and Nano Banana 2 Lite. Gemini Omni Flash is a multimodal video generation and editing model that can create videos from text, image, and video inputs. Nano Banana 2 Lite is an ultra-fast and cost-effective image generation model.

QWhat are the key features and performance claims for the Nano Banana 2 Lite image model?

ANano Banana 2 Lite is claimed to be the fastest and most cost-effective Gemini image model. It generates a 1K resolution image in about 4 seconds at a cost of approximately $0.034 per image. It maintains strong text rendering capabilities while being significantly faster and cheaper than its predecessor.

QHow does Gemini Omni Flash leverage its 'world knowledge' capability, and what is one of its stated limitations?

AGemini Omni Flash can call upon Gemini's knowledge in areas like history, biology, and narrative logic to inform video generation, reducing the need for detailed user prompts. One stated limitation is that it currently only supports generating videos up to 10 seconds in length, with longer support planned for the future.

QAccording to the article, how can the two models be used together in a workflow? Provide one example.

AThe models can be used in a seamless 'image-to-video' workflow. For example, you can first use Nano Banana 2 Lite to quickly generate an image (like a product scene). Then, you can feed that generated image as a reference directly into Gemini Omni Flash to create a video based on it, eliminating the need to manually upload files between steps.

QWhat is the article's perspective on Google's focus on multimodal AI compared to coding capabilities?

AThe article suggests that while Google may be lagging in the 'Coding' race (often used as a proxy for model intelligence), it is a strong contender in multimodal AI. It argues that multimodal models have clear, immediate commercial applications in fields like e-commerce and content creation, and Google, with its ecosystem, is well-positioned to capitalize on this.

Letture associate

Confirmed: Claude Code Secretly Inspects Users, Time Zone and Chinese AI Labs Are Key Factors

Today was a significant day for Anthropic. The company announced the launch of Claude Sonnet 5, described as its most agentic model yet, and separately confirmed that the U.S. Department of Commerce has lifted export controls on its Claude Fable 5 and Mythos 5 models, allowing their distribution to resume. However, a separate controversy has emerged regarding its coding assistant, Claude Code. Developers have exposed that certain versions of the tool allegedly contain hidden code designed to detect specific user data. This code reportedly checks for the use of Chinese time zones (like Asia/Shanghai), the presence of custom API proxy URLs, and connections to domains associated with Chinese tech companies and AI labs. If triggered, this information is said to be encoded into the system prompt sent to the AI cloud, using subtle, nearly indistinguishable variations in characters (like different Unicode apostrophes in the "Today's date" line) as a form of steganography. The core issue is the covert nature of this data collection. While telemetry for security and abuse prevention is common, implementing it through hidden channels within the prompt—without user awareness or documented disclosure—fundamentally breaches trust. This is particularly sensitive for a coding assistant that operates with access to source code and system commands. Following the exposure, an Anthropic engineer acknowledged the code's existence and stated it would be removed in an upcoming release. The incident raises serious questions about transparency and the boundaries of data collection in AI developer tools.

marsbit39 min fa

Confirmed: Claude Code Secretly Inspects Users, Time Zone and Chinese AI Labs Are Key Factors

marsbit39 min fa

Grayscale: After Halving, BTC is Nearing the Bottom of This Cycle

Grayscale Research suggests Bitcoin's recent decline below $60,000, a >50% drop from its October peak, represents a cyclical correction within a long-term uptrend rather than a trend reversal. Key factors behind the pullback include a shift in market expectations toward Federal Reserve rate hikes under new Chair Kevin Warsh, uncertainty around the CLARITY Act's Senate passage, pressure on leveraged entities like Strategy, and concerns over quantum computing risks. The path out of the current bear market hinges on upcoming catalysts. An optimistic scenario, where the CLARITY Act passes, leverage is contained, and the Fed refrains from hiking, could mean Bitcoin is nearing its cycle bottom. A pessimistic scenario, featuring legislative failure, further deleveraging, and Fed rate hikes, could lead to additional moderate downside. Grayscale does not expect a historically deep ~80% drawdown due to a more measured prior bull run and stickier institutional demand. Despite short-term headwinds, Grayscale remains highly optimistic about crypto's long-term structural prospects, driven by institutional adoption of public blockchains, unsustainable government debt, declining trust in intermediaries, and AI's potential demand for alternative systems. The report concludes that while the exact cycle low depends on near-term catalysts, current valuations present an attractive entry point for long-term investors betting on the decade-ahead growth of digital assets.

marsbit1 h fa

Grayscale: After Halving, BTC is Nearing the Bottom of This Cycle

marsbit1 h fa

Web3 Bear Market Survival Guide: Ten Great Books to Help You Navigate the Cycles

"Web3 Bear Market Survival Guide: Ten Books to Help You Navigate the Cycle" This article presents a curated book list aimed at helping Web3 enthusiasts and professionals endure and grow during crypto market downturns. It argues that bear markets are not just periods of waiting but crucial times for deepening one's foundational understanding beyond technical whitepapers and price charts. The ten recommended books offer perspectives on technology, economics, philosophy, and strategy to build resilience and long-term vision. The list includes: 1. **"The Inevitable" by Kevin Kelly:** For using a long-term technological lens to combat uncertainty about the future, including the role of crypto and AI. 2. **"Human Action" by Ludwig von Mises:** To upgrade one's economic and philosophical framework, understanding action, speculation, and calculation in a bear market context. 3. **"The Nature of Technology" by W. Brian Arthur:** For viewing blockchain and crypto as combinatorial evolutions of existing technologies, understanding their modular and economic development. 4. **"The Distant Savior" (Chinese novel):** Explores the cultural attributes of self-reliance ("strong culture") versus dependency ("weak culture"), crucial for surviving industry cycles. 5. **"The Sovereign Individual" by James Dale Davidson & Lord William Rees-Mogg:** A prophetic 1997 work on how technology empowers individuals and challenges nation-states, foreshadowing Bitcoin's emergence. 6. **"Japanization: What the World Can Learn from Japan's Lost Decades" (Adapted title):** Uses Japan's economic history as a case study to identify structural opportunities that persist even during broader recessions. 7. **"Denationalisation of Money" by F.A. Hayek:** The ideological blueprint for Bitcoin, arguing for competitive currency issuance beyond state monopoly. 8. **"Duan Yongping Investment Q&A" (Chinese compilation):** Emphasizes the simple discipline of "doing the right things and doing things right," focusing on fundamentals and maintaining a "stop doing list." 9. **"The Network State: How To Start a New Country" by Balaji Srinivasan:** A visionary text from a crypto insider outlining bold predictions and concrete ideas for a blockchain-based future across media, governance, and identity. 10. **"Selected Works of Mao Zedong" (Vol. 1):** Analyzed as a strategic playbook for a weak force challenging a powerful establishment, offering lessons on strategy, alliance-building, and perseverance for the crypto movement. The conclusion states that bear markets filter out those with weak conviction, not weak skills. Survival depends on cognitive depth and mental fortitude, which these books aim to provide.

Foresight News1 h fa

Web3 Bear Market Survival Guide: Ten Great Books to Help You Navigate the Cycles

Foresight News1 h fa

Who is the Most Profitable Man in the Crypto World? Trump Rakes in Over $1.427 Billion in 2025

Who is the most profitable man in crypto? President Trump's 2025 financial disclosure, filed with the Office of Government Ethics, reveals crypto-related earnings exceeding $1.427 billion, starkly contrasting the broader market downturn. The bulk of this wealth stems not from passive investment but from his entities' roles as issuers and licensors. CIC Digital LLC, his memecoin operation, generated approximately $636 million in 2025, primarily from "Celebration Coins" royalty fees. DT Marks Defi LLC, a stakeholder in World Liberty Financial, earned about $594 million from asset sales and token distribution proceeds. Other entities held significant Bitcoin, Ethereum, and various altcoin wallets, along with substantial income from stablecoin ventures and his wife's NFT sales. This "issuer model" shields him from market downturns. While his namesake memecoin plummeted from ~$74 to ~$1.68, leaving many retail investors at a loss, his royalty income remained unaffected. The disclosure emerges as the U.S. Senate debates the CLARITY Act, which includes contentious ethics provisions aimed at preventing officials from profiting from the crypto sector they regulate. Critics argue Trump's earnings exemplify a critical conflict of interest, fueling demands for stricter rules to separate regulatory power from personal financial gain in the industry.

Foresight News1 h fa

Who is the Most Profitable Man in the Crypto World? Trump Rakes in Over $1.427 Billion in 2025

Foresight News1 h fa

Trump's 25-Year Financial Report: Family Earns Over $1 Billion Annually from Crypto, While Retail Investors Lose Money on $TRUMP

Former President Donald Trump's family earned approximately $1.2 billion from cryptocurrency ventures in 2025, according to a financial disclosure report. This revenue stream, outlined in a 927-page filing, now surpasses income from most of his long-established real estate holdings. The crypto earnings originated from two main sources: over $500 million from the sale of products like "governance tokens" by World Liberty Financial, a DeFi project co-owned by the Trump family, and roughly $635 million in royalties from the Trump-themed meme coin $TRUMP, issued by CIC Digital LLC. While Trump's entities profited, retail investors faced significant losses. The $TRUMP token, which peaked above $74 shortly after its January 2025 launch, has plummeted to around $1.68. World Liberty Financial's token has also fallen roughly 80% since its debut. Reports indicate that the majority of meme coin buyers have lost money, with Trump-linked entities still holding about 80% of $TRUMP's supply under vesting plans. The disclosure highlights a stark contrast: Trump's crypto and real estate businesses flourished—with new international property deals bringing in tens of millions—even as his administration shifted to crypto-friendly policies, relaxing the stringent regulatory stance of the previous Biden administration. The White House maintains that Trump acts only in the public interest, with his businesses placed in a trust managed by his sons, denying any conflict of interest. However, the report notes the difficulty of assessing such conflicts, particularly regarding foreign business dealings with countries that later received favorable U.S. policy decisions.

marsbit1 h fa

Trump's 25-Year Financial Report: Family Earns Over $1 Billion Annually from Crypto, While Retail Investors Lose Money on $TRUMP

marsbit1 h fa

Trading

Spot

Articoli Popolari

Come comprare 4

Benvenuto in HTX.com! Abbiamo reso l'acquisto di 4 (4) semplice e conveniente. Segui la nostra guida passo passo per intraprendere il tuo viaggio nel mondo delle criptovalute.Step 1: Crea il tuo Account HTXUsa la tua email o numero di telefono per registrarti il tuo account gratuito su HTX. Vivi un'esperienza facile e sblocca tutte le funzionalità,Crea il mio accountStep 2: Vai in Acquista crypto e seleziona il tuo metodo di pagamentoCarta di credito/debito: utilizza la tua Visa o Mastercard per acquistare immediatamente 44.Bilancio: Usa i fondi dal bilancio del tuo account HTX per fare trading senza problemi.Terze parti: abbiamo aggiunto metodi di pagamento molto utilizzati come Google Pay e Apple Pay per maggiore comodità.P2P: Fai trading direttamente con altri utenti HTX.Over-the-Counter (OTC): Offriamo servizi su misura e tassi di cambio competitivi per i trader.Step 3: Conserva 4 (4)Dopo aver acquistato 4 (4), conserva nel tuo account HTX. In alternativa, puoi inviare tramite trasferimento blockchain o scambiare per altre criptovalute.Step 4: Scambia 4 (4)Scambia facilmente 4 (4) nel mercato spot di HTX. Accedi al tuo account, seleziona la tua coppia di trading, esegui le tue operazioni e monitora in tempo reale. Offriamo un'esperienza user-friendly sia per chi ha appena iniziato che per i trader più esperti.

365 Totale visualizzazioniPubblicato il 2025.10.20Aggiornato il 2026.06.02

Discussioni

Benvenuto nella Community HTX. Qui puoi rimanere informato sugli ultimi sviluppi della piattaforma e accedere ad approfondimenti esperti sul mercato. Le opinioni degli utenti sul prezzo di 4 4 sono presentate come di seguito.