谷歌最新「香蕉」AI 图像模型,让网友疯玩「Vibe Photoshoping」

深潮Published on 2025-08-29Last updated on 2025-09-01

高度的角色一致性,带来了前所未有的「Vibe Photoshoping」体验。

还记得之前大家热议的神秘 AI 图片编辑模型「nano-banana」吗?当时在 LMArena 大语言模型竞技场里,它凭借出色表现一度被讨论得沸沸扬扬。Google Gemini 各路技术大牛也是轮番出马,在社交媒体上吊足众人胃口,甚至一度成为了传闻中的 Gemini 3.0 Pro

如今,Google 终于揭开了它的神秘面纱。

北京时间 8 月 27 日凌晨,Google AI Studio 正式发布了 Gemini 2.5 Flash Image(代号 nano banana)🍌。

预热许久终于登场的 Gemini 2.5 Flash Image | 图片来源:极客公园

这是迄今为止 Google 最先进的图像生成与编辑模型,不仅速度快得离谱,几乎是「闪电般」的体验,还在多个榜单上拿下了 SOTA 的成绩,在 LMArena 上更是遥遥领先。

一登场就达到 SOTA 能力的 Gemini 2.5 Flash Image | 图片来源:LMarena.ai

在技术博客里,Google 提到 Gemini 2.0 Flash 已经凭借低延迟和高性价比赢得了开发者的青睐,但用户们一直期待更高质量的图像和更强大的创作控制。Gemini 2.5 Flash Image 正是带着这些重磅升级而来:角色一致性终于得以充分保持,基于提示的图片编辑也更精准,多幅图像的融合效果自然流畅,再加上对现实世界知识的理解,让它不仅是一款模型,更像是为下一代爆款应用奠定基础的「原点」。

极客公园也在第一时间体验了它。出乎意料的是,这不仅仅是一个模型更新,它让人第一次真切地感觉到,AI 修图未来的样子已经近在眼前。

Google AI Studio 中目前已经开放体验 | 图片来源:极客公园

一开始,我的确只是抱着常规体验、「看看新模型又快在哪」的心态。可没想到,短短几个小时的体验,让我仿佛提前窥见了下一代爆款应用的模样。

过去我们习惯了美图秀秀这样的工具,点点按钮、套个滤镜,照片就能迅速变美。但 Gemini 2.5 Flash Image 给人的感觉完全不同。它快得不可思议,聪明得像个懂你心思的设计师,你只需要说出想要的效果,它就能在几秒钟里把画面呈现出来。

除了效果,速度也是 Gemini 2.5 Flash Image 另一个明显不同于此前模型生图产品的体验 | 图片来源:极客公园

01 极速生成,几秒钟出结果

Nano banana 体验最直观的就是速度。以往在使用一些开源模型时,哪怕你电脑配置不错,从输入提示到生成一张像样的图,也得等个几十秒甚至更久。对于手机端用户来说,这个等待过程更是煎熬。

但 Gemini 2.5 Flash Image 把这个门槛直接拉低到了几秒钟的级别。它本身是 Google 宣称的「最新、最快、最高效」的原生多模态模型,在优化上明显下了很大功夫。我在实际测试时,输入一句提示,大概三四秒就能生成结果,而且分辨率和细节都相当清晰。

这种体验很像我们日常用美图秀秀处理照片:点一下「美颜」按钮,几乎是即时见效。区别在于,美图秀秀是用算法套现成滤镜,而 Gemini 2.5 Flash Image 是在从零构建一张图,或者把一张照片按你的需求进行大幅改造。这种「指哪打哪」的爽感,是以往繁琐的 P 图流程完全不可想象的。

类似这种「删除路人背景」的需求,只需要一个 Prompt 就能解决 | 图片来源:极客公园

如果说速度解决的是传统 P 图用户的体验感,那么「原生多模态」解决的就是 AI图片能力边界

Gemini 2.5 Flash Image 不仅能生成图片,还能同时理解文字和图像输入。这意味着我可以把一张照片和一段文字提示同时给它,它会结合两者的信息去理解我到底想要什么。

举个例子,我上传了一张在街头拍的照片,然后告诉它「把背景改成东京新宿的夜景」。结果它不仅识别出了我上传照片里的主体,而且准确地把人抠出来,背景替换成了霓虹灯闪烁的新宿街头。更难得的是,它还保持了人物光影的统一,完全没有人手抠图经常无法避免的那种「硬抠贴」的效果。

这种理解力让我想起近几年手机厂商在系统自带相册中经常被提到的一个功能——「一键换背景」。但区别在于,当年的换背景常常边缘发虚、光影不对,效果很假。而现在,Gemini 2.5 Flash Image 能用世界知识和视觉理解来补足这些细节,结果自然得多,获得了远比传统文生图/图生图模型工具更准确的画面细节保留。

原图 & Gemini 2.5 Flash Image 生成效果 | 图片来源:极客公园

这也是为什么我觉得它会重新定义修图体验:不再是依赖大量手工调整,而是靠模型的自然语义理解来「大力飞砖」式的完成任务,例如在人像 P 图这种对画面细节要求极高的场景。

对于这种人像图片处理需求,Gemini 2.5 Flash Image 的角色一致性真的提供了一种前所未有的「Vibe Photoshoping」体验。

一秒帮程序员「挽尊」| 图片来源:极客公园

这种体验打破了很多人对 AI 图像生成以往印象 —— 「玄学」:无论你提示词写得好,出图效果惊艳;提示词写得一般,生成的东西可能完全跑偏。

但在 Gemini 2.5 Flash Image 里,我发现这种「玄学感」被削弱了很多。它对提示词的理解更精准,也更贴近用户的直觉—— 这就是为什么不少人会突然觉得它会好用很多的原因。

比如我对它说「模糊背景,突出前景人物」,几秒钟后生成的图正是我想要的效果;我让它「把照片里的人换成微笑的表情」,结果不仅嘴角微微上扬,连眼神都做了调整,细节非常到位;我甚至试过「给黑白照上色」,结果输出的彩色图并不是乱涂一气,而是尽可能贴近历史照片中该有的色彩氛围。

这种「说到做到」的能力,让我想到过去用美图秀秀时,明明只是想磨皮,结果整张脸变成了「开了十级美颜」的假人脸。而现在,Gemini 2.5 Flash Image 的操作是精准的、克制的,它真的理解你要什么,然后尽量还原。

02 能力加强,旦用难回

为了更直观,我特意拿它和我日常常用的移动端修图工具做了对比。

在 Snapseed 上,我如果要模糊背景,通常需要花一两分钟手动圈选前景区域,再调整模糊程度。即使操作熟练,也免不了反复修改。

在美图秀秀上,虽然有一键背景模糊功能,但经常会把人物的边缘模糊掉,效果不够自然。

而在 Gemini 2.5 Flash Image 上,我只需要一句话,它自动识别出人物和背景的边界,模糊效果自然,完全不需要二次修饰。

在更改画面中细节的同时,仍然对其他背景部分避免了此前 AI 工具经常出现的「乱涂乱画」 | 图片来源:Twitter

这种对比其实说明了一点:Gemini 2.5 Flash Image 把用户从复杂操作中解放了出来,把更多的工作交给了模型。对于普通人来说,它降低了修图的门槛;对于专业人士来说,它节省了大量时间。

体验下来,我最大的感受是,Gemini 2.5 Flash Image 已经不再只是一个修图工具,而是更接近「智能助手」。

过去,我们用美图秀秀,是在使用一个预设好的功能合集,滤镜、美颜、马赛克,每一个按钮对应一个功能。你要做的就是一点点选择、一步步调整,直到满意。

而现在,Gemini 2.5 Flash Image 的逻辑完全不同。它不再要求你学习工具的逻辑,而是直接理解你的需求。你只要说出来,它就替你完成。

这种转变看似细微,但实质上完全改变了 P 图这个流程的关系。以前是我们去适应工具,现在是工具来适应我们。这种交互方式,本身就是下一代应用形态的雏形。

站在现在来看,Gemini 2.5 Flash Image 还处在早期阶段,功能上可能还有边界。但它展现出来的速度、理解力和还原度,足以让人对未来充满想象。

如果把它和美图秀秀结合起来,会是什么样子?可能是你打开应用,对着手机说一句「帮我修一下这张照片,让皮肤自然一些」,几秒钟后结果就生成了;可能是旅行拍照时,你告诉它「把天气改成晴天」,照片立刻变成阳光明媚的样子;甚至可能是视频编辑里,你用一句话就能改变整个片段的氛围。

这种方式未来可能会迅速成为手机操作系统中的主流图片编辑功能 | 图片来源:Twitter

这就是为什么我觉得它会迅速革命 P 图工具领域现有的操作流程,定义下一代「美图秀秀」:不仅仅是修图,而是重新塑造图像处理的交互方式,让 AI 成为你的摄影后期伙伴。

但目前 Gemini 2.5 Flash Image 还并不能一步到位,充当开箱即用的大众 P 图 App:不仅是因为它的主要目的仍然是图像生成而非在现有的基础上微调,而且所有通过 Gemini 2.5 Flash Image 创建或编辑的图像都会包含一个SynthID 数字水印,用于社交内容平台识别 AI 生成内容。

03 爆款的爆发点

回头想想,美图秀秀曾经之所以能成为全民应用,靠的是它用最简单的方式解决了所有人都想解决的问题——让照片更好看。

而 Gemini 2.5 Flash Image,正是在这个基础上进一步,把复杂的 AI 能力打磨成人人都能用的「秒出图」体验。

当我第一次对它说出「帮我模糊一下背景」,几秒后画面就被自然处理好的那一瞬间,我心里很清楚:这是爆款应用的爆发原点。它不仅仅是一个模型,而是未来无数新产品的底层能力。

前几年在手机用户之间爆火的 AI 一键换天功能 | 图片来源:vivo 社区

也许几年后,我们会忘记 Banana 这个代号,但会看到越来越多这种让你「想要什么就说出来,立刻就能实现」的新体验的图片处理工具,也许会像当年的美图秀秀一样,成为一代用户的共同记忆。

只不过这一次,AI 会把想象力推得更远。

Related Reads

Morning News | Coinbase Partners with Standard Chartered to Expand Multi-Currency Fiat Channels; Sharplink and Forward to be Included in Russell Indices; JPMorgan May Issue Stablecoin in the Future

Daily Crypto Recap: Key Developments Institutional adoption continues: Coinbase partners with Standard Chartered to expand multi-currency fiat rails for institutions via Coinbase Prime, supporting AUD, SGD, CAD, CHF, EUR, and GBP. Meanwhile, Sharplink and Forward Industries, companies holding significant ETH and SOL reserves respectively, are set to be included in the Russell indexes, providing indirect crypto exposure to traditional index investors. Regulatory and compliance moves are in focus. Hong Kong's monetary authority announced new measures for investment accounts of mainland Chinese investors, including retroactive document checks to January 2023. Prediction market Polymarket is considering implementing KYC requirements to address sanctions and legal risks. Major financial players signal deeper involvement. JPMorgan Chase CEO Jamie Dimon suggested the bank might issue a stablecoin in the future. Concurrently, Falcon Finance and Anchorage Digital launched fUSD, a compliant, institution-focused stablecoin. Market sentiment presents a mixed picture. Bitmine's Tom Lee predicts an incoming crypto "supercycle," driven by Wall Street tokenization and AI agents, with Ethereum as a key beneficiary. However, a prominent trader cautions that the current period of investor losses may not be long enough to confirm a bear market bottom, and TD Cowen analysts note diminished chances for U.S. crypto market structure legislation this year due to a worsening political climate. Other notable news includes a16z crypto's observation that most tokenized assets are merely "digitized" and not actively used in DeFi, South Korea's crypto trading volume falling to about 8% of KOSPI's, and the Chinese Supreme Court stating it will research judicial rules for virtual currency cases.

链捕手1h ago

Morning News | Coinbase Partners with Standard Chartered to Expand Multi-Currency Fiat Channels; Sharplink and Forward to be Included in Russell Indices; JPMorgan May Issue Stablecoin in the Future

链捕手1h ago

Sitting on a Trillion-Dollar Market, Why Hasn't Real Estate Tokenization Taken Off?

For years, real estate tokenization has been hailed as a breakthrough technology poised to democratize property investment. In theory, it promises fractional ownership of premium assets, rapid transactions, and enhanced liquidity. Yet, in practice, it has failed to gain traction, accounting for less than 0.1% of the global real estate market. The core issue is not a lack of tokens, but the absence of a robust legal, operational, and compliant framework that grants them credibility as financial instruments. The industry initially erred by prioritizing technology over investor needs, creating products with unclear ownership and unreliable liquidity. Key infrastructure remains missing: legally sound ownership structures, compliant transfer mechanisms, professional servicing, and interoperability with traditional finance. This regulatory ambiguity and operational complexity deter institutional investors, who already have access to established, well-governed investment channels. A mature model would feature low minimum investments in institutional-grade assets, transparent rental income distribution, and genuine liquidity through regulated secondary markets. While regulatory progress in regions like the UAE and growth in other tokenized asset sectors (like treasuries) are positive signs, the focus must shift from issuing tokens to building foundational systems. The investment proposition of tokenized real estate is not to create new returns, but to improve access, efficiency, and liquidity for existing income-generating properties. For mainstream adoption, the sector must demonstrate tangible economic advantages over traditional models, not just technical novelty. The next phase depends on proving scalable, compliant operations with auditable track records. The barrier is no longer technology, but infrastructure and regulation. The vision remains unfulfilled until this gap is bridged.

marsbit1h ago

Sitting on a Trillion-Dollar Market, Why Hasn't Real Estate Tokenization Taken Off?

marsbit1h ago

Large Language Models Ace All Exams, Yet Move Farther from AGI: What Does This Paper Reveal?

The article discusses the ongoing challenge of defining and achieving Artificial General Intelligence (AGI). It notes that industry leaders have set vague, often profit- or time-based benchmarks for AGI, while the concept itself lacks a consensus definition—a situation the article compares to a "Rorschach test." It highlights a recent 2025 paper by researcher Michael Timothy Bennett, who proposes a new, measurable definition. Bennett frames AGI not as mimicking human performance on tests, which current large language models (LLMs) have already mastered, but as an "artificial scientist." A true AGI, according to this view, should be able to widely and efficiently adapt to new environments and tasks within real-world constraints (like computational and energy limits), focusing on the *discovery of new knowledge* rather than the replication of existing data. The author contrasts this with the current dominant approach of "scale-maxing"—massively scaling up data, parameters, and compute. While powerful, this method leads to models that fail on out-of-distribution problems and lack core intelligent abilities: they are passive learners, cannot reason causally, and cannot actively experiment or balance exploration with exploitation. The article argues that Bennett's framework offers a crucial shift. It makes AGI a quantifiable engineering problem and proposes new evaluation "adaptation benchmarks" that test an AI's ability to actively learn in novel scenarios. The conclusion is that achieving AGI will require a fundamental reset—a fusion of multiple methodologies beyond simple scaling, moving AI from mimicking patterns to embodying the scientific spirit of inquiry and discovery.

marsbit2h ago

Large Language Models Ace All Exams, Yet Move Farther from AGI: What Does This Paper Reveal?

marsbit2h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片