The Image Generation Model That's Hotter Than Nano Banana Has Leaked, Screenshots Are No Longer Evidence | Includes Prompts

marsbit發佈於 2026-04-19更新於 2026-04-19

文章摘要

A new AI image generation model, widely referred to as "GPT Image 2," has been leaked and is demonstrating significant advancements over predecessors like DALL-E 3 and even Google's Nano Banana Pro. It excels in four key areas: text rendering, prompt adherence, photorealism, and world knowledge. The model can generate highly accurate text in multiple languages, including complex Chinese characters, making it capable of producing convincing fake documents, UI screenshots, and product labels. This capability also raises concerns about the reliability of using screenshots as evidence. The model is currently in A/B testing, with a full release expected around May 2026 when DALL-E services are officially retired. It is accessible for testing on the LM Arena platform. The article includes several prompt templates optimized for the model, such as generating realistic app screenshots, product photos with detailed labels, and street scenes with accurate signage. This advancement is reshaping creative workflows but also accelerating the displacement of some traditional design roles.

Is your impression of text-to-image still stuck on Nano Banana?

But kid, times have changed again.

@johnAGI168 https://x.com/johnAGI168/status/2044781168151724067

@0115hippo https://x.com/0115hippo/status/2044722124611539160

In early April, three anonymous image models, codenamed maskingtape-alpha, packingtape-alpha, and gaffertape-alpha, appeared on the LM Arena evaluation platform. They disappeared a few hours later.

OpenAI has not officially announced this model yet, but based on the metadata returned by the API and user-side testing records, it has already gained a widely accepted name: GPT Image 2.

Screenshots Can No Longer Be Used as Evidence

Over the past few years, one of the most obvious weaknesses of AI image generation models has been text within images. In the DALL-E 3 era, if you asked it to write "Hello" in an image, it might output "Hellp" or even "Hl10", with letters tilting drunkenly. GPT Image 1 improved a lot, handling simple English labels. By GPT Image 1.5, its accuracy in rendering English text was close to 95%, but it still had significant flaws with non-Latin scripts like Chinese, Japanese, and Korean.

But the leaked sample images from GPT Image 2 have changed this impression.

@MrLarus https://x.com/MrLarus/status/2044824800909054181

@akokoi1 https://x.com/akokoi1/status/2044789531615056175

The text in the images is exactly what it should be. Chinese characters are clear, with accurate glyphs and complete strokes. Someone tested generating an ID card-style image, where the name, address, and ID number were all rendered correctly, with neat formatting, looking at first glance like a photo of a real document.

This is good news. The improvement in text rendering means generating infographics, posters, product packaging, and complex charts becomes more reliable.

But there's always another side to the coin. A model that can generate photo-realistic ID-style images and precisely render UI screenshots naturally makes "screenshots can be used as evidence" increasingly questionable.

By comparison, this is also a core difference between the GPT Image series and other models. Midjourney still has no progress in text rendering, and the Stable Diffusion series also has this old problem. According to the leaked Arena test results, GPT Image 2 surpassed Midjourney in four dimensions: text rendering, instruction following, photorealism, and world knowledge. Midjourney's advantages are mainly retained in artistic style and aesthetic control.

Does It Really Know What the World Looks Like?

A tester asked the model to generate a hypothetical GPT-8 product pricing page. The resulting image had a layout that was indeed in the style of the OpenAI website, with button placement and font choices resembling those from a real interface, and the hierarchical logic of the price table was correct.

GPT Image 2 can generate images extremely similar to real software interfaces, including browser windows, mobile app interfaces, and data visualization charts, with a level of fidelity unmatched by the previous generation.

@johnAGI168 https://x.com/johnAGI168/status/2044781168151724067

@levelsio https://x.com/levelsio/status/2040333489476681758

This will lead to some very interesting practical uses. When designers are creating product prototypes, they don't need to open Figma first and draw a bunch of wireframes; they can directly describe the desired interface in text, and the output is a reference image that can be used for team discussions. When creating investor decks, they can show a "product screenshot" without waiting for an engineer to write code. When writing documentation, example interface images for illustration can be generated directly, without having to think about where to find screenshots for a blank page.

@marmaduke091 https://x.com/marmaduke091/status/2040338311873515597

Image Generation Is No Longer Just "Image Generation"

OpenAI has already announced that DALL-E 2 and DALL-E 3 will officially cease service on May 12, 2026. Azure OpenAI's DALL-E 3 was retired early in February.

DALL-E was the first place many people encountered AI image generation, from those blurry early works to today, in just a few short years.

Meanwhile, Google, which had just established its industry position with Nano Banana Pro in early 2026, might feel the pressure. Early test reports indicate that GPT Image 2 simultaneously surpasses Nano Banana Pro in three dimensions: realism, text rendering, and world knowledge. This kind of triple win is not common.

For creators, the feeling is complex. Illustrators, graphic designers, and photographers are not facing this topic for the first time. Since the release of GPT Image 1, the number of freelance graphic design positions has decreased by about 18%. AI has indeed replaced the decision to "hire someone to do this" in certain scenarios, but it is also creating new ways of working, allowing one person to do more.

The evolution speed of image generation models no longer leaves much time for adaptation. It was only a few months from GPT Image 1's launch to version 1.5. And from 1.5 to 2, it's only been about half a year. Each generation solves the core shortcomings of the previous one while opening up new possibilities.

GPT Image 2 is currently still in the A/B testing phase, with some ChatGPT users randomly gaining access. The official release window is widely predicted to be around May, coinciding with the retirement of DALL-E. If you want to experience it early, you can currently try your luck on the LM Arena evaluation platform.

Test Address: https://arena.ai

Based on community feedback and the known strengths of this model, the following prompt templates can maximize your chances of success:

UI/Screenshot Prompt: A photorealistic screenshot of a mobile banking app, clearly showing transaction history with dates, amounts, and merchant names legible. iPhone 16 screen, natural hand holding the phone, coffee shop background.

Product Label Prompt: A photographic product photo of a craft beer bottle, with clear label details showing the brewery name "Oakridge Brewing Co.", alcohol content 6.8%, a mountain logo, and an ingredient list. Studio lighting, white background.

Signage Prompt: A street scene photo of a Tokyo alley at night, showing multiple neon signs in both Japanese and English, including a ramen shop sign reading "Ichiban Ramen — Est. 1987", a karaoke bar sign, and various glowing advertisements. Wet, reflective pavement with light reflections.

Interface/World Knowledge Prompt: A photorealistic YouTube video screenshot showing a video titled "How to Assemble a Computer in 2026" with 2.3 million views, featuring realistic comments, sidebar video recommendations, and channel info. Desktop browser view.

Widescreen Trigger Prompt: A cinematic widescreen photo of an IKEA store exterior at dusk, showing the glowing IKEA sign, a parking lot with realistic cars, and shoppers entering and leaving. Golden hour lighting, 16:9 format.

Unattributed image sources and references: https://miraflow.ai/blog/how-to-use-duct-tape-ai-model-arena-gpt-image-2-guide

This article is from the WeChat public account "APPSO", author: Discovering Tomorrow's Products

你可能也喜歡

比特币是否开启新纪元？Strategy（微策略）在八月以新一轮抛售开局：宣布再次大规模出售BTC！

全球最大的比特币公开持仓上市公司微策略（Strategy）宣布进行新一轮比特币抛售。据创始人迈克尔·塞勒声明，公司出售了1,638枚比特币，价值约1.05亿美元。这些在7月27日至8月2日期间进行的销售，均价为63,957美元，共获约1.047亿美元。此次出售旨在为优先股股息支付和回购公司股票STRC提供资金。自8月初以来，公司持续进行销售，其比特币总持有量已降至842,138枚。微策略在7月未进行任何比特币买入，已连续约6周未增持BTC。同期，公司通过出售MSTR股票筹集了2.906亿美元，其中8120万美元用于回购STRC股票。公司美元储备已增至约40亿美元，这使得仅依靠美元储备支付股息的预期时长增加了57天，达到2.3年。公司声明强调，此举增加了美元储备的持久性。截至8月2日，其比特币储备为842,138枚BTC，美元储备为40亿美元。

cryptonews.ru19 分鐘前

比特币是否开启新纪元？Strategy（微策略）在八月以新一轮抛售开局：宣布再次大规模出售BTC！

cryptonews.ru19 分鐘前

XRP现货基金七月以净流入2,729万美元收官

根据SoSoValue数据，基于XRP的现货交易所交易基金在7月份实现了2729万美元的净流入，延续了长达四个月的连续资金净流入趋势。在除比特币和以太坊之外的其他加密资产ETF中，XRP-ETF成为7月份的领跑者。其中，仅过去一周就流入了1486万美元，为7月以来单周最高，主要流入发生在周四和周五。自4月以来，该类基金已累计吸引超过3亿美元资金，四月至七月每月均录得净流入。自XRP-ETF推出以来，其累计净流入总额已创下超过15亿美元的新纪录。在发行商中，Bitwise以5.11亿美元领先。尽管资金持续流入，但XRP价格本身在过去一周下跌超过3.5%，过去24小时内下跌近1%。

cryptonews.ru34 分鐘前

cryptonews.ru34 分鐘前

Hashdex将关闭运营两年多的最小规模比特币ETF

Hashdex宣布将于本月清算其同名的现货比特币交易所交易基金（DEFI），向所有剩余股东分配现金并出售基金持有的约225枚比特币。该决定基于对交易流动性、运营成本和投资者兴趣等因素的评估。该基金于2024年3月在NYSE ARCA上市，目前净资产为1425万美元。作为较晚进入市场的比特币ETF之一，其资产规模最高曾于2025年5月9日达到1754万美元。在同类产品中，目前规模最小，仅次于WisdomTree比特币信托（BTCW，净资产约1.4亿美元）。分析师曾指出，尽管上市较晚，但在比特币价格处于高位时仍有机会吸引资金。

cointelegraph34 分鐘前

cointelegraph34 分鐘前

《财富》杂志警告Circle与IBM交易的风险

《财富》杂志编辑警告，Circle收购IBM近千项区块链专利可能加剧稳定币和区块链基础设施市场的竞争压力。该交易于7月27日宣布，涵盖区块链技术、银行金融、供应链验证等多个领域。Circle称此举将支持USDC、支付网络及AI金融工具的发展。分析指出，庞大的专利组合可能成为Circle打压竞争对手的杠杆，包括索取许可费或通过诉讼威胁初创公司。同时，新兴稳定币项目Open USD（由Stripe、Visa等支持）可能对其构成挑战，因其利润返还用户的模式与Circle依赖储备收益的商业模式形成直接竞争。尽管Circle官方表示收购旨在强化“原生互联网金融”基础设施，并已成为美国最大区块链专利持有者，但其具体如何使用这些专利（如是否用于诉讼或施压开源开发者）尚不明确。2025年，Circle年收入达27亿美元，但净亏损7000万美元，主要源于上市后的股权补偿支出。此前，摩根大通曾警告Circle与Coinbase的新合作可能对USDC及公司自身带来风险。

cryptonews.ru37 分鐘前

cryptonews.ru37 分鐘前

Ripple为XRP迈出重要一步：再次完成两项大规模投资！

Ripple（XRP）在加密市场持续活跃，近期宣布对金融科技公司ZILO和Licuido进行战略投资，旨在为XRP Ledger添加受监管的基础设施。此举计划引入证券转让、资产发行和抵押品移动等功能，以增强XRPL的合规能力，方便机构投资者在符合监管要求的前提下发行、管理和转移资产。Ripple表示，这些投资将深化其在资本市场的布局，构建基于XRPL的资本市场基础设施。此前，Aviva Investors已在XRPL上将其美元流动性基金代币化，显示出机构投资者对该生态的兴趣日益增长。

cryptonews.ru37 分鐘前

cryptonews.ru37 分鐘前

交易

現貨

The Image Generation Model That's Hotter Than Nano Banana Has Leaked, Screenshots Are No Longer Evidence | Includes Prompts

文章摘要

Screenshots Can No Longer Be Used as Evidence

Does It Really Know What the World Looks Like?

Image Generation Is No Longer Just "Image Generation"

熱門幣種推薦

相關問答

你可能也喜歡

比特币是否开启新纪元？Strategy（微策略）在八月以新一轮抛售开局：宣布再次大规模出售BTC！

XRP现货基金七月以净流入2,729万美元收官

Hashdex将关闭运营两年多的最小规模比特币ETF

《财富》杂志警告Circle与IBM交易的风险

Ripple为XRP迈出重要一步：再次完成两项大规模投资！

交易

熱門文章

如何購買BANANA

相關討論

熱門問答

熱門分類

熱門標籤