This Time, OpenAI Eliminated 90% of Human Designers

marsbitPublished on 2026-04-23Last updated on 2026-04-23

Abstract

OpenAI's latest release, GPT-Image 2, marks a paradigm shift in AI-generated imagery, moving beyond aesthetic quality to logical reasoning and contextual understanding. The model introduces a "thinking mode," where it performs background reasoning—such as mathematical calculations and geographic knowledge—before generating images. This enables highly accurate and context-aware outputs, like a livestream overlay showing precise distance metrics or a brand-aligned poster design. The model excels in rendering Chinese text with remarkable accuracy and aesthetic quality, a significant improvement over previous versions. It supports multi-turn conversational editing via the new Responses API, allowing iterative refinements similar to chatting with a large language model. While GPT-Image 2 demonstrates unprecedented capabilities in commercial applications like marketing material and illustration—potentially displacing many human designers due to its cost efficiency—it still has limitations. Minor artifacts in fine text details persist, and complex prompts can cause extended processing times. Additionally, the technology raises ethical concerns around deepfakes and digital trust. Overall, GPT-Image 2 transitions AI image generation from a novelty to a powerful production-ready tool, redefining industry standards and pushing the boundary of what’s possible in visual AI.

By Silicon-based Spark

That famous Sam Altman meme has now come true for everyone.

Last year, while promoting GPT-5, the OpenAI CEO said something that later became an internet sensation: "The feeling is like witnessing an atomic bomb explosion, leaving one dizzy and collapsing." Since then, whenever the AI community releases a new product with exaggerated marketing copy, this meme gets dragged out and ridiculed repeatedly.

But late the night before last, it wasn't Altman who was left dizzy and collapsing. This time, it was all the users staring at their screens waiting for OpenAI to play its hand.

Altman, as usual,故作神秘故作神秘 (played it coy故作神秘故作神秘), posting a tweet: "We've prepared something fun."

By 3 a.m., GPT-Image 2 was released. The global AI community exploded.

"Images are a language, not decoration."

This is the first sentence written on OpenAI's release page. Translated, it means one thing: from today, images are no longer just decorations; they are a language in themselves. This is a declaration of a generational leap for the entire computer vision industry.

For the past year, AI image generation was stuck in the aesthetic quagmire of "does it look realistic?" The arrival of GPT-Image 2 directly pressed the switch—AI image generation officially entered the intelligence exam hall of "is the logic correct?".

The precision of this model can be described as "terrifying."

It topped both the text-to-image and image editing rankings on Artificial Analysis, and its practical performance is crushing.

The feeling is like when Seedance 2.0 arrived in the video generation field—it long ceased being just an auxiliary tool for humans; it is defining the new industry standard.

Note: All images in this article are generated by GPT-Image 2. The image content is purely fictional.

01 The Awakening of the Thinking Engine

In the past, the primary standard for judging an image model was how much it resembled a real person or a reference object.

In the face of this monster, GPT-Image 2, that standard is obsolete. Completely obsolete.

The core breakthrough of the new model is this: it is an image model that supports a thinking mode.

What does that mean? After the user inputs a prompt, the model doesn't simply denoise and stitch pixels. It first completes a round of thinking and modeling in the background, *then* it starts drawing.

A test image leaked from the Linux.do community best illustrates the point. The model simulated a live stream of Lei Jun running:

Image source: https://cdn3.linux.do/original/4X/0/f/3/0f37c8bc968e3d563cc6100d8e7f80ee305661ff.jpeg

This image made many developers gasp. Lei's facial features are accurately reproduced—almost like a photo—the image clearly shows: Live stream target 1313km, Distance run 425.7km, Remaining distance 887.3km. Even more impressive, the current altitude is marked as 3658m.

What does 3658m mean? From Beijing to Lhasa, the typical altitude upon entering the Tibetan region is precisely this number.

In human eyes, this is simple arithmetic and common geographical knowledge. But think about it: For an image model, what does the triple unification of mathematical logic + geographical常识 (common sense) + UI specifications mean?

The conclusion is straightforward: Before generating the first pixel, GPT-Image 2 had already completed a round of reasoning. It understood the meaning of "distance," understood the logical relationship of addition and subtraction, and also understood the visual characteristics of high-altitude areas.

This isn't drawing. This is thinking.

02 From Toy to Productivity Tool

In the face of this capability, everyone's attitude towards image models needs to change.

It's long ceased to be a toy for drawing avatars or making wallpapers. It has stepped over the "usable" threshold and rushed directly into the "easy to use" zone—a tool that can be thrown into commercial scenarios to get work done.

Take poster design. GPT-Image 2's composition aesthetics, light and shadow processing, and grasp of brand tone have undoubtedly reached a height that the vast majority of ordinary human designers find difficult to achieve.

Image source: https://cdn3.linux.do/original/4X/7/a/1/7a12ccd6b745be5ad8828eb0ac225d218fb43cbc.jpeg

In human society, hiring a senior graphic designer to create a commercial-grade poster often entails significant communication costs, time costs, and design fees of over a thousand yuan, which can be a heavy burden for small and medium-sized enterprises.

However, with GPT-Image 2, even if you are unsatisfied and need to adjust dozens of times, the cost is only a few dollars.

In fields like poster design, marketing materials, and illustration, what users care about is not "realism," but "is it good-looking, is it accurate." Precisely because of this, AI's replacement efficiency is devastating.

In the synchronously updated developer documentation, there is also an exciting detail hidden: the sample code frequently appears model: "gpt-5.4".

The thinking mode combined with the flagship model hints at one thing: GPT-Image 2 is by no means an isolated product. It is the visual terminal born for the next generation of large language models.

Through the new Responses API, the image generation process will interact as naturally as chatting with a large language model. The model adds a function that allows for multi-turn conversational modifications. After the initial image generation, users can propose various instructions that give human designers high blood pressure for modifications.

Through the new Responses API, the image generation process will interact as naturally as chatting with a large language model. The model adds a multi-turn conversational modification function. After the first version is generated, users can propose various instructions that would send a乙方 (Party B) designer's blood pressure soaring: "Make the background a bit darker." "Move the logo a few pixels to the side."

These interactive real-time modification demands are precisely the most tedious and patience-consuming parts of a designer's daily work. Now, they are solved.

03 The Pinnacle of Chinese Rendering

Although GPT-Image 2 is a foreign model, domestic users are overwhelmingly positive.

There's only one reason: Its support for Chinese characters is nearly perfect.

In the community's actual test return images, you can see the famous debate scene between Luo Yonghao and Wang Ziru:

Image source: https://cdn3.linux.do/original/4X/0/9/7/097ed46991d2464442aebc6b1076a292cc839fec.jpeg

You can see Elon Musk live-streaming sales of Lao Gan Ma chili sauce:

Image source: https://cdn3.linux.do/original/4X/2/f/a/2fa77cf040e6337643829df4ec5ca6467d2866b2.jpeg

You can even see a doctor's prescription:

Image source: https://cdn3.linux.do/original/4X/9/f/f/9ffeab83675648b43116cd0763f6c8b560611ae6.jpeg

The text in these images is no longer crooked,胡乱拼凑的 (haphazardly拼凑的) "pseudo-Chinese characters," but mature design drafts possessing calligraphic charm, typographical hierarchy, and排版 (layout) artistry.

Clearly, OpenAI has injected a massive amount of Chinese language image data into the training set and conducted targeted intensive training.

Compared to the previous generation model, GPT-Image 2's power is even more淋漓尽致地 (thoroughly) evident.

In comparative tests, the previous generation model, version 1.5, could draw something resembling a recipe, but upon closer inspection, the text was almost all gibberish.

Image source: https://cdn3.linux.do/optimized/4X/2/b/3/2b38f3c1a134515d564f07f81661c0bd9578c6b9_2_750x750.jpeg

But the same recipe generated by GPT-Image 2 shows a milestone breakthrough in text clarity and aesthetics.

Image source: https://cdn3.linux.do/original/4X/0/2/5/02513b10135d824ccb1c22bd0c7eb441f1e34455.jpeg

For prompts with over a hundred Chinese characters, the five steps are still clearly visible, and the图文一致性 (text-image consistency) is satisfactory. This isn't just an image; it's a reproducible practical guide.

However, this also raises an interesting technical question: Has the image model really completely solved the gibberish problem?

My judgment is: Probably not.

Large language models generate tokens based on semantic logic. During the reinforcement learning phase, it's based on probability; the higher the quality and quantity of the training data, the more logical the output. But the essence of an image model is, after all, pixel generation. The logical relationship between pixels is fundamentally different from the logical relationship between words.

In other words, as powerful as GPT-Image 2 is, it does not truly "understand" the rules of text. It has merely memorized the pixel-level appearance of text by rote.

An image of doing business with Altman暴露 (exposes) this point: The large characters "Mengniu" and "Wanglaoji" on the two boxes of drinks are written perfectly, but the small text below is still模糊的色块 (blurry color blocks).

Image source: https://cdn3.linux.do/original/4X/d/7/c/d7c4fb063202bcbf56b9ca0623aa0ce6fc26e542.jpeg

Under the current technical paradigm, the generation logic is still "arrange by pixels," which is fundamentally different from "render by characters." Extremely subtle gibberish may never be completely eradicated.

But that said, for over 90% of commercial application scenarios, this is already sufficient.

04 Un-deified Flaws and Boundaries

Even though it already sits on the world's number one throne, GPT-Image 2 also has its clumsy side.

Actual tests found that because the thinking mode calls for web searches and performs logical reasoning, when processing extremely complex fictional tasks, the model occasionally falls into a logical loop—thinking for nearly 40 minutes and still unable to answer.

At the same time, the API's claimed support for 2K甚至 (even) 4K resolution implies extremely high token consumption and latency.

For ordinary users, how to balance ultimate image quality with response speed will be a required course for future use.

In the field of technology, powerful capability is always a double-edged sword.

Whether it's image models or video models, they inevitably face the ethical challenges of deepfakes.

In most current test cases, the AI generates images of well-known figures, but if they are replaced with ordinary people who have posted photos on various social media platforms, it is already extremely difficult to distinguish the fake from the real without knowing the person.

Apart from the occasional gibberish in the background that might give the AI away, the human body itself has no flaws left.

Therefore, those fields that once required real people are facing an unprecedented crisis of trust.

The release of GPT-Image 2 has moved image generation models from toys to productivity tools.

In the past, people used AI for inspiration, but now AI is beginning to尝试接管 (attempt to take over) the entire process from conception, calculation, typesetting, to finished product.

For design practitioners, this is an era filled with FOMO (Fear Of Missing Out).

But for those who are good at using tools, possess product aesthetics, and logical thinking, this is also the best of times.

Images are beginning to learn to think,文字不再是像素的杂音 (text is no longer the noise of pixels).

People may truly be only one step away from that visual singularity of所思即所得 (what you think is what you get).

Trending Cryptos

CitreaCTR

wrapped stUSDTWSTUSDT

Wall Street's Most Famous 'Cassandra' Now Has His Sights Set on Nvidia

Michael Burry, the famed "Big Short" investor, has once again captured Wall Street's attention with a series of short positions against major tech and semiconductor stocks, most notably Nvidia. In late June and July, through his "Cassandra Unchained" newsletter, Burry disclosed short bets against Nvidia, Tesla, Applied Materials, Caterpillar, the SOXX semiconductor ETF, and later, Micron Technology. His core thesis revolves around potential distortions in the AI infrastructure boom, specifically questioning whether extended depreciation schedules (e.g., 6 years vs. a realistic 2-3 years for AI chips) by cloud giants like Microsoft and Google artificially inflate profits. He also raises concerns about possible "off-balance-sheet circular financing," where chip demand might be propped up by vendor-backed funding to clients. Nvidia's stock experienced volatility following these disclosures, briefly dipping but largely holding near Burry's reported entry points, leaving his positions roughly flat or slightly underwater as of late July. This move is part of a pattern for Burry, whose track record since his legendary 2008 bet is mixed. He has faced notable losses, such as on Tesla in 2021, while scoring on broader market turns like the 2020 pandemic crash. His methodology focuses intensely on free cash flow and scrutinizing original financial documents to spot overvaluation and structural risks, but it often struggles with timing the market. The article contrasts Burry's stance with other prominent investors. Steve Eisman, another "Big Short" figure, is not shorting Nvidia, citing strong fundamentals but expressing nervousness about sustainability. Jim Chanos agrees with the broad "accounting mismatch" concern—comparing it to the dot-com bubble—but targets financial leverage in private equity firms rather than the chip stocks themselves. While Nvidia's short interest remains relatively low at 1.3-1.4% of float, the massive stock size means absolute short losses have been significant, exceeding $5 billion earlier this year. The piece concludes that for ordinary investors, the key takeaway is not replicating specific short bets but learning from the critical frameworks these investors use: questioning rosy accounting, identifying structural vulnerabilities, and maintaining skepticism during market euphoria, even if pinpointing the exact catalyst for a downturn remains elusive.

marsbit7m ago

Wall Street's Most Famous 'Cassandra' Now Has His Sights Set on Nvidia

marsbit7m ago

Weekly Selection丨Epic Stock Market Volatility, Changxin Tech's IPO Reshapes Storage Landscape, Saylor Aims to Re-Anchor STRC Around September 8th

PANews Weekly Digest: Market Turmoil, Tech Breakthroughs, and Crypto Developments. The week saw significant volatility across global markets. South Korea's KOSPI index experienced extreme turbulence, including multiple trading halts, largely driven by sharp declines in AI hardware stocks like SK Hynix. In contrast, China's Changxin Xinqiao (CXC) achieved a landmark IPO with a market cap surpassing 4 trillion yuan, marking a major success for the domestic DRAM industry after a decade of losses. In the crypto and Web3 space, several key narratives emerged. AI is driving demand for new infrastructure, with projects like AI agent wallets and programmable payments gaining traction, attracting interest from firms like Coinbase. The Bitcoin mining sector is pivoting, with companies like MARA focusing on energy management as electricity becomes a core AI-era asset. Meanwhile, the RWA (Real World Assets) sector faces a "utilization puzzle," with hundreds of billions in on-chain assets remaining dormant. Notable market movements included a historic single-day surge of over 17% for the KOSPI index and a significant migration of $16.5 billion in staked ETH within the Lido ecosystem. Michael Saylor announced a target to re-peg the STRC stablecoin around September 8th. Other highlights include discussions on Ethereum's ambitious 2030 roadmap for scaling and privacy, analysis showing high protocol revenues not always translating to token price gains, and warnings from Citi about potential extreme commodity price shocks by late 2026.

marsbit13m ago

Weekly Selection丨Epic Stock Market Volatility, Changxin Tech's IPO Reshapes Storage Landscape, Saylor Aims to Re-Anchor STRC Around September 8th

marsbit13m ago

When the Market Begins to Question AI Capex: A Full Analysis of Q2 Earnings Reports from Five Tech Giants

In late July 2026, five major US tech giants—Alphabet, Intel, Microsoft, Meta, and Apple—released their Q2 earnings reports. While all companies exceeded revenue and profit expectations, driven by strong AI-related business growth, investor reactions diverged sharply due to concerns over escalating AI capital expenditures (capex) and their impact on free cash flow. Alphabet reported strong revenue growth and a surging cloud business, but its stock fell after announcing a doubled year-on-year capex and negative quarterly free cash flow for the first time. Intel posted its strongest revenue growth in over 15 years, but its stock experienced volatile trading after significantly raising its full-year capex guidance. Microsoft saw its stock surge after beating estimates and, crucially, lowering its capex forecast while projecting positive free cash flow. Meta faced the most severe sell-off as its profits declined despite revenue beats, with free cash flow plunging over 90% and its capex guidance raised. Apple reported record June-quarter results, but its stock plummeted after providing Q4 revenue guidance that fell short of expectations, citing supply chain constraints and forex headwinds. The overall takeaway is that the market's focus has shifted from validating AI demand to scrutinizing the timeline for returns on massive AI investments. Companies demonstrating a clearer path to managing capex and preserving free cash flow, like Microsoft, were rewarded, while those signaling continued aggressive spending faced investor skepticism.

Odaily星球日报22m ago

When the Market Begins to Question AI Capex: A Full Analysis of Q2 Earnings Reports from Five Tech Giants

Odaily星球日报22m ago

a16z: From Companies to DAOs, DUNA May Become the Next Generation Organizational Form

This article, "From Companies to DAOs: How DUNA Could Become the Next Organizational Form," traces the 500-year evolution of business collaboration. It begins with medieval structures like the *commenda* and Florentine *compagnia*, which exposed partners to personal risk. The modern corporation, exemplified by the Dutch East India Company (VOC), was a revolutionary leap, enabling large-scale, capital-intensive ventures by offering limited liability and reducing coordination costs. However, corporations introduced new challenges like principal-agent problems and bureaucratic overhead. The piece argues that software and internet-native protocols are now reducing these traditional overheads. Decentralized Autonomous Organizations (DAOs) emerged as a new model for coordination without centralized management. Yet, DAOs face a significant legal vacuum: they lack legal recognition, leaving members exposed to unlimited personal liability, and their tokens are vulnerable to being classified as securities under unclear regulations (e.g., the Howey Test). This has forced projects into suboptimal workarounds like offshore foundations. The article identifies the Decentralized Unincorporated Nonprofit Association (DUNA) as a potential solution. Recently legalized in states like Wyoming, the DUNA provides a legal wrapper for decentralized networks. It grants key protections—legal personality, limited liability, and perpetual existence—to a group without imposing a traditional hierarchical management structure. This allows token-holder communities to govern, hold assets, and contract as a single legal entity, aligning with their decentralized nature. While DUNA doesn't solve all governance challenges or magically resolve securities law questions, it represents a crucial step. It fills the legal recognition gap, offering a native legal form for internet-scale, decentralized collaboration and extending the separation of personal risk from organizational venture into a new domain.

marsbit1h ago

a16z: From Companies to DAOs, DUNA May Become the Next Generation Organizational Form

marsbit1h ago

2026 Mid-Year Report On-Chain RWA: Tokenized Stock Market Cap Doubles in a Year, But 90% of Rights Are Hollow Shells

The 2026 Mid-Year Report on On-Chain RWA highlights a significant growth in tokenized stock market capitalization, which nearly doubled from $951 million in March to $1.89 billion by July. However, the report reveals a fundamental contradiction in this "layer 2.5" ecosystem: products with the strongest legal foundation (like regulated U.S. infrastructure) lack liquidity and distribution, while freely tradable offshored wrapper products often lack substantive ownership rights. The increase is driven largely by a few products (SECZ, FGRS, STRCx) and platforms (Ondo, xStocks, Securitize collectively hold over 85% share). While distributed value across networks like Ethereum, Solana, and BNB Chain has grown, the market remains fragmented. Products referencing the same underlying asset (e.g., Apple stock) are distinct legal liabilities with different intermediaries and jurisdictional rules, offering varying degrees of legal claim. The report cautions that headline numbers are misleading, as they reflect changes in distributed token value—driven by issuance, conversions, and price movements—not pure investor inflows. True "canonical shares" with legal ownership, wide wallet distribution, institutional liquidity, and independent on-chain price discovery do not yet exist at scale. Tokenized treasuries show stronger product-market fit, and ETFs may be easier to scale than single stocks. The core takeaway is a trade-off: legal certainty versus liquidity and composability.

marsbit1h ago

2026 Mid-Year Report On-Chain RWA: Tokenized Stock Market Cap Doubles in a Year, But 90% of Rights Are Hollow Shells

marsbit1h ago

Trading

Spot

Hot Articles

Hot Tokens Learning Week 5: MEME Sector Heats Up Again | Onchain Cloud Mainnet May Become the Turning Point of FIL

ZEC has been attracting significant attention recently; a Grayscale report highlights privacy demand, and Bitwise has filed a ZEC ETF application.

29.6k Total ViewsPublished 2026.01.06Updated 2026.01.06

Hot Tokens Learning Week 5: MEME Sector Heats Up Again | Onchain Cloud Mainnet May Become the Turning Point of FIL

Hot Tokens Learning Week 6: Solana Meme Revival | 2026 as a Potential Transformational Year for XRP

Grayscale Outlook: 2026 is the transformational year for XRP and the beginning of the institutional era.

26.6k Total ViewsPublished 2026.01.13Updated 2026.01.13

Hot Tokens Learning Week 6: Solana Meme Revival | 2026 as a Potential Transformational Year for XRP

Hot Tokens Learning Week 12: Mar-a-Lago Crypto Summit Drives Market Attention with Meme Narrative Reviving

The community is highly anticipating the Mar-a-Lago Crypto Summit to be held on April 25, with many viewing it as a “buy tokens = get closer to the President” opportunity.

25.8k Total ViewsPublished 2026.04.14Updated 2026.04.14

Hot Tokens Learning Week 12: Mar-a-Lago Crypto Summit Drives Market Attention with Meme Narrative Reviving

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of MEME (MEME) are presented below.

This Time, OpenAI Eliminated 90% of Human Designers

Abstract

01 The Awakening of the Thinking Engine

02 From Toy to Productivity Tool

03 The Pinnacle of Chinese Rendering

04 Un-deified Flaws and Boundaries

Trending Cryptos

Related Questions

Related Reads

Wall Street's Most Famous 'Cassandra' Now Has His Sights Set on Nvidia

Weekly Selection丨Epic Stock Market Volatility, Changxin Tech's IPO Reshapes Storage Landscape, Saylor Aims to Re-Anchor STRC Around September 8th

When the Market Begins to Question AI Capex: A Full Analysis of Q2 Earnings Reports from Five Tech Giants

a16z: From Companies to DAOs, DUNA May Become the Next Generation Organizational Form

2026 Mid-Year Report On-Chain RWA: Tokenized Stock Market Cap Doubles in a Year, But 90% of Rights Are Hollow Shells

Trading

Hot Articles

Hot Tokens Learning Week 5: MEME Sector Heats Up Again | Onchain Cloud Mainnet May Become the Turning Point of FIL

Hot Tokens Learning Week 6: Solana Meme Revival | 2026 as a Potential Transformational Year for XRP

Hot Tokens Learning Week 12: Mar-a-Lago Crypto Summit Drives Market Attention with Meme Narrative Reviving

Discussions

Top Questions

Hot Categories

Hot Tags