Why Did Codex and ChatGPT Merge? What's Next for Codex? OpenAI Core Leader Answers Everything

marsbitPublished on 2026-07-05Last updated on 2026-07-05

Abstract

In 2026, OpenAI's Codex saw explosive growth, with weekly active users surging over 5x to 5 million since January, driven largely by the February launch of its desktop app. Codex desktop lead Andrew Ambrosino explains key shifts behind its evolution. A core change is the inversion of development costs: implementation is now cheap, while curation and taste—judging which of many AI-generated prototypes is valuable—have become the new scarcities. Ambrosino defines taste as a blend of aesthetics, systems thinking, direction, and semantic coherence in interaction. He notes AI still struggles with design because evaluating it requires human cultural context and abstract reasoning about how components relate—capabilities beyond current models. Timing is critical: the same Codex app would have failed months earlier; success hinges on the model's capabilities at launch. Roles are blurring within his team, with engineers, designers, and PMs overlapping significantly. However, Ambrosino cautions against eliminating specialized roles entirely, as each field retains deep expertise. On AI-assisted development, the focus has shifted from measuring code written by AI to distinguishing between supervised and unsupervised generation. A current challenge is teaching models to simplify code, not just add complexity. The merger of Codex and ChatGPT stems from observed user behavior: non-developers adopted Codex for general knowledge work despite its developer-centric interface. This revealed...

If you ask which AI product's growth has been the most remarkable in 2026, "Codex" undoubtedly ranks first.

Since this past January, the product's weekly active users have grown by over 5 times, showing a very steep growth curve. Currently, its weekly active user base has reached 5 million. Among them, the adoption rate of knowledge workers (non-developers) is more than 3 times that of the developer group.

Notably, a significant catalyst behind these steep growth curves was the release of the desktop App in February. This desktop version provided a dedicated, optimized interface, dramatically lowering the barrier to use and triggering explosive growth in Codex downloads and adoption.

Behind this steep growth curve, the person driving the changes in product form is a relatively less publicly discussed figure — Andrew Ambrosino, Head of the Codex Desktop Application Team.

As the person directly responsible for the evolution of the Codex desktop product, he stands at the intersection of two rapidly converging worlds: one centered on the "code-writing" developer toolchain, and the other being the rapidly expanding universal AI work entry point that reaches almost all knowledge work scenarios. From product release cycles to changes in user behavior, and even to how the team internally redefines the boundaries between "design," "engineering," and "product," what he observes is often closer to the essence of this transformation than the growth data itself.

The following interview, from his perspective, deconstructs what Codex has changed, why it merged with ChatGPT, and what its future iteration directions are.

Video link: https://www.youtube.com/watch?v=P3KDebPTUrw

We have compiled parts of the interview; please refer to the original video for full details.

Implementation Got Cheaper,

So What Got More Expensive?

A few years ago, the logic of entire product development was like this: implementation was expensive. So before writing code, you had to do a lot of upfront risk mitigation — writing documentation, conducting research, prototyping — to make design cheaper. Precisely because implementation itself was costly, you had to figure everything out beforehand.

Now that assumption has completely reversed. At OpenAI, the situation has become like this: give people lots of tokens, everyone has great ideas, so everyone is building things. The result is that for a feature needed, maybe 90 different teams are simultaneously exploring 90 different ways to implement it.

This means implementation is no longer the expensive part. So what got more expensive? Andrew was blunt: it's taste. More specifically, the curation process. When you face these 90 different attempts, you need the discernment to judge: which ones are good? How should these be folded into other features? How should this thing be framed? How many steps should this toggle button have? These decisions themselves are now the most expensive, most thought-intensive part.

What exactly is taste?

The word "taste" is overused in Silicon Valley. But for Andrew, it has a very specific meaning.

There's an interesting anecdote where Linear's Head of Product said someone overemphasized the aesthetic aspect of taste, then cited Paul Graham as an example — Paul Graham obviously has great taste, but he wears cargo pants. This shows taste is far more than appearance. Andrew listed the connotations of taste: there's an aesthetic level, but that's just part of it; there's a systems thinking level — how this thing integrates into the whole system; a sense of direction level — what theme is this part of; and a presentation level. Of course, there are also detail levels, like whether this interaction animation matches the semantic meaning it wants to convey — is it too fast, unsuitable for expressing this concept.

But the real core taste questions are like this: if we can build anything, then what do we want? What is this? How do we get there? These are the real taste questions.

It's not just about choosing what to do. It's also about how to present information, how to achieve goals, what medium to use. Taste is where the human mind remains most valuable in this new era.

Why is AI Still Not Good at Design?

This is an interesting paradox: Codex is already very powerful at writing code, but when using it to generate design, the output quality is often mediocre. It's rare to say "wow, it nailed it completely".

Andrew thinks there are several reasons behind this. First are practical reasons. Design is harder to score than software because human taste, which judges whether a design is good or bad, is itself part of the feedback mechanism. This makes training models difficult — unlike code, it's hard to measure with objective standards (does the code compile? does the function work?). Second, from a research investment perspective, the lab has historically invested the most resources to improve capabilities that accelerate AI research itself. In the early days of coding models, obviously being able to write correct code accelerated research. But whether design capability is good or not is less directly accelerating for AI research.

Deeper issues involve the complexity of design work itself. There's a cultural aspect to design — what counts as "good design" is culturally determined. Last year, all new websites were copying Linear's design; that was truly good design, with taste. But if a model outputs something looking like Linear every time, that's not progress, it's failure. Design needs novelty, whereas software engineering is almost the opposite — you almost always want code to follow known patterns.

The hardest problem to solve lies in the abstraction layer. When code drives visual design, there's a deep interaction between the two. For example, something in the top-left corner and something somewhere below should share the same abstraction in the codebase. It's not just that the model needs to become a better designer; it's that the model needs to understand these deeper structural relationships — if the company rebrands tomorrow, the shallow approach is to update 263 components one by one, but the deep understanding should be: these two seemingly different things are semantically the same, they're both lists, share the same styles, convey the same interaction patterns. This understanding of abstraction layers is still far out of reach for AI.

Why Couldn't Codex Be Released Earlier?

This is a very profound observation: a product's success depends not only on the design itself but also on the timing of model capabilities.

Andrew is very certain that if the Codex App had launched last November, it would have failed completely in the market. Yet the same product shape launched in February achieved huge success. The only variable was the improvement in model capability during those few months in between. In other words, the interaction design, user interface, the entire concept didn't change, but the increase in model intelligence completely changed the outcome.

This reveals a deep truth: in the AI era, whether a product is easy to use or valuable is not determined solely by UI design or interaction design, but by "what the model can do at this moment." The same idea, implemented with an old model, might be useless, but with a new model, it could be brilliant.

This also changes how product planning is done. Andrew saw this shift at his previous company: it's no longer "what do we plan to do for the whole year," but becomes "what do we believe the model will be able to do at what point, let's list all the things we're interested in, prototype all of them, then decide which ones we can do now, put the others aside and wait, and when the model makes a new leap, try those previously shelved ideas with the upgraded model". Because the premise of whether an entire feature works well is not the shape of the design, but whether the model is smart enough.

Have the Boundaries Between Engineers, Designers, and PMs Disappeared?

Lenny mentioned, looking at Andrew's resume — he has been an engineer, designer, product manager, entrepreneur — and now manages the entire desktop App, asking if the design team also reports to him. Andrew laughed and said "depends on the week" — reporting lines keep changing, but the team has always worked closely together, sitting embedded with each other.

Andrew said the outside world is already discussing "role collapse," saying there won't be separate roles anymore; his team hasn't reached that point yet, but the overlap between roles is indeed more pronounced than in other company departments, or even the entire industry — partly because Codex is itself a technical product aimed at engineers, designers on the team can speak the engineers' language, product managers can also write code, for example, another product lead Alexander has a Computer Science master's degree, while Andrew himself does not.

He believes a more accurate description now is: a person is no longer defined by boundaries like "where design ends and engineering begins," but by what they spend their average time doing — this is also related to the team's working style, because the entire App evolved by the team "eating its own dog food" internally; everyone wants to get things done within the App as much as possible, even if it's not yet the best tool for that task, so it can gradually become the best tool. The two also chatted about the origin of the title "member of technical staff." Andrew thinks it might have started at Xerox and is now considered a tradition in research-driven companies.

Lenny pressed further, does this mean everyone will become generic "builders" without distinct functions in the future? Will skill classifications like PM, design, engineering still exist? Andrew's stance is clear: he doesn't agree with completely abolishing role distinctions. He has seen many companies proclaim "abolish product roles, everyone is a builder," with the result being that the best practices and trial-and-error experience accumulated over years in the product profession are discarded as useless because of the notion that "I can write code too." He welcomes the disappearance of the territorial "this isn't your turf" kind of boundary, but each profession still has its own skill threshold — not just anyone using Excel can fill in for the finance department.

He also mentioned that switching roles is indeed easier now than before, because ability is no longer rigidly tied to "mastery of a specific tool": he himself long felt he shouldn't be an engineer because he disliked delving into assembly language or memorizing TypeScript syntax, and this threshold of "mastering a specific tool to do well" is crumbling. However, he also cautioned that this trend is currently overexaggerated externally.

The Most Cutting-Edge AI-Assisted Development Methods Today

Lenny pulled the topic back a layer: from purely manual coding, to AI being able to write 100% of the code, to now where "writing code" has become "guiding AI" — evaluating how much code a person writes has almost become "how many times did you correct the AI's direction." He asked, is the most cutting-edge approach now "loop" (autonomous loop development)? Specifically, how are the most advanced AI teams currently operating?

Andrew mentioned that a fundamental issue is that the question "how much code is written by AI" itself is no longer important, because by last year's standards, almost 100% of code is now written by AI; what we should really ask is, is this code written "supervised" or "unsupervised," which are two completely different things. He said he's happy to see this standard constantly being refreshed because it precisely shows the product is moving forward. The team has done a lot of exploration in the direction of "autonomous software development," including many attempts related to "harness engineering," such as envisioning letting the model run by itself at night to do a "garbage collection" style cleanup of the codebase.

He also admitted that all models currently have a common flaw — a tendency to make code increasingly complex. He half-jokingly said that if any company's research team is listening, they hope the model's ability to "delete code" can be trained better. This is also a practical problem encountered when handing development completely over to autopilot, for both people and the codebase: how to teach the model to judge which features to implement, which to ignore, which to merge and reclassify; how to teach the model to build the right abstract structures. These abilities are improving, but he believes it's not yet at the level of "set a loop and let it improve the product by itself, while also monitoring Twitter, Slack, emails," but the team is constantly working towards that direction.

Lenny pressed further, could there be a day when the team simply sets an ultimate goal like "win" or "make me a billion dollars" for the AI and that's it? Andrew laughed, saying he wouldn't dare to speak in absolutes, wouldn't easily assert "never" or "definitely will."

Why Did Codex and ChatGPT Have to Merge?

Where is Codex Heading in the Future?

Codex started as a command-line tool, later became a standalone App, with a clear initial positioning: a "developer tool" — not an IDE, can view code, but doesn't allow editing code.

Before the App's official external release, the team first conducted an internal trial at OpenAI (January-February). Feedback in engineering and research scenarios was very clear, very positive. But the team simultaneously discovered that people from almost all departments — marketing, PR, finance, legal — were also using this App, even though it wasn't friendly to them, with interfaces full of code and command-line permission requests, an experience not at all designed for them.

The team's initial response was to port Codex's capabilities into other product interfaces, like the ChatGPT desktop application and Atlas browser, making them more general-purpose knowledge work tools. But the result was that no one wanted to leave the Codex App to use those "specialized" Apps. This made the team realize: the boundary between developer tools and general-purpose knowledge tools is collapsing; Codex and ChatGPT are more like different entry points to the same capability, not two independent product categories.

The team's conclusion was: this product suite should be built as a sufficiently generic, extensible foundation, capable of simultaneously accommodating deep scenarios like finance, legal, science, etc. The real challenge lies only in "how to make it generic enough" — this is also the team's answer to the question "is Codex a developer tool, or is it just ChatGPT?".

Host Lenny thus pointed out: Codex has already become more useful, more fun than the ChatGPT App itself, users are flocking to it, so merging is an inevitable direction to avoid cognitive confusion.

Andrew responded with a laugh, saying some call this direction the "super app," and he somewhat regrets that term being said, because ever since, he's been surrounded by this notion daily.

Lenny pressed: without calling it a "super app," but is the core idea "users go to one place and can get everything done"? Or is this not yet settled?

The answer Andrew gave is the concept of a "home base": this should be a good "home court," a place where users can track all their pending tasks across different product interfaces. For some things, users can complete them entirely within the App; for others, the App is responsible for invoking, opening other applications to complete them — for example, the App can connect to Excel; the App does have a built-in spreadsheet editor, but for someone doing multi-billion dollar scale financing at OpenAI, requiring complex financial modeling, this built-in editor might be far from sufficient. So the App will directly talk to the Microsoft Excel add-in on the user's desktop, and once done, the user can simply close Excel.

In other words, this has never been about "we draw a box on the screen, and everything must happen within this box," but rather — this thing should become a "home" for the user: you start work here, end work here, automate work here, and whatever tool is needed, it goes to invoke that tool.

To illustrate this, Andrew told a specific story. When the Codex App first launched, the team shot a batch of promotional videos. The editing of these videos fell to an internal photographer. As it turned out, the photographer used Codex to edit these videos from start to finish — this was one of the first moments the team truly realized "oh my god, people are actually using this for that sort of thing."

The photographer thought of using Codex to edit videos purely out of curiosity, just to see if Codex could actually do it. Codex itself is completely not a video editor, with no UI related to editing in its interface, but it understood the photographer was using Premiere Pro, and could perform some editing operations by directly editing the engineering files behind Premiere Pro that support what's displayed on screen — but that couldn't cover all needs. So what Codex did next was write an extension plugin for itself that could be installed into Premiere Pro, then "talk" to Premiere Pro through this plugin — "hey, Premiere Pro extension, can you help me change this marker point?" The first time the team saw this process happen, they all thought it was too incredible.

From this, Andrew summarized a model: there already exist a vast number of professional tools in the world that are best-in-class in their respective fields. Codex — now to include ChatGPT — wants to do two things simultaneously.

The first is how to seamlessly collaborate with the tools users already use: the team doesn't need to rebuild a better video editor from scratch, but instead, let Codex and ChatGPT learn to use existing tools — interact with them, hand off tasks to them, typically achieved through connectors, computer use capabilities, or as in the Premiere Pro case, through extension plugins.

The second thing is the kind of vision Dan Shipper mentioned: users already have a bunch of web applications they can click around in, but wish to open these applications directly within Codex, letting Codex do more things for them inside. These two modes are almost mirror images of each other, and the team is currently pushing forward aggressively on both fronts simultaneously.

This article is from the WeChat public account "Machine Heart" (ID:almosthuman2014), author: Machine Heart

Trending Cryptos

Related Questions

QAccording to the article, what has become the 'most expensive part' of product development in the AI era, according to Andrew Ambrosino?

AAccording to Andrew Ambrosino, the most expensive part is 'taste'—specifically the curation process. With implementation becoming cheap, the critical skill is having the discernment to judge which of many potential features, prototypes, or ideas are valuable, how they should be integrated, and how the product should be framed and shaped.

QWhy does Andrew Ambrosino believe AI still struggles with design tasks compared to coding?

AAndrew gives several reasons: 1) Design is harder to evaluate objectively than code (does it compile/function?), as human taste is part of the feedback loop, making it harder to train models. 2) Research investment historically prioritized capabilities that directly accelerated AI research itself, like writing correct code. 3) Design has a cultural component—'good design' is culturally defined and needs novelty, unlike engineering which often follows established patterns. 4) The deepest challenge is the need for abstract understanding of the structural and semantic relationships between code and visual elements, which AI currently lacks.

QWhat key factor does Andrew identify as the difference between a failed launch and a successful one for a product like Codex?

AAndrew states that the key variable is the underlying model's capability at the time of launch. He is certain that if the same Codex application had been launched in November of the previous year, it would have completely failed. Its success in February was due solely to the significant improvement in model intelligence during those intervening months, not changes to the product's UI or interaction design.

QWhat is the core concept behind the integration of Codex and ChatGPT, and what example is given to illustrate its function?

AThe core concept is creating a 'home base' or a central hub from which users can start, automate, and manage work. It's not about containing all work within a single interface, but about seamlessly connecting to and orchestrating specialized tools. The example given is an internal photographer using Codex to edit a video. Codex, not being a video editor, wrote a custom plugin to interface with Adobe Premiere Pro, allowing it to direct tasks and make edits through the professional tool.

QWhat future direction for AI-assisted development does Andrew highlight, and what specific capability does he wish AI models had more of?

AAndrew discusses moving towards more 'unsupervised' code generation and exploration of 'loop' or autonomous development cycles. A specific capability he wishes AI models had more of is the ability to effectively 'delete code' or simplify codebases. He notes that models tend to make code increasingly complex and that improving their ability to clean up, refactor, and build correct abstractions is a crucial challenge.

Related Reads

Doubao and Qwen Will Discontinue Agent Functionality on July 15

On July 4th, Doubao and Tongyi Qianwen announced the impending shutdown of their user-created "AI Agent" features. Doubao confirmed its agent feature will be taken offline on July 15, directing users to ByteDance's CatBox app for similar needs. On the same day, Tongyi Qianwen notified users, specifying that personalized interactive agents and user-built agent functions will cease on July 10, with all agent features and services completely deactivated by July 15. After this date, access to agent configurations and historical chat records will be lost. This adjustment impacts core user scenarios like role-playing, personal assistants, and vertical tool agents. The shutdown date coincides with the official implementation of China's "Interim Measures for the Administration of Artificial Intelligence Human-like Interactive Services" on July 15. The new regulations impose strict rules on "human-like emotional interaction services," requiring platforms to implement measures like anti-addiction systems, minor verification, and content moderation. This move is widely seen as a proactive step by the platforms to align with regulatory timelines and mitigate compliance risks. Additionally, commercial challenges are a key driver. Analysis suggests that casual, human-like chat agents generate high-frequency, low-value interactions, leading to high computational costs with poor monetization. As the AI application market shifts from user growth to proving value, sustaining such "high-cost, low-efficiency" user-generated content becomes difficult. Both platforms have outlined transition plans. Doubao will allow data viewing and self-backup for a period after shutdown, with data scheduled for permanent deletion by October 15. Tongyi Qianwen similarly advised users to save important content via copying or screenshots before the deadline. This strategic retreat from C-end agent features signals a broader market shift. Compliance capability and sustainable business models are replacing user scale and feature richness as the new core competitive dimensions. Tongyi Qianwen's recent move to fully open its Agent and Skill platforms to third-party enterprises and developers further underscores a strategic pivot from low-value C-end services to high-value B-end enterprise scenarios.

marsbit1h ago

Doubao and Qwen Will Discontinue Agent Functionality on July 15

marsbit1h ago

Trading

Spot

Hot Articles

How to Buy CORE

Welcome to HTX.com! We've made purchasing CORE (CORE) simple and convenient. Follow our step-by-step guide to embark on your crypto journey.Step 1: Create Your HTX AccountUse your email or phone number to sign up for a free account on HTX. Experience a hassle-free registration journey and unlock all features.Get My AccountStep 2: Go to Buy Crypto and Choose Your Payment MethodCredit/Debit Card: Use your Visa or Mastercard to buy CORE (CORE) instantly.Balance: Use funds from your HTX account balance to trade seamlessly.Third Parties: We've added popular payment methods such as Google Pay and Apple Pay to enhance convenience.P2P: Trade directly with other users on HTX.Over-the-Counter (OTC): We offer tailor-made services and competitive exchange rates for traders.Step 3: Store Your CORE (CORE)After purchasing your CORE (CORE), store it in your HTX account. Alternatively, you can send it elsewhere via blockchain transfer or use it to trade other cryptocurrencies.Step 4: Trade CORE (CORE)Easily trade CORE (CORE) on HTX's spot market. Simply access your account, select your trading pair, execute your trades, and monitor in real-time. We offer a user-friendly experience for both beginners and seasoned traders.

5.6k Total ViewsPublished 2024.03.29Updated 2026.06.02

How to Buy CORE

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of CORE (CORE) are presented below.

活动图片