If you ask which AI product's growth has been the most remarkable in 2026, "Codex" undoubtedly ranks first.
Since this past January, the product's weekly active users have grown by over 5 times, showing a very steep growth curve. Currently, its weekly active user base has reached 5 million. Among them, the adoption rate of knowledge workers (non-developers) is more than 3 times that of the developer group.

Notably, a significant catalyst behind these steep growth curves was the release of the desktop App in February. This desktop version provided a dedicated, optimized interface, dramatically lowering the barrier to use and triggering explosive growth in Codex downloads and adoption.
Behind this steep growth curve, the person driving the changes in product form is a relatively less publicly discussed figure — Andrew Ambrosino, Head of the Codex Desktop Application Team.
As the person directly responsible for the evolution of the Codex desktop product, he stands at the intersection of two rapidly converging worlds: one centered on the "code-writing" developer toolchain, and the other being the rapidly expanding universal AI work entry point that reaches almost all knowledge work scenarios. From product release cycles to changes in user behavior, and even to how the team internally redefines the boundaries between "design," "engineering," and "product," what he observes is often closer to the essence of this transformation than the growth data itself.
The following interview, from his perspective, deconstructs what Codex has changed, why it merged with ChatGPT, and what its future iteration directions are.

Video link: https://www.youtube.com/watch?v=P3KDebPTUrw
We have compiled parts of the interview; please refer to the original video for full details.
Implementation Got Cheaper,
So What Got More Expensive?
A few years ago, the logic of entire product development was like this: implementation was expensive. So before writing code, you had to do a lot of upfront risk mitigation — writing documentation, conducting research, prototyping — to make design cheaper. Precisely because implementation itself was costly, you had to figure everything out beforehand.
Now that assumption has completely reversed. At OpenAI, the situation has become like this: give people lots of tokens, everyone has great ideas, so everyone is building things. The result is that for a feature needed, maybe 90 different teams are simultaneously exploring 90 different ways to implement it.
This means implementation is no longer the expensive part. So what got more expensive? Andrew was blunt: it's taste. More specifically, the curation process. When you face these 90 different attempts, you need the discernment to judge: which ones are good? How should these be folded into other features? How should this thing be framed? How many steps should this toggle button have? These decisions themselves are now the most expensive, most thought-intensive part.
What exactly is taste?
The word "taste" is overused in Silicon Valley. But for Andrew, it has a very specific meaning.
There's an interesting anecdote where Linear's Head of Product said someone overemphasized the aesthetic aspect of taste, then cited Paul Graham as an example — Paul Graham obviously has great taste, but he wears cargo pants. This shows taste is far more than appearance. Andrew listed the connotations of taste: there's an aesthetic level, but that's just part of it; there's a systems thinking level — how this thing integrates into the whole system; a sense of direction level — what theme is this part of; and a presentation level. Of course, there are also detail levels, like whether this interaction animation matches the semantic meaning it wants to convey — is it too fast, unsuitable for expressing this concept.

But the real core taste questions are like this: if we can build anything, then what do we want? What is this? How do we get there? These are the real taste questions.
It's not just about choosing what to do. It's also about how to present information, how to achieve goals, what medium to use. Taste is where the human mind remains most valuable in this new era.
Why is AI Still Not Good at Design?
This is an interesting paradox: Codex is already very powerful at writing code, but when using it to generate design, the output quality is often mediocre. It's rare to say "wow, it nailed it completely".
Andrew thinks there are several reasons behind this. First are practical reasons. Design is harder to score than software because human taste, which judges whether a design is good or bad, is itself part of the feedback mechanism. This makes training models difficult — unlike code, it's hard to measure with objective standards (does the code compile? does the function work?). Second, from a research investment perspective, the lab has historically invested the most resources to improve capabilities that accelerate AI research itself. In the early days of coding models, obviously being able to write correct code accelerated research. But whether design capability is good or not is less directly accelerating for AI research.
Deeper issues involve the complexity of design work itself. There's a cultural aspect to design — what counts as "good design" is culturally determined. Last year, all new websites were copying Linear's design; that was truly good design, with taste. But if a model outputs something looking like Linear every time, that's not progress, it's failure. Design needs novelty, whereas software engineering is almost the opposite — you almost always want code to follow known patterns.
The hardest problem to solve lies in the abstraction layer. When code drives visual design, there's a deep interaction between the two. For example, something in the top-left corner and something somewhere below should share the same abstraction in the codebase. It's not just that the model needs to become a better designer; it's that the model needs to understand these deeper structural relationships — if the company rebrands tomorrow, the shallow approach is to update 263 components one by one, but the deep understanding should be: these two seemingly different things are semantically the same, they're both lists, share the same styles, convey the same interaction patterns. This understanding of abstraction layers is still far out of reach for AI.
Why Couldn't Codex Be Released Earlier?
This is a very profound observation: a product's success depends not only on the design itself but also on the timing of model capabilities.
Andrew is very certain that if the Codex App had launched last November, it would have failed completely in the market. Yet the same product shape launched in February achieved huge success. The only variable was the improvement in model capability during those few months in between. In other words, the interaction design, user interface, the entire concept didn't change, but the increase in model intelligence completely changed the outcome.
This reveals a deep truth: in the AI era, whether a product is easy to use or valuable is not determined solely by UI design or interaction design, but by "what the model can do at this moment." The same idea, implemented with an old model, might be useless, but with a new model, it could be brilliant.
This also changes how product planning is done. Andrew saw this shift at his previous company: it's no longer "what do we plan to do for the whole year," but becomes "what do we believe the model will be able to do at what point, let's list all the things we're interested in, prototype all of them, then decide which ones we can do now, put the others aside and wait, and when the model makes a new leap, try those previously shelved ideas with the upgraded model". Because the premise of whether an entire feature works well is not the shape of the design, but whether the model is smart enough.
Have the Boundaries Between Engineers, Designers, and PMs Disappeared?
Lenny mentioned, looking at Andrew's resume — he has been an engineer, designer, product manager, entrepreneur — and now manages the entire desktop App, asking if the design team also reports to him. Andrew laughed and said "depends on the week" — reporting lines keep changing, but the team has always worked closely together, sitting embedded with each other.
Andrew said the outside world is already discussing "role collapse," saying there won't be separate roles anymore; his team hasn't reached that point yet, but the overlap between roles is indeed more pronounced than in other company departments, or even the entire industry — partly because Codex is itself a technical product aimed at engineers, designers on the team can speak the engineers' language, product managers can also write code, for example, another product lead Alexander has a Computer Science master's degree, while Andrew himself does not.
He believes a more accurate description now is: a person is no longer defined by boundaries like "where design ends and engineering begins," but by what they spend their average time doing — this is also related to the team's working style, because the entire App evolved by the team "eating its own dog food" internally; everyone wants to get things done within the App as much as possible, even if it's not yet the best tool for that task, so it can gradually become the best tool. The two also chatted about the origin of the title "member of technical staff." Andrew thinks it might have started at Xerox and is now considered a tradition in research-driven companies.
Lenny pressed further, does this mean everyone will become generic "builders" without distinct functions in the future? Will skill classifications like PM, design, engineering still exist? Andrew's stance is clear: he doesn't agree with completely abolishing role distinctions. He has seen many companies proclaim "abolish product roles, everyone is a builder," with the result being that the best practices and trial-and-error experience accumulated over years in the product profession are discarded as useless because of the notion that "I can write code too." He welcomes the disappearance of the territorial "this isn't your turf" kind of boundary, but each profession still has its own skill threshold — not just anyone using Excel can fill in for the finance department.
He also mentioned that switching roles is indeed easier now than before, because ability is no longer rigidly tied to "mastery of a specific tool": he himself long felt he shouldn't be an engineer because he disliked delving into assembly language or memorizing TypeScript syntax, and this threshold of "mastering a specific tool to do well" is crumbling. However, he also cautioned that this trend is currently overexaggerated externally.
The Most Cutting-Edge AI-Assisted Development Methods Today
Lenny pulled the topic back a layer: from purely manual coding, to AI being able to write 100% of the code, to now where "writing code" has become "guiding AI" — evaluating how much code a person writes has almost become "how many times did you correct the AI's direction." He asked, is the most cutting-edge approach now "loop" (autonomous loop development)? Specifically, how are the most advanced AI teams currently operating?
Andrew mentioned that a fundamental issue is that the question "how much code is written by AI" itself is no longer important, because by last year's standards, almost 100% of code is now written by AI; what we should really ask is, is this code written "supervised" or "unsupervised," which are two completely different things. He said he's happy to see this standard constantly being refreshed because it precisely shows the product is moving forward. The team has done a lot of exploration in the direction of "autonomous software development," including many attempts related to "harness engineering," such as envisioning letting the model run by itself at night to do a "garbage collection" style cleanup of the codebase.
He also admitted that all models currently have a common flaw — a tendency to make code increasingly complex. He half-jokingly said that if any company's research team is listening, they hope the model's ability to "delete code" can be trained better. This is also a practical problem encountered when handing development completely over to autopilot, for both people and the codebase: how to teach the model to judge which features to implement, which to ignore, which to merge and reclassify; how to teach the model to build the right abstract structures. These abilities are improving, but he believes it's not yet at the level of "set a loop and let it improve the product by itself, while also monitoring Twitter, Slack, emails," but the team is constantly working towards that direction.
Lenny pressed further, could there be a day when the team simply sets an ultimate goal like "win" or "make me a billion dollars" for the AI and that's it? Andrew laughed, saying he wouldn't dare to speak in absolutes, wouldn't easily assert "never" or "definitely will."
Why Did Codex and ChatGPT Have to Merge?
Where is Codex Heading in the Future?
Codex started as a command-line tool, later became a standalone App, with a clear initial positioning: a "developer tool" — not an IDE, can view code, but doesn't allow editing code.
Before the App's official external release, the team first conducted an internal trial at OpenAI (January-February). Feedback in engineering and research scenarios was very clear, very positive. But the team simultaneously discovered that people from almost all departments — marketing, PR, finance, legal — were also using this App, even though it wasn't friendly to them, with interfaces full of code and command-line permission requests, an experience not at all designed for them.
The team's initial response was to port Codex's capabilities into other product interfaces, like the ChatGPT desktop application and Atlas browser, making them more general-purpose knowledge work tools. But the result was that no one wanted to leave the Codex App to use those "specialized" Apps. This made the team realize: the boundary between developer tools and general-purpose knowledge tools is collapsing; Codex and ChatGPT are more like different entry points to the same capability, not two independent product categories.
The team's conclusion was: this product suite should be built as a sufficiently generic, extensible foundation, capable of simultaneously accommodating deep scenarios like finance, legal, science, etc. The real challenge lies only in "how to make it generic enough" — this is also the team's answer to the question "is Codex a developer tool, or is it just ChatGPT?".
Host Lenny thus pointed out: Codex has already become more useful, more fun than the ChatGPT App itself, users are flocking to it, so merging is an inevitable direction to avoid cognitive confusion.
Andrew responded with a laugh, saying some call this direction the "super app," and he somewhat regrets that term being said, because ever since, he's been surrounded by this notion daily.
Lenny pressed: without calling it a "super app," but is the core idea "users go to one place and can get everything done"? Or is this not yet settled?
The answer Andrew gave is the concept of a "home base": this should be a good "home court," a place where users can track all their pending tasks across different product interfaces. For some things, users can complete them entirely within the App; for others, the App is responsible for invoking, opening other applications to complete them — for example, the App can connect to Excel; the App does have a built-in spreadsheet editor, but for someone doing multi-billion dollar scale financing at OpenAI, requiring complex financial modeling, this built-in editor might be far from sufficient. So the App will directly talk to the Microsoft Excel add-in on the user's desktop, and once done, the user can simply close Excel.
In other words, this has never been about "we draw a box on the screen, and everything must happen within this box," but rather — this thing should become a "home" for the user: you start work here, end work here, automate work here, and whatever tool is needed, it goes to invoke that tool.
To illustrate this, Andrew told a specific story. When the Codex App first launched, the team shot a batch of promotional videos. The editing of these videos fell to an internal photographer. As it turned out, the photographer used Codex to edit these videos from start to finish — this was one of the first moments the team truly realized "oh my god, people are actually using this for that sort of thing."
The photographer thought of using Codex to edit videos purely out of curiosity, just to see if Codex could actually do it. Codex itself is completely not a video editor, with no UI related to editing in its interface, but it understood the photographer was using Premiere Pro, and could perform some editing operations by directly editing the engineering files behind Premiere Pro that support what's displayed on screen — but that couldn't cover all needs. So what Codex did next was write an extension plugin for itself that could be installed into Premiere Pro, then "talk" to Premiere Pro through this plugin — "hey, Premiere Pro extension, can you help me change this marker point?" The first time the team saw this process happen, they all thought it was too incredible.
From this, Andrew summarized a model: there already exist a vast number of professional tools in the world that are best-in-class in their respective fields. Codex — now to include ChatGPT — wants to do two things simultaneously.
The first is how to seamlessly collaborate with the tools users already use: the team doesn't need to rebuild a better video editor from scratch, but instead, let Codex and ChatGPT learn to use existing tools — interact with them, hand off tasks to them, typically achieved through connectors, computer use capabilities, or as in the Premiere Pro case, through extension plugins.
The second thing is the kind of vision Dan Shipper mentioned: users already have a bunch of web applications they can click around in, but wish to open these applications directly within Codex, letting Codex do more things for them inside. These two modes are almost mirror images of each other, and the team is currently pushing forward aggressively on both fronts simultaneously.
This article is from the WeChat public account "Machine Heart" (ID:almosthuman2014), author: Machine Heart






