Editor's Note: This article is from Dominik Kundel, a member of OpenAI's Developer Relations team, summarizing experience using the Codex "goal mode / /goal" feature. It discusses not an ordinary prompt technique, but a role shift happening in AI programming tools: Codex is no longer just a code assistant responding to single-turn instructions, but is beginning to become an execution-type Agent that can continuously advance around a clear goal.
In /goal mode, what truly matters is not writing longer and more detailed requirements, but setting clear, verifiable exit criteria for Codex. Examples include "reduce deployment time by 30%," "achieve 100% parity in test coverage," "lower LCP below 2.5 seconds." These metrics allow Codex to judge whether the task is complete, also preventing it from endless trial and error in vague objectives. Meanwhile, users need to provide sufficient direction, tools, and a real environment, enabling Codex to measure progress and verify results, rather than just completing a seemingly feasible solution locally or under hypothetical conditions.
The article particularly warns that visual tasks are most likely to trap Codex in a quagmire of details. Instead of demanding "100% pixel-perfect replication," it's better to break down visual objectives into functional checklists, design system specifications, and evaluable metrics. For long-term tasks spanning hours or even days, continuous tracking is also needed via commits, draft PRs, progress documents, Slack updates, or side chats to avoid ending up with a pile of untraceable changes.
The informational value of this article lies in redefining /goal as a "long-term task management mechanism." When AI can execute continuously for dozens or even hundreds of hours, the developer's core competency also shifts: not just making AI generate code, but defining goals for it, establishing measurement systems, configuring execution environments, and finally performing review and reflection. In other words, AI programming is moving from "writing prompts" to "managing a continuously working engineering executor."
The following is the original text:
We launched goal mode (or /goal) to help you get Codex to continuously advance towards a specific outcome. After you set a goal, Codex will keep working until the goal is achieved—whether that takes a few hours or a few days. People have already had Codex work on the same goal for over 120 consecutive hours.
Goal mode is extremely powerful. To maximize its effectiveness, here are 7 things worth noting when using /goal.
Set Clear, Verifiable Criteria
The prompt you enter when activating goal mode serves both as the initial prompt and, more importantly, as the exit criteria for this goal. After each round of work, Codex will check: has this goal been completed?
Therefore, your goal prompt shouldn't be overly long but should focus on a clear criterion: under what conditions can this goal be considered achieved.
In most cases, a good goal is best to include a specific numerical metric for the model to judge completion. For example:
"Reduce build and deployment time by 30%."
"Migrate this feature from TypeScript to Rust, achieving 100% test parity."
"Optimize the application scaffolding so that the Largest Contentful Paint (a metric measuring the speed of loading the main page content) in production is below 2.5 seconds."
This prompt doesn't always need to include numbers, but generally, numbers make subsequent steps easier to advance.
If you're still unsure how to define the goal, or want to brainstorm the project with Codex first, you don't have to start the conversation with goal mode from the beginning.
Codex can set its own goals. You can start a normal conversation first, and when you're ready for Codex to start executing, then have Codex set the goal based on the preceding discussion.
You can also edit the goal at any time: click the edit button in the Codex app, or use /goal again in the CLI.
Provide Guidance Whenever Possible
A prompt like "Reduce build and deployment time by 30%" sounds cool and might lead Codex to some creative solutions. But if you already have a rough idea of where the problem might lie, this kind of prompt could also send Codex down the wrong path.
So, whenever possible, it's best to tell Codex where to start investigating, which tools can be used to accomplish the goal, or give other hints to prevent it from heading in the wrong direction.
For example, my colleague @reach_vb did this in an experiment: he told Codex it could use the Chrome browser to access Google Colab and explained some acceptable constraints, such as allowing it to generate its own dataset when training models.
Similarly, if you want to shorten build times and already know where most of the time is being spent, it's best to point Codex to that area in the prompt first.
Another approach is to let Codex do some preliminary research in plan mode first and have it create a plan file to record potential solutions. Then, have your goal reference this plan.
Make Progress Measurable
If your goal is ambitious, or Codex has many ways to gradually approach the goal, then it's important: you need to provide Codex with the tools to measure progress.
For some tasks, this might be inherently true. For example, optimizing build times, improving test coverage, because Codex can usually already use the relevant tools or will naturally create these tools.
But for other goals, you're better off brainstorming with Codex first: which tools help judge progress? Or give it some hints about how it can confirm whether it's moving towards the goal. For instance, creating a visual diff tool for two screenshots, or creating an evaluation set for the agent you're debugging.
I once asked Codex to recreate some components based on a video, and Codex created a tool for itself to compare screenshots and check for differences. Later, it continuously iterated on this tool, adding different diff modes.
Depending on the task, you also need to consider whether there are additional criteria that need to be measured or checked. Otherwise, Codex might think the task is complete, but from your perspective, it's not actually finished.
For example, Codex might directly crop the design reference image and embed it into the page to achieve "pixel-perfect" replication of a certain UI; or it might reduce the test coverage to make the test pass rate reach 100%. These are not the completion methods you actually want.
Create a Realistic Environment
If you want Codex to make truly effective progress towards a goal, it needs to run in a sufficiently realistic environment.
In practice, this means: if you want to optimize deployment time or latency issues, Codex should have access to deployment and testing environments, and these environments should simulate production as closely as possible. That is, using the same tech stack, the same configuration flags, and similar databases.
For example, we once debugged optimization for build and deployment times on developers.openai.com. At that time, we were already using deployment previews, so Codex could use these preview environments for deployment and view related logs. However, the problem was that our preview deployments, compared to the full production environment, had disabled some build paths.
Therefore, Codex ultimately had to perform manual deployments, deploying the code to environments closer to production configuration, to truly check the issues.
Similarly, you can also let Codex use computer use (the ability for the model to operate real application interfaces) to test actual applications. To optimize some performance issues on iOS, @dimillian even used physical devices to obtain the most accurate testing environment.
Set Visual Goals Cautiously
Giving Codex a visual goal, like "100% pixel-perfect replication of this UI based on this image," is indeed tempting. But depending on the specific setup, it can also cause trouble.
If you don't provide proper guidance and constraints, Codex might get bogged down in certain details, neglecting the overall goal. For example, if the reference image contains some graphic elements, and you expect Codex to generate these elements—whether SVG icons or images—it might spend a lot of effort on "how to precisely replicate these assets," rather than correctly breaking down the entire problem.
Additionally, Codex needs tools to perform visual comparisons correctly. This means more image input, higher overall token consumption, but doesn't necessarily provide Codex with a simple way to identify truly valuable improvement opportunities.
Therefore, images are usually better suited as goal context, not the sole completion criterion. You should find other ways for Codex to judge whether the goal has been achieved, such as functional checklists, implementation specifications, compliance with design systems, etc.
Track Progress
If Codex ends up working in the background for hours or even days, perhaps even running on another machine, it's easy to forget exactly where it has progressed and what work has already been done.
Depending on the goal, I've found the following methods helpful:
· Have Codex commit code at key milestones and push to a draft PR. This is especially useful when you're working on a website and have preview deployments.
· Have Codex update a deliverable for management. It could be an HTML file you keep open in the in-app browser; a page deployed via Sites for team viewing; a rendered progress chart, or just a regular Markdown file.
· Instruct Codex to actively publish progress updates. You can also write this into the goal: have Codex send updates to a Slack channel or other places where you want to log progress when significant progress is made.
· Use other chat windows to inquire about status. If you just want a quick overview of the current state, you can run /side to start a new side chat and ask questions there. Because it forks from the current thread, it has all the context up to that point but has a short lifespan.
· Another alternative in the Codex app is: open a regular new chat, have Codex read another goal thread, and answer your questions. This method is especially powerful if you have Codex set up an automated task to regularly check progress.
Clean Up and Finalize the Result
Great, the goal is finally completed! Now can you just toss the result to the team and call it a day?
Usually, especially in optimization tasks, I find it helpful to have Codex review and reflect on the work it has done. You can first run a local code review with /review, but it's also worth having Codex reflect more deeply: What paths did it attempt to achieve the goal? Which attempts were effective? Which were ineffective? Then clean up the code accordingly.
Because Codex will keep working until the goal is reached, it might have tried methods that weren't good enough or even completely ineffective, and these leftover changes might still be in the final code.
Set a goal for your next task too
Codex's goal feature is an incredibly powerful tool that can help you solve some of the most meaningful engineering challenges. But only when you provide the right environment and instructions can it reach the goal more efficiently.
What have you done with /goal?












