How Does Codex Use a Computer? Three Entry Points and Permission Boundaries

marsbitPublicado a 2026-06-21Actualizado a 2026-06-21

Resumen

This article explains the three primary methods for Codex to interact with a computer, each with distinct use cases, permission boundaries, and trust levels. **1. Computer Use:** This offers the broadest access, allowing Codex to visually control and interact with the graphical user interface of authorized macOS/Windows apps, system settings, and even iOS simulators. It's ideal for tasks lacking APIs or structured tools, such as operating legacy software or multi-app workflows. However, it's the slowest method and has the widest permission scope, requiring careful supervision for sensitive actions. **2. Chrome Extension:** This grants Codex access to the user's logged-in Chrome browser state, including cookies, profiles, and open tabs. It's best for tasks requiring user identity across websites like Gmail, LinkedIn, Salesforce, or internal dashboards. Its key advantage is multi-tab control for complex workflows. While more powerful for browser-based tasks than Computer Use, it carries higher sensitivity as actions are performed under the user's identity. **3. In-App Browser:** This is a browser isolated within the Codex thread, separate from the user's personal browsing data. It excels in web development and debugging scenarios—previewing local servers, testing responsive layouts, or annotating designs directly on the page. Its isolation is a strength for development but a limitation for tasks requiring login sessions. The core principle is to choose the narrowest, safest...

Editor's Note: This article outlines three entry points for Codex to operate external environments: Computer Use, Chrome Extension, and In-App Browser. While all three appear to address the problem of "letting Codex use a computer," they correspond to different task scenarios, permission boundaries, and levels of trust.

Among them, Computer Use has the broadest coverage, allowing direct operation of authorized native apps, system settings, iOS simulators on macOS/Windows, and even workflows spanning multiple apps. It's suitable for GUI processes without API, plugin, or structured tool support, but the trade-offs are slower speed and the widest permission boundaries. The Chrome Extension is suitable for tasks reliant on login state, cookies, multiple tabs, and browser identity, such as Gmail, LinkedIn, Salesforce, internal dashboards, or logged-in research across multiple websites. The In-App Browser leans more towards development and debugging scenarios, especially for local services, visual bugs, responsive layouts, and design annotations; it does not inherit the login state from the user's normal browser, has narrower capabilities, but also stronger isolation.

The core judgment of the article is that Codex does not have only one way to "use a computer." What's truly important is choosing the narrowest, safest, most structured operational interface based on the task. If a plugin or MCP can be used, visual control shouldn't be the first resort. For tasks only involving web development, prioritize the In-App Browser. When user browser identity and login state are needed, then switch to Chrome. Only when structured tools cannot cover it, and the task must rely on the desktop graphical interface, is Computer Use the last mile.

Appshots, however, are not a fourth way to control the computer, but a tool to "point" the current screen context "for Codex to see." It solves the context input problem, while Browser, Chrome, and Computer Use solve the action problem. Viewed together, this layering actually reveals the key to AI Agent productization: not granting the model unlimited permissions, but continuously narrowing permissions, clarifying boundaries for specific tasks, and allowing users to retain review rights for critical actions.

The following is the original text:

Codex has three ways to use a computer: Computer Use, Chrome Extension, and In-App Browser.

There is some overlap between them, just enough to be confusing.

After reading this article, you'll know how to install and trigger these three methods, when to use each in different scenarios, how Appshots and Developer mode connect them, and what to write in AGENTS.md so Codex can choose the appropriate operational interface itself.

The simple version is:

That said, whenever possible, prioritize using plugins or MCP. For example, a Slack plugin can more precisely search a thread than clicking around in Slack; operations generated by a GitHub plugin are also easier to check than letting Codex drive the webpage. Visual control is best used where the capabilities of structured tools reach their limits.

Everything Can Be @Computer

Computer Use is the broadest among these three operational interfaces. It allows Codex to view and operate the graphical interface on macOS and Windows, including windows, menus, keyboard input, and the clipboard within apps you authorize.

It's also usually the slowest. Structured plugins can directly call APIs; Computer Use needs to observe the interface, decide where to click, wait for the app to respond, and then check the next state. This visual loop consumes time but also means Codex can operate apps that have no available API at all.

On macOS, slow doesn't necessarily mean it will disturb you. Computer Use can operate your authorized apps in the background while you can still use other parts of the computer. Often, I open an app while using Codex only to find that Codex has already quietly completed a workflow in the background.

Depending on which apps you have installed and authorized on your computer, these operational targets can include Spotify, Xcode, System Settings, iOS Simulator, or even controlling your iPhone via iPhone Mirroring. It can also switch between multiple apps, handling workflows that span different applications.

Use it when a task relies on:

Native desktop applications, like Spotify or financial apps;

iOS Simulator, iPhone Mirroring, or other processes only operable via GUI;

System or application settings;

Data sources without plugins or APIs;

Workflows requiring switching between multiple apps;

The missing final step in a structured integration.

How to install: Open Codex's Settings > Computer Use, then click Install.

How to trigger: Mention @Computer, or explicitly ask Codex to use Computer Use. As model capabilities improve, it may also invoke it automatically when needed in the future.

Try a few examples first:

One of my favorite examples started when a package was stolen. Amazon told me it would take about 25 minutes to reach a customer service agent. I gave a Codex thread to Computer Use, telling it to check the chat window every five minutes, switch to checking every minute once an agent appeared, and try its best to get me a refund. When I returned from a shower, the refund was completed.

I also use Computer Use as the "last mile" in structured workflows. In one video launch, Codex could read feedback from Slack, modify code, and render a new video, but the Slack integration in that thread couldn't upload files. So Computer Use clicked Add file, filling in that missing step.

It also has the widest trust boundary among the three. Only give it one specific app or process at a time. Keep sensitive apps closed when they're not part of the task; carefully check permission prompts; for financial, account, payment, credential, privacy, and system security changes, it's best to supervise in person.

Use @Chrome for Multi-Tab and Login State Tasks

The Codex Chrome Extension gives Codex access to your already logged-in Chrome state. Use it when tasks depend on accounts, cookies, browser profiles, or tabs you have opened and authenticated.

This operational interface is suitable for work in tools like:

Gmail or LinkedIn;

Salesforce or customer service backends;

Internal dashboards;

Logged-in research across multiple websites;

Forms that rely on your account or browser extensions.

How to install: Open Codex's Plugins, add Chrome, and follow the setup process. Codex will guide you to install the Codex Chrome extension and approve Chrome permissions. When the extension shows Connected, start a new thread.

How to trigger: Mention @Chrome, or explicitly ask Codex to use your logged-in Chrome browser:

Chrome tasks run in tab groups, which helps keep tabs related to a specific Codex thread together. Unlike the In-App Browser, this interface carries your browser identity. This makes it more powerful and also more sensitive.

Another main advantage is multi-tab control. Chrome can associate multiple tabs with the same task, reading context in one page, cross-referencing information in another, and continuing the workflow in a third. Computer Use can also drive the browser visually, but Chrome treats the task as a browser workflow, not a series of screen coordinate operations.

Recently in a thread, I gave Codex an already open Strudel Composer tab and asked it to make the music more interesting. Chrome gave it the selected tab and the WebMCP tools exposed by that page. Codex examined the song structure, rewrote the harmony and the overall four-minute form, modified the tempo, saved the track, and let it play. It didn't need to visually hunt for every control in the interface because Chrome can combine tab context with the structured capabilities the page provides.

I also use it to run a long-term Twitter thread. The general instruction was:

The interesting part isn't that Codex can open Twitter, but that this thread can long-term return to the same logged-in working environment, connect discovered content to local files, and leave a result for me to review.

The trust boundary here is important. Websites may treat Codex's clicks, form submissions, and message sending as actions taken by you personally. Webpage content itself is also untrusted input. Clearly separate higher-consequence steps: research, navigation, and drafting can be automated; sending, posting, purchasing, or submitting require your review first.

If the entire task is completed within the browser, prioritize Chrome over Computer Use. Chrome has the browser-native context needed for this type of task without extending access to the entire desktop.

Use the In-App @Browser for the Website You're Developing

The In-App Browser is a browser that exists inside a Codex thread. You and Codex share the same rendered page, making it particularly suitable for building and debugging web applications.

I usually start here when dealing with:

Local development servers;

File-based preview pages;

Public pages that don't require login;

Reproducing visual bugs;

Checking responsive layouts;

Leaving design feedback on page elements.

Its most important constraint is isolation. The In-App Browser does not use your normal browser profile, cookies, extensions, login sessions, or existing tabs. When a task requires account identity, this is a limitation; but when a task doesn't need an account, this is a useful boundary.

How to set up: Open Codex's Plugins, add the Browser plugin and enable it.

How to trigger: Mention @Browser in the prompt, or explicitly ask Codex to use the In-App Browser:

This creates a tight feedback loop: Codex can edit code, operate the page, check rendering status, take screenshots, and then revalidate the same process after fixes.

My favorite part is annotation. When reviewing a local application, I can directly click on an element or select an area and leave a comment. Style controls also let me preview and give feedback on text, fonts, spacing, and color more precisely. I usually combine it with voice input and process guidance: I review the page, leave comments, and continue queuing more feedback while Codex processes the current batch. The page itself becomes the specification.

This is especially useful for design work. I often ask Codex to organize an idea, a research package, or a project status into a single-file index.html, then open it in the In-App Browser. Instead of trying to describe an entire design in another prompt, I can directly annotate on the real page: "This hierarchy is reversed," "Make this less card-like here," "These controls need more space," or "Use this font size ratio sitewide." Codex receives comments with relevant screenshots and element context, modifies the file, and reopens the same page for the next round.

This loop feels closer to working with a designer on the same canvas than exchanging screenshots and text descriptions.

The In-App Browser is also suitable as a starting point for hybrid workflows. In another thread, I opened an X post in the In-App Browser and asked Codex to investigate related discussions. The visible page helped it confirm which post I was referring to; then Codex switched to the Twitter CLI, retrieving 38 replies, including nested replies hidden from the browser view. This is the practice of the "use the narrowest operational interface" principle: use the browser to confirm on-screen context, then use structured tools for deeper retrieval.

There are trade-offs here too. The isolation of the In-App Browser makes it a great development interface, but also means it's not suitable for handling Google logins, passkeys, or websites that rely on browser extensions. When identity is important, switch to Chrome.

Appshots

Appshot is not a fourth way for Codex to control the computer. It's a method to point Codex at the context in front of you.

On Mac, press the CMD key twice to capture the frontmost window. Codex will attach an image with all available text to the thread. You can take an Appshot of an error, an email, a design, a settings panel, or an unfamiliar form, and then directly say:

This is the mental model I find easiest to remember: Appshots are how you point at something on your computer; Browser, Chrome, and Computer Use are how Codex takes action.

Appshots are currently created via the Codex app on macOS. They capture the frontmost window, not the entire desktop. This makes it a useful way to provide focused context without granting control over that application.

How to Follow These Developments

These operational interfaces change quickly. If you want practical details rather than waiting for a massive release summary:

Follow Ari Weinstein (@AriX) for Computer Use and Appshots;

Follow James Sun (@JamesZmSun) for Browser-related content;

Follow Andrew Ambrosino (@ajambrosino) for Codex app releases and the broader desktop product narrative;

Follow OpenAI Developers (@OpenAIDevs) for broader Codex and OpenAI Platform news.

Preguntas relacionadas

QWhat are the three ways Codex can operate on a computer, as mentioned in the article?

AThe three ways are: Computer Use, the Chrome extension, and the in-app Browser.

QAccording to the article, which method should be prioritized when structured tools (like plugins or MCP) are available?

AWhen available, plugins or Model Context Protocol (MCP) should be prioritized over visual control methods. Structured tools provide more precise, faster, and easier-to-audit operations.

QWhat is the primary purpose of the Appshot feature, and is it considered a method for Codex to control the computer?

AThe primary purpose of Appshot is to provide context by allowing the user to point at something on their screen (capturing the frontmost window). It is not a method for Codex to control the computer; it solves the input/context problem, whereas Browser, Chrome, and Computer Use solve the action problem.

QIn which scenario is the in-app Browser most suitable, and what is its key constraint?

AThe in-app Browser is most suitable for building and debugging web applications, such as working with local development servers, visual bugs, responsive layouts, and design annotations. Its key constraint is isolation—it does not inherit the user's normal browser profiles, cookies, extensions, or login sessions.

QWhat is the core judgment of the article regarding how Codex should interact with a computer?

AThe core judgment is that Codex does not have just one way to 'use the computer.' The important principle is to choose the narrowest, safest, and most structured interface for the specific task. Use plugins or MCP first, use the in-app Browser for web development, switch to Chrome for logged-in browser identity, and use Computer Use only as the last mile when structured tools cannot cover a task that depends on the desktop GUI.

Lecturas Relacionadas

Foundation Steps Back, Ethlabs Steps Forward: Ethereum Undergoes Its Largest Restructuring in History

On June 23rd, the Ethereum ecosystem witnessed two major shifts, signaling a significant governance realignment. First, former Ethereum Foundation researchers established Ethlabs, a new independent non-profit. Backed by major ETH holders like Bitmine and SharpLink, Ethlabs aims to address practical needs for institutional adoption, including faster settlement, native asset issuance, cross-chain transactions, and mainnet scaling. Secondly, the Ethereum Foundation announced a major restructuring, laying off 54 employees (20% of its staff) to become a leaner entity focused on protocol governance and maintenance rather than being the primary builder. This move represents a pivotal correction. Criticisms had mounted over the Foundation's perceived slowness, lack of clear strategy, and over-reliance on Vitalik Buterin's influence. Ethlabs emerges as a more execution-oriented, "industrialized" layer focused on market adoption—bridging the gap between research and real-world use. Notably, Vitalik Buterin is absent from its list of supporters, interpreted as an intentional step to avoid excessive personal endorsement and allow the organization to build independent credibility. The Ethereum Foundation's downsizing and redefinition mark a retreat from its former central coordinating role. It now aims to share the "privilege of stewarding Ethereum" with other emerging groups like Ethlabs, the Ethereum Applications Guild, and The Ethereum Economic Zone. Analysts frame this dual shift as the Foundation ensuring Ethereum remains "correct" (credibly neutral), while Ethlabs must prove it remains "effective" (competitive and attractive for capital and adoption). This addresses community "shareholder-like anxiety" about ETH's market performance. While risks exist—such as concerns over shifting from Foundation centrality to large-holder influence—the consensus is that the greater risk for Ethereum was inaction, caught between technical idealism and organizational inertia. These steps aim to create a more multi-stakeholder, execution-driven future for the network.

链捕手Hace 5 hora(s)

Foundation Steps Back, Ethlabs Steps Forward: Ethereum Undergoes Its Largest Restructuring in History

链捕手Hace 5 hora(s)

Second Half of U.S. Crypto Policy: The Clarity Act Aims for 60 Votes, CFTC's "One-Person Commission" Becomes Biggest Variable

In a pivotal year for US crypto policy, the "CLARITY Act" is advancing in the Senate but faces a high hurdle, needing 60 votes to pass. Key challenges include bridging partisan divides on ethics and swaying undecided Republican senators within a tight legislative calendar of only about 40 working days. The policy "second half" involves intense negotiations on a broader framework for Web3 and DeFi, including crypto tax reforms and the Blockchain Regulatory Certainty Act. A significant uncertainty is the understaffed CFTC, operating with four commissioner vacancies, which complicates regulatory clarity. Meanwhile, the departure of key "crypto champions"—SEC Commissioner Hester Peirce and Senator Cynthia Lummis—will impact ongoing policy efforts. Industry experts are cautiously optimistic but realistic. Sara K. Weed notes that while progress is being made, CLARITY is unlikely to pass this Congress, pushing agencies like the SEC and CFTC to provide more guidance. Sulolit Mukherjee suggests meaningful crypto tax legislation is more likely to be attached to larger must-pass bills. Rashan Colbert discusses the jurisdictional debate over prediction markets, emphasizing the need for a regulatory framework that fosters their development as financial tools rather than treating them broadly as gambling. The clock is ticking, but opportunities remain for substantive progress through continued bipartisan dialogue and pragmatic efforts.

marsbitHace 8 hora(s)

Second Half of U.S. Crypto Policy: The Clarity Act Aims for 60 Votes, CFTC's "One-Person Commission" Becomes Biggest Variable

marsbitHace 8 hora(s)

Dan Koe's New Essay: Escaping the Fate of the Wage Slave, How to Survive the AI Replacement Wave?

Dan Koe argues that the true threat in the AI era isn't technology itself, but a reliance on others for one's livelihood and happiness. The core problem is "wage slavery"—spending life on unfulfilling work. To survive and thrive, one must escape this by building their own enterprise. The key is developing five elements: Agency (initiative), Taste (discernment), Persuasion, Persistence, and Iteration. These boil down to problem-solving skills and experiential knowledge, which cannot be learned passively but only through doing your own projects. The solution is to become "unemployable" by shifting your identity. This requires: 1) Radically changing your environment to force growth, 2) Choosing a medium (like content creation) that provides real feedback through trial and error, and 3) Mastering either code or, preferably, media (content). Content creation is more valuable because its subjective nature and need for human perspective create a durable advantage over generic AI output. To start, define your life's work by answering foundational questions about your innate knowledge, unique abilities, and contrarian beliefs. Then, immediately act by publishing your first piece of content. The cycle of creating, receiving feedback, and iterating is the essential path to developing the skills needed for an independent, meaningful career and financial resilience.

marsbitHace 9 hora(s)

Dan Koe's New Essay: Escaping the Fate of the Wage Slave, How to Survive the AI Replacement Wave?

marsbitHace 9 hora(s)

Research Report Analysis: Morgan Stanley Details SanDisk SNDK, The Truth About Cloud Data Center Pricing Power and AI Inference Benefits

Morgan Stanley raised its price target for SanDisk (SNDK) from $1100 to $1750 on June 22, maintaining an Overweight rating. The upgrade is driven by AI inference demand reshaping the NAND market, particularly for KV Cache and context window storage in cloud data centers. These cloud clients exhibit price inelasticity and sign long-term contracts, granting SanDisk significant pricing power. SanDisk's New Business Model (NBM) agreements, covering over one-third of FY27 bit shipments with 3-5 year terms and fixed price/price collar structures, are crucial. They are projected to sustain gross margins around 80% even at floor prices, providing a buffer against cyclical downturns. Morgan Stanley forecasts gross margins to surge from 30.3% in FY25 to 86.7% in FY27e. With NAND supply expected to remain tight into 2026/2027 and cloud/data centers becoming the largest end-market, SanDisk holds supply-side pricing power. The company targets 15-19% bit growth via technology transitions, not capacity expansion. Revenue is projected to grow ~6.6x from FY25 to FY27, with EPS rising from $2.74 to $14.73, driven by high-margin cloud business. Key upside catalysts include faster enterprise SSD adoption and edge AI growth. Downside risks involve slower industry growth, competitor capex increases, market share loss, and competition from Chinese players like YMTC. The investment thesis rests on AI-driven structural demand, NBM's margin protection, and sustained supply tightness. The $1750 target implies ~28x FY27e P/E.

marsbitHace 9 hora(s)

Research Report Analysis: Morgan Stanley Details SanDisk SNDK, The Truth About Cloud Data Center Pricing Power and AI Inference Benefits

marsbitHace 9 hora(s)

Trading

Spot
Futuros
活动图片