OpenAI’s Agents SDK update: harness vs sandbox for long runs
OpenAI’s updated Agents SDK pairs a model-native harness with native sandboxes. Here is the harness/compute split, when you need a sandbox, and what it changes for security and durability.

OpenAI’s Agents SDK update: harness plus native sandboxes for long runs
OpenAI updated the Agents SDK with two pieces that matter once you stop demoing agents and start operating them: a model-native harness and native sandbox execution for long-horizon work. The harness is meant to standardize how agents inspect files, run tools, and keep going across many steps, while the sandbox gives the agent a controlled Unix-like workspace to do that work safely. (Source: OpenAI announcement)
The important idea is not the marketing label. It is the boundary: harness as control plane and sandbox as compute plane. OpenAI’s docs and community post spell out this split directly, and it is a useful way to reason about security, durability, and scale. (Sources: OpenAI docs, community post)
What shipped (in one minute)
From OpenAI’s announcement, the update includes:
- A more capable harness with configurable memory, sandbox-aware orchestration, Codex-like filesystem tooling, and standard integrations with MCP, skills, AGENTS.md, the shell tool, and apply_patch. (Source: OpenAI announcement)
- Native sandbox support, so agents can run inside controlled environments with the files, tools, and dependencies they need. (Sources: OpenAI announcement, OpenAI docs)
- A Manifest abstraction to describe a portable workspace layout, including local file staging and mounts for storage providers such as S3, GCS, Azure Blob Storage, and Cloudflare R2. (Sources: OpenAI announcement, OpenAI docs)
- Built-in support for sandbox providers including Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel. (Source: OpenAI announcement)
- Python first, with TypeScript support planned later. (Source: OpenAI announcement)
The mental model: harness vs sandbox
The OpenAI docs describe the key split as the boundary between harness and compute:
- Harness (control plane): the agent loop, tool routing, approvals, tracing, recovery, and run state. (Sources: OpenAI docs, community post)
- Sandbox (compute plane): the execution layer where model-directed work reads and writes files, runs commands, installs packages, uses mounted data, exposes ports, and snapshots state. (Sources: OpenAI docs, community post)
This gives you a clear architecture question for every capability you add:
- If it affects trust, policy, or auditability, it belongs in the harness.
- If it affects filesystem state, dependencies, or artifact generation, it belongs in the sandbox.
OpenAI notes you can run the harness inside the sandbox for convenience, but that changes the trust boundary because orchestration and model-directed execution live in the same compute boundary. (Source: OpenAI docs)
When do you actually need a sandbox?
OpenAI’s sandbox guide frames it as: use a sandbox when the answer depends on work done in a workspace, not only reasoning over prompt context. (Source: OpenAI docs)
Use this as a practical decision filter.
You probably need a sandbox if
- The agent needs a directory of documents, not a single pasted excerpt.
- The agent should write files you plan to inspect, diff, or ship.
- The agent needs commands and packages to complete the work.
- The workflow produces artifacts like Markdown, CSV, JSONL, screenshots, or a runnable preview.
- The run pauses for review and must later resume with the same working state. (Source: OpenAI docs)
You probably do not need a sandbox if
- You want a short response with no persistent workspace.
- Shell access is an occasional tool call, not a core part of the product’s execution model. (Source: OpenAI docs)
Callout: The sandbox is not just "shell access"
The docs emphasize sandboxes as an isolated environment with filesystem, shell, installed packages, mounted data, exposed ports, and snapshots. If your product needs those properties, a sandbox is a design decision, not a convenience feature. (Source: OpenAI docs)
Why the harness/compute split matters in production
OpenAI motivates the split with three operational needs: security, durability, and scale. (Source: OpenAI announcement)
Security: assume prompt injection and exfil attempts
OpenAI’s announcement explicitly says agent systems should be designed assuming prompt injection and exfiltration attempts. (Source: OpenAI announcement)
Separating harness and compute helps because it keeps credentials and orchestration out of the environment where model-generated code executes. The docs put it plainly: your application can keep trusted control plane work in your infrastructure, while the sandbox focuses on provider-specific execution and stateful workspace changes. (Source: OpenAI docs)
This does not eliminate risk. It makes it easier to build safer defaults:
- Narrow what gets mounted into the sandbox.
- Keep auth and sensitive tokens out of the sandbox boundary when possible.
- Put approvals and auditing in the harness, where you can enforce them consistently.
Durability: long runs should survive container loss
Long-horizon work fails for mundane reasons: timeouts, crashes, paused workflows. The community post and OpenAI announcement describe snapshotting and rehydration so an agent can restore a session in a fresh sandbox and continue from its last saved state. (Sources: OpenAI announcement, community post)
If you have ever watched a multi-step agent lose a half-finished work tree because a container died, you already know why this matters.
Scale: isolate work and parallelize intentionally
OpenAI describes runs that can use one sandbox or many, invoke sandboxes only when needed, route subagents to isolated environments, and parallelize work across containers. (Source: OpenAI announcement)
The key benefit is not raw speed. It is controlling blast radius and resource usage. You can allocate heavier execution to sandboxes without turning your harness into a fleet of long-running containers.
What the Manifest unlocks
The Manifest is a portable contract for a fresh sandbox workspace. It can stage local files, clone repos, create output directories, and mount storage. (Sources: OpenAI docs, community post)
Two practitioner details are easy to miss:
- Workspace-relative paths: manifest entries cannot be absolute paths or escape the workspace with `..`, which makes the same manifest portable across local and hosted clients. (Source: OpenAI docs)
- Permissions map to Unix semantics: the community post describes defining read-only vs write access per directory using familiar Unix permission ideas, so you can give a model read access to inputs while keeping writes constrained to outputs. (Source: community post)
This changes the default workflow: instead of stuffing documents into context, you mount them and let the agent cite and operate over the dataset in-place.
Adoption notes
- Sandbox agents are currently available in the Python Agents SDK, and OpenAI says TypeScript support is planned. (Sources: OpenAI announcement, OpenAI docs)
- If you adopt this, start by writing down your boundaries: what stays in the harness (approvals, tracing, run state) and what is allowed in the sandbox (paths, mounts, commands). Treat this as a security boundary, not a developer convenience.
TL;DR
- The Agents SDK update adds a model-native harness plus native sandbox execution for long-horizon tool runs. (Source: OpenAI announcement)
- Use sandboxes when work depends on a real workspace: files, commands, artifacts, and resumable state. (Source: OpenAI docs)
- The harness/compute split is the architectural win: it helps you keep orchestration and credentials out of model-directed execution where possible, and improves durability through snapshotting and rehydration. (Sources: OpenAI announcement, OpenAI docs, community post)
AgenticWire Desk
Editorial