Self-Hosted Sandboxes Orchestration Dependency Explained

Anthropic’s new self-hosted sandboxes for Claude Managed Agents promise on-premise control. But the orchestration layer—the part that actually decides what your agent does—stays on Anthropic’s servers. That architectural split is the real constraint nobody’s naming. The self-hosted sandboxes orchestration dependency means companies believe they are gaining infrastructure sovereignty while silently accepting a hard external availability dependency on every agent task they run. This article maps the split precisely and tells you what to do about it before you build something you can’t fail safely.

Table of Contents

What Does ‘Self-Hosted’ Actually Mean in Claude Managed Agents?
What Happens When Anthropic’s API Goes Down?
Should You Rely on Claude Managed Agents for Mission-Critical Automation?
How Does This Compare to Fully On-Premise Agent Frameworks?
What You Need to Know Before Migrating to Self-Hosted Sandboxes
What Self-Hosted Sandboxes Orchestration Dependency Means for Your Stack
FAQ

What Does ‘Self-Hosted’ Actually Mean in Claude Managed Agents?

When Anthropic says “self-hosted sandboxes,” it means exactly one thing: tool execution moves to infrastructure you control. Your files stay in your environment. Your network policies apply. Your audit logs capture the tool calls. According to The Decoder’s reporting on Anthropic’s May 2026 update, companies can choose their own CPU, memory, and runtime image, or delegate that to managed sandbox providers like Cloudflare, Daytona, Modal, or Vercel.

What it does not mean: the agent deciding what to do next runs anywhere near your servers.

Anthropic is explicit about this. Per the same source: “Agent orchestration—context management, error handling, and the actual agent loop—stays on Anthropic’s infrastructure. A fully on-premise deployment of the agents isn’t possible.” That sentence sits quietly in a bullet point in the key takeaways. Most coverage moved past it to discuss the sandbox capabilities.

Here is the operational picture. Imagine a simple three-step agent task: read a file, analyze its contents, write a report.

Step 1 (Read file): Tool call executes in your self-hosted sandbox. File bytes never leave your environment.
Step 2 (Analyze): The file contents are sent to Anthropic’s servers as part of the context window. The model reasons there. The decision about what to do next originates there.
Step 3 (Write report): Another tool call hits your sandbox—but only because Anthropic’s orchestration layer issued the instruction.

Your data may stay local during tool execution. Your agent’s cognitive loop does not. The self-hosted sandboxes orchestration dependency is architectural, not configurable. There is no setting to flip that moves orchestration on-premise.

This distinction matters enormously for how you model failure. The sandbox is stateless infrastructure. The orchestration loop is the thing your automation actually depends on. Losing the sandbox stops one tool call. Losing orchestration stops the agent entirely—it cannot decide to retry, cannot issue the next command, cannot complete any in-flight task.

For teams evaluating Claude Managed Agents for AI automation tools in regulated or high-availability environments, this is the first question to answer before anything else.

What Happens When Anthropic’s API Goes Down?

This is the section every announcement post skipped. Let’s be direct about the failure mode.

Your self-hosted sandbox keeps running. The container is healthy. Your network is fine. Your internal MCP tunnel is alive. None of that matters because without Anthropic’s orchestration layer, the agent has no brain. It cannot evaluate tool outputs, cannot chain the next action, cannot determine whether the task succeeded or failed. It simply stops.

This is qualitatively different from a typical microservice dependency. When your payment processor goes down, your checkout fails but your catalog still loads. When Anthropic’s orchestration goes down, your entire agent workflow halts mid-task—potentially mid-write, mid-transaction, or mid-pipeline.

The MCP tunnels feature, also introduced in the same May 2026 update and currently only in research preview requiring explicit access approval, creates a second dependency vector. The tunnel connects your private internal services to Anthropic’s infrastructure via an outbound encrypted connection. When that connection breaks—whether from Anthropic’s side or a network interruption—your agent loses access to the internal APIs it was using as tools, compounding the failure.

There are three failure scenarios worth modeling explicitly:

Anthropic API outage (full): All in-flight agent tasks halt immediately. No graceful degradation. No local fallback. Recovery requires the outage to resolve and tasks to be restarted manually or via your own retry logic wrapping the API.
Anthropic API degradation (partial/slow): Agent tasks proceed but with unpredictable latency. Multi-step workflows that have wall-clock timeouts can fail at the orchestration wait step even when individual tool calls succeed.
MCP tunnel disconnection: Agent continues reasoning on Anthropic’s side but tool calls to internal services fail. Depending on how the agent handles tool errors, it may hallucinate completion, retry indefinitely, or surface an error upstream.

Anthropic’s own documentation notes that both self-hosted sandboxes and MCP tunnels are early-stage. Self-hosted sandboxes are public beta. MCP tunnels are research preview. Betting a critical automation pipeline on a research preview feature with a cloud-hosted brain is a risk profile that deserves explicit sign-off, not just a checkbox in a vendor evaluation form.

For a broader look at how AI vendor dependencies affect production systems, Anthropic’s own Claude Managed Agents update post describes the current scope but does not address offline or degraded-mode operation.

Should You Rely on Claude Managed Agents for Mission-Critical Automation?

The honest answer is no for three specific cases: any workflow with an external SLA, any compliance regime requiring full context data residency, and any pipeline where a mid-task halt creates a state your engineers cannot deterministically roll back.

Here is a practical decision framework. Ask four questions in order. If you hit a hard stop, the answer is no—at least for now.

Can this workflow tolerate a complete halt of unknown duration? Anthropic does not publish SLAs for the Claude API’s orchestration tier. If your answer is “no,” Claude Managed Agents cannot be the primary execution path for this workflow in its current form.
Does your security team’s definition of ‘self-hosted’ require that no task metadata or context leaves your perimeter? Because it does leave your perimeter—it goes to Anthropic’s orchestration layer on every agent step. Files may stay local during tool execution, but the reasoning context is processed externally. If your compliance posture requires data residency for the full agent context, this is a hard blocker.
Is the workflow stateful across multiple tool calls? Single-step tool executions are relatively safe to retry. Multi-step stateful workflows—database migrations, multi-file refactors, sequential API chains—are the ones where a mid-task orchestration failure creates the worst cleanup burden. The more steps, the higher the risk.
Are you building for today or for six months from now? Both self-hosted sandboxes and MCP tunnels are pre-GA. Features in research preview have changed shape before reaching general availability. Building a production dependency on a research preview is betting on a roadmap, not a product.

Where Claude Managed Agents makes clear sense right now:

Internal tooling workflows where a few hours of downtime per quarter is tolerable
Batch processing jobs that run during business hours and have human oversight at each stage
Security research workflows—notably, Cloudflare ran Mythos Preview through a multi-agent harness scanning 50+ repositories precisely because human reviewers were in the loop to validate findings
Prototyping and evaluation before committing to a fully on-premise architecture

Where it is a poor fit right now: 24/7 autonomous pipelines, workflows with hard SLA commitments to external parties, regulated industries where the full data path must be auditable and contained, and any scenario where your on-call engineer cannot afford a 3am page that reads “agent halted—Anthropic API unavailable.”

How Does This Compare to Fully On-Premise Agent Frameworks?

The self-hosted sandboxes orchestration dependency looks very different once you benchmark it against frameworks that were designed for local or private-cloud deployment from the start. The table below compares Claude Managed Agents to three common alternatives across the dimensions that matter for operational reliability and compliance.

Dimension	Claude Managed Agents (Self-Hosted Sandboxes)	LangChain / LangGraph (Local LLM)	LlamaIndex Workflows (Local LLM)	CrewAI (Local Deployment)
Orchestration location	Anthropic cloud (mandatory)	Your infrastructure	Your infrastructure	Your infrastructure
Tool execution location	Your infrastructure (sandbox) or managed provider	Your infrastructure	Your infrastructure	Your infrastructure
Offline / air-gapped operation	No — requires Anthropic API connectivity	Yes — with local model (e.g., Ollama)	Yes — with local model	Yes — with local model
Data residency (full agent context)	No — context processed on Anthropic servers	Yes	Yes	Yes
Vendor availability dependency	Hard dependency on Anthropic uptime	None (local model) or soft (API model)	None (local model) or soft (API model)	None (local model) or soft (API model)
Model quality at frontier capability	High — Claude frontier models	Variable — depends on chosen model	Variable — depends on chosen model	Variable — depends on chosen model
Setup complexity for teams without ML infra	Low — managed orchestration handles the hard parts	High — requires model serving infrastructure	Medium-High	Medium
Production readiness (as of May 2026)	Beta (sandboxes) / Research preview (MCP tunnels)	GA with extensive production use	GA with production use	GA with production use
Compliance auditability	Partial — tool calls auditable locally, orchestration is not	Full — all components under your control	Full	Full

The tradeoff is stark. Claude Managed Agents wins on setup simplicity and frontier model quality. Every alternative wins on operational independence. The question is not which framework is better—it is which failure mode your organization can actually tolerate.

Teams that have run LangGraph or CrewAI in production know that “fully on-premise” has its own costs: model hosting infrastructure, prompt engineering for smaller local models, and the operational burden of maintaining your own orchestration code. None of those costs are zero. But they are costs you control. The Anthropic API outage is a cost you do not control and cannot mitigate locally.

What You Need to Know Before Migrating to Self-Hosted Sandboxes

Before you move any production workflow to Claude Managed Agents’ self-hosted sandbox setup, work through this checklist. Each item corresponds to a real failure mode, not a theoretical one.

Audit your data classification for agent context, not just tool outputs. Your security team may have signed off on the idea that “files don’t leave the perimeter.” That is true for file bytes during tool execution. It is not true for the reasoning context, which includes file summaries, extracted content, intermediate analysis, and error messages. Before deployment, run a data classification exercise specifically on what gets included in the Claude API request payload—not just what the sandbox touches.
Map every workflow step that requires orchestration continuity. For each agent workflow you are considering, draw a sequence diagram showing which steps require the Anthropic API to be available. Any workflow with more than one tool call has this dependency on every transition. Mark the steps where a mid-task halt creates data inconsistency risk.
Build and test a graceful degradation path before going to production. Wrap your Claude Managed Agent calls in retry logic with exponential backoff and a circuit breaker. Define what “graceful halt” means for each workflow—which state gets saved, which gets rolled back, and who gets notified. Test the circuit breaker by simulating an API timeout before your first production deployment.
Request MCP tunnel access only if you have a clear operational owner for the tunnel connection. MCP tunnels are research preview. The lightweight gateway creates an outbound connection from your network. Someone on your team needs to own monitoring that connection, responding to its failures, and rotating credentials. If that person does not exist yet, do not deploy MCP tunnels in production workflows.
Do not use self-hosted sandboxes as your security team’s primary argument for compliance approval. “Our files stay on-premise” is true but incomplete. Present the full architectural picture—tool execution local, orchestration remote—and get explicit sign-off on the orchestration dependency. Discovering this split after a compliance audit is significantly worse than disclosing it before deployment.
Set a review date tied to GA availability. Both features are pre-GA. Set a calendar reminder for 90 days post-GA announcement to re-evaluate the feature set, SLA commitments, and any new on-premise options Anthropic may add. The architecture may change. Your deployment decision should change with it.

One concrete configuration note: if you are testing self-hosted sandboxes today, Anthropic’s documentation points to integration guides from Cloudflare, Daytona, Modal, and Vercel as managed sandbox providers. Using a managed sandbox provider does not change the orchestration dependency—it just means your tool execution environment is also partially external. If true data locality is the goal, you need a self-operated sandbox, not a managed provider’s sandbox.

What Self-Hosted Sandboxes Orchestration Dependency Means for Your Stack

The self-hosted sandboxes orchestration dependency is not a reason to avoid Claude Managed Agents. It is a reason to be precise about what you are actually buying.

What you are buying is delegation of the hardest part of agent engineering—reliable orchestration, context management, multi-step error handling—to a vendor that does it well. That has real value. Building equivalent multi-step error handling and context management in LangGraph typically costs 6–12 weeks of senior engineer time and produces systems that fail noisily on context overflow and tool retry storms—failure modes you own entirely.

What you are not buying is operational independence. Your agent workflows now have a single point of failure outside your control, and that failure point sits at the center of every task, not at the edge. The self-hosted sandbox is real infrastructure control. It is also the least critical component in the failure hierarchy. Files surviving an outage is cold comfort when the agent that was supposed to process them has no brain.

The teams this will hurt are the ones whose security approval reads ‘tool execution is on-premise’ but whose API request logs—if anyone ever pulls them—show Claude receiving file summaries, error traces, and intermediate agent state on every single step. The teams this will serve well are the ones who planned for the outage, built the circuit breakers, and are using the managed orchestration to move faster on features they would otherwise spend quarters engineering themselves.

The architectural split is real, the documentation discloses it, and most coverage ignored it. Now you have the full picture. Whether that picture fits your requirements is the one question only your team can answer—but at least you are asking the right question.

Frequently Asked Questions About Self-Hosted Sandboxes Orchestration Dependency

Q: Does using a self-hosted sandbox in Claude Managed Agents mean my data never leaves my infrastructure?

A: Not entirely. File bytes and tool execution outputs stay within your self-hosted sandbox environment during tool calls. However, the agent orchestration layer—which includes context management, intermediate reasoning, and task sequencing—runs on Anthropic’s servers. This means the reasoning context, including summaries and extracted content from your files, is processed externally on every agent step. Your security team should audit what gets included in the API request payload, not just what the sandbox touches.

Q: What happens to in-flight Claude Managed Agent tasks if the Anthropic API goes down?

A: In-flight agent tasks halt immediately. The self-hosted sandbox remains operational but the agent cannot issue new tool calls, evaluate previous tool outputs, or determine whether the task succeeded. There is no local fallback or graceful degradation built into the current architecture. Recovery requires the Anthropic API to restore and tasks to be restarted, which means multi-step stateful workflows face potential data inconsistency if they were mid-execution during the outage.

Q: How does Claude Managed Agents’ self-hosted sandbox compare to fully on-premise frameworks like LangChain or CrewAI?

A: Frameworks like LangChain, LlamaIndex, and CrewAI can run entirely on your infrastructure when paired with a locally hosted model, giving you full data residency, offline capability, and no vendor availability dependency. Claude Managed Agents offers significantly simpler setup and access to frontier-quality Claude models, but requires Anthropic’s cloud for all orchestration. The tradeoff is capability and ease of setup versus operational independence and compliance certainty.

Sources

Synthesized from reporting by simonwillison.net, the-decoder.com.