Built-in Tools

The gateway can execute certain built-in tools on your behalf and stream the results back through the Responses API.

Current built-in tools:

Code Interpreter
Web Search

For the dedicated web_search guide, see Web Search.

Code Interpreter

The Code Interpreter allows the model to write and execute Python code in a sandboxed environment. This is useful for:

Mathematical calculations
Data analysis and visualization
String manipulation
Solving logic puzzles

Hosted code-interpreter calls are isolated by default: each tool execution gets fresh top-level globals, while the underlying runtime may still reuse the same interpreter instance between calls.

Enabling the Tool

To use the code interpreter, you must:

Include it in the tools list with type code_interpreter.
(Optional but recommended) Add code_interpreter_call.outputs to the include list if you want to receive:
- the captured stdout/stderr stream (e.g. print(...) output), and
- the final expression display value (if any), in the response object.

response = client.responses.create(
    model="Qwen/Qwen3.6-35B-A3B",
    input=[{"role": "user", "content": "Calculate the 10th Fibonacci number."}],
    tools=[{"type": "code_interpreter"}],
    include=["code_interpreter_call.outputs"]
)

Response Structure

When the model uses the code interpreter, the response will contain a code_interpreter_call output item.

{
  "type": "code_interpreter_call",
  "id": "ci_123",
  "container_id": "pyodide-worker-1",
  "code": "def fib(n):...",
  "status": "completed",
  "outputs": [
    {
      "type": "logs",
      "logs": "P1\nP2\n"
    },
    {
      "type": "logs",
      "logs": "6"
    }
  ]
}

Output types:

Logs: { "type": "logs", "logs": "stdout/stderr text" }
Images: { "type": "image", "url": "data:image/png;base64,..." }

Streaming Execution

One of the biggest benefits of the built-in runtime is streaming. You receive events while the code is running.

response.code_interpreter_call.in_progress: The tool call has started.
response.code_interpreter_call_code.delta: The model is writing the code.
response.code_interpreter_call.interpreting: The code is finished and is now executing.
response.code_interpreter_call.completed: Execution finished.
response.output_item.done: The item is finalized, including outputs.

Security and Sandboxing

The code interpreter runs in a sandboxed environment:

Runtime: Pyodide (Python compiled to WebAssembly) running inside a local Code Interpreter service. On Linux x86_64 wheels, the server is bundled as a native binary; for development/source installs it can run via Bun.
Isolation: Runs in a WebAssembly sandbox with no direct host file system access.
Network Access: HTTP requests are available through the supported Python HTTP libraries. Operators can restrict outbound HTTP/HTTPS destinations with --code-interpreter-egress-policy or --responses-code-interpreter-egress-policy.
Resource Limits: Execution time is capped (configurable via startup flags).

First start download

If the tool is enabled, the first start may download the Pyodide runtime (~400MB) into a cache directory and extract it. You can control the cache location via VR_PYODIDE_CACHE_DIR.

The egress policy applies to Python HTTP/HTTPS requests made inside the Code Interpreter runtime. It does not block the host-side Pyodide bootstrap download.

Egress policy

Set --code-interpreter-egress-policy /path/to/policy.json under vllm-responses serve, or --responses-code-interpreter-egress-policy /path/to/policy.json under integrated vllm serve --responses, to enforce a deployment-scoped outbound HTTP/HTTPS policy for Python code executed by the built-in Code Interpreter. The policy supports allowlist and denylist modes with exact host, suffix, and CIDR rules, plus blocking for IP literals and private/special-use networks. rules may be empty, which is useful for a denylist policy that only blocks IP literals and internal/special-use networks.

Use the CLI flag for normal gateway startup.

This policy is defense in depth. Deployments that require strong protection against internal network access should still enforce network-layer controls such as container network policy, VPC/firewall egress rules, or an outbound proxy.

Concurrency

If you need more code-interpreter throughput, you can configure a worker pool for the Code Interpreter service via --code-interpreter-workers under vllm-responses serve, or --responses-code-interpreter-workers in integrated mode. This uses Bun Workers experimental. Use 2+ for actual parallelism; 1 enables worker mode but does not increase throughput. Each worker loads its own Pyodide runtime, so higher worker counts increase RAM usage and startup time. Worker mode always runs executions with fresh globals; it does not provide shared-state semantics across requests.

Production Deployment

While the sandbox provides isolation, running arbitrary code from an LLM always carries risks. Ensure your deployment environment is properly secured and monitored.