Built-in Tools
The gateway can execute certain built-in tools on your behalf and stream the results back through the Responses API.
Current built-in tools:
- Code Interpreter
- Web Search
For the dedicated web_search guide, see Web Search.
Code Interpreter
The Code Interpreter allows the model to write and execute Python code in a sandboxed environment. This is useful for:
- Mathematical calculations
- Data analysis and visualization
- String manipulation
- Solving logic puzzles
Hosted code-interpreter calls are isolated by default: each tool execution gets fresh top-level globals, while the underlying runtime may still reuse the same interpreter instance between calls.
Enabling the Tool
To use the code interpreter, you must:
- Include it in the
toolslist with typecode_interpreter. - (Optional but recommended) Add
code_interpreter_call.outputsto theincludelist if you want to receive:- the captured stdout/stderr stream (e.g.
print(...)output), and - the final expression display value (if any), in the response object.
- the captured stdout/stderr stream (e.g.
response = client.responses.create(
model="meta-llama/Llama-3.2-3B-Instruct",
input=[{"role": "user", "content": "Calculate the 10th Fibonacci number."}],
tools=[{"type": "code_interpreter"}],
include=["code_interpreter_call.outputs"]
)
Response Structure
When the model uses the code interpreter, the response will contain a code_interpreter_call output item.
{
"type": "code_interpreter_call",
"id": "ci_123",
"container_id": "pyodide-worker-1",
"code": "def fib(n):...",
"status": "completed",
"outputs": [
{
"type": "logs",
"logs": "P1\nP2\n"
},
{
"type": "logs",
"logs": "6"
}
]
}
Output types:
- Logs:
{ "type": "logs", "logs": "stdout/stderr text" } - Images:
{ "type": "image", "url": "data:image/png;base64,..." }
Streaming Execution
One of the biggest benefits of the built-in runtime is streaming. You receive events while the code is running.
response.code_interpreter_call.in_progress: The tool call has started.response.code_interpreter_call_code.delta: The model is writing the code.response.code_interpreter_call.interpreting: The code is finished and is now executing.response.code_interpreter_call.completed: Execution finished.response.output_item.done: The item is finalized, including outputs.
Security and Sandboxing
The code interpreter runs in a sandboxed environment:
- Runtime: Pyodide (Python compiled to WebAssembly) running inside a local Code Interpreter service. On Linux x86_64 wheels, the server is bundled as a native binary; for development/source installs it can run via Bun.
- Isolation: Runs in a WebAssembly sandbox with no direct host file system access.
- Network Access: HTTP requests are available through the supported Python HTTP libraries. Operators can restrict
outbound HTTP/HTTPS destinations with
--code-interpreter-egress-policyor--responses-code-interpreter-egress-policy. - Resource Limits: Execution time is capped (configurable via startup flags).
First start download
If the tool is enabled, the first start may download the Pyodide runtime (~400MB) into a cache directory and extract
it. You can control the cache location via VR_PYODIDE_CACHE_DIR.
The egress policy applies to Python HTTP/HTTPS requests made inside the Code Interpreter runtime. It does not block the host-side Pyodide bootstrap download.
Egress policy
Set --code-interpreter-egress-policy /path/to/policy.json under vllm-responses serve, or
--responses-code-interpreter-egress-policy /path/to/policy.json under integrated vllm serve --responses, to
enforce a deployment-scoped outbound HTTP/HTTPS policy for Python code executed by the built-in Code Interpreter.
The policy supports allowlist and denylist modes with exact host, suffix, and CIDR rules, plus blocking for IP
literals and private/special-use networks.
rules may be empty, which is useful for a denylist policy that only blocks IP literals and internal/special-use
networks.
Use the CLI flag for normal gateway startup.
This policy is defense in depth. Deployments that require strong protection against internal network access should still enforce network-layer controls such as container network policy, VPC/firewall egress rules, or an outbound proxy.
Concurrency
If you need more code-interpreter throughput, you can configure a worker pool for the Code Interpreter service via
--code-interpreter-workers under vllm-responses serve, or
--responses-code-interpreter-workers in integrated mode. This uses Bun Workers
experimental. Use 2+ for actual parallelism; 1 enables worker mode
but does not increase throughput. Each worker loads its own Pyodide runtime, so higher worker counts increase RAM
usage and startup time. Worker mode always runs executions with fresh globals; it does not provide shared-state
semantics across requests.
Production Deployment
While the sandbox provides isolation, running arbitrary code from an LLM always carries risks. Ensure your deployment environment is properly secured and monitored.