Built-in Tools
The gateway can execute certain tools on your behalf, streaming the results back to the client immediately. The primary built-in tool is the Code Interpreter.
Code Interpreter
The Code Interpreter allows the model to write and execute Python code in a sandboxed environment. This is useful for:
- Mathematical calculations
- Data analysis and visualization
- String manipulation
- Solving logic puzzles
Enabling the Tool
To use the code interpreter, you must:
- Include it in the
toolslist with typecode_interpreter. - (Optional but recommended) Add
code_interpreter_call.outputsto theincludelist if you want to receive:- the captured stdout/stderr stream (e.g.
print(...)output), and - the final expression display value (if any), in the response object.
- the captured stdout/stderr stream (e.g.
response = client.responses.create(
model="meta-llama/Llama-3.2-3B-Instruct",
input=[{"role": "user", "content": "Calculate the 10th Fibonacci number."}],
tools=[{"type": "code_interpreter"}],
include=["code_interpreter_call.outputs"]
)
Response Structure
When the model uses the code interpreter, the response will contain a code_interpreter_call output item.
{
"type": "code_interpreter_call",
"id": "ci_123",
"container_id": "pyodide-worker-1",
"code": "def fib(n):...",
"status": "completed",
"outputs": [
{
"type": "logs",
"logs": "P1\nP2\n"
},
{
"type": "logs",
"logs": "6"
}
]
}
Output types:
- Logs:
{ "type": "logs", "logs": "stdout/stderr text" } - Images:
{ "type": "image", "url": "data:image/png;base64,..." }
Streaming Execution
One of the biggest benefits of the built-in runtime is streaming. You receive events while the code is running.
response.code_interpreter_call.in_progress: The tool call has started.response.code_interpreter_call_code.delta: The model is writing the code.response.code_interpreter_call.interpreting: The code is finished and is now executing.response.code_interpreter_call.completed: Execution finished.response.output_item.done: The item is finalized, including outputs.
Security and Sandboxing
The code interpreter runs in a sandboxed environment:
- Runtime: Pyodide (Python compiled to WebAssembly) running inside a local Code Interpreter service. On Linux x86_64 wheels, the server is bundled as a native binary; for development/source installs it can run via Bun.
- Isolation: Runs in a WebAssembly sandbox with no direct host file system access.
- Network Access: HTTP requests are available via
httpx(useful for API calls, data fetching). - Resource Limits: Execution time is capped (configurable via startup flags).
First start download
If the tool is enabled, the first start may download the Pyodide runtime (~400MB) into a cache directory and extract
it. You can control the cache location via VR_PYODIDE_CACHE_DIR.
Concurrency
If you need more code-interpreter throughput, you can configure a worker pool for the Code Interpreter service via
VR_CODE_INTERPRETER_WORKERS (or --code-interpreter-workers under vllm-responses serve). This uses Bun
Workers experimental. Use 2+ for actual parallelism; 1 enables
worker mode but does not increase throughput. Each worker loads its own Pyodide runtime, so higher worker counts
increase RAM usage and startup time.
Production Deployment
While the sandbox provides isolation, running arbitrary code from an LLM always carries risks. Ensure your deployment environment is properly secured and monitored.