Configuration Reference
The gateway is configured using environment variables. All variables are prefixed with VR_.
Core Configuration
| Variable | Description | Default |
|---|---|---|
VR_LLM_API_BASE |
The URL of the upstream vLLM server (e.g., http://localhost:8457/v1). |
http://localhost:8080/v1 |
VR_HOST |
The interface the gateway should listen on. | 0.0.0.0 |
VR_PORT |
The port the gateway should listen on. | 5969 |
VR_WORKERS |
Number of Gunicorn workers processes. | 1 |
VR_LOG_TIMINGS |
Enable logging of request timings and overhead. | False |
Storage Configuration
| Variable | Description | Default |
|---|---|---|
VR_DB_PATH |
Database connection string. Use sqlite+aiosqlite:/// or postgresql+asyncpg://. |
sqlite+aiosqlite:///vllm_responses.db |
VR_RESPONSE_STORE_CACHE |
Enable Redis caching for the ResponseStore. | False |
VR_RESPONSE_STORE_CACHE_TTL_SECONDS |
Cache TTL in seconds. | 3600 |
VR_REDIS_HOST |
Redis host (if cache enabled). | localhost |
VR_REDIS_PORT |
Redis port. | 6379 |
Code Interpreter Configuration
| Variable | Description | Default |
|---|---|---|
VR_CODE_INTERPRETER_MODE |
Runtime mode: spawn, external, or disabled. |
spawn |
VR_CODE_INTERPRETER_PORT |
Port for the code interpreter server. | 5970 |
VR_CODE_INTERPRETER_WORKERS |
Worker pool size for the spawned code interpreter. (Bun Workers). | 0 |
VR_PYODIDE_CACHE_DIR |
Directory for the Pyodide runtime cache (download + extracted files). | (see docs) |
VR_CODE_INTERPRETER_DEV_BUN_FALLBACK |
Development-only: if 1, allow bun fallback when no bundled binary is available. |
0 |
Notes:
0(default) runs in-process (no Bun Workers): single-threaded execution.1enables the WorkerPool path, but does not add parallelism (useful mainly to validate worker mode).2+enables parallel execution via Bun Workers (experimental).- Each worker initializes its own Pyodide runtime, so RAM usage and startup time scale with worker count.
MCP Configuration (Built-in + Remote)
| Variable | Description | Default |
|---|---|---|
VR_MCP_CONFIG_PATH |
Path to Built-in MCP runtime JSON configuration file. | unset |
VR_MCP_BUILTIN_RUNTIME_URL |
Loopback base URL for the singleton Built-in MCP runtime (serve default: http://127.0.0.1:5981). |
unset |
VR_MCP_REQUEST_REMOTE_ENABLED |
Enable Remote MCP (tools[].mcp.server_url) handling. |
True |
VR_MCP_REQUEST_REMOTE_URL_CHECKS |
Enable Remote MCP URL policy checks (https, denylist hosts). |
True |
VR_MCP_HOSTED_STARTUP_TIMEOUT_SEC |
Built-in MCP startup/discovery timeout in seconds (applies to all hosted servers). | 10 |
VR_MCP_HOSTED_TOOL_TIMEOUT_SEC |
Built-in MCP call timeout in seconds (applies to all hosted servers). | 60 |
If VR_MCP_CONFIG_PATH is unset, Built-in MCP is disabled.
Built-in MCP is designed for vllm-responses serve, which starts a singleton runtime and injects VR_MCP_BUILTIN_RUNTIME_URL for gateway workers.
In normal serve usage, leave VR_MCP_BUILTIN_RUNTIME_URL unset to use http://127.0.0.1:5981.
Set it only when you need a different loopback port (for example, local port clashes) or when manually wiring gateway workers to an externally managed runtime.
If VR_MCP_REQUEST_REMOTE_ENABLED=false, Remote MCP declarations are rejected while Built-in MCP remains available.
If VR_MCP_REQUEST_REMOTE_URL_CHECKS=false, gateway URL policy checks are fully disabled for Remote MCP declarations.
For the canonical mcp.json examples (URL + stdio styles), see
MCP Examples -> Built-in MCP Runtime Config.
Notes:
- Labels under
mcpServersare request-visibleserver_labelvalues. - Built-in MCP supports two server entry shapes:
- URL-based HTTP:
url(required, acceptshttp://orhttps://),headers(optional),transport(optional). - Command-style stdio:
command(required),args/env/cwd(optional),transportoptional but only"stdio".
- URL-based HTTP:
- Nested
transportobjects are rejected (for example,"transport": {"type":"stdio", ...}). transport: "stdio"without command-style keys is rejected.- Mixing HTTP and stdio keys in one entry (for example
command+url) is rejected. - Hosted startup and tool timeouts are configured globally with:
VR_MCP_HOSTED_STARTUP_TIMEOUT_SECVR_MCP_HOSTED_TOOL_TIMEOUT_SEC
- Unknown non-runtime server fields are forwarded to FastMCP.
- In
servemode,VR_MCP_BUILTIN_RUNTIME_URLmust be loopbackhttp://127.0.0.1:<port>(orhttp://localhost:<port>), with no path/query/fragment.
Observability Configuration
| Variable | Description | Default |
|---|---|---|
VR_METRICS_ENABLED |
Enable Prometheus-compatible metrics and the GET /metrics endpoint. |
True |
VR_METRICS_PATH |
Metrics endpoint path. | /metrics |
VR_TRACING_ENABLED |
Enable OpenTelemetry tracing (OTLP gRPC exporter). | False |
VR_OTEL_SERVICE_NAME |
Service name used in OpenTelemetry resources. | vllm-responses |
VR_TRACING_SAMPLE_RATIO |
Trace sampling ratio in [0.0, 1.0] (ratio-based). |
0.01 |
VR_OPENTELEMETRY_HOST |
OTLP endpoint host (gRPC). | otel-collector |
VR_OPENTELEMETRY_PORT |
OTLP endpoint port (gRPC). | 4317 |
Example Configurations
Local Development (Default)
export VR_LLM_API_BASE="http://127.0.0.1:8457/v1"
export VR_DB_PATH="sqlite+aiosqlite:///vllm_responses.db"
Production with PostgreSQL & Redis
export VR_LLM_API_BASE="http://vllm-service:8000/v1"
export VR_DB_PATH="postgresql+asyncpg://user:pass@db-host:5432/vllm_responses"
export VR_WORKERS=8
export VR_RESPONSE_STORE_CACHE=1
export VR_REDIS_HOST="redis-host"