Configuration Reference

The gateway uses both CLI flags and environment variables.

CLI owns operator-facing runtime topology and helper wiring on supported entrypoints.
Environment variables own deployment-scoped settings, secrets, and process-level integrations.

All documented environment variables are prefixed with VR_.

Core Configuration

Variable	Description	Default
`VR_MAX_CONCURRENCY`	Gunicorn/Uvicorn concurrency limit for standalone gateway startup.	`300`
`VR_LOG_TIMINGS`	Enable logging of request timings and overhead.	`False`
`VR_LOG_MODEL_MESSAGES`	Enable logging of model-facing messages for debugging.	`False`
`VR_OPENAI_API_KEY`	Upstream bearer token used when the gateway or proxy path must authenticate to the upstream.	(unset)

Notes:

Supported entrypoints use CLI flags for upstream selection, bind address, worker count, and helper wiring.
Integrated mode (vllm serve --responses) uses native vLLM --host / --port and requires a single API server.
VR_MAX_CONCURRENCY applies to direct standalone startup paths used mainly for development/tests.

Storage Configuration

Variable	Description	Default
`VR_DB_PATH`	Database connection string. Use `sqlite+aiosqlite:///` or `postgresql+asyncpg://`.	`sqlite+aiosqlite:///vllm_responses.db`
`VR_RESPONSE_STORE_CACHE`	Enable Redis caching for the ResponseStore.	`False`
`VR_RESPONSE_STORE_CACHE_TTL_SECONDS`	Cache TTL in seconds.	`3600`
`VR_REDIS_HOST`	Redis host (if cache enabled).	`localhost`
`VR_REDIS_PORT`	Redis port.	`6379`

Code Interpreter Configuration

Variable	Description	Default
`VR_PYODIDE_CACHE_DIR`	Directory for the Pyodide runtime cache (download + extracted files).	(see docs)
`VR_CODE_INTERPRETER_DEV_BUN_FALLBACK`	Development-only: if `1`, allow `bun` fallback when no bundled binary is available.	`0`

Notes:

Supported entrypoints use CLI flags for code-interpreter mode, port, workers, and startup timeout.
0 (default) runs in-process (no Bun Workers): single-threaded execution.
1 enables the WorkerPool path, but does not add parallelism (useful mainly to validate worker mode).
2+ enables parallel execution via Bun Workers (experimental).
Each worker initializes its own Pyodide runtime, so RAM usage and startup time scale with worker count.

MCP Configuration (Built-in + Remote)

Variable	Description	Default
`VR_MCP_REQUEST_REMOTE_ENABLED`	Enable Remote MCP (`tools[].mcp.server_url`) handling.	`True`
`VR_MCP_REQUEST_REMOTE_URL_CHECKS`	Enable Remote MCP URL policy checks (`https`, denylist hosts).	`True`
`VR_MCP_HOSTED_STARTUP_TIMEOUT_SEC`	Built-in MCP startup/discovery timeout in seconds (applies to all hosted servers).	`10`
`VR_MCP_HOSTED_TOOL_TIMEOUT_SEC`	Built-in MCP call timeout in seconds (applies to all hosted servers).	`60`
`EXA_API_KEY`	Optional Exa API key appended to the shipped `exa_mcp` helper URL when that profile is enabled.	(unset)

Built-in MCP enablement is CLI-owned on supported entrypoints:

vllm-responses serve --mcp-config /path/to/mcp.json [--mcp-port PORT]
vllm serve ... --responses --responses-mcp-config /path/to/mcp.json [--responses-mcp-port PORT]

Remote-upstream supervisor readiness controls are also CLI-owned:

vllm-responses serve --upstream-ready-timeout SECONDS
vllm-responses serve --upstream-ready-interval SECONDS

If the MCP config flag is omitted, Built-in MCP is disabled. If VR_MCP_REQUEST_REMOTE_ENABLED=false, Remote MCP declarations are rejected while Built-in MCP remains available. If VR_MCP_REQUEST_REMOTE_URL_CHECKS=false, gateway URL policy checks are fully disabled for Remote MCP declarations.

For the canonical mcp.json examples (URL + stdio styles), see MCP Examples -> Built-in MCP Runtime Config.

Notes:

Labels under mcpServers are request-visible server_label values.
EXA_API_KEY is not a VR_-prefixed gateway setting because it is passed through to the upstream Exa MCP helper contract directly.
Built-in MCP supports two server entry shapes:
- URL-based HTTP: url (required, accepts http:// or https://), headers (optional), transport (optional).
- Command-style stdio: command (required), args/env/cwd (optional), transport optional but only "stdio".
Nested transport objects are rejected (for example, "transport": {"type":"stdio", ...}).
transport: "stdio" without command-style keys is rejected.
Mixing HTTP and stdio keys in one entry (for example command + url) is rejected.
Hosted startup and tool timeouts are configured globally with:
- VR_MCP_HOSTED_STARTUP_TIMEOUT_SEC
- VR_MCP_HOSTED_TOOL_TIMEOUT_SEC
Unknown non-runtime server fields are forwarded to FastMCP.
In supported entrypoints, Built-in MCP always binds on loopback. The CLI port flags control only the port.

Observability Configuration

Variable	Description	Default
`VR_METRICS_ENABLED`	Enable Prometheus-compatible metrics and the `GET /metrics` endpoint.	`True`
`VR_METRICS_PATH`	Metrics endpoint path.	`/metrics`
`VR_TRACING_ENABLED`	Enable OpenTelemetry tracing (OTLP gRPC exporter).	`False`
`VR_OTEL_SERVICE_NAME`	Service name used in OpenTelemetry resources.	`vllm-responses`
`VR_TRACING_SAMPLE_RATIO`	Trace sampling ratio in `[0.0, 1.0]` (ratio-based).	`0.01`
`VR_OPENTELEMETRY_HOST`	OTLP endpoint host (gRPC).	`otel-collector`
`VR_OPENTELEMETRY_PORT`	OTLP endpoint port (gRPC).	`4317`

Example Configurations

Local Development (Default)

export VR_DB_PATH="sqlite+aiosqlite:///vllm_responses.db"
vllm-responses serve --upstream http://127.0.0.1:8000/v1

Production with PostgreSQL & Redis

export VR_DB_PATH="postgresql+asyncpg://user:pass@db-host:5432/vllm_responses"
export VR_RESPONSE_STORE_CACHE=1
export VR_REDIS_HOST="redis-host"
vllm-responses serve \
  --upstream http://vllm-service:8000/v1 \
  --gateway-workers 8

Enable Built-in MCP

vllm-responses serve \
  --upstream http://127.0.0.1:8000/v1 \
  --mcp-config /etc/vllm-responses/mcp.json

Integrated Mode With Built-in MCP

vllm serve meta-llama/Llama-3.2-3B-Instruct \
  --responses \
  --responses-mcp-config /etc/vllm-responses/mcp.json