Configuration Reference
The gateway uses both CLI flags and environment variables.
- CLI owns operator-facing runtime topology and helper wiring on supported entrypoints.
- Environment variables own deployment-scoped settings, secrets, and process-level integrations.
All documented environment variables are prefixed with VR_.
Core Configuration
| Variable | Description | Default |
|---|---|---|
VR_MAX_CONCURRENCY |
Gunicorn/Uvicorn concurrency limit for standalone gateway startup. | 300 |
VR_LOG_TIMINGS |
Enable logging of request timings and overhead. | False |
VR_LOG_MODEL_MESSAGES |
Enable logging of model-facing messages for debugging. | False |
VR_OPENAI_API_KEY |
Upstream bearer token used when the gateway or proxy path must authenticate to the upstream. | (unset) |
Notes:
- Supported entrypoints use CLI flags for upstream selection, bind address, worker count, and helper wiring.
- Integrated mode (
vllm serve --responses) uses native vLLM--host/--portand requires a single API server. VR_MAX_CONCURRENCYapplies to direct standalone startup paths used mainly for development/tests.
Storage Configuration
| Variable | Description | Default |
|---|---|---|
VR_DB_PATH |
Database connection string. Use sqlite+aiosqlite:/// or postgresql+asyncpg://. |
sqlite+aiosqlite:///vllm_responses.db |
VR_RESPONSE_STORE_CACHE |
Enable Redis caching for the ResponseStore. | False |
VR_RESPONSE_STORE_CACHE_TTL_SECONDS |
Cache TTL in seconds. | 3600 |
VR_REDIS_HOST |
Redis host (if cache enabled). | localhost |
VR_REDIS_PORT |
Redis port. | 6379 |
Code Interpreter Configuration
| Variable | Description | Default |
|---|---|---|
VR_PYODIDE_CACHE_DIR |
Directory for the Pyodide runtime cache (download + extracted files). | (see docs) |
VR_CODE_INTERPRETER_DEV_BUN_FALLBACK |
Development-only: if 1, allow bun fallback when no bundled binary is available. |
0 |
Notes:
- Supported entrypoints use CLI flags for code-interpreter mode, port, workers, and startup timeout.
0(default) runs in-process (no Bun Workers): single-threaded execution.1enables the WorkerPool path, but does not add parallelism (useful mainly to validate worker mode).2+enables parallel execution via Bun Workers (experimental).- Each worker initializes its own Pyodide runtime, so RAM usage and startup time scale with worker count.
MCP Configuration (Built-in + Remote)
| Variable | Description | Default |
|---|---|---|
VR_MCP_REQUEST_REMOTE_ENABLED |
Enable Remote MCP (tools[].mcp.server_url) handling. |
True |
VR_MCP_REQUEST_REMOTE_URL_CHECKS |
Enable Remote MCP URL policy checks (https, denylist hosts). |
True |
VR_MCP_HOSTED_STARTUP_TIMEOUT_SEC |
Built-in MCP startup/discovery timeout in seconds (applies to all hosted servers). | 10 |
VR_MCP_HOSTED_TOOL_TIMEOUT_SEC |
Built-in MCP call timeout in seconds (applies to all hosted servers). | 60 |
EXA_API_KEY |
Optional Exa API key appended to the shipped exa_mcp helper URL when that profile is enabled. |
(unset) |
Built-in MCP enablement is CLI-owned on supported entrypoints:
vllm-responses serve --mcp-config /path/to/mcp.json [--mcp-port PORT]vllm serve ... --responses --responses-mcp-config /path/to/mcp.json [--responses-mcp-port PORT]
Remote-upstream supervisor readiness controls are also CLI-owned:
vllm-responses serve --upstream-ready-timeout SECONDSvllm-responses serve --upstream-ready-interval SECONDS
If the MCP config flag is omitted, Built-in MCP is disabled.
If VR_MCP_REQUEST_REMOTE_ENABLED=false, Remote MCP declarations are rejected while Built-in MCP remains available.
If VR_MCP_REQUEST_REMOTE_URL_CHECKS=false, gateway URL policy checks are fully disabled for Remote MCP declarations.
For the canonical mcp.json examples (URL + stdio styles), see
MCP Examples -> Built-in MCP Runtime Config.
Notes:
- Labels under
mcpServersare request-visibleserver_labelvalues. EXA_API_KEYis not aVR_-prefixed gateway setting because it is passed through to the upstream Exa MCP helper contract directly.- Built-in MCP supports two server entry shapes:
- URL-based HTTP:
url(required, acceptshttp://orhttps://),headers(optional),transport(optional). - Command-style stdio:
command(required),args/env/cwd(optional),transportoptional but only"stdio".
- URL-based HTTP:
- Nested
transportobjects are rejected (for example,"transport": {"type":"stdio", ...}). transport: "stdio"without command-style keys is rejected.- Mixing HTTP and stdio keys in one entry (for example
command+url) is rejected. - Hosted startup and tool timeouts are configured globally with:
VR_MCP_HOSTED_STARTUP_TIMEOUT_SECVR_MCP_HOSTED_TOOL_TIMEOUT_SEC
- Unknown non-runtime server fields are forwarded to FastMCP.
- In supported entrypoints, Built-in MCP always binds on loopback. The CLI port flags control only the port.
Observability Configuration
| Variable | Description | Default |
|---|---|---|
VR_METRICS_ENABLED |
Enable Prometheus-compatible metrics and the GET /metrics endpoint. |
True |
VR_METRICS_PATH |
Metrics endpoint path. | /metrics |
VR_TRACING_ENABLED |
Enable OpenTelemetry tracing (OTLP gRPC exporter). | False |
VR_OTEL_SERVICE_NAME |
Service name used in OpenTelemetry resources. | vllm-responses |
VR_TRACING_SAMPLE_RATIO |
Trace sampling ratio in [0.0, 1.0] (ratio-based). |
0.01 |
VR_OPENTELEMETRY_HOST |
OTLP endpoint host (gRPC). | otel-collector |
VR_OPENTELEMETRY_PORT |
OTLP endpoint port (gRPC). | 4317 |
Example Configurations
Local Development (Default)
export VR_DB_PATH="sqlite+aiosqlite:///vllm_responses.db"
vllm-responses serve --upstream http://127.0.0.1:8000/v1
Production with PostgreSQL & Redis
export VR_DB_PATH="postgresql+asyncpg://user:pass@db-host:5432/vllm_responses"
export VR_RESPONSE_STORE_CACHE=1
export VR_REDIS_HOST="redis-host"
vllm-responses serve \
--upstream http://vllm-service:8000/v1 \
--gateway-workers 8
Enable Built-in MCP
vllm-responses serve \
--upstream http://127.0.0.1:8000/v1 \
--mcp-config /etc/vllm-responses/mcp.json