Skip to content

Configuration Guide

Configure the gateway's database, caching, workers, and service architecture for your deployment needs.

Overview

This guide covers configuration options for:

  • Storage backend (SQLite vs PostgreSQL)
  • Worker processes (single vs multiple)
  • Response caching optimization (optional Redis integration)
  • Service architecture (single-command runtime vs disaggregated)

For complete environment variable reference, see Configuration Reference.

Storage Backend

The gateway stores conversation state for previous_response_id functionality. Choose the storage backend that fits your deployment model.

Stored continuation anchors include terminal responses with status="completed" and status="incomplete" (when store=true).

SQLite (Default)

Zero-configuration storage using a local SQLite database file.

# Default - no configuration needed
vllm-responses serve --upstream http://127.0.0.1:8457

Characteristics:

  • Zero setup required
  • Single file database (vllm_responses.db)
  • Works with multiple workers on the same machine (uses WAL mode)
  • Does NOT work across multiple machines

PostgreSQL

Required for multi-machine deployments and high-availability scenarios.

export VR_DB_PATH="postgresql+asyncpg://user:password@db-host:5432/vllm_responses"
vllm-responses serve --upstream http://127.0.0.1:8457

Migration notes: When moving from SQLite to PostgreSQL:

  1. Set VR_DB_PATH to your PostgreSQL connection string
  2. Restart the gateway - tables will be created automatically
  3. Existing SQLite data will NOT be migrated

Worker Configuration

Control gateway throughput by adjusting the number of worker processes.

Single Worker (Default)

The default configuration runs one worker process.

vllm-responses serve --upstream http://127.0.0.1:8457

When this is sufficient:

  • Local development
  • Low to moderate traffic (\<100 concurrent requests)
  • Testing and experimentation

Multiple Workers

Increase concurrency by running multiple worker processes.

vllm-responses serve --gateway-workers 4 --upstream http://127.0.0.1:8457

What this does:

  • Handles more concurrent requests
  • Utilizes multiple CPU cores
  • Each worker shares the same database

Compatibility notes:

  • SQLite: Works fine with multiple workers on the same machine (uses WAL mode for concurrent access)
  • PostgreSQL: Required for multiple workers across multiple machines (Kubernetes, multi-VM setups)

Response Caching Optimization (Optional)

Add Redis caching to reduce database load for previous_response_id lookups.

Configuration

export VR_RESPONSE_STORE_CACHE=1
export VR_REDIS_HOST=localhost
export VR_REDIS_PORT=6379
export VR_RESPONSE_STORE_CACHE_TTL_SECONDS=3600  # 1 hour

vllm-responses serve --upstream http://127.0.0.1:8457

How It Works

Recent responses are cached in Redis. When a request includes previous_response_id, the gateway checks Redis first before querying the database. This significantly reduces database load and latency for active conversations.

Performance impact:

  • Cache hits: fast retrieval
  • Reduces database connection pool pressure
  • Especially beneficial with PostgreSQL over network

MCP Configuration (Optional)

Enable Built-in MCP by providing a runtime config file and setting VR_MCP_CONFIG_PATH.

Minimal Setup

export VR_MCP_CONFIG_PATH="/etc/vllm-responses/mcp.json"

vllm-responses serve --upstream http://127.0.0.1:8457

For mcp.json examples (URL + stdio styles), see MCP Examples -> Built-in MCP Runtime Config.

Operational Notes

  • If VR_MCP_CONFIG_PATH is unset, Built-in MCP is disabled.
  • With vllm-responses serve, Built-in MCP runs in a singleton internal runtime process shared by all gateway workers.
  • The supervisor injects VR_MCP_BUILTIN_RUNTIME_URL for gateway workers automatically.
  • Built-in MCP startup and call timeouts are configured globally:
    • VR_MCP_HOSTED_STARTUP_TIMEOUT_SEC
    • VR_MCP_HOSTED_TOOL_TIMEOUT_SEC
  • Runtime discovery endpoints:
    • GET /v1/mcp/servers
    • GET /v1/mcp/servers/{server_label}/tools

Remote MCP Gate

Remote MCP declarations (tools[].type="mcp" with server_url) are enabled by default.

export VR_MCP_REQUEST_REMOTE_ENABLED=false

When disabled, any Remote MCP declaration is rejected as a request-level policy error. Built-in MCP mode is unaffected.

Remote MCP URL Policy Checks

Gateway URL policy checks for Remote MCP are enabled by default.

export VR_MCP_REQUEST_REMOTE_URL_CHECKS=true

Set to false to bypass gateway-side URL validation checks.

export VR_MCP_REQUEST_REMOTE_URL_CHECKS=false

Warning: disabling URL checks increases SSRF and unsafe-endpoint risk and should only be used in tightly controlled environments.

Service Architecture Patterns

The gateway can run in different architectural configurations depending on your scaling and operational needs.

Single-Command Runtime (Default)

The serve command runs the gateway with managed local components by default.

vllm-responses serve -- meta-llama/Llama-3.2-3B-Instruct --port 8457

Components:

  • vLLM subprocess
  • Gateway (1+ workers)
  • Code interpreter subprocess
  • Built-in MCP integration (optional, when VR_MCP_CONFIG_PATH is set)
    • runs as a singleton loopback runtime process shared by all gateway workers

Disaggregated

Run each component separately for flexibility and independent scaling.

Gateway + External vLLM

Use an existing vLLM deployment or scale inference separately from the gateway.

# Somewhere else: vLLM is already running
vllm serve meta-llama/Llama-3.2-3B-Instruct --port 8457

# Gateway points to external vLLM
vllm-responses serve --upstream http://127.0.0.1:8457

When to use:

  • Separate of inference and gateway
  • Using existing vLLM infrastructure
  • Avoiding model reload when restarting gateway

Configuration Quick Reference

Configuration Command/Environment
Database (PostgreSQL) export VR_DB_PATH="postgresql+asyncpg://..."
Multiple workers --gateway-workers 4
Redis cache export VR_RESPONSE_STORE_CACHE=1
Built-in MCP config export VR_MCP_CONFIG_PATH="/path/mcp.json"
Remote MCP export VR_MCP_REQUEST_REMOTE_ENABLED=false
Remote URL checks export VR_MCP_REQUEST_REMOTE_URL_CHECKS=false
External vLLM --upstream http://vllm:8000

Next Steps