Codex CLI
Use vllm-responses as the model provider for Codex CLI.
Prerequisites
- A running
vllm-responsesgateway. See Quickstart. - Codex CLI installed (
codexon$PATH).
1. Gateway
Start the gateway with the --codex-approval-model flag so Codex's
guardian auto-review feature works:
Replace the model name with whatever your backend exposes.
2. Codex Config
Add the provider to ~/.codex/config.toml
(or $CODEX_HOME/config.toml):
[model_providers.vllm-responses]
name = "vllm-responses"
base_url = "http://127.0.0.1:8457/v1"
wire_api = "responses"
3. Run Codex
# -c model_context_window is needed to ensure the codex auto compaction threshold follows the model context length
codex --disable image_generation \
-c model_provider=vllm-responses \
-m Qwen/Qwen3.6-35B-A3B \
-c model_context_window=262144
Compaction
The gateway's implementation of POST /v1/responses/compact is not ready.
However, Codex falls back to its own compaction method for custom
providers: it sends a normal POST /v1/responses with "tools": []
asking the model to summarize prior turns. The gateway accepts this,
so long interactive sessions still work.