Stateful Conversations
One of the most powerful features of the Responses API is the ability to maintain conversation state on the server.
Overview
In the standard Chat Completions API, the client is responsible for managing the entire conversation history. Every new request must include the full list of previous messages (messages=[...]). This is bandwidth-intensive and requires complex client-side state management.
The Responses API introduces Statefulness via the previous_response_id parameter.
How It Works
- Initial Request: You send a request with your initial input (e.g., a user message).
- Storage: The gateway generates a response and stores the entire conversation context (including your input and its output) in its database for final responses (
completedandincomplete, whenstore=true). - Continuation: When you want to reply, you send only your new input and the
previous_response_idfrom the last response. - Rehydration: The gateway looks up the previous response, reconstructs the full history, and sends it to the model.
Example Flow
Step 1: Start the conversation
# No previous ID provided
response_1 = client.responses.create(
model="meta-llama/Llama-3.2-3B-Instruct",
input=[{"role": "user", "content": "My name is Alice."}]
)
print(response_1.output[0].content)
# "Hello Alice!"
print(response_1.id)
# "resp_01J..."
Step 2: Continue the conversation
# Pass the ID from Step 1
response_2 = client.responses.create(
model="meta-llama/Llama-3.2-3B-Instruct",
previous_response_id=response_1.id,
input=[{"role": "user", "content": "What is my name?"}]
)
print(response_2.output[0].content)
# "Your name is Alice."
Storage Backends
Statefulness is powered by the ResponseStore.
- Development: By default,
vLLM Responsesuses a local SQLite database (vllm_responses.db). This works great for local setups and single-machine deployments. - Production: For multi-machine deployments or high-traffic production, you should configure a PostgreSQL database.
See Configuration Reference and Configuration Guide for details.
Security & Lifecycle
- Capability-Based Access: The
response_idacts as a capability token. Anyone who possesses the ID can continue the conversation. Treat these IDs as secrets (like session tokens). - Persistence: By default, responses are stored indefinitely (or until an expiration policy is configured/implemented).
storeParameter: You can control whether a response is stored using thestoreparameter (default:true). Ifstore=false, the response cannot be used as aprevious_response_idlater.- Terminal Statuses: Stored terminal responses include both
completedandincomplete. Non-terminal and failed states are not continuation anchors.