Quickstart
Get your Responses API gateway running in under 5 minutes.
Prerequisites
- Completed Installation
1. Start the Gateway
You should see output indicating the server is running at http://127.0.0.1:5969.
2. Send a Request
Now, send a request to the Responses API endpoint (/v1/responses).
curl -X POST http://127.0.0.1:5969/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer dummy" \
-d '{
"model": "meta-llama/Llama-3.2-3B-Instruct",
"input": [{"role": "user", "content": "Calculate the factorial of 5"}],
"stream": true,
"tools": [{"type": "code_interpreter"}],
"include": ["code_interpreter_call.outputs"]
}'
curl -X POST http://127.0.0.1:5969/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer dummy" \
-d '{
"model": "meta-llama/Llama-3.2-3B-Instruct",
"input": [{"role": "user", "content": "Calculate the factorial of 5"}],
"tools": [{"type": "code_interpreter"}],
"include": ["code_interpreter_call.outputs"]
}'
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:5969/v1", api_key="dummy")
with client.responses.stream(
model="meta-llama/Llama-3.2-3B-Instruct",
input=[{"role": "user", "content": "Calculate the factorial of 5"}],
tools=[{"type": "code_interpreter"}],
include=["code_interpreter_call.outputs"],
) as stream:
for event in stream:
print(event)
3. Observe the Response
If you used stream=true, you will see Server-Sent Events (SSE). Unlike standard Chat Completions, the Responses API provides rich lifecycle events:
event: response.created
data: {"response":{...}}
event: response.output_item.added
data: {"output_item":{"type":"message", ...}}
event: response.content_part.added
data: {"part":{"type":"text", "text":""}, ...}
event: response.output_text.delta
data: {"delta":"I am a large language model...", ...}
...
event: response.completed
data: {"response":{...}}
4. Optional: MCP Smoke Test (Built-in MCP)
If you enabled Built-in MCP (configured VR_MCP_CONFIG_PATH and a server label/tool), you can run a minimal forced tool call.
Need the Built-in MCP mcp.json format first? See:
curl -X POST http://127.0.0.1:5969/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer dummy" \
-d '{
"model": "meta-llama/Llama-3.2-3B-Instruct",
"stream": true,
"input": [{"role":"user","content":"Use the MCP docs tool to search for migration notes."}],
"tools": [{"type":"mcp","server_label":"github_docs"}],
"tool_choice": {"type":"mcp","server_label":"github_docs","name":"search_docs"}
}'
In the stream, you should see MCP lifecycle events such as:
response.mcp_call.in_progressresponse.mcp_call_arguments.doneresponse.mcp_call.completed(orresponse.mcp_call.failed)
Next Steps
Now that you have the basic loop working, try the advanced features:
- Code Interpreter: Ask the model to write and execute code.
- Stateful Conversations: Use
previous_response_idto continue a chat. - MCP Integration: Use Built-in MCP or Remote MCP declarations.
- Architecture: Learn how the gateway processes your request.