API Reference
The gateway exposes a single primary endpoint compatible with the OpenAI Responses API.
Spec Conformance
The gateway implements the OpenResponses specification. This ensures schema validity and correct event ordering.
OpenAI Responses API Compatibility
The gateway implements the OpenAI Responses API specification as defined in the OpenResponses specification. This includes:
- Response structure: Full
ResponseResourceshape with all required fields - Streaming events: Complete SSE event lifecycle with correct ordering
- Output items: Support for messages, function calls, reasoning, and built-in tools
- Statefulness:
previous_response_idfor multi-turn conversations
For details on the underlying architecture, see Architecture.
Gateway Extensions
In addition to OpenResponses contract compatibility, this gateway provides MCP integration:
- Request-time MCP declarations in Built-in MCP mode (
server_label) and Remote MCP mode (server_url) - Built-in MCP discovery endpoints (
/v1/mcp/servers,/v1/mcp/servers/{server_label}/tools) - MCP lifecycle stream events (
response.mcp_call.*) for both Built-in MCP and Remote MCP calls
MCP Compatibility Matrix (Current)
| Capability | OpenAI Responses API | This Gateway (Current) |
|---|---|---|
Request-declared remote MCP (tools[].mcp.server_url) |
Supported | Supported as Remote MCP (URL checks enabled by default; no connector mode) |
Request-declared connector MCP (tools[].mcp.connector_id) |
Supported | Not supported |
server_label resolution |
Request-local label associated with request-declared MCP target | Built-in MCP mode: gateway-configured label. Remote MCP mode: request-local label. |
| Where MCP transport is defined | Request payload | Built-in MCP: gateway config; Remote MCP: request server_url |
Local stdio MCP servers |
Not available via request-declared MCP in OpenAI cloud runtime | Supported in Built-in MCP config (mcpServers.<label>.command); not request-declared |
require_approval options |
Richer approval forms are documented | never only |
This table reflects current behavior in this repository as of 2026-02-28.
Create Response
POST /v1/responses
Creates a model response for the given chat conversation.
Request Body
The request body should be a JSON object with the following parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string |
Yes | The ID of the model to use (e.g., meta-llama/Llama-3.2-3B-Instruct). |
| input | string or array |
Yes | The input to the model. Can be a simple string prompt or a list of message objects. |
| stream | boolean |
No | If true, the response is streamed as Server-Sent Events. Default: false. |
| previous_response_id | string |
No | The ID of a previous response. Used to continue a conversation without re-sending history. |
| tools | array |
No | A list of tools the model can call. Supports custom function, built-in tools, and MCP declarations (Built-in MCP or Remote MCP). |
| tool_choice | string or object |
No | Controls which tool is called. Options: auto, none, required, function selection, or MCP selection. Default: auto. |
| store | boolean |
No | Whether to store the response in the database. Default: true. Stored responses can be reused via previous_response_id. |
| include | array |
No | List of additional fields to include in the output (e.g., code_interpreter_call.outputs). |
| temperature | float |
No | Sampling temperature between 0 and 2. Default: 1.0. |
| top_p | float |
No | Nucleus sampling probability mass. Default: 1.0. |
| max_output_tokens | integer |
No | Maximum number of tokens to generate. |
| max_tool_calls | integer |
No | Pass-through parity field. Returned in response metadata but not enforced as a hard runtime cap by the gateway. |
| instructions | string |
No | A system/developer message to guide the model's behavior. Not persisted across previous_response_id. |
| reasoning | object |
No | Configuration for reasoning models (e.g. effort). |
Input Item Schema
When input is an array, each item can be a Message or a Tool Output.
Message (User/System):
Function Tool Output:
MCP Tools
MCP tools are declared in tools using type: "mcp".
- Built-in MCP mode: provide
server_labelonly. - Remote MCP mode: provide
server_labelandserver_url.
MCP tool declaration:
{
"type": "mcp",
"server_label": "github_docs",
"allowed_tools": ["search_docs"],
"require_approval": "never"
}
Remote MCP declaration:
{
"type": "mcp",
"server_label": "docs_remote",
"server_url": "https://mcp.example.com/sse",
"authorization": "$DOCS_ACCESS_TOKEN",
"headers": {"X-Docs-Tenant": "acme"},
"allowed_tools": ["search_docs"],
"require_approval": "never"
}
Force MCP tool choice:
MCP request validation rules include:
tool_choice.type="mcp"requires at least one matching MCP declaration intools.- Duplicate MCP declarations for the same
server_labelare rejected (including Built-in MCP / Remote MCP cross-mode duplicates). - Hosted
server_labelvalues must reference available configured servers. - Remote MCP URL checks are enabled by default and enforce
httpsplus denylist host policy (localhost,*.localhost, IP-literal hosts). - Remote MCP URL checks can be disabled by setting
VR_MCP_REQUEST_REMOTE_URL_CHECKS=false. - Remote MCP execution can be disabled by
VR_MCP_REQUEST_REMOTE_ENABLED=false. - Remote MCP transport selection is delegated to FastMCP from request
server_urland request headers. - The gateway does not perform its own transport inference metadata/fallback logic for Remote MCP declarations.
- Function tool names starting with
mcp__are reserved and rejected. connector_idis unsupported and rejected.- Remote MCP
headersare supported. Built-in MCP declarations rejectheaders. authorizationis supported only in Remote MCP mode; it is request-scoped and not persisted.require_approvalcurrently supportsneveronly (no interactive approval flow).
MCP failure semantics:
- Remote MCP pre-run inventory/discovery failures are request-level
422(bad_input). - Runtime MCP tool failures are item-level (
mcp_call.status="failed"withmcp_call.error) and do not make the response request fail by themselves.
Response Body (Non-Streaming)
On success, returns a JSON object representing the response.
{
"id": "resp_01JM...",
"object": "response",
"created_at": 1700000000,
"model": "meta-llama/Llama-3.2-3B-Instruct",
"status": "completed",
"output": [
{
"id": "msg_01JM...",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! How can I help you?"
}
]
}
],
"usage": {
"input_tokens": 10,
"output_tokens": 8,
"total_tokens": 18
}
}
Response Body (Streaming)
See Streaming Events for the full event reference.
When generation stops due to output token limits or content filtering, Responses-style terminal semantics apply:
status: "incomplete"incomplete_details.reason(for examplemax_output_tokensorcontent_filter)
If store=true (default), both terminal statuses completed and incomplete are persisted and can be referenced by previous_response_id.
MCP Discovery Endpoints
In addition to POST /v1/responses, the gateway exposes Built-in MCP discovery endpoints:
GET /v1/mcp/servers- Returns configured server inventory with availability.
- Returns an empty list when Built-in MCP is disabled.
GET /v1/mcp/servers/{server_label}/tools- Returns runtime tool inventory for that server.
- Returns
404for unknown server labels. - Returns
409when a known server is currently unavailable. - Returns
503when the configured Built-in MCP runtime is unreachable.
Error Responses
Errors follow the standard OpenAI error format.
{
"error": {
"message": "Invalid model name",
"type": "invalid_request_error",
"param": "model",
"code": "model_not_found"
}
}
| HTTP Status | Error Type | Description |
|---|---|---|
| 400 | invalid_request_error |
Invalid input or parameters. |
| 401 | authentication_error |
Missing or invalid API key (if auth enabled). |
| 404 | invalid_request_error |
Unknown previous_response_id or model. |
| 422 | bad_input |
Request contract validation failure. |
| 500 | api_error |
Internal server error. |