MCP Integration (Built-in MCP + Remote MCP)
The gateway supports two MCP declaration modes in tools:
- Built-in MCP mode: reference a configured server by
server_label. - Remote MCP mode: provide request
server_url(and request-scoped auth/headers).
This page focuses on Built-in MCP setup and call flow, then summarizes Remote MCP mode differences.
Choose a Mode
| Mode | Best When | Request Shape |
|---|---|---|
| Built-in MCP | You want centrally managed server inventory and policy | type: "mcp" + server_label |
| Remote MCP | You want to point directly to an MCP endpoint per request | type: "mcp" + server_label + server_url |
What It Solves
- Keep MCP execution inside the gateway request lifecycle.
- Use Responses-style streaming events for MCP call progress/results.
- Reuse response IDs with
previous_response_idjust like other tool flows.
Prerequisites
- Configure MCP runtime servers via
VR_MCP_CONFIG_PATH. - Ensure the target
server_labelis available (GET /v1/mcp/servers). - Start the gateway with
vllm-responses serveso the singleton Built-in MCP runtime is launched.
Built-in MCP Setup
Set the Built-in MCP config path:
mcp.json follows the common MCP client-style shape: a top-level mcpServers object keyed by your server labels.
In most cases, you can copy an MCP server entry from another MCP client config and reuse it here with minimal changes.
For canonical examples (URL + stdio styles), see MCP Examples -> Built-in MCP Runtime Config.
Built-in URL-style entries accept both http:// and https:// URLs. This differs from Remote MCP request URLs, which are policy-checked as https:// by default.
Verify server availability before requests:
curl http://127.0.0.1:5969/v1/mcp/servers
curl http://127.0.0.1:5969/v1/mcp/servers/github_docs/tools
Runtime architecture note:
vllm-responses servestarts one internal Built-in MCP runtime process on loopback.- All gateway workers share that runtime, so Built-in MCP startup/discovery/session state is not duplicated per worker.
Built-in MCP Usage
Use one complete request payload including both MCP declaration and tool choice:
{
"model": "meta-llama/Llama-3.2-3B-Instruct",
"stream": true,
"input": [{"role": "user", "content": "Find migration notes in docs."}],
"tools": [
{
"type": "mcp",
"server_label": "github_docs",
"allowed_tools": ["search_docs"],
"require_approval": "never"
}
],
"tool_choice": {
"type": "mcp",
"server_label": "github_docs",
"name": "search_docs"
}
}
cURL
curl -X POST http://127.0.0.1:5969/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer dummy" \
-d '{
"model": "meta-llama/Llama-3.2-3B-Instruct",
"stream": true,
"input": [{"role":"user","content":"Find migration notes in docs."}],
"tools": [{"type":"mcp","server_label":"github_docs","allowed_tools":["search_docs"],"require_approval":"never"}],
"tool_choice": {"type":"mcp","server_label":"github_docs","name":"search_docs"}
}'
OpenAI Python SDK
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:5969/v1", api_key="dummy")
with client.responses.stream(
model="meta-llama/Llama-3.2-3B-Instruct",
input=[{"role": "user", "content": "Find migration notes in docs."}],
tools=[
{
"type": "mcp",
"server_label": "github_docs",
"allowed_tools": ["search_docs"],
"require_approval": "never",
}
],
tool_choice={"type": "mcp", "server_label": "github_docs", "name": "search_docs"},
) as stream:
for event in stream:
print(event.type)
MCP Event Lifecycle
Both MCP modes stream these event types:
response.mcp_call.in_progressresponse.mcp_call_arguments.deltaresponse.mcp_call_arguments.doneresponse.mcp_call.completedorresponse.mcp_call.failed
See Events Reference for payload details.
Remote MCP Mode Notes
- Built-in MCP requests reference configured servers by
server_labelonly. - Remote MCP via request
server_urldoes not require server registration inVR_MCP_CONFIG_PATH. - Remote MCP transport selection is delegated to FastMCP from request
server_urland headers. require_approvalcurrently supportsneveronly.- Remote MCP host policy rejects
localhost,*.localhost, and IP-literal hosts, and onlyhttpsis accepted. - For Remote MCP field compatibility (
server_url,connector_id,headers), see API Reference.
For end-to-end examples, see MCP Examples.