MCP Examples (Built-in MCP + Remote MCP)
Use MCP tools through the Responses API in either Built-in MCP mode or Remote MCP mode.
Built-in MCP Runtime Config (mcp.json)
Set VR_MCP_CONFIG_PATH to point to an MCP runtime config file:
Expected shape: top-level mcpServers, with each key as your server_label and each value as one MCP server entry.
This is intentionally close to common MCP client config formats, so you can usually copy existing entries directly.
Canonical example (url + stdio styles):
{
"mcpServers": {
"github_docs": {
"url": "https://mcp.example.com/mcp",
"headers": {
"Authorization": "Bearer <token>"
}
},
"local_fs": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/srv/docs"]
},
"docs_sse": {
"url": "https://mcp.example.com/sse",
"transport": "sse"
}
}
}
Built-in MCP: Discover Available Servers and Tools
Before sending tool requests, inspect runtime availability:
Set VR_MCP_CONFIG_PATH and start with vllm-responses serve so the singleton Built-in MCP runtime is active.
curl http://127.0.0.1:5969/v1/mcp/servers
curl http://127.0.0.1:5969/v1/mcp/servers/github_docs/tools
Built-in MCP: Force an MCP Tool Call
Canonical request payload:
{
"model": "meta-llama/Llama-3.2-3B-Instruct",
"stream": true,
"input": [{"role": "user", "content": "Find migration notes in docs."}],
"tools": [
{
"type": "mcp",
"server_label": "github_docs",
"allowed_tools": ["search_docs"],
"require_approval": "never"
}
],
"tool_choice": {
"type": "mcp",
"server_label": "github_docs",
"name": "search_docs"
}
}
Python SDK equivalent:
response = client.responses.create(
model="meta-llama/Llama-3.2-3B-Instruct",
stream=True,
input=[{"role": "user", "content": "Find migration notes in docs."}],
tools=[
{
"type": "mcp",
"server_label": "github_docs",
"allowed_tools": ["search_docs"],
"require_approval": "never",
}
],
tool_choice={
"type": "mcp",
"server_label": "github_docs",
"name": "search_docs",
},
)
Expected stream events include:
response.mcp_call.in_progressresponse.mcp_call_arguments.deltaresponse.mcp_call_arguments.doneresponse.mcp_call.completedorresponse.mcp_call.failed
Continue with previous_response_id
If store=true (default), the response ID from a terminal response can be reused:
follow_up = client.responses.create(
model="meta-llama/Llama-3.2-3B-Instruct",
previous_response_id=response.id,
input=[{"role": "user", "content": "Summarize that in one sentence."}],
)
Remote MCP Mode: Quick Example
Use Remote MCP mode when you want to declare an MCP endpoint directly in the request instead of using Built-in MCP registry config.
remote = client.responses.create(
model="meta-llama/Llama-3.2-3B-Instruct",
input=[{"role": "user", "content": "Search remote docs for migration notes."}],
tools=[
{
"type": "mcp",
"server_label": "docs_remote",
"server_url": "https://mcp.example.com/sse",
"authorization": "YOUR_TOKEN",
"allowed_tools": ["search_docs"],
"require_approval": "never",
}
],
tool_choice={"type": "mcp", "server_label": "docs_remote", "name": "search_docs"},
)