Skip to content

MCP Integration (Built-in MCP + Remote MCP)

The gateway supports two MCP declaration modes in tools:

  • Built-in MCP mode: reference a configured server by server_label.
  • Remote MCP mode: provide request server_url (and request-scoped auth/headers).

This page focuses on Built-in MCP setup and call flow, then summarizes Remote MCP mode differences.

Choose a Mode

Mode Best When Request Shape
Built-in MCP You want centrally managed server inventory and policy type: "mcp" + server_label
Remote MCP You want to point directly to an MCP endpoint per request type: "mcp" + server_label + server_url

What It Solves

  • Keep MCP execution inside the gateway request lifecycle.
  • Use Responses-style streaming events for MCP call progress/results.
  • Reuse response IDs with previous_response_id just like other tool flows.

Prerequisites

  1. Configure MCP runtime servers via VR_MCP_CONFIG_PATH.
  2. Ensure the target server_label is available (GET /v1/mcp/servers).
  3. Start the gateway with vllm-responses serve so the singleton Built-in MCP runtime is launched.

Built-in MCP Setup

Set the Built-in MCP config path:

export VR_MCP_CONFIG_PATH="/etc/vllm-responses/mcp.json"

mcp.json follows the common MCP client-style shape: a top-level mcpServers object keyed by your server labels. In most cases, you can copy an MCP server entry from another MCP client config and reuse it here with minimal changes. For canonical examples (URL + stdio styles), see MCP Examples -> Built-in MCP Runtime Config.

Built-in URL-style entries accept both http:// and https:// URLs. This differs from Remote MCP request URLs, which are policy-checked as https:// by default.

Verify server availability before requests:

curl http://127.0.0.1:5969/v1/mcp/servers
curl http://127.0.0.1:5969/v1/mcp/servers/github_docs/tools

Runtime architecture note:

  • vllm-responses serve starts one internal Built-in MCP runtime process on loopback.
  • All gateway workers share that runtime, so Built-in MCP startup/discovery/session state is not duplicated per worker.

Built-in MCP Usage

Use one complete request payload including both MCP declaration and tool choice:

{
  "model": "meta-llama/Llama-3.2-3B-Instruct",
  "stream": true,
  "input": [{"role": "user", "content": "Find migration notes in docs."}],
  "tools": [
    {
      "type": "mcp",
      "server_label": "github_docs",
      "allowed_tools": ["search_docs"],
      "require_approval": "never"
    }
  ],
  "tool_choice": {
    "type": "mcp",
    "server_label": "github_docs",
    "name": "search_docs"
  }
}

cURL

curl -X POST http://127.0.0.1:5969/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer dummy" \
  -d '{
    "model": "meta-llama/Llama-3.2-3B-Instruct",
    "stream": true,
    "input": [{"role":"user","content":"Find migration notes in docs."}],
    "tools": [{"type":"mcp","server_label":"github_docs","allowed_tools":["search_docs"],"require_approval":"never"}],
    "tool_choice": {"type":"mcp","server_label":"github_docs","name":"search_docs"}
  }'

OpenAI Python SDK

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:5969/v1", api_key="dummy")


with client.responses.stream(
    model="meta-llama/Llama-3.2-3B-Instruct",
    input=[{"role": "user", "content": "Find migration notes in docs."}],
    tools=[
        {
            "type": "mcp",
            "server_label": "github_docs",
            "allowed_tools": ["search_docs"],
            "require_approval": "never",
        }
    ],
    tool_choice={"type": "mcp", "server_label": "github_docs", "name": "search_docs"},
) as stream:
    for event in stream:
        print(event.type)

MCP Event Lifecycle

Both MCP modes stream these event types:

  • response.mcp_call.in_progress
  • response.mcp_call_arguments.delta
  • response.mcp_call_arguments.done
  • response.mcp_call.completed or response.mcp_call.failed

See Events Reference for payload details.

Remote MCP Mode Notes

  • Built-in MCP requests reference configured servers by server_label only.
  • Remote MCP via request server_url does not require server registration in VR_MCP_CONFIG_PATH.
  • Remote MCP transport selection is delegated to FastMCP from request server_url and headers.
  • require_approval currently supports never only.
  • Remote MCP host policy rejects localhost, *.localhost, and IP-literal hosts, and only https is accepted.
  • For Remote MCP field compatibility (server_url, connector_id, headers), see API Reference.

For end-to-end examples, see MCP Examples.