Web Search Examples
Use the gateway-owned web_search built-in to let the model search the web,
open pages, and reuse page text inside the same request.
Prerequisite
Start a supported entrypoint with a shipped profile:
vllm-responses serve
export EXA_API_KEY="your-exa-api-key" # optional for exa_mcp
vllm-responses serve \
--upstream http://127.0.0.1:8000/v1 \
--web-search-profile exa_mcp
vllm serve --responses
export EXA_API_KEY="your-exa-api-key" # optional for exa_mcp
vllm serve meta-llama/Llama-3.2-3B-Instruct \
--responses \
--responses-web-search-profile exa_mcp
If EXA_API_KEY is omitted, the shipped exa_mcp helper uses the anonymous
default Exa MCP URL.
1. Minimal Request
response = client.responses.create(
model="meta-llama/Llama-3.2-3B-Instruct",
input=[{"role": "user", "content": "Find the official vLLM documentation site."}],
tools=[{"type": "web_search"}],
)
2. Return Search Sources
Add web_search_call.action.sources to include when you want the final
search action sources expanded in the response item.
response = client.responses.create(
model="meta-llama/Llama-3.2-3B-Instruct",
input=[{"role": "user", "content": "Find migration notes for vLLM and cite the sources."}],
tools=[{"type": "web_search"}],
include=["web_search_call.action.sources"],
)
3. Streaming Web Search Events
with client.responses.stream(
model="meta-llama/Llama-3.2-3B-Instruct",
input=[{"role": "user", "content": "Search for the latest vLLM release notes."}],
tools=[{"type": "web_search"}],
include=["web_search_call.action.sources"],
) as stream:
for event in stream:
print(event)
Expected event families include:
response.web_search_call.in_progressresponse.web_search_call.searchingresponse.web_search_call.completed
4. Search Then Reuse Page Text
find_in_page is backed by request-local cached page text. In practice, that
means the model must first open a page before it can search inside it during
the same request.
Prompt pattern: