Model Context Protocol (MCP) façade
AI Workbench can expose a workspace as a Model Context Protocol server, so external agents — Claude Code, Cursor, Continue, hosted MCP gateways — can use the workspace as a context backend. The agent sees the workspace's read surface (KB search, documents, chats) as MCP tools and resources; it never sees the raw HTTP API or has to implement client code beyond the standard MCP SDK.
The façade is on by default. It shares the /api/v1/* auth middleware and the workspace-scoped authz wrapper, so enabling it does not widen the security boundary — it just exposes the existing read surface over a second protocol. Disable explicitly with mcp.enabled: false if you want a narrower surface than the REST API.
Quick start
- Make sure
mcp.enabledis not set tofalseinworkbench.yaml(the default istrue). Optionally surface thechat_sendtool:yamlmcp: # Optional: also expose `chat_send`, which routes a message # through the runtime's global chat service. Inherits the # `chat:` block; the tool is silently skipped when chat is # unconfigured. exposeChat: true - Point an MCP client at
http://<your-runtime>/api/v1/workspaces/{workspaceId}/mcp.
The endpoint speaks Streamable HTTP — the modern MCP transport. Each request is stateless: no session id, no per-client state survives between requests.
Configuration
mcp:
enabled: true | false # default: true
exposeChat: true | false # default: false; ignored when chat is unset| Field | Default | Notes |
|---|---|---|
enabled | true | When false the MCP route returns 404 not_found so the surface isn't probeable. |
exposeChat | false | Adds the chat_send tool. Requires the chat: block; without it the tool is silently skipped. |
Auth
The MCP route is mounted under /api/v1/workspaces/{w}/mcp, which means the regular /api/v1/* auth middleware applies. The shared workspace-route authorization wrapper is enforced on every request: a scoped API key for workspace A cannot call MCP tools against workspace B, even with the URL.
The default auth.mode: disabled (single-tenant dev runtime) lets anonymous callers in. For any deployment exposing MCP to external agents, set auth.mode: apiKey (or stricter) and mint a workspace API key per agent.
Per-tool scopes
Workspace API keys carry a scopes field (see API-keys card in the workspace UI). As of 0.5.0 the MCP server enforces a fine scope per write tool, matching the scope its REST sibling requires:
| Tool | Required scope |
|---|---|
ingest_text, delete_document | write:ingest |
create_knowledge_base, delete_knowledge_base | write:kb |
search_kb, list_*, get_agent, chat_send, run_agent | read (passes any authenticated caller) |
The check is hierarchical containment, not exact match: a held coarse tier grants its fine grants, so a legacy ["read", "write"] key still passes every write tool exactly as before 0.5.0 (write contains both write:ingest and write:kb). The granularity is opt-in — mint a key with ["read", "write:ingest"] and it can push content but not create or drop knowledge bases.
A key that lacks a tool's fine scope (e.g. ["read"], or ["read", "write:kb"] calling ingest_text) gets isError: true with a JSON body { outcome: "denied", code: "scope_required", required: "write:ingest", subjectScopes, message } — same shape MCP clients already handle for tool failures, so LangGraph / CrewAI / ADK / MAF / watsonx surface the denial as a regular tool error, not a transport-level 403.
OIDC + bootstrap operator credentials and anonymous (dev-mode) callers have scopes: null and pass every gate — the scope check only fires for concrete API-key subjects.
Denials are also recorded as mcp.invoke audit events with outcome: "denied" (distinct from generic failure) so SIEM rules can alert on bursts of scope rejections without parsing tool-specific reason strings.
Tools
| Name | Args | Returns |
|---|---|---|
list_knowledge_bases | none | JSON array of { knowledgeBaseId, name, description, status, language } |
list_agents | none | JSON array of { agentId, name, description, knowledgeBaseIds, llmServiceId, rerankEnabled } |
get_agent | { agentId } | Full agent configuration: prompts, tool ids, KB bindings, reranking overrides. |
list_documents | { knowledgeBaseId, limit? } | JSON array of document metadata (documentId, sourceFilename, status, chunkTotal, contentHash, ingestedAt) |
search_kb | { knowledgeBaseId, text? | vector?, topK?, hybrid?, rerank? } | JSON array of search hits (chunkId, score, documentId, content) |
list_chats | { agentId } | JSON array of chat summaries (chatId, agentId, title, knowledgeBaseIds, createdAt) |
list_chat_messages | { chatId } | Oldest-first message log (messageId, role, content, messageTs, metadata) |
ingest_text | { knowledgeBaseId, text, sourceFilename?, sourceDocId?, metadata?, overwriteOnNameConflict? } | JSON envelope with one of three outcome values: completed (new document — documentId, sourceFilename, contentHash, chunks), duplicate (content-hash match — pipeline did not run; returns the existing documentId), or name_conflict (isError: true — filename matched but bytes differ; retry with overwriteOnNameConflict: true or pick a new name). Runs the same dedup + chunk + embed + upsert pipeline as the REST POST /ingest. Always synchronous from the MCP caller's POV. Requires the write:ingest scope on the calling key (a coarse write key grants it via containment) — keys without it see isError: true + outcome: "denied" instead. |
delete_document | { knowledgeBaseId, documentId } | JSON object with outcome: deleted (documentId, chunksDropped) or not_found (no row matched the id — returned without isError so speculative cleanup doesn't need to branch). Wraps the same cascade helper the REST DELETE /documents/{id} route uses; vector chunks come down first, then the control-plane row. Requires the write:ingest scope on the calling key (a coarse write key grants it via containment). |
create_knowledge_base | { name, chunkingServiceId, embeddingServiceId, description?, rerankingServiceId?, language?, attach?, vectorCollection? } | JSON envelope with outcome: "created" plus the new knowledgeBaseId, resolved vectorCollection, and owned flag. Wraps the same KnowledgeBaseService.create the REST POST /knowledge-bases route uses — so the collection-provision + rollback dance runs identically across front doors. Validation failures (kb_name_taken, collection_name_taken, embedding/dimension mismatch) return isError: true with a recognizable code. Requires the write:kb scope on the calling key (a coarse write key grants it via containment). |
delete_knowledge_base | { knowledgeBaseId } | JSON object with outcome: deleted or not_found (idempotent — re-deleting a missing KB returns not_found without isError). For owned KBs, drops the underlying vector collection first; attached KBs are detached without touching the collection. Requires the write:kb scope on the calling key (a coarse write key grants it via containment). |
chat_send (opt-in) | { agentId, chatId, content } | The assistant's reply as a single text block. Persists both turns through the runtime's global chat service; the system prompt falls back to DEFAULT_AGENT_SYSTEM_PROMPT when chat.systemPrompt is unset. Use run_agent when you want the tool to resolve or create the conversation for you. |
run_agent (opt-in) | { agentId, content, conversationId?, title? } | JSON envelope { outcome, conversationId, agentId, content, finishReason, tokenCount, contextChunkIds }. One-call agent invocation — resolves (or creates) a conversation bound to the agent's KB set, then drives the same retrieval → prompt → complete → persist pipeline as chat_send. Honors the agent's stored systemPrompt. Returns outcome: "agent_not_found" / "chat_not_found" / "completion_error" for failure shapes; "completed" on success. |
All tool results are returned as a single MCP text content item containing JSON; clients parse it into native objects. This keeps the wire format predictable across providers that handle structured content differently.
Why these tools and not others
The façade is mostly retrieval-shaped (search_kb, list_*) so external agents can ground their reasoning in the workspace. Two write tools — ingest_text and delete_document — were added once the LangGraph / CrewAI / ADK story made it clear that recording what an agent gathers (and cleaning up afterward) is half the value of the integration. Larger mutations (KB CRUD, workspace mutation, service CRUD) stay off the surface. Reasons:
- Blast radius. A misbehaving agent that can
search_kbis a performance / cost concern; one that candelete_kbis a data-loss concern.ingest_textfalls in the middle — its only observable effect is more KB content, which is reversible bydelete_document.delete_documentitself is scoped to a single document at a time (no "delete by filter" surface) so the radius stays predictable. - Auth is fine-scoped (0.5.0). Workspace API keys carry coarse tiers (
["read"],["read", "write"]) and/or fine grants (write:ingest,write:kb, …). Each write tool requires its fine scope (ingest_text/delete_document→write:ingest;create_knowledge_base/delete_knowledge_base→write:kb), resolved by hierarchical containment so a coarsewritekey still grants all of them — no key minted before 0.5.0 loses access. See the Per-tool scopes subsection in Auth above for the deny envelope. - Most useful surface first. Retrieval is the killer feature for an MCP integration; ingestion is the most-asked-for write tool; delete pairs naturally with ingest for agents that maintain their own KB; everything else is incremental.
chat_send is exposed under a separate flag because it's the only tool that costs LLM/model tokens. ingest_text and delete_document are unflagged: their cost is bounded by the chunker + embedder on the workspace, which the operator already controls through the regular ingest config.
Streaming
Streamable HTTP supports SSE-formatted responses for long-running tool calls; the SDK uses them automatically when the server chooses. Today our tool implementations are synchronous (the only long-running one is chat_send, and we return its full reply at once rather than streaming progress notifications), but the transport is ready when we add a streaming variant.
For the chat UI's own streaming, see agents.md — it uses the POST /agents/{a}/conversations/{c}/messages/stream endpoint that emits structured SSE events tailored to the UI rather than going through MCP.
Tunnelling and reverse-proxy notes
The MCP endpoint uses SSE (Server-Sent Events) to stream JSON-RPC responses. Most reverse proxies and local-tunnel tools work fine, but there are a few gotchas:
Cloudflare quick tunnels (trycloudflare.com)
Quick tunnels (cloudflare tunnel --url ...) buffer SSE aggressively. The client often sees an empty body or a stalled connection because Cloudflare holds chunks until a flush threshold is reached or the connection closes — the opposite of what SSE needs.
Recommended alternatives for public dev access:
| Option | Notes |
|---|---|
| Cloudflare Tunnel (named) | cloudflare tunnel create <name> + cloudflare tunnel route dns — persistent, named tunnels flush SSE correctly. |
| ngrok | ngrok http 8080 — SSE works reliably out of the box. |
| Real reverse proxy | nginx / Caddy with proxy_buffering off (nginx) or default Caddy config both pass SSE through without buffering. |
nginx
Add to the location block that proxies the runtime:
location /api/v1/ {
proxy_pass http://localhost:8080;
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 3600s;
proxy_set_header Connection '';
chunked_transfer_encoding on;
}Without proxy_buffering off, nginx accumulates the SSE stream and delivers it in one shot when the connection closes — which looks like a hanging request from the MCP client's perspective.
MCP client requirements
Most MCP clients require the endpoint URL to use https://. For local development this means either:
- a named tunnel / ngrok (both provide HTTPS automatically), or
- a local TLS terminator (Caddy's
localhostcert,mkcert+ nginx).
http://localhost:8080/... works fine if your MCP client explicitly allows plain HTTP local addresses.
Failure surface
| Symptom | Why | Fix |
|---|---|---|
404 not_found from /.../mcp | mcp.enabled: false was set explicitly (the default is true). | Remove mcp.enabled: false from workbench.yaml (or flip it to true). |
404 workspace_not_found | Path workspace id doesn't exist. | Check the workspace id. |
401 / 403 | Caller lacks access. | Verify the API key scope (workspace match). |
chat_send tool isn't registered | exposeChat: false, OR chat: is unset. | Set exposeChat: true AND wire the chat: block. |
Related
- Specification — the MCP wire protocol.
docs/configuration.md— fullworkbench.yamlschema.docs/auth.md— the auth surface MCP inherits.docs/agents.md— the agent surface that the chat UI uses; thechat_sendMCP tool wraps the runtime's global chat service.