Model Context Protocol (MCP) façade

AI Workbench can expose a workspace as a Model Context Protocol server, so external agents — Claude Code, Cursor, Continue, hosted MCP gateways — can use the workspace as a context backend. The agent sees the workspace's read surface (KB search, documents, chats) as MCP tools and resources; it never sees the raw HTTP API or has to implement client code beyond the standard MCP SDK.

The façade is on by default. It shares the /api/v1/* auth middleware and the workspace-scoped authz wrapper, so enabling it does not widen the security boundary — it just exposes the existing read surface over a second protocol. Disable explicitly with mcp.enabled: false if you want a narrower surface than the REST API.

Quick start

Make sure mcp.enabled is not set to false in workbench.yaml (the default is true). Optionally surface the chat_send tool:

yaml

mcp:
  # Optional: also expose `chat_send`, which routes a message
  # through the runtime's global chat service. Inherits the
  # `chat:` block; the tool is silently skipped when chat is
  # unconfigured.
  exposeChat: true

Point an MCP client at http://<your-runtime>/api/v1/workspaces/{workspaceId}/mcp.

The endpoint speaks Streamable HTTP — the modern MCP transport. Each request is stateless: no session id, no per-client state survives between requests.

Configuration

yaml

mcp:
  enabled: true | false      # default: true
  exposeChat: true | false   # default: false; ignored when chat is unset

Field	Default	Notes
`enabled`	`true`	When false the MCP route returns `404 not_found` so the surface isn't probeable.
`exposeChat`	`false`	Adds the `chat_send` tool. Requires the `chat:` block; without it the tool is silently skipped.

Auth

The MCP route is mounted under /api/v1/workspaces/{w}/mcp, which means the regular /api/v1/* auth middleware applies. The shared workspace-route authorization wrapper is enforced on every request: a scoped API key for workspace A cannot call MCP tools against workspace B, even with the URL.

The default auth.mode: disabled (single-tenant dev runtime) lets anonymous callers in. For any deployment exposing MCP to external agents, set auth.mode: apiKey (or stricter) and mint a workspace API key per agent.

Per-tool scopes

Workspace API keys carry a scopes field (see API-keys card in the workspace UI). As of 0.5.0 the MCP server enforces a fine scope per write tool, matching the scope its REST sibling requires:

Tool	Required scope
`ingest_text`, `delete_document`	`write:ingest`
`create_knowledge_base`, `delete_knowledge_base`	`write:kb`
`search_kb`, `list_*`, `get_agent`, `chat_send`, `run_agent`	`read` (passes any authenticated caller)

The check is hierarchical containment, not exact match: a held coarse tier grants its fine grants, so a legacy ["read", "write"] key still passes every write tool exactly as before 0.5.0 (write contains both write:ingest and write:kb). The granularity is opt-in — mint a key with ["read", "write:ingest"] and it can push content but not create or drop knowledge bases.

A key that lacks a tool's fine scope (e.g. ["read"], or ["read", "write:kb"] calling ingest_text) gets isError: true with a JSON body { outcome: "denied", code: "scope_required", required: "write:ingest", subjectScopes, message } — same shape MCP clients already handle for tool failures, so LangGraph / CrewAI / ADK / MAF / watsonx surface the denial as a regular tool error, not a transport-level 403.

OIDC + bootstrap operator credentials and anonymous (dev-mode) callers have scopes: null and pass every gate — the scope check only fires for concrete API-key subjects.

Denials are also recorded as mcp.invoke audit events with outcome: "denied" (distinct from generic failure) so SIEM rules can alert on bursts of scope rejections without parsing tool-specific reason strings.

Tools

Name	Args	Returns
`list_knowledge_bases`	none	JSON array of `{ knowledgeBaseId, name, description, status, language }`
`list_agents`	none	JSON array of `{ agentId, name, description, knowledgeBaseIds, llmServiceId, rerankEnabled }`
`get_agent`	`{ agentId }`	Full agent configuration: prompts, tool ids, KB bindings, reranking overrides.
`list_documents`	`{ knowledgeBaseId, limit? }`	JSON array of document metadata (`documentId`, `sourceFilename`, `status`, `chunkTotal`, `contentHash`, `ingestedAt`)
`search_kb`	`{ knowledgeBaseId, text? \| vector?, topK?, hybrid?, rerank? }`	JSON array of search hits (`chunkId`, `score`, `documentId`, `content`)
`list_chats`	`{ agentId }`	JSON array of chat summaries (`chatId`, `agentId`, `title`, `knowledgeBaseIds`, `createdAt`)
`list_chat_messages`	`{ chatId }`	Oldest-first message log (`messageId`, `role`, `content`, `messageTs`, `metadata`)
`ingest_text`	`{ knowledgeBaseId, text, sourceFilename?, sourceDocId?, metadata?, overwriteOnNameConflict? }`	JSON envelope with one of three `outcome` values: `completed` (new document — `documentId`, `sourceFilename`, `contentHash`, `chunks`), `duplicate` (content-hash match — pipeline did not run; returns the existing `documentId`), or `name_conflict` (`isError: true` — filename matched but bytes differ; retry with `overwriteOnNameConflict: true` or pick a new name). Runs the same dedup + chunk + embed + upsert pipeline as the REST `POST /ingest`. Always synchronous from the MCP caller's POV. Requires the `write:ingest` scope on the calling key (a coarse `write` key grants it via containment) — keys without it see `isError: true` + `outcome: "denied"` instead.
`delete_document`	`{ knowledgeBaseId, documentId }`	JSON object with `outcome`: `deleted` (`documentId`, `chunksDropped`) or `not_found` (no row matched the id — returned without `isError` so speculative cleanup doesn't need to branch). Wraps the same cascade helper the REST `DELETE /documents/{id}` route uses; vector chunks come down first, then the control-plane row. Requires the `write:ingest` scope on the calling key (a coarse `write` key grants it via containment).
`create_knowledge_base`	`{ name, chunkingServiceId, embeddingServiceId, description?, rerankingServiceId?, language?, attach?, vectorCollection? }`	JSON envelope with `outcome: "created"` plus the new `knowledgeBaseId`, resolved `vectorCollection`, and `owned` flag. Wraps the same `KnowledgeBaseService.create` the REST `POST /knowledge-bases` route uses — so the collection-provision + rollback dance runs identically across front doors. Validation failures (`kb_name_taken`, `collection_name_taken`, embedding/dimension mismatch) return `isError: true` with a recognizable `code`. Requires the `write:kb` scope on the calling key (a coarse `write` key grants it via containment).
`delete_knowledge_base`	`{ knowledgeBaseId }`	JSON object with `outcome`: `deleted` or `not_found` (idempotent — re-deleting a missing KB returns `not_found` without `isError`). For owned KBs, drops the underlying vector collection first; attached KBs are detached without touching the collection. Requires the `write:kb` scope on the calling key (a coarse `write` key grants it via containment).
`chat_send` (opt-in)	`{ agentId, chatId, content }`	The assistant's reply as a single text block. Persists both turns through the runtime's global chat service; the system prompt falls back to `DEFAULT_AGENT_SYSTEM_PROMPT` when `chat.systemPrompt` is unset. Use `run_agent` when you want the tool to resolve or create the conversation for you.
`run_agent` (opt-in)	`{ agentId, content, conversationId?, title? }`	JSON envelope `{ outcome, conversationId, agentId, content, finishReason, tokenCount, contextChunkIds }`. One-call agent invocation — resolves (or creates) a conversation bound to the agent's KB set, then drives the same retrieval → prompt → complete → persist pipeline as `chat_send`. Honors the agent's stored `systemPrompt`. Returns `outcome: "agent_not_found"` / `"chat_not_found"` / `"completion_error"` for failure shapes; `"completed"` on success.

All tool results are returned as a single MCP text content item containing JSON; clients parse it into native objects. This keeps the wire format predictable across providers that handle structured content differently.

Why these tools and not others

The façade is mostly retrieval-shaped (search_kb, list_*) so external agents can ground their reasoning in the workspace. Two write tools — ingest_text and delete_document — were added once the LangGraph / CrewAI / ADK story made it clear that recording what an agent gathers (and cleaning up afterward) is half the value of the integration. Larger mutations (KB CRUD, workspace mutation, service CRUD) stay off the surface. Reasons:

Blast radius. A misbehaving agent that can search_kb is a performance / cost concern; one that can delete_kb is a data-loss concern. ingest_text falls in the middle — its only observable effect is more KB content, which is reversible by delete_document. delete_document itself is scoped to a single document at a time (no "delete by filter" surface) so the radius stays predictable.
Auth is fine-scoped (0.5.0). Workspace API keys carry coarse tiers (["read"], ["read", "write"]) and/or fine grants (write:ingest, write:kb, …). Each write tool requires its fine scope (ingest_text / delete_document → write:ingest; create_knowledge_base / delete_knowledge_base → write:kb), resolved by hierarchical containment so a coarse write key still grants all of them — no key minted before 0.5.0 loses access. See the Per-tool scopes subsection in Auth above for the deny envelope.
Most useful surface first. Retrieval is the killer feature for an MCP integration; ingestion is the most-asked-for write tool; delete pairs naturally with ingest for agents that maintain their own KB; everything else is incremental.

chat_send is exposed under a separate flag because it's the only tool that costs LLM/model tokens. ingest_text and delete_document are unflagged: their cost is bounded by the chunker + embedder on the workspace, which the operator already controls through the regular ingest config.

Streaming

Streamable HTTP supports SSE-formatted responses for long-running tool calls; the SDK uses them automatically when the server chooses. Today our tool implementations are synchronous (the only long-running one is chat_send, and we return its full reply at once rather than streaming progress notifications), but the transport is ready when we add a streaming variant.

For the chat UI's own streaming, see agents.md — it uses the POST /agents/{a}/conversations/{c}/messages/stream endpoint that emits structured SSE events tailored to the UI rather than going through MCP.

Tunnelling and reverse-proxy notes

The MCP endpoint uses SSE (Server-Sent Events) to stream JSON-RPC responses. Most reverse proxies and local-tunnel tools work fine, but there are a few gotchas:

Cloudflare quick tunnels (`trycloudflare.com`)

Quick tunnels (cloudflare tunnel --url ...) buffer SSE aggressively. The client often sees an empty body or a stalled connection because Cloudflare holds chunks until a flush threshold is reached or the connection closes — the opposite of what SSE needs.

Recommended alternatives for public dev access:

Option	Notes
Cloudflare Tunnel (named)	`cloudflare tunnel create <name>` + `cloudflare tunnel route dns` — persistent, named tunnels flush SSE correctly.
ngrok	`ngrok http 8080` — SSE works reliably out of the box.
Real reverse proxy	nginx / Caddy with `proxy_buffering off` (nginx) or default Caddy config both pass SSE through without buffering.

nginx

Add to the location block that proxies the runtime:

nginx

location /api/v1/ {
    proxy_pass http://localhost:8080;
    proxy_buffering off;
    proxy_cache off;
    proxy_read_timeout 3600s;
    proxy_set_header Connection '';
    chunked_transfer_encoding on;
}

Without proxy_buffering off, nginx accumulates the SSE stream and delivers it in one shot when the connection closes — which looks like a hanging request from the MCP client's perspective.

MCP client requirements

Most MCP clients require the endpoint URL to use https://. For local development this means either:

a named tunnel / ngrok (both provide HTTPS automatically), or
a local TLS terminator (Caddy's localhost cert, mkcert + nginx).

http://localhost:8080/... works fine if your MCP client explicitly allows plain HTTP local addresses.

Failure surface

Symptom	Why	Fix
`404 not_found` from `/.../mcp`	`mcp.enabled: false` was set explicitly (the default is `true`).	Remove `mcp.enabled: false` from `workbench.yaml` (or flip it to `true`).
`404 workspace_not_found`	Path workspace id doesn't exist.	Check the workspace id.
`401` / `403`	Caller lacks access.	Verify the API key scope (workspace match).
`chat_send` tool isn't registered	`exposeChat: false`, OR `chat:` is unset.	Set `exposeChat: true` AND wire the `chat:` block.

Specification — the MCP wire protocol.
docs/configuration.md — full workbench.yaml schema.
docs/auth.md — the auth surface MCP inherits.
docs/agents.md — the agent surface that the chat UI uses; the chat_send MCP tool wraps the runtime's global chat service.

Model Context Protocol (MCP) façade ​

Quick start ​

Configuration ​

Auth ​

Per-tool scopes ​

Tools ​

Why these tools and not others ​

Streaming ​

Tunnelling and reverse-proxy notes ​

Cloudflare quick tunnels (trycloudflare.com) ​

nginx ​

MCP client requirements ​

Failure surface ​

Related ​

Model Context Protocol (MCP) façade

Quick start

Configuration

Auth

Per-tool scopes

Tools

Why these tools and not others

Streaming

Tunnelling and reverse-proxy notes

Cloudflare quick tunnels (`trycloudflare.com`)

nginx

MCP client requirements

Failure surface

Related