Skip to content

API Spec

The AI Workbench HTTP contract. Every green box — the default TypeScript runtime and any future language-native runtime — serves this surface. Conformance is enforced by cross-runtime fixtures.

The machine-readable OpenAPI document is served at /api/v1/openapi.json, and a Scalar-rendered reference UI is served at /docs. This document exists to explain the shape narratively and to flag what's coming.

Conventions

Base URL and versioning

  • Functional routes live under /api/v1/….
  • Operational routes (/, /healthz, /readyz, /version, /features, /metrics, /astra-cli, /astra-cli/profiles, /docs, /api/v1/openapi.json) are unversioned.
  • Breaking changes bump the prefix to /api/v2/…; /api/v1/… stays until deprecated.

Content type

  • Request and response bodies are JSON (application/json).
  • Streaming endpoints use text/event-stream. Today: async-ingest job progress at GET /jobs/{jobId}/events.

Identifiers

  • All IDs are RFC 4122 v4 UUIDs rendered as lowercase hyphenated strings.
  • Timestamps are ISO-8601 in UTC with millisecond precision (2026-04-22T10:11:12.345Z).
  • Secrets never appear by value. Fields like credentials or embedding.secretRef hold pointers of the form <provider>:<path> (e.g. env:ASTRA_DB_APPLICATION_TOKEN).

Resource scoping

Every nested resource carries its parent IDs in the path:

/api/v1/workspaces/{workspaceId}
/api/v1/workspaces/{workspaceId}/knowledge-bases/{knowledgeBaseId}
/api/v1/workspaces/{workspaceId}/knowledge-bases/{kb}/documents/{documentId}
/api/v1/workspaces/{workspaceId}/{chunking,embedding,reranking}-services/{serviceId}

A request whose path references a non-existent workspace returns 404 workspace_not_found before the nested resource is ever consulted.

Pagination

Control-plane list endpoints accept:

  • limit — number of items to return, 1–200, default 50.
  • cursor — opaque value from the previous page's nextCursor.

Paginated responses use:

json
{
  "items": [],
  "nextCursor": null
}

When nextCursor is non-null, pass it back as ?cursor=... to read the next page. Malformed cursors return 400 invalid_cursor.

The chat surface — an agent's conversations (GET /agents/{a}/conversations) and a conversation's messages (GET /agents/{a}/conversations/{c}/messages) — uses keyset cursors so an unbounded, write-heavy transcript pages without the runtime materializing the whole conversation per request. Two consequences for clients:

  • Cursors are opaque and are not stable across deploys. A cursor minted before a deploy may return 400 invalid_cursor after one; restart from the first page.
  • Drain on the cursor, not on emptiness. Because internal tool-call scaffolding is filtered out of the public message listing, a page can come back with fewer than limit items (or even empty) while nextCursor is still non-null. Keep following nextCursor until it is null; do not stop on a short or empty page.

The bounded control-plane list endpoints (workspaces, services, knowledge bases, API keys, agents, …) keep simple offset cursors.

Error envelope

All error responses share one envelope:

json
{
  "error": {
    "code": "workspace_not_found",
    "message": "workspace '<workspaceId>' not found",
    "requestId": "b48e…",
    "hint": "Check the workspace ID in your URL — `aiw workspace list` shows the IDs the runtime knows about.",
    "docs": "docs/errors.md#workspace-not-found"
  }
}

hint is a one-line remediation drawn from the error-code registry (runtimes/typescript/src/lib/error-codes.ts), and docs points at a section of the docs root. Both are optional — older codes that aren't yet in the registry omit them. The complete registry is also reachable at runtime via GET /error-codes and rendered for humans in docs/errors.md.

Codes are stable, lowercase, snake_case. Messages are human-readable and may change. Currently emitted:

StatusCodeWhen
400validation_errorRequest body / params / query failed Zod validation. message carries the first failing field path and its reason (name: Name is required, credentials.token: expected '<provider>:<path>', e.g. 'env:FOO').
401unauthorizedMissing / malformed / invalid bearer token. WWW-Authenticate: Bearer set. See auth.md.
403forbiddenToken is valid but not authorized for the requested action — either the subject's workspaceScopes doesn't include the target workspace, or it's a scoped subject attempting a platform-level action (e.g. POST /workspaces). Also reserved for role-based checks in the upcoming RBAC phase.
413payload_too_large/api/v1/workspaces/* request body exceeded the runtime's 10 MB default JSON body limit, or an ingest request exceeded the 50 MB ingest-only limit.
404not_foundUnknown route
404workspace_not_foundWorkspace ID doesn't exist
404knowledge_base_not_foundKnowledge-base ID doesn't exist in workspace
404document_not_foundDocument ID doesn't exist in the knowledge base
404chunking_service_not_found / embedding_service_not_found / reranking_service_not_foundService ID doesn't exist in workspace
404job_not_foundJob ID doesn't exist in the workspace
409conflictCreate with an already-taken ID, or service deletion refused while a KB still references it
501hybrid_not_supportedCaller asked for hybrid search on a workspace kind whose driver doesn't implement searchHybrid
501rerank_not_supportedCaller asked for rerank on a workspace kind whose driver doesn't implement rerank
400dimension_mismatchSupplied vector length doesn't match the KB's bound embedding service
400embedding_unavailableText search/upsert fallback could not build an embedder for the KB's bound embedding service
400embedding_dimension_mismatchEmbedder output dimension doesn't match the bound embedding service
422workspace_misconfiguredWorkspace is missing url, token, keyspace, or similar driver-required config
500internal_errorUnhandled exception
503control_plane_unavailableBacking store is unreachable
503collection_unavailableUnderlying vector collection is unreachable or missing
503driver_unavailableWorkspace kind has no registered vector-store driver

Authentication

/api/v1/* runs through a configurable auth middleware. The default posture (auth.mode: disabled) tags every request anonymous and lets it through — same behavior as before the middleware existed. Flip auth.mode to turn enforcement on. See auth.md for the full contract, config, and rollout plan.

Header format is Authorization: Bearer <token> (RFC 6750). On failure the response carries WWW-Authenticate: Bearer and the canonical error envelope:

json
{ "error": { "code": "unauthorized", "message": "…", "requestId": "…" } }

Operational routes (/, /healthz, /readyz, /version, /features, /metrics, /health/details, /health/recent-errors, /error-codes, /setup-status, /setup/env, /setup/restart, /astra-cli, /astra-cli/profiles, /docs, /api/v1/openapi.json) bypass the middleware so load balancers and ops tooling can always reach them. /setup/env and /setup/restart additionally require the bootstrap token unless the runtime is in the fresh-install window (auth.mode === "disabled" AND no workspaces exist); see Setup wizard below.

API-key issuance, OIDC bearer verification, browser OIDC login, and silent token refresh are all implemented. All verifier modes flow through the same middleware — routes don't need to care which verifier accepted the token. Browser-only /auth/* routes (/auth/config, /auth/login, /auth/callback, /auth/me, /auth/refresh, /auth/logout) are documented in auth.md rather than here.

Request ID

Every response carries X-Request-Id. If the client supplies one, the runtime echoes it; otherwise the runtime generates a UUID-hex string. Error responses include the same value in error.requestId.


Operational routes

GET /

Service banner.

Response 200

json
{
  "name": "ai-workbench",
  "version": "0.0.0",
  "commit": "abc1234",
  "docs": "/docs"
}

GET /healthz

Liveness. Returns 200 as long as the process is running.

json
{ "status": "ok" }

GET /readyz

Readiness. 200 once the control-plane store is reachable and workspaces can be listed. The payload carries a workspace count rather than a list — avoids O(N) responses when the store grows.

json
{ "status": "ready", "workspaces": 3 }

Returns 503 draining during graceful shutdown (SIGINT / SIGTERM). Kubernetes-style readiness probes will stop routing traffic while the runtime finishes in-flight requests. See configuration.md for the drain sequence. /healthz stays 200 throughout so livenessProbe doesn't restart a healthy, draining process.

GET /version

Build metadata.

json
{
  "version": "0.0.0",
  "commit": "abc1234",
  "buildTime": "2026-04-21T10:30:00Z",
  "node": "v22.11.0"
}

GET /features

Runtime feature flags the bundled web UI reads to decide which surfaces to render. Reflects the active config (chat enabled, MCP enabled, auth posture, astra-cli inventory available, etc.). Never echoes secrets.

GET /metrics

Prometheus exposition (text/plain; version=0.0.4). No auth — same precedent as /healthz / /readyz. Families:

  • HTTP request counter + duration histogram labeled by method, matched route pattern, and status family (2xx/4xx/5xx)
  • Ingest semaphore gauges (workbench_ingest_workers_{active,queued})
  • Rate-limit rejections by key type
  • workbench_chat_requests_total{provider,outcome} — chat completions per provider, terminal outcome
  • workbench_ingest_documents_total{outcome} — documents fully processed by terminal outcome
  • workbench_search_requests_total{mode,outcome} and workbench_search_duration_seconds{mode} — KB search calls and latency, mode is vector / hybrid / rerank / text

GET /health/details

Deep health snapshot. Returns {controlPlane, chat, ingest, recentErrors} with per-probe {status: "ok"|"degraded"|"down", detail, durationMs}. The chat probe calls ChatService.ping() (an OpenAI-compatible /models call against the configured provider — OpenRouter, OpenAI, or Ollama) when a chat provider is configured. No auth.

GET /health/recent-errors

In-memory ring buffer of the last error envelopes (cap 100, newest first): code, status, method, matched route pattern, requestId, timestamp. No PII. No auth — same posture as /healthz. Drives the web /status page.

GET /error-codes

Returns the error-code registry as JSON for tooling — every registered code with its default message, hint, and docs fields. Powers aiw doctor --explain <code>, the web /status page, and external dashboards. No auth.

Setup wizard

First-run configuration routes used by the web onboarding flow. Both mutation routes accept the bootstrap token, or run unauthenticated only during the fresh-install window (auth.mode === "disabled" AND zero workspaces).

GET /setup-status

json
{
  "setupComplete": false,
  "workspacesCount": 0,
  "controlPlane": "file",
  "hasAstraCreds": false,
  "hasChatProvider": false,
  "managedEnv": { "exists": false, "path": "/data/.env" }
}

POST /setup/env

Atomically writes a wizard-managed dotenv file at $WORKBENCH_DATA_DIR/.env (mode 0600). Allow-list: ASTRA_DB_API_ENDPOINT, ASTRA_DB_APPLICATION_TOKEN, OPENROUTER_API_KEY, OPENAI_API_KEY. Any other key returns 400 unsupported_setup_key.

POST /setup/restart

Triggers graceful shutdown so the bundled compose restart: unless-stopped brings the runtime back with the new values loaded. Returns 202 immediately and begins draining.

GET /astra-cli

Auto-detected astra CLI defaults the runtime resolved at boot (active profile, default org, default DB id + name + endpoint, etc.). The web UI reads this to pre-fill the workspace onboarding form. Returns an empty payload when no CLI / profile is configured.

GET /astra-cli/profiles

Live shellout: lists every configured astra CLI profile and the databases visible to each. Drives the profile picker in the onboarding wizard. May take seconds depending on Astra API latency; not part of the hot path.

GET /docs

Scalar-rendered OpenAPI reference UI. Human-facing.

GET /api/v1/openapi.json

Machine-readable OpenAPI 3.1 document. Generated from the @hono/zod-openapi route definitions registered under /api/v1/*. The operational, setup, and /auth/* surfaces are mounted via plain Hono routers (not OpenAPI routers) and are therefore documented narratively here instead of in the generated spec; treat this file as the canonical reference for those.


/api/v1/workspaces

GET /api/v1/workspaces

List all workspaces, sorted by createdAt ascending with workspaceId as tie-breaker. Every backend (memory / file / astra) produces the same ordering so UI renders are deterministic.

Response 200 — paginated Workspace records:

json
{
  "items": [
    {
      "workspaceId": "…",
      "name": "prod",
      "url": "env:ASTRA_DB_API_ENDPOINT",
      "kind": "astra",
      "credentials": { "token": "env:ASTRA_DB_APPLICATION_TOKEN" },
      "keyspace": "default_keyspace",
      "createdAt": "2026-04-22T10:11:12.345Z",
      "updatedAt": "2026-04-22T10:11:12.345Z"
    }
  ],
  "nextCursor": null
}

POST /api/v1/workspaces

Create a workspace. workspaceId is optional — the runtime generates one if omitted.

Request

json
{
  "name": "prod",
  "kind": "astra",
  "url": "env:ASTRA_DB_API_ENDPOINT",
  "credentials": { "token": "env:ASTRA_DB_APPLICATION_TOKEN" },
  "keyspace": "default_keyspace"
}

kind is one of astra | hcd | openrag | mock. (mock stays a first-class option for CI and offline work.) Once set, kind is immutable — changing it would orphan any already-provisioned KB collections.

url is the workspace's data-plane URL (for astra / hcd, the Astra Data API endpoint). Accepts either a literal URL or a SecretRef — the driver resolves refs at dial time so the same record works across dev and prod without code changes.

Each value in credentials must be a SecretRef (<provider>:<path>, e.g. env:ASTRA_DB_APPLICATION_TOKEN or file:/etc/workbench/secrets/astra-token). Raw secret values are rejected with 400.

Response 201 — the created Workspace.

GET /api/v1/workspaces/{workspaceId}

Fetch a single workspace.

  • 200Workspace
  • 404 workspace_not_found

PATCH /api/v1/workspaces/{workspaceId}

Patch one or more of name, url, credentials, keyspace. Every field is optional; omitted fields are preserved.

kind and workspaceId are immutable after creation and are rejected with 400. Unknown fields are likewise rejected (strict body).

  • 200 — updated Workspace
  • 400 — body contains kind or an unknown field
  • 404 workspace_not_found

DELETE /api/v1/workspaces/{workspaceId}

Cascades to the workspace's knowledge bases, execution services, RAG documents, and API keys. Before removing the control-plane rows, the runtime drops each KB's underlying Astra collection through the workspace's driver.

  • 204 — deleted
  • 404 workspace_not_found
  • 503 driver_unavailable — workspace has knowledge bases but no registered driver to drop their collections

POST /api/v1/workspaces/{workspaceId}/test-connection

Run a live workspace connection check. For mock workspaces, this always returns ok: true. Remote backends resolve their configured connection details and ask the driver to make a data-plane call.

Response 200 — always 200 regardless of check outcome; the ok field distinguishes success from failure:

json
{
  "ok": true,
  "kind": "astra",
  "details": "Astra Data API responded to listCollections."
}
json
{
  "ok": false,
  "kind": "astra",
  "details": "credential 'token' could not be resolved: env var 'ASTRA_DB_APPLICATION_TOKEN' is not set"
}
  • 200 — probe executed; inspect ok for pass/fail
  • 404 workspace_not_found

/api/v1/workspaces/{workspaceId}/api-keys

Workspace-scoped bearer tokens. Documented in auth.md; re-capped here for the route contract.

GET

List every key ever issued for the workspace, including revoked ones. Never exposes the hash column.

An ApiKey:

json
{
  "workspaceId": "…",
  "keyId": "…",
  "prefix": "abc123xyz789",
  "label": "ci",
  "createdAt": "…",
  "lastUsedAt": null,
  "revokedAt": null,
  "expiresAt": null
}
  • 200 — paginated ApiKey records
  • 404 workspace_not_found

POST

Issue a new key. The plaintext is returned exactly once — the runtime stores only a scrypt digest.

Request

json
{ "label": "ci", "expiresAt": null }

Response 201

json
{
  "plaintext": "wb_live_abc123xyz789_…",
  "key": { "...ApiKey..." }
}
  • 201 — created; plaintext is the only time you'll see the token
  • 400 — missing / empty label
  • 404 workspace_not_found

DELETE /{keyId}

Soft-revoke: stamps revokedAt, leaves the row visible so audit tools still see the history. The next request bearing this token gets 401 unauthorized. Re-revoking an already-revoked key is a no-op that still returns 204.

  • 204 — revoked (or was already revoked)
  • 404 workspace_not_found / api_key_not_found

/api/v1/workspaces/{workspaceId}/{chunking,embedding,reranking}-services

Workspace-scoped execution services. Knowledge bases compose one chunking + one embedding + (optionally) one reranking service at create time. The three surfaces share an identical CRUD shape; only the body fields differ.

GET

List services in the workspace.

  • 200 — paginated ChunkingService / EmbeddingService / RerankingService records (sorted by createdAt ascending, *ServiceId as tie-breaker)
  • 404 workspace_not_found

POST

Create a service. The runtime generates the service ID if omitted. Required fields by kind:

KindRequired
chunkingname, engine
embeddingname, provider, modelName, embeddingDimension
rerankingname, provider, modelName

Optional fields cover endpoint config (endpointBaseUrl, endpointPath, requestTimeoutMs, authType, credentialRef), provider/engine tuning, and supported language/content tags. See the OpenAPI spec for the full per-kind shape.

json
{
  "name": "openai-3-small",
  "provider": "openai",
  "modelName": "text-embedding-3-small",
  "embeddingDimension": 1536,
  "distanceMetric": "cosine",
  "endpointBaseUrl": "https://api.openai.com/v1",
  "credentialRef": "env:OPENAI_API_KEY",
  "supportedLanguages": ["en", "fr"],
  "supportedContent": ["text"]
}

supportedLanguages and supportedContent arrive as arrays and are returned deduplicated + sorted on the wire. (Astra-row layer keeps them as SET<TEXT>; the converter normalises at the boundary.)

  • 201 — the created record (with the generated *ServiceId)
  • 400 validation_error — schema failure
  • 404 workspace_not_found
  • 409 conflict*ServiceId collision

GET /{serviceId} / PATCH /{serviceId} / DELETE /{serviceId}

Fetch / patch / delete. PATCH accepts every field from create (all optional). Strict bodies — unknown keys return 400.

DELETE is refused with 409 conflict while any KB still references the service. Drop or rebind the dependent KBs first. The error message names the offending KB so operators can navigate straight to it.


/api/v1/workspaces/{workspaceId}/knowledge-bases

Knowledge base provisioning

A knowledge base is the runtime's atomic retrieval unit: a logical group of documents indexed by exactly one embedding service and one chunking service, optionally re-ranked by one reranker. Creating a KB through POST does four things in lockstep:

  1. Validate the requested collection shape. Owned KBs use the KB name as the underlying collection identifier. Attach-mode KBs (attach: true) must supply vectorCollection, and the supplied value must equal name so the KB row and data-plane collection cannot drift apart.
  2. Insert the control-plane row. The KnowledgeBase record is written before owned collection provisioning; if provisioning fails, the runtime rolls the row back so callers never observe a KB that points at a missing collection.
  3. Materialize the underlying vector collection on the workspace's driver. The driver (mock for tests, astra for production) creates a collection sized for the bound embedding service's embeddingDimension with the requested vectorSimilarity. For Astra workspaces with an astra-provider embedding service, the collection is provisioned with a service: block so embedding runs server-side (see Configuration §Vectorize-on-ingest). Attach mode skips this step and binds to the existing data-plane collection after validating compatibility.
  4. Seed any default knowledge filters declared on the workspace. Filters are mutable post-create via POST /{kb}/filters.

Collection naming. Owned KBs derive vectorCollection from name, and the KB name must match Astra collection-name rules (letters, digits, underscores; starts with a letter; max 48 chars). To adopt a pre-existing collection, set attach: true and supply that collection name as both name and vectorCollection; the driver verifies its dimension and vectorize provider/model match the bound embedding service before the row is accepted. Renaming after create is not supported because the name is the collection identifier.

Idempotence. POST is not idempotent on its own — re-issuing the same request creates a second KB with a fresh knowledgeBaseId. To make creation safe to retry, supply an explicit knowledgeBaseId in the body; if the row already exists with the same name and service bindings, the route returns 409 conflict rather than mutating the existing KB. Drop the KB explicitly before re-creating.

Dimension binding. The bound embedding service's embeddingDimension is captured into the collection at create time and is not re-checked on subsequent ingest / search calls — the driver trusts the collection's dimension. Changing the embedding service binding via PATCH is rejected (the field is immutable) because the collection's stored vectors would no longer match the new service's dimension.

Cascade on DELETE. The route drops the underlying collection before the control-plane row so a partial failure leaves the KB intact. Once the collection is gone, the row is removed and the cascade clears RAG documents, knowledge filters, and any conversation references in agent.knowledgeBaseIds / conversation.knowledgeBaseIds.

GET

List knowledge bases in the workspace.

  • 200 — paginated KnowledgeBase records
  • 404 workspace_not_found

A KnowledgeBase:

json
{
  "workspaceId": "…",
  "knowledgeBaseId": "…",
  "name": "support-docs",
  "description": "customer support knowledge base",
  "status": "active",
  "embeddingServiceId": "…",
  "chunkingServiceId": "…",
  "rerankingServiceId": null,
  "language": "en",
  "vectorCollection": "support_docs",
  "lexical": { "enabled": false, "analyzer": null, "options": {} },
  "createdAt": "…",
  "updatedAt": "…"
}

POST

Create a KB and auto-provision its underlying Astra collection. Transactional — if collection provisioning fails, the KB row is rolled back so the control plane and data plane never drift.

For owned KBs, omit vectorCollection; the runtime uses name as the collection name. To adopt a pre-existing collection, set attach: true and supply the same collection name in both name and vectorCollection.

Request

json
{
  "name": "support-docs",
  "description": "customer support",
  "embeddingServiceId": "…",
  "chunkingServiceId": "…",
  "rerankingServiceId": null,
  "language": "en"
}

embeddingServiceId and chunkingServiceId are required. Both must reference services that exist in the same workspace.

  • 201 — the created KnowledgeBase (collection now exists)
  • 404 workspace_not_found / embedding_service_not_found / chunking_service_not_found / reranking_service_not_found
  • 409 conflictknowledgeBaseId collision
  • 422 workspace_misconfigured — workspace is missing url or credentials.token required by its driver
  • 503 driver_unavailable — no driver registered for the workspace's kind

GET /{knowledgeBaseId} / PATCH /{knowledgeBaseId} / DELETE /{knowledgeBaseId}

GET reads the record. PATCH accepts a partial — description, status, rerankingServiceId, language, and lexical are mutable; name, embeddingServiceId, and chunkingServiceId are immutable post-create and the schema is .strict(), so accidentally including them in a body returns 400. DELETE drops the underlying collection first for owned KBs, then the KB row, then cascades RAG document rows. Attached KBs detach without dropping the external collection.

GET /api/v1/workspaces/{workspaceId}/adoptable-collections

Discover Astra collections in the workspace's keyspace that aren't already bound to a knowledge base. The web UI uses this to populate the "attach an existing collection" picker on the create-KB flow.

  • 200{ "items": [ { "name": string, "vectorDimension": number | null, "vectorMetric": string | null } ] }
  • 404 workspace_not_found
  • 422 workspace_misconfigured — workspace driver missing required config
  • 503 driver_unavailable

Knowledge filters — …/knowledge-bases/{kb}/filters

Workspace-scoped, KB-scoped saved retrieval filters. They are shallow-equal payload constraints applied at search time without requiring the caller to remember the exact JSON. Used by the playground's filter dropdown and by agents that want pre-defined narrowings.

MethodPathPurpose
GET/{kb}/filtersList filters in the KB (paginated)
POST/{kb}/filtersCreate. Body: { knowledgeFilterId?, name, description?, filter }. 409 on duplicate explicit ID.
GET/{kb}/filters/{filterId}Fetch one
PATCH/{kb}/filters/{filterId}Mutate name, description, or filter
DELETE/{kb}/filters/{filterId}204

filter is the same shape as POST /search's filter body — a shallow-equal map over payload keys. Filters are seeded from the workspace's configured defaults at KB-create time.

POST /{knowledgeBaseId}/records — upsert records

Request — each record carries exactly one of vector or text:

json
{
  "records": [
    { "id": "doc-1", "vector": [0.01, -0.02, ...], "payload": { "title": "…" } },
    { "id": "doc-2", "text": "winter sweater in blue" },
    { "id": "doc-3", "text": "summer shorts", "payload": { "tag": "apparel" } }
  ]
}
  • records — 1..500 items per request.
  • id is the application's identifier; re-upsert replaces the prior value.
  • vector.length must equal the bound embedding service's embeddingDimension.
  • Text dispatch mirrors search: the route tries driver.upsertByText() for all-text batches (Astra $vectorize inserts for collections with a service block). On NotSupportedError the runtime embeds each text record via the KB's bound embedding service and retries through plain upsert. Mixed batches always embed client-side so the whole batch stays in one transactional call.

Response 200

json
{ "upserted": 2 }
  • 400 validation_error — record has neither/both of vector/text
  • 400 dimension_mismatch — vector length doesn't match the bound embedding service's embeddingDimension
  • 400 embedding_unavailable / embedding_dimension_mismatch
  • 404 workspace_not_found / knowledge_base_not_found

DELETE /{knowledgeBaseId}/records/{recordId}

Delete a single record. recordId is the application's id (any non-empty string).

json
{ "deleted": true }

POST /{knowledgeBaseId}/search — vector or text search

Request — exactly one of vector or text, plus optional hybrid / lexicalWeight / rerank:

json
{
  "text": "how do refunds work?",
  "topK": 5,
  "filter": { "section": "billing" },
  "hybrid": true,
  "lexicalWeight": 0.3,
  "rerank": true
}
  • topK defaults to 10, clamped to [1, 1000].
  • filter is shallow-equal on payload keys.
  • hybrid: true runs the driver's vector + lexical lane (defaults to the KB's lexical.enabled). Requires text.
  • rerank: true reorders hits through the KB's bound reranking service. Defaults to true when rerankingServiceId is non-null. Requires text.

The route synthesises a driver-facing descriptor from the KB plus its bound services (see kb-descriptor.ts) so the dispatch layer stays unchanged.

Response 200 — array of hits, sorted by score descending:

json
[
  { "id": "doc-1", "score": 0.94, "payload": { "title": "…" } },
  { "id": "doc-2", "score": 0.87, "payload": { "title": "…" } }
]

Score semantics match the bound embedding service's distanceMetric:

MetricScore
cosineCosine similarity in [-1, 1]; 1 = exact match
dotRaw dot product; unbounded
euclidean1 / (1 + distance) so higher = closer
  • 400 validation_error — neither/both of vector/text, or hybrid/rerank without text
  • 400 dimension_mismatch / embedding_unavailable / embedding_dimension_mismatch
  • 404 workspace_not_found / knowledge_base_not_found
  • 501 hybrid_not_supported / rerank_not_supported

GET /{knowledgeBaseId}/documents

List RAG documents in the KB.

  • 200 — paginated RagDocument records
  • 404 workspace_not_found / knowledge_base_not_found

A RagDocument:

json
{
  "workspaceId": "…",
  "knowledgeBaseId": "…",
  "documentId": "…",
  "sourceDocId": null,
  "sourceFilename": "readme.md",
  "fileType": "text/markdown",
  "fileSize": 1024,
  "contentHash": "sha256:…",
  "chunkTotal": null,
  "ingestedAt": null,
  "updatedAt": "…",
  "status": "pending",
  "errorMessage": null,
  "metadata": { "source": "upload" }
}

status is one of pending | chunking | embedding | writing | ready | failed. The KB ingest pipeline is the canonical writer of status / errorMessage / chunkTotal / ingestedAt. Clients can also set these directly via PATCH if they own the lifecycle externally.

POST /{knowledgeBaseId}/documents

Register a document in the KB without running the ingest pipeline.

json
{
  "sourceFilename": "readme.md",
  "fileType": "text/markdown",
  "fileSize": 1024,
  "contentHash": "sha256:…",
  "metadata": { "source": "upload" }
}
  • 201 — the created RagDocument (status defaults to pending, metadata defaults to {})
  • 404 workspace_not_found / knowledge_base_not_found
  • 409 conflictworkspaceId collision within the same KB

GET /{knowledgeBaseId}/documents/{documentId} / PATCH /{documentId} / DELETE /{documentId}

Fetch / patch / delete. PATCH accepts every field from create (all optional). DELETE cascades into the KB's collection: chunks matched by payload.documentId are removed before the row is dropped, so a successful delete leaves no traces in KB-scoped search. Drivers exposing deleteRecords use a single bulk call; older drivers fall back to a listRecords + per-row delete loop.

GET /{knowledgeBaseId}/documents/{documentId}/chunks

Lists the chunks the ingest pipeline extracted from this document. Reads raw records out of the KB's collection filtered on documentId, sorts by the chunkIndex payload key, and returns:

json
[
  {
    "id": "<documentId>:0",
    "chunkIndex": 0,
    "text": "First paragraph about apples.",
    "payload": {
      "knowledgeBaseId": "…",
      "documentId": "…",
      "chunkIndex": 0,
      "chunkText": "First paragraph about apples.",
      "source": "seed"
    }
  }
]

Query params:

  • limit (1–1000, default 1000) — caps the number of chunks returned.

  • 200 — array of chunks, sorted by chunkIndex ascending

  • 404 workspace_not_found / knowledge_base_not_found / document_not_found

  • 501 list_records_not_supported — driver doesn't expose listRecords

POST /{knowledgeBaseId}/ingest

Synchronous end-to-end ingest. Chunks the input text, embeds every chunk through the KB's bound embedding service (server-side via $vectorize where the driver supports it, otherwise client-side), upserts the chunks into the KB's collection, and creates a RagDocument row with status: ready + chunkTotal.

Request

json
{
  "text": "Apples are red. Bananas are yellow.",
  "sourceFilename": "fruit.md",
  "metadata": { "source": "seed" },
  "chunker": { "maxChars": 1000, "minChars": 100, "overlapChars": 150 }
}

chunker overrides the runtime defaults for this call only. metadata is merged onto every chunk's payload; the reserved keys knowledgeBaseId, documentId, chunkIndex, and chunkText are always set by the runtime and override any caller-supplied values. text is capped at 200,000 characters.

Response 201

json
{
  "document": { "status": "ready", "chunkTotal": 3, "...": "..." },
  "chunks": 3
}

Chunk payloads. Every chunk upserted carries:

  • knowledgeBaseId — the KB's ID (used by /search)
  • documentId — the ID of the RagDocument row this ingest created
  • chunkIndex — 0-based position within the source document
  • chunkText — the chunk's raw text (read back through /chunks)
  • Plus every caller-supplied metadata key

Failure semantics. When chunking or upsert throws, the RagDocument row is marked status: failed with errorMessage before the error is re-raised.

POST /{knowledgeBaseId}/ingest?async=true

Same body. The pipeline runs in the background; the response returns immediately with a job pointer.

Response 202

json
{
  "job": {
    "workspaceId": "…",
    "jobId": "…",
    "kind": "ingest",
    "knowledgeBaseId": "…",
    "documentId": "…",
    "status": "pending",
    "processed": 0,
    "total": null,
    "result": null,
    "errorMessage": null,
    "createdAt": "…",
    "updatedAt": "…"
  },
  "document": { "status": "writing", "…": "…" }
}

Errors are the same set as the sync path. A 4xx means the request was rejected outright; nothing was enqueued and no job row exists.

Once the job is running, failures are captured into the job record (status: failed, errorMessage populated) and the document row. The runKbIngestJob worker resolves the KB descriptor on every call so renames or service swaps mid-flight don't drift.

POST /{knowledgeBaseId}/ingest/file

Multipart counterpart to /ingest. Accepts a binary upload (PDF, DOCX, XLSX, or text) plus optional metadata, dispatches an extractor based on the file's MIME type / extension, then runs the same chunk → embed → upsert pipeline.

Form fields:

FieldRequiredNotes
fileyesThe document bytes. Must be a File part in multipart/form-data.
metadatanoJSON object string merged onto every chunk's payload (same semantics as the JSON /ingest metadata field).
chunkernoJSON object string overriding the runtime's chunker defaults for this call only.
parsernonative | docling | auto (default). When DOCLING_URL is unset, native is the only option. See Configuration § Document extraction.

Query: ?async=true → 202 + job pointer (same response shape as the JSON variant). Body cap is 50 MB.

  • 201{ document, chunks }
  • 202{ job, document } (when async=true)
  • 400 invalid_multipart / missing_file — body wasn't multipart, or the file field was missing
  • 400 validation_error — bad metadata / chunker JSON
  • 400 extractor_unsupported — file type the runtime can't extract
  • 413 payload_too_large — body exceeded 50 MB
  • 503 docling_unavailableparser=docling (or auto) couldn't reach the configured docling-serve

/api/v1/workspaces/{workspaceId}/jobs/{jobId}

Job poll surface for anything that runs in the background. Today only async ingest creates jobs; future bulk ops (reindex, export, batch delete) plug in with the same record shape.

GET /{jobId}

Point-in-time fetch, suitable for polling. Returns the Job record described above.

  • 200Job
  • 404 job_not_found

GET /{jobId}/events

Server-Sent Events stream. Emits event: job with the full record as JSON on every update, plus a final event: done carrying { status } when the job hits a terminal state. The current record is replayed as the first job event so clients don't race the first update.

Headers: Content-Type: text/event-stream, Cache-Control: no-cache.

Same-replica updates fan out immediately through the in-process subscription registry. With the Astra job store, subscribers on other replicas poll the subscribed job records at controlPlane.jobPollIntervalMs so an SSE client can see progress even when the worker is running on a different pod. The memory and file job stores remain single-replica deployment shapes.

Job record

FieldTypeNotes
workspaceIduuidOwning workspace
jobIduuid
kind"ingest"Discriminator — more kinds arrive with more async ops
knowledgeBaseIduuid or nullSet for ingest jobs
documentIduuid or nullSet for ingest jobs
status"pending" | "running" | "succeeded" | "failed"Terminal: succeeded, failed
processedintUnits completed
totalint or nullUnits expected (null if unknown)
resultobject or nullKind-specific summary on success (ingest: { chunks: N })
errorMessagestring or nullPopulated on failed
leasedBystring or nullReplica currently driving the job
leasedAtiso-8601 or nullLast heartbeat from the lease holder
ingestInputobject or nullPersisted ingest snapshot used for orphan replay
createdAtiso-8601
updatedAtiso-8601

Persistence. The job store auto-matches the control-plane driver:

  • controlPlane.driver: memory → jobs live in-process (lost on restart).
  • controlPlane.driver: file → jobs serialize to <controlPlane.root>/jobs.json alongside workspaces.json, survive restart.
  • controlPlane.driver: astra → jobs live in wb_jobs_by_workspace, reusing the existing Data API connection; durable across restart and across replicas. Subscriptions poll across replicas while local updates still fan out immediately.

Clustered Astra deployments can set controlPlane.jobsResume.enabled: true. Running workers then stamp leasedBy / leasedAt; the orphan sweeper claims stale leases and, when ingestInput is present, replays the ingest pipeline. Chunk IDs are deterministic, so replay is idempotent. Older jobs without an input snapshot, or future job kinds that cannot replay yet, are claimed and marked failed so clients still see a terminal state.

The cross-replica subscription model, lease/heartbeat protocol, and orphan-sweeper design are documented in full in cross-replica-jobs.md.

/api/v1/workspaces/{workspaceId}/llm-services

Workspace-scoped LLM execution services — describe how to call a chat-completion or generation model. Mirrors the chunking / embedding / reranking service surface. An agent in the same workspace may bind one of these via agent.llmServiceId; the agent's send + streaming pipeline then instantiates a chat service from the bound record.

Today provider: "openrouter", provider: "openai", and provider: "ollama" are wired end-to-end — all OpenAI-compatible and served by the same adapter. Other providers can be created and stored, but agent send returns 422 llm_provider_unsupported until their adapters land.

GET /llm-services

List services in the workspace, oldest-first. Paginated.

  • 200 — paginated LlmService records
  • 404 workspace_not_found

POST /llm-services

Create a service. Required: name, provider, modelName. Optional fields cover endpoint config (endpointBaseUrl, endpointPath, requestTimeoutMs, authType, credentialRef), provider tuning (engine, modelVersion, contextWindowTokens, maxOutputTokens, temperatureMin, temperatureMax, supportsStreaming, supportsTools, maxBatchSize), and language / content tags. See the OpenAPI spec for the full shape.

json
{
  "name": "openrouter-gpt-4o-mini",
  "provider": "openrouter",
  "modelName": "openai/gpt-4o-mini",
  "credentialRef": "env:OPENROUTER_API_KEY",
  "maxOutputTokens": 1024
}
  • 201 — the created LlmService
  • 400 validation_error
  • 404 workspace_not_found
  • 409 conflict — duplicate explicit llmServiceId

GET /llm-services/{llmServiceId} / PATCH /{id} / DELETE /{id}

Fetch / patch / delete. PATCH accepts every field from create (all optional). DELETE is refused with 409 conflict while any agent still references the service via llmServiceId. Reassign or delete the dependent agents first.

/api/v1/workspaces/{workspaceId}/agents

User-defined agents — workspace-scoped personas backed by the Stage-2 agentic tables. See agents.md for the full walkthrough; the route shapes are summarised below.

Historical note. Earlier drafts of this document described a parallel /chats route surface and a singleton "Bobbie" agent. Both were retired; the agent surface is the single way to chat against a workspace.

GET /agents

List agents in the workspace, oldest-first. Paginated.

POST /agents

  • Body: CreateAgentInput (see agents.md).
  • 201Agent
  • 404 — workspace not found
  • 409 — duplicate explicit agentId

GET /agents/{agentId}

  • 200Agent

PATCH /agents/{agentId}

Patch any optional field except agentId. Sends null to clear nullable fields (including llmServiceId).

DELETE /agents/{agentId}

204; cascades the agent's conversations and their messages.

GET /agents/{agentId}/conversations

List the agent's conversations, newest-first. Paginated.

POST /agents/{agentId}/conversations

  • Body: CreateConversationInput ({ conversationId?, title?, knowledgeBaseIds? }).
  • 201Conversation
  • 404 — workspace or agent not found

GET|PATCH|DELETE /agents/{agentId}/conversations/{conversationId}

Single-conversation read / update (title + KB filter) / delete. Delete cascades messages. 404 when the conversation does not belong to the named agent.

GET /agents/{agentId}/conversations/{conversationId}/messages

Oldest-first message log, paginated.

  • 200 — paginated ChatMessage records
  • 404 when the workspace, agent, or conversation does not exist, or when the conversation does not belong to the named agent

POST /agents/{agentId}/conversations/{conversationId}/messages (synchronous)

Body: { content }. Persists the user turn, retrieves grounding context, calls the agent's LLM (per the resolution order below), persists the assistant turn, and returns:

json
{ "user": <ChatMessage>, "assistant": <ChatMessage> }

LLM resolution. When agent.llmServiceId is set the runtime instantiates a chat service from the bound LLM-service record. When unset it falls back to the runtime's global chat: block.

  • 201{ user, assistant }
  • 404 when the conversation does not belong to the named agent
  • 422 llm_provider_unsupportedagent.llmServiceId points at an LLM service whose provider is not one of openrouter, openai, or ollama
  • 422 llm_credential_missing — bound LLM service has no credentialRef
  • 503 chat_disabled — runtime has no global chat: block configured and the agent has no llmServiceId

POST /agents/{agentId}/conversations/{conversationId}/messages/stream (SSE)

Same body. Returns text/event-stream:

EventPayloadWhen
user-messageThe persisted user ChatMessageOnce, after the user turn is persisted
token{ delta: string }Per model emission
token-reset{}After a tool-call iteration so the UI can clear pre-tool narration before iteration N+1 streams in
tool-call{ toolName, args, callId }The model requested a tool invocation. Native function calling works across all wired providers (openrouter, openai, ollama) since they share the OpenAI tool-call wire format, subject to the specific model supporting tools (OpenRouter's catalog is filtered to tool-capable models).
tool-result{ toolName, callId, result }Each tool result fed back into the next iteration
doneThe persisted assistant ChatMessage (metadata.finish_reason: "stop" / "length")Terminal on success
errorThe persisted assistant ChatMessage with metadata.finish_reason: "error"Terminal on failure

The stream emits exactly one of done / error. Tool-use loops are capped at 6 iterations per turn. Client disconnect is treated as a clean stop — whatever was already streamed gets persisted with finish_reason: "stop". Status codes are the same as the synchronous variant (404 / 422 / 503 surface as error events when they occur after the response has already started).

GET /agent-templates

Catalog of one-click agent templates the UI offers in the agent gallery. Workspace-scoped for authz, but the body is workspace- independent and ships with the binary. Returns the four entries (Bobby, Maven, Quill, Sage) with their templateId, name, description, persona prompt, and defaultOnNewWorkspace flag. See agents.md § Template catalog.

POST /agents/from-template

Instantiate a catalog template as a new agent in the workspace.

json
{ "templateId": "bobby" }

The new agent's name, description, and systemPrompt are copied from the template; other fields default to the same values as POST /agents. Audit event agent.create carries the templateId slug.

  • 201Agent
  • 400 — unknown templateId
  • 404 — workspace not found

Agent record

FieldTypeNotes
workspaceIduuid
agentIduuidServer-assigned unless caller supplied.
namestring
descriptionstring | null
systemPromptstring | null
userPromptstring | null
llmServiceIduuid | nullWhen set, points at an LLM service in the same workspace; the agent's chat service is instantiated from that record. When null, the runtime's global chat: block is used. Mutable.
knowledgeBaseIdsuuid[]Default RAG-grounding set.
ragEnabledbool
ragMaxResultsint | null
ragMinScorenumber | null
rerankEnabledbool
rerankingServiceIduuid | nullAgent-level override of the KB-level reranker.
rerankMaxResultsint | null
createdAtiso-8601
updatedAtiso-8601

Conversation record

FieldTypeNotes
workspaceIduuid
agentIduuid
conversationIduuid
titlestring | null
knowledgeBaseIdsuuid[]Per-conversation override of the agent's default KB set.
createdAtiso-8601

ChatMessage record

FieldTypeNotes
workspaceIduuid
conversationIduuid
messageIduuid
messageTsiso-8601Cluster-key. Strictly increasing within a conversation.
role"user" | "agent" | "system" | "tool"agent is the assistant turn.
contentstring | null
tokenCountint | nullIf the provider reports it.
metadataRecord<string, string>RAG provenance (context_document_ids, context_chunks), model, finish_reason (stop/length/error), error_message.

/api/v1/workspaces/{workspaceId}/mcp

Optional Model Context Protocol façade. Speaks Streamable HTTP (the modern MCP transport) with JSON-RPC payloads. On by default (mcp.enabled: true); set mcp.enabled: false in workbench.yaml to take the route down. See mcp.md for the full walkthrough.

MethodStatusBody
GET / POST / DELETE / OPTIONS200JSON-RPC response (or SSE stream for long-running tool calls). The four methods map to the Streamable-HTTP spec — POST for client→server messages, GET for the long-lived event stream, DELETE for session teardown, OPTIONS for CORS preflight.
any404 not_foundWhen mcp.enabled is false
any404 workspace_not_foundWhen the path workspace doesn't exist

The full tool catalogue (read + write surfaces, scopes, JSON-Schema inputs) is documented in mcp.md. At a glance:

  • Readlist_knowledge_bases, list_documents, search_kb (vector / hybrid / rerank), list_agents, get_agent, list_chats, list_chat_messages
  • Writeingest_text, delete_document, create_knowledge_base, delete_knowledge_base
  • Chat-gatedchat_send, run_agent (only when mcp.exposeChat: true and chat: is configured)

Auth flows through the regular /api/v1/* middleware plus the shared workspace-route authorization wrapper, so workspace scoping is enforced before any MCP tool is invoked.

Planned routes

These do not exist yet. Shapes may shift before they land.

Multi-provider LLM execution

openrouter, openai, and ollama are wired end-to-end today. All three speak the OpenAI tool-call wire format, so the agent tool-use loop fires across every wired provider (subject to the specific model supporting tools — OpenRouter's catalog is filtered to tool-capable models). Other providers (Cohere, Anthropic, Bedrock, …) can be created and stored, but agent send returns 422 llm_provider_unsupported until the provider is wired into the chat-service factory. Adding a provider is mostly a one-case addition to the dispatcher.

MCP tool execution

/api/v1/workspaces/{w}/mcp-tools — CRUD over the wb_config_mcp_tools_by_workspace rows, plus /api/v1/workspaces/{w}/agents/{a}/run for an agent execution loop with tool use. Now that the MCP server façade is in, the inverse — letting an agent call MCP tools — is the next step.

See roadmap.md for the phase plan.


OpenAPI

The generated document at /api/v1/openapi.json covers the /api/v1/* resource routes (workspaces, knowledge bases, agents, chat, search, jobs, MCP tools, etc.) and stays in sync with the running runtime — those routes register their Zod schemas through @hono/zod-openapi. Share it with downstream tooling (client generators, API gateway configs, etc.).

The operational (/healthz, /readyz, /metrics, /health/*, /error-codes), setup (/setup-*), and /auth/* surfaces are mounted as plain Hono routers and are documented in this file narratively (and in auth.md for /auth/*); they will not appear in the generated spec.

To consume locally:

bash
curl -s http://localhost:8080/api/v1/openapi.json > openapi.json

Released under the MIT license.