Configuration (`workbench.yaml`)

Runtime behavior is driven by a single YAML file, conventionally named workbench.yaml. The runtime loads it at startup and validates it against a strict schema.

Workspaces, knowledge bases, and execution services are not in config. They're runtime data, mutable via the HTTP API. workbench.yaml decides two things:

Where that data is persisted (the control-plane backend).
Optionally, which seed workspaces to load into the memory backend at startup.

Resolution order

The runtime looks for the config file in this order and takes the first match:

--config <file> CLI flag.
WORKBENCH_CONFIG environment variable.
./workbench.yaml in the process working directory.
./examples/workbench.yaml — the sample config this runtime ships with. Lets npm run dev work out-of-the-box when run from the runtime directory.
/etc/workbench/workbench.yaml (the Docker image default).

No cross-source merging — config is a single declarative document. --config and WORKBENCH_CONFIG are returned verbatim; they fail loudly if the target doesn't exist rather than silently falling through to the next step.

Environment variable interpolation

Any string value may reference an environment variable with ${VAR} or ${VAR:-default} syntax. Interpolation happens before schema validation.

yaml

controlPlane:
  driver: astra
  endpoint: ${ASTRA_DB_API_ENDPOINT}
  tokenRef: env:ASTRA_DB_APPLICATION_TOKEN

References to unset variables without a default fail loudly at startup.

Note: tokenRef above is a SecretRef string, not an interpolation. Secret refs are resolved at use time by the runtime's SecretResolver, which is separate from YAML interpolation. See § Secrets below.

Top-level schema

yaml

version: 1                          # required
runtime: { port, logLevel, ... }    # optional, with defaults
controlPlane: { driver, ... }       # optional, default: memory
seedWorkspaces: [ ... ]             # optional, memory-only

`version`

Schema version. Currently 1. The runtime refuses to start on an unknown version.

`runtime`

Field	Type	Default	Notes
`environment`	enum	`development`	`development \| production`. Production mode enforces durable persistence, enabled auth, rejected anonymous traffic, HTTPS `publicOrigin`, and a persistent OIDC session secret when browser login is configured.
`port`	int	`8080`	HTTP listen port
`logLevel`	enum	`info`	`trace \| debug \| info \| warn \| error`. The `LOG_LEVEL` env var overrides this when set.
`requestIdHeader`	string	`X-Request-Id`	Name of the request-ID header
`uiDir`	string \| null	`null`	Directory of pre-built UI assets to serve from `/` (with SPA fallback). `null` auto-detects `/app/public` → `${cwd}/public` → `${cwd}/apps/web/dist`. The `UI_DIR` env var also works as an override. The official Docker image sets this up automatically.
`replicaId`	string \| null	`null`	Identifier this replica writes into job leases (used by the cross-replica orphan sweeper to tell whose lease is whose). `null` auto-generates `${HOSTNAME or "wb"}-<short-uuid>` at boot — fine for single-replica deployments and tests; set explicitly for clustered runs if you want the lease holder to be deterministic.
`publicOrigin`	URL \| null	`null`	Externally visible browser origin, e.g. `https://workbench.example.com`. Used for OIDC redirect URI construction and secure-cookie decisions. Required for production OIDC browser login.
`trustProxyHeaders`	boolean	`false`	Trust `X-Forwarded-Proto` / `X-Forwarded-Host` when `publicOrigin` is not set. Also extends to the rate limiter (`X-Forwarded-For` / `X-Real-IP`). Enable only behind a trusted proxy that overwrites those headers.
`csrfOriginCheck`	boolean	`true`	CSRF Origin/Referer check on cookie-protected routes (`/api/v1/workspaces/*` state-changing methods, plus `/auth/refresh` and `/auth/logout`). Bearer-token requests bypass the check. Disable only for non-browser clients that authenticate with cookies but cannot send `Origin` — prefer Bearer auth instead.
`rateLimit`	object	(defaults below)	In-process per-IP rate limiter. See § Rate limiting.
`blockPrivateNetworkEndpoints`	boolean	`false`	Layered SSRF defense: when `true`, operator-supplied `endpointBaseUrl` values on chunking / embedding / reranking / LLM services are rejected if they resolve to RFC1918 (`10/8`, `172.16/12`, `192.168/16`), loopback, or IPv6 unique-local hosts. Auto-flipped to `true` when `runtime.environment: production`. Default `false` so the local-Ollama / local-vLLM dev workflow keeps working; production deployments should still pair this with VPC-level egress controls.
`maxConcurrentIngestJobs`	int (≥1)	`4`	Per-replica cap on in-flight ingest workers. Beyond the cap, queued jobs wait in-process for a slot rather than slamming the embedding provider's quota. Persisted job state is unaffected; raise for dedicated provisioned-throughput deployments. Surfaced as `workbench_ingest_workers_{active,queued}` on `/metrics`.
`tracing`	object	(off)	OpenTelemetry tracing knobs. See § Tracing.
`telemetry`	object	(off)	Opt-in anonymous usage telemetry. See § Telemetry.

Production deployments should start from runtimes/typescript/examples/workbench.production.yaml.

Rate limiting

Defense-in-depth limiter applied to /api/v1/* (capacity from config) and /auth/* (a tighter fixed cap of 30 req/window — login flows shouldn't burst). Per-IP, per-replica fixed window. Distributed deployments should still front the runtime with an upstream WAF / API gateway for accurate aggregate ceilings; this layer protects against runaway clients and naive scanners.

yaml

runtime:
  rateLimit:
    enabled: true        # default
    capacity: 600        # max requests per window per IP for /api/v1/*
    windowMs: 60000      # window length, ms

Field	Type	Default	Notes
`enabled`	bool	`true`	Set `false` to skip the limiter entirely.
`capacity`	int (1–1_000_000)	`600`	Per-IP requests per window for `/api/v1/*`. The auth surface uses a fixed `30`.
`windowMs`	int (1000–3_600_000)	`60000`	Window length in milliseconds.

Rejected requests get 429 Too Many Requests with the canonical error envelope, a Retry-After header (seconds), and X-RateLimit-{Limit,Remaining,Reset} headers on every response. Client IP is derived from the socket; set runtime.trustProxyHeaders: true to honor X-Forwarded-For / X-Real-IP instead.

Tracing

OpenTelemetry tracing knobs. Off by default — flipping enabled: true starts a NodeSDK with the OTLP HTTP trace exporter and the standard auto-instrumentations bundle. When disabled, the runtime still creates manual server spans through @opentelemetry/api so flipping tracing on later does not require code changes — the spans are just no-ops without a registered SDK.

yaml

runtime:
  tracing:
    enabled: false
    serviceName: null         # null → "ai-workbench-runtime"
    exporterUrl: null         # null → OTEL_EXPORTER_OTLP_ENDPOINT / SDK default

Field	Type	Default	Notes
`enabled`	bool	`false`	Start the NodeSDK + auto-instrumentations bundle.
`serviceName`	string \| null	`null`	Override the `service.name` resource attribute. `null` keeps the default `ai-workbench-runtime`.
`exporterUrl`	URL \| null	`null`	OTLP HTTP traces endpoint, e.g. `https://otel-collector.example.com/v1/traces`. `null` falls back to `OTEL_EXPORTER_OTLP_ENDPOINT` and the SDK default.

For full HTTP / fetch / pino auto-instrumentation, preload the SDK at process launch (node --import ./dist/lib/tracing-preload.js dist/root.js). Without --import, manual server spans cover every request but outbound HTTP / fetch / DB clients won't emit child spans. See production.md for the deploy-side walkthrough.

Telemetry

Opt-in anonymous usage telemetry. Off by default. When enabled without a sink, the runtime constructs each event and logs telemetry: dark mode (no sink configured) instead of sending — useful for verifying the wiring before standing up a collector. Network failures never block the runtime: each emit is fire-and-forget with a 2 s timeout.

yaml

runtime:
  telemetry:
    enabled: false
    url: null            # e.g. https://telemetry.example.com/v1/events

Field	Type	Default	Notes
`enabled`	bool	`false`	Set `true` to construct + emit events. `WORKBENCH_TELEMETRY=1` is an env override; `WORKBENCH_TELEMETRY=0` disables even if YAML says `true`.
`url`	URL \| null	`null`	Sink for `POST`ed events. `WORKBENCH_TELEMETRY_URL` env override. `null` + `enabled: true` is dark mode (events constructed, never sent).

Every event carries an anonymous install id persisted at $WORKBENCH_DATA_DIR/.install-id. Three event types are emitted: runtime_start, error (code + status, no message bodies), and command_run from the CLI wrapper. The canonical event catalog and no-PII guarantee live in telemetry.md.

`controlPlane`

Picks where workspaces, knowledge bases, execution services, and RAG documents are persisted. Discriminated on driver.

When controlPlane: is omitted entirely, the runtime infers a default: if both ASTRA_DB_API_ENDPOINT and ASTRA_DB_APPLICATION_TOKEN are populated (the astra-cli auto-detection on boot fills these for any developer with a working profile), the runtime selects the astra driver against ASTRA_DB_KEYSPACE (or default_keyspace). Otherwise it falls back to a file backend rooted at ./.workbench-data. Set controlPlane.driver: memory explicitly if you want pure in-process state without the on-disk fallback.

`memory`

yaml

controlPlane:
  driver: memory

In-process Maps. State is lost when the runtime exits. Best for CI, tests, and ephemeral demos. Note that omitting controlPlane entirely no longer falls through to memory — the runtime's default prefers Astra (when env vars are present) or a file backend. Set driver: memory explicitly to opt in.

`file`

yaml

controlPlane:
  driver: file
  root: /var/lib/workbench

JSON-on-disk. One file per table, per-table mutex, atomic rename on writes. Single-node self-hosted. Not safe for multiple writers — if you run two containers pointing at the same directory, they'll clobber each other.

Field	Type	Required	Notes
`root`	string	yes	Directory that will hold `workspaces.json` et al. Created if absent.

`sqlite`

yaml

controlPlane:
  driver: sqlite
  path: /var/lib/workbench/workbench.db

SQLite via better-sqlite3, in WAL mode. Same durability and single-node posture as file, but row-level writes instead of a whole-file rewrite per mutation — built for chat-heavy deployments where the file backend goes quadratic as conversations grow. Not safe for multiple writers; run one replica per database file. WAL sidecars (-wal, -shm) live beside the database file. A path of :memory: selects an ephemeral in-process database (tests and throwaway demos).

Field	Type	Required	Notes
`path`	string	yes	Database file path. Parent directory must exist. `:memory:` selects an ephemeral in-process database.
`jobsResume`	object	off	Cross-replica orphan-sweeper config (see below). Single-node SQLite rarely needs it.
`jobPollIntervalMs`	int (50–60000)	—	Accepted for symmetry with `astra`; unused by the in-process SQLite job store, which fans out updates with no poller.

`astra`

yaml

controlPlane:
  driver: astra
  endpoint: https://<db-id>-<region>.apps.astra.datastax.com
  tokenRef: env:ASTRA_DB_APPLICATION_TOKEN
  keyspace: default_keyspace

Astra Data API Tables via @datastax/astra-db-ts. Production-grade, multi-writer-safe.

Tip — astra-cli auto-config. If you have the astra CLI installed and a profile configured, you can leave ASTRA_DB_APPLICATION_TOKEN and ASTRA_DB_API_ENDPOINT unset locally — the runtime will pick them up from the CLI at startup. Production deployments inject them from a secret manager and the CLI integration is automatically inert. See astra-cli.md.

Field	Type	Required	Notes
`endpoint`	URL	yes	Astra Data API endpoint
`tokenRef`	SecretRef	yes	Pointer to the application token (`env:…` / `file:…`)
`keyspace`	string	no (default `default_keyspace`)	Keyspace hosting the `wb_*` control-plane tables. Defaults to `default_keyspace` — the keyspace Astra DB auto-creates on every new database — so out-of-the-box deployments boot without pre-creating one.
`jobPollIntervalMs`	int (50–60000)	`500`	Cross-replica job-subscriber poll interval in ms. Each subscribed `(workspace, jobId)` pair is re-read at this cadence so SSE clients on a different replica from the worker still see updates. Same-replica updates fan out instantly; the poller is a no-op when no one is subscribed. Raise for cost-sensitive deployments where second-scale staleness is fine; lower for hot SSE paths. Astra-only — `memory` and `file` are single-replica by definition.
`jobsResume`	object	off	Cross-replica orphan-sweeper config. See below.
`reconcileOrphansOnStart`	bool	`false`	Run a one-shot orphaned-dependent reconciliation at boot. See below. Astra-only.

The runtime creates the wb_* tables at startup if they don't exist (using createTable(..., { ifNotExists: true })). The keyspace itself must already exist.

`controlPlane.jobsResume` (file / sqlite / astra)

Off by default — only useful for clustered deployments where one replica can crash mid-ingest while another stays up. Single-replica operators don't need it (their pipelines always fail-fast on the same process). When enabled, every replica scans the durable job store on an interval for running jobs whose lease is older than the grace window and CAS-claims them. Jobs with a persisted ingest snapshot replay the pipeline idempotently; older rows without a snapshot still become terminal failed records so SSE clients do not hang forever. See cross-replica-jobs.md.

jobsResume.enabled: true with controlPlane.driver: memory is rejected at validation time — there is no shared store for sibling replicas to scan when the leases live in another replica's process memory.

yaml

controlPlane:
  driver: astra
  endpoint: https://...
  tokenRef: env:ASTRA_DB_APPLICATION_TOKEN
  jobsResume:
    enabled: true
    graceMs: 60000     # how stale a lease must be before reclaim
    intervalMs: 60000  # how often each replica scans

`controlPlane.reconcileOrphansOnStart` (astra)

Off by default. When true, the runtime runs a single reconcileOrphans() pass during startup: it scans the wb_* dependent tables for rows whose owning workspace row no longer exists — the signature of a partial cross-partition cascade failure — and re-runs the idempotent dependent-delete cascade for each such workspace id.

You shouldn't normally need it. deleteWorkspace is children-first, parent-last: it removes every dependent partition before the workspace row and, on any partial failure (Astra has no cross-partition transaction), leaves the workspace row intact and returns 500 cascade_incomplete so the delete is simply retried — the cascade is idempotent, so the retry completes it. New orphans therefore don't occur. This flag exists to mop up orphans left by older deployments (which deleted the workspace row first) or out-of-band row deletions. Enable it, restart once, then turn it back off. It is operator-gated by workbench.yaml ownership and never blocks startup — a reconcile failure is logged and boot continues.

Field	Type	Default	Notes
`enabled`	bool	`false`	Set to `true` to start the sweeper. Off by default; clustered deployments opt in.
`graceMs`	int (1000–600000)	`60000`	Maximum age of a lease (relative to last heartbeat) before the job is considered orphaned.
`intervalMs`	int (1000–600000)	`60000`	How often each replica scans for stale leases.

Heartbeats are stamped on every progress update (processed ticking, status flipping), so any active worker keeps its lease fresh. Each replica writes its own replicaId (see runtime.replicaId) into leasedBy so the sweeper can tell what claim belongs to whom.

`seedWorkspaces` (memory only)

Optional list of workspace records loaded into the memory backend at startup. Lets developers skip the POST /api/v1/workspaces dance when running locally.

yaml

seedWorkspaces:
  - name: demo
    kind: mock
  - name: prod-astra
    kind: astra
    url: env:ASTRA_DB_API_ENDPOINT
    credentials:
      token: env:ASTRA_DB_APPLICATION_TOKEN
    keyspace: workbench

Field	Type	Required	Notes
`name`	string	yes	Workspace name
`kind`	enum	yes	`astra \| hcd \| openrag \| mock`
`uid`	UUID	no (auto-generated)	Only useful if other seeds reference it
`url`	URL or SecretRef	no	Workspace-specific data-plane URL
`credentials`	map<string, SecretRef>	no	Per-key secret pointers
`keyspace`	string	no	Workspace-specific keyspace

Using seedWorkspaces with any driver other than memory is a validation error — workspaces already persist in the backend, so there's nothing to seed.

Embedding services and vectorize-on-ingest

Embedding services are workspace-scoped runtime data — created via POST /api/v1/workspaces/{w}/embedding-services, not in workbench.yaml. The yaml only seeds workspaces; everything past that flows through the API. Embedding services control how the runtime turns text into vectors during ingest and search.

The runtime supports two execution paths:

Client-side embedding (the default). The runtime resolves endpointBaseUrl + endpointPath + credentialRef, calls the provider's HTTP API at ingest / search time, and writes the resulting vectors to Astra. Works with any embedding provider (OpenAI, Cohere, NVIDIA NIM, self-hosted, etc.) — the runtime carries the bytes.
Vectorize-on-ingest (server-side, Astra-only). When the embedding service has provider: "astra" and the bound vector collection was created with a matching Astra service block (e.g. nvidia:nvidia/nv-embedqa-e5-v5), the runtime delegates embedding to Astra's $vectorize column type. Documents are upserted with raw text; Astra runs the embedding model in its own infrastructure and stores both the text and the vector. Search likewise sends the query as text and Astra embeds it server-side.

The vectorize path is faster on hot paths (no extra round-trip to a provider), simpler to operate (no provider credential lives on the runtime), and avoids the dimension-mismatch class of errors. Two constraints to know:

The embedding service and the collection must agree. Creating a knowledge base with provider: "astra" materializes the Astra collection with the service block baked in. Changing the embedding service binding on an existing KB is rejected if the dimension or provider doesn't match what the collection was provisioned with.
credentialRef is irrelevant on the runtime side. Astra itself holds whatever provider credentials the vectorize service needs; the runtime never sees them. Setting credentialRef on a provider: "astra" embedding service is a no-op.

Mixed-batch upserts (some records carry a precomputed vector, others carry text) always fall back to client-side embedding so the entire batch lands in one transactional call. See api-spec.md §POST /{knowledgeBaseId}/records for the exact dispatch rules.

`chat` (optional)

Wires up the runtime-wide default chat-completion executor used by agents that have no llmServiceId of their own. When unset and the agent also has no LLM service bound, agent send + streaming return 503 chat_disabled. See agents.md for the agent surface and the per-agent LLM-service binding.

yaml

chat:
  provider: openrouter
  tokenRef: env:OPENROUTER_API_KEY
  baseUrl: null
  model: openai/gpt-4o-mini
  maxOutputTokens: 1024
  retrievalK: 6
  systemPrompt: null
  requestTimeoutMs: null

Field	Type	Default	Notes
`provider`	enum	`openrouter`	One of `openrouter` (hosted default), `openai` (direct/BYOK), or `ollama` (local/offline). All three speak the OpenAI-compatible wire protocol and share one adapter.
`tokenRef`	SecretRef	`env:OPENROUTER_API_KEY`	Resolved once at boot. `env:VAR` or `file:/path`. Not required for `ollama`, which needs no credential.
`baseUrl`	string \| null	`null`	API base URL. `null` derives the provider default (`https://openrouter.ai/api/v1`, `https://api.openai.com/v1`, or `http://localhost:11434/v1`).
`model`	string	`openai/gpt-4o-mini`	The provider's model id. For OpenRouter this is a catalog slug such as `openai/gpt-4o-mini`, `anthropic/claude-3.5-sonnet`, or `meta-llama/llama-3.3-70b-instruct` — one key reaches 300+ models.
`allowDataCollection`	bool	`false`	OpenRouter only. Defaults to ZDR-only routing (`provider.data_collection: "deny"`), so prompts route exclusively to zero-data-retention upstreams. Set `true` to relax this and permit upstreams that may retain prompts. No effect for `openai` / `ollama`.
`maxOutputTokens`	int (1–8192)	`1024`	Per-turn cap on the assistant's reply length.
`retrievalK`	int (1–64)	`6`	Top-K KB chunks per knowledge base. The total injected into the prompt is `retrievalK * ceil(sqrt(numKbs))` so multi-KB conversations don't blow up the prompt.
`systemPrompt`	string \| null	`null`	Default system prompt when neither the agent nor the agent's LLM service supplies one. `null` falls back to the runtime's persona-agnostic `DEFAULT_AGENT_SYSTEM_PROMPT`.
`requestTimeoutMs`	int (1–600000) \| null	`null`	Hard per-request wall-clock for a single non-streaming completion. A hung or pathologically slow provider aborts at this bound and surfaces as an error turn instead of holding the request open. `null` defers to the transport's socket timeout. Streaming replies are not bounded by this (they already honor client aborts).

Per-agent override. When an agent has llmServiceId set, the agent's bound LLM service overrides this block — the runtime instantiates a chat service from the LLM-service record instead of using the global block. The agent's own systemPrompt likewise wins over chat.systemPrompt when present. Three providers are wired end-to-end today: provider: "openrouter", provider: "openai", and provider: "ollama" — all OpenAI- compatible and served by the same adapter. Other providers can be stored but return 422 llm_provider_unsupported until their adapters land.

Document extraction (optional)

The runtime exposes a multipart ingest route at POST /api/v1/workspaces/{w}/knowledge-bases/{kb}/ingest/file that accepts PDF, DOCX, XLSX, and text uploads. Extraction is dispatched based on the upload's MIME type / extension; configuration is via environment variables, not workbench.yaml.

Variable	Default	Notes
`DOCLING_URL`	unset	Base URL of a docling-serve instance. When set, the dispatcher prefers docling over the native pipeline for non-text files (PDF, DOCX, XLSX) and falls back to native if docling is unreachable. The route also accepts an explicit per-upload `parser=native\|docling\|auto` form field.
`DOCLING_TIMEOUT_MS`	`60000`	Per-request budget for docling-serve calls. Scanned/OCR'd PDFs can run long; raise this if you see `docling_unavailable` with `timed out after …` messages.

When DOCLING_URL is unset (the default), the runtime uses pdfjs-dist for PDFs, mammoth for DOCX, and exceljs for XLSX (rendered as one markdown table per worksheet). Native extraction is fast and zero-ops but flattens layout-specific structure and skips OCR; docling preserves layout and does OCR for scanned documents.

`mcp` (optional)

Toggles the Model Context Protocol façade at /api/v1/workspaces/{w}/mcp. On by default — it sits behind the same auth middleware as the REST API, so it does not widen the security boundary. Set enabled: false to take the route down (it will then return 404). See mcp.md for the full feature walkthrough.

yaml

mcp:
  enabled: true
  exposeChat: false

Field	Type	Default	Notes
`enabled`	bool	`true`	When false, MCP is not exposed at all.
`exposeChat`	bool	`false`	Adds the `chat_send` MCP tool. Requires the `chat:` block; silently skipped when chat is unset.

`auth`

Configures the /api/v1/* auth middleware. See auth.md for the full contract and rollout plan.

yaml

auth:
  mode: disabled          # disabled | apiKey | oidc | any
  anonymousPolicy: allow  # allow | reject
  # oidc: …               # required when mode is `oidc` or `any`

Field	Type	Default	Notes
`mode`	enum	`disabled`	Which verifiers are active.
`anonymousPolicy`	enum	`allow`	`allow` lets tokenless requests through as anonymous; `reject` returns `401 unauthorized`.
`bootstrapTokenRef`	SecretRef \| null	`null`	Optional 32+ character break-glass bearer token. Accepted as an unscoped operator subject when `mode` is `apiKey`, `oidc`, or `any`; invalid with `mode: disabled`.
`acknowledgeOpenAccess`	boolean	`true`	Controls how the deployment guard reacts when a durable control plane (`file` / `astra`) is paired with open auth (`mode: disabled` or `anonymousPolicy: allow`). Default `true` keeps that pairing as a loud startup warning so the dev loop (file CP + open auth) keeps booting. Flip to `false` in CI / shared environments to convert the warning into a hard fatal. Production deployments should set `runtime.environment: production` instead — that forces `apiKey`/`oidc`/`any` + `anonymousPolicy: reject` at the schema layer regardless of this flag.
`oidc`	object	—	Required when `mode` is `oidc` or `any`. See table below.

The default (disabled + allow) matches pre-auth behavior: the middleware runs, tags every request anonymous, and lets it through. Set anonymousPolicy: reject in CI to assert the middleware is mounted.

`auth.oidc`

Field	Type	Default	Notes
`issuer`	url	required	Must equal the JWT `iss` claim exactly. Discovery URL is derived from this.
`audience`	string \| string[]	required	At least one value must match the JWT `aud` claim.
`jwksUri`	url \| null	`null`	When null, the runtime fetches `${issuer}/.well-known/openid-configuration` at startup and uses `jwks_uri` from the response.
`clockToleranceSeconds`	int	`30`	Skew allowance for `exp` / `nbf`.
`claims.subject`	string	`sub`	JWT claim → `AuthSubject.id`.
`claims.label`	string	`email`	JWT claim → `AuthSubject.label` (nullable).
`claims.workspaceScopes`	string	`wb_workspace_scopes`	Array-valued claim → `AuthSubject.workspaceScopes`. A JSON `null` marks the subject unscoped (admin).
`client`	object	—	Optional. When present, the runtime hosts `/auth/{login,callback,me,logout}` so the bundled web UI can drive a browser PKCE login. See table below.

`auth.oidc.client`

Field	Type	Default	Notes
`clientId`	string	required	OAuth client identifier registered at the IdP.
`clientSecretRef`	SecretRef \| null	`null`	Client secret. Omit for public (SPA-style) clients.
`redirectPath`	string	`/auth/callback`	Path the IdP redirects to after authorization. Must be in the IdP's allow-list.
`postLogoutPath`	string	`/`	Where `/auth/logout` sends the user.
`scopes`	string[]	`[openid, profile, email]`	OAuth scopes requested at login.
`sessionCookieName`	string	`wb_session`	Cookie that carries the encrypted session.
`sessionSecretRef`	SecretRef \| null	`null`	Key material for encrypting session cookies. Must resolve to ≥32 bytes. When null, runtime auto-generates an ephemeral key at boot (dev only).

Secrets

Secrets reach the runtime through two disjoint paths:

YAML interpolation (`${VAR}`)

Applies before schema validation. Good for non-secret runtime settings like endpoints, and for pulling secrets that need to be literal strings in the config document.

Secret refs (`env:` / `file:`)

The preferred path for anything credential-shaped. A SecretRef is a string like env:ASTRA_DB_APPLICATION_TOKEN or file:/etc/workbench/secrets/astra-token. The runtime resolves it when it actually needs the secret (at control-plane init, for example), so the value never lives in memory longer than necessary and never crosses process logs.

Providers available today:

Provider	Ref shape	Behavior
`env`	`env:VAR_NAME`	Reads `process.env.VAR_NAME`. Errors if unset or empty.
`file`	`file:/abs/path`	Reads the file and trims trailing whitespace.
`astra-cli`	`astra-cli:<profile>:<dbId>:<token\|endpoint>`	Sources the token / Data API endpoint from a specific `astra` CLI profile + database. Lets different workspaces target different Astra databases without restarting. Cached for the process lifetime; errors are not cached.

Future providers (Vault, AWS SM, etc.) plug into the same SecretProvider interface. See runtimes/typescript/src/secrets/provider.ts.

Validation rules

At startup the runtime enforces:

Every ${VAR} reference resolves or has a default.
controlPlane.driver is one of memory | file | sqlite | astra.
Driver-specific required fields are present (e.g. root for file, path for sqlite, endpoint + tokenRef for astra).
Every tokenRef / credentials value matches the <prefix>:<path> shape.
seedWorkspaces is only non-empty when controlPlane.driver == memory.
No duplicate names within seedWorkspaces.

Validation failures abort startup with a non-zero exit code and a human-readable error message.

Hot reload

Not supported. The current model is "restart the process to pick up changes." Since only the control-plane backend is configured here (workspaces themselves are runtime data), most day-to-day operations don't require a config change anyway.

Graceful shutdown

SIGINT and SIGTERM trigger a graceful-shutdown sequence:

/readyz starts returning 503 draining. Kubernetes-style readiness probes will stop routing traffic here.
server.close() stops accepting new connections. In-flight requests keep going.
When every connection finishes (or after 15 seconds, whichever comes first), the control-plane store closes and the process exits 0. A timeout exits 1 so the supervisor knows we didn't drain cleanly.
A second SIGINT / SIGTERM while the first is still draining short-circuits straight to exit — the operator can force-kill a stuck process without waiting for the timeout.

/healthz stays 200 throughout the drain (the process is still alive, just closed to new traffic). That's the split that k8s expects — livenessProbe hits /healthz, readinessProbe hits /readyz.

`.env` file (dev convenience)

The runtime auto-loads a .env file at startup so local dev doesn't need you to export secrets by hand. Uses Node 21.7+'s built-in process.loadEnvFile — no dotenv dependency.

Location. Put it at the repo root. The runtime walks up from the process's current working directory looking for .env, stopping at the repo root (.git sentinel). That means the same file works whether you run npm run dev from the repo root or from runtimes/typescript/.

Precedence. Values already present in process.env win — .env never overwrites shell exports or container env vars. Matches every other dotenv loader.

Override the path. Set WORKBENCH_ENV_FILE=/abs/path/to/.env to skip the walk and load an explicit file. Useful for production container boots where the token lives on a mounted secret. As of 0.2.0 the override is absent-tolerant — a missing file is no longer fatal, so a fresh container can boot before the file exists and the first-run setup wizard can populate it.

Managed env file. The setup wizard writes its allow-listed keys (ASTRA_DB_API_ENDPOINT, ASTRA_DB_APPLICATION_TOKEN, OPENROUTER_API_KEY, OPENAI_API_KEY) to $WORKBENCH_DATA_DIR/.env with mode 0600. The bundled Docker compose sets both WORKBENCH_DATA_DIR and WORKBENCH_ENV_FILE to that path so the runtime auto-loads the file on the next boot after POST /setup/restart.

WORKBENCH_DATA_DIR. Base directory for runtime-managed state files: the setup wizard's .env, the anonymous telemetry install id (.install-id), and the cli/ subdirectory the bundled aiw binary uses when running inside the compose container. The compose file points this at the persistent volume; outside compose, defaults to os.tmpdir()/ai-workbench.

Template. .env.example at the repo root is a committed starting point — copy to .env and fill in the secrets you need. .env itself is gitignored.

Production. The runtime ships the same loader in production, but standard container practice (Docker -e / K8s Secrets → env vars) usually means no .env is present and the loader silently skips.

Examples

All canonical examples live under runtimes/typescript/examples/:

workbench.yaml — the default dev config the Docker image ships with, with annotated comments covering all three backends, seedWorkspaces, and auth stanzas.
workbench.production.yaml — hardened production preset (astra backend, OIDC, security headers).
workbench.memory.yaml — CI / smoke-test preset (in-memory only, no persistence).

Configuration (workbench.yaml) ​

Resolution order ​

Environment variable interpolation ​

Top-level schema ​

version ​

runtime ​

Rate limiting ​

Tracing ​

Telemetry ​

controlPlane ​

memory ​

file ​

sqlite ​

astra ​

controlPlane.jobsResume (file / sqlite / astra) ​

controlPlane.reconcileOrphansOnStart (astra) ​

seedWorkspaces (memory only) ​

Embedding services and vectorize-on-ingest ​

chat (optional) ​

Document extraction (optional) ​

mcp (optional) ​

auth ​

auth.oidc ​

auth.oidc.client ​

Secrets ​

YAML interpolation (${VAR}) ​

Secret refs (env: / file:) ​

Validation rules ​

Hot reload ​

Graceful shutdown ​

.env file (dev convenience) ​

Examples ​

Configuration (`workbench.yaml`)

Resolution order

Environment variable interpolation

Top-level schema

`version`

`runtime`

Rate limiting

Tracing

Telemetry

`controlPlane`

`memory`

`file`

`sqlite`

`astra`

`controlPlane.jobsResume` (file / sqlite / astra)

`controlPlane.reconcileOrphansOnStart` (astra)

`seedWorkspaces` (memory only)

Embedding services and vectorize-on-ingest

`chat` (optional)

Document extraction (optional)

`mcp` (optional)

`auth`

`auth.oidc`

`auth.oidc.client`

Secrets

YAML interpolation (`${VAR}`)

Secret refs (`env:` / `file:`)

Validation rules

Hot reload

Graceful shutdown

`.env` file (dev convenience)

Examples