Skip to content

Configuration (workbench.yaml)

Runtime behavior is driven by a single YAML file, conventionally named workbench.yaml. The runtime loads it at startup and validates it against a strict schema.

Workspaces, knowledge bases, and execution services are not in config. They're runtime data, mutable via the HTTP API. workbench.yaml decides two things:

  1. Where that data is persisted (the control-plane backend).
  2. Optionally, which seed workspaces to load into the memory backend at startup.

Resolution order

The runtime looks for the config file in this order and takes the first match:

  1. --config <file> CLI flag.
  2. WORKBENCH_CONFIG environment variable.
  3. ./workbench.yaml in the process working directory.
  4. ./examples/workbench.yaml — the sample config this runtime ships with. Lets npm run dev work out-of-the-box when run from the runtime directory.
  5. /etc/workbench/workbench.yaml (the Docker image default).

No cross-source merging — config is a single declarative document. --config and WORKBENCH_CONFIG are returned verbatim; they fail loudly if the target doesn't exist rather than silently falling through to the next step.

Environment variable interpolation

Any string value may reference an environment variable with ${VAR} or ${VAR:-default} syntax. Interpolation happens before schema validation.

yaml
controlPlane:
  driver: astra
  endpoint: ${ASTRA_DB_API_ENDPOINT}
  tokenRef: env:ASTRA_DB_APPLICATION_TOKEN

References to unset variables without a default fail loudly at startup.

Note: tokenRef above is a SecretRef string, not an interpolation. Secret refs are resolved at use time by the runtime's SecretResolver, which is separate from YAML interpolation. See § Secrets below.

Top-level schema

yaml
version: 1                          # required
runtime: { port, logLevel, ... }    # optional, with defaults
controlPlane: { driver, ... }       # optional, default: memory
seedWorkspaces: [ ... ]             # optional, memory-only

version

Schema version. Currently 1. The runtime refuses to start on an unknown version.

runtime

FieldTypeDefaultNotes
environmentenumdevelopmentdevelopment | production. Production mode enforces durable persistence, enabled auth, rejected anonymous traffic, HTTPS publicOrigin, and a persistent OIDC session secret when browser login is configured.
portint8080HTTP listen port
logLevelenuminfotrace | debug | info | warn | error. The LOG_LEVEL env var overrides this when set.
requestIdHeaderstringX-Request-IdName of the request-ID header
uiDirstring | nullnullDirectory of pre-built UI assets to serve from / (with SPA fallback). null auto-detects /app/public${cwd}/public${cwd}/apps/web/dist. The UI_DIR env var also works as an override. The official Docker image sets this up automatically.
replicaIdstring | nullnullIdentifier this replica writes into job leases (used by the cross-replica orphan sweeper to tell whose lease is whose). null auto-generates ${HOSTNAME or "wb"}-<short-uuid> at boot — fine for single-replica deployments and tests; set explicitly for clustered runs if you want the lease holder to be deterministic.
publicOriginURL | nullnullExternally visible browser origin, e.g. https://workbench.example.com. Used for OIDC redirect URI construction and secure-cookie decisions. Required for production OIDC browser login.
trustProxyHeadersbooleanfalseTrust X-Forwarded-Proto / X-Forwarded-Host when publicOrigin is not set. Also extends to the rate limiter (X-Forwarded-For / X-Real-IP). Enable only behind a trusted proxy that overwrites those headers.
csrfOriginCheckbooleantrueCSRF Origin/Referer check on cookie-protected routes (/api/v1/workspaces/* state-changing methods, plus /auth/refresh and /auth/logout). Bearer-token requests bypass the check. Disable only for non-browser clients that authenticate with cookies but cannot send Origin — prefer Bearer auth instead.
rateLimitobject(defaults below)In-process per-IP rate limiter. See § Rate limiting.
blockPrivateNetworkEndpointsbooleanfalseLayered SSRF defense: when true, operator-supplied endpointBaseUrl values on chunking / embedding / reranking / LLM services are rejected if they resolve to RFC1918 (10/8, 172.16/12, 192.168/16), loopback, or IPv6 unique-local hosts. Auto-flipped to true when runtime.environment: production. Default false so the local-Ollama / local-vLLM dev workflow keeps working; production deployments should still pair this with VPC-level egress controls.
maxConcurrentIngestJobsint (≥1)4Per-replica cap on in-flight ingest workers. Beyond the cap, queued jobs wait in-process for a slot rather than slamming the embedding provider's quota. Persisted job state is unaffected; raise for dedicated provisioned-throughput deployments. Surfaced as workbench_ingest_workers_{active,queued} on /metrics.
tracingobject(off)OpenTelemetry tracing knobs. See § Tracing.
telemetryobject(off)Opt-in anonymous usage telemetry. See § Telemetry.

Production deployments should start from runtimes/typescript/examples/workbench.production.yaml.

Rate limiting

Defense-in-depth limiter applied to /api/v1/* (capacity from config) and /auth/* (a tighter fixed cap of 30 req/window — login flows shouldn't burst). Per-IP, per-replica fixed window. Distributed deployments should still front the runtime with an upstream WAF / API gateway for accurate aggregate ceilings; this layer protects against runaway clients and naive scanners.

yaml
runtime:
  rateLimit:
    enabled: true        # default
    capacity: 600        # max requests per window per IP for /api/v1/*
    windowMs: 60000      # window length, ms
FieldTypeDefaultNotes
enabledbooltrueSet false to skip the limiter entirely.
capacityint (1–1_000_000)600Per-IP requests per window for /api/v1/*. The auth surface uses a fixed 30.
windowMsint (1000–3_600_000)60000Window length in milliseconds.

Rejected requests get 429 Too Many Requests with the canonical error envelope, a Retry-After header (seconds), and X-RateLimit-{Limit,Remaining,Reset} headers on every response. Client IP is derived from the socket; set runtime.trustProxyHeaders: true to honor X-Forwarded-For / X-Real-IP instead.

Tracing

OpenTelemetry tracing knobs. Off by default — flipping enabled: true starts a NodeSDK with the OTLP HTTP trace exporter and the standard auto-instrumentations bundle. When disabled, the runtime still creates manual server spans through @opentelemetry/api so flipping tracing on later does not require code changes — the spans are just no-ops without a registered SDK.

yaml
runtime:
  tracing:
    enabled: false
    serviceName: null         # null → "ai-workbench-runtime"
    exporterUrl: null         # null → OTEL_EXPORTER_OTLP_ENDPOINT / SDK default
FieldTypeDefaultNotes
enabledboolfalseStart the NodeSDK + auto-instrumentations bundle.
serviceNamestring | nullnullOverride the service.name resource attribute. null keeps the default ai-workbench-runtime.
exporterUrlURL | nullnullOTLP HTTP traces endpoint, e.g. https://otel-collector.example.com/v1/traces. null falls back to OTEL_EXPORTER_OTLP_ENDPOINT and the SDK default.

For full HTTP / fetch / pino auto-instrumentation, preload the SDK at process launch (node --import ./dist/lib/tracing-preload.js dist/root.js). Without --import, manual server spans cover every request but outbound HTTP / fetch / DB clients won't emit child spans. See production.md for the deploy-side walkthrough.

Telemetry

Opt-in anonymous usage telemetry. Off by default. When enabled without a sink, the runtime constructs each event and logs telemetry: dark mode (no sink configured) instead of sending — useful for verifying the wiring before standing up a collector. Network failures never block the runtime: each emit is fire-and-forget with a 2 s timeout.

yaml
runtime:
  telemetry:
    enabled: false
    url: null            # e.g. https://telemetry.example.com/v1/events
FieldTypeDefaultNotes
enabledboolfalseSet true to construct + emit events. WORKBENCH_TELEMETRY=1 is an env override; WORKBENCH_TELEMETRY=0 disables even if YAML says true.
urlURL | nullnullSink for POSTed events. WORKBENCH_TELEMETRY_URL env override. null + enabled: true is dark mode (events constructed, never sent).

Every event carries an anonymous install id persisted at $WORKBENCH_DATA_DIR/.install-id. Three event types are emitted: runtime_start, error (code + status, no message bodies), and command_run from the CLI wrapper. The canonical event catalog and no-PII guarantee live in telemetry.md.

controlPlane

Picks where workspaces, knowledge bases, execution services, and RAG documents are persisted. Discriminated on driver.

When controlPlane: is omitted entirely, the runtime infers a default: if both ASTRA_DB_API_ENDPOINT and ASTRA_DB_APPLICATION_TOKEN are populated (the astra-cli auto-detection on boot fills these for any developer with a working profile), the runtime selects the astra driver against ASTRA_DB_KEYSPACE (or default_keyspace). Otherwise it falls back to a file backend rooted at ./.workbench-data. Set controlPlane.driver: memory explicitly if you want pure in-process state without the on-disk fallback.

memory

yaml
controlPlane:
  driver: memory

In-process Maps. State is lost when the runtime exits. Best for CI, tests, and ephemeral demos. Note that omitting controlPlane entirely no longer falls through to memory — the runtime's default prefers Astra (when env vars are present) or a file backend. Set driver: memory explicitly to opt in.

file

yaml
controlPlane:
  driver: file
  root: /var/lib/workbench

JSON-on-disk. One file per table, per-table mutex, atomic rename on writes. Single-node self-hosted. Not safe for multiple writers — if you run two containers pointing at the same directory, they'll clobber each other.

FieldTypeRequiredNotes
rootstringyesDirectory that will hold workspaces.json et al. Created if absent.

sqlite

yaml
controlPlane:
  driver: sqlite
  path: /var/lib/workbench/workbench.db

SQLite via better-sqlite3, in WAL mode. Same durability and single-node posture as file, but row-level writes instead of a whole-file rewrite per mutation — built for chat-heavy deployments where the file backend goes quadratic as conversations grow. Not safe for multiple writers; run one replica per database file. WAL sidecars (-wal, -shm) live beside the database file. A path of :memory: selects an ephemeral in-process database (tests and throwaway demos).

FieldTypeRequiredNotes
pathstringyesDatabase file path. Parent directory must exist. :memory: selects an ephemeral in-process database.
jobsResumeobjectoffCross-replica orphan-sweeper config (see below). Single-node SQLite rarely needs it.
jobPollIntervalMsint (50–60000)Accepted for symmetry with astra; unused by the in-process SQLite job store, which fans out updates with no poller.

astra

yaml
controlPlane:
  driver: astra
  endpoint: https://<db-id>-<region>.apps.astra.datastax.com
  tokenRef: env:ASTRA_DB_APPLICATION_TOKEN
  keyspace: default_keyspace

Astra Data API Tables via @datastax/astra-db-ts. Production-grade, multi-writer-safe.

Tip — astra-cli auto-config. If you have the astra CLI installed and a profile configured, you can leave ASTRA_DB_APPLICATION_TOKEN and ASTRA_DB_API_ENDPOINT unset locally — the runtime will pick them up from the CLI at startup. Production deployments inject them from a secret manager and the CLI integration is automatically inert. See astra-cli.md.

FieldTypeRequiredNotes
endpointURLyesAstra Data API endpoint
tokenRefSecretRefyesPointer to the application token (env:… / file:…)
keyspacestringno (default default_keyspace)Keyspace hosting the wb_* control-plane tables. Defaults to default_keyspace — the keyspace Astra DB auto-creates on every new database — so out-of-the-box deployments boot without pre-creating one.
jobPollIntervalMsint (50–60000)500Cross-replica job-subscriber poll interval in ms. Each subscribed (workspace, jobId) pair is re-read at this cadence so SSE clients on a different replica from the worker still see updates. Same-replica updates fan out instantly; the poller is a no-op when no one is subscribed. Raise for cost-sensitive deployments where second-scale staleness is fine; lower for hot SSE paths. Astra-only — memory and file are single-replica by definition.
jobsResumeobjectoffCross-replica orphan-sweeper config. See below.
reconcileOrphansOnStartboolfalseRun a one-shot orphaned-dependent reconciliation at boot. See below. Astra-only.

The runtime creates the wb_* tables at startup if they don't exist (using createTable(..., { ifNotExists: true })). The keyspace itself must already exist.

controlPlane.jobsResume (file / sqlite / astra)

Off by default — only useful for clustered deployments where one replica can crash mid-ingest while another stays up. Single-replica operators don't need it (their pipelines always fail-fast on the same process). When enabled, every replica scans the durable job store on an interval for running jobs whose lease is older than the grace window and CAS-claims them. Jobs with a persisted ingest snapshot replay the pipeline idempotently; older rows without a snapshot still become terminal failed records so SSE clients do not hang forever. See cross-replica-jobs.md.

jobsResume.enabled: true with controlPlane.driver: memory is rejected at validation time — there is no shared store for sibling replicas to scan when the leases live in another replica's process memory.

yaml
controlPlane:
  driver: astra
  endpoint: https://...
  tokenRef: env:ASTRA_DB_APPLICATION_TOKEN
  jobsResume:
    enabled: true
    graceMs: 60000     # how stale a lease must be before reclaim
    intervalMs: 60000  # how often each replica scans

controlPlane.reconcileOrphansOnStart (astra)

Off by default. When true, the runtime runs a single reconcileOrphans() pass during startup: it scans the wb_* dependent tables for rows whose owning workspace row no longer exists — the signature of a partial cross-partition cascade failure — and re-runs the idempotent dependent-delete cascade for each such workspace id.

You shouldn't normally need it. deleteWorkspace is children-first, parent-last: it removes every dependent partition before the workspace row and, on any partial failure (Astra has no cross-partition transaction), leaves the workspace row intact and returns 500 cascade_incomplete so the delete is simply retried — the cascade is idempotent, so the retry completes it. New orphans therefore don't occur. This flag exists to mop up orphans left by older deployments (which deleted the workspace row first) or out-of-band row deletions. Enable it, restart once, then turn it back off. It is operator-gated by workbench.yaml ownership and never blocks startup — a reconcile failure is logged and boot continues.

FieldTypeDefaultNotes
enabledboolfalseSet to true to start the sweeper. Off by default; clustered deployments opt in.
graceMsint (1000–600000)60000Maximum age of a lease (relative to last heartbeat) before the job is considered orphaned.
intervalMsint (1000–600000)60000How often each replica scans for stale leases.

Heartbeats are stamped on every progress update (processed ticking, status flipping), so any active worker keeps its lease fresh. Each replica writes its own replicaId (see runtime.replicaId) into leasedBy so the sweeper can tell what claim belongs to whom.

seedWorkspaces (memory only)

Optional list of workspace records loaded into the memory backend at startup. Lets developers skip the POST /api/v1/workspaces dance when running locally.

yaml
seedWorkspaces:
  - name: demo
    kind: mock
  - name: prod-astra
    kind: astra
    url: env:ASTRA_DB_API_ENDPOINT
    credentials:
      token: env:ASTRA_DB_APPLICATION_TOKEN
    keyspace: workbench
FieldTypeRequiredNotes
namestringyesWorkspace name
kindenumyesastra | hcd | openrag | mock
uidUUIDno (auto-generated)Only useful if other seeds reference it
urlURL or SecretRefnoWorkspace-specific data-plane URL
credentialsmap<string, SecretRef>noPer-key secret pointers
keyspacestringnoWorkspace-specific keyspace

Using seedWorkspaces with any driver other than memory is a validation error — workspaces already persist in the backend, so there's nothing to seed.

Embedding services and vectorize-on-ingest

Embedding services are workspace-scoped runtime data — created via POST /api/v1/workspaces/{w}/embedding-services, not in workbench.yaml. The yaml only seeds workspaces; everything past that flows through the API. Embedding services control how the runtime turns text into vectors during ingest and search.

The runtime supports two execution paths:

  • Client-side embedding (the default). The runtime resolves endpointBaseUrl + endpointPath + credentialRef, calls the provider's HTTP API at ingest / search time, and writes the resulting vectors to Astra. Works with any embedding provider (OpenAI, Cohere, NVIDIA NIM, self-hosted, etc.) — the runtime carries the bytes.
  • Vectorize-on-ingest (server-side, Astra-only). When the embedding service has provider: "astra" and the bound vector collection was created with a matching Astra service block (e.g. nvidia:nvidia/nv-embedqa-e5-v5), the runtime delegates embedding to Astra's $vectorize column type. Documents are upserted with raw text; Astra runs the embedding model in its own infrastructure and stores both the text and the vector. Search likewise sends the query as text and Astra embeds it server-side.

The vectorize path is faster on hot paths (no extra round-trip to a provider), simpler to operate (no provider credential lives on the runtime), and avoids the dimension-mismatch class of errors. Two constraints to know:

  1. The embedding service and the collection must agree. Creating a knowledge base with provider: "astra" materializes the Astra collection with the service block baked in. Changing the embedding service binding on an existing KB is rejected if the dimension or provider doesn't match what the collection was provisioned with.
  2. credentialRef is irrelevant on the runtime side. Astra itself holds whatever provider credentials the vectorize service needs; the runtime never sees them. Setting credentialRef on a provider: "astra" embedding service is a no-op.

Mixed-batch upserts (some records carry a precomputed vector, others carry text) always fall back to client-side embedding so the entire batch lands in one transactional call. See api-spec.md §POST /{knowledgeBaseId}/records for the exact dispatch rules.

chat (optional)

Wires up the runtime-wide default chat-completion executor used by agents that have no llmServiceId of their own. When unset and the agent also has no LLM service bound, agent send + streaming return 503 chat_disabled. See agents.md for the agent surface and the per-agent LLM-service binding.

yaml
chat:
  provider: openrouter
  tokenRef: env:OPENROUTER_API_KEY
  baseUrl: null
  model: openai/gpt-4o-mini
  maxOutputTokens: 1024
  retrievalK: 6
  systemPrompt: null
  requestTimeoutMs: null
FieldTypeDefaultNotes
providerenumopenrouterOne of openrouter (hosted default), openai (direct/BYOK), or ollama (local/offline). All three speak the OpenAI-compatible wire protocol and share one adapter.
tokenRefSecretRefenv:OPENROUTER_API_KEYResolved once at boot. env:VAR or file:/path. Not required for ollama, which needs no credential.
baseUrlstring | nullnullAPI base URL. null derives the provider default (https://openrouter.ai/api/v1, https://api.openai.com/v1, or http://localhost:11434/v1).
modelstringopenai/gpt-4o-miniThe provider's model id. For OpenRouter this is a catalog slug such as openai/gpt-4o-mini, anthropic/claude-3.5-sonnet, or meta-llama/llama-3.3-70b-instruct — one key reaches 300+ models.
allowDataCollectionboolfalseOpenRouter only. Defaults to ZDR-only routing (provider.data_collection: "deny"), so prompts route exclusively to zero-data-retention upstreams. Set true to relax this and permit upstreams that may retain prompts. No effect for openai / ollama.
maxOutputTokensint (1–8192)1024Per-turn cap on the assistant's reply length.
retrievalKint (1–64)6Top-K KB chunks per knowledge base. The total injected into the prompt is retrievalK * ceil(sqrt(numKbs)) so multi-KB conversations don't blow up the prompt.
systemPromptstring | nullnullDefault system prompt when neither the agent nor the agent's LLM service supplies one. null falls back to the runtime's persona-agnostic DEFAULT_AGENT_SYSTEM_PROMPT.
requestTimeoutMsint (1–600000) | nullnullHard per-request wall-clock for a single non-streaming completion. A hung or pathologically slow provider aborts at this bound and surfaces as an error turn instead of holding the request open. null defers to the transport's socket timeout. Streaming replies are not bounded by this (they already honor client aborts).

Per-agent override. When an agent has llmServiceId set, the agent's bound LLM service overrides this block — the runtime instantiates a chat service from the LLM-service record instead of using the global block. The agent's own systemPrompt likewise wins over chat.systemPrompt when present. Three providers are wired end-to-end today: provider: "openrouter", provider: "openai", and provider: "ollama" — all OpenAI- compatible and served by the same adapter. Other providers can be stored but return 422 llm_provider_unsupported until their adapters land.

Document extraction (optional)

The runtime exposes a multipart ingest route at POST /api/v1/workspaces/{w}/knowledge-bases/{kb}/ingest/file that accepts PDF, DOCX, XLSX, and text uploads. Extraction is dispatched based on the upload's MIME type / extension; configuration is via environment variables, not workbench.yaml.

VariableDefaultNotes
DOCLING_URLunsetBase URL of a docling-serve instance. When set, the dispatcher prefers docling over the native pipeline for non-text files (PDF, DOCX, XLSX) and falls back to native if docling is unreachable. The route also accepts an explicit per-upload parser=native|docling|auto form field.
DOCLING_TIMEOUT_MS60000Per-request budget for docling-serve calls. Scanned/OCR'd PDFs can run long; raise this if you see docling_unavailable with timed out after … messages.

When DOCLING_URL is unset (the default), the runtime uses pdfjs-dist for PDFs, mammoth for DOCX, and exceljs for XLSX (rendered as one markdown table per worksheet). Native extraction is fast and zero-ops but flattens layout-specific structure and skips OCR; docling preserves layout and does OCR for scanned documents.

mcp (optional)

Toggles the Model Context Protocol façade at /api/v1/workspaces/{w}/mcp. On by default — it sits behind the same auth middleware as the REST API, so it does not widen the security boundary. Set enabled: false to take the route down (it will then return 404). See mcp.md for the full feature walkthrough.

yaml
mcp:
  enabled: true
  exposeChat: false
FieldTypeDefaultNotes
enabledbooltrueWhen false, MCP is not exposed at all.
exposeChatboolfalseAdds the chat_send MCP tool. Requires the chat: block; silently skipped when chat is unset.

auth

Configures the /api/v1/* auth middleware. See auth.md for the full contract and rollout plan.

yaml
auth:
  mode: disabled          # disabled | apiKey | oidc | any
  anonymousPolicy: allow  # allow | reject
  # oidc: …               # required when mode is `oidc` or `any`
FieldTypeDefaultNotes
modeenumdisabledWhich verifiers are active.
anonymousPolicyenumallowallow lets tokenless requests through as anonymous; reject returns 401 unauthorized.
bootstrapTokenRefSecretRef | nullnullOptional 32+ character break-glass bearer token. Accepted as an unscoped operator subject when mode is apiKey, oidc, or any; invalid with mode: disabled.
acknowledgeOpenAccessbooleantrueControls how the deployment guard reacts when a durable control plane (file / astra) is paired with open auth (mode: disabled or anonymousPolicy: allow). Default true keeps that pairing as a loud startup warning so the dev loop (file CP + open auth) keeps booting. Flip to false in CI / shared environments to convert the warning into a hard fatal. Production deployments should set runtime.environment: production instead — that forces apiKey/oidc/any + anonymousPolicy: reject at the schema layer regardless of this flag.
oidcobjectRequired when mode is oidc or any. See table below.

The default (disabled + allow) matches pre-auth behavior: the middleware runs, tags every request anonymous, and lets it through. Set anonymousPolicy: reject in CI to assert the middleware is mounted.

auth.oidc

FieldTypeDefaultNotes
issuerurlrequiredMust equal the JWT iss claim exactly. Discovery URL is derived from this.
audiencestring | string[]requiredAt least one value must match the JWT aud claim.
jwksUriurl | nullnullWhen null, the runtime fetches ${issuer}/.well-known/openid-configuration at startup and uses jwks_uri from the response.
clockToleranceSecondsint30Skew allowance for exp / nbf.
claims.subjectstringsubJWT claim → AuthSubject.id.
claims.labelstringemailJWT claim → AuthSubject.label (nullable).
claims.workspaceScopesstringwb_workspace_scopesArray-valued claim → AuthSubject.workspaceScopes. A JSON null marks the subject unscoped (admin).
clientobjectOptional. When present, the runtime hosts /auth/{login,callback,me,logout} so the bundled web UI can drive a browser PKCE login. See table below.

auth.oidc.client

FieldTypeDefaultNotes
clientIdstringrequiredOAuth client identifier registered at the IdP.
clientSecretRefSecretRef | nullnullClient secret. Omit for public (SPA-style) clients.
redirectPathstring/auth/callbackPath the IdP redirects to after authorization. Must be in the IdP's allow-list.
postLogoutPathstring/Where /auth/logout sends the user.
scopesstring[][openid, profile, email]OAuth scopes requested at login.
sessionCookieNamestringwb_sessionCookie that carries the encrypted session.
sessionSecretRefSecretRef | nullnullKey material for encrypting session cookies. Must resolve to ≥32 bytes. When null, runtime auto-generates an ephemeral key at boot (dev only).

Secrets

Secrets reach the runtime through two disjoint paths:

YAML interpolation (${VAR})

Applies before schema validation. Good for non-secret runtime settings like endpoints, and for pulling secrets that need to be literal strings in the config document.

Secret refs (env: / file:)

The preferred path for anything credential-shaped. A SecretRef is a string like env:ASTRA_DB_APPLICATION_TOKEN or file:/etc/workbench/secrets/astra-token. The runtime resolves it when it actually needs the secret (at control-plane init, for example), so the value never lives in memory longer than necessary and never crosses process logs.

Providers available today:

ProviderRef shapeBehavior
envenv:VAR_NAMEReads process.env.VAR_NAME. Errors if unset or empty.
filefile:/abs/pathReads the file and trims trailing whitespace.
astra-cliastra-cli:<profile>:<dbId>:<token|endpoint>Sources the token / Data API endpoint from a specific astra CLI profile + database. Lets different workspaces target different Astra databases without restarting. Cached for the process lifetime; errors are not cached.

Future providers (Vault, AWS SM, etc.) plug into the same SecretProvider interface. See runtimes/typescript/src/secrets/provider.ts.

Validation rules

At startup the runtime enforces:

  • Every ${VAR} reference resolves or has a default.
  • controlPlane.driver is one of memory | file | sqlite | astra.
  • Driver-specific required fields are present (e.g. root for file, path for sqlite, endpoint + tokenRef for astra).
  • Every tokenRef / credentials value matches the <prefix>:<path> shape.
  • seedWorkspaces is only non-empty when controlPlane.driver == memory.
  • No duplicate names within seedWorkspaces.

Validation failures abort startup with a non-zero exit code and a human-readable error message.

Hot reload

Not supported. The current model is "restart the process to pick up changes." Since only the control-plane backend is configured here (workspaces themselves are runtime data), most day-to-day operations don't require a config change anyway.

Graceful shutdown

SIGINT and SIGTERM trigger a graceful-shutdown sequence:

  1. /readyz starts returning 503 draining. Kubernetes-style readiness probes will stop routing traffic here.
  2. server.close() stops accepting new connections. In-flight requests keep going.
  3. When every connection finishes (or after 15 seconds, whichever comes first), the control-plane store closes and the process exits 0. A timeout exits 1 so the supervisor knows we didn't drain cleanly.
  4. A second SIGINT / SIGTERM while the first is still draining short-circuits straight to exit — the operator can force-kill a stuck process without waiting for the timeout.

/healthz stays 200 throughout the drain (the process is still alive, just closed to new traffic). That's the split that k8s expects — livenessProbe hits /healthz, readinessProbe hits /readyz.

.env file (dev convenience)

The runtime auto-loads a .env file at startup so local dev doesn't need you to export secrets by hand. Uses Node 21.7+'s built-in process.loadEnvFile — no dotenv dependency.

Location. Put it at the repo root. The runtime walks up from the process's current working directory looking for .env, stopping at the repo root (.git sentinel). That means the same file works whether you run npm run dev from the repo root or from runtimes/typescript/.

Precedence. Values already present in process.env win — .env never overwrites shell exports or container env vars. Matches every other dotenv loader.

Override the path. Set WORKBENCH_ENV_FILE=/abs/path/to/.env to skip the walk and load an explicit file. Useful for production container boots where the token lives on a mounted secret. As of 0.2.0 the override is absent-tolerant — a missing file is no longer fatal, so a fresh container can boot before the file exists and the first-run setup wizard can populate it.

Managed env file. The setup wizard writes its allow-listed keys (ASTRA_DB_API_ENDPOINT, ASTRA_DB_APPLICATION_TOKEN, OPENROUTER_API_KEY, OPENAI_API_KEY) to $WORKBENCH_DATA_DIR/.env with mode 0600. The bundled Docker compose sets both WORKBENCH_DATA_DIR and WORKBENCH_ENV_FILE to that path so the runtime auto-loads the file on the next boot after POST /setup/restart.

WORKBENCH_DATA_DIR. Base directory for runtime-managed state files: the setup wizard's .env, the anonymous telemetry install id (.install-id), and the cli/ subdirectory the bundled aiw binary uses when running inside the compose container. The compose file points this at the persistent volume; outside compose, defaults to os.tmpdir()/ai-workbench.

Template. .env.example at the repo root is a committed starting point — copy to .env and fill in the secrets you need. .env itself is gitignored.

Production. The runtime ships the same loader in production, but standard container practice (Docker -e / K8s Secrets → env vars) usually means no .env is present and the loader silently skips.

Examples

All canonical examples live under runtimes/typescript/examples/:

  • workbench.yaml — the default dev config the Docker image ships with, with annotated comments covering all three backends, seedWorkspaces, and auth stanzas.
  • workbench.production.yaml — hardened production preset (astra backend, OIDC, security headers).
  • workbench.memory.yaml — CI / smoke-test preset (in-memory only, no persistence).

Released under the MIT license.