Audit logging

The TypeScript runtime emits structured audit events for the sensitive operations listed below. Events are pino log lines at info level with a stable discriminator field audit: true, so deployments can route them to a dedicated sink (file, syslog, SIEM) by filter.

jsonc

{
  "level": 30,
  "time": 1735603200000,
  "audit": true,
  "action": "api_key.create",
  "outcome": "success",
  "requestId": "01JFE5...",
  "subject": {
    "type": "oidc",            // "apiKey" | "oidc" | "bootstrap" | "anonymous"
    "id": "sub-123",
    "label": "alice@example.com"
  },
  "workspaceId": "ws-1",
  "details": { "keyId": "...", "label": "ci-deployer" },
  "msg": "audit api_key.create success"
}

What gets logged

Action	Trigger	Notes
`api_key.create`	`POST /api/v1/workspaces/{w}/api-keys`	Plaintext is never logged. Only `keyId` + caller-supplied `label`.
`api_key.revoke`	`DELETE /api/v1/workspaces/{w}/api-keys/{keyId}`	Soft revoke; emitted on the first revoke only.
`workspace.create`	`POST /api/v1/workspaces`	Includes the workspace `label` (the human-friendly `name`).
`workspace.delete`	`DELETE /api/v1/workspaces/{w}`	Emitted after the cascade completes.
`kb.create`	`POST /api/v1/workspaces/{w}/knowledge-bases`	Provisions an Astra collection — destructive on rollback. Includes `knowledgeBaseId` + `label`.
`kb.delete`	`DELETE /api/v1/workspaces/{w}/knowledge-bases/{kb}`	Cascades the underlying collection drop and all rag-document rows. Emitted after the cascade.
`document.delete`	`DELETE /api/v1/workspaces/{w}/knowledge-bases/{kb}/documents/{d}`	Cascades chunk wipe before the row drop. Includes `knowledgeBaseId` + `documentId`.
`agent.create`	`POST /api/v1/workspaces/{w}/agents` or `POST /api/v1/workspaces/{w}/agents/from-template`	Includes `agentId` + `label`. The from-template variant additionally includes `templateId` (the catalog slug).
`agent.delete`	`DELETE /api/v1/workspaces/{w}/agents/{a}`	Cascades conversations + chat messages owned by the agent. Emitted after the cascade.
`job.claim`	Cross-replica orphan reclaim in `jobs/sweeper.ts`	Emitted when a replica successfully CAS-claims an orphaned job. Includes `jobId` + `jobKind`. Subject is the replica id (synthetic), not a user.
`mcp.invoke`	Any tool call into `/api/v1/workspaces/{w}/mcp`	Includes the `toolName`. Argument payloads are not logged.
`tool.invoke`	Any agent tool call in the chat tool-call loop (`POST /api/v1/workspaces/{w}/agents/{a}/conversations/{c}/messages[/stream]`)	One row per tool call — built-in, native, Astra, or remote-MCP. Includes the `toolName`, the `source` (`builtin`/`native`/`astra`/`mcp`), and the `mcpServerId` for `mcp`-source calls. `outcome: "denied"` (with `reason`) when the tool is not on the agent's allow-list or an `mcp:`-source call's caller lacks the `tools:invoke` scope (`reason: "missing tools:invoke scope"`); `outcome: "failure"` (with `reason`) when the tool errors or times out; `outcome: "success"` otherwise. Argument payloads are not logged (secrets could be in args).
`auth.api_denied`	401/403 auth decisions on `/api/v1/*`	`outcome: "denied"` with `reason`; unauthenticated 401s have `subject: null`, while scoped 403s include the resolved subject when available. A 403 from a scope gate (`assertScope`/`requireScope`) also carries a structured `requiredScope` (e.g. `write:ingest`) so denials aggregate by scope; workspace-membership 403s and 401s omit it.
`auth.bootstrap_use`	Any request authenticated with the bootstrap operator token	Includes `scheme: "bootstrap"`. The plaintext bootstrap token is never logged.
`auth.csrf_rejected`	A state-changing request to a cookie-protected route was rejected by the Origin/Referer check	`outcome: "failure"` with `reason` ∈ `{ "no allowed origin available", "missing Origin and Referer on state-changing request", "origin mismatch (got <claimed>)" }`. The HTTP response is `403 forbidden_origin`. Bearer-token requests bypass the check and never emit this event.
`auth.login`	OIDC `/auth/callback`	`outcome: "success"` once the access token passes the runtime's own verifier; `outcome: "failure"` with `reason` on token-validation errors.
`auth.refresh`	OIDC `/auth/refresh`	`outcome: "success"` on a clean rotate. `outcome: "failure"` with `reason` ∈ `{ "no_refresh_token", "idp_rejected", "token_validation_failed" }` covers the three failure paths (missing cookie, IdP refused the refresh_token, freshly-issued access token failed self-verification).
`auth.logout`	OIDC `/auth/logout`	Emitted on every cookie clear, even when no session was present.
`auth.device.authorize`	`POST /auth/device/authorize`	RFC 8628 device-flow proxy for the `aiw` CLI. `outcome: "success"` once the IdP returns a `device_code`; the audit row includes the short `user_code` (the human-typed code) but never the `device_code` itself. `outcome: "failure"` with `reason` when the IdP rejects the authorize request.
`auth.device.token`	`POST /auth/device/token`	RFC 8628 device-flow proxy for the `aiw` CLI — one row per terminal poll. `outcome: "success"` when the IdP issues an access token; `outcome: "failure"` with `reason` for IdP-rejected polls (`expired_token`, `access_denied`, etc.). Pending polls (`authorization_pending` / `slow_down`) are NOT audited; only the final outcome lands a row, so a long-lived device grant doesn't flood the audit log.
`principal.create`	`POST /api/v1/workspaces/{w}/principals`	RLAC prototype. Includes the `principalId`.
`principal.update`	`PATCH /api/v1/workspaces/{w}/principals/{principalId}`	RLAC prototype. Includes the `principalId`.
`principal.delete`	`DELETE /api/v1/workspaces/{w}/principals/{principalId}`	RLAC prototype. Includes the `principalId`.

The set is intentionally small. Adding a new event is a one-line call from a route handler — see src/lib/audit.ts.

Sample envelopes

Concrete payloads from a live runtime, lightly redacted. Field order is audit, action, outcome, requestId, subject, workspaceId, details, msg — pino emits in declaration order, which makes the envelope stable enough to grep with cut/jq.

`workspace.create` — anonymous in dev mode

jsonc

{
  "level": 30,
  "time": 1735603195123,
  "audit": true,
  "action": "workspace.create",
  "outcome": "success",
  "requestId": "01KQG3MCDGC3VWP07BNQWX7NPB",
  "subject": {
    "type": "anonymous",
    "id": null,
    "label": null
  },
  "workspaceId": "ab907991-dba4-4d9d-81f0-4756ec5ccf43",
  "details": { "label": "support-docs" },
  "msg": "audit workspace.create success"
}

subject.type: "anonymous" is normal in development (default auth.mode: disabled). In production, subject.type will be "apiKey" or "oidc" — the auth deployment guard refuses to start with anonymous access on a non-memory control plane.

`auth.login` — failed JWT validation

jsonc

{
  "level": 30,
  "time": 1735603612877,
  "audit": true,
  "action": "auth.login",
  "outcome": "failure",
  "requestId": "01KQG3PV9ZH7T82R4KAE8WBN3X",
  "subject": {
    "type": "anonymous",
    "id": null,
    "label": null
  },
  "details": { "scheme": "oidc", "reason": "audience_mismatch" },
  "msg": "audit auth.login failure"
}

workspaceId is absent because the request never resolves a workspace before the auth middleware rejects it. details.reason is one of the verifier's terminal error codes (audience_mismatch, signature_invalid, token_expired, issuer_mismatch, malformed); see src/auth/oidc/verifier.ts.

`api_key.create` — authenticated by an OIDC subject

jsonc

{
  "level": 30,
  "time": 1735603889104,
  "audit": true,
  "action": "api_key.create",
  "outcome": "success",
  "requestId": "01KQG3QRVF20YGD6MTFB8KKCN5",
  "subject": {
    "type": "oidc",
    "id": "auth0|7c2d4f12",
    "label": "alice@example.com"
  },
  "workspaceId": "ab907991-dba4-4d9d-81f0-4756ec5ccf43",
  "details": { "keyId": "3a4977c8-3e01-4fd0-9b02-2e082950bd40", "label": "ci-deployer" },
  "msg": "audit api_key.create success"
}

The plaintext token (wb_live_…) is only in the HTTP response body, never the audit log. details.keyId is the row id; label is the operator-supplied tag.

Seed-failure events (non-route)

Workspace creation tries to seed default agents, LLM services, chunking services, and embedding services. Per-row failures emit audit: true error lines (not routed through audit() because they don't cleanly fit <resource>.<verb>):

jsonc

{
  "level": 50,
  "time": 1735603195310,
  "audit": true,
  "workspaceId": "ab907991-dba4-4d9d-81f0-4756ec5ccf43",
  "serviceName": "openai-text-embedding-3-small",
  "err": { "type": "ControlPlaneConflictError", "message": "..." },
  "msg": "failed to seed default embedding service"
}

When every seed of a kind fails (systemic — DB outage, broken config), an aggregate line follows with expected: <count> so monitoring can alert on "workspace shipped with no embedders" rather than counting individual failures.

Design rules

The audit module enforces a few rules so events stay safe to ship to external systems:

No secret material. The details field is typed and only accepts a known set of identifier fields (keyId, knowledgeBaseId, scheme, reason, label). Plaintext tokens, refresh tokens, hashes, OAuth codes, and PII are not part of the contract and have no path into the envelope.
Stable action names. <resource>.<verb> in snake_case. We never rename in place — adding a new action and keeping the old one for a release is the migration path.
Outcome is always set. success | failure | denied so SIEM rules can alert on bursts of denied without parsing status codes.
Best-effort. Audit logging must never break the request path. Logger errors are swallowed inside audit().

Operating it

Single-replica. The default pino transport writes to stdout. Pipe the container's stdout into your log pipeline and filter on audit: true.
Multi-replica. Each replica writes its own events; correlate by requestId (already echoed in every audit envelope) and by the Strict-Transport-Security / replicaId markers documented in production.md.
Retention. The runtime does not retain audit events itself. Choose a retention period that satisfies your compliance posture and configure it on the sink.

What's not yet logged

These are tracked as gaps:

Rate-limit denials. They are visible from the limiter's existing log lines but are not audit events yet.
Document and chunk mutation. Volume-sensitive; needs a sampling / batching strategy first.

When a new event lands, add it to the What gets logged table and the AuditAction union in src/lib/audit.ts — the audit-doc-drift test will fail otherwise.

Resolved findings

The following audit gaps have been addressed in the commits listed:

Finding	Status	Resolution
Unbounded agent `systemPrompt` / `userPrompt` fields	Resolved in #147 (8dbf741)	Capped at 128 KB; `name` at 200 chars, `description` at 2 KB.
Monolithic body-size cap (50 MB) on all `/api/v1/workspaces/*`	Resolved in #147 (8dbf741)	Split: 10 MB default, 50 MB only on explicit `.../knowledge-bases/*/ingest`.
Sequential `Promise.all` on multi-KB agent tools	Resolved in #147 (8dbf741)	Parallelized with `Promise.allSettled` in `chat/tools/registry.ts`.
SSRF surface on service endpoint URLs (RFC1918, loopback, cloud metadata)	Resolved in #147 (8dbf741)	Layered validation in `openapi/schemas.ts` + `root.ts`; `runtime.blockPrivateNetworkEndpoints` config flag.

PRs welcome.

Audit logging ​

What gets logged ​

Sample envelopes ​

workspace.create — anonymous in dev mode ​

auth.login — failed JWT validation ​

api_key.create — authenticated by an OIDC subject ​

Seed-failure events (non-route) ​

Design rules ​

Operating it ​

What's not yet logged ​

Resolved findings ​

Audit logging

What gets logged

Sample envelopes

`workspace.create` — anonymous in dev mode

`auth.login` — failed JWT validation

`api_key.create` — authenticated by an OIDC subject

Seed-failure events (non-route)

Design rules

Operating it

What's not yet logged

Resolved findings