Audit logging
The TypeScript runtime emits structured audit events for the sensitive operations listed below. Events are pino log lines at info level with a stable discriminator field audit: true, so deployments can route them to a dedicated sink (file, syslog, SIEM) by filter.
{
"level": 30,
"time": 1735603200000,
"audit": true,
"action": "api_key.create",
"outcome": "success",
"requestId": "01JFE5...",
"subject": {
"type": "oidc", // "apiKey" | "oidc" | "bootstrap" | "anonymous"
"id": "sub-123",
"label": "alice@example.com"
},
"workspaceId": "ws-1",
"details": { "keyId": "...", "label": "ci-deployer" },
"msg": "audit api_key.create success"
}What gets logged
| Action | Trigger | Notes |
|---|---|---|
api_key.create | POST /api/v1/workspaces/{w}/api-keys | Plaintext is never logged. Only keyId + caller-supplied label. |
api_key.revoke | DELETE /api/v1/workspaces/{w}/api-keys/{keyId} | Soft revoke; emitted on the first revoke only. |
workspace.create | POST /api/v1/workspaces | Includes the workspace label (the human-friendly name). |
workspace.delete | DELETE /api/v1/workspaces/{w} | Emitted after the cascade completes. |
kb.create | POST /api/v1/workspaces/{w}/knowledge-bases | Provisions an Astra collection — destructive on rollback. Includes knowledgeBaseId + label. |
kb.delete | DELETE /api/v1/workspaces/{w}/knowledge-bases/{kb} | Cascades the underlying collection drop and all rag-document rows. Emitted after the cascade. |
document.delete | DELETE /api/v1/workspaces/{w}/knowledge-bases/{kb}/documents/{d} | Cascades chunk wipe before the row drop. Includes knowledgeBaseId + documentId. |
agent.create | POST /api/v1/workspaces/{w}/agents or POST /api/v1/workspaces/{w}/agents/from-template | Includes agentId + label. The from-template variant additionally includes templateId (the catalog slug). |
agent.delete | DELETE /api/v1/workspaces/{w}/agents/{a} | Cascades conversations + chat messages owned by the agent. Emitted after the cascade. |
job.claim | Cross-replica orphan reclaim in jobs/sweeper.ts | Emitted when a replica successfully CAS-claims an orphaned job. Includes jobId + jobKind. Subject is the replica id (synthetic), not a user. |
mcp.invoke | Any tool call into /api/v1/workspaces/{w}/mcp | Includes the toolName. Argument payloads are not logged. |
tool.invoke | Any agent tool call in the chat tool-call loop (POST /api/v1/workspaces/{w}/agents/{a}/conversations/{c}/messages[/stream]) | One row per tool call — built-in, native, Astra, or remote-MCP. Includes the toolName, the source (builtin/native/astra/mcp), and the mcpServerId for mcp-source calls. outcome: "denied" (with reason) when the tool is not on the agent's allow-list or an mcp:-source call's caller lacks the tools:invoke scope (reason: "missing tools:invoke scope"); outcome: "failure" (with reason) when the tool errors or times out; outcome: "success" otherwise. Argument payloads are not logged (secrets could be in args). |
auth.api_denied | 401/403 auth decisions on /api/v1/* | outcome: "denied" with reason; unauthenticated 401s have subject: null, while scoped 403s include the resolved subject when available. A 403 from a scope gate (assertScope/requireScope) also carries a structured requiredScope (e.g. write:ingest) so denials aggregate by scope; workspace-membership 403s and 401s omit it. |
auth.bootstrap_use | Any request authenticated with the bootstrap operator token | Includes scheme: "bootstrap". The plaintext bootstrap token is never logged. |
auth.csrf_rejected | A state-changing request to a cookie-protected route was rejected by the Origin/Referer check | outcome: "failure" with reason ∈ { "no allowed origin available", "missing Origin and Referer on state-changing request", "origin mismatch (got <claimed>)" }. The HTTP response is 403 forbidden_origin. Bearer-token requests bypass the check and never emit this event. |
auth.login | OIDC /auth/callback | outcome: "success" once the access token passes the runtime's own verifier; outcome: "failure" with reason on token-validation errors. |
auth.refresh | OIDC /auth/refresh | outcome: "success" on a clean rotate. outcome: "failure" with reason ∈ { "no_refresh_token", "idp_rejected", "token_validation_failed" } covers the three failure paths (missing cookie, IdP refused the refresh_token, freshly-issued access token failed self-verification). |
auth.logout | OIDC /auth/logout | Emitted on every cookie clear, even when no session was present. |
auth.device.authorize | POST /auth/device/authorize | RFC 8628 device-flow proxy for the aiw CLI. outcome: "success" once the IdP returns a device_code; the audit row includes the short user_code (the human-typed code) but never the device_code itself. outcome: "failure" with reason when the IdP rejects the authorize request. |
auth.device.token | POST /auth/device/token | RFC 8628 device-flow proxy for the aiw CLI — one row per terminal poll. outcome: "success" when the IdP issues an access token; outcome: "failure" with reason for IdP-rejected polls (expired_token, access_denied, etc.). Pending polls (authorization_pending / slow_down) are NOT audited; only the final outcome lands a row, so a long-lived device grant doesn't flood the audit log. |
principal.create | POST /api/v1/workspaces/{w}/principals | RLAC prototype. Includes the principalId. |
principal.update | PATCH /api/v1/workspaces/{w}/principals/{principalId} | RLAC prototype. Includes the principalId. |
principal.delete | DELETE /api/v1/workspaces/{w}/principals/{principalId} | RLAC prototype. Includes the principalId. |
The set is intentionally small. Adding a new event is a one-line call from a route handler — see src/lib/audit.ts.
Sample envelopes
Concrete payloads from a live runtime, lightly redacted. Field order is audit, action, outcome, requestId, subject, workspaceId, details, msg — pino emits in declaration order, which makes the envelope stable enough to grep with cut/jq.
workspace.create — anonymous in dev mode
{
"level": 30,
"time": 1735603195123,
"audit": true,
"action": "workspace.create",
"outcome": "success",
"requestId": "01KQG3MCDGC3VWP07BNQWX7NPB",
"subject": {
"type": "anonymous",
"id": null,
"label": null
},
"workspaceId": "ab907991-dba4-4d9d-81f0-4756ec5ccf43",
"details": { "label": "support-docs" },
"msg": "audit workspace.create success"
}subject.type: "anonymous" is normal in development (default auth.mode: disabled). In production, subject.type will be "apiKey" or "oidc" — the auth deployment guard refuses to start with anonymous access on a non-memory control plane.
auth.login — failed JWT validation
{
"level": 30,
"time": 1735603612877,
"audit": true,
"action": "auth.login",
"outcome": "failure",
"requestId": "01KQG3PV9ZH7T82R4KAE8WBN3X",
"subject": {
"type": "anonymous",
"id": null,
"label": null
},
"details": { "scheme": "oidc", "reason": "audience_mismatch" },
"msg": "audit auth.login failure"
}workspaceId is absent because the request never resolves a workspace before the auth middleware rejects it. details.reason is one of the verifier's terminal error codes (audience_mismatch, signature_invalid, token_expired, issuer_mismatch, malformed); see src/auth/oidc/verifier.ts.
api_key.create — authenticated by an OIDC subject
{
"level": 30,
"time": 1735603889104,
"audit": true,
"action": "api_key.create",
"outcome": "success",
"requestId": "01KQG3QRVF20YGD6MTFB8KKCN5",
"subject": {
"type": "oidc",
"id": "auth0|7c2d4f12",
"label": "alice@example.com"
},
"workspaceId": "ab907991-dba4-4d9d-81f0-4756ec5ccf43",
"details": { "keyId": "3a4977c8-3e01-4fd0-9b02-2e082950bd40", "label": "ci-deployer" },
"msg": "audit api_key.create success"
}The plaintext token (wb_live_…) is only in the HTTP response body, never the audit log. details.keyId is the row id; label is the operator-supplied tag.
Seed-failure events (non-route)
Workspace creation tries to seed default agents, LLM services, chunking services, and embedding services. Per-row failures emit audit: true error lines (not routed through audit() because they don't cleanly fit <resource>.<verb>):
{
"level": 50,
"time": 1735603195310,
"audit": true,
"workspaceId": "ab907991-dba4-4d9d-81f0-4756ec5ccf43",
"serviceName": "openai-text-embedding-3-small",
"err": { "type": "ControlPlaneConflictError", "message": "..." },
"msg": "failed to seed default embedding service"
}When every seed of a kind fails (systemic — DB outage, broken config), an aggregate line follows with expected: <count> so monitoring can alert on "workspace shipped with no embedders" rather than counting individual failures.
Design rules
The audit module enforces a few rules so events stay safe to ship to external systems:
- No secret material. The
detailsfield is typed and only accepts a known set of identifier fields (keyId,knowledgeBaseId,scheme,reason,label). Plaintext tokens, refresh tokens, hashes, OAuth codes, and PII are not part of the contract and have no path into the envelope. - Stable action names.
<resource>.<verb>in snake_case. We never rename in place — adding a new action and keeping the old one for a release is the migration path. - Outcome is always set.
success|failure|deniedso SIEM rules can alert on bursts ofdeniedwithout parsing status codes. - Best-effort. Audit logging must never break the request path. Logger errors are swallowed inside
audit().
Operating it
- Single-replica. The default pino transport writes to stdout. Pipe the container's stdout into your log pipeline and filter on
audit: true. - Multi-replica. Each replica writes its own events; correlate by
requestId(already echoed in every audit envelope) and by theStrict-Transport-Security/replicaIdmarkers documented in production.md. - Retention. The runtime does not retain audit events itself. Choose a retention period that satisfies your compliance posture and configure it on the sink.
What's not yet logged
These are tracked as gaps:
- Rate-limit denials. They are visible from the limiter's existing log lines but are not audit events yet.
- Document and chunk mutation. Volume-sensitive; needs a sampling / batching strategy first.
When a new event lands, add it to the What gets logged table and the AuditAction union in src/lib/audit.ts — the audit-doc-drift test will fail otherwise.
Resolved findings
The following audit gaps have been addressed in the commits listed:
| Finding | Status | Resolution |
|---|---|---|
Unbounded agent systemPrompt / userPrompt fields | Resolved in #147 (8dbf741) | Capped at 128 KB; name at 200 chars, description at 2 KB. |
Monolithic body-size cap (50 MB) on all /api/v1/workspaces/* | Resolved in #147 (8dbf741) | Split: 10 MB default, 50 MB only on explicit .../knowledge-bases/*/ingest. |
Sequential Promise.all on multi-KB agent tools | Resolved in #147 (8dbf741) | Parallelized with Promise.allSettled in chat/tools/registry.ts. |
| SSRF surface on service endpoint URLs (RFC1918, loopback, cloud metadata) | Resolved in #147 (8dbf741) | Layered validation in openapi/schemas.ts + root.ts; runtime.blockPrivateNetworkEndpoints config flag. |
PRs welcome.