diff --git a/docs/blog/2026-04-21-audit-chain-verification/index.md b/docs/blog/2026-04-21-audit-chain-verification/index.md new file mode 100644 index 00000000..9f5a7540 --- /dev/null +++ b/docs/blog/2026-04-21-audit-chain-verification/index.md @@ -0,0 +1,129 @@ +--- +title: "How Molecule AI's Audit Ledger Works: HMAC Chains and the Fix That Made It Production-Ready" +date: 2026-04-21 +slug: audit-chain-verification +description: "Every agent decision logged, chained with HMAC-SHA256, and verified tamper-evident. Here's the architecture behind Molecule AI's audit trail — and the panic bug fix that shipped in PR #1339." +tags: [security, audit, HMAC, enterprise, compliance] +--- + +# How Molecule AI's Audit Ledger Works: HMAC Chains and the Fix That Made It Production-Ready + +Every time an agent in your Molecule AI org does something — delegates a task, calls a tool, reads a secret, or makes an external API call — that event is written to an append-only audit log. That log is chained with HMAC-SHA256 so that any tampering with past entries is detectable, provable, and logged. + +This post explains how that system works and what changed in PR #1339. + +--- + +## The problem with plain audit logs + +A standard audit log is a list of events with timestamps. It's useful for debugging, but it has a structural weakness: nothing stops someone with database access from editing past rows. A malicious actor — or a buggy cleanup script — can remove or modify entries, and the log looks perfectly fine. + +For production multi-agent systems, that matters. Your compliance team needs to know: *did that agent actually call the API it was supposed to call, or did it skip the approval step?* A plain log can't answer that with confidence. + +Molecule AI's audit ledger is built to answer that question. + +--- + +## HMAC-SHA256 chain architecture + +The audit ledger is an **append-only, chain-verified log**. Each entry contains: + +- The event data (who did what, when, what the result was) +- An HMAC-SHA256 of the current entry, signed with a server-side secret +- The HMAC of the *previous* entry embedded as part of the signing context + +This creates a chain — like a blockchain, but not distributed. Every entry's HMAC depends on the previous entry's HMAC, which depends on the one before that, and so on back to the genesis entry. + +``` +Entry 0: HMAC₀ = HMAC(genesis_payload + genesis_secret) +Entry 1: HMAC₁ = HMAC(event₁ + HMAC₀ + secret) +Entry 2: HMAC₂ = HMAC(event₂ + HMAC₁ + secret) +... +``` + +If you change *any* past entry, its HMAC changes. That breaks the chain at the next verification step. The tampered entry is detectable. + +--- + +## Verifying the chain + +`verifyAuditChain` walks the log from the beginning, recomputing each HMAC and comparing it against the stored value. If every entry verifies, the chain is intact — no tampering. + +If an entry fails to verify, the function returns `false`. Your observability stack picks this up and can alert, halt, or log the discrepancy. The audit trail isn't just a record of what happened — it's a proof that the record hasn't been altered. + +This is what compliance auditors want: not a log, but a **tamper-evident log with cryptographic guarantees**. + +--- + +## What org-scoped keys add + +Org-scoped API keys are the attribution layer on top of the integrity layer. + +Each org key carries a name, a hash, and a prefix. Every authenticated call carries that prefix in the audit row: + +``` +org-token:mole_a1b2 POST /workspaces/ws_abc123/secrets 200 3ms +``` + +Combined with the HMAC chain, you get two guarantees simultaneously: +1. **Integrity** — the audit log hasn't been tampered with (HMAC chain) +2. **Attribution** — you know exactly which named key (and therefore which integration) made each call (org API keys) + +For teams running SOC 2 or ISO 27001, this is the difference between "here's a log" and "here's a cryptographically verifiable, attributable record of everything that happened." + +--- + +## The bug PR #1339 fixed + +In Go, slicing a string beyond its length causes a panic: + +```go +// This panics if len(ev.HMAC) < 12 +log.Printf("expected: %q got: %q", ev.HMAC[:12], expected[:12]) +``` + +`verifyAuditChain` was using `[:12]` to truncate HMACs for log readability — 12 characters is enough to identify a key without printing the full hash. But if an audit row had been corrupted (a database write failure, a migration bug, manual intervention), the stored HMAC could be shorter than 12 bytes. When that row was processed, the verification pass would panic and crash. + +A tamper attempt wouldn't just fail verification — it would take down the verification process. + +**The fix (PR #1339):** add a length check before truncation. + +```go +storedPrefix := ev.HMAC +if len(storedPrefix) > 12 { + storedPrefix = storedPrefix[:12] +} +computedPrefix := expected +if len(computedPrefix) > 12 { + computedPrefix = computedPrefix[:12] +} +log.Printf("expected: %q got: %q", storedPrefix, computedPrefix) +``` + +The logic is unchanged — if the HMAC is long enough, the same 12-char prefix is logged. If it's short or missing, a shorter prefix (or none) is logged. Either way, the chain verification still runs, and mismatches still fail correctly. + +The panic is gone. The integrity guarantee holds. + +--- + +## What this means for production deployments + +If you're running Molecule AI in a production environment: + +- **The audit log is tamper-evident by construction.** You can verify the chain integrity programmatically at any point and alert on failures. +- **Org-scoped keys give you per-integration attribution.** A compromised CI key is identifiable, revocable, and its entire call history is reconstructable. +- **PR #1339 ensures the verification pass itself is hardened.** Corrupt rows — whether from a bug, a migration, or an attack — are handled gracefully, not catastrophically. + +The combination of HMAC chain + org-scoped key attribution + immediate revocation is the foundation of Molecule AI's production trust model for enterprise teams. + +--- + +## Next steps + +- [Org-scoped API keys guide](/docs/guides/org-api-keys) — mint your first named key +- [Architecture: Org API Keys](/docs/architecture/org-api-keys) — the full design +- [Platform API Reference](/docs/api-reference) — audit log endpoints + +--- + +*HMAC-SHA256 audit ledger shipped in PR #594. HMAC truncation guard shipped in PR #1339. Org-scoped API keys shipped in PRs #1105, #1107, #1109, #1110.*