The phantom-producer detector (#795) was doing UPDATE + SELECT in two
roundtrips — first incrementing consecutive_empty_runs, then re-
reading to check the stale threshold. Switch to UPDATE ... RETURNING
so the post-increment value comes back in one query.
Called once per schedule per cron tick. At 100 tenants × dozens of
schedules per tenant, the halved DB traffic on the empty-response
path is measurable, not just cosmetic.
Also now properly logs if the bump itself fails (previously it silent-
swallowed the ExecContext error and still ran the SELECT, which would
confuse debugging).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Post-merge audit flagged cp_provisioner.go as the only new file from
the canary/C1 work without test coverage. Fills the gap:
- NewCPProvisioner_RequiresOrgID — self-hosted without MOLECULE_ORG_ID
refuses to construct (avoids silent phone-home to prod CP).
- NewCPProvisioner_FallsBackToProvisionSharedSecret — the operator
ergonomics of using one env-var name on both sides of the wire.
- AuthHeader noop + happy path — bearer only set when secret is set.
- Start_HappyPath — end-to-end POST to stubbed CP, bearer forwarded,
instance_id parsed out of response.
- Start_Non201ReturnsStructuredError — when CP returns structured
{"error":"…"}, that message surfaces to the caller.
- Start_NoStructuredErrorFallsBackToSize — regression gate for the
anti-log-leak change from PR #980: raw upstream body must NOT
appear in the error, only the byte count.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Paired with molecule-controlplane PR #55 (GET /cp/tenants/config). Lets
existing tenants heal themselves when we rotate or add a CP-side env
var (e.g. MOLECULE_CP_SHARED_SECRET landing earlier today) without any
ssh or re-provision.
Flow: main() calls refreshEnvFromCP() before any other os.Getenv read.
The helper reads MOLECULE_ORG_ID + ADMIN_TOKEN from the baked-in
user-data env, GETs {MOLECULE_CP_URL}/cp/tenants/config with those
credentials, and applies the returned string map via os.Setenv so
downstream code (CPProvisioner, etc.) sees the fresh values.
Best-effort semantics:
- self-hosted / no MOLECULE_ORG_ID → no-op (return nil)
- CP unreachable / non-200 → log + return error (main keeps booting)
- oversized values (>4 KiB each) rejected to avoid env pollution
- body read capped at 64 KiB
Once this image hits GHCR, the 5-minute tenant auto-updater picks it
up, the container restarts, refresh runs, and every tenant has
MOLECULE_CP_SHARED_SECRET within ~5 minutes — no operator toil.
Also fixes workspace-server/.gitignore so `server` no longer matches
the cmd/server package dir — it only ignored the compiled binary but
pattern was too broad. Anchored to `/server`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes the C1 integration (PR #50 on molecule-controlplane). The CP
now requires Authorization: Bearer <PROVISION_SHARED_SECRET> on all
three /cp/workspaces/* endpoints; without this change the tenant-side
Start/Stop/IsRunning calls would all 401 (or 404 when the CP's routes
refused to mount) and every workspace provision from a SaaS tenant
would silently fail.
Reads MOLECULE_CP_SHARED_SECRET, falling back to PROVISION_SHARED_SECRET
so operators can use one env-var name on both sides of the wire. Empty
value is a no-op: self-hosted deployments with no CP or a CP that
doesn't gate /cp/workspaces/* keep working as before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two findings from the pre-launch log-scrub audit:
1. handlers/workspace_provision.go:548 logged `token[:8]` — the exact
H1 pattern that panicked on short keys. Even with a length guard,
leaking 8 chars of an auth token into centralized logs shortens the
search space for anyone who gets log-read access. Now logs only
`len(token)` as a liveness signal.
2. provisioner/cp_provisioner.go:101 fell back to logging the raw
control-plane response body when the structured {"error":"..."}
field was absent. If the CP ever echoed request headers (Authorization)
or a portion of user-data back in an error path, the bearer token
would end up in our tenant-instance logs. Now logs the byte count
only; the structured error remains in place for the happy path.
Also caps the read at 64 KiB via io.LimitReader to prevent
log-flood DoS from a compromised upstream.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-launch review blocker. AdminAuth's Tier-1 fail-open fired whenever
the workspace_auth_tokens table was empty — including the window between
a hosted tenant EC2 booting and the first workspace being created. In
that window, every admin-gated route (POST /org/import, POST /workspaces,
POST /bundles/import, etc.) was reachable without a bearer, letting an
attacker pre-empt the first real user by importing a hostile workspace
into a freshly provisioned instance.
Fix: fail-open is now ONLY applied when ADMIN_TOKEN is unset (self-
hosted dev with zero auth configured). Hosted SaaS always sets
ADMIN_TOKEN at provision time, so the branch never fires in prod and
requests with no bearer get 401 even before the first token is minted.
Tier-2 / Tier-3 paths unchanged.
The old TestAdminAuth_684_FailOpen_AdminTokenSet_NoGlobalTokens test
was codifying exactly this bug (asserting 200 on fresh install with
ADMIN_TOKEN set). Renamed and flipped to
TestAdminAuth_C4_AdminTokenSet_FreshInstall_FailsClosed asserting 401.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two HIGH-severity DoS surfaces: both handlers read the entire HTTP
body with io.ReadAll(r.Body) and no upper bound, so a caller streaming
a multi-gigabyte request could exhaust memory on the tenant instance
before we even validated the JSON.
H3 (Discord webhook): wrap Body in io.LimitReader with a 1 MiB cap.
Discord Interactions payloads are well under 10 KiB in practice.
H4 (workspace config PATCH): wrap Body in http.MaxBytesReader with a
256 KiB cap. Real configs are <10 KiB; jsonb handles the cap
comfortably. Returns 413 Request Entity Too Large on overflow.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sub of #795 (phantom-busy post-mortem). Adds last_outbound_at TIMESTAMPTZ
column to workspaces. Bumped async on every successful outbound A2A call
from a real workspace (skip canvas + system callers). Exposed in
GET /workspaces/:id response as "last_outbound_at".
PM/Dev Lead orchestrators can now detect workspaces that have gone silent
despite being online (> 2h + active cron = phantom-busy warning).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TestCommitMemory_GlobalScope_DelimiterSpoofingEscaped: verifies [MEMORY prefix
is escaped to [_MEMORY before DB insert (SAFE-T1201, #807)
- TestCommitMemory_LocalScope_NoDelimiterEscape: LOCAL scope stored verbatim
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SAFE-T1201 (#807): Escape [MEMORY prefix in GLOBAL memory content on
write to prevent delimiter-spoofing prompt injection. Content stored
as "[_MEMORY " so it renders as text, not structure, when wrapped with
the real delimiter on read.
SAFE-T1102 (#805): Pin @molecule-ai/mcp-server@1.0.0 in .mcp.json.example.
Prevents supply-chain attacks via unpinned npx -y.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
current_task exposes live agent instructions to any caller with a
valid workspace UUID. Also strips last_sample_error and workspace_dir
from the public endpoint. These fields remain available through
authenticated workspace-specific endpoints.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Windows CRLF in org-template prompt text caused empty agent responses
and phantom-producing detection. Strips \r at the handler level before
DB persist, plus a one-time migration to clean existing rows.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a schema_migrations table that records which migration files
have been applied. On boot, only new migrations execute — previously
applied ones are skipped. This eliminates:
- Re-running all 33 migrations on every restart
- Risk of non-idempotent DDL failing on restart
- Unnecessary log noise from re-applying unchanged schema
First boot auto-populates the tracking table with all existing
migrations. Subsequent boots only apply new ones.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The supply_chain.go implementation was merged in #937 but never called
from the actual install handler. Plugins with a manifest.json sha256
field now get verified before staging completes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove compiled workspace-server/server binary from git
- Fix .gitignore, .gitattributes, .githooks/pre-commit for renamed dirs
- Fix CI workflow path filters (workspace-template → workspace)
- Replace real EC2 IP and personal slug in test_saas_tenant.sh
- Scrub molecule-controlplane references in docs
- Fix stale workspace-template/ paths in provisioner, handlers, tests
- Clean tracked Python cache files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>