CI run #946 (post-#43) confirmed `driver: docker` doesn't fix the ECR
push 401 either: buildx CLI inside the runner container talks to the
operator-host docker daemon (mounted socket), but the daemon doesn't
see the runner's ECR auth state, and the runner's buildx CLI doesn't
attach the auth header in a way the daemon accepts.
Drop buildx + build-push-action entirely. Plain `docker build` +
`docker push` from the runner container works because both use the
SAME docker socket + the SAME runner-container config.json (populated
by `aws ecr get-login-password | docker login` from amazon-ecr-login).
Trade-off: lose multi-arch support. We only ship linux/amd64 tenant
images today, so this is fine. If multi-arch becomes a requirement
later, we can revisit (likely with `docker buildx create
--driver=remote` pointing at an external buildkit, but that's
substantial infra work; not worth it for a single-arch shop).
Closes#173 (fourth piece — and hopefully last; this matches the
operator-host manual approach exactly).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #38 + #41 fixed the Dockerfile-side clone issue. CI run #893 then
revealed two Gitea-Actions-specific issues with the unchanged buildx
config:
1. `failed to push: 401 Unauthorized` to ECR. Root cause: default
buildx driver `docker-container` spawns a buildkit container that
doesn't share the host's `~/.docker/config.json`, so the ECR auth
set up by amazon-ecr-login doesn't reach the push. Fix: pin
`driver: docker` so buildx delegates to the host daemon, which
already has the ECR creds.
2. `dial tcp ...:41939: i/o timeout` on `_apis/artifactcache/cache`.
Root cause: `cache-from/cache-to: type=gha` is GitHub-specific;
Gitea Actions has no compatible artifact-cache backend, so every
cache lookup fails after a 30s timeout. Fix: remove the cache-*
options. Cold-build cost is <10min for 37-repo clone + Go/Node
compile, acceptable. Could revisit with type=registry inline cache
later if rebuilds get painful.
With this + #38/#41, the workflow should run end-to-end on Gitea
Actions: pre-clone -> docker build (host daemon) -> ECR push.
Closes#173 (third and final piece).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first PR (#38) only patched Dockerfile.tenant — but the workflow
also builds the platform image from workspace-server/Dockerfile, which
had the SAME in-image `git clone` stage. Build run #794 caught this:
"process clone-manifest.sh ... exit code 128" on the platform image.
Apply the same pre-clone shape to the platform Dockerfile: drop the
`templates` stage, COPY from .tenant-bundle-deps/ instead. The
workflow's existing "Pre-clone manifest deps" step (added in #38)
already populates .tenant-bundle-deps/ before either build runs, so no
workflow change needed.
Self-review note: the missed-platform-Dockerfile is a Phase 1 quality
miss — I read both files but only registered the tenant one as
in-scope. Saved memory `feedback_orchestrator_must_verify_before_declaring_fixed`
applies: should have grepped the whole workspace-server/ for "templates"
stages before claiming Task #173 done. CI run #794 caught it within
~6 minutes; net cost: one followup commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TestPooledWithEICTunnel_PreservesFnErr (and any sqlmock-using neighbour
test) was at risk of inheriting stale INSERT calls from a previous
test's coalesceRestart goroutine that survived its t.Cleanup boundary.
The production callsite shape is `go h.RestartByID(...)` from
a2a_proxy.go, a2a_proxy_helpers.go and main.go. When that goroutine's
runRestartCycle panics, coalesceRestart's deferred recover swallows it
to keep the platform process alive — but in tests, nothing waits for
the goroutine to fully exit. If it's still draining LogActivity-shaped
work after the test returns, those INSERTs land in the next test's
sqlmock connection as kind=DELEGATION_FAILED /
kind=WORKSPACE_PROVISION_FAILED, surfacing as "INSERT-not-expected".
Fix: introduce drainCoalesceGoroutine(t, wsID, cycle) test helper that
spawns coalesceRestart on a goroutine (matching production) and
registers a t.Cleanup with sync.WaitGroup.Wait so the test can't
declare itself done while a goroutine is still alive.
Convert TestCoalesceRestart_PanicInCycleClearsState to use the helper
(previously it called coalesceRestart synchronously, which never
exercised the production goroutine-survival contract).
Add TestCoalesceRestart_DrainHelperWaitsForGoroutineExit as the
regression guard: cycle blocks 150ms then panics; the test asserts
t.Run elapsed >= 150ms (proving the Wait barrier engaged) AND the
deferred close ran (proving the panic-recovery defer chain executed)
AND state.running was cleared. Verified the assertion is real by
mutation-testing: removing t.Cleanup(wg.Wait) makes this test FAIL
deterministically with elapsed <300µs.
Per saved memory feedback_assert_exact_not_substring: the regression
test asserts an exact-shape contract (elapsed >= blockFor) rather than
a substring-in-output, so it discriminates between "drain works" and
"drain skipped".
Per Phase 3: 10/10 race-detector runs pass for all TestCoalesceRestart_*
tests. Full ./internal/handlers/... suite green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TestPooledWithEICTunnel_PreservesFnErr (and any sqlmock-using neighbour
test) was at risk of inheriting stale INSERT calls from a previous
test's coalesceRestart goroutine that survived its t.Cleanup boundary.
The production callsite shape is `go h.RestartByID(...)` from
a2a_proxy.go, a2a_proxy_helpers.go and main.go. When that goroutine's
runRestartCycle panics, coalesceRestart's deferred recover swallows it
to keep the platform process alive — but in tests, nothing waits for
the goroutine to fully exit. If it's still draining LogActivity-shaped
work after the test returns, those INSERTs land in the next test's
sqlmock connection as kind=DELEGATION_FAILED /
kind=WORKSPACE_PROVISION_FAILED, surfacing as "INSERT-not-expected".
Fix: introduce drainCoalesceGoroutine(t, wsID, cycle) test helper that
spawns coalesceRestart on a goroutine (matching production) and
registers a t.Cleanup with sync.WaitGroup.Wait so the test can't
declare itself done while a goroutine is still alive.
Convert TestCoalesceRestart_PanicInCycleClearsState to use the helper
(previously it called coalesceRestart synchronously, which never
exercised the production goroutine-survival contract).
Add TestCoalesceRestart_DrainHelperWaitsForGoroutineExit as the
regression guard: cycle blocks 150ms then panics; the test asserts
t.Run elapsed >= 150ms (proving the Wait barrier engaged) AND the
deferred close ran (proving the panic-recovery defer chain executed)
AND state.running was cleared. Verified the assertion is real by
mutation-testing: removing t.Cleanup(wg.Wait) makes this test FAIL
deterministically with elapsed <300µs.
Per saved memory feedback_assert_exact_not_substring: the regression
test asserts an exact-shape contract (elapsed >= blockFor) rather than
a substring-in-output, so it discriminates between "drain works" and
"drain skipped".
Per Phase 3: 10/10 race-detector runs pass for all TestCoalesceRestart_*
tests. Full ./internal/handlers/... suite green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
publish-workspace-server-image.yml could not run on Gitea Actions because
Dockerfile.tenant's stage 3 ran `git clone` against private Gitea repos
from inside the Docker build context, where no auth path exists. Every
workspace-server rebuild required a manual operator-host push.
Move cloning to the trusted CI context (where AUTO_SYNC_TOKEN — the
devops-engineer persona PAT — is naturally available). Dockerfile.tenant
now COPYs from .tenant-bundle-deps/, populated by the workflow's new
"Pre-clone manifest deps" step. The Gitea token never enters the image.
- scripts/clone-manifest.sh: optional MOLECULE_GITEA_TOKEN env embeds
basic-auth in the clone URL; redacted in log output. Anonymous fallback
preserved for future public-repo path.
- .github/workflows/publish-workspace-server-image.yml: new pre-clone
step before docker build; injects AUTO_SYNC_TOKEN. Fail-fast if the
secret is empty.
- workspace-server/Dockerfile.tenant: drop stage 3 (templates), COPY
from .tenant-bundle-deps/ instead. Header documents the prereq.
- .gitignore: ignore /.tenant-bundle-deps/ so a local build can't
accidentally commit cloned repos.
Verified locally: clone-manifest.sh with the devops-engineer persona
token cloned all 37 repos (9 ws + 7 org + 21 plugins, 4.9MB after
.git strip).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same SSOT-divergence shape as #10 / fixed in #12, but on the a2a-proxy
code path. The plugin handler was routed through `provisioner.RunningContainerName`;
a2a-proxy was forwarding optimistically and only catching missing containers
REACTIVELY via `maybeMarkContainerDead` after the network call timed out.
Result on tenants whose agent containers had been recycled (e.g. post-EC2
replace from molecule-controlplane#20): canvas waits 2-30s for the network
forward to fail before getting a 503, and the workspace-server logs only
"ProxyA2A forward error" without the "container is dead" signal.
This PR adds a proactive `Provisioner.IsRunning` check in `proxyA2ARequest`
between `resolveAgentURL` and `dispatchA2A`, gated on the conditions where
we know we're talking to a sibling Docker container we own (`h.provisioner
!= nil` AND `platformInDocker` AND the URL was rewritten to Docker-DNS form).
Three outcomes via the SSOT helper:
(true, nil) → forward as today
(false, nil) → fast-503 with `error="workspace container not running —
restart triggered"`, `restarting=true`, `preflight=true`,
plus the same offline-flip + WORKSPACE_OFFLINE broadcast +
async restart that `maybeMarkContainerDead` produces
(true, err) → fall through to optimistic forward (matches IsRunning's
"fail-soft as alive" contract — flaky daemon must not
trigger a restart cascade)
The `preflight=true` flag in the response distinguishes the proactive
short-circuit from the reactive `maybeMarkContainerDead` path so canvas
or downstream callers can render distinct messages later.
* `internal/handlers/a2a_proxy.go` — preflight call site between
resolveAgentURL and dispatchA2A; gated on `h.provisioner != nil &&
platformInDocker && url == http://<ContainerName(id)>:port`.
* `internal/handlers/a2a_proxy_helpers.go` — `preflightContainerHealth`
helper. Routes through `h.provisioner.IsRunning` (which itself wraps
`RunningContainerName`). Identical offline-flip side-effects as
`maybeMarkContainerDead` for the dead-container case.
* `internal/handlers/a2a_proxy_preflight_test.go` — 4 tests: running →
nil; not-running → structured 503 + sqlmock expectations on the
offline-flip + structure_events insert; transient error → nil
(fail-soft); AST gate pinning the SSOT routing (mirror of #12's gate).
Mutation-tested: removing the `if running { return nil }` guard makes
the production code fail to compile (unused var). A subtler mutation
(replacing the !running branch with `return nil`) would make
TestPreflight_ContainerNotRunning_StructuredFastFail fail at runtime
with sqlmock's "expected DB call did not occur."
Refs: molecule-core#36. Companion to #12 (issue #10).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a 'mock' runtime: virtual workspaces with no container, no EC2,
no LLM. Every A2A reply is synthesised from a small canned-variant
pool ('On it!', 'Got it, on it now.', etc.) deterministically seeded
by (workspace_id, request_id).
Built for funding-demo "200-workspace mock org" — renders an
enterprise-scale org chart on the canvas (CEO/VPs/Managers/ICs)
without burning real LLM credits or provisioning 200 EC2 instances.
Surfaces:
- workspace-server/internal/handlers/mock_runtime.go: A2A proxy
short-circuit, canned-reply pool, deterministic variant pick.
- workspace-server/internal/handlers/a2a_proxy.go: gate the
short-circuit before resolveAgentURL (mock has no URL).
- workspace-server/internal/handlers/org_import.go: skip Docker
provisioning for mock workspaces, set status='online' directly,
drop the per-sibling 2s pacing for mock children (collapses
a 200-workspace import from ~7min → ~1s).
- workspace-server/internal/handlers/runtime_registry.go: register
'mock' in the runtime allowlist (manifest + fallback set).
- workspace-server/internal/registry/healthsweep.go +
orphan_sweeper.go: skip mock workspaces in container-health and
stale-token sweeps (no container by design).
- workspace-server/internal/handlers/workspace_restart.go: mirror
the 'external' Restart no-op for mock.
- manifest.json: register the new
Molecule-AI/molecule-ai-org-template-mock-bigorg repo.
Tests: 5 new in mock_runtime_test.go covering happy-path, non-mock
regression guard, determinism, IsMockRuntime trim/case, JSON-RPC
id echo. All existing handler + registry tests still pass.
Local-verified: imported the 200-workspace template against a fresh
postgres+redis, confirmed all 200 land in 'online' and stay there
through the 30s health-sweep window, exercised A2A on CEO + VPs +
Managers + ICs and saw the variant pool rotate.
Org template lives at
Molecule-AI/molecule-ai-org-template-mock-bigorg (created today)
and is imported via the existing /org/import flow on the canvas
Template Palette.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Funding-demo Mock #1: when the canvas loads with `?purchase_success=1`,
show a centred success modal in the warm-paper theme. Auto-dismisses
after 5s; Close button + Esc + backdrop click also dismiss; URL params
are stripped on first paint so a refresh after dismiss does not
re-trigger.
Mounted in `app/layout.tsx` (not `app/page.tsx`) so the modal persists
across the canvas page-state transitions (loading → hydrated → error)
without unmounting and losing its open-state.
No real billing logic — the marketplace "Purchase" button on the
landing page redirects here with the flag; this modal is the only
thing the user sees of the "transaction".
Local-verified end-to-end via playwright (5/5 tests pass): redirect
URL shape, modal visibility, URL cleanup, close button, refresh-after-
dismiss behaviour, 5s auto-dismiss.
Pairs with the Purchase button added to landingpage Marketplace
section.
scripts/clone-manifest.sh runs inside the platform Dockerfile build,
so a change to that script needs to retrigger publish. Without it,
the prior fix (clone via Gitea + lowercase org) didn't trigger this
workflow because scripts/ wasn't in the path filter.
Also serves as the file change to satisfy the path filter for THIS
push, retriggering publish-workspace-server-image now.
Post-2026-05-06 GitHub-org suspension: scripts/clone-manifest.sh
was still pointing at https://github.com/${repo}.git, so the
Docker build for workspace-server'\''s platform image fails at:
fatal: could not read Username for 'https://github.com':
No such device or address
with no credentials available in the build container.
Fix: clone from https://git.moleculesai.app/${repo}.git instead.
manifest.json'\''s repo paths still read 'Molecule-AI/...' (the
historic GitHub slug, mixed-case); Gitea lowercases the org
component to 'molecule-ai/...'. Lowercase the org segment on
the fly with awk so we don'\''t need to rewrite every manifest
entry.
Local verify: bash -n passes, lowercase transform produces correct
Gitea paths, anonymous git clone of one of the manifest plugins
over HTTPS to git.moleculesai.app succeeds.
Class G in the prod-ship CI sweep — same shape as the github.com
ref Harness Replays hits, this is the second instance found.
Two coupled cleanups for the post-2026-05-06 stack:
============================================
The plugin injected GITHUB_TOKEN/GH_TOKEN via the App's
installation-access flow (~hourly rotation). Per-agent Gitea
identities replaced this approach after the 2026-05-06 suspension —
workspaces now provision with a per-persona Gitea PAT from .env
instead of an App-rotated token. The plugin code itself lived on
github.com/Molecule-AI/molecule-ai-plugin-github-app-auth which is
also unreachable post-suspension; checking it out at CI build time
was already failing.
Removed:
- workspace-server/cmd/server/main.go: githubappauth import + the
`if os.Getenv("GITHUB_APP_ID") != ""` block that called
BuildRegistry. gh-identity remains as the active mutator.
- workspace-server/Dockerfile + Dockerfile.tenant: COPY of the
sibling repo + the `replace github.com/Molecule-AI/molecule-ai-
plugin-github-app-auth => /plugin` directive injection.
- workspace-server/go.mod + go.sum: github-app-auth dep entry
(cleaned up by `go mod tidy`).
- 3 workflows: actions/checkout steps for the sibling plugin repo:
- .github/workflows/codeql.yml (Go matrix path)
- .github/workflows/harness-replays.yml
- .github/workflows/publish-workspace-server-image.yml
Verified `go build ./cmd/server` + `go vet ./...` pass post-removal.
=======================================================
Same workflow used to push to ghcr.io/molecule-ai/platform +
platform-tenant. ghcr.io/molecule-ai is gone post-suspension. The
operator's ECR org (153263036946.dkr.ecr.us-east-2.amazonaws.com/
molecule-ai/) already hosts platform-tenant + workspace-template-*
+ runner-base images and is the post-suspension SSOT for container
images. This PR aligns publish-workspace-server-image with that
stack.
- env.IMAGE_NAME + env.TENANT_IMAGE_NAME repointed to ECR URL.
- docker/login-action swapped for aws-actions/configure-aws-
credentials@v4 + aws-actions/amazon-ecr-login@v2 chain (the
standard ECR auth pattern; uses AWS_ACCESS_KEY_ID/SECRET secrets
bound to the molecule-cp IAM user).
The :staging-<sha> + :staging-latest tag policy is unchanged —
staging-CP's TENANT_IMAGE pin still points at :staging-latest, just
with the new registry prefix.
Refs molecule-core#157, #161; parallel to org-wide CI-green sweep.
Gitea is case-sensitive on owner slugs; canonical is lowercase
`molecule-ai/...`. Mixed-case `Molecule-AI/...` refs fail-at-0s
when the runner tries to resolve the cross-repo workflow / checkout.
Same fix as molecule-controlplane#12. Mechanical case-correction;
no behavior change beyond making CI resolve again.
Refs: internal#46
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled fixes for molecule-core#10 (plugin install 503 vs
status=online split-state):
1. SSOT for "is this workspace's container running" — `findRunningContainer`
in plugins.go used to carry its own copy of `cli.ContainerInspect`, which
collapsed transient daemon errors into the same `""` return as a
genuinely-stopped container. Healthsweep's `Provisioner.IsRunning`
handled the same input correctly (defensive). Promote the inspect logic
to `provisioner.RunningContainerName`, route both consumers through it.
Transient errors get a distinct log line on the plugins side so triage
doesn't confuse a flaky daemon with a stopped container.
2. Runtime-aware Install/Uninstall — `runtime='external'` workspaces have
no local container; push-install via docker exec is meaningless. They
pull plugins via the download endpoint instead (Phase 30.3). Without a
guard they fell through to `findRunningContainer` and 503'd with a
misleading "container not running." Add an early 422 with a hint
pointing at the download endpoint.
The two fixes are independent: (1) preserves correctness when the SSOT
helper is later modified; (2) eliminates the persistent split-state on
the 5 external persona-agent workspaces in this DB (and on tenant
deployments hitting the same shape).
* `internal/provisioner/provisioner.go` — new `RunningContainerName(ctx,
cli, id) (string, error)` with three documented outcomes (running /
stopped / transient). `Provisioner.IsRunning` now wraps it; behavior
preserved.
* `internal/handlers/plugins.go` — `findRunningContainer` shimmed onto
`RunningContainerName`; new `isExternalRuntime(id)` predicate.
* `internal/handlers/plugins_install.go` — Install + Uninstall reject
external runtimes with 422 + hint, before the source-fetch step.
* `internal/handlers/plugins_install_external_test.go` — 5 cases:
external→422, uninstall-external→422, container-backed-falls-through,
no-runtime-lookup-fails-open, lookup-error-fails-open.
* `internal/handlers/plugins_findrunning_ssot_test.go` — two AST gates
pin the SSOT routing so future PRs can't silently re-introduce the
parallel impl. Mutation-tested: reverting either consumer to a direct
`ContainerInspect` makes the gate fail.
Refs: molecule-core#10
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In dev mode (`MOLECULE_ENV=dev|development`, `ADMIN_TOKEN` unset) the
AdminAuth chain fails open by design so canvas at :3000 can call
workspace-server at :8080 without a bearer token. Combined with the
existing wildcard bind on `:8080`, that exposed unauthenticated
`POST /workspaces` to any same-LAN peer (S-8 in the audit RFC v1).
Couple the bind narrowness to the same signal that drives the auth
fail-open: when `middleware.IsDevModeFailOpen()` returns true, default
the listener to `127.0.0.1`. Production (`ADMIN_TOKEN` set) keeps
binding to all interfaces — its auth chain is doing the work. Operators
who need LAN exposure set `BIND_ADDR=<host>` explicitly.
* `cmd/server/main.go` — `resolveBindHost()` precedence: BIND_ADDR
explicit > IsDevModeFailOpen() loopback > "" (all interfaces).
Startup log line now includes the resolved bind + dev-mode-fail-open
state for post-deploy auditing.
* `cmd/server/bind_test.go` — 8 t.Setenv table cases covering
precedence, explicit overrides, dev/prod env words. Mutation-tested:
removing the `IsDevModeFailOpen()` branch makes the dev-mode cases
fail with "" vs "127.0.0.1".
Refs: molecule-core#7
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The five `mock.ExpectQuery(\`SELECT id FROM workspaces\`)` sites used a
loose substring regex that silent-passed three regression shapes #2872
called out:
1. `WHERE parent_id = $2` (drops `IS NOT DISTINCT FROM` — breaks
NULL-parent root matching)
2. `WHERE name = $1` only (drops parent_id check entirely — hijacks
siblings of the same name across different parents)
3. Drops `AND status != 'removed'` (blocks re-import after Collapse)
Extracts a `lookupChildSQLRE` const that anchors all four load-bearing
tokens (the SELECT/FROM, the name predicate, the IS NOT DISTINCT FROM
predicate, and the status filter). All five ExpectQuery sites now use
the same const so a future schema/predicate change fails one place.
Mutation-tested per memory feedback_assert_exact_not_substring.md:
- Replacing `IS NOT DISTINCT FROM` with `=` fails
TestLookupExistingChild_NilParent_MatchesRoot.
- Dropping `AND status != 'removed'` fails
TestLookupExistingChild_Found_ReturnsIDAndTrue.
Note: #2872 PR-A (AST gate strengthening) is already addressed inline —
findWorkspacesInsertSQL + TestCreateWorkspaceTree_InsertUsesOnConflictDoNothing
pin the ON CONFLICT DO NOTHING shape, which is a strictly stronger
gate than the original lookup-before-insert ordering check.
Allows MOLECULE_IMAGE_REGISTRY env override on the tenant workspace-server. Used to flip from ghcr.io/molecule-ai → private ECR mirror after the GitHub org suspension on 2026-05-06. Default unchanged for OSS users.
Closes#6.
Add MOLECULE_IMAGE_REGISTRY env var to override the registry prefix used
by all workspace-template image references. Defaults to ghcr.io/molecule-ai
(unchanged for OSS users); set to an ECR URI in production tenants when
mirroring to AWS.
Why this matters: GitHub suspended the Molecule-AI org on 2026-05-06 with
no warning. Production tenants kept running because they had images cached
locally, but any tenant restart (AWS health event, redeploy, OS reboot)
would have failed at `docker pull ghcr.io/molecule-ai/...` because GHCR
returned 401. This change introduces the seam needed to point new pulls at
a registry we control (AWS ECR) by flipping a single env var on Railway.
Design (RFC: molecule-ai/internal#6):
- New `RegistryPrefix()` function in `provisioner/registry.go` reads
MOLECULE_IMAGE_REGISTRY, falls back to "ghcr.io/molecule-ai".
- New `RuntimeImage(runtime)` returns the canonical ref using the prefix.
- `RuntimeImages` map computed at init via `computeRuntimeImages()` so
existing callers that range over it still work.
- `DefaultImage` likewise computed via `RuntimeImage(defaultRuntime)`.
- `handlers.TemplateImageRef()` switched from hardcoded format string to
`provisioner.RegistryPrefix()`.
- `runtime_image_pin.go::resolveRuntimeImage()` automatically inherits
the prefix change because it reads from `provisioner.RuntimeImages[]`
and only re-formats the tag suffix to a digest pin.
Alternatives rejected (see RFC):
- Multi-registry fallback chain (try ECR, fall back to GHCR): GHCR is
locked from outbound for our org, so the fallback never works for us.
Adds code complexity for no benefit.
- Hardcoded ECR-only switch: couples production code to a specific
deployment environment. OSS users self-hosting Molecule would need
the upstream GHCR.
- Self-hosted Harbor / registry-on-Hetzner: adds a component to operate.
Not justified at 3-tenant scale; AWS ECR is mature and IAM-integrated.
Auth — deliberately NOT changed in this commit:
- For GHCR, the existing `ghcrAuthHeader()` reads GHCR_USER/GHCR_TOKEN.
- For ECR, EC2 user-data installs `amazon-ecr-credential-helper` and adds
a `credHelpers` entry in `~/.docker/config.json` so the daemon resolves
ECR credentials via the EC2 instance role on every pull. The Go code
needs no auth change. This keeps the diff minimal.
Backwards compatibility:
- Additive: env unset → identical behavior to today (GHCR).
- Existing tests reference literal `ghcr.io/molecule-ai/...` strings;
they continue to pass under the default prefix.
- `RuntimeImages` map preserved for callers that iterate it.
- No interface, schema, API, or migration version bump needed.
Security review:
- No untrusted input: MOLECULE_IMAGE_REGISTRY is set at deploy time
(Railway env, EC2 user-data), not by users.
- No expanded data collection or logging changes.
- No new permissions: ECR pull permission is a future user-data + IAM
role change, separate from this code change.
- Worst-case: an attacker who already compromises Railway can swap the
registry prefix to a malicious URI — same blast radius as compromising
Railway today, no expansion.
Tests:
- 9 new unit tests in `registry_test.go` covering: default fallback,
env override, empty env, all 9 known runtimes, unknown runtime,
override-applies-to-all, computeRuntimeImages map population, env
reflection, alphabetical ordering pin.
- All existing provisioner + handlers tests continue to pass.
- Mutation-tested mentally: deleting `if v := os.Getenv(...)` makes
TestRegistryPrefix_RespectsEnv fail. Deleting `for _, r := range
knownRuntimes` makes TestRuntimeImage_AllKnownRuntimes fail. The test
suite would catch a regression of the original failure mode.
Rollout plan: this PR is safe to merge with no env change. Production
cutover happens by setting MOLECULE_IMAGE_REGISTRY on Railway after
the AWS ECR mirror is populated (separate ops change, tracked in
issue #6 phases 3b–3f).
Tracking:
- RFC: molecule-ai/internal#6
- Tasks: #97 (ECR setup), #98 (CP fallback)
- Tech debt: runbooks/hetzner-rollout-tech-debt-2026-05-06.md item 7
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#3026. Final piece of RFC #2945.
## What's new
New package internal/messagestore/ holds:
- MessageStore interface — single read-side contract operators
implement to plug in alternative chat-history backends.
- ChatMessage / ChatAttachment / ListOptions types — canonical data
shapes returned by any impl, mirrors canvas's TS ChatMessage.
- PostgresMessageStore — platform-default impl wrapping the
activity_logs query + A2A-envelope parser ported in PR-C.
Behavior is byte-identical to the pre-PR-D handler.
## What moves
The activity_logs query, the parser (activityRowToChatMessages,
extractRequestText, extractChatResponseText, extractFilesFromTask,
etc.), and the internal-self-message predicate all migrate from
internal/handlers/chat_history.go into the new package. handlers/
chat_history.go becomes a thin HTTP-shape adapter:
parse query params → store.List(ctx, workspaceID, opts) → emit JSON
Compile-time interface assertion in postgres_store.go catches future
drift if the interface evolves and the impl falls behind.
## Why this PR
OSS operators wanting to:
- Tier hot/warm/cold storage (recent in Postgres, archival in S3)
- Use a vector store with hybrid search (Pinecone, Weaviate)
- Run an in-memory store for ephemeral test environments
- Federate history across regions
…had no extension point — they'd have to fork the handler. This PR
makes that a constructor swap at router.go.
## Tests
Parser-level (22 tests, MOVED to internal/messagestore/postgres_
store_test.go): every TS test case in
canvas/src/components/tabs/chat/__tests__/historyHydration.test.ts
has a Go counterpart. Timestamp preservation, user/agent extraction,
internal-self filter, role decision (status=error vs agent-error
prefix), v0/v1 file shapes, malformed JSON resilience.
Handler-level (9 NEW tests in internal/handlers/chat_history_test.go):
thin adapter coverage using a fake MessageStore. UUID validation,
before_ts RFC3339 validation, default limit, max-limit clamp,
invalid-limit fallback, before_ts passthrough, empty-array (not
null) JSON shape, attachment shape preservation, store-error → 502
mapping.
Compile-time interface conformance: PostgresMessageStore satisfies
MessageStore, fakeStore (test fake) satisfies MessageStore.
Mutation-tested. Removed UUID validation in the handler; confirmed
TestChatHistoryHandler_RejectsNonUUIDWorkspaceID fires red (status
200 instead of 400, non-UUID reaches the store). Restored, all
green.
Full handlers + messagestore + router test runs green; full repo
go test ./... green.
## SSOT decision
ChatMessage / ChatAttachment / parser / DB query all live in
internal/messagestore/ ONLY. handlers/chat_history.go imports the
package and uses the types via messagestore.ChatMessage etc. — no
re-declaration anywhere.
## Three weakest spots (hostile-reviewer self-pass)
1. The internal-self prefix list (Delegation results are ready...) is
a package var in messagestore/postgres_store.go. A future impl
that wants to override the predicate must reach into the package
to use IsInternalSelfMessage or define its own. Acceptable: the
predicate is part of the contract; if an impl wants different
semantics it owns that decision explicitly.
2. ListOptions has Limit + BeforeTS + HasBefore; future paging needs
(after_ts, peer_id filter, role filter) require additive struct
field additions, which is a soft API break for any impl that
handles ListOptions positionally. Mitigated by Go's struct-literal
convention (named fields by default); also flagged in the
interface comment for impl authors.
3. The handler does NOT log when a store returns an error — it just
maps to 502. An impl that wants to surface its error class up the
stack can't, today. If/when an impl needs that, the interface can
add a typed-error contract in a follow-up. Today's coverage is
sufficient: most ops issues land in the store impl's own logs.
## Security review
- Untrusted input? Same as PR-C — agent-emitted JSON parsed
defensively. New fakeStore in tests can't reach production.
- Trust boundary? Same. Interface lives BEHIND wsAuth; impls only
see workspace IDs already authenticated.
- Auth/authz? Inherited from handler; the interface doesn't
authenticate.
- PII / secrets in logs? Documented in the interface contract:
impls MUST NOT log full message bodies / attachment URIs. The
Postgres impl logs nothing on the happy path.
- Output sanitization? Same plain-text + opaque-URI surface as
PR-C. Canvas validates attachment-URI schemes.
No security-relevant changes beyond what /chat-history already
exposes via PR-C. Considered, not skipped.
## Versioning / backwards compat
- New internal package. Zero public API change.
- Single caller site in router.go updated (one-line constructor
change). NewChatHistoryHandler() → NewChatHistoryHandler(store).
- No schema change, no migration.
- Existing /chat-history endpoint unchanged on the wire — clients
don't notice the refactor.
## Phasing
This is the final RFC #2945 piece. Follow-ups parked:
- PR-C-2 (canvas migration): swap canvas loadMessagesFromDB to call
/chat-history instead of /activity. Independent of this PR;
blocked only by canvas team's calendar.
- Sample alternative impls (S3, in-memory) for OSS docs: separate
PR when the first OSS consumer materializes; demonstration code
untested against a real workload is anti-pattern.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Closes the SSOT gap for chat-history hydration: today every consumer
(canvas TS) re-implements an A2A-envelope walk to map activity_logs
rows into rendered ChatMessage objects. This PR moves that walk into
the server.
## What's added
GET /workspaces/:id/chat-history?limit=N&before_ts=T
Returns:
{
"messages": [
{"id": "<uuid>", "role": "user"|"agent"|"system",
"content": "...", "attachments": [...], "timestamp": "<RFC3339>"}
],
"reached_end": false
}
Auth chain: same wsAuth as /workspaces/:id/activity (tenant ADMIN_TOKEN
+ X-Molecule-Org-Id). No new trust boundary.
Filter: a2a_receive rows with source_id IS NULL — same canvas-source
filter the canvas applies via /activity?type=a2a_receive&source=canvas,
centralized so future API consumers don't need to know it.
## What's mirrored from canvas TS
Direct port of canvas/src/components/tabs/chat/historyHydration.ts
+ message-parser.ts:
- extractRequestText / extractFilesFromUserMessage — user-side parts
walk through request_body.params.message.parts[]
- extractChatResponseText — agent-side response_body collector across
the four shapes (string, A2A JSON-RPC parts, older nested
parts.root.text, task artifacts) joined with "\n" (matches canvas
multi-source collector — claude-code emits multiple text parts;
hermes emits summary+artifacts)
- extractFilesFromResponse / extractFilesFromTask — file walk across
parts[] + artifacts[].parts[] + status.message.parts[] +
message.parts[]
- v0 hot path ({kind:"file", file:{...}}) AND v1 protobuf flat shape
({url, filename, mediaType}) both supported
- Role decision: status='error' OR text starts with "agent error"
(case-insensitive) → "system", else "agent"
- isInternalSelfMessage prefix filter (Delegation results are
ready...)
- Timestamp pinned to row.created_at (regression cover for
2026-04-25 bubble-collapse bug)
## Tests
22 unit tests in chat_history_test.go, every TS test case in
historyHydration.test.ts has a Go counterpart:
Timestamp preservation (3): user/agent pin to created_at, two-rows
produce two distinct timestamps.
User-message extraction (5): text-only, internal-self skip,
null body, attachments hydrated, attachments-only-when-text-empty,
internal-self suppresses even with attachments.
Agent-message extraction (4): result-string, status=error→system,
agent-error-prefix→system, response_body.parts attachments,
null body, no-text-no-files-no-bubble.
End-to-end (1): paired user+agent same timestamp.
Go-specific (5): malformed JSON returns empty (no panic), v1
protobuf flat shape extraction, task-artifacts extraction, older
nested root.text shape, basename helper edge cases.
isInternalSelfMessage predicate (1): prefix match, non-prefix non-
match, empty-text non-match.
Mutation-tested. Removed the role-promotion branch (status=error +
agent-error prefix → system); confirmed both
TestChatHistory_RoleSystemWhenStatusError and
TestChatHistory_RoleSystemWhenAgentErrorPrefix fire red. Restored.
Both green.
Full handlers test suite (4.3s) green; full repo `go test ./...` green.
## SSOT decision
Parsing logic lives in workspace-server/internal/handlers/chat_history.go
ONLY. Canvas keeps historyHydration.ts + message-parser.ts during the
transition because:
- PR-C-2 (follow-up): canvas loadMessagesFromDB swaps to new
endpoint. Today's canvas still calls /activity for backward
compatibility.
- The TS parsers are still load-bearing for LIVE message handling
(WebSocket A2A_RESPONSE events) until RFC #2945 PR-B-2 mirrors
the typed event payloads to canvas consumers.
Canvas's TS path will be deleted in a separate PR after a one-week
observation window confirms no live-message consumers depend on it.
## Security review
- Untrusted input? YES — request_body and response_body come from
agents (potentially OSS / third-party). Defensive: any malformed
JSON returns empty content + no attachments, no panic. Tested
via TestChatHistory_MalformedJSONInRequestBodyReturnsEmpty.
- Trust boundary? Same as today: agent → workspace-server.
No new boundary; reuses existing wsAuth middleware.
- Auth/authz? Inherits wsAuth chain. Cross-workspace access blocked
by existing TenantGuard middleware.
- PII / secrets in logs? None. The handler logs nothing on the
happy path; errors log 502 without body content.
- Output sanitization? ChatMessage.content is plain text returned
as-is; canvas already sanitizes via ReactMarkdown. Attachment
URIs are agent-provided (workspace: / platform-pending: /
https:); canvas's existing scheme allow-list still applies.
## Versioning / backwards compatibility
- New endpoint /chat-history. /activity unchanged.
- Canvas historyHydration.ts + message-parser.ts intact during
transition (will be removed in PR-C-2 follow-up).
- No public API consumer of /activity is broken — added route is
additive.
- No semver bump (server is internal versioning).
## Three weakest spots (hostile-reviewer self-pass)
1. extractRequestText returns ONLY parts[0].text. If a user message
contains multiple text parts (uncommon — canvas only ever emits
one), we lose later parts. Matches canvas exactly today, but a
future change that emits multi-text user messages needs both
parsers updated. Documented in code; covered by test if/when
added.
2. activityRowToChatMessages rebuilds ChatMessage IDs every call (no
caching). Each chat reload mints fresh UUIDs. This is fine because
canvas dedupes by (role, content, timestamp window) not id, but a
future API consumer that DID rely on id stability would break.
Documented in the ChatMessage struct comment.
3. The handler scopes to source_id IS NULL only (canvas-source rows).
A future "show all messages, including agent-to-agent" mode would
need a new endpoint or a parameter. Out of scope for PR-C; canvas's
/activity?source=canvas already enforces the same filter.
Closes#3017. Unblocks RFC #2945 PR-D (MessageStore interface) which
returns []ChatMessage typed values.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#2962.
## Why
Six per-package `truncate` helpers had drifted into independent
re-implementations of the same idea. Three of them (delegation.go,
memory/client/client.go, memory-backfill/verify.go) used
`s[:max] + "…"` byte-slice form, which on a multi-byte codepoint at
byte `max` produces invalid UTF-8 → Postgres `text`/`jsonb` rejects
the INSERT silently → `delegation` / `activity_logs` row never lands
→ audit gap.
Three other helpers (delegation_ledger.go #2962, agent_message_writer.go
#2959, scheduler.go #2026) had each been fixed in isolation with three
slightly different rune-safe shapes — confirming this is a class of
bug, not a single instance.
## What
New package `internal/textutil` with three rune-safe functions:
- `TruncateBytes(s, maxBytes)` — byte-cap, "…" marker. Used by 5
callers writing into byte-bounded columns / log lines.
- `TruncateBytesNoMarker(s, maxBytes)` — byte-cap, no marker. Used by
delegation_ledger.go where the storage already conveys "preview"
and an extra ellipsis would push the result over the column cap.
- `TruncateRunes(s, maxRunes)` — rune-cap, "…" marker. Used by
agent_message_writer.go where the cap is in display chars (UI
summary), not bytes.
All three guarantee `utf8.ValidString(out)` for any `utf8.ValidString(in)`.
Inputs already invalid go through `sanitizeUTF8` at the call site
boundary (scheduler.go preserved this defense-in-depth).
## Migration map
| Old | New | Behavior change |
|---|---|---|
| `delegation_ledger.truncatePreview` | `textutil.TruncateBytesNoMarker(s, 4096)` | none |
| `agent_message_writer.truncatePreviewRunes` | `textutil.TruncateRunes(s, n)` | none |
| `scheduler.truncate` | `textutil.TruncateBytes(s, n)` | "..." → "…" (3 bytes either way; single-glyph display) |
| `delegation.truncate` | `textutil.TruncateBytes(s, n)` | bug fix + ellipsis swap |
| `memory/client.truncate` | `textutil.TruncateBytes(s, n)` | bug fix |
| `memory-backfill.truncate` | `textutil.TruncateBytes(s, n)` | bug fix |
Five separate `truncate*` helpers + their per-package tests removed.
Net: 12 files / +427 / -255.
## Tests
- `internal/textutil/truncate_test.go` — 27 table-test cases + 145
fuzz-invariant cases asserting `utf8.ValidString` and byte-cap
invariants on every output.
- `delegation_ledger_test.go TestLedgerInsert_TruncatesOversizedPreview`
strengthened with `capValidUTF8Matcher` so the SQL-write argument
is asserted to be valid UTF-8 + within cap (not just `AnyArg()`).
Mutation-tested: replacing the SSOT call with byte-slice form makes
this test fail loud.
## Compatibility
- All callers internal; no external API surface change.
- Ellipsis swap "..." → "…": same byte budget (3 bytes), single-glyph
display. No alerting/grep on either marker in this codebase
(verified). Canvas renders both correctly.
- DB column widths unchanged (4096 / 80 / 200 / 256 / 300 — all
preserved in the migrations).
## Security
Fixes a silent INSERT-failure mode that hid `activity_logs` /
`delegations` rows containing peer-controlled text. The class of input
that triggered it (CJK, emoji, accented Latin) is normal user content,
not malicious — but the symptom (audit gap) makes incident
reconstruction harder. Helper is pure-function over `string`; no
secrets / PII / auth handling involved. Untrusted input is handled
identically to before, just rune-aligned now.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The migration-replay step globbed only *.up.sql, silently skipping
the older flat-naming migrations (001_workspaces.sql,
009_activity_logs.sql, etc.). Fine while no integration test
depended on those tables; broke when the #149 cross-table
atomicity test came in needing both workspaces (FK target for
activity_logs) and activity_logs themselves.
Switch to globbing *.sql + sorted lex-order, excluding *.down.sql
so up/down pairs don't undo themselves mid-run. Add a sanity check
for workspaces + activity_logs + pending_uploads alongside the
existing delegations gate so a future migration drift fails loud
instead of silently skipping the regressed test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two real-Postgres tests under //go:build integration:
- TestIntegration_PollUpload_AtomicRollback_AcrossBothTables exercises
the helpers in the same Tx shape uploadPollMode does (PutBatchTx +
LogActivityTx + Rollback) and asserts COUNT(*)=0 on BOTH
pending_uploads AND activity_logs after the rollback. Failure
injection: NUL byte in `summary` triggers lib/pq protocol rejection
on the second activity insert — same trick the existing PutBatch
AtomicRollback test uses.
- TestIntegration_PollUpload_HappyPath_AcrossBothTables is the positive
counterpart — Commit lands N rows in both tables.
Coverage rationale (post-PR-3010 review):
- sqlmock unit test (TestPollUpload_AtomicRollbackOnActivityInsertFailure)
proved the handler calls Begin/Exec/Exec-fail/Rollback in order.
- Existing PutBatch integration test proved Postgres honors rollback
for pending_uploads alone.
- New tests close the cross-table gap: prove LogActivityTx + PutBatchTx
+ real Postgres MVCC compose correctly under rollback.
A regression that made LogActivityTx silently route through db.DB
instead of the passed tx would still pass the sqlmock test (the
Begin/Commit/Rollback shape would look right) but would fail this
integration test (the activity_logs row would survive the rollback).
Verified locally: postgres:15-alpine + all migrations applied, both
tests pass in 0.1s. Skips cleanly without INTEGRATION_DB_URL — CI
already runs this file via the Handlers Postgres Integration job.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Bug
`/org/import` had no per-tenant mutex, advisory lock, or DB-level
uniqueness on (parent_id, name). The pattern was lookup-then-insert:
existingID, existing, err := h.lookupExistingChild(...) // SELECT
if existing { return /* skip */ }
db.DB.ExecContext(ctx, `INSERT INTO workspaces ...`) // INSERT
Two concurrent admin POSTs (rapid double-click in canvas, retry-after-
timeout, two operators on the same template) both saw "not found" in
the SELECT and both INSERT'd the same (parent_id, name).
Captured impact: tenant-hongming accumulated 72 stale child workspaces
in 4 days from repeated org-template spawns of the same template
(see #2857 phase 4 sweeper for the cleanup; #2872 for the prevention RFC).
## Fix
Two-layer fix — DB-level backstop AND application-level happy path:
1. **Migration** `20260506000000_workspaces_unique_parent_name.up.sql`
```sql
CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS workspaces_parent_name_uniq
ON workspaces (
COALESCE(parent_id, '00000000-0000-0000-0000-000000000000'::uuid),
name
)
WHERE status != 'removed';
```
* COALESCE(parent_id, sentinel) collapses NULLs so root workspaces
also collide pairwise.
* `WHERE status != 'removed'` lets a tombstoned row be replaced
by a same-named re-import (preserves existing org-import semantics).
* CONCURRENTLY avoids ACCESS EXCLUSIVE on production tenants under
live traffic; IF NOT EXISTS makes the migration resumable.
* Down migration drops CONCURRENTLY symmetrically.
2. **`org_import.go` swap**
Replace lookup-then-insert with `INSERT ... ON CONFLICT DO NOTHING
RETURNING id`. On the skip path (RETURNING returns 0 rows →
sql.ErrNoRows), re-select the existing id to recurse children:
INSERT INTO workspaces (...) VALUES (...)
ON CONFLICT (COALESCE(parent_id, ...), name)
WHERE status != 'removed'
DO NOTHING
RETURNING id;
The ON CONFLICT target predicate matches the partial-index predicate
exactly — required for Postgres to consider the index applicable.
Existing `lookupExistingChild` helper kept (still used on the skip
path); semantics unchanged.
## Test coverage
* AST gate refreshed to assert the workspaces INSERT contains the
ON CONFLICT pattern (`onConflictDoNothingRE`) instead of the now-obsolete
"lookup-before-insert" ordering. Per behavior-based gating
(memory: feedback_behavior_based_ast_gates.md), the new gate pins
the actual TOCTOU-resolution behavior.
* Companion `TestGate_FailsWhenInsertOmitsOnConflict` proves the gate
catches the bug shape on synthetic source.
* All existing `lookupExistingChild` unit tests (no-rows, found,
nil-parent, DB error, wrapped no-rows) still pass — helper is
unchanged and still load-bearing on the skip path.
* Live Postgres E2E coverage runs via the existing
"Handlers Postgres Integration" CI job, which applies migrations
to a real PG and exercises the INSERT path.
## Why ship the migration + swap together (not stacked)
The migration alone provides a DB-level backstop, but without the
handler swap a UNIQUE-violation surfaces as a 500 to the user. The
handler swap alone has no enforceable target until the migration
applies. Shipped together they give graceful skip + atomic backstop.
Migration is CONCURRENTLY + IF NOT EXISTS, safe to apply even on
tenants where the sweeper (#2860) hasn't run yet — the index just
declines to build until conflicting rows are reconciled.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#149.
uploadPollMode for poll-mode chat uploads previously committed N
pending_uploads rows in one Tx (PutBatch), then wrote N activity_logs
rows individually outside any Tx. A per-row failure on activity row K
left rows 1..K-1 committed and pending_uploads orphaned until the 24h
TTL — not data-loss because the platform's fetcher handled the
half-state cleanly, but the user never saw file K in the canvas and
the inconsistency surfaced as an "uploaded but invisible" complaint
class.
Thread one Tx through PutBatchTx + N × LogActivityTx + Commit so all
or none commit. Broadcasts are deferred until after Commit — emitting
an ACTIVITY_LOGGED event for a row that ends up rolled back would
paint a ghost message into the canvas's optimistic UI. A new
LogActivityTx returns a commitHook the caller invokes post-Commit;
the existing fire-and-forget LogActivity is unchanged for the 4 other
production callers (a2a_proxy_helpers + activity.go report path).
Storage interface gains PutBatchTx; PostgresStorage.PutBatch is
refactored to share the validation + insert path. inMemStorage and
fakeSweepStorage delegate or no-op for PutBatchTx (the in-mem fake
can't model Tx state — DB-level atomicity is verified by the existing
real-Postgres integration test for PutBatch + the new unit test
asserting the Go handler calls Rollback on activity-insert failure).
Tests:
- TestPollUpload_AtomicRollbackOnActivityInsertFailure pins the new
contract via sqlmock — second activity insert errors → Rollback
expected, Commit must NOT be called.
- TestLogActivityTx_DefersBroadcastUntilCommitHook +
_InsertError_NoHook_NoBroadcast + _NilTx_Errors cover the new API.
- TestPutBatchTx_HappyPath / _EmptyItems / _ValidationFails /
_PerRowErrorPropagates cover Tx-aware storage layer.
- 7 existing TestPollUpload_* tests updated to mock Begin + Commit
(or Begin + Rollback for failure paths) since the handler now
opens a Tx around PutBatch + activity inserts.
All workspace-server tests pass; integration tag also clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
github-code-quality bot flagged this as the last unresolved review thread
blocking the merge queue. The function is referenced in comments but
never called from this file (download is dispatched via the lightbox /
AttachmentChip path). Removing the import resolves the bot thread and
clears the staging branch-protection 'all conversations resolved' gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User asked for VSCode-style drag-drop upload (#2999): "drag local to
upload to target folder just like vscode does". Today the only upload
path is the toolbar's Upload button (folder picker). Drag-drop lets
users grab files from Finder/Explorer and drop them directly on a
specific subdirectory in the tree.
1. New `uploadDataTransferItems(items, targetDir)` in `useFilesApi`
— walks the HTML5 DataTransferItemList via `webkitGetAsEntry()`,
recursing folders to a flat (relativePath, file) list, then PUTs
each via the existing /files/<path> endpoint. The walker (also
exported via `__testables`) calls `readEntries()` in a loop until
empty so multi-batch folders (browsers cap each call at ~100
entries) aren't silently truncated.
2. `uploadFiles` (folder-picker path) gained an optional `targetDir`
parameter. Same prefixing semantics so future surfaces (e.g. an
"upload here" toolbar button on a row) can reuse it.
3. `FileTree` directory rows gained `onDragOver` / `onDragEnter` /
`onDragLeave` / `onDrop` handlers + a hover-target highlight
(accent-tinted background + outline). dragLeave uses
`currentTarget.contains(relatedTarget)` to suppress the flicker
that fires when the cursor crosses any child of the row (icon,
label, ✕ button) — without this the highlight strobes on every
sub-element transition.
4. `FilesTab` wraps the tree column in an outer drop zone for
"drop on root" — drops outside any specific subdir row land at
root. The empty-state placeholder copy now includes a
"drag files here to upload" hint when the active root is
/configs (the only writable root today).
5. Both the row drop and the root drop are gated on
`root === "/configs"` (the same gate that already blocks the
toolbar's New / Upload / Clear). Other roots ignore the drag
entirely (no highlight, no drop), so the user doesn't get a
misleading drag affordance followed by a "switch root" toast.
`dragDropUpload.test.tsx` (9 tests, two layers):
Walker tests (pure function, no DOM):
- `walkEntry` collects a single dropped file with correct relpath.
- `walkEntry` walks a folder + preserves folder name in the path.
- **Multi-batch loop**: a fake reader that emits two batches of 2
+ an empty terminator must yield 4 files. A walker that called
readEntries once would see only 2 — this is the load-bearing
assertion against silent folder truncation.
- Nested directories: outer/inner/file.md → "outer/inner/file.md".
FileTree drag-drop wiring (DOM):
- `dragover` on a directory row preventDefault's (load-bearing —
without it the drop event never fires).
- `drop` on a directory row fires `onDropToTarget(path, items)`.
- `drop` on a FILE row does NOT fire (only directories are valid
drop targets).
- `drop` with no DataTransferItems does NOT fire (defensive guard
against text-only drags).
- `dragenter` adds the highlight class to the directory row.
1. The 1MB per-file size cap is inherited from the existing
`uploadFiles`. A user dropping a 5MB skill bundle silently
skips the file (the loop's `continue` on `file.size >
1_000_000`). Same behavior as the toolbar Upload, so consistent
if not great. Surfacing skipped-files would be a UX improvement
tracked separately — not load-bearing for this PR.
2. Drop-zone highlight on the column wrapper uses an outline that
sits inside the column's overflow-y-auto scroll container. If
the user drags onto a row that's mid-scroll, the highlight may
clip slightly at the scroll boundary. Cosmetic only; the drop
still works.
3. The `?root=` query is NOT passed on the underlying writeFile
call (matches the existing uploadFiles behavior). On a backend
without #2999 PR-A, this means uploads always land in /configs
regardless of selected root — but we already gated drop on
`root === "/configs"` so the practical effect is nil today.
Once PR-A merges and the canvas threads ?root= through writes
(separate follow-up), drops on /home etc. would be enableable
by lifting the canDelete-style gate.
- `npx tsc --noEmit` clean
- 177/177 canvas tab tests pass
- Manual on local dev: drag a file from Finder onto /configs/skills
row → file appears under /configs/skills/<name>. Drag a folder of
3 files onto root area → 3 files uploaded with folder structure
preserved. Drag onto /home tree → no highlight, no drop.
Refs #2999. Pairs with PR-A (backend EIC) — without PR-A the tree
is empty on SaaS and there's nothing to drop ONTO; PR-D still works
on self-hosted today.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Adds two new arms to the AttachmentPreview kind dispatcher:
* PDF — chip in the bubble, click opens the shared AttachmentLightbox
with a browser-native <embed type="application/pdf"> at 95vw/90vh.
Fetch+Blob+ObjectURL auth path matches AttachmentImage / Video. PDF.js
not pulled in; browser viewer is good enough for the desktop chat MVP
(Slack/Linear/Notion all gate full-page PDF behind a click for the
same reason). Falls back to AttachmentChip on fetch error.
* Text/code/JSON/YAML — first 10 lines in monospace <pre><code> right
in the bubble, "Show all N lines" expands to full content, with a
filename + ⬇ download header. Streams up to 256 KB then marks
truncated and offers a download chip; large logs don't crash the
bubble. No syntax highlighting in v1 — shiki adds 200-500 KB and is
pure polish.
Coverage: 5 new dispatch tests (PDF success → embed in lightbox,
PDF fetch fail → chip fallback, text inline render, text long content
→ Show-all-N-lines expand button, text fetch fail → chip fallback).
All 19 AttachmentPreview tests pass; tsc --noEmit clean.
Stacked on rfc-2991-pr-1-image-preview-lightbox (PR-2 already merged
into PR-1's branch). PR-1 ships first; this rebases onto staging
once it lands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Why
User asked for a VSCode-style right-click menu on file rows (#2999):
"right click to have a menu to download". Today the only download
affordance is the toolbar's Export-all (bulk JSON dump), and the
inline ✕ button is the only delete UX (small click target, easy to
miss).
## Fix
1. New `FileTreeContextMenu` component — fixed-position popover with
Open / Download / Delete items composed per-row (files get all
three; directories get Delete only since "open a directory in the
editor" doesn't apply). Esc + outside-click + Tab + scroll
dismiss. ↓/↑ arrow keys rove focus between menu items. role=menu
+ role=menuitem + autofocus on first item for a11y.
2. Menu state lifted to the top-level `FileTree` (not per-row) so
opening a second row's menu auto-closes the first — only one
menu open at a time, matching VSCode/Theia. Pinned by the
`replaces the first` test.
3. New `downloadFileByPath(path)` in `useFilesApi` — fetches via the
existing GET /workspaces/<id>/files/<path>?root= endpoint and
triggers a browser download. Distinct from the existing
`handleDownloadFile` which downloads the in-editor buffer
(round-trips unsaved edits to disk); the context-menu download
targets arbitrary tree rows the user hasn't opened.
4. `canDelete` prop threaded from FilesTab → FileTree → menu →
item. Same gate as the toolbar (Clear/New/Upload all gated to
/configs); context menu's Delete renders as disabled with a
muted background on other roots, matching the "feature exists
but isn't applicable here" pattern.
## Test coverage
`FileTreeContextMenu.test.tsx` (8 tests):
- File row → menu opens with Open + Download + Delete.
- Directory row → menu opens with Delete only.
- Click Download → onDownload(path) fires + menu closes.
- Click Delete (canDelete=true) → onDelete(path) fires.
- Click Delete (canDelete=false) → onDelete NOT called + menu stays
open (disabled-state UX).
- Esc dismisses.
- Outside-click (mousedown on document.body) dismisses.
- Opening second context menu replaces the first (only-one-open
invariant).
Each test uses fireEvent + screen.getByRole, so they fail on a
deleted-code regression — none would pass on the pre-PR shape.
## Three weakest spots (hostile self-review)
1. The menu is positioned at `clientX/clientY` without viewport
clamping. If the user right-clicks at the very bottom-right of
the panel, part of the menu may overflow off-screen. VSCode
handles this by flipping the anchor; we don't yet. Acceptable
v1 because the FilesTab is fixed-width (≤ side-panel width)
and the menu is small (140×~80px); the overflow would be a few
pixels of one item. Filed as a follow-up.
2. Auto-focus on the first item shifts keyboard focus away from
the row that opened the menu. Closing with Esc returns focus
to the body, not the row. Same behavior as TerminalTab's
placeholder + the canvas's other context menus; consistent
isn't ideal but at least uniform. Documented inline.
3. The download request reuses the API client's 15s default
timeout — large config files (multi-MB skill bundles) on a
slow connection could time out. Same risk applies to the
existing toolbar Export. If we see real download failures we
can add a `timeoutMs` override at the call site without
touching the menu.
## Verification
- `npx tsc --noEmit` clean
- 176/176 canvas tab tests pass
- Manual on local dev: right-click a config.yaml row → menu opens
→ click Download → file lands in Downloads. Right-click on
/home root → Delete renders disabled.
Refs #2999. Pairs with PR-A (backend EIC) — without PR-A the tree
is empty and there's nothing to right-click on a SaaS workspace.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
## Why
Reported by user (issue #2999): external workspaces (mac laptop, mac
mini, hermes-on-home-server — runtime="external") render the FilesTab
identically to the SaaS empty-listing bug, showing "0 files / No
config files yet" even though the platform doesn't actually own the
filesystem of these workspaces. Visually indistinguishable from the
broken state, reads as a bug.
## Fix
Mirror the affordance TerminalTab adopted in PR #2830 for runtimes
without a TTY:
1. New `NotAvailablePanel` in `canvas/src/components/tabs/FilesTab/`
— folder-with-slash icon + "Files not available" headline + body
text that names the runtime and points the user at Chat.
2. `FilesTab` now takes optional `data?: WorkspaceNodeData`. When
`data.runtime` is in `RUNTIMES_WITHOUT_FILES` (currently just
"external"), early-return the placeholder before mounting the
useFilesApi hook. Mirrors TerminalTab's prop shape exactly so the
review pattern is uniform across tabs.
3. SidePanel passes `node.data` to FilesTab (matches existing pattern
for ChatTab / TerminalTab).
## Test coverage
`FilesTab.notAvailable.test.tsx` (4 tests):
- external runtime → banner renders with runtime name + Chat-tab
guidance copy.
- external runtime → NO `/files` API request fires (asserted by
inspecting the mocked api.get call log).
- claude-code runtime → no banner, normal mount proceeds (toolbar's
root selector is the discriminator).
- data prop omitted → falls through to normal mount (back-compat
with any caller that doesn't thread data through, e.g. legacy
tests).
Each branch is independent and discriminating — none would pass on
a code-deleted version of the early-return.
## Three weakest spots (hostile self-review)
1. `RUNTIMES_WITHOUT_FILES` is a hardcoded set in this file. If a
future runtime joins (e.g. a "byok-claude" that runs on user
hardware), someone has to remember to add it here. Reviewed
alternatives: pull from a runtime-capabilities registry — same
shape as `RUNTIMES_WITHOUT_TERMINAL` already in TerminalTab. We
chose the parallel pattern over a new abstraction; consolidating
into a shared registry can land if/when a third tab grows the
same gate (rule of three). Documented inline.
2. The placeholder is a static panel — no retry, no "report bug"
link. Same as TerminalTab's. Acceptable because the absence is
intentional, not transient.
3. Chat-tab guidance is hardcoded English. No i18n in canvas yet;
matches the rest of the codebase. Will move with the i18n
migration when that lands.
## Verification
- `npx tsc --noEmit` clean
- 54/54 canvas tab + SidePanel tests pass
- Will be live-verified on staging post-merge: open Files tab on an
external workspace (mac laptop) → expect placeholder; open on a
platform-owned workspace (Hongming Personal Brand Agent) → expect
normal tree (assuming PR-A also lands).
Refs #2999. Pairs with PR-A (backend EIC fix) — without PR-A the
platform-owned path still shows "0 files" because the backend never
returns rows.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
## User-visible bug
Canvas Files tab returns "0 files / No config files yet" for every
SaaS workspace, every root (/configs, /home, /workspace, /plugins).
Reported by user (canvas screenshot, hongming.moleculesai.app,
Hongming Personal Brand Agent — claude-code, T4, online).
## Root cause
`ListFiles` (templates.go) was missing the SSH-via-EIC branch that
ReadFile (PR #2785) and WriteFile (PR #1702) already have. On SaaS,
dockerCli is nil → findContainer returns "" → falls through to
host-side resolveTemplateDir which only matches baked-in template
names. For a user-named workspace it matches nothing, so the handler
silently returns []fileEntry{}.
DeleteFile had the same gap — right-click delete (introduced in PR-C
of this issue) would silently no-op once #1 was fixed.
## Fix
1. Extracted shared EIC plumbing into `withEICTunnel` (closure-based,
single SSOT for keypair → key push → tunnel → port-wait → cleanup).
Refactored writeFileViaEIC + readFileViaEIC to use it. Added
listFilesViaEIC + deleteFileViaEIC on the same scaffold. The
`LogLevel=ERROR` shim from PR #2822 now lives in one
`eicSSHSession.sshArgs()` helper instead of being duplicated per
helper — the next time we need to tweak ssh options, one place.
2. Factored remote shell strings into pure functions
(buildInstallShell / buildCatShell / buildRmShell / buildFindShell
+ parseFindOutput) so the wire shape can be pinned without booting
a real EIC tunnel.
3. Refactored `resolveWorkspaceFilePath(runtime, root, relPath)` to
honor `?root=`. New rule: `/configs` (or empty / unrecognized) →
runtime managed-config dir via workspaceFilePathPrefix (preserves
the v1 ReadFile/WriteFile behaviour where canvas's Config tab
GETs/PUTs config.yaml without specifying a root and lands in the
right per-runtime dir); `/home`, `/workspace`, `/plugins` →
literal absolute path on the EC2 host. List/Read/Write/Delete now
agree on what file a tree row points to — pre-fix List would say
"/home contents" but Read/Write would route to /configs.
4. ListFiles + DeleteFile dispatch on instance_id != "" → EIC helper.
Errors from the EIC path produce 500 (not silent fall-through to
local-Docker, which would mask the failure as "0 files" — the
exact user-visible symptom).
5. Added ?root= validation gate to WriteFile + DeleteFile so an
out-of-allowlist root is rejected before the resolver runs.
## Test coverage
- TestResolveWorkspaceFilePath_RuntimeIndirection — pins the
/configs → runtime prefix translation per-runtime (hermes,
claude-code, langgraph, external, unknown). Catches the regression
where a future edit accidentally drops the runtime indirection.
- TestResolveWorkspaceFilePath_LiteralRoots — pins /home,
/workspace, /plugins as literal pass-through regardless of
runtime. Catches the symmetric regression where the literal roots
start getting rewritten to the runtime prefix (which would mean
the FilesTab "/home" selector silently routes to /configs on
hermes).
- TestResolveWorkspaceRootPath — directory-only translation used
by listFilesViaEIC, same indirection rules.
- TestSSHArgs_HardenedFlags — pins the centralised ssh option set
(LogLevel=ERROR + hardening). Catches drift in the
one-place-where-ssh-flags-live.
- TestEicSSHSessionSingleSourceForSSHFlags — behaviour-based AST
gate (per memory). Counts s.sshArgs() callers (must be ≥4 —
list/read/write/delete) and asserts LogLevel=ERROR appears
exactly once in the source. Fires if anyone copy-pastes a raw
ssh args slice instead of going through the helper.
- TestBuildInstallShell / TestBuildCatShell / TestBuildRmShell /
TestBuildFindShell — pure-function tests pinning the remote
command shape. Catches regression like "rm -f silently becomes
rm -rf" or "find loses node_modules pruning" without needing a
real EC2.
- TestBuildFindShell_DepthForwarding — catches a regression where
the helper hard-codes a depth instead of using the caller's value.
- TestParseFindOutput / TestParseFindOutput_EmptyInput — pin the
TYPE|SIZE|REL parser. Empty-input case explicitly returns []
not nil so the JSON wire shape stays a list.
- TestListFiles_EICDispatch_Success / Error — sqlmock-driven
handler test. Verifies instance_id != "" routes to listFilesViaEIC
and surfaces errors as 500 (does NOT silently fall through to
local-Docker, which is the exact regression-mode of the original
bug).
- TestListFiles_EICBranch_NotTakenForSelfHosted — back-compat
guard: instance_id == "" must NOT enter the EIC branch (would
break self-hosted operators).
- TestDeleteFile_EICDispatch_Success / Error — same shape for
DeleteFile.
- TestListFiles_RootValidation / TestDeleteFile_RootValidation —
?root=/etc must 400 before any DB query or EIC call.
## Verification
- `go build ./...` clean
- `go test ./...` clean (full workspace-server suite)
- Will be live-verified against staging on hongming.moleculesai.app
after merge: open Files tab → expect populated /home + /configs +
/workspace listings (not "0 files"); right-click delete on
/configs/old.yaml → expect file removed on the EC2 host.
## Three weakest spots (hostile self-review)
1. The LogLevel=ERROR drift gate counts source occurrences. A
future refactor that intentionally moves the literal somewhere
else (e.g. into a constant) would trigger a false positive. The
gate's failure message points to the load-bearing constraint
(must appear in sshArgs); operator can adjust.
2. `eicFileWriteTimeout` constant kept as an alias for back-compat
with prior tests. Documented as intentional + safe to remove on
the next pass.
3. The resolver tests pin the runtime → prefix map values
(`/home/ubuntu/.hermes`, `/configs`, etc.). A future runtime
addition that ships a new prefix needs the test updated. This
is intentional — silent prefix changes orphan saved files, so a
test failure on map edit IS the right signal.
## Follow-up (RFC #2312 subtask 2)
Long-term the right fix is to drop EIC entirely and HTTP-forward to
the workspace's own URL (RFC #2312). That's a substantially larger
refactor across 5 surfaces (chat upload, files, templates, plugins,
terminal) and out of scope for this bug-fix PR. Tracked separately
under that RFC.
Refs #2999.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second specialized renderer pair landing under RFC #2991. Stacks on
PR-1 (#2997) — extends the AttachmentPreview dispatcher with video/
audio cases.
Why HTML5-native (not custom JS player)
---------------------------------------
- Browser vendors ship hardware-accelerated decoders, captions,
pinch + scrub UX, and fullscreen UI. We get all of it for free.
- Native fullscreen via the <video> control bar — no
AttachmentLightbox needed for video (the browser's built-in
fullscreen handles it).
- Mobile-friendly without us writing the touch handlers.
Auth model
----------
Identical to AttachmentImage (PR-1): platform-auth URIs need our
cookie/token, so we fetch the bytes, wrap in a Blob, hand the
browser an ObjectURL via <video src=> / <audio src=>. External
http(s) URIs skip the fetch.
Memory caveat: a Blob holds the entire media in JS memory until the
bubble unmounts. The server's 25MB single-file cap (chat_files.go)
bounds this; v2 can switch to MediaSource + streaming if larger
files become a real shape.
Failure modes
-------------
- Fetch failure (404, 403, network) → AttachmentChip fallback.
- Bytes that aren't valid media (corrupt, wrong Content-Type) →
<video onError> / <audio onError> swap to chip.
Tests
-----
5 new component tests in AttachmentPreview.test.tsx (now 14 total):
- kind=video → <video controls> with blob URL src
- kind=video fetch fails → falls back to chip
- kind=video extension fallback (no mime) → routes to video path
- kind=audio → <audio controls> + filename label visible
- kind=audio fetch fails → falls back to chip
The preview-kind unit tests from PR-1 (49 cases) already cover the
MIME → video / audio dispatch logic; this PR's component tests pin
the rendered DOM shape (controls attribute, blob URL src, fallback
behavior).
Hostile self-review
-------------------
1. Memory bound: 25MB cap protects us today; documented future
migration path (MediaSource).
2. iOS Safari autoplay: playsInline pinned on <video> so mobile
doesn't auto-fullscreen on play.
3. Captions accessibility: <track kind="captions" /> placeholder so
the element is tagged correctly even though we don't have caption
files yet (forward-compatible).
Verified
- tsc --noEmit clean
- 173 chat tests green (49 unit + 14 component + 110 pre-existing)
Stacks on PR-1 (#2997). PR-3 (PDF + text/code) is the final piece.
Refs RFC #2991, PR #2997 (PR-1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First specialized renderer landing under RFC #2991 — chat attachment
preview. Adds the dispatch infrastructure that PR-2 (video/audio) and
PR-3 (PDF/text) will extend.
Architecture (RFC #2991 Phase 2 design)
---------------------------------------
- preview-kind.ts: pure helper that maps mimeType (+ extension fallback
for missing/generic MIME) to one of: image | video | audio | pdf |
text | file. Single source of truth; the dispatch axis for every
attachment renderer.
- AttachmentPreview.tsx: SSOT dispatch component. ChatTab no longer
imports kind-specific components — it imports AttachmentPreview,
which switches on the kind and renders the right child.
- AttachmentImage.tsx: inline thumbnail (max 240×180) + click →
lightbox. Auth-aware: for platform URIs (workspace: /
platform-pending: / etc) the bytes are fetched via JS-injected
headers, wrapped in a Blob, served as ObjectURL — bare <img src>
would not include the cookie/token.
- AttachmentLightbox.tsx: shared fullscreen modal (image now; PDF will
use it in PR-3). Esc / backdrop click / X button to close, focus
trap on close button, focus restoration on close.
- AttachmentChip retained as the kind=file fallback. No breaking
change for existing renderable shapes.
External-workspace coverage
---------------------------
The wire shape (ChatAttachment.mimeType + uri) is identical for
internal + external workspaces — both go through AgentMessageWriter
(PR #2949). External claude-code agents that attach images via
send_message_to_user automatically get the new preview surface; no
runtime-side change needed.
Failure modes
-------------
- Fetch failure (404, 403, network) → AttachmentChip fallback so the
user still gets a working download. Pinned by tests.
- Decoded as non-image (corrupt bytes, wrong Content-Type) → onError
on the <img> swaps to AttachmentChip. Pinned by tests.
- Non-platform URIs (http/https external image hosts) → skip the
auth-fetch flow, use the raw URL via resolveAttachmentHref. Pinned
by extension-fallback tests.
Tests
-----
preview-kind.test.ts (49 cases):
- Strict MIME match across image/video/audio/pdf/text/unknown
- Extension fallback when MIME is missing or application/octet-stream
- URL with query string + fragment → strip before parsing
- MIME wins over extension (regression: don't render image-named zip)
- SVG is image (not text) despite being XML
- Non-canonical MIME like application/javascript → text
AttachmentPreview.test.tsx (9 component tests):
- Dispatch: kind=file → chip, kind=image → image path
- Loading state shows placeholder, NOT chip (proves dispatch routed)
- Extension fallback (no mimeType) routes to image path
- Fetch fail (404) and network error → fall back to chip
- Image success: <img> renders ObjectURL, click opens lightbox
- Lightbox: Esc closes, backdrop click closes, content click doesn't
- Universal fallback: unknown MIME → chip even when extension hints
at a renderable kind
Hostile self-review (3 weakest spots, addressed)
------------------------------------------------
1. <img> auth: bare <img src="/chat/download?..."> would NOT include
our auth headers. Resolved via fetch+Blob+ObjectURL pattern.
Pinned by the image-success test (asserts src === "blob:test-url").
2. Server-side allowed-roots mismatch: pre-fix tests used /tmp/ paths
which the server doesn't allow. Caught when the dispatch test
fell into the non-platform path. Updated tests to use /workspace/
subpaths matching templates.go's allowedRoots.
3. Bundle size creep: each kind component adds bytes. Lightbox is
currently always-bundled. Lazy-loading is plausible but defer
until measured-needed.
Verified
- tsc --noEmit clean
- 168 chat tests green (49 unit + 9 component + 110 pre-existing)
PR-2 (video + audio) and PR-3 (PDF + text) extend the dispatch in
AttachmentPreview.tsx with their own kind-specific components.
Refs RFC #2991.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 15-min sweeper has been deleting stale e2e orgs but not the
orphan tunnels left behind when the org-delete cascade half-fails
(CP transient 5xx after the org row is gone but before the CF
tunnel delete completes). Result: tunnels accumulate in CF until
manual operator cleanup.
Add a final step that POSTs `/cp/admin/orphan-tunnels/cleanup`
every tick. Best-effort — failure doesn't fail the workflow; next
tick re-attempts. Output reports deleted_count + failed count for
ops visibility.
This is the catch-all for the orphan-tunnel class. The proper
upstream fix (transactional org delete) lives in CP and tracks as
issue #2989. Until that lands, the sweeper bounded-time-to-cleanup
keeps the leak from escalating.
Note: PR #492 (cf-tunnel silent-success fix) makes this step
actually effective — pre-fix DeleteTunnel silent-succeeded on
1022, so the cleanup endpoint reported success without deleting.
Post-fix the cleanup chains CleanupTunnelConnections + retry on
1022, which actually clears stuck-connector orphans.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Mirrors molecule-controlplane#494: the canonical EPHEMERAL_PREFIXES
list now lives in molecule-controlplane/internal/slugs/ephemeral.go,
where redeploy-fleet reads it to skip in-flight test tenants. The
sweep workflow keeps a Python copy because GHA Python can't import
Go, but a comment now points engineers updating the list to update
both files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2990 root cause: the resolver SQL added `name` to the SELECT for
DisplayName plumbing, but the e2e test's sqlmock fixture
(expectChainQueryRoot at swap_test.go:216) still scripts the
3-column shape. Three e2e tests fail with:
sql: expected 3 destination arguments in Scan, not 4
Fix: bump the fixture to 4 columns (id, name, parent_id, depth) and
pass an empty name. The e2e tests don't assert on label rendering —
they pin the namespace string flow ("workspace:root-1" etc), which
is unchanged. Empty name is fine: ReadableNamespaces still emits the
correct namespace strings; only DisplayName is empty.
Caught by CI's Platform (Go) check on PR #2990 — would have been a
silent missed-coverage case in the resolver_test.go run because that
package doesn't import the e2e package.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Mechanical migration of bare event-name strings in BroadcastOnly /
RecordAndBroadcast call sites to the typed constants from
internal/events/types.go (RFC #2945 PR-B). Wire format unchanged
(both shapes serialize to identical WSMessage.Event literals); pinned
by TestAllEventTypes_IsSnapshot in #2965.
Migrated (18 files, scope: handlers/, scheduler/, registry/, bundle/,
channels/):
- handlers/{approvals,a2a_proxy_helpers,a2a_queue,activity,agent,
delegation,external_rotate,org_import,registry,workspace,
workspace_bootstrap,workspace_crud,workspace_provision_shared,
workspace_restart}.go
- channels/manager.go (caught by hostile-reviewer pass — initial
scope missed channels/, found via grep on the post-migration tree)
- scheduler/scheduler.go
- registry/provisiontimeout.go
- bundle/importer.go
Hostile self-review (3 weakest spots, addressed)
------------------------------------------------
1. Missed call sites — initial scope omitted channels/. Post-migration
`grep -rEn 'BroadcastOnly\([^,]+,[^,]*"[A-Z_]+"|RecordAndBroadcast\([^,]+,[^,]*"[A-Z_]+"' internal/`
found 2 stragglers in channels/manager.go. Migrated. Final grep
on the same pattern returns only the docstring example in
types.go (intentional).
2. gofmt drift — auto-import injection produced non-canonical import
ordering. `gofmt -w` applied ONLY to the 18 modified files (NOT
the whole tree, to avoid sweeping unrelated pre-existing drift
into this PR's diff). Three pre-existing un-gofmt'd files in
handlers/ (a2a_proxy.go, a2a_proxy_test.go, a2a_queue_test.go)
left as-is — they're unchanged by this PR and their drift
predates it.
3. Wire format — paranoia check: do the constants serialize to the
exact strings consumers (canvas TS, hermes plugin, anything
parsing WSMessage.Event) expect? Yes. Pinned by the snapshot
test. The migration is name-only; not a single character of
wire output changes.
Verified
- go build ./... clean
- go vet ./internal/... clean
- gofmt -l on the 5 migrated package dirs: only pre-existing files
- Full tests: handlers/, channels/, scheduler/, registry/, events/,
bundle/ all green (5 ok, 0 fail)
PR-B-2 (canvas TS mirror + cross-language parity gate) remains as
the final piece of RFC #2945 PR-B. Tracked separately so this PR
stays mechanical + reviewable.
Refs RFC #2945, PR #2965 (PR-B types).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User feedback on the v2 Memory tab redesign: on a root workspace, the
namespace dropdown showed three indistinguishable entries:
Workspace (30ba7f0b)
Team (30ba7f0b) (team)
Org (30ba7f0b-b303-4a20-aefe-3a4a675b8aa4) (org)
For a root workspace, the resolver collapses workspace==team==org IDs
(resolver.go:113-122 derive() degenerate case). The previous
shortID(8)-truncated UUID label scheme made all three look identical
even though the three concepts (private / team-shared / org-wide)
remain semantically distinct.
## Backend — Resolver returns DisplayName
- SQL chain query now SELECTs workspaces.name (COALESCE → "" on NULL)
- chainNode carries .name through walk
- deriveNames() computes the display name for each namespace,
mirroring derive():
workspace: self.name
team: parent.name (or self.name if root — degenerate)
org: chain[end].name (root of tree)
- Namespace struct gets a new DisplayName field, omitempty wire-shape
## Backend — Handler renders label from DisplayName when present
- memories_v2.go:namespaceLabelWithName(name, kind, displayName) is
the new SSOT label generator. Falls back to the UUID-prefix shape
when displayName is empty so callers without name plumbing keep
working unchanged.
- namespacesToViews now plumbs Namespace.DisplayName into the label.
- Old namespaceLabel(name, kind) is preserved as a thin wrapper
around namespaceLabelWithName(_, _, "") for back-compat.
- Custom namespaces ignore displayName by design — operator-defined
suffixes ARE the chosen label; a name override would surprise.
## Frontend — drop redundant `(kind)` suffix
Pre-fix: "Team (mac laptop) (team)" — kind shown twice.
Post-fix: "Team (mac laptop)" — the prefix already conveys the kind.
## Test coverage
Resolver (3 new tests):
- DisplayName_Root: workspace name propagates to all 3 namespaces
- DisplayName_Child: workspace=self.name, team=parent.name, org=root.name
- DisplayName_EmptyOnNULL: COALESCE → "" → empty fallback
Handler (3 new tests):
- NamespaceLabelWithName_PrefersDisplayName: workspace/team/org/custom paths
- NamespaceLabelWithName_FallsBackToUUIDPrefix: empty displayName → legacy shape
- NamespacesToViews_PassesDisplayNameThrough: full integration on root case
Canvas: existing 30 tests still pass; suffix drop is rendering-only.
memories_v2.go function coverage: **14/14 = 100%**
- namespaceLabelWithName: 100%
- namespacesToViews: 100%
- (all 11 pre-existing functions stay at 100%)
## SSOT
The "what is this namespace called" question now has one source of
truth: namespace.Resolver.ReadableNamespaces sets DisplayName from the
canonical workspace.name column. The handler is a renderer; the
canvas is a consumer. No name-lookup logic duplicated across the
three layers.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Closes the silent-block failure mode that left 25 commits — including
the Memory v2 redesign and the reno-stars data-loss fix — wedged on
staging for 12+ hours behind a single missing review. The auto-promote
workflow opened the PR + armed auto-merge, but main's branch protection
required a human review and nobody noticed until a user reported
"still seeing old memory tab".
## Detection logic — `scripts/check-stale-promote-pr.sh`
Reads open PRs `base=main head=staging` and alarms on:
- `mergeStateStatus == BLOCKED`
- `reviewDecision == REVIEW_REQUIRED`
- createdAt older than `STALE_HOURS` (default 4h)
Other BLOCKED reasons (DIRTY, BEHIND, failed checks) are NOT alarmed —
those are the author's signal-to-fix. This script targets the specific
"no human reviewed yet" wedge.
Output:
- `::warning` per stale PR (visible in workflow summary + Actions UI)
- PR comment (idempotent via marker-string detection; one alarm
per PR, never re-spammed)
- Exit code = count of stale PRs (capped at 125)
Logic in a script (not inline workflow YAML) so it's:
- **Unit-testable** — tests/test-check-stale-promote-pr.sh exercises
every branch with stubbed fixture JSON + frozen clock. 23 tests
covering: empty list, single stale, just-under-threshold, wrong
reviewDecision, wrong mergeStateStatus, mixed list (only matching
PRs alarm), custom threshold via --stale-hours, exit-code-counts-
matching-PRs, --help, unknown arg → 64, missing repo → 2.
- **Operator-runnable ad-hoc** — `scripts/check-stale-promote-pr.sh`
works from any shell with `gh` + `jq`.
- **SSOT** — one detector, the workflow YAML is just schedule +
invocation surface. Future sibling workflows that need the same
check call the same script.
## Workflow — `.github/workflows/auto-promote-stale-alarm.yml`
Triggers:
- cron `27 * * * *` (hourly, off-the-hour to dodge cron herd)
- workflow_dispatch with `stale_hours` + `post_comment` overrides
Concurrency: `auto-promote-stale-alarm` group, cancel-in-progress=false
(idempotent script; no benefit to cancelling a running scan).
Permissions: `contents: read` + `pull-requests: write` (post comments).
Sparse checkout — only fetches `scripts/check-stale-promote-pr.sh`.
No node_modules, no go modules, no slow setup steps. Workflow runs
in <30s on a clean repo.
## Why "alarm + comment" not "auto-approve"
Considered options in issue #2975:
1. Slack/email alert — picked.
2. Bot-account auto-approve via molecule-ops — circumvents the
human-review gate that branch protection encodes.
3. Trusted-promote bypass via CODEOWNERS — needs Org Admin config
change; out of scope for a workflow PR.
The comment-on-PR pattern picks (1) without external dependencies
(no Slack token, no email config). Subscribers get notified via
GitHub's existing PR notification delivery; the warning shows up in
the Actions feed.
## Why this won't false-positive on legitimate slow reviews
Threshold is 4h. Most legitimate gates clear in <1h, so 4× headroom
is plenty for slow CI. The comment is idempotent (one alarm per PR,
never re-posted) — adding noise stops at 1 comment regardless of
how long the PR sits.
## Test plan
- [x] `bash scripts/test-check-stale-promote-pr.sh` — 23/23 pass
- [x] `python3 -c 'yaml.safe_load(...)'` clean
- [x] `bash -n` clean on both scripts
- [ ] Live verification: dispatch the workflow once main has caught up,
confirm it correctly reports zero stale PRs
Reported on production reno-stars 2026-05-05 (browser console):
/workspaces/d76977b1-…/files/config.yaml:1
Failed to load resource: the server responded with a status of 404
The workspace was an external-runtime mac-mini-style agent that
doesn't use the platform's config.yaml template — every Config tab
open issued a GET that 404d cleanly, and the existing catch block
fell into the runtime-manages-own-config branch + populated the
form from workspace metadata. Functionally correct, but the request
fired anyway, surfaced as a 404 in DevTools, and burned an RTT.
Fix: branch on RUNTIMES_WITH_OWN_CONFIG BEFORE the fetch — when the
workspace's runtime is one of those (external, hermes), skip the
GET, populate the form from workspace metadata directly, set
loading=false, return. Same code path as the existing 404-catch
fallback, just skipping the wasted request.
Behavior preserved for runtimes that DO use the template
(claude-code, etc.): unchanged GET → parse → setConfig flow.
Tests: 24/24 existing ConfigTab tests pass; no behavioral change for
the documented runtimes. tsc clean.
Refs reno-stars production 2026-05-05.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#2973 — the followup test gap I flagged on PR #2968's review.
Pre-merge #2968 added the platform-pending: URI scheme branch to
resolveAttachmentHref + introduced the isPlatformAttachment SSOT
helper, but the existing uploads.test.ts only covered the older
workspace: / file:/// / absolute-path branches. The new branch shipped
on prod-impact (live console error on reno-stars) with manual post-
deploy verification; the regression gate was filed as a followup
(#2973) so a future canvas refactor can't silently re-break the
poll-mode chat-attachment download path.
Adds 15 new test cases across two existing describe blocks:
resolveAttachmentHref — platform-pending: scheme (poll-mode uploads):
- well-formed platform-pending:<wsid>/<fileid> resolves to the
/pending-uploads/<file>/content endpoint
- uses the URI's wsid, NOT the chat workspace_id (cross-workspace
forwarding case — pinning the explicit decision from #2968's
commit message so a regression that flipped this would mis-route
the download to the wrong workspace's pending-uploads store)
- defensive fallback to raw URI on missing slash, empty fileID,
empty wsid (so a future "helpful" change can't synthesize a
broken /pending-uploads// path)
- regression test against the EXACT production repro from #2968's
body (reno-stars, 2026-05-05 console error)
isPlatformAttachment:
- positive cases for platform-pending: (well-formed and malformed),
workspace:<allowed-root>, file:///<allowed-root>, absolute paths
under allowed roots
- NEGATIVE cases for HTTPS/HTTP URLs to other origins (auth-leak
class regression — a helper that always returned true would
attach workspace tokens to third-party requests), non-allowlisted
roots like /etc/passwd or /var/log/x, empty string, and
unrecognised schemes (s3://, ftp://)
All 21 tests pass. The 6 pre-existing tests are unchanged. The 15
new tests are the regression gate that #2973 asked for.
Verification:
- pnpm exec vitest run src/components/tabs/chat/__tests__/uploads.test.ts
→ 21 passed
The drift gate caught the new SSOT parser module — without registration
the wheel ships it un-rewritten and runtime imports fail. Same pattern
as inbox_uploads, a2a_tools_delegation, a2a_tools_rbac registrations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously Phase 3 only checked the workspace-server's poll-mode short-circuit
emit shape ({"status":"queued","delivery_mode":"poll","method":"..."}); the
matching client-side classification was tested in isolation against fixture
dicts in test_a2a_response.py.
This phase closes the loop by piping the actual on-the-wire response from a
real workspace-server back through the wheel's a2a_response.parse() and
asserting it classifies as the Queued variant with the right method +
delivery_mode. A regression in EITHER the server emit shape OR the client
parser will now fail this E2E, eliminating the gap that allowed the original
"unexpected response shape" production bug to ship despite green unit tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported on production 2026-05-05:
agent plugin tab Plugins
0 installed
+ Install Plugin
this part should be default compact
Pre-fix: SkillsTab always rendered the Plugins section as a full
rounded-xl panel with vertical chrome — even when zero plugins were
installed and the registry browser was closed. The empty state
gave a lot of vertical real estate for content that's just "0
installed + Install button".
Fix: when installed.length === 0 AND registry closed AND initial
load completed, collapse the section into a single inline pill
("Plugins · 0 installed · + Install Plugin"). The full panel
re-mounts when:
- installed.length > 0 (a plugin landed → expand to surface the list)
- showRegistry === true (user clicked + Install Plugin → registry opens)
- !installedLoaded (avoid flash; the loading shell shows instead
until the first /plugins fetch resolves)
Accessibility:
- Compact pill: aria-label="Plugins (none installed)" + button
aria-expanded="false" + aria-controls="plugins-section"
- Full panel: button aria-expanded={showRegistry} + same aria-controls
- Section gets id="plugins-section" so the aria-controls reference
resolves once the section mounts
External workspaces: this is a pure canvas-frontend layout change —
applies to ALL workspace runtimes (external, claude-code, hermes,
langchain, codex, third-party MCP). No server-side change needed.
Tests
-----
SkillsTab.compactEmpty.test.tsx (4 tests):
- Compact pill renders when installed=0, registry closed, loaded
- Full panel renders when installed > 0
- Click + Install Plugin from compact → expands to full panel
(verified via aria-controls target id appearing in the DOM)
- During initial load (installedLoaded=false), compact pill does
NOT render — avoids a compact→full flash as the load completes
Per memory feedback_oss_design_philosophy.md: the SkillsTab is the
only tab that needs compact-empty today, but the pattern is
extractable into a shared EmptyStateCompactWrapper if Schedules /
Memories / Approvals adopt the same affordance later. Don't generalise
until the third use case (per the same memory, "every refactor toward
OSS plugin shape" without premature abstraction).
Verified
- tsc --noEmit clean
- All 4 tests pass
Refs #2971.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce ``workspace/a2a_response.py`` as the single source of truth for
the wire shapes the workspace-server proxy can return at
``/workspaces/<id>/a2a``:
* ``Result`` — JSON-RPC success
* ``Error`` — JSON-RPC error or platform-level error (with
restart-in-progress metadata when present)
* ``Queued`` — poll-mode short-circuit envelope: the platform
queued the message into the target's inbox, the
target will fetch via /activity poll
* ``Malformed`` — anything the parser can't classify (logged at
WARNING so a future server change is loud)
``send_a2a_message`` (in ``a2a_client.py``) now dispatches via
``a2a_response.parse(data)`` instead of inline ``"result" in data`` /
``"error" in data`` sniffing. The Queued variant returns a new
``_A2A_QUEUED_PREFIX`` sentinel so callers can distinguish "delivered
async, no synchronous reply" from both success-with-text and failure.
reno-stars production data caught two intermittent failures that
both reduced to the same root cause:
1. **File transfer announce silently failed** — when CEO Ryan PC
(poll-mode external molecule-mcp) sent the harmi.zip
announcement to Reno Stars Business Intelligent (also poll-mode
external), ``send_a2a_message`` saw the platform's poll-queued
envelope ``{"status":"queued","delivery_mode":"poll","method":"..."}``,
didn't recognize it as the synthetic delivery-acknowledgement
it is, and returned ``[A2A_ERROR] unexpected response shape``.
The agent fell back to a chunk-shipping path; receiver did get
the file but operator-facing logs showed a failure that didn't
actually fail.
2. **Duplicated agent comm** — same bug, inverted direction. d76
delegated to 67d, send_a2a_message returned the unexpected-shape
error, delegate_task wrapped it as DELEGATION FAILED, the calling
agent retried with sharper wording, the recipient saw the same
request twice and self-reported "二次请求 — 我先不执行".
External molecule-mcp standalone runtimes are inherently poll-mode
(they have no public URL), so every external↔external A2A pair was
hitting this on every send. The pre-fix client only handled JSON-RPC
``result``/``error`` keys and treated the queued envelope (which has
neither) as malformed. RFC #2339 PR 2 added the queued envelope on
the server side; the client never caught up.
When ``send_a2a_message`` returns the ``_A2A_QUEUED_PREFIX`` sentinel,
``tool_delegate_task`` now transparently falls back to
``_delegate_sync_via_polling`` (RFC #2829 PR-5's durable
``/delegate`` + ``/delegations`` polling path, which DOES work for
poll-mode peers because the platform's executeDelegation goroutine
writes to the inbox queue and the result row arrives when the target
picks it up + replies). The agent gets a real synchronous reply
instead of the empty queued sentinel.
* ``test_a2a_response.py`` — 62 tests, **100% line coverage** on
the parser (verified via ``coverage run --source=a2a_response``).
Includes adversarial-input fuzzing across ~25 pathological
payloads — parser must never raise.
* ``test_a2a_client.py::TestSendA2AMessagePollMode`` — 4 tests for
the new Queued/Error wiring in ``send_a2a_message``.
* ``test_delegation_sync_via_polling.py::TestPollModeAutoFallback``
— 3 tests for the auto-fallback in ``tool_delegate_task``,
including negative cases (push-mode reply must NOT trigger
fallback; genuine error must NOT silently retry).
* **Verified all new tests FAIL on pre-fix source** by stashing
a2a_client.py + a2a_tools_delegation.py and re-running — 5
failures including ImportError for the missing
``_A2A_QUEUED_PREFIX``.
Per the operator-debuggability directive:
* INFO at every Queued classification (expected variant; operator
sees normal poll-mode-peer queueing in log stream).
* INFO at the auto-fallback decision in ``tool_delegate_task``
so a future operator can correlate "send returned queued →
falling back to polling path" without reading the source.
* WARNING at every Malformed classification (server contract
drift; operator MUST see this immediately).
* Existing transient-retry WARNING preserved.
* Mirror Go-side typed model in workspace-server. The wire shape
is documented in ``a2a_response.py``'s module docstring with
file:line pointers to the canonical emitters; a future PR can
introduce ``models/a2a_response.go`` without changing wire
behavior. The fixture corpus in ``test_a2a_response.py`` is
designed so a one-sided edit breaks CI.
* ``send_message_to_user`` and ``chat_upload_receive`` use a
different endpoint (``/notify``) and aren't affected by this
bug; their parsing stays unchanged.
* 135 tests pass across ``test_a2a_response.py`` +
``test_a2a_client.py`` + ``test_delegation_sync_via_polling.py``
+ ``test_a2a_tools_impl.py``.
* ``coverage run --source=a2a_response -m pytest`` reports 100%
line coverage with 0 missing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
workspace-server's a2a_proxy poll-mode short-circuit returns
{status: "queued", delivery_mode: "poll", method: <a2a_method>}
when the peer has no URL to dispatch to (poll-mode peers, including
every external molecule-mcp standalone runtime). The bare
send_a2a_message parser only knew about JSON-RPC {result, error}
keys, so this envelope fell through to the "unexpected response shape"
error path. Two production symptoms on the reno-stars tenant traced
to it:
1. File transfer logged as failed when it actually succeeded —
operator-facing logs showed an A2A_ERROR but the receiving
workspace did get the chunked file via the agent's fallback path.
2. delegate_task retried after the false failure → peer received
duplicate delegations → conversation got confused, the second
peer self-diagnosed in a notify ("⚠️ Peer 二次请求 — 我先不执行").
Add a third branch to the parser, BETWEEN the existing JSON-RPC
{result, error} cases and the catch-all "unexpected" fallback. The
queued envelope is delivery-acknowledged-but-pending-consumption —
not an error — so it returns a clean success string the agent can
render as a normal outcome. The success string includes "queued"
and "poll" so an operator scanning logs sees the routing path
without parsing JSON.
Defensive: the new branch only fires when BOTH status="queued" AND
delivery_mode="poll" are present. A partial envelope (one key
missing) still falls through to the catch-all, so a future server
bug that emits a malformed shape gets surfaced instead of silently
swallowed.
Tests:
- test_poll_queued_envelope_returns_success_string — pins the canonical
envelope returns a non-error string. Discriminating: verified to FAIL
on old code (returned [A2A_ERROR] string), PASS on new.
- test_poll_queued_envelope_with_other_method — pins the parser doesn't
hardcode message/send. Discriminating: also FAILS on old code.
- test_status_queued_without_poll_mode_still_falls_through — pins both
keys are required (defensive against future server bugs).
12 existing tests in TestSendA2AMessage still pass — no regression.
Scope: hotfix for the bare send_a2a_message path. The full SSOT
typed-A2AResponse refactor (#158-#163, parents under #2967) covers the
broader vocabulary alignment between Go server and Python client. This
PR ends the production symptoms now without preempting that work.
Followup to PR #2966. The user reported the about:blank symptom on
reno-stars and the browser console showed:
Failed to launch 'platform-pending:d76977b1-…/bb0dcaf3-…' because
the scheme does not have a registered handler.
So the agent's "download link" was a `platform-pending:<wsid>/<file_id>`
URI — the canonical reference for poll-mode chat uploads (see
workspace-server/internal/handlers/chat_files.go:690 +
workspace/inbox_uploads.py). PR #2966 only handled `workspace:`,
`file:///`, and absolute container paths; the platform-pending
scheme fell through to the raw URI which the browser couldn't
navigate to.
Fix
---
- `resolveAttachmentHref`: added a `platform-pending:` branch that
resolves to `${PLATFORM_URL}/workspaces/<wsid>/pending-uploads/
<file_id>/content`. Uses the wsid from the URI, NOT the chat's
workspace_id — these can differ when a file is forwarded across
workspaces (cross-workspace delegation, agent forwarding).
- New `isPlatformAttachment(uri)` helper — single source of truth
for "this URI requires our auth headers, route through
downloadChatFile". Used by both `downloadChatFile` (chip click)
and ChatTab's markdown-link override.
- ChatTab.tsx markdown-link override now imports
`isPlatformAttachment` instead of duplicating the scheme list.
Pre-fix this list was duplicated and missed `platform-pending:`.
Tests
-----
The 4 IME tests still pass; tsc clean. The platform-pending resolution
is exercised via the `isPlatformAttachment` SSOT helper (any URI
reaching `downloadChatFile` or the markdown override goes through
it). A dedicated test for the URL shape would need a more elaborate
fixture; manual verification on staging post-deploy is the practical
gate.
Reported on production reno-stars 2026-05-05.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two production-reported regressions in the same chat surface, fixed
in one focused PR.
Issue 1 — IME composition + Enter sends half-typed message
----------------------------------------------------------
ChatTab's textarea onKeyDown was:
if (e.key === "Enter" && !e.shiftKey) {
e.preventDefault();
sendMessage();
}
For agents typing CJK / Japanese / Korean via the system IME, Enter
commits the candidate selection — not a newline, not a send. With
the old check, every IME-commit Enter accidentally sent the
half-typed message ("你好" + half-typed-pinyin + Enter to commit
the next candidate → message goes out before the user finishes).
Fix: guard on `event.nativeEvent.isComposing` AND `e.keyCode !== 229`.
The latter covers older Safari / WebKit-based mobile browsers that
delay setting isComposing on the composition-end Enter.
Issue 2 — markdown links land at about:blank
---------------------------------------------
ReactMarkdown's default `<a>` rendering passes the agent-supplied
href directly to the DOM with no target / scheme handling:
- http(s) → navigates the canvas tab away (canvas state lost)
- workspace://path / file:///workspace/... / /workspace/... →
browser hits unhandled-protocol click → about:blank, no
download (the reported bug)
Fix: ReactMarkdown `components.a` override:
- In-container paths (workspace:, file:///{workspace,configs,home,
plugins}, bare /{workspace,configs,...}) → preventDefault, route
through downloadChatFile (same auth path the AttachmentChip
uses). Filename is derived from the path's last segment.
- External (http/https/mailto/unknown scheme) → target="_blank"
rel="noopener noreferrer" so canvas state survives.
Tests
-----
ChatTab.imeAndLinks.test.tsx (4 tests):
- Enter with isComposing=true → does NOT send, input preserved
- Enter with keyCode=229 (older-Safari IME) → does NOT send
- Enter with no IME signal → DOES send (happy path intact)
- Shift+Enter → does NOT send (newline path intact)
The link-component override is exercised through the full ChatTab
render — the IME tests are jsdom-only and don't load chat history
with markdown messages, so the link test would need a more elaborate
fixture. Manual verification on staging post-deploy is the practical
gate; if the link test grows critical the AttachmentViews-style chip
test can extend.
Verified:
- tsc --noEmit clean
- 4/4 IME tests pass
Reported on production 2026-05-05.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-RFC-#2945, every BroadcastOnly / RecordAndBroadcast call site
passed a bare string literal:
h.broadcaster.BroadcastOnly(workspaceID, "AGENT_MESSAGE", payload)
29 producers (Go, ~30 call sites in handlers/, scheduler/, registry/,
bundle/) and ~30 canvas consumers (TS store + listeners) duplicated
the same string with no shared definition. A producer renaming an
event silently broke every consumer — same drift class that produced
the reno-stars data-loss regression on the persistence side. PR-A
fixed the persistence-side SSOT (AgentMessageWriter); PR-B fixes the
event-name SSOT.
What this PR ships
internal/events/types.go
- EventType typed string + 29 named constants covering the full
taxonomy (chat / lifecycle / agent assignment / delegation /
task / approval / auth).
- Grouped semantically; new constants must be added here AND
mirrored in canvas/src/lib/ws-events.ts (parity gate landing
in PR-B-2 follow-up).
- AllEventTypes slice — authoritative list for the snapshot
test + the cross-language parity gate.
internal/events/types_test.go (3 tests)
- TestAllEventTypes_IsSnapshot: pins the canonical list. Adding
a new constant without updating AllEventTypes (or vice versa)
fails with a one-line diff.
- TestEventType_NoEmptyConstants: catches accidentally-empty
values (typo in types.go: const X EventType = ...).
- TestEventType_AllUppercaseSnakeCase: pins the wire format that
canvas TS switch statements assume (no kebab-case, no mixed
case, no leading/trailing/double underscores).
agent_message_writer.go (single migration)
- Demonstrates the constant-usage shape:
events.EventAgentMessage → "AGENT_MESSAGE"
- Other ~30 call sites stay on bare strings for now (this PR
narrow); the migration happens in PR-B-1 follow-up. Both
shapes (constant + bare string) co-exist on the wire — the
typed version is just the recommended path for new code.
Why ship this in stages
1. PR-B (this): types + tests + first migration → MERGEABLE NOW,
low risk.
2. PR-B-1 (follow-up): migrate the remaining ~30 call sites to
constants. Mechanical, low-risk.
3. PR-B-2 (follow-up): canvas/src/lib/ws-events.ts mirror + cross-
language parity gate. Touches both repos.
Per memory feedback_oss_design_philosophy.md (every refactor toward
OSS plugin shape) — this surface is now plugin-safe: external
implementations can import the events package and get the same
named taxonomy without copying strings.
Verified
- go vet ./internal/events/ clean
- go build ./... clean
- TestAllEventTypes_IsSnapshot + TestEventType_* all pass
- TestAgentMessageWriter_* (the only call site touched) still green
Refs RFC #2945, PR #2949 (PR-A SSOT), PR #2944 (reno-stars).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous byte-slice form `s[:previewCap]` could split a multi-byte
codepoint at byte 4096, producing invalid UTF-8. Postgres JSONB rejects
the row → ledger insert silently fails → audit gap on dashboards while
activity_logs continues to record the event.
Walk the string by rune index and stop at the last boundary that fits
inside the cap. ASCII-only strings still hit the cap exactly; CJK/emoji
strings stop slightly under, never over.
Mirrors the truncatePreviewRunes fix shipped for agent_message_writer
in #2959. Followup: deduplicate into a shared helper once both have
landed.
Tests: 2 regression tests using utf8.ValidString — one with an all-3-byte
rune string just over the cap, one with a single multi-byte rune sitting
exactly on the boundary. Verified on the previous byte-slice impl: both
new tests would fail (invalid UTF-8 + truncation past cap by 1 byte).
Two issues caught in five-axis self-review of #2956:
## 1. Drop speculative source_workspace_id rendering
The panel rendered a "from peer" badge based on
`propagation.source_workspace_id`, claiming it surfaced cross-
workspace propagation. But the OpenAPI spec at
docs/api-protocol/memory-plugin-v1.yaml documents `propagation` as
"Opaque metadata the plugin stores and returns. Reserved for future
cross-namespace propagation semantics" — and a grep across
workspace-server/internal/memory/ confirms NO writer in the codebase
populates that key. The badge would never render against real data.
Violates "don't design for hypothetical future requirements" from
the project conventions. Drop the field from MemoryV2, the row badge,
the test fixtures, and the JSDoc. When propagation gains a concrete
shape, re-add backed by an actual writer.
## 2. Tighten 503 detection — match the literal contract string
Pre-fix detection: `msg.includes('503') || msg.toLowerCase().includes('plugin is not configured')`
False-positives on any unrelated 503 + on any error mentioning
"plugin" + "configured" in any order.
Post-fix: `msg.includes('MEMORY_PLUGIN_URL')` — the env var name is a
hard-coded literal in workspace-server/internal/handlers/memories_v2.go's
available() error, so this is a pinned cross-layer contract. Drift
between the Go error message and the canvas detection now fails
loud (TestMemoriesV2_PluginUnwired_All503 asserts the env var name
in the response body; the canvas test asserts the same).
Extracted as a named export `isPluginUnavailableError` so the
detection is unit-testable and reusable. Added 4 direct tests:
contract-string match, generic-503 false-negative, 401 false-
negative, non-Error inputs.
## Test results
- 30 component tests pass (was 26; +4 for isPluginUnavailableError)
- Coverage on MemoryInspectorPanel.tsx: 100% lines, 100% functions
(branch coverage up to 85.9% from 84.7% — speculative-field
branches no longer count)
- Full canvas suite: 1277/1277 pass across 91 files
Self-review caught after #2954 landed: check_register() POSTed to
/registry/register with agent_card.name="doctor-probe". The endpoint
is an UPSERT, so the doctor probe overwrites the workspace's actual
agent_card metadata until the real agent's next register call. An
operator running `molecule-mcp doctor` against a live workspace
would see their canvas briefly display "doctor-probe" as the agent
name — invisible production-disruption.
Switches to POST /registry/heartbeat. heartbeat only updates
last_heartbeat_at (and clears awaiting_agent if needed) — the same
work a normal molecule-mcp boot does every 20s in steady state, so
the doctor's extra heartbeat is indistinguishable from background
traffic.
Function renamed check_register → check_token_auth to match what
it actually does. check_register kept as back-compat alias so any
external test/import still resolves.
Also unified the duplicated token-resolution paths into a single
_resolve_token() returning (value, source_label). Pre-fix:
check_register and _resolve_token_summary read env in parallel
ladders — a future env-var addition would have to touch both.
New tests:
- test_check_token_auth_uses_heartbeat_endpoint: mocks urlopen,
asserts the URL ends in /registry/heartbeat AND does NOT
contain /registry/register. Pins the load-bearing invariant
so a future refactor can't silently re-route through register.
- test_resolve_token_returns_value_and_label_for_env: pins the
consolidated resolver returns both pieces of info from the
same source-decision.
- test_resolve_token_returns_none_when_missing: missing-env
happy path.
Verification:
- 13/13 tests pass (10 existing + 3 new)
- Manual stripped-env run still renders 4 FAIL + 2 WARN with
actionable hints, exit 1.
Refs molecule-core#2934 item 6 (doctor side-effect fix-up).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review of PR #2949 surfaced two pre-existing defects that the
SSOT consolidation inherited from the original /notify handler. Both
are addressable in a small follow-up; shipping them as a separate PR
keeps the consolidation and the bug-fix individually reviewable.
Critical: byte-slice preview truncation produces invalid UTF-8
-------------------------------------------------------------
Pre-fix:
if len(preview) > 80 {
preview = preview[:80] + "…"
}
`len()` returns BYTES; `preview[:80]` slices on a byte boundary. For
agent-authored chat in CJK / emoji / accented characters, byte 80
lands mid-codepoint → invalid UTF-8 → Postgres JSONB rejects → INSERT
fails → activity_log row never written → message vanishes from chat
history on the next reload. The persistence-failure log fires but
operators have to grep to find it, and the user-visible regression
mode is identical to reno-stars.
Fix: extract `truncatePreviewRunes(s, maxRunes)` that walks the rune
boundary using `for i := range s` (Go's range over string yields rune
start indices). Cap at 80 RUNES not bytes — UI-friendly count, not
storage count.
Important: workspace-lookup error path swallows real DB errors
--------------------------------------------------------------
Pre-fix:
if err := w.db.QueryRowContext(...).Scan(&wsName); err != nil {
return ErrWorkspaceNotFound
}
Conflates `sql.ErrNoRows` (legit not-found → caller 404) with real
DB errors (connection drop, query timeout, pool exhaustion → caller
should 503). During a Postgres outage every notify call surfaced as
"workspace not found" — masking the actual incident in alerting and
making the symptom indistinguishable from "you typed a bad workspace
ID".
Fix: distinguish via `errors.Is(err, sql.ErrNoRows)` and wrap
non-not-found errors with `fmt.Errorf("agent_message: workspace
lookup: %w", err)`. Callers' existing fallback path (return 500 /
return error wrapped) handles the new shape correctly without any
changes — verified by running existing TestNotify_* and
TestMCPHandler_SendMessage_* tests.
Tests added (3 new, 11 total writer tests)
------------------------------------------
- TestTruncatePreviewRunes_RuneBoundary: 8-case table — ASCII, CJK,
exactly-at-max, emoji prefix. Asserts both correct visible output
AND `utf8.ValidString` on every result so the bug shape (invalid
UTF-8) can't recur.
- TestAgentMessageWriter_Send_NonASCIIMessagePersists: end-to-end
with a 200-rune CJK message (exceeds the 80-rune cap, would have
hit the byte-slice bug). Pins the INSERT summary contains valid
UTF-8 with exactly 80-rune body + ellipsis.
- TestAgentMessageWriter_Send_DBErrorOnLookupReturnsWrapped: pins the
DB-outage path returns a wrapped non-ErrWorkspaceNotFound error so
alerting can distinguish 404 from 503. Verified via mock
ExpectQuery returning a transient error.
Verified
--------
- `go vet ./internal/handlers/` clean
- `go build ./...` clean
- All 14 writer + caller tests pass (8 original + 3 new + AST gate +
TestNotify_* + TestMCPHandler_SendMessage_* sibling tests)
Per memory feedback_assert_exact_not_substring.md: every new test
asserts boundary behavior directly (UTF-8 validity, exact rune count,
errors.Is comparison) rather than substring-match in stringified
output.
Refs RFC #2945, PR #2949, PR #2944.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The External Connect modal's Codex and OpenClaw tabs were rendering
this MCP server config:
command = "python3"
args = ["-m", "molecule_runtime.a2a_mcp_server"]
That spawns the bare MCP dispatcher with no presence wiring. The
``molecule-mcp`` console-script wrapper (mcp_cli.main) is what calls
``POST /registry/register`` at startup and runs the 20s heartbeat
thread alongside the MCP stdio loop. Without the wrapper, the canvas
flips the workspace back to ``awaiting_agent`` (OFFLINE) within
60-90s — even while tools work — because nothing is heartbeating.
Operator-side this looks like: the workspace is registered and tools
work fine when invoked, but the canvas shows "offline" / "Restart"
CTA, peer agents see the workspace as awaiting_agent in list_peers
output, and inbound A2A delivery silently fails the readiness check.
A new external-Codex operator (#2957) hit this and spent debugging
time on what should have been a copy-paste install.
Fix: switch both Codex and OpenClaw templates to
``command = "molecule-mcp"`` / ``args = []``, matching the universal
MCP template that already handles this correctly. Inline comment in
each template explains the wrapper-vs-bare-module tradeoff so a
future template author doesn't regress to the shorter form.
Hermes-channel intentionally still spawns the bare module — the
hermes plugin owns the platform plugin path and runs its own
register_platform/heartbeat code in-process; double-heartbeating
would race. Universal/Codex/OpenClaw all need the wrapper.
Regression gate: TestExternalMcpTemplates_UseMoleculeMcpWrapper
asserts the three templates that must use the wrapper actually do,
and explicitly fails on the old ``-m molecule_runtime.a2a_mcp_server``
shape. Verified the test FAILS on pre-fix source by stashing only
external_connection.go and re-running.
Source: molecule-core#2957 issue 1 (item 4 of the report — the
``(codex returned empty output)`` / opaque-canvas-error / stale-
session items live in codex-channel-molecule and are tracked
separately).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the v1 LOCAL/TEAM/GLOBAL tab trio (mapped to the deprecated
shared_context model) with a v2 plugin-driven UI. Without this,
canvas Memory tab was reading the frozen agent_memories table while
all post-cutover agent writes went to the plugin's memory_records —
the tab silently displayed stale data.
## Backend (workspace-server)
New routes under wsAuth, all behind the existing per-tenant token:
GET /workspaces/:id/v2/namespaces → readable + writable lists
GET /workspaces/:id/v2/memories → plugin search proxy
DELETE /workspaces/:id/v2/memories/:mid → plugin forget proxy
memories_v2.go — slim handler:
- Server-side ACL: every search request is intersected with the
resolver's readable-namespaces set (canvas-supplied namespace
that the workspace can't read returns [] not 403, matches v1
existence-non-inferring shape).
- Returns 503 with "set MEMORY_PLUGIN_URL" hint when plugin
isn't wired (canvas surfaces a banner).
- Maps plugin not_found → 404, other plugin errors → 502.
- View shaping: NamespaceView.label rendered server-side
("Workspace (abc-1234)", "Team (t-99)", "Org (acme)", custom)
so canvas doesn't parse namespace names. MemoryView surfaces
pin/expires_at/score/source_workspace_id from Propagation.
memories_v2_test.go — 100% line + 100% function coverage:
- 503 path on every endpoint when unwired
- Namespaces success + readable/writable error paths
- Search: empty intersection, full-path query/kind/limit
propagation, namespace=/no-namespace branches, propagation
map missing/wrong-type, intersect error, plugin error
- Forget: success, plugin not_found→404, other plugin
errors→502, missing memoryId→400
- Helpers: namespaceLabel for all 4 kinds + truncation,
parseLimit edge cases (default/0/negative/over-cap/non-num),
memoryToView field round-trip, indexOfColon, shortID
## Frontend (canvas)
MemoryInspectorPanel rewritten for v2:
- Drop LOCAL/TEAM/GLOBAL trio. Namespace dropdown driven by
GET /v2/namespaces.readable, "All namespaces" default.
- New per-row badges: kind (F/S/C), source (agent/runtime/user),
pin (📌), TTL countdown (⌛12h / "expired"), score% on
semantic search, source-workspace ⇡ws-pee for propagated.
- Drop Edit button — v2 plugin contract has no PATCH; the
model is forget + recommit. Forget stays.
- Plugin-unavailable banner with operator hint when /v2/*
returns 503.
- Bug fix surfaced by test: rollback-on-failed-delete order
of operations (loadEntries() called setError(null) AFTER
we set the failure message, wiping it). Reload first, then
set the error.
MemoryEditorDialog deleted — Add was POST /memories which v2
doesn't support from canvas (writes go via MCP). The legacy
Edit-flow tests go with it.
## Test results
Backend: `go test ./internal/handlers/` — all pass
Backend coverage on memories_v2.go: 100% lines, 100% functions
Canvas: `vitest run` — 91 files, 1273 tests pass (26 new)
Canvas coverage on MemoryInspectorPanel.tsx: 100% lines,
100% functions, 96.7% statements, 84.7% branches
(uncovered branches are defensive `?? fallback` for
contract-impossible kind/source values)
## Migration note
The legacy v1 GET/POST/PATCH/DELETE on /workspaces/:id/memories
remains in place for the back-compat MCP shim (mcp_tools_memory_v2's
legacy routing) and admin export/import. PR-9 (#283) drops
agent_memories along with the v1 endpoints once the cutover
verification window closes.
Closes#2934 item 6 — the deferred follow-up from Ryan's onboarding-
friction report. Quote: "this single command would have saved me
30 of the 45 minutes."
When push delivery fails or the install half-works, the operator
today has no signal — they hand-grep the Claude Code binary or
chase the `from versions: none` red herring. Doctor renders six
checks in one screen with concrete next-step suggestions:
1. Python version >=3.11? (wheel's pin)
2. Wheel install molecule-ai-workspace-runtime importable +
version surfaced
3. PATH for binary `molecule-mcp` resolves on PATH; if not,
prints the resolved user-site bin dir to
add (or recommends pipx)
4. Env vars PLATFORM_URL + WORKSPACE_ID + token (env or
*_FILE or .auth_token)
5. Platform reach GET ${PLATFORM_URL}/healthz returns 2xx
6. Registry register POST /registry/register with the resolved
token returns 2xx — end-to-end auth check
Each line: `[OK|WARN|FAIL] <label>: <status>` plus a `next:` hint
when not OK. ANSI colors auto-disable on non-TTY / NO_COLOR.
Exit code: 0 on all-OK or only-WARN, 1 on any FAIL — scriptable
from CI install-checks.
## Files
`workspace/mcp_doctor.py` (new) — six check functions + `run()`
entry point. Uses urllib (stdlib)
so doctor works even on a partial
install where `requests` is missing.
`workspace/mcp_cli.py` Subcommand dispatch:
molecule-mcp doctor → mcp_doctor.run()
molecule-mcp --help → usage banner
molecule-mcp → server (unchanged)
`workspace/tests/test_mcp_doctor.py` (new) — 10 tests covering each
check's pass/fail/skip path
plus the end-to-end exit-code
contract on a stripped env.
`scripts/build_runtime_package.py` Adds `mcp_doctor` to
TOP_LEVEL_MODULES so the
wheel ships the new module.
## Out of scope (deferred follow-ups)
- Claude Code-specific checks (parse ~/.claude.json, verify each
MCP entry is plugin-sourced + dev-channels flag set). That's a
separate Claude-Code-shaped doctor; lives in the channel plugin.
- Automated remediation. Doctor is diagnostic — tells the operator
what's wrong + how to fix it, doesn't apply changes.
## Verification
- python -m pytest tests/test_mcp_doctor.py -v → 10/10 PASS
- python -m pytest tests/test_mcp_cli*.py → 67/67 PASS
(existing CLI suite still green; subcommand dispatch added
before env-validation, doesn't disturb the server-boot path)
- manual: `molecule-mcp doctor` on a stripped env renders 4 FAIL
+ 2 WARN + exit code 1, with each `next:` hint actionable
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-RFC-#2945 the broadcast + activity_log INSERT for "agent → user
chat" was duplicated across two handlers — activity.go's Notify (HTTP
/notify) and mcp_tools.go's toolSendMessageToUser (MCP tools/call).
The duplication is exactly what produced the reno-stars production
data-loss regression (PR #2944): the persistence-half fix landed for
one handler and silently lagged for the other for months, dropping
every long-form external-agent message on reload.
PR #2944 added the missing INSERT to mcp_tools.go and a forward-
looking AST gate. This PR removes the duplication at the source.
What changes
------------
NEW: workspace-server/internal/handlers/agent_message_writer.go
- AgentMessageWriter struct + NewAgentMessageWriter ctor.
- Send(ctx, workspaceID, message, attachments) error: workspace
lookup → broadcast WS AGENT_MESSAGE → INSERT activity_logs.
- ErrWorkspaceNotFound for the lookup-miss path so callers can
return 404 / JSON-RPC error cleanly.
- Best-effort persistence: INSERT failure logs only, returns nil so
the broadcast success isn't undone (matches previous behavior in
both call sites — pinned by test).
- Takes events.EventEmitter (interface) so tests can substitute a
capturing fake without nil-panicking inside hub.Broadcast.
UPDATED: activity.go:Notify
- Replaced ~75 lines of inline broadcast+INSERT with a 12-line
call to AgentMessageWriter.Send.
- Attachment shape conversion (NotifyAttachment → AgentMessageAttachment)
is local to the HTTP handler; the writer's API doesn't import the
HTTP-binding-tagged type.
UPDATED: mcp_tools.go:toolSendMessageToUser
- Replaced ~40 lines (the post-#2944 broadcast+INSERT pair) with a
6-line call to the writer.
- Attachments is nil today because the MCP tool args don't expose
attachments yet. When the schema adds it, build the slice and
pass through; the writer half is ready.
Tests
-----
agent_message_writer_test.go (8 tests, comprehensive):
- TestAgentMessageWriter_Send_Success_NoAttachments — happy path,
pins JSON `{"result":"hi"}`.
- TestAgentMessageWriter_Send_Success_WithAttachments — pins file
parts shape (kind=file, file.{uri,name,mimeType,size}). Uses a
jsonMatcher that decodes + asserts via predicate (tolerant of
map key ordering, exact on shape).
- TestAgentMessageWriter_Send_WorkspaceNotFound — pins
ErrWorkspaceNotFound + asserts NO broadcast NO INSERT.
- TestAgentMessageWriter_Send_DBInsertFailureStillReturnsNil — pins
best-effort persistence contract.
- TestAgentMessageWriter_Send_PreviewTruncation — pins ≤80-char
preview + ellipsis (Ryan's onboarding-friction report would have
bloated activity_logs.summary by 2KB without this).
- TestAgentMessageWriter_Send_BroadcastsAgentMessageEvent — pins WS
event name + payload shape via capturingEmitter.
- TestAgentMessageWriter_Send_OmitsAttachmentsKeyWhenEmpty — pins
the "no key when nil" wire contract.
The existing AST gate from #2944
(TestAgentMessageBroadcastsArePersisted) still holds: any future
function emitting AGENT_MESSAGE without an INSERT fails the test.
With the writer in place that's now redundant — both producers go
through it — but the gate is cheap to keep as defense-in-depth.
Verified: go vet clean; all writer + caller tests pass; existing
TestNotify_* + TestMCPHandler_SendMessage_* + the AST gate all green.
Refs RFC #2945, PR #2944.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review of #2935 turned up two real defects:
1. Stale README issue references — the build_runtime_package.py
README template said "(issue #2934 follow-up)" twice, but the
marketplace-plugin and `doctor` items now have dedicated tracking
issues. Updated to point at #2936 and #2937 respectively.
2. Silent fallthrough on broken MOLECULE_WORKSPACE_TOKEN_FILE — when
an operator EXPLICITLY pointed TOKEN_FILE at a path that didn't
exist / wasn't readable / was blank / contained internal whitespace,
the resolver silently returned the generic "set one of these three
vars" error. That's exactly the silent failure mode #2934 flagged
("a new user has no chance"). Refactor `_read_token_from_file_env`
to return `(token, error)`; surface the SPECIFIC failure when the
operator's intent was clearly the file path. Skip the CONFIGS_DIR
fallback in that case so the operator's config bug isn't masked
by a different source happening to work.
Adds 2 renames + 2 new tests in test_mcp_cli_split.py:
- test_missing_file_returns_specific_error (asserts "does not exist")
- test_empty_file_returns_specific_error (asserts "is empty")
- test_multi_line_file_rejected (asserts "internal whitespace")
- test_token_file_error_skips_configs_dir_fallback (asserts a valid
CONFIGS_DIR/.auth_token does NOT silently rescue a broken
TOKEN_FILE)
All 81 mcp_cli + mcp_cli_multi_workspace + mcp_cli_split tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per user request: audit all similar tools + write comprehensive tests
including E2E for the persistence-of-AGENT_MESSAGE-broadcasts contract.
Audit (all BroadcastOnly call sites in workspace-server/internal/):
| Site | Event | Persisted? | Notes |
|---|---|:---:|---|
| a2a_proxy_helpers.go:275 | A2A_RESPONSE | ✓ | LogActivity above |
| activity.go:486 (Notify) | AGENT_MESSAGE | ✓ | INSERT line 535 |
| activity.go:701 (LogActivity) | ACTIVITY_LOGGED | ✓ | self-emits inside DB write |
| mcp_tools.go:341 (toolSendMessageToUser) | AGENT_MESSAGE | ✓ NEW (this PR) |
| registry.go:575 | TASK_UPDATED | N/A | transient progress, not chat |
| registry.go:596 | WORKSPACE_HEARTBEAT | N/A | infra ping, not chat |
Only one chat-bearing broadcast was missing persistence (the just-
fixed mcp bridge path). No other regressions found.
Tests added (4 new, total 5 send_message_to_user tests):
1. TestAgentMessageBroadcastsArePersisted — AST gate that walks every
non-test .go in the package, finds funcs that BroadcastOnly with
"AGENT_MESSAGE", asserts each ALSO contains an
"INSERT INTO activity_logs". Forward-looking regression block:
any future chat tool that broadcasts without persisting fails the
test with a clear file:func diagnostic. Mutation-tested locally:
removing the INSERT block from toolSendMessageToUser reliably
produces the expected failure.
2. TestMCPHandler_SendMessageToUser_DBErrorLogsAndStill200s — pins
the "best-effort persistence" contract. DB INSERT failures must
NOT abort the tool response (the WS broadcast already succeeded;
retrying would double-render in the live chat). Matches /notify.
3. TestMCPHandler_SendMessageToUser_ResponseBodyShape — pins the
exact `{"result": "<message>"}` JSON shape stored in
response_body. The canvas hydrater (extractResponseText in
historyHydration.ts) reads body.result; any drift here silently
breaks chat history without failing the INSERT. Per memory
feedback_assert_exact_not_substring.md, asserts the literal JSON
shape, not a substring.
4. TestMCPHandler_SendMessageToUser_PersistsToActivityLog (existing,
from previous commit) — pins INSERT shape with regex on
'a2a_receive' + 'notify' literals.
5. TestMCPHandler_SendMessageToUser_Blocked_WhenEnvNotSet (existing)
— env-gate aborts before DB.
Test fixture cleanup: newMCPHandler now uses newTestBroadcaster (real
ws.Hub) instead of events.NewBroadcaster(nil) — the latter nil-panics
inside hub.Broadcast on the AGENT_MESSAGE path. Same broadcaster
shape every other handler test uses.
E2E note: the AST gate is the strongest forward-looking guarantee.
A real-DB integration test would add value for CI but is largely
duplicative of the sqlmock contract tests above (sqlmock pins SQL
shape with much faster feedback). Left as a future enhancement when
the handlers Postgres-integration suite extends MCP coverage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported on production tenant reno-stars: an external claude-code agent
(CEO Ryan PC workspace) sent a long-form message via send_message_to_user;
the user saw it live in the chat panel but it vanished after a refresh.
Confirmed via direct production query — the message is NOT in
activity_logs at all (only short test pings around it are persisted).
Root cause: there are TWO server-side handlers for send_message_to_user:
1. HTTP `/workspaces/:id/notify` (activity.go:Notify) — broadcasts WS
AND inserts a row into activity_logs. This is the path the
in-container runtime's tool_send_message_to_user calls.
2. MCP-bridge `tools/call name=send_message_to_user`
(mcp_tools.go:toolSendMessageToUser) — broadcasts WS only,
**never persisted**. This is the path EXTERNAL agents using
molecule-mcp's send_message_to_user tool route through.
The persistence fix landed for path 1 months ago but was never mirrored
on path 2. External agents — exactly the case in reno-stars/CEO Ryan PC
— have been silently losing every long-form notification on reload.
Fix: mirror the activity.go INSERT shape inside toolSendMessageToUser:
INSERT INTO activity_logs
(workspace_id, activity_type, method, summary, response_body, status)
VALUES ($1, 'a2a_receive', 'notify', $2, $3::jsonb, 'ok')
Same wire shape as /notify so the canvas's chat-history hydration
(`type=a2a_receive&source=canvas`) treats both writers identically.
Errors are log-only — broadcast already succeeded, persistence failure
shouldn't block the tool response (matches /notify behavior; downside
is the same data-loss-on-DB-error risk, surfaced via log.Printf).
Tests
-----
- `TestMCPHandler_SendMessageToUser_PersistsToActivityLog` — pins both
the workspace-name lookup AND the INSERT shape. Regex-matches
`'a2a_receive'` + `'notify'` literals so a future refactor that
changes activity_type or method breaks the test loud, not silently
re-introducing the data-loss bug.
- Updated newMCPHandler to use newTestBroadcaster() (real ws.Hub) —
events.NewBroadcaster(nil) crashes inside hub.Broadcast in the
send_message_to_user path. Same shape every other handler test uses.
Verified `go test ./internal/handlers/ -run TestMCPHandler_SendMessage`
green; full vet clean.
Refs reno-stars production incident 2026-05-05.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Continues the OSS-shape refactor. After iters 4a-4d (rbac, delegation,
memory, messaging) the only behavior left in ``a2a_tools.py`` was
``report_activity`` plus three thin inbox-tool wrappers and the
``_enrich_inbound_for_agent`` helper. This iter extracts the inbox
slice to ``a2a_tools_inbox.py`` so the kitchen-sink module shrinks
from 280 LOC to ~165 LOC of imports + report_activity + back-compat
re-export blocks.
Extracted symbols:
- ``_INBOX_NOT_ENABLED_MSG`` (sentinel)
- ``_enrich_inbound_for_agent`` (poll-path peer enrichment helper)
- ``tool_inbox_peek``
- ``tool_inbox_pop``
- ``tool_wait_for_message``
Re-exports (`from a2a_tools_inbox import …`) preserve the public
``a2a_tools.tool_inbox_*`` surface so existing tests + call sites
continue to resolve unchanged.
New tests in test_a2a_tools_inbox_split.py:
1. **Drift gate (5)** — every previously-public symbol on a2a_tools
is the EXACT same object as a2a_tools_inbox.foo (`is`, not `==`),
catches a future "wrap with logging" refactor that silently loses
existing test coverage.
2. **Import contract (1)** — a2a_tools_inbox does NOT eagerly import
a2a_tools at module load. Pins the layered architecture: the
extracted slice depends on ``inbox`` + a lazy ``a2a_client``
import, never on the kitchen-sink that re-exports it.
3. **_enrich_inbound_for_agent branches (5)** — peer_id-empty
(canvas_user) returns dict unchanged; missing peer_id key same;
a2a_client unavailable (test harness, partial install) degrades
gracefully with a bare envelope; registry hit populates
peer_name + peer_role + agent_card_url; registry miss still
surfaces agent_card_url (constructable from peer_id alone).
The full timeout-clamp / validation / JSON-shape behavior matrix for
the three wrappers stays in test_a2a_tools_inbox_wrappers.py — those
tests pass identically against both the alias and the underlying impl.
Wiring updates:
- ``scripts/build_runtime_package.py``: add ``a2a_tools_inbox`` to
``TOP_LEVEL_MODULES`` so it ships in the runtime wheel and the
drift gate doesn't fail the next publish.
- ``.github/workflows/ci.yml``: add ``a2a_tools_inbox.py`` to
``CRITICAL_FILES`` so the 75% MCP/inbox/auth per-file floor
applies — this is now where the inbox-delivery code actually
lives.
Ryan's bug report (#2934) walked through ~45 min of debugging a stock
external-runtime install. This PR fixes the four items he flagged that
have a small surface, and stubs out the larger ones for follow-up.
Fixed in this PR
================
#1 — Python floor disclosure (README in publish bundle)
Add an explicit "Requires Python ≥3.11" section that calls out the
cryptic "Could not find a version that satisfies the requirement"
failure mode; recommend `pipx install` over `pip install` so the
binary lands on PATH automatically; show the explicit `pip install
--user` alternative with the PATH caveat.
#3 — MOLECULE_WORKSPACE_TOKEN_FILE support (mcp_workspace_resolver.py)
Add a third resolution step between the inline env var and the
in-container CONFIGS_DIR fallback. Operators can write the bearer to
a 0600 file (e.g. ~/.config/molecule/token) and point
MOLECULE_WORKSPACE_TOKEN_FILE at it, keeping the secret out of
~/.zsh_history and out of plaintext in MCP-host configs like
~/.claude.json. Inline TOKEN still wins on conflict so rotation flows
are predictable. README documents the safer option as the
recommended path. 6 new tests pin every leg (file resolves, inline
wins, missing/empty file falls through, blank env unset-equivalent,
help text advertises it).
#4 — Push delivery 3-condition gating (README in publish bundle)
Document that real-time push on Claude Code requires (a) the server
to declare experimental.claude/channel (we do), (b) the server to be
marketplace-plugin-sourced (operators must scaffold their own until
the official marketplace lands — see #2934 follow-up), and (c) the
--dangerously-load-development-channels flag on the claude
invocation. Until any of the three is in place, delivery silently
falls back to poll mode with no diagnostic. The README now says all
of this explicitly so a new operator doesn't grep the binary for
channel_enable to figure it out.
#8 — serverInfo.name mismatch (a2a_mcp_server.py)
The server reported `serverInfo.name = "a2a-delegation"` while
operators register it as `molecule` (the name in `claude mcp add
molecule …`). Harmless on tool routing today but matters for any
future Claude Code allowlist that gates push by hardcoded server
name. Renamed to "molecule" with an inline comment explaining the
invariant.
Deferred (separate issues to track)
===================================
#2 — covered transitively by #1's pipx recommendation; no separate fix.
#5 — `moleculesai/claude-code-plugin` marketplace repo (substantial new
repo work; the README references it as a documented follow-up).
#6 — `molecule-mcp doctor` subcommand (substantial new CLI surface;
mentioned in the README's push-vs-poll section as the planned
diagnostic for silent push fallback).
#7 — `--dangerously-load-development-channels` rename — not in our
control; that's Claude Code's flag.
Tests
=====
164/164 mcp_cli + a2a_mcp_server tests pass locally
(WORKSPACE_ID=00000000-0000-0000-0000-000000000001 pytest …) including
6 new TestTokenFileEnv cases. Wheel builds successfully via
scripts/build_runtime_package.py with the new README markers verified
in the output.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After RFC #2873 iter 4d extracted messaging tools to
``a2a_tools_messaging.py``, the only behavior left in ``a2a_tools.py``
is ``report_activity`` (covered by test_a2a_tools_impl) plus three
thin wrappers around inbox state — ``tool_inbox_peek``,
``tool_inbox_pop``, ``tool_wait_for_message`` — which were never
directly exercised at the module level.
Per-file critical-path coverage dropped to 54.4% on the iter 4d
branch, breaking the 75% MCP/inbox/auth floor in ci.yml.
Adds ``test_a2a_tools_inbox_wrappers.py`` — 14 focused tests on the
three wrappers covering: inbox-disabled fallback (via the
_INBOX_NOT_ENABLED_MSG sentinel), input validation
(empty/non-str activity_id, non-int peek limit), the timeout clamp
contract on wait_for_message (300s ceiling, 0s floor, non-numeric
fallback to 60s), JSON-shape pinning, and the limit/activity_id
forwarding contract.
Result: a2a_tools.py back to 100% covered with the existing impl-tests
suite, gate green.
Two related fixes to the Connect-External-Agent flow that the user
flagged: the "Need help?" disclosure block in the modal is for the
operator's eyes only — but the agent reading the pasted snippet has
no access to that context. And the docs URL was pointing at a
hostname that doesn't resolve.
User-visible problems:
1. The agent doesn't see the install link, docs link, or the common-
error/check pairs that the human pasted. When the agent fails to
register or hits ConnectionRefused, it can't self-diagnose because
the troubleshooting context lives in a separate UI block.
2. https://docs.molecule.ai → DNS NXDOMAIN. Every "Documentation"
link in the modal was a dead link.
## Fixes
### Move help INTO the snippet (not a separate human-only UI block)
Each of the 7 server-rendered templates in
`workspace-server/internal/handlers/external_connection.go` now
appends a `# Need help?` section with: install link, correct docs
link, and the top common errors as `# • symptom — check` pairs.
Templates updated: curl / channel (Claude Code) / mcp (Universal MCP) /
python / hermes / codex / openclaw. Agents reading the paste now have
the same diagnostic context the human did.
### Drop the duplicated UI block in the canvas modal
`canvas/src/components/ExternalConnectModal.tsx`:
- Removed the `TAB_HELP` per-tab metadata constant (152 lines).
- Removed the `HelpBlock` component (62 lines).
- Removed the `<HelpBlock help={TAB_HELP[tab]} />` render call.
The snippet is now the single source of truth for tab-level help.
### Fix the wrong docs hostname
The actual docs site is `doc.moleculesai.app` (singular `doc`,
`.app` not `.ai`), confirmed by:
- `package.json` description in `Molecule-AI/docs` repo →
"Molecule AI documentation site — doc.moleculesai.app"
- HTTP HEAD on the new URL → 200 for both
`/docs/guides/mcp-server-setup` and
`/docs/guides/external-agent-registration`
- HTTP HEAD on old `docs.molecule.ai` → 000 (NXDOMAIN)
All template docs URLs now point at `doc.moleculesai.app`.
## Verification
- `go build ./...` clean
- `go test ./internal/handlers/... -count=1` green
- `pnpm test` → 1291/1291 pass (unchanged)
- `tsc --noEmit` clean
- 219 LOC removed (canvas duplicate UI), 69 LOC added (snippet help)
- Net `-150 LOC` while gaining the agent-readable help
## Out of scope (deferred, captured in followups)
- One blog post still has `canonical: "https://docs.molecule.ai/blog/..."`
in `src/app/blog/2026-04-20-chrome-devtools-mcp/page.mdx` — separate
blog-content fix.
- Comment in `theme-provider.tsx` references `docs.moleculesai.app`
(with `s`) — comment-only, not a runtime URL.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-reported friction: pip install molecule-ai-workspace-runtime on a
3.10 interpreter fails with "Could not find a version that satisfies the
requirement (from versions: none)" — pip's requires_python filter
silently drops the only available artifact before attempting install,
so the error doesn't mention Python at all. Operators see
"package missing", file a bug, and chase a phantom CDN/visibility
issue.
Two changes mirror the requirement at the two operator-touch surfaces:
1. workspace-server/internal/handlers/external_connection.go:
the externalUniversalMcpTemplate snippet (rendered into the
canvas Connect-External-Agent modal) now leads with a brief
"Requires Python >= 3.11" block + diagnostic + upgrade paths.
2. docs/workspace-runtime-package.md: same callout at the top of
the doc, before the Overview, so anyone landing here from search
gets the answer immediately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bot lint flagged the two imports as unused (correct — neither is
referenced after the file shrank during review). Resolves the two
unresolved review threads silently blocking merge per the staging
"all conversations resolved" gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Iter 4c (#2890) moved tool_commit_memory + tool_recall_memory into
a2a_tools_memory.py, which has its own top-level `import httpx`.
test_mcp_memory.py + the secret-redact memory tests still patched
`a2a_tools.httpx.AsyncClient`, which after the move is the WRONG
module's reference — the real call inside the moved tool resolves to
`a2a_tools_memory.httpx.AsyncClient` and reaches the network. CI
catches this as 7 failures: JSONDecodeError on empty bodies and
"All connection attempts failed" on the recall side.
Update 7 patch sites to `a2a_tools_memory.httpx.AsyncClient`. The
existing tests in `test_a2a_tools_impl.py` were already updated by
the iter-4c PR; only these two files were missed.
Verified: pytest workspace/tests/test_mcp_memory.py +
test_secret_redact.py — 43/43 pass after the fix (both files were
red on the iter-4c branch CI).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small hardening passes from #2872's optional/important findings,
batched into one polish PR:
1. errors.Is(err, sql.ErrNoRows) instead of err == sql.ErrNoRows.
The bare equality breaks if any future caller wraps the error via
fmt.Errorf("…: %w", err) — the no-rows happy path would fall
through to the "real DB error" branch and abort the import.
errors.Is unwraps. New test
TestLookupExistingChild_WrappedNoRows_TreatedAsNotFound pins the
fix; verified the test fails on the old `==` shape (build break
on unused-import + assertion failure once import dropped).
2. Bounded 5s timeout on lookupExistingChild instead of
context.Background().
The createWorkspaceTree call site runs in goroutines spawned from
the /org/import handler, so plumbing the request context here
would cascade-cancel into provisionWorkspaceAuto and abort
in-flight EC2 provisioning if the client disconnected mid-import
— that's the wrong tradeoff. A short bounded timeout protects the
per-row SELECT against a wedged DB without taking the
drop-everything-on-disconnect behaviour. The lookup is a single
~10ms query; 5s leaves 500x headroom for transient slow paths.
3. Godoc clarifications on the skip-path block.
- /org/import is ADDITIVE-ONLY, never destructive. Children
present in the existing tree but absent from the new template
are preserved (no DELETE on diff).
- Skip-path does NOT propagate updates to existing nodes — a
re-import that adds an initial_memory or schedule to an
existing workspace is silently dropped. Document the limitation
so future operators know to delete-and-re-import or reach for
a future /org/sync route.
Verification:
- go build ./... → clean
- go test ./internal/handlers/... → all passing (TestLookup* +
TestCreateWorkspaceTree* + TestClass1* + TestGate*)
- 4 lookup tests + 1 new wrap-safety test → 5/5 PASS
- Full handlers suite → green
Refs molecule-core#2872 (Optional findings — wrap-safety + ctx, godoc
clarifications for additive-only + skip-path-update-limitation)
Out of scope (deferred):
- PR-D partial unique index migration + ON CONFLICT — sequenced
after Phase 4 cleanup verified clean per #2872 plan
- PR-E full createWorkspaceTree integration test for partial-match
— needs heavier sqlmock scaffolding for downstream
workspaces_audit/canvas_layouts/secrets/channels INSERTs;
follow-up
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-visible problem: agent-comms panel opens mid-conversation on long
histories (the same chat-opens-in-middle bug PR #2903 fixed for
my-chat) and silently renders empty state when the history fetch fails
(no retry button, no diagnostic).
Three changes mirror the my-chat patterns from ChatTab:
1. Initial-mount instant scroll.
Adds hasInitialScrollRef + switches the scroll hook from useEffect
to useLayoutEffect. First arrival of messages → scrollIntoView
`instant`; subsequent appends → `smooth` as before. useLayoutEffect
runs before paint so the user never sees the panel jump for one
frame on every append.
2. Error UI with Retry button.
Adds `loadError` state. The history-load .catch now sets the
error message; a new branch in the render renders a red alert
with the failure text and a Retry button that re-invokes
`loadInitial`. Same shape as ChatTab MyChatPanel's `loadError`
handling — both surfaces should fail loud, not silent.
3. Extracted `loadInitial` callback.
The history-load body becomes a useCallback so the retry button
has a stable reference to call. Mirrors ChatTab's loadInitial.
Tests (4 new in AgentCommsPanel.render.test.tsx):
- Loading state renders the loading copy.
- Error state with Retry button renders on rejection; clicking
Retry fires a second api.get.
- Empty state renders when load succeeds with zero rows.
- scrollIntoView is called with behavior=instant on first message
arrival (pins the chat-opens-in-middle prevention).
Verification:
- pnpm test → 1284/1284 pass (1280 prior + 4 new)
- tsc --noEmit → clean
- 92 → 93 test files, no existing test broken
Closes the parity gap raised in chat. The two surfaces now share:
loading copy / error UI / empty-state placeholder / scroll behaviour /
useLayoutEffect timing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the user-visible flow that Phase 1-5b shipped (RFC #2891):
register a poll-mode workspace, POST a multi-file /chat/uploads, verify
the activity feed shows one chat_upload_receive row per file, fetch the
bytes via /pending-uploads/:fid/content, ack each row, and confirm a
post-ack fetch returns 404. Also pins cross-workspace bleed protection
(workspace B's bearer on A's URL → 401, B's URL with A's file_id →
404) and the file_id-UUID-parse 400 path.
23 assertions, all green against a local platform (Postgres+Redis+
platform-server stack matches the e2e-api.yml CI recipe verbatim).
Why a new script instead of extending test_poll_mode_e2e.sh: that
script tests A2A short-circuit + since_id cursor semantics; this one
tests the chat-upload path. They share zero handler code on the
platform side and would dilute each other's failure messages if
combined.
Why not the bearerless-401 strict-mode assertion: the platform's
wsauth fail-opens for bearerless requests when MOLECULE_ENV=development
(see middleware/devmode.go). The CI workflow doesn't set that var, but
some local-dev .env files do — the assertion would flap by environment
without testing the poll-mode upload contract. The middleware's own
unit tests cover strict-mode 401.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds class1_ast_gate_test.go — a per-package AST walk that fails the
build if any handler function INSERTs INTO workspaces inside a range
loop body without one of three escape hatches:
1. A call to a registered preflight helper (lookupExistingChild today;
extend preflightCallNames as new helpers are introduced).
2. An ON CONFLICT clause in the same SQL literal (idempotent UPSERT,
like registry.go).
3. An explicit `// class1-gate: idempotent-by-design` comment in the
function body (deliberately awkward — forces a code-review beat).
Why this is broader than the existing
TestCreateWorkspaceTree_CallsLookupBeforeInsert gate in
org_import_idempotency_test.go: that one is hard-coded to one function
in one file. This one walks every non-test .go file in the handlers
package and applies a structural rule independent of file/function
names. A future handler written from scratch in a new file would not
have been covered before — now it is.
Detection mechanism (per AST):
- Collect spans (Lbrace..Rbrace) of every RangeStmt body in each
function. Position-based instead of stack-based — ast.Inspect's
nil-callback ordering doesn't give per-node pop semantics, so a
naive push/pop stack silently miscounts. Position spans are
deterministic.
- Walk every BasicLit, regex-match `^\s*INSERT INTO workspaces\(`
(tightened from bytes.Index "INSERT INTO workspaces" so
workspaces_audit literals don't false-positive — same regex used
by the existing createWorkspaceTree gate).
- For each match: record insertLine, hasONCONFLICT, and the
innermost enclosing RangeStmt line (or 0 if not inside any range).
- Fail the function if INSERT is inside a range AND no preflight
AND no ON CONFLICT AND no allowlist annotation.
Self-tests (per `feedback_assert_exact_not_substring.md` —
verify gate fails on the bug shape before merging):
- TestClass1_GateFiresOnSyntheticBuggySource: synthetic source
where INSERT is inside `for _, child := range children` body
must trigger the gate's three guards (enclosingRangeLine!=0,
hasONCONFLICT=false, no preflight call).
- TestClass1_GateAllowsONCONFLICT: synthetic INSERT...ON CONFLICT
must NOT trigger the gate (idempotent UPSERT case).
- TestClass1_GateAllowsAllowlistAnnotation: function with
`// class1-gate: idempotent-by-design` must be skipped.
- TestClass1_NoUnpreflightedInsertInsideRange: production sweep
over every handler .go file. Currently passes because
org_import.go preflights, registry.go ON-CONFLICTs, and
workspace.go's Create has no INSERT inside a range body.
Verification:
- go test ./internal/handlers/... -run TestClass1_ -count=1
→ 4/4 PASS
- go test ./internal/handlers/... -count=1 → suite green
(no pre-existing test broken by the new file)
Refs molecule-core#2867 (PR-A Class 1 generic AST gate)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2906 spawned the sidecar unconditionally on every tenant boot. The
plugin's first migration runs \`CREATE EXTENSION vector\` which fails
on tenant Postgres without pgvector preinstalled — every staging
tenant redeploy aborted at the 30s health gate. CP fail-fast kept
running tenants on the prior image (no outage), but the new image
was DOA.
Caught on staging redeploy 2026-05-05 19:23 with
\`pq: extension "vector" is not available\`.
Fix: only spawn the sidecar when the operator has flipped the cutover
flag — \`MEMORY_V2_CUTOVER=true\` OR \`MEMORY_PLUGIN_URL\` is set.
* Aligns the entrypoint to the same opt-in posture wiring.go already
uses (it skips building the client when MEMORY_PLUGIN_URL is empty).
* Until cutover, the sidecar isn't even running — no migration, no
health gate, no boot-time pgvector dependency.
* Operators activating cutover already redeploy with the new env
vars set; that's when the sidecar starts. By definition they've
verified pgvector is available before flipping.
* MEMORY_PLUGIN_DISABLE=1 escape hatch preserved; harness fix#2915
becomes belt-and-suspenders (still respected).
Both Dockerfile and entrypoint-tenant.sh updated. Behavior change for
existing deployments: zero (cutover env vars still unset → sidecar
still inert, but now also not running).
Refs RFC #2728. Hotfix for #2906; supersedes the migration-path
fragility class (the sidecar isn't doing migrations on tenants that
won't use it).
The deadline contract was incomplete: wait_all logged the timeout but
close() then called executor.shutdown(wait=True), which blocked on
the leaked workers — undoing the user-facing timeout. The inbox poll
loop would stall indefinitely on a hung /content fetch instead of
returning to chat-message processing.
Fix: wait_all now flips self._timed_out and cancels queued (not-yet-
started) futures; close() reads that flag and switches to
shutdown(wait=False, cancel_futures=True) on the timeout path.
Currently-running workers can't be interrupted by Python's threading
model, but they're now detached daemons whose blocking httpx call
no longer gates the next poll.
Healthy path (no timeout) keeps the existing drain-and-wait so a
still-queued ack POST isn't dropped mid-write.
Two new tests pin both legs of the contract end-to-end:
- close-after-timeout-doesn't-block: hung worker, wait_all(0.05s)
fires the timeout, close() returns in <1s instead of waiting ~5s
for the worker to come back.
- close-without-timeout-still-drains: 2 slow workers, wait_all
completes cleanly, close() drains both ack POSTs.
Resolves the BatchFetcher timeout-cancellation finding from the
post-merge five-axis review of Phase 5b.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds internal/provlog with a single Event(name, fields) helper that
emits JSON-tagged single-line records to the standard logger. Five
boundary sites instrumented for #2867:
provision.start — workspace_dispatchers.go (sync + async)
provision.skip_existing — org_import.go idempotency hit
provision.ec2_started — cp_provisioner.go after RunInstances
provision.ec2_stopped — cp_provisioner.go after TerminateInstances ack
restart.pre_stop — workspace_restart.go before Stop dispatch
These pair with the existing human-prose log.Printf lines (kept). The
new records are grep+jq friendly so a future log-aggregation pipeline
can reconstruct per-workspace provision timelines without parsing the
operator messages — this is the "and debug loggers so it dont happen
again" half of the leak-prevention work.
Tests:
- provlog: emits evt-prefixed JSON, nil-tolerant, marshal-error
fallback preserves event boundary, single-line output pinned.
- handlers: provlog_emit_test.go pins three call-site contracts:
provisionWorkspaceAutoSync emits provision.start with sync=true,
stopForRestart emits restart.pre_stop with backend=cp on SaaS,
and backend=none when both backends are nil.
Field taxonomy is convenience for ops, not contract — payload can grow
additively without breaking callers. Behavior gate is the event name +
boundary location, per feedback_behavior_based_ast_gates.md.
Refs #2867 (PR-D structured logging at provisioning boundaries)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to molecule-controlplane#485. The first half of #2913 wired
a Sign-out button + signOut() helper that POSTed /cp/auth/signout, but
clicking still left the user signed in: WorkOS's browser cookie
preserved the SSO session, /cp/auth/login auto-re-authed via SSO, and
the user landed back on /orgs.
CP PR #485 returns the AuthKit hosted logout URL in the signout
response. This change has signOut() navigate the browser there
instead of /cp/auth/login. AuthKit clears its cookie + redirects to
return_to (configured server-side from APP_URL) → next /cp/auth/login
hits a fresh AuthKit, no SSO session, login form actually shows.
Defensive parsing: malformed JSON, missing logout_url, or wrong-type
logout_url all fall through to the legacy /cp/auth/login fallback,
which works locally (DisabledProvider, dev) where there's no SSO to
escape.
Forward-compat: when CP doesn't have #485 deployed yet, signOut()
sees logout_url="" or missing → fallback fires. Order of merge
between this and #485 doesn't matter, but the bug isn't actually
fixed end-to-end until both ship.
Tests added (3 new, 15 total auth.test.ts):
- Hosted logout: navigates to logout_url when response includes one.
- DisabledProvider path: falls back to /cp/auth/login when "".
- Defensive: malformed JSON body → fallback (no crash).
- Defensive: non-string logout_url → fallback (no open redirect).
Verified:
- npx vitest run src/lib/__tests__/auth.test.ts — 15/15 pass
- tsc --noEmit clean
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported externally on 2026-05-05: "SaaS app logout does not work."
Root cause: the control plane has had POST /cp/auth/signout (clears the
WorkOS session cookie + revokes at the provider) since auth shipped,
but no canvas code ever called it. grep across canvas/ for
`logout|signOut|signout|sign-out` returned zero results — no helper,
no button, no menu entry. Users had no path to log out short of
clearing cookies in DevTools.
This is a UI gap, not a backend bug. Adding the missing pieces:
1. `signOut()` helper in `canvas/src/lib/auth.ts`:
- POST /cp/auth/signout with credentials:include (cross-origin
cookie required for tenant subdomain → app subdomain)
- Best-effort: a 5xx, 401-stale-cookie, or network failure still
redirects the browser to /cp/auth/login. Leaving the user on an
authed-looking page after they clicked Sign out is the worst
possible UX — that's the precise "logout doesn't work" symptom
the report described.
- Lands on /cp/auth/login (not the current URL) so the user
doesn't loop back into the org they just left via AuthGate's
return_to.
2. `AccountBar` component on /orgs page Shell — renders the signed-in
email + Sign-out button at the top. Click → signOut() →
`Signing out…` → bounces to login. Disabled-while-pending so a
double-click can't fire two requests.
3. Tests in `auth.test.ts` (4 new, total 12 pass):
- POSTs to the right endpoint with credentials:include
- Redirects to /cp/auth/login after success
- Redirects EVEN ON network failure (the critical UX invariant)
- Redirects on 401 (stale cookie path)
The auth-origin resolution (`getAuthOrigin`) is reused so a tenant
subdomain (acme.moleculesai.app) correctly POSTs to
app.moleculesai.app/cp/auth/signout — same chain that fetchSession
+ redirectToLogin already use.
Test plan:
- [x] `npx vitest run src/lib/__tests__/auth.test.ts` — 12/12 green
- [x] `tsc --noEmit` — clean
- [ ] Manual: navigate to /orgs, click Sign out, observe redirect +
that the next /orgs visit bounces to login (cookie cleared)
- [ ] CI green
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-model retrospective review of #2856 (Phase 1 Expand removal)
flagged that TeamHandler.Collapse is unreachable from the canvas UI:
the "Collapse Team" button calls PATCH /workspaces/:id { collapsed }
(visual flag toggle on canvas_layouts), NOT POST /workspaces/:id/collapse.
The destructive POST route — which stops EC2s, marks children removed,
and deletes layouts — has zero UI callers (verified via grep across
canvas/, scripts/, and the MCP tool registry; only docs referenced it).
Two semantically different operations had been sharing the word
"Collapse":
- Visual collapse (canvas) → PATCH { collapsed: true }. Hides
children visually. Reversible. UI-only.
- Destructive collapse (POST /collapse) → Stops + marks removed.
Irreversible. No caller.
Deleting the destructive one + its supporting machinery:
- workspace-server/internal/handlers/team.go (entirely)
- workspace-server/internal/handlers/team_test.go (entirely)
- POST /collapse route + teamh init in router.go
- findTemplateDirByName helper (zero non-test callers after Expand
was deleted in #2856; package-private so no out-of-package consumers)
- NewTeamHandler constructor (no callers after route removed)
Plus stale doc references (the most dangerous was the MCP wrapper
mapping in mcp-server-setup.md — anyone generating MCP tool wrappers
from that table was wiring a 404):
- docs/agent-runtime/team-expansion.md (deleted entirely — whole
guide taught the deleted flow)
- docs/api-reference.md (dropped two team.go rows)
- docs/api-protocol/platform-api.md (dropped /expand + /collapse
rows)
- docs/architecture/molecule-technical-doc.md (dropped /expand +
/collapse rows)
- docs/guides/mcp-server-setup.md (dropped expand_team +
collapse_team MCP wrapper mappings)
- docs/glossary.md (dropped "(org template expand_team)"
parenthetical)
- docs/frontend/canvas.md (dropped broken link to deleted
team-expansion.md)
Kept: docs/architecture/backends.md mention of "TeamHandler.Expand
(#2367) bypassed routing on Start" — correct historical context for
the AST gate's existence, no live route reference.
Visual-collapse path unaffected:
canvas/src/components/ContextMenu.tsx:227 → api.patch — unchanged
canvas/src/components/WorkspaceNode.tsx:128 → api.patch — unchanged
go vet ./... clean. go test ./internal/handlers/ -count 1 — all green
(4.3s, no regression).
Net: -388/+10 = ~378 lines removed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2906 shipped the binary at /memory-plugin without the migrations
directory. The plugin's runMigrations() resolved a relative path
\`cmd/memory-plugin-postgres/migrations\` that exists in the build
context but NOT in the runtime image. Every staging tenant boot
failed with:
memory-plugin-postgres: migrate: read migrations dir
"cmd/memory-plugin-postgres/migrations": open
cmd/memory-plugin-postgres/migrations: no such file or directory
memory-plugin: ❌ /v1/health never returned 200 after 30s
— aborting boot
Caught on the staging redeploy fleet job after #2906 merged. Tenants
stayed on the old image (CP redeploy correctly fail-fasted) but the
new image was broken.
Fix: \`//go:embed migrations/*.up.sql\` bundles the migrations into
the binary at build time. No filesystem path dependency at runtime.
* \`embed.FS\` embeds the .up.sql files alongside the binary.
* runMigrations() reads from migrationsFS by default;
MEMORY_PLUGIN_MIGRATIONS_DIR override path preserved for operators
shipping custom migrations.
* Names sorted alphabetically — pinned by a test so a future
\`002_*.up.sql\` is guaranteed to run after \`001_*.up.sql\`.
Tests:
* TestMigrationsEmbedded_ContainsCreateTable — pins that the embed
pattern matched files AND those files contain CREATE TABLE
(catches both empty-pattern and wrong-files-embedded).
* TestRunMigrationsFromEmbed_OrderingIsAlphabetic — pins sorted
application order.
Verified locally: \`go build\` succeeds, binary 9.3MB,
\`strings\` shows the embedded SQL.
Refs RFC #2728. Hotfix for #2906.
Lint nit from review bot — _drain_uploads() runs and the function
immediately advances to the cursor save + return, so the local
re-assign is dead code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_start_poller_thread_is_daemon spawned a daemon thread with no stop
mechanism — the leaked thread polled every 10ms with the test's patched
httpx.Client mock STILL ACTIVE for ~50ms after the test scope. Later
tests that re-patched httpx.Client + asserted call counts on
fetch_and_stage / Client construction got their assertions inflated by
the leaked thread's iterations.
Symptoms: test_poll_once_skips_chat_upload_row_from_queue saw
fetch_and_stage called twice instead of once on Python 3.11 CI;
test_batch_fetcher_owns_client_when_not_supplied saw two Client
constructions instead of one in the full local suite. Both surfaced
only after Phase 5b's BatchFetcher refactor changed the timing window
that allowed the leaked thread to fire mid-test.
Fix: extend start_poller_thread with an optional stop_event kwarg
(backward compatible — production callers pass None and rely on the
daemon flag for process-exit cleanup). The test now signals + joins
on stop_event before exiting scope, so the thread is gone before any
later test patches httpx.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2906 bundled memory-plugin-postgres as a startup-gated sidecar in
both tenant entrypoints. Plugin migrations include
\`CREATE EXTENSION IF NOT EXISTS vector\` which fails on the harness's
plain postgres:15-alpine (no pgvector preinstalled). The 30s health
gate then aborts container boot and Harness Replays fails.
Detected on auto-promote PR #2914 — Harness Replays job:
Container harness-tenant-alpha-1 Error
Container harness-tenant-beta-1 Error
dependency failed to start: container harness-tenant-alpha-1 exited (1)
The harness doesn't exercise memory features, so the simplest fix is
to use the documented escape hatch the sidecar entrypoint already
ships (MEMORY_PLUGIN_DISABLE=1) — applied to both alpha and beta
tenants in compose.yml. Alternative would be switching the harness
postgres images to pgvector/pgvector:pg15, deferred until the
harness wants to verify memory paths.
Refs PR #2906. Unblocks #2914 (auto-promote staging→main).
Multi-model retrospective review of #2901 found three Critical gaps:
1. (#2910 PR-B) template_import.go:79 wrote `tier: 3` hardcoded into
generated config.yaml. On SaaS this defeated the T4 default at the
create-handler layer — a config-less template import landed at T3
regardless of POST /workspaces' computed default. The 4th
default-tier site #2901 missed.
2. (#2910 PR-A) #2901 claimed `go test ... all green` but added zero
new tests. Existing structural-pin tests caught dispatch-layer
drift but said nothing about tier-default drift. A future refactor
that flips DefaultTier() to always return 3 would ship green.
3. (#2910 PR-E) org_import.go fallback returned T2 on self-hosted
while workspace.go returned T3. Internally consistent ("bulk vs
interactive defaults") but undocumented same-name-different-value
drift.
Fix:
- TemplatesHandler.NewTemplatesHandler now takes `wh *WorkspaceHandler`
(nil-tolerant for read-only callers). Import + ReplaceFiles compute
tier via h.wh.DefaultTier() and pass it to generateDefaultConfig.
generateDefaultConfig gets a `tier int` parameter (bounds-checked,
invalid input falls back to T3).
- org_import.go fallback lifts to h.workspace.DefaultTier() — single
source of truth shared with Create + Templates so a future
tier-default change sweeps every entry point at once.
- New saas_default_tier_test.go pinning:
TestIsSaaS_TrueWhenCPProvWired
TestIsSaaS_FalseWhenOnlyDocker
TestDefaultTier_SaaS_IsT4
TestDefaultTier_SelfHosted_IsT3
TestGenerateDefaultConfig_RespectsTierParam
TestGenerateDefaultConfig_SelfHostedTierT3
TestGenerateDefaultConfig_OutOfRangeFallsBackToT3
- Existing template_import_test.go tests + chat_files_test.go +
security_regression_test.go updated to thread the new tier param /
wh constructor arg through their NewTemplatesHandler calls. Their
pre-#2910 assertion of `tier: 3` is preserved (now passes because
the test caller passes `3` explicitly), so no regression.
go vet ./... clean. go test ./internal/handlers/ -count 1 — all
green (4.2s).
Deferred to separate follow-ups (per #2910 plan):
- PR-C: MOLECULE_DEPLOYMENT_MODE explicit deployment-mode signal
(closes the IsSaaS()=cpProv!=nil structural fragility)
- PR-D: Host iptables IMDS block + IMDSv2 hop-limit (paired with
molecule-controlplane EC2-IAM-scope audit)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review of PR #2906 flagged: defaultListenAddr was ":9100" — binds
on every container interface. Inside today's deployment that's moot
(no host port mapping, platform talks over loopback) but it's not
least-privilege. A future Dockerfile edit that publishes the port,
a misconfigured Fly machine, or a future cross-host plugin topology
would expose an unauth'd memory store.
Loopback is the right baseline. Operators with a multi-host topology
already override via MEMORY_PLUGIN_LISTEN_ADDR — that path is unchanged.
Tests:
* TestLoadConfig_DefaultListenAddrIsLoopback pins the new default.
* TestLoadConfig_ListenAddrEnvOverride pins the override path so
operators relying on it don't break.
* TestLoadConfig_MissingDatabaseURL covers the existing fail-fast.
No prior unit tests existed for loadConfig — boot_e2e_test.go always
sets MEMORY_PLUGIN_LISTEN_ADDR explicitly, so the default was never
exercised by tests. This PR adds that coverage.
Refs RFC #2728. Hardening follow-up to PR #2906.
Resolves the two remaining findings from the Phase 1-4 retrospective
review (the Python-side counterparts to phase 5a):
1. Important — inbox_uploads.fetch_and_stage blocked the inbox poll
loop synchronously per row. A user dragging 4 files into chat at
once would stall the poller for 4× per-fetch latency before the
chat message reached the agent. Add BatchFetcher: a thread-pool
wrapper (default 4 workers) that submits fetches concurrently and
exposes wait_all() as the barrier the inbox loop calls before
processing the chat-message row that references the uploads.
The drain barrier is the correctness invariant: rewrite_request_body
must observe a populated URI cache when it walks the chat-message
row's parts. _poll_once now drains the BatchFetcher inline before
the first non-upload row, AND at end-of-batch (case: batch contains
only upload rows; the corresponding chat message arrives in a later
poll, but the future-poll-races-current-fetch race is closed).
2. Nit — fetch_and_stage created two httpx.Client instances per row
(one for GET /content, one for POST /ack). Refactor so a single
client serves both calls. When called from BatchFetcher, the
batch-shared client serves every row's GET + ack — so the second
fetch reuses the TCP+TLS handshake from the first.
Comprehensive tests:
- 13 new inbox_uploads tests:
- fetch_and_stage with supplied client: zero httpx.Client
constructions, GET+POST through the same client, caller's client
not closed (lifecycle owned by caller).
- fetch_and_stage without supplied client: exactly one
httpx.Client constructed (was 2 pre-fix), closed on the way out.
- BatchFetcher: 3 rows × 120ms = parallel completion < 250ms
(vs. ~360ms serial), URI cache hot when wait_all returns,
per-row failure isolation, single-client reuse across all
submits, idempotent close, submit-after-close raises,
owned-vs-supplied client lifecycle, no-op wait_all on empty
batch, graceful httpx-missing degradation.
- 3 new inbox tests:
- poll_once drains uploads before processing the chat-message row
(in-place mutation of row['request_body'] proves the URI was
rewritten BEFORE message_from_activity returned).
- poll_once with only upload rows still drains at end-of-batch.
- poll_once with no upload rows never constructs a BatchFetcher
(zero overhead on the no-upload happy path).
133 total inbox + inbox_uploads tests pass; 0 regressions.
Closes the chat-upload poll-mode-perf gap end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds TestINSERTworkspacesAllowlist: walks every non-test .go in this
package, finds funcs containing an `INSERT INTO workspaces (` SQL
literal, and pins the result against an explicit allowlist with the
safety mechanism named per entry.
New entries fail the build until a reviewer adds them — forcing the
question "what makes this INSERT idempotent?" at PR-review time, not
after the next bulk-create leak (the shape that produced 72 stale
child workspaces in tenant-hongming over 4 days).
Pairs with TestCreateWorkspaceTree_CallsLookupBeforeInsert (the
behavior pin for the one bulk path today). Together:
- this test catches "did a new function start inserting?"
- that test catches "did the existing bulk path drop its idempotency check?"
Both fire immediately when drift happens.
Current allowlist (3 entries):
- org_import.go:createWorkspaceTree → lookup-then-insert via
lookupExistingChild (#2868 phase 3, also pinned by the sibling AST
gate from #2895)
- registry.go:Register → ON CONFLICT (id) DO UPDATE (idempotent by
primary key — external workspace upsert)
- workspace.go:Create → single-workspace POST /workspaces, server-
generated UUID, no iteration
Verified via mutation: dropping a synthetic tempBulkLeakTest with an
unsafe loop+INSERT into the package fails the gate with a clear
diagnostic pointing at the file + function. Restoring the tree
returns the gate to green.
Memory: feedback_assert_exact_not_substring.md (verify tightened test
FAILS on bug shape) — mutation proof done locally.
RFC #2867 class 1. Class 2 (Prometheus gauge for ec2_instance
duplicates) + class 3 (structured logging on workspace create) are
follow-up PRs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves four of six findings from the retrospective code review of Phases
1–4 (poll-mode chat upload). Bundled because every change is in the
platform's pending_uploads layer or the multi-file handler that reads it.
Findings resolved:
1. Important — Sweep query lacked an index for the acked-retention OR-arm.
The Phase 1 partial indexes are both `WHERE acked_at IS NULL`, so the
`(acked_at IS NOT NULL AND acked_at < retention)` half of the WHERE
clause seq-scanned the table on every cycle. Add a complementary
partial index on `acked_at WHERE acked_at IS NOT NULL` so both arms
of the disjunction are index-covered. Disjoint from the existing two
indexes (no row matches both predicates), so write amplification is
bounded to ~one index entry per terminal-state row.
2. Important — uploadPollMode partial-failure left orphans. The previous
per-file Put loop committed rows 1..K-1 and then errored on row K with
no compensation, so a client retry would double-insert the survivors.
Refactor the handler into three explicit phases (pre-validate +
read-into-memory, single atomic PutBatch, per-file activity row) and
add Storage.PutBatch with all-or-nothing transaction semantics.
3. FYI — pendinguploads.StartSweeperWithInterval was exported only for
tests. Move it to lower-case startSweeperWithInterval and expose the
test seam through pendinguploads/export_test.go (Go convention; the
shim file is stripped from the production binary at build time).
4. Nit — multipart Content-Type was passed verbatim into pending_uploads
rows and re-served on /content. Add safeMimetype which strips
parameters, rejects CR/LF/control bytes, and coerces malformed shapes
to application/octet-stream. The eventual GET /content response can no
longer be header-split via a crafted Content-Type on the multipart.
Comprehensive tests:
- 10 PutBatch unit tests (sqlmock): happy path, empty input, all four
pre-validation rejection paths, BeginTx error, per-row error +
Rollback (no Commit), first-row error, Commit error.
- 4 new PutBatch integration tests (real Postgres): all-rows-commit
happy path with COUNT(*) verification, atomic-rollback no-leak via
a NUL-byte filename that lib/pq rejects mid-batch, oversize
short-circuit no-Tx, idx_pending_uploads_acked existence + partial
predicate via pg_indexes (planner-shape-independent).
- 3 new chat_files_poll tests: atomic rollback on second-file oversize,
atomic rollback on PutBatch error, mimetype CRLF/NUL/parameter
sanitization (8 sub-cases).
The two remaining review findings (inbox_uploads.fetch_and_stage blocks
the poll loop synchronously; two httpx Clients per row) are Python-side
and ship in Phase 5b once this lands on staging.
Test-only export pattern via export_test.go, atomic pre-validation
discipline (validate before Tx), and behavior-based (not name-based)
test assertions follow the standing project conventions.
Closes the gap between the merged Memory v2 code (PR #2757 wired the
client into main.go) and operator activation. Without this PR an
operator wanting to flip MEMORY_V2_CUTOVER=true had to provision a
separate memory-plugin service and point MEMORY_PLUGIN_URL at it —
extra ops surface for what the design intends to be a built-in.
What ships:
* Both Dockerfile + Dockerfile.tenant build the
cmd/memory-plugin-postgres binary into /memory-plugin.
* Entrypoints spawn the plugin in the background on :9100 BEFORE
starting the main server; wait up to 30s for /v1/health to return
200; abort boot loud if it doesn't (better to crash-loop than to
silently route cutover traffic against a dead plugin).
* Default env: MEMORY_PLUGIN_DATABASE_URL=$DATABASE_URL (share the
existing tenant Postgres — plugin's `memory_namespaces` /
`memory_records` tables coexist with platform schema, no
conflicts), MEMORY_PLUGIN_LISTEN_ADDR=:9100.
* MEMORY_PLUGIN_DISABLE=1 escape hatch for operators running the
plugin externally on a separate host.
* Platform image: plugin runs as the `platform` user (not root) via
su-exec — matches the privilege boundary the main server already
drops to. Tenant image already starts as `canvas` so the plugin
inherits non-root automatically.
What stays operator-controlled:
* MEMORY_V2_CUTOVER is NOT auto-set. Behavior change for existing
deployments: zero. The wiring at workspace-server/internal/memory/
wiring/wiring.go skips building the plugin client until the
operator opts in, so the running sidecar is a no-op for traffic
until then.
* MEMORY_PLUGIN_URL is NOT auto-set either, for the same reason —
setting it implies cutover-active intent. Operators set both on
staging first, verify a live commit/recall round-trip (closes
pending task #292), then promote to production.
Operator activation steps after this PR ships:
1. Verify pgvector extension is available on the target Postgres
(the plugin's first migration runs CREATE EXTENSION IF NOT
EXISTS vector). Railway's managed Postgres ships pgvector
available; some self-hosted operators may need to enable it.
2. Redeploy the workspace-server with this image.
3. Set MEMORY_PLUGIN_URL=http://localhost:9100 + MEMORY_V2_CUTOVER=true
in the environment (staging first).
4. Watch boot logs for "memory-plugin: ✅ sidecar healthy" and the
wiring.go cutover messages; do a live commit_memory + recall_memory
round-trip via the canvas Memory tab to verify.
5. Promote to production once staging holds for a sweep window.
Refs RFC #2728. Closes the dormant-plugin gap noted in task #294.
Reported: agents receiving messages via inbox_peek / wait_for_message
get a plain envelope — text + peer_id + kind only. The push-path
(a2a_mcp_server._build_channel_notification) already enriches the
meta dict with peer_name, peer_role, and agent_card_url from the
registry cache, but the poll-path returns InboxMessage.to_dict()
unchanged. So a Claude Code host with channel-push gets the friendly
identity, but every other MCP client (and Claude Code with push
disabled — the universal default) sees plain text.
This silently breaks the contract documented in
a2a_mcp_server.py:303-345:
> In both paths the same fields apply: kind, peer_id, peer_name,
> peer_role, agent_card_url, activity_id
Fix: a2a_tools._enrich_inbound_for_agent() — same shape as the
push-path's enrichment, called from tool_inbox_peek and
tool_wait_for_message. Cache-first non-blocking (5-min TTL via
enrich_peer_metadata_nonblocking, same helper push uses), so a cache
miss returns immediately with bare envelope and warms the cache for
the next poll. agent_card_url is constructable from peer_id alone
and surfaces even on cache miss, so the receiving agent always has
a single endpoint to hit for capabilities.
Degradation paths:
- canvas_user (peer_id="") → pass through unchanged, no enrichment
- a2a_client unavailable (test harness without registry) → bare
envelope, agent still gets text + peer_id + kind + activity_id
Tests:
- canvas_user passes through unchanged
- peer_agent cache hit → name + role + agent_card_url all present
- peer_agent cache miss → agent_card_url still constructed
- a2a_client unavailable → bare envelope, no crash
All 4 pass against fixed code. Without the fix, the cache-hit and
cache-miss tests would fail (peer_name/peer_role/agent_card_url keys
absent from to_dict's output).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported: "right now when chat box opens it opens in the middle, but
it should be at the end of conversation."
Root cause: ChatTab.tsx:548 fires `bottomRef.scrollIntoView({ behavior:
"smooth" })` on every messages-update. On initial mount with N
messages already loaded, the smooth-scroll triggers a ~300ms animation
that any concurrent React re-render (agent push landing, theme
toggle, sidepanel resize) interrupts mid-flight, leaving the user
stuck somewhere in the middle of the conversation.
Fix: track first-mount via hasInitialScrollRef. Use behavior:"instant"
for the initial jump (deterministic, no animation interruption), then
smooth for subsequent appends (the new-message-landing visual stays).
Refs flipped on first messages.length > 0 transition, so:
- Initial open of chat tab: instant jump to bottom ✓
- New agent message arrives: smooth scroll into view ✓
- Workspace switch (ChatTab remounts): fresh hasInitialScrollRef, gets
instant again ✓
- loadOlder prepend: anchor-restore path unchanged, still pins user's
reading position ✓
Test plan:
- pnpm test --run ChatTab.lazyHistory.test.tsx → 8 pass (existing
lazy-history tests untouched)
- npx tsc --noEmit clean
- Manual on hongming.moleculesai.app: open a busy chat (mac laptop,
~50 messages), confirm view lands at the latest bubble, not mid-
scroll. Switch to another workspace + back → instant again.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TestStartSweeper_RecordsMetricsOnError flaked on every CI rerun under
race detection: `error counter delta = 0, want 1`. Root cause is a
race between two goroutines, not a bug in the production sweeper.
The fake `fakeSweepStorage.Sweep` signals `cycleDone` from inside its
deferred return — that happens BEFORE Sweep's return value is
received by `sweepOnce`, which is what triggers the metric increment.
On slow CI hosts the test goroutine wins the read after `waitForCycle`
unblocks and BEFORE StartSweeper's goroutine has called
`metrics.PendingUploadsSweepError`, so the asserted delta is 0 even
though the metric WILL be 1 a few ms later.
Adds a polling assert helper, `waitForMetricDelta`, that closes the
race deterministically without timing-based sleeps:
- TestStartSweeper_RecordsMetricsOnError uses waitForMetricDelta to
wait for the error counter to settle at 1.
- TestStartSweeper_RecordsMetricsOnSuccess uses it on the success
counters (acked, expired) so the error-stayed-zero assertion
reads after StartSweeper has fully processed the cycle.
- waitForCycle keeps its current shape but documents the caveat in
its comment so future tests don't repeat the assumption.
Verified: `go test ./internal/pendinguploads/ -race -count 5` passes
all 9 tests across 5 iterations cleanly.
Per memory feedback_question_test_when_unexpected.md: the
"delta=0, want=1" failure looked like a real production bug at first
glance, but instrumented inspection showed the metric DOES increment,
just AFTER the test's read. The fix is the test's wait shape, not
the sweeper.
Unblocks every PR currently broken by this flake (#2898 hit it on
two consecutive CI runs; staging-merged PRs from earlier today
(#2877/#2881/#2885/#2886) introduced the test).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported every SaaS workspace defaults to T2 (Standard). Three
sites quietly disagreed on the default:
- canvas CreateWorkspaceDialog (line 126): isSaaS ? 4 : 3 ← only correct one
- canvas EmptyState "Create blank": tier: 2 ← hardcoded
- workspace.go POST /workspaces: tier = 3 ← not SaaS-aware
- org_import.go createWorkspaceTree: tier = 2 (fallback)← not SaaS-aware
So a user clicking "+ New Workspace" via the dialog got T4 on SaaS,
but a user clicking "Create blank" on the empty canvas got T2, and an
agent POSTing /workspaces directly got T3. Same tenant, three different
tiers depending on entry point.
Fix:
1. WorkspaceHandler.IsSaaS() and DefaultTier() helpers (workspace_dispatchers.go).
IsSaaS() := h.cpProv != nil — single source of truth for "are we
SaaS" across the file. DefaultTier() returns 4 on SaaS, 3 on
self-hosted. SaaS rationale: each workspace runs on its own sibling
EC2 so the per-workspace tier boundary is a Docker resource limit
on the only container present — no neighbour to protect from. T4
matches the boundary.
2. workspace.go now defaults tier via h.DefaultTier() instead of
hardcoded T3.
3. org_import.go fallback (when neither ws.tier nor defaults.tier set)
becomes SaaS-aware: T4 on SaaS, T2 on self-hosted (preserve the
existing safe-shared-Docker-daemon default for self-hosted org
imports).
4. canvas EmptyState "Create blank" stops sending tier:2 in the body
and lets the backend pick — single source of truth in the backend.
Eliminates the third disagreement.
Test plan:
- go vet ./... clean
- go test ./internal/handlers/ -count 1 — all green (4.3s)
- npx tsc --noEmit on canvas — clean
- Staging E2E (after deploy): create a fresh workspace via canvas
empty-state on hongming.moleculesai.app, confirm tier=4 on the
workspace details panel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inherits the iter 4b test retarget commit through rebase. Adds the
remaining 4 patch sites in test_a2a_multi_workspace.py that target
get_peers_with_diagnostic — that call site moved from a2a_tools to
a2a_tools_messaging in this PR.
Refs RFC #2873 iter 4d.
Fourth slice of the a2a_tools.py split (stacked on iter 4c). Owns the
four human-and-peer messaging MCP tools + the chat-upload helper:
* _upload_chat_files — stage local paths to /chat/uploads
* tool_send_message_to_user — push canvas-chat via /notify
* tool_list_peers — discover peers across registered workspaces
* tool_get_workspace_info — JSON-encode workspace info
* tool_chat_history — fetch prior conversation rows with a peer
a2a_tools.py shrinks from 508 → 213 LOC (−295). The remaining 213
is just report_activity + back-compat re-exports. Inbox tools
(tool_inbox_peek/pop/wait_for_message) deferred to iter 4e.
Layered architecture: messaging depends on a2a_tools_rbac (iter 4a),
a2a_client, platform_auth — NOT on kitchen-sink a2a_tools. An
import-contract test pins this so future refactors that add
`from a2a_tools import …` fail in CI.
Tests:
* 28 patch sites in TestToolSendMessageToUser + TestToolListPeers +
TestToolGetWorkspaceInfo + TestChatHistory retargeted from
`a2a_tools.{httpx, get_peers_*, get_workspace_info,
_upload_chat_files, _peer_*, list_registered_workspaces}` to
`a2a_tools_messaging.…` because the call sites moved.
* test_a2a_tools_messaging.py adds 7 new tests:
- 5 alias drift gates
- 2 import-contract tests (no top-level a2a_tools dep + a2a_tools
surfaces every messaging symbol)
137 tests total in the a2a_tools suite, all green.
Refs RFC #2873.
Third slice of the a2a_tools.py split (stacked on iter 4b). Owns the
two persistent-memory MCP tools:
* tool_commit_memory — write to /workspaces/:id/memories with RBAC
+ GLOBAL-scope tier-zero enforcement
* tool_recall_memory — search /workspaces/:id/memories with RBAC
a2a_tools.py shrinks from 609 → 508 LOC (−101). Both handlers depend
ONLY on a2a_tools_rbac (iter 4a), a2a_client, and the platform's
/memories endpoint — no entanglement with delegation or messaging.
Side-effects of the layered architecture: a2a_tools_memory's import
contract is "depends on a2a_tools_rbac, never on a2a_tools" — the
kitchen-sink module is for back-compat re-exports only. A test pins
this so a future refactor that re-introduces `from a2a_tools import …`
fails in CI.
Tests:
* 49 patch sites in TestToolCommitMemory + TestToolRecallMemory
retargeted from `a2a_tools.{_check_memory_*, _is_root_workspace,
httpx.AsyncClient}` to `a2a_tools_memory.…` because the call sites
moved.
* test_a2a_tools_memory.py adds 4 new tests (alias drift gate +
import-contract + a2a_tools-side re-export).
117 tests total (77 impl + 28 rbac + 8 delegation + 4 memory), all green.
Refs RFC #2873.
CI caught two test files I missed in the original iter 4b retarget:
test_a2a_multi_workspace.py + test_delegation_sync_via_polling.py
patch a2a_tools.{discover_peer, send_a2a_message, _delegate_sync_via_polling,
httpx.AsyncClient} but those call sites moved to a2a_tools_delegation
in this PR. 17 patch sites retargeted; 30 tests now green.
Refs RFC #2873 iter 4b.
The previous TestCreateWorkspaceTree_CallsLookupBeforeInsert used
bytes.Index("INSERT INTO workspaces"), which prefix-matches
INSERT INTO workspaces_audit, INSERT INTO workspace_secrets, and
INSERT INTO workspace_channels. RFC #2872 cited this as a silent
false-pass mode: a future refactor that adds an audit-table INSERT
literal earlier in source than the real workspaces INSERT would
make the gate point at the wrong target.
Replaces the byte-search with a go/ast walk + a regex that requires
`\s*\(` after `workspaces` — distinguishes the real target from
prefix lookalikes.
Adds three discriminating tests:
- TestWorkspacesInsertRE_RejectsLookalikes — pins the regex against
9 sql shapes (real, raw-string-literal, audit-shadow, workspace_*
prefixes, canvas_layouts, UPDATE/SELECT, comments).
- TestGate_FailsWhenLookupAfterInsert — synthesizes Go source where
the lookup is positioned AFTER the workspaces INSERT, asserts the
helper returns lookupPos > insertPos (which the production gate
flags via t.Errorf). Proves the gate isn't vestigial.
- TestGate_IgnoresAuditTableShadow — synthesizes source with an
audit-table INSERT BEFORE the lookup + real INSERT, asserts the
tightened regex correctly walks past the shadow and finds the
real INSERT.
Also extracts findLookupAndWorkspacesInsertPos as a helper so the
gate logic can be exercised against synthetic source, not only
against the real org_import.go.
Memory: feedback_assert_exact_not_substring.md (verify tightened
test FAILS on old code) — TestGate_FailsWhenLookupAfterInsert is
the failing-on-bug-shape proof.
Closes the silent-false-pass mode of #2872 Important-1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4 closes out the rollout — strict-sqlmock unit tests pin which
SQL fires, but they cannot detect bugs that depend on the actual row
state after the SQL runs. Real-Postgres integration tests catch:
- the Sweep CTE depends on Postgres' make_interval function and
the table's CHECK constraints; sqlmock would happily accept a
hand-written SQL literal that Postgres rejects at runtime.
- the partial idx_pending_uploads_unacked index only catches a
wrong WHERE predicate at real-query-plan time.
- subtle predicate drift (e.g. a WHERE clause that filters by
acked_at IS NOT NULL but uses BETWEEN incorrectly).
Test cases:
- PutGetAckRoundTrip: the full happy path — Put, Get, MarkFetched,
Ack, idempotent re-Ack, Get-after-Ack returns ErrNotFound.
- Sweep_DeletesAckedAfterRetention: row not eligible at retention=1h
immediately after Ack; deleted at retention=0.
- Sweep_DeletesExpiredUnacked: backdated expires_at exercises the
unacked-and-expired branch of the WHERE clause.
- Sweep_DeletesBothCategoriesInOneCycle: three rows (acked, expired,
fresh); a single Sweep deletes the first two and leaves the third.
- PutEnforcesSizeCap: ErrTooLarge above MaxFileBytes.
- GetIgnoresExpiredAndAcked: Get filters predicate matches expected
row state in the table.
Run path:
- locally via the file-header docker incantation.
- CI runs on every PR/push that touches handlers/** OR migrations/**
(.github/workflows/handlers-postgres-integration.yml).
Second slice of the a2a_tools.py split (stacked on iter 4a). Owns the
three delegation MCP tools + the RFC #2829 PR-5 sync-via-polling
helper they share:
* tool_delegate_task — synchronous delegation
* tool_delegate_task_async — fire-and-forget
* tool_check_task_status — poll the platform's /delegations log
* _delegate_sync_via_polling — durable async + poll for terminal status
* _SYNC_POLL_INTERVAL_S / _SYNC_POLL_BUDGET_S constants
a2a_tools.py shrinks from 915 → 609 LOC (−306). Stacked on iter 4a's
RBAC extraction; uses `from a2a_tools_rbac import auth_headers_for_heartbeat`
as its auth-header source.
The lazy `from a2a_tools import report_activity` inside tool_delegate_task
breaks the circular-import cycle (a2a_tools imports the delegation
re-exports at module-load; delegation handler needs report_activity at
CALL time). A dedicated test pins this contract.
Tests:
* 77 existing test_a2a_tools_impl.py tests pass after retargeting
20 patch sites in TestToolDelegateTask + TestToolDelegateTaskAsync +
TestToolCheckTaskStatus from `a2a_tools.foo` to
`a2a_tools_delegation.foo` (foo ∈ {discover_peer, send_a2a_message,
httpx.AsyncClient}). The patches need to target the new module
because that's where the call sites live now.
* test_a2a_tools_delegation.py adds 8 new tests:
- 6 alias drift gates (`a2a_tools.tool_delegate_task is …`)
- 2 import-contract tests (no top-level circular dep + a2a_tools
surfaces every delegation symbol)
- 1 sync-poll budget invariant
113 tests total (77 impl + 28 rbac + 8 delegation), all green.
Refs RFC #2873.
Iter 4a's new module needs to be in the rewrite list so the wheel
ships its imports prefixed correctly. Caught by 'PR-built wheel +
import smoke'.
Refs RFC #2873 iter 4a.
The iter-3 split created mcp_heartbeat / mcp_inbox_pollers /
mcp_workspace_resolver but the wheel build's drift-gate check at
scripts/build_runtime_package.py:TOP_LEVEL_MODULES wasn't updated.
Without this fix the wheel ships those modules un-rewritten, so
their imports of platform_auth / configs_dir / etc. break at
runtime. Caught by the 'PR-built wheel + import smoke' check.
Refs RFC #2873 iter 3.
Phase 3 of the poll-mode chat upload rollout. Stack atop Phase 2.
The platform's pending_uploads table grows once-per-uploaded-file with
no built-in cleanup. Phase 1's hard TTL (expires_at default 24h) makes
expired rows un-fetchable but doesn't actually delete them; Phase 1's
ack stamps acked_at but leaves the row indefinitely. Without a sweep
the table grows unbounded across normal traffic.
This PR adds:
- `Storage.Sweep(ctx, ackRetention)` — a single round-trip CTE that
deletes acked rows past their retention window plus unacked rows
past expires_at. Returns `(acked, expired)` deletion counts so
Phase 3 dashboards can spot the stuck-fetch pattern (high expired,
low acked) vs healthy churn.
- `pendinguploads.StartSweeper(ctx, storage, ackRetention)` —
background goroutine that calls Sweep every 5 minutes (default).
Runs once immediately on startup so a platform restart cleans up
any rows that became eligible while we were down.
- Prometheus counters `molecule_pending_uploads_swept_total` with
`outcome={acked,expired,error}` labels. Wired into the existing
`/metrics` endpoint.
- Wired from cmd/server/main.go via supervised.RunWithRecover —
one transient panic doesn't take the platform down with it.
Defaults:
- SweepInterval = 5m (matches the dashboard refresh cadence)
- DefaultAckRetention = 1h (gives the workspace at-least-once retry
headroom in case it processed but failed to write the file before
crashing)
Test coverage: 100% on storage_test.go (extended with sweepSQL pin +
six Sweep test cases including negative-retention clamp + zero-retention
immediate-delete + DB error wrapping) and sweeper_test.go (ticker-driven
+ ctx-cancel + nil-storage + transient-error-doesn't-crash + metric
counter assertions).
Closes the third of four phases tracked on the parent RFC; phase 4 is
the staging E2E test.
Closes#2865 (split-B of the #2669 root-cause stack).
The phantom-busy sweep in workspace-server/internal/scheduler/scheduler.go
already logs each row reset, but no aggregate metric surfaces "how often
is this firing." A regression that causes high reset rates (e.g.
controlplane#481's missing env vars, or future drift in the workspace
runtime's task-lifecycle accounting) only surfaces when users complain.
Fix: counter exposed at /metrics as molecule_phantom_busy_resets_total,
incremented from sweepPhantomBusy after each row whose active_tasks
was reset. Same shape as existing molecule_websocket_connections_active.
Operator-side dashboard: alert when daily phantom-busy reset count
> 0.5% of active workspaces. Today's steady-state is near-zero; any
increase is a regression signal.
Tests:
- TestTrackPhantomBusyReset_IncrementsCounter
- TestTrackPhantomBusyReset_RaceFreeUnderConcurrentWrites (50×200
concurrent writes; tests atomic invariant)
- TestHandler_ExposesPhantomBusyResetsCounter (asserts HELP + TYPE
+ value lines in Prometheus text format)
- TestHandler_PhantomBusyResetsZeroByDefault (fresh-process 0
contract — prevents a future refactor from accidentally dropping
the metric from /metrics)
Race-detector clean. Vet clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First slice of the a2a_tools.py (991 LOC) split — single-concern module
for the workspace's RBAC + auth-header layer:
* _ROLE_PERMISSIONS canonical table
* _get_workspace_tier
* _check_memory_write_permission
* _check_memory_read_permission
* _is_root_workspace
* _auth_headers_for_heartbeat
a2a_tools.py shrinks from 991 → 915 LOC. Internal call sites (15
references) work unchanged because the bare names are re-imported at
module-level — Python's local-then-module name resolution still
finds them in a2a_tools's namespace, so existing tests'
patch("a2a_tools._foo", …) keeps working.
The RBAC layer can now evolve independently of the 18 tool handlers.
Adding a new role or capability action touches one file, not the
kitchen-sink module.
Tests:
* 77 existing test_a2a_tools_impl.py pass unchanged.
* test_a2a_tools_rbac.py adds 28 focused tests:
- 6 alias drift-gate tests (`_foo is rbac.foo`)
- 4 get_workspace_tier env+config branches
- 2 is_root_workspace tier branches
- 6 check_memory_write_permission roles + override branches
- 3 check_memory_read_permission scenarios
- 3 auth_headers_for_heartbeat platform_auth branches
- 4 ROLE_PERMISSIONS table invariants
* Direct coverage for the helper module (was previously only
exercised through 991-LOC tool-handler tests).
Refs RFC #2873.
Splits the standalone molecule-mcp wrapper into three single-concern
modules per the OSS-shape refactor program:
* mcp_heartbeat.py — register POST + heartbeat loop + auth-failure
escalation + inbound-secret persistence
* mcp_workspace_resolver.py — single + multi-workspace env validation
+ on-disk token-file read + operator-help printer
* mcp_inbox_pollers.py — activate inbox singleton + spawn one daemon
poller per workspace
mcp_cli.py becomes a 193-LOC orchestrator: validates env, calls each
module's helpers, hands off to a2a_mcp_server.cli_main. The console-
script entry molecule-mcp = molecule_runtime.mcp_cli:main is preserved.
Back-compat aliases (mcp_cli._build_agent_card, _heartbeat_loop,
_resolve_workspaces, etc.) re-export the new modules' authoritative
functions so existing tests + wheel_smoke.py + any downstream caller
keeps working unchanged. A new test file pins each alias as the
exact same callable (drift gate via `is`).
Tests:
* 62 existing test_mcp_cli.py + test_mcp_cli_multi_workspace.py
pass against the split.
* Two heartbeat-loop persist tests + the auth-escalation caplog
setup updated to target mcp_heartbeat (the module where the loop
body now lives) instead of mcp_cli (still works through aliases
for direct calls, but Python's name resolution inside the loop
body uses the new module's namespace).
* test_mcp_cli_split.py adds 11 new tests: alias drift gate +
inbox-poller single + multi-workspace branches + degraded
inbox-import logging path (none of those existed before).
Refs RFC #2873.
2026-05-05 04:33:06 -07:00
218 changed files with 24693 additions and 4570 deletions
[retarget-bot] This PR was opened against `main` and has been retargeted to `staging` automatically.
**Why:** per [SHARED_RULES rule 8](https://github.com/Molecule-AI/molecule-ai-org-template-molecule-dev/blob/main/SHARED_RULES.md), all feature work targets `staging` first; the CEO promotes `staging → main` separately.
**Why:** per [SHARED_RULES rule 8](https://github.com/molecule-ai/molecule-ai-org-template-molecule-dev/blob/main/SHARED_RULES.md), all feature work targets `staging` first; the CEO promotes `staging → main` separately.
**What changed:** just the base branch — no code change. CI will re-run against `staging`. If you get merge conflicts, rebase on `staging`.
echo "::warning::orphan-tunnels cleanup returned HTTP $http_code — body: $body"
fi
- name:Dry-run summary
if:env.DRY_RUN == 'true'
run:|
echo "DRY RUN — would have deleted ${{ steps.identify.outputs.count }} org(s). Re-run with dry_run=false to actually delete."
echo "DRY RUN — would have deleted ${{ steps.identify.outputs.count }} org(s) AND triggered orphan-tunnels cleanup. Re-run with dry_run=false to actually delete."
"Run `/plugin marketplace add Molecule-AI/molecule-mcp-claude-channel` then `/plugin install molecule@molecule-mcp-claude-channel` inside Claude Code, then `/reload-plugins`.",
},
{
symptom:"not on the approved channels allowlist",
check:
"Custom channels need `--dangerously-load-development-channels` on the launch command. Team/Enterprise orgs need admin to set `channelsEnabled` + `allowedChannelPlugins` in claude.ai admin settings.",
},
{
symptom:"Inbound messages not arriving",
check:
"Check stderr for `molecule channel: connected — watching N workspace(s)`. Verify ~/.claude/channels/molecule/.env has the right PLATFORM_URL + token.",
"Tail ~/.hermes/gateway.log. YAML duplicate-key in config.yaml is the most common cause — `gateway:` block must appear exactly once.",
},
{
symptom:"Plugin not discovered after install",
check:
"Run `pip show hermes-channel-molecule` to confirm install. Some hermes builds need `hermes plugin reload` before the new platform_plugins entry takes effect.",
When a workspace is expanded into a team, it gains sub-workspaces while its own agent remains as the **team lead** (coordinator). This is recursive — sub-workspaces can themselves be expanded into teams, infinitely deep.
- The platform can always discover any workspace (for provisioning, monitoring)
- The parent workspace can discover its sub-workspaces
- Sub-workspaces can discover their siblings (same parent)
- Outside workspaces get a **403 Forbidden** if they try to discover a private sub-workspace
## How to Expand
Expansion is triggered via `POST /workspaces/:id/expand`. The platform reads the `sub_workspaces` list from the workspace's config and provisions each one. On the canvas, users right-click a workspace node and select "Expand into team."
Collapsing is the inverse: `POST /workspaces/:id/collapse`. Sub-workspaces are stopped and removed.
## What Happens on Expansion
When Developer PM is expanded into a team, the hierarchy changes but the outside view doesn't. Business Core's parent/child relationship to Developer PM is unaffected — Developer PM still responds to the same A2A endpoint.
The events fired:
- `WORKSPACE_EXPANDED` with the new `sub_workspace_ids` in the payload
- `WORKSPACE_PROVISIONING` for each new sub-workspace
- `WORKSPACE_ONLINE` for each sub-workspace as they come up
Communication rules are automatically derived from the new hierarchy — no manual wiring needed.
## Canvas Behavior
- Children render as embedded mini-cards (`TeamMemberChip`) inside the parent node, not as separate canvas nodes
- Each mini-card shows full status: gradient bar, name, tier badge, skills pills, active tasks, descendant count
- **Recursive rendering** up to 3 levels deep (`MAX_NESTING_DEPTH = 3`) — sub-cards can contain their own "Team" sections
@@ -41,8 +41,6 @@ Full contract: `docs/runbooks/admin-auth.md`.
| GET | /admin/workspaces/:id/test-token | admin_test_token.go — mint a fresh bearer token for E2E scripts; returns 404 unless `MOLECULE_ENV != production` or `MOLECULE_ENABLE_TEST_TOKENS=1` |
@@ -18,7 +18,7 @@ lands in the watch list with a colliding term, add a row here.
| **plugin** | A directory under `plugins/` packaging one or more skills or an MCP server wrapper, installable per-workspace via `POST /workspaces/:id/plugins`. Governed by `plugin.yaml`. | **Langflow**: a visual UI node / component in a flowchart. **CrewAI**: a Python-importable callable registered as a capability. |
| **agent** | A persistent containerized workspace running continuously — an identity with memory, a role, and a schedule. Not a one-shot invocation. | Most frameworks (AutoGPT, LangChain agents, OpenAI Assistants): a stateless function-call loop. No persistence between invocations unless explicitly checkpointed. |
| **flow** | A task execution within a workspace — a request enters, the agent runs tools, emits a response, logs activity. No explicit graph abstraction. | **Langflow**: a directed graph of nodes you author visually. **LangGraph**: a stateful graph of callable nodes. Our "flow" is an imperative timeline, not a graph. |
| **team** | A named cluster of workspaces under a PM (org template `expand_team`). Used for role grouping in Canvas. | **CrewAI**: a "crew" is a sequence of agents that pass a task through a declared order. Our "team" is an org-chart abstraction, not an execution order. |
| **team** | A named cluster of workspaces under a PM . Used for role grouping in Canvas. | **CrewAI**: a "crew" is a sequence of agents that pass a task through a declared order. Our "team" is an org-chart abstraction, not an execution order. |
| **skill** | A directory with `SKILL.md` that an agent invokes via the `Skill` tool. Skills are documentation + optional scripts that teach an agent a recipe. | **Anthropic Skills API**: nearly identical. **CrewAI tool**: closer to our plugin's MCP tool, not our skill. |
| **channel** | An outbound/inbound social integration (Telegram, Slack, …) per-workspace, wired in `workspace_channels`. | Slack's "channel": the container for messages. We use "channel" for the adapter + credentials, not the conversation itself. |
| **runtime** | The execution engine image tag for a workspace: one of `langgraph`, `claude-code`, `openclaw`, `crewai`, `autogen`, `deepagents`, `hermes`. | **LangGraph runtime**: the Python process running the graph. We use "runtime" for the Docker image + adapter pairing, not the inner process. |
# Comment body — kept short; the issue body has the full design.
comment_body(){
localage_h="$1"
cat <<EOF
⚠️ This auto-promote PR has been BLOCKED on \`REVIEW_REQUIRED\`for **${age_h}h**.
Auto-merge is armed, but main's branch protection requires 1 review and no human has approved. Until someone reviews, the staging→main promote chain is wedged and downstream consumers (canvas builds, tenant redeploys) won't see new code.
**Action**: a human reviewer on \`@Molecule-AI/maintainers\` should approve this PR (or mark it as not ready and close).
Detected by \`scripts/check-stale-promote-pr.sh\` per issue #2975.
EOF
}
post_comment(){
localpr_num="$1"
localage_h="$2"
if["$POST_COMMENT" !="true"];then
return0
fi
# Idempotency: only one alarm comment per PR. Look for the marker
# string in existing comments before posting a new one.
echo"memory-plugin: starting sidecar on $MEMORY_PLUGIN_LISTEN_ADDR" >&2
# Drop privs to the platform user — the plugin doesn't need root and
# runs unprivileged elsewhere (tenant image already starts as canvas).
su-exec platform /memory-plugin &
MEMORY_PLUGIN_PID=$!
# Wait up to 30s for the plugin's /v1/health to return 200. Boot
# failure here is fatal — better to crash-loop than to silently
# serve cutover traffic against a dead plugin.
health_port=${MEMORY_PLUGIN_LISTEN_ADDR#:}
ready=0
for _ in $(seq 1 30);do
if wget -qO- --timeout=2"http://localhost:${health_port}/v1/health" >/dev/null 2>&1;then
ready=1
break
fi
sleep 1
done
if["$ready" !="1"];then
echo"memory-plugin: ❌ /v1/health never returned 200 after 30s — aborting boot. Check that DATABASE_URL is reachable, has the pgvector extension, and the plugin's migrations applied." >&2
kill"$MEMORY_PLUGIN_PID" 2>/dev/null || true
exit1
fi
echo"memory-plugin: ✅ sidecar healthy on :$health_port" >&2
fi
exec su-exec platform /platform "$@"
ENTRY
RUN chmod +x /entrypoint.sh && apk add --no-cache su-exec
t.Errorf("preflightContainerHealth must call provisioner.IsRunning OR provisioner.RunningContainerName for the SSOT health check — see molecule-core#36. Found neither.")
}
ifcallsContainerInspectRaw{
t.Errorf("preflightContainerHealth carries a direct ContainerInspect call. This is the parallel-impl drift molecule-core#36 fixed. "+
"Either route through provisioner.IsRunning OR — if a new use case truly needs a different inspect — extend the helper's contract first and update this gate to allow the specific delta.")
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.