Two surfaces in workspace-server hardcoded `ghcr.io` and silently bypassed
the `MOLECULE_IMAGE_REGISTRY` env override that flips every other image
operation to the configured private mirror (e.g. AWS ECR in production):
1. internal/imagewatch/watch.go — image-auto-refresh polled
`https://ghcr.io/v2/...` and `https://ghcr.io/token` directly. Post-
suspension, with the platform pointed at ECR, the watcher silently
stopped seeing digest changes (every poll either 404'd or hung on a
registry it has no business talking to).
2. internal/handlers/admin_workspace_images.go — Docker Engine auth
payload pinned `serveraddress: "ghcr.io"`, so when the operator sets
`MOLECULE_IMAGE_REGISTRY=…ecr…/molecule-ai` the engine matched the
wrong credential entry on every authenticated pull.
Fix: extract `provisioner.RegistryHost()` returning the host portion of
`RegistryPrefix()` (e.g. `ghcr.io` ← `ghcr.io/molecule-ai`, or
`004947743811.dkr.ecr.us-east-2.amazonaws.com` ← the ECR mirror prefix),
and route both surfaces through it. Default behavior is unchanged for
OSS users on GHCR.
Tests
- New `TestRegistryHost_SplitsHostFromOrgPath` and
`TestRegistryHost_NeverEmpty` pin the helper across GHCR / ECR /
self-hosted Gitea / bare-host edge cases.
- New `TestGHCRAuthHeader_RespectsRegistryEnv` asserts the Docker auth
payload's `serveraddress` follows MOLECULE_IMAGE_REGISTRY (and never
leaks the org-path suffix).
- New `TestRemoteDigest_RegistryHostFollowsEnv` stands up an httptest
server, points MOLECULE_IMAGE_REGISTRY at it, and confirms both the
token endpoint and the manifest HEAD land there — i.e. the full image-
watch loop respects the env override end-to-end.
Both new tests were verified to FAIL on the pre-fix code path before the
helper was wired in, so a future revert can't silently re-introduce the
bug.
Out of scope (followup needed)
ECR uses `aws ecr get-authorization-token` (SigV4 + basic-auth) instead
of GHCR's `/token?service=…&scope=…` flow. This PR makes the URL host-
configurable; the bearer-token negotiation in `fetchPullToken` still
speaks the GHCR flavor. On ECR with `IMAGE_AUTO_REFRESH=true`, the
watcher will now fail loudly at the token fetch (logged per tick) rather
than silently hitting ghcr.io. Operators on ECR should keep
IMAGE_AUTO_REFRESH=false until ECR auth is wired — tracked as a separate
task. Net effect of this PR alone is strictly better than pre-fix:
fail-loud > silent-broken.
Refs: RFC #229 P2-4
tier:low
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs in yaml-utils.ts toYaml():
1. tools: [] was only emitted when config.tools.length > 0,
but the test asserts it's always present. Add blank-line
separator + unconditional list("tools", ...) so MINIMAL_CONFIG
with tools: [] renders correctly.
2. Nested list values (e.g. runtime_config.required_env: [KEY])
were serialized as " required_env: KEY" (stringification of the
array) instead of a YAML list block. Fix obj() to detect
Array.isArray(sv) and emit a list block with 4-space indent.
Closes#269.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Self-delegation deadlocks: the sending turn holds `_run_lock`, the receive
handler waits for the same lock, the A2A request 30s-times-out, and the
whole cycle is wasted (the Dev Lead system prompt warns agents off this by
hand — "Never delegate_task to your own workspace ID … there is no peer who
is also you"). The platform/runtime had no guard. Now both
`tool_delegate_task` and `tool_delegate_task_async` early-return an
actionable error when `workspace_id == effective_source` (`source_workspace_id
or _peer_to_source[target] or WORKSPACE_ID`) — before `discover_peer`, so no
network round-trip is wasted either. A genuinely different target (incl.
another of a multi-workspace agent's own registered workspaces) is
unaffected.
Tests: tests/test_a2a_tools_delegation.py — new TestSelfDelegationGuard (4
cases: rejects own ID; rejects when source_workspace_id explicitly == target;
async path rejects; a different target passes the guard through to
discover_peer). `pytest tests/test_a2a_tools_delegation.py` → 12 passed.
(tests/test_a2a_tools_impl.py's TestToolDelegateTask* suite is red on this
PC2/Windows checkout — same on `main` without this change; httpx-mock infra,
not this PR — CI validates on Linux.)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[core-lead-agent] PR #281 merged — handles string-form errors in a2a_tools.delegate_task (was raising AttributeError on every delegation through legacy path), fixes empty-parts dict regression (#279), and drops the dead staging branch trigger from both publish workflows. Replaces the abandoned PR #268 + #277. Integration Tester unblocked for mesh recovery validation.
internal#226 follow-up #1. `molecule_runtime.config` resolves the picked
model as `MOLECULE_MODEL` > `MODEL` > (legacy) `MODEL_PROVIDER` (#280) —
this side of the boundary now matches:
- applyRuntimeModelEnv reads `MOLECULE_MODEL` ahead of `MODEL` /
`MODEL_PROVIDER`, and exports BOTH `MOLECULE_MODEL` and `MODEL`
(the latter kept for back-compat with everything that already reads
`os.environ["MODEL"]`). So a workspace whose secrets carry
`MOLECULE_MODEL` (the unambiguous name) is honoured, and the
`MODEL_PROVIDER` misnomer — which got set to provider slugs
("minimax") and even runtime names ("claude-code") — is the lowest-
priority fallback, exactly as on the runtime side.
- the resolution-order comment is updated to flag MODEL_PROVIDER as the
legacy-and-misleadingly-named var.
Also drops a stray trailing `}` in delegation_test.go (committed in
97768272 "test(delegation): add isDeliveryConfirmedSuccess helper") that
made `internal/handlers` fail to parse — one of the things keeping the
package from compiling for tests.
Tests: TestApplyRuntimeModelEnv_SetsUniversalMODELForAllRuntimes extended
to assert MOLECULE_MODEL mirrors MODEL on every case, plus two new cases
(MOLECULE_MODEL env fallback; MOLECULE_MODEL beats MODEL_PROVIDER). Could
not run `go test ./internal/handlers/` locally — the package is still
blocked behind `internal/plugins` `SourceResolver` redeclaration (the
#248 plugin-router/resolver refactor, Core-BE's lane); CI validates once
that lands. The applyRuntimeModelEnv change is mechanical (same shape as
the existing `MODEL` handling) — reviewer please eyeball.
Companion: molecule-core#280 (runtime config.py side), molecule-ai-workspace-template-claude-code#14 (CLI-stream-error surfacing).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run `docker info` as the first CI step to catch runner Docker socket
permission issues (docker.sock unreadable, daemon restarted, group
membership drift) before the expensive `docker build` step. The error
now surfaces immediately with a clear `::error::` message rather than
silently continuing into `docker build` where the same failure would
appear 60-90s later as a cryptic ECR auth error.
Gitea Actions run 4350 (2026-05-10 05:58 UTC) is the trigger: the runner's
docker.sock became inaccessible for ~6 minutes, `docker build` failed
at step 2 with `permission denied...docker.sock`, and `go build` (step 3)
was never reached — masking the compile errors that were already on
main. The downstream code errors only surfaced once run 4407 succeeded
at `docker build` and finally reached `go build`.
Now: `docker info` → fail in ~1s with actionable error.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two-part fix from PR #268 (ported by Integration Tester after PR #268
was closed without merge):
PART 1 — workspace/builtin_tools/a2a_tools.py: Fixes AttributeError
when platform returns a plain string as the error field. Before:
data["error"].get("message") ← crashes if error is a string
After:
isinstance(err, dict) → err.get("message")
isinstance(err, str) → use err directly
otherwise → str(err)
Also guards result.get("parts") against non-dict result.
Includes fix for issue #279: empty-parts regression where
{"parts": []} returned "(no text)" instead of str(result).
PART 2 — .gitea/workflows/ and .github/workflows/
publish-workspace-server-image.yml: Removed dead "staging" branch
trigger. Trunk-based migration (2026-05-08) removed the staging branch
but the workflow triggers were not updated.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR #256 introduced PluginResolver to break the SourceResolver redeclaration
deadlock, but missed three downstream call-sites that left main uncompilable:
1. plugins/drift_sweeper.go: PluginResolver.Resolve was declared returning
PluginResolver (recursive). *Registry.Resolve returns the production
SourceResolver from source.go, so *Registry didn't satisfy PluginResolver.
Fix: Resolve returns SourceResolver. Add compile-time assertion that
*Registry satisfies PluginResolver so any future signature drift fails
the build instead of router wiring.
2. plugins/drift_sweeper_test.go: stubResolver was still declared with the
old SourceResolver shape AND asserted against SourceResolver — the
assertion failed because stubResolver lacks Scheme()/Fetch(). Fix: stub
is a PluginResolver; assertion targets PluginResolver. Drop the unused
"database/sql" import that fails go vet.
3. router/router.go:
- The 70f84823 reorder moved the plgh init block above its dockerCli
dependency (line 538 used; line 594 declared). Moved the dockerCli
declaration up so it's available where used; replaced the orphaned
declaration in the terminal block with a comment.
- Setup's pluginResolver param was typed plugins.SourceResolver — wrong
for *plugins.Registry (Registry is not a per-scheme resolver). Retyped
to plugins.PluginResolver, which *Registry actually satisfies.
- Removed the broken `plgh.WithSourceResolver(pluginResolver)` call —
WithSourceResolver expects a per-scheme SourceResolver, not a
PluginResolver/registry. plgh has its own internal default registry
(github+local) from NewPluginsHandler, so dropping the call is
functionally a no-op vs the broken state. Kept the param so the
drift sweeper (main.go) can share scheme enumeration when needed.
4. go.sum: add the content hash entry for go.moleculesai.app/plugin/
gh-identity/pluginloader (only the /go.mod hash was present, breaking
`go build ./cmd/server`).
Verified locally:
go build ./... ✓
go vet ./... ✓ (only pre-existing org_external append warning)
go test ./internal/plugins/... ✓
go test ./internal/router/... ✓
6 pre-existing handler test failures (TestExecuteDelegation_*,
TestHandleDiagnose_*) are orthogonal — they did not run before because the
package didn't compile. Out of scope for this fix; tracking separately.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`molecule_runtime.config.load_config` read the `MODEL_PROVIDER` env var as
the *picked model id* — despite the name, it never carried the provider
(that's `LLM_PROVIDER` / the YAML `provider:` field). So `claude-code`,
`minimax`, and `opus` were all "valid" values for a var named
MODEL_PROVIDER. That footgun bit the dev-team rollout (2026-05-10): the
lead persona env files set `MODEL=claude-opus-4-7` (the intended model)
*and* `MODEL_PROVIDER=claude-code` (mistaking it for "the runtime"); the
loader picked up MODEL_PROVIDER → the claude CLI got `--model claude-code`
→ 404 on every turn, surfaced only as "Command failed with exit code 1"
with empty stderr (the real error is in the stream-json stdout, swallowed
by the SDK's placeholder). The 22 IC workspaces "worked" only because
their `MODEL_PROVIDER=minimax` happened to fuzzy-match on MiniMax's side —
they were actually running `--model minimax`, not `MiniMax-M2.7-highspeed`.
New precedence in `_picked_model_from_env`: `MOLECULE_MODEL` (canonical,
unambiguous) > `MODEL` (the obviously-correct name, already plumbed by
workspace-server's applyRuntimeModelEnv) > `MODEL_PROVIDER` (legacy —
still honored so canvas Save+Restart, the secret-mint path, and existing
persona env files keep working, but if it's the only one set we log a
one-time deprecation pointing at the misnomer) > the YAML `model:` field.
Applied at both the top-level `model` and `runtime_config.model`
resolution sites; semantics are otherwise unchanged. Bonus: workspaces
that already set `MODEL` correctly now get exactly that model instead of
whatever fuzzy-match the upstream did with the provider slug.
Tests: 5 new cases in test_config.py (MODEL beats MODEL_PROVIDER;
MOLECULE_MODEL beats MODEL; MODEL overrides YAML; legacy MODEL_PROVIDER
still resolves + warns; no warning when MODEL is set) + an autouse
fixture that clears MODEL*/resets the warn-latch so resolution is
deterministic regardless of the CI env or test order. `pytest
tests/test_config.py` — 66 passed; the config-importing suites
(test_preflight, test_skills_loader) — 129 passed.
Companion: molecule-dev-department PR #10 fixes the six dev-team lead
`workspace.yaml`s from `model: MiniMax-M2.7` to `model: opus`. Follow-ups
(not in scope here): plumb `MOLECULE_MODEL` from applyRuntimeModelEnv and
the canvas; strip `MODEL`/`MODEL_PROVIDER` from the operator-host persona
env files once the org-template `model:` field is authoritative end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a push-mode workspace (one with a public URL) is at capacity, the
platform queues the delegation request and returns:
{"queued": true, "message": "...", "queue_depth": N, "queue_id": "..."}
The existing SSOT parser (a2a_response.py) only handled the poll-mode
envelope (status=queued + delivery_mode=poll). Push-mode queue
responses fell through to Malformed, causing send_a2a_message to log a
warning and return an error — even though delivery was actually queued
successfully.
Fix: add handling for data.get("queued") is True as a Queued variant
with delivery_mode="push". Checked before the poll-mode envelope so the
two cases are mutually exclusive.
Fixes observed 2026-05-10: platform returning push-mode queue
envelopes to Integration Tester when Release Manager workspace was at
capacity.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a Known Limitations section to docs/agent-runtime/workspace-runtime.md
explaining that the base molecule-ai-workspace-runtime image intentionally
omits Chromium system libs (libnss3, libatk-bridge2.0-0, libxkbcommon0, etc.)
to keep the shared image lean for every workspace role.
Records the recommended workflow (E2E in CI on the Gitea Actions self-hosted
runner) and points future role-specific QA/FE templates at layering
playwright install-deps on top of the base image rather than baking it in.
Closes the documentation half of molecule-ai/molecule-app#7.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Plgh was referenced at line 505 before it was created at line 632, causing
"undefined: plgh" on main. Moved the entire Plugins block to before the
drift handler block. No functional change to registered routes — only
declaration order. Combined with d88a320f (SourceResolver→PluginResolver
rename, SSRF guard placement, and test regressions) this makes main fully
compile again.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- actions/checkout@v6 → @de0fac2e4500dabe0009e67214ff5f5447ce83dd (v6.0.2)
in secret-pattern-drift.yml
- pypa/gh-action-pypi-publish@release/v1 →
@cef221092ed1bacb1cc03d23a2d87d1d172e277b in publish-runtime.yml
Mutable action tags (e.g. @v6, @release/v1) can silently resolve to
different code over time, creating supply-chain risk. SHA-pinning
ensures the exact commit runs every time. Workspace Dockerfile was
already compliant (python:3.11-slim@sha256:...).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
go.sum still carried the pre-suspension github.com/Molecule-AI/molecule-ai-plugin-gh-identity
entries while go.mod requires go.moleculesai.app/plugin/gh-identity — so `go build` failed
with 'missing go.sum entry'. With the go.moleculesai.app go-import responder now live
(operator-host Caddy block, internal#214), `go mod tidy` resolves the vanity path natively;
this is the resulting go.sum (no replace directive, no go.mod change beyond the tidy).
Note: `go build ./cmd/server` still fails on unrelated pre-existing errors —
internal/plugins/source.go vs drift_sweeper.go SourceResolver redeclaration (#123) and
internal/router/router.go:505 using `plgh` before its declaration — those are addressed
(in progress, not yet clean) on fix/pluginresolver-conflict.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- plugins/drift_sweeper.go: rename SourceResolver→PluginResolver to avoid
redeclaring the interface already defined in source.go (core#228)
- handlers/workspace.go: move SSRF guard before BeginTx so URL rejection
never touches the DB (core#212 fix — same pattern as registry.go:324)
- handlers/restart_signals.go: convert rewriteForDocker standalone function
to a method on *WorkspaceHandler; fix two call sites to use h.rewriteForDocker
- handlers/plugins.go: change Sources() return type from plugins.SourceResolver
to pluginSources (the narrow interface satisfied by *Registry)
- handlers/admin_plugin_drift.go: remove unused "context" import
- handlers/delegation_test.go: remove stray closing brace
- handlers/restart_signals_test.go: rewrite with correct miniredis v2 API
(mr.Get takes context, mr.Set requires TTL), resolveURLTestWrapper embedding
pattern, and corrected Redis key handling
- handlers/workspace_test.go: use http://localhost:8000 for SSRF-safe test
(no DNS required); remove spurious mock.ExpectExec for Redis CacheURL call
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both are data constants exported from design-tokens.ts — TIER_CONFIG
maps tier levels 1-4 to label/color/border CSS classes, and
COMM_TYPE_LABELS maps a2a_send/a2a_receive/task_update to display
labels. No logic to test; structural shape coverage.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Issue: MEDIUM priority from canvas accessibility audit (2026-05-09).
The existing Quick Start help dialog in Toolbar omitted most keyboard shortcuts
from useKeyboardShortcuts.ts — users couldn't discover them visually.
Changes:
- Toolbar.tsx: enhance the help dialog (role="dialog") to include all
documented shortcuts: Esc, Enter, Shift+Enter, Cmd+], Cmd+[, Z, plus
mouse interaction tips for Palette, Right-click, Dbl-click, Shift+click.
Renamed from "Quick start" to "Shortcuts & tips".
- canvas-audit-items.md: update Keyboard Shortcuts section from PARTIAL
to complete; mark help dialog item as done.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR #225 introduced the AND-composition clause evaluator. PR #231
patched the per-team case-pattern matching but did NOT fix the
underlying clause-splitter bug. This PR fixes the actual root cause
behind issue #229.
Root cause (.gitea/scripts/sop-tier-check.sh ~line 289):
_clause=$(echo "$_raw_clause" \
| tr -d '()' \
| tr ',' '\n' \
| tr -d '[:space:]' \
| grep -v '^$')
`tr -d '[:space:]'` strips the newlines that `tr ',' '\n'` just
inserted. For tier:low (expression "engineers,managers,ceo") the
intermediate value is:
engineers\nmanagers\nceo
then `tr -d '[:space:]'` flattens it to:
engineersmanagersceo
The for-loop iterates ONCE over this single bogus token. The case
pattern `*engineersmanagersceo*` never matches APPROVER_TEAMS values
like " managers ", so EVERY tier:low PR fails:
::error::clause [engineers/managers/ceo]: FAIL — no approving
reviewer belongs to any of these teamsengineersmanagersceo
::error::sop-tier-check FAILED for tier:low
(Note: the missing separators in the error string `teamsengineersmanagersceo`
were a SECOND, masked bug — `_clause_names="${_clause_names:+, }${_t}"`
overwrites the variable on every iteration instead of appending. With
the splitter bug, the inner loop only ran once so the overwrite was
invisible. Fixing the splitter unmasks the accumulator bug, so we fix
both atomically.)
Fix:
_no_parens=${_raw_clause//[()]/}
_clause=${_no_parens//,/ } # comma -> space, bash word-split iterates
# Append, don't overwrite:
_clause_names="${_clause_names}${_clause_names:+, }${_t}"
_passed_clauses="${_passed_clauses}${_passed_clauses:+, }$_label"
_failed_clauses="${_failed_clauses}${_failed_clauses:+, }$_label"
Per-tier policy is UNCHANGED — this is a parser fix, not a policy
relaxation:
tier:low — engineers,managers,ceo (OR-set, ANY ONE suffices)
tier:medium — managers AND engineers AND qa???,security???
tier:high — ceo
Test: .gitea/scripts/tests/test_sop_tier_check_clause_split.sh
asserts the splitter, accumulators, and end-to-end OR-gate matching
against APPROVER_TEAMS=" managers " (the exact shape PRs #233-238 hit).
7/7 pass on the new logic.
Refs: #229, supersedes attempted fix in #231 for the same root cause.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>