Two docs covering load-bearing patterns from today's work that
weren't previously discoverable:
1. workspace/platform_tools/README.md — explains the ToolSpec
single-source-of-truth pattern (#2240), the CLI-block alignment
gap that hand-maintained generation can't close (#2258), the
snapshot golden files + LF-pinning (#2260), and the add/rename/
remove playbook. The next reader who lands in
workspace/platform_tools/ now has the design rationale + the
safe-edit procedure colocated with the code.
2. scripts/README.md — disambiguates the three measure-coordinator-
task-bounds.sh files that now exist across two repos:
- scripts/measure-coordinator-task-bounds.sh (canonical OSS, this repo)
- scripts/measure-coordinator-task-bounds-runner.sh (Hermes/MiniMax variant, this repo)
- scripts/measure-coordinator-task-bounds.sh (production-shape, in molecule-controlplane)
Cross-references reference_harness_pair_pattern (auto-memory) for
the cross-repo design rationale. Documents the common safety
pattern (cleanup trap, DRY_RUN, non-target guard,
cleanup_*_failed events) and the heartbeat-trace caveat.
Refs: #2240, #2254, #2257, #2258, #2259, #2260; molecule-controlplane#321.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review follow-up on #2258 (registry snapshot tests, just merged).
The byte-exact snapshot comparisons in test_platform_tools.py would
fail mysteriously on a Windows contributor's machine with
core.autocrlf=true: checkout would convert LF → CRLF, the test would
fail locally with no useful diagnostic, and the regen instructions
in the test-file header would produce LF files that disagree with
the working copy.
Pin workspace/tests/snapshots/*.txt to text eol=lf so this can't
happen. All three current snapshots are already LF; the attribute
ensures it stays that way.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review follow-ups on #2257:
- Drop `local exit_code=$?` from cleanup(). `trap`-handler return values
are ignored, so capturing $? only misled a future reader into thinking
exit-code preservation was happening.
- Replace silenced `>/dev/null 2>&1` DELETE with `-w '%{http_code}'`
capture. ADMIN_TOKEN expiring mid-run was the realistic failure mode
here — previously we swallowed it under the silenced redirect, leaving
workspaces leaked with no signal. Now a 401/403/5xx surfaces as a
`cleanup_failed` JSON event with a remediation hint pointing at
cleanup-rogue-workspaces.sh; 404 is treated as success (the
post-condition — workspace absent — holds).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups from the #2240 code review:
1. Snapshot tests for the rendered tool-instruction blocks. The
structural tests added in #2240 guarantee tool NAMES are present;
these new tests pin the SHAPE — bullet ordering, heading style,
footer placement — so a future contributor who reorders fields in
`_render_section` or rewrites a `when_to_use` paragraph sees the
diff in CI rather than shipping a silently-different system prompt.
Golden files live under workspace/tests/snapshots/.
2. CLI-block alignment test + corrected source-of-truth comment.
`_A2A_INSTRUCTIONS_CLI` is a separate hand-maintained surface for
ollama and other non-MCP runtimes — the registry can't auto-generate
it because the CLI subprocess interface uses different command
shapes (`peers` vs `list_peers`, etc.). A new
`_CLI_A2A_COMMAND_KEYWORDS` mapping declares the registry-tool →
CLI-keyword correspondence (or explicit `None` for tools not
exposed via subprocess). Two tests enforce coverage:
- every a2a tool in the registry is keyed in the mapping
- every non-None subcommand keyword literally appears in
`_A2A_INSTRUCTIONS_CLI`
Caught one real gap: `send_message_to_user` is in the registry but
has no CLI subcommand. Mapped to `None` with an explanatory comment.
The "no other source of truth" claim in registry.py's docstring
was wrong post-#2240 (the CLI block survived) — corrected to
describe the two surfaces explicitly and point at the alignment
tests as the gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups from #2254 code review before the harness is safe to
run against staging:
1. Cleanup trap. Workspaces are now auto-deleted on EXIT/INT/TERM. A
Ctrl-C mid-run no longer leaks the PM + Researcher pair against
shared infra. KEEP_WORKSPACES=1 opts out for post-run inspection.
2. Tenant scoping + admin auth. Non-localhost PLATFORM values now
require both ADMIN_TOKEN and TENANT_ID; the script refuses to run
without them. The previous version sent unauthenticated POSTs that,
on staging, would either 401 every request or — worse — provision
into the wrong tenant. Memory `feedback_never_run_cluster_cleanup_
tests_on_live_platform` calls out the same hazard class.
3. DRY_RUN=1 mode. Prints platform target, tenant id, auth fingerprint,
and the planned actions, then exits before any state mutation. The
intended pre-flight before running against staging.
Also tightened OR_KEY check (the chained default silently accepted an
empty OPENROUTER_API_KEY) and added a heartbeat-trace caveat to the
interpretation guide explaining what `<endpoint_unavailable>` means
for the bound question.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds structured `rfc2251_phase=...` log lines at the deterministic phase
boundaries inside route_task_to_team and check_task_status, so an
operator running scripts/measure-coordinator-task-bounds.sh against
staging can correlate the harness's external timing trace with what
phase the coordinator was in at any given second.
The harness already exists in staging and measures end-to-end response
time + heartbeat trace. What it CAN'T do without this PR is answer
"the coordinator response took 7 minutes — was it stuck delegating, or
stuck polling children, or stuck synthesizing after all children
returned?" The phase logs answer that question.
Phases instrumented (deterministic Python boundaries, no agent prompt
involvement):
route_start → enter route_task_to_team
children_fetched → after get_children() returns
routing_decided → after build_team_routing_payload
delegate_invoked → just before delegate_task_async.ainvoke
delegate_returned → after delegate_task_async returns
check_status → every check_task_status poll (per-poll)
route_returning_decision_only → fall-through path
Each line includes elapsed_ms from route_start so per-phase durations
are extractable via:
grep rfc2251_phase= <container.log> \
| awk '{...}' to compute deltas between consecutive phases
The synthesis phase (after all children return, before agent emits
final A2A response) is NOT instrumented here because it's
agent-driven (no deterministic Python boundary). The harness operator
infers synthesis_secs = total_response_secs − max(check_status_ts).
This is reproduction-harness scaffolding; it adds zero behavior. Strip
the rfc2251_phase log lines when V1.0 ships and the phase data lands
in the structured heartbeat payload instead.
Refs:
- RFC: molecule-core#2251
- Harness: scripts/measure-coordinator-task-bounds.sh (shipped earlier)
- V1.0 gate: this is deliverable #2 of the four pre-V1.0 gates
Adds a reproduction harness for Issue 4 of the 2026-04-28 CP review,
referenced in RFC molecule-core#2251. The RFC review (issue #2251
comment) flagged that Issue 4 was hypothesized but not reproduced
before V1.0 implementation begins — this script closes that gap.
What it does:
- Provisions a coordinator (PM, claude-code-default) + 1 child
(Researcher, langgraph) via the platform API.
- Sends an A2A kickoff with a synthesis-heavy task that requires
SYNTHESIS_DEPTH (default 3) sequential delegations followed by a
600-word post-delegation synthesis.
- Times the coordinator's full A2A round-trip with millisecond
precision and emits one JSON event per phase (machine-readable).
- Pulls the coordinator's heartbeat trace post-run so the team can
see whether any platform-side state transition fired during the
long synthesis (the V1.0 RFC's MAX_TASK_EXECUTION_SECS would
surface as such a transition; absence of one in this trace
confirms the RFC's premise).
Why a measurement harness, not a pass/fail test:
Issue 4's claim is "absence of platform-side bound", which is hard
to assert in a single CI run. Outputting structured measurement
data lets the team interpret across multiple runs / staging vs
prod / different SYNTHESIS_DEPTH values rather than relying on one
reproduction snapshot.
The script's header has the full interpretation guide:
- ELAPSED < 60s → not informative (LLM was just fast)
- 60–300s → within DELEGATION_TIMEOUT, ambiguous
- >= 300s without trace transitions → BUG CONFIRMED
- curl_failed → coordinator hung past A2A_TIMEOUT or genuinely
slow (disambiguate by querying status separately)
Doesn't run in CI by default — invoked manually against staging or a
local platform with PLATFORM=... and OPENROUTER_API_KEY=... env vars.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a comment block at the top of auto-promote-staging.yml naming the
load-bearing one-time repo setting that the workflow depends on:
Settings → Actions → General → Workflow permissions
→ ✅ Allow GitHub Actions to create and approve pull requests
Without this toggle, every workflow_run fails with
"GitHub Actions is not permitted to create or approve pull requests
(createPullRequest)". Observed 2026-04-29 01:43 UTC blocking the
fcd87b9 promotion (PRs #2248 + #2249); manually bridged via PR #2252.
The setting is invisible to anyone reading the workflow file, but the
workflow cannot do its job without it. Documenting here so the next
time it gets toggled off (org admin change, repo migration, audit
cleanup) the failure mode points at the cause rather than another
round of "why is auto-promote broken."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the sweep-cf-orphans hardening (#2248) on publish-runtime's
TEMPLATE_DISPATCH_TOKEN gate. The previous behaviour was to print
:⚠️:skipping cascade — templates will pick up the new version
on their own next rebuild and exit 0. That message is wrong: the 8
workspace-template repos only rebuild on this repository_dispatch
fanout. Without the dispatch they stay pinned to whatever runtime
version they last saw, and the gap is invisible until someone
notices a template several versions behind weeks later.
Behaviour after this PR:
- push (auto-trigger on workspace/runtime/** changes) → exit 1
- workflow_dispatch (manual operator) → exit 0
with a warning (operator already accepted state; let them rerun
after restoring the secret)
The token-missing path now also names the consequence concretely
("templates will NOT pick up the new version until this token is
restored") so future operators see the actionable line, not the
misleading "they'll catch up on their own" message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the soft-skip-with-warning behaviour for scheduled runs of the
hourly Cloudflare orphan sweeper with an explicit failure when the six
required secrets aren't set. Manual workflow_dispatch keeps the
soft-skip path so an operator can short-circuit a deliberate rerun
without redoing the secrets dance — they accepted the state when they
clicked the button.
Why: from some-date to 2026-04-28, all six secrets were unset on the
repo. Every hourly tick printed a yellow :⚠️: and exited 0,
which GitHub registers as "completed/success" — the sweeper was
indistinguishable from a healthy janitor with nothing to do. Cloudflare
orphans accumulated unobserved to 152/200 (~76% of the zone quota),
and only surfaced via a manual audit. The mechanism to catch this kind
of regression is to make the workflow loud: red runs prompt
investigation, green runs are presumed healthy.
Schedule/workflow_run/push paths now print three ::error:: lines
naming the missing secrets, the fix, and a one-line reference to this
incident, then exit 1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the fix#2234 applied to auto-sync-main-to-staging.yml in the
reverse direction. Both workflows now use the same merge-queue path
that humans use; no special-case bypass.
Why
Every tick of auto-promote-staging.yml since main's branch protection
went stricter has been failing with:
remote: error: GH006: Protected branch update failed for refs/heads/main.
remote: - Required status checks "Analyze (go)", "Analyze (javascript-typescript)",
"Analyze (python)", "Canvas (Next.js)", "Detect changes",
"E2E API Smoke Test", "Platform (Go)", "Python Lint & Test",
and "Shellcheck (E2E scripts)" were not set by the expected
GitHub apps.
remote: - Changes must be made through a pull request.
The previous version did `git merge --ff-only origin/staging &&
git push origin main` directly. That works against a permissive
branch — it doesn't work against a ruleset that requires checks
satisfied by the expected GitHub apps. Only PR merges through the
queue produce check runs from the right apps.
Result was that today's 12+ merges to staging never propagated to
main; the auto-promote ran every tick and failed every tick, while
operators had to keep opening manual `staging → main` bridges.
Fix
- Replace the direct git push step with a step that opens (or reuses)
a PR base=main head=staging and enables auto-merge. The merge queue
lands it once gates are green on the merge_group ref.
- The PR's head IS the staging branch (no per-SHA promote branch
needed) — the whole purpose is "advance main to staging's tip".
- Add `pull-requests: write` permission so the workflow can call
gh pr create + gh pr merge --auto.
- Drop the `git merge-base --is-ancestor` divergence check — the
merge queue itself enforces branch protection now, and rejects
the PR if main has diverged from staging history.
Loop safety preserved: when this PR's merge lands on main, it
triggers auto-sync-main-to-staging.yml which opens a sync PR back
to staging. That sync PR's eventual merge is by GITHUB_TOKEN (the
merge queue) which doesn't trigger downstream workflow_run events
— so auto-promote-staging.yml does NOT re-fire from its own merge
landing.
Refs: #2234 (the parallel fix for auto-sync-main-to-staging.yml),
task #142, multiple failing runs visible in
https://github.com/Molecule-AI/molecule-core/actions/workflows/auto-promote-staging.yml
Consolidates the remaining safe-to-merge dependabot PRs from the
2026-04-28 wave into one consumable PR. Replaces three earlier
single-bump PRs (#2245, #2230, #2231) which were closed in favor of
this single batch — same pattern as #2235.
GitHub Actions majors (SHA-pinned per org convention):
github/codeql-action v3 → v4.35.2 (#2228)
actions/setup-node v4 → v6.4.0 (#2218)
actions/upload-artifact v4 → v7.0.1 (#2216)
actions/setup-python v5 → v6.2.0 (#2214)
npm dev deps (canvas/, lockfile regenerated in node:22-bookworm
container so @emnapi/* and other Linux-only optional deps are
properly resolved — Mac-native `npm install` strips them, which
caused the earlier #2235 batch to drop these two):
@types/node ^22 → ^25.6 (#2231)
jsdom ^25 → ^29.1 (#2230)
Why each is safe
setup-node v4 → v6 / setup-python v5 → v6:
Every consumer call pins node-version / python-version
explicitly. v5 / v6 changed defaults but pinned consumers
are unaffected. Confirmed via grep across .github/workflows/
— all setup-node call sites pin '20' or '22', all
setup-python call sites pin '3.11'.
codeql-action v3 → v4.35.2:
Used as init/autobuild/analyze sub-actions in codeql.yml.
v4 bundles a newer CodeQL CLI; ubuntu-latest auto-updates
so functional behavior is unchanged. The deprecated
CODEQL_ACTION_CLEANUP_TRAP_CACHES env var (per v4.35.2
release notes) is undocumented and we don't set it.
upload-artifact v4 → v7.0.1:
v6 introduced Node.js 24 runtime requiring Actions Runner
>= 2.327.1. All upload-artifact users (codeql.yml,
e2e-staging-canvas.yml) run on `ubuntu-latest` (GitHub-
hosted), which auto-updates the runner agent. Self-hosted
runners are NOT used for these jobs.
@types/node 22 → 25 / jsdom 25 → 29:
Both are dev-only — @types/node is type definitions,
jsdom backs vitest's DOM environment. Tests pass:
79 files / 1154 tests in node:22-bookworm container.
Verified locally (Linux container so the lockfile reflects what
CI's `npm ci` will install):
- cd canvas && npm install --include=optional → 169 packages
- npm test → 1154/1154 pass
- npm ci → clean install succeeds
- npm run build → Next.js prerendering succeeds
Closes when this lands (the 3 individual auto-merge PRs from earlier
were closed):
#2228#2218#2216#2214#2231#2230
NOT included (CI failing on dependabot's own run — major framework
bumps that need code-side migration tasks, not safe auto-bumps):
#2233 next 15 → 16
#2232 tailwindcss 3 → 4
#2226 typescript 5 → 6
Branch protection on `main` requires "E2E API Smoke Test" as a status
check. With Design B's no-op + e2e-api job split, when paths-filter
excludes a commit:
- e2e-api job (name="E2E API Smoke Test"): SKIPPED
- no-op job (name="no-op"): SUCCESS
Branch protection counts the skipped check-run as not-satisfied →
auto-promote-staging's `git push origin main` rejected with GH006.
Observed 2026-04-28 00:22 UTC: every gate green at the workflow level,
all_green=true in auto-promote-staging's gate-check, but the FF push
itself rejected with:
Required status checks "..., E2E API Smoke Test, ..." were not set
by the expected GitHub apps.
Fix: give the no-op job the same `name:` as the real one. Now both
register as check-runs named "E2E API Smoke Test" — exactly one runs
per workflow execution (mutex `if`), the other registers as skipped
with the same name. Branch protection sees at least one success,
requirement satisfied.
Same fix applied to e2e-staging-canvas.yml's no-op (name → "Canvas
tabs E2E") for symmetry, even though "Canvas tabs E2E" isn't currently
in main's required check list — kept consistent so the next time a
required-checks reshuffle pulls it in, it doesn't recreate this bug.
Note: Design B's intent was always "emit a result auto-promote can
read" — that intent was satisfied at the workflow-conclusion level
(success), but missed the per-check-run-name level. This PR closes
that second-order gap.
The PR-built wheel + import smoke gate refused the platform_tools
package because it's a new subdirectory under workspace/ that wasn't
in scripts/build_runtime_package.py:SUBPACKAGES. The drift gate (which
exists for exactly this reason) caught it cleanly:
error: SUBPACKAGES drifted from workspace/ subdirectories:
in workspace/ but NOT in SUBPACKAGES (will ship un-rewritten or
be excluded): ['platform_tools']
Adding platform_tools to SUBPACKAGES wires the package into the
runtime wheel + applies the canonical
from platform_tools.<x> -> from molecule_runtime.platform_tools.<x>
import-rewrite step that every other subpackage uses.
Verified locally: scripts/build_runtime_package.py succeeds, the
rewritten a2a_mcp_server.py reads
from molecule_runtime.platform_tools.registry import TOOLS
which matches the package layout in the wheel.
e2e-staging-canvas had a single global concurrency group:
concurrency:
group: e2e-staging-canvas
cancel-in-progress: false
That meant the entire repo shared one running + one pending slot. When a
staging push queued behind an in-flight run and a third entrant (a PR
run, a follow-on push) entered the group, the staging push got
cancelled. auto-promote-staging then saw `completed/cancelled` for a
required gate and refused to advance main.
Observed 2026-04-28 23:51-23:53: staging tip 3f99fede's e2e-staging-
canvas push run was cancelled within 2:20 of starting because a PR run
on a follow-on branch entered the group. Auto-promote-staging fired 8+
times after that, all skipped because canvas was still in the cancelled
state. The chain stayed stuck until the cancelled run was manually
re-dispatched.
e2e-api had a softer version of the same bug — `group: e2e-api-${{
github.ref }}`. Per-ref isolates push events from PR events, so this
specific scenario didn't hit it, but back-to-back pushes to staging at
SHA-A and SHA-B share refs/heads/staging and would still cancel SHA-A's
queued run when SHA-B enters.
Both workflows now use per-SHA grouping. The single-global-group's
original intent was to throttle parallel E2E provisions, but each E2E
run already isolates its state via fresh-org-per-run, and parallel
infrastructure cost at our scale (~$0.001/min × 10min × 2) is rounding
error compared to a stuck pipeline.
Per-SHA still dedupes accidental double-triggers for the SAME SHA.
It does not cancel obsolete-PR-version runs on force-push — that wasted
CI is acceptable given the alternative is losing staging-tip data that
auto-promote-staging depends on.
Other gate workflows: ci.yml uses `cancel-in-progress: true` which is
correct for unit tests (intentional cancellation on supersede). codeql.yml
is per-ref like e2e-api was; same fix probably applies if the same
deadlock pattern is observed there, but no incident yet so deferring.
Establishes workspace/platform_tools/registry.py as THE place tool
naming and docs live. Every consumer reads from it; nothing duplicates
the source. Closes the architectural gap behind the doc/tool drift
discussion 2026-04-28 — adding hundreds of future runtime SDK adapters
should not require touching tool names anywhere except the registry.
What the registry owns
ToolSpec dataclass with: name, short (one-line description), when_to_use
(multi-paragraph agent-facing usage guidance), input_schema (JSON Schema),
impl (the actual coroutine in a2a_tools.py), section ('a2a' | 'memory').
TOOLS list with 8 entries — delegate_task, delegate_task_async,
check_task_status, list_peers, get_workspace_info, send_message_to_user,
commit_memory, recall_memory.
What now reads from the registry
- workspace/a2a_mcp_server.py
The hardcoded TOOLS list (167 lines of hand-maintained dicts) is
gone. Replaced with a 6-line list comprehension over the registry.
MCP description = spec.short. inputSchema = spec.input_schema.
- workspace/executor_helpers.py
get_a2a_instructions(mcp=True) and get_hma_instructions() now
GENERATE the agent-facing system-prompt text from the registry.
Heading + per-tool bullet (spec.short) + per-tool when_to_use +
a section-specific footer. No more hand-maintained instruction
blocks that drift from reality.
- workspace/builtin_tools/delegation.py
Renamed delegate_to_workspace -> delegate_task_async to match
registry. check_delegation_status -> check_task_status. Added
sync delegate_task @tool wrapping a2a_tools.tool_delegate_task
(was missing for LangChain runtimes — CP review Issue 3).
- workspace/builtin_tools/memory.py
Renamed search_memory -> recall_memory to match registry.
- workspace/adapter_base.py, workspace/main.py
Bundle all 7 core tools (was 6) into all_tools / base_tools.
- workspace/coordinator.py, shared_runtime.py, policies/routing.py
Updated system-prompt-text references to use the registry names.
Structural alignment tests
workspace/tests/test_platform_tools.py — 9 tests pin every
registry-to-adapter mapping:
- registry names are unique
- a2a + memory partition is complete (no orphans)
- by_name lookup works
- MCP server registers exactly the registry's tool set
- MCP description equals registry.short for every tool
- MCP inputSchema equals registry.input_schema for every tool
- get_a2a_instructions text contains every a2a tool name
- get_hma_instructions text contains every memory tool name
- pre-rename names (delegate_to_workspace, search_memory,
check_delegation_status) cannot leak back
Adding a future tool means adding one ToolSpec; the test failure
list tells the author exactly which adapter to update.
Adapter pattern for future SDK support
When (e.g.) AutoGen or Pydantic AI gets adapters, the only work
needed for tool surfacing is "wrap registry.TOOLS in your SDK's
tool format." Names, descriptions, schemas, impl come from the
registry — adapter author writes zero strings.
Why this needed to ship now
PR #2237 (already in staging) injected MCP-world docs as the
default system-prompt content. Without the registry, those docs
said "delegate_task" while LangChain runtimes only had
"delegate_to_workspace" — workers see docs for tools that don't
exist (CP review Issue 1+3). PR #2239 was a tactical rename;
this PR is the structural fix that prevents the same class of
drift from recurring as new adapters ship.
PR #2239 was closed in favor of this — same renames, plus the
registry, plus structural tests. Single coherent change.
Tests: 1232 pass, 2 xfailed (pre-existing). 9 new in
test_platform_tools.py; 4 alignment tests in test_prompt.py from
#2237 still pass; original test_executor_helpers tests adapted to
the registry-driven world.
Refs: CP review Issues 1, 2, 3, 5; project memory
project_runtime_native_pluggable.md (platform owns A2A);
project memory feedback_doc_tool_alignment.md (this is the structural
fix for the tactical lesson).
Self-review caught a real correctness bug: scenario where publish-
workspace-server-image completes BEFORE E2E Staging SaaS for a runtime-
touching SHA. Publish typically takes ~5-10min; E2E ~10-15min, so this
ordering is the common case for runtime-path PRs.
Previous gate logic:
- completed/success: proceed
- completed/failure: abort
- everything else (including in_progress): proceed ← BUG
If publish-trigger fires while E2E is still running, the gate returned
"in_progress/none" and fell through the catch-all "proceed" branch.
Result: :latest retagged on the publish signal alone. Then E2E ends
red — but :latest was already wrongly advanced; the E2E-completion
trigger's job-level if=conclusion==success filter just skips, never
rolls back.
Fix: explicit case for in_progress|queued|requested|waiting|pending
that DEFERS — sets gate.proceed=false, writes a "deferred" summary,
exits 0 (workflow run shows success, retag steps skipped). The E2E
completion trigger then fires later and either promotes (green) or
aborts (red), giving us correct ordering regardless of who finishes
first.
Subsequent steps now guarded by `if: steps.gate.outputs.proceed ==
'true'` instead of relying on `exit 1` for skip semantics.
Also added an explicit catch-all `*)` branch that aborts on unknown
states (forward-compat: GitHub adds a new status, we surface it
instead of silently promoting through it).
Previously this workflow only triggered on E2E Staging SaaS completion,
which is itself paths-filtered to runtime handlers
(workspace-server/internal/handlers/{registry,workspace_provision,
a2a_proxy}.go, middleware/**, provisioner/**). publish-workspace-server
-image fires on a STRICTLY BROADER path set (workspace-server/**,
canvas/**, manifest.json) — so canvas-only or cmd-only or sweep-only
PRs rebuilt the platform image without ever advancing :latest.
Result observed 2026-04-28: zero runs of this workflow since merge
despite eight main pushes. :latest sat ~7 hours / 9 PRs behind main.
Fix: add publish-workspace-server-image as a second trigger. Add an
explicit gate inside the job that aborts when E2E Staging SaaS for the
same SHA ended red. When E2E didn't fire (paths-filtered), proceed —
auto-promote-staging's pre-merge gates (CI + E2E Canvas + E2E API +
CodeQL on staging) already validated this SHA before main moved.
Concurrency group serializes promotes per-SHA so the publish+E2E both-
fired race lands cleanly. Idempotent crane tag makes it safe regardless.
Workers were registering platform tools (delegate_task, delegate_task_async,
list_peers, check_task_status, send_message_to_user, commit_memory,
recall_memory) but the build_system_prompt assembly never included
documentation for any of them. The instruction-text functions
get_a2a_instructions() and get_hma_instructions() exist in
executor_helpers.py and have unit tests, but were not called from any
production code path — workers received system-prompt.md content only
and saw the tools as bare names with no usage guidance.
Symptom: agents called commit_memory and delegate_task without knowing
they were platform tools. They worked when the agent guessed the API
correctly and silently failed when the agent didn't.
Fix: build_system_prompt() now appends both instruction sets between
the Skills section and the Peers section. The placement is intentional —
A2A docs explain how to call delegate_task; the peer list is the data
that delegate_task operates over, so the docs precede the peer table.
New parameter `a2a_mcp: bool = True` lets adapters opt into the CLI
subprocess variant of the A2A instructions for runtimes without MCP
support (ollama, custom CLI runtimes). Default True covers the
MCP-capable majority (claude-code, hermes, langchain, crewai). Adapter
callers don't need to change unless they specifically need CLI mode.
Tests: 4 new regression tests in test_prompt.py pin
- A2A MCP variant injection (default)
- A2A CLI variant injection (a2a_mcp=False, with MCP-only fields absent)
- HMA instruction injection
- A2A docs precede peer list ordering
Full suite green: 1223 passed, 2 xfailed.
Brings the merge commit a3864eaf (PR #2211 staging→main UI merge) back
onto staging so the staging-as-superset-of-main invariant holds again.
Why this is needed instead of auto-sync firing:
- Push-to-main ran the OLD direct-push auto-sync workflow that was
on main at the time (a3864eaf), not the new PR-based version.
- Direct push to staging is blocked by the merge_queue ruleset
(GH013), so the run failed.
- The new PR-based auto-sync (#2234) lives on staging but isn't on
main yet — it can't fire on the push that would've triggered it.
- This PR breaks the deadlock manually. Once it merges, auto-promote
fast-forwards main and main picks up the new auto-sync workflow,
making the system self-healing for any future #2211-style merge.
Diff is empty — a3864eaf is itself a staging→main merge, so its tree
matches staging's tree at that point. This commit only adds the merge
graph, not new file content.
Consolidates 11 of the 17 open Dependabot PRs (#2215, #2217, #2219-#2225,
#2227, #2229) into one PR. Every entry is a patch / minor / floor bump
where the impact surface is small and CI carries the proof.
Same pattern as the 2026-04-15 batch.
Go (workspace-server/go.mod + go.sum, regenerated via `go mod tidy`):
- golang.org/x/crypto 0.49.0 → 0.50.0 (#2225)
- github.com/golang-jwt/jwt/v5 5.2.2 → 5.3.1 (#2222)
- github.com/gin-contrib/cors 1.7.2 → 1.7.7 (#2220)
- github.com/docker/go-connections 0.6.0 → 0.7.0 (#2223)
- github.com/redis/go-redis/v9 9.7.3 → 9.19.0 (#2217)
Python floor bumps (workspace/requirements.txt; current pip-resolved
versions don't change unless they happen to be below the new floor):
- httpx >=0.27 → >=0.28.1 (#2221)
- uvicorn >=0.30 → >=0.46 (#2229)
- temporalio >=1.7 → >=1.26 (#2227)
- websockets >=12 → >=16 (#2224)
- opentelemetry-sdk >=1.24 → >=1.41.1 (#2219)
GitHub Actions (SHA-pinned per existing convention):
- dorny/paths-filter@d1c1ffe (v3) → @fbd0ab8 (v4.0.1) (#2215)
REMOVED from this batch (lockfile platform mismatch):
- #2231 @types/node ^22 → ^25.6 (npm install on macOS strips
Linux-only @emnapi/* entries from package-lock.json that CI's
`npm ci` then refuses; needs a Linux-side install to land cleanly)
- #2230 jsdom ^25 → ^29.1 (same)
NOT included in this batch (deferred to per-PR human review):
- #2228 github/codeql-action v3 → v4 (CodeQL CLI alignment risk)
- #2218 actions/setup-node v4 → v6 (default Node version drift)
- #2216 actions/upload-artifact v4 → v7 (3 major versions)
- #2214 actions/setup-python v5 → v6 (action major)
NOT merged (CI failing on dependabot's own PR):
- #2233 next 15 → 16
- #2232 tailwindcss 3 → 4
- #2226 typescript 5 → 6
Verified:
- workspace-server: `go mod tidy && go build ./... && go test ./...` — green
- workspace requirements.txt: floor bumps only
The molecule-core/staging branch is protected by ruleset 15500102
(name: staging-merge-queue) which blocks ALL direct pushes — no
bypass even for org admins or the GitHub Actions integration. The
prior version of this workflow attempted `git push origin staging`
and was rejected with GH013:
! [remote rejected] staging -> staging
(push declined due to repository rule violations)
- Changes must be made through a pull request.
- Changes must be made through the merge queue
This was a real architectural mismatch: auto-sync was bypassing
the same gates everyone else goes through to land on staging,
which is exactly what the ruleset is designed to prevent.
The fix matches the org convention: the workflow now opens a PR
(base=staging, head=auto-sync/main-<sha>) and enables auto-merge.
The merge queue picks it up, runs required gates against the
merged result, and lands it. Same path human PRs take through
staging — no special-snowflake bypass.
Trade-off acknowledged
- Slight PR churn: every main push that needs sync opens a tracked
PR. With concurrency: cancel-in-progress: false (existing) and
the merge queue's serial processing, this is bounded — PRs land
in order, no thundering herd.
- The previous direct-push approach worked on
molecule-controlplane (which has no merge_queue ruleset on
staging). That version of the workflow was correct for that
repo's protection model. Per-repo divergence is acceptable; the
invariant ("staging ⊇ main") is what matters, not how it's
enforced.
Loop safety preserved
GITHUB_TOKEN-authored merges (including the merge queue's land
of this PR) do NOT trigger downstream workflow runs. So the merge
to staging from this PR doesn't fire auto-promote-staging — same
as the direct-push version.
Idempotency
The branch name is derived from main's short sha
(`auto-sync/main-<sha>`) so workflow restarts on the same main
push reuse the existing branch + PR rather than opening duplicates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Supply-chain hardening for the CI pipeline. 23 workflow files
modified, 59 mutable-tag refs replaced with commit SHAs.
The risk
Every `uses:` reference in .github/workflows/*.yml was pinned to a
mutable tag (e.g., `actions/checkout@v4`). A maintainer of an
action — or a compromised maintainer account — can repoint that
tag to malicious code, and our pipelines silently pull it on the
next run. The tj-actions/changed-files compromise of March 2025 is
the canonical example: maintainer credential leak, attacker
repointed several `@v<N>` tags to a payload that exfiltrated
repository secrets. Repos that pinned to SHAs were unaffected.
The fix
Replace each `@v<N>` with `@<commit-sha> # v<N>`. The trailing
comment preserves human readability ("ah, this is v4"); the SHA
makes the reference immutable.
Actions covered (10 distinct):
actions/{checkout,setup-go,setup-python,setup-node,upload-artifact,github-script}
docker/{login-action,setup-buildx-action,build-push-action}
github/codeql-action/{init,autobuild,analyze}
dorny/paths-filter
imjasonh/setup-crane
pnpm/action-setup (already pinned in molecule-app, listed here for completeness)
Excluded:
Molecule-AI/molecule-ci/.github/workflows/disable-auto-merge-on-push.yml@main
— internal org reusable workflow; we control its repo, threat model
is different from third-party actions. Conventional to pin to @main
rather than SHA for internal reusables.
The maintenance cost
SHA pinning means upstream fixes require manual SHA bumps. Without
automation, pinned SHAs go stale. So this PR also enables Dependabot
across four ecosystems:
- github-actions (workflows)
- gomod (workspace-server)
- npm (canvas)
- pip (workspace runtime requirements)
Weekly cadence — the supply-chain attack window is "minutes between
repoint and pull"; weekly auto-bumps don't help with zero-days
regardless. The point is to pull in non-zero-day fixes without
operator effort.
Aligns with user-stated principle: "long-term, robust, fully-
automated, eliminate human error."
Companion PR: Molecule-AI/molecule-controlplane#308 (same pattern,
smaller surface).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a lint that diffs the canonical SECRET_PATTERNS array in
.github/workflows/secret-scan.yml against every known public
consumer mirror, failing on any divergence.
Why: every side that scans for credentials carries its own copy of
the pattern list. They drift — most recently the workspace-runtime
pre-commit hook lagged the canonical by one pattern (sk-cp- /
MiniMax F1088 vector), so a developer's local pre-commit would let
a sk-cp- token through while the org-wide CI scan would refuse it.
Useless friction; automated detection closes the gap.
Implementation:
.github/scripts/lint_secret_pattern_drift.py — pure stdlib, fetches
each consumer's RAW file via urllib, extracts the
SECRET_PATTERNS=( ... ) array via anchored regex (the closing
`)` is anchored to the start of a line because pattern comments
like `# GitHub PAT (classic)` contain their own paren mid-line),
diffs against canonical, fails on missing or extra patterns.
Fetch failures are warnings, not errors — a consumer whose
branch was renamed shouldn't fail the lint until someone updates
the URL list.
.github/workflows/secret-pattern-drift.yml — daily 05:00 UTC cron
+ on-push gate (when canonical, the workflow, or the script
changes) + workflow_dispatch. Read-only token, 5-minute timeout.
Initial consumer set: workspace-runtime's bundled pre-commit hook
(the one that drifted on sk-cp-). molecule-controlplane's inlined
copy is private so this workflow can't read it; that's tracked
separately and the controlplane's own self-monitor is the gap.
Verified locally: lint detects drift correctly when the runtime
hook is missing sk-cp-, returns clean when aligned.
Refs: task #139.