BLOCKERS fixed:
- instructions.go: Drop team-scope queries (teams/team_members tables don't
exist in any migration). Schema column kept for future. Restored Resolve
to /workspaces/:id/instructions/resolve under wsAuth — closes auth gap
that allowed cross-workspace enumeration of operator policy.
- migration 040: Add CHECK constraints on title (<=200) and content (<=8192)
to prevent token-budget DoS via oversized instructions.
- a2a_executor.py: Pair on_tool_start/on_tool_end via run_id instead of
list-position so parallel tool calls don't drop or clobber outputs. Cap
tool_trace at 200 entries to prevent runaway loops bloating JSONB.
HIGH fixes:
- instructions.go: Add length validation in Create + Update handlers.
Removed dead rows_ shadow variable. Replaced string concatenation in
Resolve with strings.Builder.
- prompt.py: Drop httpx timeout 10s -> 3s (boot hot path). Switch print
to logger.warning. Add Authorization bearer header from
MOLECULE_WORKSPACE_TOKEN env var.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tenant's workspace provisioner now forwards payload.Model (set by
canvas Config tab when a user picks a model) through to the
workspace's runtime env as HERMES_DEFAULT_MODEL, so install.sh /
start.sh in the template can seed the right ~/.hermes/config.yaml
without any post-provision manual step.
Helper applyRuntimeModelEnv() is runtime-switched so each template
owns its own env contract — hermes uses HERMES_DEFAULT_MODEL, future
runtimes with different config schemas register their own cases.
Runtimes that read model from /configs/config.yaml instead (langgraph,
claude-code, deepagents) are unaffected: the switch has no case for
them, so this is a no-op in those paths.
Applied in both the Docker provisioner path (provisionWorkspaceOpts)
and the SaaS/CP path (provisionWorkspaceCP) so local dev and
production behave identically.
Combined with:
- molecule-controlplane#231 (/opt/adapter/install.sh hook)
- molecule-ai-workspace-template-hermes#8 (install.sh for bare-host)
- molecule-ai-workspace-template-hermes#9 (derive-provider.sh)
this completes the MVP flow: customer creates a hermes workspace
in canvas with model = minimax/MiniMax-M2.7-highspeed + secret
MINIMAX_API_KEY = sk-cp-…, clicks Save, workspace provisions with
the MiniMax Token Plan hermes-agent gateway up and ready for the
first chat — no ops touch.
Foundation this builds on:
- env injection works for every runtime
- secret passthrough is generic (already via workspace_secrets)
- per-runtime env-var contract encoded once (applyRuntimeModelEnv)
- canvas Save button for later-edit remains a Files-API-over-EIC
concern (tracked separately)
See internal/product/designs/workspace-backends.md for the broader
architectural direction this fits into.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rm received /configs and filePath as two separate arguments, deleting
the entire /configs dir on every call. Concatenate to target only the
intended file. validateRelPath already prevents traversal, so this is
a logic bug not a security vulnerability.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
F1085 (Misconfiguration - Filesystems): the 2-arg exec form
[]string{"rm", "-rf", "/configs", filePath} passes /configs as
an rm target, so rm -rf /configs deletes the entire volume mount
regardless of what filePath resolves to.
Fix uses filepath.Join + filepath.Clean + HasPrefix assertion to
scope rm to the /configs/ prefix. validateRelPath (CWE-22) catches
leading/mid-path ".." before rm. HasPrefix guard is defence-in-depth.
Includes CP-BE's 12-case regression test suite (docker: nil,
validates all traversal forms rejected before Docker call).
Co-Authored-By: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-Authored-By: Molecule AI CP-BE <cp-be@agents.moleculesai.app>
Two related workflow hygiene changes:
## (1) canary-verify: graceful-skip when canary secrets absent
Before: canary-verify hit `scripts/canary-smoke.sh` which exited
non-zero when CANARY_TENANT_URLS was empty. Every main publish
ran → canary-verify failed → red check on main CI signal (7/7 in
past 24h). Noise, no value.
After: smoke step detects the missing-secrets case, writes a
warning to the step summary, sets an output `smoke_ran=false`,
and exits 0. The workflow completes green without pretending to
have tested anything.
Gated downstream: `promote-to-latest` now requires BOTH
`needs.canary-smoke.result == success` AND
`needs.canary-smoke.outputs.smoke_ran == true`. A skip does NOT
auto-promote — manual `promote-latest.yml` remains the release
gate while Phase 2 canary is absent (see
molecule-controlplane/docs/canary-tenants.md for the fleet
stand-up plan + decision framework).
When the canary fleet is stood up and secrets populated: delete
the early-exit branch + the smoke_ran gate. The workflow goes back
to its original "smoke gates promotion" semantics.
## (2) auto-promote-staging.yml — draft
New workflow that fires after CI / E2E Staging Canvas / E2E API /
CodeQL complete on the staging branch, checks that ALL four are
green on the same SHA, and fast-forwards `main` to that SHA.
Shipped disabled: the promote step is gated behind repo variable
`AUTO_PROMOTE_ENABLED=true`. Until that's set, the workflow
dry-runs and logs what it would have done. Toggle via Settings →
Variables when staging CI has been reliably green for a few days.
Safety:
- workflow_run events only fire on push to staging (PRs into
staging don't promote).
- Every required gate must be `completed/success` on the same
head_sha. Pending / failed / skipped / cancelled → abort.
- `--ff-only` push. Refuses to advance main if it has diverged
from staging history (someone landed a direct-to-main commit
that's not on staging). Human resolves the fork.
- `workflow_dispatch` with `force=true` lets us test the flow
end-to-end before flipping the variable on.
Motivation: molecule-core#1496 has been open with 1172 commits
divergence between staging and main. Today that trapped PR #1526
(dynamic canvas runtime dropdown) on staging while prod users
hit the hardcoded-dropdown bug. Auto-promote retires the bulk
staging→main PR pattern once the staging CI it depends on is
reliable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes to stop ferrying sensitive content through our public
monorepo. All content already imported to Molecule-AI/internal (private)
— see linked PRs below.
Contained full security audit cycle records with CWE references,
file:line pointers to historical vulnerabilities, and severity
ratings. None of that belongs in a public repo.
→ Moved to Molecule-AI/internal/security/incident-log.md (PR #20).
Monorepo file becomes a 17-line stub pointing at the internal
location. Future incidents land in the internal file only.
Had AWS account ID `004947743811` and IAM role name
`MoleculeStagingProvisioner` embedded. Even though the fleet
described isn't actually running (see state note), these
identifiers are account-specific and don't belong in public git.
→ Removed both values, replaced with generic references + a pointer
to Molecule-AI/internal/runbooks/canary-fleet.md (PR #21) where
the actual identifiers live. Any future rotation touches the
internal file, no public-git-history rewrite needed.
Contained the full ops runbook: bootstrap script output, per-tenant
SG backfill loop with live SG IDs, customer slug names
(hongmingwang). Useful content but too specific for a public repo.
→ Moved to Molecule-AI/internal/runbooks/workspace-terminal.md
(PR #22). Monorepo file becomes a 30-line public summary of what
the feature does + pointers to code, so external readers /
self-hosters still get the design story.
Marketing briefs, SEO plans, campaign copy, research dossiers, and
internal product designs (hermes-adapter-plan, medo-integration,
cognee-*) are the next batches. See docs policy doc coming next to
set team expectations.
Net removal: ~820 lines from public git going forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a configurable instruction injection system that prepends rules to
every agent's system prompt. Instructions are stored in the DB and fetched
at workspace startup, supporting three scopes:
- Global: applies to all agents (e.g., "verify with tools before reporting")
- Team: applies to agents in a specific team
- Workspace: applies to a single agent (role-specific rules)
Components:
- Migration 040: platform_instructions table with scope hierarchy
- Go API: CRUD endpoints + resolve endpoint that merges scopes
- Python runtime: fetches instructions at startup via /instructions/resolve
and prepends them to the system prompt as highest-priority context
Initial global instructions seeded:
1. Verify Before Acting (check issues/PRs/docs first)
2. Verify Output Before Reporting (second signal before reporting done)
3. Tool Usage Requirements (claims must include tool output)
4. No Hallucinated Emergencies (CRITICAL needs proof)
5. Staging-First Workflow (never push to main directly)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Every A2A response now includes a tool_trace — the list of tools/commands
the agent actually invoked during execution. This enables verifying agent
claims against what they actually did, catches hallucinated "I checked X"
responses, and provides an audit trail for the CEO to control hundreds of
agents by checking the top-level PM's trace.
Changes:
- Python runtime: collect tool name/input/output_preview on every
on_tool_start/on_tool_end event, embed in Message.metadata.tool_trace
- Go platform: extract tool_trace from A2A response metadata, store in
new activity_logs.tool_trace JSONB column with GIN index
- Activity API: expose tool_trace in List and broadcast endpoints
- Migration 039: adds tool_trace column + GIN index
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two related workflow hygiene changes:
## (1) canary-verify: graceful-skip when canary secrets absent
Before: canary-verify hit `scripts/canary-smoke.sh` which exited
non-zero when CANARY_TENANT_URLS was empty. Every main publish
ran → canary-verify failed → red check on main CI signal (7/7 in
past 24h). Noise, no value.
After: smoke step detects the missing-secrets case, writes a
warning to the step summary, sets an output `smoke_ran=false`,
and exits 0. The workflow completes green without pretending to
have tested anything.
Gated downstream: `promote-to-latest` now requires BOTH
`needs.canary-smoke.result == success` AND
`needs.canary-smoke.outputs.smoke_ran == true`. A skip does NOT
auto-promote — manual `promote-latest.yml` remains the release
gate while Phase 2 canary is absent (see
molecule-controlplane/docs/canary-tenants.md for the fleet
stand-up plan + decision framework).
When the canary fleet is stood up and secrets populated: delete
the early-exit branch + the smoke_ran gate. The workflow goes back
to its original "smoke gates promotion" semantics.
## (2) auto-promote-staging.yml — draft
New workflow that fires after CI / E2E Staging Canvas / E2E API /
CodeQL complete on the staging branch, checks that ALL four are
green on the same SHA, and fast-forwards `main` to that SHA.
Shipped disabled: the promote step is gated behind repo variable
`AUTO_PROMOTE_ENABLED=true`. Until that's set, the workflow
dry-runs and logs what it would have done. Toggle via Settings →
Variables when staging CI has been reliably green for a few days.
Safety:
- workflow_run events only fire on push to staging (PRs into
staging don't promote).
- Every required gate must be `completed/success` on the same
head_sha. Pending / failed / skipped / cancelled → abort.
- `--ff-only` push. Refuses to advance main if it has diverged
from staging history (someone landed a direct-to-main commit
that's not on staging). Human resolves the fork.
- `workflow_dispatch` with `force=true` lets us test the flow
end-to-end before flipping the variable on.
Motivation: molecule-core#1496 has been open with 1172 commits
divergence between staging and main. Today that trapped PR #1526
(dynamic canvas runtime dropdown) on staging while prod users
hit the hardcoded-dropdown bug. Auto-promote retires the bulk
staging→main PR pattern once the staging CI it depends on is
reliable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1526 shipped the /templates registry + canvas dynamic Runtime /
Model / Required-Env fields on 2026-04-22 — but merged into the
staging branch, not main. The staging→main promotion PR #1496 has
been open unmerged for a while with 1172 commits divergence, so
prod (which builds from main) still carries the old hardcoded
dropdown.
Symptom seen on hongmingwang.moleculesai.app today:
- New Hermes Agent workspace (template declares runtime: hermes) loads
Config tab → Runtime dropdown shows "LangGraph (default)" because
there's no <option value="hermes"> in the hardcoded list; it falls
back to empty-value silently.
- Model field is a plain TextInput with static placeholder
"e.g. anthropic:claude-sonnet-4-6" — should be a combobox populated
from the selected runtime's models[].
- Required Env Vars is a TagList with static placeholder
"e.g. CLAUDE_CODE_OAUTH_TOKEN" — should auto-populate from the
selected model's required_env.
- Net effect: "Save & Deploy" sends empty model + empty env to the
provisioner → workspace instant-fails.
This PR cherry-picks the exact three files from PR #1526 (#359dc61
on staging) forward to main, without pulling the other 1171
commits:
- canvas/src/components/tabs/ConfigTab.tsx
- RuntimeOption interface + FALLBACK_RUNTIME_OPTIONS (hermes,
gemini-cli included)
- useEffect fetches /templates and populates runtimeOptions
dynamically
- dropdown renders from runtimeOptions (no hardcoded list)
- Model becomes a combobox with datalist of available models
per selected runtime
- Required Env Vars auto-populates from the selected model's
required_env on model change
- workspace-server/internal/handlers/templates.go
- /templates endpoint returns [{id, name, runtime, models}] with
per-template models registry (id, name, required_env)
- workspace-server/internal/handlers/templates_test.go
- Tests for runtime+models parsing and legacy top-level model
fallback
The canvas Runtime dropdown now resolves "hermes" correctly;
Model dropdown shows the models[] from the hermes template; Env
auto-populates with HERMES_API_KEY (or whichever model selected).
Verified locally:
- workspace-server builds clean
- Template handler tests pass: TestTemplatesList_RuntimeAndModelsRegistry,
TestTemplatesList_LegacyTopLevelModel, TestTemplatesList_NonexistentDir
Follow-up: the staging→main promotion gap (#1496) is the
underlying process issue. Either merge that PR or adopt a policy
of landing fixes directly on main (as several PRs have today).
Files here were chosen minimally to avoid pulling unrelated staging
changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Day 4: EC2 Console Output — approved by Marketing Lead + PM
Day 5: Org-Scoped API Keys — approved by Marketing Lead + PM
Both campaigns queued for Apr 24 and Apr 25.
Co-authored-by: Marketing Lead <marketing-lead@agents.moleculesai.app>
Three changes to stop ferrying sensitive content through our public
monorepo. All content already imported to Molecule-AI/internal (private)
— see linked PRs below.
## docs/incidents/INCIDENT_LOG.md — replaced with stub
Contained full security audit cycle records with CWE references,
file:line pointers to historical vulnerabilities, and severity
ratings. None of that belongs in a public repo.
→ Moved to Molecule-AI/internal/security/incident-log.md (PR #20).
Monorepo file becomes a 17-line stub pointing at the internal
location. Future incidents land in the internal file only.
## docs/architecture/canary-release.md — redacted identifiers
Had AWS account ID `004947743811` and IAM role name
`MoleculeStagingProvisioner` embedded. Even though the fleet
described isn't actually running (see state note), these
identifiers are account-specific and don't belong in public git.
→ Removed both values, replaced with generic references + a pointer
to Molecule-AI/internal/runbooks/canary-fleet.md (PR #21) where
the actual identifiers live. Any future rotation touches the
internal file, no public-git-history rewrite needed.
## docs/infra/workspace-terminal.md — reduced to public summary
Contained the full ops runbook: bootstrap script output, per-tenant
SG backfill loop with live SG IDs, customer slug names
(hongmingwang). Useful content but too specific for a public repo.
→ Moved to Molecule-AI/internal/runbooks/workspace-terminal.md
(PR #22). Monorepo file becomes a 30-line public summary of what
the feature does + pointers to code, so external readers /
self-hosters still get the design story.
## What's NOT in this PR (follow-up)
Marketing briefs, SEO plans, campaign copy, research dossiers, and
internal product designs (hermes-adapter-plan, medo-integration,
cognee-*) are the next batches. See docs policy doc coming next to
set team expectations.
Net removal: ~820 lines from public git going forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(canary-release): flag as aspirational; link to current state
The canary-release.md doc describes the pipeline as if the fleet is
running — referring to AWS account 004947743811 and a configured
MoleculeStagingProvisioner role. Reality as of 2026-04-22: no canary
tenants are provisioned, the 3 GH Actions secrets are empty, and
canary-verify.yml has failed 7/7 times in a row.
Added a top-of-doc ⚠️ state note that:
1. Clarifies this is intended design, not deployed reality.
2. Notes the AWS account ID is historical / unverified.
3. Explains that merges currently rely on manual promote-latest.
4. Cross-links to molecule-controlplane/docs/canary-tenants.md for
the Phase 1 work that's shipped, the Phase 2 stand-up plan, and
the "should we even do this now?" decision framework.
5. Asks whoever lands Phase 2 to reconcile the two docs.
No behaviour change — doc-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(build): add missing fmt import in a2a_proxy.go, fix canvas Dockerfile GID
- a2a_proxy.go: missing "fmt" import caused build failure (8 undefined
references at lines 743-775). Likely dropped during a recent merge.
- canvas/Dockerfile: GID 1000 already in use in node base image.
Changed to dynamic group/user creation with fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
The canary-release.md doc describes the pipeline as if the fleet is
running — referring to AWS account 004947743811 and a configured
MoleculeStagingProvisioner role. Reality as of 2026-04-22: no canary
tenants are provisioned, the 3 GH Actions secrets are empty, and
canary-verify.yml has failed 7/7 times in a row.
Added a top-of-doc ⚠️ state note that:
1. Clarifies this is intended design, not deployed reality.
2. Notes the AWS account ID is historical / unverified.
3. Explains that merges currently rely on manual promote-latest.
4. Cross-links to molecule-controlplane/docs/canary-tenants.md for
the Phase 1 work that's shipped, the Phase 2 stand-up plan, and
the "should we even do this now?" decision framework.
5. Asks whoever lands Phase 2 to reconcile the two docs.
No behaviour change — doc-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add Version E: ephemeral key story (60-second RSA key lifecycle)
- Elevate Version D: zero key rot angle with explicit 60-second key window
- Add Version A/D as approved primary angles (ops simplicity / security)
- Update status to APPROVED, unblocked for Social Media Brand
- Add header: positioning angle confirmed per GH issue #1637
- Add image suggestion for ephemeral key timeline graphic
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three-channel brief covering partner platforms, marketplace resellers,
and enterprise CI/CD automation. Links to Phase 30 (mol_ws_* token model)
as cross-sell. Flags first-mover opportunity vs CrewAI/LangGraph Cloud.
Collocates collateral gap list and open PM questions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds competitive differentiation section explicitly calling out the
governance layer gap in LangGraph's current A2A PRs vs Molecule AI's
Phase 30 production implementation. Canonical URL verified correct.
Closes PMM A2A blog final-review item.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- PRs #6645, #7113, #7205 not found in langchain-ai/langgraph open PR list
- Added VERIFY flags to LangGraph tracker; requires manual re-check
- Updated market events log with verification result
- Battlecard v0.3 LangGraph status is now flagged as stale pending re-verify
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two latent bugs the self-hosted Mac mini had been hiding. Both caught
by the newer toolchain on ubuntu-latest runners after PR #1626.
1. workspace-server/internal/handlers/terminal.go:442
`fmt.Sprintf("%s:%d", host, port)` flagged by go vet as unsafe
for IPv6 (it omits the required [::] brackets). Replaced with
`net.JoinHostPort(host, strconv.Itoa(port))` which handles both
IPv4 and IPv6 correctly. No runtime behaviour change — the only
call site passes "127.0.0.1", so the bug would never trigger in
practice, but vet is right to flag it as a latent correctness
issue.
2. workspace/tests/test_a2a_executor.py::test_set_current_task_updates_heartbeat
`MagicMock()` auto-creates attributes on first access, so
`getattr(heartbeat, "active_tasks", 0)` in shared_runtime.py
returned a MagicMock rather than the default 0. Adding 1 to a
MagicMock returns another MagicMock, so the assertion
`heartbeat.active_tasks == 1` never held. Seeding
`heartbeat.active_tasks = 0` before the first call makes
getattr() return a real int, matching how the real HeartbeatLoop
class initialises itself.
Both pre-existed on main and were hidden by the older Python / Go
toolchains on the Mac mini runner. Verified locally (venv pytest
pass, `go vet ./...` + `go build ./...` clean on workspace-server).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
molecule-core is a public repo — GHA-hosted minutes are free. The
self-hosted Mac mini was only in play to dodge GHA rate limits
(memory feedback_selfhosted_runner), but for these specific
workflows it came with real costs:
- Docker-push workflows emulated linux/amd64 from arm64 via QEMU —
every canvas + platform image build ran ~2-3x slower than native.
- Six PRs worth of keychain-avoidance hacks in publish-* because
`docker login` on macOS writes to osxkeychain unconditionally,
and the Mac mini's launchd user-agent keychain is locked.
- Homebrew pin-down environment variables (HOMEBREW_NO_*) sprinkled
everywhere to work around the shared /opt/homebrew symlink mess
on the runner.
- Setup-python@v5 couldn't write to /Users/runner, so ci.yml
python-lint resorted to a hand-rolled Homebrew python3.11 dance.
- Single runner → fan-out contention; CodeQL's 45-min analysis
fought the canvas publish for the one slot.
Changes across the 7 workflows:
- runs-on: [self-hosted, macos, arm64] → ubuntu-latest (every job)
- publish-canvas-image + publish-workspace-server-image:
drop the hand-rolled auths-map step + QEMU setup + buildx v4
→ docker/login-action@v3 + setup-buildx@v3. Linux + amd64
target = native build.
- canary-verify + promote-latest: replace `brew install crane` +
HOMEBREW_NO_* incantations with imjasonh/setup-crane@v0.4.
- codeql.yml: drop `brew install jq` — jq is preinstalled on
ubuntu-latest.
- ci.yml shellcheck: drop the self-hosted existence check —
shellcheck is preinstalled via apt.
- ci.yml python-lint: replace the Homebrew python3.11 path dance
with actions/setup-python@v5 (which works fine on GHA-hosted),
add requirements.txt caching while we're there.
- Remove stale comments referencing "the self-hosted runner",
"Mac mini", keychain, osxkeychain etc.
The self-hosted Mac mini remains in service for private-repo
workflows only. Memory feedback_selfhosted_runner updated to
reflect the public-repo scope carve-out.
Net -96 lines across the 7 files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every standalone workspace-template repo now publishes to
ghcr.io/molecule-ai/workspace-template-<runtime>:latest via the
reusable publish-template-image workflow in molecule-ci (landed
today — one caller per template repo). This PR makes the
provisioner actually use those images:
- RuntimeImages map + DefaultImage switched from bare local tags
(workspace-template:<runtime>) to their GHCR equivalents.
- New ensureImageLocal step before ContainerCreate: if the image
isn't present locally, attempt `docker pull` and drain the
progress stream to completion. Best-effort — if the pull fails
(network, auth, rate limit) the subsequent ContainerCreate still
surfaces the actionable "No such image" error, now with a
GHCR-appropriate hint instead of the defunct
`bash workspace/build-all.sh <runtime>` advice.
- runtimeTagFromImage now handles both forms: legacy
`workspace-template:<runtime>` (local dev via build-all.sh /
rebuild-runtime-images.sh) and the current GHCR shape. Keeps
error hints sensible in both worlds.
- Tests cover the GHCR path for tag extraction and the new error
message shape. Legacy local tags still recognised.
Local dev path unchanged — scripts/build-images.sh and
workspace/rebuild-runtime-images.sh still produce locally-tagged
`workspace-template:<runtime>` images, and Docker's image
resolver matches them before any pull is attempted. So
contributors can keep iterating on a template repo without
round-tripping through GHCR.
Follow-on impact:
- hongmingwang.moleculesai.app (and any other tenant EC2) will
auto-pull `ghcr.io/molecule-ai/workspace-template-hermes:latest`
on the next hermes workspace provision — picking up the real
Nous hermes-agent behind the A2A bridge (template-hermes v2.1.0)
without any tenant-side rebuild step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(F1085): scope rm to /configs volume in deleteViaEphemeral
Regressed by commit 49ab614 ("CWE-78/CWE-22 — block shell injection
in deleteViaEphemeral") which changed the rm form from the scoped
concat "/configs/" + filePath to the unscoped 2-arg "/configs", filePath.
With 2 args, rm receives /configs as the first target — rm -rf /configs
attempts to delete the entire volume mount before processing filePath,
which is the F1085 (Misconfiguration - Filesystems) defect. The concat
form passes a single scoped path so rm only touches files inside /configs.
validateRelPath call retained as CWE-22 defence-in-depth.
* docs: note F1085 defect in deleteViaEphemeral 2-arg rm form
Amends the CWE-22+CWE-78 incident entry to record that commit 49ab614
regressed the F1085 (volume deletion scope) fix, and that f1085-fix
commit a432df5 restores the correct concat form.
---------
Co-authored-by: Molecule AI CP-QA <cp-qa@agents.moleculesai.app>
publish-canvas-image has been failing on every main push since 2026-04-21
at `addgroup -g 1000 canvas` because node:20-alpine already ships a `node`
user/group at uid/gid 1000. Same collision workspace-server/Dockerfile.tenant
already fixes with `deluser --remove-home node` before `addgroup`.
Copying that pattern here so the workflow goes green again and canvas images
publish to ghcr. No runtime behaviour change — canvas still runs as non-root
uid 1000.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- dev-start.sh: $ROOT/platform → $ROOT/workspace-server (Go server
lives in workspace-server/, not platform/; any developer running
this script would get "no such directory" immediately)
- nuke-and-rebuild.sh: add ROOT variable and -f "$ROOT/docker-compose.yml"
so docker compose works from any CWD; fix post-rebuild-setup.sh path
- rollback-latest.sh: add 'local' to src_digest and new_digest vars
inside roll() function to prevent global-scope leakage
Co-authored-by: Molecule AI Core-DevOps <core-devops@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>