Manual fresh-user clean-slate test surfaced three friction points in
the existing dev-start.sh:
1. The script ran docker compose -f docker-compose.infra.yml
directly, bypassing infra/scripts/setup.sh — so the workspace
template registry was never populated and the canvas template
palette came up empty (the "Template palette is empty"
troubleshooting hit).
2. ADMIN_TOKEN was not handled at all. Without it, the AdminAuth
fail-open gate worked initially but slammed shut the moment the
first workspace registered a token — at which point the canvas
could no longer call /workspaces or /templates. New users hit
401s with no obvious next step.
3. The script wasn't mentioned in docs/quickstart.md. New users
followed the documented 4-step manual flow and never discovered
the single command existed.
Fixes:
- dev-start.sh now calls infra/scripts/setup.sh, which brings up
full infra (postgres + redis + langfuse + clickhouse + temporal)
AND populates the template/plugin registry from manifest.json.
- On first run, dev-start.sh writes MOLECULE_ENV=development to
.env. This activates middleware.isDevModeFailOpen() which lets
the canvas keep calling admin endpoints without a bearer (the
intended local-dev escape hatch). The .env is preserved on
re-runs and sourced before the platform launches.
- The script intentionally does NOT auto-generate an ADMIN_TOKEN.
A first attempt did, and broke the canvas because isDevModeFailOpen
requires ADMIN_TOKEN empty AND MOLECULE_ENV=development together.
Setting ADMIN_TOKEN in dev would close the hatch and the canvas
has no way to read that token in a dev build (no
NEXT_PUBLIC_ADMIN_TOKEN bake step here). The .env comment block
explicitly warns future contributors not to add it.
- Both processes' logs go to /tmp/molecule-{platform,canvas}.log
instead of stdout-mixed so the readiness banner stays clean.
- Health-poll loops cap at 30s with a clear timeout error pointing
to the log file, instead of hanging forever.
- The readiness banner now lists the log paths AND tells the user
the next step is "open localhost:3000 → add API key in Config →
Secrets & API Keys → Global", instead of just listing service
URLs.
Quickstart doc rewrite leads with:
git clone ...
cd molecule-monorepo
./scripts/dev-start.sh
The 4-step manual flow is preserved as "Manual setup (advanced)"
for contributors who want per-component logs.
Verified end-to-end from clean Docker (no containers, no volumes,
no .env) three times: total wall-clock ~12s for a re-run with
cached npm/docker layers. Platform's HTTP 200 on /workspaces
without a bearer confirms the dev-mode auth hatch is active.
Same root cause as the workspace/molecule_ai_status.py docstring fix
in this PR: this doc claimed `molecule-monorepo-status` was a usable
shell alias and `from molecule_ai_status import set_status` was a
usable Python import. Both worked under the pre-#87 monolithic-template
layout (where workspace/Dockerfile created the symlink and COPY'd the
modules into /app/) but neither works in current standalone template
images that install the runtime as a wheel:
- `which molecule-monorepo-status` errors — only `a2a-db` and
`molecule-runtime` are registered console scripts.
- `from molecule_ai_status` raises ImportError — modules are under the
`molecule_runtime` package now.
Switched both examples to the canonical `python3 -m
molecule_runtime.molecule_ai_status` form (CLI) and `from
molecule_runtime.molecule_ai_status import set_status` (Python). Same
form the runtime ships in its own usage banner, so anyone discovering
this doc gets a runnable example.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Quota gates are resource-state conflicts, not payment failures —
RFC 9110 reserves 402 for billing/payment failures specifically. The
canonical Molecule-AI/docs PR #82 already shipped the corrected text;
this brings the molecule-core copy of the tutorial in line.
The inline parenthetical "(not 402 Payment Required — quota gates are
resource-state conflicts, not payment failures, per RFC 9110)" doubles
as a regression anchor: a future edit that flips 409 back to 402 would
have to also reword that explanation, making the change a deliberate
two-step act rather than a casual oversight.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an opt-in goroutine that polls GHCR every 5 minutes for digest
changes on each workspace-template-*:latest tag and invokes the same
refresh logic /admin/workspace-images/refresh exposes. With this, the
chain from "merge runtime PR" to "containers running new code" is fully
hands-off — no operator step between auto-tag → publish-runtime →
cascade → template image rebuild → host pull + recreate.
Opt-in via IMAGE_AUTO_REFRESH=true. SaaS deploys whose pipeline already
pulls every release should leave it off (would be redundant work);
self-hosters get true zero-touch.
Why a refactor of admin_workspace_images.go is in this PR:
The HTTP handler held all the refresh logic inline. To share it with
the new watcher without HTTP loopback, extracted WorkspaceImageService
with a Refresh(ctx, runtimes, recreate) (RefreshResult, error) shape.
HTTP handler is now a thin wrapper; behavior is preserved (same JSON
response, same 500-on-list-failure, same per-runtime soft-fail).
Watcher design notes:
- Last-observed digest tracked in memory (not persisted). On boot the
first observation per runtime is seed-only — no spurious refresh
fires on every restart.
- On Refresh error, the seen digest rolls back so the next tick retries.
Without this rollback a transient Docker glitch would convince the
watcher the work was done.
- Per-runtime fetch errors don't block other runtimes (one template's
brief 500 doesn't pause the others).
- digestFetcher injection seam in tick() lets unit tests cover all
bookkeeping branches without standing up an httptest GHCR server.
Verified live: probed GHCR's /token + manifest HEAD against
workspace-template-claude-code; got HTTP 200 + a real
Docker-Content-Digest. Same calls the watcher makes.
Co-authored-by: Hongming Wang <hongmingwangalt@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: point new-runtime-template flow at the GitHub template repo
The 'Writing a new adapter' section was a 6-step manual checklist that
re-derived the canonical shape every time. Now that
Molecule-AI/molecule-ai-workspace-template-starter exists as a GitHub
template, the flow collapses to:
gh repo create ... --template Molecule-AI/molecule-ai-workspace-template-starter
Plus a fill-in-the-TODO-markers table.
Why this matters: the starter ships with the
'repository_dispatch: [runtime-published]' cascade receiver pre-wired,
which means new templates pick up runtime PyPI publishes automatically
without the one-time setup PR each existing template needed (PRs #6-#22
across the 8 template repos that we just opened to retrofit). At
'hundreds of runtimes' scale this is the difference between linear PR-
toil and zero PR-toil per template addition.
Also adds: 'When the starter itself needs to evolve' — explicit pattern
for keeping the canonical shape in one place when it changes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
* docs(workspace-runtime): drop PYPI_TOKEN refs — OIDC is the new auth
Reflects PR #2113 (PyPI Trusted Publisher / OIDC migration). No static
PyPI token exists in the repo anymore, so the docs shouldn't claim one
does. Replaces the PYPI_TOKEN row in the Required Secrets table with an
"Auth" section pointing at the OIDC config; TEMPLATE_DISPATCH_TOKEN is
still the only repo secret the cascade needs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Hongming Wang <hongmingwangalt@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
External architecture review flagged the SECRETS_ENCRYPTION_KEY env var
on the platform as encryption-at-rest theater. The reviewer read only
the platform repo and missed that the master key actually lives in AWS
KMS at the control plane layer, with envelope encryption wrapping each
tenant secret blob.
Adds docs/architecture/secrets-key-custody.md as the canonical source
of truth for the full chain:
- Two-mode envelope (KMS_KEY_ARN vs static-key fallback)
- Per-blob AES-256-GCM with KMS-wrapped DEKs
- Where each key actually lives (KMS, CP env, tenant env)
- Threat model per attacker capability
- Rotation story (annual KMS CMK rotation, manual DEK rotation on incident)
- Audit posture (SOC2 / ISO 27001 questionnaire bullets)
Patches three downstream docs that previously stopped at the env-var
level and link them to the new custody doc:
- development/constraints-and-rules.md (Rule 11)
- architecture/database-schema.md (workspace_secrets paragraph)
- architecture/molecule-technical-doc.md (env-vars table)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The production-side end of the runtime CD chain. Operators (or the post-
publish CI workflow) hit this after a runtime release to pull the latest
workspace-template-* images from GHCR and recreate any running ws-* containers
so they adopt the new image. Without this, freshly-published runtime sat in
the registry but containers kept the old image until naturally cycled.
Implementation notes:
- Uses Docker SDK ImagePull rather than shelling out to docker CLI — the
alpine platform container has no docker CLI installed.
- ghcrAuthHeader() reads GHCR_USER + GHCR_TOKEN env, builds the base64-
encoded JSON payload Docker engine expects in PullOptions.RegistryAuth.
Both empty → public/cached images only; both set → private GHCR pulls.
- Container matching uses ContainerInspect (NOT ContainerList) because
ContainerList returns the resolved digest in .Image, not the human tag.
Inspect surfaces .Config.Image which is what we need.
- Provisioner.DefaultImagePlatform() exported so admin handler picks the
same Apple-Silicon-needs-amd64 platform as the provisioner — single
source of truth for the multi-arch override.
Local-dev companion: scripts/refresh-workspace-images.sh runs on the
host and inherits the host's docker keychain auth — alternate path for
when GHCR_USER/TOKEN aren't set in the platform env.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
71 files across docs/marketing/ and marketing/ are blocked by the
Block-internal-flavored-paths CI gate (CEO directive 2026-04-23).
These paths must live in Molecule-AI/internal, not the public monorepo.
Unblocks PR #1981 (staging→main sync).
Public-facing blog/devrel content should be re-added via correct paths:
docs/blog/<slug>.md, docs/devrel/<slug>.md, docs/tutorials/<slug>.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Discord-format announcement for Phase 34 GA (April 30, 2026).
All four features: Tool Trace, Platform Instructions, Partner API Keys,
SaaS Federation v2. ~550 words, community-native tone.
Address: Molecule-AI/molecule-core#1836
Co-Authored-By: Claude Community Manager <noreply@anthropic.com>
Community announcement for Phase 34 GA (April 30, 2026).
Four features: Tool Trace, Platform Instructions, Partner API Keys,
SaaS Federation v2. Discord-format, ~550 words, community-native tone.
Addresses Molecule-AI/molecule-core#1836.
Co-Authored-By: Claude Community Manager <noreply@anthropic.com>
Community announcement for Phase 34 GA (April 30, 2026).
Four features: Tool Trace, Platform Instructions, Partner API Keys,
SaaS Federation v2. Discord-format, ~550 words, community-native tone.
Addresses Molecule-AI/molecule-core#1836.
Co-Authored-By: Claude Community Manager <noreply@anthropic.com>
This monorepo is public. Internal content (positioning, competitive
briefs, sales playbooks, PMM/press drip, draft campaigns) belongs in
Molecule-AI/internal — never here.
## What this PR removes
/research/ (3 competitive briefs)
/marketing/ (45 files: assets, audio, community, copy,
demos, devrel, drip, pmm, press, sales)
/docs/marketing/ (31 draft campaign / blog / brief files)
comment-1172.json + comment-1173.json
test-pmm-temp.txt
tick-reflections-temp.md
83 files removed, 7,141 lines deleted from public history (going forward —
historical commits remain visible in this repo's git log).
## Companion: internal repo absorption
Molecule-AI/internal PR `chore/migrate-monorepo-internal-content-2026-04-23`
absorbs all 79 files into `from-monorepo-2026-04-23/` for curator triage
into the existing internal/marketing/ tree. Bulk-dump avoids file-collision
on overlapping subdirs (audio, devrel, pmm).
## Three-layer enforcement so this can't recur
1. .gitignore — blocks `git add` of /research, /marketing, /docs/marketing,
/comment-*.json, *-temp.{md,txt}, /test-pmm-*, /tick-reflections-*
2. .github/workflows/block-internal-paths.yml — CI hard gate. Fails any PR
that adds a forbidden path. Cannot be silently bypassed.
3. docs/internal-content-policy.md — canonical decision tree for agents
and humans. Linked from the CI failure message.
A separate PR on molecule-ai-org-template-molecule-dev updates SHARED_RULES
to teach every agent role to write internal content directly to
Molecule-AI/internal via gh repo clone + commit + PR (the prevention-at-
source layer; this PR is the mechanical backstop).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The monitoring section referenced GET /org/tokens/:id/logs which does
not exist. The org token API only exposes List/Create/Revoke
(GET/POST/DELETE /org/tokens). Per-token activity logs via API are
a planned feature, not yet built.
Fixes: molecule-core#1914
- Replaced fake curl example with Canvas Activity Log path
- Added roadmap note: per-token activity logs via API (planned)
- Updated footer to include per-token activity logs on roadmap
- Kept the operational guidance (monitor call patterns, revoke if
suspicious) since the principle is correct even if the API is TBD
- Rename combined overview slug to tool-trace-platform-instructions-overview
- Add og_image placeholder to all 3 posts
- Cross-link all Phase 34 posts bidirectionally
- Add Tool Trace X post and Platform Instructions LinkedIn post
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Research team competitive audit confirmed no competitor has documented
programmatic partner org provisioning API equivalent to mol_pk_*. Updated
lead claim from unverified "only platform" to verified "first-mover" /
"first agent platform" framing for legal defensibility. Resolves the
VERIFICATION REQUIRED warning blocks in the battlecard.
Co-authored-by: Molecule AI Marketing Lead <marketing-lead@agents.moleculesai.app>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
PR #1686 introduced two platform-level features:
- Tool Trace: tool_call list in A2A metadata, stored in activity_logs.tool_trace JSONB
- Platform Instructions: admin-configurable instruction text (global/workspace scope),
injected as first section of every agent's system prompt at startup
Demo covers 5 scenarios: admin creates global instruction, workspace-scoped instruction,
agent fetches resolved instructions at boot, admin lists instructions, and query activity
logs with tool_trace. Includes screencast outline (5 moments, ~90s) and TTS narration script.
Co-authored-by: Molecule AI DevRel Engineer <devrel-engineer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
New tutorial covering:
- Control plane provisioning for multi-tenant org isolation
- Neon DB branch-per-tenant architecture
- EC2 workspace + security group per tenant
- Platform API for tenant onboarding, billing, quota
Blocked on: Stripe Atlas integration (Phase 34)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs(blog): add Phase 34 blog posts — Partner API Keys, Governance, Tool Trace
- Partner API Keys: partner-gated MCP server access for enterprise
- Platform Instructions Governance: org-scoped AI instruction governance
- Tool Trace Observability: debug/audit AI agent decision trees
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(blog): remove og_image refs from Phase 34 posts — images TBD
OG images are a known gap across many posts in the repo. Removed og_image
lines from all 4 Phase 34 posts to avoid 404s. Social Media Brand to
generate final assets. Also fixed broken link in governance post:
/docs/blog/ai-agent-observability-without-overhead → /blog/...
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Content Marketer <content-marketer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Brings the staging branch up to date with main's feature-fix stream so
every staging-targeted PR stops tripping on pre-existing rot. Before
this merge, staging had 30+ compile + test failures from fix PRs that
landed on main but never reached staging — primarily #1755's panic-
cascade + schema-drift alignments.
After this merge the handlers package goes from 30+ fails → 2 pre-
existing nil-docker test panics (TestCopyFilesToContainer_CWE22_
RejectsTraversal + TestDeleteViaEphemeral_F1085_RejectsTraversal),
both authored on staging and broken before this promotion. Tracked
separately; not a merge regression.
## Conflicts resolved
1. docs/marketing/campaigns/discord-adapter-announcement/announcement.md
— deleted on main (9d0d213: "move sensitive strategy + research to
internal repo"), modified on staging. Deletion wins: marketing
content moved out of the public monorepo per that commit's intent.
The content lives in the internal repo.
2. workspace-server/internal/handlers/container_files.go — staging's
rmTarget version kept. Main's version had `Cmd: []string{"rm",
"-rf", "/configs/" + filePath}` which concatenates raw filePath
AFTER the prefix-check on rmTarget, defeating the path-traversal
guard (a "../etc/passwd" input passes validation but the rm cmd
then traverses). Staging's `Cmd: []string{"rm", "-rf", rmTarget}`
uses the validated path. Keeping staging's more-secure variant.
## Includes build unblockers from #1769 / #1782
- terminal.go: malformed handleLocalConnect repaired
- terminal_test.go: missing braces in TestHandleConnect_RoutesToLocal
- workspace_crud.go: unused imports + duplicate strField block
- container_files_test.go: duplicate contains() removed (uses the one
in workspace_provision_test.go, same package)
## Verification
- go build ./... ✅ clean
- go vet ./... ✅ clean
- go test -race ./... — 18/20 packages green; 2 test panics in
internal/handlers are pre-existing on staging (documented above)
Existing external-agent-registration.md is 784 lines — great reference
but hostile to first-time devs evaluating Molecule. Add a tight
5-minute quickstart aimed at "make it work today":
- 40-line Python agent with A2A JSON-RPC skeleton
- Cloudflare quick-tunnel for instant public URL (no account)
- Single curl registration
- Common gotchas table (includes the canvas dedup + tunnel rotation
issues caught in the demo this afternoon)
- Production upgrade path
- Preview of polling mode (Phase N+1 transport)
- 4-step diagnostic checklist at the bottom
The reference doc (external-agent-registration.md) now has a prominent
"in a hurry?" callout pointing at the quickstart, so the discovery
path works either way.
Target audience: a developer who wants to see their code on canvas
inside 5 minutes, not a self-hoster hardening for prod.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These files have been in public monorepo docs/ since the open-source
restructure on 2026-04-18, but are operational (outreach targets,
analytics tracking IDs, staged unpublished social copy) or strategic
(launch plans, SEO briefs, keyword targets, competitive research).
Per the internal documentation policy (2026-04-22), they belong in
the private internal repo. Pair PR: internal#27 receives the files.
Removed:
- docs/marketing/campaigns/* — 6 campaign packs with outreach + analytics
- docs/marketing/plans/phase-30-launch-plan.md — draft launch plan
- docs/marketing/briefs/* — 2 SEO content briefs
- docs/marketing/seo/keywords.md — keyword strategy
- docs/research/cognee-*.md — 2 architecture + isolation evals
What stays public:
- docs/marketing/blog/ — published blog posts
- docs/marketing/devrel/demos/ — dev-facing demo scripts + video
- docs/marketing/discord-adapter-day2/ — already-posted community copy
No external references to update — cross-references among these files
are now intact inside the internal repo; no public CLAUDE.md / README /
PLAN / docs/README referenced the moved paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(devrel): add Phase 30 hero video — 3 aspect ratio cuts
Primary (16:9), social (9:16), and LinkedIn (1:1) cuts.
47.95s, 30fps H.264, dark zinc theme, burn-in captions, VO track.
Assembled from:
- marketing/assets/phase30-fleet-diagram.png
- marketing/audio/phase30-video-vo.mp3
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs(marketing): fill Discord adapter Day 2 blog URL — ready for Apr 22 push
Adds https://moleculesai.app/blog/discord-adapter to both Reddit
(r/LocalLLaMA) and Hacker News post bodies. Updates status line and
draft attribution. Reddit/HN copy is now complete and ready for
Social Media Brand coordination.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(marketing): correct Discord adapter blog URL — discord-adapter → 2026-04-21-discord-adapter
Fixes broken link in Reddit and HN Day 2 copy. Correct slug is
/blog/2026-04-21-discord-adapter.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Community Manager <community-manager@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
* docs(social): EC2 Instance Connect SSH launch copy + terminal demo visual
PR #1533 (feat/terminal: remote path via aws ec2-instance-connect + pty)
Issue #1547 (social: launch thread for EC2 Instance Connect SSH)
Content:
- docs/marketing/social/2026-04-22-ec2-instance-connect-ssh/social-copy.md
5-post X thread + LinkedIn single post, dark theme brand voice
- docs/assets/blog/2026-04-22-ec2-instance-connect-ssh/ec2-terminal-demo.png (1200x800)
Canvas Terminal tab mockup showing EC2 bash prompt via EIC
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs(blog): AI agent credential model — one key, named, monitored
Companion post to the enterprise-key-management launch post.
Focuses on the agent-specific angle: dynamic tool interfaces,
emergent behavior containment, delegation chains, and the
security properties that survive agent compromise.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Social Media Brand <social-media-brand@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI DevRel Engineer <devrel-engineer@agents.moleculesai.app>
Three changes to stop ferrying sensitive content through our public
monorepo. All content already imported to Molecule-AI/internal (private)
— see linked PRs below.
Contained full security audit cycle records with CWE references,
file:line pointers to historical vulnerabilities, and severity
ratings. None of that belongs in a public repo.
→ Moved to Molecule-AI/internal/security/incident-log.md (PR #20).
Monorepo file becomes a 17-line stub pointing at the internal
location. Future incidents land in the internal file only.
Had AWS account ID `004947743811` and IAM role name
`MoleculeStagingProvisioner` embedded. Even though the fleet
described isn't actually running (see state note), these
identifiers are account-specific and don't belong in public git.
→ Removed both values, replaced with generic references + a pointer
to Molecule-AI/internal/runbooks/canary-fleet.md (PR #21) where
the actual identifiers live. Any future rotation touches the
internal file, no public-git-history rewrite needed.
Contained the full ops runbook: bootstrap script output, per-tenant
SG backfill loop with live SG IDs, customer slug names
(hongmingwang). Useful content but too specific for a public repo.
→ Moved to Molecule-AI/internal/runbooks/workspace-terminal.md
(PR #22). Monorepo file becomes a 30-line public summary of what
the feature does + pointers to code, so external readers /
self-hosters still get the design story.
Marketing briefs, SEO plans, campaign copy, research dossiers, and
internal product designs (hermes-adapter-plan, medo-integration,
cognee-*) are the next batches. See docs policy doc coming next to
set team expectations.
Net removal: ~820 lines from public git going forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Day 4: EC2 Console Output — approved by Marketing Lead + PM
Day 5: Org-Scoped API Keys — approved by Marketing Lead + PM
Both campaigns queued for Apr 24 and Apr 25.
Co-authored-by: Marketing Lead <marketing-lead@agents.moleculesai.app>
Three changes to stop ferrying sensitive content through our public
monorepo. All content already imported to Molecule-AI/internal (private)
— see linked PRs below.
## docs/incidents/INCIDENT_LOG.md — replaced with stub
Contained full security audit cycle records with CWE references,
file:line pointers to historical vulnerabilities, and severity
ratings. None of that belongs in a public repo.
→ Moved to Molecule-AI/internal/security/incident-log.md (PR #20).
Monorepo file becomes a 17-line stub pointing at the internal
location. Future incidents land in the internal file only.
## docs/architecture/canary-release.md — redacted identifiers
Had AWS account ID `004947743811` and IAM role name
`MoleculeStagingProvisioner` embedded. Even though the fleet
described isn't actually running (see state note), these
identifiers are account-specific and don't belong in public git.
→ Removed both values, replaced with generic references + a pointer
to Molecule-AI/internal/runbooks/canary-fleet.md (PR #21) where
the actual identifiers live. Any future rotation touches the
internal file, no public-git-history rewrite needed.
## docs/infra/workspace-terminal.md — reduced to public summary
Contained the full ops runbook: bootstrap script output, per-tenant
SG backfill loop with live SG IDs, customer slug names
(hongmingwang). Useful content but too specific for a public repo.
→ Moved to Molecule-AI/internal/runbooks/workspace-terminal.md
(PR #22). Monorepo file becomes a 30-line public summary of what
the feature does + pointers to code, so external readers /
self-hosters still get the design story.
## What's NOT in this PR (follow-up)
Marketing briefs, SEO plans, campaign copy, research dossiers, and
internal product designs (hermes-adapter-plan, medo-integration,
cognee-*) are the next batches. See docs policy doc coming next to
set team expectations.
Net removal: ~820 lines from public git going forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(canary-release): flag as aspirational; link to current state
The canary-release.md doc describes the pipeline as if the fleet is
running — referring to AWS account 004947743811 and a configured
MoleculeStagingProvisioner role. Reality as of 2026-04-22: no canary
tenants are provisioned, the 3 GH Actions secrets are empty, and
canary-verify.yml has failed 7/7 times in a row.
Added a top-of-doc ⚠️ state note that:
1. Clarifies this is intended design, not deployed reality.
2. Notes the AWS account ID is historical / unverified.
3. Explains that merges currently rely on manual promote-latest.
4. Cross-links to molecule-controlplane/docs/canary-tenants.md for
the Phase 1 work that's shipped, the Phase 2 stand-up plan, and
the "should we even do this now?" decision framework.
5. Asks whoever lands Phase 2 to reconcile the two docs.
No behaviour change — doc-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(build): add missing fmt import in a2a_proxy.go, fix canvas Dockerfile GID
- a2a_proxy.go: missing "fmt" import caused build failure (8 undefined
references at lines 743-775). Likely dropped during a recent merge.
- canvas/Dockerfile: GID 1000 already in use in node base image.
Changed to dynamic group/user creation with fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
The canary-release.md doc describes the pipeline as if the fleet is
running — referring to AWS account 004947743811 and a configured
MoleculeStagingProvisioner role. Reality as of 2026-04-22: no canary
tenants are provisioned, the 3 GH Actions secrets are empty, and
canary-verify.yml has failed 7/7 times in a row.
Added a top-of-doc ⚠️ state note that:
1. Clarifies this is intended design, not deployed reality.
2. Notes the AWS account ID is historical / unverified.
3. Explains that merges currently rely on manual promote-latest.
4. Cross-links to molecule-controlplane/docs/canary-tenants.md for
the Phase 1 work that's shipped, the Phase 2 stand-up plan, and
the "should we even do this now?" decision framework.
5. Asks whoever lands Phase 2 to reconcile the two docs.
No behaviour change — doc-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add Version E: ephemeral key story (60-second RSA key lifecycle)
- Elevate Version D: zero key rot angle with explicit 60-second key window
- Add Version A/D as approved primary angles (ops simplicity / security)
- Update status to APPROVED, unblocked for Social Media Brand
- Add header: positioning angle confirmed per GH issue #1637
- Add image suggestion for ephemeral key timeline graphic
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three-channel brief covering partner platforms, marketplace resellers,
and enterprise CI/CD automation. Links to Phase 30 (mol_ws_* token model)
as cross-sell. Flags first-mover opportunity vs CrewAI/LangGraph Cloud.
Collocates collateral gap list and open PM questions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds competitive differentiation section explicitly calling out the
governance layer gap in LangGraph's current A2A PRs vs Molecule AI's
Phase 30 production implementation. Canonical URL verified correct.
Closes PMM A2A blog final-review item.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- PRs #6645, #7113, #7205 not found in langchain-ai/langgraph open PR list
- Added VERIFY flags to LangGraph tracker; requires manual re-check
- Updated market events log with verification result
- Battlecard v0.3 LangGraph status is now flagged as stale pending re-verify
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(F1085): scope rm to /configs volume in deleteViaEphemeral
Regressed by commit 49ab614 ("CWE-78/CWE-22 — block shell injection
in deleteViaEphemeral") which changed the rm form from the scoped
concat "/configs/" + filePath to the unscoped 2-arg "/configs", filePath.
With 2 args, rm receives /configs as the first target — rm -rf /configs
attempts to delete the entire volume mount before processing filePath,
which is the F1085 (Misconfiguration - Filesystems) defect. The concat
form passes a single scoped path so rm only touches files inside /configs.
validateRelPath call retained as CWE-22 defence-in-depth.
* docs: note F1085 defect in deleteViaEphemeral 2-arg rm form
Amends the CWE-22+CWE-78 incident entry to record that commit 49ab614
regressed the F1085 (volume deletion scope) fix, and that f1085-fix
commit a432df5 restores the correct concat form.
---------
Co-authored-by: Molecule AI CP-QA <cp-qa@agents.moleculesai.app>
Review turned up two issues in the rollout runbook:
1. The tenant env-var list was missing — today's debugging burned 2
hours on hongmingwang where everything worked infra-side but
canvas 401'd because MOLECULE_ORG_SLUG and CP_UPSTREAM_URL weren't
set. Doc without this sends the next operator down the same hole.
Added a dedicated step-3 table covering CP_UPSTREAM_URL,
MOLECULE_ORG_SLUG, MOLECULE_ORG_ID, AWS_REGION with the exact
failure mode each one produces when missing.
2. Backfill loop used tab-separated aws-cli output directly, which
can concatenate all SG ids into one word and run the loop body
once with no iteration. Inserted `| tr '\t' '\n'` — no-op on
well-behaved output, fix on the concatenated case.
Renumbered subsequent sections.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Expanded the rollout section with the exact scripts + env vars
that landed to make Hermes workspace Terminal work on 2026-04-22.
Points at molecule-controlplane#227 (which adds bootstrap script +
EIC_ENDPOINT_SG_ID env var) so operators can reproduce the setup
on a new AWS account in one command.
Also documents the existing-workspace backfill for the instance_id
column — the CP only writes on new provisions, so pre-migration
workspaces need a manual UPDATE before Terminal routes to the
remote path.
Refs: #1528 (resolved)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Researched the actual molecule-controlplane repo rather than guessing:
- Workspaces launch in a shared CP workspace VPC (p.VPCID), not per
tenant
- CP already tags instances with Role=workspace at ec2.go:1126 — my
prior IAM policy used molecule:role which doesn't match anything
- workspaceIngressRules() currently opens only 8000/tcp — no port 22
Corrected:
- IAM policy Condition now matches existing Role tag (no CP change
needed for the scope to work fleet-wide)
- Added OpenTunnel action so EIC Endpoint path works
- Dropped the \"open 22 in SG\" recommendation. Cross-VPC topology
makes SG CIDR rules awkward (would need peering + tenant-CIDR
bookkeeping). EIC Endpoint is one VPC resource + no SG changes.
- Simplified rollout to two items: add IAM policy, create EIC Endpoint
Kept direct-SG path as an explicit not-recommended alternative.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundation for the EIC-based terminal handler (#1528). The tenant's
workspace-server needs to map workspace_id → EC2 instance_id to open
an SSH session, but CPProvisioner.Start returned the instance id only
for logging — it was never written anywhere. This PR adds the column
and writes it at provision time.
Scope kept intentionally small: no terminal code yet. The follow-up
PR will consume this column from the terminal handler.
What's here:
- migrations/038_workspace_instance_id — nullable TEXT column on
workspaces, partial index on non-null for fast lookup
- workspace_provision.go — UPDATE after CPProvisioner.Start; failure
logs but doesn't fail provisioning (row just lacks instance_id and
terminal falls back to the existing not-reachable error)
- docs/infra/workspace-terminal.md — full design for the terminal
flow: EIC vs SSM comparison, IAM policy JSON, SG rules, key
lifetime, failure modes, rollout checklist
Refs: #1528
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: split 4 oversized handler files into focused sub-files
- org.go (1099 lines) → org.go + org_import.go + org_helpers.go
- mcp.go (1001 lines) → mcp.go + mcp_tools.go
- workspace.go (934 lines) → workspace.go + workspace_crud.go
- a2a_proxy.go (825 lines) → a2a_proxy.go + a2a_proxy_helpers.go
No functional changes — same package, same exports, same tests.
All files stay under 635 lines.
Note: isSafeURL and isPrivateOrMetadataIP are duplicated between
mcp_tools.go and a2a_proxy_helpers.go — this is a pre-existing issue
from the original mcp.go and a2a_proxy.go, not introduced by this split.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(runtime+scheduler): increment/decrement active_tasks counter (refs #1386)
* docs(tutorials): add Self-Hosted AI Agents guide — Docker, Fly Machines, bare metal
* docs: add Remote Agents feature + Phase 30 blog links to docs index
* docs(marketing): update Phase 30 brief — Action 5 complete, docs/index.md update noted
* docs(api-ref): add workspace file copy API reference (#1281)
Documents TemplatesHandler.copyFilesToContainer (container_files.go):
- Endpoint overview: PUT /workspaces/:id/files/*path
- Parameter descriptions for all four function parameters
- CWE-22 path traversal protection (PRs #1267/1270/1271)
- Defense-in-depth: validateRelPath at handler + archive boundary
- Full error code table (400/404/500)
- curl example with success and path-traversal rejection cases
Also covers: writeViaEphemeral routing, findContainer fallback,
allowed roots allow-list, and related links to platform-api.md.
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): CWE-78/CWE-22 — block shell injection in deleteViaEphemeral (#1310)
## Summary
Issue #1273: deleteViaEphemeral interpolated filePath directly into
rm command, enabling both shell injection (CWE-78) and path traversal
(CWE-22) attacks.
## Changes
1. Added validateRelPath(filePath) guard before constructing the rm command.
validateRelPath blocks absolute paths and ".." traversal sequences.
2. Changed Cmd from "/configs/"+filePath (string interpolation) to
[]string{"rm", "-rf", "/configs", filePath} (exec form). This
eliminates shell injection entirely — filePath is a plain argument,
never interpreted as shell code.
## Security properties
- validateRelPath: blocks "../" and absolute paths before they reach Docker
- Exec form: filePath cannot inject shell metacharacters even if validation
is somehow bypassed
- "/configs" as separate arg: rm has exactly two arguments, no room for
injected args
Closes#1273.
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
* fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in a2a_proxy.go (#1292) (#1302)
* fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in mcp.go and a2a_proxy.go
Issue #1042: 3 CodeQL SSRF findings across mcp.go and a2a_proxy.go.
staging already ships the fix (PRs #1147, #1154 → merged); main did not include it.
- mcp.go: add isSafeURL() + isPrivateOrMetadataIP() helpers; validate
agentURL before outbound calls in mcpCallTool (line ~529) and
toolDelegateTaskAsync (line ~607)
- a2a_proxy.go: add identical isSafeURL() + isPrivateOrMetadataIP()
helpers; call isSafeURL() before dispatchA2A in resolveAgentURL()
(blocks finding #1 at line 462)
- mcp_test.go: 19 new tests covering all blocked URL patterns:
file://, ftp://, 127.0.0.1, ::1, 169.254.169.254, 10.x.x.x,
172.16.x.x, 192.168.x.x, empty hostname, invalid URL,
isPrivateOrMetadataIP across all private/CGNAT/metadata ranges
1. URL scheme enforcement — http/https only
2. IP literal blocking — loopback, link-local, RFC-1918, CGNAT, doc/test ranges
3. DNS hostname resolution — blocks internal hostnames resolving to private IPs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(ci-blocker): remove duplicate isSafeURL/isPrivateOrMetadataIP from mcp.go
Issue #1292: PR #1274 duplicated isSafeURL + isPrivateOrMetadataIP in
mcp.go — both functions already exist on main at lines 829 and 876.
Kept the mcp.go definitions (the originals) and removed the 70-line
duplicate appended at end of file. a2a_proxy.go functions are
unchanged — they serve the same purpose via a separate code path.
* fix: remove orphaned commit-text lines from a2a_proxy.go
Three lines from the PR/commit title were accidentally baked into the
file during the rebase from #1274 to #1302, causing a Go syntax error
(a bare string literal at statement level followed by dangling braces).
Deletion restores:
}
return agentURL, nil
}
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>
* fix(canvas/test): patch test regressions from PR #1243 + proximity hitbox fix (#1313)
* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled
With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.
Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.
Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.
* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)
Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.
Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.
Closes#1043.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix
Two regressions introduced by PR #1243 (fix issue #1207):
1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
`{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
expected only `{id, name}`. Added `hasChildren: false` to the assertion.
2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
without `act()`. With fake timers, `setState` (synchronous) is flushed by
`advanceTimersByTimeAsync`, but the React state update it triggers is a
microtask — so the test saw stale render. Wrapping in `act(async () =>
{ await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
before assertions run.
All 813 vitest tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas): add 100px proximity threshold to drag-to-nest detection
Fixes#1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.
The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.
Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas): add ?? 0 guard for optional budget_used in progressPct (#1324) (#1327)
* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled
With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.
Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.
Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.
* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)
Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.
Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.
Closes#1043.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix
Two regressions introduced by PR #1243 (fix issue #1207):
1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
`{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
expected only `{id, name}`. Added `hasChildren: false` to the assertion.
2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
without `act()`. With fake timers, `setState` (synchronous) is flushed by
`advanceTimersByTimeAsync`, but the React state update it triggers is a
microtask — so the test saw stale render. Wrapping in `act(async () =>
{ await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
before assertions run.
All 813 vitest tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas): add 100px proximity threshold to drag-to-nest detection
Fixes#1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.
The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.
Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas): add ?? 0 guard for optional budget_used in progressPct
Fixes#1324 — TypeScript strict mode flags budget.budget_used as
possibly undefined in the progressPct ternary, even though the
outer condition checks budget_limit > 0.
Fix: use nullish coalescing (budget_used ?? 0) so progress shows 0%
when the backend returns a partial shape (provisioning-stuck
workspaces). Also adds a test covering the undefined-budget_used
case with the progress bar aria-valuenow and fill width both at 0%.
Closes#1324.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas): add ?? 0 guard for optional budget_used in progressPct (issue #1324) (#1329)
* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled
With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.
Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.
Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.
* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)
Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.
Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.
Closes#1043.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix
Two regressions introduced by PR #1243 (fix issue #1207):
1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
`{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
expected only `{id, name}`. Added `hasChildren: false` to the assertion.
2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
without `act()`. With fake timers, `setState` (synchronous) is flushed by
`advanceTimersByTimeAsync`, but the React state update it triggers is a
microtask — so the test saw stale render. Wrapping in `act(async () =>
{ await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
before assertions run.
All 813 vitest tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas): add 100px proximity threshold to drag-to-nest detection
Fixes#1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.
The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.
Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas): add ?? 0 guard for optional budget_used in progressPct
Fixes#1324 — TypeScript strict mode flags budget.budget_used as
possibly undefined in the progressPct ternary, even though the
outer condition checks budget_limit > 0.
Fix: use nullish coalescing (budget_used ?? 0) so progress shows 0%
when the backend returns a partial shape (provisioning-stuck
workspaces). Also adds a test covering the undefined-budget_used
case with the progress bar aria-valuenow and fill width both at 0%.
Closes#1324.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(platform): unblock SaaS workspace registration end-to-end
Every workspace in the cross-EC2 SaaS provisioning shape was failing
registration, heartbeat, or A2A routing. Four distinct blockers sat
between "EC2 is up" and "agent responds"; three are platform-side and
fixed here (the fourth is in the CP user-data, separate PR).
1. SSRF validator blocked RFC-1918 (registry.go + mcp.go)
validateAgentURL and isPrivateOrMetadataIP rejected 172.16.0.0/12,
which contains the AWS default VPC range (172.31.x.x) that every
sibling workspace EC2 registers from. Registration returned 400 and
the 10-min provision sweep flipped status to failed. RFC-1918 +
IPv6 ULA are now gated behind saasMode(); link-local (169.254/16),
loopback, IPv6 metadata (fe80::/10, ::1), and TEST-NET stay blocked
unconditionally in both modes.
saasMode() resolution order:
1. MOLECULE_DEPLOY_MODE=saas|self-hosted (explicit operator flag)
2. MOLECULE_ORG_ID presence (legacy implicit signal, kept for
back-compat so existing deployments don't need a config change)
isPrivateOrMetadataIP now actually checks IPv6 — previously it
returned false on any non-IPv4 input, which would let a registered
[::1] or [fe80::...] URL bypass the SSRF check entirely.
2. Orphan auth-token minting (workspace_provision.go)
issueAndInjectToken mints a token and stuffs it into
cfg.ConfigFiles[".auth_token"]. The Docker provisioner writes that
file into the /configs volume — the CP provisioner ignores it
(only cfg.EnvVars crosses the wire). Result: live token in DB, no
plaintext on disk, RegistryHandler.requireWorkspaceToken 401s every
/registry/register attempt because the workspace is no longer in
the "no live token → bootstrap-allowed" state. Now no-ops in SaaS
mode; the register handler already mints on first successful
register and returns the plaintext in the response body for the
runtime to persist locally.
Also removes the redundant wsauth.IssueToken call at the bottom of
provisionWorkspaceCP, which created the same orphan-token pattern
a second time.
3. Compaction artefacts (bundle/importer.go, handlers/org_tokens.go,
scheduler.go, workspace_provision.go)
Four pre-existing compile errors on main from an earlier session's
code truncation: missing tuple destructuring on ExecContext /
redactSecrets / orgTokenActor, missing close-brace in
Scheduler.fireSchedule's panic recovery. All one-line mechanical
fixes; without them the binary would not build.
Tests
-----
ssrf_test.go adds:
* TestSaasMode — covers the env resolution ladder (explicit flag
wins over legacy signal, case-insensitive, whitespace tolerant)
* TestIsPrivateOrMetadataIP_SaaSMode — asserts RFC-1918 + IPv6 ULA
flip to allowed, metadata/loopback/TEST-NET still blocked
* TestIsPrivateOrMetadataIP_IPv6 — regression guard for the old
"returns false for all IPv6" behaviour
Follow-up issue for CP-sourced workspace_id attestation will be filed
separately — closes the residual intra-VPC SSRF + token-race windows
the SaaS-mode relaxation introduces.
Verified end-to-end today on workspace 6565a2e0 (hermes runtime, OpenAI
provider) — agent returned "PONG" in 1.4s after register → heartbeat →
A2A proxy → runtime.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(runtime+scheduler): increment/decrement active_tasks + max_concurrent (#1408)
Runtime (shared_runtime.py):
- set_current_task now increments active_tasks on task start, decrements
on completion (was binary 0/1)
- Counter never goes below 0 (max(0, n-1))
- Pushes heartbeat immediately on BOTH increment and decrement (#1372)
Scheduler (scheduler.go):
- Reads max_concurrent_tasks from DB (default 1, backward compatible)
- Skips cron only when active_tasks >= max_concurrent_tasks (was > 0)
- Leaders can be configured with max_concurrent_tasks > 1 to accept
A2A delegations while a cron runs
Platform:
- Added max_concurrent_tasks column to workspaces (migration 037)
- Workspace model + list/get queries include the new field
- API exposes max_concurrent_tasks in workspace JSON
Config.yaml support (future): runtime_config.max_concurrent_tasks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(review): address 3 critical issues from code review
1. BLOCKER: executor_helpers.py now uses increment/decrement too
(was still binary 0/1, stomping the counter for CLI + SDK executors)
2. BUG: asymmetric getattr defaults fixed — both paths use default 0
(was 0 on increment, 1 on decrement)
3. UX: current_task preserved when active_tasks > 0 on decrement
(was clearing task description even when other tasks still running)
4. Scheduler polling loop re-reads max_concurrent_tasks on each poll
(was using stale value from initial query)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
* docs: workspace files API reference, skill catalog, and links
* docs: fix secrets endpoint path across docs
The workspace secrets endpoint is `/workspaces/:id/secrets`, not
`/secrets/values`. This was wrong in quickstart.md (Path 2: Remote Agent)
and workspace-runtime.md (registration flow example and comparison table).
The external-agent-registration guide already had the correct path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: fix broken blog cross-link in skills-vs-bundled-tools post
Link path had an extra `/docs/` segment: `/docs/blog/...` instead of
`/blog/...`. Nextra resolves blog posts directly under `/blog/<slug>`,
not under `/docs/blog/`.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: add skill-catalog.md guide
Linked from the skills-vs-bundled-tools blog post as a reference
for TTS/image-generation/web-search skills. The blog promises
"install directly via the CLI" with a skill catalog — this page
fills that promise by documenting available skill types, install
commands, version management, custom skill authoring, and removal.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs(marketing): update Phase 30 brief — Action 5 complete, docs/index.md update noted
* docs(api-ref): add workspace file copy API reference
Documents TemplatesHandler.copyFilesToContainer (container_files.go):
- Endpoint overview: PUT /workspaces/:id/files/*path
- Parameter descriptions for all four function parameters
- CWE-22 path traversal protection (PRs #1267/1270/1271)
- Defense-in-depth: validateRelPath at handler + archive boundary
- Full error code table (400/404/500)
- curl example with success and path-traversal rejection cases
Also covers: writeViaEphemeral routing, findContainer fallback,
allowed roots allow-list, and related links to platform-api.md.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
* fix(handlers): add saasMode() gating to isPrivateOrMetadataIP in a2a_proxy_helpers.go
Issue #1421 / #1401: PR #1363 (handler split) moved isPrivateOrMetadataIP
into a2a_proxy_helpers.go but kept the OLD pre-SaaS version — it
unconditionally blocks RFC-1918 addresses, regressing the fix in
commits 1125a02 / cf10733.
The A2A proxy path now has the same SaaS-gated logic as registry.go:
- Cloud metadata (169.254/16, fe80::/10, ::1) always blocked in both modes
- RFC-1918 (10/8, 172.16/12, 192.168/16) + IPv6 ULA (fc00::/7) blocked in
self-hosted, allowed in SaaS cross-EC2 mode
- IPv6 addresses now properly checked (previous version returned false for all)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs(marketing): Discord adapter Day 2 Reddit + HN community copy
* fix(tests): supply *events.Broadcaster pointer to captureBroadcaster
Cannot use *captureBroadcaster as *events.Broadcaster when the struct
embeds events.Broadcaster as a value — must initialize as a named field.
Fixes go vet error in workspace_provision_test.go:
cannot use broadcaster (*captureBroadcaster) as *events.Broadcaster value
* Merge pull request #1429 from fix/canvas-tooltip-clear-timer
Without this, a 400ms setTimeout from onFocus/onMouseEnter that fires
after onBlur will re-show a tooltip the user just dismissed. The
setShow(false) in onBlur closes the tooltip immediately but leaves the
timer pending — Tab-blur followed by timer-fire would re-show it.
Fix: add clearTimeout(timerRef.current) at the top of onBlur, mirroring
the pattern already used in onMouseLeave and onFocus.
Refs: PR #1367 (a11y keyboard support — this was a pre-existing gap)
Co-authored-by: Molecule AI App-FE <app-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas/test): add missing children:[] to setPendingDelete expectation (#1426)
PR #1252 (cascade-delete UX) updated setPendingDelete to pass a
children array for cascade-warning rendering. The keyboard-a11y test
assertion was not updated to match.
Test: clicking 'Delete' hoists state to the store and closes the menu
Co-authored-by: Molecule AI Core-QA <core-qa@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(canvas/test): add children:[] to setPendingDelete + \' entity fix (closes#1380) (#1427)
* ci: retry — trigger fresh runner allocation
* fix(canvas/test): add children:[] to setPendingDelete assertion
setPendingDelete now includes children:[] (PR #1383 extended the
pendingDelete type). The keyboard accessibility test at line 225 used
exact object matching which omitted the new field, causing a failure
after staging merged #1383.
Issue: #1380
* fix(canvas): replace ' HTML entity with straight apostrophe
JSX does not entity-decode ' — it renders the literal text
"'" instead of "'". Found at line 157 (payment confirmed) and
line 321 (empty org list). Replaced with a straight apostrophe,
which JSX handles correctly.
Ref: issue #1375
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: DevOps Engineer <devops@molecule.ai>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Merge pull request #1430 from fix/1421-saas-ssrf-helpers
Issue #1421 / #1401: PR #1363 (handler split) moved isPrivateOrMetadataIP
into a2a_proxy_helpers.go but kept the OLD pre-SaaS version — it
unconditionally blocks RFC-1918 addresses, regressing the fix in
commits 1125a02 / cf10733.
The A2A proxy path now has the same SaaS-gated logic as registry.go:
- Cloud metadata (169.254/16, fe80::/10, ::1) always blocked in both modes
- RFC-1918 (10/8, 172.16/12, 192.168/16) + IPv6 ULA (fc00::/7) blocked in
self-hosted, allowed in SaaS cross-EC2 mode
- IPv6 addresses now properly checked (previous version returned false for all)
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(P0): CWE-22 path traversal in copyFilesToContainer + ContextMenu test
Issue #1434 — CWE-22 Path Traversal Regression:
PR #1280 (dc218212) correctly used cleaned path in tar header.
PR #1363 (e9615af) regressed to using uncleaned `name`.
Fix: use `clean` in filepath.Join AND add defence-in-depth escape check.
Issue #1422 — ContextMenu Test Regression:
PR #1340 expanded pendingDelete store type to include `children:[]`.
Test assertion missing the field — add `children:[]` to match.
Note: ssrf.go created (shared isSafeURL/isPrivateOrMetadataIP) to
prepare for the handler-split refactor fix — current branch has no
build error, but the shared file will prevent regression when PR #1363
is merged. isSafeURL/isPrivateOrMetadataIP retained in both files
for now to avoid breaking callers while the split is finalized.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: resolve 3 go vet failures + add idempotency_key to delegate_task_async
- workspace_provision_test.go: add missing mock := setupTestDB(t) to
TestSeedInitialMemories_Truncation — mock was referenced but never
declared, causing "undefined: mock" vet error
- orgtoken/tokens_test.go: discard unused orgID return value with _ in
Validate call — "declared and not used" vet error
- a2a_tools.py: delegate_task_async now sends idempotency_key (SHA-256
of workspace_id + task) to POST /workspaces/:id/delegate, fixing
duplicate task execution when an agent restarts mid-delegation (#1456)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: airenostars <airenostars@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Molecule AI Community Manager <community-manager@agents.moleculesai.app>
Co-authored-by: Molecule AI App-FE <app-fe@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-QA <core-qa@agents.moleculesai.app>
Co-authored-by: DevOps Engineer <devops@molecule.ai>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
- Add Reddit r/LocalLlama + r/MachineLearning copy sources
- Add full Hacker News post body + guidelines
- Add dev.to full post body + frontmatter
- Add Discord server #announcements copy
- Add coordination checklist with [BLOG_URL] placeholder flag
- Update PR/status references
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reduces credential surface in INCIDENT_LOG.md from partial-informative
(kvv-lHt-QFSyZwxeo...KVw, github_pat_11BPRRWQI0m...hsIJLIL) to
fully-redacted (sk-cp-lHt...KVw, github_pat_11...hsIJLIL) format.
ADMIN_TOKEN was already in truncated form (HlgeMb8...ShARE=).
Addresses GH #1333.
Co-authored-by: Molecule AI Core-DevOps <core-devops@agents.moleculesai.app>
Covers Docker, Fly Machines, and bare metal deployment models with
use cases, configuration examples, and a comparison table. Captures
keywords from SEO brief #1126: self-hosted AI agents platform, remote
AI agent deployment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds Reddit (r/LocalLLaMA) and Hacker News post bodies for Discord adapter
Day 2 community campaign. Blog URL left as placeholder — fill before posting.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the org-scoped API keys blog post (extracted from orphaned PR #1342).
Already live on Molecule-AI/docs main at content/blog/2026-04-20-org-api-keys.
Molecule AI is open source. Org-scoped API keys shipped in PRs #1105, #1107, #1109, and #1110.
- PLATFORM_URL: replace unreachable http://platform:8080 mesh-only default
with Docker-aware detection (host.docker.internal in containers,
localhost for local dev) across all workspace Python modules and the
git-token-helper shell script.
- WORKSPACE_ID: add fail-fast validation in main.py (SystemExit if empty)
consistent with coordinator.py / a2a_cli.py patterns already in place.
- INCIDENT_LOG.md: replace all 3 F1088 credential types with
***REDACTED*** (sk-cp- 2x, github_pat_ 2x, ADMIN_TOKEN base64 3x).
Fixes#1124, #1333.
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
Documents TemplatesHandler.copyFilesToContainer (container_files.go):
- Endpoint overview: PUT /workspaces/:id/files/*path
- Parameter descriptions for all four function parameters
- CWE-22 path traversal protection (PRs #1267/1270/1271)
- Defense-in-depth: validateRelPath at handler + archive boundary
- Full error code table (400/404/500)
- curl example with success and path-traversal rejection cases
Also covers: writeViaEphemeral routing, findContainer fallback,
allowed roots allow-list, and related links to platform-api.md.
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- mcp-server-setup.md: move 'Claude Code, Cursor, etc.' to a follow-on
sentence instead of parenthetical, fixing awkward 'agent (Claude Code,...)' pattern
- fly-machines-provisioner.md: replace 'as a Fly Machine' with 'on Fly Machines',
add clarification that the platform manages the workspace and Fly manages the machine
Co-authored-by: Molecule AI Documentation Specialist <documentation-specialist@agents.moleculesai.app>
* fix(security): call redactSecrets before seeding workspace memories (F1085)
seedInitialMemories() in workspace_provision.go was inserting template/config
memories directly into agent_memories without scrubbing credential patterns.
A workspace provisioned from a template containing API keys, tokens, or other
secrets would store them in plain text — the same class of issue as #838.
Fix: call redactSecrets(workspaceID, content) on the truncated memory content
before the INSERT. The truncation (maxMemoryContentLength = 100 KiB, CWE-400)
is preserved — redaction runs after truncation so the size limit still applies.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(workspace_provision): add seedInitialMemories coverage for #1208
Cover the truncate-at-100k boundary (PR #1167, CWE-400) and the
redactSecrets call (F1085 / #1132), both identified as untested in #1208.
- TestSeedInitialMemories_TruncatesOversizedContent: boundary at exactly
100k, 1 byte over, far over, and well under. Verifies INSERT receives
exactly maxMemoryContentLength bytes.
- TestSeedInitialMemories_RedactsSecrets: verifies redactSecrets runs
before INSERT, regression test for F1085.
- TestSeedInitialMemories_InvalidScopeSkipped: invalid scope is silently
skipped, no INSERT called.
- TestSeedInitialMemories_EmptyMemoriesNil: nil slice is handled without
DB calls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs(marketing): Discord adapter launch visual assets (#1209)
Squash-merge: Discord adapter launch visual assets (3 PNGs) + social copy. Acceptance: assets on staging.
* fix(ci): golangci-lint errcheck failures on staging
Suppress errcheck warnings for calls where the return value is safely
ignored:
- resp.Body.Close() (artifacts/client.go): deferred cleanup — failure
to close a response body is non-critical; the defer itself is what
matters for connection reuse.
- rows.Close() (bundle/exporter.go): deferred cleanup in a loop where
rows.Err() already handles query errors.
- filepath.Walk (bundle/exporter.go): top-level walk call; errors in
sub-directory traversal are handled by the inner callback (which
returns nil for err != nil).
- broadcaster.RecordAndBroadcast (bundle/importer.go): fire-and-forget
event broadcast; errors are logged internally by the broadcaster.
- db.DB.ExecContext (bundle/importer.go): best-effort runtime column
update; non-critical auxiliary data that the provisioner re-extracts
if needed.
Fixes: #1143
* test(artifacts): suppress w.Write return values to satisfy errcheck
All httptest.ResponseWriter.Write calls in client_test.go now discard
the byte count and error return with _, _ = prefix. The Write method
is safe to discard in test handlers — httptest.ResponseWriter.Write
never returns an error for in-memory buffers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(CI): move changes job off self-hosted runner + add workflow concurrency
Cherry-pick from staging PR #1194 for main. Two changes to relieve
macOS arm64 runner saturation:
1. `changes` job: runs on ubuntu-latest instead of
[self-hosted, macos, arm64]. This job does a plain `git diff`
with zero macOS dependencies — moving it off the runner frees
a slot immediately on every workflow trigger.
2. Add workflow-level concurrency:
concurrency: group: ci-${{ github.ref }}; cancel-in-progress: true
Prevents multiple stale in-flight CI runs from queuing on the
same ref when new commits arrive.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): call redactSecrets before seeding workspace memories (F1085) (#1203)
seedInitialMemories() in workspace_provision.go was inserting template/config
memories directly into agent_memories without scrubbing credential patterns.
A workspace provisioned from a template containing API keys, tokens, or other
secrets would store them in plain text — the same class of issue as #838.
Fix: call redactSecrets(workspaceID, content) on the truncated memory content
before the INSERT. The truncation (maxMemoryContentLength = 100 KiB, CWE-400)
is preserved — redaction runs after truncation so the size limit still applies.
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* tick: 2026-04-21 ~03:40Z — CI stalled 59+ min, GH_TOKEN 4th rotation, PR reviews done
* fix(tenant-guard): allowlist /registry/register + /registry/heartbeat
Final layer of today's stuck-provisioning saga. With the private-IP
platform_url fix and the intra-VPC :8080 SG rule in place, workspace
EC2s finally reached the tenant on the right port — only to have every
POST bounced with a synthetic 404 by TenantGuard.
TenantGuard is the SaaS hook that rejects cross-tenant routing. It
demands X-Molecule-Org-Id on every request, but CP's workspace user-
data doesn't export MOLECULE_ORG_ID (only WORKSPACE_ID, PLATFORM_URL,
RUNTIME, PORT), so the runtime can't attach the header. Net effect:
every workspace's first heartbeat to /registry/heartbeat was a silent
404, and the workspace sat in 'provisioning' until the platform
sweeper timed it out.
Allowlist the two workspace-boot paths:
- /registry/register — one-shot at runtime startup
- /registry/heartbeat — every 30s
Both are still gated by wsauth.HasAnyLiveToken (workspaces with a
token on file must present it; legacy tokenless workspaces are
grandfathered). And the tenant SG already scopes :8080 to the VPC
CIDR, so only intra-VPC callers can reach these paths in the first
place. The allowlist bypasses cross-org routing, not auth.
Follow-up: passing MOLECULE_ORG_ID into the workspace env would let
the runtime attach the header and drop this allowlist entry. Tracked
separately; not urgent since the multi-layer auth above is already
adequate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Infra-SRE <infra-sre@agents.moleculesai.app>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-DevOps <core-devops@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>