Commit Graph

2574 Commits

Author SHA1 Message Date
Molecule AI Controlplane Lead
7fce21056b fix(F1085): scope rm to /configs volume in deleteViaEphemeral
F1085 (Misconfiguration - Filesystems): the 2-arg exec form
[]string{"rm", "-rf", "/configs", filePath} passes /configs as
an rm target, so rm -rf /configs deletes the entire volume mount
regardless of what filePath resolves to.

Fix uses filepath.Join + filepath.Clean + HasPrefix assertion to
scope rm to the /configs/ prefix. validateRelPath (CWE-22) catches
leading/mid-path ".." before rm. HasPrefix guard is defence-in-depth.

Includes CP-BE's 12-case regression test suite (docker: nil,
validates all traversal forms rejected before Docker call).

Co-Authored-By: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-Authored-By: Molecule AI CP-BE <cp-be@agents.moleculesai.app>
2026-04-22 22:39:39 +00:00
Hongming Wang
0082568448 ci: canary-verify graceful-skip + draft auto-promote staging→main
Two related workflow hygiene changes:

## (1) canary-verify: graceful-skip when canary secrets absent

Before: canary-verify hit `scripts/canary-smoke.sh` which exited
non-zero when CANARY_TENANT_URLS was empty. Every main publish
ran → canary-verify failed → red check on main CI signal (7/7 in
past 24h). Noise, no value.

After: smoke step detects the missing-secrets case, writes a
warning to the step summary, sets an output `smoke_ran=false`,
and exits 0. The workflow completes green without pretending to
have tested anything.

Gated downstream: `promote-to-latest` now requires BOTH
`needs.canary-smoke.result == success` AND
`needs.canary-smoke.outputs.smoke_ran == true`. A skip does NOT
auto-promote — manual `promote-latest.yml` remains the release
gate while Phase 2 canary is absent (see
molecule-controlplane/docs/canary-tenants.md for the fleet
stand-up plan + decision framework).

When the canary fleet is stood up and secrets populated: delete
the early-exit branch + the smoke_ran gate. The workflow goes back
to its original "smoke gates promotion" semantics.

## (2) auto-promote-staging.yml — draft

New workflow that fires after CI / E2E Staging Canvas / E2E API /
CodeQL complete on the staging branch, checks that ALL four are
green on the same SHA, and fast-forwards `main` to that SHA.

Shipped disabled: the promote step is gated behind repo variable
`AUTO_PROMOTE_ENABLED=true`. Until that's set, the workflow
dry-runs and logs what it would have done. Toggle via Settings →
Variables when staging CI has been reliably green for a few days.

Safety:
- workflow_run events only fire on push to staging (PRs into
  staging don't promote).
- Every required gate must be `completed/success` on the same
  head_sha. Pending / failed / skipped / cancelled → abort.
- `--ff-only` push. Refuses to advance main if it has diverged
  from staging history (someone landed a direct-to-main commit
  that's not on staging). Human resolves the fork.
- `workflow_dispatch` with `force=true` lets us test the flow
  end-to-end before flipping the variable on.

Motivation: molecule-core#1496 has been open with 1172 commits
divergence between staging and main. Today that trapped PR #1526
(dynamic canvas runtime dropdown) on staging while prod users
hit the hardcoded-dropdown bug. Auto-promote retires the bulk
staging→main PR pattern once the staging CI it depends on is
reliable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 22:39:23 +00:00
Hongming Wang
28bf11fb85 docs(security): move sensitive runbooks to private internal repo
Three changes to stop ferrying sensitive content through our public
monorepo. All content already imported to Molecule-AI/internal (private)
— see linked PRs below.

Contained full security audit cycle records with CWE references,
file:line pointers to historical vulnerabilities, and severity
ratings. None of that belongs in a public repo.

→ Moved to Molecule-AI/internal/security/incident-log.md (PR #20).
  Monorepo file becomes a 17-line stub pointing at the internal
  location. Future incidents land in the internal file only.

Had AWS account ID `004947743811` and IAM role name
`MoleculeStagingProvisioner` embedded. Even though the fleet
described isn't actually running (see state note), these
identifiers are account-specific and don't belong in public git.

→ Removed both values, replaced with generic references + a pointer
  to Molecule-AI/internal/runbooks/canary-fleet.md (PR #21) where
  the actual identifiers live. Any future rotation touches the
  internal file, no public-git-history rewrite needed.

Contained the full ops runbook: bootstrap script output, per-tenant
SG backfill loop with live SG IDs, customer slug names
(hongmingwang). Useful content but too specific for a public repo.

→ Moved to Molecule-AI/internal/runbooks/workspace-terminal.md
  (PR #22). Monorepo file becomes a 30-line public summary of what
  the feature does + pointers to code, so external readers /
  self-hosters still get the design story.

Marketing briefs, SEO plans, campaign copy, research dossiers, and
internal product designs (hermes-adapter-plan, medo-integration,
cognee-*) are the next batches. See docs policy doc coming next to
set team expectations.

Net removal: ~820 lines from public git going forward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 22:39:23 +00:00
molecule-ai[bot]
ea200cbcb0
docs(marketing): add Day 4 + Day 5 social copy
Day 4: EC2 Console Output — approved by Marketing Lead + PM
Day 5: Org-Scoped API Keys — approved by Marketing Lead + PM
Both campaigns queued for Apr 24 and Apr 25.

Co-authored-by: Marketing Lead <marketing-lead@agents.moleculesai.app>
2026-04-22 21:22:34 +00:00
molecule-ai[bot]
7c66c692d8
docs(blog): Phase 33 direct-connect migration — Cloudflare Tunnel to public IP (#1612)
* docs(social): EC2 Instance Connect SSH launch copy + terminal demo visual

PR #1533 (feat/terminal: remote path via aws ec2-instance-connect + pty)
Issue #1547 (social: launch thread for EC2 Instance Connect SSH)

Content:
- docs/marketing/social/2026-04-22-ec2-instance-connect-ssh/social-copy.md
  5-post X thread + LinkedIn single post, dark theme brand voice
- docs/assets/blog/2026-04-22-ec2-instance-connect-ssh/ec2-terminal-demo.png (1200x800)
  Canvas Terminal tab mockup showing EC2 bash prompt via EIC

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(blog): Phase 33 direct-connect migration — Cloudflare Tunnel to public IP

Migrate from Cloudflare Tunnel (outbound WebSocket) to direct-connect
agent workspaces with per-workspace public IPs. Covers operator actions,
developer notes, security model, and Phase 33 rollout timeline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Social Media Brand <social-media-brand@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI DevRel Engineer <devrel-engineer@agents.moleculesai.app>
2026-04-22 21:11:56 +00:00
airenostars
7a89704b6e
fix(build): add missing fmt import + fix canvas Dockerfile GID (#1487)
* docs(canary-release): flag as aspirational; link to current state

The canary-release.md doc describes the pipeline as if the fleet is
running — referring to AWS account 004947743811 and a configured
MoleculeStagingProvisioner role. Reality as of 2026-04-22: no canary
tenants are provisioned, the 3 GH Actions secrets are empty, and
canary-verify.yml has failed 7/7 times in a row.

Added a top-of-doc ⚠️ state note that:

1. Clarifies this is intended design, not deployed reality.
2. Notes the AWS account ID is historical / unverified.
3. Explains that merges currently rely on manual promote-latest.
4. Cross-links to molecule-controlplane/docs/canary-tenants.md for
   the Phase 1 work that's shipped, the Phase 2 stand-up plan, and
   the "should we even do this now?" decision framework.
5. Asks whoever lands Phase 2 to reconcile the two docs.

No behaviour change — doc-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build): add missing fmt import in a2a_proxy.go, fix canvas Dockerfile GID

- a2a_proxy.go: missing "fmt" import caused build failure (8 undefined
  references at lines 743-775). Likely dropped during a recent merge.
- canvas/Dockerfile: GID 1000 already in use in node base image.
  Changed to dynamic group/user creation with fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
2026-04-22 21:10:58 +00:00
Molecule AI PMM
4736f07e1c PMM: add enterprise governance + org API key attribution to A2A v1 blog
- Add "Org-Scoped API Keys: Delegation Attribution for Regulated Industries" section
  with org:keyId audit trail, created_by chain of custody, revocation story
- Add CloudTrail-compatible architecture bullet to enterprise section
- Update meta description: governance/compliance angle (replaces "native vs bolted-on")
- Cross-links org keys, audit trail, and compliance frameworks to existing Phase 30 primitives

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 21:09:22 +00:00
Molecule AI PMM
840d9732ce Merge main into staging — bring staging to date for PR #1496 2026-04-22 20:57:31 +00:00
Molecule AI PMM
96178eca95 PMM: update EC2 SSH social copy — add ephemeral key versions + positioning approval
- Add Version E: ephemeral key story (60-second RSA key lifecycle)
- Elevate Version D: zero key rot angle with explicit 60-second key window
- Add Version A/D as approved primary angles (ops simplicity / security)
- Update status to APPROVED, unblocked for Social Media Brand
- Add header: positioning angle confirmed per GH issue #1637
- Add image suggestion for ephemeral key timeline graphic

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 20:54:11 +00:00
Molecule AI PMM
83c977f6d7 PMM: commit all Phase 30/34 staged work
- Phase 34 Partner API Keys battlecard
- A2A Enterprise Deep-Dive SEO brief + social copy
- Phase 30 social copy (X + LinkedIn threads)
- Phase 30 blog post (remote-workspaces)
- Launch pages (org-scoped API keys, instance ID, EC2 SSH)
- Fly.io + Discord Adapter + EC2 social copy
- Screencast storyboards (4 demos)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 20:31:37 +00:00
Molecule AI PMM
cb2e5c5f3b docs: add Phase 34 Partner API Keys positioning brief
Three-channel brief covering partner platforms, marketplace resellers,
and enterprise CI/CD automation. Links to Phase 30 (mol_ws_* token model)
as cross-sell. Flags first-mover opportunity vs CrewAI/LangGraph Cloud.
Collocates collateral gap list and open PM questions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 20:31:24 +00:00
Molecule AI PMM
7f699116ae docs: add LangGraph governance-gap ADR section to A2A v1 blog
Adds competitive differentiation section explicitly calling out the
governance layer gap in LangGraph's current A2A PRs vs Molecule AI's
Phase 30 production implementation. Canonical URL verified correct.
Closes PMM A2A blog final-review item.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 20:31:24 +00:00
Molecule AI PMM
50082a35a3 PMM: remove #AgenticAI from org-api-keys social copy
Not in positioning brief. Replace with #A2A per PMM alignment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 20:31:23 +00:00
Molecule AI PMM
1dc60d17fb PMM: stage A2A v1 deep-dive content brief for Content Marketer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 20:31:23 +00:00
Molecule AI PMM
156d1cae13 PMM: update ecosystem-watch with LangGraph PR verification
- PRs #6645, #7113, #7205 not found in langchain-ai/langgraph open PR list
- Added VERIFY flags to LangGraph tracker; requires manual re-check
- Updated market events log with verification result
- Battlecard v0.3 LangGraph status is now flagged as stale pending re-verify

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 20:31:23 +00:00
Hongming Wang
c4f7d551dc
Merge pull request #1628 from Molecule-AI/fix/cicd-unblock-latent-bugs
fix(ci): unblock main CI on ubuntu-latest (2 latent bugs)
2026-04-22 13:19:09 -07:00
Hongming Wang
1aea013e20 fix(ci): unblock main CI on ubuntu-latest — IPv6-safe addr + MagicMock seed
Two latent bugs the self-hosted Mac mini had been hiding. Both caught
by the newer toolchain on ubuntu-latest runners after PR #1626.

1. workspace-server/internal/handlers/terminal.go:442
   `fmt.Sprintf("%s:%d", host, port)` flagged by go vet as unsafe
   for IPv6 (it omits the required [::] brackets). Replaced with
   `net.JoinHostPort(host, strconv.Itoa(port))` which handles both
   IPv4 and IPv6 correctly. No runtime behaviour change — the only
   call site passes "127.0.0.1", so the bug would never trigger in
   practice, but vet is right to flag it as a latent correctness
   issue.

2. workspace/tests/test_a2a_executor.py::test_set_current_task_updates_heartbeat
   `MagicMock()` auto-creates attributes on first access, so
   `getattr(heartbeat, "active_tasks", 0)` in shared_runtime.py
   returned a MagicMock rather than the default 0. Adding 1 to a
   MagicMock returns another MagicMock, so the assertion
   `heartbeat.active_tasks == 1` never held. Seeding
   `heartbeat.active_tasks = 0` before the first call makes
   getattr() return a real int, matching how the real HeartbeatLoop
   class initialises itself.

Both pre-existed on main and were hidden by the older Python / Go
toolchains on the Mac mini runner. Verified locally (venv pytest
pass, `go vet ./...` + `go build ./...` clean on workspace-server).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 13:18:46 -07:00
Hongming Wang
557e7a0697
Merge pull request #1626 from Molecule-AI/perf/public-workflows-ubuntu-latest
perf(ci): all public-repo workflows → ubuntu-latest
2026-04-22 13:04:06 -07:00
Hongming Wang
f3e658a091
Merge pull request #1624 from Molecule-AI/feat/provisioner-pull-templates-from-ghcr
feat(provisioner): pull workspace-template images from GHCR
2026-04-22 13:04:03 -07:00
Hongming Wang
e298393df5 perf(ci): move all public-repo workflows to ubuntu-latest
molecule-core is a public repo — GHA-hosted minutes are free. The
self-hosted Mac mini was only in play to dodge GHA rate limits
(memory feedback_selfhosted_runner), but for these specific
workflows it came with real costs:

- Docker-push workflows emulated linux/amd64 from arm64 via QEMU —
  every canvas + platform image build ran ~2-3x slower than native.
- Six PRs worth of keychain-avoidance hacks in publish-* because
  `docker login` on macOS writes to osxkeychain unconditionally,
  and the Mac mini's launchd user-agent keychain is locked.
- Homebrew pin-down environment variables (HOMEBREW_NO_*) sprinkled
  everywhere to work around the shared /opt/homebrew symlink mess
  on the runner.
- Setup-python@v5 couldn't write to /Users/runner, so ci.yml
  python-lint resorted to a hand-rolled Homebrew python3.11 dance.
- Single runner → fan-out contention; CodeQL's 45-min analysis
  fought the canvas publish for the one slot.

Changes across the 7 workflows:

- runs-on: [self-hosted, macos, arm64] → ubuntu-latest (every job)
- publish-canvas-image + publish-workspace-server-image:
  drop the hand-rolled auths-map step + QEMU setup + buildx v4
  → docker/login-action@v3 + setup-buildx@v3. Linux + amd64
  target = native build.
- canary-verify + promote-latest: replace `brew install crane` +
  HOMEBREW_NO_* incantations with imjasonh/setup-crane@v0.4.
- codeql.yml: drop `brew install jq` — jq is preinstalled on
  ubuntu-latest.
- ci.yml shellcheck: drop the self-hosted existence check —
  shellcheck is preinstalled via apt.
- ci.yml python-lint: replace the Homebrew python3.11 path dance
  with actions/setup-python@v5 (which works fine on GHA-hosted),
  add requirements.txt caching while we're there.
- Remove stale comments referencing "the self-hosted runner",
  "Mac mini", keychain, osxkeychain etc.

The self-hosted Mac mini remains in service for private-repo
workflows only. Memory feedback_selfhosted_runner updated to
reflect the public-repo scope carve-out.

Net -96 lines across the 7 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:56:49 -07:00
Hongming Wang
9df3159c59 feat(provisioner): pull workspace-template images from GHCR
Every standalone workspace-template repo now publishes to
ghcr.io/molecule-ai/workspace-template-<runtime>:latest via the
reusable publish-template-image workflow in molecule-ci (landed
today — one caller per template repo). This PR makes the
provisioner actually use those images:

- RuntimeImages map + DefaultImage switched from bare local tags
  (workspace-template:<runtime>) to their GHCR equivalents.
- New ensureImageLocal step before ContainerCreate: if the image
  isn't present locally, attempt `docker pull` and drain the
  progress stream to completion. Best-effort — if the pull fails
  (network, auth, rate limit) the subsequent ContainerCreate still
  surfaces the actionable "No such image" error, now with a
  GHCR-appropriate hint instead of the defunct
  `bash workspace/build-all.sh <runtime>` advice.
- runtimeTagFromImage now handles both forms: legacy
  `workspace-template:<runtime>` (local dev via build-all.sh /
  rebuild-runtime-images.sh) and the current GHCR shape. Keeps
  error hints sensible in both worlds.
- Tests cover the GHCR path for tag extraction and the new error
  message shape. Legacy local tags still recognised.

Local dev path unchanged — scripts/build-images.sh and
workspace/rebuild-runtime-images.sh still produce locally-tagged
`workspace-template:<runtime>` images, and Docker's image
resolver matches them before any pull is attempted. So
contributors can keep iterating on a template repo without
round-tripping through GHCR.

Follow-on impact:
- hongmingwang.moleculesai.app (and any other tenant EC2) will
  auto-pull `ghcr.io/molecule-ai/workspace-template-hermes:latest`
  on the next hermes workspace provision — picking up the real
  Nous hermes-agent behind the A2A bridge (template-hermes v2.1.0)
  without any tenant-side rebuild step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:39:56 -07:00
molecule-ai[bot]
de11188cc4
fix(F1085): scope rm to /configs volume in deleteViaEphemeral (#1616)
* fix(F1085): scope rm to /configs volume in deleteViaEphemeral

Regressed by commit 49ab614 ("CWE-78/CWE-22 — block shell injection
in deleteViaEphemeral") which changed the rm form from the scoped
concat "/configs/" + filePath to the unscoped 2-arg "/configs", filePath.

With 2 args, rm receives /configs as the first target — rm -rf /configs
attempts to delete the entire volume mount before processing filePath,
which is the F1085 (Misconfiguration - Filesystems) defect. The concat
form passes a single scoped path so rm only touches files inside /configs.

validateRelPath call retained as CWE-22 defence-in-depth.

* docs: note F1085 defect in deleteViaEphemeral 2-arg rm form

Amends the CWE-22+CWE-78 incident entry to record that commit 49ab614
regressed the F1085 (volume deletion scope) fix, and that f1085-fix
commit a432df5 restores the correct concat form.

---------

Co-authored-by: Molecule AI CP-QA <cp-qa@agents.moleculesai.app>
2026-04-22 18:44:52 +00:00
Molecule AI Fullstack (floater)
ea5e018f76 Merge main into staging to sync 2026-04-22 18:15:52 +00:00
molecule-ai[bot]
6bd1691446
Merge pull request #1594 from Molecule-AI/fix/canvas-a11y-clean
fix(canvas/a11y): aria-hidden on decorative SVGs + MissingKeysModal semantics
2026-04-22 18:11:12 +00:00
236158d4a4 fix(canvas/a11y): add aria-hidden to decorative SVGs + MissingKeysModal semantics
- DeleteCascadeConfirmDialog: aria-hidden on warning triangle SVG (button
  already has adjacent text content; icon is purely decorative)
- Toolbar: aria-hidden on 4 decorative SVGs (stop-all, restart-pending,
  search, help) — buttons all have aria-label/aria-expanded/text
- MissingKeysModal: role="dialog" aria-modal="true" aria-labelledby on
  container, id="missing-keys-title" on heading, requestAnimationFrame
  focus management via useRef (replaces autoFocus={index===0})
- CreateWorkspaceDialog: remove redundant aria-describedby={undefined}

WCAG 2.1 SC 1.1.1 — screen readers skip purely-presentational icons.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 17:40:43 +00:00
Hongming Wang
a8e4afe863
Merge pull request #1591 from Molecule-AI/fix/canvas-dockerfile-uid-collision
fix(canvas): unblock publish-canvas-image — drop default node user before uid 1000
2026-04-22 10:22:18 -07:00
Hongming Wang
5f96a832e7 fix(canvas): drop node:20-alpine default user before creating canvas uid 1000
publish-canvas-image has been failing on every main push since 2026-04-21
at `addgroup -g 1000 canvas` because node:20-alpine already ships a `node`
user/group at uid/gid 1000. Same collision workspace-server/Dockerfile.tenant
already fixes with `deluser --remove-home node` before `addgroup`.

Copying that pattern here so the workflow goes green again and canvas images
publish to ghcr. No runtime behaviour change — canvas still runs as non-root
uid 1000.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 09:42:02 -07:00
molecule-ai[bot]
4a03b89e91
fix(scripts): correct platform dir path + add ROOT isolation (shellcheck clean)
- dev-start.sh: $ROOT/platform → $ROOT/workspace-server (Go server
  lives in workspace-server/, not platform/; any developer running
  this script would get "no such directory" immediately)
- nuke-and-rebuild.sh: add ROOT variable and -f "$ROOT/docker-compose.yml"
  so docker compose works from any CWD; fix post-rebuild-setup.sh path
- rollback-latest.sh: add 'local' to src_digest and new_digest vars
  inside roll() function to prevent global-scope leakage

Co-authored-by: Molecule AI Core-DevOps <core-devops@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 15:42:24 +00:00
molecule-ai[bot]
66ea0b6471
test(handlers): add CWE-22 regression suite + KI-005 terminal access fix + tests (#1574)
* fix(lint): unblock Platform Go CI — suppress 8 pre-existing errcheck warnings

golangci-lint errcheck has been flagging these since before this PR —
not regressions from the restart fix, just long-standing debt that
blocks Platform (Go) CI from ever going green. Prefix ignored returns
with `_ =` to make the signal explicit without changing behavior:

- channels/lark_test.go:97 (w.Write) + :118 (resp.Body.Close)
- channels/channels_test.go:620 + :760 (mockDB.Close in t.Cleanup)
- channels/manager.go:131 + :196 (defer rows.Close via closure wrapper)
- channels/manager.go:206–207 (json.Unmarshal into struct fields)
- artifacts/client_test.go:195, 237, 297 (json.Decode in test handlers)

The manager.go defer patch uses `defer func() { _ = rows.Close() }()`
since errcheck doesn't allow the `_ =` prefix directly on `defer`.

Build + `go test ./...` green locally for internal/channels and
internal/artifacts. The manager.go change touches production code so
I re-ran the channels test suite; passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: trigger PR refresh

* test(handlers): add CWE-22 regression suite + KI-005 terminal access fix + tests

container_files_test.go (152 lines):
- 11 path-traversal test cases for copyFilesToContainer (F1501/CWE-22)
- Tests nil Docker client — validation logic runs before any Docker call

terminal.go KI-005 security fix (backport from ship/security-fix 6de7530c):
- Enforce CanCommunicate hierarchy check before granting terminal access
- Shell access is more dangerous than A2A message-passing; apply the
  same hierarchy check used by A2A and discovery endpoints
- When X-Workspace-ID header is present and bearer token is valid
  (ValidateAnyToken), reject unless CanCommunicate(callerID, targetID)
- Canvas/molecli callers without X-Workspace-ID header pass through to
  WorkspaceAuth middleware for existing bearer check
- canCommunicateCheck exposed as package var for testability

terminal_test.go (5 test cases):
- TestTerminalConnect_KI005_RejectsUnauthorizedCrossWorkspace
- TestTerminalConnect_KI005_AllowsOwnTerminal
- TestTerminalConnect_KI005_SkipsCheckWithoutHeader
- TestTerminalConnect_KI005_RejectsInvalidToken
- TestTerminalConnect_KI005_AllowsSiblingWorkspace

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
2026-04-22 15:30:11 +00:00
Hongming Wang
359dc615e9
fix(canvas+templates): fetch runtime dropdown from /templates registry (#1526)
* fix(canvas+templates): fetch runtime dropdown from /templates registry

Canvas hardcoded 6 runtime options, drifting from manifest.json which
already registers hermes + gemini-cli as first-class workspace templates.
A Hermes workspace had runtime=hermes in its DB row but Config showed
"LangGraph (default)" — the HTML select fell back to its first option
because "hermes" wasn't listed, and saving would clobber the runtime
back to empty.

Now:
- GET /templates returns the runtime field from each cloned template's
  config.yaml (previously dropped on the floor)
- ConfigTab fetches /templates on mount, dedupes non-empty runtimes, and
  renders them as <option>s. Falls back to the static list if the fetch
  fails (offline, older backend), so the control never renders empty.

Adding a template to manifest.json now flows through automatically — no
canvas PR required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(canvas+templates): model + required-env suggestions from template

Extends the dropdown fix so Model and Required Env also flow from
the template registry instead of being free-form fields the user
has to remember.

Template config.yaml now declares:

  runtime_config:
    model: <default>
    models:
      - id: nous-hermes-3-70b
        name: Nous Hermes 3 70B (Nous Portal)
        required_env: [HERMES_API_KEY]
      - id: nousresearch/hermes-3-llama-3.1-70b
        name: Hermes 3 70B (via OpenRouter)
        required_env: [OPENROUTER_API_KEY]

Platform: GET /templates now returns runtime + model + models[] per
template (was previously dropping runtime + ignoring runtime_config).

Canvas:
- Runtime dropdown built from /templates (was hardcoded 6 options)
- Model input becomes a datalist combobox; free-form input still
  allowed since model names rotate faster than templates
- Required Env Vars default to the selected model's required_env,
  labelled "(suggested)" so the user knows it's template-driven
- Everything falls back to a static list when /templates is
  unreachable, so offline editing still works

Follow-up: add models[] to the other 7 template repos (claude-code,
crewai, autogen, deepagents, openclaw, gemini-cli, langgraph). This
PR updates the platform + canvas; the Hermes template config update
goes in a separate PR against its own repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(canvas): commit required_env on model change; add backend tests

Review turned up that the \"Required Env Vars (suggested)\" display
was cosmetic-only — users picking a different model saw the new
env suggestion in the TagList, but the values never made it into
state, so Save serialized an empty (or stale) required_env and the
workspace ran with the wrong auth check.

Canvas fixes:
- Model input onChange now commits the matched modelSpec's required_env
  to state — but only when the prior required_env was empty or matched
  the previous modelSpec's list (i.e. user hadn't manually edited).
  User-typed envs always win.
- Dropped the display-only fallback in TagList values; shows only what's
  actually in state.
- New \"Template suggests X, Apply\" hint button covers the edge case
  where state and template differ (existing workspace whose required_env
  lags the template's current recommendation).
- datalist option key now includes index so template authors shipping
  duplicate model ids don't trigger a silent React key collision.
- Small arraysEqual helper.

Backend tests:
- TestTemplatesList_RuntimeAndModelsRegistry — asserts /templates
  response carries runtime + models[] with per-model required_env.
- TestTemplatesList_LegacyTopLevelModel — asserts older templates with
  top-level model: still surface correctly, with empty Models[].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 15:07:46 +00:00
airenostars
201e18f9ed
fix(canvas): infinite render loop in ContextMenu + dedupe SSRF funcs (#1499)
ContextMenu: useCanvasStore selector returned .filter() (new array on
every call), causing React 19's useSyncExternalStore to detect a
reference change and re-render infinitely. Fixed by using .some()
which returns a stable boolean.

Also deduplicates isSafeURL, isPrivateOrMetadataIP, validateRelPath
which existed in 3 files after PR merges collided. Canonical location
is ssrf.go. Removed unused imports (fmt, net, net/url, database/sql,
strings) from a2a_proxy.go, a2a_proxy_helpers.go, mcp_tools.go.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Molecule AI SDK-Dev <sdk-dev@agents.moleculesai.app>
2026-04-22 13:56:46 +00:00
0506e0cabc Merge main into staging - resolving 1,388 commit divergence for PR #1573
Main→staging sync: bring staging up to date with main.
All conflicts resolved to main's version (newer state).
2026-04-22 13:54:53 +00:00
Hongming Wang
fc27477df9
fix(canvas): stop infinite re-render on ContextMenu mount (#1544)
fix(canvas): stop infinite re-render on ContextMenu mount
2026-04-21 21:50:41 -07:00
Hongming Wang
e88ab70251 fix(canvas): stop infinite re-render on ContextMenu mount
ContextMenu's children selector ran .filter() inside the Zustand
hook, returning a brand-new array reference on every render.
useSyncExternalStore under the hood compares snapshots with
Object.is — a new array always differs, so React kept scheduling
re-renders, hit the 50-update depth cap, and crashed with minified
error #185.

Observed as "Application error: a client-side exception" on every
SaaS tenant once a session cookie resolved. Caught in dev mode
where the build emits the clear warning:

  The result of getSnapshot should be cached to avoid an infinite loop
      at ContextMenu (src/components/ContextMenu.tsx:26:34)

Fix: select the stable nodes array once, derive children via
useMemo outside the store subscription. Same output, no new
reference per render.

Manually verified: dev bundle served through a cloudflared tunnel
to a live tenant, ContextMenu component mounts cleanly, remaining
console errors are all unrelated (localhost API 401s from the dev
server pointing at its own origin).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:47:32 -07:00
Hongming Wang
9466542212 docs(infra): add tenant env-var section + fix backfill loop split
Review turned up two issues in the rollout runbook:

1. The tenant env-var list was missing — today's debugging burned 2
   hours on hongmingwang where everything worked infra-side but
   canvas 401'd because MOLECULE_ORG_SLUG and CP_UPSTREAM_URL weren't
   set. Doc without this sends the next operator down the same hole.

   Added a dedicated step-3 table covering CP_UPSTREAM_URL,
   MOLECULE_ORG_SLUG, MOLECULE_ORG_ID, AWS_REGION with the exact
   failure mode each one produces when missing.

2. Backfill loop used tab-separated aws-cli output directly, which
   can concatenate all SG ids into one word and run the loop body
   once with no iteration. Inserted `| tr '\t' '\n'` — no-op on
   well-behaved output, fix on the concatenated case.

Renumbered subsequent sections.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:01:30 -07:00
Hongming Wang
456b8fd184 docs(infra): workspace-terminal runbook with verified commands
Expanded the rollout section with the exact scripts + env vars
that landed to make Hermes workspace Terminal work on 2026-04-22.
Points at molecule-controlplane#227 (which adds bootstrap script +
EIC_ENDPOINT_SG_ID env var) so operators can reproduce the setup
on a new AWS account in one command.

Also documents the existing-workspace backfill for the instance_id
column — the CP only writes on new provisions, so pre-migration
workspaces need a manual UPDATE before Terminal routes to the
remote path.

Refs: #1528 (resolved)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 19:50:59 -07:00
Hongming Wang
3820a0cc5b
feat(terminal): remote path via aws ec2-instance-connect (#1533)
feat(terminal): remote path via aws ec2-instance-connect + pty
2026-04-21 18:40:23 -07:00
Hongming Wang
9aef3ed046
feat(workspace): persist CP-returned EC2 instance_id on provision (#1531)
feat(workspace): persist CP-returned EC2 instance_id on provision
2026-04-21 18:40:05 -07:00
Hongming Wang
bca11fea9f fix(terminal): correct CP branch to SSH-only (no docker exec)
Proven by end-to-end testing against a live Hermes workspace EC2:
CP-provisioned workspaces run the agent as a NATIVE process under
the ubuntu user, not inside a Docker container. The earlier
\`aws ec2-instance-connect ssh -- docker exec -it ws-X bash\` was
doubly wrong:
- aws-cli's \`ssh\` subcommand doesn't accept a trailing command
- Even if it did, there's no container to exec into

Replaced with a three-step pipeline that matches what actually
works when run by hand:
1. ssh-keygen  — ephemeral ed25519 per session
2. aws ec2-instance-connect send-ssh-public-key --instance-os-user ubuntu
3. aws ec2-instance-connect open-tunnel --local-port N  (runs in background)
4. ssh -p N -i <key> ubuntu@127.0.0.1

Infra prerequisites (verified in docs/infra/workspace-terminal.md):
- EIC service-linked role created
- EIC Endpoint in the workspace VPC (we created eice-08b035ec8789202f9)
- Workspace SG allows 22/tcp from the EIC Endpoint's SG
- molecule-cp IAM: ec2:DescribeInstances + ec2-instance-connect:*

Changes in this commit:
- eicSSHOptions struct carries session inputs between factories
- openTunnelCmd + sshCommandCmd + sendSSHPublicKey are package vars
  so tests can stub them individually
- Default OS user is \"ubuntu\" (Ubuntu 24.04 CP AMI). Override via
  WORKSPACE_EC2_OS_USER env var if the AMI changes
- AWS_REGION env var respected; default us-east-2 matches current CP
- pickFreePort + waitForPort helpers — no hardcoded ports, tolerates
  multiple concurrent sessions
- Tests updated: two argv-shape regressions for open-tunnel + ssh
  (SSH shape was the silent-drift case that caused the first failure)

Refs: #1528, #1531
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 18:39:00 -07:00
Hongming Wang
89d9470ba4 feat(terminal): remote path via aws ec2-instance-connect + pty
Closes the last CP-provisioned-workspace gap: Terminal tab now works
for workspaces running on separate EC2 instances. Follow-up to
#1531 which added instance_id persistence.

How it works:
- HandleConnect checks workspaces.instance_id
- Empty → existing local Docker path (unchanged)
- Set   → spawn `aws ec2-instance-connect ssh --connection-type eice
          --instance-id X --os-user ec2-user -- docker exec -it ws-Y
          /bin/bash` under creack/pty, bridge pty ↔ canvas WebSocket

Why subprocess AWS CLI instead of native AWS SDK:
- EIC Endpoint tunnel needs a signed WebSocket with specific framing
- aws-cli v2 implements it correctly; reimplementing in Go is ~500
  lines of crypto + WS protocol work for zero user-visible benefit
- Tenant image picks up 1MB of aws-cli + openssh-client via apk

Handler design:
- sshCommandFactory is a var so tests can stub it (no real aws calls)
- Context cancellation propagates both ways (WS close → kill ssh;
  ssh exit → close WS)
- User-visible error points at docs/infra/workspace-terminal.md when
  EIC wiring is incomplete (common bootstrap failure)

Tests:
- TestHandleConnect_RoutesToRemote — instance_id in DB → CP branch
- TestHandleConnect_RoutesToLocal — empty instance_id → local branch
- TestSshCommandFactory_BuildsEICCommand — argv shape regression guard

Dockerfile.tenant: + openssh-client + aws-cli (Alpine main repo)

Refs: #1528, #1531

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 18:13:29 -07:00
Hongming Wang
1e47f85495 docs(infra): fix workspace-terminal doc against real CP code
Researched the actual molecule-controlplane repo rather than guessing:
- Workspaces launch in a shared CP workspace VPC (p.VPCID), not per
  tenant
- CP already tags instances with Role=workspace at ec2.go:1126 — my
  prior IAM policy used molecule:role which doesn't match anything
- workspaceIngressRules() currently opens only 8000/tcp — no port 22

Corrected:
- IAM policy Condition now matches existing Role tag (no CP change
  needed for the scope to work fleet-wide)
- Added OpenTunnel action so EIC Endpoint path works
- Dropped the \"open 22 in SG\" recommendation. Cross-VPC topology
  makes SG CIDR rules awkward (would need peering + tenant-CIDR
  bookkeeping). EIC Endpoint is one VPC resource + no SG changes.
- Simplified rollout to two items: add IAM policy, create EIC Endpoint

Kept direct-SG path as an explicit not-recommended alternative.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 18:05:24 -07:00
Hongming Wang
46a8d24b2d feat(workspace): persist CP-returned EC2 instance_id on provision
Foundation for the EIC-based terminal handler (#1528). The tenant's
workspace-server needs to map workspace_id → EC2 instance_id to open
an SSH session, but CPProvisioner.Start returned the instance id only
for logging — it was never written anywhere. This PR adds the column
and writes it at provision time.

Scope kept intentionally small: no terminal code yet. The follow-up
PR will consume this column from the terminal handler.

What's here:
- migrations/038_workspace_instance_id — nullable TEXT column on
  workspaces, partial index on non-null for fast lookup
- workspace_provision.go — UPDATE after CPProvisioner.Start; failure
  logs but doesn't fail provisioning (row just lacks instance_id and
  terminal falls back to the existing not-reachable error)
- docs/infra/workspace-terminal.md — full design for the terminal
  flow: EIC vs SSM comparison, IAM policy JSON, SG rules, key
  lifetime, failure modes, rollout checklist

Refs: #1528
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:56:15 -07:00
Hongming Wang
73464a21dd
fix(restart): support SaaS control-plane provisioner (unblocks Platform Go build too) (#1512)
Squash-merge fix/restart (PR #1512): remove SSRF helpers from a2a_proxy_helpers.go since ssrf.go on main now owns these functions, resolving duplicate symbol build failures. Author: HongmingWang-Rabbit. Approved by molecule-ai. Mergeable, UNSTABLE (likely due to pending head branch changes).
2026-04-21 22:56:01 +00:00
Hongming Wang
2133e5601f
Merge pull request #1491 from Molecule-AI/feat/e2e-staging-saas-cicd
fix(e2e): 9 follow-ups to make staging E2E actually green end-to-end
2026-04-21 11:39:07 -07:00
Hongming Wang
bd020d84be ci(e2e): wire MOLECULE_STAGING_OPENAI_KEY into workflow env
The harness needs E2E_OPENAI_API_KEY set for Hermes workspaces to
boot — without it the runtime crashes with "No provider API key
found" and workspaces never hit online. Preflight step fails fast
with a clear error if the repo secret is missing, so CI doesn't
burn 10 minutes on a foregone conclusion.

Repo secret to add: Settings → Secrets → Actions →
MOLECULE_STAGING_OPENAI_KEY.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 11:24:59 -07:00
molecule-ai[bot]
64ccf8e179
fix: CWE-78 rm scope, go vet failures, delegation idempotency
* refactor: split 4 oversized handler files into focused sub-files

- org.go (1099 lines) → org.go + org_import.go + org_helpers.go
- mcp.go (1001 lines) → mcp.go + mcp_tools.go
- workspace.go (934 lines) → workspace.go + workspace_crud.go
- a2a_proxy.go (825 lines) → a2a_proxy.go + a2a_proxy_helpers.go

No functional changes — same package, same exports, same tests.
All files stay under 635 lines.

Note: isSafeURL and isPrivateOrMetadataIP are duplicated between
mcp_tools.go and a2a_proxy_helpers.go — this is a pre-existing issue
from the original mcp.go and a2a_proxy.go, not introduced by this split.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(runtime+scheduler): increment/decrement active_tasks counter (refs #1386)

* docs(tutorials): add Self-Hosted AI Agents guide — Docker, Fly Machines, bare metal

* docs: add Remote Agents feature + Phase 30 blog links to docs index

* docs(marketing): update Phase 30 brief — Action 5 complete, docs/index.md update noted

* docs(api-ref): add workspace file copy API reference (#1281)

Documents TemplatesHandler.copyFilesToContainer (container_files.go):
- Endpoint overview: PUT /workspaces/:id/files/*path
- Parameter descriptions for all four function parameters
- CWE-22 path traversal protection (PRs #1267/1270/1271)
- Defense-in-depth: validateRelPath at handler + archive boundary
- Full error code table (400/404/500)
- curl example with success and path-traversal rejection cases

Also covers: writeViaEphemeral routing, findContainer fallback,
allowed roots allow-list, and related links to platform-api.md.

Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(security): CWE-78/CWE-22 — block shell injection in deleteViaEphemeral (#1310)

## Summary
Issue #1273: deleteViaEphemeral interpolated filePath directly into
rm command, enabling both shell injection (CWE-78) and path traversal
(CWE-22) attacks.

## Changes
1. Added validateRelPath(filePath) guard before constructing the rm command.
   validateRelPath blocks absolute paths and ".." traversal sequences.
2. Changed Cmd from "/configs/"+filePath (string interpolation) to
   []string{"rm", "-rf", "/configs", filePath} (exec form). This
   eliminates shell injection entirely — filePath is a plain argument,
   never interpreted as shell code.

## Security properties
- validateRelPath: blocks "../" and absolute paths before they reach Docker
- Exec form: filePath cannot inject shell metacharacters even if validation
  is somehow bypassed
- "/configs" as separate arg: rm has exactly two arguments, no room for
  injected args

Closes #1273.

Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>

* fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in a2a_proxy.go (#1292) (#1302)

* fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in mcp.go and a2a_proxy.go

Issue #1042: 3 CodeQL SSRF findings across mcp.go and a2a_proxy.go.
staging already ships the fix (PRs #1147, #1154 → merged); main did not include it.

- mcp.go: add isSafeURL() + isPrivateOrMetadataIP() helpers; validate
  agentURL before outbound calls in mcpCallTool (line ~529) and
  toolDelegateTaskAsync (line ~607)
- a2a_proxy.go: add identical isSafeURL() + isPrivateOrMetadataIP()
  helpers; call isSafeURL() before dispatchA2A in resolveAgentURL()
  (blocks finding #1 at line 462)
- mcp_test.go: 19 new tests covering all blocked URL patterns:
  file://, ftp://, 127.0.0.1, ::1, 169.254.169.254, 10.x.x.x,
  172.16.x.x, 192.168.x.x, empty hostname, invalid URL,
  isPrivateOrMetadataIP across all private/CGNAT/metadata ranges

1. URL scheme enforcement — http/https only
2. IP literal blocking — loopback, link-local, RFC-1918, CGNAT, doc/test ranges
3. DNS hostname resolution — blocks internal hostnames resolving to private IPs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci-blocker): remove duplicate isSafeURL/isPrivateOrMetadataIP from mcp.go

Issue #1292: PR #1274 duplicated isSafeURL + isPrivateOrMetadataIP in
mcp.go — both functions already exist on main at lines 829 and 876.
Kept the mcp.go definitions (the originals) and removed the 70-line
duplicate appended at end of file. a2a_proxy.go functions are
unchanged — they serve the same purpose via a separate code path.

* fix: remove orphaned commit-text lines from a2a_proxy.go

Three lines from the PR/commit title were accidentally baked into the
file during the rebase from #1274 to #1302, causing a Go syntax error
(a bare string literal at statement level followed by dangling braces).

Deletion restores:
  }
  return agentURL, nil
}

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>

* fix(canvas/test): patch test regressions from PR #1243 + proximity hitbox fix (#1313)

* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct (#1324) (#1327)

* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct

Fixes #1324 — TypeScript strict mode flags budget.budget_used as
possibly undefined in the progressPct ternary, even though the
outer condition checks budget_limit > 0.

Fix: use nullish coalescing (budget_used ?? 0) so progress shows 0%
when the backend returns a partial shape (provisioning-stuck
workspaces). Also adds a test covering the undefined-budget_used
case with the progress bar aria-valuenow and fill width both at 0%.

Closes #1324.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct (issue #1324) (#1329)

* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct

Fixes #1324 — TypeScript strict mode flags budget.budget_used as
possibly undefined in the progressPct ternary, even though the
outer condition checks budget_limit > 0.

Fix: use nullish coalescing (budget_used ?? 0) so progress shows 0%
when the backend returns a partial shape (provisioning-stuck
workspaces). Also adds a test covering the undefined-budget_used
case with the progress bar aria-valuenow and fill width both at 0%.

Closes #1324.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(platform): unblock SaaS workspace registration end-to-end

Every workspace in the cross-EC2 SaaS provisioning shape was failing
registration, heartbeat, or A2A routing. Four distinct blockers sat
between "EC2 is up" and "agent responds"; three are platform-side and
fixed here (the fourth is in the CP user-data, separate PR).

1. SSRF validator blocked RFC-1918 (registry.go + mcp.go)
   validateAgentURL and isPrivateOrMetadataIP rejected 172.16.0.0/12,
   which contains the AWS default VPC range (172.31.x.x) that every
   sibling workspace EC2 registers from. Registration returned 400 and
   the 10-min provision sweep flipped status to failed. RFC-1918 +
   IPv6 ULA are now gated behind saasMode(); link-local (169.254/16),
   loopback, IPv6 metadata (fe80::/10, ::1), and TEST-NET stay blocked
   unconditionally in both modes.

   saasMode() resolution order:
     1. MOLECULE_DEPLOY_MODE=saas|self-hosted (explicit operator flag)
     2. MOLECULE_ORG_ID presence (legacy implicit signal, kept for
        back-compat so existing deployments don't need a config change)

   isPrivateOrMetadataIP now actually checks IPv6 — previously it
   returned false on any non-IPv4 input, which would let a registered
   [::1] or [fe80::...] URL bypass the SSRF check entirely.

2. Orphan auth-token minting (workspace_provision.go)
   issueAndInjectToken mints a token and stuffs it into
   cfg.ConfigFiles[".auth_token"]. The Docker provisioner writes that
   file into the /configs volume — the CP provisioner ignores it
   (only cfg.EnvVars crosses the wire). Result: live token in DB, no
   plaintext on disk, RegistryHandler.requireWorkspaceToken 401s every
   /registry/register attempt because the workspace is no longer in
   the "no live token → bootstrap-allowed" state. Now no-ops in SaaS
   mode; the register handler already mints on first successful
   register and returns the plaintext in the response body for the
   runtime to persist locally.

   Also removes the redundant wsauth.IssueToken call at the bottom of
   provisionWorkspaceCP, which created the same orphan-token pattern
   a second time.

3. Compaction artefacts (bundle/importer.go, handlers/org_tokens.go,
   scheduler.go, workspace_provision.go)
   Four pre-existing compile errors on main from an earlier session's
   code truncation: missing tuple destructuring on ExecContext /
   redactSecrets / orgTokenActor, missing close-brace in
   Scheduler.fireSchedule's panic recovery. All one-line mechanical
   fixes; without them the binary would not build.

Tests
-----
ssrf_test.go adds:
  * TestSaasMode — covers the env resolution ladder (explicit flag
    wins over legacy signal, case-insensitive, whitespace tolerant)
  * TestIsPrivateOrMetadataIP_SaaSMode — asserts RFC-1918 + IPv6 ULA
    flip to allowed, metadata/loopback/TEST-NET still blocked
  * TestIsPrivateOrMetadataIP_IPv6 — regression guard for the old
    "returns false for all IPv6" behaviour

Follow-up issue for CP-sourced workspace_id attestation will be filed
separately — closes the residual intra-VPC SSRF + token-race windows
the SaaS-mode relaxation introduces.

Verified end-to-end today on workspace 6565a2e0 (hermes runtime, OpenAI
provider) — agent returned "PONG" in 1.4s after register → heartbeat →
A2A proxy → runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(runtime+scheduler): increment/decrement active_tasks + max_concurrent (#1408)

Runtime (shared_runtime.py):
- set_current_task now increments active_tasks on task start, decrements
  on completion (was binary 0/1)
- Counter never goes below 0 (max(0, n-1))
- Pushes heartbeat immediately on BOTH increment and decrement (#1372)

Scheduler (scheduler.go):
- Reads max_concurrent_tasks from DB (default 1, backward compatible)
- Skips cron only when active_tasks >= max_concurrent_tasks (was > 0)
- Leaders can be configured with max_concurrent_tasks > 1 to accept
  A2A delegations while a cron runs

Platform:
- Added max_concurrent_tasks column to workspaces (migration 037)
- Workspace model + list/get queries include the new field
- API exposes max_concurrent_tasks in workspace JSON

Config.yaml support (future): runtime_config.max_concurrent_tasks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(review): address 3 critical issues from code review

1. BLOCKER: executor_helpers.py now uses increment/decrement too
   (was still binary 0/1, stomping the counter for CLI + SDK executors)

2. BUG: asymmetric getattr defaults fixed — both paths use default 0
   (was 0 on increment, 1 on decrement)

3. UX: current_task preserved when active_tasks > 0 on decrement
   (was clearing task description even when other tasks still running)

4. Scheduler polling loop re-reads max_concurrent_tasks on each poll
   (was using stale value from initial query)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>

* docs: workspace files API reference, skill catalog, and links

* docs: fix secrets endpoint path across docs

The workspace secrets endpoint is `/workspaces/:id/secrets`, not
`/secrets/values`. This was wrong in quickstart.md (Path 2: Remote Agent)
and workspace-runtime.md (registration flow example and comparison table).
The external-agent-registration guide already had the correct path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: fix broken blog cross-link in skills-vs-bundled-tools post

Link path had an extra `/docs/` segment: `/docs/blog/...` instead of
`/blog/...`. Nextra resolves blog posts directly under `/blog/<slug>`,
not under `/docs/blog/`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add skill-catalog.md guide

Linked from the skills-vs-bundled-tools blog post as a reference
for TTS/image-generation/web-search skills. The blog promises
"install directly via the CLI" with a skill catalog — this page
fills that promise by documenting available skill types, install
commands, version management, custom skill authoring, and removal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(marketing): update Phase 30 brief — Action 5 complete, docs/index.md update noted

* docs(api-ref): add workspace file copy API reference

Documents TemplatesHandler.copyFilesToContainer (container_files.go):
- Endpoint overview: PUT /workspaces/:id/files/*path
- Parameter descriptions for all four function parameters
- CWE-22 path traversal protection (PRs #1267/1270/1271)
- Defense-in-depth: validateRelPath at handler + archive boundary
- Full error code table (400/404/500)
- curl example with success and path-traversal rejection cases

Also covers: writeViaEphemeral routing, findContainer fallback,
allowed roots allow-list, and related links to platform-api.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>

* fix(handlers): add saasMode() gating to isPrivateOrMetadataIP in a2a_proxy_helpers.go

Issue #1421 / #1401: PR #1363 (handler split) moved isPrivateOrMetadataIP
into a2a_proxy_helpers.go but kept the OLD pre-SaaS version — it
unconditionally blocks RFC-1918 addresses, regressing the fix in
commits 1125a02 / cf10733.

The A2A proxy path now has the same SaaS-gated logic as registry.go:
- Cloud metadata (169.254/16, fe80::/10, ::1) always blocked in both modes
- RFC-1918 (10/8, 172.16/12, 192.168/16) + IPv6 ULA (fc00::/7) blocked in
  self-hosted, allowed in SaaS cross-EC2 mode
- IPv6 addresses now properly checked (previous version returned false for all)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(marketing): Discord adapter Day 2 Reddit + HN community copy

* fix(tests): supply *events.Broadcaster pointer to captureBroadcaster

Cannot use *captureBroadcaster as *events.Broadcaster when the struct
embeds events.Broadcaster as a value — must initialize as a named field.

Fixes go vet error in workspace_provision_test.go:
  cannot use broadcaster (*captureBroadcaster) as *events.Broadcaster value

* Merge pull request #1429 from fix/canvas-tooltip-clear-timer

Without this, a 400ms setTimeout from onFocus/onMouseEnter that fires
after onBlur will re-show a tooltip the user just dismissed. The
setShow(false) in onBlur closes the tooltip immediately but leaves the
timer pending — Tab-blur followed by timer-fire would re-show it.

Fix: add clearTimeout(timerRef.current) at the top of onBlur, mirroring
the pattern already used in onMouseLeave and onFocus.

Refs: PR #1367 (a11y keyboard support — this was a pre-existing gap)

Co-authored-by: Molecule AI App-FE <app-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): add missing children:[] to setPendingDelete expectation (#1426)

PR #1252 (cascade-delete UX) updated setPendingDelete to pass a
children array for cascade-warning rendering. The keyboard-a11y test
assertion was not updated to match.

Test: clicking 'Delete' hoists state to the store and closes the menu

Co-authored-by: Molecule AI Core-QA <core-qa@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): add children:[] to setPendingDelete + \&apos; entity fix (closes #1380) (#1427)

* ci: retry — trigger fresh runner allocation

* fix(canvas/test): add children:[] to setPendingDelete assertion

setPendingDelete now includes children:[] (PR #1383 extended the
pendingDelete type). The keyboard accessibility test at line 225 used
exact object matching which omitted the new field, causing a failure
after staging merged #1383.

Issue: #1380

* fix(canvas): replace &apos; HTML entity with straight apostrophe

JSX does not entity-decode &apos; — it renders the literal text
"&apos;" instead of "'".  Found at line 157 (payment confirmed) and
line 321 (empty org list).  Replaced with a straight apostrophe,
which JSX handles correctly.

Ref: issue #1375
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: DevOps Engineer <devops@molecule.ai>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Merge pull request #1430 from fix/1421-saas-ssrf-helpers

Issue #1421 / #1401: PR #1363 (handler split) moved isPrivateOrMetadataIP
into a2a_proxy_helpers.go but kept the OLD pre-SaaS version — it
unconditionally blocks RFC-1918 addresses, regressing the fix in
commits 1125a02 / cf10733.

The A2A proxy path now has the same SaaS-gated logic as registry.go:
- Cloud metadata (169.254/16, fe80::/10, ::1) always blocked in both modes
- RFC-1918 (10/8, 172.16/12, 192.168/16) + IPv6 ULA (fc00::/7) blocked in
  self-hosted, allowed in SaaS cross-EC2 mode
- IPv6 addresses now properly checked (previous version returned false for all)

Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(P0): CWE-22 path traversal in copyFilesToContainer + ContextMenu test

Issue #1434 — CWE-22 Path Traversal Regression:
PR #1280 (dc218212) correctly used cleaned path in tar header.
PR #1363 (e9615af) regressed to using uncleaned `name`.
Fix: use `clean` in filepath.Join AND add defence-in-depth escape check.

Issue #1422 — ContextMenu Test Regression:
PR #1340 expanded pendingDelete store type to include `children:[]`.
Test assertion missing the field — add `children:[]` to match.

Note: ssrf.go created (shared isSafeURL/isPrivateOrMetadataIP) to
prepare for the handler-split refactor fix — current branch has no
build error, but the shared file will prevent regression when PR #1363
is merged. isSafeURL/isPrivateOrMetadataIP retained in both files
for now to avoid breaking callers while the split is finalized.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: resolve 3 go vet failures + add idempotency_key to delegate_task_async

- workspace_provision_test.go: add missing mock := setupTestDB(t) to
  TestSeedInitialMemories_Truncation — mock was referenced but never
  declared, causing "undefined: mock" vet error
- orgtoken/tokens_test.go: discard unused orgID return value with _ in
  Validate call — "declared and not used" vet error
- a2a_tools.py: delegate_task_async now sends idempotency_key (SHA-256
  of workspace_id + task) to POST /workspaces/:id/delegate, fixing
  duplicate task execution when an agent restarts mid-delegation (#1456)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: airenostars <airenostars@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Molecule AI Community Manager <community-manager@agents.moleculesai.app>
Co-authored-by: Molecule AI App-FE <app-fe@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-QA <core-qa@agents.moleculesai.app>
Co-authored-by: DevOps Engineer <devops@molecule.ai>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
2026-04-21 18:22:30 +00:00
rabbitblood
ce52b67d62 fix(build): add missing fmt import to a2a_proxy.go
Build broken on main since d86b8fe — a2a_proxy.go uses fmt.Errorf()
(8 call sites) but the import was dropped during an isSafeURL refactor
merge. CI fails with "undefined: fmt" at lines 743-775.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 11:17:54 -07:00
molecule-ai[bot]
859d676f70
fix(CI): correct BASE in detect-changes (PR/push race); catch RuntimeError in conftest (#1473)
- ci.yml: replace if/else BASE assignment with GITHUB_BASE_REF default
  + pull_request base.sha override pattern. Prevents push events from
    overwriting the correct PR base SHA when both events fire together.
- conftest.py: catch RuntimeError in addition to ImportError when
  importing coordinator.py, which raises RuntimeError at import time
  when WORKSPACE_ID is not set (before the ImportError guard).

Co-authored-by: Molecule AI Release Manager <release-manager@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 18:15:45 +00:00
Hongming Wang
5e130b7e6f fix(e2e): delegation raw curl missing X-Molecule-Org-Id
Section 10's delegation call is a raw curl (not tenant_call, because
it carries an additional X-Source-Workspace-Id). It was missing
X-Molecule-Org-Id, which TenantGuard requires — so the tenant 404'd
every delegation probe despite section 8's A2A call (via tenant_call)
working correctly.

Repro: staging run 2026-04-21T17:40Z had section 8 green (PONG)
and section 10 red (rc=22) on the same workspace. Only difference
was the missing header.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:41:17 -07:00
Hongming Wang
b8b3d5ce1f fix(e2e): MODEL_PROVIDER is provider:model slug, not just provider
workspace/config.py:258 reads MODEL_PROVIDER as the full model string
(format 'provider:model', e.g. 'anthropic:claude-opus-4-7'). My prior
'openai' alone got parsed as the model name → 404 model_not_found.

Use 'openai:gpt-4o' and also set OPENAI_BASE_URL to api.openai.com
(default was openrouter.ai which takes different key format).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:33:27 -07:00