fix/s8-bind-loopback-dev
8 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1161b97faf |
feat(mcp): cross-workspace delegation routing (multi-ws PR-2)
PR-2 of the multi-workspace external-agent stack. PR-1 (#2739) landed per-workspace auth + heartbeat + inbox. This PR threads ``source_workspace_id`` through the A2A client + tool surface so an agent registered against multiple workspaces can list peers across all of them and delegate from a specific source. Changes ------- * ``a2a_client``: ``discover_peer``, ``send_a2a_message``, ``get_peers_with_diagnostic``, and ``enrich_peer_metadata`` now accept ``source_workspace_id``. Routing uses it for both the X-Workspace-ID header and (transitively, via ``auth_headers(src)``) the bearer token. Defaults to module-level WORKSPACE_ID for back-compat. * ``a2a_client._peer_to_source``: a new lock-free cache mapping each discovered peer back to the source workspace whose registry surfaced it. ``tool_list_peers`` populates the cache on every call; ``tool_delegate_task`` consults it for auto-routing. * ``a2a_tools.tool_list_peers(source_workspace_id=None)``: when multiple workspaces are registered (MOLECULE_WORKSPACES) and no explicit source is passed, aggregates peers across every registered workspace and tags each entry with ``via: <src[:8]>``. Single-workspace mode is unchanged — no ``via:`` annotation, same output shape. * ``a2a_tools.tool_delegate_task`` and ``tool_delegate_task_async`` resolve source via ``source_workspace_id arg → _peer_to_source[target] → WORKSPACE_ID``. Agents almost never need to specify ``source_*`` explicitly — call ``list_peers`` first and the cache handles the rest. * ``tool_delegate_task_async`` idempotency key now includes the source workspace, so the same task delegated from two registered workspaces produces two distinct delegations (the right behavior — one per tenant audit trail). * ``platform_auth.list_registered_workspaces()``: new helper for the tool layer to enumerate the multi-ws registry. Lock-free reads matched by the existing single-writer-per-workspace contract from PR-1. * ``platform_auth.self_source_headers``: now passes ``workspace_id`` through to ``auth_headers`` — without this, a multi-workspace POST source-tagged with ``X-Workspace-ID=ws_b`` was authenticating with ws_a's token (or no token if MOLECULE_WORKSPACE_TOKEN unset). Latent PR-1 bug exposed by the new tool surface. * ``a2a_mcp_server`` tool dispatch passes ``source_workspace_id`` from the tool call arguments. * ``platform_tools.registry``: add ``source_workspace_id`` to the delegate_task, delegate_task_async, check_task_status, list_peers input schemas with copy explaining when to use it (rarely — the cache handles it). Tests (15 new, all passing) --------------------------- ``test_a2a_multi_workspace.py``: * TestDiscoverPeerSourceRouting (3): src arg drives header+token, fallback to module ws when omitted, invalid target short-circuits before any HTTP attempt. * TestSendA2AMessageSourceRouting (1): X-Workspace-ID source header + Authorization bearer both come from the source arg via the patched self_source_headers chain. * TestGetPeersSourceRouting (1): URL path AND headers use the source workspace id. * TestToolListPeersAggregation (4): aggregates across multiple registered workspaces, tags origin, leaves single-workspace path unchanged, explicit src arg overrides aggregation, diagnostic joining when every workspace returns empty. * TestToolDelegateTaskAutoRouting (3): cache-driven auto-route, explicit override beats cache, single-workspace fallback to module WORKSPACE_ID. * TestListRegisteredWorkspaces (3): registry enumeration helper. Plus ``tests/snapshots/a2a_instructions_mcp.txt`` regenerated to absorb the new ``source_workspace_id`` schema entries. Back-compat ----------- Every change defaults ``source_workspace_id=None``; legacy single-workspace operators (no MOLECULE_WORKSPACES) see identical behavior — same URLs, same headers, same tool output. The 24 PR-1 tests + 125 existing A2A tests all still pass. Out of scope (PR-3) ------------------- Memory namespacing per registered workspace lands after the new memory system v2 PR (#2740) settles in production. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
829ab66462 |
mcp: support multi-workspace external-agent registration (PR-1)
External MCP agents (e.g. Claude Code installed on a company PC) can
now register against MULTIPLE workspaces from a single process — the
agent participates as a peer in workspace A (company) AND workspace B
(personal) simultaneously, with one merged inbox tagged so replies
route to the correct tenant.
Use case (verbatim from operator): "I have this computer AI thats in
company's PC, he is going to be put in company's workspace, but
personally, I want to register it to my own workspace as well, so
that I can talk to it and asking him to do work."
## What changed
**Wire format** — new env var:
MOLECULE_WORKSPACES='[
{"id":"<company-wsid>","token":"<company-tok>"},
{"id":"<personal-wsid>","token":"<personal-tok>"}
]'
When set, mcp_cli iterates the array and spawns one (register +
heartbeat + inbox poller) trio per workspace. Single-workspace mode
(WORKSPACE_ID + MOLECULE_WORKSPACE_TOKEN) is unchanged — every
existing operator's setup keeps working bit-for-bit.
**Per-workspace token registry** (platform_auth.py):
register_workspace_token(wsid, tok) — populated by mcp_cli once
per workspace before any thread spawns; thread-safe registration
+ lock-free reads on the hot path. auth_headers(workspace_id=...)
routes to the per-workspace token; auth_headers() with no arg
uses the legacy resolution path unchanged (back-compat).
**Per-workspace inbox cursors** (inbox.py):
InboxState now supports cursor_paths={wsid: Path,...}. Each poller
advances its own cursor — one workspace's slow poll can't stall
another, and a 410 only resets the affected workspace's cursor.
Single-workspace constructor (cursor_path=Path(...)) still works
exactly as before via __post_init__ promotion to the empty-string
key. Cursor filenames disambiguated by workspace_id[:8] when
multi-workspace; single-workspace keeps the legacy filename so
upgrade doesn't invalidate on-disk state.
**Arrival workspace tagging** (inbox.py):
InboxMessage.arrival_workspace_id — tells the agent which OF ITS
workspaces the inbound message arrived on. Set by the poller from
the cursor key. to_dict() omits the field when empty so single-
workspace consumers see no shape change.
**Reply routing** (a2a_tools.py + a2a_mcp_server.py + registry.py):
send_message_to_user(workspace_id=...) — optional override that
selects which workspace's /notify endpoint to POST to (and which
token authenticates). Multi-workspace agents pass the inbound
message's arrival_workspace_id; single-workspace agents omit it
and route to the only registered workspace via the legacy URL.
## Out of scope (future PRs)
- PR-2: cross-workspace delegation auto-routing — when an agent
receives a request from personal-ws "delegate to ops-bot" and
ops-bot lives in company-ws, the agent should auto-pick its
company-ws identity for the outbound delegate_task. Today the
agent must pass via_workspace explicitly (or fall through to
primary workspace).
- PR-3: memory namespacing — commit_memory() still writes to the
primary workspace's memory regardless of inbound context. Will
revisit when the new memory system (PR #2733 just landed) settles.
## Tests
workspace/tests/test_mcp_cli_multi_workspace.py — 24 new tests:
* MOLECULE_WORKSPACES JSON parsing (valid + 6 error shapes)
* Token registry register / lookup / rotation / clear
* auth_headers routing by workspace_id with legacy fallback
* Per-workspace cursor save/load/reset isolation
* arrival_workspace_id present-when-set, omitted-when-empty
* default_cursor_path namespacing
All 110 pre-existing tests in test_mcp_cli.py / test_inbox.py /
test_platform_auth.py still pass — back-compat is mechanical.
Refs: project memory entry "External agent multi-workspace
registration", design questions answered 2026-05-04 by user
(JSON env var; explicit memory writes deferred to PR-3).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c636022d2f |
fix(runtime): auto-fallback CONFIGS_DIR for non-container hosts (closes #2458)
The runtime persists per-workspace state (`.auth_token`,
`.platform_inbound_secret`, `.mcp_inbox_cursor`) under `/configs` —
the workspace-EC2 mount path. Inside a container that's writable,
agent-owned. Outside a container, `/configs` either doesn't exist or
isn't writable by an unprivileged user.
The default broke the external-runtime path (`pip install
molecule-ai-workspace-runtime` + `molecule-mcp` on a Mac/Linux
laptop). First heartbeat tries to persist `.platform_inbound_secret`
and crashes:
[Errno 30] Read-only file system: '/configs'
The heartbeat thread logs and dies. Workspace flips offline within
a minute. Operator sees no actionable error.
Adds workspace/configs_dir.py — single resolution point with a tiered
fallback:
1. CONFIGS_DIR env var, if set — explicit operator override
(preserves existing tests + custom deployments verbatim).
2. /configs — if it exists AND is writable. In-container default;
unchanged behavior for every prod workspace.
3. ~/.molecule-workspace — created with mode 0700 so per-file 0600
perms aren't undermined by a world-readable parent.
Migrates the four readers (platform_auth, platform_inbound_auth,
mcp_cli, inbox) to call configs_dir.resolve() instead of
inlining `Path(os.environ.get("CONFIGS_DIR", "/configs"))`.
Existing tests that assert the old `/configs`-as-default contract
updated to assert the new contract: when CONFIGS_DIR is unset, path
resolves to a writable location — `/configs` if present, fallback
otherwise. Tests skip the fallback branch on hosts that DO have a
writable `/configs` (CI containers).
Verified the original repro is fixed: with no CONFIGS_DIR set on
macOS, configs_dir.resolve() returns ~/.molecule-workspace, the dir
exists, and writes succeed.
Test suite: 1454 passed, 3 skipped, 2 xfailed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
74c5e0d7a8 |
fix(workspace-runtime): add Origin header so SaaS edge WAF accepts MCP tool calls
Discovered while smoke-testing the molecule-mcp external-runtime path
against a live tenant (hongmingwang.moleculesai.app). Every tool call
that hit /workspaces/* or /registry/*/peers returned 404 — but
/registry/register and /registry/heartbeat returned 200. Diagnosis:
the tenant's edge WAF requires a same-origin header. Without it,
unhandled paths get silently rewritten to the canvas Next.js app,
which has no /workspaces or /registry/:id/peers route and returns an
empty 404. The molecule-mcp-claude-channel plugin already sets this
header (server.ts:271-276); the workspace runtime never did because
in-container PLATFORM_URLs (Docker network) aren't behind the WAF.
Fix: extend platform_auth.auth_headers() to include
Origin: ${PLATFORM_URL} whenever PLATFORM_URL is set. Inside-container
behavior is unchanged (the WAF is path-irrelevant for the internal
hostnames). External-runtime calls now thread the WAF correctly.
Verification (live, against a freshly-registered external workspace):
pre-fix: get_workspace_info → "not found", list_peers → 404
post-fix: get_workspace_info → full workspace JSON,
list_peers → "Claude Code Agent (ID: 97ac32e9..., status: online)"
This is the kind of bug unit tests can never catch — caught only by
running the wheel against the real tenant. Memory:
feedback_always_run_e2e.md.
Test coverage: 4 new tests in test_platform_auth.py — Origin alone
when no token + Origin + Authorization both, no-PLATFORM_URL falls
through to original empty-dict behavior, env-token path with Origin.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
169e284d57 |
feat(workspace-runtime): expose universal MCP server to runtime=external operators
Ship the baseline universal MCP path that any external runtime (Claude
Code, hermes, codex, anything that speaks MCP stdio) can use, before
optimizing per-runtime channels. Today the workspace MCP server only
spins up inside the container; external operators have no way to call
the 8 platform tools (delegate_task, list_peers, send_message_to_user,
commit_memory, etc.) from outside.
Three additive changes:
1. **`platform_auth.get_token()` env-var fallback** — adds
`MOLECULE_WORKSPACE_TOKEN` as a fallback when no
`${CONFIGS_DIR}/.auth_token` file exists. File-first preserves
in-container behavior unchanged. External operators (no /configs
volume) now have a way to supply the token without faking the
filesystem layout.
2. **`molecule-mcp` console script** — adds a new entry point in the
published `molecule-ai-workspace-runtime` PyPI wheel. Operators run
`pip install molecule-ai-workspace-runtime`, set 3 env vars
(WORKSPACE_ID, PLATFORM_URL, MOLECULE_WORKSPACE_TOKEN), and register
the binary in their agent's MCP config. `mcp_cli.main` is a thin
validator wrapper — it checks env BEFORE importing the heavy
`a2a_mcp_server` module so a misconfigured first-run gets a friendly
3-line error instead of a 20-line module-level RuntimeError
traceback.
3. **Wheel smoke gate** — extends `scripts/wheel_smoke.py` to assert
`cli_main` and `mcp_cli.main` are importable. Same regression class
as the 0.1.16 main_sync incident: a silent rename or unrewritten
import here would break every external operator on the next wheel
publish (memory: feedback_runtime_publish_pipeline_gates.md).
Test coverage:
- `tests/test_platform_auth.py` — 8 new tests for the env-var fallback:
file-priority, env-fallback, whitespace handling, cache, header
construction, empty-env-as-unset.
- `tests/test_mcp_cli.py` — 8 new tests for the validator: each
required var separately, file-or-env satisfies token requirement,
whitespace-only env treated as missing, help mentions canvas Tokens
tab.
- Full `workspace/tests/` suite green: 1346 passed, 1 skipped.
- Local end-to-end: built wheel, installed in venv, ran `molecule-mcp`
with no env → friendly error; with env → MCP server starts.
Why now / why this shape: user redirect was "support the baseline
first so all runtimes can use, then optimize". A claude-only MCP
channel leaves hermes/codex/third-party operators broken on
runtime=external. This PR ships the runtime-agnostic baseline; per-
runtime polish (claude-channel push delivery, hermes-native
bindings) is a follow-up PR. PR #2412 fixed the partner bug where
canvas Restart silently revoked the operator's token — the two
together unblock the external-runtime story end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
65b531acf6 |
fix(workspace): tag self-originated A2A POSTs with X-Workspace-ID
Workspace runtime fired four classes of A2A request to the platform
without the X-Workspace-ID header that identifies the source
workspace: heartbeat self-messages, initial_prompt, idle-loop fires,
and peer-to-peer A2A from runtime tools. The platform's a2a_receive
logger keys source_id off that header — without it, every such row
was written with source_id=NULL, which the canvas's My Chat tab
filters as ?source=canvas (i.e. "user typed this") and rendered the
internal triggers as if the human user had sent them. The
"Delegation results are ready..." heartbeat trigger was visible to
end users in the chat history; delegate_task A2A calls between agents
were misclassified the same way.
Centralise the header construction in a new platform_auth helper
self_source_headers(workspace_id) that returns auth_headers() PLUS
{X-Workspace-ID: <id>}. Apply it to:
- heartbeat.py self-message (refactored from inline header dict)
- main.py initial_prompt POST
- main.py idle_prompt POST
- a2a_client.py send_a2a_message (peer A2A from runtime)
- builtin_tools/a2a_tools.py delegate_task (was missing ALL headers)
Tests:
- test_heartbeat.py asserts the X-Workspace-ID header is set on
the self-message POST.
- test_a2a_tools_module.py asserts the same on delegate_task POSTs;
FakeClient.post mocks updated to accept the headers kwarg.
Production effect lands the moment workspace containers are rebuilt
with this code; existing rows in activity_logs keep their NULL
source_id (legacy data). The canvas-side filter (#follow-up)
covers the historical-rows case until backfill.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
| b5e2142c46 |
fix(#1877): close token-rotation race on restart — Option A+Option B combined
Platform side (Option B): - provisioner.go: add WriteAuthTokenToVolume() — writes .auth_token to the Docker named volume BEFORE ContainerStart using a throwaway alpine container, eliminating the race window where a restarted container could read a stale token before WriteFilesToContainer writes the new one. - workspace_provision.go: call WriteAuthTokenToVolume() in issueAndInjectToken as a best-effort pre-write before the container starts. Runtime side (Option A): - heartbeat.py: on HTTPStatusError 401 from /registry/heartbeat, call refresh_cache() to force re-read of /configs/.auth_token from disk, then retry the heartbeat once. Fall through to normal failure tracking if the retry also fails. - platform_auth.py: add refresh_cache() which discards the in-process _cached_token and calls get_token() to re-read from disk. Together these eliminate the >1 consecutive 401 window described in issue #1877. Pre-write (B) is the primary fix; runtime retry (A) is the self-healing fallback for any residual race. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
|||
|
|
479a027e4b |
chore: open-source restructure — rename dirs, remove internal files, scrub secrets
Renames: - platform/ → workspace-server/ (Go module path stays as "platform" for external dep compat — will update after plugin module republish) - workspace-template/ → workspace/ Removed (moved to separate repos or deleted): - PLAN.md — internal roadmap (move to private project board) - HANDOFF.md, AGENTS.md — one-time internal session docs - .claude/ — gitignored entirely (local agent config) - infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy - org-templates/molecule-dev/ → standalone template repo - .mcp-eval/ → molecule-mcp-server repo - test-results/ — ephemeral, gitignored Security scrubbing: - Cloudflare account/zone/KV IDs → placeholders - Real EC2 IPs → <EC2_IP> in all docs - CF token prefix, Neon project ID, Fly app names → redacted - Langfuse dev credentials → parameterized - Personal runner username/machine name → generic Community files: - CONTRIBUTING.md — build, test, branch conventions - CODE_OF_CONDUCT.md — Contributor Covenant 2.1 All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml, README, CLAUDE.md updated for new directory names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |