molecule-core/workspace/platform_auth.py
Hongming Wang 169e284d57 feat(workspace-runtime): expose universal MCP server to runtime=external operators
Ship the baseline universal MCP path that any external runtime (Claude
Code, hermes, codex, anything that speaks MCP stdio) can use, before
optimizing per-runtime channels. Today the workspace MCP server only
spins up inside the container; external operators have no way to call
the 8 platform tools (delegate_task, list_peers, send_message_to_user,
commit_memory, etc.) from outside.

Three additive changes:

1. **`platform_auth.get_token()` env-var fallback** — adds
   `MOLECULE_WORKSPACE_TOKEN` as a fallback when no
   `${CONFIGS_DIR}/.auth_token` file exists. File-first preserves
   in-container behavior unchanged. External operators (no /configs
   volume) now have a way to supply the token without faking the
   filesystem layout.

2. **`molecule-mcp` console script** — adds a new entry point in the
   published `molecule-ai-workspace-runtime` PyPI wheel. Operators run
   `pip install molecule-ai-workspace-runtime`, set 3 env vars
   (WORKSPACE_ID, PLATFORM_URL, MOLECULE_WORKSPACE_TOKEN), and register
   the binary in their agent's MCP config. `mcp_cli.main` is a thin
   validator wrapper — it checks env BEFORE importing the heavy
   `a2a_mcp_server` module so a misconfigured first-run gets a friendly
   3-line error instead of a 20-line module-level RuntimeError
   traceback.

3. **Wheel smoke gate** — extends `scripts/wheel_smoke.py` to assert
   `cli_main` and `mcp_cli.main` are importable. Same regression class
   as the 0.1.16 main_sync incident: a silent rename or unrewritten
   import here would break every external operator on the next wheel
   publish (memory: feedback_runtime_publish_pipeline_gates.md).

Test coverage:
- `tests/test_platform_auth.py` — 8 new tests for the env-var fallback:
  file-priority, env-fallback, whitespace handling, cache, header
  construction, empty-env-as-unset.
- `tests/test_mcp_cli.py` — 8 new tests for the validator: each
  required var separately, file-or-env satisfies token requirement,
  whitespace-only env treated as missing, help mentions canvas Tokens
  tab.
- Full `workspace/tests/` suite green: 1346 passed, 1 skipped.
- Local end-to-end: built wheel, installed in venv, ran `molecule-mcp`
  with no env → friendly error; with env → MCP server starts.

Why now / why this shape: user redirect was "support the baseline
first so all runtimes can use, then optimize". A claude-only MCP
channel leaves hermes/codex/third-party operators broken on
runtime=external. This PR ships the runtime-agnostic baseline; per-
runtime polish (claude-channel push delivery, hermes-native
bindings) is a follow-up PR. PR #2412 fixed the partner bug where
canvas Restart silently revoked the operator's token — the two
together unblock the external-runtime story end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:20:19 -07:00

157 lines
5.9 KiB
Python

"""Workspace auth-token store (Phase 30.1).
Single source of truth for this workspace's authentication token. The
token is issued by the platform on the first successful
``POST /registry/register`` call and travels with every subsequent
heartbeat / update-card / (later) secrets-pull / A2A request.
The token is persisted to ``<configs>/.auth_token`` so it survives
restarts — we only expect to receive it once from the platform, since
``/registry/register`` no-ops token issuance for workspaces that already
have one on file.
Storage:
${CONFIGS_DIR}/.auth_token # 0600, one line, no trailing newline
Callers interact with three functions:
:func:`get_token` — returns the cached token or None
:func:`save_token` — persists a freshly-issued token
:func:`auth_headers`— builds the Authorization header dict for httpx
"""
from __future__ import annotations
import logging
import os
from pathlib import Path
logger = logging.getLogger(__name__)
# In-process cache so we don't hit disk on every heartbeat. The heartbeat
# loop fires on a short interval and reading a tiny file 10x per minute
# is wasteful. The file is the durable copy; this var is the hot path.
_cached_token: str | None = None
def _token_file() -> Path:
"""Path to the on-disk token file. Respects CONFIGS_DIR, falls back
to /configs for the default container layout."""
return Path(os.environ.get("CONFIGS_DIR", "/configs")) / ".auth_token"
def get_token() -> str | None:
"""Return the cached token, reading it from disk on first call.
Resolution order:
1. In-process cache (hot path)
2. ``${CONFIGS_DIR}/.auth_token`` file (in-container default —
the platform writes this on provision and rotates it on
restart)
3. ``MOLECULE_WORKSPACE_TOKEN`` env var (external-runtime path —
operators running the universal MCP server outside a
container have no /configs volume to populate, so they pass
the token via env)
File-first preserves in-container behavior unchanged: containers
always have /configs/.auth_token on disk, env-var fallback only
fires when there's no file. This is additive — no existing caller
sees a behavior change.
"""
global _cached_token
if _cached_token is not None:
return _cached_token
path = _token_file()
if path.exists():
try:
tok = path.read_text().strip()
except OSError as exc:
logger.warning("platform_auth: failed to read %s: %s", path, exc)
tok = ""
if tok:
_cached_token = tok
return tok
# File missing or empty — fall back to env (external-runtime path).
env_tok = os.environ.get("MOLECULE_WORKSPACE_TOKEN", "").strip()
if env_tok:
_cached_token = env_tok
return env_tok
return None
def save_token(token: str) -> None:
"""Persist a newly-issued token. Creates the file with 0600 mode atomically.
Uses ``os.open(O_CREAT, 0o600)`` so the file is never world-readable,
even transiently. The previous ``write_text()`` + ``chmod()`` approach
had a TOCTOU window where a concurrent reader could access the token
between the two syscalls (M4 — flagged in security audit cycle 10).
Idempotent — if an identical token is already on disk we skip the
write so we don't churn the file's mtime or trigger spurious
filesystem watchers."""
global _cached_token
token = token.strip()
if not token:
raise ValueError("platform_auth: refusing to save empty token")
if get_token() == token:
return
path = _token_file()
path.parent.mkdir(parents=True, exist_ok=True)
# O_CREAT | O_WRONLY | O_TRUNC with mode=0o600 atomically creates (or
# truncates) the file with restricted permissions in a single syscall,
# eliminating the TOCTOU window.
fd = os.open(str(path), os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
try:
os.write(fd, token.encode())
finally:
os.close(fd)
_cached_token = token
def auth_headers() -> dict[str, str]:
"""Return a header dict to merge into httpx calls. Empty if no token
is available yet — callers send the request as-is and the platform's
heartbeat handler grandfathers pre-token workspaces through until
their next /registry/register issues one."""
tok = get_token()
if not tok:
return {}
return {"Authorization": f"Bearer {tok}"}
def self_source_headers(workspace_id: str) -> dict[str, str]:
"""Return auth headers PLUS X-Workspace-ID identifying this workspace
as the source of the request.
Use this for any POST the workspace's own runtime fires against the
platform's A2A endpoints — heartbeat self-messages, initial_prompt,
idle-loop fires, peer-to-peer A2A from runtime tools. Without the
X-Workspace-ID header the platform's a2a_receive logger writes
source_id=NULL, which the canvas's My Chat tab interprets as a
user-typed message and renders the internal prompt to the user.
See workspace-server/internal/handlers/a2a_proxy.go:184 for the
server-side classification rule.
Centralised here so adding a new system header (e.g. a per-fire
correlation ID) only touches one place — and so that any
workspace→A2A POST that doesn't use this helper stands out in
review as a probable bug."""
return {**auth_headers(), "X-Workspace-ID": workspace_id}
def clear_cache() -> None:
"""Reset the in-memory cache. Used by tests that write fresh token
files between cases."""
global _cached_token
_cached_token = None
def refresh_cache() -> str | None:
"""Force re-read of the token from disk, discarding the in-process cache.
Use this when a 401 response suggests the cached token is stale —
e.g. after the platform rotates tokens during a restart (issue #1877).
Returns the (new) token value or None if not found/error."""
global _cached_token
_cached_token = None
return get_token()