molecule-core/workspace/platform_inbound_auth.py
Hongming Wang 055e447355 feat(saas): deliver platform_inbound_secret via /registry/register (RFC #2312, PR-F)
Closes the SaaS-side gap that PR-A acknowledged but didn't fix: SaaS
workspaces have no persistent /configs volume, so the platform_inbound_secret
that PR-A's provisioner wrote at workspace creation never reaches the
runtime. Without this, even after the entire RFC #2312 stack lands,
SaaS chat upload would 401 (workspace fails-closed when /configs/.platform_inbound_secret
is missing).

Solution: return the secret in the /registry/register response body
on every register call. The runtime extracts it and persists to
/configs/.platform_inbound_secret at mode 0600. Idempotent — Docker-
mode workspaces also receive it and overwrite the value the provisioner
already wrote (same value until rotation).

Why on every register, not just first-register:
  * SaaS containers can be restarted (deploys, drains, EBS detach/
    re-attach) — /configs is rebuilt empty on each fresh start.
  * The auth_token is "issue once" because re-issuing rotates and
    invalidates the previous one. The inbound secret has no rotation
    flow yet (#2318) so re-sending the same value is harmless.
  * Eliminates the bootstrap window where a restarted SaaS workspace
    has no inbound secret on disk and would 401 every platform call.

Changes:
  * workspace-server/internal/handlers/registry.go — Register handler
    reads workspaces.platform_inbound_secret via wsauth.ReadPlatformInboundSecret
    and includes it in the response body. Legacy workspaces (NULL
    column) get a successful registration with the field omitted.
  * workspace-server/internal/handlers/registry_test.go — two new tests:
      - TestRegister_ReturnsPlatformInboundSecret_RFC2312_PRF: secret
        present in DB → secret in response, alongside auth_token.
      - TestRegister_NoInboundSecret_OmitsField: NULL column → field
        omitted, registration still 200.
  * workspace/platform_inbound_auth.py — adds save_inbound_secret(secret).
    Atomic write via tmp + os.replace, mode 0600 from os.open(O_CREAT,
    0o600) so a concurrent reader never sees 0644-default. Resets the
    in-process cache after write so the next get_inbound_secret() returns
    the freshly-written value (rotation-safe when it lands).
  * workspace/main.py — register-response handler extracts
    platform_inbound_secret alongside auth_token and persists via
    save_inbound_secret. Mirrors the existing save_token pattern.
  * workspace/tests/test_platform_inbound_auth.py — 6 new tests for
    save_inbound_secret: writes file, mode 0600, overwrite-existing,
    cache invalidation after save, empty-input no-op, parent-dir creation
    for fresh installs.

Test results:
  * go test ./internal/handlers/ ./internal/wsauth/ — all green
  * pytest workspace/tests/ — 1272 passed (was 1266 before this PR)

Refs #2312 (parent RFC), #2308 (chat upload 503 incident).
Stacks: PR-A #2313 → PR-B #2314 → PR-C #2315 → this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:12:34 -07:00

143 lines
5.7 KiB
Python

"""Auth gate for the /internal/* Starlette routes.
The platform calls into the workspace's HTTP server using a per-workspace
shared secret minted at provision time and stored in
``/configs/.platform_inbound_secret`` (see migration 044 + RFC #2312).
The workspace validates by string-equality against the file content —
the platform side stores the same plaintext in ``workspaces
.platform_inbound_secret`` and reads it back on every forward call.
Asymmetric to ``platform_auth.py``:
platform_auth.py platform_inbound_auth.py
──────────────── ────────────────────────
workspace → platform platform → workspace
/configs/.auth_token /configs/.platform_inbound_secret
workspace presents bearer workspace validates bearer
Fail-closed semantics (mirrors transcript_auth.py): if the secret file is
missing, empty, or unreadable, every request is rejected. The platform
will surface this as a structural error rather than silently sending
unauthenticated requests through.
"""
from __future__ import annotations
import logging
import os
from pathlib import Path
logger = logging.getLogger(__name__)
# In-process cache so we don't hit disk on every forward call. Same
# pattern as platform_auth._cached_token. The file is the durable copy;
# this var is the hot path.
_cached_secret: str | None = None
def _secret_file() -> Path:
"""Path to the on-disk inbound-secret file. Respects CONFIGS_DIR,
falls back to /configs for the default container layout."""
return Path(os.environ.get("CONFIGS_DIR", "/configs")) / ".platform_inbound_secret"
def get_inbound_secret() -> str | None:
"""Return the cached inbound secret, reading from disk on first call.
Returns None if the file is missing, empty, or unreadable. Callers
MUST treat None as an auth failure (fail-closed) — never substitute
a default or skip-auth-on-missing semantics.
"""
global _cached_secret
if _cached_secret is not None:
return _cached_secret
path = _secret_file()
if not path.exists():
return None
try:
secret = path.read_text().strip()
except OSError as exc:
logger.warning("platform_inbound_auth: read %s failed: %s", path, exc)
return None
if not secret:
return None
_cached_secret = secret
return secret
def reset_cache() -> None:
"""Drop the in-process cache. Used by tests + the rare runtime-side
path that needs to re-read after the file is overwritten (e.g. a
rotation flow lands in the future)."""
global _cached_secret
_cached_secret = None
def save_inbound_secret(secret: str) -> None:
"""Persist a freshly-received platform_inbound_secret to disk.
Called from the /registry/register response handler when the platform
returns a `platform_inbound_secret` field. Mirrors platform_auth.save_token's
pattern: 0600 file in CONFIGS_DIR, atomic write via tmp + rename so a
concurrent reader never sees a partial file.
Idempotent: writing the same value over an existing file is a no-op
from the workspace's perspective. Resets the in-process cache so the
next get_inbound_secret() returns the freshly-written value (matters
when a future rotation flow lands and the platform sends a different
secret on a subsequent register call).
"""
global _cached_secret
if not secret:
return
path = _secret_file()
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + ".tmp")
try:
# Open with 0600 from the start so a concurrent reader can never
# see a 0644-default fd before the chmod. mode= is honored by
# os.open underneath; pathlib.write_text does not expose it.
fd = os.open(str(tmp), os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
with os.fdopen(fd, "w") as f:
f.write(secret)
os.replace(str(tmp), str(path))
# Race-safe in-process cache update: clear first, then let next
# caller re-read disk. Avoids the "stored new, cache still has
# old" window if get_inbound_secret races with this write.
_cached_secret = None
except OSError as exc:
logger.warning("platform_inbound_auth: save %s failed: %s", path, exc)
# Best-effort cleanup of the tmp file.
try:
os.unlink(str(tmp))
except OSError as cleanup_exc:
logger.debug("platform_inbound_auth: unlink tmp %s failed: %s", tmp, cleanup_exc)
def inbound_authorized(expected_secret: str | None, auth_header: str) -> bool:
"""Return True iff a /internal/* request should be served.
Args:
expected_secret: the workspace's stored inbound secret, or None
if /configs/.platform_inbound_secret is absent / empty /
unreadable.
auth_header: raw Authorization request header value.
Behavior:
- None / empty expected → fail closed. A missing secret file
is an auth failure, not a bypass.
- Non-empty expected → strict string-equality against
"Bearer <secret>". Bearer prefix is case-sensitive (matches
the platform's wsauth.BearerTokenFromHeader contract).
Constant-time comparison is used to avoid leaking the secret one
byte at a time via timing analysis on a network-reachable endpoint.
"""
if not expected_secret:
return False
expected = f"Bearer {expected_secret}"
# hmac.compare_digest is the stdlib constant-time string compare.
# Length mismatch is documented to short-circuit safely (returns
# False without leaking length-difference timing).
import hmac
return hmac.compare_digest(auth_header, expected)