chore(runtime): delete core workspace copy
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 17s
E2E Chat / detect-changes (pull_request) Successful in 13s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Harness Replays / detect-changes (pull_request) Successful in 5s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Failing after 1m43s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m36s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 1m29s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m10s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 3s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m23s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 5m13s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m12s
gate-check-v3 / gate-check (pull_request) Successful in 9s
security-review / approved (pull_request) Failing after 5s
qa-review / approved (pull_request) Failing after 7s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 5s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m34s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m15s
CI / Canvas (Next.js) (pull_request) Successful in 6m19s
CI / all-required (pull_request) Successful in 5m46s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 15s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 2m1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m11s
E2E Chat / E2E Chat (pull_request) Failing after 6m36s
CI / Canvas Deploy Reminder (pull_request) Has been skipped

This commit is contained in:
core-devops
2026-05-20 14:47:55 -07:00
parent 90467540dd
commit 9aa4764301
235 changed files with 73 additions and 70615 deletions
@@ -1,60 +0,0 @@
name: cascade-list-drift-gate
# Ported from .github/workflows/cascade-list-drift-gate.yml on 2026-05-11
# per RFC internal#219 §1 sweep.
#
# Differences from the GitHub version:
# - on.paths reference .gitea/workflows/publish-runtime.yml (the active
# Gitea workflow file) instead of .github/workflows/publish-runtime.yml
# (which Category A of this sweep deletes).
# - Explicit `WORKFLOW=` arg passed to the drift script so it audits the
# .gitea/ workflow (the script's default is still .github/... which
# will not exist post-Cat-A).
# - Workflow-level env.GITHUB_SERVER_URL set per
# feedback_act_runner_github_server_url.
# - `continue-on-error: true` on the job (RFC §1 contract — surface
# defects without blocking; follow-up PR flips after triage).
#
# Structural gate: TEMPLATES list in publish-runtime.yml must match
# manifest.json's workspace_templates exactly. Closes the recurrence
# path of PR #2556 (the data fix) and is the first concrete deliverable
# of RFC #388 PR-3.
#
# Triggers narrowly to keep CI quiet: only on PRs that actually change
# one of the two files. The path-filtered split + always-emit-result
# pattern (memory: "Required check names need a job that always runs")
# is unnecessary here because the workflow IS the check name and PR
# branch protection should require it directly. Future-proof: if this
# becomes a required check, add a no-op aggregator with always() so the
# name still emits when paths don't match.
on:
pull_request:
branches: [staging, main]
paths:
- manifest.json
- .gitea/workflows/publish-runtime.yml
- scripts/check-cascade-list-vs-manifest.sh
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
permissions:
contents: read
jobs:
# bp-exempt: drift visibility gate; CI / all-required remains the required aggregate.
check:
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
# the PR. Follow-up PR flips this off after surfaced defects are
# triaged.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- name: Check cascade list matches manifest
# Pass the .gitea/ workflow path explicitly — the script's
# default still points at .github/... which Category A of this
# sweep removes.
run: bash scripts/check-cascade-list-vs-manifest.sh manifest.json .gitea/workflows/publish-runtime.yml
-225
View File
@@ -1,225 +0,0 @@
name: MCP Stdio Transport Regression
# Regression test for molecule-ai-workspace-runtime#61:
# asyncio.connect_read_pipe / connect_write_pipe fail with
# ValueError: "Pipe transport is only for pipes, sockets and character devices"
# when stdout is a regular file (openclaw capture, CI tee, debugging).
#
# This workflow reproduces the exact failure mode and verifies the
# fallback to direct buffer I/O works. It runs on every PR that
# touches the MCP server or this workflow, plus nightly cron.
#
# Why a separate workflow (not folded into ci.yml python-lint):
# - The test needs to spawn the MCP server with stdout redirected
# to a regular file (not a TTY/pipe), which conflicts with
# pytest's own capture mechanism.
# - It exercises the actual process spawn path (python a2a_mcp_server.py)
# not just unit-test mocks — closer to the real openclaw integration.
# - A dedicated workflow surfaces stdio-specific regressions without
# coupling to the broader Python test suite's coverage gate.
on:
pull_request:
branches: [main, staging]
paths:
- 'workspace/a2a_mcp_server.py'
- 'workspace/mcp_cli.py'
- 'workspace/tests/test_a2a_mcp_server.py'
- '.gitea/workflows/ci-mcp-stdio-transport.yml'
push:
branches: [main, staging]
paths:
- 'workspace/a2a_mcp_server.py'
- 'workspace/mcp_cli.py'
- 'workspace/tests/test_a2a_mcp_server.py'
- '.gitea/workflows/ci-mcp-stdio-transport.yml'
schedule:
# Nightly at 04:00 UTC — catches drift from dependency updates
# (e.g. asyncio behavior changes in new Python patch releases).
- cron: '0 4 * * *'
concurrency:
group: mcp-stdio-${{ github.ref }}
cancel-in-progress: true
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
# bp-exempt: regression canary for runtime#61; not a merge gate — informational only until promoted to required.
# mc#774: continue-on-error mask — new workflow, flip to false once it's green on ≥3 consecutive main runs.
mcp-stdio-regular-file:
name: MCP stdio with regular-file stdout
runs-on: ubuntu-latest
continue-on-error: true # mc#774
timeout-minutes: 5
env:
WORKSPACE_ID: "00000000-0000-0000-0000-000000000001"
defaults:
run:
working-directory: workspace
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.11'
cache: pip
cache-dependency-path: workspace/requirements.txt
- run: pip install -r requirements.txt pytest pytest-asyncio pytest-cov
- name: Reproduce runtime#61 — stdout as regular file
run: |
set -euo pipefail
echo "=== Reproducing molecule-ai-workspace-runtime#61 ==="
echo ""
echo "Before the fix, this command would fail with:"
echo ' ValueError: Pipe transport is only for pipes, sockets and character devices'
echo ""
# Spawn the MCP server with stdout redirected to a regular file.
# This is exactly what openclaw does when capturing MCP output.
OUTPUT=$(mktemp)
trap 'rm -f "$OUTPUT"' EXIT
# Send initialize request, then tools/list, then exit
{
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}'
echo '{"jsonrpc":"2.0","id":2,"method":"tools/list"}'
} | python a2a_mcp_server.py > "$OUTPUT" 2>&1 || {
RC=$?
echo "FAIL: MCP server exited with code $RC"
echo "--- stdout+stderr ---"
cat "$OUTPUT"
exit 1
}
echo "PASS: MCP server handled regular-file stdout without crashing"
echo ""
echo "--- Output (first 20 lines) ---"
head -20 "$OUTPUT"
echo ""
# Verify we got valid JSON-RPC responses
if grep -q '"result"' "$OUTPUT"; then
echo "PASS: JSON-RPC responses found in output"
else
echo "FAIL: No JSON-RPC responses in output"
cat "$OUTPUT"
exit 1
fi
- name: Reproduce runtime#61 — stdin from regular file
run: |
set -euo pipefail
echo "=== stdin as regular file (CI tee / capture pattern) ==="
INPUT=$(mktemp)
OUTPUT=$(mktemp)
trap 'rm -f "$INPUT" "$OUTPUT"' EXIT
cat > "$INPUT" <<'EOF'
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}
{"jsonrpc":"2.0","id":2,"method":"tools/list"}
EOF
python a2a_mcp_server.py < "$INPUT" > "$OUTPUT" 2>&1 || {
RC=$?
echo "FAIL: MCP server exited with code $RC"
cat "$OUTPUT"
exit 1
}
echo "PASS: MCP server handled regular-file stdin without crashing"
if grep -q '"result"' "$OUTPUT"; then
echo "PASS: JSON-RPC responses found in output"
else
echo "FAIL: No JSON-RPC responses in output"
cat "$OUTPUT"
exit 1
fi
- name: Verify warning is emitted for non-pipe stdio
run: |
set -euo pipefail
echo "=== Verify diagnostic warning ==="
OUTPUT=$(mktemp)
trap 'rm -f "$OUTPUT"' EXIT
{
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}'
} | python a2a_mcp_server.py > "$OUTPUT" 2>&1
# The warning should mention "not a pipe" for operator visibility
if grep -qi "not a pipe" "$OUTPUT"; then
echo "PASS: Diagnostic warning emitted for non-pipe stdio"
else
echo "NOTE: No warning in output (may be suppressed by log level)"
fi
- name: Reproduce openclaw failure — pipe held OPEN, no EOF
run: |
set -euo pipefail
echo "=== keep-stdin-open pipe (the real openclaw / Claude Code case) ==="
echo ""
echo "Before the readline() fix this HANGS: main() did"
echo " stdin.read(65536) -> on a pipe, blocks until 64KB OR EOF."
echo "An MCP client sends one ~150B initialize and keeps stdin"
echo "open waiting for the response, so the server never parsed"
echo "the request and the client timed out (openclaw: 'MCP error"
echo "-32000: Connection closed'). The earlier regular-file /"
echo "heredoc-pipe steps PASSED through this bug because a file"
echo "(or a closing heredoc) yields EOF immediately."
echo ""
# Drive the server through a real pipe that stays OPEN: write
# one initialize, do NOT close stdin, and require a response
# within a hard timeout. read(65536) -> no output -> timeout
# kills it -> FAIL. readline() -> immediate response -> PASS.
python - <<'PYEOF'
import json, subprocess, sys, time, select
proc = subprocess.Popen(
[sys.executable, "a2a_mcp_server.py"],
stdin=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
env={**__import__("os").environ},
)
req = json.dumps({
"jsonrpc": "2.0", "id": 1, "method": "initialize",
"params": {"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": {"name": "keepopen", "version": "1"}},
}) + "\n"
proc.stdin.write(req.encode())
proc.stdin.flush()
# Deliberately DO NOT close proc.stdin — mirror a live MCP client.
deadline = time.time() + 15
line = b""
while time.time() < deadline:
r, _, _ = select.select([proc.stdout], [], [], 1)
if r:
line = proc.stdout.readline()
if line:
break
proc.kill()
if not line:
print("FAIL: no response within 15s on an open pipe — "
"stdin.read(65536) regression is back")
sys.exit(1)
resp = json.loads(line.decode())
assert resp.get("id") == 1 and "result" in resp, \
f"unexpected response: {line[:200]!r}"
assert resp["result"]["serverInfo"]["name"] == "molecule", \
f"wrong serverInfo: {line[:200]!r}"
print("PASS: server answered initialize on a still-open pipe")
PYEOF
- name: Run unit tests for stdio transport
run: |
set -euo pipefail
echo "=== Running stdio transport unit tests ==="
python -m pytest tests/test_a2a_mcp_server.py::TestStdioPipeAssertion tests/test_a2a_mcp_server.py::TestStdioKeepOpenPipe -v --no-cov
+15 -70
View File
@@ -456,84 +456,29 @@ jobs:
cat /tmp/deploy-reminder.md >> "$GITHUB_STEP_SUMMARY"
# Python Lint & Test — required check, always runs.
# Runtime Python moved to molecule-ai-workspace-runtime. Keep this context as
# a guard so branch protection still catches attempts to reintroduce an
# editable runtime copy under molecule-core/workspace/.
python-lint:
name: Python Lint & Test
runs-on: ubuntu-latest
# Phase 4 (RFC #219 §1): confirmed green on main 2026-05-12.
continue-on-error: false
env:
WORKSPACE_ID: test
defaults:
run:
working-directory: workspace
steps:
- if: false
working-directory: .
run: echo "No workspace/** changes — skipping real lint+test; this job always runs to satisfy the required-check name on branch protection."
- if: always()
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- if: always()
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.11'
cache: pip
cache-dependency-path: workspace/requirements.txt
- if: always()
run: pip install -r requirements.txt pytest pytest-asyncio pytest-cov sqlalchemy>=2.0.0
# Coverage flags + fail-under floor moved into workspace/pytest.ini
# (issue #1817) so local `pytest` and CI use identical config.
- if: always()
run: python -m pytest --tb=short
- if: always()
name: Per-file critical-path coverage (MCP / inbox / auth)
# MCP-critical Python files have a per-file floor on top of the
# 86% total floor in pytest.ini. See issue #2790 for full rationale.
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Runtime SSOT guard
run: |
set -e
PER_FILE_FLOOR=75
CRITICAL_FILES=(
"a2a_mcp_server.py"
"mcp_cli.py"
"a2a_tools.py"
"a2a_tools_inbox.py"
"inbox.py"
"platform_auth.py"
)
# pytest already wrote .coverage; emit a JSON view scoped to
# the critical files so jq/python can read the per-file pct
# without parsing tabular text.
INCLUDES=$(printf '*%s,' "${CRITICAL_FILES[@]}")
INCLUDES="${INCLUDES%,}"
python -m coverage json -o /tmp/critical-cov.json --include="$INCLUDES"
FAILED=0
for f in "${CRITICAL_FILES[@]}"; do
pct=$(jq -r --arg f "$f" '.files | to_entries | map(select(.key == $f)) | .[0].value.summary.percent_covered // "MISSING"' /tmp/critical-cov.json)
if [ "$pct" = "MISSING" ]; then
echo "::error file=workspace/$f::No coverage data — file may have moved or test exclusion mis-set."
FAILED=$((FAILED+1))
continue
fi
echo "$f: ${pct}%"
if awk "BEGIN{exit !($pct < $PER_FILE_FLOOR)}"; then
echo "::error file=workspace/$f::${pct}% < ${PER_FILE_FLOOR}% per-file floor (MCP critical path). See COVERAGE_FLOOR.md."
FAILED=$((FAILED+1))
fi
done
if [ "$FAILED" -gt 0 ]; then
echo ""
echo "$FAILED MCP critical-path file(s) below the ${PER_FILE_FLOOR}% per-file floor."
echo "These paths handle multi-tenant routing, auth tokens, and inbox dispatch."
echo "A coverage drop here is the same risk shape as Go-side tokens/secrets files"
echo "dropping below 10% (see COVERAGE_FLOOR.md). Either:"
echo " (a) add tests to raise coverage back above ${PER_FILE_FLOOR}%, or"
echo " (b) if this is unavoidable historical debt, file an issue and propose"
echo " adjusting the floor with rationale in COVERAGE_FLOOR.md."
set -eu
if [ -d workspace ]; then
echo "::error file=workspace::Runtime source must live in molecule-ai-workspace-runtime, not molecule-core/workspace."
exit 1
fi
for f in scripts/build_runtime_package.py scripts/test_build_runtime_package.py; do
if [ -e "$f" ]; then
echo "::error file=$f::Legacy build-from-workspace packaging script must not be restored."
exit 1
fi
done
echo "Runtime SSOT guard passed; core consumes the standalone runtime package."
all-required:
# Aggregator sentinel — RFC internal#219 §2 (Phase 4 — closes internal#286).
-4
View File
@@ -86,8 +86,6 @@ on:
- 'workspace-server/internal/middleware/**'
- 'workspace-server/internal/handlers/registry.go'
- 'workspace-server/internal/handlers/workspace.go'
- 'workspace/a2a_mcp_server.py'
- 'workspace/platform_tools/registry.py'
- 'tests/e2e/test_peer_visibility_mcp_staging.sh'
- 'tests/e2e/test_peer_visibility_mcp_local.sh'
- 'tests/e2e/lib/peer_visibility_assert.sh'
@@ -100,8 +98,6 @@ on:
- 'workspace-server/internal/middleware/**'
- 'workspace-server/internal/handlers/registry.go'
- 'workspace-server/internal/handlers/workspace.go'
- 'workspace/a2a_mcp_server.py'
- 'workspace/platform_tools/registry.py'
- 'tests/e2e/test_peer_visibility_mcp_staging.sh'
- 'tests/e2e/test_peer_visibility_mcp_local.sh'
- 'tests/e2e/lib/peer_visibility_assert.sh'
@@ -1,177 +0,0 @@
name: publish-runtime-autobump
# Auto-bump-on-workspace-edit half of the publish pipeline.
#
# Why this file exists (issue #351):
# Gitea Actions does not correctly disambiguate `paths:` from `tags:`
# when both are bundled under a single `on.push` key. The result is
# that tag pushes get filtered out and `publish-runtime.yml` never
# fires — `action_run` rows: 0. This was unnoticed pre-2026-05-11
# because PYPI_TOKEN was absent (publishes would have failed anyway).
#
# Split design:
# - publish-runtime.yml : on.push.tags only (the publisher)
# - publish-runtime-autobump.yml: on.push.branches+paths (this file — the version-bumper)
#
# This file computes the next version from PyPI's latest, pushes a
# `runtime-v$VERSION` tag, and exits. The tag push then triggers
# publish-runtime.yml via its tags-only trigger.
#
# Concurrency: shares the `publish-runtime` group with publish-runtime.yml
# so concurrent workspace pushes serialize at the bump step. Without
# this, two pushes minutes apart could both read PyPI latest=0.1.129
# and try to tag 0.1.130 simultaneously, only one of which would land.
on:
# Run on PR pushes to post a success status so Gitea can merge the PR.
# All steps use continue-on-error: true so operational failures
# (PyPI unreachable, DISPATCH_TOKEN missing) do not block merge.
pull_request:
paths:
- "workspace/**"
# mc#1578 / a05add29 cure: build_runtime_package.py owns PYPROJECT_TEMPLATE
# (deps, classifiers, project metadata). A change there is publish-affecting
# even when workspace/** is untouched, so the autobump must fire to claim
# the next runtime-v$VERSION tag. Without this, manual tagging races PyPI
# (e.g. runtime-v0.1.18 collided with the 2026-04-27 PyPI 0.1.18 publish,
# blocking the python-multipart pin from reaching prod).
- "scripts/build_runtime_package.py"
- "scripts/test_build_runtime_package.py"
# Bump-and-tag on main/staging push (the actual operational trigger).
push:
branches:
- main
- staging
paths:
- "workspace/**"
- "scripts/build_runtime_package.py"
- "scripts/test_build_runtime_package.py"
# Manual dispatch — useful when Gitea Actions API (/actions/*) is
# unreachable (e.g. act_runner 404 on Gitea 1.22.6) and we cannot
# re-trigger via curl.
workflow_dispatch:
permissions:
contents: write # required to push tags back
concurrency:
group: publish-runtime
cancel-in-progress: false
jobs:
# PR-validation path: always succeeds so Gitea can merge workflow-only PRs.
# Operational failures (PyPI unreachable, missing DISPATCH_TOKEN) are
# surfaced via continue-on-error: true rather than blocking the merge.
# The actual bump work happens on the main/staging push after merge.
# bp-exempt: advisory validation for runtime publication; not a branch-protection gate.
pr-validate:
runs-on: ubuntu-latest
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true # do not block PR merge on operational failures
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: "3.11"
- name: Validate PyPI connectivity (best-effort)
run: |
set -eu
echo "=== Checking PyPI accessibility ==="
LATEST=$(curl -fsS --retry 3 --max-time 10 \
https://pypi.org/pypi/molecule-ai-workspace-runtime/json \
| python -c "import sys,json; print(json.load(sys.stdin)['info']['version'])" \
|| echo "PyPI unreachable (non-blocking for PR validation)")
echo "Latest: ${LATEST:-unknown}"
# Actual bump-and-tag: runs on main/staging pushes, posts real success/failure.
# No continue-on-error — operational failures here trip the main-red
# watchdog, which is the desired signal for infrastructure degradation.
# bp-exempt: post-merge tag publication side effect; CI / all-required gates source changes.
bump-and-tag:
runs-on: ubuntu-latest
# Only fire on push events (main/staging after PR merge). Pull_request
# events are handled by pr-validate above; we do NOT bump on every
# push-synchronize because that would race with the PR head.
#
# NOTE: the prior condition `github.event.pull_request.base.ref == ''`
# was broken — on a PR-merge push in Gitea Actions, the pull_request
# context is still attached (base.ref='main'), so the condition always
# evaluated to false and bump-and-tag was permanently skipped.
if: github.event_name == 'push'
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1
- name: Fetch tags for collision check
run: git fetch origin --tags --depth=1
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: "3.11"
- name: Compute next version from PyPI latest and existing tags
id: bump
run: |
set -eu
LATEST=$(curl -fsS --retry 3 https://pypi.org/pypi/molecule-ai-workspace-runtime/json \
| python -c "import sys,json; print(json.load(sys.stdin)['info']['version'])")
MAJOR=$(echo "$LATEST" | cut -d. -f1)
MINOR=$(echo "$LATEST" | cut -d. -f2)
TAG_LATEST=$(git tag --list "runtime-v${MAJOR}.${MINOR}.*" \
| sed -E 's/^runtime-v//' \
| grep -E '^[0-9]+\.[0-9]+\.[0-9]+$' \
| sort -V \
| tail -1 || true)
VERSION=$(PYPI_LATEST="$LATEST" TAG_LATEST="$TAG_LATEST" python - <<'PY'
import os
def parse(v):
return tuple(int(part) for part in v.split("."))
pypi = os.environ["PYPI_LATEST"]
tag = os.environ.get("TAG_LATEST") or pypi
base = max(parse(pypi), parse(tag))
print(f"{base[0]}.{base[1]}.{base[2] + 1}")
PY
)
echo "PyPI latest=$LATEST, latest runtime tag=${TAG_LATEST:-none} -> next=$VERSION"
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+$'; then
echo "::error::computed version $VERSION does not match PEP 440 X.Y.Z"
exit 1
fi
if git tag --list | grep -qx "runtime-v$VERSION"; then
echo "::error::tag runtime-v$VERSION already exists in this repo. Manual intervention required (PyPI and Gitea tag history are out of sync)."
exit 1
fi
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
- name: Push runtime-v$VERSION tag
env:
DISPATCH_TOKEN: ${{ secrets.DISPATCH_TOKEN }}
VERSION: ${{ steps.bump.outputs.version }}
GITEA_URL: https://git.moleculesai.app
run: |
set -eu
if [ -z "$DISPATCH_TOKEN" ]; then
echo "::error::DISPATCH_TOKEN secret is not set — needed to push the tag back to molecule-core."
exit 1
fi
git config user.name "publish-runtime autobump"
git config user.email "publish-runtime@moleculesai.app"
git tag -a "runtime-v$VERSION" \
-m "Auto-bump on workspace/** edit on $GITHUB_REF" \
-m "Triggered by: $GITHUB_REF @ $GITHUB_SHA" \
-m "publish-runtime.yml will pick up this tag and upload to PyPI"
# Push via DISPATCH_TOKEN (a Gitea PAT). Using the bot identity
# ensures the resulting tag-push event is dispatched to
# publish-runtime.yml; act_runner's default GITHUB_TOKEN cannot
# trigger downstream workflows.
git remote set-url origin "${GITEA_URL#https://}"
git remote set-url origin "https://x-access-token:${DISPATCH_TOKEN}@${GITEA_URL#https://}/molecule-ai/molecule-core.git"
git push origin "runtime-v$VERSION"
echo "✓ pushed runtime-v$VERSION — publish-runtime.yml should fire next"
-437
View File
@@ -1,437 +0,0 @@
name: publish-runtime
# Gitea Actions port of .github/workflows/publish-runtime.yml.
#
# Ported 2026-05-10 (issue #206). Key differences from the GitHub version:
# - Gitea Actions reads .gitea/workflows/, not .github/workflows/
# - Dropped `environment: pypi-publish` — Gitea Actions does not support
# named environments or OIDC trusted publishers
# - Replaced `pypa/gh-action-pypi-publish@release/v1` (OIDC) with
# `twine upload` using PYPI_TOKEN secret — same mechanism as a local
# `python -m twine upload` with a PyPI token
# - Replaced `github.ref_name` (GitHub-only) with `${GITHUB_REF#refs/tags/}`
# — Gitea Actions exposes github.ref (the full ref) but not ref_name
# - Dropped `merge_group` trigger (Gitea has no merge queue)
#
# 2026-05-10 (issue #348): originally restored `staging`/`main` branch +
# `workspace/**` path-filter trigger in PR #349.
#
# 2026-05-11 (issue #351): REVERTED the branches+paths trigger from THIS
# file. Bundling `paths` with `tags` under a single `on.push` key caused
# Gitea Actions to never dispatch the workflow for tag-push events (0
# runs in `action_run` for workflow_id='publish-runtime.yml' since the
# port, including the runtime-v1.0.0 tag — which is why PyPI is still at
# 0.1.129 despite a v1.0.0 Gitea tag existing).
#
# The auto-bump-on-workspace-edit trigger now lives in
# `.gitea/workflows/publish-runtime-autobump.yml`. That file computes the
# next version from PyPI's latest and pushes a `runtime-v$VERSION` tag,
# which THIS file then picks up via the tags-only trigger below.
#
# This decoupling means Gitea's path-vs-tag evaluator never has to
# disambiguate — each file has a single unambiguous trigger shape.
#
# PyPI publishing: requires PYPI_TOKEN repository secret (or org-level secret).
# Set via: repo Settings → Actions → Variables and Secrets → New Secret.
# The token should be a PyPI API token scoped to molecule-ai-workspace-runtime.
#
# The DISPATCH_TOKEN cascade (git push to template repos) is unchanged —
# it uses the Gitea API directly and was already Gitea-compatible.
on:
push:
tags:
- "runtime-v*"
workflow_dispatch:
# 2026-05-11 (root cause of #351 / 0 runs ever):
# Gitea 1.22.6's workflow parser rejects `workflow_dispatch.inputs.version`
# with "unknown on type" — it mis-treats the inputs sub-keys as top-level
# `on:` event types. Log line:
# actions/workflows.go:DetectWorkflows() [W] ignore invalid workflow
# "publish-runtime.yml": unknown on type: map["version": {...}]
# That `[W] ignore invalid workflow` is silent UX — the workflow never
# registers, so it never fires for ANY event (push.tags included).
# Removing the inputs block restores parsing. Manual dispatch from the
# Gitea UI now triggers the PyPI auto-bump fallback in `Derive version`
# below (no `inputs.version` to read).
permissions:
contents: read
# Serialize publishes so two concurrent tag pushes don't both compute
# "latest+1" and race on PyPI upload. The second one waits.
concurrency:
group: publish-runtime
cancel-in-progress: false
jobs:
publish:
# Dedicated publish/release lane (internal#462 / #394 / #399). Ship
# path (on: push tag runtime-v*) — reserved capacity, never FIFO
# behind PR-CI. `publish` resolves only to molecule-runner-publish-*.
runs-on: publish
outputs:
version: ${{ steps.version.outputs.version }}
wheel_sha256: ${{ steps.wheel_hash.outputs.wheel_sha256 }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: "3.11"
cache: pip
- name: Derive version (tag or PyPI auto-bump)
id: version
run: |
if echo "$GITHUB_REF" | grep -q "^refs/tags/runtime-v"; then
# Tag is `runtime-vX.Y.Z` — strip the prefix.
VERSION="${GITHUB_REF#refs/tags/runtime-v}"
else
# workflow_dispatch path (no inputs supported on Gitea 1.22.6) or
# any other non-tag trigger: derive from PyPI latest + patch bump.
LATEST=$(curl -fsS --retry 3 https://pypi.org/pypi/molecule-ai-workspace-runtime/json \
| python -c "import sys,json; print(json.load(sys.stdin)['info']['version'])")
MAJOR=$(echo "$LATEST" | cut -d. -f1)
MINOR=$(echo "$LATEST" | cut -d. -f2)
PATCH=$(echo "$LATEST" | cut -d. -f3)
VERSION="${MAJOR}.${MINOR}.$((PATCH+1))"
echo "Auto-bumped from PyPI latest $LATEST -> $VERSION"
fi
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+(\.dev[0-9]+|rc[0-9]+|a[0-9]+|b[0-9]+|\.post[0-9]+)?$'; then
echo "::error::version $VERSION does not match PEP 440"
exit 1
fi
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
echo "Publishing molecule-ai-workspace-runtime $VERSION"
- name: Install build tooling
run: pip install build twine
- name: Build package from workspace/
run: |
python scripts/build_runtime_package.py \
--version "${{ steps.version.outputs.version }}" \
--out "${{ runner.temp }}/runtime-build"
- name: Build wheel + sdist
working-directory: ${{ runner.temp }}/runtime-build
run: python -m build
- name: Capture wheel SHA256 for cascade content-verification
id: wheel_hash
working-directory: ${{ runner.temp }}/runtime-build
run: |
set -eu
WHEEL=$(ls dist/*.whl 2>/dev/null | head -1)
if [ -z "$WHEEL" ]; then
echo "::error::No .whl in dist/ — \`python -m build\` must have failed silently"
exit 1
fi
HASH=$(sha256sum "$WHEEL" | awk '{print $1}')
echo "wheel_sha256=${HASH}" >> "$GITHUB_OUTPUT"
echo "Local wheel SHA256 (pre-upload): ${HASH}"
echo "Wheel filename: $(basename "$WHEEL")"
- name: Verify package contents (sanity)
working-directory: ${{ runner.temp }}/runtime-build
run: |
python -m twine check dist/*
python -m venv /tmp/smoke
/tmp/smoke/bin/pip install --quiet dist/*.whl
/tmp/smoke/bin/python "$GITHUB_WORKSPACE/scripts/wheel_smoke.py"
# ─────────────────────────────────────────────────────────────────────
# RFC#596 (2026-05-19): Gitea PyPI registry as PRIMARY, PyPI as
# best-effort fallback. Eliminates the SPOF that caused the
# 2026-05-19 P0 (PyPI abuse-block #593 + Railway outage #595).
#
# Order is inverted intentionally:
# 1. Gitea FIRST — must succeed (our internal SSOT).
# 2. PyPI SECOND — best-effort, non-fatal on failure (courtesy
# mirror; our consumers don't depend on it after Phase 4
# template Dockerfile updates).
#
# Endpoint shape (verified live in RFC#596 Phase 5):
# POST https://git.moleculesai.app/api/packages/molecule-ai/pypi/
# HTTP Basic auth: username = gitea username, password = PAT with
# `write:package` scope. Returns 201 Created on success.
# ─────────────────────────────────────────────────────────────────────
- name: Publish to Gitea PyPI registry (PRIMARY)
id: gitea_publish
working-directory: ${{ runner.temp }}/runtime-build
env:
# MOLECULE_PYPI_GITEA_PUBLISHER_USER: Gitea username for the publisher
# persona (must own a token with `write:package` scope).
# Provisioned in RFC#596 Phase 3 (operator-config PR).
# NOTE: secret name MUST NOT start with `GITEA_` or `GITHUB_` —
# Gitea 1.22.6 reserves those prefixes for built-in env vars and
# rejects repo-secret PUT with HTTP 400 / "invalid secret name".
# Empirically reproduced 2026-05-19 against
# `/repos/molecule-ai/molecule-core/actions/secrets/GITEA_*`.
MOLECULE_PYPI_GITEA_PUBLISHER_USER: ${{ secrets.MOLECULE_PYPI_GITEA_PUBLISHER_USER }}
# MOLECULE_PYPI_GITEA_PUBLISHER_TOKEN: PAT for the publisher persona,
# `write:package` scope on molecule-ai org.
# Synced from Infisical /ci/gitea-pypi-publisher (RFC#596 Phase 3).
MOLECULE_PYPI_GITEA_PUBLISHER_TOKEN: ${{ secrets.MOLECULE_PYPI_GITEA_PUBLISHER_TOKEN }}
run: |
set -eu
if [ -z "${MOLECULE_PYPI_GITEA_PUBLISHER_TOKEN:-}" ] || [ -z "${MOLECULE_PYPI_GITEA_PUBLISHER_USER:-}" ]; then
echo "::error::MOLECULE_PYPI_GITEA_PUBLISHER_USER / MOLECULE_PYPI_GITEA_PUBLISHER_TOKEN secrets are not set."
echo "::error::Provision them via the RFC#596 Phase 3 operator-config sync script."
echo "::error::Gitea is the PRIMARY index per RFC#596 — publish job aborts here, NOT after PyPI."
exit 1
fi
python -m twine upload \
--verbose \
--repository-url "https://git.moleculesai.app/api/packages/molecule-ai/pypi/" \
--username "$MOLECULE_PYPI_GITEA_PUBLISHER_USER" \
--password "$MOLECULE_PYPI_GITEA_PUBLISHER_TOKEN" \
dist/*
echo "gitea_status=success" >> "$GITHUB_OUTPUT"
echo "gitea_url=https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/molecule-ai-workspace-runtime" >> "$GITHUB_OUTPUT"
- name: Publish to PyPI (FALLBACK, best-effort)
id: pypi_publish
# working-directory matches the preceding Build/Verify steps. Without
# this, twine runs from the default workspace checkout dir where
# `dist/` doesn't exist and fails with:
# ERROR InvalidDistribution: Cannot find file (or expand pattern): 'dist/*'
# Caught on the first-ever successful dispatch of this workflow
# (run 5097, 2026-05-11 02:08Z) — every other step in the publish
# job already had this working-directory; Publish was missing it.
#
# RFC#596: this step is `continue-on-error: true` because PyPI is
# NO LONGER the primary index. PyPI 403/timeout/abuse-block does
# NOT block the publish — Gitea already has the wheel.
continue-on-error: true
working-directory: ${{ runner.temp }}/runtime-build
env:
# PYPI_TOKEN: repository secret scoped to molecule-ai-workspace-runtime.
# Set via: Settings → Actions → Variables and Secrets → New Secret.
# Format: pypi-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
run: |
if [ -z "$PYPI_TOKEN" ]; then
echo "::warning::PYPI_TOKEN secret is not set — skipping PyPI mirror publish (non-fatal per RFC#596)."
echo "pypi_status=skipped_no_token" >> "$GITHUB_OUTPUT"
exit 0
fi
if python -m twine upload \
--verbose \
--repository pypi \
--username __token__ \
--password "$PYPI_TOKEN" \
dist/*; then
echo "pypi_status=success" >> "$GITHUB_OUTPUT"
else
rc=$?
echo "::warning::PyPI mirror publish failed (exit $rc). Non-fatal per RFC#596 — Gitea has the wheel."
echo "pypi_status=failed_exit_$rc" >> "$GITHUB_OUTPUT"
fi
echo "pypi_url=https://pypi.org/project/molecule-ai-workspace-runtime/${{ steps.version.outputs.version }}/" >> "$GITHUB_OUTPUT"
- name: Publish job summary (Gitea + PyPI status)
if: always()
run: |
{
echo "## publish-runtime $(date -u +%FT%TZ)"
echo
echo "**Version:** \`${{ steps.version.outputs.version }}\`"
echo "**Wheel SHA256:** \`${{ steps.wheel_hash.outputs.wheel_sha256 }}\`"
echo
echo "### Indexes"
echo
echo "| Index | Status | URL |"
echo "|---------|-------------------------------------------------|-----|"
echo "| Gitea (PRIMARY) | ${{ steps.gitea_publish.outputs.gitea_status || 'failed' }} | ${{ steps.gitea_publish.outputs.gitea_url || '—' }} |"
echo "| PyPI (fallback) | ${{ steps.pypi_publish.outputs.pypi_status || 'failed' }} | ${{ steps.pypi_publish.outputs.pypi_url || '—' }} |"
echo
echo "Per RFC#596: Gitea is the contract. PyPI is best-effort."
} >> "$GITHUB_STEP_SUMMARY"
cascade:
needs: publish
# Publish/release lane (internal#462) — downstream of the runtime
# publish ship job; keep it on the reserved lane too.
runs-on: publish
steps:
- name: Wait for PyPI to propagate the new version
env:
RUNTIME_VERSION: ${{ needs.publish.outputs.version }}
EXPECTED_SHA256: ${{ needs.publish.outputs.wheel_sha256 }}
run: |
set -eu
if [ -z "$EXPECTED_SHA256" ]; then
echo "::error::publish job did not expose wheel_sha256 — cannot verify wheel content. Refusing to fan out cascade."
exit 1
fi
# NOTE (RFC#596 follow-up): this propagation probe still resolves
# against PyPI's default index. After RFC#596 Phase 4 lands and
# consumers pull from Gitea first, this probe should be rewritten
# to verify the Gitea simple/ endpoint serves the new wheel
# (PyPI may be best-effort-failed and the cascade should still
# fan out, since templates will pull from Gitea). Tracked in #596.
python -m venv /tmp/propagation-probe
PROBE=/tmp/propagation-probe/bin
$PROBE/pip install --upgrade --quiet pip
for i in $(seq 1 30); do
if $PROBE/pip install \
--quiet \
--no-cache-dir \
--force-reinstall \
--no-deps \
"molecule-ai-workspace-runtime==${RUNTIME_VERSION}" \
>/dev/null 2>&1; then
INSTALLED=$($PROBE/pip show molecule-ai-workspace-runtime 2>/dev/null \
| awk -F': ' '/^Version:/{print $2}')
if [ "$INSTALLED" = "$RUNTIME_VERSION" ]; then
echo "✓ PyPI resolved $RUNTIME_VERSION (install check)"
break
fi
fi
if [ $i -eq 30 ]; then
echo "::error::pip install --no-cache-dir molecule-ai-workspace-runtime==${RUNTIME_VERSION} never resolved within ~5 min."
echo "::error::Refusing to fan out cascade against a potentially stale PyPI index."
exit 1
fi
echo " [$i/30] waiting for PyPI to propagate ${RUNTIME_VERSION}..."
sleep 4
done
# Stage (b): download wheel + SHA256 compare against what we built.
# Catches Fastly stale-content serving old bytes under a new version URL.
#
# Caught run 5196 (first-ever successful publish, 2026-05-11): the
# previous one-liner `HASH=$(pip download ... && sha256sum ...)`
# captured pip's stdout (`Collecting molecule-ai-workspace-runtime
# ==X.Y.Z`) into HASH, then the SHA comparison failed against the
# leaked `Collecting...` string. `2>/dev/null` silences stderr but
# NOT stdout; pip writes its progress to stdout by default.
# Fix: split into two steps, silence pip's stdout explicitly, capture
# only sha256sum's output into HASH.
python -m pip download \
--no-deps \
--no-cache-dir \
--dest /tmp/wheel-probe \
--quiet \
"molecule-ai-workspace-runtime==${RUNTIME_VERSION}" \
>/dev/null 2>&1
HASH=$(sha256sum /tmp/wheel-probe/*.whl | awk '{print $1}')
if [ "$HASH" != "$EXPECTED_SHA256" ]; then
echo "::error::PyPI propagated $RUNTIME_VERSION but wheel content SHA256 mismatch."
echo "::error::Expected: $EXPECTED_SHA256"
echo "::error::Got: $HASH"
echo "::error::Fastly may be serving stale content. Refusing to fan out cascade."
exit 1
fi
echo "✓ PyPI CDN verified (SHA256 match)"
- name: Fan out via push to .runtime-version
env:
# Gitea PAT with write:repository scope on the 8 cascade-active
# template repos. Used for git push to each template repo's main
# branch, which trips their `on: push: branches: [main]` trigger
# on publish-image.yml.
DISPATCH_TOKEN: ${{ secrets.DISPATCH_TOKEN }}
RUNTIME_VERSION: ${{ needs.publish.outputs.version }}
run: |
set +e # don't abort on a single repo failure — collect them all
if [ -z "$DISPATCH_TOKEN" ]; then
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "::warning::DISPATCH_TOKEN secret not set — skipping cascade."
echo "::warning::set it at Settings → Actions → Variables and Secrets → New Secret."
exit 0
fi
echo "::error::DISPATCH_TOKEN secret missing — cascade cannot fan out."
echo "::error::PyPI was published, but the 8 template repos will NOT pick up the new version."
exit 1
fi
VERSION="$RUNTIME_VERSION"
if [ -z "$VERSION" ]; then
echo "::error::publish job did not expose a version output"
exit 1
fi
GITEA_URL="${GITEA_URL:-https://git.moleculesai.app}"
# Keep in lockstep with manifest.json workspace_templates (suffix-stripped).
# Guarded by scripts/check-cascade-list-vs-manifest.sh (cascade-list-drift-gate).
# 2026-05-19: pruned crewai/deepagents/gemini-cli — not in manifest.
TEMPLATES="claude-code hermes openclaw codex langgraph autogen"
FAILED=""
SKIPPED=""
git config --global user.name "publish-runtime cascade"
git config --global user.email "publish-runtime@moleculesai.app"
WORKDIR="$(mktemp -d)"
for tpl in $TEMPLATES; do
REPO="molecule-ai/molecule-ai-workspace-template-$tpl"
CLONE="$WORKDIR/$tpl"
HTTP=$(curl -sS -o /dev/null -w "%{http_code}" \
-H "Authorization: token $DISPATCH_TOKEN" \
"$GITEA_URL/api/v1/repos/$REPO/contents/.github/workflows/publish-image.yml")
if [ "$HTTP" = "404" ]; then
echo "↷ $tpl has no publish-image.yml — soft-skip"
SKIPPED="$SKIPPED $tpl"
continue
fi
attempt=0
success=false
while [ $attempt -lt 3 ]; do
attempt=$((attempt + 1))
rm -rf "$CLONE"
if ! git clone --depth=1 \
"https://x-access-token:${DISPATCH_TOKEN}@${GITEA_URL#https://}/$REPO.git" \
"$CLONE" >/tmp/clone.log 2>&1; then
echo "::warning::clone $tpl attempt $attempt failed: $(tail -n3 /tmp/clone.log)"
sleep 2
continue
fi
cd "$CLONE"
echo "$VERSION" > .runtime-version
if git diff --quiet -- .runtime-version; then
echo "✓ $tpl already at $VERSION — no commit needed"
success=true
cd - >/dev/null
break
fi
git add .runtime-version
git commit -m "chore: pin runtime to $VERSION (publish-runtime cascade)" \
-m "Co-Authored-By: publish-runtime cascade <publish-runtime@moleculesai.app>" \
>/dev/null
if git push origin HEAD:main >/tmp/push.log 2>&1; then
echo "✓ $tpl pushed $VERSION on attempt $attempt"
success=true
cd - >/dev/null
break
fi
echo "::warning::push $tpl attempt $attempt failed, pull-rebasing"
git pull --rebase origin main >/tmp/rebase.log 2>&1 || true
cd - >/dev/null
done
if [ "$success" != "true" ]; then
FAILED="$FAILED $tpl"
fi
done
rm -rf "$WORKDIR"
if [ -n "$FAILED" ]; then
echo "::error::Cascade incomplete after 3 retries each. Failed:$FAILED"
exit 1
fi
if [ -n "$SKIPPED" ]; then
echo "Cascade complete: pinned $VERSION. Soft-skipped (no publish-image.yml):$SKIPPED"
else
echo "Cascade complete: $VERSION pinned across all manifest workspace_templates."
fi
-101
View File
@@ -1,101 +0,0 @@
name: Runtime Pin Compatibility
# Ported from .github/workflows/runtime-pin-compat.yml on 2026-05-11 per
# RFC internal#219 §1 sweep.
#
# Differences from the GitHub version:
# - Dropped `merge_group:` (no Gitea merge queue) and
# `workflow_dispatch:` (no inputs, but the trigger itself is
# parser-rejected when inputs are absent in some Gitea 1.22.x
# builds; safest to drop entirely — manual runs go via cron-trigger
# bump or push-with-paths-filter).
# - on.paths references .gitea/workflows/runtime-pin-compat.yml (this
# file) instead of the .github/ one.
# - Workflow-level env.GITHUB_SERVER_URL set.
# - `continue-on-error: true` on the job (RFC §1 contract).
#
# CI gate that prevents the 5-hour staging outage from 2026-04-24 from
# recurring (controlplane#253). The original failure mode:
# 1. molecule-ai-workspace-runtime 0.1.13 declared `a2a-sdk<1.0` in its
# requires_dist metadata (incorrect — it actually imports
# a2a.server.routes which only exists in a2a-sdk 1.0+)
# 2. `pip install molecule-ai-workspace-runtime` resolved cleanly
# 3. `from molecule_runtime.main import main_sync` raised ImportError
# 4. Every tenant workspace crashed; the canary tenant caught it but
# only after 5 hours of degraded staging
#
# This workflow installs the CURRENTLY PUBLISHED runtime from PyPI on
# top of `workspace/requirements.txt` and smoke-imports. Catches:
# - Upstream PyPI yanks
# - Bad re-releases of molecule-ai-workspace-runtime
# - Already-shipped wheels that stop importing because a transitive
# dep moved underneath
on:
push:
branches: [main, staging]
paths:
# Narrow filter: pypi-latest is sensitive only to changes that
# affect what we're INSTALLING (requirements.txt) or WHAT THE
# CHECK ITSELF DOES (this workflow file). Edits to workspace/
# source code don't change what's on PyPI right now, so they
# don't change this gate's verdict.
- 'workspace/requirements.txt'
- '.gitea/workflows/runtime-pin-compat.yml'
pull_request:
branches: [main, staging]
paths:
- 'workspace/requirements.txt'
- '.gitea/workflows/runtime-pin-compat.yml'
# Daily catch for upstream PyPI publishes that break the pin combo
# without any change in our repo (e.g. someone re-yanks an a2a-sdk
# release or molecule-ai-workspace-runtime publishes a bad bump).
schedule:
- cron: '0 13 * * *' # 06:00 PT
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
pypi-latest-install:
name: PyPI-latest install + import smoke
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
# the PR. Follow-up PR flips this off after surfaced defects are
# triaged.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.11'
cache: pip
cache-dependency-path: workspace/requirements.txt
- name: Install runtime + workspace requirements
# Install order is load-bearing: install the runtime FIRST so pip
# honors whatever a2a-sdk constraint the runtime metadata declares
# (this is the surface that broke in 2026-04-24 — runtime declared
# `a2a-sdk<1.0` but actually needed >=1.0). The follow-up install
# of workspace/requirements.txt then upgrades a2a-sdk to the
# constraint our runtime image actually pins. The import smoke
# below verifies the upgraded combination is consistent.
run: |
python -m venv /tmp/venv
/tmp/venv/bin/pip install --upgrade pip
/tmp/venv/bin/pip install molecule-ai-workspace-runtime
/tmp/venv/bin/pip install -r workspace/requirements.txt
/tmp/venv/bin/pip show molecule-ai-workspace-runtime a2a-sdk \
| grep -E '^(Name|Version):'
- name: Smoke import — fail if metadata declares deps that don't satisfy real imports
# WORKSPACE_ID is validated at import time by platform_auth.py — EC2
# user-data sets it from the cloud-init template; set a placeholder
# here so the import smoke doesn't trip on the env-var guard.
env:
WORKSPACE_ID: 00000000-0000-0000-0000-000000000001
run: |
/tmp/venv/bin/python -c "from molecule_runtime.main import main_sync; print('runtime imports OK')"
-150
View File
@@ -1,150 +0,0 @@
name: Runtime PR-Built Compatibility
# Ported from .github/workflows/runtime-prbuild-compat.yml on 2026-05-11
# per RFC internal#219 §1 sweep.
#
# Differences from the GitHub version:
# - Dropped `merge_group:` (no Gitea merge queue) and `workflow_dispatch:`
# (Gitea 1.22.6 parser-rejects workflow_dispatch with inputs and is
# finicky without them).
# - `dorny/paths-filter@v4` replaced with inline `git diff` (per PR#372
# pattern for ci.yml port).
# - on.paths references .gitea/workflows/runtime-prbuild-compat.yml.
# - Workflow-level env.GITHUB_SERVER_URL set.
# - `continue-on-error: true` on every job (RFC §1 contract).
#
# Companion to `runtime-pin-compat.yml`. That workflow tests what's
# CURRENTLY PUBLISHED on PyPI; this workflow tests what WOULD BE
# PUBLISHED if THIS PR merges.
#
# Why two workflows: the chicken-and-egg #128 fix added a "PR-built
# wheel" job to the original runtime-pin-compat.yml, but both jobs
# shared a `paths:` filter that was the union of their needs
# (`workspace/**`). That meant the PyPI-latest job ran on every doc
# edit even though the upstream PyPI artifact can't change with our
# workspace/ source. Splitting the two means each gets a narrow
# `paths:` filter that matches the inputs it actually depends on.
#
# Catches the failure mode where a PR adds an import requiring a newer
# SDK than `workspace/requirements.txt` pins:
# 1. Pip resolves the existing PyPI wheel + the old SDK pin -> smoke
# passes (it imports the OLD main.py from the wheel, not the PR's
# new main.py).
# 2. Merge -> publish-runtime.yml ships a wheel WITH the new import.
# 3. Tenant images redeploy -> all crash on first boot with ImportError.
on:
push:
branches: [main, staging]
pull_request:
branches: [main, staging]
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
concurrency:
# event_name + sha keeps PR sync and the subsequent staging push on the
# same SHA from cancelling each other (per feedback_concurrency_group_per_sha).
group: ${{ github.workflow }}-${{ github.event_name }}-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: true
jobs:
detect-changes:
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
outputs:
wheel: ${{ steps.decide.outputs.wheel }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
- id: decide
run: |
# Inline replacement for dorny/paths-filter — same pattern
# PR#372's ci.yml port used. Diffs against the PR base or the
# previous push SHA, then matches against the wheel-relevant
# path set.
#
# NOTE: Gitea Actions does not expose github.event.before as a
# shell environment variable. The ${{ github.event.before }} template
# expression works inside YAML run: blocks but is evaluated to an
# empty string for push events, making the ${VAR:-fallback} always
# use the fallback. Use GITHUB_EVENT_BEFORE instead — it IS set in
# the runner's shell environment for push events.
BASE=""
if [ "${{ github.event_name }}" = "pull_request" ]; then
BASE="${{ github.event.pull_request.base.sha }}"
elif [ -n "$GITHUB_EVENT_BEFORE" ]; then
BASE="$GITHUB_EVENT_BEFORE"
fi
if [ -z "$BASE" ] || echo "$BASE" | grep -qE '^0+$'; then
# New branch or no previous SHA: treat as wheel-relevant.
echo "wheel=true" >> "$GITHUB_OUTPUT"
exit 0
fi
if ! timeout 30 git cat-file -e "$BASE" 2>/dev/null; then
git fetch --depth=1 origin "$BASE" 2>/dev/null || true
fi
if ! timeout 30 git cat-file -e "$BASE" 2>/dev/null; then
echo "wheel=true" >> "$GITHUB_OUTPUT"
exit 0
fi
CHANGED=$(git diff --name-only "$BASE" HEAD)
if echo "$CHANGED" | grep -qE '^(workspace/|scripts/build_runtime_package\.py$|scripts/wheel_smoke\.py$|\.gitea/workflows/runtime-prbuild-compat\.yml$)'; then
echo "wheel=true" >> "$GITHUB_OUTPUT"
else
echo "wheel=false" >> "$GITHUB_OUTPUT"
fi
# ONE job (no job-level `if:`) that always runs and reports under the
# required-check name `PR-built wheel + import smoke`. Real work is
# gated per-step on `needs.detect-changes.outputs.wheel`.
local-build-install:
needs: detect-changes
name: PR-built wheel + import smoke
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
steps:
- name: No-op pass (paths filter excluded this commit)
if: needs.detect-changes.outputs.wheel != 'true'
run: |
echo "No workspace/ / scripts/{build_runtime_package,wheel_smoke}.py / workflow changes — wheel gate satisfied without rebuilding."
echo "::notice::PR-built wheel + import smoke no-op pass (paths filter excluded this commit)."
- if: needs.detect-changes.outputs.wheel == 'true'
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- if: needs.detect-changes.outputs.wheel == 'true'
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.11'
cache: pip
cache-dependency-path: workspace/requirements.txt
- name: Install build tooling
if: needs.detect-changes.outputs.wheel == 'true'
run: pip install build
- name: Build wheel from PR source (mirrors publish-runtime.yml)
if: needs.detect-changes.outputs.wheel == 'true'
# Use a fixed test version so the wheel filename is predictable.
# Doesn't reach PyPI — this build is local-only for the smoke.
run: |
python scripts/build_runtime_package.py \
--version "0.0.0.dev0+pin-compat" \
--out /tmp/runtime-build
cd /tmp/runtime-build && python -m build
- name: Install built wheel + workspace requirements
if: needs.detect-changes.outputs.wheel == 'true'
run: |
python -m venv /tmp/venv-built
/tmp/venv-built/bin/pip install --upgrade pip
/tmp/venv-built/bin/pip install /tmp/runtime-build/dist/*.whl
/tmp/venv-built/bin/pip install -r workspace/requirements.txt
/tmp/venv-built/bin/pip show molecule-ai-workspace-runtime a2a-sdk \
| grep -E '^(Name|Version):'
- name: Smoke import the PR-built wheel
if: needs.detect-changes.outputs.wheel == 'true'
# Same script publish-runtime.yml runs against the to-be-PyPI wheel.
run: |
/tmp/venv-built/bin/python "$GITHUB_WORKSPACE/scripts/wheel_smoke.py"
+13 -7
View File
@@ -58,14 +58,20 @@ jobs:
python-version: '3.11'
- name: Install .gitea script test dependencies
run: python -m pip install --quiet 'pytest==9.0.2' 'PyYAML==6.0.2'
- name: Run scripts/ unittests (build_runtime_package, ...)
# Top-level scripts/ tests live alongside their target file
# (e.g. scripts/test_build_runtime_package.py exercises
# scripts/build_runtime_package.py). discover from scripts/
# picks up only top-level test_*.py because scripts/ops/ has
# no __init__.py — that's intentional, so we run two passes.
- name: Run scripts/ unittests, if any
# Top-level scripts/ tests live alongside their target file. The
# runtime packaging tests moved to molecule-ai-workspace-runtime, so
# this pass may legitimately find no tests.
working-directory: scripts
run: python -m unittest discover -t . -p 'test_*.py' -v
run: |
set +e
python -m unittest discover -t . -p 'test_*.py' -v
rc=$?
if [ "$rc" -eq 5 ]; then
echo "No top-level scripts/ unittest files found; skipping."
exit 0
fi
exit "$rc"
- name: Run scripts/ops/ unittests (sweep_cf_decide, ...)
working-directory: scripts/ops
run: python -m unittest discover -p 'test_*.py' -v
+5 -5
View File
@@ -163,11 +163,11 @@ Most agent systems stop at "a smart runtime." Molecule AI pushes further: it giv
| Core mechanism | Molecule AI module(s) | Why it matters |
|---|---|---|
| **Durable memory that survives sessions** | `workspace/builtin_tools/memory.py`, `workspace/builtin_tools/awareness_client.py`, `workspace-server/internal/handlers/memories.go` | Memory is not just durable, it is **workspace-scoped** and can route into awareness namespaces tied to the org structure |
| **Durable memory that survives sessions** | `molecule-ai-workspace-runtime/molecule_runtime/builtin_tools/`, `workspace-server/internal/handlers/memories.go` | Memory is not just durable, it is **workspace-scoped** and can route into awareness namespaces tied to the org structure |
| **Cross-session recall** | `workspace-server/internal/handlers/activity.go` (`/workspaces/:id/session-search`) | Recall spans both activity history and memory rows, so the system can search what happened and what was learned without inventing a separate hidden store |
| **Skills built from experience** | `workspace/builtin_tools/memory.py` (`_maybe_log_skill_promotion`) | Promotion from memory into a skill candidate is surfaced as an explicit platform activity, not a silent internal side effect |
| **Skill improvement during use** | `workspace/skill_loader/watcher.py`, `workspace/skill_loader/loader.py`, `workspace/main.py` | Skills hot-reload into the live runtime, so improvements become available on the next A2A task without restarting the workspace |
| **Persistent skill lifecycle** | `workspace-server/cmd/cli/cmd_agent_skill.go`, `workspace/plugins.py` | Skills are not just generated once; they can be audited, installed, published, shared, mounted by plugins, and governed as reusable operational assets |
| **Skills built from experience** | `molecule-ai-workspace-runtime/molecule_runtime/builtin_tools/memory.py` (`_maybe_log_skill_promotion`) | Promotion from memory into a skill candidate is surfaced as an explicit platform activity, not a silent internal side effect |
| **Skill improvement during use** | `molecule-ai-workspace-runtime/molecule_runtime/skill_loader/`, `molecule-ai-workspace-runtime/molecule_runtime/main.py` | Skills hot-reload into the live runtime, so improvements become available on the next A2A task without restarting the workspace |
| **Persistent skill lifecycle** | `workspace-server/cmd/cli/cmd_agent_skill.go`, `molecule-ai-workspace-runtime/molecule_runtime/plugins.py` | Skills are not just generated once; they can be audited, installed, published, shared, mounted by plugins, and governed as reusable operational assets |
### Why this matters in Molecule AI
@@ -208,7 +208,7 @@ The result is not just “an agent that learns.” It is **an organization that
### Runtime
- unified `workspace/` image; thin AMI in production (us-east-2)
- standalone workspace-template images that install `molecule-ai-workspace-runtime` from the Gitea package registry; thin AMI in production (us-east-2)
- adapter-driven execution across **8 runtimes** (Claude Code, Hermes, Gemini CLI, LangGraph, DeepAgents, CrewAI, AutoGen, OpenClaw)
- Agent Card registration
- awareness-backed memory integration; **Memory v2 backed by pgvector** for semantic recall
+1 -1
View File
@@ -17,7 +17,7 @@ Canvas (Next.js :3000) ←WebSocket→ Platform (Go :8080) ←HTTP→ Postgres +
- **Workspace Server** (`workspace-server/`): Go/Gin control plane — workspace CRUD, registry, discovery, WebSocket hub, liveness monitoring.
- **Canvas** (`canvas/`): Next.js 15 + React Flow (@xyflow/react v12) + Zustand + Tailwind — visual workspace graph.
- **Workspace Runtime** (`workspace/`): Shared runtime published as [`molecule-ai-workspace-runtime`](https://pypi.org/project/molecule-ai-workspace-runtime/) on PyPI. Supports LangGraph, Claude Code, OpenClaw, DeepAgents, CrewAI, AutoGen. Each adapter lives in its own standalone template repo (e.g. `molecule-ai-workspace-template-claude-code`). See `docs/workspace-runtime-package.md` for the full picture.
- **Workspace Runtime**: Shared runtime published from [`molecule-ai-workspace-runtime`](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime) to the Molecule AI Gitea package registry. Supports LangGraph, Claude Code, OpenClaw, Hermes, Codex, and AutoGen. Each adapter lives in its own standalone template repo (e.g. `molecule-ai-workspace-template-claude-code`). See `docs/workspace-runtime-package.md` for the full picture.
- **molecli** (`workspace-server/cmd/cli/`): Go TUI dashboard (Bubbletea + Lipgloss) — real-time workspace monitoring, event log, health overview, delete/filter operations.
## Key Architectural Patterns
+28 -288
View File
@@ -1,304 +1,44 @@
# Workspace Runtime PyPI Package
# Workspace Runtime Package
## Requires Python >= 3.11
`molecule-ai-workspace-runtime` is the shared Python runtime consumed by
workspace template images and by external MCP integrations.
The wheel pins `requires_python>=3.11`. On Python 3.10 or older, `pip install
molecule-ai-workspace-runtime` fails with `Could not find a version that
satisfies the requirement (from versions: none)` — the pin filters the only
available artifact before pip even attempts install. Upgrade the interpreter
(`brew install python@3.12` / `apt install python3.12` / etc.) or use a
3.11+ venv.
## Source Of Truth
## Overview
The source of truth is the standalone Gitea repo:
The shared workspace runtime infrastructure has **one editable source** and
**one published artifact**:
1. **Source of truth (monorepo, editable):** `workspace/` — every runtime
change lands here. Edit it like any other monorepo code.
2. **Published artifact (PyPI, generated):** [`molecule-ai-workspace-runtime`](https://pypi.org/project/molecule-ai-workspace-runtime/)
— produced by `.github/workflows/publish-runtime.yml` on every
`runtime-vX.Y.Z` tag push. Do NOT edit this independently — it gets
overwritten on every publish.
The legacy sibling repo `molecule-ai-workspace-runtime` (the GitHub repo, as
distinct from the PyPI package) is no longer the source-of-truth and should
be treated as a publish artifact only. It can be archived or used as a
read-only mirror.
## Where to make changes
**All runtime edits land in `molecule-monorepo/workspace/`. Period.**
The GitHub repo `Molecule-AI/molecule-ai-workspace-runtime` is **mirror-only**.
It exists so external consumers (template repos, downstream operators) have a
git-cloneable artifact that mirrors the PyPI wheel — nothing more.
- **Direct PRs against `molecule-ai-workspace-runtime` are auto-rejected by
the `mirror-guard` CI check.** The check fails any push that did not come
from the publish pipeline. There is no opt-out — file the change against
`molecule-monorepo/workspace/` instead.
- **The mirror + the PyPI wheel both auto-regenerate on every push to
`staging`** via `.github/workflows/publish-runtime.yml` (which calls
`scripts/build_runtime_package.py`, builds wheel + sdist, smoke-imports,
uploads to PyPI via Trusted Publisher, and force-pushes the rewritten tree
to the mirror repo). You never touch the mirror by hand.
If you have an old local clone of the mirror and try to push a fix to it
directly, expect a CI failure with a message pointing you here. Re-open the
change against `molecule-monorepo/workspace/` and let the publish workflow
do the rest.
## Why this shape
The 8 workspace template repos (claude-code, langgraph, hermes, etc.) each
build their own Docker image and `pip install molecule-ai-workspace-runtime`
from PyPI. PyPI is the right distribution channel — semver, reproducible
builds, no submodule dance per-repo. But the runtime ALSO needs to evolve
in lock-step with the platform's wire protocol (queue shape, A2A metadata,
event payloads). Shipping cross-cutting protocol changes as separate
runtime + platform PRs in two repos creates ordering pain and broken
intermediate states.
The monorepo + auto-publish split gives both: edit cross-cutting changes
in one PR, publish the runtime artifact via a tag.
## What's in the package
Everything in `workspace/*.py` plus the `adapters/`, `builtin_tools/`,
`plugins_registry/`, `policies/`, `skill_loader/` subpackages. Build
artifacts (`Dockerfile`, `*.sh`, `pytest.ini`, `requirements.txt`) are
excluded.
The build script rewrites bare imports so the published package is a
proper Python namespace:
```
# In monorepo workspace/:
from a2a_client import discover_peer
from builtin_tools.memory import store
# In published molecule_runtime/ (auto-rewritten at publish time):
from molecule_runtime.a2a_client import discover_peer
from molecule_runtime.builtin_tools.memory import store
```text
https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime
```
The closed allowlist of rewritten module names lives in
`scripts/build_runtime_package.py` (`TOP_LEVEL_MODULES` + `SUBPACKAGES`).
Add a new top-level module to workspace/? Add it to the allowlist in the
same PR.
Do not add runtime source back under `molecule-core/workspace/`. The core repo
owns the platform server, canvas, provisioning, and tests around the installed
runtime package.
## Adapter repos
## Package Registry
Each of the 8 adapter template repos contains:
- `adapter.py` — runtime-specific `Adapter` class
- `requirements.txt``molecule-ai-workspace-runtime>=0.1.X` + adapter deps
- `Dockerfile` — standalone image with `ENV ADAPTER_MODULE=adapter` and
`ENTRYPOINT ["molecule-runtime"]`
The runtime package is published to the Molecule AI Gitea package registry:
| Adapter | Repo |
|---------|------|
| claude-code | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-claude-code |
| langgraph | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-langgraph |
| crewai | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-crewai |
| autogen | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-autogen |
| deepagents | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-deepagents |
| hermes | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-hermes |
| gemini-cli | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-gemini-cli |
| openclaw | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-openclaw |
## Adapter discovery (ADAPTER_MODULE)
Standalone adapter repos set `ENV ADAPTER_MODULE=adapter` in their
Dockerfile. The runtime's `get_adapter()` checks this env var first:
```python
# In molecule_runtime/adapters/__init__.py
def get_adapter(runtime: str) -> type[BaseAdapter]:
adapter_module = os.environ.get("ADAPTER_MODULE")
if adapter_module:
mod = importlib.import_module(adapter_module)
return getattr(mod, "Adapter")
raise KeyError(...)
```text
https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/
```
## Publishing a new version
PyPI is intentionally not part of the critical path. Template Dockerfiles,
external-runtime snippets, and CI install checks should use the Gitea registry.
```bash
# From any local checkout of monorepo, after merging your runtime change:
git tag runtime-v0.1.6
git push origin runtime-v0.1.6
```
## Release Flow
The `publish-runtime` workflow takes over — checks out the tag, runs
`scripts/build_runtime_package.py --version 0.1.6`, builds wheel + sdist,
runs a smoke import to catch broken rewrites, and uploads to PyPI via
the PyPA Trusted Publisher action (OIDC). No static API token is stored
in this repo — PyPI verifies the workflow's OIDC claim against the
trusted-publisher config registered for `molecule-ai-workspace-runtime`.
1. Land a reviewed PR in `molecule-ai-workspace-runtime`.
2. Bump `version =` in that repo's `pyproject.toml`.
3. Tag `runtime-vX.Y.Z` on the runtime repo.
4. The runtime repo's `publish-runtime` workflow builds the wheel and sdist,
publishes to the Gitea registry, verifies install from that registry, then
cascades `.runtime-version` pins to workspace template repos.
For dev/test releases without tagging, dispatch the workflow manually
with an explicit version (e.g. `0.1.6.dev1` — PEP 440 dev/rc/post forms
are accepted).
## Core Repo Contract
After publish, the 8 template repos pick up the new version on their
next `:latest` rebuild. To force-pull immediately, bump the pin in each
template's `requirements.txt`.
`molecule-core` must not ship editable runtime code. Its responsibilities are:
## End-to-end CD chain
The full chain from monorepo merge → workspace containers running new code:
```
1. Merge PR with workspace/ changes to main
2. .github/workflows/auto-tag-runtime.yml fires
↓ reads PR labels (release:major/minor) or defaults to patch
↓ pushes runtime-vX.Y.Z tag
3. .github/workflows/publish-runtime.yml fires (on the tag)
↓ builds wheel via scripts/build_runtime_package.py
↓ smoke-imports the wheel
↓ uploads to PyPI
↓ cascade job fires repository_dispatch (event-type: runtime-published)
↓ to all 8 workspace-template-* repos
4. Each template's publish-image.yml fires (on repository_dispatch)
↓ rebuilds Dockerfile (which pip-installs the new PyPI version)
↓ pushes ghcr.io/molecule-ai/workspace-template-<runtime>:latest
5. Production hosts run scripts/refresh-workspace-images.sh
OR an operator hits POST /admin/workspace-images/refresh on the platform
↓ docker pull all 8 :latest tags
↓ remove + force-recreate any running ws-* containers using a refreshed image
↓ canvas re-provisions the workspaces on next interaction
```
Steps 1-4 are fully automated. Step 5 is one-click: a single curl or shell
command. SaaS deployments typically wire step 5 into their normal deploy
pipeline (every release pulls fresh images on every host); local dev fires
it manually after a runtime release lands.
### Auth
PyPI publishing uses **Trusted Publisher (OIDC)** — no static token in the
monorepo. The trusted-publisher config on PyPI binds the
`molecule-ai-workspace-runtime` project to this repo's
`publish-runtime.yml` workflow + `pypi-publish` environment. Rotation is
moot: there is no shared secret to rotate.
### Required secrets
| Secret | Where | Why |
|---|---|---|
| `TEMPLATE_DISPATCH_TOKEN` | molecule-core repo | Fine-grained PAT with `actions:write` on the 8 template repos. Without it the `cascade` job warns and exits clean — PyPI still publishes; templates just don't auto-rebuild. |
### Step 5 specifics
**Local dev (compose stack):**
```bash
bash scripts/refresh-workspace-images.sh # all runtimes
bash scripts/refresh-workspace-images.sh --runtime claude-code
bash scripts/refresh-workspace-images.sh --no-recreate # pull only, leave containers
```
**Via platform admin endpoint (any deploy):**
```bash
curl -X POST "$PLATFORM/admin/workspace-images/refresh"
curl -X POST "$PLATFORM/admin/workspace-images/refresh?runtime=claude-code"
curl -X POST "$PLATFORM/admin/workspace-images/refresh?recreate=false"
```
The endpoint pulls + recreates from inside the platform container, so it
needs Docker socket access (the compose stack mounts
`/var/run/docker.sock` already) AND GHCR auth on the host's docker config
(`docker login ghcr.io` once per host). On a fresh host without GHCR auth,
the pull step warns per runtime and the response surfaces the failures.
**Fully hands-off (opt-in image auto-refresh):**
Set `IMAGE_AUTO_REFRESH=true` on the platform process. A watcher polls
GHCR every 5 minutes for digest changes on each `workspace-template-*:latest`
tag and invokes the same refresh logic the admin endpoint exposes —
no operator action required between "runtime PR merged" and
"containers running new code". Disabled by default because SaaS deploy
pipelines that already pull on every release would do redundant work.
Optional companion env (same as the admin endpoint):
- `GHCR_USER` + `GHCR_TOKEN` — required for private template images;
unused for the current public set, but harmless if set.
## Local dev (build the package without publishing)
```bash
python3 scripts/build_runtime_package.py --version 0.1.0-local --out /tmp/runtime-build
cd /tmp/runtime-build
python -m build # produces dist/*.whl + dist/*.tar.gz
pip install dist/*.whl # install into a venv to test locally
```
This is the same pipeline CI runs. Use it to validate import-rewrite
correctness before pushing a `runtime-v*` tag.
## Writing a new adapter
Use the GitHub template repo
[`molecule-ai/molecule-ai-workspace-template-starter`](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-starter) (note: the starter repo did not survive the 2026-05-06 GitHub-org-suspension migration; recreation tracked at internal#41)
— it ships with the canonical Dockerfile + adapter.py skeleton + config.yaml
schema + the `repository_dispatch: [runtime-published]` cascade receiver
already wired up. No follow-up setup PR required.
```bash
# Replace <runtime> with your runtime slug (lowercase, hyphenated).
gh repo create Molecule-AI/molecule-ai-workspace-template-<runtime> \
--template Molecule-AI/molecule-ai-workspace-template-starter \
--public \
--description "Molecule AI workspace template: <runtime>"
git clone https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-<runtime>.git
cd molecule-ai-workspace-template-<runtime>
```
Then fill in the `TODO` markers in:
| File | What to fill in |
|---|---|
| `adapter.py` | Rename class to `<Runtime>Adapter`. Fill in `name()`, `display_name()`, `description()`, `get_config_schema()`. Implement `setup()` and `create_executor()`. |
| `requirements.txt` | Add your runtime's pip dependencies (e.g. `langgraph`, `crewai`, `claude-agent-sdk`). |
| `Dockerfile` | Add runtime-specific apt deps (most runtimes don't need any). Replace ENTRYPOINT only if you need custom boot logic. |
| `config.yaml` | Update top-level `name`/`runtime`/`description`. Add the models your runtime supports to `models[]`. |
| `system-prompt.md` | Default agent prompt. |
After `git push`:
1. The template's `publish-image.yml` builds + pushes
`ghcr.io/molecule-ai/workspace-template-<runtime>:latest` automatically.
2. The next `runtime-vX.Y.Z` tag on `molecule-core` cascades a
`repository_dispatch` event into your new template, rebuilding the image
against the latest runtime — no setup PR required.
3. Register the runtime name in the platform's `RuntimeImages` map (in
`workspace-server/internal/provisioner/provisioner.go`) so it's
selectable in the canvas.
## When the starter itself needs to evolve
If the canonical shape changes (e.g. `config.yaml` schema gets a new field,
the `BaseAdapter` interface adds a method, the reusable CI workflow
signature changes), update the
[starter](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-starter) (recreation pending — see note above)
**first**. Existing templates can either migrate at their own pace or be
touched in a coordinated cleanup PR. Either way, future templates pick up
the new shape from day one.
## Migration note
Prior to this workflow, the runtime was duplicated across monorepo
`workspace/` AND a sibling repo `molecule-ai-workspace-runtime`, with no
sync mechanism. That caused 30+ files to drift between the two trees and
tonight's chat-leak / queued-classification fixes existed only in the
monorepo copy until manually ported.
If you have an old local checkout of `molecule-ai-workspace-runtime`, treat
it as outdated. The monorepo `workspace/` is now authoritative; the PyPI
artifact is rebuilt from it on every `runtime-v*` tag.
- Test platform behavior against the installed runtime contract.
- Keep MCP/registry/TenantGuard behavior compatible with the runtime package.
- Fail CI if `workspace/` or legacy build-from-workspace scripts are restored.
-542
View File
@@ -1,542 +0,0 @@
#!/usr/bin/env python3
"""Build the molecule-ai-workspace-runtime PyPI package from monorepo workspace/.
Monorepo workspace/ is the single source-of-truth for runtime code. The PyPI
package is a publish-time mirror produced by this script, NOT a parallel
editable copy. Anyone editing the runtime should edit workspace/, never the
sibling molecule-ai-workspace-runtime repo.
What this does
--------------
1. Copies workspace/ source into build/molecule_runtime/ (note the rename:
bare modules become a real Python package).
2. Rewrites top-level imports so e.g. `from a2a_client import X` becomes
`from molecule_runtime.a2a_client import X`. The rewrite is regex-based
on a closed allowlist of modules — third-party imports like `from a2a.X`
(the a2a-sdk package) are left alone because the regex is anchored on
exact module names.
3. Writes a pyproject.toml with the requested version + the README + the
py.typed marker.
4. Leaves the build dir ready for `python -m build` to produce a wheel/sdist.
Usage
-----
scripts/build_runtime_package.py --version 0.1.6 --out /tmp/runtime-build
cd /tmp/runtime-build && python -m build
python -m twine upload dist/*
The publish workflow (.github/workflows/publish-runtime.yml) drives this
on every `runtime-v*` tag push.
"""
from __future__ import annotations
import argparse
import re
import shutil
import sys
from pathlib import Path
# Top-level Python modules in workspace/ that become molecule_runtime.X.
# Anything imported as `from <name> import` or `import <name>` (where <name>
# matches one of these) gets rewritten to use the package prefix.
#
# Closed list (not "every .py we copy") because a typo in workspace/ would
# otherwise leak into a wrong rewrite. The set is asserted against
# `workspace/*.py` at build time — if the disk contents drift from this
# list (new module added, old one removed), the build fails loud instead
# of silently shipping unrewritten imports. That gap caused 0.1.16 to
# ship `from transcript_auth import ...` (unrewritten — module added
# without updating this set), which broke every workspace startup with
# `ModuleNotFoundError: No module named 'transcript_auth'`.
TOP_LEVEL_MODULES = {
"_sanitize_a2a",
"a2a_cli",
"a2a_client",
"a2a_executor",
"a2a_mcp_server",
"a2a_response",
"a2a_tools",
"a2a_tools_delegation",
"a2a_tools_identity",
"a2a_tools_inbox",
"a2a_tools_memory",
"a2a_tools_messaging",
"a2a_tools_rbac",
"adapter_base",
"agent",
"agents_md",
"boot_routes",
"card_helpers",
"config",
"configs_dir",
"consolidation",
"coordinator",
"event_log",
"events",
"executor_helpers",
"heartbeat",
"inbox",
"inbox_uploads",
"initial_prompt",
"internal_chat_uploads",
"internal_file_read",
"main",
"mcp_cli",
"mcp_doctor",
"mcp_heartbeat",
"mcp_inbox_pollers",
"mcp_workspace_resolver",
"molecule_ai_status",
"not_configured_handler",
"platform_auth",
"platform_inbound_auth",
"plugins",
"preflight",
"prompt",
"runtime_wedge",
"secret_redactor",
"shared_runtime",
"smoke_mode",
"transcript_auth",
"watcher",
}
# Subdirectory packages — these are already real packages (they have or will
# have __init__.py) so the rewrite is `from <pkg>` → `from molecule_runtime.<pkg>`.
SUBPACKAGES = {
"adapters",
"builtin_tools",
"lib",
"platform_tools",
"plugins_registry",
"policies",
"skill_loader",
}
# Files in workspace/ NOT included in the published package. These are
# build artifacts, dev scripts, or monorepo-only scaffolding.
EXCLUDE_FILES = {
"Dockerfile",
"build-all.sh",
"rebuild-runtime-images.sh",
"entrypoint.sh",
"pytest.ini",
"requirements.txt",
# Note: adapter_base.py, agents_md.py, hermes_executor.py, shared_runtime.py
# are kept (referenced by adapters/__init__.py and other modules); they get
# their imports rewritten via TOP_LEVEL_MODULES. Excluding them broke the
# smoke-test install with `ModuleNotFoundError: adapter_base`.
}
EXCLUDE_DIRS = {
"__pycache__",
"tests",
"molecule_audit", # only used by tests; not on production import path
"scripts",
}
def build_import_rewriter() -> re.Pattern:
"""Compile a single regex matching all import statements that need
rewriting. The match groups capture the keyword + module name so the
replacement preserves whitespace and trailing punctuation.
Modules included: TOP_LEVEL_MODULES SUBPACKAGES.
The negative-lookahead on `\\.` in the suffix prevents matching
`from a2a.server.X import Y` against bare `a2a` (which isn't in our
set, but the principle matters for any future short module name that
happens to be a prefix of a real package name).
"""
names = sorted(TOP_LEVEL_MODULES | SUBPACKAGES)
alt = "|".join(re.escape(n) for n in names)
# Matches:
# from <name>(\.|\s|import)
# import <name>(\s|$|,)
# And captures the keyword + name so we can re-emit with prefix.
pattern = (
r"(?m)^(?P<indent>\s*)" # leading whitespace (preserved)
r"(?P<kw>from|import)\s+" # 'from' or 'import'
r"(?P<mod>" + alt + r")" # the module name
r"(?P<rest>[\s.,]|$)" # what follows: '.subpath', ' import …', ',', whitespace, EOL
)
return re.compile(pattern)
def rewrite_imports(text: str, regex: re.Pattern) -> str:
"""Replace bare imports with package-prefixed ones.
`import X` → `import molecule_runtime.X as X` (preserve binding)
`from X import Y` → `from molecule_runtime.X import Y`
`from X.sub import Y` → `from molecule_runtime.X.sub import Y`
Rejects `import X as Y` because the rewrite would produce
`import molecule_runtime.X as X as Y`, a syntax error. The PR #2433
incident shipped this exact pattern past `Python Lint & Test` (which
runs against pre-rewrite source) but blew up the wheel-smoke gate.
Detecting it here turns the silent build failure into a build-time
error with a clear path: use `from X import …` or plain `import X`.
"""
def repl(m: re.Match) -> str:
indent, kw, mod, rest = m.group("indent"), m.group("kw"), m.group("mod"), m.group("rest")
if kw == "from":
# `from X` or `from X.sub` — always safe to prefix.
return f"{indent}from molecule_runtime.{mod}{rest}"
# `import X` — preserve the binding name `X` (callers do `X.foo`)
# by aliasing. `import X.sub` is uncommon for our modules and would
# need a different binding form, but isn't used in workspace/ today.
if rest.startswith("."):
# `import X.sub` — rewrite as `import molecule_runtime.X.sub` and
# leave the trailing dot pattern intact for the rest of the line.
return f"{indent}import molecule_runtime.{mod}{rest}"
# Detect `import X as Y` — the regex's `rest` group captures only
# the immediate following char (whitespace, comma, or EOL), so we
# have to peek at the surrounding line context. The match start is
# at the line's `import` keyword; everything after the matched
# name on the same line is what the source author wrote.
line_start = text.rfind("\n", 0, m.start()) + 1
line_end = text.find("\n", m.end())
if line_end == -1:
line_end = len(text)
line_after = text[m.end() - len(rest):line_end]
# Strip comments from consideration so `import X # noqa` doesn't trip.
line_after_no_comment = line_after.split("#", 1)[0]
if re.search(r"^\s*as\s+\w+", line_after_no_comment):
raise ValueError(
f"rewrite_imports: cannot rewrite 'import {mod} as <alias>' on a "
f"workspace module — the regex would produce "
f"'import molecule_runtime.{mod} as {mod} as <alias>', invalid syntax. "
f"Use 'from {mod} import …' or plain 'import {mod}' instead. "
f"Offending line: {text[line_start:line_end]!r}"
)
# Plain `import X` — alias preserves the local name.
return f"{indent}import molecule_runtime.{mod} as {mod}{rest}"
return regex.sub(repl, text)
def copy_tree_filtered(src: Path, dst: Path) -> list[Path]:
"""Copy src/ → dst/ skipping EXCLUDE_FILES + EXCLUDE_DIRS. Returns the
list of .py files copied so the caller can run the import rewrite over
them in one pass."""
py_files: list[Path] = []
if dst.exists():
shutil.rmtree(dst)
dst.mkdir(parents=True)
for entry in src.iterdir():
if entry.is_dir():
if entry.name in EXCLUDE_DIRS:
continue
sub_py = copy_tree_filtered(entry, dst / entry.name)
py_files.extend(sub_py)
else:
if entry.name in EXCLUDE_FILES:
continue
shutil.copy2(entry, dst / entry.name)
if entry.suffix == ".py":
py_files.append(dst / entry.name)
return py_files
PYPROJECT_TEMPLATE = """\
[build-system]
requires = ["setuptools>=68.0", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "molecule-ai-workspace-runtime"
version = "{version}"
description = "Molecule AI workspace runtime — shared infrastructure for all agent adapters"
requires-python = ">=3.11"
license = {{text = "BSL-1.1"}}
readme = "README.md"
dependencies = [
"a2a-sdk[http-server]>=1.0.0,<2.0",
"httpx>=0.27.0",
"uvicorn>=0.30.0",
"starlette>=0.38.0",
"websockets>=12.0",
# multipart/form-data parser — required for Starlette's Request.form() on
# /internal/chat/uploads/ingest. Without it, Starlette raises AssertionError
# when parsing multipart bodies, which the chat-upload handler surfaces as
# an opaque 400. Mirrors the canonical pin in workspace/requirements.txt;
# >=0.0.27 avoids CVE-2024-53981 (DoS via malformed boundary).
# Forensic a78762a0 (2026-05-19): Hermes PDF upload 400 root cause.
"python-multipart>=0.0.27",
"pyyaml>=6.0",
"langchain-core>=0.3.0",
"opentelemetry-api>=1.24.0",
"opentelemetry-sdk>=1.24.0",
"opentelemetry-exporter-otlp-proto-http>=1.24.0",
"temporalio>=1.7.0",
]
[project.scripts]
molecule-runtime = "molecule_runtime.main:main_sync"
molecule-mcp = "molecule_runtime.mcp_cli:main"
[tool.setuptools.packages.find]
where = ["."]
include = ["molecule_runtime*", "plugins_registry*"]
[tool.setuptools.package-data]
"molecule_runtime" = ["py.typed"]
"plugins_registry" = ["py.typed"]
"""
README_TEMPLATE = """\
# molecule-ai-workspace-runtime
Shared workspace runtime for [Molecule AI](https://git.moleculesai.app/molecule-ai/molecule-core)
agent adapters. Installed by every workspace template image
(`workspace-template-claude-code`, `-langgraph`, `-hermes`, etc.) to provide
A2A delegation, heartbeat, memory, plugin loading, and skill management.
This package is **published from the molecule-core monorepo `workspace/`
directory** by the `publish-runtime` GitHub Actions workflow on every
`runtime-v*` tag push. **Do not edit this package directly** — edit
`workspace/` in the monorepo.
## External-runtime MCP server (`molecule-mcp`)
Operators running an agent outside the platform's container fleet
(any runtime that supports MCP stdio — Claude Code, hermes, codex,
etc.) can install this wheel and run the universal MCP server
locally.
### Requirements
* **Python ≥3.11.** The wheel sets `requires-python = ">=3.11"`. On
older interpreters `pip install` returns the cryptic
`Could not find a version that satisfies the requirement` — that
message is pip filtering this wheel out, NOT the package missing
from PyPI. Upgrade with `brew install python@3.12` /
`apt install python3.12` / `pyenv install 3.12` first.
* **`pipx` recommended over `pip`.** `pipx install` puts
`molecule-mcp` on PATH automatically and isolates the runtime's
deps from your system Python. Plain `pip install --user` works
but the binary lands in `~/.local/bin` (Linux) or
`~/Library/Python/3.X/bin` (macOS) which is often not on PATH on
a fresh shell — `claude mcp add molecule-<workspace-slug> -- molecule-mcp`
then fails with "command not found" at first use.
* **Server name in `claude mcp add` is workspace-specific.** The
Canvas "Add to Claude Code" snippet stamps a unique slug
(`molecule-<workspace-name>`) so a single Claude Code session can
talk to N molecule workspaces concurrently — `claude mcp add` keys
entries by name in `~/.claude.json`, so re-running with a bare
`molecule` name silently overwrites the prior workspace's entry.
See [molecule-core#1535](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1535)
for the canonical generator.
### Install
```sh
# Recommended:
pipx install molecule-ai-workspace-runtime
# Alternative (manage PATH yourself):
pip install --user molecule-ai-workspace-runtime
```
### Run
```sh
WORKSPACE_ID=<uuid> \\
PLATFORM_URL=https://<tenant>.staging.moleculesai.app \\
MOLECULE_WORKSPACE_TOKEN=<bearer> \\
molecule-mcp
```
That exposes the same 8 platform tools (`delegate_task`, `list_peers`,
`send_message_to_user`, `commit_memory`, etc.) that container-bound
runtimes already get via the workspace's auto-spawned MCP. Register
the binary in your agent's MCP config — use a workspace-specific
server name so multi-workspace setups don't collide (e.g. Claude Code:
`claude mcp add molecule-<workspace-slug> -- molecule-mcp` with the env
above; the Canvas modal stamps the right slug for you).
### Keeping the token out of shell history
Inline `MOLECULE_WORKSPACE_TOKEN=<bearer>` ends up in `~/.zsh_history`
and (when registered via `claude mcp add`) plaintext in
`~/.claude.json`. To avoid that, write the token to a 0600 file and
point `MOLECULE_WORKSPACE_TOKEN_FILE` at it:
```sh
umask 077
printf '%s' "<bearer>" > ~/.config/molecule/token
WORKSPACE_ID=<uuid> \\
PLATFORM_URL=https://<tenant>.staging.moleculesai.app \\
MOLECULE_WORKSPACE_TOKEN_FILE=$HOME/.config/molecule/token \\
molecule-mcp
```
Token resolution order: `MOLECULE_WORKSPACE_TOKEN` (inline env) →
`MOLECULE_WORKSPACE_TOKEN_FILE` (path) → `${CONFIGS_DIR}/.auth_token`
(in-container default).
The token comes from the canvas → Tokens tab. Restarting an external
workspace from the canvas no longer revokes the token (PR #2412), so
operator tokens persist across status nudges.
### Push vs poll delivery (Claude Code specifics)
By default the inbox runs in **poll mode** — every turn the agent
calls `wait_for_message`, which blocks up to ~60s on
`/activity?since_id=…`. Real-time push delivery is also supported,
but on Claude Code it requires THREE conditions, ALL of which must
hold:
1. **The MCP server declares `experimental.claude/channel`** — this
wheel does (see `_build_initialize_result`). Nothing for you to
do.
2. **Claude Code installs the server as a marketplace plugin** — a
plain `claude mcp add molecule-<workspace-slug> -- molecule-mcp`
produces a non-plugin-sourced server, which Claude Code rejects with
`channel_enable requires a marketplace plugin`. Until the
official `moleculesai/claude-code-plugin` marketplace lands
(tracking [#2936](https://git.moleculesai.app/molecule-ai/molecule-core/issues/2936)),
operators who want push must scaffold their own local marketplace
under
`~/.claude/marketplaces/molecule-local/` containing a
`marketplace.json` + `plugin.json` that points at this wheel.
3. **Claude Code is launched with the dev-channels flag** — pass
`--dangerously-load-development-channels plugin:molecule@<marketplace>`
on the `claude` invocation. Without this flag the channel
capability is silently ignored.
Symptom of any condition failing: messages arrive but only via the
poll path (every ~160s), not real-time. There's currently no
diagnostic surfaced — `molecule-mcp doctor` (tracking
[#2937](https://git.moleculesai.app/molecule-ai/molecule-core/issues/2937)) is
planned.
If you don't need real-time push, the default poll path works
universally with no extra setup; both modes converge on the same
`inbox_pop` ack so messages never duplicate.
See [`docs/workspace-runtime-package.md`](https://git.moleculesai.app/molecule-ai/molecule-core/src/branch/main/docs/workspace-runtime-package.md)
for the publish flow and architecture.
"""
def main() -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--version", required=True, help="Package version, e.g. 0.1.6")
parser.add_argument("--out", required=True, type=Path, help="Build output directory (will be wiped)")
parser.add_argument("--source", type=Path, default=Path(__file__).resolve().parent.parent / "workspace",
help="Path to monorepo workspace/ directory (default: ../workspace from this script)")
args = parser.parse_args()
src = args.source.resolve()
out = args.out.resolve()
if not src.is_dir():
print(f"error: source not a directory: {src}", file=sys.stderr)
return 2
# Drift gate: assert TOP_LEVEL_MODULES matches workspace/*.py.
# Without this, a new top-level module added to workspace/ ships
# with unrewritten `from <name> import` statements that explode at
# runtime with ModuleNotFoundError. (See 0.1.16 transcript_auth
# incident — closed list silently went stale.)
on_disk_modules = {
f.stem for f in src.glob("*.py")
if f.stem not in {"__init__", "conftest"}
}
missing = on_disk_modules - TOP_LEVEL_MODULES
stale = TOP_LEVEL_MODULES - on_disk_modules
if missing or stale:
print("error: TOP_LEVEL_MODULES drifted from workspace/*.py contents:", file=sys.stderr)
if missing:
print(f" in workspace/ but NOT in TOP_LEVEL_MODULES (will ship un-rewritten): {sorted(missing)}", file=sys.stderr)
if stale:
print(f" in TOP_LEVEL_MODULES but NOT in workspace/ (no-op, but misleading): {sorted(stale)}", file=sys.stderr)
print(" Edit scripts/build_runtime_package.py:TOP_LEVEL_MODULES to match.", file=sys.stderr)
return 3
# Same drift gate for SUBPACKAGES — catches the inverse class of
# bug where a workspace/ subdirectory is referenced by main.py
# (`from lib.pre_stop import ...`) but is either missing from
# SUBPACKAGES (so the rewriter doesn't qualify the import) or
# accidentally listed in EXCLUDE_DIRS (so the directory itself
# isn't shipped). 0.1.16-0.1.19 had `lib` in EXCLUDE_DIRS while
# main.py imported from it — `ModuleNotFoundError: No module
# named 'lib'` at every workspace startup.
on_disk_subpkgs = {
d.name for d in src.iterdir()
if d.is_dir()
and d.name not in EXCLUDE_DIRS
and d.name not in {"__pycache__"}
and (d / "__init__.py").exists()
}
sub_missing = on_disk_subpkgs - SUBPACKAGES
sub_stale = SUBPACKAGES - on_disk_subpkgs
if sub_missing or sub_stale:
print("error: SUBPACKAGES drifted from workspace/ subdirectories:", file=sys.stderr)
if sub_missing:
print(f" in workspace/ but NOT in SUBPACKAGES (will ship un-rewritten or be excluded): {sorted(sub_missing)}", file=sys.stderr)
if sub_stale:
print(f" in SUBPACKAGES but NOT in workspace/ (no-op, but misleading): {sorted(sub_stale)}", file=sys.stderr)
print(" Edit scripts/build_runtime_package.py:SUBPACKAGES + EXCLUDE_DIRS to match.", file=sys.stderr)
return 3
pkg_dir = out / "molecule_runtime"
print(f"[build] source: {src}")
print(f"[build] output: {out}")
print(f"[build] package: {pkg_dir}")
if out.exists():
shutil.rmtree(out)
out.mkdir(parents=True)
py_files = copy_tree_filtered(src, pkg_dir)
print(f"[build] copied {len(py_files)} .py files")
# Install plugins_registry/ at the wheel TOP LEVEL so that plugin adapter
# code (workspace-template-*) can use bare `from plugins_registry import ...`.
# The molecule-runtime package (molecule_runtime/) also ships it at
# molecule_runtime/plugins_registry/ (satisfies the rewritten
# `from molecule_runtime.plugins_registry import ...` in adapter_base.py).
# Both copies coexist: they serve different import namespaces.
plugins_src = src / "plugins_registry"
plugins_dst = out / "plugins_registry"
if plugins_src.is_dir():
shutil.copytree(plugins_src, plugins_dst)
print(f"[build] installed plugins_registry/ at top level (bare-import shim)")
# Ensure top-level package marker exists. workspace/ doesn't have one
# (it's not a package in monorepo), but the published artifact must.
init = pkg_dir / "__init__.py"
if not init.exists():
init.write_text('"""Molecule AI workspace runtime."""\n')
# Touch py.typed so type-checkers in adapter consumers see the package
# as typed. Empty file is the convention.
(pkg_dir / "py.typed").touch()
# Rewrite imports in every .py file we copied + the new __init__.py.
regex = build_import_rewriter()
rewrites = 0
for f in [*py_files, init]:
original = f.read_text()
rewritten = rewrite_imports(original, regex)
if rewritten != original:
f.write_text(rewritten)
rewrites += 1
print(f"[build] rewrote imports in {rewrites} files")
# Emit pyproject.toml + README at build root.
(out / "pyproject.toml").write_text(PYPROJECT_TEMPLATE.format(version=args.version))
(out / "README.md").write_text(README_TEMPLATE)
print(f"[build] done. To publish:")
print(f" cd {out}")
print(f" python -m build")
print(f" python -m twine upload dist/*")
return 0
if __name__ == "__main__":
sys.exit(main())
-95
View File
@@ -1,95 +0,0 @@
#!/usr/bin/env bash
# check-cascade-list-vs-manifest.sh — structural drift gate for the
# publish-runtime cascade list vs manifest.json workspace_templates.
#
# WHY: PR #2536 pruned the manifest to 4 supported runtimes; PR #2556
# realigned the cascade list to match. The underlying drift hazard
# (cascade-list ≠ manifest) was unguarded — the data fix didn't prevent
# recurrence. This script is the structural gate that does.
#
# Behavior-based per project pattern: derives the expected set from
# manifest.json and the actual set from the workflow YAML, fails on
# any divergence in either direction.
#
# missing-from-cascade → templates in manifest that publish-runtime.yml
# won't auto-rebuild on a new wheel publish
# (the codex-stuck-on-stale-runtime bug class)
# extra-in-cascade → cascade dispatches to deprecated templates
# (the wasted-API-calls + dead-CI-noise class)
#
# Suffix mapping: manifest names map to GHCR repos via
# {name without -default suffix} → molecule-ai-workspace-template-<suffix>
# That's the same map publish-runtime.yml's TEMPLATES variable iterates.
#
# Exit:
# 0 cascade matches manifest exactly
# 1 drift detected (script prints the diff)
# 2 bad usage / missing inputs
set -eu
MANIFEST="${1:-manifest.json}"
WORKFLOW="${2:-.github/workflows/publish-runtime.yml}"
if [ ! -f "$MANIFEST" ]; then
echo "::error::manifest not found: $MANIFEST" >&2
exit 2
fi
if [ ! -f "$WORKFLOW" ]; then
echo "::error::workflow not found: $WORKFLOW" >&2
exit 2
fi
# Expected cascade entries: manifest workspace_templates → suffix-only
# (strip -default tail, e.g. claude-code-default → claude-code, since
# publish-runtime.yml's TEMPLATES uses suffixes that match the
# molecule-ai-workspace-template-<suffix> repo naming).
EXPECTED=$(jq -r '.workspace_templates[].name' "$MANIFEST" \
| sed 's/-default$//' \
| sort -u)
# Actual cascade entries: extract from the TEMPLATES="…" line. We look
# for the line, pull the contents between the quotes, and split into
# one-per-line. Single source of truth in the workflow itself, no
# parallel registry needed.
#
# Why not \s in the regex: BSD sed (macOS) doesn't recognize \s as
# whitespace — treats it as literal `s`. POSIX [[:space:]] works on
# both BSD and GNU sed. Same hazard nuked the original draft of this
# script: \s* matched empty-prefix-of-literal-s, then the leading
# whitespace stayed in the captured group.
ACTUAL=$(grep -E '[[:space:]]*TEMPLATES="' "$WORKFLOW" \
| head -1 \
| sed -E 's/^[[:space:]]*TEMPLATES="([^"]*)".*$/\1/' \
| tr ' ' '\n' \
| grep -v '^$' \
| sort -u)
if [ -z "$ACTUAL" ]; then
echo "::error::could not extract TEMPLATES=\"…\" from $WORKFLOW — has the variable name or quoting changed?" >&2
exit 2
fi
MISSING=$(comm -23 <(printf '%s\n' "$EXPECTED") <(printf '%s\n' "$ACTUAL"))
EXTRA=$(comm -13 <(printf '%s\n' "$EXPECTED") <(printf '%s\n' "$ACTUAL"))
if [ -z "$MISSING" ] && [ -z "$EXTRA" ]; then
echo "✓ cascade list matches manifest workspace_templates ($(echo "$EXPECTED" | wc -l | tr -d ' ') entries)"
exit 0
fi
echo "::error::cascade list drift detected between $MANIFEST and $WORKFLOW" >&2
echo "" >&2
if [ -n "$MISSING" ]; then
echo " Templates in manifest but MISSING from cascade (won't auto-rebuild on wheel publish):" >&2
echo "$MISSING" | sed 's/^/ - /' >&2
echo "" >&2
fi
if [ -n "$EXTRA" ]; then
echo " Templates in cascade but NOT in manifest (deprecated, wasting dispatch calls):" >&2
echo "$EXTRA" | sed 's/^/ - /' >&2
echo "" >&2
fi
echo " Fix: edit the TEMPLATES=\"…\" line in $WORKFLOW so the set matches" >&2
echo " manifest.json's workspace_templates (suffix-stripped). See PR #2556 for context." >&2
exit 1
-201
View File
@@ -1,201 +0,0 @@
"""Tests for scripts/build_runtime_package.py — the wheel-build import rewriter.
Run locally: ``python3 -m unittest scripts/test_build_runtime_package.py -v``
Why this exists: PR #2433 shipped ``import inbox as _inbox_module`` inside
the workspace runtime, and the rewriter expanded it to
``import molecule_runtime.inbox as inbox as _inbox_module`` — invalid
Python. The wheel-smoke gate caught it post-merge but couldn't block
the merge (not a required check yet — see PR #2439). PR #2436 added a
build-time gate that raises ``ValueError`` on this pattern; this file
locks the rewriter's documented contract under unit test so the gate
itself can't silently regress.
Coverage:
- ``import X`` → ``import molecule_runtime.X as X``
- ``import X.sub`` → ``import molecule_runtime.X.sub``
- ``import X`` + trailing comment is preserved
- ``from X import Y`` → ``from molecule_runtime.X import Y``
- ``from X.sub import Y`` → ``from molecule_runtime.X.sub import Y``
- ``from X import Y, Z`` → ``from molecule_runtime.X import Y, Z``
- ``import X as Y`` → raises ValueError (the rewriter would
produce ``import molecule_runtime.X as X as Y``, syntax error)
- non-allowlist module names → not rewritten (regex anchors on the closed set)
- Indented imports (inside def/class) keep their indentation.
"""
from __future__ import annotations
import os
import sys
import unittest
# scripts/build_runtime_package.py lives at scripts/ — add scripts/ to sys.path
# so the import works whether unittest is invoked from repo root or scripts/.
HERE = os.path.dirname(os.path.abspath(__file__))
if HERE not in sys.path:
sys.path.insert(0, HERE)
import build_runtime_package as M # noqa: E402
def rewrite(text: str) -> str:
"""Run the rewriter end-to-end so the test exercises the same path
used by the wheel build (regex compile + substitution)."""
regex = M.build_import_rewriter()
return M.rewrite_imports(text, regex)
class TestBareImportRewriting(unittest.TestCase):
def test_plain_import_aliases_to_preserve_binding(self):
self.assertEqual(
rewrite("import inbox\n"),
"import molecule_runtime.inbox as inbox\n",
)
def test_plain_import_with_trailing_comment_is_preserved(self):
# Real-world shape from a2a_mcp_server.py — the comment must
# survive the rewrite without losing its leading-space buffer.
self.assertEqual(
rewrite("import inbox # noqa: E402\n"),
"import molecule_runtime.inbox as inbox # noqa: E402\n",
)
def test_import_dotted_keeps_dotted_form(self):
# `import X.sub` is rare for our modules but the rewriter must
# not double-alias — we want `import molecule_runtime.X.sub`,
# not `import molecule_runtime.X.sub as X.sub` (invalid).
self.assertEqual(
rewrite("import platform_tools.registry\n"),
"import molecule_runtime.platform_tools.registry\n",
)
def test_indented_import_preserves_indentation(self):
src = "def foo():\n import inbox\n return inbox.x\n"
out = rewrite(src)
self.assertIn(" import molecule_runtime.inbox as inbox\n", out)
class TestFromImportRewriting(unittest.TestCase):
def test_from_module_import_simple(self):
self.assertEqual(
rewrite("from inbox import InboxState\n"),
"from molecule_runtime.inbox import InboxState\n",
)
def test_from_dotted_import(self):
self.assertEqual(
rewrite("from platform_tools.registry import TOOLS\n"),
"from molecule_runtime.platform_tools.registry import TOOLS\n",
)
def test_from_import_multiple_symbols(self):
# Multi-import statement — the rewriter only touches the module
# prefix, not the names being imported.
self.assertEqual(
rewrite("from a2a_tools import (foo, bar, baz)\n"),
"from molecule_runtime.a2a_tools import (foo, bar, baz)\n",
)
def test_from_import_block_form(self):
src = (
"from a2a_tools import (\n"
" tool_check_task_status,\n"
" tool_commit_memory,\n"
")\n"
)
out = rewrite(src)
self.assertIn("from molecule_runtime.a2a_tools import (\n", out)
# Trailing names + closer are unchanged.
self.assertIn(" tool_check_task_status,\n", out)
self.assertIn(")\n", out)
class TestImportAsAliasRejection(unittest.TestCase):
"""The key regression class — the failure mode that shipped in PR #2433."""
def test_import_as_alias_raises_value_error(self):
with self.assertRaises(ValueError) as ctx:
rewrite("import inbox as _inbox_module\n")
msg = str(ctx.exception)
# Error must name the offending module + suggest the fix.
self.assertIn("inbox", msg)
self.assertIn("as <alias>", msg)
self.assertIn("from", msg) # suggests `from X import …`
def test_import_as_alias_indented_still_rejected(self):
# Indented (inside def/class) — same hazard, same rejection.
with self.assertRaises(ValueError):
rewrite("def foo():\n import inbox as _x\n")
def test_import_as_alias_with_trailing_comment_still_rejected(self):
with self.assertRaises(ValueError):
rewrite("import inbox as _x # comment\n")
def test_plain_import_with_as_in_comment_does_not_trip(self):
# The detection strips comments before pattern-matching, so a
# comment containing "as foo" must NOT trigger the rejection.
self.assertEqual(
rewrite("import inbox # rewriter produces alias as inbox\n"),
"import molecule_runtime.inbox as inbox # rewriter produces alias as inbox\n",
)
def test_import_followed_by_comma_is_not_an_alias(self):
# `import inbox, os` — comma is not `as`, must not be rejected.
# Our regex captures `inbox` then `,` — only `inbox` gets prefixed.
# `os` is not in TOP_LEVEL_MODULES so it's left alone.
out = rewrite("import inbox, os\n")
# The first module is rewritten; the second (non-allowlist) is not.
self.assertIn("import molecule_runtime.inbox as inbox", out)
class TestOutsideAllowlistModules(unittest.TestCase):
def test_third_party_imports_unchanged(self):
# `httpx`, `os`, `re` etc. are not in TOP_LEVEL_MODULES — the
# regex must not match them. This is the closed-list invariant
# that prevents accidental rewrites of stdlib / third-party.
src = "import httpx\nimport os\nfrom re import match\n"
self.assertEqual(rewrite(src), src)
def test_short_name_collision_avoided(self):
# `from a2a.server.X import Y` must not match the bare `a2a`
# prefix — `a2a` isn't in our allowlist (we allow `a2a_tools`,
# `a2a_client`, etc., but not bare `a2a`). Belt-and-suspenders.
src = "from a2a.server.routes import create_agent_card_routes\n"
self.assertEqual(rewrite(src), src)
class TestEndToEndShape(unittest.TestCase):
"""Reproduces the PR #2433 → #2436 incident shape."""
def test_pr_2433_pattern_now_rejected(self):
# The exact line PR #2433 added (inside main()), which produced
# `import molecule_runtime.inbox as inbox as _inbox_module` —
# invalid syntax in the published wheel.
with self.assertRaises(ValueError) as ctx:
rewrite(
" import inbox as _inbox_module\n"
" _inbox_module.set_notification_callback(_on_inbox_message)\n"
)
# Error message includes the offending line so the operator
# knows exactly where to fix.
self.assertIn("inbox", str(ctx.exception))
def test_pr_2436_fix_pattern_works(self):
# The fix-forward shape (#2436): top-level `import inbox`,
# bridge wired in main() via `inbox.set_notification_callback`.
src = (
"import inbox\n"
"\n"
"def main():\n"
" inbox.set_notification_callback(cb)\n"
)
out = rewrite(src)
self.assertIn("import molecule_runtime.inbox as inbox\n", out)
# The callable reference inside main() is left alone — only
# imports get rewritten, not arbitrary `inbox.foo` callsites
# (those resolve via the module binding the rewrite preserves).
self.assertIn(" inbox.set_notification_callback(cb)\n", out)
if __name__ == "__main__":
unittest.main()
+1 -1
View File
@@ -9,7 +9,7 @@ This repo uses the standard monorepo testing convention: **unit tests live with
| Go unit + integration (platform, CLI, handlers) | `workspace-server/**/*_test.go` — run with `cd workspace-server && go test -race ./...` |
| TypeScript unit (canvas components, hooks, store) | `canvas/src/**/__tests__/` — run with `cd canvas && npm test -- --run` |
| TypeScript unit (MCP server handlers) | `mcp-server/src/__tests__/` — run with `cd mcp-server && npx jest` |
| Python unit (workspace runtime, adapters) | `workspace/tests/` — run with `cd workspace && python3 -m pytest` |
| Python unit (workspace runtime, adapters) | `molecule-ai-workspace-runtime/tests/` in the standalone runtime repo |
| Python unit (SDK: plugin + remote agent) | `sdk/python/tests/` — run with `cd sdk/python && python3 -m pytest` |
| **Cross-component E2E** (spans platform + runtime + HTTP) | `tests/e2e/`**you are here** |
@@ -283,7 +283,7 @@ claude --dangerously-load-development-channels \
// externalUniversalMcpTemplate — runtime-agnostic standalone path.
// Ships as the `molecule-mcp` console script in the
// molecule-ai-workspace-runtime PyPI wheel (workspace/mcp_cli.py).
// molecule-ai-workspace-runtime wheel published to the Gitea package registry.
// Any MCP-aware runtime (Claude Code, hermes, codex, third-party)
// registers it once and gets the same 8 universal tools that
// container-bound runtimes use today: delegate_task, list_peers,
@@ -322,7 +322,7 @@ const externalUniversalMcpTemplate = `# Universal MCP — standalone register +
# 1. Install the workspace runtime wheel (once per machine — safe to
# re-run; subsequent workspaces share the same wheel):
pip install molecule-ai-workspace-runtime
pip install --index-url https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/ molecule-ai-workspace-runtime
# 2. Wire molecule-mcp into your agent's MCP config. Claude Code:
# NOTE the server name is workspace-specific ("{{MCP_SERVER_NAME}}") so
@@ -344,7 +344,7 @@ claude mcp add {{MCP_SERVER_NAME}} -s user -- env \
# needed when calling tools through the MCP server.
# Need help?
# Where to install: https://pypi.org/project/molecule-ai-workspace-runtime/
# Where to install: https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/molecule-ai-workspace-runtime/
# Documentation: https://doc.moleculesai.app/docs/guides/mcp-server-setup
# Common errors:
# • "Tools not appearing in your agent" — run ` + "`claude mcp list`" + ` (or
@@ -359,8 +359,8 @@ claude mcp add {{MCP_SERVER_NAME}} -s user -- env \
`
// externalPythonTemplate uses molecule-sdk-python's RemoteAgentClient +
// A2AServer (PR #13 in that repo). Until the SDK cuts a v0.y release
// to PyPI the snippet pins git+main.
// A2AServer. Until the SDK is published to the Gitea package registry the
// snippet pins git+main.
const externalPythonTemplate = `# pip install 'git+https://git.moleculesai.app/molecule-ai/molecule-sdk-python.git@main'
import asyncio
@@ -396,7 +396,7 @@ if __name__ == "__main__":
asyncio.run(main())
# Need help?
# Where to install: https://pypi.org/project/molecule-ai-workspace-runtime/
# Where to install: https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/molecule-ai-workspace-runtime/
# Documentation: https://doc.moleculesai.app/docs/guides/external-agent-registration
# Common errors:
# • 401 from /heartbeat — AUTH_TOKEN expired or wrong workspace_id.
@@ -445,7 +445,7 @@ const externalHermesChannelTemplate = `# Hermes channel — bridges this workspa
# also supported via the plugin's dual-mode fallback.
#
# 1. Install the runtime + plugin:
pip install molecule-ai-workspace-runtime
pip install --index-url https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/ molecule-ai-workspace-runtime
pip install 'git+https://git.moleculesai.app/molecule-ai/hermes-channel-molecule.git'
# 2. Export the workspace credentials:
@@ -528,7 +528,7 @@ const externalCodexTemplate = `# Codex external setup — outbound tools (MCP) +
# 1. Install codex CLI, the workspace runtime, and the bridge daemon:
npm install -g @openai/codex@latest
pip install molecule-ai-workspace-runtime
pip install --index-url https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/ molecule-ai-workspace-runtime
pip install codex-channel-molecule
# 2. Wire the molecule MCP server into codex's config.toml — this is
@@ -620,7 +620,7 @@ const externalKimiTemplate = `# Kimi CLI external setup — register + heartbeat
# No public URL needed; runs behind NAT in poll mode.
# 1. Install the workspace runtime wheel (provides HTTP client):
pip install molecule-ai-workspace-runtime
pip install --index-url https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/ molecule-ai-workspace-runtime
# 2. Save credentials and the bridge script:
mkdir -p ~/.molecule-ai/kimi-{{MCP_SERVER_NAME}}
@@ -779,7 +779,7 @@ const externalOpenClawTemplate = `# OpenClaw MCP config — outbound tool path.
# (register-on-startup + 20s heartbeat). Older versions only ship
# a2a_mcp_server which does not heartbeat.
npm install -g openclaw@latest
pip install "molecule-ai-workspace-runtime>=0.1.999"
pip install --index-url https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/ "molecule-ai-workspace-runtime>=0.1.999"
# 2. Onboard openclaw against your model provider (one-time setup).
# --non-interactive needs an explicit --provider + --model so it
-13
View File
@@ -1,13 +0,0 @@
# coverage.py config — consumed by `pytest --cov` via the pytest-cov
# plugin. Lives here (not in pytest.ini) because coverage.py only reads
# .coveragerc / setup.cfg / tox.ini / pyproject.toml — the [coverage:*]
# sections in pytest.ini are silently ignored. See issue #1817.
[run]
omit =
*/tests/*
*/__init__.py
plugins_registry/*
[report]
# Skip files at 100% in the term-missing output to keep CI logs readable.
skip_covered = True
-104
View File
@@ -1,104 +0,0 @@
FROM python:3.11-slim@sha256:e78299e55776ca065dcb769f80161f48465ad352014240eb5fe4712e22505e9b
WORKDIR /app
# Install Node.js, git, gh CLI in a single layer to minimize image size
RUN apt-get update && \
apt-get install -y --no-install-recommends curl git ca-certificates && \
# Node.js 22
curl -fsSL https://deb.nodesource.com/setup_22.x | bash - && \
apt-get install -y --no-install-recommends nodejs && \
# GitHub CLI
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg \
| dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg && \
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" \
> /etc/apt/sources.list.d/github-cli.list && \
apt-get update && apt-get install -y --no-install-recommends gh && \
# Cleanup apt caches and temp files
apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# Create non-root user (claude --dangerously-skip-permissions refuses root)
RUN useradd -m -s /bin/bash agent
# Install base Python dependencies (A2A SDK + HTTP only)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy runtime code (adapters/ has been removed — adapters now live in standalone
# template repos and install molecule-ai-workspace-runtime from PyPI)
COPY *.py ./
COPY entrypoint.sh ./
COPY skill_loader/ ./skill_loader/
COPY builtin_tools/ ./builtin_tools/
COPY plugins_registry/ ./plugins_registry/
COPY policies/ ./policies/
# Create CLI aliases
RUN ln -s /app/a2a_cli.py /usr/local/bin/a2a && chmod +x /app/a2a_cli.py /app/a2a_mcp_server.py && \
ln -s /app/molecule_ai_status.py /usr/local/bin/molecule-monorepo-status && chmod +x /app/molecule_ai_status.py
# gh wrapper — auto-prefixes PR / issue titles with the agent role + appends
# a body footer. Every agent in the template shares one GitHub PAT so plain
# `gh pr list` can't distinguish workspaces; the wrapper reads GIT_AUTHOR_NAME
# (set by the platform provisioner, "Molecule AI <Role>") and rewrites the
# title/body accordingly. Fails open when the env is missing. Anything that
# isn't `gh pr create` or `gh issue create` passes through untouched.
# /usr/local/bin is earlier in PATH than /usr/bin/gh so this shadows the
# real binary without renaming it.
COPY scripts/gh-wrapper.sh /usr/local/bin/gh
RUN chmod +x /usr/local/bin/gh
# Copy the git credential helper so entrypoint.sh can register it at boot.
# molecule-git-token-helper.sh fetches a fresh GitHub App installation token
# from the platform on every git push/fetch, preventing stale-token failures
# after the ~60 min GitHub App token TTL (issue #613 / #547).
COPY scripts/molecule-git-token-helper.sh ./scripts/
RUN chmod +x ./scripts/molecule-git-token-helper.sh
# Copy the background token refresh daemon. Runs as a background process
# started by entrypoint.sh — refreshes gh CLI auth and the credential
# helper cache every 45 min so tokens never expire mid-operation.
COPY scripts/molecule-gh-token-refresh.sh ./scripts/
RUN chmod +x ./scripts/molecule-gh-token-refresh.sh
# Generic GIT_ASKPASS helper. Reads HTTPS Basic-Auth credentials from env
# vars (GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD, with GITEA_USER / GITEA_TOKEN
# as fallback) and emits them on the git credential-prompt protocol so
# container-side `git` can authenticate to any private HTTPS remote
# without on-disk .gitconfig / .git-credentials mutation. The platform
# provisioner sets GIT_ASKPASS=/usr/local/bin/molecule-askpass via
# applyAgentGitIdentity (workspace-server/internal/handlers/agent_git_identity.go).
# Filename is the only project-specific marker; the script body contains
# no vendor literals and is identical to the script shipped in each
# open-source workspace template (scripts/git-askpass.sh).
COPY scripts/molecule-askpass /usr/local/bin/molecule-askpass
RUN chmod +x /usr/local/bin/molecule-askpass
# Dirs and permissions
RUN mkdir -p /workspace /plugins /home/agent/.claude /home/agent/.config /home/agent/.local \
/home/agent/.molecule-token-cache && \
chown -R agent:agent /app /home/agent /workspace
# Install gosu for clean root → agent user handoff in entrypoint.
# The entrypoint starts as root to fix volume ownership, then exec's
# as the agent user so Claude Code's --dangerously-skip-permissions works.
RUN apt-get update && apt-get install -y --no-install-recommends gosu && \
rm -rf /var/lib/apt/lists/*
VOLUME /configs
VOLUME /workspace
EXPOSE 8000
# HEALTHCHECK: probe the A2A agent-card endpoint so orchestrators and
# container runtimes can detect a live, responsive workspace agent.
# Uses curl (present in python:3.11-slim base) against the uvicorn server.
# PORT is injected at runtime via the molecule-runtime entrypoint; the
# default matches EXPOSE.
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD curl -sf http://localhost:${PORT:-8000}/agent/card >/dev/null || exit 1
RUN chmod +x /app/entrypoint.sh
# Start as root — entrypoint fixes volume permissions then drops to agent
CMD ["./entrypoint.sh"]
-1
View File
@@ -1 +0,0 @@
# trigger autobump for python-multipart pin (PDF P0 cure)
-105
View File
@@ -1,105 +0,0 @@
"""OFFSEC-003: A2A peer-result sanitization — shared across delegation tools.
This module is intentionally a LEAF (no imports from the molecule-runtime
package) to avoid circular dependency cycles. Both ``a2a_tools_delegation``
and ``a2a_tools`` can import from here without creating import loops.
Trust-boundary design (OFFSEC-003):
A2A peer responses are untrusted third-party content. Before passing
them to the agent context, they MUST be wrapped in a trust-boundary
marker pair so the calling agent knows the content is external.
Boundary markers:
- _A2A_BOUNDARY_START = "[A2A_RESULT_FROM_PEER]"
- _A2A_BOUNDARY_END = "[/A2A_RESULT_FROM_PEER]"
The boundary is the PRIMARY security control. A peer that sends
"[A2A_RESULT_FROM_PEER]evil[/A2A_RESULT_FROM_PEER]safe" can make "safe"
appear inside the trusted context unless the markers themselves are
escaped before wrapping — see _escape_boundary_markers() below.
Defense-in-depth (secondary):
Known prompt-injection control-words are also escaped so that even
if a calling agent ignores the boundary marker, embedded attack
patterns (SYSTEM:, OVERRIDE:, etc.) lose their special meaning.
This is not a complete injection sanitizer — do not rely on it as
the primary control.
"""
from __future__ import annotations
import re
# ── Trust-boundary markers ────────────────────────────────────────────────────
_A2A_BOUNDARY_START = "[A2A_RESULT_FROM_PEER]"
_A2A_BOUNDARY_END = "[/A2A_RESULT_FROM_PEER]"
# ── Boundary-marker escaping ─────────────────────────────────────────────────
# A peer that sends "[/A2A_RESULT_FROM_PEER]evil" can make "evil" appear
# inside the trusted zone. Escape BOTH boundary markers in the raw text
# before wrapping so they can never close the boundary early.
# We use "[/ " as the escape prefix — visually distinct from the real marker.
_A2A_BOUNDARY_START_ESCAPED = "[/ A2A_RESULT_FROM_PEER]"
_A2A_BOUNDARY_END_ESCAPED = "[/ /A2A_RESULT_FROM_PEER]"
def _escape_boundary_markers(text: str) -> str:
"""Escape boundary markers inside the raw peer text before wrapping.
Replaces any occurrence of the boundary start/end markers with a
visually-similar escaped form so a malicious peer can never close
the boundary early or inject a fake opener.
"""
return (
text.replace(_A2A_BOUNDARY_START, _A2A_BOUNDARY_START_ESCAPED)
.replace(_A2A_BOUNDARY_END, _A2A_BOUNDARY_END_ESCAPED)
)
# ── Defense-in-depth: injection pattern escaping ───────────────────────────────
# These patterns cover common prompt-injection phrasings. They are NOT a
# complete sanitizer — see module docstring. The boundary marker is the
# primary control; these are purely defense-in-depth.
_INJECTION_PATTERNS = [
# Single-word patterns: anchor to word boundary so they don't match
# inside other words (e.g. "SYSTEM" in "mySYSTEMatic").
# Single-word patterns: anchor to word boundary so they don't match
# inside other words (e.g. "SYSTEM" in "mySYSTEMatic").
(re.compile(r"(^|[^\w])SYSTEM\b", re.IGNORECASE), r"\1[ESCAPED_SYSTEM]"),
(re.compile(r"(^|[^\w])OVERRIDE\b", re.IGNORECASE), r"\1[ESCAPED_OVERRIDE]"),
# "INSTRUCTIONS" may appear at the start of a string or after a newline.
(re.compile(r"(^|\n)INSTRUCTIONS?\b", re.IGNORECASE), " [ESCAPED_INSTRUCTIONS]"),
(re.compile(r"(^|[^\w])IGNORE\s+ALL\b", re.IGNORECASE), r"\1[ESCAPED_IGNORE_ALL]"),
(re.compile(r"(^|[^\w])YOU\s+ARE\s+NOW\b", re.IGNORECASE), r"\1[ESCAPED_YOU_ARE_NOW]"),
]
def sanitize_a2a_result(text: str) -> str:
"""Sanitize untrusted text from an A2A peer (OFFSEC-003).
Order of operations:
1. Escape boundary markers in the raw text (prevents injection).
2. Escape known injection patterns (defense-in-depth).
Returns the input unchanged if it is empty/None.
Note: this function does NOT add boundary wrappers — callers that need
to establish a trust boundary should wrap the sanitized result with
``[A2A_RESULT_FROM_PEER]\\n{sanitized}\\n[/A2A_RESULT_FROM_PEER]``.
See ``a2a_tools_delegation.py:tool_delegate_task`` for the canonical
wrapping pattern.
"""
if not text:
return text
# 1. Escape boundary markers so a malicious peer cannot break the
# trust boundary from inside their response.
escaped = _escape_boundary_markers(text)
# 2. Escape known injection control-words (defense-in-depth only).
for pattern, replacement in _INJECTION_PATTERNS:
escaped = pattern.sub(replacement, escaped)
return escaped
-251
View File
@@ -1,251 +0,0 @@
#!/usr/bin/env python3
"""A2A CLI — command-line tools for inter-workspace communication.
Supports both synchronous and asynchronous delegation:
a2a delegate <id> <task> — Send task, wait for response (sync)
a2a delegate --async <id> <task> — Send task, return task ID immediately
a2a status <task_id> — Check task status / get result
a2a peers — List available peers
a2a info — Show this workspace's info
Environment variables:
WORKSPACE_ID — this workspace's ID
PLATFORM_URL — platform API base URL
"""
import asyncio
import json
import os
import sys
import uuid
import httpx
_WORKSPACE_ID_raw = os.environ.get("WORKSPACE_ID")
if not _WORKSPACE_ID_raw:
raise RuntimeError("WORKSPACE_ID environment variable is required but not set")
WORKSPACE_ID = _WORKSPACE_ID_raw
# Platform URL: always host.docker.internal inside containers. The platform API
# is only reachable via the Docker network mesh from inside a workspace
# container regardless of the runtime environment (Docker/host).
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
async def discover(target_id: str) -> dict | None:
"""Discover a peer workspace's URL."""
async with httpx.AsyncClient(timeout=30.0) as client:
resp = await client.get(
f"{PLATFORM_URL}/registry/discover/{target_id}",
headers={"X-Workspace-ID": WORKSPACE_ID},
)
if resp.status_code == 200:
return resp.json()
return None
async def delegate(target_id: str, task: str, async_mode: bool = False):
"""Delegate a task to another workspace."""
peer = await discover(target_id)
if not peer:
print(f"Error: cannot reach workspace {target_id} (access denied or offline)", file=sys.stderr)
sys.exit(1)
target_url = peer.get("url", "")
if not target_url:
print(f"Error: workspace {target_id} has no URL", file=sys.stderr)
sys.exit(1)
task_id = str(uuid.uuid4())
if async_mode:
# Async: send and return immediately, don't wait for response
# Use a background task that fires and forgets
async with httpx.AsyncClient(timeout=10.0) as client:
try:
# Send with a short timeout — just confirm receipt
resp = await client.post(
target_url,
json={
"jsonrpc": "2.0",
"id": task_id,
"method": "message/send",
"params": {
"message": {
"role": "user",
"messageId": str(uuid.uuid4()),
"parts": [{"kind": "text", "text": task}],
}
},
},
)
# Even if we timeout, the task is queued on the target
print(json.dumps({
"task_id": task_id,
"target": target_id,
"status": "submitted",
"target_url": target_url,
}))
except httpx.TimeoutException:
# Request was sent but we didn't get confirmation — task may or may not have been received
print(json.dumps({
"task_id": task_id,
"target": target_id,
"status": "uncertain",
"note": "Request sent but response timed out — delivery unconfirmed. Use 'a2a status' to check.",
}), file=sys.stderr)
return
# Sync: wait for full response with retry on rate limit
max_retries = 3
for attempt in range(max_retries):
async with httpx.AsyncClient(timeout=300.0) as client:
try:
resp = await client.post(
target_url,
json={
"jsonrpc": "2.0",
"id": task_id,
"method": "message/send",
"params": {
"message": {
"role": "user",
"messageId": str(uuid.uuid4()),
"parts": [{"kind": "text", "text": task}],
}
},
},
)
try:
data = resp.json()
except Exception:
print(f"Error: invalid JSON response (status {resp.status_code})", file=sys.stderr)
sys.exit(1)
if "result" in data:
parts = data["result"].get("parts", [])
text = parts[0].get("text", "") if parts else ""
if text and text != "(no response generated)":
print(text)
return
# Empty or no-response — might be rate limited, retry
if attempt < max_retries - 1:
delay = 5 * (2 ** attempt)
print(f"(empty response, retrying in {delay}s...)", file=sys.stderr)
await asyncio.sleep(delay)
continue
print(text or "(no response after retries)")
elif "error" in data:
error_msg = data['error'].get('message', 'unknown')
if ("rate" in error_msg.lower() or "overloaded" in error_msg.lower()) and attempt < max_retries - 1:
delay = 5 * (2 ** attempt)
print(f"(rate limited, retrying in {delay}s...)", file=sys.stderr)
await asyncio.sleep(delay)
continue
print(f"Error: {error_msg}", file=sys.stderr)
sys.exit(1)
return
except httpx.TimeoutException:
if attempt < max_retries - 1:
delay = 5 * (2 ** attempt)
print(f"(timeout, retrying in {delay}s...)", file=sys.stderr)
await asyncio.sleep(delay)
continue
print("Error: request timed out after retries", file=sys.stderr)
sys.exit(1)
async def check_status(target_id: str, task_id: str):
"""Check the status of an async task."""
peer = await discover(target_id)
if not peer:
print(f"Error: cannot reach workspace {target_id}", file=sys.stderr)
sys.exit(1)
target_url = peer.get("url", "")
async with httpx.AsyncClient(timeout=30.0) as client:
resp = await client.post(
target_url,
json={
"jsonrpc": "2.0",
"id": str(uuid.uuid4()),
"method": "tasks/get",
"params": {"id": task_id},
},
)
data = resp.json()
if "result" in data:
task = data["result"]
status = task.get("status", {}).get("state", "unknown")
print(f"Status: {status}")
if status == "completed":
artifacts = task.get("artifacts", [])
for a in artifacts:
for p in a.get("parts", []):
if p.get("text"):
print(p["text"])
elif "error" in data:
print(f"Error: {data['error'].get('message', 'unknown')}")
async def peers():
"""List available peers."""
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.get(f"{PLATFORM_URL}/registry/{WORKSPACE_ID}/peers")
if resp.status_code != 200:
print("Error: could not fetch peers", file=sys.stderr)
sys.exit(1)
for p in resp.json():
status = p.get("status", "?")
role = p.get("role", "")
print(f"{p['id']} {p['name']:30s} {status:10s} {role}")
async def info():
"""Get this workspace's info."""
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.get(f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}")
if resp.status_code == 200:
d = resp.json()
print(f"ID: {d['id']}")
print(f"Name: {d['name']}")
print(f"Role: {d.get('role', '')}")
print(f"Tier: {d['tier']}")
print(f"Status: {d['status']}")
print(f"Parent: {d.get('parent_id', '(root)')}")
def main():
if len(sys.argv) < 2:
print("Usage: a2a <command> [args]")
print("Commands:")
print(" delegate <workspace_id> <task> — Send task, wait for response")
print(" delegate --async <workspace_id> <task> — Send task, return immediately")
print(" status <workspace_id> <task_id> — Check async task status")
print(" peers — List available peers")
print(" info — Show workspace info")
sys.exit(1)
cmd = sys.argv[1]
if cmd == "delegate":
async_mode = "--async" in sys.argv
args = [a for a in sys.argv[2:] if a != "--async"]
if len(args) < 2:
print("Usage: a2a delegate [--async] <workspace_id> <task>", file=sys.stderr)
sys.exit(1)
asyncio.run(delegate(args[0], " ".join(args[1:]), async_mode))
elif cmd == "status":
if len(sys.argv) < 4:
print("Usage: a2a status <workspace_id> <task_id>", file=sys.stderr)
sys.exit(1)
asyncio.run(check_status(sys.argv[2], sys.argv[3]))
elif cmd == "peers":
asyncio.run(peers())
elif cmd == "info":
asyncio.run(info())
else:
print(f"Unknown command: {cmd}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__": # pragma: no cover
main()
-803
View File
@@ -1,803 +0,0 @@
"""A2A protocol client — peer discovery, messaging, and workspace info.
Shared constants (WORKSPACE_ID, PLATFORM_URL) live here so that
a2a_tools and a2a_mcp_server can import them from a single place.
"""
import asyncio
import logging
import os
import random
import re
import threading
import time
import uuid
from collections import OrderedDict
from concurrent.futures import ThreadPoolExecutor
import httpx
import a2a_response
from platform_auth import auth_headers, self_source_headers
logger = logging.getLogger(__name__)
_WORKSPACE_ID_raw = os.environ.get("WORKSPACE_ID")
if not _WORKSPACE_ID_raw:
raise RuntimeError("WORKSPACE_ID environment variable is required but not set")
WORKSPACE_ID = _WORKSPACE_ID_raw
# Platform URL: always host.docker.internal inside containers. The platform API
# is only reachable via the Docker network mesh from inside a workspace
# container regardless of the runtime environment (Docker/host).
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
# Cache workspace ID → name mappings (populated by list_peers calls)
_peer_names: dict[str, str] = {}
# Cache: peer workspace_id → the source workspace_id whose registry
# returned that peer. Populated by ``a2a_tools.tool_list_peers`` whenever
# it queries a specific workspace's peers — so a later
# ``tool_delegate_task(target)`` can auto-route through the correct
# source workspace without the agent having to specify
# ``source_workspace_id`` explicitly.
#
# Single-workspace mode: dict stays empty, all delegations fall through
# to the module-level WORKSPACE_ID (existing behavior).
#
# Multi-workspace mode: as the agent calls list_peers, this map is
# populated with each peer's source. Subsequent delegate_task calls
# auto-route. If a peer is registered under multiple sources (rare —
# e.g. an org-wide capability) the LAST observed source wins; the agent
# can override by passing ``source_workspace_id`` explicitly.
_peer_to_source: dict[str, str] = {}
# Cache workspace ID → full peer record (id, name, role, status, url, ...).
# Populated by tool_list_peers and by the lazy registry lookup in
# enrich_peer_metadata. The notification-callback path (channel envelope
# enrichment) reads this cache on every inbound peer_agent push, so the
# read shape stays a dict-like ``__getitem__`` lookup; entries carry
# their fetched-at timestamp so TTL eviction is in-line with the
# lookup. ``None`` as the record is the negative-cache sentinel:
# registry failure is cached for one TTL window so we don't re-fire
# the 2s-bounded GET on every push from a flaky peer.
#
# OrderedDict + maxsize bound (#2482): pre-fix this was an unbounded
# ``dict``, so a workspace receiving from N distinct peers across its
# lifetime accumulated ~100 bytes/entry × N indefinitely. At 10K peers
# that's ~1 MB; at 100K (a chatty platform-wide router) ~10 MB; not
# crash-class but unbounded. The LRU bound caps memory + the TTL caps
# per-entry staleness — both gates are needed because a runaway poller
# touching N new peer_ids per push could grow within a single TTL
# window.
#
# All reads / writes go through ``_peer_metadata_get`` /
# ``_peer_metadata_set`` so the LRU move-to-end + size-trim invariants
# stay co-located. Direct mutation is allowed only in test fixtures
# (clearing for isolation); production code path uses the helpers.
_PEER_METADATA_MAXSIZE = 1024
_peer_metadata: "OrderedDict[str, tuple[float, dict | None]]" = OrderedDict()
_peer_metadata_lock = threading.Lock()
# How long an entry in ``_peer_metadata`` is treated as fresh. 5 minutes
# is the same window we use for delegation routing — long enough that a
# busy agent receiving repeated pushes from one peer doesn't hit the
# registry on every push, short enough that role/name renames propagate
# within a single agent session.
_PEER_METADATA_TTL_SECONDS = 300.0
def _peer_metadata_get(canon: str) -> tuple[float, dict | None] | None:
"""Read with LRU touch — moves the entry to the most-recently-used
position so steady-state pushes from a busy peer don't get evicted
by a cold-start burst from new peers. Returns the raw tuple shape
callers expect; TTL eviction stays at the call site.
"""
with _peer_metadata_lock:
entry = _peer_metadata.get(canon)
if entry is not None:
_peer_metadata.move_to_end(canon)
return entry
def _peer_metadata_set(canon: str, value: tuple[float, dict | None]) -> None:
"""Write + evict-if-over-maxsize. The eviction is in-process and
cheap (popitem(last=False) on an OrderedDict is O(1)). Holding the
lock across the trim keeps the size invariant stable under concurrent
writes from background enrichment workers.
"""
with _peer_metadata_lock:
_peer_metadata[canon] = value
_peer_metadata.move_to_end(canon)
# Trim the oldest entries until at-or-below maxsize. The bound
# is a soft cap — a single overrun (set called when at maxsize)
# evicts the LRU entry before returning, never letting size
# exceed maxsize.
while len(_peer_metadata) > _PEER_METADATA_MAXSIZE:
_peer_metadata.popitem(last=False)
# Background-fetch executor for enrich_peer_metadata_nonblocking (#2484).
# A small pool — peers are highly TTL-cached, so the steady-state load
# is "one fetch per peer per 5 minutes." Two workers handle the cold-
# start burst when an agent starts receiving pushes from a new peer for
# the first time without backing up the inbox poller. Daemon threads:
# the executor must NOT block process exit if the inbox shuts down.
_enrich_executor: ThreadPoolExecutor | None = None
_enrich_executor_lock = threading.Lock()
# In-flight peer IDs — guards against a single peer's repeated pushes
# scheduling N concurrent registry fetches before the first one fills
# the cache. Set membership is "a worker is currently fetching this
# peer; subsequent calls should NOT schedule another."
_enrich_in_flight: set[str] = set()
_enrich_in_flight_lock = threading.Lock()
def _get_enrich_executor() -> ThreadPoolExecutor:
"""Lazy-init the enrichment worker pool. Lazy because most test
fixtures and short-lived CLI invocations don't need it; only the
long-running molecule-mcp / inbox-poller path actually schedules
background fetches.
"""
global _enrich_executor
if _enrich_executor is not None:
return _enrich_executor
with _enrich_executor_lock:
if _enrich_executor is None:
_enrich_executor = ThreadPoolExecutor(
max_workers=2,
thread_name_prefix="enrich-peer",
)
return _enrich_executor
def enrich_peer_metadata_nonblocking(
peer_id: str,
source_workspace_id: str | None = None,
) -> dict | None:
"""Cache-first variant of ``enrich_peer_metadata`` — returns
immediately without blocking on a registry GET.
Behavior:
- Cache hit (fresh): return the cached record.
- Cache miss or TTL expired: schedule a background fetch via the
worker pool, return ``None`` (caller renders bare peer_id).
The next push for this peer hits the warm cache and gets the
full record.
Why this exists (#2484): the inbox poller's notification callback
in molecule-mcp called the synchronous ``enrich_peer_metadata`` on
every push, blocking the poller for up to 2s × N uncached peers
per batch. Push-delivery latency was gated on registry latency —
the exact thing the negative-cache patch in PR #2471 was supposed
to avoid amplifying. Moving the fetch off the poller thread means
push delivery is bounded by the inbox poll interval, never by
registry RTT.
Trade-off: the FIRST push from a new peer arrives metadata-light
(no name/role). The MCP host renders the bare peer_id. Subsequent
pushes (within the 5-min TTL) hit the warm cache and get the full
record. Acceptable because:
- Channel-envelope enrichment is a UX nicety, not a correctness
invariant.
- The cold-cache window per peer is bounded to one push.
- The TTL is long enough that an active conversation never
re-enters the cold state.
"""
canon = _validate_peer_id(peer_id)
if canon is None:
return None
# Cache hit (fresh): return without blocking on a registry GET.
# This is the hot path for active peer conversations — avoids
# spawning a background thread for every push from a known peer.
current = time.monotonic()
cached = _peer_metadata_get(canon)
if cached is not None:
fetched_at, record = cached
if current - fetched_at < _PEER_METADATA_TTL_SECONDS:
return record
# Cache miss or TTL expired: schedule background fetch unless one is
# already in flight for this peer. The in-flight set keeps a flurry
# of pushes from one peer (e.g., a chatty agent) from spawning N
# parallel GETs.
with _enrich_in_flight_lock:
if canon in _enrich_in_flight:
return None
_enrich_in_flight.add(canon)
try:
_get_enrich_executor().submit(
_enrich_peer_metadata_worker, canon, source_workspace_id
)
except RuntimeError:
# Executor was shut down (process exit path) — drop the request,
# let the caller render bare peer_id.
with _enrich_in_flight_lock:
_enrich_in_flight.discard(canon)
return None
def _enrich_peer_metadata_worker(
canon: str, source_workspace_id: str | None
) -> None:
"""Background-thread body for ``enrich_peer_metadata_nonblocking``.
Runs the same fetch logic as the synchronous helper but discards
the return value — the cache write is the only output anyone
needs. Always clears the in-flight marker so a future cache miss
can retry.
"""
try:
enrich_peer_metadata(canon, source_workspace_id)
except Exception as exc: # noqa: BLE001
# Background workers must not crash the executor — log and
# move on. The negative-cache path inside enrich_peer_metadata
# already records failures, so a re-attempt is rate-limited
# by TTL.
logger.debug("_enrich_peer_metadata_worker: %s failed: %s", canon, exc)
finally:
with _enrich_in_flight_lock:
_enrich_in_flight.discard(canon)
def _wait_for_enrichment_inflight_for_testing(timeout: float = 2.0) -> None:
"""Block until all in-flight enrichment workers have completed.
Test-only helper. Production code never has a reason to wait — the
point of the nonblocking path is that callers don't care when the
cache fills. Tests that want to assert "after the worker runs, the
cache has the record" use this to synchronise without sleeping.
Polls ``_enrich_in_flight`` rather than holding a Condition because
the worker pool is already serializing through ``_enrich_in_flight_lock``;
poll keeps the production hot path lock-free.
"""
deadline = time.monotonic() + timeout
while time.monotonic() < deadline:
with _enrich_in_flight_lock:
if not _enrich_in_flight:
return
time.sleep(0.01)
def _peer_in_flight_clear_for_testing() -> None:
"""Clear the in-flight enrichment set. Test-only helper."""
with _enrich_in_flight_lock:
_enrich_in_flight.clear()
def enrich_peer_metadata(
peer_id: str,
source_workspace_id: str | None = None,
*,
now: float | None = None,
) -> dict | None:
"""Return cached or freshly-fetched metadata for ``peer_id``.
Sync helper — safe to call from the inbox poller's notification
callback thread (which is not async). Hits the in-process cache
first; on miss or TTL expiry, GETs ``/registry/discover/<peer_id>``
synchronously with a tight timeout. Returns None on validation
failure, network failure, or non-200 response so callers can
degrade gracefully (the channel envelope falls back to the raw
``peer_id`` instead of crashing the push path).
Negative caching: failure outcomes (4xx/5xx/non-JSON/network
exception) are stored as ``(now, None)`` and treated as
fresh-but-empty for the TTL window. Without this, a peer with a
flaky/missing registry record would re-fire the 2s-bounded GET on
EVERY push — turning the cache into a no-op for the exact failure
scenarios it most needs to defend against.
The fetched dict is stored as-is, so callers can read whatever
fields the platform exposes (currently: ``id``, ``name``, ``role``,
``status``, ``url``). New fields surface automatically without a
code change here.
"""
canon = _validate_peer_id(peer_id)
if canon is None:
return None
current = now if now is not None else time.monotonic()
cached = _peer_metadata_get(canon)
if cached is not None:
fetched_at, record = cached
if current - fetched_at < _PEER_METADATA_TTL_SECONDS:
# Fresh entry — return whatever's there. ``None`` is the
# negative-cache sentinel: caller treats absence of fields
# the same as a registry miss, which is the desired UX.
return record
src = (source_workspace_id or "").strip() or WORKSPACE_ID
url = f"{PLATFORM_URL}/registry/discover/{canon}"
try:
with httpx.Client(timeout=2.0) as client:
resp = client.get(url, headers={"X-Workspace-ID": src, **auth_headers(src)})
except Exception as exc: # noqa: BLE001
logger.debug("enrich_peer_metadata: GET %s failed: %s", url, exc)
_peer_metadata_set(canon, (current, None))
return None
if resp.status_code != 200:
logger.debug(
"enrich_peer_metadata: %s returned HTTP %d", url, resp.status_code
)
_peer_metadata_set(canon, (current, None))
return None
try:
data = resp.json()
except Exception: # noqa: BLE001
_peer_metadata_set(canon, (current, None))
return None
if not isinstance(data, dict):
_peer_metadata_set(canon, (current, None))
return None
_peer_metadata_set(canon, (current, data))
if name := data.get("name"):
_peer_names[canon] = name
return data
def _agent_card_url_for(peer_id: str) -> str:
"""Construct the platform-side agent-card URL for ``peer_id``.
Returns the empty string when ``peer_id`` is not a UUID — same
trust-boundary rationale as ``discover_peer``: never interpolate
path-traversal characters into a URL. An invalid id reflected back
to the receiving agent as ``…/registry/discover/../../foo`` is a
foothold we close at construction time.
Uses the registry's discovery path so the agent receiving a push
can hit a single endpoint to enumerate the sender's capabilities
+ role + URL. Same shape every workspace exposes regardless of
runtime — claude-code, hermes, langchain wrappers all register
through ``/registry/register`` and surface through ``/registry/discover``.
"""
safe_id = _validate_peer_id(peer_id)
if safe_id is None:
return ""
return f"{PLATFORM_URL}/registry/discover/{safe_id}"
# Sentinel prefix for errors originating from send_a2a_message / child agents.
# Used by delegate_task to distinguish real errors from normal response text.
_A2A_ERROR_PREFIX = "[A2A_ERROR] "
# Sentinel prefix for queued-for-poll-mode-peer outcomes (#2967).
# When the target workspace is registered as delivery_mode=poll (no
# public URL — typical for external molecule-mcp standalone runtimes),
# the platform's a2a_proxy.go:402 short-circuit returns a synthetic
# {"status":"queued","delivery_mode":"poll","method":"..."} envelope
# instead of dispatching over HTTP. The message IS delivered (written
# to the platform's inbox queue); there's just no synchronous reply
# to relay. Pre-#2967 the client treated this as "unexpected response
# shape" → caller saw DELEGATION FAILED → retried → recipient saw
# duplicates. The Queued prefix lets callers branch on this outcome
# explicitly: "delivered async, no synchronous reply expected" is
# different from both success-with-text and failure.
_A2A_QUEUED_PREFIX = "[A2A_QUEUED] "
# Workspace IDs are UUIDs everywhere we generate them (platform's
# workspaces.id column, /registry/discover/:id route param, etc.) but
# the agent-facing tool surface receives them as free-form strings via
# tool args. ``_validate_peer_id`` enforces UUID-shape at the
# trust boundary so we never interpolate `..` or `/` into a URL path,
# never silently coerce malformed input into a 404, and surface a
# clear error to the agent rather than letting an HTTP 4xx bubble up
# from the platform with a generic error message.
#
# Lenient on case + whitespace because real-world peer-id strings
# come from list_peers/discover_peer responses (canonical lowercase)
# or hand-typed agent input (mixed-case acceptable). Strict on
# everything else.
_UUID_RE = re.compile(
r"^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$"
)
def _validate_peer_id(peer_id: str) -> str | None:
"""Return the canonicalised peer_id if valid, else None.
Returning None instead of raising so callers in tool surfaces can
convert to a friendly agent-facing string ("workspace_id is not a
valid UUID") rather than crashing with a stack trace.
"""
if not isinstance(peer_id, str):
return None
pid = peer_id.strip()
if not _UUID_RE.match(pid):
return None
return pid.lower()
async def discover_peer(target_id: str, source_workspace_id: str | None = None) -> dict | None:
"""Discover a peer workspace's URL via the platform registry.
Validates ``target_id`` is a UUID before constructing the URL — a
malformed id can't reach the platform handler now, which both
short-circuits an avoidable round-trip AND ensures we never
interpolate path-traversal characters into the URL.
``source_workspace_id`` selects which registered workspace asks the
question — both the X-Workspace-ID header AND the Authorization
bearer token must come from the same workspace, otherwise the
platform's TenantGuard rejects the request. Defaults to the
module-level WORKSPACE_ID for back-compat with single-workspace
callers.
"""
safe_id = _validate_peer_id(target_id)
if safe_id is None:
return None
src = (source_workspace_id or "").strip() or WORKSPACE_ID
async with httpx.AsyncClient(timeout=10.0) as client:
try:
resp = await client.get(
f"{PLATFORM_URL}/registry/discover/{safe_id}",
headers={"X-Workspace-ID": src, **auth_headers(src)},
)
if resp.status_code == 200:
return resp.json()
return None
except Exception as e:
logger.error(f"Discovery failed for {target_id}: {e}")
return None
# httpx exception classes that indicate a transient transport-layer
# failure worth retrying — the request never produced an application
# response, so a fresh attempt has a real chance of succeeding. Any
# error not in this tuple is treated as deterministic (HTTP-status,
# JSON parse, runtime-returned JSON-RPC error, etc.) and surfaced to
# the caller on the first try.
#
# Why each one belongs here:
# - ConnectError / ConnectTimeout: peer's listening socket wasn't
# ready (mid-restart, not yet bound). Fast failure, fast recovery.
# - RemoteProtocolError: peer closed the TCP connection without
# writing a response — observed on 2026-04-27 when a peer's prior
# in-flight Claude SDK session aborted and the new request's
# connection was reset mid-handler.
# - ReadError / WriteError: TCP read/write socket error mid-flight,
# typically a network blip on the Docker bridge or a peer worker
# crash.
# - ReadTimeout: peer didn't write ANY response bytes within the
# 300s read budget. Distinct from "peer is slow but progressing"
# (which httpx surfaces as a successful read with chunked bytes).
# Retry budget caps the worst case — see _DELEGATE_TOTAL_BUDGET_S.
_TRANSIENT_HTTP_ERRORS: tuple[type[Exception], ...] = (
httpx.ConnectError,
httpx.ConnectTimeout,
httpx.ReadError,
httpx.WriteError,
httpx.RemoteProtocolError,
httpx.ReadTimeout,
)
# Retry budget. Up to 5 attempts (1 initial + 4 retries) with
# exponential backoff (1, 2, 4, 8 seconds), each backoff jittered ±25%
# to prevent synchronized retry storms across siblings if a peer flaps.
# _DELEGATE_TOTAL_BUDGET_S caps cumulative wall-clock so a string of
# ReadTimeouts can't make the caller wait 25 minutes — once the
# deadline elapses we stop retrying even if attempts remain. 600s = 10
# minutes is the agreed worst case the caller can tolerate before
# falling back to "peer unavailable" handling in tool_delegate_task.
_DELEGATE_MAX_ATTEMPTS = 5
_DELEGATE_BACKOFF_BASE_S = 1.0
_DELEGATE_BACKOFF_CAP_S = 16.0
_DELEGATE_TOTAL_BUDGET_S = 600.0
def _delegate_backoff_seconds(attempt_zero_indexed: int) -> float:
"""Return the (jittered) backoff delay before retrying after the
given attempt index (0 = backoff before retry #1).
Pure function so the schedule is unit-testable without monkey-
patching asyncio.sleep. Jitter is symmetric ±25% on top of the
capped exponential — enough to break sync across simultaneous
callers without making the schedule unpredictable.
"""
base = min(_DELEGATE_BACKOFF_BASE_S * (2 ** attempt_zero_indexed), _DELEGATE_BACKOFF_CAP_S)
jitter = base * (0.5 * random.random() - 0.25)
return max(0.0, base + jitter)
def _format_a2a_error(exc: BaseException, target_url: str) -> str:
"""Format an httpx exception as an [A2A_ERROR] string.
Some httpx exceptions stringify to empty (RemoteProtocolError,
ConnectionReset variants) — the canvas would then render
"[A2A_ERROR] " with no detail and the operator has no signal to
act on. Always include the exception class name and the target
URL so the activity log + Agent Comms panel have actionable
information without a trip through container logs.
"""
msg = str(exc).strip()
type_name = type(exc).__name__
if not msg:
detail = f"{type_name} (no message — likely connection reset or silent timeout)"
elif msg.startswith(f"{type_name}:") or msg.startswith(f"{type_name} "):
# Already prefixed with the type — don't double-prefix.
# Prefix-anchored check (not substring) so a message that
# happens to mention some OTHER class name mid-string
# (e.g. "got OSError on read") doesn't suppress our own
# type prefix and lose the diagnostic signal.
detail = msg
else:
detail = f"{type_name}: {msg}"
return f"{_A2A_ERROR_PREFIX}{detail} [target={target_url}]"
async def send_a2a_message(peer_id: str, message: str, source_workspace_id: str | None = None) -> str:
"""Send an A2A ``message/send`` to a peer workspace via the platform proxy.
The target URL is constructed internally as
``${PLATFORM_URL}/workspaces/{peer_id}/a2a``. Going through the
platform's A2A proxy is the only path that works for both
in-container and external runtimes — see
a2a_tools.tool_delegate_task for the rationale.
``source_workspace_id`` is the SENDING workspace — drives both the
X-Workspace-ID source-tagging header and the bearer token. Defaults
to the module-level WORKSPACE_ID for back-compat. Multi-workspace
operators pass it explicitly so each registered workspace's peers
are reached via their own auth chain.
Auto-retries up to _DELEGATE_MAX_ATTEMPTS times on transient
transport-layer errors (RemoteProtocolError, ConnectError,
ReadTimeout, etc.) with exponential-backoff + jitter, capped by
_DELEGATE_TOTAL_BUDGET_S. Application-level failures (HTTP 4xx,
JSON-RPC error response, malformed JSON) are NOT retried — they
indicate a deterministic problem retry won't fix.
"""
safe_id = _validate_peer_id(peer_id)
if safe_id is None:
return f"{_A2A_ERROR_PREFIX}invalid peer_id (expected UUID): {peer_id!r}"
src = (source_workspace_id or "").strip() or WORKSPACE_ID
target_url = f"{PLATFORM_URL}/workspaces/{safe_id}/a2a"
# Fix F (Cycle 5 / H2 — flagged 5 consecutive audits): timeout=None allowed
# a hung upstream to block the agent indefinitely. Use a generous but bounded
# timeout: 30s connect + 300s read (long enough for slow LLM responses).
timeout_cfg = httpx.Timeout(connect=30.0, read=300.0, write=30.0, pool=30.0)
deadline = time.monotonic() + _DELEGATE_TOTAL_BUDGET_S
last_exc: BaseException | None = None
for attempt in range(_DELEGATE_MAX_ATTEMPTS):
async with httpx.AsyncClient(timeout=timeout_cfg) as client:
try:
# self_source_headers() includes X-Workspace-ID so the
# platform's a2a_receive logger records source_id =
# WORKSPACE_ID. Otherwise peer-A2A messages — including
# the case where target_url resolves to this workspace's
# own /a2a — get logged with source_id=NULL and surface
# in the recipient's My Chat tab as user-typed input.
resp = await client.post(
target_url,
headers=self_source_headers(src),
json={
"jsonrpc": "2.0",
"id": str(uuid.uuid4()),
"method": "message/send",
"params": {
"message": {
"role": "user",
"messageId": str(uuid.uuid4()),
"parts": [{"kind": "text", "text": message}],
}
},
},
)
data = resp.json()
# Dispatch via the SSOT response model (a2a_response.py).
# All shape detection lives in one place — the parser
# never raises and routes unknown shapes to Malformed
# so a future server-side change is loud, not silent.
variant = a2a_response.parse(data)
if isinstance(variant, a2a_response.Result):
# Match legacy semantics:
# parts non-empty + first part has no text → ""
# parts empty → "(no response)"
# Differentiation matters for callers that assert
# on the empty-string case (test_a2a_client).
if variant.parts:
text = variant.text
else:
text = "(no response)"
# Tag child-reported errors so the caller can
# detect them reliably — agent-side bug surfaces
# text like "Agent error: <traceback>" inside a
# JSON-RPC success envelope.
if text.startswith("Agent error:"):
return f"{_A2A_ERROR_PREFIX}{text}"
return text
if isinstance(variant, a2a_response.Queued):
# Poll-mode peer — message accepted into the inbox
# queue, target agent will fetch via poll. NOT a
# failure. Return the queued sentinel so callers
# (delegate_task etc.) can render the outcome
# accurately instead of treating it as an error.
logger.info(
"send_a2a_message: queued for poll-mode peer (target=%s method=%s)",
target_url,
variant.method,
)
return f"{_A2A_QUEUED_PREFIX}target={safe_id} method={variant.method}"
if isinstance(variant, a2a_response.Error):
msg = variant.message
code = variant.code
if msg and code is not None:
detail = f"{msg} (code={code})"
elif msg:
detail = msg
elif code is not None:
detail = f"JSON-RPC error with no message (code={code})"
else:
detail = "JSON-RPC error with no message"
if variant.restarting:
# Surface platform-restart-in-progress
# explicitly — caller (UI / delegating agent)
# can render a softer "agent is restarting"
# message rather than a generic failure.
retry = (
f", retry_after={variant.retry_after}s"
if variant.retry_after is not None
else ""
)
detail = f"{detail} (restarting{retry})"
return f"{_A2A_ERROR_PREFIX}{detail} [target={target_url}]"
# Malformed — log loud + surface as error so the
# operator notices a server change. SSOT refactor
# subsumes the inline "queued" check that landed in
# the #2972 hotfix; that branch is now the typed
# Queued variant above.
logger.warning(
"send_a2a_message: malformed response (target=%s body=%.200s)",
target_url,
str(variant.raw),
)
return (
f"{_A2A_ERROR_PREFIX}unexpected response shape "
f"(no result, error, or queued envelope): "
f"{str(variant.raw)[:200]} [target={target_url}]"
)
except _TRANSIENT_HTTP_ERRORS as e:
last_exc = e
attempts_remaining = _DELEGATE_MAX_ATTEMPTS - (attempt + 1)
if attempts_remaining <= 0 or time.monotonic() >= deadline:
# Out of attempts OR out of total budget — surface
# the last error to the caller.
break
delay = _delegate_backoff_seconds(attempt)
# Don't sleep past the deadline — clamp.
remaining = deadline - time.monotonic()
if delay > remaining:
delay = max(0.0, remaining)
logger.warning(
"send_a2a_message: transient %s on attempt %d/%d, retrying in %.1fs (target=%s)",
type(e).__name__,
attempt + 1,
_DELEGATE_MAX_ATTEMPTS,
delay,
target_url,
)
await asyncio.sleep(delay)
continue
except Exception as e:
# Non-transient (HTTP-status, JSON parse, etc.) — don't retry.
return _format_a2a_error(e, target_url)
# Retries exhausted (or budget elapsed). last_exc must be set
# because we only break out of the loop after assigning it.
assert last_exc is not None # noqa: S101
return _format_a2a_error(last_exc, target_url)
async def get_peers_with_diagnostic(source_workspace_id: str | None = None) -> tuple[list[dict], str | None]:
"""Get this workspace's peers, returning (peers, diagnostic).
diagnostic is None when the call succeeded (status 200, even if the list
is empty). When peers is [] for a non-trivial reason (auth failure,
workspace-id missing from registry, platform error, network error),
diagnostic is a short human-readable string explaining what went wrong
so callers can surface it instead of "may be isolated" — see #2397.
``source_workspace_id`` selects which registered workspace's peers to
enumerate; defaults to the module-level WORKSPACE_ID for
single-workspace back-compat. Multi-workspace operators iterate over
each registered workspace separately so each set of peers is fetched
with the correct auth.
The legacy get_peers() shim below preserves the bare-list contract for
non-tool callers.
"""
src = (source_workspace_id or "").strip() or WORKSPACE_ID
url = f"{PLATFORM_URL}/registry/{src}/peers"
async with httpx.AsyncClient(timeout=10.0) as client:
try:
resp = await client.get(
url,
headers={"X-Workspace-ID": src, **auth_headers(src)},
)
except Exception as e:
return [], f"Cannot reach platform at {PLATFORM_URL}: {e}"
if resp.status_code == 200:
try:
data = resp.json()
except Exception as e:
return [], f"Platform returned 200 but body was not JSON: {e}"
if not isinstance(data, list):
return [], f"Platform returned 200 but body was not a list: {type(data).__name__}"
return data, None
if resp.status_code in (401, 403):
return [], (
f"Authentication to platform failed (HTTP {resp.status_code}). "
"The workspace bearer token may be invalid — restarting the workspace usually re-mints it."
)
if resp.status_code == 404:
return [], (
f"Workspace ID {WORKSPACE_ID} is not registered with the platform (HTTP 404). "
"Re-registration via the platform's /registry/register endpoint is needed."
)
if 500 <= resp.status_code < 600:
return [], f"Platform error: HTTP {resp.status_code}."
return [], f"Unexpected platform response: HTTP {resp.status_code}."
async def get_peers() -> list[dict]:
"""Get this workspace's peers from the platform registry.
Bare-list shim over get_peers_with_diagnostic() — discards the diagnostic
so callers that don't care about the failure reason (e.g. system-prompt
bootstrap formatters) get the same shape they always had.
"""
peers, _ = await get_peers_with_diagnostic()
return peers
async def get_workspace_info(source_workspace_id: str | None = None) -> dict:
"""Get this workspace's info from the platform.
``source_workspace_id`` selects which registered workspace to
introspect when the agent is registered into multiple workspaces
(multi-workspace mode). Unset → defaults to the module-level
WORKSPACE_ID — single-workspace operators see no behaviour change.
Distinguishes three failure shapes so callers can handle them
distinctly (#2429):
- 410 Gone → workspace was deleted; re-onboard required
- 404 / other → workspace never existed (or transient)
- exception → network / auth failure
"""
src = source_workspace_id or WORKSPACE_ID
async with httpx.AsyncClient(timeout=10.0) as client:
try:
resp = await client.get(
f"{PLATFORM_URL}/workspaces/{src}",
headers=auth_headers(src),
)
if resp.status_code == 200:
return resp.json()
if resp.status_code == 410:
# #2429: platform returns 410 when status='removed'.
# Surface "removed" + the actionable hint so callers
# can prompt re-onboard instead of falling through to
# "not found" — which made the 2026-04-30 incident
# impossible to diagnose ("workspace not found" with
# a workspace_id we KNEW we'd just registered).
try:
body = resp.json()
except Exception:
body = {}
return {
"error": "removed",
"id": body.get("id", src),
"removed_at": body.get("removed_at"),
"hint": body.get(
"hint",
"Workspace was deleted on the platform. "
"Regenerate workspace + token from the canvas → Tokens tab.",
),
}
return {"error": "not found"}
except Exception as e:
return {"error": str(e)}
-567
View File
@@ -1,567 +0,0 @@
"""Bridge between LangGraph agent and A2A protocol, with SSE streaming support.
SSE streaming architecture
--------------------------
The A2A SDK (``DefaultRequestHandler`` + ``EventQueue``) owns the SSE transport
layer. This executor's job is to push the right event types into the queue as
work progresses:
1. ``TaskStatusUpdateEvent(state=working)`` — immediately signals start
2. ``TaskArtifactUpdateEvent(chunk, append=…)`` — one per LLM text token
3. ``Message(final_text)`` — terminal event
Client compatibility
--------------------
*Non-streaming* (``message/send``):
``ResultAggregator.consume_all()`` processes status/artifact events
(updating the task in the store) and returns the final ``Message``
immediately — backward-compatible with ``a2a_client.py`` which reads
``data["result"]["parts"][0]["text"]``.
*Streaming* (``message/stream``):
``consume_and_emit()`` yields every event above as SSE, letting the client
render tokens in real time.
LangGraph integration
---------------------
Uses ``agent.astream_events(version="v2")`` to receive ``on_chat_model_stream``
events with ``AIMessageChunk`` payloads. Text is extracted from both plain
strings (OpenAI / Groq) and Anthropic-style content-block lists. Non-text
content (tool_use, etc.) is silently skipped. A fresh ``artifact_id`` is
generated for each new LLM ``run_id`` so tool-call cycles are grouped cleanly.
"""
import functools
import logging
import os
import uuid
from a2a.server.agent_execution import AgentExecutor, RequestContext
from a2a.server.events import EventQueue
from a2a.server.tasks import TaskUpdater
from a2a.types import Part
# KI-009: a2a-sdk v1 renames a2a.utils → a2a.helpers; TextPart removed (Part takes text= directly)
from a2a.helpers import new_text_message
from shared_runtime import (
extract_history as _extract_history,
extract_message_text,
brief_task,
set_current_task,
)
from executor_helpers import (
collect_outbound_files,
extract_attached_files,
read_delegation_results,
sanitize_agent_error,
)
from builtin_tools.telemetry import (
A2A_TASK_ID,
GEN_AI_OPERATION_NAME,
GEN_AI_REQUEST_MODEL,
GEN_AI_SYSTEM,
WORKSPACE_ID_ATTR,
_incoming_trace_context,
gen_ai_system_from_model,
get_tracer,
record_llm_token_usage,
)
logger = logging.getLogger(__name__)
_WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "unknown")
# LangGraph ReAct cycle budget per turn. Library default is 25; 500 covers
# PM fan-outs (plan → 6 delegations → 6 awaits → 6 results → synthesize ≈
# 30+ steps even before retries). Overridable via LANGGRAPH_RECURSION_LIMIT.
DEFAULT_RECURSION_LIMIT = 500
def _parse_recursion_limit() -> int:
"""Read LANGGRAPH_RECURSION_LIMIT; fall back to DEFAULT_RECURSION_LIMIT
with a WARNING log on any unparseable or non-positive value."""
raw = os.environ.get("LANGGRAPH_RECURSION_LIMIT", "")
if not raw:
return DEFAULT_RECURSION_LIMIT
try:
n = int(raw)
except ValueError:
logger.warning(
"LANGGRAPH_RECURSION_LIMIT=%r is not an integer; using default %d",
raw, DEFAULT_RECURSION_LIMIT,
)
return DEFAULT_RECURSION_LIMIT
if n <= 0:
logger.warning(
"LANGGRAPH_RECURSION_LIMIT=%d is not positive; using default %d",
n, DEFAULT_RECURSION_LIMIT,
)
return DEFAULT_RECURSION_LIMIT
return n
# ---------------------------------------------------------------------------
# Compliance (OWASP Top 10 for Agentic Apps) — optional, lazy-loaded
# ---------------------------------------------------------------------------
try:
from builtin_tools.compliance import (
AgencyTracker,
ExcessiveAgencyError,
PromptInjectionError,
redact_pii as _redact_pii,
sanitize_input as _sanitize_input,
)
_COMPLIANCE_AVAILABLE = True
except ImportError: # pragma: no cover
_COMPLIANCE_AVAILABLE = False
@functools.lru_cache(maxsize=1)
def _get_compliance_cfg():
"""Return ComplianceConfig or None (cached for process lifetime)."""
try:
from config import load_config
return load_config().compliance
except Exception:
return None
def _extract_chunk_text(content) -> list[str]:
"""Extract text strings from an LLM streaming chunk's content field.
Handles both provider content styles:
- OpenAI / Groq: ``content`` is a plain ``str`` (empty for tool-call chunks).
- Anthropic: ``content`` is a list of typed blocks, e.g.
``[{"type": "text", "text": "Hello"}, {"type": "tool_use", ...}]``
Only ``"text"`` blocks are returned; ``tool_use``, ``tool_result``, and
other non-text blocks are filtered out so raw tool JSON never appears in
the SSE stream.
Args:
content: ``chunk.content`` value from an ``on_chat_model_stream`` event.
Returns:
List of non-empty text strings.
"""
if isinstance(content, str):
return [content] if content else []
if isinstance(content, list):
texts: list[str] = []
for block in content:
if isinstance(block, dict) and block.get("type") == "text":
text = block.get("text", "")
if text:
texts.append(text)
elif isinstance(block, str) and block:
texts.append(block)
return texts
return []
class LangGraphA2AExecutor(AgentExecutor):
"""Bridges LangGraph agent to A2A event model with SSE streaming support.
Always uses ``agent.astream_events()`` so that:
- Streaming clients (``message/stream``) receive token-level SSE events.
- Non-streaming clients (``message/send``) receive the final ``Message``
collected from the same stream — no duplicate LLM call, full compat.
"""
def __init__(self, agent, heartbeat=None, model: str = "unknown"):
self.agent = agent # Compiled LangGraph graph (create_react_agent output)
self._heartbeat = heartbeat
self._model = model # e.g. "anthropic:claude-sonnet-4-6"
async def execute(self, context: RequestContext, event_queue: EventQueue) -> None:
"""Execute a task from an A2A request with SSE streaming.
Routes through the Temporal durable workflow when a global
``TemporalWorkflowWrapper`` is initialised and connected to Temporal;
otherwise falls back to ``_core_execute()`` (direct path).
Event emission sequence:
1. TaskStatusUpdateEvent(working) — immediate start signal
2. TaskArtifactUpdateEvent chunks — token-by-token via astream_events
3. Message(final_text) — terminal; non-streaming clients
return on this; streaming clients
also receive it as the last SSE event.
"""
# ── Optional Temporal durable execution wrapper ──────────────────────
# When a TemporalWorkflowWrapper is active this routes execution through
# a MoleculeAIAgentWorkflow (task_receive → llm_call → task_complete).
# Falls back silently to _core_execute() on any error or if Temporal
# is unavailable, so the client always receives a response.
try:
from builtin_tools.temporal_workflow import get_wrapper as _get_temporal_wrapper
_tw = _get_temporal_wrapper()
if _tw is not None and _tw.is_available():
return await _tw.run(self, context, event_queue)
except Exception:
pass # Never let the wrapper path crash the executor
await self._core_execute(context, event_queue)
async def _core_execute(self, context: RequestContext, event_queue: EventQueue) -> str:
"""Core execution pipeline — called directly or from a Temporal activity.
This is the original ``execute()`` body, extracted so that the Temporal
``llm_call`` activity can invoke it without re-entering the wrapper
check and causing infinite recursion.
Returns the final response text (empty string on empty input or error).
Event emission sequence:
1. TaskStatusUpdateEvent(working) — immediate start signal
2. TaskArtifactUpdateEvent chunks — token-by-token via astream_events
3. Message(final_text) — terminal event
"""
user_input = extract_message_text(context)
# Inject delegation results from prior turns. Heartbeat writes
# completed delegation rows to DELEGATION_RESULTS_FILE and sends
# a self-message to wake the agent; this consumes the file and
# surfaces the results as context so the agent can act on them
# without needing an explicit check_task_status call.
# Results are prepended so they are visible even when the
# self-message text is overwritten by a subsequent user message.
pending_results = read_delegation_results()
if pending_results:
logger.info("A2A execute: injecting %d delegation result(s)", pending_results.count("\n") + 1)
user_input = f"[Delegation results available]\n{pending_results}\n\n{user_input}"
# Pull attached files from A2A message parts (kind: "file") and
# append a manifest to the prompt so the agent knows they exist.
# LangGraph tools (filesystem, bash, skills) can then open the
# files by path — without this the agent silently ignores the
# attachments and replies "I'm not sure what you're referring to".
_attached_files = extract_attached_files(getattr(context, "message", None))
if _attached_files:
_manifest = "\n\nAttached files:\n" + "\n".join(
f"- {f['name']} ({f['mime_type'] or 'unknown type'}) at {f['path']}"
for f in _attached_files
)
user_input = (user_input + _manifest) if user_input else _manifest.lstrip()
if not user_input:
parts = getattr(getattr(context, "message", None), "parts", None)
logger.warning("A2A execute: no text content in message parts: %s", parts)
await event_queue.enqueue_event(
new_text_message("Error: message contained no text content.")
)
return ""
# ── OA-01: Prompt injection check (OWASP Agentic Top 10) ────────────
_compliance_cfg = _get_compliance_cfg() if _COMPLIANCE_AVAILABLE else None
if _COMPLIANCE_AVAILABLE and _compliance_cfg and _compliance_cfg.mode == "owasp_agentic":
try:
user_input = _sanitize_input(
user_input,
prompt_injection_mode=_compliance_cfg.prompt_injection,
context_id=context.context_id or "",
)
except PromptInjectionError as exc:
await event_queue.enqueue_event(
new_text_message(f"Request blocked: {exc}")
)
return ""
logger.info("A2A execute: user_input=%s", user_input[:200])
# ── OTEL: task_receive span ──────────────────────────────────────────
parent_ctx = _incoming_trace_context.get()
tracer = get_tracer()
_result: str = "" # captured inside the span for return after it closes
with tracer.start_as_current_span("task_receive", context=parent_ctx) as task_span:
task_span.set_attribute(WORKSPACE_ID_ATTR, _WORKSPACE_ID)
task_span.set_attribute(A2A_TASK_ID, context.context_id or "")
task_span.set_attribute("a2a.input_preview", user_input[:256])
# Resolve IDs — the RequestContextBuilder always sets them, but
# we generate fallbacks for safety (e.g. in unit tests).
task_id = context.task_id or str(uuid.uuid4())
context_id = context.context_id or str(uuid.uuid4())
# A2A v1 contract (a2a-sdk ≥ 1.0): enqueue a Task event before any
# TaskStatusUpdateEvent. The framework only auto-creates the Task
# on continuation messages (existing task_id resolves via
# task_manager.get_task()). For fresh requests get_task() returns
# None and the SDK rejects the first status update with
# InvalidAgentResponseError("Agent should enqueue Task before
# TaskStatusUpdateEvent event") — see a2a/server/agent_execution/
# active_task.py for the validation site. PR #2170 migrated the
# surface to v1 but missed this contract; the synth-E2E gate
# surfaced it on every run after staging deploy.
if getattr(context, "current_task", None) is None:
from a2a.types import Task, TaskState, TaskStatus
await event_queue.enqueue_event(
Task(
id=task_id,
context_id=context_id,
status=TaskStatus(state=TaskState.TASK_STATE_SUBMITTED),
)
)
updater = TaskUpdater(event_queue, task_id, context_id)
try:
# set_current_task INSIDE the try so active_tasks is always
# decremented by the finally block even if CancelledError hits
# during the heartbeat HTTP push. Moving it outside the try
# created a window where cancellation left active_tasks stuck
# at 1, permanently blocking queue drain. (#2026)
await set_current_task(self._heartbeat, brief_task(user_input))
messages = _extract_history(context)
if messages:
logger.info("A2A execute: injecting %d history messages", len(messages))
messages.append(("human", user_input))
# Recursion limit: see DEFAULT_RECURSION_LIMIT and
# _parse_recursion_limit() at module top. Re-read on every
# call so the env var can be hot-changed between requests.
recursion_limit = _parse_recursion_limit()
run_config = {
"configurable": {"thread_id": context_id},
"run_name": f"a2a-{context_id[:8]}",
"recursion_limit": recursion_limit,
}
# ── OTEL: llm_call span ──────────────────────────────────────
with tracer.start_as_current_span("llm_call") as llm_span:
llm_span.set_attribute(GEN_AI_OPERATION_NAME, "chat")
llm_span.set_attribute(GEN_AI_SYSTEM, gen_ai_system_from_model(self._model))
llm_span.set_attribute(GEN_AI_REQUEST_MODEL, self._model)
llm_span.set_attribute(WORKSPACE_ID_ATTR, _WORKSPACE_ID)
# ── Step 1: signal "working" to streaming clients ─────────
await updater.start_work()
# ── Step 2: stream tokens via LangGraph astream_events ────
# Each "on_chat_model_stream" event carries an AIMessageChunk.
# We emit one TaskArtifactUpdateEvent per text chunk so SSE
# clients can render tokens in real time.
# artifact_id resets on each new LLM run_id so agent→tool→agent
# cycles each get their own artifact slot.
artifact_id = str(uuid.uuid4())
has_streamed = False # True after first chunk for current artifact
current_run_id = None # Detects new LLM call in a ReAct cycle
accumulated: list[str] = [] # All text for the final Message
last_ai_message = None # Saved for token-usage telemetry
# ── OA-03: Excessive agency tracker ──────────────────────
_agency = (
AgencyTracker(
max_tool_calls=_compliance_cfg.max_tool_calls_per_task,
max_duration_seconds=float(_compliance_cfg.max_task_duration_seconds),
)
if _COMPLIANCE_AVAILABLE and _compliance_cfg and _compliance_cfg.mode == "owasp_agentic"
else None
)
# ── Tool trace: collect every tool invocation for
# platform-level observability ────────────────────
# Keyed by run_id so parallel tool calls (LangGraph
# supports them) pair start→end correctly. Capped at
# MAX_TOOL_TRACE entries to prevent runaway loops from
# ballooning the JSONB payload.
MAX_TOOL_TRACE = 200
tool_trace: list[dict] = []
tool_trace_by_run: dict[str, dict] = {}
async for event in self.agent.astream_events(
{"messages": messages},
config=run_config,
version="v2",
):
kind = event.get("event", "")
if kind == "on_chat_model_stream":
run_id = event.get("run_id", "")
if run_id and run_id != current_run_id:
# New LLM run started — fresh artifact slot
current_run_id = run_id
artifact_id = str(uuid.uuid4())
has_streamed = False
chunk = event.get("data", {}).get("chunk")
if chunk is not None:
texts = _extract_chunk_text(chunk.content)
for text in texts:
await updater.add_artifact(
parts=[Part(text=text)], # v1: TextPart removed, Part takes text= directly
artifact_id=artifact_id,
append=has_streamed, # False=first, True=append
last_chunk=False,
)
has_streamed = True
accumulated.append(text)
elif kind == "on_tool_start":
tool_name = event.get("name", "?")
tool_input = event.get("data", {}).get("input", "")
tool_run_id = event.get("run_id", "")
logger.debug("SSE: tool start — %s", tool_name)
if len(tool_trace) < MAX_TOOL_TRACE:
entry = {
"tool": tool_name,
"input": str(tool_input)[:500] if tool_input else "",
}
tool_trace.append(entry)
if tool_run_id:
tool_trace_by_run[tool_run_id] = entry
if _agency is not None:
_agency.on_tool_call(
tool_name=tool_name,
context_id=context_id,
)
elif kind == "on_tool_end":
tool_end_name = event.get("name", "?")
tool_output = event.get("data", {}).get("output", "")
tool_run_id = event.get("run_id", "")
logger.debug("SSE: tool end — %s", tool_end_name)
# Pair via run_id so parallel tool calls don't clobber each other.
entry = tool_trace_by_run.get(tool_run_id) if tool_run_id else None
if entry is not None:
entry["output_preview"] = str(tool_output)[:300] if tool_output else ""
elif kind == "on_chat_model_end":
# Capture the last completed AIMessage for token telemetry
output = event.get("data", {}).get("output")
if output is not None:
last_ai_message = output
# Record token usage from the last completed LLM call
if last_ai_message is not None:
record_llm_token_usage(llm_span, {"messages": [last_ai_message]})
# Build final text from all accumulated streaming tokens
final_text = "".join(accumulated).strip() or "(no response generated)"
logger.info("A2A execute: response length=%d chars", len(final_text))
# ── OA-02 / OA-06: Output PII redaction ──────────────────────
if _COMPLIANCE_AVAILABLE and _compliance_cfg and _compliance_cfg.mode == "owasp_agentic":
final_text, _pii_types = _redact_pii(final_text)
if _pii_types:
from builtin_tools.audit import log_event as _audit_log
_audit_log(
event_type="compliance",
action="pii.redact",
resource="task_output",
outcome="redacted",
pii_types=_pii_types,
context_id=context_id,
)
# ── OTEL: task_complete span ─────────────────────────────────
with tracer.start_as_current_span("task_complete") as done_span:
done_span.set_attribute(WORKSPACE_ID_ATTR, _WORKSPACE_ID)
done_span.set_attribute(A2A_TASK_ID, context_id)
done_span.set_attribute("task.has_response", bool(accumulated))
done_span.set_attribute("task.response_length", len(final_text))
# ── Step 3: emit final Message ────────────────────────────────
# Non-streaming: ResultAggregator.consume_all() returns this
# immediately as the response (a2a_client.py reads .parts[0].text).
# Streaming: yielded as the last SSE event in the stream.
#
# If the reply mentions /workspace/... paths, stage each one
# and emit as FileParts alongside the text so the canvas can
# render a download button. Same contract the hermes executor
# uses — every runtime going through this code path (langgraph,
# deepagents, future ReAct variants) inherits it.
_outbound = collect_outbound_files(final_text)
if _outbound:
# NOTE: do NOT re-import `Part` here. It is already imported
# at module scope (line 42). A function-scope `from a2a.types
# import ... Part ...` would mark `Part` as a local name
# throughout this function under Python's scoping rules,
# making the earlier `Part(text=text)` call (line ~358, inside
# the astream_events loop) raise UnboundLocalError because
# the local binding is not yet in scope at that point.
#
# a2a-sdk 1.x flattened the Part shape: 0.x used
# `Part(root=TextPart(text=...))` / `Part(root=FilePart(file=
# FileWithUri(uri=..., name=..., mimeType=...)))` (Pydantic
# discriminated-union style). 1.x's Part is a single proto
# message with flat fields: text, url, filename, media_type,
# raw, data, metadata. TextPart/FilePart/FileWithUri were
# removed. Same for Message: messageId/taskId/contextId
# camelCase became message_id/task_id/context_id.
from a2a.types import Message, Role
_parts: list[Part] = [Part(text=final_text)] if final_text else []
for f in _outbound:
_parts.append(Part(
url="workspace:" + f["path"],
filename=f["name"],
media_type=f["mime_type"],
))
msg = Message(
message_id=uuid.uuid4().hex,
# 1.x Role is a protobuf enum: ROLE_UNSPECIFIED,
# ROLE_USER, ROLE_AGENT. Old `Role.agent` (Pydantic
# lowercase enum) doesn't exist anymore.
role=Role.ROLE_AGENT,
parts=_parts,
task_id=task_id,
context_id=context_id,
)
else:
msg = new_text_message(final_text, task_id=task_id, context_id=context_id)
# Attach tool_trace via metadata when supported. Guarded with
# hasattr because some test mocks return a plain string here.
if tool_trace and hasattr(msg, "metadata"):
try:
msg.metadata = {"tool_trace": tool_trace}
except (AttributeError, TypeError):
# `new_text_message()` returns a plain string in
# MagicMock paths in tests, where assignment to
# .metadata raises despite hasattr being true (the
# mock has the attribute as a property). Suppression
# is intentional — production Message objects always
# accept the assignment. See #1787 + commit dcbcf19
# for the original test-mock motivation.
logger.debug("metadata attach skipped (non-Message return from new_text_message)")
# A2A v1 (a2a-sdk ≥ 1.0): once Task is enqueued (above, PR #2558),
# the executor is in task mode and raw Message enqueues are
# rejected with InvalidAgentResponseError("Received Message
# object in task mode. Use TaskStatusUpdateEvent or
# TaskArtifactUpdateEvent instead."). updater.complete()
# wraps the Message in a terminal TaskStatusUpdateEvent
# (state=COMPLETED, final=True) which both streaming and
# non-streaming clients accept.
await updater.complete(message=msg)
_result = final_text
except Exception as e:
logger.error("A2A execute error: %s", e, exc_info=True)
try:
task_span.record_exception(e)
from opentelemetry.trace import StatusCode
task_span.set_status(StatusCode.ERROR, str(e))
except Exception:
pass
# A2A v1: in task mode, terminal errors must publish a
# FAILED TaskStatusUpdateEvent (carrying the error Message)
# rather than a raw Message enqueue. updater.failed() does
# exactly this — both streaming and non-streaming clients
# receive the error and stop polling.
await updater.failed(
message=new_text_message(
sanitize_agent_error(exc=e), task_id=task_id, context_id=context_id
)
)
finally:
await set_current_task(self._heartbeat, "")
return _result
async def cancel(self, context: RequestContext, event_queue: EventQueue) -> None:
"""Cancel a running task — emits canceled state to comply with A2A protocol."""
from a2a.types import TaskStatus, TaskState, TaskStatusUpdateEvent
await event_queue.enqueue_event(
TaskStatusUpdateEvent(
status=TaskStatus(state=TaskState.TASK_STATE_CANCELED), # v1: TaskState uses SCREAMING_SNAKE_CASE
final=True,
)
)
File diff suppressed because it is too large Load Diff
-263
View File
@@ -1,263 +0,0 @@
"""Single source of truth for A2A ``/workspaces/<id>/a2a`` response shapes.
The workspace-server proxy at
``workspace-server/internal/handlers/a2a_proxy.go`` (the canonical
emitter) returns one of the following shapes for a single A2A call:
* **JSON-RPC success** —
``{"jsonrpc": "2.0", "result": {...}, "id": "..."}``
The agent's reply, passed through unchanged.
* **JSON-RPC error** —
``{"jsonrpc": "2.0", "error": {"message": "...", "code": ...}, "id": "..."}``
The agent reported a structured error.
* **Poll-queued** (synthesized at proxy, RFC #2339 PR 2 — see
``a2a_proxy.go:402-406``) —
``{"status": "queued", "delivery_mode": "poll", "method": "..."}``
The target is a poll-mode workspace (no public URL); the message
was written to the platform's inbox queue. The target agent will
fetch it via ``GET /activity?since_id=`` polling. NOT a failure —
delivery succeeded, there's just no synchronous reply to relay.
* **Platform error** — ``{"error": "...", "restarting": true?, "retry_after": int?}``
HTTP-level failure synthesized by the proxy when the agent is
unreachable, the container is restarting, or some other infrastructure
failure happened. ``restarting=true`` flags the platform-initiated
container-restart path.
* **Malformed** — anything else. Surfaced explicitly so a future server
change is loud rather than silent.
The ``parse(data)`` function classifies a pre-decoded JSON body into a
typed variant. Callers ``match`` on the variant and never re-implement
shape detection — that's the SSOT discipline.
# SSOT contract
This file is the Python half. The Go server emits these shapes today
via inline ``gin.H{...}`` literals. A future PR can introduce a Go
mirror (e.g. ``workspace-server/internal/models/a2a_response.go``)
with a typed marshaller — until then, **any change to the wire shape
must be reflected here** and gated by ``test_a2a_response.py``'s
fixture corpus. The corpus exists specifically so a one-sided edit
breaks CI.
# Why a typed model (vs. dict-key sniffing at every site)
The pre-2967 client at ``a2a_client.py:567-587`` sniffed for ``result``
or ``error`` keys inline and treated everything else as malformed —
which silently broke poll-mode peers (the queued envelope has neither
key). Inline sniffing per call site multiplies the surface area where
a new shape gets misclassified. A single typed parser with an
explicit ``Malformed`` escape hatch makes shape additions a
one-line change here + a fixture entry in the test corpus, instead of
a hunt through every parsing site in the runtime.
"""
from __future__ import annotations
import dataclasses
import logging
from typing import Any, Optional, Union
logger = logging.getLogger(__name__)
@dataclasses.dataclass(frozen=True)
class Result:
"""JSON-RPC success — agent's reply available synchronously.
``text`` is the convenience extraction from ``parts[0].text`` (the
A2A multipart shape). ``parts`` is the full list, available for
callers that need richer rendering (multiple parts, non-text parts).
``raw_result`` preserves the unparsed ``result`` field for any
caller that needs it (e.g. activity-row response_body audit).
"""
text: str
parts: list[dict[str, Any]] = dataclasses.field(default_factory=list)
raw_result: Optional[dict[str, Any]] = None
@dataclasses.dataclass(frozen=True)
class Error:
"""JSON-RPC error or platform-level error response.
``code`` is the JSON-RPC integer code when present, else None.
``restarting`` / ``retry_after`` are platform-restart-in-progress
metadata: when both are set, the caller knows the container is
being recycled and may surface a softer error to the user.
"""
message: str
code: Optional[int] = None
restarting: bool = False
retry_after: Optional[int] = None
@dataclasses.dataclass(frozen=True)
class Queued:
"""Platform poll-mode short-circuit — message accepted, peer will pick up async.
Returned when the target workspace is registered as
``delivery_mode=poll`` (no public URL — typical for external
standalone ``molecule-mcp`` runtimes). The message was written to
the platform's inbox queue; the target agent will fetch it via
``GET /activity?since_id=`` polling.
NOT a failure. Callers that expect a synchronous reply (the agent's
response text) won't get one here — they should either:
* Tolerate the absence of a reply (fire-and-forget semantics).
* Fall back to the durable ``/workspaces/:id/delegate`` +
``/delegations`` polling path (see ``a2a_tools_delegation``'s
``_delegate_sync_via_polling``), which writes the same A2A
request through the platform's executeDelegation goroutine
and lets the caller poll for the result row.
``method`` echoes the request method (``message/send``, ``notify``,
etc.) so callers can correlate.
"""
method: str
delivery_mode: str = "poll"
@dataclasses.dataclass(frozen=True)
class Malformed:
"""Server returned a body the parser can't classify.
Carries the raw decoded payload for diagnostic logging. Callers
typically render this as an error to the user (see
``send_a2a_message``) — but the Malformed variant is a separate
type so logging / metrics can distinguish it from genuine
JSON-RPC ``Error`` responses.
"""
raw: Any # whatever the server returned: dict / list / str / number / etc.
Variant = Union[Result, Error, Queued, Malformed]
# Field-name constants — the wire vocabulary. Single source of truth;
# the parser references these by name so a change here is a
# one-line edit instead of a hunt through string literals.
_KEY_RESULT = "result"
_KEY_ERROR = "error"
_KEY_STATUS = "status"
_KEY_DELIVERY_MODE = "delivery_mode"
_KEY_METHOD = "method"
_KEY_RESTARTING = "restarting"
_KEY_RETRY_AFTER = "retry_after"
_STATUS_QUEUED = "queued"
_DELIVERY_MODE_POLL = "poll"
def parse(data: Any) -> Variant:
"""Classify a pre-decoded ``/a2a`` JSON response into a typed variant.
Never raises. Every branch is total: any input that doesn't match a
known shape routes to ``Malformed`` so the caller can decide how
to surface it.
The order of checks matters:
1. Non-dict input → Malformed (server contract is dict-shaped).
2. Poll-queued envelope is checked BEFORE result/error because a
server bug that sets both ``status=queued`` and ``result``
should be loud, not silently treated as Result.
3. ``result`` → Result (the JSON-RPC success path).
4. ``error`` → Error (JSON-RPC error or platform error).
5. Anything else → Malformed.
"""
if not isinstance(data, dict):
logger.warning(
"a2a_response.parse: non-dict body — got %s",
type(data).__name__,
)
return Malformed(raw=data)
# Push-mode queue envelope — returned when a push-mode workspace
# (one with a public URL) is at capacity. The platform queues the
# request and returns {"queued": true, "message": "...", "queue_id": "..."}.
# Unlike the poll-mode envelope (status=queued + delivery_mode=poll),
# this shape has no delivery_mode key — it's distinguishable by
# data.get("queued") is True alone. Checked before poll-mode so the
# two cases are mutually exclusive even if a buggy server sends both.
if data.get("queued") is True:
method_raw = data.get(_KEY_METHOD)
method = str(method_raw) if method_raw is not None else "message/send"
logger.info(
"a2a_response.parse: queued for busy push-mode peer (method=%s, queue_id=%s)",
method,
data.get("queue_id", "?"),
)
return Queued(method=method, delivery_mode="push")
# Poll-queued envelope. Both keys must be present — the workspace
# server sets them together; if only one is present the body is
# ambiguous and we route to Malformed for visibility.
if (
data.get(_KEY_STATUS) == _STATUS_QUEUED
and data.get(_KEY_DELIVERY_MODE) == _DELIVERY_MODE_POLL
):
method_raw = data.get(_KEY_METHOD)
method = str(method_raw) if method_raw is not None else "unknown"
logger.info(
"a2a_response.parse: queued for poll-mode peer (method=%s)",
method,
)
return Queued(method=method)
# JSON-RPC success.
if _KEY_RESULT in data:
result = data[_KEY_RESULT]
if isinstance(result, dict):
parts_raw = result.get("parts")
parts = parts_raw if isinstance(parts_raw, list) else []
text = ""
if parts:
first = parts[0]
if isinstance(first, dict):
text_raw = first.get("text")
text = str(text_raw) if text_raw is not None else ""
return Result(text=text, parts=parts, raw_result=result)
# ``result`` present but not a dict — unusual but not an error;
# surface as a Result with the value rendered to text.
return Result(text=str(result), parts=[], raw_result=None)
# JSON-RPC error or platform error.
if _KEY_ERROR in data:
err_raw = data[_KEY_ERROR]
message = ""
code: Optional[int] = None
if isinstance(err_raw, dict):
msg_raw = err_raw.get("message")
if msg_raw is not None:
message = str(msg_raw).strip()
code_raw = err_raw.get("code")
if isinstance(code_raw, int):
code = code_raw
elif isinstance(err_raw, str):
message = err_raw.strip()
else:
message = str(err_raw)
restarting = bool(data.get(_KEY_RESTARTING, False))
retry_after_raw = data.get(_KEY_RETRY_AFTER)
retry_after = retry_after_raw if isinstance(retry_after_raw, int) else None
return Error(
message=message,
code=code,
restarting=restarting,
retry_after=retry_after,
)
logger.warning(
"a2a_response.parse: unrecognized shape — keys=%s",
sorted(data.keys()),
)
return Malformed(raw=data)
-181
View File
@@ -1,181 +0,0 @@
"""A2A MCP tool implementations — the body of each tool handler.
Imports shared client functions and constants from a2a_client.
"""
import hashlib
import json
import mimetypes
import os
import uuid
import httpx
from a2a_client import (
PLATFORM_URL,
WORKSPACE_ID,
_A2A_ERROR_PREFIX,
_peer_names,
_peer_to_source,
discover_peer,
get_peers,
get_peers_with_diagnostic,
get_workspace_info,
send_a2a_message,
)
from builtin_tools.security import _redact_secrets
from platform_auth import list_registered_workspaces
# ---------------------------------------------------------------------------
# RBAC + auth helpers — extracted to a2a_tools_rbac (RFC #2873 iter 4a).
# Re-exported here under the legacy underscore names so existing tests'
# patch("a2a_tools._check_memory_write_permission", …) and call sites
# inside this module that resolve bare names against the module-level
# namespace continue to work unchanged.
# ---------------------------------------------------------------------------
from a2a_tools_rbac import ( # noqa: E402 (import after the from-a2a_client block)
_auth_headers_for_heartbeat,
_check_memory_read_permission,
_check_memory_write_permission,
_get_workspace_tier,
_is_root_workspace,
_ROLE_PERMISSIONS,
)
# Per-field caps on the heartbeat / activity payload. Borrowed from
# hermes-agent's design discipline: cap ONCE in the helper, not at every
# call site, so a future caller adding error_detail can't accidentally
# DoS activity_logs by pasting a 4MB stack trace + base64 image.
#
# Why these specific limits:
# - error_detail (4096): hermes' value. Long enough for a multi-frame
# stack trace, short enough that 100 errors in 5min is < 500KB total.
# - summary (256): summary is a one-liner shown in the canvas card +
# activity row. 256 covers UTF-8 emoji + a sentence.
# - response_text (NOT capped): this is the agent's actual reply
# content. Capping would silently truncate user-visible output.
_MAX_ERROR_DETAIL_CHARS = 4096
_MAX_SUMMARY_CHARS = 256
async def report_activity(
activity_type: str, target_id: str = "", summary: str = "", status: str = "ok",
task_text: str = "", response_text: str = "", error_detail: str = "",
):
"""Report activity to the platform for live progress tracking."""
# Defensive caps in the helper itself so every caller benefits — see
# _MAX_ERROR_DETAIL_CHARS / _MAX_SUMMARY_CHARS comments above.
if error_detail and len(error_detail) > _MAX_ERROR_DETAIL_CHARS:
error_detail = error_detail[:_MAX_ERROR_DETAIL_CHARS]
if summary and len(summary) > _MAX_SUMMARY_CHARS:
summary = summary[:_MAX_SUMMARY_CHARS]
try:
async with httpx.AsyncClient(timeout=5.0) as client:
payload: dict = {
"activity_type": activity_type,
"source_id": WORKSPACE_ID,
"target_id": target_id,
"method": "message/send",
"summary": summary,
"status": status,
}
if task_text:
payload["request_body"] = {"task": task_text}
if response_text:
payload["response_body"] = {"result": response_text}
if error_detail:
# error_detail is a top-level activity row column on the
# platform (handlers/activity.go). Surfacing the cleaned
# exception string here lets the Activity tab render a
# red error chip + the cause without forcing the user
# to scroll into the raw response_body JSON.
payload["error_detail"] = error_detail
await client.post(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/activity",
json=payload,
headers=_auth_headers_for_heartbeat(),
)
# Also push current_task via heartbeat for canvas card display
if summary:
await client.post(
f"{PLATFORM_URL}/registry/heartbeat",
json={
"workspace_id": WORKSPACE_ID,
"current_task": summary,
"active_tasks": 1,
"error_rate": 0,
"sample_error": "",
"uptime_seconds": 0,
},
headers=_auth_headers_for_heartbeat(),
)
except Exception:
pass # Best-effort — don't block delegation on activity reporting
# Delegation tool handlers — extracted to a2a_tools_delegation
# (RFC #2873 iter 4b). Re-imported here so call sites + tests that
# reference ``a2a_tools.tool_delegate_task`` /
# ``a2a_tools._delegate_sync_via_polling`` keep resolving identically.
from a2a_tools_delegation import ( # noqa: E402 (import after the from-a2a_client block)
_SYNC_POLL_BUDGET_S,
_SYNC_POLL_INTERVAL_S,
_delegate_sync_via_polling,
tool_check_task_status,
tool_delegate_task,
tool_delegate_task_async,
)
# Messaging tool handlers — extracted to a2a_tools_messaging
# (RFC #2873 iter 4d). Re-imported here so call sites + tests that
# reference ``a2a_tools.tool_send_message_to_user`` /
# ``tool_list_peers`` / ``tool_get_workspace_info`` /
# ``tool_chat_history`` / ``_upload_chat_files`` keep resolving
# identically.
from a2a_tools_messaging import ( # noqa: E402 (import after the top-of-module imports)
_upload_chat_files,
tool_broadcast_message,
tool_chat_history,
tool_get_workspace_info,
tool_list_peers,
tool_send_message_to_user,
)
# Memory tool handlers — extracted to a2a_tools_memory (RFC #2873 iter 4c).
# Re-imported here so call sites + tests that reference
# ``a2a_tools.tool_commit_memory`` / ``tool_recall_memory`` keep
# resolving identically.
from a2a_tools_memory import ( # noqa: E402 (import after the top-of-module imports)
tool_commit_memory,
tool_recall_memory,
)
# Inbox tool handlers — extracted to a2a_tools_inbox (RFC #2873 iter 4e).
# Re-imported here so call sites + tests that reference
# ``a2a_tools.tool_inbox_peek`` / ``tool_inbox_pop`` / ``tool_wait_for_message``
# / ``_enrich_inbound_for_agent`` / ``_INBOX_NOT_ENABLED_MSG`` keep
# resolving identically.
from a2a_tools_inbox import ( # noqa: E402 (import after the top-of-module imports)
_INBOX_NOT_ENABLED_MSG,
_enrich_inbound_for_agent,
tool_inbox_peek,
tool_inbox_pop,
tool_wait_for_message,
)
# Identity tool handlers — extracted to a2a_tools_identity. Ports the
# two T4-tier MCP tools (``tool_get_runtime_identity`` +
# ``tool_update_agent_card``) from molecule-ai-workspace-runtime PR#17.
# That repo is mirror-only (reference_runtime_repo_is_mirror_only);
# this is the canonical edit point, and the wheel mirror is
# regenerated by publish-runtime.yml on merge.
from a2a_tools_identity import ( # noqa: E402 (import after the top-of-module imports)
tool_get_runtime_identity,
tool_update_agent_card,
)
-459
View File
@@ -1,459 +0,0 @@
"""Delegation tool handlers — single-concern slice of the a2a_tools surface.
Extracted from ``a2a_tools.py`` (RFC #2873 iter 4b). Owns the three
delegation MCP tools + the RFC #2829 PR-5 sync-via-polling helper they
share.
Public surface:
* ``tool_delegate_task`` — synchronous delegation, waits for response.
* ``tool_delegate_task_async`` — fire-and-forget delegation; returns
``{delegation_id, ...}``.
* ``tool_check_task_status`` — poll the platform's ``/delegations`` log.
Internal:
* ``_delegate_sync_via_polling`` — durable async + poll for terminal
status (RFC #2829 PR-5 cutover path; toggled by
``DELEGATION_SYNC_VIA_INBOX=1``).
* ``_SYNC_POLL_INTERVAL_S`` / ``_SYNC_POLL_BUDGET_S`` constants.
Circular-import note: this module calls ``report_activity`` from
``a2a_tools`` to emit activity rows around the delegate dispatch.
``a2a_tools`` imports the public symbols here at module-load time,
so we use a LAZY import for ``report_activity`` inside the function
that needs it. Without the lazy hop Python raises an ImportError
on first ``a2a_tools`` import.
"""
from __future__ import annotations
import hashlib
import json
import logging
import os
import httpx
logger = logging.getLogger(__name__)
from a2a_client import (
PLATFORM_URL,
WORKSPACE_ID,
_A2A_ERROR_PREFIX,
_A2A_QUEUED_PREFIX,
_peer_names,
_peer_to_source,
discover_peer,
send_a2a_message,
)
from a2a_tools_rbac import auth_headers_for_heartbeat as _auth_headers_for_heartbeat
from _sanitize_a2a import (
_A2A_BOUNDARY_END,
_A2A_BOUNDARY_END_ESCAPED,
_A2A_BOUNDARY_START,
_A2A_BOUNDARY_START_ESCAPED,
sanitize_a2a_result,
) # noqa: E402
# RFC #2829 PR-5 cutover constants. The poll cadence + timeout are
# intentionally generous: 3s gives the platform's executeDelegation
# goroutine room to dispatch + the callee to respond + the result to
# write to activity_logs without thrashing the platform with rapid
# polls; the budget matches the legacy DELEGATION_TIMEOUT (300s) so
# operators don't see behavior change beyond "no more 600s timeouts".
_SYNC_POLL_INTERVAL_S = 3.0
_SYNC_POLL_BUDGET_S = float(os.environ.get("DELEGATION_TIMEOUT", "300.0"))
async def _delegate_sync_via_polling(
workspace_id: str,
task: str,
src: str,
) -> str:
"""RFC #2829 PR-5: durable async delegation + poll for terminal status.
Sidesteps the platform proxy's blocking `message/send` HTTP path that
hits a hard 600s ceiling. Instead:
1. POST /workspaces/<src>/delegate (async, returns 202 + delegation_id)
— platform's executeDelegation goroutine handles A2A dispatch in
the background. No client-side timeout dependency on the platform
holding a connection open.
2. Poll GET /workspaces/<src>/delegations every 3s for a row with
matching delegation_id reaching terminal status (completed/failed).
3. Return the response_preview text on completed; surface error_detail
on failed (with the same _A2A_ERROR_PREFIX wrapping the legacy
path uses, so caller error-detection logic is unchanged).
Both /delegate and /delegations are existing endpoints — this helper
just composes them into a polling synchronous facade. The result is
available the moment the platform writes the terminal status row;
no extra latency vs. the legacy proxy-blocked path on fast cases.
"""
import asyncio
import time
idem_key = hashlib.sha256(f"{src}:{workspace_id}:{task}".encode()).hexdigest()[:32]
# 1. Dispatch via /delegate (the async, durable path).
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.post(
f"{PLATFORM_URL}/workspaces/{src}/delegate",
json={
"target_id": workspace_id,
"task": task,
"idempotency_key": idem_key,
},
headers=_auth_headers_for_heartbeat(src),
)
except Exception as e: # pylint: disable=broad-except
return f"{_A2A_ERROR_PREFIX}delegate dispatch failed: {e}"
if resp.status_code != 202 and resp.status_code != 200:
return f"{_A2A_ERROR_PREFIX}delegate dispatch failed: HTTP {resp.status_code} {resp.text[:200]}"
try:
dispatch = resp.json()
except Exception as e: # pylint: disable=broad-except
return f"{_A2A_ERROR_PREFIX}delegate dispatch returned non-JSON: {e}"
delegation_id = dispatch.get("delegation_id", "")
if not delegation_id:
return f"{_A2A_ERROR_PREFIX}delegate dispatch missing delegation_id: {dispatch}"
# 2. Poll for terminal status with a deadline. Each poll is a cheap
# /delegations GET — bounded by the platform's existing rate limit.
deadline = time.monotonic() + _SYNC_POLL_BUDGET_S
last_status = "unknown"
while time.monotonic() < deadline:
try:
async with httpx.AsyncClient(timeout=10.0) as client:
poll = await client.get(
f"{PLATFORM_URL}/workspaces/{src}/delegations",
headers=_auth_headers_for_heartbeat(src),
)
except Exception as e: # pylint: disable=broad-except
# Transient — keep polling. The platform IS holding the
# delegation row; we just lost a network request.
last_status = f"poll-error: {e}"
await asyncio.sleep(_SYNC_POLL_INTERVAL_S)
continue
if poll.status_code != 200:
last_status = f"poll HTTP {poll.status_code}"
await asyncio.sleep(_SYNC_POLL_INTERVAL_S)
continue
try:
rows = poll.json()
except Exception as e: # pylint: disable=broad-except
last_status = f"poll non-JSON: {e}"
await asyncio.sleep(_SYNC_POLL_INTERVAL_S)
continue
# /delegations returns a flat list of delegation events. Filter to
# our delegation_id; pick the first terminal one. The list may
# have multiple rows per delegation_id (one for the original
# dispatch, one per status update); we want the latest terminal.
if not isinstance(rows, list):
await asyncio.sleep(_SYNC_POLL_INTERVAL_S)
continue
terminal = None
for r in rows:
if not isinstance(r, dict):
continue
if r.get("delegation_id") != delegation_id:
continue
status = (r.get("status") or "").lower()
last_status = status
if status in ("completed", "failed"):
terminal = r
break
if terminal:
if (terminal.get("status") or "").lower() == "completed":
# OFFSEC-003: sanitize response_preview before returning so
# boundary markers injected by a malicious peer cannot escape
# the trust boundary.
return sanitize_a2a_result(terminal.get("response_preview") or "")
# OFFSEC-003: sanitize error_detail / summary before wrapping with
# the _A2A_ERROR_PREFIX sentinel so injected markers cannot appear
# inside the trusted error block returned to the agent.
err_raw = (
terminal.get("error_detail")
or terminal.get("summary")
or "delegation failed"
)
err = sanitize_a2a_result(err_raw)
return f"{_A2A_ERROR_PREFIX}{err}"
await asyncio.sleep(_SYNC_POLL_INTERVAL_S)
# Budget exhausted — the platform's row is still in flight (or queued).
# Surface as an error so the caller can decide to retry or fall back;
# the platform DOES still have the durable row, so the work isn't
# lost — it'll complete eventually and a future check_task_status
# will surface the result.
return (
f"{_A2A_ERROR_PREFIX}polling timeout after {_SYNC_POLL_BUDGET_S}s "
f"(delegation_id={delegation_id}, last_status={last_status}); "
f"the platform is still working on it — call check_task_status('{delegation_id}') to retrieve later"
)
async def tool_delegate_task(
workspace_id: str,
task: str,
source_workspace_id: str | None = None,
) -> str:
"""Delegate a task to another workspace via A2A (synchronous — waits for response).
``source_workspace_id`` selects which registered workspace this
delegation originates from — drives auth + the X-Workspace-ID source
header so the platform's a2a_proxy logs the correct sender. Single-
workspace operators leave it None and routing falls back to the
module-level WORKSPACE_ID.
"""
if not workspace_id or not task:
return "Error: workspace_id and task are required"
# Self-delegation guard: delegating to your own workspace ID deadlocks —
# the sending turn holds _run_lock while the receive handler waits for the
# same lock, the request 30s-times-out, and the whole cycle is wasted.
# Reject immediately with an actionable message. (effective_src mirrors the
# `src or WORKSPACE_ID` resolution used below for routing.)
effective_src = source_workspace_id or _peer_to_source.get(workspace_id) or WORKSPACE_ID
if workspace_id and workspace_id == effective_src:
return (
"Error: cannot delegate_task to your own workspace — self-delegation "
"deadlocks _run_lock (your sending turn holds it, the receive handler "
"waits for it, the request times out). There is no peer who is also you: "
"just do the work yourself, or call commit_memory / send_message_to_user directly."
)
# Auto-route: if source not specified, look up which registered
# workspace last saw this peer (populated by tool_list_peers). Falls
# back to the legacy WORKSPACE_ID for single-workspace operators.
src = source_workspace_id or _peer_to_source.get(workspace_id) or None
# Discover the target. discover_peer is the access-control gate +
# name/status lookup. The peer's reported ``url`` field is NOT used
# for routing — see send_a2a_message, which constructs the URL via
# the platform's A2A proxy.
peer = await discover_peer(workspace_id, source_workspace_id=src)
if not peer:
return f"Error: workspace {workspace_id} not found or not accessible (check access control)"
if (peer.get("status") or "").lower() == "offline":
return f"Error: workspace {workspace_id} is offline"
# Lazy import: a2a_tools imports this module at top-level, so a
# top-level import of report_activity from a2a_tools would create a
# circular dependency at first-import time. Lazy resolution inside
# the function body breaks the cycle without forcing a ground-up
# restructure of the activity-reporting layer.
from a2a_tools import report_activity
# Report delegation start — include the task text for traceability
peer_name = peer.get("name") or _peer_names.get(workspace_id) or workspace_id[:8]
_peer_names[workspace_id] = peer_name # cache for future use
# Brief summary for canvas display — just the delegation target
await report_activity("a2a_send", workspace_id, f"Delegating to {peer_name}", task_text=task)
# RFC #2829 PR-5: agent-side cutover. When DELEGATION_SYNC_VIA_INBOX=1,
# use the platform's durable async delegation API (POST /delegate +
# poll /delegations) instead of the proxy-blocked message/send path.
# This sidesteps the 600s message/send timeout class that broke
# iteration-14/90-style long-running delegations on 2026-05-05.
#
# Default off — staging-canary first, flip default after PR-2's
# result-push flag (DELEGATION_RESULT_INBOX_PUSH) has been on for
# ≥1 week without incident.
if os.environ.get("DELEGATION_SYNC_VIA_INBOX") == "1":
result = await _delegate_sync_via_polling(workspace_id, task, src or WORKSPACE_ID)
else:
# send_a2a_message routes through ${PLATFORM_URL}/workspaces/{id}/a2a
# (the platform proxy) so the same code works for in-container and
# external (standalone molecule-mcp) callers.
result = await send_a2a_message(workspace_id, task, source_workspace_id=src)
# #2967: when the target is a poll-mode peer, the platform's
# a2a_proxy short-circuits and returns a queued envelope —
# send_a2a_message surfaces that as the _A2A_QUEUED_PREFIX
# sentinel. The synchronous proxy path can't deliver a reply
# because the target has no public URL; fall back to the
# durable /delegate + /delegations polling path which DOES
# work for poll-mode peers (the executeDelegation goroutine
# writes to the inbox queue and the result row arrives when
# the target picks it up + replies).
#
# This is what makes external-runtime-to-external-runtime
# A2A actually deliver synchronous replies — without the
# fallback the calling agent sees the queued sentinel as
# success-with-no-text and never gets the peer's response.
if result.startswith(_A2A_QUEUED_PREFIX):
logger.info(
"tool_delegate_task: target=%s is poll-mode; "
"falling back from message/send to /delegate-poll path",
workspace_id,
)
result = await _delegate_sync_via_polling(
workspace_id, task, src or WORKSPACE_ID,
)
# Detect delegation failures — wrap them clearly so the calling agent
# can decide to retry, use another peer, or handle the task itself.
is_error = result.startswith(_A2A_ERROR_PREFIX)
# Strip the sentinel prefix so error_detail is the human-readable
# cause directly. The Activity tab's red error chip surfaces this
# without the user having to scroll into the raw response JSON.
#
# Cap at 4096 chars before sending — the platform's
# activity_logs.error_detail column is unbounded TEXT and a
# malicious or buggy peer could otherwise stream an arbitrarily
# large error message into the caller's activity log. 4096 is
# comfortably above any real exception traceback we've seen and
# well below an obvious-DoS threshold.
error_detail = result[len(_A2A_ERROR_PREFIX):].strip()[:4096] if is_error else ""
await report_activity(
"a2a_receive", workspace_id,
f"{peer_name} responded ({len(result)} chars)" if not is_error else f"{peer_name} failed: {error_detail[:120]}",
task_text=task, response_text=result,
status="error" if is_error else "ok",
error_detail=error_detail,
)
if is_error:
return (
f"DELEGATION FAILED to {peer_name}: {result}\n"
f"You should either: (1) try a different peer, (2) handle this task yourself, "
f"or (3) inform the user that {peer_name} is unavailable and provide your best answer."
)
# OFFSEC-003: escape boundary markers in peer text, then wrap in boundary
# markers so the agent can distinguish trusted (own output) from untrusted
# (peer-supplied) content. Explicit wrapping here rather than inside
# sanitize_a2a_result preserves a clean separation of concerns.
#
# Truncate at the closer BEFORE sanitizing so the raw closer (which gets
# lost during escaping) is removed from the content. After truncation,
# sanitize the remaining text and wrap with escaped boundary markers.
if _A2A_BOUNDARY_END in result:
result = result[:result.index(_A2A_BOUNDARY_END)]
escaped = sanitize_a2a_result(result)
return (
f"{_A2A_BOUNDARY_START_ESCAPED}\n"
f"{escaped}\n"
f"{_A2A_BOUNDARY_END_ESCAPED}"
)
async def tool_delegate_task_async(
workspace_id: str,
task: str,
source_workspace_id: str | None = None,
) -> str:
"""Delegate a task via the platform's async delegation API (fire-and-forget).
Uses POST /workspaces/:id/delegate which runs the A2A request in the background.
Results are tracked in the platform DB and broadcast via WebSocket.
Use check_task_status to poll for results.
``source_workspace_id`` selects the sending workspace (which one of
this agent's registered workspaces gets logged as the originator);
auto-routes via the peer→source cache when omitted.
"""
if not workspace_id or not task:
return "Error: workspace_id and task are required"
src = source_workspace_id or _peer_to_source.get(workspace_id) or WORKSPACE_ID
# Self-delegation guard: even on the async path, queuing a task to your own
# workspace just makes you re-process your own dispatch — never useful, and
# on the sync path it deadlocks (see tool_delegate_task). Reject early.
if workspace_id and workspace_id == src:
return (
"Error: cannot delegate_task_async to your own workspace — there is no "
"peer who is also you. Do the work yourself, or call commit_memory / "
"send_message_to_user directly."
)
# Idempotency key: SHA-256 of (source, target, task) so that a
# restarted agent firing the same delegation gets the same key and
# the platform returns the existing delegation_id instead of
# creating a duplicate. Fixes #1456. Source is in the key so the
# SAME task delegated from two different registered workspaces
# produces two distinct delegations (the right behavior — one per
# tenant audit trail).
idem_key = hashlib.sha256(f"{src}:{workspace_id}:{task}".encode()).hexdigest()[:32]
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.post(
f"{PLATFORM_URL}/workspaces/{src}/delegate",
json={"target_id": workspace_id, "task": task, "idempotency_key": idem_key},
headers=_auth_headers_for_heartbeat(src),
)
if resp.status_code == 202:
data = resp.json()
return json.dumps({
"delegation_id": data.get("delegation_id", ""),
"workspace_id": workspace_id,
"status": "delegated",
"note": "Task delegated. The platform runs it in the background. Use check_task_status to poll for results.",
})
else:
return f"Error: delegation failed with status {resp.status_code}: {resp.text[:200]}"
except Exception as e:
return f"Error: delegation failed — {e}"
async def tool_check_task_status(
workspace_id: str,
task_id: str,
source_workspace_id: str | None = None,
) -> str:
"""Check delegations for this workspace via the platform API.
Args:
workspace_id: Ignored (kept for backward compat). Checks
``source_workspace_id``'s delegations (the workspace that
FIRED the delegations), not the target's.
task_id: Optional delegation_id to filter. If empty, returns all recent delegations.
source_workspace_id: Which registered workspace's delegation log
to query. Defaults to the module-level WORKSPACE_ID.
"""
src = source_workspace_id or WORKSPACE_ID
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.get(
f"{PLATFORM_URL}/workspaces/{src}/delegations",
headers=_auth_headers_for_heartbeat(src),
)
if resp.status_code != 200:
return f"Error: failed to check delegations ({resp.status_code})"
delegations = resp.json()
if task_id:
# Filter by delegation_id
matching = [d for d in delegations if d.get("delegation_id") == task_id]
if matching:
# OFFSEC-003: sanitize peer-supplied fields
d = matching[0]
d["summary"] = sanitize_a2a_result(d.get("summary", ""))
d["response_preview"] = sanitize_a2a_result(d.get("response_preview", ""))
return json.dumps(d)
return json.dumps({"status": "not_found", "delegation_id": task_id})
# Return all recent delegations
summary = []
for d in delegations[:10]:
preview = d.get("response_preview", "")
if preview:
preview = sanitize_a2a_result(preview)
summary.append({
"delegation_id": d.get("delegation_id", ""),
"target_id": d.get("target_id", ""),
"status": d.get("status", ""),
"summary": sanitize_a2a_result(d.get("summary", "")),
"response_preview": preview,
})
return json.dumps({"delegations": summary, "count": len(delegations)})
except Exception as e:
return f"Error checking delegations: {e}"
-187
View File
@@ -1,187 +0,0 @@
"""Identity tool handlers — single-concern slice of the a2a_tools surface.
Owns the two MCP tools that close the T4-tier workspace owner-permission
gaps reported via the canvas:
* ``tool_get_runtime_identity`` — env-only; returns model, model_provider,
molecule_model, anthropic_base_url, tier, workspace_id, runtime
(ADAPTER_MODULE). No HTTP call. Always permitted by RBAC — even
read-only agents may know what model they are.
* ``tool_update_agent_card`` — POSTs the card to ``/registry/update-card``
with the workspace's own bearer (same auth path as ``tool_commit_memory``
via ``a2a_tools_rbac.auth_headers_for_heartbeat``). The platform
replaces the stored card and broadcasts an ``agent_card_updated``
event so the canvas reflects the new card live. Gated on
``memory.write`` capability via the existing RBAC permission map so
read-only roles can't silently rewrite the platform card.
Both originated as a port of molecule-ai-workspace-runtime PR#17
(``feat(mcp): add update_agent_card + get_runtime_identity tools``).
The mirror-only PR#17 was closed without merge per
``reference_runtime_repo_is_mirror_only``; the canonical edit point is
this monorepo at ``workspace/`` and the wheel mirror is regenerated
automatically by the publish-runtime workflow.
Imports the auth-header primitive from ``a2a_tools_rbac`` (iter 4a) —
NOT from ``a2a_tools`` — to avoid a circular import with the
kitchen-sink re-export module.
"""
from __future__ import annotations
import json
import os
from typing import Any
import httpx
from a2a_client import PLATFORM_URL
from a2a_tools_rbac import (
auth_headers_for_heartbeat as _auth_headers_for_heartbeat,
check_memory_write_permission as _check_memory_write_permission,
)
def _runtime_identity_payload() -> dict[str, Any]:
"""Build the identity dict — env-only, no I/O.
Factored out from ``tool_get_runtime_identity`` so tests can assert
against the exact key set without re-parsing JSON. The MCP tool
handler ``tool_get_runtime_identity`` is the only public caller in
production; tests call this helper directly.
"""
return {
"model": os.environ.get("MODEL", ""),
"model_provider": os.environ.get("MODEL_PROVIDER", ""),
"molecule_model": os.environ.get("MOLECULE_MODEL", ""),
"anthropic_base_url": os.environ.get("ANTHROPIC_BASE_URL", ""),
"tier": os.environ.get("TIER", ""),
"workspace_id": os.environ.get("WORKSPACE_ID", ""),
# Adapter module is the closest thing the runtime has to a
# "template slug" — e.g. "adapter" for claude-code-default,
# "hermes" for hermes-template, etc. Picked from
# $ADAPTER_MODULE env baked by each template's Dockerfile.
"runtime": os.environ.get("ADAPTER_MODULE", ""),
}
async def tool_get_runtime_identity() -> str:
"""Return this runtime's identity — model, provider, tier, IDs.
Env-only; no HTTP call. Useful so the agent can answer "what model
am I?" correctly instead of guessing from a stale system prompt
that the operator may have changed between boots.
Returns the identity as a JSON-encoded string (the dispatch contract
every MCP tool in this module follows). Tests that want to assert
individual fields can call ``_runtime_identity_payload()`` directly,
or ``json.loads`` the return value.
Always permitted by RBAC — there is no sensitive information here
that isn't already available to the process via ``os.environ``.
The point of the tool is to surface those env values to the agent
layer in a stable, documented shape rather than expecting every
agent runtime to know to ``echo $MODEL``.
"""
return json.dumps(_runtime_identity_payload(), indent=2)
async def tool_update_agent_card(card: Any) -> str:
"""Update this workspace's agent_card on the platform.
POSTs the provided card to ``/registry/update-card`` with the
workspace's own bearer token (same auth path as ``tool_commit_memory``
and ``tool_get_workspace_info``). The platform validates required
fields server-side, replaces the stored card, and broadcasts an
``agent_card_updated`` event so the canvas updates live.
Args:
card: A JSON-serialisable object (typically a dict) holding the
new card. The platform validates required fields server-side.
Returns:
JSON-encoded string. Body:
- ``{"success": true, "status": "updated"}`` on success;
- ``{"success": false, "error": "<msg>", "status_code": <int>}``
on platform error;
- ``{"success": false, "error": "<reason>"}`` on local validation
(non-dict card, missing WORKSPACE_ID, network error).
Permission gate: this tool requires the ``memory.write`` RBAC
capability — same gate as ``tool_commit_memory``. The check runs
inline rather than at the dispatcher layer to keep ``a2a_mcp_server``
permission-agnostic (the gate sits with the implementation, not the
transport). Read-only roles get a clear error string back instead
of a 403 from the platform.
We re-check ``isinstance(card, dict)`` here defensively rather than
trust the MCP schema validator alone — the schema only constrains
the transport, not the in-process call surface used by tests and
sibling modules.
"""
payload = await _update_agent_card_impl(card)
return json.dumps(payload, indent=2)
async def _update_agent_card_impl(card: Any) -> dict[str, Any]:
"""Dict-returning core of ``tool_update_agent_card``.
Split out so tests can assert against the raw dict shape (status
codes, error messages) without re-parsing JSON on every assertion.
The string-returning ``tool_update_agent_card`` is a thin wrapper
invoked by the MCP dispatcher.
"""
# RBAC: require memory.write permission. Same gate as
# tool_commit_memory (the agent already needs this capability to
# persist anything outbound). Read-only roles can still call
# get_runtime_identity / get_workspace_info to introspect — those
# are env-only / read-only and have no inline gate.
if not _check_memory_write_permission():
return {
"success": False,
"error": (
"RBAC — this workspace does not have the 'memory.write' "
"permission required to update the agent_card."
),
}
if not isinstance(card, dict):
return {
"success": False,
"error": "card must be a JSON object (dict)",
}
ws_id = os.environ.get("WORKSPACE_ID", "")
if not ws_id:
return {
"success": False,
"error": "WORKSPACE_ID env not set; cannot identify caller",
}
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.post(
f"{PLATFORM_URL}/registry/update-card",
json={"workspace_id": ws_id, "agent_card": card},
headers=_auth_headers_for_heartbeat(),
)
if resp.status_code == 200:
body: dict[str, Any] = {}
try:
body = resp.json()
except Exception:
pass
return {
"success": True,
"status": body.get("status", "updated"),
}
# Non-200 — surface what the platform returned.
error_msg = ""
try:
error_msg = resp.json().get("error", "") or resp.text
except Exception:
error_msg = resp.text
return {
"success": False,
"status_code": resp.status_code,
"error": error_msg,
}
except Exception as e:
return {"success": False, "error": f"network error: {e}"}
-140
View File
@@ -1,140 +0,0 @@
"""Inbox tool handlers — single-concern slice of the a2a_tools surface.
Standalone-runtime path for inbound-message delivery (push-mode runtimes
get messages via the channel-tag synthesis in a2a_mcp_server). The
``InboxState`` singleton is set by ``mcp_cli`` before the MCP server
starts; in-container runtimes never call ``inbox.activate(...)`` so
``inbox.get_state()`` returns None and these tools surface an
informational error instead of raising.
When-to-use guidance for agents (mirrored in
``platform_tools/registry.py``):
- ``wait_for_message``: block until a new inbound message arrives, then
decide what to do with it; forms the loop ``wait → respond → wait``.
- ``inbox_peek``: inspect the queue non-destructively.
- ``inbox_pop``: remove a handled message by activity_id.
Extracted from ``a2a_tools.py`` in RFC #2873 iter 4e so the kitchen-sink
module shrinks to a back-compat shim. The extraction also makes the
``_enrich_inbound_for_agent`` helper unit-testable in isolation —
previously it was buried in ``a2a_tools`` and only exercised through
the inbox wrappers, leaving its peer-id-empty / cache-miss / registry-
unavailable branches under-covered.
"""
from __future__ import annotations
import asyncio
import json
# Surfaced when the inbox subsystem is not initialised. Returned by the
# three inbox tool wrappers below so the agent gets a clear "this
# runtime delivers via push" message instead of a NameError.
_INBOX_NOT_ENABLED_MSG = (
"Error: inbox polling is not enabled in this runtime. The standalone "
"molecule-mcp wrapper activates it; in-container runtimes receive "
"messages via push delivery and do not need these tools."
)
def _enrich_inbound_for_agent(d: dict) -> dict:
"""Add peer_name / peer_role / agent_card_url to a poll-path message.
The PUSH path (a2a_mcp_server._build_channel_notification) already
enriches the meta dict with these fields, so a Claude Code host
with channel-push sees them. The POLL path goes through
InboxMessage.to_dict, which is intentionally identity-free (the
storage layer doesn't know about the registry cache). Without this
helper, every non-Claude-Code MCP client that uses inbox_peek /
wait_for_message gets a plain message and the receiving agent
can't tell who's writing — breaking the contract documented in
a2a_mcp_server.py:303-345 ("In both paths the same fields apply").
Cache-first non-blocking enrichment (same shape as push): on cache
miss the helper returns the bare message; the next call within the
5-min TTL hits the warm cache. Failure to enrich is non-fatal —
the agent still gets text + peer_id + kind + activity_id, just
without the friendly identity.
"""
peer_id = d.get("peer_id") or ""
if not peer_id:
# canvas_user — no peer to enrich; helper returns the plain
# message unchanged so the canvas reply path still works.
return d
try:
from a2a_client import ( # local import — avoid module-load cycle
_agent_card_url_for,
enrich_peer_metadata_nonblocking,
)
except Exception: # noqa: BLE001
# If a2a_client is unavailable (test harness, partial install),
# degrade gracefully — agent still gets the bare envelope.
return d
record = enrich_peer_metadata_nonblocking(peer_id)
if record is not None:
if name := record.get("name"):
d["peer_name"] = name
if role := record.get("role"):
d["peer_role"] = role
# agent_card_url is constructable from peer_id alone — surface it
# even when registry enrichment misses, so the receiving agent has
# a single endpoint to hit for the peer's full capability list.
d["agent_card_url"] = _agent_card_url_for(peer_id)
return d
async def tool_inbox_peek(limit: int = 10) -> str:
"""Return up to ``limit`` pending inbound messages without removing them."""
import inbox # local import — avoids a circular dep at module load
state = inbox.get_state()
if state is None:
return _INBOX_NOT_ENABLED_MSG
messages = state.peek(limit=limit if isinstance(limit, int) else 10)
return json.dumps([_enrich_inbound_for_agent(m.to_dict()) for m in messages])
async def tool_inbox_pop(activity_id: str) -> str:
"""Remove a message from the inbox queue by activity_id."""
import inbox
state = inbox.get_state()
if state is None:
return _INBOX_NOT_ENABLED_MSG
if not isinstance(activity_id, str) or not activity_id:
return "Error: activity_id is required."
removed = state.pop(activity_id)
if removed is None:
return json.dumps({"removed": False, "activity_id": activity_id})
return json.dumps({"removed": True, "activity_id": activity_id})
async def tool_wait_for_message(timeout_secs: float = 60.0) -> str:
"""Block until a new message arrives or ``timeout_secs`` elapses.
Returns the head message non-destructively; the agent decides
whether to ``inbox_pop`` it after acting.
"""
import inbox
state = inbox.get_state()
if state is None:
return _INBOX_NOT_ENABLED_MSG
try:
timeout = float(timeout_secs)
except (TypeError, ValueError):
timeout = 60.0
# Cap at 300s — Claude Code's default tool timeout is ~10min, and
# blocking longer than 5min wastes the prompt cache window for
# nothing useful. Operators who want longer can call repeatedly.
timeout = max(0.0, min(timeout, 300.0))
# The threading.Event-based wait would block the asyncio loop.
# Run it on the default executor so the MCP server can keep
# processing other JSON-RPC requests while we sleep.
loop = asyncio.get_running_loop()
message = await loop.run_in_executor(None, state.wait, timeout)
if message is None:
return json.dumps({"timeout": True, "timeout_secs": timeout})
return json.dumps(_enrich_inbound_for_agent(message.to_dict()))
-141
View File
@@ -1,141 +0,0 @@
"""Memory tool handlers — single-concern slice of the a2a_tools surface.
Extracted from ``a2a_tools.py`` (RFC #2873 iter 4c). Owns the two
agent-memory MCP tools:
* ``tool_commit_memory`` — write to the workspace's persistent memory.
* ``tool_recall_memory`` — search the workspace's persistent memory.
Both go through the platform's ``/workspaces/:id/memories`` endpoint;
the platform is the source of truth for namespace isolation + audit
trail. Local responsibility here is RBAC enforcement BEFORE hitting
the network so a denied operation surfaces a clear in-band error
instead of an opaque platform 403.
Imports the RBAC primitives from ``a2a_tools_rbac`` (iter 4a).
"""
from __future__ import annotations
import json
import httpx
from a2a_client import PLATFORM_URL, WORKSPACE_ID
from a2a_tools_rbac import (
auth_headers_for_heartbeat as _auth_headers_for_heartbeat,
check_memory_read_permission as _check_memory_read_permission,
check_memory_write_permission as _check_memory_write_permission,
is_root_workspace as _is_root_workspace,
)
from builtin_tools.security import _redact_secrets
async def tool_commit_memory(
content: str,
scope: str = "LOCAL",
source_workspace_id: str | None = None,
) -> str:
"""Save important information to persistent memory.
GLOBAL scope is writable only by root workspaces (tier == 0).
RBAC memory.write permission is required for all scope levels.
The source workspace_id is embedded in every record so the platform
can enforce cross-workspace isolation and audit trail.
``source_workspace_id`` selects which registered workspace this
memory belongs to when the agent is registered into multiple
workspaces (PR-1 / multi-workspace mode). When unset, falls back
to the module-level WORKSPACE_ID — single-workspace operators see
no behaviour change.
"""
if not content:
return "Error: content is required"
content = _redact_secrets(content)
scope = scope.upper()
if scope not in ("LOCAL", "TEAM", "GLOBAL"):
scope = "LOCAL"
# RBAC: require memory.write permission (mirrors builtin_tools/memory.py)
if not _check_memory_write_permission():
return (
"Error: RBAC — this workspace does not have the 'memory.write' "
"permission for this operation."
)
# Scope enforcement: only root workspaces (tier 0) can write GLOBAL memory.
# This prevents tenant workspaces from poisoning org-wide memory (GH#1610).
if scope == "GLOBAL" and not _is_root_workspace():
return (
"Error: RBAC — only root workspaces (tier 0) can write to GLOBAL scope. "
"Non-root workspaces may use LOCAL or TEAM scope."
)
src = source_workspace_id or WORKSPACE_ID
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.post(
f"{PLATFORM_URL}/workspaces/{src}/memories",
json={
"content": content,
"scope": scope,
# Embed source workspace so the platform can namespace-isolate
# and audit cross-workspace writes (GH#1610 fix).
"workspace_id": src,
},
headers=_auth_headers_for_heartbeat(src),
)
data = resp.json()
if resp.status_code in (200, 201):
return json.dumps({"success": True, "id": data.get("id"), "scope": scope})
return f"Error: {data.get('error', resp.text)}"
except Exception as e:
return f"Error saving memory: {e}"
async def tool_recall_memory(
query: str = "",
scope: str = "",
source_workspace_id: str | None = None,
) -> str:
"""Search persistent memory for previously saved information.
RBAC memory.read permission is required (mirrors builtin_tools/memory.py).
The workspace_id is sent as a query parameter so the platform can
cross-validate it against the auth token and defend against any future
path traversal / cross-tenant read bugs in the platform itself.
``source_workspace_id`` selects which registered workspace's memories
to search when the agent is registered into multiple workspaces.
Unset → defaults to the module-level WORKSPACE_ID.
"""
# RBAC: require memory.read permission (mirrors builtin_tools/memory.py)
if not _check_memory_read_permission():
return (
"Error: RBAC — this workspace does not have the 'memory.read' "
"permission for this operation."
)
src = source_workspace_id or WORKSPACE_ID
params: dict[str, str] = {"workspace_id": src}
if query:
params["q"] = query
if scope:
params["scope"] = scope.upper()
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.get(
f"{PLATFORM_URL}/workspaces/{src}/memories",
params=params,
headers=_auth_headers_for_heartbeat(src),
)
data = resp.json()
if isinstance(data, list):
if not data:
return "No memories found."
lines = []
for m in data:
lines.append(f"[{m.get('scope', '?')}] {m.get('content', '')}")
return "\n".join(lines)
return json.dumps(data)
except Exception as e:
return f"Error recalling memory: {e}"
-382
View File
@@ -1,382 +0,0 @@
"""Messaging tool handlers — single-concern slice of the a2a_tools surface.
Extracted from ``a2a_tools.py`` (RFC #2873 iter 4d). Owns the four
human-and-peer messaging MCP tools + the chat-upload helper they share:
* ``tool_send_message_to_user`` — push a canvas-chat message via the
platform's ``/notify`` endpoint.
* ``tool_list_peers`` — discover peers across one or many registered
workspaces, with side-effect of populating ``_peer_to_source`` for
delegate-task auto-routing.
* ``tool_get_workspace_info`` — JSON-encode the workspace's own info.
* ``tool_chat_history`` — fetch prior conversation rows with a peer.
* ``_upload_chat_files`` — internal helper for the message-attachments
code path; routes local file paths through the platform's
``/chat/uploads`` so the canvas can render them as download chips.
Imports the auth-header primitive from ``a2a_tools_rbac`` (iter 4a).
"""
from __future__ import annotations
import json
import mimetypes
import os
import httpx
from a2a_client import (
PLATFORM_URL,
WORKSPACE_ID,
_peer_names,
_peer_to_source,
get_peers_with_diagnostic,
get_workspace_info,
)
from a2a_tools_rbac import auth_headers_for_heartbeat as _auth_headers_for_heartbeat
from platform_auth import list_registered_workspaces
async def _upload_chat_files(
client: httpx.AsyncClient,
paths: list[str],
workspace_id: str | None = None,
) -> tuple[list[dict], str | None]:
"""Upload local file paths through /workspaces/<self>/chat/uploads.
The platform stages each upload under /workspace/.molecule/chat-uploads
(an "allowed root" the canvas knows how to render via the Download
endpoint) and returns metadata the broadcast payload references.
Why we route through upload instead of just passing the agent's path:
the canvas's allowed-root list is /configs, /workspace, /home, /plugins
— files at /tmp or /root would be unreachable. Uploading copies the
bytes into an allowed root regardless of where the agent wrote them.
Returns (attachments, error). On any failure the caller should NOT
fire the notify — partial-attach would surface a half-rendered chip.
"""
if not paths:
return [], None
files_payload: list[tuple[str, tuple[str, bytes, str]]] = []
for p in paths:
if not isinstance(p, str) or not p:
return [], f"Error: invalid attachment path {p!r}"
if not os.path.isfile(p):
return [], f"Error: attachment not found: {p}"
try:
with open(p, "rb") as fh:
data = fh.read()
except OSError as e:
return [], f"Error reading {p}: {e}"
# Sniff mime from filename so the canvas can pick the right
# icon / preview / inline-image renderer. Pre-fix this was
# hardcoded application/octet-stream and chat_files.go's
# Upload trusts whatever Content-Type the multipart part
# carries — `mt := fh.Header.Get("Content-Type")` only falls
# back to extension-sniffing when the header is empty. So a
# hardcoded octet-stream meant every attachment lost its
# real type forever, breaking the canvas chip's icon logic.
mime_type, _ = mimetypes.guess_type(p)
if not mime_type:
mime_type = "application/octet-stream"
files_payload.append(("files", (os.path.basename(p), data, mime_type)))
target_workspace_id = (workspace_id or "").strip() or WORKSPACE_ID
try:
resp = await client.post(
f"{PLATFORM_URL}/workspaces/{target_workspace_id}/chat/uploads",
files=files_payload,
headers=_auth_headers_for_heartbeat(target_workspace_id),
)
except Exception as e:
return [], f"Error uploading attachments: {e}"
if resp.status_code != 200:
return [], f"Error: chat/uploads returned {resp.status_code}: {resp.text[:200]}"
try:
body = resp.json()
except Exception as e:
return [], f"Error parsing upload response: {e}"
uploaded = body.get("files") or []
if not isinstance(uploaded, list) or len(uploaded) != len(paths):
return [], f"Error: upload returned {len(uploaded) if isinstance(uploaded, list) else 'invalid'} entries for {len(paths)} files"
return uploaded, None
async def tool_broadcast_message(
message: str,
workspace_id: str | None = None,
) -> str:
"""Send a broadcast message to ALL agent workspaces in the org.
Requires the workspace to have broadcast_enabled=true (set by a user or
admin via PATCH /workspaces/:id/abilities). Use for urgent org-wide
signals — status changes, critical alerts, coordination instructions.
Every non-removed workspace receives the message in its activity log so
poll-mode agents pick it up, and push-mode canvases get a real-time
BROADCAST_MESSAGE WebSocket event.
Args:
message: The broadcast text. Keep it concise — all agents receive
this, so avoid lengthy prose that floods every context.
workspace_id: Optional. Which registered workspace to send the
broadcast from. Single-workspace agents omit this.
"""
if not message:
return "Error: message is required"
target_workspace_id = (workspace_id or "").strip() or WORKSPACE_ID
try:
async with httpx.AsyncClient(timeout=30.0) as client:
resp = await client.post(
f"{PLATFORM_URL}/workspaces/{target_workspace_id}/broadcast",
json={"message": message},
headers=_auth_headers_for_heartbeat(target_workspace_id),
)
if resp.status_code == 200:
data = resp.json()
delivered = data.get("delivered", "?")
return f"Broadcast sent to {delivered} workspace(s)"
if resp.status_code == 403:
try:
hint = resp.json().get("hint", "")
except Exception:
hint = ""
return f"Error: broadcast ability not enabled.{(' ' + hint) if hint else ''}"
return f"Error: platform returned {resp.status_code}"
except Exception as e:
return f"Error sending broadcast: {e}"
async def tool_send_message_to_user(
message: str,
attachments: list[str] | None = None,
workspace_id: str | None = None,
) -> str:
"""Send a message directly to the user's canvas chat via WebSocket.
Args:
message: The text to display in the user's chat. Required even
when sending attachments — set to a short caption like
"Here's the build output:" or "Done — see attached."
attachments: Optional list of absolute file paths inside this
container. Each is uploaded to the platform and rendered
in the canvas as a clickable download chip. Use this
instead of pasting paths in the message text — paths
render as plain text and the user can't click them.
Examples:
attachments=["/tmp/build-output.zip"]
attachments=["/workspace/report.pdf", "/workspace/data.csv"]
workspace_id: Optional. When the agent is registered in MULTIPLE
workspaces (external multi-workspace MCP path), this
selects which workspace's chat to deliver the message to —
should match the ``arrival_workspace_id`` of the inbound
message you're replying to so the user sees the reply in
the same canvas they typed in. Single-workspace agents
omit this; the message routes to the only registered
workspace.
"""
if not message:
return "Error: message is required"
target_workspace_id = (workspace_id or "").strip() or WORKSPACE_ID
try:
async with httpx.AsyncClient(timeout=60.0) as client:
uploaded, upload_err = await _upload_chat_files(
client, attachments or [], workspace_id=target_workspace_id,
)
if upload_err:
return upload_err
payload: dict = {"message": message}
if uploaded:
payload["attachments"] = uploaded
resp = await client.post(
f"{PLATFORM_URL}/workspaces/{target_workspace_id}/notify",
json=payload,
headers=_auth_headers_for_heartbeat(target_workspace_id),
)
if resp.status_code == 200:
if uploaded:
return f"Message sent to user with {len(uploaded)} attachment(s)"
return "Message sent to user"
if resp.status_code == 403:
try:
body = resp.json()
if body.get("error") == "talk_to_user_disabled":
hint = body.get("hint", "")
return (
"Error: this workspace is not allowed to send messages "
"directly to the user (talk_to_user is disabled). "
+ (hint + " " if hint else "")
+ "Use delegate_task to forward your update to a parent "
"or supervisor workspace that can reach the user."
)
except Exception:
pass
return f"Error: platform returned {resp.status_code}"
except Exception as e:
return f"Error sending message: {e}"
async def tool_list_peers(source_workspace_id: str | None = None) -> str:
"""List all workspaces this agent can communicate with.
Behavior:
- ``source_workspace_id`` set → list peers of that one workspace.
- Unset, single-workspace mode → list peers of WORKSPACE_ID
(the legacy path, unchanged).
- Unset, multi-workspace mode (MOLECULE_WORKSPACES populated) →
aggregate across every registered workspace, prefixing each
peer with its source so the agent / user can see the full peer
surface in one call.
Side-effect: populates ``_peer_to_source`` so subsequent
``tool_delegate_task(target)`` auto-routes through the correct
sending workspace without the agent needing ``source_workspace_id``.
"""
sources: list[str]
aggregate = False
if source_workspace_id:
sources = [source_workspace_id]
else:
registered = list_registered_workspaces()
if len(registered) > 1:
sources = registered
aggregate = True
else:
sources = [WORKSPACE_ID]
all_peers: list[tuple[str, dict]] = [] # (source, peer_record)
diagnostics: list[tuple[str, str]] = [] # (source, diagnostic)
for src in sources:
peers, diagnostic = await get_peers_with_diagnostic(source_workspace_id=src)
if peers:
for p in peers:
all_peers.append((src, p))
elif diagnostic is not None:
diagnostics.append((src, diagnostic))
if not all_peers:
if diagnostics:
joined = "; ".join(f"[{src[:8]}] {d}" for src, d in diagnostics)
return f"No peers found. {joined}"
return (
"You have no peers in the platform registry. "
"(No parent, no children, no siblings registered.)"
)
lines = []
for src, p in all_peers:
status = p.get("status", "unknown")
role = p.get("role", "")
peer_id = p["id"]
# Cache name for use in delegate_task
_peer_names[peer_id] = p["name"]
# Cache the source workspace so tool_delegate_task auto-routes
_peer_to_source[peer_id] = src
if aggregate:
lines.append(
f"- {p['name']} (ID: {peer_id}, status: {status}, role: {role}, via: {src[:8]})"
)
else:
lines.append(f"- {p['name']} (ID: {peer_id}, status: {status}, role: {role})")
return "\n".join(lines)
async def tool_get_workspace_info(source_workspace_id: str | None = None) -> str:
"""Get this workspace's own info.
``source_workspace_id`` selects which registered workspace to
introspect when the agent is registered into multiple workspaces.
Unset → falls back to module-level WORKSPACE_ID.
"""
info = await get_workspace_info(source_workspace_id=source_workspace_id)
return json.dumps(info, indent=2)
async def tool_chat_history(
peer_id: str,
limit: int = 20,
before_ts: str = "",
source_workspace_id: str | None = None,
) -> str:
"""Fetch the prior conversation with one peer.
Hits ``/workspaces/<self>/activity?peer_id=<peer>&limit=<N>``
against the workspace-server, which returns activity rows where
the peer is either the sender (``source_id=peer`` — they sent us
the message) or the recipient (``target_id=peer`` — we sent to
them) of an A2A turn — both sides of the conversation in
chronological order.
Args:
peer_id: The other workspace's UUID. Same value the agent
sees as ``peer_id`` on a peer_agent push or ``workspace_id``
on a delegate_task call.
limit: Maximum rows to return; capped server-side at 500. The
default of 20 covers "most recent context for this peer"
without flooding the agent's context window.
before_ts: Optional RFC3339 timestamp; only rows strictly
older are returned. Used to page backward through long
histories — pass the oldest ``ts`` from the previous
response. Empty (default) returns the most recent ``limit``
rows.
source_workspace_id: Which registered workspace's activity log
to query. Auto-routes via ``_peer_to_source`` cache when
unset (the workspace this peer was discovered through);
falls back to module-level WORKSPACE_ID for single-workspace
operators.
Returns a JSON-encoded list of activity rows (or an error string
starting with ``Error:`` so the agent can branch). Each row carries
``activity_type``, ``source_id``, ``target_id``, ``method``,
``summary``, ``request_body``, ``response_body``, ``status``,
``created_at`` — same shape ``inbox_peek`` and the canvas chat
loader already see.
"""
if not peer_id or not isinstance(peer_id, str):
return "Error: peer_id is required"
if not isinstance(limit, int) or limit <= 0:
limit = 20
if limit > 500:
limit = 500
src = source_workspace_id or _peer_to_source.get(peer_id) or WORKSPACE_ID
params: dict[str, str] = {
"peer_id": peer_id,
"limit": str(limit),
}
# Forward verbatim — the server route validates as RFC3339 at the
# trust boundary and translates into a `created_at < $X` clause.
if before_ts:
params["before_ts"] = before_ts
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.get(
f"{PLATFORM_URL}/workspaces/{src}/activity",
params=params,
headers=_auth_headers_for_heartbeat(src),
)
except Exception as exc: # noqa: BLE001
return f"Error: chat_history request failed: {exc}"
if resp.status_code == 400:
# Trust-boundary rejection (malformed peer_id, etc.) — surface
# the server's reason verbatim so the agent can correct itself.
try:
err = resp.json().get("error", "bad request")
except Exception: # noqa: BLE001
err = "bad request"
return f"Error: {err}"
if resp.status_code >= 400:
return f"Error: chat_history returned HTTP {resp.status_code}"
try:
rows = resp.json()
except Exception: # noqa: BLE001
return "Error: chat_history response was not JSON"
if not isinstance(rows, list):
return "Error: chat_history response was not a list"
# Server returns DESC (most recent first); reverse to chronological
# so the agent reads the conversation top-down like a chat log.
rows.reverse()
return json.dumps(rows)
-138
View File
@@ -1,138 +0,0 @@
"""RBAC + auth-header helpers shared by all a2a_tools tool handlers.
Extracted from ``a2a_tools.py`` (RFC #2873 iter 4a). Centralises the
"what can this workspace do" + "how do I prove it on a platform call"
concerns into a single module so:
* Future tools added under ``a2a_tools/`` see one obvious helper to
call instead of re-implementing the role/tier check.
* The role-permission table is in ONE place — adding a new role
or capability touches one file, not every tool that gates on it.
* Tests targeting these helpers don't have to import the whole
991-LOC ``a2a_tools`` surface.
Public surface:
* ``ROLE_PERMISSIONS`` — canonical role → action set table.
* ``get_workspace_tier()`` — config-resolved tier (0 = root).
* ``check_memory_write_permission()`` — boolean.
* ``check_memory_read_permission()`` — boolean.
* ``is_root_workspace()`` — boolean (tier == 0).
* ``auth_headers_for_heartbeat(workspace_id=None)`` — auth-header dict
with the multi-workspace registry lookup; tolerates ``platform_auth``
missing on older installs (returns ``{}``).
Underscore-prefixed back-compat aliases (``_ROLE_PERMISSIONS``,
``_check_memory_write_permission``, etc.) match the names previously
exposed in ``a2a_tools`` so existing tests'
``patch("a2a_tools._foo", ...)`` continue to work via the re-exports
in ``a2a_tools.py``.
"""
from __future__ import annotations
import os
# Mirror ``builtin_tools/audit.py`` for a2a_tools isolation. Listed as a
# module-level constant rather than computed lazily so the table is
# discoverable in static analysis + ``grep``.
ROLE_PERMISSIONS: dict[str, set[str]] = {
"admin": {"delegate", "approve", "memory.read", "memory.write"},
"operator": {"delegate", "approve", "memory.read", "memory.write"},
"read-only": {"memory.read"},
"no-delegation": {"approve", "memory.read", "memory.write"},
"no-approval": {"delegate", "memory.read", "memory.write"},
"memory-readonly": {"memory.read"},
}
def get_workspace_tier() -> int:
"""Return the workspace tier from config (0 = root, 1+ = tenant)."""
try:
from config import load_config
cfg = load_config()
return getattr(cfg, "tier", 1)
except Exception:
return int(os.environ.get("WORKSPACE_TIER", 1))
def _resolve_role_state() -> tuple[list[str], dict]:
"""Return (roles, allowed_actions) from config.
Fail-closed: if config is unavailable, fall back to an "operator"
default with no per-role overrides. Operator has memory.read +
memory.write but not the elevated approve/delegate over GLOBAL
scope, so a config outage doesn't grant unexpected privileges.
"""
try:
from config import load_config
cfg = load_config()
roles = list(getattr(cfg, "rbac", None).roles or ["operator"])
allowed = dict(getattr(cfg, "rbac", None).allowed_actions or {})
return roles, allowed
except Exception:
return ["operator"], {}
def check_memory_write_permission() -> bool:
"""Return True if this workspace's RBAC roles grant memory.write."""
roles, allowed = _resolve_role_state()
for role in roles:
if role == "admin":
return True
if role in allowed:
if "memory.write" in allowed[role]:
return True
elif role in ROLE_PERMISSIONS and "memory.write" in ROLE_PERMISSIONS[role]:
return True
return False
def check_memory_read_permission() -> bool:
"""Return True if this workspace's RBAC roles grant memory.read."""
roles, allowed = _resolve_role_state()
for role in roles:
if role == "admin":
return True
if role in allowed:
if "memory.read" in allowed[role]:
return True
elif role in ROLE_PERMISSIONS and "memory.read" in ROLE_PERMISSIONS[role]:
return True
return False
def is_root_workspace() -> bool:
"""Return True if this workspace is tier 0 (root/root-org)."""
return get_workspace_tier() == 0
def auth_headers_for_heartbeat(workspace_id: str | None = None) -> dict[str, str]:
"""Return Phase 30.1 auth headers; tolerate platform_auth being absent
in older installs (e.g. during rolling upgrade).
``workspace_id`` selects the per-workspace token from the multi-
workspace registry when set (PR-1: external agent registered in
multiple workspaces). With no arg the legacy single-token path is
unchanged.
"""
try:
from platform_auth import auth_headers
return auth_headers(workspace_id) if workspace_id else auth_headers()
except Exception:
return {}
# ============== Back-compat aliases for the previous a2a_tools names ==============
# Tests + downstream call sites refer to the pre-extract names; aliasing
# keeps both forms valid. The new public names (no underscore prefix)
# are preferred for new code.
_ROLE_PERMISSIONS = ROLE_PERMISSIONS
_get_workspace_tier = get_workspace_tier
_check_memory_write_permission = check_memory_write_permission
_check_memory_read_permission = check_memory_read_permission
_is_root_workspace = is_root_workspace
_auth_headers_for_heartbeat = auth_headers_for_heartbeat
-597
View File
@@ -1,597 +0,0 @@
"""Base adapter interface for agent infrastructure providers."""
import logging
import os
from abc import ABC, abstractmethod
from collections.abc import Mapping
from dataclasses import dataclass, field
from typing import Any
# ---------------------------------------------------------------------------
# Provider routing — type alias + resolver used by individual adapters.
# Each adapter defines its own ProviderRegistry with the providers it accepts.
# ---------------------------------------------------------------------------
# Maps prefix → (ordered_auth_env_vars, default_base_url).
ProviderRegistry = dict[str, tuple[tuple[str, ...], str]]
def resolve_provider_routing(
model_str: str,
env: Mapping[str, str],
*,
registry: ProviderRegistry,
runtime_config: dict[str, Any] | None = None,
) -> tuple[str, str, str]:
"""Resolve a ``provider:model`` string to ``(api_key, base_url, bare_model_id)``.
URL precedence (highest to lowest):
1. ``<PREFIX>_BASE_URL`` env var
2. ``runtime_config["provider_url"]``
3. registry default for the prefix
Unknown prefixes fall back to OPENAI_API_KEY + api.openai.com.
Raises RuntimeError when no API key env var is set for the prefix.
"""
if ":" in model_str:
prefix, model_id = model_str.split(":", 1)
else:
prefix, model_id = "openai", model_str
env_vars, default_url = registry.get(
prefix, (("OPENAI_API_KEY",), "https://api.openai.com/v1")
)
api_key = next((env[v] for v in env_vars if env.get(v)), "")
if not api_key:
raise RuntimeError(
f"No API key found for provider {prefix!r} "
f"(checked: {', '.join(env_vars)}). Set one in workspace secrets."
)
env_url = env.get(f"{prefix.upper()}_BASE_URL", "")
config_url = (runtime_config or {}).get("provider_url", "")
base_url = env_url or config_url or default_url
return api_key, base_url, model_id
from a2a.server.agent_execution import AgentExecutor
from event_log import DisabledEventLog, EventLogBackend
logger = logging.getLogger(__name__)
# Shared no-op default for adapter.event_log. Safe to share across
# adapters because every DisabledEventLog method is a pure no-op with
# no per-instance state.
_DISABLED_EVENT_LOG: EventLogBackend = DisabledEventLog()
@dataclass
class SetupResult:
"""Result from the shared _common_setup() pipeline."""
system_prompt: str
loaded_skills: list # LoadedSkill instances
langchain_tools: list # LangChain BaseTool instances
is_coordinator: bool
children: list # child workspace dicts
@dataclass
class AdapterConfig:
"""Standardized config passed to every adapter."""
model: str # e.g. "anthropic:claude-sonnet-4-6" or "openrouter:google/gemini-2.5-flash"
system_prompt: str | None = None # Assembled system prompt text
tools: list[str] = field(default_factory=list) # Tool names from config.yaml
runtime_config: dict[str, Any] = field(default_factory=dict) # Raw runtime_config block
config_path: str = "/configs" # Path to configs directory
workspace_id: str = "" # Workspace identifier
prompt_files: list[str] = field(default_factory=list) # Ordered prompt file names
a2a_port: int = 8000 # Port for A2A server
heartbeat: Any = None # HeartbeatLoop instance
@dataclass(frozen=True)
class RuntimeCapabilities:
"""Adapter-declared ownership of cross-cutting platform capabilities.
The platform provides FALLBACK implementations of heartbeat, cron,
durable session, etc. When a runtime SDK provides one of these
natively (e.g. claude-code's streaming session model, hermes-agent's
sidecar lifecycle), the adapter sets the corresponding flag to True.
The platform reads these flags and skips its fallback for that
capability — the adapter is responsible instead.
Observability is NEVER skipped: A2A protocol, activity_logs, and the
broadcaster always run regardless of who owns the capability. These
flags only switch WHO IMPLEMENTS the behavior, not whether the
platform sees it.
All defaults are False so introducing this dataclass is a no-op:
every existing adapter inherits BaseAdapter.capabilities() which
returns RuntimeCapabilities() with everything off, matching today's
"platform does it all" behavior. Each capability gets a platform-
side consumer in a follow-up PR; this class is the foundation.
See project memory `project_runtime_native_pluggable.md` for the
architecture principle these flags encode.
"""
# Heartbeat — adapter sends its own keep-alive signal to the platform's
# broadcaster instead of relying on workspace/heartbeat.py's 30s loop.
# Set True when the SDK already maintains a long-lived session that
# produces natural progress events (e.g. claude-code streaming).
provides_native_heartbeat: bool = False
# Cron / schedule — adapter handles scheduled triggers internally
# (Temporal workflows, Durable Functions, sidecar daemons). Platform
# scheduler skips polling workspace_schedules for this workspace,
# avoiding double-fire on restart.
provides_native_scheduler: bool = False
# Durable session — adapter persists in-flight session state across
# restarts and exposes it via pre_stop_state/restore_state. When True,
# the platform's a2a_queue does not need to enqueue mid-session
# requests; the adapter handles QUEUED-state on its own.
provides_native_session: bool = False
# Status lifecycle — adapter reports its own ready/degraded/failed
# state (e.g. via heartbeat metadata). Platform respects the adapter
# report instead of inferring status from heartbeat error rate.
provides_native_status_mgmt: bool = False
# Retry — adapter handles transient errors (rate limits, 5xx) with
# its own backoff. Platform stops re-dispatching A2A requests that
# the adapter explicitly marked as "retrying internally".
provides_native_retry: bool = False
# Activity log decoration — adapter contributes runtime-specific
# fields (model, token_count, latency breakdown) into activity_log
# rows alongside the platform-defined columns.
provides_activity_decoration: bool = False
# Channel dispatch — adapter sends to external channels (Slack,
# Lark, etc.) directly instead of routing through platform channels
# manager. Used when the SDK has built-in channel integrations.
provides_channel_dispatch: bool = False
def to_dict(self) -> dict[str, bool]:
"""Serializable shape for the heartbeat payload + /capabilities
endpoint. Plain dict avoids leaking dataclass internals to Go."""
return {
"heartbeat": self.provides_native_heartbeat,
"scheduler": self.provides_native_scheduler,
"session": self.provides_native_session,
"status_mgmt": self.provides_native_status_mgmt,
"retry": self.provides_native_retry,
"activity_decoration": self.provides_activity_decoration,
"channel_dispatch": self.provides_channel_dispatch,
}
class BaseAdapter(ABC):
"""Interface every agent infrastructure adapter must implement.
To add a new agent infra:
1. Create a standalone template repo (molecule-ai-workspace-template-<infra>)
2. Implement adapter.py with a class extending BaseAdapter
3. Add requirements.txt with your infra's dependencies + molecule-runtime
4. Set ADAPTER_MODULE in the Dockerfile to your adapter module path
Cross-cutting capabilities your adapter can opt into:
- capabilities() — declare native ownership of heartbeat, scheduler,
session, status mgmt, etc. (see RuntimeCapabilities above)
- idle_timeout_override() — extend the platform's per-dispatch
silence window for SDKs with long synth turns
- runtime_wedge.mark_wedged() / clear_wedge() — flip the workspace
to `degraded` + auto-recover when your SDK hits a non-recoverable
error class. Import directly from `runtime_wedge`; the heartbeat
forwards the state to the platform automatically. See the
runtime_wedge module docstring for the integration recipe.
"""
@staticmethod
@abstractmethod
def name() -> str: # pragma: no cover
"""Return the runtime identifier (e.g. 'langgraph', 'crewai').
This must match the 'runtime' field in config.yaml."""
...
@staticmethod
@abstractmethod
def display_name() -> str: # pragma: no cover
"""Human-readable name for UI display."""
...
@staticmethod
@abstractmethod
def description() -> str: # pragma: no cover
"""Short description of what this adapter provides."""
...
@staticmethod
def get_config_schema() -> dict:
"""Return JSON Schema for runtime_config fields this adapter supports.
Used by the Config tab UI to render the right form fields.
Override in subclasses for adapter-specific settings."""
return {}
def capabilities(self) -> "RuntimeCapabilities":
"""Declare which cross-cutting capabilities this adapter owns
natively vs delegates to platform fallback.
Default returns RuntimeCapabilities() — every flag False, meaning
the platform owns everything (today's behavior). Adapters override
to declare native ownership; e.g. claude-code's adapter returns
RuntimeCapabilities(provides_native_heartbeat=True,
provides_native_session=True).
Subsequent platform-side consumers (idle-timeout override,
scheduler skip, etc.) read this and route accordingly. See
project memory `project_runtime_native_pluggable.md`."""
return RuntimeCapabilities()
def idle_timeout_override(self) -> int | None:
"""Per-A2A-dispatch silence window override, in SECONDS.
Return None to use the platform default (env var
A2A_IDLE_TIMEOUT_SECONDS, falling back to 5 minutes — see
a2a_proxy.go:defaultIdleTimeoutDuration). Override when this
runtime's SDK can legitimately go silent longer than the
default before the dispatch should be considered wedged.
Why this is per-adapter, not just env: the env value is a
cluster-wide knob set by ops. Different SDKs have different
latency profiles — claude-code synthesis on Opus + tool use
legitimately runs 8-10 min between broadcasts; hermes synth
with custom providers can be even slower. Hardcoding 5min for
everyone either cancels real work (claude-code synth) or
leaves wedged runtimes (langgraph) hanging too long.
Platform reads this from the heartbeat payload and stashes
it per-workspace; dispatchA2A consults it before applying the
idle timer. None / unset / zero falls through to the global
default — same behavior as before this hook landed."""
return None
@property
def event_log(self) -> EventLogBackend:
"""Pluggable in-process event-log backend.
Adapters MAY call ``self.event_log.append(kind=..., payload=...)``
to record runtime-internal events (tool dispatch, skill load,
executor errors, peer-handoff). Readers query the buffer via
the platform's ``/workspaces/:id/activity`` endpoint with a
cursor — see ``event_log.py`` for the protocol.
Default: shared ``DisabledEventLog`` no-op, so adapters that
never set this still link cleanly. ``main.py`` overrides at boot
from the ``observability.event_log`` config block."""
return getattr(self, "_event_log", None) or _DISABLED_EVENT_LOG
@event_log.setter
def event_log(self, backend: EventLogBackend) -> None:
self._event_log = backend
# ------------------------------------------------------------------
# Plugin install hooks
# ------------------------------------------------------------------
# New pipeline: each plugin ships per-runtime adaptors resolved via
# `plugins_registry.resolve()`. Adapters expose hooks below that
# adaptors call to wire plugin content into the runtime.
#
# Default implementations are filesystem-only (write to /configs,
# append to CLAUDE.md). Runtimes with a dynamic tool registry
# (e.g. DeepAgents sub-agents) override the hooks to also register
# in-process state.
def memory_filename(self) -> str:
"""File under /configs that the runtime treats as long-lived memory.
Both Claude Code and DeepAgents read CLAUDE.md natively, so this is
the sensible default. Override only if a runtime expects a different
filename.
"""
return "CLAUDE.md"
def register_tool_hook(self, name: str, fn) -> None:
"""Default no-op. Override on runtimes with a dynamic tool registry.
Runtimes that pick tools up at startup via filesystem scan (Claude
Code reads /configs/skills, LangGraph globs **/*.py) don't need to
do anything here — the adaptor's file-write step is enough.
"""
return None
async def transcript_lines(self, since: int = 0, limit: int = 100) -> dict:
"""Return live transcript entries for the most-recent agent session.
Default implementation returns ``supported: False`` for runtimes
that don't expose a per-session log on disk. Override in subclasses
that DO (Claude Code reads ``~/.claude/projects/<cwd>/<session>.jsonl``).
This is the "look over the agent's shoulder" feature — lets canvas /
operators see live tool calls + AI thinking instead of waiting for
the high-level activity log to flush.
Args:
since: line offset to skip — caller's last cursor (0 = from start)
limit: max lines to return (caller-side cap, default 100, max 1000)
Returns:
``{runtime, supported, lines, cursor, more, source}`` where
``cursor`` is the new offset to pass on the next poll, ``more``
is True if additional lines remain past ``limit``, and ``source``
is the file path lines were read from (useful for debugging).
"""
return {
"runtime": self.name(),
"supported": False,
"lines": [],
"cursor": since,
"more": False,
"source": None,
}
def pre_stop_state(self) -> dict:
"""Capture in-memory state for pause/resume serialization.
Called by main.py's shutdown handler just before the container exits.
Returns a dict that will be scrubbed (via lib.snapshot_scrub) and
written to /configs/.agent_snapshot.json.
Default implementation:
1. Attempts to read ``self._executor._session_id`` (set by
create_executor) and includes it as ``session_id``.
2. Includes up to 200 recent transcript lines via transcript_lines().
Override in adapters that hold additional in-memory state that
should survive a container stop.
Returns:
A JSON-serializable dict. All string values are scrubbed before
persisting, so it is safe to include raw content from the
agent's context.
"""
from lib.pre_stop import MAX_TRANSCRIPT_LINES
state: dict = {}
# Session handle — critical for resuming the Claude Code session.
executor = getattr(self, "_executor", None)
if executor is not None:
session_id = getattr(executor, "_session_id", None)
if session_id:
state["session_id"] = session_id
# Recent conversation log — captures where the agent left off.
# transcript_lines() may be async; call it synchronously if possible,
# otherwise let async adapters override pre_stop_state entirely.
try:
import inspect as _inspect
transcript_fn = self.transcript_lines
if _inspect.iscoroutinefunction(transcript_fn):
# Async adapter — override pre_stop_state() for transcript access.
# The base impl still captures session_id above.
pass
else:
transcript = transcript_fn(since=0, limit=MAX_TRANSCRIPT_LINES)
if transcript.get("supported"):
state["transcript_lines"] = transcript.get("lines", [])
except Exception:
# Best-effort: never let transcript capture failure block serialization.
pass
return state
def restore_state(self, snapshot: dict) -> None:
"""Restore in-memory state from a pause/resume snapshot.
Called by main.py on first boot when /configs/.agent_snapshot.json
exists. Gives the adapter a chance to restore session handles,
conversation context, or any other in-memory state before the A2A
server starts accepting requests.
Default implementation stores ``snapshot["session_id"]`` and
``snapshot["transcript_lines"]`` as ``self._snapshot_session_id``
and ``self._snapshot_transcript`` so that ``create_executor()`` or
the executor itself can pick them up.
Args:
snapshot: The scrubbed snapshot dict previously written by
pre_stop_state(). All secrets have already been redacted.
"""
self._snapshot_session_id: str | None = snapshot.get("session_id")
self._snapshot_transcript: list | None = snapshot.get("transcript_lines")
def register_subagent_hook(self, name: str, spec: dict) -> None:
"""Default no-op. DeepAgents overrides to register a sub-agent."""
return None
def append_to_memory_hook(self, config: AdapterConfig, filename: str, content: str) -> None:
"""Append text to /configs/<filename> if the marker isn't already present.
Idempotent: looks for the first line of `content` as a marker so a
re-install doesn't duplicate the block. Adaptors should pass content
beginning with a unique header (e.g. ``# Plugin: molecule-dev-conventions``).
"""
import os
target = os.path.join(config.config_path, filename)
marker = content.splitlines()[0].strip() if content else ""
existing = ""
if os.path.exists(target):
with open(target) as f:
existing = f.read()
if marker and marker in existing:
logger.info("append_to_memory: %s already contains %r — skipping", filename, marker)
return
os.makedirs(os.path.dirname(target) or ".", exist_ok=True)
with open(target, "a") as f:
if existing and not existing.endswith("\n"):
f.write("\n")
f.write(content if content.endswith("\n") else content + "\n")
logger.info("append_to_memory: appended %d chars to %s", len(content), filename)
async def install_plugins_via_registry(
self,
config: AdapterConfig,
plugins,
) -> list:
"""Drive the new per-runtime adaptor pipeline for every loaded plugin.
For each plugin in `plugins.plugins`, resolve the adaptor for this
runtime (via :func:`plugins_registry.resolve`) and invoke
``install(ctx)``. Returns the list of :class:`InstallResult` so
callers can surface warnings (e.g. raw-drop fallback hits).
Adapters whose runtime supports the new pipeline call this from
``setup()`` instead of the legacy ``inject_plugins()``.
"""
from pathlib import Path
from plugins_registry import InstallContext, resolve
results = []
runtime = self.name().replace("-", "_") # e.g. "claude-code" -> "claude_code"
for plugin in plugins.plugins:
adaptor, source = resolve(plugin.name, runtime, Path(plugin.path))
ctx = InstallContext(
configs_dir=Path(config.config_path),
workspace_id=config.workspace_id,
runtime=runtime,
plugin_root=Path(plugin.path),
memory_filename=self.memory_filename(),
register_tool=self.register_tool_hook,
register_subagent=self.register_subagent_hook,
append_to_memory=lambda fn, c, _cfg=config: self.append_to_memory_hook(_cfg, fn, c),
)
try:
result = await adaptor.install(ctx)
results.append(result)
logger.info(
"Plugin %s installed via %s adaptor (warnings: %d)",
plugin.name, source, len(result.warnings),
)
except Exception as exc:
logger.exception("Plugin %s install via %s failed: %s", plugin.name, source, exc)
return results
async def inject_plugins(self, config: AdapterConfig, plugins) -> None:
"""Legacy hook — kept for backwards compatibility during migration.
Default: drive the new per-runtime adaptor pipeline. Adapters not yet
migrated may still override this with their own logic.
"""
await self.install_plugins_via_registry(config, plugins)
async def _common_setup(self, config: AdapterConfig) -> SetupResult:
"""Shared setup pipeline — loads plugins, skills, tools, coordinator, and builds system prompt.
All adapters can call this to get the full platform feature set.
Returns a SetupResult with LangChain BaseTool instances that adapters
convert to their native format if needed.
"""
from plugins import load_plugins
from skill_loader.loader import load_skills
from coordinator import get_children, build_children_description
from prompt import build_system_prompt, get_peer_capabilities, get_platform_instructions
from builtin_tools.approval import request_approval
from builtin_tools.delegation import delegate_task, delegate_task_async, check_task_status
from builtin_tools.memory import commit_memory, recall_memory
from builtin_tools.sandbox import run_code
platform_url = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
# Load plugins from per-workspace dir first, then shared fallback
workspace_plugins_dir = os.path.join(config.config_path, "plugins")
plugins = load_plugins(
workspace_plugins_dir=workspace_plugins_dir,
shared_plugins_dir=os.environ.get("PLUGINS_DIR", "/plugins"),
)
await self.inject_plugins(config, plugins)
if plugins.plugin_names:
logger.info(f"Plugins: {', '.join(plugins.plugin_names)}")
# Load skills (workspace + plugin skills, deduped). Pass the runtime
# name so SKILL.md frontmatter `runtime: [...]` can opt skills out
# of incompatible adapters (hermes won't load claude-code-only
# skills, etc.).
runtime_name = type(self).name()
loaded_skills = load_skills(config.config_path, config.tools, current_runtime=runtime_name)
seen_skill_ids = {s.metadata.id for s in loaded_skills}
for plugin_skills_dir in plugins.skill_dirs:
plugin_skill_names = [
d for d in os.listdir(plugin_skills_dir)
if os.path.isdir(os.path.join(plugin_skills_dir, d))
]
for skill in load_skills(plugin_skills_dir, plugin_skill_names, current_runtime=runtime_name):
if skill.metadata.id not in seen_skill_ids:
loaded_skills.append(skill)
seen_skill_ids.add(skill.metadata.id)
logger.info(f"Loaded {len(loaded_skills)} skills: {[s.metadata.id for s in loaded_skills]}")
# Core platform tools — names mirror the platform_tools registry,
# so the names referenced in get_a2a_instructions/get_hma_instructions
# are guaranteed to exist as @tool symbols here. The structural
# alignment test in tests/test_platform_tools.py pins this.
all_tools = [
delegate_task, delegate_task_async, check_task_status,
request_approval, commit_memory, recall_memory, run_code,
]
for skill in loaded_skills:
all_tools.extend(skill.tools)
# Coordinator mode: detect children and add routing tool
children = await get_children()
is_coordinator = len(children) > 0
if is_coordinator:
from coordinator import route_task_to_team
logger.info(f"Coordinator mode: {len(children)} children")
all_tools.append(route_task_to_team)
# Build system prompt with all context. Parent→child knowledge sharing
# was previously handled by `shared_context` (parent's config.yaml file
# paths injected into the child's prompt at boot). That path was removed
# — agents now pull team-scoped knowledge via memory v2's team:<id>
# namespace (recall_memory) on demand instead of paying for it on every
# boot regardless of need. See RFC #2789 for the future shared-file
# storage that complements this for large blob-shaped artefacts.
peers = await get_peer_capabilities(platform_url, config.workspace_id)
platform_instructions = await get_platform_instructions(platform_url, config.workspace_id)
coordinator_prompt = build_children_description(children) if is_coordinator else ""
extra_prompts = list(plugins.prompt_fragments)
if coordinator_prompt:
extra_prompts.append(coordinator_prompt)
system_prompt = build_system_prompt(
config.config_path, config.workspace_id, loaded_skills, peers,
prompt_files=config.prompt_files,
plugin_rules=plugins.rules,
plugin_prompts=extra_prompts,
platform_instructions=platform_instructions,
)
return SetupResult(
system_prompt=system_prompt,
loaded_skills=loaded_skills,
langchain_tools=all_tools,
is_coordinator=is_coordinator,
children=children,
)
@abstractmethod
async def setup(self, config: AdapterConfig) -> None:
"""One-time setup: validate config, prepare internal state.
Called after deps are installed but before create_executor().
Raise RuntimeError if setup fails (missing deps, bad config, etc.)."""
... # pragma: no cover
@abstractmethod
async def create_executor(self, config: AdapterConfig) -> AgentExecutor:
"""Create and return an AgentExecutor ready for A2A integration.
The returned executor's execute() method will be called by the
A2A server's DefaultRequestHandler.
Subclasses should also store the returned executor as ``self._executor``
so ``pre_stop_state()`` can access it for serialization.
"""
... # pragma: no cover
-22
View File
@@ -1,22 +0,0 @@
"""Adapter registry shim.
Adapters extracted to standalone repos (molecule-ai-workspace-template-*).
ADAPTER_MODULE env var is the primary discovery mechanism in production.
This shim provides backward-compatible imports for local dev + tests.
"""
import importlib
import os
import logging
from adapter_base import BaseAdapter, AdapterConfig
logger = logging.getLogger(__name__)
def get_adapter(runtime: str) -> type[BaseAdapter]:
adapter_module = os.environ.get("ADAPTER_MODULE")
if adapter_module:
mod = importlib.import_module(adapter_module)
return getattr(mod, "Adapter")
raise KeyError(
f"No ADAPTER_MODULE set for runtime '{runtime}'. "
"Adapters now live in standalone template repos."
)
-2
View File
@@ -1,2 +0,0 @@
"""Re-export from adapter_base for backward compat."""
from adapter_base import * # noqa: F401,F403
-130
View File
@@ -1,130 +0,0 @@
# Google ADK Adapter
Molecule AI workspace adapter for [Google Agent Development Kit (ADK)](https://github.com/google/adk-python) — Google's official multi-agent Python SDK (~19k ⭐, Apache-2.0).
## Overview
This adapter bridges the A2A protocol used by the Molecule AI platform to Google ADK's runner/session model. Agents are backed by Google Gemini models via AI Studio or Vertex AI. Each workspace gets an `LlmAgent` wrapped in a `Runner` with an `InMemorySessionService`; sessions are tied to A2A task context IDs for stable, isolated per-conversation state.
**Runtime key:** `google-adk`
## Installation
The adapter dependencies are installed automatically by `entrypoint.sh` from this directory's `requirements.txt`:
```bash
pip install -r adapters/google-adk/requirements.txt
```
You'll also need a Google API key (AI Studio) or Vertex AI credentials.
## Configuration
### `config.yaml`
```yaml
runtime: google-adk
model: google:gemini-2.0-flash # or gemini-1.5-pro, gemini-2.5-flash, etc.
runtime_config:
agent_name: my-agent # optional, default: molecule-adk-agent
max_output_tokens: 8192 # optional, default: 8192
temperature: 1.0 # optional, default: 1.0
```
### Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `GOOGLE_API_KEY` | Yes (unless Vertex AI) | Google AI Studio API key |
| `GOOGLE_GENAI_USE_VERTEXAI` | No | Set to `"1"` to use Vertex AI instead of AI Studio |
| `GOOGLE_CLOUD_PROJECT` | When using Vertex AI | GCP project ID |
| `GOOGLE_CLOUD_LOCATION` | When using Vertex AI | GCP region, e.g. `"us-central1"` |
## Usage Example
```python
import asyncio
from adapter_base import AdapterConfig
from adapters.google_adk.adapter import GoogleADKAdapter
async def main():
config = AdapterConfig(
model="google:gemini-2.0-flash",
system_prompt="You are a helpful assistant.",
runtime_config={
"agent_name": "demo-agent",
"max_output_tokens": 1024,
"temperature": 0.7,
},
workspace_id="ws-demo",
)
adapter = GoogleADKAdapter()
await adapter.setup(config) # validates keys, loads plugins/skills
executor = await adapter.create_executor(config) # returns GoogleADKA2AExecutor
# executor.execute(context, event_queue) is called by the A2A server per turn
print(f"Adapter: {adapter.display_name()} — model {config.model}")
asyncio.run(main())
```
### Running via A2A
Once the workspace is provisioned, send A2A messages as normal:
```bash
curl -X POST http://localhost:8000 \
-H 'Content-Type: application/json' \
-d '{
"method": "message/send",
"params": {
"message": {
"role": "user",
"parts": [{"kind": "text", "text": "What is 2 + 2?"}]
}
}
}'
```
## Supported Models
Any model supported by Google ADK and available through your credential path:
| Model | Notes |
|-------|-------|
| `gemini-2.0-flash` | Recommended — fast, cost-effective |
| `gemini-2.5-flash` | Latest preview, strong reasoning |
| `gemini-1.5-pro` | Higher capability, higher latency |
| `gemini-1.5-flash` | Fast, lower cost |
Use the `google:` prefix in `config.yaml` — the adapter strips it before passing the model name to ADK.
## Architecture
```
A2A Request
GoogleADKA2AExecutor.execute()
├── extract_message_text() ← shared_runtime helper
├── _ensure_session() ← create/reuse InMemorySessionService session
├── _build_content() ← wrap text in google.genai.types.Content
runner.run_async(session_id, user_id, new_message)
ADK Event stream → filter is_final_response() → extract text
event_queue.enqueue_event(new_agent_text_message(reply))
A2A Response
```
## License
Apache-2.0 — same as [google/adk-python](https://github.com/google/adk-python).
-408
View File
@@ -1,408 +0,0 @@
"""Google ADK adapter for Molecule AI workspace runtime.
Wraps Google's Agent Development Kit (google-adk v1.x) as a Molecule AI
WorkspaceAdapter, bridging the A2A protocol to Google ADK's runner/session
model.
Google ADK concepts used
------------------------
- ``google.adk.agents.LlmAgent`` — An LLM-backed agent with instructions and
optional tools. Declared with ``model``, ``name``, and ``instruction``.
- ``google.adk.runners.Runner`` — Drives one or more agents inside a session;
``run_async()`` streams ``Event`` objects, including the final response text.
- ``google.adk.sessions.InMemorySessionService`` — Manages session state in
memory. Each ``Runner`` owns a single ``InMemorySessionService`` instance.
Runtime-config keys (all optional)
------------------------------------
``max_output_tokens`` — int, default 8192. Forwarded to the ADK ``GenerateContentConfig``.
``temperature`` — float, default 1.0.
``agent_name`` — str, default ``"molecule-adk-agent"``.
Environment variables
---------------------
``GOOGLE_API_KEY`` — Google AI Studio key (required for ``gemini-*`` models).
``GOOGLE_GENAI_USE_VERTEXAI`` — set to ``"1"`` to use Vertex AI instead of AI
Studio. In that case supply
``GOOGLE_CLOUD_PROJECT`` and
``GOOGLE_CLOUD_LOCATION`` as well.
"""
from __future__ import annotations
import logging
import os
from typing import TYPE_CHECKING, Any
from a2a.server.agent_execution import AgentExecutor, RequestContext
from a2a.server.events import EventQueue
from a2a.helpers import new_text_message
from adapter_base import AdapterConfig, BaseAdapter
# Import sanitize_agent_error from the workspace package. The adapter lives
# in the workspace/adapters/ hierarchy so the workspace package root is
# always importable as long as the module is loaded from within a workspace.
# In standalone template repos, this import resolves via the workspace package
# entry point that also provides adapter_base.
try:
from executor_helpers import sanitize_agent_error # type: ignore[attr-defined]
except ImportError: # pragma: no cover
sanitize_agent_error = None # fallback: below handler falls back to class-name only
if TYPE_CHECKING:
pass
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
_DEFAULT_AGENT_NAME = "molecule-adk-agent"
_DEFAULT_MAX_OUTPUT_TOKENS = 8192
_DEFAULT_TEMPERATURE = 1.0
_NO_TEXT_MSG = "Error: message contained no text content."
_NO_RESPONSE_MSG = "(no response generated)"
# ---------------------------------------------------------------------------
# GoogleADKA2AExecutor
# ---------------------------------------------------------------------------
class GoogleADKA2AExecutor(AgentExecutor):
"""A2A executor backed by a Google ADK ``Runner``.
Each executor instance owns a single ``Runner`` and ``InMemorySessionService``.
Sessions are created on first use and reused across subsequent turns
(the session_id is derived from the A2A context_id so each task gets a
stable, isolated session).
Parameters
----------
model:
ADK model identifier, e.g. ``"gemini-2.0-flash"`` or
``"gemini-1.5-pro"``.
system_prompt:
Optional instruction prepended to every conversation. Passed to
``LlmAgent(instruction=...)``.
agent_name:
Internal ADK agent name. Defaults to ``_DEFAULT_AGENT_NAME``.
max_output_tokens:
Token cap forwarded to ``GenerateContentConfig``.
temperature:
Sampling temperature forwarded to ``GenerateContentConfig``.
heartbeat:
Optional ``HeartbeatLoop`` instance (unused directly but stored for
future heartbeat integration).
_runner:
Inject a pre-built ``Runner`` — for testing only. When provided,
the real ADK ``Runner`` is never constructed.
"""
def __init__(
self,
model: str,
system_prompt: str | None = None,
agent_name: str = _DEFAULT_AGENT_NAME,
max_output_tokens: int = _DEFAULT_MAX_OUTPUT_TOKENS,
temperature: float = _DEFAULT_TEMPERATURE,
heartbeat: Any = None,
_runner: Any = None,
) -> None:
self.model = model
self.system_prompt = system_prompt
self.agent_name = agent_name
self.max_output_tokens = max_output_tokens
self.temperature = temperature
self._heartbeat = heartbeat
self._sessions_created: set[str] = set()
if _runner is not None:
# Test injection — skip building the real ADK objects.
self._runner = _runner
else:
self._runner = self._build_runner()
# ------------------------------------------------------------------
# Internal helpers
# ------------------------------------------------------------------
def _build_runner(self) -> Any: # pragma: no cover — requires real ADK
"""Construct a Google ADK ``Runner`` with an ``LlmAgent``.
Lazy-imports ``google.adk`` so the rest of the workspace runtime
doesn't pull in google-adk on startup (it's only needed when this
executor is actually instantiated by ``GoogleADKAdapter.create_executor``).
"""
from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
agent = LlmAgent(
name=self.agent_name,
model=self.model,
instruction=self.system_prompt or "",
)
session_service = InMemorySessionService()
runner = Runner(
agent=agent,
app_name=self.agent_name,
session_service=session_service,
)
return runner
async def _ensure_session(self, session_id: str, user_id: str) -> None:
"""Create a session in the service if it doesn't exist yet."""
if session_id in self._sessions_created:
return
session_service = self._runner.session_service
existing = await session_service.get_session(
app_name=self.agent_name,
user_id=user_id,
session_id=session_id,
)
if existing is None:
await session_service.create_session(
app_name=self.agent_name,
user_id=user_id,
session_id=session_id,
)
self._sessions_created.add(session_id)
def _extract_text(self, context: RequestContext) -> str:
"""Pull plain text out of the A2A message parts."""
from shared_runtime import extract_message_text
return extract_message_text(context)
def _build_content(self, user_text: str) -> Any:
"""Wrap user text in an ADK-compatible ``Content`` object."""
from google.genai.types import Content, Part
return Content(role="user", parts=[Part(text=user_text)])
# ------------------------------------------------------------------
# AgentExecutor interface
# ------------------------------------------------------------------
async def execute(self, context: RequestContext, event_queue: EventQueue) -> None:
"""Run a single ADK turn and enqueue the reply as an A2A Message.
Sequence:
1. Extract user text from A2A message parts.
2. Ensure an ADK session exists for this context_id.
3. Call ``runner.run_async()`` and collect all response events.
4. Concatenate final-response text; fall back to ``_NO_RESPONSE_MSG``
when the model produces no output.
5. Enqueue the reply via ``event_queue``.
"""
user_text = self._extract_text(context)
if not user_text:
parts = getattr(getattr(context, "message", None), "parts", None)
logger.warning("GoogleADKA2AExecutor: no text in message parts: %s", parts)
await event_queue.enqueue_event(new_text_message(_NO_TEXT_MSG))
return
session_id = getattr(context, "context_id", None) or "default-session"
user_id = "molecule-user"
try:
await self._ensure_session(session_id, user_id)
content = self._build_content(user_text)
response_parts: list[str] = []
async for event in self._runner.run_async(
session_id=session_id,
user_id=user_id,
new_message=content,
):
# Collect text from final-response events
if not getattr(event, "is_final_response", lambda: False)():
continue
candidate_response = getattr(event, "response", None)
if candidate_response is None:
continue
for part in getattr(
getattr(candidate_response, "content", None) or MissingContent(),
"parts", []
):
text = getattr(part, "text", None)
if text:
response_parts.append(text)
final_text = "".join(response_parts).strip() or _NO_RESPONSE_MSG
await event_queue.enqueue_event(new_text_message(final_text))
except Exception as exc:
logger.error(
"GoogleADKA2AExecutor: execution error [model=%s]: %s",
self.model,
type(exc).__name__,
exc_info=True,
)
# Include exception detail (first ~1 KB) in the A2A error response so
# callers get actionable context without needing workspace log access.
# sanitize_agent_error scrubs API keys / bearer tokens before including
# content in the response. Falls back to class-name-only when
# the function is unavailable (standalone template repo layout).
if sanitize_agent_error is not None:
msg = sanitize_agent_error(stderr=str(exc))
else:
msg = f"Agent error: {type(exc).__name__}"
await event_queue.enqueue_event(new_text_message(msg))
async def cancel(self, context: RequestContext, event_queue: EventQueue) -> None:
"""Cancel a running task — emits canceled state per A2A protocol."""
from a2a.types import TaskState, TaskStatus, TaskStatusUpdateEvent
await event_queue.enqueue_event(
TaskStatusUpdateEvent(
status=TaskStatus(state=TaskState.TASK_STATE_CANCELED),
final=True,
)
)
class MissingContent:
"""Sentinel to avoid AttributeError when response.content is None."""
parts: list = []
# ---------------------------------------------------------------------------
# GoogleADKAdapter
# ---------------------------------------------------------------------------
class GoogleADKAdapter(BaseAdapter):
"""Molecule AI workspace adapter for Google ADK (google-adk v1.x).
Implements the full ``BaseAdapter`` lifecycle:
- ``setup()`` — validates config and runs ``_common_setup()``.
- ``create_executor()`` — returns a ``GoogleADKA2AExecutor`` configured
from ``AdapterConfig``.
"""
# Stored by setup(); consumed by create_executor()
_setup_result: Any = None
# ------------------------------------------------------------------
# Identity
# ------------------------------------------------------------------
@staticmethod
def name() -> str:
"""Runtime identifier — matches the ``runtime`` field in config.yaml."""
return "google-adk"
@staticmethod
def display_name() -> str:
"""Human-readable name shown in the Molecule AI UI."""
return "Google ADK"
@staticmethod
def description() -> str:
"""Short description of this adapter's capabilities."""
return (
"Google Agent Development Kit (ADK) adapter. "
"Runs LLM agents via Google Gemini models using the official "
"google-adk Python SDK (Apache-2.0)."
)
@staticmethod
def get_config_schema() -> dict:
"""JSON Schema for runtime_config fields rendered in the Config tab."""
return {
"type": "object",
"properties": {
"agent_name": {
"type": "string",
"default": _DEFAULT_AGENT_NAME,
"description": "Internal ADK agent name",
},
"max_output_tokens": {
"type": "integer",
"default": _DEFAULT_MAX_OUTPUT_TOKENS,
"description": "Maximum output tokens for the Gemini model",
},
"temperature": {
"type": "number",
"default": _DEFAULT_TEMPERATURE,
"minimum": 0.0,
"maximum": 2.0,
"description": "Sampling temperature",
},
},
"additionalProperties": False,
}
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
async def setup(self, config: AdapterConfig) -> None:
"""Validate config and run the shared platform setup pipeline.
Raises ``RuntimeError`` if the required API key is not set and
Vertex AI mode is not active.
Args:
config: ``AdapterConfig`` populated by the workspace runtime.
"""
use_vertex = os.environ.get("GOOGLE_GENAI_USE_VERTEXAI", "").strip() in ("1", "true", "True")
api_key = os.environ.get("GOOGLE_API_KEY", "").strip()
if not use_vertex and not api_key:
raise RuntimeError(
"GoogleADKAdapter requires GOOGLE_API_KEY (for AI Studio) or "
"GOOGLE_GENAI_USE_VERTEXAI=1 with GOOGLE_CLOUD_PROJECT set."
)
logger.info(
"GoogleADKAdapter.setup: model=%s vertex=%s", config.model, use_vertex
)
self._setup_result = await self._common_setup(config)
async def create_executor(self, config: AdapterConfig) -> GoogleADKA2AExecutor:
"""Build and return a ``GoogleADKA2AExecutor`` for A2A integration.
Uses the system prompt assembled by ``_common_setup()`` in ``setup()``.
Runtime-config keys ``agent_name``, ``max_output_tokens``, and
``temperature`` are respected when present.
Args:
config: ``AdapterConfig`` populated by the workspace runtime.
Returns:
A ready-to-use ``GoogleADKA2AExecutor`` instance.
"""
rc = config.runtime_config or {}
# Strip provider prefix from model, e.g. "google:gemini-2.0-flash" → "gemini-2.0-flash"
model = config.model
if ":" in model:
model = model.split(":", 1)[1]
system_prompt = (
self._setup_result.system_prompt
if self._setup_result is not None
else config.system_prompt or ""
)
return GoogleADKA2AExecutor(
model=model,
system_prompt=system_prompt,
agent_name=rc.get("agent_name", _DEFAULT_AGENT_NAME),
max_output_tokens=int(rc.get("max_output_tokens", _DEFAULT_MAX_OUTPUT_TOKENS)),
temperature=float(rc.get("temperature", _DEFAULT_TEMPERATURE)),
heartbeat=config.heartbeat,
)
# ---------------------------------------------------------------------------
# Module-level alias required by the adapter autodiscovery loader
# ---------------------------------------------------------------------------
Adapter = GoogleADKAdapter
@@ -1,7 +0,0 @@
# Google ADK adapter dependencies
# Pin to the latest stable release — update when a new version is verified.
google-adk==1.30.0
# google-adk transitively requires google-genai; pin explicitly for
# reproducibility (same pinning convention as other adapter requirements.txt).
google-genai>=1.16.0
@@ -1,993 +0,0 @@
"""Unit tests for adapters/google-adk/adapter.py.
Coverage targets (100%)
-----------------------
- Module constants: _DEFAULT_AGENT_NAME, _DEFAULT_MAX_OUTPUT_TOKENS, etc.
- MissingContent sentinel class
- GoogleADKA2AExecutor.__init__ — field assignment + runner injection
- GoogleADKA2AExecutor._extract_text
- GoogleADKA2AExecutor._build_content
- GoogleADKA2AExecutor._ensure_session — first call (create), subsequent call (skip)
- GoogleADKA2AExecutor.execute — happy path, empty input, API error,
no final_response events, partial text
- GoogleADKA2AExecutor.cancel — TaskStatusUpdateEvent emitted
- GoogleADKAdapter.name / display_name / description / get_config_schema
- GoogleADKAdapter.setup — success, missing key, vertex override
- GoogleADKAdapter.create_executor — model stripping, defaults, rc overrides
- Adapter alias
All google-adk, google-genai, and shared_runtime calls are mocked.
No live API calls are made.
"""
from __future__ import annotations
import sys
from types import ModuleType
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
# ---------------------------------------------------------------------------
# Stub heavy external modules BEFORE the adapter is imported.
# conftest.py already stubs: a2a, builtin_tools, langchain_core.
# We need to additionally stub: google.adk, google.genai, shared_runtime.
# ---------------------------------------------------------------------------
def _make_a2a_stubs() -> None:
"""Register minimal a2a SDK stubs in sys.modules.
Mirrors what workspace/tests/conftest.py does; needed because
this test file lives outside the ``tests/`` directory and conftest.py
is not automatically loaded for it.
"""
if "a2a" in sys.modules:
# Already mocked by conftest — just ensure new_agent_text_message is passthrough
a2a_utils = sys.modules.get("a2a.utils")
if a2a_utils and callable(getattr(a2a_utils, "new_agent_text_message", None)):
a2a_utils.new_agent_text_message = lambda text, **kwargs: text
return
agent_execution_mod = ModuleType("a2a.server.agent_execution")
class AgentExecutor:
pass
class RequestContext:
pass
agent_execution_mod.AgentExecutor = AgentExecutor
agent_execution_mod.RequestContext = RequestContext
events_mod = ModuleType("a2a.server.events")
class EventQueue:
pass
events_mod.EventQueue = EventQueue
tasks_mod = ModuleType("a2a.server.tasks")
types_mod = ModuleType("a2a.types")
class Part:
# v1: Part takes text= directly; root= retained for compat during transition
def __init__(self, text=None, root=None, **kwargs):
self.text = text
types_mod.Part = Part
# a2a.helpers (v1: moved from a2a.utils)
helpers_mod = ModuleType("a2a.helpers")
# Passthrough so tests can assert on the plain text string, matching the
# hermes_executor test convention from conftest.py.
helpers_mod.new_agent_text_message = lambda text, **kwargs: text
a2a_mod = ModuleType("a2a")
a2a_server_mod = ModuleType("a2a.server")
sys.modules["a2a"] = a2a_mod
sys.modules["a2a.server"] = a2a_server_mod
sys.modules["a2a.server.agent_execution"] = agent_execution_mod
sys.modules["a2a.server.events"] = events_mod
sys.modules["a2a.server.tasks"] = tasks_mod
sys.modules["a2a.types"] = types_mod
sys.modules["a2a.helpers"] = helpers_mod
def _make_google_adk_stubs() -> None:
"""Register minimal google.adk and google.genai stubs in sys.modules."""
# google (top-level namespace package)
google_mod = sys.modules.get("google") or ModuleType("google")
google_mod.__path__ = []
sys.modules.setdefault("google", google_mod)
# google.genai
google_genai_mod = ModuleType("google.genai")
google_genai_mod.__path__ = []
google_genai_types_mod = ModuleType("google.genai.types")
class _Content:
def __init__(self, role="user", parts=None):
self.role = role
self.parts = parts or []
class _Part:
def __init__(self, text=""):
self.text = text
google_genai_types_mod.Content = _Content
google_genai_types_mod.Part = _Part
sys.modules["google.genai"] = google_genai_mod
sys.modules["google.genai.types"] = google_genai_types_mod
# google.adk
google_adk_mod = ModuleType("google.adk")
google_adk_mod.__path__ = []
# google.adk.agents
google_adk_agents_mod = ModuleType("google.adk.agents")
class _LlmAgent:
def __init__(self, name="", model="", instruction="", tools=None):
self.name = name
self.model = model
self.instruction = instruction
self.tools = tools or []
google_adk_agents_mod.LlmAgent = _LlmAgent
# google.adk.runners
google_adk_runners_mod = ModuleType("google.adk.runners")
class _Runner:
def __init__(self, agent=None, app_name="", session_service=None):
self.agent = agent
self.app_name = app_name
self.session_service = session_service
async def run_async(self, session_id, user_id, new_message):
# Stub — tests override this via mock runner
return
yield # make it an async generator
google_adk_runners_mod.Runner = _Runner
# google.adk.sessions
google_adk_sessions_mod = ModuleType("google.adk.sessions")
class _InMemorySessionService:
def __init__(self):
self._sessions: dict = {}
async def get_session(self, app_name, user_id, session_id):
return self._sessions.get((app_name, user_id, session_id))
async def create_session(self, app_name, user_id, session_id):
self._sessions[(app_name, user_id, session_id)] = {"id": session_id}
return self._sessions[(app_name, user_id, session_id)]
google_adk_sessions_mod.InMemorySessionService = _InMemorySessionService
sys.modules["google.adk"] = google_adk_mod
sys.modules["google.adk.agents"] = google_adk_agents_mod
sys.modules["google.adk.runners"] = google_adk_runners_mod
sys.modules["google.adk.sessions"] = google_adk_sessions_mod
def _make_shared_runtime_stub() -> None:
"""Register shared_runtime stub with extract_message_text."""
if "shared_runtime" not in sys.modules:
mod = ModuleType("shared_runtime")
def _extract_message_text(ctx) -> str:
parts = getattr(getattr(ctx, "message", None), "parts", None)
if parts is None:
parts = ctx
texts = []
for p in parts or []:
t = getattr(p, "text", None) or getattr(
getattr(p, "root", None), "text", None
) or ""
if t:
texts.append(t)
return " ".join(texts).strip()
mod.extract_message_text = _extract_message_text
sys.modules["shared_runtime"] = mod
def _make_adapter_base_stub() -> None:
"""Register adapter_base stub in sys.modules."""
if "adapter_base" not in sys.modules:
mod = ModuleType("adapter_base")
from dataclasses import dataclass, field
from abc import ABC, abstractmethod
@dataclass
class AdapterConfig:
model: str = "google:gemini-2.0-flash"
system_prompt: str | None = None
tools: list = field(default_factory=list)
runtime_config: dict = field(default_factory=dict)
config_path: str = "/configs"
workspace_id: str = ""
prompt_files: list = field(default_factory=list)
a2a_port: int = 8000
heartbeat: object = None
class BaseAdapter(ABC):
@staticmethod
@abstractmethod
def name() -> str: ... # pragma: no cover
@staticmethod
@abstractmethod
def display_name() -> str: ... # pragma: no cover
@staticmethod
@abstractmethod
def description() -> str: ... # pragma: no cover
@staticmethod
def get_config_schema() -> dict:
return {}
def memory_filename(self) -> str:
return "CLAUDE.md"
def register_tool_hook(self, name, fn): return None # noqa
async def transcript_lines(self, since=0, limit=100): return {"supported": False} # noqa
def register_subagent_hook(self, name, spec): return None # noqa
def append_to_memory_hook(self, config, filename, content): pass # noqa
async def install_plugins_via_registry(self, config, plugins): return [] # noqa
async def inject_plugins(self, config, plugins):
await self.install_plugins_via_registry(config, plugins)
async def _common_setup(self, config):
from types import SimpleNamespace
return SimpleNamespace(
system_prompt="mocked system prompt",
loaded_skills=[],
langchain_tools=[],
is_coordinator=False,
children=[],
)
@abstractmethod
async def setup(self, config) -> None: ... # pragma: no cover
@abstractmethod
async def create_executor(self, config): ... # pragma: no cover
mod.AdapterConfig = AdapterConfig
mod.BaseAdapter = BaseAdapter
mod.SetupResult = None
sys.modules["adapter_base"] = mod
# Install all stubs before importing the module under test
# Order matters: a2a must be stubbed before adapter.py is imported so that
# `from a2a.utils import new_agent_text_message` resolves to the passthrough.
_make_a2a_stubs()
_make_google_adk_stubs()
_make_shared_runtime_stub()
_make_adapter_base_stub()
# Now safe to import the adapter
import sys as _sys
import os as _os
_adapter_dir = _os.path.dirname(_os.path.abspath(__file__))
if _adapter_dir not in _sys.path:
_sys.path.insert(0, _adapter_dir)
from adapter import ( # noqa: E402
Adapter,
GoogleADKA2AExecutor,
GoogleADKAdapter,
MissingContent,
_DEFAULT_AGENT_NAME,
_DEFAULT_MAX_OUTPUT_TOKENS,
_DEFAULT_TEMPERATURE,
_NO_RESPONSE_MSG,
_NO_TEXT_MSG,
)
# ---------------------------------------------------------------------------
# Fixtures and helpers
# ---------------------------------------------------------------------------
def _make_context(text: str, context_id: str = "ctx-test") -> MagicMock:
"""Return a mock RequestContext with the given text in message.parts."""
part = MagicMock()
part.text = text
ctx = MagicMock()
ctx.message.parts = [part]
ctx.context_id = context_id
return ctx
def _make_empty_context() -> MagicMock:
"""Return a context whose message parts contain no text."""
part = MagicMock(spec=[])
part.root = MagicMock(spec=[])
ctx = MagicMock()
ctx.message.parts = [part]
ctx.context_id = "ctx-empty"
return ctx
def _make_event(is_final: bool, text: str | None = None) -> MagicMock:
"""Build a mock ADK Event that optionally is a final response."""
event = MagicMock()
event.is_final_response = MagicMock(return_value=is_final)
if text is not None:
part = MagicMock()
part.text = text
event.response = MagicMock()
event.response.content = MagicMock()
event.response.content.parts = [part]
else:
event.response = None
return event
async def _async_gen(*events):
"""Yield events one by one as an async generator."""
for e in events:
yield e
def _make_runner(events=None) -> MagicMock:
"""Return a mock Runner whose run_async yields the given events."""
runner = MagicMock()
runner.session_service = AsyncMock()
runner.session_service.get_session = AsyncMock(return_value=None)
runner.session_service.create_session = AsyncMock(return_value={"id": "s1"})
evts = events or []
runner.run_async = MagicMock(return_value=_async_gen(*evts))
return runner
def _make_executor(
model: str = "gemini-2.0-flash",
system_prompt: str | None = "You are helpful.",
runner: MagicMock | None = None,
) -> GoogleADKA2AExecutor:
"""Create a GoogleADKA2AExecutor with an injected mock runner."""
return GoogleADKA2AExecutor(
model=model,
system_prompt=system_prompt,
_runner=runner or _make_runner(),
)
def _make_adapter_config(**kwargs) -> object:
"""Return an AdapterConfig with sensible defaults."""
from adapter_base import AdapterConfig
defaults = dict(
model="google:gemini-2.0-flash",
system_prompt="Test prompt.",
runtime_config={},
workspace_id="ws-test",
)
defaults.update(kwargs)
return AdapterConfig(**defaults)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
def test_default_agent_name():
assert _DEFAULT_AGENT_NAME == "molecule-adk-agent"
def test_default_max_output_tokens():
assert _DEFAULT_MAX_OUTPUT_TOKENS == 8192
def test_default_temperature():
assert _DEFAULT_TEMPERATURE == 1.0
def test_no_text_msg_constant():
assert "no text" in _NO_TEXT_MSG.lower()
def test_no_response_msg_constant():
assert "no response" in _NO_RESPONSE_MSG.lower()
# ---------------------------------------------------------------------------
# MissingContent sentinel
# ---------------------------------------------------------------------------
def test_missing_content_has_empty_parts():
mc = MissingContent()
assert mc.parts == []
# ---------------------------------------------------------------------------
# GoogleADKA2AExecutor — construction
# ---------------------------------------------------------------------------
def test_constructor_stores_fields():
runner = _make_runner()
executor = GoogleADKA2AExecutor(
model="gemini-1.5-pro",
system_prompt="Hello",
agent_name="my-agent",
max_output_tokens=4096,
temperature=0.5,
_runner=runner,
)
assert executor.model == "gemini-1.5-pro"
assert executor.system_prompt == "Hello"
assert executor.agent_name == "my-agent"
assert executor.max_output_tokens == 4096
assert executor.temperature == 0.5
assert executor._runner is runner
assert executor._sessions_created == set()
def test_constructor_defaults():
executor = GoogleADKA2AExecutor(model="gemini-2.0-flash", _runner=_make_runner())
assert executor.system_prompt is None
assert executor.agent_name == _DEFAULT_AGENT_NAME
assert executor.max_output_tokens == _DEFAULT_MAX_OUTPUT_TOKENS
assert executor.temperature == _DEFAULT_TEMPERATURE
assert executor._heartbeat is None
def test_constructor_uses_injected_runner():
stub = MagicMock()
stub.session_service = MagicMock()
executor = GoogleADKA2AExecutor(model="gemini-2.0-flash", _runner=stub)
assert executor._runner is stub
# ---------------------------------------------------------------------------
# GoogleADKA2AExecutor — _extract_text
# ---------------------------------------------------------------------------
def test_extract_text_returns_message_text():
executor = _make_executor()
ctx = _make_context("Hello world")
result = executor._extract_text(ctx)
assert result == "Hello world"
def test_extract_text_empty_context():
executor = _make_executor()
ctx = _make_empty_context()
result = executor._extract_text(ctx)
assert result == ""
# ---------------------------------------------------------------------------
# GoogleADKA2AExecutor — _build_content
# ---------------------------------------------------------------------------
def test_build_content_creates_content_object():
executor = _make_executor()
content = executor._build_content("test message")
assert content.role == "user"
assert len(content.parts) == 1
assert content.parts[0].text == "test message"
def test_build_content_empty_string():
executor = _make_executor()
content = executor._build_content("")
assert content.parts[0].text == ""
# ---------------------------------------------------------------------------
# GoogleADKA2AExecutor — _ensure_session
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_ensure_session_creates_when_not_exists():
runner = _make_runner()
runner.session_service.get_session = AsyncMock(return_value=None)
executor = GoogleADKA2AExecutor(
model="gemini-2.0-flash", agent_name="test-agent", _runner=runner
)
await executor._ensure_session("session-1", "user-1")
runner.session_service.create_session.assert_called_once_with(
app_name="test-agent",
user_id="user-1",
session_id="session-1",
)
assert "session-1" in executor._sessions_created
@pytest.mark.asyncio
async def test_ensure_session_skips_if_already_tracked():
runner = _make_runner()
executor = GoogleADKA2AExecutor(
model="gemini-2.0-flash", _runner=runner
)
executor._sessions_created.add("session-x")
await executor._ensure_session("session-x", "user-1")
# Neither get_session nor create_session should be called
runner.session_service.get_session.assert_not_called()
runner.session_service.create_session.assert_not_called()
@pytest.mark.asyncio
async def test_ensure_session_skips_create_when_existing():
runner = _make_runner()
runner.session_service.get_session = AsyncMock(return_value={"id": "s1"})
executor = GoogleADKA2AExecutor(
model="gemini-2.0-flash", agent_name="test-agent", _runner=runner
)
await executor._ensure_session("session-existing", "user-1")
runner.session_service.create_session.assert_not_called()
assert "session-existing" in executor._sessions_created
# ---------------------------------------------------------------------------
# GoogleADKA2AExecutor — execute: happy path
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_execute_returns_response_text():
event = _make_event(is_final=True, text="The answer is 42.")
runner = _make_runner(events=[event])
executor = _make_executor(runner=runner)
ctx = _make_context("What is 6×7?")
eq = AsyncMock()
await executor.execute(ctx, eq)
eq.enqueue_event.assert_called_once_with("The answer is 42.")
@pytest.mark.asyncio
async def test_execute_concatenates_multiple_final_parts():
part1 = MagicMock()
part1.text = "Hello "
part2 = MagicMock()
part2.text = "world"
event = MagicMock()
event.is_final_response = MagicMock(return_value=True)
event.response = MagicMock()
event.response.content = MagicMock()
event.response.content.parts = [part1, part2]
runner = _make_runner(events=[event])
executor = _make_executor(runner=runner)
ctx = _make_context("Hi")
eq = AsyncMock()
await executor.execute(ctx, eq)
eq.enqueue_event.assert_called_once_with("Hello world")
@pytest.mark.asyncio
async def test_execute_skips_non_final_events():
non_final = _make_event(is_final=False, text="intermediate")
final = _make_event(is_final=True, text="final answer")
runner = _make_runner(events=[non_final, final])
executor = _make_executor(runner=runner)
ctx = _make_context("question")
eq = AsyncMock()
await executor.execute(ctx, eq)
enqueued = eq.enqueue_event.call_args[0][0]
assert enqueued == "final answer"
@pytest.mark.asyncio
async def test_execute_fallback_when_no_final_response_events():
non_final = _make_event(is_final=False)
runner = _make_runner(events=[non_final])
executor = _make_executor(runner=runner)
ctx = _make_context("hello")
eq = AsyncMock()
await executor.execute(ctx, eq)
eq.enqueue_event.assert_called_once_with(_NO_RESPONSE_MSG)
@pytest.mark.asyncio
async def test_execute_fallback_when_response_is_none():
event = MagicMock()
event.is_final_response = MagicMock(return_value=True)
event.response = None # no response object
runner = _make_runner(events=[event])
executor = _make_executor(runner=runner)
ctx = _make_context("ping")
eq = AsyncMock()
await executor.execute(ctx, eq)
eq.enqueue_event.assert_called_once_with(_NO_RESPONSE_MSG)
@pytest.mark.asyncio
async def test_execute_fallback_when_parts_have_no_text():
part = MagicMock()
part.text = None # no text on the part
event = MagicMock()
event.is_final_response = MagicMock(return_value=True)
event.response = MagicMock()
event.response.content = MagicMock()
event.response.content.parts = [part]
runner = _make_runner(events=[event])
executor = _make_executor(runner=runner)
ctx = _make_context("ping")
eq = AsyncMock()
await executor.execute(ctx, eq)
eq.enqueue_event.assert_called_once_with(_NO_RESPONSE_MSG)
@pytest.mark.asyncio
async def test_execute_fallback_when_response_content_is_none():
event = MagicMock()
event.is_final_response = MagicMock(return_value=True)
event.response = MagicMock()
event.response.content = None # content is None → MissingContent sentinel
runner = _make_runner(events=[event])
executor = _make_executor(runner=runner)
ctx = _make_context("ping")
eq = AsyncMock()
await executor.execute(ctx, eq)
eq.enqueue_event.assert_called_once_with(_NO_RESPONSE_MSG)
@pytest.mark.asyncio
async def test_execute_uses_context_id_as_session_id():
event = _make_event(is_final=True, text="ok")
runner = _make_runner(events=[event])
executor = _make_executor(runner=runner)
ctx = _make_context("hello", context_id="ctx-abc-123")
eq = AsyncMock()
await executor.execute(ctx, eq)
runner.run_async.assert_called_once()
call_kwargs = runner.run_async.call_args[1]
assert call_kwargs["session_id"] == "ctx-abc-123"
assert call_kwargs["user_id"] == "molecule-user"
@pytest.mark.asyncio
async def test_execute_falls_back_to_default_session_id_when_context_id_is_none():
event = _make_event(is_final=True, text="ok")
runner = _make_runner(events=[event])
executor = _make_executor(runner=runner)
ctx = _make_context("hello")
ctx.context_id = None # override
eq = AsyncMock()
await executor.execute(ctx, eq)
call_kwargs = runner.run_async.call_args[1]
assert call_kwargs["session_id"] == "default-session"
# ---------------------------------------------------------------------------
# GoogleADKA2AExecutor — execute: empty input
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_execute_empty_input_returns_error():
runner = _make_runner()
executor = _make_executor(runner=runner)
ctx = _make_empty_context()
eq = AsyncMock()
await executor.execute(ctx, eq)
eq.enqueue_event.assert_called_once_with(_NO_TEXT_MSG)
runner.run_async.assert_not_called()
# ---------------------------------------------------------------------------
# GoogleADKA2AExecutor — execute: error handling
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_execute_api_error_returns_sanitized_message():
runner = _make_runner()
class _FakeAPIError(Exception):
pass
async def _raise(*args, **kwargs):
raise _FakeAPIError("api_key=secret token_limit_exceeded")
yield # make it an async generator
runner.run_async = MagicMock(return_value=_raise())
executor = _make_executor(runner=runner)
eq = AsyncMock()
await executor.execute(_make_context("hello"), eq)
enqueued = eq.enqueue_event.call_args[0][0]
assert enqueued == "Agent error: _FakeAPIError"
assert "secret" not in enqueued
@pytest.mark.asyncio
async def test_execute_api_error_is_logged(caplog):
import logging
runner = _make_runner()
async def _raise(*args, **kwargs):
raise ValueError("bad request")
yield # make it an async generator
runner.run_async = MagicMock(return_value=_raise())
executor = _make_executor(runner=runner)
with caplog.at_level(logging.ERROR, logger="adapter"):
await executor.execute(_make_context("hello"), AsyncMock())
assert any("execution error" in r.message.lower() for r in caplog.records)
# ---------------------------------------------------------------------------
# GoogleADKA2AExecutor — cancel
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_cancel_emits_canceled_event():
executor = _make_executor()
import a2a.types as a2a_types
class _TaskState:
canceled = "canceled"
class _TaskStatus:
def __init__(self, state):
self.state = state
class _TaskStatusUpdateEvent:
def __init__(self, status, final):
self.status = status
self.final = final
a2a_types.TaskState = _TaskState
a2a_types.TaskStatus = _TaskStatus
a2a_types.TaskStatusUpdateEvent = _TaskStatusUpdateEvent
eq = AsyncMock()
ctx = MagicMock()
await executor.cancel(ctx, eq)
eq.enqueue_event.assert_called_once()
event = eq.enqueue_event.call_args[0][0]
assert isinstance(event, _TaskStatusUpdateEvent)
assert event.status.state == "canceled"
assert event.final is True
# ---------------------------------------------------------------------------
# GoogleADKAdapter — identity methods
# ---------------------------------------------------------------------------
def test_adapter_name():
assert GoogleADKAdapter.name() == "google-adk"
def test_adapter_display_name():
assert "Google ADK" in GoogleADKAdapter.display_name()
def test_adapter_description():
desc = GoogleADKAdapter.description()
assert "ADK" in desc or "Google" in desc
def test_adapter_get_config_schema():
schema = GoogleADKAdapter.get_config_schema()
assert schema["type"] == "object"
assert "agent_name" in schema["properties"]
assert "max_output_tokens" in schema["properties"]
assert "temperature" in schema["properties"]
# ---------------------------------------------------------------------------
# GoogleADKAdapter — setup
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_setup_succeeds_with_api_key(monkeypatch):
monkeypatch.setenv("GOOGLE_API_KEY", "fake-api-key")
monkeypatch.delenv("GOOGLE_GENAI_USE_VERTEXAI", raising=False)
adapter = GoogleADKAdapter()
config = _make_adapter_config()
await adapter.setup(config)
assert adapter._setup_result is not None
assert adapter._setup_result.system_prompt == "mocked system prompt"
@pytest.mark.asyncio
async def test_setup_succeeds_with_vertex_ai(monkeypatch):
monkeypatch.delenv("GOOGLE_API_KEY", raising=False)
monkeypatch.setenv("GOOGLE_GENAI_USE_VERTEXAI", "1")
adapter = GoogleADKAdapter()
config = _make_adapter_config()
await adapter.setup(config)
assert adapter._setup_result is not None
@pytest.mark.asyncio
async def test_setup_succeeds_with_vertex_ai_true_string(monkeypatch):
monkeypatch.delenv("GOOGLE_API_KEY", raising=False)
monkeypatch.setenv("GOOGLE_GENAI_USE_VERTEXAI", "True")
adapter = GoogleADKAdapter()
config = _make_adapter_config()
await adapter.setup(config)
assert adapter._setup_result is not None
@pytest.mark.asyncio
async def test_setup_raises_without_credentials(monkeypatch):
monkeypatch.delenv("GOOGLE_API_KEY", raising=False)
monkeypatch.delenv("GOOGLE_GENAI_USE_VERTEXAI", raising=False)
adapter = GoogleADKAdapter()
config = _make_adapter_config()
with pytest.raises(RuntimeError, match="GOOGLE_API_KEY"):
await adapter.setup(config)
# ---------------------------------------------------------------------------
# GoogleADKAdapter — create_executor
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_create_executor_strips_google_prefix(monkeypatch):
monkeypatch.setenv("GOOGLE_API_KEY", "key")
adapter = GoogleADKAdapter()
config = _make_adapter_config(model="google:gemini-2.0-flash")
await adapter.setup(config)
executor = await adapter.create_executor(config)
assert executor.model == "gemini-2.0-flash"
@pytest.mark.asyncio
async def test_create_executor_no_prefix_passthrough(monkeypatch):
monkeypatch.setenv("GOOGLE_API_KEY", "key")
adapter = GoogleADKAdapter()
config = _make_adapter_config(model="gemini-1.5-pro")
await adapter.setup(config)
executor = await adapter.create_executor(config)
assert executor.model == "gemini-1.5-pro"
@pytest.mark.asyncio
async def test_create_executor_uses_setup_system_prompt(monkeypatch):
monkeypatch.setenv("GOOGLE_API_KEY", "key")
adapter = GoogleADKAdapter()
config = _make_adapter_config()
await adapter.setup(config)
executor = await adapter.create_executor(config)
assert executor.system_prompt == "mocked system prompt"
@pytest.mark.asyncio
async def test_create_executor_runtime_config_overrides(monkeypatch):
monkeypatch.setenv("GOOGLE_API_KEY", "key")
adapter = GoogleADKAdapter()
config = _make_adapter_config(
runtime_config={
"agent_name": "custom-agent",
"max_output_tokens": 512,
"temperature": 0.3,
}
)
await adapter.setup(config)
executor = await adapter.create_executor(config)
assert executor.agent_name == "custom-agent"
assert executor.max_output_tokens == 512
assert executor.temperature == 0.3
@pytest.mark.asyncio
async def test_create_executor_defaults_without_runtime_config(monkeypatch):
monkeypatch.setenv("GOOGLE_API_KEY", "key")
adapter = GoogleADKAdapter()
config = _make_adapter_config(runtime_config={})
await adapter.setup(config)
executor = await adapter.create_executor(config)
assert executor.agent_name == _DEFAULT_AGENT_NAME
assert executor.max_output_tokens == _DEFAULT_MAX_OUTPUT_TOKENS
assert executor.temperature == _DEFAULT_TEMPERATURE
@pytest.mark.asyncio
async def test_create_executor_without_setup_uses_config_system_prompt(monkeypatch):
"""create_executor without prior setup falls back to config.system_prompt."""
monkeypatch.setenv("GOOGLE_API_KEY", "key")
adapter = GoogleADKAdapter()
config = _make_adapter_config(system_prompt="fallback prompt")
# Intentionally skip setup() — _setup_result remains None
executor = await adapter.create_executor(config)
assert executor.system_prompt == "fallback prompt"
@pytest.mark.asyncio
async def test_create_executor_without_setup_no_system_prompt(monkeypatch):
"""create_executor without setup and no system_prompt → empty string."""
monkeypatch.setenv("GOOGLE_API_KEY", "key")
adapter = GoogleADKAdapter()
config = _make_adapter_config(system_prompt=None)
# Skip setup()
executor = await adapter.create_executor(config)
assert executor.system_prompt == ""
@pytest.mark.asyncio
async def test_create_executor_heartbeat_passed(monkeypatch):
monkeypatch.setenv("GOOGLE_API_KEY", "key")
adapter = GoogleADKAdapter()
heartbeat = MagicMock()
config = _make_adapter_config(heartbeat=heartbeat)
await adapter.setup(config)
executor = await adapter.create_executor(config)
assert executor._heartbeat is heartbeat
# ---------------------------------------------------------------------------
# Adapter alias
# ---------------------------------------------------------------------------
def test_adapter_alias_is_google_adk_adapter():
assert Adapter is GoogleADKAdapter
-2
View File
@@ -1,2 +0,0 @@
"""Re-export from shared_runtime for backward compat."""
from shared_runtime import * # noqa: F401,F403
-32
View File
@@ -1,32 +0,0 @@
"""Smolagents adapter for Molecule AI workspace runtime.
Provides env sanitization and safe executor/messaging primitives for use
with HuggingFace's smolagents library.
Two env-sanitization strategies are available:
* **Allowlist** (recommended) — :mod:`adapters.smolagents.env_sanitize`:
only explicitly-safe variables pass through. Stricter but requires keeping
the allowlist up-to-date as new safe vars are needed.
* **Denylist** (simple) — :mod:`adapters.smolagents.safe_env`:
well-known secret names plus ``*_API_KEY`` / ``*_TOKEN`` suffix patterns
are stripped. Easier to start with; less exhaustive.
Quick start::
# Allowlist approach (stricter)
from adapters.smolagents.env_sanitize import make_safe_env, SafeLocalPythonExecutor
# Denylist approach (simpler)
from adapters.smolagents.safe_env import make_safe_env
# Safe messaging
from adapters.smolagents.send_message_wrapper import safe_send_message
"""
# Re-export the allowlist-based make_safe_env as the default (most secure).
from adapters.smolagents.env_sanitize import SafeLocalPythonExecutor, make_safe_env
from adapters.smolagents.send_message_wrapper import safe_send_message
__all__ = ["make_safe_env", "SafeLocalPythonExecutor", "safe_send_message"]
@@ -1,226 +0,0 @@
"""Allowlist-based environment sanitization for smolagents (#826 — C3 CRITICAL).
Security model
--------------
We use an **allowlist** (not a denylist) — only variables explicitly
enumerated as safe are passed through to agent-executed code. Any key not
on the list is silently dropped.
This is intentionally strict: adding a new safe variable is a deliberate
engineering act that surfaces in code review, rather than hoping a regex
denylist catches every new secret name.
Thread safety
-------------
``SafeLocalPythonExecutor.__call__`` mutates ``os.environ`` temporarily.
``_ENV_PATCH_LOCK`` serialises concurrent calls so simultaneous executions
do not see each other's env patches.
Extending the allowlist
-----------------------
Set ``SMOLAGENTS_ENV_EXTRA_ALLOWLIST`` to a comma-separated list of
additional uppercase env var names that should be passed through. This is
intended for workspace-specific non-secret variables (e.g. ``WORKSPACE_ID``
that you know are safe):
SMOLAGENTS_ENV_EXTRA_ALLOWLIST="MY_COMPANY_ENV,REGION"
Never add secret names here — use workspace secrets injection instead.
"""
from __future__ import annotations
import os
import threading
from typing import Any, Dict, List, Optional
# ---------------------------------------------------------------------------
# Allowlist configuration
# ---------------------------------------------------------------------------
# Core safe env variables — non-secret system and runtime variables that
# agent code may legitimately need (e.g. PATH for subprocess-free tools,
# PYTHONPATH for module resolution, TZ for datetime ops).
_SAFE_ENV_ALLOWLIST: frozenset = frozenset(
[
# Shell / system fundamentals
"PATH",
"HOME",
"USER",
"LOGNAME",
"SHELL",
"TERM",
"TZ",
"TMPDIR",
"TEMP",
"TMP",
# Language / locale
"LANG",
"LANGUAGE",
"LC_ALL",
"LC_CTYPE",
"LC_MESSAGES",
"LC_NUMERIC",
"LC_TIME",
# Python runtime
"PYTHONPATH",
"PYTHONHOME",
"PYTHONDONTWRITEBYTECODE",
"PYTHONUNBUFFERED",
"PYTHONIOENCODING",
# Molecule workspace non-secret identity vars
"WORKSPACE_ID",
"WORKSPACE_NAME",
"PLATFORM_URL",
]
)
# Imports permanently excluded from the executor's authorized list.
# These are well-known sandbox-escape vectors.
_BANNED_IMPORTS: frozenset = frozenset(
["subprocess", "socket", "ctypes", "importlib", "importlib.util"]
)
# Baseline imports every SafeLocalPythonExecutor allows — pure-computation
# modules with no I/O escape surface.
_BASELINE_SAFE_IMPORTS: List[str] = [
"math",
"json",
"re",
"datetime",
"collections",
"itertools",
"functools",
"typing",
"string",
"textwrap",
"decimal",
"fractions",
"statistics",
"random",
"hashlib",
"base64",
"urllib.parse",
"copy",
"dataclasses",
"enum",
"abc",
"io",
]
# Thread lock for env patching
_ENV_PATCH_LOCK = threading.Lock()
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def make_safe_env(
extra_allowed: Optional[List[str]] = None,
) -> Dict[str, str]:
"""Return a *copy* of the environment containing only allowlisted keys.
``os.environ`` is **never mutated** by this function.
Parameters
----------
extra_allowed:
Additional variable names to include beyond the built-in allowlist.
Also merged with the ``SMOLAGENTS_ENV_EXTRA_ALLOWLIST`` env var.
Returns
-------
dict
A copy of ``os.environ`` filtered to allowlisted keys only.
Keys not on the list are silently dropped.
"""
allowed = set(_SAFE_ENV_ALLOWLIST)
# Merge caller-provided extras
if extra_allowed:
allowed.update(k.upper() for k in extra_allowed)
# Merge env-var-configured extras
env_extra = os.environ.get("SMOLAGENTS_ENV_EXTRA_ALLOWLIST", "")
if env_extra:
for key in env_extra.split(","):
key = key.strip().upper()
if key:
allowed.add(key)
return {k: v for k, v in os.environ.items() if k in allowed}
class SafeLocalPythonExecutor:
"""Allowlist-gated wrapper around smolagents ``LocalPythonExecutor``.
Guarantees that agent-generated code cannot read secret environment
variables (``ANTHROPIC_API_KEY``, ``GH_TOKEN``, ``DATABASE_URL``, etc.)
because they are absent from ``os.environ`` during execution.
Parameters
----------
additional_imports:
Extra module names to allow beyond ``_BASELINE_SAFE_IMPORTS``.
``_BANNED_IMPORTS`` takes precedence — listed names are silently
removed.
extra_allowed_env:
Extra variable names to pass through beyond the core allowlist.
_inner:
Inject a mock ``LocalPythonExecutor`` for tests. When ``None``,
the real smolagents executor is constructed lazily.
"""
def __init__(
self,
additional_imports: Optional[List[str]] = None,
extra_allowed_env: Optional[List[str]] = None,
*,
_inner: Any = None,
) -> None:
# Compute final import list (baseline + extras banned)
combined = list(_BASELINE_SAFE_IMPORTS)
if additional_imports:
for imp in additional_imports:
if imp not in _BANNED_IMPORTS:
combined.append(imp)
self._authorized_imports: List[str] = combined
self._extra_allowed_env: Optional[List[str]] = extra_allowed_env
self._inner = _inner # may be None until first call
def _get_inner(self) -> Any:
"""Lazy-construct the real executor on first use (avoids import errors in tests)."""
if self._inner is None:
from smolagents import LocalPythonExecutor # type: ignore[import]
self._inner = LocalPythonExecutor(
additional_authorized_imports=self._authorized_imports
)
return self._inner
def __call__(self, code: str, *args: Any, **kwargs: Any) -> Any:
"""Execute ``code`` with only allowlisted env vars visible.
All keys not on the allowlist are removed from ``os.environ`` for
the duration of execution and restored afterward, even on exception.
The lock ensures thread safety across concurrent calls.
"""
safe_env = make_safe_env(self._extra_allowed_env)
inner = self._get_inner()
with _ENV_PATCH_LOCK:
# Snapshot full current env
original_env = dict(os.environ)
# Remove everything not in the safe set
keys_to_remove = [k for k in os.environ if k not in safe_env]
for k in keys_to_remove:
del os.environ[k]
try:
return inner(code, *args, **kwargs)
finally:
# Always restore
os.environ.clear()
os.environ.update(original_env)
-61
View File
@@ -1,61 +0,0 @@
"""Denylist-based environment sanitization for smolagents (issue #826 — C3 CRITICAL).
This module provides a simple denylist approach: well-known secret variable
names plus ``*_API_KEY`` and ``*_TOKEN`` suffix patterns are stripped before
env is passed to agent-executed code.
For a stricter allowlist-based alternative that only passes explicitly-safe
variables through, see :mod:`adapters.smolagents.env_sanitize`.
Usage::
from adapters.smolagents.safe_env import make_safe_env
executor = LocalPythonExecutor(...)
# Pass only the sanitised env to the subprocess / exec context:
safe = make_safe_env()
"""
import copy
import os
# Named API keys and tokens known to be used by smolagents / LLM clients.
# These are removed regardless of the suffix-pattern below.
SMOLAGENTS_ENV_DENYLIST: frozenset = frozenset(
{
"OPENAI_API_KEY",
"ANTHROPIC_API_KEY",
"GROQ_API_KEY",
"CEREBRAS_API_KEY",
"QIANFAN_API_KEY",
"LANGFUSE_SECRET_KEY",
"LANGFUSE_PUBLIC_KEY",
"HF_TOKEN",
}
)
def make_safe_env() -> dict:
"""Return a sanitised copy of ``os.environ`` with secrets removed.
Removes any key that:
- Is in :data:`SMOLAGENTS_ENV_DENYLIST`, OR
- Ends with ``_API_KEY``, OR
- Ends with ``_TOKEN``
``os.environ`` is **never mutated** — a fresh ``dict`` copy is returned.
Returns
-------
dict
A copy of the current environment with secret keys removed.
"""
env = copy.copy(dict(os.environ))
for key in list(env.keys()):
if (
key in SMOLAGENTS_ENV_DENYLIST
or key.endswith("_API_KEY")
or key.endswith("_TOKEN")
):
del env[key]
return env
@@ -1,71 +0,0 @@
"""Safe send_message wrapper for smolagents (issue #827 — C1 HIGH).
Prevents social-engineering attacks where agent-generated content could
impersonate platform messages, inject HTML, or flood the user chat.
Guarantees
----------
1. Every message is prefixed with ``[smolagents]`` so recipients can
attribute it to the agent and cannot be mistaken for platform UI.
2. Truncated to 2000 characters to prevent log/UI floods.
3. HTML entities (``<``, ``>``, ``&``, ``"``, ``'``) are escaped so
rendered UIs that interpret HTML cannot be injected into.
Usage::
from adapters.smolagents.send_message_wrapper import safe_send_message
safe_send_message("Hello world", send_fn=platform_client.send)
"""
from __future__ import annotations
import html
import logging
logger = logging.getLogger(__name__)
# Maximum character length for the *user-visible* portion of the message
# (label prefix does not count toward this cap).
_MAX_TEXT_LEN: int = 2000
# Label prepended to every outbound message.
_LABEL: str = "[smolagents]"
def safe_send_message(text: str, send_fn) -> None:
"""Sanitise *text* and deliver it via *send_fn*.
Parameters
----------
text:
The raw message text produced by the agent.
send_fn:
Callable that delivers the message (e.g. ``platform_client.send``
or a WebSocket broadcast function). Called with the final,
sanitised string as its sole positional argument.
Side effects
------------
- Logs a warning when truncation occurs.
- Logs a debug entry with the final payload length.
"""
if not isinstance(text, str):
text = str(text)
# Strip HTML entities to prevent injection into rendered UIs.
sanitised = html.escape(text, quote=True)
# Truncate to cap (before adding label so cap applies to content).
if len(sanitised) > _MAX_TEXT_LEN:
logger.warning(
"safe_send_message: truncating message from %d to %d chars",
len(sanitised),
_MAX_TEXT_LEN,
)
sanitised = sanitised[:_MAX_TEXT_LEN]
payload = f"{_LABEL} {sanitised}"
logger.debug("safe_send_message: delivering %d-char payload", len(payload))
send_fn(payload)
-133
View File
@@ -1,133 +0,0 @@
"""Create the Deep Agent with model + skills + tools."""
import os
import logging
from langgraph.prebuilt import create_react_agent
logger = logging.getLogger(__name__)
def create_agent(model_str: str, tools: list, system_prompt: str):
"""Create a LangGraph ReAct agent.
Args:
model_str: LangChain-compatible model string (e.g., 'anthropic:claude-sonnet-4-6')
tools: List of tool functions
system_prompt: The system prompt for the agent
"""
# Parse provider:model format
if ":" in model_str:
provider, model_name = model_str.split(":", 1)
else:
provider = "anthropic"
model_name = model_str
# Import the provider package
try:
if provider in ("anthropic",):
from langchain_anthropic import ChatAnthropic as LLMClass
elif provider in ("openai", "openrouter", "groq", "cerebras", "qianfan"):
from langchain_openai import ChatOpenAI as LLMClass
elif provider == "google_genai":
from langchain_google_genai import ChatGoogleGenerativeAI as LLMClass
elif provider == "ollama":
from langchain_ollama import ChatOllama as LLMClass
else:
raise ValueError(f"Unsupported model provider: {provider}")
except ImportError as e:
pkg = "langchain-openai" if provider == "openrouter" else f"langchain-{provider}"
raise ImportError(f"Provider '{provider}' requires package '{pkg}'. Install: pip install {pkg}") from e
# Instantiate the LLM
if provider == "anthropic":
llm_kwargs = {"model": model_name}
anthropic_base_url = os.environ.get("ANTHROPIC_BASE_URL", "")
if anthropic_base_url:
llm_kwargs["anthropic_api_url"] = anthropic_base_url
llm = LLMClass(**llm_kwargs)
elif provider == "openrouter":
api_key = os.environ.get("OPENROUTER_API_KEY", os.environ.get("OPENAI_API_KEY", ""))
max_tokens = int(os.environ.get("MAX_TOKENS", "2048"))
llm = LLMClass(
model=model_name,
openai_api_key=api_key,
openai_api_base="https://openrouter.ai/api/v1",
max_tokens=max_tokens,
)
elif provider == "groq":
api_key = os.environ.get("GROQ_API_KEY", "")
llm = LLMClass(
model=model_name,
openai_api_key=api_key,
openai_api_base="https://api.groq.com/openai/v1",
)
elif provider == "cerebras":
api_key = os.environ.get("CEREBRAS_API_KEY", "")
llm = LLMClass(
model=model_name,
openai_api_key=api_key,
openai_api_base="https://api.cerebras.ai/v1",
)
elif provider == "qianfan":
api_key = os.environ.get("QIANFAN_API_KEY", os.environ.get("AISTUDIO_API_KEY", ""))
llm = LLMClass(
model=model_name,
openai_api_key=api_key,
openai_api_base="https://qianfan.baidubce.com/v2",
)
elif provider == "openai":
llm_kwargs = {"model": model_name}
openai_base_url = os.environ.get("OPENAI_BASE_URL", "")
if openai_base_url:
llm_kwargs["openai_api_base"] = openai_base_url
llm = LLMClass(**llm_kwargs)
else:
llm = LLMClass(model=model_name)
# Auto-inject Langfuse tracing if env vars are present
callbacks = _setup_langfuse()
if callbacks:
llm.callbacks = callbacks
agent = create_react_agent(
model=llm,
tools=tools,
prompt=system_prompt,
)
return agent
def _setup_langfuse():
"""Set up Langfuse tracing if LANGFUSE_* env vars are present.
Returns list of callbacks to pass to agent invocations, or empty list.
"""
langfuse_host = os.environ.get("LANGFUSE_HOST")
langfuse_public = os.environ.get("LANGFUSE_PUBLIC_KEY")
langfuse_secret = os.environ.get("LANGFUSE_SECRET_KEY")
if not (langfuse_host and langfuse_public and langfuse_secret):
return []
try:
from langfuse.callback import CallbackHandler
handler = CallbackHandler(
host=langfuse_host,
public_key=langfuse_public,
secret_key=langfuse_secret,
)
logger.info("Langfuse tracing enabled: %s", langfuse_host)
# Also set LANGSMITH_TRACING for LangGraph native integration
os.environ.setdefault("LANGSMITH_TRACING", "true")
return [handler]
except ImportError:
logger.warning("Langfuse env vars set but langfuse package not installed")
return []
except Exception as e:
logger.warning("Langfuse setup failed: %s", e)
return []
-74
View File
@@ -1,74 +0,0 @@
"""AGENTS.md auto-generation for Molecule AI workspaces.
Implements the AAIF / Linux Foundation AGENTS.md standard so that peer agents
and orchestration tools can discover this workspace's identity, role, A2A
endpoint, and available tools without reading the full system prompt.
Usage::
from agents_md import generate_agents_md
generate_agents_md(config_dir="/configs", output_path="/workspace/AGENTS.md")
The function is called automatically at container startup (see main.py).
"""
import logging
import os
from pathlib import Path
logger = logging.getLogger(__name__)
def generate_agents_md(config_dir: str, output_path: str) -> None:
"""Generate (or regenerate) AGENTS.md from the workspace config.yaml.
Always overwrites ``output_path`` — no stale-file guard. Re-calling
after editing config.yaml produces a fresh file reflecting the changes.
Args:
config_dir: Directory containing config.yaml (same convention as
``load_config`` in config.py).
output_path: Absolute path where AGENTS.md will be written.
The parent directory is expected to exist.
"""
from config import load_config
cfg = load_config(config_dir)
# ── A2A Endpoint ─────────────────────────────────────────────────────────
# AGENT_URL env var takes priority (production deployments behind a proxy).
# Otherwise derive from the configured a2a.port (default 8000).
endpoint = os.environ.get("AGENT_URL") or f"http://localhost:{cfg.a2a.port}/a2a"
# ── Role ─────────────────────────────────────────────────────────────────
# Fall back to description when the role field is absent so legacy
# config.yaml files (without a role key) still produce meaningful output.
role = cfg.role if cfg.role else cfg.description
# ── MCP Tools ────────────────────────────────────────────────────────────
# tools (skill names) + plugins (installed plugin names) form the combined
# capability surface visible to peer agents.
all_tools = list(cfg.tools) + list(cfg.plugins)
if all_tools:
tools_section = "\n".join(f"- {t}" for t in all_tools)
else:
tools_section = "None"
content = (
f"# {cfg.name}\n"
f"\n"
f"**Role:** {role}\n"
f"\n"
f"## Description\n"
f"{cfg.description}\n"
f"\n"
f"## A2A Endpoint\n"
f"{endpoint}\n"
f"\n"
f"## MCP Tools\n"
f"{tools_section}\n"
)
Path(output_path).write_text(content, encoding="utf-8")
logger.info("Generated AGENTS.md at %s for workspace %r", output_path, cfg.name)
@@ -1,31 +0,0 @@
# Publish-runtime pipeline verification — 2026-05-11
Marker file for the canonical end-to-end pipeline verification after
`publish-runtime-bot` provisioning (internal#327) + stale-tag drift
resolution (`runtime-v0.1.131` deleted from main).
## Purpose
Triggers `workspace/**` path filter on `publish-runtime-autobump.yml`,
exercising the full pipeline:
1. `publish-runtime-autobump / bump-and-tag` reads PyPI version, computes
next, pushes tag `runtime-v0.1.131` (or higher) using new bot scope.
2. `publish-runtime.yml` fires on tag, builds + publishes to PyPI.
3. Cascade autobump: 9 template repos get their `.runtime-version`
pinned to the new version.
## Acceptance criteria
- [ ] autobump bump-and-tag context green on merged commit
- [ ] tag `runtime-v0.1.131` (or computed next) exists on molecule-core
- [ ] publish-runtime.yml run green
- [ ] PyPI molecule-ai-workspace-runtime updated from 0.1.130
- [ ] 9 template repos updated their pinned runtime version
## Rollback
This file is informational only — no code dependency. Safe to delete
in any future PR once pipeline is proven stable.
— core-devops (per Hongming "long-term proper robust" directive 2026-05-11 19:48-19:50Z)
-84
View File
@@ -1,84 +0,0 @@
"""Build the Starlette routes for a workspace from its (card, adapter
state) pair.
Pairs with PR #2756, which decoupled ``/.well-known/agent-card.json`` from
``adapter.setup()`` failure. main.py was the only consumer and was
``# pragma: no cover`` — so the wiring (card-route mounted unconditionally,
JSON-RPC route swapped between DefaultRequestHandler and the
not-configured handler based on ``adapter_ready``) had no pytest coverage.
A future refactor that re-couples the two would silently bypass PR #2756
and shipped the original "stuck booting forever" UX again. That gap is
what closes here: extract the route-assembly into a pure function whose
behaviour is unit-testable with Starlette's TestClient, and have main.py
call it. Issue molecule-core#2761.
"""
from __future__ import annotations
from typing import Any
from starlette.routing import Route
from not_configured_handler import make_not_configured_handler
# Heavy a2a-sdk imports are lazy: deferred to inside build_routes so
# tests that exercise only the not-configured branch (no executor) don't
# need a2a.server.request_handlers / routes stubbed in their conftest.
# Production boot pays the import cost once, on workspace startup.
def build_routes(
agent_card: Any,
executor: Any | None,
adapter_error: str | None,
) -> list:
"""Return the list of Starlette routes for this workspace.
Always mounts ``/.well-known/agent-card.json`` from ``agent_card``.
JSON-RPC route at ``/`` swaps based on adapter state:
* ``executor`` is non-None → ``DefaultRequestHandler`` with the
executor (production happy-path).
* ``executor`` is None → ``not_configured_handler`` returning JSON-RPC
``-32603`` with ``adapter_error`` in ``error.data``. The
workspace stays REACHABLE (operator can introspect, deprovision,
redeploy with corrected env) instead of crash-looping invisibly.
The two branches are mutually exclusive — caller passes one or the
other, never both. Test coverage at ``tests/test_boot_routes.py``
pins the contract.
"""
from a2a.server.routes import create_agent_card_routes
routes: list = []
routes.extend(create_agent_card_routes(agent_card))
if executor is not None:
from a2a.server.request_handlers import DefaultRequestHandler
from a2a.server.routes import create_jsonrpc_routes
from a2a.server.tasks import InMemoryTaskStore
handler = DefaultRequestHandler(
agent_executor=executor,
task_store=InMemoryTaskStore(),
agent_card=agent_card,
)
# enable_v0_3_compat=True is the JSON-RPC wire-compat path: clients
# using v0.3-shaped payloads (`"role": "user"` lowercase + camelCase
# Pydantic field names) can talk to us without re-deploying.
# Outbound payloads must also use v0.3 shape — see main.py's
# original comment block for the full a2a-sdk 1.x migration note.
routes.extend(
create_jsonrpc_routes(
request_handler=handler,
rpc_url="/",
enable_v0_3_compat=True,
)
)
else:
routes.append(
Route("/", make_not_configured_handler(adapter_error), methods=["POST"])
)
return routes
-37
View File
@@ -1,37 +0,0 @@
#!/usr/bin/env bash
# build-all.sh — Rebuild base image and optionally adapter images.
#
# NOTE: Adapters have been extracted to standalone template repos:
# https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-<runtime>
#
# This script now only builds the base image from workspace/Dockerfile.
# Each adapter repo has its own Dockerfile that installs molecule-ai-workspace-runtime
# from PyPI and the adapter-specific deps.
#
# Usage:
# bash workspace/build-all.sh # Build base image only
#
# Standalone adapter repos still reference the legacy base image for local dev
# (e.g. FROM workspace-template:base). To build those locally, clone the adapter
# repo and run `docker build -t workspace-template:<runtime> .` from its root.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
cd "$SCRIPT_DIR"
GREEN='\033[0;32m'
RED='\033[0;31m'
NC='\033[0m'
log() { echo -e "${GREEN}[build]${NC} $1" >&2; }
err() { echo -e "${RED}[error]${NC} $1" >&2; }
# Build base image
log "Building workspace-template:base ..."
if ! docker build -t workspace-template:base -f Dockerfile . ; then
err "Base image build failed"
exit 1
fi
log "Base image built"
log "Done. Adapters are in standalone template repos — see docs/workspace-runtime-package.md"
View File
-139
View File
@@ -1,139 +0,0 @@
"""A2A communication tools — framework-agnostic delegation and peer discovery.
These are plain async functions that any adapter can wrap in its native tool format.
The LangChain @tool versions are in tools/delegation.py.
"""
import os
import uuid
import httpx
# OFFSEC-003: peer-controlled text MUST be wrapped with sanitize_a2a_result
# before being returned to the LLM. This module's delegate_task() is one of
# the trust-boundary entry points where peer output crosses into our agent's
# context — same surface as a2a_tools_delegation.py:325 (fixed via #492).
# Issue #537.
from _sanitize_a2a import sanitize_a2a_result
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")
async def list_peers() -> list[dict]:
"""Get this workspace's peers from the platform registry."""
async with httpx.AsyncClient(timeout=10.0) as client:
try:
resp = await client.get(f"{PLATFORM_URL}/registry/{WORKSPACE_ID}/peers")
if resp.status_code == 200:
return resp.json()
return []
except Exception:
return []
async def delegate_task(workspace_id: str, task: str) -> str:
"""Send a task to a peer workspace via A2A and return the response text."""
# Task #190 / #193 — Self-delegation guard. Without this, a workspace
# delegating to its own UUID round-trips through the platform proxy back
# into the sender; the synchronous handler waits on the same lock the
# caller holds, the request times out, and the platform writes an
# a2a_receive activity row with source_id=our own workspace UUID. The
# inbox poller then surfaces that row as kind="peer_agent" and the agent
# sees the timeout echoed back as a peer instructing it (#190).
#
# The sibling guards live in:
# - workspace-server/internal/handlers/delegation.go (Go API gate)
# - workspace/a2a_tools_delegation.py (MCP path guard)
# This module is the framework-agnostic adapter surface used by adapters
# that don't go through a2a_tools_delegation.py — it needs its own guard.
if WORKSPACE_ID and workspace_id == WORKSPACE_ID:
return (
"Error: self-delegation rejected (cannot delegate_task to your own "
"workspace). There is no peer who is also you — the platform proxy "
"would deadlock and the timeout would echo back as a peer_agent "
"message from yourself (#190). Do the work directly, or use "
"commit_memory / send_message_to_user instead."
)
async with httpx.AsyncClient(timeout=120.0) as client:
# Discover target URL
try:
resp = await client.get(
f"{PLATFORM_URL}/registry/discover/{workspace_id}",
headers={"X-Workspace-ID": WORKSPACE_ID},
)
if resp.status_code != 200:
return f"Error: cannot reach workspace {workspace_id} (status {resp.status_code})"
target_url = resp.json().get("url", "")
if not target_url:
return f"Error: workspace {workspace_id} has no URL"
except Exception as e:
return f"Error discovering workspace: {e}"
# Send A2A message. X-Workspace-ID identifies us as the source —
# without it the platform's a2a_receive logger writes
# source_id=NULL and the recipient's My Chat tab renders the
# delegation as if a human user typed it. Same hazard fixed
# in heartbeat.py / a2a_client.py / main.py initial+idle flows.
try:
a2a_resp = await client.post(
target_url,
headers={"X-Workspace-ID": WORKSPACE_ID},
json={
"jsonrpc": "2.0",
"id": str(uuid.uuid4()),
"method": "message/send",
"params": {
"message": {
"role": "user",
"messageId": str(uuid.uuid4()),
"parts": [{"kind": "text", "text": task}],
},
},
},
)
data = a2a_resp.json()
if "result" in data:
result = data["result"]
parts = result.get("parts", []) if isinstance(result, dict) else []
if parts and isinstance(parts[0], dict):
# OFFSEC-003: wrap peer-controlled text before returning
# to LLM context. Issue #537.
return sanitize_a2a_result(parts[0].get("text", "(no text)"))
# Empty parts list (e.g. {"parts": []}) should return str(result),
# not "(no text)" — preserves pre-fix behavior (#279 regression fix).
if isinstance(result, dict) and result.get("parts") == []:
return sanitize_a2a_result(str(result))
return sanitize_a2a_result(str(result) if isinstance(result, str) else "(no text)")
elif "error" in data:
err = data["error"]
# Handle both string-form errors ("error": "some string")
# and object-form errors ("error": {"message": "...", "code": ...}).
msg = ""
if isinstance(err, dict):
msg = err.get("message", "")
elif isinstance(err, str):
msg = err
else:
msg = str(err)
# OFFSEC-003: peer-controlled error message; wrap before return.
return sanitize_a2a_result(f"Error: {msg}")
return sanitize_a2a_result(str(data))
except Exception as e:
return f"Error sending A2A message: {e}"
async def get_peers_summary() -> str:
"""Return a formatted string of available peers for system prompts."""
peers = await list_peers()
if not peers:
return "No peers available."
lines = []
for p in peers:
name = p.get("name", "Unknown")
pid = p.get("id", "")
role = p.get("role", "")
status = p.get("status", "")
lines.append(f"- {name} (ID: {pid}) — {role} [{status}]")
return "Available peers:\n" + "\n".join(lines)
-320
View File
@@ -1,320 +0,0 @@
"""Approval tool for human-in-the-loop workflows.
When an agent encounters a destructive, expensive, or unauthorized action,
it calls request_approval() which creates a request and waits for a decision.
## Notification strategy
By default this module uses a **WebSocket subscription** (APPROVAL_USE_WEBSOCKET=true
or when the ``websockets`` package is installed). The platform pushes an
``APPROVAL_DECIDED`` event to the workspace WebSocket as soon as a human
clicks Approve / Deny on the canvas — no polling required, instant delivery.
If WebSocket is unavailable (env var opt-out or import error) the module
falls back to a **polling loop** so existing deployments without WebSocket
support continue to work without any config change.
RBAC enforcement
----------------
The calling workspace must hold a role that grants the ``"approve"`` action.
Roles are read from ``config.yaml`` under ``rbac.roles`` (default: operator).
Audit trail
-----------
Every approval lifecycle emits structured JSON Lines records:
1. ``approval / approve / requested`` — request submitted to platform
2. ``approval / approve / granted`` — human approved (actor = decided_by)
3. ``approval / approve / denied`` — human denied (actor = decided_by)
4. ``approval / approve / timeout`` — no decision within APPROVAL_TIMEOUT
RBAC denials emit an ``rbac / rbac.deny / denied`` event instead.
Environment variables
---------------------
PLATFORM_URL Platform base URL (default: http://platform:8080)
WORKSPACE_ID This workspace's ID (default: "")
APPROVAL_TIMEOUT Max wait in seconds (default: 300)
APPROVAL_POLL_INTERVAL Polling interval in seconds (default: 5, polling path only)
APPROVAL_USE_WEBSOCKET "true" to force WS, "false"
to force polling (default: auto-detect)
AUDIT_LOG_PATH Path for JSON Lines audit log (default: /var/log/molecule/audit.jsonl)
"""
import asyncio
import json
import logging
import os
import uuid
import httpx
from langchain_core.tools import tool
from builtin_tools.audit import check_permission, get_workspace_roles, log_event
logger = logging.getLogger(__name__)
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")
APPROVAL_POLL_INTERVAL = float(os.environ.get("APPROVAL_POLL_INTERVAL", "5"))
APPROVAL_TIMEOUT = float(os.environ.get("APPROVAL_TIMEOUT", "300"))
# Auto-detect WebSocket support; can be overridden with env var
_ws_env = os.environ.get("APPROVAL_USE_WEBSOCKET", "").lower()
if _ws_env == "false":
_USE_WEBSOCKET_DEFAULT = False
elif _ws_env == "true":
_USE_WEBSOCKET_DEFAULT = True
else:
try:
import websockets as _ws_probe # noqa: F401
_USE_WEBSOCKET_DEFAULT = True
except ImportError:
_USE_WEBSOCKET_DEFAULT = False
# Module-level reference so tests can monkeypatch it
try:
import websockets
except ImportError:
websockets = None # type: ignore[assignment]
# Expose for test introspection
APPROVAL_USE_WEBSOCKET = _USE_WEBSOCKET_DEFAULT
# ---------------------------------------------------------------------------
# Internal helpers
# ---------------------------------------------------------------------------
async def _create_approval_request(action: str, reason: str) -> dict:
"""POST to the platform to create an approval request.
Returns {"approval_id": str} on success or {"error": str} on failure.
"""
async with httpx.AsyncClient(timeout=10.0) as client:
try:
resp = await client.post(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/approvals",
json={"action": action, "reason": reason},
)
if resp.status_code != 201:
return {"error": f"Failed to create request: {resp.status_code}"}
try:
approval_id = resp.json().get("approval_id")
except (ValueError, Exception):
return {"error": f"Platform returned invalid JSON (status {resp.status_code})"}
logger.info("Approval requested: %s (id=%s)", action, approval_id)
return {"approval_id": approval_id}
except Exception as e:
return {"error": f"Failed to request approval: {e}"}
async def _wait_websocket(approval_id: str, timeout: float) -> dict:
"""Subscribe to the platform WebSocket and wait for APPROVAL_DECIDED event.
Returns the decision dict or raises asyncio.TimeoutError on expiry.
"""
ws_url = (
PLATFORM_URL.replace("http://", "ws://").replace("https://", "wss://")
+ "/ws"
)
headers = {"X-Workspace-ID": WORKSPACE_ID}
logger.debug("Approval %s: waiting via WebSocket %s", approval_id, ws_url)
async with websockets.connect(ws_url, additional_headers=headers) as ws:
async for raw_message in ws:
try:
event = json.loads(raw_message)
except json.JSONDecodeError:
continue
if event.get("event") != "APPROVAL_DECIDED":
continue
if event.get("approval_id") != approval_id:
continue
status = event.get("status")
decided_by = event.get("decided_by", "")
logger.info("Approval %s decided via WebSocket: %s by %s",
approval_id, status, decided_by)
if status == "approved":
return {
"approved": True,
"approval_id": approval_id,
"decided_by": decided_by,
}
else:
return {
"approved": False,
"approval_id": approval_id,
"decided_by": decided_by,
"message": "Denied by human",
}
async def _wait_polling(approval_id: str, timeout: float) -> dict:
"""Legacy polling loop — checks platform REST endpoint every APPROVAL_POLL_INTERVAL seconds."""
elapsed = 0.0
async with httpx.AsyncClient(timeout=10.0) as client:
while elapsed < timeout:
await asyncio.sleep(APPROVAL_POLL_INTERVAL)
elapsed += APPROVAL_POLL_INTERVAL
try:
resp = await client.get(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/approvals",
)
if resp.status_code == 200:
for a in resp.json():
if a.get("id") == approval_id:
status = a.get("status")
if status == "approved":
logger.info("Approval granted (poll): %s", approval_id)
return {
"approved": True,
"approval_id": approval_id,
"decided_by": a.get("decided_by"),
}
elif status == "denied":
logger.info("Approval denied (poll): %s", approval_id)
return {
"approved": False,
"approval_id": approval_id,
"decided_by": a.get("decided_by"),
"message": "Denied by human",
}
except Exception:
pass # transient error — keep retrying
raise asyncio.TimeoutError()
# ---------------------------------------------------------------------------
# Public tool
# ---------------------------------------------------------------------------
@tool
async def request_approval(
action: str,
reason: str,
) -> dict:
"""Request human approval before proceeding with a sensitive action.
Use this when you're about to do something destructive, expensive,
or outside your normal authority. The request is sent to the canvas
where a human can approve or deny it.
Args:
action: Short description of what you want to do
reason: Why this action is necessary
"""
# One trace_id links every audit event for this approval lifecycle.
trace_id = str(uuid.uuid4())
# --- RBAC check -----------------------------------------------------------
roles, custom_perms = get_workspace_roles()
if not check_permission("approve", roles, custom_perms):
log_event(
event_type="rbac",
action="rbac.deny",
resource=action,
outcome="denied",
trace_id=trace_id,
attempted_action="approve",
roles=roles,
)
return {
"approved": False,
"error": (
"RBAC: this workspace does not have the 'approve' permission. "
f"Current roles: {roles}"
),
}
# Step 1: Create the approval request
creation = await _create_approval_request(action, reason)
if "error" in creation:
log_event(
event_type="approval",
action="approve",
resource=action,
outcome="failure",
trace_id=trace_id,
reason="submit_failed",
error=creation["error"],
)
return {"approved": False, "error": creation["error"]}
approval_id = creation["approval_id"]
log_event(
event_type="approval",
action="approve",
resource=action,
outcome="requested",
trace_id=trace_id,
approval_id=approval_id,
reason_text=reason,
)
timeout = float(os.environ.get("APPROVAL_TIMEOUT", str(APPROVAL_TIMEOUT)))
# Step 2: Wait for decision — WebSocket preferred, polling as fallback
use_ws = APPROVAL_USE_WEBSOCKET and websockets is not None
try:
if use_ws:
try:
result = await asyncio.wait_for(
_wait_websocket(approval_id, timeout),
timeout=timeout,
)
except Exception as ws_err:
# WebSocket failed (connection error, etc.) — fall through to polling
logger.warning(
"WebSocket approval wait failed (%s), falling back to polling",
ws_err,
)
result = await asyncio.wait_for(
_wait_polling(approval_id, timeout),
timeout=timeout + APPROVAL_POLL_INTERVAL,
)
else:
# Polling path (primary when WS disabled)
result = await asyncio.wait_for(
_wait_polling(approval_id, timeout),
timeout=timeout + APPROVAL_POLL_INTERVAL, # slight grace period
)
# Log the human decision
decided_by = result.get("decided_by")
outcome = "granted" if result.get("approved") else "denied"
log_event(
event_type="approval",
action="approve",
resource=action,
outcome=outcome,
# Record the human identity as actor when available
actor=decided_by or WORKSPACE_ID,
trace_id=trace_id,
approval_id=approval_id,
decided_by=decided_by,
)
return result
except asyncio.TimeoutError:
logger.warning("Approval timed out after %.0fs: %s", timeout, approval_id)
log_event(
event_type="approval",
action="approve",
resource=action,
outcome="timeout",
trace_id=trace_id,
approval_id=approval_id,
timeout_seconds=timeout,
)
return {
"approved": False,
"approval_id": approval_id,
"error": f"Timed out after {timeout}s waiting for human decision",
}
-274
View File
@@ -1,274 +0,0 @@
"""Immutable append-only audit log for EU AI Act compliance.
Fulfils Article 12 (record-keeping), Article 13 (transparency), and
Article 17 (quality-management system) requirements for high-risk AI systems.
Log format: JSON Lines (one UTF-8 JSON object per line), suitable for direct
ingestion by any SIEM (Splunk, Elastic, Datadog, etc.).
Required event fields
---------------------
timestamp ISO 8601 UTC datetime with timezone offset
event_type Coarse category: "delegation", "approval", "memory", "rbac"
workspace_id Workspace that generated this event
actor Entity that triggered the action; defaults to workspace_id for
automated events, or the human identity for approval decisions
action Verb describing what was attempted:
delegate | approve | memory.read | memory.write | rbac.deny
resource Object of the action: target workspace ID, memory scope,
approval action string, etc.
outcome One of: allowed | denied | success | failure | timeout |
requested | granted
trace_id UUID v4 correlating related events across workspaces
The log file is opened in append mode ("a") on every write — it is NEVER
truncated, rewritten, or deleted by this module. Rotate externally using
logrotate (with ``copytruncate`` disabled) or ship to a SIEM before rotating.
Configuration
-------------
AUDIT_LOG_PATH env var — full path to the JSONL file
default: /var/log/molecule/audit.jsonl
"""
from __future__ import annotations
import functools
import json
import logging
import os
import threading
import uuid
from datetime import datetime, timezone
from pathlib import Path
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
pass # avoid circular import at runtime
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------
AUDIT_LOG_PATH: str = os.environ.get(
"AUDIT_LOG_PATH", "/var/log/molecule/audit.jsonl"
)
WORKSPACE_ID: str = os.environ.get("WORKSPACE_ID", "")
# Protects the open() + write() sequence; prevents interleaved JSON lines
# when multiple async tasks run in the same event-loop thread.
_write_lock = threading.Lock()
# ---------------------------------------------------------------------------
# Built-in role → permitted-action mappings
# ---------------------------------------------------------------------------
#: Maps each built-in role name to the set of actions it grants.
#: Custom roles can be added in config.yaml under ``rbac.allowed_actions``.
ROLE_PERMISSIONS: dict[str, set[str]] = {
# Full access — shortcircuits all other checks
"admin": {"delegate", "approve", "memory.read", "memory.write"},
# Standard agent role
"operator": {"delegate", "approve", "memory.read", "memory.write"},
# Read-only observer — no writes, no delegation, no approvals
"read-only": {"memory.read"},
# Can approve and write memory, but cannot delegate
"no-delegation": {"approve", "memory.read", "memory.write"},
# Can delegate and write memory, but cannot invoke approval gate
"no-approval": {"delegate", "memory.read", "memory.write"},
# Memory reads only (useful for analytic sidecars)
"memory-readonly": {"memory.read"},
}
# ---------------------------------------------------------------------------
# Config loader (lazy, cached per process)
# ---------------------------------------------------------------------------
@functools.lru_cache(maxsize=1)
def _load_workspace_config():
"""Return the WorkspaceConfig or None if it cannot be loaded."""
try:
from config import load_config # local import avoids circular deps
return load_config()
except Exception as exc:
logger.warning("audit: could not load workspace config for RBAC: %s", exc)
return None
def get_workspace_roles() -> tuple[list[str], dict[str, list[str]]]:
"""Return ``(roles, custom_permissions)`` from the workspace config.
Falls back to ``["operator"]`` / ``{}`` when the config is unavailable so
that agents remain functional in degraded environments.
"""
cfg = _load_workspace_config()
if cfg is None:
return ["operator"], {}
return list(cfg.rbac.roles), dict(cfg.rbac.allowed_actions)
# ---------------------------------------------------------------------------
# RBAC helpers
# ---------------------------------------------------------------------------
def check_permission(
action: str,
roles: list[str],
custom_permissions: dict[str, list[str]] | None = None,
) -> bool:
"""Return True if *any* of ``roles`` grants ``action``.
Evaluation order
~~~~~~~~~~~~~~~~
1. ``"admin"`` shortcircuits — always grants everything.
2. Custom role definitions (from ``rbac.allowed_actions`` in config.yaml).
3. Built-in :data:`ROLE_PERMISSIONS` table.
When a role appears in *custom_permissions* its built-in definition is
**ignored** — the custom list is the complete permission set for that role.
Args:
action: Action to authorise, e.g. ``"delegate"``.
roles: Roles assigned to the calling workspace.
custom_permissions: Optional ``{role: [action, ...]}`` mapping loaded
from ``WorkspaceConfig.rbac.allowed_actions``.
Returns:
``True`` if the action is permitted, ``False`` otherwise.
Examples::
>>> check_permission("delegate", ["operator"])
True
>>> check_permission("delegate", ["read-only"])
False
>>> check_permission("deploy", ["developer"], {"developer": ["deploy"]})
True
"""
for role in roles:
if role == "admin":
return True
if custom_permissions and role in custom_permissions:
# Custom entry is definitive for this role
if action in custom_permissions[role]:
return True
continue # Don't fall through to built-ins for custom roles
if role in ROLE_PERMISSIONS and action in ROLE_PERMISSIONS[role]:
return True
return False
# ---------------------------------------------------------------------------
# Public audit API
# ---------------------------------------------------------------------------
def log_event(
event_type: str,
action: str,
resource: str,
outcome: str,
actor: str | None = None,
trace_id: str | None = None,
**extra: Any,
) -> str:
"""Append one audit event to the immutable JSON Lines log.
Args:
event_type: Coarse category — ``"delegation"``, ``"approval"``,
``"memory"``, or ``"rbac"``.
action: Verb — ``"delegate"``, ``"approve"``, ``"memory.write"``,
``"memory.read"``, ``"rbac.deny"``.
resource: Object of the action — target workspace ID, memory scope,
approval action string, etc.
outcome: Terminal state — one of ``"allowed"``, ``"denied"``,
``"success"``, ``"failure"``, ``"timeout"``,
``"requested"``, ``"granted"``.
actor: Identity that triggered the event. Defaults to
``WORKSPACE_ID`` (the running workspace) for automated
events. Pass ``decided_by`` for human approval decisions.
trace_id: Caller-supplied UUID v4 for cross-event correlation.
A fresh UUID is generated when omitted.
**extra: Additional key-value pairs appended verbatim to the JSON
object (e.g. ``target_workspace_id``, ``memory_scope``,
``attempt``). Built-in keys cannot be overridden.
Returns:
The ``trace_id`` used for this event, enabling callers to chain
related events under a single correlation identifier.
Example::
trace = log_event(
event_type="delegation",
action="delegate",
resource="billing-agent",
outcome="success",
target_workspace_id="billing-agent",
attempt=1,
)
"""
if trace_id is None:
trace_id = str(uuid.uuid4())
event: dict[str, Any] = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"event_type": event_type,
"workspace_id": WORKSPACE_ID,
"actor": actor if actor is not None else WORKSPACE_ID,
"action": action,
"resource": resource,
"outcome": outcome,
"trace_id": trace_id,
}
# Merge extra fields — built-in keys are not overridable
for key, value in extra.items():
if key not in event:
event[key] = value
_write_event(event)
return trace_id
# ---------------------------------------------------------------------------
# Internal writer
# ---------------------------------------------------------------------------
def _ensure_log_dir(path: str) -> None:
"""Create the parent directory for *path* if it does not already exist."""
Path(path).parent.mkdir(parents=True, exist_ok=True)
def _write_event(event: dict[str, Any]) -> None:
"""Serialise *event* as a JSON line and fsync-append it to the log file.
The write is atomic with respect to other threads in this process: the
lock ensures that no two JSON objects are interleaved on the same line.
Failures are emitted to the standard Python logger at WARNING level but
are **never** re-raised — the application must not crash because audit
logging is temporarily unavailable (e.g. disk full, permission error).
In production, consider wiring an alert on WARNING messages from this
module so that missing audit records are detected quickly.
"""
try:
log_path = AUDIT_LOG_PATH
_ensure_log_dir(log_path)
line = json.dumps(event, default=str, ensure_ascii=False) + "\n"
with _write_lock:
with open(log_path, "a", encoding="utf-8") as fh:
fh.write(line)
fh.flush()
os.fsync(fh.fileno())
except Exception as exc: # pylint: disable=broad-except
logger.warning(
"Audit log write failed — event NOT persisted "
"(trace_id=%s, action=%s): %s",
event.get("trace_id", "?"),
event.get("action", "?"),
exc,
)
-122
View File
@@ -1,122 +0,0 @@
"""Workspace-scoped awareness backend wrapper.
The agent-facing memory tools keep their existing signatures and delegate
to this helper when workspace awareness is configured.
"""
from __future__ import annotations
import os
import sys
from types import SimpleNamespace
from typing import Any
from policies.namespaces import resolve_awareness_namespace
try: # pragma: no cover - optional runtime dependency in lightweight test envs
import httpx # type: ignore
except ImportError: # pragma: no cover
httpx = SimpleNamespace(AsyncClient=None)
DEFAULT_AWARENESS_TIMEOUT = 10.0
def get_awareness_config() -> dict[str, str] | None:
"""Return awareness connection settings if the workspace is configured."""
base_url = os.environ.get("AWARENESS_URL", "").rstrip("/")
workspace_id = os.environ.get("WORKSPACE_ID", "")
configured_namespace = os.environ.get("AWARENESS_NAMESPACE", "")
if not base_url:
return None
if not workspace_id and not configured_namespace:
return None
namespace = resolve_awareness_namespace(workspace_id, configured_namespace)
return {
"base_url": base_url,
"namespace": namespace,
}
class AwarenessClient:
"""Small HTTP client for workspace-scoped awareness memory operations."""
def __init__(self, base_url: str, namespace: str, timeout: float = DEFAULT_AWARENESS_TIMEOUT):
self.base_url = base_url.rstrip("/")
self.namespace = namespace
self.timeout = timeout
def _memories_url(self) -> str:
# Keep the awareness path isolated in one helper so the contract can
# be adjusted later without touching the agent-facing tools.
return f"{self.base_url}/api/v1/namespaces/{self.namespace}/memories"
async def commit(self, content: str, scope: str) -> dict[str, Any]:
client_cls = _resolve_async_client()
async with client_cls(timeout=self.timeout) as client:
resp = await client.post(
self._memories_url(),
json={"content": content, "scope": scope},
)
return _parse_commit_response(resp, scope)
async def search(self, query: str = "", scope: str = "") -> dict[str, Any]:
params: dict[str, str] = {}
if query:
params["q"] = query
if scope:
params["scope"] = scope
client_cls = _resolve_async_client()
async with client_cls(timeout=self.timeout) as client:
resp = await client.get(self._memories_url(), params=params)
return _parse_search_response(resp)
def build_awareness_client() -> AwarenessClient | None:
"""Create an awareness client from the current workspace environment."""
config = get_awareness_config()
if not config:
return None
return AwarenessClient(config["base_url"], config["namespace"])
def _parse_commit_response(resp: httpx.Response, scope: str) -> dict[str, Any]:
data = _safe_json(resp)
if resp.status_code in (200, 201):
return {"success": True, "id": data.get("id"), "scope": scope}
return {"success": False, "error": data.get("error", resp.text)}
def _parse_search_response(resp: httpx.Response) -> dict[str, Any]:
data = _safe_json(resp)
if resp.status_code == 200:
memories = data if isinstance(data, list) else data.get("memories", [])
return {
"success": True,
"count": len(memories),
"memories": memories,
}
return {"success": False, "error": data.get("error", resp.text)}
def _safe_json(resp: httpx.Response) -> dict[str, Any] | list[Any]:
try:
return resp.json()
except ValueError:
return {"error": resp.text}
def _resolve_async_client():
client_cls = getattr(httpx, "AsyncClient", None)
if client_cls is not None:
return client_cls
memory_module = sys.modules.get("builtin_tools.memory")
if memory_module is not None:
memory_httpx = getattr(memory_module, "httpx", None)
client_cls = getattr(memory_httpx, "AsyncClient", None)
if client_cls is not None:
return client_cls
raise RuntimeError("httpx.AsyncClient is unavailable")
-359
View File
@@ -1,359 +0,0 @@
"""OWASP Top 10 for Agentic Applications compliance enforcement (Dec 2025).
Enable via config.yaml::
compliance:
mode: owasp_agentic
prompt_injection: detect # detect | block
max_tool_calls_per_task: 50
max_task_duration_seconds: 300
When ``mode`` is absent or empty, this module is a no-op — no overhead, no
behaviour change. This makes it safe to import unconditionally.
Coverage
--------
OA-01 Prompt Injection (``sanitize_input``)
Scans user-supplied text for instruction-override patterns, role-hijacking
attempts, system-prompt delimiter injection, and known jailbreak keywords.
- ``detect`` (default): log an audit event, return the original text so
the agent still processes the input. Operators are alerted without
breaking legitimate use-cases that happen to contain trigger words.
- ``block``: raise ``PromptInjectionError`` before the agent sees the text.
OA-03 Excessive Agency (``check_agency_limits``)
Tracks the number of tool calls and wall-clock time elapsed per task.
When a limit is exceeded, ``ExcessiveAgencyError`` is raised. The caller
(``a2a_executor.py``) catches it and terminates the task gracefully.
OA-02 / OA-06 Insecure Output / Sensitive Data Exposure (``redact_pii``)
Scans agent output for credit-card numbers, SSNs, API keys, AWS access
keys, and e-mail addresses. Detected values are replaced with
``[REDACTED:<type>]`` tokens before the response reaches the caller.
An audit event records the PII types found (not the values themselves).
Note on streaming: ``redact_pii`` is applied to the *final accumulated
text* before the terminal ``Message`` event is emitted. Token-by-token
SSE artifacts that have already been sent to streaming clients are not
retroactively redacted. For full streaming redaction, integrate
``redact_pii`` at the ``TaskArtifactUpdateEvent`` level.
Compliance posture report (``get_compliance_posture``)
Returns the current effective compliance configuration as a plain ``dict``
suitable for a health or audit endpoint, letting operators verify that the
correct settings are active without reading config files.
"""
from __future__ import annotations
import logging
import re
import time
import uuid
from dataclasses import dataclass, field
from typing import Any
from builtin_tools.audit import log_event
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Public exceptions
# ---------------------------------------------------------------------------
class PromptInjectionError(ValueError):
"""Raised when prompt injection is detected and ``prompt_injection=block``."""
class ExcessiveAgencyError(RuntimeError):
"""Raised when the tool-call count or task-duration limit is exceeded."""
# ---------------------------------------------------------------------------
# OA-01 — Prompt Injection detection
# ---------------------------------------------------------------------------
#: Compiled patterns matched against normalised (lowercased + collapsed) input.
#: Add workspace-specific patterns in config if needed.
_INJECTION_PATTERNS: list[tuple[re.Pattern[str], str]] = [
# Instruction override
(re.compile(r"ignore\s+(all\s+)?previous\s+instructions?", re.I), "instruction_override"),
(re.compile(r"disregard\s+(all\s+)?previous", re.I), "instruction_override"),
(re.compile(r"forget\s+(all\s+)?previous", re.I), "instruction_override"),
(re.compile(r"override\s+(your\s+)?(instructions?|guidelines?|rules?)", re.I), "instruction_override"),
# Role hijacking
(re.compile(r"you\s+are\s+now\s+\w", re.I), "role_hijack"),
(re.compile(r"act\s+as\s+(a\s+)?(new\s+|different\s+|unrestricted\s+)", re.I), "role_hijack"),
(re.compile(r"roleplay\s+as", re.I), "role_hijack"),
(re.compile(r"pretend\s+(you\s+are|to\s+be)\b", re.I), "role_hijack"),
(re.compile(r"from\s+now\s+on\s+(you\s+are|act\s+as)", re.I), "role_hijack"),
# System-prompt delimiter injection (LLM-specific tokens)
(re.compile(r"<\|?\s*(system|im_start|im_end|endoftext)\s*\|?>", re.I), "delimiter_injection"),
(re.compile(r"\[INST\]|\[/INST\]|\[\[SYS\]\]|\[\[/SYS\]\]", re.I), "delimiter_injection"),
(re.compile(r"<</SYS>>|<<SYS>>", re.I), "delimiter_injection"),
# DAN / jailbreak keywords
(re.compile(r"\bDAN\b.{0,30}(mode|now|enabled|activated)", re.I), "jailbreak"),
(re.compile(r"do\s+anything\s+now", re.I), "jailbreak"),
(re.compile(r"\bjailbreak\b", re.I), "jailbreak"),
(re.compile(r"developer\s+mode\s+(enabled|on)", re.I), "jailbreak"),
# Prompt exfiltration
(re.compile(r"(repeat|print|output|show|reveal|display)\s+(your\s+)?(system\s+prompt|initial\s+instructions?)", re.I), "prompt_exfiltration"),
(re.compile(r"what\s+(are\s+)?your\s+(instructions?|system\s+prompt)", re.I), "prompt_exfiltration"),
]
def detect_prompt_injection(text: str) -> list[tuple[str, str]]:
"""Return a list of ``(pattern_description, category)`` for each match.
Args:
text: Raw user input to scan.
Returns:
List of ``(matched_pattern, category)`` tuples; empty means clean.
"""
matches: list[tuple[str, str]] = []
for pattern, category in _INJECTION_PATTERNS:
m = pattern.search(text)
if m:
matches.append((m.group(0)[:80], category))
return matches
def sanitize_input(
text: str,
*,
prompt_injection_mode: str = "detect",
context_id: str = "",
) -> str:
"""Check *text* for prompt injection and enforce the configured response.
Args:
text: User-supplied input to the agent.
prompt_injection_mode: ``"detect"`` or ``"block"``.
context_id: Task/context identifier for audit correlation.
Returns:
The original *text* unchanged (``detect`` mode always returns input).
Raises:
:class:`PromptInjectionError`: only when ``prompt_injection_mode="block"``
and at least one injection pattern is matched.
"""
matches = detect_prompt_injection(text)
if not matches:
return text
categories = list({cat for _, cat in matches})
trace_id = str(uuid.uuid4())
log_event(
event_type="compliance",
action="prompt_injection.detect",
resource="user_input",
outcome="detected" if prompt_injection_mode == "detect" else "blocked",
trace_id=trace_id,
context_id=context_id,
categories=categories,
match_count=len(matches),
# Log category + truncated match, never the full raw text (OA-06)
matches=[{"category": cat, "snippet": snippet} for snippet, cat in matches[:5]],
)
if prompt_injection_mode == "block":
raise PromptInjectionError(
f"Prompt injection detected ({', '.join(categories)}). "
"Request blocked by compliance policy."
)
# detect mode — log and continue
logger.warning(
"Prompt injection patterns detected (context_id=%s, categories=%s) — "
"passing to agent in detect mode",
context_id,
categories,
)
return text
# ---------------------------------------------------------------------------
# OA-03 — Excessive Agency
# ---------------------------------------------------------------------------
@dataclass
class AgencyTracker:
"""Per-task mutable state for excessive-agency enforcement.
Instantiate once per ``execute()`` call and pass to
:func:`check_agency_limits` at each tool-start event.
"""
max_tool_calls: int = 50
max_duration_seconds: float = 300.0
tool_call_count: int = field(default=0, init=False)
start_time: float = field(default_factory=time.monotonic, init=False)
def on_tool_call(self, tool_name: str = "", context_id: str = "") -> None:
"""Increment counter and enforce limits.
Raises:
:class:`ExcessiveAgencyError`: if either limit is exceeded.
"""
self.tool_call_count += 1
elapsed = time.monotonic() - self.start_time
if self.tool_call_count > self.max_tool_calls:
log_event(
event_type="compliance",
action="excessive_agency.tool_limit",
resource=tool_name or "unknown_tool",
outcome="blocked",
context_id=context_id,
tool_call_count=self.tool_call_count,
limit=self.max_tool_calls,
elapsed_seconds=round(elapsed, 2),
)
raise ExcessiveAgencyError(
f"Tool call limit exceeded: {self.tool_call_count} calls > "
f"max {self.max_tool_calls} per task"
)
if elapsed > self.max_duration_seconds:
log_event(
event_type="compliance",
action="excessive_agency.duration_limit",
resource=tool_name or "unknown_tool",
outcome="blocked",
context_id=context_id,
tool_call_count=self.tool_call_count,
elapsed_seconds=round(elapsed, 2),
limit_seconds=self.max_duration_seconds,
)
raise ExcessiveAgencyError(
f"Task duration limit exceeded: {elapsed:.0f}s > "
f"max {self.max_duration_seconds:.0f}s per task"
)
# ---------------------------------------------------------------------------
# OA-02 / OA-06 — PII redaction
# ---------------------------------------------------------------------------
#: ``(compiled_pattern, replacement_token)`` pairs applied in order.
#: The replacement tokens are SIEM-friendly: ``[REDACTED:type]``.
_PII_PATTERNS: list[tuple[re.Pattern[str], str]] = [
# Formatted credit cards: XXXX-XXXX-XXXX-XXXX or XXXX XXXX XXXX XXXX
(re.compile(r"\b\d{4}[\s\-]\d{4}[\s\-]\d{4}[\s\-]\d{4}\b"), "[REDACTED:credit_card]"),
# US Social Security Numbers: XXX-XX-XXXX
(re.compile(r"\b\d{3}-\d{2}-\d{4}\b"), "[REDACTED:ssn]"),
# OpenAI-style keys: sk-... (≥ 32 chars after prefix)
(re.compile(r"\bsk-[A-Za-z0-9_\-]{32,}\b"), "[REDACTED:api_key]"),
# Generic API/secret keys with common prefixes
(re.compile(r"\b(?:sk|pk|api|secret|token|auth)[-_][A-Za-z0-9_\-]{20,}\b", re.I), "[REDACTED:api_key]"),
# AWS Access Key IDs
(re.compile(r"\bAKIA[0-9A-Z]{16}\b"), "[REDACTED:aws_key]"),
# GitHub personal access tokens — classic format (36-char alphanumeric suffix)
(re.compile(r"\bghp_[A-Za-z0-9]{36}\b"), "[REDACTED:github_token]"),
# GitHub personal access tokens — fine-grained format (82-char alphanumeric+underscore suffix)
(re.compile(r"\bgithub_pat_[A-Za-z0-9_]{82}\b"), "[REDACTED:github_token]"),
# Email addresses
(re.compile(r"\b[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b"), "[REDACTED:email]"),
]
def redact_pii(text: str) -> tuple[str, list[str]]:
"""Redact PII from *text* and return ``(redacted_text, pii_types_found)``.
Each unique PII type is reported at most once in ``pii_types_found``.
The replacement tokens (``[REDACTED:type]``) are SIEM-indexable and
preserve the structural context of the output while hiding sensitive data.
Args:
text: Agent output text to scan.
Returns:
Tuple of ``(redacted_text, list_of_pii_type_strings)``. The list is
empty when no PII is detected (the common case).
Examples::
>>> redacted, types = redact_pii("Call me at test@example.com sk-abc123...")
>>> "email" in types
True
>>> "[REDACTED:email]" in redacted
True
"""
found: list[str] = []
result = text
for pattern, replacement in _PII_PATTERNS:
new_result = pattern.sub(replacement, result)
if new_result != result:
# Extract type from "[REDACTED:type]"
pii_type = replacement[len("[REDACTED:"):-1]
if pii_type not in found:
found.append(pii_type)
result = new_result
return result, found
# ---------------------------------------------------------------------------
# Compliance posture report
# ---------------------------------------------------------------------------
def get_compliance_posture() -> dict[str, Any]:
"""Return the current compliance configuration as a serialisable dict.
Loads ``WorkspaceConfig`` lazily (cached) and returns a snapshot of the
active compliance settings. Safe to call from a health endpoint.
Returns a dict with these keys::
{
"compliance_mode": "owasp_agentic" | "",
"enabled": true | false,
"prompt_injection": "detect" | "block",
"max_tool_calls_per_task": 50,
"max_task_duration_seconds": 300,
"pii_redaction_enabled": true,
"security_scan_mode": "warn" | "block" | "off",
"rbac_roles": ["operator"],
}
"""
try:
from builtin_tools.audit import _load_workspace_config
cfg = _load_workspace_config()
except Exception:
cfg = None
if cfg is None:
return {
"compliance_mode": "",
"enabled": False,
"prompt_injection": "detect",
"max_tool_calls_per_task": 50,
"max_task_duration_seconds": 300,
"pii_redaction_enabled": False,
"security_scan_mode": "warn",
"rbac_roles": [],
"note": "config unavailable",
}
c = cfg.compliance
enabled = c.mode == "owasp_agentic"
return {
"compliance_mode": c.mode,
"enabled": enabled,
"prompt_injection": c.prompt_injection,
"max_tool_calls_per_task": c.max_tool_calls_per_task,
"max_task_duration_seconds": c.max_task_duration_seconds,
# PII redaction is active whenever compliance mode is on
"pii_redaction_enabled": enabled,
"security_scan_mode": cfg.security_scan.mode,
"rbac_roles": list(cfg.rbac.roles),
}
-550
View File
@@ -1,550 +0,0 @@
"""Async delegation tool for sending tasks to peer workspaces via A2A.
Delegations are non-blocking: the tool fires the A2A request in the background
and returns immediately with a task_id. The agent can check status anytime via
check_task_status, or just continue working and check later.
When the delegate responds, the result is stored and the agent is notified
via a status update.
"""
import asyncio
import os
import uuid
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
import httpx
from langchain_core.tools import tool
from builtin_tools.audit import check_permission, get_workspace_roles, log_event
from builtin_tools.telemetry import (
A2A_SOURCE_WORKSPACE,
A2A_TARGET_WORKSPACE,
A2A_TASK_ID,
WORKSPACE_ID_ATTR,
get_current_traceparent,
get_tracer,
inject_trace_headers,
)
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")
DELEGATION_RETRY_ATTEMPTS = int(os.environ.get("DELEGATION_RETRY_ATTEMPTS", "3"))
DELEGATION_RETRY_DELAY = float(os.environ.get("DELEGATION_RETRY_DELAY", "5.0"))
DELEGATION_TIMEOUT = float(os.environ.get("DELEGATION_TIMEOUT", "300.0"))
class DelegationStatus(str, Enum):
PENDING = "pending"
IN_PROGRESS = "in_progress"
# QUEUED: peer's a2a-proxy returned HTTP 202 + {queued: true}, meaning
# the peer is mid-task and the request was placed in a drain queue.
# The reply will arrive via the platform's stitch path when the
# peer finishes its current work. The LLM should WAIT, not retry,
# and definitely not fall back to doing the work itself — see the
# check_task_status docstring for the prompt-side guidance.
QUEUED = "queued"
COMPLETED = "completed"
FAILED = "failed"
@dataclass
class DelegationTask:
task_id: str
workspace_id: str
task_description: str
status: DelegationStatus = DelegationStatus.PENDING
result: Optional[str] = None
error: Optional[str] = None
# In-memory store of delegation tasks for this workspace
_delegations: dict[str, DelegationTask] = {}
_background_tasks: set[asyncio.Task] = set()
MAX_DELEGATION_HISTORY = 100
logger = __import__("logging").getLogger(__name__)
def _evict_old_delegations():
"""Remove completed/failed delegations when store exceeds MAX_DELEGATION_HISTORY."""
if len(_delegations) <= MAX_DELEGATION_HISTORY:
return
# Evict oldest completed/failed first
removable = [
tid for tid, d in _delegations.items()
if d.status in (DelegationStatus.COMPLETED, DelegationStatus.FAILED)
]
for tid in removable[:len(_delegations) - MAX_DELEGATION_HISTORY]:
del _delegations[tid]
def _on_task_done(task: asyncio.Task):
"""Callback for background tasks — log unhandled exceptions."""
_background_tasks.discard(task)
if not task.cancelled() and task.exception():
logger.error("Delegation background task failed: %s", task.exception())
async def _notify_completion(task_id: str, target_workspace_id: str, status: str):
"""Push notification to platform when delegation completes/fails."""
try:
async with httpx.AsyncClient(timeout=10) as client:
await client.post(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/notify",
json={
"type": "delegation_complete",
"task_id": task_id,
"target_workspace_id": target_workspace_id,
"status": status,
},
)
except Exception as e:
logger.debug("Delegation notify failed (best-effort): %s", e)
async def _record_delegation_on_platform(task_id: str, target_workspace_id: str, task: str):
"""Register the delegation in the platform's activity_logs (#64 fix).
Best-effort POST to /workspaces/<self>/delegations/record. The agent still
fires A2A directly for speed + OTEL propagation, but the platform's
GET /delegations endpoint now mirrors the same set an agent's local
check_task_status sees.
"""
try:
async with httpx.AsyncClient(timeout=10) as client:
await client.post(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/delegations/record",
json={
"target_id": target_workspace_id,
"task": task,
"delegation_id": task_id,
},
)
except Exception as e:
logger.debug("Delegation record failed (best-effort): %s", e)
async def _refresh_queued_from_platform(task_id: str) -> bool:
"""Lazy-refresh a QUEUED delegation's local state from the platform.
Called by check_task_status when local status is QUEUED. The
platform's drain stitch (a2a_queue.go) updates the delegate_result
activity_logs row when a queued delegation eventually completes,
but it has no callback to this runtime — without this lazy refresh,
the LLM polling check_task_status would see "queued" forever
even after the platform has the result.
Returns True if the local delegation was updated to a terminal state
(completed/failed), False otherwise. Best-effort — network/parse
errors leave the local state untouched and let the next call retry.
"""
delegation = _delegations.get(task_id)
if not delegation:
return False
try:
async with httpx.AsyncClient(timeout=10) as client:
resp = await client.get(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/delegations",
headers={},
)
if resp.status_code != 200:
return False
entries = resp.json()
if not isinstance(entries, list):
return False
except Exception as e:
logger.debug("refresh queued delegation %s: %s", task_id, e)
return False
# Find the latest delegate_result row matching our task_id.
# Platform list is newest-first; the first match is the freshest.
for entry in entries:
if entry.get("delegation_id") != task_id:
continue
if entry.get("type") != "delegation":
continue
# Only delegate_result rows carry the eventual outcome; the
# initial 'delegate' row stays at status='pending' even after
# the result lands. Filtering on summary text is brittle, but
# the rows from the LIST endpoint don't include `method`. The
# `delegate_result` rows are the ones with `error` (failure)
# or `response_preview` (success) populated — pick those.
status = entry.get("status", "")
if status == "completed":
delegation.status = DelegationStatus.COMPLETED
delegation.result = entry.get("response_preview", "")
await _notify_completion(task_id, delegation.workspace_id, "completed")
return True
if status == "failed":
delegation.status = DelegationStatus.FAILED
delegation.error = entry.get("error", "")
await _notify_completion(task_id, delegation.workspace_id, "failed")
return True
# status == "queued" / "pending" / "dispatched": platform hasn't
# resolved yet; leave local state unchanged so the next poll
# retries. Don't break — keep scanning in case there's a newer
# entry for the same task_id (possible if the same delegation
# was retried).
return False
async def _update_delegation_on_platform(task_id: str, status: str, error: str = "", response_preview: str = ""):
"""Mirror status changes to the platform's activity_logs (#64 fix).
Paired with _record_delegation_on_platform — fires on completion/failure
so the platform view stays in sync with the agent's local dict.
"""
try:
async with httpx.AsyncClient(timeout=10) as client:
await client.post(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/delegations/{task_id}/update",
json={
"status": status,
"error": error,
"response_preview": response_preview[:500],
},
)
except Exception as e:
logger.debug("Delegation update failed (best-effort): %s", e)
async def _execute_delegation(task_id: str, workspace_id: str, task: str):
"""Background coroutine that sends the A2A request and stores the result."""
delegation = _delegations[task_id]
delegation.status = DelegationStatus.IN_PROGRESS
# #64: register on the platform so GET /workspaces/<self>/delegations
# sees the same set as check_task_status. Best-effort — platform
# unreachability must not block the actual A2A delegation.
await _record_delegation_on_platform(task_id, workspace_id, task)
tracer = get_tracer()
with tracer.start_as_current_span("task_delegate") as delegate_span:
delegate_span.set_attribute(WORKSPACE_ID_ATTR, WORKSPACE_ID)
delegate_span.set_attribute(A2A_SOURCE_WORKSPACE, WORKSPACE_ID)
delegate_span.set_attribute(A2A_TARGET_WORKSPACE, workspace_id)
delegate_span.set_attribute(A2A_TASK_ID, task_id)
async with httpx.AsyncClient(timeout=DELEGATION_TIMEOUT) as client:
# Discover target URL
try:
discover_resp = await client.get(
f"{PLATFORM_URL}/registry/discover/{workspace_id}",
headers={"X-Workspace-ID": WORKSPACE_ID},
)
if discover_resp.status_code != 200:
delegation.status = DelegationStatus.FAILED
delegation.error = f"Discovery failed: HTTP {discover_resp.status_code}"
log_event(event_type="delegation", action="delegate", resource=workspace_id,
outcome="failure", trace_id=task_id, reason="discovery_error")
return
target_url = discover_resp.json().get("url")
if not target_url:
delegation.status = DelegationStatus.FAILED
delegation.error = "No URL for workspace"
return
except Exception as e:
delegation.status = DelegationStatus.FAILED
delegation.error = f"Discovery error: {e}"
return
# Send A2A with retry
outgoing_headers = inject_trace_headers({
"Content-Type": "application/json",
"X-Workspace-ID": WORKSPACE_ID,
})
traceparent = get_current_traceparent()
last_error = None
for attempt in range(DELEGATION_RETRY_ATTEMPTS):
try:
a2a_resp = await client.post(
target_url,
headers=outgoing_headers,
json={
"jsonrpc": "2.0",
"method": "message/send",
"id": f"delegation-{task_id}-{attempt}",
"params": {
"message": {
"role": "user",
"parts": [{"kind": "text", "text": task}],
"messageId": f"msg-{task_id}-{attempt}",
},
"metadata": {
"parent_task_id": task_id,
"source_workspace_id": WORKSPACE_ID,
"traceparent": traceparent,
},
},
},
)
# HTTP 202 + {queued: true} = peer's a2a-proxy
# accepted the request but the peer's runtime is
# mid-task. Platform-side drain will deliver the
# reply asynchronously. Mark QUEUED locally so
# check_task_status can surface that state
# to the LLM with explicit "wait, don't bypass"
# guidance. Do NOT mark FAILED — the request is
# alive in the platform's queue, not lost.
#
# Without this branch, the loop falls through, the
# `if "error" in result` line below references an
# unbound `result`, and the eventual FAILED status
# leads the LLM to conclude the peer is permanently
# unavailable — at which point it does the delegated
# work itself, defeating the whole orchestration.
if a2a_resp.status_code == 202:
try:
queued_body = a2a_resp.json()
except Exception:
queued_body = {}
if queued_body.get("queued") is True:
delegation.status = DelegationStatus.QUEUED
log_event(
event_type="delegation", action="delegate",
resource=workspace_id, outcome="queued",
trace_id=task_id, attempt=attempt + 1,
)
await _notify_completion(task_id, workspace_id, "queued")
await _update_delegation_on_platform(
task_id, "queued", "", "",
)
return
if a2a_resp.status_code == 200:
try:
result = a2a_resp.json()
except Exception:
delegation.status = DelegationStatus.FAILED
delegation.error = "Invalid JSON response"
return
if "result" in result:
task_result = result["result"]
artifacts = task_result.get("artifacts", [])
texts = []
for artifact in artifacts:
for part in artifact.get("parts", []):
if part.get("kind") == "text":
texts.append(part["text"])
# Also check top-level parts
for part in task_result.get("parts", []):
if part.get("kind") == "text":
texts.append(part["text"])
delegation.status = DelegationStatus.COMPLETED
delegation.result = "\n".join(texts) if texts else str(task_result)
log_event(event_type="delegation", action="delegate", resource=workspace_id,
outcome="success", trace_id=task_id, attempt=attempt + 1)
await _notify_completion(task_id, workspace_id, "completed")
# #64: mirror to platform activity_logs so
# GET /delegations shows the completion state.
await _update_delegation_on_platform(
task_id, "completed", "",
delegation.result or "",
)
return
if "error" in result:
last_error = result["error"].get("message", str(result["error"]))
break
except (httpx.ConnectError, httpx.TimeoutException) as e:
last_error = str(e)
if attempt < DELEGATION_RETRY_ATTEMPTS - 1:
await asyncio.sleep(DELEGATION_RETRY_DELAY * (attempt + 1))
continue
delegation.status = DelegationStatus.FAILED
delegation.error = str(last_error)
log_event(event_type="delegation", action="delegate", resource=workspace_id,
outcome="failure", trace_id=task_id, last_error=str(last_error))
await _notify_completion(task_id, workspace_id, "failed")
# #64: mirror failure to platform activity_logs.
await _update_delegation_on_platform(
task_id, "failed", str(last_error), "",
)
@tool
async def delegate_task(
workspace_id: str,
task: str,
) -> str:
"""Delegate a task to a peer workspace via A2A and WAIT for the response.
Synchronous variant — blocks until the peer replies (or the platform's
A2A round-trip times out). Use this for QUICK questions and small
sub-tasks where you can afford to wait inline.
For longer-running work (research, multi-minute jobs) use
delegate_task_async + check_task_status instead so you don't hold
this workspace busy waiting.
Tool name + description are sourced from the platform_tools registry —
a single ToolSpec drives MCP, LangChain, and system-prompt docs.
"""
from a2a_tools import tool_delegate_task
return await tool_delegate_task(workspace_id, task)
@tool
async def delegate_task_async(
workspace_id: str,
task: str,
) -> dict:
"""Delegate a task to a peer workspace via A2A protocol (non-blocking).
Sends the task in the background and returns immediately with a task_id.
Use check_task_status to poll for the result, or continue working
and check later. The delegate works independently.
Args:
workspace_id: The ID of the target workspace to delegate to.
task: The task description to send to the peer.
Returns:
A dict with task_id and status="delegated". Use check_task_status(task_id) to get results.
"""
task_id = str(uuid.uuid4())
# Task #190 / #193 — Self-delegation guard (async path). Even on the
# async path that returns a task_id immediately, _execute_delegation
# eventually fires the A2A POST back to our own URL, which times out
# against our own held run lock, gets recorded with source_id=our
# workspace UUID, and surfaces in the inbox as a peer_agent message
# from ourselves (#190). Reject before scheduling the background task
# so no peer_agent echo can be generated. Sibling guards:
# - workspace-server/internal/handlers/delegation.go (Go API gate)
# - workspace/a2a_tools_delegation.py (MCP sync + async paths)
# - workspace/builtin_tools/a2a_tools.py (framework-agnostic sync)
if WORKSPACE_ID and workspace_id == WORKSPACE_ID:
log_event(event_type="delegation", action="delegate", resource=workspace_id,
outcome="rejected_self_delegation", trace_id=task_id)
return {
"success": False,
"error": (
"self-delegation rejected: cannot delegate_task_async to your "
"own workspace (would time out and echo back as a peer_agent "
"message from yourself — #190)"
),
}
# RBAC check
roles, custom_perms = get_workspace_roles()
if not check_permission("delegate", roles, custom_perms):
log_event(event_type="rbac", action="rbac.deny", resource=workspace_id,
outcome="denied", trace_id=task_id, attempted_action="delegate", roles=roles)
return {"success": False, "error": f"RBAC: no 'delegate' permission. Roles: {roles}"}
log_event(event_type="delegation", action="delegate", resource=workspace_id,
outcome="dispatched", trace_id=task_id, task_preview=task[:200])
# Store the delegation and launch background task
delegation = DelegationTask(
task_id=task_id,
workspace_id=workspace_id,
task_description=task[:200],
)
_delegations[task_id] = delegation
_evict_old_delegations()
bg_task = asyncio.create_task(_execute_delegation(task_id, workspace_id, task))
_background_tasks.add(bg_task)
bg_task.add_done_callback(_on_task_done)
return {
"success": True,
"task_id": task_id,
"status": "delegated",
"message": f"Task delegated to {workspace_id}. Use check_task_status('{task_id}') to get the result when ready.",
}
@tool
async def check_task_status(
task_id: str = "",
) -> dict:
"""Check the status of a delegated task, or list all active delegations.
Status semantics — IMPORTANT:
- "pending" / "in_progress" → peer is actively working. Wait and check again.
- "queued" → peer's a2a-proxy accepted the call but the peer is
processing a prior task. The reply WILL arrive — the platform's
drain re-dispatches when the peer is free. This tool transparently
polls the platform for the eventual outcome on each call, so
keep polling check_task_status periodically and you'll see
the status flip to "completed" / "failed" automatically.
Do NOT retry the delegation. Do NOT do the work yourself.
Acknowledge to the user that the peer is busy and will reply,
then continue with other delegations or check back later.
- "completed" → result is in the `result` field.
- "failed" → real failure (network, peer crashed, etc.). The
`error` field has the cause. Only fall back to doing the work
yourself if status is "failed", never if status is "queued".
Args:
task_id: The task_id returned by delegate_task_async. If empty, lists all delegations.
Returns:
Status and result (if completed) of the delegation.
"""
if not task_id:
# List all delegations
summary = []
for tid, d in _delegations.items():
entry = {
"task_id": tid,
"workspace_id": d.workspace_id,
"status": d.status.value,
"task": d.task_description,
}
if d.status == DelegationStatus.COMPLETED:
entry["result_preview"] = (d.result or "")[:200]
if d.status == DelegationStatus.FAILED:
entry["error"] = d.error
summary.append(entry)
return {"delegations": summary, "count": len(summary)}
delegation = _delegations.get(task_id)
if not delegation:
return {"error": f"No delegation found with task_id {task_id}"}
# Lazy refresh for QUEUED entries: the platform's drain stitch
# updates its activity_logs row when the queued delegation
# eventually completes, but doesn't push back to this runtime.
# Without this refresh, the LLM polling here would see "queued"
# forever even after the result is available — exactly the bug
# the upstream director-bypass docstring guidance warned against.
if delegation.status == DelegationStatus.QUEUED:
await _refresh_queued_from_platform(task_id)
# delegation is the same dict entry — _refresh mutates in-place.
result = {
"task_id": task_id,
"workspace_id": delegation.workspace_id,
"status": delegation.status.value,
"task": delegation.task_description,
}
if delegation.status == DelegationStatus.COMPLETED:
result["result"] = delegation.result
elif delegation.status == DelegationStatus.FAILED:
result["error"] = delegation.error
# RFC #2251 V1.0 reproduction-harness instrumentation. Every poll of
# check_task_status emits a phase=check_status line so the harness
# operator can tell whether a coordinator stuck for 8 minutes was
# polling-children-the-whole-time vs synthesizing-after-children-done.
# `grep rfc2251_phase=check_status` in the workspace's container log
# gives the polling pattern. Strip when V1.0 ships.
logger.info(
"rfc2251_phase=check_status task_id=%s peer=%s status=%s",
task_id, delegation.workspace_id, delegation.status.value,
)
return result
-403
View File
@@ -1,403 +0,0 @@
"""Bridge between Molecule AI's RBAC + audit subsystem and the Microsoft Agent
Governance Toolkit (agent-os-kernel, released April 2, 2026).
Integration points
------------------
* ``check_permission`` → ``PolicyEvaluator.evaluate()``
Molecule AI's RBAC gate runs first; if RBAC allows the action the toolkit
evaluator is consulted according to ``policy_mode``.
* ``log_event`` → governance audit sink
Every permission decision (allow or deny) is written via
``tools.audit.log_event`` with extra governance metadata so the full
decision trail lands in Molecule AI's existing audit stream.
* OTEL traceparent flows through
``tools.telemetry.get_current_traceparent()`` is called inside ``emit()``
and the W3C traceparent string is attached to every audit record, giving
end-to-end distributed tracing across agent boundaries.
Graceful degradation
--------------------
If ``agent-os-kernel`` is not installed the module falls back to Molecule AI
RBAC alone. No exception propagates to the agent — governance is a
best-effort overlay, never a hard dependency.
Install::
pip install agent-os-kernel
Minimal config.yaml snippet::
governance:
enabled: true
toolkit: microsoft
policy_mode: strict # strict | permissive | audit
policy_endpoint: https://your-tenant.governance.azure.com
policy_file: policies/workspace.rego
blocked_patterns:
- ".*\\.exec$"
- "shell\\."
max_tool_calls_per_task: 50
NOTE: The agent-os-kernel package was released April 2, 2026 and is in
community preview. The API bindings in this module target v3.0.x of the
package (agent_os.policies.PolicyEvaluator). If the package API changes,
update _init_evaluator() accordingly.
"""
import logging
import os
from typing import Any, Optional
logger = logging.getLogger(__name__)
WORKSPACE_ID: str = os.environ.get("WORKSPACE_ID", "")
# Module-level singleton — set by initialize_governance() at startup
_adapter: Optional["GovernanceAdapter"] = None
class GovernanceAdapter:
"""Bridges Molecule AI RBAC + audit trail to the Microsoft Agent Governance Toolkit."""
def __init__(self, config: Any) -> None:
self._config = config
self._evaluator = None
self._toolkit_available: bool = False
async def initialize(self) -> None:
"""Async entry point: initialise evaluator and log outcome."""
self._init_evaluator()
if self._toolkit_available:
logger.info(
"GovernanceAdapter initialised — toolkit=%s mode=%s",
self._config.toolkit,
self._config.policy_mode,
)
else:
logger.warning(
"GovernanceAdapter initialised in RBAC-only mode "
"(agent-os-kernel not available or failed to load)."
)
def _init_evaluator(self) -> None:
"""Lazy-import and configure the PolicyEvaluator from agent-os-kernel.
All failures are caught and logged; the adapter simply runs without
the toolkit rather than crashing the workspace.
"""
try:
try:
from agent_os.policies import PolicyEvaluator # type: ignore[import]
except ImportError:
logger.warning(
"agent-os-kernel is not installed — graceful degradation active. "
"Governance will use Molecule AI RBAC only. "
"To enable the Microsoft Agent Governance Toolkit run: "
"pip install agent-os-kernel"
)
return
kwargs: dict[str, Any] = {
"policy_mode": self._config.policy_mode,
"max_tool_calls_per_task": self._config.max_tool_calls_per_task,
"blocked_patterns": self._config.blocked_patterns,
}
if self._config.policy_endpoint:
kwargs["endpoint"] = self._config.policy_endpoint
self._evaluator = PolicyEvaluator(**kwargs)
# Load a policy file if one is configured and exists on disk.
if self._config.policy_file:
policy_file = self._config.policy_file
if os.path.exists(policy_file):
ext = os.path.splitext(policy_file)[1].lower()
if ext == ".rego":
self._evaluator.load_rego(path=policy_file)
logger.info("Loaded Rego policy file: %s", policy_file)
elif ext in (".yaml", ".yml"):
self._evaluator.load_yaml(path=policy_file)
logger.info("Loaded YAML policy file: %s", policy_file)
elif ext == ".cedar":
self._evaluator.load_cedar(path=policy_file)
logger.info("Loaded Cedar policy file: %s", policy_file)
else:
logger.warning(
"Unrecognised policy file extension '%s' — skipping load.",
ext,
)
else:
logger.warning(
"policy_file '%s' does not exist — skipping load.",
policy_file,
)
self._toolkit_available = True
logger.info(
"agent-os-kernel PolicyEvaluator ready — policy_mode=%s",
self._config.policy_mode,
)
except Exception as exc: # noqa: BLE001
logger.warning(
"Failed to initialise agent-os-kernel PolicyEvaluator: %s"
"graceful degradation active (RBAC only).",
exc,
)
def check_permission(
self,
action: str,
roles: list[str],
custom_permissions: dict | None = None,
context: dict | None = None,
) -> tuple[bool, str]:
"""Evaluate an action against Molecule AI RBAC and (optionally) the toolkit.
Returns
-------
tuple[bool, str]
``(allowed, reason)`` — reason is a short human-readable string
explaining the decision.
"""
from builtin_tools import audit # inline import to avoid circular dependencies
context = context or {}
# --- Step 1: Molecule AI RBAC gate (always runs) ---
rbac_allowed: bool = audit.check_permission(action, roles, custom_permissions)
if not rbac_allowed:
self.emit(
event_type="permission_check",
action=action,
resource=context.get("resource", ""),
outcome="denied",
actor=context.get("actor"),
policy_decision="rbac_deny",
roles=roles,
)
return False, f"RBAC denied action '{action}' for roles {roles}"
# --- Step 2: If toolkit unavailable or audit-only mode, return RBAC result ---
if not self._toolkit_available or self._config.policy_mode == "audit":
self.emit(
event_type="permission_check",
action=action,
resource=context.get("resource", ""),
outcome="allowed",
actor=context.get("actor"),
policy_decision="rbac_allowed",
roles=roles,
toolkit_mode=self._config.policy_mode,
)
return rbac_allowed, "rbac_allowed"
# --- Step 3: Toolkit evaluation ---
eval_context: dict[str, Any] = {
"action": action,
"resource": context.get("resource", ""),
"roles": roles,
"workspace_id": WORKSPACE_ID,
}
# Merge any extra context keys the caller supplied.
for key, value in context.items():
if key not in eval_context:
eval_context[key] = value
toolkit_allowed: bool = True
reason: str = ""
evaluator_name: str = "agent-os-kernel"
try:
decision = self._evaluator.evaluate(eval_context)
toolkit_allowed = getattr(decision, "allowed", True)
reason = getattr(decision, "reason", "")
evaluator_name = getattr(decision, "evaluator_name", "agent-os-kernel")
except Exception as exc: # noqa: BLE001
logger.warning(
"agent-os-kernel evaluation raised an exception: %s"
"falling back to RBAC result to avoid blocking the agent.",
exc,
)
self.emit(
event_type="permission_check",
action=action,
resource=context.get("resource", ""),
outcome="allowed",
actor=context.get("actor"),
policy_decision="toolkit_evaluation_error",
toolkit_mode=self._config.policy_mode,
roles=roles,
)
return rbac_allowed, "toolkit_evaluation_error"
# --- Step 4: Combine results according to policy_mode ---
if self._config.policy_mode == "permissive":
# Toolkit denial is advisory only in permissive mode.
if not toolkit_allowed:
logger.warning(
"Governance toolkit denied action '%s' (reason=%s) but policy_mode "
"is 'permissive' — allowing and logging advisory denial.",
action,
reason,
)
final_allowed = rbac_allowed
else:
# strict: both gates must allow.
final_allowed = rbac_allowed and toolkit_allowed
outcome = "allowed" if final_allowed else "denied"
self.emit(
event_type="permission_check",
action=action,
resource=context.get("resource", ""),
outcome=outcome,
actor=context.get("actor"),
policy_decision=reason or outcome,
evaluator=evaluator_name,
toolkit_mode=self._config.policy_mode,
roles=roles,
)
return final_allowed, reason or "allowed"
def emit(
self,
event_type: str,
action: str,
resource: str,
outcome: str,
actor: str | None = None,
trace_id: str | None = None,
**extra: Any,
) -> str:
"""Write a governance-annotated audit event.
Pulls the current W3C traceparent from the active OTEL span so that
governance decisions are traceable across service boundaries.
Returns
-------
str
The ``trace_id`` produced by ``audit.log_event``.
"""
from builtin_tools import audit # inline import to avoid circular dependencies
from builtin_tools.telemetry import get_current_traceparent # inline import
traceparent: str | None = get_current_traceparent()
recorded_trace_id: str = audit.log_event(
event_type,
action,
resource,
outcome,
actor=actor,
trace_id=trace_id,
governance_toolkit=(
self._config.toolkit if self._toolkit_available else "disabled"
),
traceparent=traceparent or "",
**extra,
)
return recorded_trace_id
# ---------------------------------------------------------------------------
# Module-level functions
# ---------------------------------------------------------------------------
async def initialize_governance(config: Any) -> Optional[GovernanceAdapter]:
"""Initialize the module-level GovernanceAdapter singleton.
Called once at startup by main.py when governance.enabled is True.
Returns the adapter, or None if initialization fails.
"""
global _adapter
try:
adapter = GovernanceAdapter(config)
await adapter.initialize()
_adapter = adapter
logger.info(
"Governance singleton initialised — toolkit=%s mode=%s",
config.toolkit,
config.policy_mode,
)
return adapter
except Exception as exc: # noqa: BLE001
logger.warning(
"initialize_governance() failed: %s — governance disabled for this session.",
exc,
)
return None
def get_governance_adapter() -> Optional[GovernanceAdapter]:
"""Return the module-level GovernanceAdapter singleton (may be None)."""
return _adapter
def check_permission_with_governance(
action: str,
roles: list[str],
custom_permissions: dict | None = None,
context: dict | None = None,
) -> tuple[bool, str]:
"""Convenience wrapper: use GovernanceAdapter when available, else RBAC only.
Parameters
----------
action:
The action name to evaluate (e.g. ``"memory.write"``).
roles:
The list of role names held by the requesting actor.
custom_permissions:
Optional custom role→action mapping to overlay on built-in roles.
context:
Optional extra context forwarded to the PolicyEvaluator.
Returns
-------
tuple[bool, str]
``(allowed, reason)``
"""
if _adapter is None:
from builtin_tools import audit # inline import to avoid circular dependencies
result: bool = audit.check_permission(action, roles, custom_permissions)
return result, "rbac_only"
return _adapter.check_permission(action, roles, custom_permissions, context)
# ---------------------------------------------------------------------------
# Private helper
# ---------------------------------------------------------------------------
def _emit_governance_event(
event_type: str,
action: str,
resource: str,
outcome: str,
actor: str | None = None,
trace_id: str | None = None,
**extra: Any,
) -> Optional[str]:
"""Emit a governance audit event via the singleton adapter if one is set.
Returns the trace_id produced by log_event, or None if no adapter is set.
"""
if _adapter is None:
return None
return _adapter.emit(
event_type,
action,
resource,
outcome,
actor=actor,
trace_id=trace_id,
**extra,
)
-561
View File
@@ -1,561 +0,0 @@
"""Human-In-The-Loop (HITL) workflow primitives.
Generalizes the approval tool into reusable HITL building blocks that work
across all Molecule AI adapters.
Features
--------
@requires_approval
Decorator that gates *any* async callable (tool, method, standalone fn)
behind a human approval request. The decorated function only runs if
the request is granted. Roles in ``hitl.bypass_roles`` skip the gate.
pause_task / resume_task
LangChain tools for explicit pause/resume of in-flight tasks. An agent
calls ``pause_task(task_id, reason)`` to suspend itself; an external
signal (webhook, dashboard click, another agent) calls ``resume_task``
with the same task_id to wake it up.
Notification channels
---------------------
Configured under ``hitl:`` in ``config.yaml``:
hitl:
channels:
- type: dashboard # always active; uses platform approval API
- type: slack
webhook_url: https://hooks.slack.com/services/…
- type: email
smtp_host: smtp.example.com
smtp_port: 587
from: alerts@example.com
to: ops@example.com
username: alerts@example.com # optional; password from SMTP_PASSWORD env
default_timeout: 300 # seconds before an unanswered request times out
bypass_roles: [admin] # roles that skip the approval gate entirely
Environment variables
---------------------
SMTP_PASSWORD Password for SMTP authentication (preferred over config file)
"""
from __future__ import annotations
import asyncio
import functools
import logging
import os
import smtplib
from dataclasses import dataclass, field
from email.mime.text import MIMEText
from typing import Any, Callable
import httpx
from langchain_core.tools import tool
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
@dataclass
class HITLConfig:
"""HITL settings loaded from the ``hitl:`` block in config.yaml."""
channels: list[dict] = field(default_factory=lambda: [{"type": "dashboard"}])
default_timeout: float = 300.0
bypass_roles: list[str] = field(default_factory=list)
def _load_hitl_config() -> HITLConfig:
"""Load HITL config from workspace config; fall back to safe defaults."""
try:
from config import load_config
cfg = load_config()
raw = getattr(cfg, "hitl", None)
if raw is None:
return HITLConfig()
return HITLConfig(
channels=raw.channels if hasattr(raw, "channels") else [{"type": "dashboard"}],
default_timeout=float(raw.default_timeout if hasattr(raw, "default_timeout") else 300),
bypass_roles=list(raw.bypass_roles if hasattr(raw, "bypass_roles") else []),
)
except Exception:
return HITLConfig()
# ---------------------------------------------------------------------------
# Pause / Resume registry
# ---------------------------------------------------------------------------
class _TaskPauseRegistry:
"""In-process registry mapping task_id → asyncio.Event + optional result.
Multiple coroutines awaiting the same task_id are all unblocked when
``resume()`` is called. Results survive until the awaiting coroutine
calls ``pop_result()``.
"""
def __init__(self) -> None:
self._events: dict[str, asyncio.Event] = {}
self._results: dict[str, dict] = {}
# #265: owner map — workspace_id that created each task.
# Empty string means "no owner / legacy" (bypasses ownership check).
self._owners: dict[str, str] = {}
def register(self, task_id: str, owner: str = "") -> asyncio.Event:
"""Create and store an Event for *task_id*. Returns the event.
Args:
task_id: Unique task identifier.
owner: Workspace ID that owns this task. When set, ``resume``
will reject callers from a different workspace.
"""
ev = asyncio.Event()
self._events[task_id] = ev
self._owners[task_id] = owner
return ev
def resume(self, task_id: str, result: dict | None = None, owner: str = "") -> bool:
"""Signal the Event for *task_id*. Returns False if not registered.
Args:
task_id: The identifier used in ``register``.
result: Optional result payload forwarded to the waiting coroutine.
owner: Caller's workspace ID. When both the stored owner and
*owner* are non-empty and they differ, the call is rejected
(returns False) — prevents cross-workspace prompt injection
(#265). Passing ``owner=""`` bypasses the check (used in
direct registry calls from tests and platform code).
"""
# #265 ownership check
stored_owner = self._owners.get(task_id, "")
if owner and stored_owner and owner != stored_owner:
logger.warning(
"HITL: resume rejected for task %s — caller workspace %r != owner %r",
task_id, owner, stored_owner,
)
return False
ev = self._events.get(task_id)
if ev is None:
return False
self._results[task_id] = result or {}
ev.set()
return True
def pop_result(self, task_id: str) -> dict:
"""Return and remove the stored result for *task_id*."""
return self._results.pop(task_id, {})
def cleanup(self, task_id: str) -> None:
"""Remove *task_id* from all dicts."""
self._events.pop(task_id, None)
self._results.pop(task_id, None)
self._owners.pop(task_id, None)
def list_paused(self) -> list[str]:
"""Return IDs of tasks whose events have not yet been set."""
return [tid for tid, ev in self._events.items() if not ev.is_set()]
# Global singleton — safe within one asyncio event loop / process
pause_registry = _TaskPauseRegistry()
# ---------------------------------------------------------------------------
# Notification channels
# ---------------------------------------------------------------------------
async def _notify_channels(
action: str,
reason: str,
approval_id: str,
cfg: HITLConfig,
) -> None:
"""Fire-and-forget notifications to all configured channels.
Errors in individual channels are logged but never re-raised so that a
misconfigured Slack webhook cannot block the approval flow.
"""
platform_url = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
workspace_id = os.environ.get("WORKSPACE_ID", "")
for channel in cfg.channels:
ch_type = channel.get("type", "dashboard")
try:
if ch_type == "slack":
await _notify_slack(channel, action, reason, approval_id,
platform_url, workspace_id)
elif ch_type == "email":
await _notify_email(channel, action, reason, approval_id,
platform_url, workspace_id)
# "dashboard" is handled by the platform via the approval POST
except Exception as exc:
logger.warning("HITL: channel '%s' notification failed: %s", ch_type, exc)
async def _notify_slack(
cfg: dict,
action: str,
reason: str,
approval_id: str,
platform_url: str,
workspace_id: str,
) -> None:
webhook_url = cfg.get("webhook_url", "")
if not webhook_url:
return
approve_url = f"{platform_url}/workspaces/{workspace_id}/approvals/{approval_id}/approve"
deny_url = f"{platform_url}/workspaces/{workspace_id}/approvals/{approval_id}/deny"
payload = {
"text": f":warning: Approval required from workspace `{workspace_id}`",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": (
f"*Action:* {action}\n"
f"*Reason:* {reason}\n"
f"*Approval ID:* `{approval_id}`"
),
},
},
{
"type": "actions",
"elements": [
{
"type": "button",
"text": {"type": "plain_text", "text": "Approve"},
"style": "primary",
"url": approve_url,
},
{
"type": "button",
"text": {"type": "plain_text", "text": "Deny"},
"style": "danger",
"url": deny_url,
},
],
},
],
}
async with httpx.AsyncClient(timeout=10.0) as client:
await client.post(webhook_url, json=payload)
logger.info("HITL: Slack notification sent for approval %s", approval_id)
async def _notify_email(
cfg: dict,
action: str,
reason: str,
approval_id: str,
platform_url: str,
workspace_id: str,
) -> None:
smtp_host = cfg.get("smtp_host", "")
smtp_port = int(cfg.get("smtp_port", 587))
from_addr = cfg.get("from", "")
to_addr = cfg.get("to", "")
if not all([smtp_host, from_addr, to_addr]):
logger.warning("HITL: email channel missing smtp_host/from/to — skipping")
return
approve_url = f"{platform_url}/workspaces/{workspace_id}/approvals/{approval_id}/approve"
deny_url = f"{platform_url}/workspaces/{workspace_id}/approvals/{approval_id}/deny"
body = (
f"Approval required from workspace {workspace_id}\n\n"
f"Action : {action}\n"
f"Reason : {reason}\n"
f"ID : {approval_id}\n\n"
f"Approve: {approve_url}\n"
f"Deny : {deny_url}\n"
)
msg = MIMEText(body, "plain", "utf-8")
msg["Subject"] = f"[Molecule AI] Approval required: {action}"
msg["From"] = from_addr
msg["To"] = to_addr
username = cfg.get("username", "")
password = cfg.get("password", os.environ.get("SMTP_PASSWORD", ""))
def _send() -> None:
with smtplib.SMTP(smtp_host, smtp_port) as srv:
srv.ehlo()
srv.starttls()
if username and password:
srv.login(username, password)
srv.send_message(msg)
await asyncio.to_thread(_send)
logger.info("HITL: email notification sent for approval %s", approval_id)
# ---------------------------------------------------------------------------
# @requires_approval decorator
# ---------------------------------------------------------------------------
def requires_approval(
action_description: str = "",
reason_template: str = "",
bypass_roles: list[str] | None = None,
) -> Callable[[Callable], Callable]:
"""Decorator that gates an async callable behind a human approval request.
The wrapped function executes only when a human approves. Use this on
any tool or async helper that performs destructive or high-impact work.
Args:
action_description: Short label for the action shown to the approver.
Defaults to the function's ``name`` attribute or
``__name__``.
reason_template: f-string template for the reason line. Keyword
arguments of the decorated function are available,
e.g. ``"Delete table {table_name}"``).
bypass_roles: Roles that skip the gate entirely. Overrides
``hitl.bypass_roles`` in config.yaml when given.
Returns:
A decorator; applying it to a function returns an async wrapper.
Usage::
@tool
@requires_approval("Wipe production DB", bypass_roles=["admin"])
async def drop_table(table_name: str) -> dict:
...
# Works with plain async functions too:
@requires_approval("Send customer email")
async def send_email(to: str, body: str) -> dict:
...
"""
def decorator(fn: Callable) -> Callable:
action = action_description or getattr(fn, "name", None) or fn.__name__
@functools.wraps(fn)
async def wrapper(*args: Any, **kwargs: Any) -> Any:
hitl_cfg = _load_hitl_config()
# --- Check bypass roles -----------------------------------------
active_bypass = bypass_roles if bypass_roles is not None else hitl_cfg.bypass_roles
if active_bypass:
try:
from builtin_tools.audit import get_workspace_roles
roles, _ = get_workspace_roles()
if any(r in active_bypass for r in roles):
logger.info(
"@requires_approval bypassed (role %s) for '%s'", roles, action
)
return await fn(*args, **kwargs)
except Exception:
pass # If RBAC check fails, proceed to approval gate
# --- Build reason string -----------------------------------------
if reason_template:
try:
reason = reason_template.format(**kwargs)
except (KeyError, IndexError):
reason = reason_template
else:
arg_parts = [f"{k}={str(v)[:60]}" for k, v in list(kwargs.items())[:3]]
reason = f"Args: {', '.join(arg_parts)}" if arg_parts else "Automated action"
# --- Fire non-dashboard notifications (async, non-blocking) ------
asyncio.create_task(
_notify_channels(action, reason, "pending", hitl_cfg)
)
# --- Request approval via approval tool --------------------------
try:
from builtin_tools.approval import request_approval
approval_result = await request_approval.ainvoke(
{"action": action, "reason": reason}
)
except Exception as exc:
logger.error("@requires_approval: approval call failed: %s", exc)
return {
"success": False,
"error": f"Approval gate error: {exc}",
}
if not approval_result.get("approved"):
# Art. 14 audit: log the denial outcome so the activity log
# contains evidence that the human oversight gate was exercised.
try:
from builtin_tools.audit import log_event
log_event(
event_type="hitl",
action="approve",
resource=action,
outcome="denied",
actor=approval_result.get("decided_by"),
approval_id=approval_result.get("approval_id"),
reason=reason,
)
except Exception:
pass
return {
"success": False,
"error": (
f"Action '{action}' not approved: "
f"{approval_result.get('message', approval_result.get('error', 'denied'))}"
),
"approval_id": approval_result.get("approval_id"),
}
# Art. 14 audit: log the approval grant before running the function.
try:
from builtin_tools.audit import log_event
log_event(
event_type="hitl",
action="approve",
resource=action,
outcome="granted",
actor=approval_result.get("decided_by"),
approval_id=approval_result.get("approval_id"),
reason=reason,
)
except Exception:
pass
# --- Approved — run the original function ------------------------
return await fn(*args, **kwargs)
return wrapper
return decorator
# ---------------------------------------------------------------------------
# Pause / Resume LangChain tools
# ---------------------------------------------------------------------------
@tool
async def pause_task(task_id: str, reason: str = "") -> dict:
"""Suspend the current task and wait for a resume signal.
The agent calls this to pause itself at a decision point. Execution
resumes when ``resume_task`` is called with the same task_id, or after
the configured ``hitl.default_timeout`` seconds.
Args:
task_id: Unique identifier for this pause point (use the A2A task ID
or any stable string that the caller can reference later).
reason: Human-readable description of why the task is pausing.
"""
# #265: record workspace ownership on registration so resume_task can
# reject callers from a different workspace (cross-workspace prompt-injection
# prevention). External task_id is unchanged — only internal ownership
# metadata is added, so no tests or callers need to update their task IDs.
_ws = os.environ.get("WORKSPACE_ID", "")
try:
from builtin_tools.audit import log_event
log_event(
event_type="hitl",
action="pause",
resource=task_id,
outcome="paused",
trace_id=task_id,
reason=reason,
)
except Exception:
pass
event = pause_registry.register(task_id, owner=_ws)
timeout = _load_hitl_config().default_timeout
logger.info("HITL: task %s paused — %s", task_id, reason or "(no reason given)")
try:
await asyncio.wait_for(event.wait(), timeout=timeout)
result = pause_registry.pop_result(task_id)
logger.info("HITL: task %s resumed", task_id)
try:
from builtin_tools.audit import log_event
log_event(
event_type="hitl",
action="resume",
resource=task_id,
outcome="resumed",
trace_id=task_id,
)
except Exception:
pass
return {"resumed": True, "task_id": task_id, **result}
except asyncio.TimeoutError:
logger.warning("HITL: task %s timed out after %.0fs", task_id, timeout)
try:
from builtin_tools.audit import log_event
log_event(
event_type="hitl",
action="pause",
resource=task_id,
outcome="timeout",
trace_id=task_id,
timeout_seconds=timeout,
)
except Exception:
pass
return {
"resumed": False,
"task_id": task_id,
"error": f"Timed out after {timeout:.0f}s waiting for resume signal",
}
finally:
pause_registry.cleanup(task_id)
@tool
async def resume_task(task_id: str, message: str = "") -> dict:
"""Resume a previously paused task.
Signals the ``pause_task`` coroutine waiting on *task_id* to continue.
Safe to call even if the task has already resumed or timed out (returns
success=False in that case).
Args:
task_id: The identifier passed to ``pause_task``.
message: Optional message forwarded to the resumed task.
"""
# #265: pass caller's workspace ID so the registry can reject a resume
# from a different workspace (ownership check in _TaskPauseRegistry.resume).
_ws = os.environ.get("WORKSPACE_ID", "")
result_payload = {"message": message} if message else {}
success = pause_registry.resume(task_id, result_payload, owner=_ws)
if success:
logger.info("HITL: resume signal sent for task %s", task_id)
try:
from builtin_tools.audit import log_event
log_event(
event_type="hitl",
action="resume",
resource=task_id,
outcome="success",
trace_id=task_id,
message=message,
)
except Exception:
pass
return {"success": True, "task_id": task_id}
return {
"success": False,
"task_id": task_id,
"error": "Task not found or already resumed",
}
@tool
async def list_paused_tasks() -> dict:
"""List all tasks currently suspended and waiting for a resume signal."""
paused = pause_registry.list_paused()
return {"paused_tasks": paused, "count": len(paused)}
-470
View File
@@ -1,470 +0,0 @@
"""HMA memory tools for agents.
Hierarchical Memory Architecture:
- LOCAL: private to this workspace, invisible to others
- TEAM: shared with parent + siblings (same team)
- GLOBAL: readable by all, writable by root workspaces only
RBAC enforcement
----------------
``commit_memory`` requires the ``"memory.write"`` action.
``recall_memory`` requires the ``"memory.read"`` action.
Roles are read from ``config.yaml`` under ``rbac.roles`` (default: operator).
Audit trail
-----------
Every memory operation appends a JSON Lines record to the audit log:
memory / memory.write / allowed — write permitted by RBAC
memory / memory.write / success — write committed successfully
memory / memory.write / failure — write failed (platform error)
memory / memory.read / allowed — read permitted by RBAC
memory / memory.read / success — search returned results
memory / memory.read / failure — search failed (platform error)
RBAC denials emit ``rbac / rbac.deny / denied`` events instead.
"""
import json
import os
import uuid
from types import SimpleNamespace
from typing import Any
from langchain_core.tools import tool
from builtin_tools.awareness_client import build_awareness_client
from builtin_tools.audit import check_permission, get_workspace_roles, log_event
from builtin_tools.security import _redact_secrets
from builtin_tools.telemetry import MEMORY_QUERY, MEMORY_SCOPE, WORKSPACE_ID_ATTR, get_tracer
try: # pragma: no cover - optional runtime dependency in lightweight test envs
import httpx # type: ignore
except ImportError: # pragma: no cover
httpx = SimpleNamespace(AsyncClient=None)
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")
@tool
async def commit_memory(content: str, scope: str = "LOCAL") -> dict:
"""Store a fact in memory with a specific scope.
Args:
content: The fact or knowledge to remember.
scope: Memory scope — LOCAL (private), TEAM (shared with team), or GLOBAL (company-wide, root only).
"""
content = _redact_secrets(content)
trace_id = str(uuid.uuid4())
scope = scope.upper()
if scope not in ("LOCAL", "TEAM", "GLOBAL"):
return {"error": "scope must be LOCAL, TEAM, or GLOBAL"}
# --- RBAC check -----------------------------------------------------------
roles, custom_perms = get_workspace_roles()
if not check_permission("memory.write", roles, custom_perms):
log_event(
event_type="rbac",
action="rbac.deny",
resource=scope,
outcome="denied",
trace_id=trace_id,
attempted_action="memory.write",
roles=roles,
)
return {
"success": False,
"error": (
"RBAC: this workspace does not have the 'memory.write' permission. "
f"Current roles: {roles}"
),
}
log_event(
event_type="memory",
action="memory.write",
resource=scope,
outcome="allowed",
trace_id=trace_id,
memory_scope=scope,
content_length=len(content),
)
# ── OTEL: memory_write span ──────────────────────────────────────────────
tracer = get_tracer()
with tracer.start_as_current_span("memory_write") as mem_span:
mem_span.set_attribute(WORKSPACE_ID_ATTR, WORKSPACE_ID)
mem_span.set_attribute(MEMORY_SCOPE, scope)
mem_span.set_attribute("memory.content_length", len(content))
awareness_client = build_awareness_client()
if awareness_client is not None:
try:
result = await awareness_client.commit(content, scope)
except Exception as e:
log_event(
event_type="memory",
action="memory.write",
resource=scope,
outcome="failure",
trace_id=trace_id,
memory_scope=scope,
error=str(e),
)
try:
mem_span.record_exception(e)
except Exception:
pass
return {"success": False, "error": str(e)}
else:
# #215-class bug: platform now gates /workspaces/:id/memories behind
# workspace auth. Import auth_headers lazily (same pattern as the
# activity-log path below) so test environments that don't ship
# platform_auth still work.
try:
from platform_auth import auth_headers as _auth
_headers = _auth()
except Exception:
_headers = {}
async with httpx.AsyncClient(timeout=10.0) as client:
try:
resp = await client.post(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/memories",
json={"content": content, "scope": scope},
headers=_headers,
)
if resp.status_code == 201:
result = {"success": True, "id": resp.json().get("id"), "scope": scope}
else:
result = {"success": False, "error": resp.json().get("error", resp.text)}
except Exception as e:
log_event(
event_type="memory",
action="memory.write",
resource=scope,
outcome="failure",
trace_id=trace_id,
memory_scope=scope,
error=str(e),
)
try:
mem_span.record_exception(e)
except Exception:
pass
return {"success": False, "error": str(e)}
if result.get("success"):
mem_span.set_attribute("memory.id", result.get("id") or "")
mem_span.set_attribute("memory.success", True)
log_event(
event_type="memory",
action="memory.write",
resource=scope,
outcome="success",
trace_id=trace_id,
memory_scope=scope,
memory_id=result.get("id"),
)
# #125: surface memory writes in /activity so the Canvas
# "Agent Comms" tab shows what an agent chose to remember.
# Fire-and-forget — failure here must not poison the tool
# response since the memory write itself already succeeded.
await _record_memory_activity(scope, content, result.get("id"))
await _maybe_log_skill_promotion(content, scope, result)
else:
mem_span.set_attribute("memory.success", False)
log_event(
event_type="memory",
action="memory.write",
resource=scope,
outcome="failure",
trace_id=trace_id,
memory_scope=scope,
error=result.get("error"),
)
return result
@tool
async def recall_memory(query: str = "", scope: str = "") -> dict:
"""Search stored memories.
Args:
query: Text to search for (empty returns all).
scope: Filter by scope — LOCAL, TEAM, GLOBAL, or empty for all accessible.
"""
trace_id = str(uuid.uuid4())
scope = scope.upper()
if scope and scope not in ("LOCAL", "TEAM", "GLOBAL"):
return {"error": "scope must be LOCAL, TEAM, GLOBAL, or empty"}
# --- RBAC check -----------------------------------------------------------
roles, custom_perms = get_workspace_roles()
if not check_permission("memory.read", roles, custom_perms):
log_event(
event_type="rbac",
action="rbac.deny",
resource=scope or "all",
outcome="denied",
trace_id=trace_id,
attempted_action="memory.read",
roles=roles,
)
return {
"success": False,
"error": (
"RBAC: this workspace does not have the 'memory.read' permission. "
f"Current roles: {roles}"
),
}
log_event(
event_type="memory",
action="memory.read",
resource=scope or "all",
outcome="allowed",
trace_id=trace_id,
memory_scope=scope or "all",
query_length=len(query),
)
# ── OTEL: memory_read span ───────────────────────────────────────────────
tracer = get_tracer()
with tracer.start_as_current_span("memory_read") as mem_span:
mem_span.set_attribute(WORKSPACE_ID_ATTR, WORKSPACE_ID)
mem_span.set_attribute(MEMORY_SCOPE, scope or "all")
mem_span.set_attribute(MEMORY_QUERY, query[:256] if query else "")
awareness_client = build_awareness_client()
if awareness_client is not None:
try:
result = await awareness_client.search(query, scope)
mem_span.set_attribute("memory.result_count", result.get("count", 0))
mem_span.set_attribute("memory.success", result.get("success", False))
log_event(
event_type="memory",
action="memory.read",
resource=scope or "all",
outcome="success" if result.get("success") else "failure",
trace_id=trace_id,
memory_scope=scope or "all",
result_count=result.get("count", 0),
)
return result
except Exception as e:
log_event(
event_type="memory",
action="memory.read",
resource=scope or "all",
outcome="failure",
trace_id=trace_id,
memory_scope=scope or "all",
error=str(e),
)
try:
mem_span.record_exception(e)
except Exception:
pass
return {"success": False, "error": str(e)}
params = {}
if query:
params["q"] = query
if scope:
params["scope"] = scope.upper()
# #215-class bug (search path): same fix as commit_memory above —
# the platform gates GET /workspaces/:id/memories behind workspace
# auth, so without auth_headers() every search silently 401s and the
# agent thinks its backlog is empty (observed on Technical Researcher
# idle-loop pilot 2026-04-15).
try:
from platform_auth import auth_headers as _auth
_headers = _auth()
except Exception:
_headers = {}
async with httpx.AsyncClient(timeout=10.0) as client:
try:
resp = await client.get(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/memories",
params=params,
headers=_headers,
)
if resp.status_code == 200:
memories = resp.json()
mem_span.set_attribute("memory.result_count", len(memories))
mem_span.set_attribute("memory.success", True)
log_event(
event_type="memory",
action="memory.read",
resource=scope or "all",
outcome="success",
trace_id=trace_id,
memory_scope=scope or "all",
result_count=len(memories),
)
return {
"success": True,
"count": len(memories),
"memories": memories,
}
mem_span.set_attribute("memory.success", False)
log_event(
event_type="memory",
action="memory.read",
resource=scope or "all",
outcome="failure",
trace_id=trace_id,
memory_scope=scope or "all",
http_status=resp.status_code,
)
return {"success": False, "error": resp.json().get("error", resp.text)}
except Exception as e:
log_event(
event_type="memory",
action="memory.read",
resource=scope or "all",
outcome="failure",
trace_id=trace_id,
memory_scope=scope or "all",
error=str(e),
)
try:
mem_span.record_exception(e)
except Exception:
pass
return {"success": False, "error": str(e)}
def _parse_promotion_packet(content: str) -> dict[str, Any] | None:
"""Return a structured memory packet when content looks like promotion metadata."""
text = content.strip()
if not text.startswith("{"):
return None
try:
payload = json.loads(text)
except json.JSONDecodeError:
return None
if not isinstance(payload, dict): # pragma: no cover
return None
if not payload.get("promote_to_skill"):
return None
return payload
async def _record_memory_activity(scope: str, content: str, memory_id: str | None) -> None:
"""Surface a successful memory write as an activity row so the Canvas
"Agent Comms" tab can display what an agent chose to remember.
Fire-and-forget — never raises. #125.
The summary is intentionally short (scope tag + first 80 chars of
content with a ``…`` ellipsis when truncated) so the activity table
stays readable; full content lives in ``agent_memories``.
"""
workspace_id = WORKSPACE_ID.strip()
platform_url = PLATFORM_URL.strip().rstrip("/")
if not workspace_id or not platform_url:
return
preview = content.strip().replace("\n", " ")
if len(preview) > 80:
preview = preview[:80] + ""
summary = f"[{scope}] {preview}"
# NOTE: target_id is a UUID column scoped to workspace_id references —
# cannot hold awareness/memory IDs (which are arbitrary strings).
# We embed the memory_id in the summary instead so it's still searchable.
if memory_id:
summary = f"{summary} (id={memory_id[:24]})"
payload: dict[str, Any] = {
"workspace_id": workspace_id,
"activity_type": "memory_write",
"summary": summary,
"status": "ok",
}
try:
try:
from platform_auth import auth_headers as _auth
_headers = _auth()
except Exception:
_headers = {}
async with httpx.AsyncClient(timeout=5.0) as client:
await client.post(
f"{platform_url}/workspaces/{workspace_id}/activity",
json=payload,
headers=_headers,
)
except Exception:
# Activity logging is purely observability — never poison the
# tool response on a failure here. We don't even log_event the
# failure since the memory write itself succeeded and that's
# what matters to the caller.
pass
async def _maybe_log_skill_promotion(content: str, scope: str, memory_result: dict) -> None:
"""Best-effort activity log for durable memory entries that should become skills."""
packet = _parse_promotion_packet(content)
if packet is None:
return
workspace_id = WORKSPACE_ID.strip()
platform_url = PLATFORM_URL.strip().rstrip("/")
if not workspace_id or not platform_url:
return
repetition_signal = packet.get("repetition_signal")
summary = (
packet.get("summary")
or packet.get("title")
or packet.get("what changed")
or "Repeatable workflow promoted to skill candidate"
)
metadata: dict[str, Any] = {
"source": "memory-curation",
"scope": scope,
"memory_id": memory_result.get("id"),
"promote_to_skill": True,
"repetition_signal": repetition_signal,
"memory_packet": packet,
}
payload = {
"activity_type": "skill_promotion",
"method": "memory/skill-promotion",
"summary": summary,
"status": "ok",
"source_id": workspace_id,
"request_body": packet,
"metadata": metadata,
}
try:
async with httpx.AsyncClient(timeout=5.0) as client:
await client.post(
f"{platform_url}/workspaces/{workspace_id}/activity",
json=payload,
)
await client.post(
f"{platform_url}/registry/heartbeat",
json={
"workspace_id": workspace_id,
"error_rate": 0,
"sample_error": "",
"active_tasks": 1,
"uptime_seconds": 0,
"current_task": f"Skill promotion: {summary}",
},
)
except Exception:
# Best-effort observability only. Memory commits must never fail because
# the promotion log could not be written.
return
-281
View File
@@ -1,281 +0,0 @@
"""Code sandbox tool for safe code execution.
Executes code in an isolated environment. Three backends are supported:
subprocess (default)
Runs code locally via asyncio subprocess with a hard timeout.
Best for Tier 1/2 agents where run_code is lightly used and the
workspace container itself is the isolation boundary.
docker
Throwaway Docker-in-Docker container: network disabled, memory capped,
read-only filesystem. Requires Docker socket access inside the container.
Best for Tier 3 on-prem deployments.
e2b
Cloud-hosted microVM sandbox via E2B (https://e2b.dev).
No local Docker required — code runs in E2B's isolated cloud VMs.
Supports Python and JavaScript.
Requires:
- e2b-code-interpreter Python package (pinned in requirements.txt)
- E2B_API_KEY workspace secret (set via canvas Secrets panel or API)
Best for hosted/cloud Molecule AI deployments.
Backend is selected via the SANDBOX_BACKEND env var, which the provisioner
sets from config.yaml → sandbox.backend. Default: "subprocess".
"""
import asyncio
import logging
import os
import tempfile
from langchain_core.tools import tool
logger = logging.getLogger(__name__)
SANDBOX_BACKEND = os.environ.get("SANDBOX_BACKEND", "subprocess")
SANDBOX_TIMEOUT = int(os.environ.get("SANDBOX_TIMEOUT", "30"))
SANDBOX_MEMORY_LIMIT = os.environ.get("SANDBOX_MEMORY_LIMIT", "256m")
MAX_OUTPUT = 10_000
# E2B kernel names differ from internal language names.
_E2B_KERNEL_MAP = {
"python": "python3",
"javascript": "js",
"js": "js",
}
@tool
async def run_code(code: str, language: str = "python") -> dict:
"""Execute code in an isolated sandbox and return the output.
Args:
code: The code to execute.
language: Programming language — python, javascript, or shell.
The e2b backend supports python and javascript only.
"""
if SANDBOX_BACKEND == "docker":
return await _run_docker(code, language)
elif SANDBOX_BACKEND == "e2b":
return await _run_e2b(code, language)
else:
return await _run_subprocess(code, language)
async def _run_subprocess(code: str, language: str) -> dict:
"""Fallback: run code in a subprocess with timeout."""
cmd_map = {
"python": ["python3", "-c"],
"javascript": ["node", "-e"],
"shell": ["sh", "-c"],
"bash": ["bash", "-c"],
}
cmd_prefix = cmd_map.get(language)
if not cmd_prefix:
return {"error": f"Unsupported language: {language}", "exit_code": -1}
try:
proc = await asyncio.create_subprocess_exec(
*cmd_prefix, code,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=SANDBOX_TIMEOUT)
return {
"exit_code": proc.returncode,
"stdout": stdout.decode("utf-8", errors="replace")[:MAX_OUTPUT],
"stderr": stderr.decode("utf-8", errors="replace")[:MAX_OUTPUT],
"language": language,
"backend": "subprocess",
}
except asyncio.TimeoutError:
try:
proc.kill()
await proc.wait()
except ProcessLookupError:
pass
return {"error": f"Timeout after {SANDBOX_TIMEOUT}s", "exit_code": -1}
except Exception as e:
return {"error": str(e), "exit_code": -1}
async def _run_docker(code: str, language: str) -> dict:
"""Run code in a throwaway Docker container via mounted temp file."""
image_map = {
"python": ("python:3.11-slim", ["python3", "/sandbox/code.py"]),
"javascript": ("node:20-slim", ["node", "/sandbox/code.js"]),
"shell": ("alpine:3.18", ["sh", "/sandbox/code.sh"]),
"bash": ("alpine:3.18", ["sh", "/sandbox/code.sh"]),
}
entry = image_map.get(language)
if not entry:
return {"error": f"Unsupported language: {language}", "exit_code": -1}
image, run_cmd = entry
code_file = None
try:
# Write code to temp file — avoids shell metacharacter injection
ext = {"python": ".py", "javascript": ".js", "shell": ".sh", "bash": ".sh"}.get(language, ".txt")
fd, code_file = tempfile.mkstemp(suffix=ext, prefix="sandbox_")
with os.fdopen(fd, "w") as f:
f.write(code)
cmd = [
"docker", "run", "--rm",
"--network", "none",
"--memory", SANDBOX_MEMORY_LIMIT,
"--cpus", "0.5",
"--read-only",
"--tmpfs", "/tmp:size=32m",
"-v", f"{code_file}:/sandbox/code{ext}:ro",
image,
] + run_cmd
proc = await asyncio.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=SANDBOX_TIMEOUT)
return {
"exit_code": proc.returncode,
"stdout": stdout.decode("utf-8", errors="replace")[:MAX_OUTPUT],
"stderr": stderr.decode("utf-8", errors="replace")[:MAX_OUTPUT],
"language": language,
"backend": "docker",
"image": image,
}
except asyncio.TimeoutError:
return {"error": f"Timeout after {SANDBOX_TIMEOUT}s", "exit_code": -1}
except Exception as e:
return {"error": str(e), "exit_code": -1}
finally:
if code_file:
try:
os.unlink(code_file)
except OSError:
pass
async def _run_e2b(code: str, language: str) -> dict:
"""Run code in an E2B cloud microVM sandbox.
Requires the e2b-code-interpreter package and an E2B_API_KEY secret.
Each call creates a fresh sandbox, runs the code, and destroys the sandbox.
Sandbox lifetime is bounded by SANDBOX_TIMEOUT seconds.
Supported languages: python, javascript.
"""
# Import lazily so the package is only required when the e2b backend is
# actually configured — other backends work without it installed.
try:
from e2b_code_interpreter import Sandbox
except ImportError:
return {
"error": (
"e2b-code-interpreter is not installed. "
"Add it to requirements.txt or switch to the docker/subprocess backend."
),
"exit_code": -1,
}
api_key = os.environ.get("E2B_API_KEY")
if not api_key:
return {
"error": (
"E2B_API_KEY is not set. "
"Add it as a workspace secret via the canvas Secrets panel or platform API."
),
"exit_code": -1,
}
kernel = _E2B_KERNEL_MAP.get(language)
if kernel is None:
return {
"error": (
f"Language '{language}' is not supported by the e2b backend. "
"Supported: python, javascript."
),
"exit_code": -1,
}
sandbox = None
try:
# Create a fresh sandbox for this execution.
# timeout controls the sandbox lifetime in seconds.
sandbox = await asyncio.wait_for(
asyncio.get_running_loop().run_in_executor(
None,
lambda: Sandbox(api_key=api_key, timeout=SANDBOX_TIMEOUT),
),
timeout=SANDBOX_TIMEOUT,
)
# Execute code and collect results.
execution = await asyncio.wait_for(
asyncio.get_running_loop().run_in_executor(
None,
lambda: sandbox.run_code(code, language=kernel),
),
timeout=SANDBOX_TIMEOUT,
)
# E2B returns a list of Result objects; collect text/error output.
stdout_parts = []
stderr_parts = []
for result in execution.results:
# result.text is the primary output (stdout equivalent)
if hasattr(result, "text") and result.text:
stdout_parts.append(str(result.text))
# Some result types expose an error attribute
if hasattr(result, "error") and result.error:
stderr_parts.append(str(result.error))
# Logs are stored separately in execution.logs
if hasattr(execution, "logs"):
logs = execution.logs
if hasattr(logs, "stdout") and logs.stdout:
stdout_parts.extend(logs.stdout)
if hasattr(logs, "stderr") and logs.stderr:
stderr_parts.extend(logs.stderr)
combined_stdout = "".join(stdout_parts)[:MAX_OUTPUT]
combined_stderr = "".join(stderr_parts)[:MAX_OUTPUT]
# Treat any stderr output as a non-zero exit code (e2b doesn't expose
# a numeric exit code at the sandbox level).
exit_code = 1 if combined_stderr else 0
return {
"exit_code": exit_code,
"stdout": combined_stdout,
"stderr": combined_stderr,
"language": language,
"backend": "e2b",
}
except asyncio.TimeoutError:
logger.warning("E2B sandbox timed out after %ds", SANDBOX_TIMEOUT)
return {"error": f"Timeout after {SANDBOX_TIMEOUT}s", "exit_code": -1}
except Exception as e:
logger.exception("E2B sandbox error: %s", e)
return {"error": str(e), "exit_code": -1}
finally:
# Always destroy the sandbox to avoid leaking E2B credits.
if sandbox is not None:
try:
await asyncio.get_running_loop().run_in_executor(
None, sandbox.kill
)
except Exception:
pass # Best-effort cleanup
-120
View File
@@ -1,120 +0,0 @@
"""Secret-scrubbing utilities for workspace runtime (#834 — C2).
Provides ``_redact_secrets()`` applied at every ``commit_memory`` call site
to prevent API keys and tokens from being persisted verbatim in the
memories table.
Design notes
------------
- **Allowlist of known prefixes** (``sk-``, ``ghp_``, etc.) cover the most
dangerous tokens because they are unambiguous.
- **Contextual pattern** covers generic high-entropy values that appear
immediately after assignment keywords (``key=``, ``token=``, ``secret=``,
``password=``, ``api_key=``). The keyword is preserved in the output so
log lines remain readable; only the value is redacted.
- **Idempotent**: the replacement token ``[REDACTED]`` does not match any
of the patterns, so calling ``_redact_secrets`` twice is safe.
- **No false-positive risk on normal prose**: all patterns require either
a well-known prefix (``AKIA``, ``ghp_``, ``sk-``) or both a keyword and
≥ 40 base64/alphanumeric chars — ordinary English words never match.
Relationship to ``compliance.redact_pii``
------------------------------------------
``redact_pii`` handles PII (emails, SSNs, credit cards) and uses typed
tokens ``[REDACTED:type]`` for SIEM indexing. ``_redact_secrets`` is
narrowly scoped to API credentials and uses the plain ``[REDACTED]`` token
because the exact secret type is not important at the storage layer —
what matters is that no credential value ever reaches the database.
"""
from __future__ import annotations
import re
from typing import List
# ---------------------------------------------------------------------------
# Replacement sentinel
# ---------------------------------------------------------------------------
#: Replacement token — deliberately plain so downstream readers do not need
#: to parse structured tokens. Does not match any scrub pattern (idempotent).
REDACTED: str = "[REDACTED]"
# ---------------------------------------------------------------------------
# Patterns
# ---------------------------------------------------------------------------
# Patterns that identify secret values by their well-known prefix.
# Ordered from most specific to least specific.
_BARE_PATTERNS: List[re.Pattern] = [
# OpenAI / Anthropic-style keys: sk-<20+ alnum/hyphen/underscore chars>
# Covers: sk-<key>, sk-ant-<key>, sk-proj-<key>, etc.
re.compile(r"\bsk-[A-Za-z0-9_-]{20,}\b"),
# GitHub classic personal access token
re.compile(r"\bghp_[A-Za-z0-9]{36}\b"),
# GitHub server-to-server token
re.compile(r"\bghs_[A-Za-z0-9]{36}\b"),
# GitHub fine-grained personal access token
re.compile(r"\bgithub_pat_[A-Za-z0-9_]{82}\b"),
# AWS access key ID
re.compile(r"\bAKIA[0-9A-Z]{16}\b"),
]
# Contextual pattern: keyword= followed by a high-entropy value.
#
# Group 1 captures the keyword + equals sign so it is preserved in the
# replacement — "api_key=[REDACTED]" is more informative than "[REDACTED]".
#
# The value charset [A-Za-z0-9+/] covers base64 and common token alphabets.
# The minimum length of 40 chars prevents false-positives on short values.
_CONTEXTUAL_RE: re.Pattern = re.compile(
r"(?i)"
r"((?:api_key|key|token|secret|password)\s*=\s*)"
r"([A-Za-z0-9+/]{40,}={0,2})"
)
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def _redact_secrets(content: str) -> str:
"""Scrub known secret patterns from *content*, replacing with ``[REDACTED]``.
Parameters
----------
content:
Raw string to scrub — typically a ``commit_memory`` payload.
Returns
-------
str
Copy of *content* with secrets replaced. If no secrets are found,
the original string is returned unchanged. Calling this function
on already-redacted content is safe (idempotent).
Examples::
>>> _redact_secrets("token is sk-abc1234567890123456789012345")
'token is [REDACTED]'
>>> _redact_secrets("api_key=" + "A" * 45)
'api_key=[REDACTED]'
>>> _redact_secrets("The answer is 42.")
'The answer is 42.'
>>> _redact_secrets("[REDACTED]")
'[REDACTED]'
"""
result = content
# Apply prefix-based patterns first (most unambiguous)
for pattern in _BARE_PATTERNS:
result = pattern.sub(REDACTED, result)
# Apply contextual pattern — preserve keyword, replace only the value
result = _CONTEXTUAL_RE.sub(r"\1" + REDACTED, result)
return result
-344
View File
@@ -1,344 +0,0 @@
"""Skill dependency security scanner — supply-chain risk management.
Scans a skill's ``requirements.txt`` for known CVEs before the skill is
loaded into the workspace. Two scanners are supported:
Snyk CLI — ``snyk test --file=requirements.txt --json``
Preferred; requires the ``snyk`` binary in PATH and
a SNYK_TOKEN env var for authenticated scans.
pip-audit — ``pip-audit -r requirements.txt --json``
Fallback; no authentication required.
The scanner is auto-selected: Snyk if available, pip-audit otherwise.
If neither is present in PATH the scan is silently skipped with a log line.
Scan mode (``security_scan.mode`` in config.yaml):
block — raise ``SkillSecurityError`` when critical/high CVEs are found;
the skill is *not* loaded.
warn — log a WARNING + audit event; the skill is loaded anyway.
off — skip scanning entirely; useful in air-gapped CI.
Audit trail
-----------
Every scan (pass or fail) is recorded via ``tools.audit.log_event`` with
``event_type="security_scan"``, enabling compliance reports to prove that
all loaded skills were checked before activation.
"""
from __future__ import annotations
import json
import logging
import shutil
import subprocess
from dataclasses import dataclass, field
from pathlib import Path
from typing import Optional
from builtin_tools.audit import log_event
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Public exception
# ---------------------------------------------------------------------------
class SkillSecurityError(RuntimeError):
"""Raised when a skill fails security scanning in ``block`` mode.
The message contains the skill name, scanner used, and a summary of the
critical/high findings so operators can act on it immediately.
"""
# ---------------------------------------------------------------------------
# Data models
# ---------------------------------------------------------------------------
@dataclass
class CVEFinding:
"""A single vulnerability finding from a security scanner."""
vuln_id: str
"""CVE or advisory identifier, e.g. ``SNYK-PYTHON-REQUESTS-1234``."""
package: str
"""Affected package name."""
version: str
"""Installed version of the package."""
severity: str
"""One of: critical | high | medium | low | unknown."""
description: str
"""Short human-readable summary (≤ 200 chars)."""
@dataclass
class ScanResult:
"""Aggregated result of a single skill dependency scan."""
skill_name: str
scanner: str
"""Scanner used: ``"snyk"`` | ``"pip-audit"`` | ``"none"``."""
requirements_file: Optional[str]
"""Absolute path to the scanned requirements.txt, or ``None``."""
findings: list[CVEFinding] = field(default_factory=list)
scan_error: Optional[str] = None
"""Non-fatal scanner error (e.g. timeout); findings may be incomplete."""
@property
def critical_or_high(self) -> list[CVEFinding]:
return [f for f in self.findings if f.severity in ("critical", "high")]
@property
def has_critical_or_high(self) -> bool:
return bool(self.critical_or_high)
# ---------------------------------------------------------------------------
# Internal helpers
# ---------------------------------------------------------------------------
def _find_requirements(skill_path: Path) -> Optional[Path]:
"""Return the first ``requirements.txt`` found in the skill tree."""
for candidate in (
skill_path / "requirements.txt",
skill_path / "tools" / "requirements.txt",
):
if candidate.exists():
return candidate
return None
def _run_scanner(cmd: list[str], timeout: int = 120) -> tuple[str, Optional[str]]:
"""Run a scanner subprocess and return ``(stdout, error_or_None)``."""
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=timeout,
)
# Both Snyk and pip-audit exit 1 when vulns are found — not an error.
# Exit 2 from Snyk means a genuine scan failure.
if result.returncode == 2 and not result.stdout.strip():
return "", f"scanner exited 2: {result.stderr.strip()[:200]}"
return result.stdout, None
except subprocess.TimeoutExpired:
return "", f"scanner timed out after {timeout}s"
except FileNotFoundError as exc:
return "", str(exc)
except Exception as exc: # pylint: disable=broad-except
return "", str(exc)
def _parse_snyk(stdout: str) -> tuple[list[CVEFinding], Optional[str]]:
"""Parse ``snyk test --json`` output."""
if not stdout.strip():
return [], "empty snyk output"
try:
data = json.loads(stdout)
except json.JSONDecodeError as exc:
return [], f"snyk JSON parse error: {exc}"
vulns = data.get("vulnerabilities", [])
findings = [
CVEFinding(
vuln_id=v.get("id", "UNKNOWN"),
package=v.get("packageName", "?"),
version=v.get("version", "?"),
severity=v.get("severity", "unknown").lower(),
description=(v.get("title", "") or "")[:200],
)
for v in vulns
if isinstance(v, dict)
]
return findings, None
def _parse_pip_audit(stdout: str) -> tuple[list[CVEFinding], Optional[str]]:
"""Parse ``pip-audit --json`` output.
pip-audit does not always provide a CVSS severity level. When absent we
conservatively classify the finding as ``"high"`` so it is not silently
ignored in ``warn`` mode.
"""
if not stdout.strip():
return [], "empty pip-audit output"
try:
data = json.loads(stdout)
except json.JSONDecodeError as exc:
return [], f"pip-audit JSON parse error: {exc}"
# pip-audit ≥ 2.x wraps results in {"dependencies": [...]}
if isinstance(data, dict):
deps = data.get("dependencies", [])
else:
deps = data # older versions return a bare list
findings: list[CVEFinding] = []
for dep in deps:
if not isinstance(dep, dict):
continue
for vuln in dep.get("vulns", []):
sev_raw = vuln.get("fix_versions") and "high" # pip-audit lacks severity
sev = (vuln.get("severity") or sev_raw or "high").lower()
findings.append(
CVEFinding(
vuln_id=vuln.get("id", "UNKNOWN"),
package=dep.get("name", "?"),
version=dep.get("version", "?"),
severity=sev,
description=(vuln.get("description", "") or "")[:200],
)
)
return findings, None
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def scan_skill_dependencies(
skill_name: str,
skill_path: Path,
mode: str,
fail_open_if_no_scanner: bool = True,
) -> ScanResult:
"""Scan a skill's dependency file for known CVEs.
Args:
skill_name: Name of the skill (used in log messages and audit events).
skill_path: Absolute path to the skill's root directory.
mode: ``"block"`` | ``"warn"`` | ``"off"``
fail_open_if_no_scanner:
When *True* (default) silently skip scanning if neither snyk nor
pip-audit is in PATH. When *False* and ``mode="block"``, raise
:class:`SkillSecurityError` so operators know the gate is absent.
Corresponds to ``security_scan.fail_open_if_no_scanner`` in
config.yaml. Closes #268.
Returns:
A :class:`ScanResult` describing what was found.
Raises:
:class:`SkillSecurityError`: When ``mode="block"`` and one or more
critical/high severity CVEs are found — OR when
``mode="block"`` and ``fail_open_if_no_scanner=False`` and no
scanner is available.
"""
if mode == "off":
return ScanResult(skill_name=skill_name, scanner="none", requirements_file=None)
req_file = _find_requirements(skill_path)
if req_file is None:
# No requirements file — nothing to scan; not a problem.
return ScanResult(skill_name=skill_name, scanner="none", requirements_file=None)
# ── Select scanner ────────────────────────────────────────────────────────
scanner_name: str
findings: list[CVEFinding]
scan_error: Optional[str]
if shutil.which("snyk"):
scanner_name = "snyk"
stdout, run_error = _run_scanner(
["snyk", "test", f"--file={req_file}", "--json"]
)
if run_error:
findings, scan_error = [], run_error
else:
findings, scan_error = _parse_snyk(stdout)
elif shutil.which("pip-audit"):
scanner_name = "pip-audit"
stdout, run_error = _run_scanner(
["pip-audit", "-r", str(req_file), "--json", "--progress-spinner=off"]
)
if run_error:
findings, scan_error = [], run_error
else:
findings, scan_error = _parse_pip_audit(stdout)
else:
logger.info(
"security_scan: no scanner (snyk, pip-audit) in PATH — skipping %s",
skill_name,
)
log_event(
event_type="security_scan",
action="skill.security_scan",
resource=skill_name,
outcome="skipped",
reason="no_scanner_in_path",
requirements_file=str(req_file),
mode=mode,
)
# #268: if fail_open_if_no_scanner=False and mode=block, the operator
# explicitly opted in to "fail closed" — raise so the missing scanner
# is visible rather than silently skipped.
if not fail_open_if_no_scanner and mode == "block":
raise SkillSecurityError(
f"Skill '{skill_name}' blocked: no scanner (snyk or pip-audit) "
f"found in PATH and fail_open_if_no_scanner=false"
)
return ScanResult(
skill_name=skill_name,
scanner="none",
requirements_file=str(req_file),
scan_error="No scanner (snyk or pip-audit) found in PATH",
)
result = ScanResult(
skill_name=skill_name,
scanner=scanner_name,
requirements_file=str(req_file),
findings=findings,
scan_error=scan_error,
)
# ── Log scan outcome to audit trail ──────────────────────────────────────
audit_outcome = "clean" if not result.has_critical_or_high else "vulnerable"
log_event(
event_type="security_scan",
action="skill.security_scan",
resource=skill_name,
outcome=audit_outcome,
scanner=scanner_name,
requirements_file=str(req_file),
total_findings=len(findings),
critical_or_high_count=len(result.critical_or_high),
scan_error=scan_error,
)
if scan_error:
logger.warning(
"security_scan: scanner error for skill '%s': %s", skill_name, scan_error
)
# ── Enforce mode ─────────────────────────────────────────────────────────
if result.has_critical_or_high:
summary = ", ".join(
f"{f.vuln_id}({f.severity}) in {f.package}@{f.version}"
for f in result.critical_or_high[:5]
)
if len(result.critical_or_high) > 5:
summary += f" … and {len(result.critical_or_high) - 5} more"
msg = (
f"Skill '{skill_name}' has {len(result.critical_or_high)} "
f"critical/high CVE(s) [{scanner_name}]: {summary}"
)
if mode == "block":
logger.error("Blocking skill load — %s", msg)
raise SkillSecurityError(msg)
# warn mode — continue loading, but make noise
logger.warning("Security warning — %s", msg)
return result
-418
View File
@@ -1,418 +0,0 @@
"""OpenTelemetry (OTEL) instrumentation for the Molecule AI workspace runtime.
Architecture
------------
* One global ``TracerProvider`` is initialised at startup via ``setup_telemetry()``.
* Up to three exporters are wired in:
1. **OTLP/HTTP** — activated when ``OTEL_EXPORTER_OTLP_ENDPOINT`` is set.
Point this at any compatible collector (Jaeger, Tempo, Grafana OTEL, …).
2. **Langfuse OTLP bridge** — activated when the ``LANGFUSE_HOST``,
``LANGFUSE_PUBLIC_KEY`` and ``LANGFUSE_SECRET_KEY`` env vars are all present.
Langfuse ≥4 accepts OTLP/HTTP at ``<host>/api/public/otel``.
This is a *second* exporter alongside the existing Langfuse LangChain
callback handler in agent.py — both paths emit spans simultaneously.
3. **Console** (debug) — activated when ``OTEL_DEBUG=1``.
* **W3C TraceContext** propagation (``traceparent`` / ``tracestate``) is used for
cross-workspace context injection and extraction so A2A hops form a single
distributed trace.
* ``make_trace_middleware()`` returns an ASGI middleware that extracts incoming
trace context from HTTP headers and stores it in a ``ContextVar`` so the
A2A executor can access it to parent its spans correctly.
GenAI semantic conventions
--------------------------
Attribute constants for ``gen_ai.*`` follow OpenTelemetry GenAI SemConv 1.26.
Usage example
-------------
# main.py — call once at startup
from builtin_tools.telemetry import setup_telemetry, make_trace_middleware
setup_telemetry(service_name=workspace_id)
instrumented = make_trace_middleware(app.build())
# Any module
from builtin_tools.telemetry import get_tracer
tracer = get_tracer()
with tracer.start_as_current_span("my_span") as span:
span.set_attribute("key", "value")
# Outgoing HTTP — inject W3C headers
from builtin_tools.telemetry import inject_trace_headers
headers = inject_trace_headers({"Content-Type": "application/json"})
await client.post(url, headers=headers, ...)
# Incoming HTTP — extract context (done automatically by middleware)
from builtin_tools.telemetry import extract_trace_context
ctx = extract_trace_context(dict(request.headers))
"""
from __future__ import annotations
import base64
import logging
import os
from contextvars import ContextVar
from typing import Any, Optional
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# GenAI Semantic Convention attribute keys (OTel SemConv 1.26)
# https://opentelemetry.io/docs/specs/semconv/gen-ai/
# ---------------------------------------------------------------------------
GEN_AI_SYSTEM = "gen_ai.system"
GEN_AI_REQUEST_MODEL = "gen_ai.request.model"
GEN_AI_OPERATION_NAME = "gen_ai.operation.name"
GEN_AI_USAGE_INPUT_TOKENS = "gen_ai.usage.input_tokens"
GEN_AI_USAGE_OUTPUT_TOKENS = "gen_ai.usage.output_tokens"
GEN_AI_RESPONSE_FINISH_REASONS = "gen_ai.response.finish_reasons"
# ---------------------------------------------------------------------------
# Workspace / A2A attribute keys
# ---------------------------------------------------------------------------
WORKSPACE_ID_ATTR = "workspace.id"
A2A_SOURCE_WORKSPACE = "a2a.source_workspace_id"
A2A_TARGET_WORKSPACE = "a2a.target_workspace_id"
A2A_TASK_ID = "a2a.task_id"
MEMORY_SCOPE = "memory.scope"
MEMORY_QUERY = "memory.query"
# ---------------------------------------------------------------------------
# Module-level state
# ---------------------------------------------------------------------------
WORKSPACE_ID: str = os.environ.get("WORKSPACE_ID", "unknown")
_initialized: bool = False
_tracer: Any = None # opentelemetry.trace.Tracer | _NoopTracer
# ContextVar that carries incoming trace context from the ASGI middleware to
# the A2A executor. Using a ContextVar (rather than a global) is safe with
# asyncio because each task inherits a copy of the context at creation time.
_incoming_trace_context: ContextVar[Optional[Any]] = ContextVar(
"otel_incoming_trace_context", default=None
)
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def setup_telemetry(service_name: Optional[str] = None) -> None:
"""Initialise the global ``TracerProvider``. Safe to call multiple times.
Reads configuration from environment variables:
``OTEL_EXPORTER_OTLP_ENDPOINT``
Base URL of an OTLP-compatible collector (e.g. ``http://jaeger:4318``).
Spans are sent to ``<endpoint>/v1/traces``.
``LANGFUSE_HOST`` + ``LANGFUSE_PUBLIC_KEY`` + ``LANGFUSE_SECRET_KEY``
When all three are set, a second OTLP exporter is wired to Langfuse's
ingest endpoint using HTTP Basic auth.
``OTEL_DEBUG``
Set to ``1`` / ``true`` to also print spans to stdout.
"""
global _initialized, _tracer
if _initialized:
return
try:
from opentelemetry import propagate, trace
from opentelemetry.baggage.propagation import W3CBaggagePropagator
from opentelemetry.propagators.composite import CompositePropagator
from opentelemetry.sdk.resources import SERVICE_NAME as OTEL_SERVICE_NAME
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
except ImportError as exc:
logger.warning(
"OTEL: opentelemetry packages not installed — telemetry disabled. "
"Add opentelemetry-api, opentelemetry-sdk, "
"opentelemetry-exporter-otlp-proto-http to requirements.txt. "
"Error: %s",
exc,
)
return
svc = service_name or f"molecule-{WORKSPACE_ID}"
resource = Resource.create(
{
OTEL_SERVICE_NAME: svc,
"service.version": "1.0.0",
WORKSPACE_ID_ATTR: WORKSPACE_ID,
}
)
provider = TracerProvider(resource=resource)
# -- Exporter 1: Generic OTLP/HTTP ----------------------------------------
otlp_endpoint = os.environ.get("OTEL_EXPORTER_OTLP_ENDPOINT", "").rstrip("/")
if otlp_endpoint:
try:
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
exporter = OTLPSpanExporter(endpoint=f"{otlp_endpoint}/v1/traces")
provider.add_span_processor(BatchSpanProcessor(exporter))
logger.info("OTEL: OTLP/HTTP exporter → %s", otlp_endpoint)
except ImportError:
logger.warning(
"OTEL: OTEL_EXPORTER_OTLP_ENDPOINT is set but "
"opentelemetry-exporter-otlp-proto-http is not installed"
)
except Exception as exc:
logger.warning("OTEL: OTLP exporter init failed: %s", exc)
# -- Exporter 2: Langfuse OTLP bridge -------------------------------------
# Langfuse ≥4 accepts OTLP at <host>/api/public/otel (Basic auth).
lf_host = os.environ.get("LANGFUSE_HOST", "").rstrip("/")
lf_public = os.environ.get("LANGFUSE_PUBLIC_KEY", "")
lf_secret = os.environ.get("LANGFUSE_SECRET_KEY", "")
if lf_host and lf_public and lf_secret:
try:
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
lf_endpoint = f"{lf_host}/api/public/otel/v1/traces"
token = base64.b64encode(f"{lf_public}:{lf_secret}".encode()).decode()
lf_exporter = OTLPSpanExporter(
endpoint=lf_endpoint,
headers={"Authorization": f"Basic {token}"},
)
provider.add_span_processor(BatchSpanProcessor(lf_exporter))
logger.info("OTEL: Langfuse OTLP bridge → %s", lf_endpoint)
except ImportError:
logger.warning(
"OTEL: Langfuse env vars set but "
"opentelemetry-exporter-otlp-proto-http is not installed"
)
except Exception as exc:
logger.warning("OTEL: Langfuse OTLP bridge init failed: %s", exc)
# -- Exporter 3: Console (debug) ------------------------------------------
if os.environ.get("OTEL_DEBUG", "").lower() in ("1", "true", "yes"):
provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
logger.info("OTEL: console debug exporter enabled")
# -- Register global provider + W3C propagators ---------------------------
trace.set_tracer_provider(provider)
propagate.set_global_textmap(
CompositePropagator(
[
TraceContextTextMapPropagator(),
W3CBaggagePropagator(),
]
)
)
_tracer = trace.get_tracer(
"molecule.workspace",
schema_url="https://opentelemetry.io/schemas/1.26.0",
)
_initialized = True
logger.info("OTEL: telemetry initialised for service '%s'", svc)
def get_tracer() -> Any:
"""Return the global ``Tracer``. Lazily calls ``setup_telemetry()`` if needed.
Returns a no-op tracer when the opentelemetry packages are not installed so
that instrumented code never raises ``ImportError``.
"""
global _tracer
if not _initialized:
setup_telemetry()
if _tracer is None:
# Packages unavailable — hand back a no-op implementation
try:
from opentelemetry import trace
return trace.get_tracer("molecule.noop")
except ImportError:
return _NoopTracer()
return _tracer
def inject_trace_headers(headers: dict) -> dict:
"""Inject W3C ``traceparent`` / ``tracestate`` into *headers* and return it.
Mutates the dict in-place so it can be used directly::
headers = inject_trace_headers({"Content-Type": "application/json"})
await client.post(url, headers=headers, ...)
"""
try:
from opentelemetry import propagate
propagate.inject(headers)
except Exception:
pass # Never let telemetry break the caller
return headers
def extract_trace_context(carrier: dict) -> Any:
"""Extract W3C trace context from a header mapping.
Returns an OpenTelemetry ``Context`` object suitable for::
tracer.start_as_current_span("name", context=ctx)
Returns ``None`` when packages are unavailable or no context is present.
"""
try:
from opentelemetry import propagate
return propagate.extract(carrier)
except Exception:
return None
def get_current_traceparent() -> Optional[str]:
"""Return the W3C ``traceparent`` string for the active span, or ``None``."""
try:
from opentelemetry import trace
span = trace.get_current_span()
ctx = span.get_span_context()
if not ctx.is_valid:
return None
trace_id = format(ctx.trace_id, "032x")
span_id = format(ctx.span_id, "016x")
flags = "01" if ctx.trace_flags else "00"
return f"00-{trace_id}-{span_id}-{flags}"
except Exception:
return None
def make_trace_middleware(asgi_app: Any) -> Any:
"""Wrap an ASGI application with W3C trace-context extraction middleware.
The middleware reads ``traceparent`` / ``tracestate`` from every incoming
HTTP request and stores the extracted ``Context`` in the
``_incoming_trace_context`` ContextVar. The A2A executor reads that
ContextVar to parent its ``task_receive`` span correctly, forming an
unbroken distributed trace across workspace hops.
Usage::
built = app.build()
instrumented = make_trace_middleware(built)
uvicorn.Config(instrumented, ...)
"""
async def _middleware(scope: dict, receive: Any, send: Any) -> None: # type: ignore[override]
if scope.get("type") != "http":
await asgi_app(scope, receive, send)
return
# Decode byte-headers from the ASGI scope (latin-1 per HTTP/1.1 spec)
raw_headers: list[tuple[bytes, bytes]] = scope.get("headers", [])
str_headers: dict[str, str] = {
k.decode("latin-1"): v.decode("latin-1") for k, v in raw_headers
}
ctx = extract_trace_context(str_headers)
token = _incoming_trace_context.set(ctx)
try:
await asgi_app(scope, receive, send)
finally:
_incoming_trace_context.reset(token)
return _middleware
# ---------------------------------------------------------------------------
# Helpers for GenAI attributes
# ---------------------------------------------------------------------------
def gen_ai_system_from_model(model_str: str) -> str:
"""Map a ``provider:model`` string to a ``gen_ai.system`` value."""
if ":" not in model_str:
return "unknown"
provider = model_str.split(":", 1)[0].lower()
return {
"anthropic": "anthropic",
"openai": "openai",
"openrouter": "openrouter",
"groq": "groq",
"google_genai": "google",
"ollama": "ollama",
}.get(provider, provider)
def record_llm_token_usage(span: Any, result: dict) -> None:
"""Extract token counts from a LangGraph ainvoke result and set span attrs.
Handles both Anthropic (``usage``) and OpenAI (``token_usage``) metadata
shapes. Silently skips if metadata is absent.
"""
try:
messages = result.get("messages", [])
for msg in reversed(messages):
meta = getattr(msg, "response_metadata", {}) or {}
# Anthropic
usage = meta.get("usage", {})
if usage:
inp = usage.get("input_tokens") or usage.get("prompt_tokens")
out = usage.get("output_tokens") or usage.get("completion_tokens")
if inp is not None:
span.set_attribute(GEN_AI_USAGE_INPUT_TOKENS, int(inp))
if out is not None:
span.set_attribute(GEN_AI_USAGE_OUTPUT_TOKENS, int(out))
return
# OpenAI
token_usage = meta.get("token_usage", {})
if token_usage:
inp = token_usage.get("prompt_tokens")
out = token_usage.get("completion_tokens")
if inp is not None:
span.set_attribute(GEN_AI_USAGE_INPUT_TOKENS, int(inp))
if out is not None:
span.set_attribute(GEN_AI_USAGE_OUTPUT_TOKENS, int(out))
return
except Exception:
pass # Best-effort — never break the caller
# ---------------------------------------------------------------------------
# No-op fallbacks (used when opentelemetry packages are absent)
# ---------------------------------------------------------------------------
class _NoopSpan:
"""Transparent no-op span that satisfies the context-manager protocol."""
def set_attribute(self, key: str, value: Any) -> None: # noqa: ARG002
pass
def set_status(self, *args: Any, **kwargs: Any) -> None:
pass
def record_exception(self, exc: BaseException, *args: Any, **kwargs: Any) -> None:
pass
def add_event(self, name: str, *args: Any, **kwargs: Any) -> None:
pass
def __enter__(self) -> "_NoopSpan":
return self
def __exit__(self, *args: Any) -> None:
pass
class _NoopTracer:
"""Transparent no-op tracer returned when the SDK is unavailable."""
def start_as_current_span(self, name: str, *args: Any, **kwargs: Any) -> _NoopSpan: # noqa: ARG002
return _NoopSpan()
def start_span(self, name: str, *args: Any, **kwargs: Any) -> _NoopSpan: # noqa: ARG002
return _NoopSpan()
@@ -1,697 +0,0 @@
"""Temporal durable execution wrapper for Molecule AI A2A workspaces.
Architecture
-----------
A co-located Temporal worker runs as an asyncio background task **inside the
same process** as the A2A server. This means worker activities share the same
memory space as the A2A handler, which lets us bridge non-serialisable objects
(LangGraph agent, EventQueue, RequestContext) through an in-process registry
without having to serialise them through Temporal's state store.
Workflow stages (names mirror the OTEL span names in a2a_executor.py):
task_receive → llm_call → task_complete
task_receive — durable checkpoint: task acknowledged, queued
llm_call — durable checkpoint: LLM execution + SSE streaming (retryable)
task_complete — durable checkpoint: execution finished, telemetry recorded
Crash-recovery behaviour
------------------------
If the process crashes while ``llm_call`` is running, Temporal retries the
activity on the restarted process. The in-process registry is empty after a
restart, so the activity detects a registry miss, logs a warning, and returns
an error result. The SSE client connection is already gone at that point so
no response can be delivered — but the task is permanently recorded in
Temporal's history and will not silently disappear.
Env vars
--------
TEMPORAL_HOST Temporal gRPC endpoint (default: ``localhost:7233``)
Set this to enable durable execution. Leave unset (or point
at an unreachable host) to run in direct-execution mode.
Dependencies (optional)
-----------
temporalio>=1.7.0
Add to requirements.txt to enable. The module loads and the wrapper class
works without the package installed — all Temporal paths return early with a
graceful fallback to direct execution.
"""
from __future__ import annotations
import asyncio
import dataclasses
import logging
import os
import uuid
from datetime import timedelta
from typing import Any, Optional
import httpx
logger = logging.getLogger(__name__)
def _platform_url() -> str:
"""Return the platform URL, defaulting to host.docker.internal.
The workspace runtime always runs inside a Docker container, so
``localhost`` refers to the container itself, not the platform host.
The platform API is only reachable via ``host.docker.internal`` from
within a workspace container, regardless of how the container was started.
"""
return os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
# ─────────────────────────────────────────────────────────────────────────────
# Constants
# ─────────────────────────────────────────────────────────────────────────────
_TASK_QUEUE = "molecule-agent-tasks"
_WORKFLOW_EXECUTION_TIMEOUT = timedelta(minutes=30)
_ACTIVITY_START_TO_CLOSE_TIMEOUT = timedelta(minutes=10)
# ─────────────────────────────────────────────────────────────────────────────
# Checkpoint persistence (non-fatal)
# ─────────────────────────────────────────────────────────────────────────────
async def _fetch_latest_checkpoint(workspace_id: str) -> Optional[dict]:
"""GET /workspaces/:id/checkpoints/latest — returns the most recently
completed step for this workspace, or None if no checkpoints exist yet.
Non-fatal: any HTTP error, network failure, or timeout returns None so
the calling code continues without a resume context. A 404 (no checkpoints)
is the expected response for a freshly provisioned workspace.
Args:
workspace_id: The workspace to query.
Reads:
PLATFORM_URL Platform base URL (default ``http://host.docker.internal:8080``).
"""
try:
from platform_auth import auth_headers as _auth_headers # type: ignore[import]
platform_url = _platform_url()
url = f"{platform_url}/workspaces/{workspace_id}/checkpoints/latest"
async with httpx.AsyncClient(timeout=5.0) as client:
resp = await client.get(url, headers=_auth_headers())
if resp.status_code == 404:
return None
resp.raise_for_status()
return resp.json()
except Exception as exc:
logger.debug(
"Temporal: latest checkpoint fetch skipped workspace=%s: %s "
"(non-fatal — starting fresh context)",
workspace_id,
exc,
)
return None
async def _save_checkpoint(
workspace_id: str,
workflow_id: str,
step_name: str,
step_index: int,
payload: Optional[dict] = None,
) -> None:
"""POST a step checkpoint to the platform.
Non-fatal: any HTTP error, network failure, or timeout is logged as a
WARNING and silently swallowed so the calling activity always continues.
Checkpoint loss is survivable; aborting a workflow on a transient DB or
network blip is not.
Args:
workspace_id: The workspace whose token is used for auth.
workflow_id: Unique ID for this workflow execution (task_id).
step_name: Temporal activity stage name
(``task_receive`` / ``llm_call`` / ``task_complete``).
step_index: 0-based stage index matching the platform schema.
payload: Optional JSON-serialisable dict stored as JSONB.
Reads:
PLATFORM_URL Platform base URL (default ``http://host.docker.internal:8080``).
"""
try:
from platform_auth import auth_headers as _auth_headers # type: ignore[import]
platform_url = _platform_url()
url = f"{platform_url}/workspaces/{workspace_id}/checkpoints"
body: dict = {
"workflow_id": workflow_id,
"step_name": step_name,
"step_index": step_index,
}
if payload is not None:
body["payload"] = payload
async with httpx.AsyncClient(timeout=5.0) as client:
resp = await client.post(url, json=body, headers=_auth_headers())
resp.raise_for_status()
logger.debug(
"Temporal: checkpoint saved workspace=%s wf=%s step=%s idx=%d",
workspace_id,
workflow_id,
step_name,
step_index,
)
except Exception as exc:
# Non-fatal: workflow continues regardless of checkpoint outcome.
logger.warning(
"Temporal: checkpoint failed workspace=%s wf=%s step=%s: %s "
"(non-fatal — workflow continues)",
workspace_id,
workflow_id,
step_name,
exc,
)
# ─────────────────────────────────────────────────────────────────────────────
# Serialisable data models
# These are the only objects that cross the Temporal serialisation boundary.
# ─────────────────────────────────────────────────────────────────────────────
@dataclasses.dataclass
class AgentTaskInput:
"""Serialisable snapshot of an incoming A2A task.
All fields must be JSON-representable so that Temporal can persist them in
its workflow history (used for crash recovery and replay).
"""
task_id: str
context_id: str
user_input: str
model: str
workspace_id: str
history: list # [[role, content], ...] — tuples converted to lists
@dataclasses.dataclass
class LLMResult:
"""Serialisable execution result passed from ``llm_call`` to ``task_complete``."""
final_text: str
success: bool
error: str = ""
# ─────────────────────────────────────────────────────────────────────────────
# In-process registry
#
# Maps task_id → {executor, context, event_queue, final_text}
# Activities look up non-serialisable objects here. The registry is
# populated by TemporalWorkflowWrapper.run() before the workflow starts and
# cleaned up in the finally block when the workflow completes.
# ─────────────────────────────────────────────────────────────────────────────
_task_registry: dict[str, dict[str, Any]] = {}
# ─────────────────────────────────────────────────────────────────────────────
# Temporal workflow + activities
# Loaded only when the temporalio package is installed. The surrounding
# try/except ensures the module imports cleanly without the package.
# ─────────────────────────────────────────────────────────────────────────────
_TEMPORAL_AVAILABLE = False
try:
from temporalio import activity, workflow
from temporalio.client import Client
from temporalio.worker import Worker
_TEMPORAL_AVAILABLE = True
# ── Activities ────────────────────────────────────────────────────────── #
@activity.defn(name="task_receive")
async def task_receive_activity(inp: AgentTaskInput) -> dict:
"""Durable checkpoint: task received and queued for LLM execution.
Mirrors the *task_receive* OTEL span opened in
``LangGraphA2AExecutor._core_execute()``. This activity is lightweight —
it validates that the in-process registry entry exists and logs receipt.
The actual A2A "working" signal (``updater.start_work()``) is emitted
inside ``_core_execute()`` so that SSE timing is preserved.
Saves a step checkpoint after completing. Checkpoint failure is
non-fatal — the activity returns normally regardless.
"""
logger.info(
"Temporal[task_receive] task_id=%s context_id=%s workspace=%s model=%s",
inp.task_id,
inp.context_id,
inp.workspace_id,
inp.model,
)
if inp.task_id not in _task_registry:
logger.warning(
"Temporal[task_receive] task_id=%s not found in registry "
"(crash recovery path — no SSE client connection available)",
inp.task_id,
)
try:
await _save_checkpoint(
inp.workspace_id, inp.task_id, "task_receive", 0,
{"task_id": inp.task_id, "status": "registry_miss"},
)
except Exception as _ckpt_exc: # pragma: no cover
logger.warning("task_receive checkpoint swallowed: %s", _ckpt_exc)
return {"task_id": inp.task_id, "status": "registry_miss"}
try:
await _save_checkpoint(
inp.workspace_id, inp.task_id, "task_receive", 0,
{"task_id": inp.task_id, "status": "received"},
)
except Exception as _ckpt_exc: # pragma: no cover
logger.warning("task_receive checkpoint swallowed: %s", _ckpt_exc)
return {"task_id": inp.task_id, "status": "received"}
@activity.defn(name="llm_call")
async def llm_call_activity(inp: AgentTaskInput) -> LLMResult:
"""Durable checkpoint: LLM execution with streaming to the event_queue.
Mirrors the *llm_call* OTEL span in ``LangGraphA2AExecutor._core_execute()``.
Calls ``executor._core_execute()`` which handles the full execution pipeline:
SSE streaming, OTEL sub-spans, final message emission, and heartbeat updates.
On crash recovery (empty registry): logs a warning and returns an error
result. Temporal records the failure and will retry if configured to do so.
The original SSE client connection is gone after a crash, so no response
can be delivered, but the task is durably recorded in Temporal's history.
"""
logger.info("Temporal[llm_call] task_id=%s", inp.task_id)
entry = _task_registry.get(inp.task_id)
if entry is None:
msg = (
f"task_id={inp.task_id} not in registry — "
"process likely restarted; original SSE client connection is gone"
)
logger.warning("Temporal[llm_call] registry miss: %s", msg)
miss_result = LLMResult(final_text="", success=False, error=msg)
try:
await _save_checkpoint(
inp.workspace_id, inp.task_id, "llm_call", 1,
{"success": False, "error": msg},
)
except Exception as _ckpt_exc: # pragma: no cover
logger.warning("llm_call checkpoint swallowed: %s", _ckpt_exc)
return miss_result
try:
executor = entry["executor"]
context = entry["context"]
event_queue = entry["event_queue"]
# _core_execute() is the renamed body of the original execute().
# It handles: OTEL spans, SSE streaming, final message, heartbeat.
final_text = await executor._core_execute(context, event_queue)
# Cache for task_complete observability
entry["final_text"] = final_text or ""
result = LLMResult(final_text=final_text or "", success=True)
except Exception as exc:
logger.error(
"Temporal[llm_call] task_id=%s execution error: %s",
inp.task_id,
exc,
exc_info=True,
)
result = LLMResult(final_text="", success=False, error=str(exc))
try:
await _save_checkpoint(
inp.workspace_id, inp.task_id, "llm_call", 1,
{"success": result.success, "error": result.error or None},
)
except Exception as _ckpt_exc: # pragma: no cover
logger.warning("llm_call checkpoint swallowed: %s", _ckpt_exc)
return result
@activity.defn(name="task_complete")
async def task_complete_activity(result: LLMResult) -> None:
"""Durable checkpoint: task execution finished.
Mirrors the *task_complete* OTEL span in ``LangGraphA2AExecutor._core_execute()``.
This activity records the outcome for Temporal observability. The actual
OTEL task_complete span fires inside ``_core_execute()``; this activity
provides a durable, queryable record in Temporal's workflow history.
Saves a step checkpoint. Checkpoint failure is non-fatal.
The ``workspace_id`` and ``task_id`` are not available in this activity
(only the ``LLMResult`` is passed from ``llm_call``), so the checkpoint
is skipped here — ``llm_call`` already captured the final outcome.
"""
if result.success:
logger.info(
"Temporal[task_complete] success=True final_text_len=%d",
len(result.final_text),
)
else:
logger.warning(
"Temporal[task_complete] success=False error=%r",
result.error,
)
# ── Workflow ──────────────────────────────────────────────────────────── #
@workflow.defn
class MoleculeAIAgentWorkflow:
"""Durable Temporal workflow for Molecule AI A2A agent task execution.
Sequences three activities that mirror the OTEL span hierarchy in
``LangGraphA2AExecutor._core_execute()``:
task_receive → llm_call → task_complete
Each activity is a durable checkpoint: if the process crashes between
activities, Temporal resumes from the last completed checkpoint on
restart. If an activity fails (exception or timeout), Temporal can
retry it according to the configured retry policy.
"""
@workflow.run
async def run(self, inp: AgentTaskInput) -> LLMResult:
opts: dict[str, Any] = {
"start_to_close_timeout": _ACTIVITY_START_TO_CLOSE_TIMEOUT,
}
# Stage 1 — acknowledge receipt (lightweight checkpoint)
await workflow.execute_activity(task_receive_activity, inp, **opts)
# Stage 2 — LLM execution (main work; retryable on crash/timeout)
result: LLMResult = await workflow.execute_activity(
llm_call_activity, inp, **opts
)
# Stage 3 — record completion (lightweight checkpoint)
await workflow.execute_activity(task_complete_activity, result, **opts)
return result
except ImportError:
# temporalio not installed — the wrapper class below will gracefully fall
# back to direct execution for every call.
logger.debug(
"Temporal: temporalio package not installed — "
"durable execution disabled (add temporalio>=1.7.0 to requirements.txt)"
)
# ─────────────────────────────────────────────────────────────────────────────
# TemporalWorkflowWrapper
# ─────────────────────────────────────────────────────────────────────────────
class TemporalWorkflowWrapper:
"""Wraps ``LangGraphA2AExecutor.execute()`` with Temporal durable execution.
The wrapper intercepts each ``execute()`` call and routes it through a
``MoleculeAIAgentWorkflow`` Temporal workflow. If Temporal is unavailable
for any reason, execution falls back transparently to the direct path
(``executor._core_execute()``), so the A2A server never crashes due to
Temporal issues.
Lifecycle
---------
1. ``create_wrapper()`` — instantiate and register the global singleton.
2. ``await wrapper.start()`` — connect to Temporal, launch the background
worker. No-op (with a log warning) if Temporal is unreachable.
3. Normal operation — ``wrapper.run()`` is called from ``execute()``.
4. ``await wrapper.stop()`` — cancel the background worker task on shutdown.
Co-located worker pattern
-------------------------
The Temporal worker runs as an asyncio background task in the **same event
loop** as the A2A server. This means:
- No separate worker process to manage.
- Activities share the process's memory (registry access works).
- Worker and server share the same asyncio event loop.
Env vars
--------
``TEMPORAL_HOST`` Temporal gRPC address, e.g. ``localhost:7233`` or
``temporal.internal:7233``. Defaults to
``localhost:7233``. If Temporal is not reachable at
this address, the wrapper falls back to direct execution.
"""
def __init__(self) -> None:
self._host: str = os.environ.get("TEMPORAL_HOST", "localhost:7233")
self._client: Optional[Any] = None
self._worker: Optional[Any] = None
self._worker_task: Optional[asyncio.Task] = None # type: ignore[type-arg]
self._available: bool = False
# ── Lifecycle ─────────────────────────────────────────────────────────── #
async def start(self) -> None:
"""Connect to Temporal and start the co-located background worker.
Safe to call multiple times (idempotent after first success).
Never raises — logs a warning and returns on any failure.
"""
if not _TEMPORAL_AVAILABLE:
logger.info(
"Temporal: temporalio package not installed — "
"all tasks will use direct execution. "
"To enable durable execution: pip install temporalio>=1.7.0"
)
return
if self._available:
return # already started
# Connect to the Temporal server
try:
self._client = await Client.connect(self._host) # type: ignore[name-defined]
logger.info("Temporal: connected to %s", self._host)
except Exception as exc:
logger.warning(
"Temporal: cannot connect to %s (%s) — "
"all tasks will use direct execution (no durable state)",
self._host,
exc,
)
return
# Start the worker as an asyncio background task
try:
self._worker = Worker( # type: ignore[name-defined]
self._client,
task_queue=_TASK_QUEUE,
workflows=[MoleculeAIAgentWorkflow], # type: ignore[name-defined]
activities=[
task_receive_activity, # type: ignore[name-defined]
llm_call_activity, # type: ignore[name-defined]
task_complete_activity, # type: ignore[name-defined]
],
)
self._worker_task = asyncio.create_task(
self._worker.run(),
name="temporal-worker",
)
self._available = True
logger.info(
"Temporal: co-located worker started on task queue '%s'",
_TASK_QUEUE,
)
except Exception as exc:
logger.warning(
"Temporal: worker initialisation failed (%s) — "
"falling back to direct execution",
exc,
)
async def stop(self) -> None:
"""Gracefully stop the Temporal worker background task."""
self._available = False
if self._worker_task and not self._worker_task.done():
self._worker_task.cancel()
try:
await self._worker_task
except (asyncio.CancelledError, Exception):
pass
logger.info("Temporal: worker stopped")
# ── Public API ────────────────────────────────────────────────────────── #
def is_available(self) -> bool:
"""Return ``True`` if Temporal is connected and the worker is running."""
return self._available
async def run(
self,
executor: Any,
context: Any,
event_queue: Any,
) -> None:
"""Route one A2A task execution through a Temporal durable workflow.
Steps
-----
1. Build a serialisable ``AgentTaskInput`` from the A2A request context.
2. Store non-serialisable state (executor, context, event_queue) in
the in-process ``_task_registry`` keyed by task_id.
3. Submit and await ``MoleculeAIAgentWorkflow`` on the Temporal server.
4. Clean up the registry entry (always, via ``finally``).
Falls back to ``executor._core_execute()`` if:
- Temporal is not available (``is_available()`` is False).
- Input extraction fails.
- The workflow raises any exception.
This guarantees that the A2A client always receives a response even
when Temporal is misconfigured or temporarily unreachable.
"""
if not self._available or self._client is None:
# Temporal unavailable — silent direct fallback
await executor._core_execute(context, event_queue)
return
task_id = getattr(context, "task_id", None) or str(uuid.uuid4())
context_id = getattr(context, "context_id", None) or str(uuid.uuid4())
# Build serialisable AgentTaskInput
try:
from adapters.shared_runtime import (
extract_history as _extract_history,
extract_message_text,
)
user_input = extract_message_text(context) or ""
raw_history = _extract_history(context)
# Convert (role, content) tuples → [role, content] lists (JSON-safe)
history: list = [list(pair) for pair in raw_history]
except Exception as exc:
logger.warning(
"Temporal: failed to extract serialisable task input (%s) — "
"falling back to direct execution",
exc,
)
await executor._core_execute(context, event_queue)
return
workspace_id_env = os.environ.get("WORKSPACE_ID", "unknown")
# Issue #837: query the latest checkpoint for this workspace.
# If a previous workflow crashed mid-step, inject the last known
# step into the history so the agent is aware of its prior state.
# Non-fatal: a missing or 404 response means starting fresh.
last_ckpt = await _fetch_latest_checkpoint(workspace_id_env)
if last_ckpt:
step_name = last_ckpt.get("step_name", "unknown")
workflow_id_ckpt = last_ckpt.get("workflow_id", "")
completed_at = last_ckpt.get("completed_at", "")
ckpt_note = (
f"[SYSTEM: This workspace was previously executing workflow "
f"'{workflow_id_ckpt}'. The last recorded step was '{step_name}' "
f"(completed at {completed_at}). If the current task is a "
f"continuation of that workflow, resume from this point. "
f"Otherwise ignore this context and start fresh.]"
)
# Prepend as a synthetic context entry so the agent sees it at the
# start of its history — before any user messages for this task.
history = [["system", ckpt_note]] + history
logger.info(
"Temporal: injecting checkpoint context task_id=%s last_step=%s wf=%s",
task_id,
step_name,
workflow_id_ckpt,
)
inp = AgentTaskInput(
task_id=task_id,
context_id=context_id,
user_input=user_input,
model=getattr(executor, "_model", "unknown"),
workspace_id=workspace_id_env,
history=history,
)
# Register non-serialisable in-process state for activities to access
_task_registry[task_id] = {
"executor": executor,
"context": context,
"event_queue": event_queue,
"final_text": "",
}
try:
logger.info(
"Temporal: starting workflow molecule-%s on queue '%s'",
task_id,
_TASK_QUEUE,
)
await self._client.execute_workflow(
MoleculeAIAgentWorkflow.run, # type: ignore[name-defined]
inp,
id=f"molecule-{task_id}",
task_queue=_TASK_QUEUE,
execution_timeout=_WORKFLOW_EXECUTION_TIMEOUT,
)
except Exception as exc:
logger.error(
"Temporal: workflow molecule-%s failed (%s) — "
"falling back to direct execution so client receives a response",
task_id,
exc,
exc_info=True,
)
# Direct fallback ensures the SSE client is never left hanging
await executor._core_execute(context, event_queue)
finally:
_task_registry.pop(task_id, None)
# ─────────────────────────────────────────────────────────────────────────────
# Module-level singleton helpers
# Used by a2a_executor.py and main.py
# ─────────────────────────────────────────────────────────────────────────────
_global_wrapper: Optional[TemporalWorkflowWrapper] = None
def get_wrapper() -> Optional[TemporalWorkflowWrapper]:
"""Return the global ``TemporalWorkflowWrapper``, or ``None`` if not set.
Called from ``LangGraphA2AExecutor.execute()`` on every request.
Returns ``None`` before ``create_wrapper()`` is called (direct-execution mode).
"""
return _global_wrapper
def create_wrapper() -> TemporalWorkflowWrapper:
"""Create (or return the existing) global ``TemporalWorkflowWrapper``.
Idempotent — safe to call multiple times. Call ``await wrapper.start()``
after this to connect to Temporal and launch the background worker.
Example (in main.py)::
from builtin_tools.temporal_workflow import create_wrapper as create_temporal_wrapper
temporal_wrapper = create_temporal_wrapper()
await temporal_wrapper.start() # connects + starts worker
try:
await server.serve()
finally:
await temporal_wrapper.stop()
"""
global _global_wrapper
if _global_wrapper is None:
_global_wrapper = TemporalWorkflowWrapper()
return _global_wrapper
-57
View File
@@ -1,57 +0,0 @@
"""Helpers for building / mutating the workspace ``AgentCard``.
Kept as their own module so the behavior is unit-testable without booting
the whole runtime (``main.py`` is ``# pragma: no cover``).
"""
from __future__ import annotations
from typing import Iterable
from a2a.types import AgentCard, AgentSkill
def enrich_card_skills(card: AgentCard, loaded_skills: Iterable | None) -> bool:
"""Replace ``card.skills`` with rich metadata from the adapter's loaded
skills, in place. Pairs with PR #2756: the card was built up front from
static ``config.skills`` names so /.well-known/agent-card.json could
serve before ``adapter.setup()`` finishes; this swaps in the richer
descriptions/tags/examples that ``setup()``'s skill loader produces.
Returns ``True`` on swap, ``False`` when the swap was skipped or
failed. Failure cases:
* ``loaded_skills`` is None / empty — caller didn't load any.
* Any element doesn't expose ``.metadata.{id,name,description,tags,examples}``
(a future adapter that doesn't follow the canonical shape).
Failures DO NOT raise — a malformed ``loaded_skills`` shape would
otherwise propagate to ``main.py``'s outer ``except Exception``,
silently degrading an OK boot to the not-configured state. Static
stubs from ``config.skills`` stay in place; setup() already
succeeded, the agent works, only the card's skill enrichment is
degraded. Operator sees a clear log line; tests assert this
distinction.
"""
if not loaded_skills:
return False
try:
rich = [
AgentSkill(
id=skill.metadata.id,
name=skill.metadata.name,
description=skill.metadata.description,
tags=skill.metadata.tags,
examples=skill.metadata.examples,
)
for skill in loaded_skills
]
except Exception as enrich_err: # noqa: BLE001
print(
f"Warning: skill metadata enrichment failed (keeping static "
f"stubs from config.skills): {type(enrich_err).__name__}: {enrich_err}",
flush=True,
)
return False
card.skills = rich
return True
-659
View File
@@ -1,659 +0,0 @@
"""Load workspace configuration from config.yaml."""
import logging
import os
from dataclasses import dataclass, field
from pathlib import Path
from typing import Optional
import yaml
logger = logging.getLogger(__name__)
@dataclass
class RBACConfig:
"""Role-based access control settings for this workspace.
``roles`` declares what this workspace is *allowed* to do. Each role
name maps to a set of permitted actions. Built-in roles are defined in
``tools/audit.ROLE_PERMISSIONS``; custom roles can be added via
``allowed_actions``.
Built-in roles
--------------
admin All actions (delegate, approve, memory.read, memory.write)
operator Same as admin — standard agent role (default)
read-only memory.read only
no-delegation approve + memory.read + memory.write
no-approval delegate + memory.read + memory.write
memory-readonly memory.read only
Example config.yaml snippet::
rbac:
roles:
- operator
allowed_actions:
analyst:
- memory.read
- memory.write
"""
roles: list[str] = field(default_factory=lambda: ["operator"])
"""List of role names granted to this workspace."""
allowed_actions: dict[str, list[str]] = field(default_factory=dict)
"""Custom role → [action, ...] overrides. Takes precedence over built-ins."""
@dataclass
class HITLConfig:
"""Human-In-The-Loop settings loaded from the ``hitl:`` block in config.yaml.
Example config.yaml snippet::
hitl:
channels:
- type: dashboard # always active
- type: slack
webhook_url: https://hooks.slack.com/services/…
- type: email
smtp_host: smtp.example.com
from: alerts@example.com
to: ops@example.com
default_timeout: 300 # seconds
bypass_roles: [admin]
"""
channels: list[dict] = field(default_factory=lambda: [{"type": "dashboard"}])
default_timeout: float = 300.0
bypass_roles: list[str] = field(default_factory=list)
@dataclass
class DelegationConfig:
retry_attempts: int = 3
retry_delay: float = 5.0
timeout: float = 120.0
escalate: bool = True
@dataclass
class A2AConfig:
port: int = 8000
streaming: bool = True
push_notifications: bool = True
@dataclass
class SandboxConfig:
backend: str = "subprocess" # subprocess | docker
memory_limit: str = "256m"
timeout: int = 30
@dataclass
class RuntimeConfig:
"""Configuration for CLI-based agent runtimes (claude-code, codex, ollama, custom)."""
command: str = "" # e.g. "claude", "codex", "ollama" (model goes in model field)
args: list[str] = field(default_factory=list) # additional CLI args
required_env: list[str] = field(default_factory=list) # env vars required to run (e.g. ["CLAUDE_CODE_OAUTH_TOKEN"])
timeout: int = 0 # seconds (0 = no timeout — agents wait until done)
model: str = "" # model override for the CLI
provider: str = "" # explicit LLM provider (e.g., "anthropic", "openai",
# "minimax"). Falls back to the top-level resolved
# provider when empty. Adapters (hermes, claude-code,
# codex) prefer this over slug-parsing the model name.
# Per-model entries surfaced in the canvas Model dropdown. Each entry is a
# raw dict with at least ``id``; ``required_env`` is the per-model auth
# list (e.g. ``{"id": "MiniMax-M2.7", "required_env": ["MINIMAX_API_KEY"]}``).
# Preflight prefers an entry's ``required_env`` over the top-level
# ``required_env`` when the picked ``model`` matches an entry's ``id``
# (case-insensitive). The top-level list remains the fallback so single-
# model templates need not migrate. Surfaced 2026-05-02 after a user
# picked MiniMax in canvas, set MINIMAX_API_KEY, and still got booted
# into a CLAUDE_CODE_OAUTH_TOKEN preflight failure.
models: list[dict] = field(default_factory=list)
# Deprecated — use required_env + secrets API instead. Kept for backward compat.
auth_token_env: str = ""
auth_token_file: str = ""
@dataclass
class GovernanceConfig:
"""Microsoft Agent Governance Toolkit integration settings.
When ``enabled`` is True, Molecule AI's RBAC and audit trail are bridged
to the Agent Governance Toolkit (agent-os-kernel) for policy evaluation.
``toolkit`` is reserved for future extensibility — only ``"microsoft"``
is supported today.
``policy_mode`` controls enforcement:
strict RBAC *and* toolkit policy must both allow — strictest mode
permissive RBAC must allow; toolkit denials are logged but not enforced
audit RBAC only; toolkit evaluated and logged but never blocks
``policy_file`` path to a Rego (.rego), YAML (.yaml/.yml), or Cedar
(.cedar) policy file, loaded into the PolicyEvaluator at startup.
``blocked_patterns`` is a list of regex patterns that the toolkit will
always deny regardless of roles or policy.
"""
enabled: bool = False
toolkit: str = "microsoft"
policy_endpoint: str = ""
policy_mode: str = "audit" # strict | permissive | audit
policy_file: str = ""
blocked_patterns: list[str] = field(default_factory=list)
max_tool_calls_per_task: int = 50
@dataclass
class SecurityScanConfig:
"""Skill dependency security scanning settings.
``mode`` controls what happens when critical/high CVEs are found:
block — raise ``SkillSecurityError``; the skill is NOT loaded.
warn — emit a WARNING + audit event; the skill is loaded anyway (default).
off — skip scanning entirely (air-gapped or CI environments).
Scanners tried in order: Snyk CLI (requires ``SNYK_TOKEN``), then
pip-audit. If neither is available the scan is silently skipped.
Example config.yaml snippet::
security_scan: warn # shorthand string form
# or verbose form:
security_scan:
mode: block
"""
mode: str = "warn"
"""One of: block | warn | off."""
fail_open_if_no_scanner: bool = True
"""When True (default), silently skip scanning if no scanner (snyk/pip-audit)
is in PATH. When False and mode='block', raise SkillSecurityError so that
operators who require a CVE gate know the gate is absent. Closes #268."""
@dataclass
class EventLogConfig:
"""Settings for the workspace event log (workspace/event_log.py).
The event log is an append-and-query buffer for runtime events
(turn started, tool invoked, peer message delivered, …) that the
canvas Activity tab and platform-side `/activity` endpoint read.
Defaults are tuned for a long-running workspace: 1-hour TTL and a
10k-entry cap together hold ~1 MB of events in memory at the
documented per-event size budget (~100 bytes payload).
Example config.yaml snippet::
observability:
event_log:
backend: memory # or "disabled" to opt out
ttl_seconds: 3600
max_entries: 10000
"""
backend: str = "memory"
"""``memory`` (default) buffers events in process RAM with the
bounds below; ``disabled`` returns a no-op log so the canvas
Activity tab is silent. Unknown values fall back to ``memory`` —
a typo should not crash boot or silently drop telemetry."""
ttl_seconds: int = 3600
"""How long an event survives before TTL eviction. 1 hour covers
a long agentic loop comfortably without leaking; operators
debugging a slow drift may temporarily widen this, but be aware
the bound is RAM, not disk."""
max_entries: int = 10_000
"""Hard cap on resident events. Together with ``ttl_seconds`` this
bounds memory: the FIFO eviction drops oldest first, so a query
cursor that falls behind sees a contiguous tail rather than a
gappy log."""
@dataclass
class ObservabilityConfig:
"""Observability settings — heartbeat cadence, log verbosity, event log.
Hermes-style block: groups platform-runtime knobs that operators
typically tune together (cadence, verbosity, event-log retention)
into one declarative section instead of scattering them across env
vars and hard-coded constants. Adopting this shape unblocks
per-workspace tuning without a code change.
The ``event_log`` sub-block is schema-only in this PR (#119 PR-2);
consumer wiring (the canvas Activity tab + `/activity` endpoint
reading from the configured backend) lands in PR-3.
Example config.yaml snippet::
observability:
heartbeat_interval_seconds: 60
log_level: DEBUG
event_log:
backend: memory
ttl_seconds: 3600
max_entries: 10000
"""
heartbeat_interval_seconds: int = 30
"""Seconds between heartbeats sent to the platform. Default 30 matches
``workspace/heartbeat.py``'s long-standing constant. Lower values
reduce platform-side detection latency for crashed workspaces; higher
values reduce platform write load. Bounds: clamped to [5, 300] at
parse time — outside that range the workspace either floods the
platform or looks dead before the next beat."""
log_level: str = "INFO"
"""Python ``logging`` level for the workspace runtime. Accepts the
standard names (DEBUG, INFO, WARNING, ERROR, CRITICAL). Today the
runtime reads ``LOG_LEVEL`` env; PR-3 of the #119 stack switches to
this field with env still honored as an override for ops debugging."""
event_log: EventLogConfig = field(default_factory=EventLogConfig)
"""Event-log backend + retention bounds. See ``EventLogConfig``."""
@dataclass
class ComplianceConfig:
"""OWASP Top 10 for Agentic Applications compliance settings.
Default is ``mode: owasp_agentic`` + ``prompt_injection: detect``.
The detect mode logs injection attempts as audit events without
blocking the request — so there is no false-positive UX cost, only
a gain in visibility. Operators opt into stricter ``block`` mode per
workspace. To disable compliance entirely (not recommended), set
``mode: ""`` in config.yaml.
Before 2026-04-24, the default was ``mode: ""`` (fully off). A
review of the A2A inbound path showed that no shipped template set
``mode`` explicitly, so prompt-injection detection was silently
disabled for every live workspace despite the machinery existing.
Flipping the default to ``owasp_agentic`` with ``prompt_injection:
detect`` closes that gap with zero user-visible behavior change.
Example config.yaml snippet to opt OUT::
compliance:
mode: "" # disables all compliance checks
Example config.yaml snippet to tighten::
compliance:
mode: owasp_agentic # (default)
prompt_injection: block # (default: detect)
max_tool_calls_per_task: 30
max_task_duration_seconds: 180
"""
mode: str = "owasp_agentic"
"""Enable compliance mode. ``owasp_agentic`` (default) activates the
OA-01/OA-02/OA-03/OA-06 checks; ``""`` disables everything."""
prompt_injection: str = "detect"
"""``detect`` logs injection attempts (default, zero UX cost);
``block`` raises PromptInjectionError before the agent sees the
text. Operators can tighten to ``block`` per workspace."""
max_tool_calls_per_task: int = 50
"""Maximum number of tool invocations per task before ExcessiveAgencyError."""
max_task_duration_seconds: int = 300
"""Maximum wall-clock seconds per task before ExcessiveAgencyError."""
@dataclass
class WorkspaceConfig:
name: str = "Workspace"
description: str = ""
role: str = ""
"""Human-readable role label for this agent (e.g. 'Senior Code Reviewer').
Surfaced in AGENTS.md so peer agents can understand this workspace's purpose
without reading the full system prompt. Falls back to description when empty."""
version: str = "1.0.0"
tier: int = 1
model: str = "anthropic:claude-opus-4-7"
provider: str = ""
"""Explicit LLM provider slug (e.g., ``anthropic``, ``openai``, ``minimax``).
When empty, ``load_config`` derives it from the ``model`` slug prefix
(``anthropic:claude-opus-4-7`` → ``anthropic``; ``minimax/abab7-chat`` →
``minimax``; bare model names → ``""``). Set explicitly via the canvas
Provider dropdown or the ``LLM_PROVIDER`` env var when the model name
is provider-ambiguous (e.g., a custom alias) or when an adapter needs
a specific gateway distinct from the model namespace.
"""
runtime: str = "langgraph" # langgraph | claude-code | codex | ollama | custom
runtime_config: RuntimeConfig = field(default_factory=RuntimeConfig)
initial_prompt: str = ""
"""Auto-sent as the first A2A message after startup. Default empty = no auto-message.
Can be an inline string or a file reference (initial_prompt_file in yaml)."""
idle_prompt: str = ""
"""Auto-sent every `idle_interval_seconds` while the workspace has no active
task (heartbeat.active_tasks == 0). Default empty = no idle loop. This is
the reflection-on-completion / backlog-pull pattern from the Hermes/Letta
playbook: the workspace self-wakes when idle, runs a lightweight reflection
prompt, and either picks up queued work or stops. Cost scales with useful
activity (the prompt returns quickly if there's nothing to do). Can be
inline or a file reference via `idle_prompt_file`."""
idle_interval_seconds: int = 600
"""How often the idle loop checks in (seconds). Default 600 (10 min).
Ignored when idle_prompt is empty."""
skills: list[str] = field(default_factory=list)
plugins: list[str] = field(default_factory=list) # installed plugin names
tools: list[str] = field(default_factory=list)
prompt_files: list[str] = field(default_factory=list)
a2a: A2AConfig = field(default_factory=A2AConfig)
delegation: DelegationConfig = field(default_factory=DelegationConfig)
sandbox: SandboxConfig = field(default_factory=SandboxConfig)
rbac: RBACConfig = field(default_factory=RBACConfig)
hitl: HITLConfig = field(default_factory=HITLConfig)
governance: GovernanceConfig = field(default_factory=GovernanceConfig)
security_scan: SecurityScanConfig = field(default_factory=SecurityScanConfig)
compliance: ComplianceConfig = field(default_factory=ComplianceConfig)
observability: ObservabilityConfig = field(default_factory=ObservabilityConfig)
sub_workspaces: list[dict] = field(default_factory=list)
effort: str = ""
"""Claude output effort level for the agentic loop: low | medium | high | xhigh | max.
Empty string = not set (model default applies). xhigh is the Opus 4.7 recommended
default for long agentic tasks. Passed as ``output_config.effort`` by ClaudeSDKExecutor."""
task_budget: int = 0
"""Advisory total-token budget across the full agentic loop. 0 = not set.
Must be >= 20000 when non-zero (API minimum). When set, ClaudeSDKExecutor
automatically adds the ``task-budgets-2026-03-13`` beta header."""
def _derive_provider_from_model(model: str) -> str:
"""Extract the provider slug prefix from a model identifier.
Recognizes both ``provider:model`` (Anthropic / OpenAI / Google convention)
and ``provider/model`` (HuggingFace / Minimax convention). Returns ``""``
when the model has no recognizable separator — callers must treat empty
as "use adapter default routing", not as a hard failure.
"""
for sep in (":", "/"):
if sep in model:
return model.partition(sep)[0]
return ""
_legacy_model_provider_warned = False
def _picked_model_from_env(default: str) -> str:
"""Resolve the operator-picked model id from env; newest name wins.
Precedence: ``MOLECULE_MODEL`` (canonical, unambiguous) → ``MODEL`` →
``MODEL_PROVIDER`` (legacy) → ``default`` (the YAML ``model:`` field).
``MODEL_PROVIDER`` is **misleadingly named**: it carries the picked
*model id*, never the LLM provider — the provider lives in
``LLM_PROVIDER`` / the YAML ``provider:`` field. The legacy path stays
so canvas Save+Restart, the workspace-server secret-mint path, and
persona env files that set it keep working, but if it's the *only* one
set we log a deprecation once — the misnomer keeps biting (e.g. setting
``MODEL_PROVIDER=claude-code`` expecting it to select the claude-code
*runtime* — it doesn't, ``runtime:`` does — after which the claude CLI
404s on ``--model claude-code``). Set ``MODEL``/``MOLECULE_MODEL`` to
an id from ``runtime_config.models[].id`` (e.g. ``opus``, ``sonnet``,
``claude-opus-4-7``, ``MiniMax-M2.7-highspeed``) instead.
"""
global _legacy_model_provider_warned
for name in ("MOLECULE_MODEL", "MODEL"):
v = (os.environ.get(name) or "").strip()
if v:
return v
legacy = (os.environ.get("MODEL_PROVIDER") or "").strip()
if legacy:
if not _legacy_model_provider_warned:
logger.warning(
"MODEL_PROVIDER=%r is deprecated and misleadingly named — it "
"sets the picked *model id*, not the LLM provider (that's "
"LLM_PROVIDER / the YAML `provider:` field). Set MODEL (or "
"MOLECULE_MODEL) to an id from runtime_config.models instead.",
legacy,
)
_legacy_model_provider_warned = True
return legacy
return default
_EVENT_LOG_VALID_BACKENDS = {"memory", "disabled"}
def _parse_event_log(raw: object) -> "EventLogConfig":
"""Coerce the ``observability.event_log`` YAML block into EventLogConfig.
Lenient like the rest of this parser: a missing block, a non-dict
value, or a bad backend name resolves to defaults rather than
raising at boot. The event_log is observability infra — a typo in
one field should not crash the workspace before any event can fire.
Bounds (ttl_seconds, max_entries) clamp to positives so a 0/-1
misconfig doesn't disable the log silently; that's what
``backend: disabled`` is for.
"""
if not isinstance(raw, dict):
return EventLogConfig()
backend = str(raw.get("backend", "memory")).strip().lower()
if backend not in _EVENT_LOG_VALID_BACKENDS:
backend = "memory"
try:
ttl_seconds = int(raw.get("ttl_seconds", 3600))
except (TypeError, ValueError):
ttl_seconds = 3600
if ttl_seconds <= 0:
ttl_seconds = 3600
try:
max_entries = int(raw.get("max_entries", 10_000))
except (TypeError, ValueError):
max_entries = 10_000
if max_entries <= 0:
max_entries = 10_000
return EventLogConfig(
backend=backend, ttl_seconds=ttl_seconds, max_entries=max_entries
)
def _clamp_heartbeat(value: object) -> int:
"""Coerce raw YAML/env input into the [5, 300]-second heartbeat band.
Outside that band the workspace either floods the platform with
sub-second beats or looks dead long before the next one — both
real failure modes seen on incidents, neither benign. Coerce here
so adapters and ``heartbeat.py`` can read the value without
re-validating.
"""
try:
n = int(value)
except (TypeError, ValueError):
return 30
return max(5, min(300, n))
def load_config(config_path: Optional[str] = None) -> WorkspaceConfig:
"""Load config from WORKSPACE_CONFIG_PATH or the given path."""
if config_path is None:
config_path = os.environ.get("WORKSPACE_CONFIG_PATH", "/configs")
config_file = Path(config_path) / "config.yaml"
if not config_file.exists():
raise FileNotFoundError(f"Config file not found: {config_file}")
with open(config_file) as f:
raw = yaml.safe_load(f) or {}
# Operator-picked model from env (canvas / secret-mint / persona env),
# falling back to the YAML `model:` field. See _picked_model_from_env for
# the precedence (MOLECULE_MODEL > MODEL > legacy MODEL_PROVIDER).
model = _picked_model_from_env(raw.get("model", "anthropic:claude-opus-4-7"))
# Resolve top-level provider with this priority chain:
# 1. ``LLM_PROVIDER`` env var (canvas Save+Restart sets this so the
# operator's choice survives a CP-driven restart even though the
# regenerated /configs/config.yaml drops most user fields).
# 2. Explicit YAML ``provider:`` (an operator pinned it in the file).
# 3. Derive from the model slug prefix for backward compat:
# ``anthropic:claude-opus-4-7`` → ``anthropic``
# ``minimax/abab7-chat-preview`` → ``minimax``
# bare model names → ``""`` (signals "use adapter default")
# Empty after all three is fine — adapters that don't need an explicit
# provider (langgraph, claude-code-default, codex) keep their existing
# routing; adapters that do (hermes via derive-provider.sh) prefer this
# over slug-parsing the model name.
provider = (
os.environ.get("LLM_PROVIDER")
or raw.get("provider")
or _derive_provider_from_model(model)
)
runtime = raw.get("runtime", "langgraph")
runtime_raw = raw.get("runtime_config", {})
a2a_raw = raw.get("a2a", {})
delegation_raw = raw.get("delegation", {})
sandbox_raw = raw.get("sandbox", {})
rbac_raw = raw.get("rbac", {})
hitl_raw = raw.get("hitl", {})
governance_raw = raw.get("governance", {})
# security_scan accepts both shorthand string ("warn") and dict ({"mode": "warn"})
_ss_raw = raw.get("security_scan", {})
security_scan_raw = _ss_raw if isinstance(_ss_raw, dict) else {"mode": str(_ss_raw)}
compliance_raw = raw.get("compliance", {})
observability_raw = raw.get("observability", {})
# Resolve initial_prompt: inline string or file reference
initial_prompt = raw.get("initial_prompt", "")
initial_prompt_file = raw.get("initial_prompt_file", "")
if not initial_prompt and initial_prompt_file:
prompt_path = Path(config_path) / initial_prompt_file
if prompt_path.exists():
initial_prompt = prompt_path.read_text().strip()
# Resolve idle_prompt: same pattern as initial_prompt
idle_prompt = raw.get("idle_prompt", "")
idle_prompt_file = raw.get("idle_prompt_file", "")
if not idle_prompt and idle_prompt_file:
idle_path = Path(config_path) / idle_prompt_file
if idle_path.exists():
idle_prompt = idle_path.read_text().strip()
idle_interval_seconds = int(raw.get("idle_interval_seconds", 600))
return WorkspaceConfig(
name=raw.get("name", "Workspace"),
description=raw.get("description", ""),
role=raw.get("role", ""),
version=raw.get("version", "1.0.0"),
tier=int(raw.get("tier", 1)) if str(raw.get("tier", 1)).isdigit() else 1,
model=model,
provider=provider,
runtime=runtime,
initial_prompt=initial_prompt,
idle_prompt=idle_prompt,
idle_interval_seconds=idle_interval_seconds,
runtime_config=RuntimeConfig(
command=runtime_raw.get("command", ""),
args=runtime_raw.get("args", []),
required_env=runtime_raw.get("required_env", []),
timeout=runtime_raw.get("timeout", 0),
# Picked-model precedence (priority order):
# 1. operator-picked model from env — MOLECULE_MODEL > MODEL >
# (legacy) MODEL_PROVIDER, plumbed via canvas Save+Restart,
# workspace-server's secret-mint path, or the universal
# MODEL/MODEL_PROVIDER env from applyRuntimeModelEnv. The
# operator's canvas selection MUST win over the template's
# baked-in default; previously the template's
# `runtime_config.model: sonnet` always won and the picked
# MiniMax/GLM/etc model was silently dropped (Bug B,
# surfaced 2026-05-02 during E2E).
# 2. runtime_raw.model — explicit YAML override in the
# template's runtime_config.
# 3. top-level `model` (already env-resolved above). This is
# the SaaS restart case (CP regenerates a minimal
# config.yaml on every boot, dropping runtime_config.model).
# Centralising here means EVERY adapter gets the override for
# free — no per-adapter env-reading code required.
model=_picked_model_from_env(runtime_raw.get("model") or model),
# Same fallback shape as ``model`` above: an explicit
# ``runtime_config.provider`` wins; otherwise inherit the
# top-level resolved provider so adapters see a single
# consistent choice without each one re-implementing
# env/YAML/slug-prefix resolution.
provider=runtime_raw.get("provider") or provider,
# Per-model entries (canvas Model dropdown source). Pass through
# raw dicts so the schema can grow without a parser change. Only
# entries that are dicts are kept — a malformed YAML element
# (string, list, None) is silently dropped rather than raising,
# matching the rest of this parser's lenient defaults.
models=[m for m in (runtime_raw.get("models") or []) if isinstance(m, dict)],
# Deprecated fields — kept for backward compat
auth_token_env=runtime_raw.get("auth_token_env", ""),
auth_token_file=runtime_raw.get("auth_token_file", ""),
),
skills=raw.get("skills", []),
plugins=raw.get("plugins", []),
tools=raw.get("tools", []),
prompt_files=raw.get("prompt_files", []),
a2a=A2AConfig(
port=a2a_raw.get("port", 8000),
streaming=a2a_raw.get("streaming", True),
push_notifications=a2a_raw.get("push_notifications", True),
),
delegation=DelegationConfig(
retry_attempts=delegation_raw.get("retry_attempts", 3),
retry_delay=delegation_raw.get("retry_delay", 5.0),
timeout=delegation_raw.get("timeout", 120.0),
escalate=delegation_raw.get("escalate", True),
),
sandbox=SandboxConfig(
backend=sandbox_raw.get("backend", "subprocess"),
memory_limit=sandbox_raw.get("memory_limit", "256m"),
timeout=sandbox_raw.get("timeout", 30),
),
rbac=RBACConfig(
roles=rbac_raw.get("roles", ["operator"]),
allowed_actions=rbac_raw.get("allowed_actions", {}),
),
hitl=HITLConfig(
channels=hitl_raw.get("channels", [{"type": "dashboard"}]),
default_timeout=float(hitl_raw.get("default_timeout", 300)),
bypass_roles=hitl_raw.get("bypass_roles", []),
),
governance=GovernanceConfig(
enabled=governance_raw.get("enabled", False),
toolkit=governance_raw.get("toolkit", "microsoft"),
policy_endpoint=governance_raw.get("policy_endpoint", ""),
policy_mode=governance_raw.get("policy_mode", "audit"),
policy_file=governance_raw.get("policy_file", ""),
blocked_patterns=governance_raw.get("blocked_patterns", []),
max_tool_calls_per_task=governance_raw.get("max_tool_calls_per_task", 50),
),
security_scan=SecurityScanConfig(
mode=security_scan_raw.get("mode", "warn"),
fail_open_if_no_scanner=security_scan_raw.get("fail_open_if_no_scanner", True),
),
compliance=ComplianceConfig(
# Default must match ComplianceConfig.mode's dataclass default
# (see class docstring for rationale — 2026-04-24 flip).
mode=compliance_raw.get("mode", "owasp_agentic"),
prompt_injection=compliance_raw.get("prompt_injection", "detect"),
max_tool_calls_per_task=int(compliance_raw.get("max_tool_calls_per_task", 50)),
max_task_duration_seconds=int(compliance_raw.get("max_task_duration_seconds", 300)),
),
observability=ObservabilityConfig(
heartbeat_interval_seconds=_clamp_heartbeat(
observability_raw.get("heartbeat_interval_seconds", 30)
),
log_level=str(observability_raw.get("log_level", "INFO")).upper(),
event_log=_parse_event_log(observability_raw.get("event_log", {})),
),
sub_workspaces=raw.get("sub_workspaces", []),
effort=str(raw.get("effort", "")),
task_budget=int(raw.get("task_budget", 0)),
)
-61
View File
@@ -1,61 +0,0 @@
"""Resolve the configs directory used by the workspace runtime.
The runtime persists per-workspace state to a single directory:
``.auth_token`` (platform_auth), ``.platform_inbound_secret``
(platform_inbound_auth), ``.mcp_inbox_cursor`` (inbox). Inside a
workspace EC2 container that directory is ``/configs`` — a tmpfs/EBS
mount owned by the agent user, populated by the provisioner before
runtime boot.
Outside a container — operators running ``molecule-mcp`` on a laptop
for the external-runtime path — ``/configs`` doesn't exist (or, if it
does, isn't writable by an unprivileged user). The default would
silently fail on the first heartbeat: ``.platform_inbound_secret``
write hits ``Read-only file system: '/configs'``, the heartbeat thread
logs and dies, the workspace flips offline within a minute. The
operator sees no actionable error.
This module is the single resolution point. Resolution order:
1. ``CONFIGS_DIR`` env var, if set — explicit operator override.
2. ``/configs`` — used iff the path exists AND is writable. This
preserves the in-container default for every existing deployment.
3. ``$HOME/.molecule-workspace`` — the non-container fallback,
created with mode 0700 so per-file 0600 perms aren't undermined
by a world-readable parent.
Not cached: callers (heartbeat thread, MCP tools) hit this at most a
few times per second; reading the env var + one ``stat()`` call is
cheap, and the existing call sites read ``os.environ`` live so tests
that monkeypatch ``CONFIGS_DIR`` between cases keep working.
Issue: Molecule-AI/molecule-core#2458.
"""
from __future__ import annotations
import os
from pathlib import Path
def resolve() -> Path:
"""Return the configs directory, creating the home fallback if needed."""
explicit = os.environ.get("CONFIGS_DIR", "").strip()
if explicit:
path = Path(explicit)
path.mkdir(parents=True, exist_ok=True)
return path
in_container = Path("/configs")
if in_container.exists() and os.access(str(in_container), os.W_OK):
return in_container
home_path = Path.home() / ".molecule-workspace"
home_path.mkdir(parents=True, exist_ok=True, mode=0o700)
return home_path
def reset_cache() -> None:
"""No-op kept for API stability; this module is stateless. Tests
that called reset_cache when the cached prototype was in tree
keep working without modification."""
return
-137
View File
@@ -1,137 +0,0 @@
"""Memory consolidation loop.
When an agent is idle (no active tasks for a configurable period),
the consolidation loop wakes up and summarizes noisy local memory
entries into dense, high-value knowledge facts.
Similar to human sleep consolidation — raw scratchpad entries get
compressed into reusable knowledge.
"""
import asyncio
import logging
import os
import httpx
from platform_auth import auth_headers
logger = logging.getLogger(__name__)
if os.path.exists("/.dockerenv") or os.environ.get("DOCKER_VERSION"):
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
else:
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://localhost:8080")
_WORKSPACE_ID_raw = os.environ.get("WORKSPACE_ID")
if not _WORKSPACE_ID_raw:
raise RuntimeError("WORKSPACE_ID environment variable is required but not set")
WORKSPACE_ID = _WORKSPACE_ID_raw
CONSOLIDATION_INTERVAL = float(os.environ.get("CONSOLIDATION_INTERVAL", "300")) # 5 min
CONSOLIDATION_THRESHOLD = int(os.environ.get("CONSOLIDATION_THRESHOLD", "10")) # min memories before consolidating
class ConsolidationLoop:
"""Background loop that consolidates local memories when idle."""
def __init__(self, agent=None):
self.agent = agent
self._running = False
async def start(self):
"""Start the consolidation loop."""
self._running = True
logger.info("Memory consolidation loop started (interval=%ss, threshold=%d)",
CONSOLIDATION_INTERVAL, CONSOLIDATION_THRESHOLD)
while self._running:
await asyncio.sleep(CONSOLIDATION_INTERVAL)
if not self._running:
break
try:
await self._consolidate()
except Exception as e:
logger.warning("Consolidation error: %s", e)
async def _consolidate(self):
"""Check if consolidation is needed and run it."""
async with httpx.AsyncClient(timeout=10.0) as client:
# Fetch local memories
resp = await client.get(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/memories",
params={"scope": "LOCAL"},
headers=auth_headers(),
)
if resp.status_code != 200:
return
memories = resp.json()
if len(memories) < CONSOLIDATION_THRESHOLD:
return
logger.info("Consolidating %d local memories", len(memories))
# Build a summary of all local memories
contents = [m["content"] for m in memories]
summary_prompt = (
"Summarize the following workspace memories into 3-5 key facts. "
"Each fact should be a single, clear sentence capturing the most "
"important and reusable knowledge:\n\n"
+ "\n".join(f"- {c}" for c in contents)
)
# Use the agent to generate the summary if available
summary = ""
if self.agent:
try:
result = await self.agent.ainvoke(
{"messages": [("user", summary_prompt)]},
config={"configurable": {"thread_id": "consolidation"}},
)
messages = result.get("messages", [])
summary = ""
for msg in reversed(messages):
content = getattr(msg, "content", "")
if isinstance(content, str) and content.strip():
msg_type = getattr(msg, "type", "")
if msg_type != "human":
summary = content
break
if summary:
# Store consolidated summary as a TEAM memory — only delete originals if POST succeeds
resp = await client.post(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/memories",
json={"content": f"[Consolidated] {summary}", "scope": "TEAM"},
headers=auth_headers(),
)
if resp.status_code in (200, 201):
# Safe to delete originals — consolidated version is saved
for m in memories:
await client.delete(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/memories/{m['id']}",
headers=auth_headers(),
)
logger.info("Consolidated %d memories into team knowledge", len(memories))
else:
logger.warning("Consolidation POST failed (status %d) — keeping originals", resp.status_code)
except Exception as e:
logger.error(
"CONSOLIDATION: Agent summarization failed (rate limit? model error?): %s. "
"Falling back to simple concatenation.", e
)
# Fall through to concatenation below
# Fallback: concatenate without agent summarization
if not (self.agent and summary):
combined = " | ".join(contents[:20])
await client.post(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/memories",
json={"content": f"[Consolidated] {combined}", "scope": "TEAM"},
headers=auth_headers(),
)
logger.info("Consolidated %d memories via concatenation fallback", len(memories))
def stop(self):
self._running = False
-152
View File
@@ -1,152 +0,0 @@
"""Coordinator pattern for team workspaces.
When a workspace is expanded into a team, the parent agent becomes a
coordinator that routes incoming tasks to the appropriate child workspace
based on the task content and children's capabilities.
The coordinator:
1. Fetches its children's Agent Cards (skills, capabilities)
2. Analyzes each incoming task to determine which child is best suited
3. Delegates to the chosen child via the delegation tool
4. Aggregates responses if a task requires multiple children
5. Falls back to handling the task itself if no child is appropriate
"""
import logging
import os
import httpx
from langchain_core.tools import tool
from shared_runtime import build_peer_section
from policies.routing import build_team_routing_payload
logger = logging.getLogger(__name__)
if os.path.exists("/.dockerenv") or os.environ.get("DOCKER_VERSION"):
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
else:
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://localhost:8080")
_WORKSPACE_ID_raw = os.environ.get("WORKSPACE_ID")
if not _WORKSPACE_ID_raw:
raise RuntimeError("WORKSPACE_ID environment variable is required but not set")
WORKSPACE_ID = _WORKSPACE_ID_raw
async def get_children() -> list[dict]:
"""Fetch this workspace's children from the platform."""
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.get(
f"{PLATFORM_URL}/registry/{WORKSPACE_ID}/peers",
headers={"X-Workspace-ID": WORKSPACE_ID},
)
if resp.status_code == 200:
peers = resp.json()
# Filter to only children (parent_id == our ID)
return [p for p in peers if p.get("parent_id") == WORKSPACE_ID]
except Exception as e:
logger.warning("Failed to fetch children: %s", e)
return []
def build_children_description(children: list[dict]) -> str:
"""Build a description of children's capabilities for the coordinator prompt."""
if not children:
return ""
team_section = build_peer_section(
children,
heading="## Your Team (sub-workspaces you coordinate)",
instruction=(
"Use the `delegate_task_async` tool to send tasks to the chosen member. "
"Only delegate to members listed above."
),
)
return "\n".join(
[
team_section,
"",
"### Coordination Rules — MANDATORY",
"1. You are a COORDINATOR. Your ONLY job is to delegate and synthesize. NEVER do the work yourself.",
"2. For EVERY task, use `delegate_task_async` to send it to the appropriate team member(s). "
"Do this BEFORE writing any analysis, code, or research yourself.",
"3. If a task spans multiple members, delegate to ALL of them in parallel and aggregate results.",
"4. If ALL members are offline/paused, tell the caller which members are unavailable. "
"Do NOT attempt the work yourself — you lack the specialist context.",
"5. If a delegation FAILS (error, timeout): try another member first. "
"Only provide your own brief summary if NO member can respond. Never forward raw errors.",
"6. Your response should be a SYNTHESIS of your team's work, not your own analysis.",
"7. Always respond in the same language the caller uses.",
]
)
@tool
async def route_task_to_team(
task: str,
preferred_member_id: str = "",
) -> dict:
"""Route a task to the most appropriate team member.
As the team coordinator, analyze the task and delegate to the best-suited
child workspace. If preferred_member_id is provided, delegate directly to
that member.
Args:
task: The task description to route.
preferred_member_id: Optional — directly delegate to this member.
"""
import time
from builtin_tools.delegation import delegate_task_async as delegate
# RFC #2251 V1.0 reproduction-harness instrumentation. Phase-tagged log
# lines correlate with scripts/measure-coordinator-task-bounds.sh's
# external timing trace, so an operator running the harness against
# staging can answer "what phase was the coordinator in at minute 7?".
# `grep rfc2251_phase` on the workspace's container logs is the query.
# Strip when V1.0 ships and the phase data lands in the structured
# heartbeat payload instead.
_phase_t0 = time.monotonic()
logger.info(
"rfc2251_phase=route_start task_chars=%d preferred_member_id=%s",
len(task), preferred_member_id or "none",
)
children = await get_children()
logger.info(
"rfc2251_phase=children_fetched count=%d elapsed_ms=%d",
len(children), int((time.monotonic() - _phase_t0) * 1000),
)
decision = build_team_routing_payload(
children,
task=task,
preferred_member_id=preferred_member_id,
)
logger.info(
"rfc2251_phase=routing_decided action=%s elapsed_ms=%d",
decision.get("action", "unknown"), int((time.monotonic() - _phase_t0) * 1000),
)
if decision.get("action") == "delegate_to_preferred_member":
# Async delegation — returns immediately with task_id
target = decision["preferred_member_id"]
logger.info(
"rfc2251_phase=delegate_invoked target=%s elapsed_ms=%d",
target, int((time.monotonic() - _phase_t0) * 1000),
)
result = await delegate.ainvoke(
{"workspace_id": target, "task": task}
)
logger.info(
"rfc2251_phase=delegate_returned target=%s task_id=%s elapsed_ms=%d",
target, result.get("task_id", "n/a"), int((time.monotonic() - _phase_t0) * 1000),
)
return result
logger.info(
"rfc2251_phase=route_returning_decision_only elapsed_ms=%d",
int((time.monotonic() - _phase_t0) * 1000),
)
return decision
-174
View File
@@ -1,174 +0,0 @@
#!/bin/sh
# Drop privileges to the agent user before exec'ing molecule-runtime.
# claude-code refuses --dangerously-skip-permissions when running as
# root/sudo for safety. Without this entrypoint, every cron tick fails
# with `ProcessError: Command failed with exit code 1` and the agent
# logs `--dangerously-skip-permissions cannot be used with root/sudo
# privileges for security reasons`.
#
# Pattern matches the legacy monorepo workspace/entrypoint.sh:
# fix volume ownership as root, then re-exec via gosu as agent (uid 1000).
# --- RFC#523 Layer 2: tenant-workspace forbidden-env guard (task #146) ---
# Defense-in-depth. The provisioner (workspace-server) has a fail-closed
# abort at provision time (Layer 1, prepareProvisionContext), and the
# in-container env-build has a silent strip (forensic #145,
# provisioner.buildContainerEnv). This guard fires if either upstream
# layer is bypassed — e.g. someone runs this image standalone with
# `docker run -e GITEA_TOKEN=...`. Exit 1 with a clear message instead
# of running with an operator-scope credential in tenant scope.
#
# Key names are generic. The MOLECULE_OPERATOR_ prefix is the one
# molecule-AI-specific literal; this entrypoint lives inside the
# claude-code template that is internal-only (memory
# `feedback_open_source_templates_no_hardcoded_org_internals` — claude-
# code template is internal, separate-published templates must NOT carry
# org-specific literals). A fork can edit FORBIDDEN_KEYS /
# FORBIDDEN_PREFIXES for its own operator-scope names without touching
# the rest of the entrypoint.
#
# Skipped when MOLECULE_TENANT_GUARD_DISABLE=1 — for local-dev where the
# operator host IS the tenant host (e.g. running molecule-runtime on the
# operator box for debugging). NEVER set this in tenant containers.
if [ "${MOLECULE_TENANT_GUARD_DISABLE:-0}" != "1" ]; then
FORBIDDEN_KEYS="GITEA_TOKEN GITEA_PAT GITHUB_TOKEN GITHUB_PAT GH_TOKEN GITLAB_TOKEN GL_TOKEN BITBUCKET_TOKEN CP_ADMIN_API_TOKEN CP_ADMIN_TOKEN INFISICAL_OPERATOR_TOKEN INFISICAL_BOOTSTRAP_TOKEN RAILWAY_TOKEN RAILWAY_PERSONAL_API_TOKEN HETZNER_TOKEN HETZNER_API_TOKEN"
FORBIDDEN_PREFIXES="MOLECULE_OPERATOR_"
FOUND=""
for k in $FORBIDDEN_KEYS; do
# eval is safe here — $k is from a static whitespace-separated
# literal list above (no user input). POSIX sh has no
# associative arrays, hence the indirect-expansion via eval to
# test "is this var set" without caring about its value.
eval "v=\${$k+set}"
if [ "$v" = "set" ]; then
FOUND="$FOUND $k"
fi
done
for prefix in $FORBIDDEN_PREFIXES; do
# env | awk is the portable POSIX way to enumerate by prefix.
# busybox awk (alpine), gawk (debian), and BSD awk (macOS-test)
# all support index(). Doesn't depend on bash arrays / [[ =~ ]].
prefix_hits=$(env | awk -F= -v p="$prefix" 'index($1, p)==1 {print $1}')
if [ -n "$prefix_hits" ]; then
FOUND="$FOUND $prefix_hits"
fi
done
if [ -n "$FOUND" ]; then
echo "RFC#523 Layer 2: refusing to start tenant workspace — forbidden operator-scope env var(s) present:$FOUND" >&2
echo "These vars are operator-fleet scope and must not reach tenant workspaces." >&2
echo "Remove them from workspace_secrets / global_secrets / docker -e and retry." >&2
echo "If running this image standalone for local dev with intentional operator scope, set MOLECULE_TENANT_GUARD_DISABLE=1." >&2
exit 1
fi
fi
if [ "$(id -u)" = "0" ]; then
# Configs volume is created by Docker as root; agent needs write access
# for plugin installs, memory writes, .auth_token rotation, etc.
chown -R agent:agent /configs 2>/dev/null
# Strip CRLF from hook scripts — Windows Docker Desktop copies host files
# with CRLF line endings even when .gitattributes says eol=lf. The \r in
# the shebang line makes python3 try to open 'script.py\r' → ENOENT →
# claude-code swallows the hook error → "(no response generated)".
# This is the permanent fix — runs at every container start.
for f in /configs/.claude/hooks/*.sh /configs/.claude/hooks/*.py; do
[ -f "$f" ] && sed -i 's/\r$//' "$f"
done
# /workspace handling — only chown when the contents are root-owned
# (typical on Docker Desktop on Windows where host uid maps to 0).
# On Linux Docker with matching uids the recursive chown is skipped
# to keep startup fast.
chown agent:agent /workspace 2>/dev/null || true
if [ -d /workspace ]; then
first_entry=$(find /workspace -mindepth 1 -maxdepth 1 -print -quit 2>/dev/null)
if [ -n "$first_entry" ] && [ "$(stat -c '%u' "$first_entry" 2>/dev/null)" = "0" ]; then
chown -R agent:agent /workspace 2>/dev/null
fi
fi
# Claude Code session directory — mounted at /root/.claude/sessions by
# the platform provisioner. Symlink it into agent's home so the SDK
# finds it when running as agent. The provisioner's mount point is
# hardcoded to /root/.claude/sessions; we don't want to change the
# platform contract just for this template.
mkdir -p /home/agent/.claude
if [ -d /root/.claude/sessions ]; then
chown -R agent:agent /root/.claude /home/agent/.claude 2>/dev/null
ln -sfn /root/.claude/sessions /home/agent/.claude/sessions
fi
# --- Per-persona git identity (closes molecule-core#155) ---
# Without this, every team commit lands with an empty author and Gitea
# attributes the work to the founder PAT instead of the persona that
# actually authored it. Same fingerprint that got us suspended on GitHub
# 2026-05-06. GITEA_USER is injected by the provisioner from the
# workspace_secrets table; bot.moleculesai.app is the agent-only domain
# so commits are clearly distinguishable from human authors.
if [ -n "${GITEA_USER:-}" ]; then
git config --global user.name "${GITEA_USER}"
git config --global user.email "${GITEA_USER}@bot.moleculesai.app"
fi
# --- GitHub credential helper setup (issue #547 / #613) ---
# Configure git to use the molecule credential helper for github.com.
# This runs as root so the global gitconfig is written before we drop
# to agent. The helper fetches fresh GitHub App installation tokens
# from the platform API, with caching and env-var fallback.
#
# NOTE: post-suspension (2026-05-06), github.com/Molecule-AI is gone;
# the helper's platform endpoint also 500s (internal#187). The helper
# block is kept for legacy boxes that still have a working token chain;
# post-suspension provisioner injects GITEA_TOKEN directly so this
# path's failure is non-fatal. Full removal tracked under #171.
if [ -x /app/scripts/molecule-git-token-helper.sh ]; then
# Set credential helper for github.com only (not all hosts).
# The '!' prefix tells git to run the command as a shell command.
git config --global "credential.https://github.com.helper" \
"!/app/scripts/molecule-git-token-helper.sh"
# Disable other credential helpers for github.com to avoid conflicts.
git config --global "credential.https://github.com.useHttpPath" true
fi
# Move gitconfig to agent's home so it takes effect after gosu —
# done unconditionally so the per-persona identity survives the drop
# even when the github.com helper block is skipped.
if [ -f /root/.gitconfig ]; then
cp /root/.gitconfig /home/agent/.gitconfig
chown agent:agent /home/agent/.gitconfig
fi
# Create the token cache directory for the agent user.
mkdir -p /home/agent/.molecule-token-cache
chown agent:agent /home/agent/.molecule-token-cache
chmod 700 /home/agent/.molecule-token-cache
exec gosu agent "$0" "$@"
fi
# Now running as agent (uid 1000)
# --- Start background token refresh daemon (with respawn supervision) ---
# Keeps gh CLI and git credentials fresh across the 60-min token TTL.
# Wrapped in a respawn loop so a daemon crash doesn't silently leave the
# workspace stuck on an expired token. Runs in the background; entrypoint
# continues to exec molecule-runtime.
if [ -x /app/scripts/molecule-gh-token-refresh.sh ]; then
nohup bash -c '
while true; do
/app/scripts/molecule-gh-token-refresh.sh
rc=$?
echo "[molecule-gh-token-refresh] daemon exited rc=$rc — respawning in 30s" >&2
sleep 30
done
' > /home/agent/.gh-token-refresh.log 2>&1 &
fi
# --- Initial gh auth setup ---
# If GITHUB_TOKEN or GH_TOKEN is set (injected at provision time),
# authenticate gh CLI with it so it works immediately (before the first
# background refresh fires). The background daemon will replace this
# with a fresh token within ~60s of boot.
if [ -n "${GITHUB_TOKEN:-}" ]; then
echo "${GITHUB_TOKEN}" | gh auth login --hostname github.com --with-token 2>/dev/null || true
elif [ -n "${GH_TOKEN:-}" ]; then
echo "${GH_TOKEN}" | gh auth login --hostname github.com --with-token 2>/dev/null || true
fi
exec molecule-runtime "$@"
-249
View File
@@ -1,249 +0,0 @@
"""Workspace event log — append-and-query buffer for runtime events.
Hermes-style declarative observability primitive. Adapter and platform
code emit semantic events (turn started, tool invoked, peer message
delivered) and external readers — the canvas Activity tab, A2A peers,
and the platform's `/workspaces/:id/activity` endpoint — query them
with a cursor.
Today's PR ships the in-memory backend only. Redis backend lands in
the follow-up that wires platform-side fan-out (#119 PR-3 follow-up).
The Protocol shape lets a future backend swap in without touching the
emitting sites.
Eviction is the load-bearing invariant: the workspace runtime is
long-lived, so an unbounded list would leak memory. Every append
prunes by both TTL and max_entries; readers that fall behind past
the eviction frontier see a contiguous tail without an error — the
cursor protocol only guarantees "events with id > since that are
still resident", not "every event ever appended". A reader that
needs at-least-once delivery must poll faster than the eviction TTL.
"""
from __future__ import annotations
import threading
import time
from collections import deque
from dataclasses import asdict, dataclass, field
from typing import Any, Deque, Iterable, Optional, Protocol
@dataclass(frozen=True)
class Event:
"""One immutable entry in the event log.
``id`` is a monotonic integer assigned at append time. It SURVIVES
eviction — the counter is never reset when an old event drops out
of the buffer, so a reader's cursor stays valid even if the event
it points to has aged out (the next query just returns the resident
tail). This is the contract that lets a slow reader reconnect
without resetting to id=0.
"""
id: int
timestamp: float
"""Seconds since the Unix epoch — the same shape as ``time.time()``
so callers can format with ``datetime.fromtimestamp`` without an
extra conversion. Float, not int, because event-bursts within the
same second need stable ordering for downstream merging."""
kind: str
"""Short tag categorising the event: ``turn.started``, ``tool.invoked``,
``peer.message.delivered``, etc. Convention is dotted snake_case so
the canvas can group by prefix without a parser."""
payload: dict = field(default_factory=dict)
"""Arbitrary JSON-serialisable dict. Keep small — the in-memory
backend holds every event in process RAM. Large blobs (file
contents, full transcripts) belong in the platform's blob store
with a reference here, not the value itself."""
def to_dict(self) -> dict:
"""Plain-dict shape for JSON serialisation in the API layer.
Wrapping ``dataclasses.asdict`` rather than relying on the
consumer to call it themselves means the wire format stays
owned by this module — a rename of ``kind`` to ``type`` (or
whatever the canvas eventually settles on) flips here, not in
every reader.
"""
return asdict(self)
class EventLogBackend(Protocol):
"""Backend Protocol — the swap point for memory ↔ redis ↔ disabled.
Implementations must be safe to call from multiple threads. The
workspace runtime appends from the heartbeat thread, the agent's
main loop, and any A2A executor concurrently; readers run on the
HTTP server thread. A backend that needs locking owns it.
"""
def append(self, kind: str, payload: Optional[dict] = None) -> Event:
"""Add an event and return the persisted record (with id assigned)."""
...
def query(self, since: Optional[int] = None, limit: Optional[int] = None) -> list[Event]:
"""Return events with ``id > since`` (or all resident if ``since`` is None).
Order is ascending by id. ``limit`` caps the returned slice;
if the resident tail is shorter than ``limit``, returns what
is available.
"""
...
def clear(self) -> None:
"""Drop all entries. Provided for test isolation, not for production callers."""
...
class InMemoryEventLog:
"""Bounded in-memory ring buffer with TTL eviction.
Two eviction triggers, both checked on every ``append`` (and on
``query`` for read-side freshness when older entries have aged
past the TTL but no append has happened to evict them):
- **TTL:** entries older than ``ttl_seconds`` are dropped.
- **max_entries:** when the deque exceeds ``max_entries``, oldest
drop until back at the cap.
Both bounds are advisory at construction — non-positive values
fall back to permissive defaults rather than disabling the log,
because a misconfigured value should not silently lose events.
To disable the log, use ``DisabledEventLog`` instead.
The id counter is monotonic across the entire process lifetime;
eviction does not reset it. A query with ``since=last_seen_id``
returns the resident tail past that cursor, which may be empty if
the reader is too far behind.
"""
_DEFAULT_TTL_SECONDS = 3600 # 1 hour — covers a long agentic loop without leaking
_DEFAULT_MAX_ENTRIES = 10_000 # ~1 MB at 100 bytes/event, safely under workspace RAM budget
def __init__(
self,
ttl_seconds: int = _DEFAULT_TTL_SECONDS,
max_entries: int = _DEFAULT_MAX_ENTRIES,
now: Optional[Any] = None,
) -> None:
self._ttl_seconds: int = ttl_seconds if ttl_seconds > 0 else self._DEFAULT_TTL_SECONDS
self._max_entries: int = max_entries if max_entries > 0 else self._DEFAULT_MAX_ENTRIES
# Injected clock for deterministic TTL tests. Production passes
# ``time.time``; tests pass a callable that returns a controlled value.
self._now = now if callable(now) else time.time
self._lock = threading.Lock()
self._next_id: int = 1
self._buf: Deque[Event] = deque()
def append(self, kind: str, payload: Optional[dict] = None) -> Event:
with self._lock:
event = Event(
id=self._next_id,
timestamp=self._now(),
kind=kind,
payload=dict(payload) if payload else {},
)
self._next_id += 1
self._buf.append(event)
self._evict_locked()
return event
def query(self, since: Optional[int] = None, limit: Optional[int] = None) -> list[Event]:
with self._lock:
# Read-side TTL sweep — covers the case where appends pause
# but a reader keeps polling. Without this, a stale tail
# would survive forever once writes stop.
self._evict_locked()
cutoff = since if since is not None else 0
tail: Iterable[Event] = (e for e in self._buf if e.id > cutoff)
if limit is not None and limit >= 0:
if limit == 0:
# Explicit empty-slice probe — used by pagination
# UIs to ask "are there any new events?" without
# paying for the data. Distinct from limit=None
# (no cap) — return empty rather than the first event.
return []
out: list[Event] = []
for e in tail:
out.append(e)
if len(out) >= limit:
break
return out
return list(tail)
def clear(self) -> None:
with self._lock:
self._buf.clear()
# NOTE: do NOT reset _next_id — the cursor contract is that
# ids are monotonic across the lifetime of the process, even
# across explicit clears (which only happen in tests).
def _evict_locked(self) -> None:
"""Caller MUST hold self._lock."""
if not self._buf:
return
cutoff = self._now() - self._ttl_seconds
while self._buf and self._buf[0].timestamp < cutoff:
self._buf.popleft()
# max_entries bound after TTL — a long buffer that fits the
# window can still be capped if the burst rate exceeded design.
while len(self._buf) > self._max_entries:
self._buf.popleft()
class DisabledEventLog:
"""No-op backend for ``backend: disabled``.
Append returns a synthetic event so callers that want the id
don't crash; query always returns empty. The synthetic event is
NOT cached anywhere — the contract for ``backend: disabled`` is
that no state is retained. Operators who pick this backend opt
out of the canvas Activity tab and the `/activity` endpoint.
"""
def __init__(self) -> None:
self._next_id: int = 1
self._lock = threading.Lock()
def append(self, kind: str, payload: Optional[dict] = None) -> Event:
# Single-shot id increment — keeps the returned event ids
# monotonic for callers that compare them, even though we
# never persist anything.
with self._lock:
event = Event(
id=self._next_id,
timestamp=time.time(),
kind=kind,
payload=dict(payload) if payload else {},
)
self._next_id += 1
return event
def query(self, since: Optional[int] = None, limit: Optional[int] = None) -> list[Event]:
return []
def clear(self) -> None:
return None
def create_event_log(
backend: str = "memory",
ttl_seconds: int = InMemoryEventLog._DEFAULT_TTL_SECONDS,
max_entries: int = InMemoryEventLog._DEFAULT_MAX_ENTRIES,
) -> EventLogBackend:
"""Factory — pick a backend by name from EventLogConfig.
Unknown backend strings fall back to ``memory`` rather than
raising at boot. A typo'd config value should degrade to the
safe default, not crash the workspace before any event can be
recorded. The redis backend lands in a follow-up; until then
``backend: redis`` also resolves to in-memory.
"""
name = (backend or "memory").strip().lower()
if name in ("disabled", "off", "none"):
return DisabledEventLog()
# memory is the default; redis falls through here until it's wired.
return InMemoryEventLog(ttl_seconds=ttl_seconds, max_entries=max_entries)
-96
View File
@@ -1,96 +0,0 @@
"""WebSocket subscriber for platform events.
Subscribes to the platform WebSocket with X-Workspace-ID header
so the workspace only receives events about reachable peers.
Triggers system prompt rebuild on relevant peer changes.
"""
import asyncio
import json
import logging
import httpx
logger = logging.getLogger(__name__)
# Events that should trigger a system prompt rebuild
REBUILD_EVENTS = {
"WORKSPACE_ONLINE",
"WORKSPACE_OFFLINE",
"WORKSPACE_EXPANDED",
"WORKSPACE_COLLAPSED",
"WORKSPACE_REMOVED",
"AGENT_CARD_UPDATED",
}
class PlatformEventSubscriber:
"""Subscribes to platform WebSocket for peer events."""
def __init__(
self,
platform_url: str,
workspace_id: str,
on_peer_change=None,
):
self.ws_url = platform_url.replace("http://", "ws://").replace("https://", "wss://") + "/ws"
self.workspace_id = workspace_id
self.on_peer_change = on_peer_change
self._running = False
self._reconnect_delay = 1.0
async def start(self):
"""Connect to platform WebSocket with exponential backoff reconnect."""
self._running = True
while self._running:
try:
await self._connect()
except Exception as e:
if not self._running:
break
logger.warning("WebSocket disconnected: %s. Reconnecting in %.0fs...", e, self._reconnect_delay)
await asyncio.sleep(self._reconnect_delay)
self._reconnect_delay = min(self._reconnect_delay * 2, 30.0)
async def _connect(self):
"""Establish WebSocket connection and process events."""
try:
import websockets
except ImportError:
logger.warning("websockets package not installed, skipping event subscription")
self._running = False
return
# Fix D (Cycle 5): include bearer token in WebSocket upgrade so the
# server's new auth check can validate this agent connection.
# Graceful fallback for workspaces that have no token yet.
headers = {"X-Workspace-ID": self.workspace_id}
try:
from platform_auth import auth_headers as _auth_headers
headers.update(_auth_headers())
except Exception:
pass # No token available — connect unauthenticated (grandfathered)
logger.info("Connecting to platform WebSocket: %s", self.ws_url)
async with websockets.connect(self.ws_url, additional_headers=headers) as ws:
self._reconnect_delay = 1.0 # Reset on successful connect
logger.info("Platform WebSocket connected")
async for message in ws:
try:
event = json.loads(message)
event_type = event.get("event", "")
if event_type in REBUILD_EVENTS:
logger.info("Peer event: %s for workspace %s",
event_type, event.get("workspace_id", ""))
if self.on_peer_change:
await self.on_peer_change(event)
except json.JSONDecodeError:
continue
except Exception as e:
logger.warning("Error processing event: %s", e)
def stop(self):
self._running = False
File diff suppressed because it is too large Load Diff
-706
View File
@@ -1,706 +0,0 @@
"""Heartbeat loop — alive signal + delegation status checker.
Every 30 seconds:
1. Send heartbeat to platform (alive signal with current_task, error_rate)
2. Check pending delegations — any results back?
3. Store completed delegation results for the agent to pick up
Resilient: recreates HTTP client on failure, auto-restarts on crash.
"""
import asyncio
import json
import logging
import os
import time
from pathlib import Path
import httpx
from platform_auth import auth_headers, refresh_cache, self_source_headers
def _runtime_state_payload() -> dict:
"""Build the {runtime_state, sample_error} portion of the heartbeat
body when SOME adapter executor has marked itself wedged. Returns
an empty dict when the runtime is healthy so the heartbeat payload
doesn't grow fields the platform doesn't need.
Source of truth is runtime_wedge (lives in molecule-runtime,
independent of any specific adapter). Pre task #87 this imported
from claude_sdk_executor — that worked because the executor was
bundled into molecule-runtime, but blocked moving it to the
claude-code template repo. The runtime_wedge module is now the
cross-cutting wedge-state holder; adapters mark/clear via it,
heartbeat reads it.
Imported lazily so a workspace whose runtime image somehow ships
without runtime_wedge (corrupt install, mid-rolling-deploy state)
keeps heartbeating — a missing import means "no wedge info; assume
healthy."
"""
try:
from runtime_wedge import is_wedged, wedge_reason
except Exception:
return {}
if not is_wedged():
return {}
return {
"runtime_state": "wedged",
# sample_error doubles as the human-readable banner text on the
# canvas's degraded card — keep it short and actionable.
"sample_error": wedge_reason(),
}
def _runtime_metadata_payload() -> dict:
"""Build the {runtime_metadata} portion of the heartbeat body —
adapter-declared capabilities + per-capability override values
(idle timeout, etc.). The platform reads this to route capabilities
to the right owner: native (adapter) vs fallback (platform).
Returns an empty dict if the adapter can't be loaded or introspected.
Heartbeat must NEVER fail because of capability discovery — observability
is more important than capability accuracy. The platform falls through
to its own defaults when fields are missing.
See project memory `project_runtime_native_pluggable.md` and
workspace/adapter_base.py:RuntimeCapabilities.
"""
try:
from adapters import get_adapter
# ADAPTER_MODULE wins over the runtime arg in get_adapter — pass
# an empty string to force the env-var path.
adapter_cls = get_adapter("")
adapter = adapter_cls()
caps = adapter.capabilities()
meta: dict = {"capabilities": caps.to_dict()}
idle = adapter.idle_timeout_override()
# Only include the override when it's a positive integer. None /
# zero / negative falls through to the platform's global default
# (env A2A_IDLE_TIMEOUT_SECONDS, default 5min) — that "absent
# field = use default" contract is what keeps the wire small.
if isinstance(idle, int) and idle > 0:
meta["idle_timeout_seconds"] = idle
return {"runtime_metadata": meta}
except Exception as e:
# debug-level: missing ADAPTER_MODULE in dev / test envs is normal
logger.debug("runtime_metadata: failed to read adapter caps: %s", e)
return {}
logger = logging.getLogger(__name__)
def _persist_inbound_secret_from_heartbeat(resp) -> None:
"""Persist ``platform_inbound_secret`` from a heartbeat response, if any.
The platform's heartbeat handler (workspace-server PR #2421) returns
the secret on every beat — mirrors /registry/register so a workspace
whose secret was lazy-healed on the platform side picks it up within
one heartbeat tick instead of requiring a runtime restart.
Without this delivery path the chat-upload code path's "secret was
just minted, will pick up on next heartbeat" 503 message is a lie
and the workspace stays 401-forever until the operator restarts the
runtime. Caught 2026-04-30 on the hongmingwang tenant — the
standalone wrapper (mcp_cli.py) got the same change in #2421 but
the in-container heartbeat (this file) was missed in the first
pass.
Failure is non-fatal: if the body isn't JSON, doesn't carry the
field, or the disk write fails, the next heartbeat retries. This
matches the cold-start register flow in main.py:319-323.
"""
try:
body = resp.json()
except Exception:
return
if not isinstance(body, dict):
return
secret = body.get("platform_inbound_secret")
if not secret:
return
try:
from platform_inbound_auth import save_inbound_secret
save_inbound_secret(secret)
except Exception as exc:
logger.warning(
"heartbeat: persist inbound secret failed: %s", exc
)
HEARTBEAT_INTERVAL = 30 # seconds — fallback default when no per-instance value is passed
MAX_CONSECUTIVE_FAILURES = 10
MAX_SEEN_DELEGATION_IDS = 200
SELF_MESSAGE_COOLDOWN = 60 # seconds — minimum between self-messages to prevent loops
# Shared path — adapter executors (in their template repos) read this
# same file via executor_helpers.read_delegation_results so heartbeat-
# delivered async delegation results land in the next agent turn.
DELEGATION_RESULTS_FILE = os.environ.get("DELEGATION_RESULTS_FILE", "/tmp/delegation_results.jsonl")
# Cursor file for tracking activity_log IDs processed from the a2a_receive path
# (delegations fired via tool_delegate_task → POST /workspaces/:id/a2a proxy, not
# POST /workspaces/:id/delegate). Persisted to disk so heartbeat restarts
# don't re-process the same rows.
_ACTIVITY_DELEGATION_CURSOR_FILE = os.environ.get(
"DELEGATION_ACTIVITY_CURSOR_FILE",
"/tmp/delegation_activity_cursor",
)
class HeartbeatLoop:
def __init__(
self,
platform_url: str,
workspace_id: str,
interval_seconds: int = HEARTBEAT_INTERVAL,
):
self.platform_url = platform_url
self.workspace_id = workspace_id
# Per-instance interval — main.py threads ObservabilityConfig.
# heartbeat_interval_seconds (clamped to [5, 300] at parse time)
# in here so operators can tune cadence per-workspace via the
# `observability:` block in config.yaml. Defaults to the
# legacy module constant so callers that haven't been updated
# yet (and tests that construct HeartbeatLoop directly with the
# 2-arg signature) keep their existing 30s behavior.
self._interval_seconds = interval_seconds
self.start_time = time.time()
self.error_count = 0
self.request_count = 0
self.active_tasks = 0
self.current_task = ""
self.sample_error = ""
self._task = None
self._consecutive_failures = 0
self._seen_delegation_ids: set[str] = set()
self._last_self_message_time = 0.0
self._parent_name: str | None = None # Cached after first lookup
# Seen activity IDs for a2a_receive polling (delegations via POST /a2a proxy path).
# Loaded lazily from cursor file on first poll to avoid blocking startup.
self._seen_activity_ids: set[str] = set()
self._activity_cursor_loaded = False
@property
def error_rate(self) -> float:
if self.request_count == 0:
return 0.0
return self.error_count / self.request_count
def record_error(self, error: str):
self.error_count += 1
self.request_count += 1
self.sample_error = error
def record_success(self):
self.request_count += 1
def start(self):
self._task = asyncio.create_task(self._loop())
self._task.add_done_callback(self._on_done)
def _on_done(self, task):
if not task.cancelled() and task.exception():
logger.error("Heartbeat loop died: %s — restarting", task.exception())
self._task = asyncio.create_task(self._loop())
self._task.add_done_callback(self._on_done)
async def stop(self):
if self._task:
self._task.cancel()
try:
await self._task
except asyncio.CancelledError:
pass
async def _loop(self):
while True:
client = None
try:
client = httpx.AsyncClient(timeout=10.0)
while True:
# 1. Send heartbeat (Phase 30.1: include auth header if token known)
try:
body = {
"workspace_id": self.workspace_id,
"error_rate": self.error_rate,
"sample_error": self.sample_error,
"active_tasks": self.active_tasks,
"current_task": self.current_task,
"uptime_seconds": int(time.time() - self.start_time),
}
# Layer the runtime-wedge fields on top so a
# non-empty sample_error from the wedge wins
# over the (typically empty) heartbeat
# sample_error field. The platform reads
# runtime_state to flip status → degraded.
body.update(_runtime_state_payload())
body.update(_runtime_metadata_payload())
resp = await client.post(
f"{self.platform_url}/registry/heartbeat",
json=body,
headers=auth_headers(),
)
self.error_count = 0
self.request_count = 0
self._consecutive_failures = 0
# 2026-04-30: persist the platform_inbound_secret
# if the heartbeat response carries one. Mirrors
# the cold-start register flow in main.py:319-323
# and closes the recovery path for workspaces
# whose secret was lazy-healed on the platform
# side after register-time. Without this, the
# workspace stays 401-forever on chat upload
# until restart. See workspace-server PR #2421
# for the server-side delivery change.
_persist_inbound_secret_from_heartbeat(resp)
except Exception as e:
self._consecutive_failures += 1
# Issue #1877: if heartbeat 401'd, re-read the token from disk
# and retry once. This handles the platform's token-rotation race
# where WriteFilesToContainer hasn't finished writing the new
# token before the runtime boots and caches the old value.
is_401 = False
if isinstance(e, httpx.HTTPStatusError) and e.response.status_code == 401:
is_401 = True
if is_401:
logger.warning("Heartbeat 401 for %s — refreshing token cache and retrying once", self.workspace_id)
refresh_cache()
try:
retry_body = {
"workspace_id": self.workspace_id,
"error_rate": self.error_rate,
"sample_error": self.sample_error,
"active_tasks": self.active_tasks,
"current_task": self.current_task,
"uptime_seconds": int(time.time() - self.start_time),
}
retry_body.update(_runtime_state_payload())
retry_resp = await client.post(
f"{self.platform_url}/registry/heartbeat",
json=retry_body,
headers=auth_headers(),
)
self._consecutive_failures = 0
self.request_count += 1
_persist_inbound_secret_from_heartbeat(retry_resp)
except Exception:
# Retry also failed — fall through to the normal
# failure tracking below.
pass
if self._consecutive_failures <= 3 or self._consecutive_failures % MAX_CONSECUTIVE_FAILURES == 0:
logger.warning("Heartbeat failed (%d consecutive): %s", self._consecutive_failures, e)
if self._consecutive_failures >= MAX_CONSECUTIVE_FAILURES:
logger.info("Heartbeat: recreating HTTP client after %d failures", self._consecutive_failures)
try:
await client.aclose()
except Exception:
pass
break
# 2. Check delegation status
try:
await self._check_delegations(client)
except Exception as e:
logger.debug("Delegation check failed: %s", e)
# 3. Check activity_logs for delegation results that arrived via
# the POST /a2a proxy path (tool_delegate_task → send_a2a_message).
# These are NOT written to the delegations table, so
# _check_delegations misses them. See issue #354.
try:
await self._check_activity_delegations(client)
except Exception as e:
logger.debug("Activity delegation check failed: %s", e)
await asyncio.sleep(self._interval_seconds)
except asyncio.CancelledError:
raise
except Exception as e:
logger.error(
"Heartbeat loop error: %s — retrying in %ds", e, self._interval_seconds
)
await asyncio.sleep(self._interval_seconds)
finally:
if client:
try:
await client.aclose()
except Exception:
pass
async def _check_delegations(self, client: httpx.AsyncClient):
"""Check for completed delegations and store results for the agent."""
try:
resp = await client.get(
f"{self.platform_url}/workspaces/{self.workspace_id}/delegations",
headers=auth_headers(),
)
if resp.status_code != 200:
return
delegations = resp.json()
if not isinstance(delegations, list):
return
new_results = []
for d in delegations:
did = d.get("delegation_id", "")
status = d.get("status", "")
if not did or did in self._seen_delegation_ids:
continue
if status in ("completed", "failed"):
# Fix B (Cycle 5): validate source_id before accepting delegation
# results. Only process delegations that THIS workspace created
# (source_id == self.workspace_id). Attacker-crafted delegation
# records with a foreign source_id cannot inject instructions.
source_id = d.get("source_id", "")
if source_id != self.workspace_id:
logger.warning(
"Heartbeat: skipping delegation %s — source_id %r does not "
"match this workspace %r; possible injection attempt",
did, source_id, self.workspace_id,
)
self._seen_delegation_ids.add(did) # mark seen so we don't warn again
continue
self._seen_delegation_ids.add(did)
new_results.append({
"delegation_id": did,
"target_id": d.get("target_id", ""),
"source_id": source_id,
"status": status,
"summary": d.get("summary", ""),
"response_preview": d.get("response_preview", ""),
"error": d.get("error", ""),
"timestamp": time.time(),
})
# Evict old seen IDs if over limit
if len(self._seen_delegation_ids) > MAX_SEEN_DELEGATION_IDS:
# Keep most recent half
self._seen_delegation_ids = set(list(self._seen_delegation_ids)[MAX_SEEN_DELEGATION_IDS // 2:])
if new_results:
# Append to results file for context injection on next message
with open(DELEGATION_RESULTS_FILE, "a") as f:
for r in new_results:
f.write(json.dumps(r) + "\n")
logger.info("Heartbeat: %d new delegation results — triggering self-message", len(new_results))
# Build a summary message for the agent.
# Fix B (Cycle 5): do NOT embed raw response_preview text in
# user-role A2A messages — that is the prompt-injection vector.
# Instead reference only the delegation ID and status; the agent
# reads full content from DELEGATION_RESULTS_FILE which was
# written above from trusted platform data.
summary_lines = []
for r in new_results:
line = f"- [{r['status']}] Delegation {r['delegation_id'][:8]}: {r['summary'][:80]}"
if r.get("error"):
line += f"\n Error: {r['error'][:100]}"
summary_lines.append(line)
# Look up parent workspace (cached after first call)
if self._parent_name is None:
try:
parent_resp = await client.get(
f"{self.platform_url}/workspaces/{self.workspace_id}",
headers=auth_headers(),
)
if parent_resp.status_code == 200:
parent_id = parent_resp.json().get("parent_id", "")
if parent_id:
parent_info = await client.get(
f"{self.platform_url}/workspaces/{parent_id}",
headers=auth_headers(),
)
if parent_info.status_code == 200:
self._parent_name = parent_info.json().get("name", "")
if self._parent_name is None:
self._parent_name = "" # No parent — cache empty
except Exception:
pass # Will retry next cycle
parent_name = self._parent_name or ""
report_instruction = ""
if parent_name:
report_instruction = (
f"\n\nIMPORTANT: Report these results back to your parent '{parent_name}' "
f"by delegating a summary to them. Use delegate_task or delegate_task_async "
f"with a concise status report. Also use send_message_to_user to notify the user."
)
else:
report_instruction = (
"\n\nReport results using send_message_to_user to notify the user."
)
trigger_msg = (
"Delegation results are ready. Review them and take appropriate action:\n"
+ "\n".join(summary_lines)
+ report_instruction
)
# Send A2A self-message to wake the agent.
# Minimum 60s between self-messages to avoid spam, but always send
# when there are genuinely NEW results to process.
now = time.time()
if now - self._last_self_message_time < SELF_MESSAGE_COOLDOWN:
logger.debug("Heartbeat: self-message cooldown (60s), will retry next cycle")
else:
self._last_self_message_time = now
try:
# self_source_headers() adds X-Workspace-ID so the
# platform tags this row source=agent, not canvas
# — see platform_auth.py for the full rationale.
await client.post(
f"{self.platform_url}/workspaces/{self.workspace_id}/a2a",
json={
"method": "message/send",
"params": {
"message": {
"role": "user",
"parts": [{"type": "text", "text": trigger_msg}],
},
},
},
headers=self_source_headers(self.workspace_id),
timeout=120.0,
)
logger.info("Heartbeat: self-message sent to process delegation results")
except Exception as e:
logger.warning("Heartbeat: failed to send self-message: %s", e)
# Also push notification to user via canvas
for r in new_results:
try:
msg = f"Delegation {r['status']}: {r['summary'][:100]}"
if r.get("response_preview"):
msg += f"\nResult: {r['response_preview'][:200]}"
await client.post(
f"{self.platform_url}/workspaces/{self.workspace_id}/notify",
json={"message": msg, "type": "delegation_result"},
headers=auth_headers(),
)
except Exception:
pass
except Exception as e:
logger.debug("Delegation check error: %s", e)
async def _check_activity_delegations(self, client: httpx.AsyncClient):
"""Poll activity_logs for delegation results that arrived via the POST /a2a proxy path.
tool_delegate_task → send_a2a_message → POST /workspaces/:id/a2a (proxy)
logs to activity_logs but NOT the delegations table. _check_delegations
only checks the delegations table, so these results are invisible to the
heartbeat — the agent never wakes up to consume them (issue #354).
This method closes that gap: polls GET /workspaces/:id/activity?type=a2a_receive,
filters for rows from peer workspaces (source_id != "" and != self.workspace_id),
tracks seen IDs with a cursor file, and sends a self-message to wake the agent.
"""
try:
# Load cursor lazily on first call so startup is not blocked by disk I/O.
if not self._activity_cursor_loaded:
self._activity_cursor_loaded = True
try:
if os.path.exists(_ACTIVITY_DELEGATION_CURSOR_FILE):
cursor = open(_ACTIVITY_DELEGATION_CURSOR_FILE).read().strip()
if cursor:
self._seen_activity_ids = set(cursor.split(","))
except Exception:
pass # Corrupt cursor — start fresh
params: dict[str, str] = {"type": "a2a_receive"}
resp = await client.get(
f"{self.platform_url}/workspaces/{self.workspace_id}/activity",
params=params,
headers=auth_headers(),
)
if resp.status_code != 200:
return
rows = resp.json()
if not isinstance(rows, list):
return
# Activity API returns newest-first; process in reverse order so
# we advance the cursor monotonically (oldest → newest).
rows = list(reversed(rows))
new_results: list[dict] = []
last_id: str | None = None
for row in rows:
if not isinstance(row, dict):
continue
activity_id = str(row.get("id", ""))
if not activity_id:
continue
last_id = activity_id
if activity_id in self._seen_activity_ids:
continue
# Filter: must have a non-empty source_id that is NOT this workspace
# (peer agent messages only; skip canvas-user messages and self-notify).
source_id = row.get("source_id") or ""
if not source_id or source_id == self.workspace_id:
continue
self._seen_activity_ids.add(activity_id)
summary = row.get("summary") or ""
# Extract response text from request_body if available.
# Shape mirrors inbox._extract_text: walk parts for "text" field.
response_text = summary
request_body = row.get("request_body")
if isinstance(request_body, dict):
params_obj = request_body.get("params")
if isinstance(params_obj, dict):
msg = params_obj.get("message")
if isinstance(msg, dict):
parts = msg.get("parts") or []
texts = []
for p in (parts if isinstance(parts, list) else []):
if isinstance(p, dict) and p.get("kind") == "text" or p.get("type") == "text":
t = p.get("text", "")
if t:
texts.append(t)
if texts:
response_text = " ".join(texts)
new_results.append({
"delegation_id": activity_id, # Use activity ID as pseudo-delegation ID
"target_id": source_id,
"source_id": self.workspace_id,
"status": "completed",
"summary": summary,
"response_preview": response_text[:4096],
"error": "",
"timestamp": time.time(),
})
if not new_results:
return
# Persist cursor so restarts don't re-process these rows.
if last_id:
try:
with open(_ACTIVITY_DELEGATION_CURSOR_FILE, "w") as f:
# Keep cursor as comma-joined IDs; truncate if over 100KB.
cursor_str = ",".join(sorted(self._seen_activity_ids))
if len(cursor_str) > 102_400:
# Evict oldest half when cursor file grows too large.
sorted_ids = sorted(self._seen_activity_ids)
self._seen_activity_ids = set(sorted_ids[len(sorted_ids) // 2:])
cursor_str = ",".join(sorted(self._seen_activity_ids))
f.write(cursor_str)
except Exception:
pass # Non-fatal; next cycle will retry
# Append to results file and trigger self-message (mirrors _check_delegations).
with open(DELEGATION_RESULTS_FILE, "a") as f:
for r in new_results:
f.write(json.dumps(r) + "\n")
logger.info(
"Heartbeat: %d new a2a_receive delegation results from activity_logs — "
"triggering self-message",
len(new_results),
)
# Build and send self-message to wake the agent.
summary_lines = []
for r in new_results:
line = f"- [completed] Peer response from {r['target_id'][:8]}: {r['summary'][:80] or '(no summary)'}"
if r.get("error"):
line += f"\n Error: {r['error'][:100]}"
summary_lines.append(line)
# Look up parent name (reuse cached value from _check_delegations if set).
if self._parent_name is None:
try:
parent_resp = await client.get(
f"{self.platform_url}/workspaces/{self.workspace_id}",
headers=auth_headers(),
)
if parent_resp.status_code == 200:
parent_id = parent_resp.json().get("parent_id", "")
if parent_id:
parent_info = await client.get(
f"{self.platform_url}/workspaces/{parent_id}",
headers=auth_headers(),
)
if parent_info.status_code == 200:
self._parent_name = parent_info.json().get("name", "")
if self._parent_name is None:
self._parent_name = ""
except Exception:
self._parent_name = ""
parent_name = self._parent_name or ""
report_instruction = ""
if parent_name:
report_instruction = (
f"\n\nIMPORTANT: Delegate a summary of these results to your parent "
f"'{parent_name}' using delegate_task. Also use send_message_to_user "
f"to notify the user."
)
else:
report_instruction = (
"\n\nReport results using send_message_to_user to notify the user."
)
trigger_msg = (
"Delegation results are ready (from a2a_receive via activity_logs). "
"Review them and take appropriate action:\n"
+ "\n".join(summary_lines)
+ report_instruction
)
now = time.time()
if now - self._last_self_message_time < SELF_MESSAGE_COOLDOWN:
logger.debug(
"Heartbeat: self-message cooldown active; "
"a2a_receive results will be retried next cycle"
)
else:
self._last_self_message_time = now
try:
await client.post(
f"{self.platform_url}/workspaces/{self.workspace_id}/a2a",
json={
"method": "message/send",
"params": {
"message": {
"role": "user",
"parts": [{"type": "text", "text": trigger_msg}],
},
},
},
headers=self_source_headers(self.workspace_id),
timeout=120.0,
)
logger.info("Heartbeat: a2a_receive self-message sent")
except Exception as e:
logger.warning("Heartbeat: failed to send a2a_receive self-message: %s", e)
# Also notify the user via canvas.
for r in new_results:
try:
msg = f"Delegation completed: {r['summary'][:100] or '(no summary)'}"
preview = r.get("response_preview", "")
if preview:
msg += f"\nResult: {preview[:200]}"
await client.post(
f"{self.platform_url}/workspaces/{self.workspace_id}/notify",
json={"message": msg, "type": "delegation_result"},
headers=auth_headers(),
)
except Exception:
pass
except Exception as e:
logger.debug("Activity delegation check error: %s", e)
-807
View File
@@ -1,807 +0,0 @@
"""In-memory inbox + background poller for the standalone molecule-mcp path.
Purpose
-------
The universal MCP server (a2a_mcp_server.py) is OUTBOUND-ONLY by default —
it gives an MCP-aware agent the same A2A delegation, peer-discovery, and
memory tools that container-bound runtimes already have. There is no
inbound delivery path: when the canvas user types a message or a peer
sends an A2A request, the activity lands on the platform but the
standalone agent never sees it.
This module closes that gap WITHOUT requiring a tunnel or a public agent
URL. A daemon thread polls ``/workspaces/:id/activity?type=a2a_receive``
on the platform and stages new rows in an in-memory deque. Three new MCP
tools (``inbox_peek``, ``inbox_pop``, ``wait_for_message``) let the
agent observe the queue.
Why a poller (not push)
-----------------------
runtime=external workspaces have ``delivery_mode="poll"`` — the platform
records inbound A2A in ``activity_logs`` but does not call back to the
agent. A poller is the only inbound surface that works without the
operator exposing a public URL through a tunnel. 5s cadence matches
the molecule-mcp-claude-channel plugin's POLL_INTERVAL — it's already
proven on staging for the channel-based delivery path.
Cursor model
------------
``activity_logs.id`` is the cursor (server-assigned, monotonic). We
persist it to ``${CONFIGS_DIR}/.mcp_inbox_cursor`` so an agent restart
doesn't replay the last 10 minutes of inbound traffic and re-act on
already-handled messages. On 410 (cursor pruned) we drop back to
``since_secs=600`` for a bounded backlog and let the cursor advance
naturally from there.
Scope
-----
Standalone molecule-mcp ONLY. The in-container runtime has its own
push delivery (main.py + canvas WebSocket); we never want both
running at once or a single message would be delivered twice. The
caller (mcp_cli.main) gates activation explicitly via
``activate(state)``; in-container code that imports this module by
accident gets a no-op until activate is called.
"""
from __future__ import annotations
import json
import logging
import os
import threading
import time
from collections import deque
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Callable
import configs_dir
logger = logging.getLogger(__name__)
# Poll cadence. 5s mirrors the molecule-mcp-claude-channel plugin's
# proven default — fast enough that a canvas user typing "are you
# there?" gets picked up before they refresh, slow enough that 12
# requests/min won't trip rate limits or wake mobile devices.
POLL_INTERVAL_SECONDS = 5.0
# Initial backlog window for the first poll AND the recovery path
# after a stale-cursor 410. 10 minutes is enough to cover a brief
# crash/restart without flooding a long-idle workspace with hours of
# stale chat.
INITIAL_BACKLOG_SECONDS = 600
# Hard cap on the in-memory deque. The poller is bounded by the
# server's per-page limit (default 100) and the agent typically pops
# faster than the operator types, so an idle workspace shouldn't
# exceed a handful. The cap protects against runaway growth if the
# agent process stops calling pop.
MAX_QUEUED_MESSAGES = 200
@dataclass
class InboxMessage:
"""One inbound A2A message staged for the agent.
Mirrors the shape the agent sees via inbox_peek / wait_for_message.
Fields are derived from the activity_logs row by ``_from_activity``.
"""
activity_id: str
text: str
peer_id: str # empty string = canvas user; non-empty = peer workspace_id
method: str # JSON-RPC method ("message/send", "tasks/send", etc.)
created_at: str # RFC3339 timestamp from the activity row
# Which OF MY workspaces did this message arrive on. Only meaningful
# for the multi-workspace external agent (one process registered
# against multiple workspaces). Empty string = single-workspace
# path / pre-multi-workspace caller — back-compat with consumers
# that don't set it. Tools like send_message_to_user use this to
# know which workspace's identity to reply with.
arrival_workspace_id: str = ""
def to_dict(self) -> dict[str, Any]:
# Task #190 / #193 — Distinguish delegation-result rows from peer-agent
# messages. The platform's pushDelegationResultToInbox (RFC #2829 PR-2)
# writes activity_type='a2a_receive' with method='delegate_result' and
# source_id=our own workspace UUID, so the caller's inbox poller can
# surface delegation completions/failures via wait_for_message. But
# the default to_dict derives kind="peer_agent" purely from peer_id
# being non-empty — which makes a synchronous-delegation timeout, or
# a cross-workspace ProxyA2A failure, appear to the agent as a NEW
# peer_agent message from our own workspace UUID (#190 self-echo).
#
# Explicitly classify rows with method='delegate_result' as
# kind='delegation_result' regardless of peer_id, so:
# 1. wait_for_message gives the original caller a structured
# delegation result (not a fake peer instruction).
# 2. Agents reading the envelope don't mistake the row for a
# peer instructing them — preventing the #190 reply-via-
# delegate_task-to-self loop.
if self.method == "delegate_result":
kind = "delegation_result"
elif self.peer_id:
kind = "peer_agent"
else:
kind = "canvas_user"
d = {
"activity_id": self.activity_id,
"text": self.text,
"peer_id": self.peer_id,
"kind": kind,
"method": self.method,
"created_at": self.created_at,
}
# Only surface arrival_workspace_id when it's set, so single-
# workspace consumers don't see a new key in their existing
# output.
if self.arrival_workspace_id:
d["arrival_workspace_id"] = self.arrival_workspace_id
return d
@dataclass
class InboxState:
"""Thread-safe queue of pending inbound messages.
Producer: the poller thread(s), calling ``record(message)``. Consumers:
the MCP tool handlers, calling ``peek``, ``pop``, or ``wait``.
Synchronization is via a single ``threading.Lock`` (cheap — every
operation is O(n) over a small deque) plus an ``Event`` that wakes
``wait`` callers when a new message lands.
Cursors are per-workspace. Single-workspace operators construct with
``InboxState(cursor_path=...)`` (back-compat — the path becomes the
cursor file for the empty-string workspace_id key). Multi-workspace
operators construct with ``InboxState(cursor_paths={wsid: path,...})``
so each poller advances its own cursor independently — one
workspace's slow poll can't stall another's, and a 410 on one cursor
only resets that one.
"""
cursor_path: Path | None = None
"""Single-workspace cursor file. Sets ``cursor_paths[""]`` if
``cursor_paths`` not also supplied. Kept on the dataclass for
back-compat — existing callers pass ``cursor_path=`` positionally."""
cursor_paths: dict[str, Path] = field(default_factory=dict)
"""Per-workspace cursor files keyed by workspace_id. Multi-workspace
pollers each own their own row here."""
_queue: deque[InboxMessage] = field(default_factory=lambda: deque(maxlen=MAX_QUEUED_MESSAGES))
_lock: threading.Lock = field(default_factory=threading.Lock)
_arrival: threading.Event = field(default_factory=threading.Event)
_cursors: dict[str, str | None] = field(default_factory=dict)
_cursors_loaded: dict[str, bool] = field(default_factory=dict)
def __post_init__(self) -> None:
# Back-compat: single-workspace constructor passes
# cursor_path=Path(...). Promote it into the dict under the
# empty-string key so the lookup APIs are uniform.
if self.cursor_path is not None and "" not in self.cursor_paths:
self.cursor_paths[""] = self.cursor_path
def _path_for(self, workspace_id: str) -> Path | None:
"""Resolve the cursor path for a workspace_id key, or None."""
return self.cursor_paths.get(workspace_id or "")
def load_cursor(self, workspace_id: str = "") -> str | None:
"""Read the persisted cursor from disk. Cached after first call.
Missing/unreadable file → None (poller will fall back to the
initial-backlog window). We never raise: a corrupt cursor is
less bad than the inbox refusing to start.
``workspace_id=""`` is the single-workspace path, untouched.
"""
path = self._path_for(workspace_id)
with self._lock:
if self._cursors_loaded.get(workspace_id):
return self._cursors.get(workspace_id)
cursor: str | None = None
if path is not None:
try:
if path.is_file():
cursor = path.read_text().strip() or None
except OSError as exc:
logger.warning("inbox: failed to read cursor %s: %s", path, exc)
cursor = None
self._cursors[workspace_id] = cursor
self._cursors_loaded[workspace_id] = True
return cursor
def save_cursor(self, activity_id: str, workspace_id: str = "") -> None:
"""Persist the cursor. Best-effort — log + continue on failure.
Loss of the cursor on a write failure means an extra page of
backlog after restart, never a stuck poller. Silent-fail
would mask a permission misconfiguration on the operator's
configs dir; warn loudly so they can fix it.
"""
path = self._path_for(workspace_id)
with self._lock:
self._cursors[workspace_id] = activity_id
self._cursors_loaded[workspace_id] = True
if path is None:
return
try:
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + ".tmp")
tmp.write_text(activity_id)
tmp.replace(path)
except OSError as exc:
logger.warning("inbox: failed to persist cursor to %s: %s", path, exc)
def reset_cursor(self, workspace_id: str = "") -> None:
"""Forget the cursor. Used after a 410 from the activity API."""
path = self._path_for(workspace_id)
with self._lock:
self._cursors[workspace_id] = None
self._cursors_loaded[workspace_id] = True
if path is None:
return
try:
if path.is_file():
path.unlink()
except OSError as exc:
logger.warning("inbox: failed to delete cursor %s: %s", path, exc)
def record(self, message: InboxMessage) -> None:
"""Append a message, wake any waiter, and fire the notification
callback (if registered) for push-UX-capable hosts.
Skips a row whose activity_id we've already queued — defensive
against the poller racing with the consumer + cursor save. The
dedupe short-circuits BEFORE the notification fires, so a
notification-capable host doesn't see duplicate push events on
backlog overlap.
"""
with self._lock:
for existing in self._queue:
if existing.activity_id == message.activity_id:
return
self._queue.append(message)
self._arrival.set()
# Fire notification AFTER releasing the lock so the callback
# is free to do anything (including calling back into inbox)
# without deadlock. Best-effort: a raising callback must not
# prevent the message from landing in the queue — observability
# is more important than push delivery.
cb = _NOTIFICATION_CALLBACK
if cb is not None:
try:
cb(message.to_dict())
except Exception:
logger.warning(
"inbox: notification callback raised", exc_info=True
)
def peek(self, limit: int = 10) -> list[InboxMessage]:
"""Return up to ``limit`` pending messages without removing them."""
if limit <= 0:
limit = 10
with self._lock:
return list(self._queue)[:limit]
def pop(self, activity_id: str) -> InboxMessage | None:
"""Remove a specific message. Idempotent; returns None if absent.
We require the caller to specify which message it handled
rather than auto-popping the head — preserves observability
when the agent reads several but only handles one.
"""
with self._lock:
for existing in list(self._queue):
if existing.activity_id == activity_id:
self._queue.remove(existing)
if not self._queue:
self._arrival.clear()
return existing
return None
def wait(self, timeout_secs: float) -> InboxMessage | None:
"""Block until a message is available or timeout elapses.
Returns the head message WITHOUT popping; the caller decides
whether to pop after acting on it. Same shape as Python's
Queue.get with timeout, but non-destructive so a peek-style
agent can still inspect with peek/pop.
"""
# Fast path: queue already has something.
with self._lock:
if self._queue:
return self._queue[0]
self._arrival.clear()
triggered = self._arrival.wait(timeout=max(0.0, timeout_secs))
if not triggered:
return None
with self._lock:
return self._queue[0] if self._queue else None
# ---------------------------------------------------------------------------
# Module singleton — set by mcp_cli before MCP server starts.
# ---------------------------------------------------------------------------
#
# In-container callers don't activate; the inbox tools detect the
# unset singleton and return an informational error rather than
# breaking the dispatch path.
_STATE: InboxState | None = None
# Notification bridge — set by the universal MCP server (a2a_mcp_server.py)
# at startup so that new inbox arrivals can be pushed to notification-
# capable hosts (Claude Code) as MCP `notifications/claude/channel`
# events. Kept module-level (rather than a method on InboxState) so the
# inbox doesn't need to know about MCP — a thin pluggable seam.
#
# Defaults to None: in-container runtimes that don't activate the inbox
# also don't push notifications, and tests start clean. The wheel's
# wiring is exercised by tests/test_a2a_mcp_server.py + the bridge
# tests below.
_NOTIFICATION_CALLBACK: Callable[[dict], None] | None = None
def set_notification_callback(cb: Callable[[dict], None] | None) -> None:
"""Register (or clear) the per-message notification callback.
The callback receives ``InboxMessage.to_dict()`` for each new
arrival — same shape ``inbox_peek`` returns to the agent, so a
bridge can build its MCP notification payload without re-deriving
fields.
Best-effort: a raising callback does NOT prevent the message from
landing in the queue (see ``InboxState.record``). Pass ``None`` to
clear (used by tests + the wheel's shutdown path).
"""
global _NOTIFICATION_CALLBACK
_NOTIFICATION_CALLBACK = cb
def activate(state: InboxState) -> None:
"""Register an InboxState as the singleton this module exposes.
Idempotent within a process: re-activating with the same state is
a no-op; activating with a DIFFERENT state replaces the singleton
+ logs at WARNING (the only legitimate caller is mcp_cli at
startup; double-activate usually means a test/runtime mix-up).
"""
global _STATE
if _STATE is state:
return
if _STATE is not None:
logger.warning("inbox: replacing existing singleton state")
_STATE = state
def get_state() -> InboxState | None:
"""Return the active InboxState, or None if the runtime never activated.
Tool implementations call this and surface a clear "(inbox not
enabled)" message to the agent when None — keeps the in-container
path's tool dispatch from raising on an inbox-tool call that the
agent shouldn't have made anyway.
"""
return _STATE
# ---------------------------------------------------------------------------
# Activity → InboxMessage adapter
# ---------------------------------------------------------------------------
#
# The platform's a2a_proxy logs request_body as the JSON-RPC envelope
# it forwarded to the workspace. Three shapes have been observed in
# the wild (verified against workspace-server's logA2ASuccess in
# a2a_proxy_helpers.go on 2026-04-29) — handle all three before
# falling back to summary so a peer message at least surfaces SOMETHING.
def _extract_text(request_body: Any, summary: str | None) -> str:
"""Pull the human-readable text out of an A2A activity row.
Mirrors molecule-mcp-claude-channel/server.ts:445 (extractText) so
canvas-user messages and peer-agent messages render identically
across both inbound channels.
"""
if not isinstance(request_body, dict):
return summary or "(empty A2A message)"
candidates: list[Any] = []
params = request_body.get("params") if isinstance(request_body.get("params"), dict) else None
if params:
message = params.get("message") if isinstance(params.get("message"), dict) else None
if message:
candidates.append(message.get("parts"))
candidates.append(params.get("parts"))
candidates.append(request_body.get("parts"))
# The A2A protocol's part discriminator field varies between SDK
# versions: a2a-sdk v0 uses ``type``, v1 uses ``kind``. The platform's
# activity_logs preserves whichever the original sender used, so we
# accept either. Verified live against a hosted SaaS workspace on
# 2026-04-30 — every canvas-user message arrived with ``kind`` and
# the type-only filter was silently falling through to summary.
for parts in candidates:
if isinstance(parts, list):
text = "".join(
p.get("text", "")
for p in parts
if isinstance(p, dict)
and (p.get("kind") == "text" or p.get("type") == "text")
)
if text:
return text
return summary or "(empty A2A message)"
def _is_self_notify_row(row: dict[str, Any]) -> bool:
"""Return True if ``row`` is the agent's own send_message_to_user
POST surfacing back through the activity API.
The shape (workspace-server handlers/activity.go, ``Notify`` writer):
method='notify' AND no peer (source_id is None or '')
Matched on both fields together so a future caller using
``method='notify'`` for a different purpose with a real peer_id
still passes through.
"""
if row.get("method") != "notify":
return False
source_id = row.get("source_id")
return source_id is None or source_id == ""
def _is_self_echo_row(row: dict[str, Any], workspace_id: str) -> bool:
"""Return True if ``row`` is a self-originated a2a_receive row.
Internal #469: when a workspace delegates to a target that never picks
up the task, ``tool_delegate_task`` calls ``report_activity`` which
POSTs to the platform with source_id set to the *sender's* workspace
UUID (mandated by spoof-defense in workspace-server's a2a_proxy). The
activity API exposes that row under type=a2a_receive, so the inbox
poller re-fetches it. Without this guard the row is surfaced as
kind='peer_agent' with the workspace's own identity as peer_id —
the workspace sees its own delegation-failure echoed back as if a
peer had delegated to it.
The guard mirrors the existing _is_self_notify_row pattern: both
skip rows that would otherwise create spurious inbound signal. The
long-term fix (making the platform write a distinct activity_type
for agent-outbound rows) is tracked separately; this guard stays
because it only excludes rows the agent never wants.
``workspace_id`` must be non-empty — an empty-string workspace_id
(single-workspace legacy path) can never match a UUID source_id, so
the predicate is always False there, which is safe.
RFC #2829 PR-2 note: rows with method="delegate_result" are excluded
from the self-echo guard even when source_id matches our workspace_id.
The platform may write a delegation-result row with source_id set to
our workspace_id (e.g. a self-delegation or edge case in the platform's
result-writing path). Such rows must reach the inbox so that
message_from_activity can surface them as peer_agent inbound and the
runtime receives the delegation result. Silently filtering them as
self-echo would break delegation result delivery.
"""
if not workspace_id:
return False
return row.get("source_id") == workspace_id and row.get("method") != "delegate_result"
def message_from_activity(row: dict[str, Any]) -> InboxMessage:
"""Convert one /activity row into an InboxMessage.
Mutates ``row['request_body']`` in-place to swap any
``platform-pending:`` URIs to the locally-staged ``workspace:`` URIs
(see ``inbox_uploads.rewrite_request_body``) — by the time the
upstream chat message arrives via this path, the upload-receive row
that staged the bytes has already populated the URI cache (lower
activity_logs.id, processed earlier in the same poll batch). A
cache miss leaves the URI untouched; the agent surfaces an
unresolvable URI rather than the inbox silently dropping the part.
"""
request_body = row.get("request_body")
if isinstance(request_body, str):
# The Go handler returns request_body as json.RawMessage; httpx
# deserializes that to a dict already. But some legacy paths or
# mocked servers may return it as a string — handle defensively.
try:
request_body = json.loads(request_body)
except (TypeError, ValueError):
request_body = None
# Rewrite platform-pending: URIs → workspace: URIs in-place. Imported
# at call time to keep the import graph clean for the in-container
# path that doesn't use this module (also avoids a circular: the
# uploads module is small enough that re-importing per call is
# cheap, and the Python import cache makes it free after the first).
from inbox_uploads import rewrite_request_body
rewrite_request_body(request_body)
return InboxMessage(
activity_id=str(row.get("id", "")),
text=_extract_text(request_body, row.get("summary")),
peer_id=row.get("source_id") or "",
method=row.get("method") or "",
created_at=str(row.get("created_at", "")),
)
# ---------------------------------------------------------------------------
# Poller — daemon thread that fills the queue from the activity API
# ---------------------------------------------------------------------------
def _poll_once(
state: InboxState,
platform_url: str,
workspace_id: str,
headers: dict[str, str],
timeout_secs: float = 10.0,
) -> int:
"""One poll iteration. Returns number of new messages enqueued.
Idempotent and stateless apart from the InboxState passed in —
safe to call from tests with a stub state + a real httpx mock.
``workspace_id`` doubles as the cursor key on InboxState — pollers
for distinct workspaces get distinct cursors and don't trample each
other. For the single-workspace path the cursor key is the empty
string (per InboxState.__post_init__'s back-compat promotion of
``cursor_path``).
"""
import httpx
url = f"{platform_url}/workspaces/{workspace_id}/activity"
# Dual cursor key resolution: in single-workspace mode the cursor
# was historically stored under the "" key (back-compat). In
# multi-workspace mode each poller's cursor lives under its own
# workspace_id. Try the workspace-specific key first; if absent on
# this state, fall back to the legacy empty-string slot so existing
# InboxState-with-cursor_path-only constructors keep working.
cursor_key = workspace_id if workspace_id in state.cursor_paths else ""
params: dict[str, str] = {"type": "a2a_receive"}
cursor = state.load_cursor(cursor_key)
if cursor:
params["since_id"] = cursor
else:
params["since_secs"] = str(INITIAL_BACKLOG_SECONDS)
try:
with httpx.Client(timeout=timeout_secs) as client:
resp = client.get(url, params=params, headers=headers)
except Exception as exc: # noqa: BLE001
logger.warning("inbox poller: GET /activity failed: %s", exc)
return 0
if resp.status_code == 410:
# Cursor pruned — drop back to the backlog window. The next
# poll picks up wherever the activity API has rows now.
logger.info(
"inbox poller: cursor %s expired (410); resetting to since_secs=%d",
cursor,
INITIAL_BACKLOG_SECONDS,
)
state.reset_cursor(cursor_key)
return 0
if resp.status_code >= 400:
logger.warning(
"inbox poller: HTTP %d from /activity: %s",
resp.status_code,
(resp.text or "")[:200],
)
return 0
try:
rows = resp.json()
except ValueError as exc:
logger.warning("inbox poller: non-JSON response: %s", exc)
return 0
if not isinstance(rows, list):
return 0
# since_id mode returns ASC (oldest first). since_secs mode returns
# DESC; reverse so we record in chronological order and the cursor
# we save is the freshest row.
if cursor is None:
rows = list(reversed(rows))
# Imported lazily at use-site so a runtime that never sees an
# upload-receive row never imports the module. Cheap on the hot
# path because Python caches the import.
from inbox_uploads import is_chat_upload_row, BatchFetcher
new_count = 0
last_id: str | None = None
# ``batch_fetcher`` is lazy: a poll batch with no upload rows pays
# zero overhead. Once the first upload row appears we open one
# BatchFetcher and submit every subsequent upload row to its thread
# pool; before processing the FIRST non-upload row we drain the
# pool (wait_all) so the URI cache is hot when message rewriting
# runs. Without the barrier, the chat message that references the
# upload would arrive at the agent with the un-rewritten
# platform-pending: URI.
batch_fetcher: BatchFetcher | None = None
def _drain_uploads(bf: BatchFetcher | None) -> None:
if bf is None:
return
bf.wait_all()
bf.close()
for row in rows:
if not isinstance(row, dict):
continue
if is_chat_upload_row(row):
# Side-effect row from the platform's poll-mode chat-upload
# handler — fetch the bytes, stage to /workspace/.molecule/
# chat-uploads, ack. NOT enqueued as an InboxMessage; the
# agent will see the chat message that REFERENCES this
# upload via a separate (later) activity row, with the
# pending: URI rewritten to a workspace: URI by
# message_from_activity. We DO advance the cursor past
# this row so a permanent network outage on /content
# doesn't stall the cursor and block real chat traffic.
if batch_fetcher is None:
batch_fetcher = BatchFetcher(
platform_url=platform_url,
workspace_id=workspace_id,
headers=headers,
)
batch_fetcher.submit(row)
last_id = str(row.get("id", "")) or last_id
continue
# Non-upload row: drain any pending uploads first so the URI
# cache is populated before we run rewrite_request_body /
# message_from_activity on a row that may reference one.
if batch_fetcher is not None:
_drain_uploads(batch_fetcher)
batch_fetcher = None
if _is_self_notify_row(row):
# The workspace-server's `/notify` handler writes the agent's
# own send_message_to_user POSTs to activity_logs with
# activity_type='a2a_receive', method='notify', and no
# source_id, so the canvas chat-history loader can restore
# those bubbles after a page reload (handlers/activity.go,
# comment block at line 428). The activity API exposes that
# filter only on type, so the same row otherwise lands in
# this poll and gets pushed back to the agent — confirmed
# live 2026-05-01: agent observed its own outbound as an
# inbound `← molecule: Agent message: ...`. Filter here
# belt-and-braces; the long-term fix is upstream renaming
# the activity_type to `agent_outbound` (molecule-core
# #2469). Once that lands, this filter becomes redundant
# but stays in place because it only excludes rows we never
# want, so removing it would just be churn.
#
# NB: still call save_cursor for these rows below — we
# advance past them so the next poll doesn't keep re-seeing
# the same self-notify on every iteration.
last_id = str(row.get("id", "")) or last_id
continue
if _is_self_echo_row(row, workspace_id):
# Internal #469: tool_delegate_task writes its own a2a_receive
# row with source_id = this workspace's UUID (spoof-defense).
# The poll fetches it back as kind='peer_agent', making the
# workspace echo its own delegation-failure as an inbound from
# a phantom peer. Skip it — the real delegation-result path
# (delegate_result push) is separate and unaffected. Cursor
# still advances so the next poll doesn't re-seen this row.
last_id = str(row.get("id", "")) or last_id
continue
message = message_from_activity(row)
if not message.activity_id:
continue
# Tag the message with the workspace it arrived on so the agent
# (and tools like send_message_to_user) can route the reply to
# the right tenant. Empty-string in single-workspace mode keeps
# to_dict()'s output shape unchanged for back-compat consumers.
message.arrival_workspace_id = workspace_id if cursor_key else ""
state.record(message)
last_id = message.activity_id
new_count += 1
# Drain any uploads still in flight if the batch ended with upload
# rows (no chat-message row to trigger the inline drain). Without
# this, a future poll that picks up the chat-message row first
# would race with the still-running fetches.
if batch_fetcher is not None:
_drain_uploads(batch_fetcher)
if last_id is not None:
state.save_cursor(last_id, cursor_key)
return new_count
def _poll_loop(
state: InboxState,
platform_url: str,
workspace_id: str,
interval: float = POLL_INTERVAL_SECONDS,
stop_event: threading.Event | None = None,
) -> None:
"""Daemon-thread body: poll forever until stop_event fires.
auth_headers(workspace_id) is rebuilt every iteration so a token
rotation via env var, .auth_token file, or per-workspace registry
is picked up without a restart. Cheap (a dict + an env read).
Multi-workspace pollers pass the workspace_id so the per-workspace
bearer token is selected from platform_auth's registry; single-
workspace pollers fall through to the legacy resolution path
(workspace_id arg is still passed but the registry lookup misses
and auth_headers falls back to the cached/file/env token).
"""
from platform_auth import auth_headers
while True:
try:
_poll_once(state, platform_url, workspace_id, auth_headers(workspace_id))
except Exception as exc: # noqa: BLE001
logger.warning("inbox poller: iteration crashed: %s", exc)
if stop_event is not None and stop_event.wait(interval):
return
if stop_event is None:
time.sleep(interval)
def start_poller_thread(
state: InboxState,
platform_url: str,
workspace_id: str,
interval: float = POLL_INTERVAL_SECONDS,
stop_event: threading.Event | None = None,
) -> threading.Thread:
"""Spawn the poller as a daemon thread. Returns the Thread handle.
daemon=True so the poller dies with the main process — same
rationale as mcp_cli's heartbeat thread (no leaks, no stale
workspace writes after the operator hits Ctrl-C).
Thread name embeds the workspace_id (truncated) so a multi-workspace
operator running ``ps -eL`` or eyeballing ``threading.enumerate()``
can tell which thread is which without reverse-engineering it from
crash tracebacks.
Pass ``stop_event`` to enable graceful shutdown — used by tests so
the daemon thread doesn't outlive the test that started it and race
with later tests' httpx patches. Production code passes None and
relies on the daemon flag for process-exit cleanup.
"""
name = "molecule-mcp-inbox-poller"
if workspace_id:
name = f"{name}-{workspace_id[:8]}"
t = threading.Thread(
target=_poll_loop,
args=(state, platform_url, workspace_id, interval, stop_event),
name=name,
daemon=True,
)
t.start()
return t
def default_cursor_path(workspace_id: str = "") -> Path:
"""Standard cursor location: ``<resolved configs dir>/.mcp_inbox_cursor``.
Resolved via configs_dir so the cursor lives next to .auth_token
+ .platform_inbound_secret regardless of whether the runtime is
in-container (/configs) or external (~/.molecule-workspace).
Multi-workspace operators pass ``workspace_id`` to get a unique
cursor file per workspace (``.mcp_inbox_cursor_<wsid_short>``) so
pollers don't trample each other's cursors. Single-workspace
operators omit the arg and keep the legacy filename — back-compat
with existing on-disk cursors.
"""
base = configs_dir.resolve() / ".mcp_inbox_cursor"
if workspace_id:
# 8-char prefix is enough to disambiguate two workspaces in the
# same operator's setup (UUID v4 first 32 bits ≈ 4 billion of
# entropy) without hash-bombing the filename.
return base.with_name(f".mcp_inbox_cursor_{workspace_id[:8]}")
return base
-733
View File
@@ -1,733 +0,0 @@
"""Poll-mode chat-upload fetcher + URI cache for the standalone path.
Companion to ``inbox.py``. When the workspace's inbox poller sees an
``activity_logs`` row with ``method='chat_upload_receive'`` (written by
the platform's ``uploadPollMode`` handler — workspace-server
``internal/handlers/chat_files.go``), this module:
1. Pulls the bytes from
``GET /workspaces/:id/pending-uploads/:file_id/content``.
2. Writes them to ``/workspace/.molecule/chat-uploads/<prefix>-<name>``
— same on-disk shape as the push-mode handler in
``internal_chat_uploads.py``, so anything downstream that already
resolves ``workspace:/workspace/.molecule/chat-uploads/...`` URIs
works unchanged.
3. POSTs ``/workspaces/:id/pending-uploads/:file_id/ack`` so Phase 3
sweep can clean up the platform-side ``pending_uploads`` row.
4. Records a ``platform-pending:<wsid>/<file_id> →
workspace:/workspace/.molecule/chat-uploads/...`` mapping in a
process-local cache so the chat message that arrives later
(referencing the platform-pending URI) gets rewritten before the
agent sees it.
URI rewrite ordering — the chat message containing the
``platform-pending:`` URI is logged by the platform AFTER the
``chat_upload_receive`` row, so the inbox poller sees the upload-receive
row first (lower activity_logs.id) and stages the bytes before the chat
message arrives in the same poll batch (or a later one). The URI cache
is therefore populated before the message_from_activity path needs it.
A miss (network race, restart with stale cursor) is handled by keeping
the original ``platform-pending:`` URI in the rewritten body — the agent
will see something it can't open, which is preferable to silently
dropping the URI.
Auth — same Bearer token the inbox poller uses (``platform_auth.auth_headers``).
Both endpoints are on the wsAuth-gated route, so this module can never
read another tenant's bytes even if a token is misrouted.
"""
from __future__ import annotations
import concurrent.futures
import logging
import mimetypes
import os
import re
import secrets as pysecrets
import threading
from collections import OrderedDict
from pathlib import Path
from typing import Any
logger = logging.getLogger(__name__)
# Same on-disk root as internal_chat_uploads.CHAT_UPLOAD_DIR — keeping
# these decoupled would let drift sneak in. Imported here rather than
# from internal_chat_uploads to avoid pulling in starlette as a
# transitive dep (this module runs in the standalone MCP path which
# doesn't ship the in-container HTTP server).
CHAT_UPLOAD_DIR = "/workspace/.molecule/chat-uploads"
# Per-file safety net. The platform enforces 100 MB on the staging side
# (workspace-server migration 20260519200000_pending_uploads_bump_size_cap
# + pendinguploads.MaxFileBytes — bumped from 25 MB per CTO directive
# 2026-05-19 to match push-mode mc#1588), but a buggy or hostile
# platform response shouldn't be able to fill the workspace's disk —
# refuse to write more than this even if the response claims a larger
# Content-Length.
MAX_FILE_BYTES = 100 * 1024 * 1024
# Network deadline for the GET. Tuned for a 100 MB transfer over a
# reasonable consumer link (~5 Mbps gives ~160s for the full payload),
# plus headroom for TLS + platform auth. Scaled up from the original
# 60s (sized for 25 MB) when the per-file cap moved to 100 MB — a fixed
# 60s would fire BEFORE a legitimate slow uplink finished streaming, the
# same wrong-reason failure mc#1588 fixed on the canvas side (forensic
# a99ab0a1 reno-stars). Aligned with platform httpClient.Timeout (1200s
# in chat_files.go after mc#1588) — laptop pull side gets a smaller
# value because it's downstream of a fully-staged row, not a live
# multipart parse.
DEFAULT_FETCH_TIMEOUT = 240.0
# Concurrency cap for ``BatchFetcher``. Four workers is enough headroom
# for the realistic "user dragged 3-4 files into chat at once" case
# while bounding the platform's per-workspace fan-out. The cap matters
# because the platform's /content endpoint reads bytea from Postgres in
# a single round-trip per request — N workers = N concurrent DB reads
# of up to 100 MB each (post-mc#1588 cap), so a higher cap could pressure
# platform memory without much UX win (network bandwidth is the
# bottleneck once the bytes are buffered).
DEFAULT_BATCH_FETCH_WORKERS = 4
# Upper bound on how long ``BatchFetcher.wait_all`` blocks the inbox
# poll loop before giving up on still-in-flight fetches. Aligned with
# DEFAULT_FETCH_TIMEOUT so a single hung fetch can't stall the loop
# longer than its own deadline. A timeout fires only if a worker thread
# is stuck past the underlying httpx timeout — pathological case;
# normal completion is bounded by per-fetch timeout × ceil(N/W).
DEFAULT_BATCH_WAIT_TIMEOUT = DEFAULT_FETCH_TIMEOUT + 5.0
# Cap on the URI cache. A long-lived workspace handling thousands of
# uploads shouldn't grow without bound; an LRU cap of 1024 keeps the
# entries-needed-for-a-typical-conversation well within memory.
URI_CACHE_MAX_ENTRIES = 1024
# Same character class as internal_chat_uploads — kept duplicated rather
# than imported to avoid dragging starlette into the standalone path.
_UNSAFE_FILENAME_CHARS = re.compile(r"[^a-zA-Z0-9._\-]")
def sanitize_filename(name: str) -> str:
"""Reduce a user-supplied filename to a safe form.
Mirrors ``internal_chat_uploads.sanitize_filename`` and the Go
handler's ``SanitizeFilename`` — three-way parity is pinned by
``workspace-server/internal/handlers/sanitize_filename_test.go`` and
``workspace/tests/test_internal_chat_uploads.py`` so the URI shape
is identical regardless of which path handles the upload.
"""
base = os.path.basename(name)
base = base.replace(" ", "_")
base = _UNSAFE_FILENAME_CHARS.sub("_", base)
if len(base) > 100:
ext = ""
dot = base.rfind(".")
if dot >= 0 and len(base) - dot <= 16:
ext = base[dot:]
base = base[: 100 - len(ext)] + ext
if base in ("", ".", ".."):
return "file"
return base
# ---------------------------------------------------------------------------
# URI cache — maps platform-pending URIs to local workspace: URIs
# ---------------------------------------------------------------------------
class _URICache:
"""Thread-safe bounded LRU mapping of platform-pending → workspace URIs.
Bounded so a workspace that runs for months and handles thousands of
uploads doesn't accumulate entries forever. ``OrderedDict.move_to_end``
promotes recently-used entries; eviction takes the oldest.
The cache is intentionally per-process — there is no persistence
across a workspace restart. A restart with a stale inbox cursor that
re-poll an upload-receive row will re-fetch (the bytes are already
on disk from the prior session — see ``stage_to_disk``'s O_EXCL
handling) and re-register; a chat message that referenced the
platform-pending URI BEFORE the restart and arrives AFTER would miss
the rewrite and surface the platform-pending URI to the agent. That
is preferable to a stale persisted mapping that points at a deleted
file.
"""
def __init__(self, max_entries: int = URI_CACHE_MAX_ENTRIES):
self._max = max_entries
self._lock = threading.Lock()
self._entries: "OrderedDict[str, str]" = OrderedDict()
def get(self, pending_uri: str) -> str | None:
with self._lock:
local = self._entries.get(pending_uri)
if local is not None:
self._entries.move_to_end(pending_uri)
return local
def set(self, pending_uri: str, local_uri: str) -> None:
with self._lock:
self._entries[pending_uri] = local_uri
self._entries.move_to_end(pending_uri)
while len(self._entries) > self._max:
self._entries.popitem(last=False)
def __len__(self) -> int:
with self._lock:
return len(self._entries)
def clear(self) -> None:
with self._lock:
self._entries.clear()
_cache = _URICache()
def get_cache() -> _URICache:
"""Expose the module-singleton cache for tests and the rewrite path."""
return _cache
def resolve_pending_uri(uri: str) -> str | None:
"""Return the local ``workspace:`` URI for a ``platform-pending:`` URI,
or None if not yet staged. Convenience for callers that want to
fall back to an on-demand fetch — pass the result through to
``executor_helpers.resolve_attachment_uri``.
"""
return _cache.get(uri)
# ---------------------------------------------------------------------------
# On-disk staging
# ---------------------------------------------------------------------------
def _open_safe(path: str) -> int:
"""Open ``path`` for write with ``O_CREAT|O_EXCL|O_NOFOLLOW``.
Same shape as ``internal_chat_uploads._open_safe`` — refuses to
follow a pre-existing symlink at the target and refuses to overwrite
an existing regular file. The 16-byte random prefix makes a name
collision astronomical, but defense-in-depth costs nothing.
"""
flags = os.O_WRONLY | os.O_CREAT | os.O_EXCL
if hasattr(os, "O_NOFOLLOW"):
flags |= os.O_NOFOLLOW
return os.open(path, flags, 0o600)
def stage_to_disk(content: bytes, filename: str) -> str:
"""Write ``content`` under ``CHAT_UPLOAD_DIR`` and return the local URI.
Returns ``workspace:/workspace/.molecule/chat-uploads/<prefix>-<sanitized>``.
The 32-hex prefix makes the on-disk name unguessable to anything
that didn't see the response, so even if a stale agent has a guess
at the original filename it can't construct a URL to a sibling's
upload.
Raises:
OSError: write failure (mkdir, open, or write). Caller is
expected to log + skip; the activity row stays unacked so a
future poll re-tries.
ValueError: ``content`` exceeds ``MAX_FILE_BYTES``. Pre-staging
guard belt-and-braces above the platform's same-side cap.
"""
if len(content) > MAX_FILE_BYTES:
raise ValueError(
f"content size {len(content)} exceeds workspace cap {MAX_FILE_BYTES}"
)
Path(CHAT_UPLOAD_DIR).mkdir(parents=True, exist_ok=True)
sanitized = sanitize_filename(filename)
prefix = pysecrets.token_hex(16)
stored = f"{prefix}-{sanitized}"
target = os.path.join(CHAT_UPLOAD_DIR, stored)
fd = _open_safe(target)
try:
with os.fdopen(fd, "wb") as f:
f.write(content)
except OSError:
# Best-effort cleanup — partial writes leave a stub file that
# would mask a future retry's success otherwise.
try:
os.unlink(target)
except OSError:
pass
raise
return f"workspace:{CHAT_UPLOAD_DIR}/{stored}"
# ---------------------------------------------------------------------------
# Activity row → fetch/stage/ack flow
# ---------------------------------------------------------------------------
def _request_body_dict(row: dict[str, Any]) -> dict[str, Any] | None:
"""Coerce ``row['request_body']`` into a dict.
The /activity API returns request_body as JSON (already-deserialized
by httpx). Some legacy paths or mocked transports may emit a string;
handle defensively rather than raising.
"""
body = row.get("request_body")
if isinstance(body, dict):
return body
if isinstance(body, str):
import json
try:
decoded = json.loads(body)
except (TypeError, ValueError):
return None
return decoded if isinstance(decoded, dict) else None
return None
def is_chat_upload_row(row: dict[str, Any]) -> bool:
"""True if ``row`` is the platform's chat-upload-receive activity.
Used by the inbox poller to fork the row off the regular A2A
message handling path — this row is not a peer message; it's an
instruction to fetch + stage bytes. Match on ``method`` only;
``activity_type`` is already filtered to ``a2a_receive`` upstream.
"""
return row.get("method") == "chat_upload_receive"
def fetch_and_stage(
row: dict[str, Any],
*,
platform_url: str,
workspace_id: str,
headers: dict[str, str],
timeout_secs: float = DEFAULT_FETCH_TIMEOUT,
client: Any = None,
) -> str | None:
"""Fetch the row's bytes, stage them under chat-uploads, and ack.
Returns the local ``workspace:`` URI on success, or ``None`` if any
step failed (logged with enough detail to triage). Failure leaves
the platform-side row unacked, so a subsequent poll retries — the
activity row stays in the cursor's window because we DO advance the
cursor (the row is "handled" from the inbox's perspective even on
fetch failure; otherwise a permanent network outage would stall the
cursor and block real chat traffic).
On success, the URI cache is updated so a subsequent chat message
referencing the same ``platform-pending:`` URI is rewritten before
the agent sees it.
Pass ``client`` to reuse a shared ``httpx.Client`` for both GET and
POST ack (saves one TLS handshake per row vs. constructing one
per-call). ``BatchFetcher`` does this across an entire poll batch so
N concurrent fetches share one connection pool.
"""
body = _request_body_dict(row)
if body is None:
logger.warning(
"inbox_uploads: row %s missing request_body; cannot fetch",
row.get("id"),
)
return None
file_id = body.get("file_id")
if not isinstance(file_id, str) or not file_id:
logger.warning(
"inbox_uploads: row %s has no file_id in request_body",
row.get("id"),
)
return None
pending_uri = body.get("uri")
if not isinstance(pending_uri, str) or not pending_uri:
# Reconstruct what the platform would have written — defensive
# against a row whose uri field got truncated. Same shape as the
# Go handler's URI builder.
pending_uri = f"platform-pending:{workspace_id}/{file_id}"
filename = body.get("name") or "file"
if not isinstance(filename, str):
filename = "file"
# Caller-supplied client: reuse for both GET + POST ack. Otherwise
# build a one-shot client and close it on the way out. Lazy httpx
# import keeps the standalone MCP path's optional dep optional.
own_client = client is None
if own_client:
try:
import httpx # noqa: WPS433
except ImportError:
logger.error("inbox_uploads: httpx not installed; cannot fetch %s", file_id)
return None
client = httpx.Client(timeout=timeout_secs)
try:
return _fetch_and_stage_with_client(
client,
platform_url=platform_url,
workspace_id=workspace_id,
headers=headers,
file_id=file_id,
pending_uri=pending_uri,
filename=filename,
body=body,
)
finally:
if own_client:
try:
client.close()
except Exception: # noqa: BLE001 — close should never crash the caller
pass
def _fetch_and_stage_with_client(
client: Any,
*,
platform_url: str,
workspace_id: str,
headers: dict[str, str],
file_id: str,
pending_uri: str,
filename: str,
body: dict[str, Any],
) -> str | None:
"""Inner body of fetch_and_stage. Always uses the supplied client for
both GET and POST so the connection pool is shared across the call.
"""
content_url = f"{platform_url}/workspaces/{workspace_id}/pending-uploads/{file_id}/content"
ack_url = f"{platform_url}/workspaces/{workspace_id}/pending-uploads/{file_id}/ack"
try:
resp = client.get(content_url, headers=headers)
except Exception as exc: # noqa: BLE001
logger.warning("inbox_uploads: GET %s failed: %s", content_url, exc)
return None
if resp.status_code == 404:
# Row was swept or already acked by a previous poll race — nothing
# to fetch. Don't ack again; the platform's GC handles it. This is
# a soft-skip, not an error — log at INFO so triage isn't noisy.
logger.info(
"inbox_uploads: pending upload %s already gone (404); skipping",
file_id,
)
return None
if resp.status_code >= 400:
logger.warning(
"inbox_uploads: GET %s returned %d: %s",
content_url,
resp.status_code,
(resp.text or "")[:200],
)
return None
content = resp.content or b""
if len(content) > MAX_FILE_BYTES:
logger.warning(
"inbox_uploads: refusing to stage %s — size %d exceeds cap %d",
file_id,
len(content),
MAX_FILE_BYTES,
)
return None
# Mimetype precedence: platform's Content-Type header → request_body
# mimeType field → extension guess. Same precedence as the in-
# container ingest handler.
mime_header = resp.headers.get("content-type", "").split(";")[0].strip()
mime = (
mime_header
or (body.get("mimeType") if isinstance(body.get("mimeType"), str) else "")
or (mimetypes.guess_type(filename)[0] or "")
)
try:
local_uri = stage_to_disk(content, filename)
except (OSError, ValueError) as exc:
logger.error(
"inbox_uploads: failed to stage %s (%s) to disk: %s",
file_id,
filename,
exc,
)
return None
_cache.set(pending_uri, local_uri)
logger.info(
"inbox_uploads: staged file_id=%s name=%s size=%d mime=%s pending_uri=%s local_uri=%s",
file_id,
filename,
len(content),
mime,
pending_uri,
local_uri,
)
# Ack last so a write failure above leaves the row available for a
# retry on the next poll. A failed ack is logged but doesn't roll
# back the on-disk file — the platform's sweep will clean up
# eventually.
try:
ack_resp = client.post(ack_url, headers=headers)
if ack_resp.status_code >= 400:
logger.warning(
"inbox_uploads: ack %s returned %d: %s",
ack_url,
ack_resp.status_code,
(ack_resp.text or "")[:200],
)
except Exception as exc: # noqa: BLE001
logger.warning("inbox_uploads: POST %s failed: %s", ack_url, exc)
return local_uri
# ---------------------------------------------------------------------------
# BatchFetcher — concurrent fetch across a single poll batch
# ---------------------------------------------------------------------------
class BatchFetcher:
"""Fetch + stage + ack a batch of upload-receive rows concurrently.
Why this exists: the inbox poll loop used to call ``fetch_and_stage``
serially per row. With N upload rows in a batch (a user dragging
multiple files into chat at once), the loop blocked for
``N × per_fetch_latency`` before processing the chat message that
referenced them — a 4-file upload at 5s each = 20s of stall
before the agent saw the user's prompt. ``BatchFetcher`` runs the
fetches on a small thread pool (default 4 workers) so the stall is
bounded by ``ceil(N/W) × per_fetch_latency`` instead.
Connection reuse: one ``httpx.Client`` is shared across every fetch
in the batch. httpx clients carry a connection pool, so a second
fetch to the same platform host reuses the TCP+TLS handshake from
the first — measurable win when fetches happen back-to-back.
Correctness invariant the caller MUST preserve: the inbox loop is
expected to call ``wait_all()`` before processing the chat-message
activity row that REFERENCES one of these uploads. Without the
barrier, the URI cache is empty when ``rewrite_request_body`` runs
and the agent sees the un-rewritten ``platform-pending:`` URI. The
caller-side test ``test_poll_once_waits_for_uploads_before_messages``
pins this end-to-end.
Use as a context manager so the executor + client are torn down
even if the caller raises mid-batch.
"""
def __init__(
self,
*,
platform_url: str,
workspace_id: str,
headers: dict[str, str],
timeout_secs: float = DEFAULT_FETCH_TIMEOUT,
max_workers: int = DEFAULT_BATCH_FETCH_WORKERS,
client: Any = None,
):
self._platform_url = platform_url
self._workspace_id = workspace_id
self._headers = dict(headers) # copy so caller mutations don't leak in
self._timeout_secs = timeout_secs
# Caller can inject a client (tests do this); production callers
# let us build one. Track ownership so we only close ours.
self._own_client = client is None
if self._own_client:
try:
import httpx # noqa: WPS433
except ImportError:
# Match fetch_and_stage's behavior: log + degrade rather
# than raising at construction time. submit() will then
# return None for every row.
logger.error("inbox_uploads: httpx not installed; BatchFetcher inert")
self._client: Any = None
else:
self._client = httpx.Client(timeout=timeout_secs)
else:
self._client = client
self._executor = concurrent.futures.ThreadPoolExecutor(
max_workers=max_workers,
thread_name_prefix="upload-fetch",
)
self._futures: list[concurrent.futures.Future[Any]] = []
self._closed = False
# Flipped to True by wait_all when the timeout fires; close()
# reads this to decide between drain-and-wait vs cancel-queued.
self._timed_out = False
def submit(self, row: dict[str, Any]) -> concurrent.futures.Future[Any] | None:
"""Submit ``row`` for fetch + stage + ack. Non-blocking — the
worker thread runs ``fetch_and_stage`` with the shared client.
Returns the Future so a caller that wants per-row outcome can
await it; ``None`` if the BatchFetcher is in a degraded state
(httpx missing).
"""
if self._closed:
raise RuntimeError("BatchFetcher: submit after close")
if self._client is None:
return None
fut = self._executor.submit(
fetch_and_stage,
row,
platform_url=self._platform_url,
workspace_id=self._workspace_id,
headers=self._headers,
timeout_secs=self._timeout_secs,
client=self._client,
)
self._futures.append(fut)
return fut
def wait_all(self, timeout: float | None = DEFAULT_BATCH_WAIT_TIMEOUT) -> None:
"""Block until every submitted future completes (or times out).
Per-future exceptions are logged + swallowed — ``fetch_and_stage``
already converts every error path to ``return None``, so a real
exception propagating up to here is unexpected and we don't want
one bad fetch to abort the whole batch.
Timeouts are also logged + swallowed AND record the timed-out
futures on ``self._timed_out`` so ``close`` can cancel them
without paying their full latency. Without this hand-off,
``close()``'s ``shutdown(wait=True)`` would block on the leaked
workers and undo the user-facing timeout — the inbox poll loop
would stall indefinitely on a hung /content fetch.
"""
if not self._futures:
return
try:
done, not_done = concurrent.futures.wait(
self._futures,
timeout=timeout,
return_when=concurrent.futures.ALL_COMPLETED,
)
except Exception as exc: # noqa: BLE001 — concurrent.futures shouldn't raise here
logger.warning("inbox_uploads: BatchFetcher.wait_all crashed: %s", exc)
return
for fut in done:
exc = fut.exception()
if exc is not None:
logger.warning(
"inbox_uploads: BatchFetcher worker raised: %s", exc
)
if not_done:
logger.warning(
"inbox_uploads: BatchFetcher.wait_all left %d in-flight after %ss timeout",
len(not_done),
timeout,
)
# Mark these futures so close() knows to cancel-not-wait. We
# cancel queued-but-not-started ones immediately; futures
# already running can't be cancelled (Python's threading
# model), but close() will pass cancel_futures=True so any
# remaining queued items don't run.
for fut in not_done:
fut.cancel()
self._timed_out = True
def close(self) -> None:
"""Tear down the executor + (if owned) the httpx client.
Idempotent. After close, ``submit`` raises and the BatchFetcher
cannot be reused — construct a fresh one for the next poll.
If ``wait_all`` reported a timeout, shutdown skips the
``wait=True`` drain and instead asks the executor to drop queued
futures (``cancel_futures=True``). Currently-running workers
can't be interrupted by Python's threading model, but the poll
loop returns immediately rather than blocking on a hung fetch.
"""
if self._closed:
return
self._closed = True
timed_out = getattr(self, "_timed_out", False)
try:
if timed_out:
# cancel_futures landed in Python 3.9 — guarded for older
# interpreters via a TypeError fallback. Drop queued
# tasks; running ones will exit when their httpx call
# eventually returns or the daemon thread dies.
try:
self._executor.shutdown(wait=False, cancel_futures=True)
except TypeError:
self._executor.shutdown(wait=False)
else:
# Healthy path: wait for in-flight work so we don't
# interrupt a fetch mid-write.
self._executor.shutdown(wait=True)
except Exception as exc: # noqa: BLE001
logger.warning("inbox_uploads: executor shutdown error: %s", exc)
if self._own_client and self._client is not None:
try:
self._client.close()
except Exception as exc: # noqa: BLE001
logger.warning("inbox_uploads: client close error: %s", exc)
def __enter__(self) -> "BatchFetcher":
return self
def __exit__(self, exc_type, exc, tb) -> None:
self.close()
# ---------------------------------------------------------------------------
# URI rewrite for incoming chat messages
# ---------------------------------------------------------------------------
#
# The chat message that references a staged upload arrives as a
# SEPARATE activity_log row, with parts of kind=file containing
# platform-pending: URIs in the file.uri field. Walk the structure
# in-place and rewrite to the local workspace: URI when the cache has it.
# Unknown URIs pass through unchanged — the agent gets to choose how
# to react (most runtimes log + ignore an unresolvable URI).
def _rewrite_part(part: Any) -> None:
"""Mutate a single A2A Part dict to swap platform-pending: URIs."""
if not isinstance(part, dict):
return
file_obj = part.get("file")
if not isinstance(file_obj, dict):
return
uri = file_obj.get("uri")
if not isinstance(uri, str) or not uri.startswith("platform-pending:"):
return
rewritten = _cache.get(uri)
if rewritten:
file_obj["uri"] = rewritten
def rewrite_request_body(body: Any) -> None:
"""Mutate ``body`` in-place, replacing platform-pending: URIs with
the cached local equivalents.
Walks the same shapes ``inbox._extract_text`` accepts:
- ``body['parts']``
- ``body['params']['parts']``
- ``body['params']['message']['parts']``
No-op for shapes that don't match — the message simply passes
through to the agent as-is.
"""
if not isinstance(body, dict):
return
candidates: list[Any] = []
params = body.get("params") if isinstance(body.get("params"), dict) else None
if params:
message = params.get("message") if isinstance(params.get("message"), dict) else None
if message:
candidates.append(message.get("parts"))
candidates.append(params.get("parts"))
candidates.append(body.get("parts"))
for parts in candidates:
if isinstance(parts, list):
for part in parts:
_rewrite_part(part)
-51
View File
@@ -1,51 +0,0 @@
"""Helpers for the workspace's one-shot initial_prompt.
Kept as a standalone module (no heavy imports like uvicorn) so the marker
logic is unit-testable without standing up the full workspace runtime.
Background: the workspace runtime supports an `initial_prompt` that runs once
on first boot (clone the repo, set git hooks, read CLAUDE.md, commit_memory).
A marker file `.initial_prompt_done` prevents the prompt from re-running on
subsequent boots.
Prior behaviour wrote the marker AFTER the prompt completed successfully. If
the prompt crashed mid-execution (e.g. ProcessError from a stale Claude
session), the marker was never written; every subsequent container boot
replayed the same failing prompt, cascading into "every message crashes until
an operator intervenes." See GitHub issue #71.
Fix (2026-04-12): write the marker BEFORE firing the prompt. If the prompt
fails, operators re-send it manually via chat — cheap and available — instead
of trapping the workspace in a crash loop.
"""
from __future__ import annotations
import os
def resolve_initial_prompt_marker(config_path: str) -> str:
"""Return the path where the `.initial_prompt_done` marker should live.
Prefers ``<config_path>/.initial_prompt_done`` when the directory is
writable; falls back to ``/workspace/.initial_prompt_done`` for containers
where ``/configs`` is read-only.
"""
if os.access(config_path, os.W_OK):
return os.path.join(config_path, ".initial_prompt_done")
return "/workspace/.initial_prompt_done"
def mark_initial_prompt_attempted(marker_path: str) -> bool:
"""Write the marker best-effort. Return True on success, False on I/O error.
Called BEFORE the initial-prompt self-message is sent. If the attempt
later fails, the marker is still present — so the next container boot
does NOT replay the same failing prompt. Operators retry manually via
the chat interface instead of relying on auto-replay.
"""
try:
with open(marker_path, "w") as f:
f.write("attempted")
return True
except OSError:
return False
-287
View File
@@ -1,287 +0,0 @@
"""POST /internal/chat/uploads/ingest — workspace-side chat upload sink.
Replaces the Docker-exec / tar-copy path the platform-side workspace-server
used historically (see RFC #2312). The platform forwards the multipart
request to this handler with a Bearer header carrying the workspace's
inbound secret; this handler validates, writes each file under
``/workspace/.molecule/chat-uploads/<random>-<sanitized-name>``, and
returns the same ``ChatUploadedFile`` shape the platform Go handler
returned previously, so callers (canvas, molecli, A2A tools) see no
contract change.
Why no platform-side Docker-exec equivalent here:
The handler runs INSIDE the workspace container, which already has
direct filesystem access to /workspace. mkdir + open + write is
enough — no archive ceremony, no remote-exec round-trip, no
docker socket dependency. Same code path on local Docker and SaaS
EC2; the bug behind #2308 (platform's findContainer is nil in
SaaS) cannot exist here by construction.
Path safety:
sanitize_filename strips everything outside [A-Za-z0-9._-], collapses
spaces, refuses ``""``/`"."`/`".."`, and caps length at 100 chars
(preserving extension if ≤16 chars). Files are written with
O_CREAT|O_EXCL|O_NOFOLLOW so a pre-existing symlink at the target
cannot redirect the write to /etc/* or any sensitive location, and
a colliding name fails fast (the random prefix already makes
collisions astronomical, but defense-in-depth costs nothing).
Limits (matches the Go contract from chat_files.go):
- 100 MB total request body
- 100 MB per file
- filename truncated to 100 chars
Response shape:
{"files": [
{"uri": "workspace:/workspace/.molecule/chat-uploads/<id>-<name>",
"name": "<sanitized name>",
"mimeType": "<content-type or guessed>",
"size": <bytes>}
]}
"""
from __future__ import annotations
import logging
import mimetypes
import os
import re
import secrets as pysecrets
from pathlib import Path
from starlette.requests import Request
from starlette.responses import JSONResponse
from platform_inbound_auth import get_inbound_secret, inbound_authorized
logger = logging.getLogger(__name__)
# In-container destination — must match the platform-side Go constant
# `chatUploadDir` so the URI scheme stays identical and existing canvas
# / agent code that resolves "workspace:/workspace/.molecule/chat-uploads/*"
# keeps working unchanged.
CHAT_UPLOAD_DIR = "/workspace/.molecule/chat-uploads"
# Total-request body cap. multipart/form-data with multiple parts can
# add ~100 bytes of framing per file; the cap is the bytes hitting the
# socket, including framing.
#
# SERVER_MIRROR: keep aligned with workspace-server/internal/handlers/
# chat_files.go chatUploadMaxBytes AND canvas/src/components/tabs/chat/
# uploads.ts MAX_UPLOAD_BYTES. Three constants exist (platform Go +
# workspace Python + canvas TS) because each layer must enforce or
# pre-flight the cap on its own; an SSOT follow-up tracked in
# molecule-ai/internal would expose the cap via GET /uploads/limits.
CHAT_UPLOAD_MAX_BYTES = 100 * 1024 * 1024 # 100 MB
# Per-file cap. Aligned with the total at 100 MB so a single legitimate
# large file (e.g. a 70 MB PDF — reno-stars 2026-05-19 forensic
# a99ab0a1) succeeds end-to-end; batched small attachments still fit
# under the same ceiling.
CHAT_UPLOAD_MAX_FILE_BYTES = 100 * 1024 * 1024 # 100 MB
# Conservative {alnum, dot, underscore, dash} character class — anything
# outside gets rewritten so embedded paths, control chars, newlines,
# quotes, and shell metachars never reach the filesystem.
_UNSAFE_FILENAME_CHARS = re.compile(r"[^a-zA-Z0-9._\-]")
def sanitize_filename(name: str) -> str:
"""Reduce a user-supplied filename to a safe form.
Mirrors workspace-server/internal/handlers/chat_files.go::sanitizeFilename
so canvas-emitted URIs stay identical regardless of which path
handles the upload.
"""
base = os.path.basename(name)
base = base.replace(" ", "_")
base = _UNSAFE_FILENAME_CHARS.sub("_", base)
if len(base) > 100:
ext = ""
dot = base.rfind(".")
if dot >= 0 and len(base) - dot <= 16:
ext = base[dot:]
base = base[: 100 - len(ext)] + ext
if base in ("", ".", ".."):
return "file"
return base
def _open_safe(path: str) -> int:
"""Open `path` for write with O_CREAT|O_EXCL|O_NOFOLLOW.
Refuses to follow a pre-existing symlink at the target, and refuses
to overwrite an existing regular file. Both protections close the
same class of attack: a process inside the workspace container that
raced to create a symlink at the destination before the upload landed.
The random 16-byte prefix on the stored name makes the race
effectively impossible, but defense-in-depth costs nothing here.
"""
flags = os.O_WRONLY | os.O_CREAT | os.O_EXCL
# O_NOFOLLOW is POSIX; refuses to open if the path is a symlink.
if hasattr(os, "O_NOFOLLOW"):
flags |= os.O_NOFOLLOW
return os.open(path, flags, 0o600)
async def ingest_handler(request: Request) -> JSONResponse:
"""POST /internal/chat/uploads/ingest — Starlette route handler.
Auth: Bearer <platform_inbound_secret>; fail-closed when the secret
file is missing or empty.
Body: multipart/form-data with one or more `files` parts.
Returns 200 with the list of stored URIs on success, or one of:
401 unauthorized — bad / missing bearer
400 bad request — malformed multipart, no files field, etc.
413 payload too large — total body or per-file over cap
500 internal — disk write failed
"""
if not inbound_authorized(get_inbound_secret(), request.headers.get("Authorization", "")):
return JSONResponse({"error": "unauthorized"}, status_code=401)
# Total-body guard. Starlette won't enforce this for us; we read
# Content-Length first and reject early to avoid streaming a 5 GB
# request through the multipart parser only to bail at the end.
cl_str = request.headers.get("Content-Length", "")
if cl_str:
try:
cl = int(cl_str)
except ValueError:
cl = -1
if cl > CHAT_UPLOAD_MAX_BYTES:
return JSONResponse(
{"error": f"request body exceeds total limit ({CHAT_UPLOAD_MAX_BYTES // (1024*1024)} MB)"},
status_code=413,
)
try:
form = await request.form(max_files=64, max_fields=32)
except Exception as exc: # multipart parse error
# Surface exc.class + str(exc) to the caller. Prior behavior returned
# only the opaque {"error": "failed to parse multipart form"}, which
# took ~25 min to root-cause in forensic a78762a0 (Hermes workspace
# PDF upload, 2026-05-19) — the underlying cause was a MISSING
# python-multipart dep, surfaced as an AssertionError from Starlette's
# parser. Surfacing exception class + detail in the 400 body would
# have cut that to ~10 min. Per feedback_surface_actionable_failure_
# reason_to_user (CTO 2026-05-17): user-facing failures MUST tell the
# user WHY. Top-level "error" key is preserved for backwards-compat
# with existing canvas / alert rules.
logger.warning(
"internal_chat_uploads: multipart parse failed: %s: %s",
type(exc).__name__, exc,
)
return JSONResponse(
{
"error": "failed to parse multipart form",
"exception": type(exc).__name__,
"detail": str(exc),
},
status_code=400,
)
# Starlette's FormData allows multiple values per key — `files` may
# appear multiple times for batched uploads. getlist returns them
# in order.
parts = form.getlist("files")
if not parts:
return JSONResponse({"error": "expected at least one 'files' field"}, status_code=400)
# Filter out non-file entries defensively. Starlette's UploadFile
# has a .filename attribute; plain string fields don't.
uploads = [p for p in parts if hasattr(p, "filename") and hasattr(p, "read")]
if not uploads:
return JSONResponse({"error": "expected at least one 'files' field"}, status_code=400)
# mkdir -p is idempotent. Fired every call so a container restart
# that wipes /workspace/.molecule doesn't surprise us.
try:
Path(CHAT_UPLOAD_DIR).mkdir(parents=True, exist_ok=True)
except OSError as exc:
# Surface errno + path in the response so a fresh-tenant
# "failed to prepare uploads dir" 500 self-diagnoses without
# requiring SSM access to the workspace stderr. Prior incident
# 2026-05-01: hongming.moleculesai.app hit EACCES on the
# /workspace volume's `.molecule` subtree (root-owned race
# window between Docker volume create and entrypoint's chown,
# fixed via molecule-ai-workspace-template-claude-code#23).
# The errno + path are not security-sensitive — both are
# well-known to anyone with workspace access.
logger.error("internal_chat_uploads: mkdir %s failed: %s", CHAT_UPLOAD_DIR, exc)
return JSONResponse(
{
"error": "failed to prepare uploads dir",
"path": CHAT_UPLOAD_DIR,
"errno": exc.errno,
"detail": str(exc),
},
status_code=500,
)
response_files: list[dict] = []
total_bytes = 0
for upload in uploads:
# Read into memory with a hard cap. Files larger than the cap
# surface as 413; we don't truncate silently.
data = await upload.read(CHAT_UPLOAD_MAX_FILE_BYTES + 1)
if len(data) > CHAT_UPLOAD_MAX_FILE_BYTES:
return JSONResponse(
{"error": f"{upload.filename} exceeds per-file limit ({CHAT_UPLOAD_MAX_FILE_BYTES // (1024*1024)} MB)"},
status_code=413,
)
total_bytes += len(data)
if total_bytes > CHAT_UPLOAD_MAX_BYTES:
return JSONResponse(
{"error": f"total request body exceeds limit ({CHAT_UPLOAD_MAX_BYTES // (1024*1024)} MB)"},
status_code=413,
)
sanitized = sanitize_filename(upload.filename or "file")
# 16-byte random prefix → 32-hex-char + sanitized name. Same
# shape as the Go handler's `hex.EncodeToString(rand 16) + "-" + name`.
prefix = pysecrets.token_hex(16)
stored = f"{prefix}-{sanitized}"
target = os.path.join(CHAT_UPLOAD_DIR, stored)
try:
fd = _open_safe(target)
except FileExistsError:
# 32 hex chars of entropy → 128 bits → re-collision is
# astronomical. If we hit it anyway, surface as 500 rather
# than overwriting; the next retry will pick a fresh prefix.
logger.error("internal_chat_uploads: collision at %s — refusing overwrite", target)
return JSONResponse({"error": "internal collision; retry"}, status_code=500)
except OSError as exc:
logger.error("internal_chat_uploads: open %s failed: %s", target, exc)
return JSONResponse({"error": "failed to write file"}, status_code=500)
try:
with os.fdopen(fd, "wb") as f:
f.write(data)
except OSError as exc:
logger.error("internal_chat_uploads: write %s failed: %s", target, exc)
# Best-effort cleanup of the partial file. unlink can fail
# if the file was never created (open succeeded but write
# failed before any bytes hit disk) or if the dir was
# concurrently torn down — neither case warrants surfacing.
try:
os.unlink(target)
except OSError as unlink_exc:
logger.debug("internal_chat_uploads: unlink %s after write fail: %s", target, unlink_exc)
return JSONResponse({"error": "failed to write file"}, status_code=500)
# Mime type: prefer the part's Content-Type header, fall back to
# extension-based guess. matches the Go handler's precedence.
mime_type = upload.headers.get("content-type") if hasattr(upload, "headers") else None
if not mime_type:
mime_type, _ = mimetypes.guess_type(sanitized)
response_files.append({
"uri": f"workspace:{CHAT_UPLOAD_DIR}/{stored}",
"name": sanitized,
"mimeType": mime_type or "",
"size": len(data),
})
return JSONResponse({"files": response_files}, status_code=200)
-134
View File
@@ -1,134 +0,0 @@
"""GET /internal/file/read?path=<abs path> — workspace-side file read sink.
Companion to /internal/chat/uploads/ingest (RFC #2312 PR-B). Replaces the
docker-cp tar-stream extraction the platform-side workspace-server used
in chat_files.go::Download. Same path-safety contract as the legacy Go
handler:
* absolute path required
* must canonicalise to itself (no `..` segments, no double-slashes)
* must land under one of {/configs, /workspace, /home, /plugins}
* must be a regular file (not a directory, symlink, device, etc.)
Why a single broad "/internal/file/read" instead of a chat-specific path:
Today's chat_files.go::Download already accepts paths under any of the
four allowed roots — it's not strictly chat. Future PR-G/H will migrate
/files/* template-config reads to the same forward pattern; reusing
the same endpoint avoids three near-identical handlers (one per domain)
with duplicated path-safety logic.
Auth: Bearer <platform_inbound_secret>; fail-closed when missing.
Response shape (matches Go contract for byte-for-byte compatibility):
Content-Type: <mime.guess from extension or application/octet-stream>
Content-Length: <stat size>
Content-Disposition: attachment; filename="<basename>"; filename*=UTF-8''<encoded>
body: raw file bytes (binary-safe — no JSON wrapping)
"""
from __future__ import annotations
import logging
import mimetypes
import os
import urllib.parse
from pathlib import Path
from starlette.requests import Request
from starlette.responses import FileResponse, JSONResponse
from platform_inbound_auth import get_inbound_secret, inbound_authorized
logger = logging.getLogger(__name__)
# Mirror chat_files.go's allowedRoots set. A request whose `path` doesn't
# fall under one of these — by exact-match or prefix-with-trailing-slash
# — is rejected at the gate, regardless of how many `..` segments
# canonicalised away.
_ALLOWED_ROOTS = ("/configs", "/workspace", "/home", "/plugins")
def _content_disposition_attachment(name: str) -> str:
"""Mirror chat_files.go::contentDispositionAttachment.
Quotes, CR, and LF stripped/escaped per RFC 6266 / RFC 5987.
Drop control chars, escape backslash and double-quote in the
quoted-string. Emit percent-encoded filename* so non-ASCII names
survive in clients that prefer the modern form.
"""
safe_q: list[str] = []
for ch in name:
if ch in ("\r", "\n"):
continue # would terminate the header
if ch in ('"', "\\"):
safe_q.append("\\")
safe_q.append(ch)
continue
if ord(ch) < 0x20 or ord(ch) == 0x7f:
continue # other control chars
safe_q.append(ch)
ascii_safe = "".join(safe_q)
encoded = urllib.parse.quote(name, safe="") # full RFC 3986 unreserved-only
return f'attachment; filename="{ascii_safe}"; filename*=UTF-8\'\'{encoded}'
def _validate_path(path: str) -> tuple[bool, str]:
"""Return (ok, error_msg). Mirrors Go's chat_files.go::Download
validation in the same order so error shapes stay identical."""
if not path:
return False, "path query required"
if not os.path.isabs(path):
return False, "path must be absolute"
rooted = False
for root in _ALLOWED_ROOTS:
if path == root or path.startswith(root + "/"):
rooted = True
break
if not rooted:
return False, "path must be under /configs, /workspace, /home, or /plugins"
# Reject anything that canonicalises differently or contains a
# traversal segment. Defence-in-depth on top of the prefix check.
if os.path.normpath(path) != path or ".." in path:
return False, "invalid path"
return True, ""
async def file_read_handler(request: Request):
"""GET /internal/file/read — Starlette route handler."""
if not inbound_authorized(get_inbound_secret(), request.headers.get("Authorization", "")):
return JSONResponse({"error": "unauthorized"}, status_code=401)
path = request.query_params.get("path", "")
ok, err = _validate_path(path)
if not ok:
return JSONResponse({"error": err}, status_code=400)
# lstat (not stat) so a symlink at the path doesn't pretend to be the
# file it points at — we want to know "is this LITERALLY a regular
# file at the validated path." A symlink could redirect to /etc/*
# or another mount.
try:
st = os.lstat(path)
except FileNotFoundError:
return JSONResponse({"error": "file not found"}, status_code=404)
except OSError as exc:
logger.warning("internal_file_read: lstat %s failed: %s", path, exc)
return JSONResponse({"error": "stat failed"}, status_code=500)
import stat as _stat
if not _stat.S_ISREG(st.st_mode):
return JSONResponse({"error": "path is not a regular file"}, status_code=400)
name = os.path.basename(path)
mime_type, _ = mimetypes.guess_type(name)
if not mime_type:
mime_type = "application/octet-stream"
return FileResponse(
path,
media_type=mime_type,
headers={
"Content-Disposition": _content_disposition_attachment(name),
},
)
View File
-192
View File
@@ -1,192 +0,0 @@
"""Pre-stop serialization for pause/resume — GH#1391.
Captures the agent's in-memory state just before the container exits so it
survives intentional pause and unplanned restart. All content is scrubbed
with lib.snapshot_scrub before being written to disk so that a snapshot blob
obtained by an attacker cannot recover API keys, tokens, or arbitrary sandbox
output (GH#823).
State captured
--------------
- ``workspace_id`` — identity for cross-container restore
- ``current_task`` — active task label from heartbeat (what the canvas sees)
- ``active_tasks`` — task count
- ``session_id`` — SDK session handle (Claude Code); key for full session
- ``transcript_lines`` — recent session log lines from the adapter
- ``uptime_seconds`` — how long this container has been running
- ``timestamp`` — when the snapshot was taken (ISO-8601)
Scrubbing
---------
Every text field passes through scrub_snapshot before being written.
Sandbox-sourced content (tool=run_code, source=sandbox, [sandbox_output]) is
dropped wholesale. Secrets matching the pattern library are replaced with
[REDACTED:TYPE] markers.
Storage
-------
Snapshots are written to /configs/.agent_snapshot.json by default. The
config volume survives container restarts so the file is durable. The path
is also overridable via ``AGENT_SNAPSHOT_PATH`` for testing or custom layouts.
"""
from __future__ import annotations
import json
import logging
import os
from datetime import datetime, timezone
from typing import TYPE_CHECKING, Any
from .snapshot_scrub import scrub_snapshot
if TYPE_CHECKING:
from heartbeat import HeartbeatLoop
logger = logging.getLogger(__name__)
# Default snapshot path — on the config volume, survives container restarts.
DEFAULT_SNAPSHOT_PATH = os.environ.get(
"AGENT_SNAPSHOT_PATH",
"/configs/.agent_snapshot.json",
)
# How many transcript lines to capture in the snapshot (recent window).
MAX_TRANSCRIPT_LINES = 200
def build_snapshot(
heartbeat: "HeartbeatLoop | None",
adapter_state: dict[str, Any],
) -> dict[str, Any]:
"""Build a raw snapshot dict from live workspace state.
Args:
heartbeat: HeartbeatLoop instance; provides current_task, session_id, etc.
adapter_state: Arbitrary state dict from the adapter's pre_stop_state() hook.
Keys are free-form; all string values in nested dicts/lists are
scrubbed before writing.
Returns a raw (not yet scrubbed) snapshot dict.
"""
import time
raw: dict[str, Any] = {
"workspace_id": os.environ.get("WORKSPACE_ID", "unknown"),
"timestamp": datetime.now(timezone.utc).isoformat(),
# Defaults — heartbeat block below overwrites these when available:
"current_task": "",
"active_tasks": 0,
}
if heartbeat is not None:
raw["current_task"] = heartbeat.current_task or ""
raw["active_tasks"] = heartbeat.active_tasks
if hasattr(heartbeat, "start_time"):
raw["uptime_seconds"] = int(time.time() - heartbeat.start_time)
# session_id lives in the adapter but we also accept it via heartbeat
# for convenience (avoids requiring every adapter to pass it separately).
if not adapter_state.get("session_id"):
raw["session_id"] = getattr(heartbeat, "_session_id", None) or ""
# Adapter-supplied state (conversation history, reasoning traces, etc.)
raw["adapter"] = adapter_state
return raw
def _scrub_value(value: Any) -> Any:
"""Recursively scrub all secret patterns from a value.
- Strings: scrub_content() replaces patterns with [REDACTED:TYPE].
- Dicts: return a new dict with all values scrubbed recursively.
- Lists: drop entries that are sandbox content; scrub remaining items.
- Other: pass through unchanged.
"""
from .snapshot_scrub import is_sandbox_content, scrub_content
if isinstance(value, str):
return scrub_content(value)
if isinstance(value, dict):
return {k: _scrub_value(v) for k, v in value.items()}
if isinstance(value, list):
result = []
for item in value:
if isinstance(item, str) and is_sandbox_content(item):
continue # Drop sandbox entries wholesale
result.append(_scrub_value(item))
return result
return value
def write_snapshot(
snapshot: dict[str, Any],
path: str | None = None,
) -> bool:
"""Scrub and write a snapshot to disk.
Args:
snapshot: Raw snapshot dict from build_snapshot().
path: Target file path (default: DEFAULT_SNAPSHOT_PATH).
Returns:
True if the snapshot was written successfully; False on any error.
Errors are logged but never raise — pre-stop serialization must be
best-effort to avoid blocking shutdown.
"""
target = path or DEFAULT_SNAPSHOT_PATH
try:
# Deep-scrub every string value in the snapshot to remove API keys,
# tokens, and arbitrary sandbox output before writing to disk.
scrubbed = _scrub_value(snapshot)
# Ensure parent directory exists.
parent = os.path.dirname(target)
if parent:
os.makedirs(parent, exist_ok=True)
with open(target, "w") as f:
json.dump(scrubbed, f, indent=2, default=str)
logger.info(
"Pre-stop snapshot written: %s (workspace=%s, task=%r, lines=%d)",
target,
scrubbed.get("workspace_id", "?"),
scrubbed.get("current_task", ""),
len(scrubbed.get("adapter", {}).get("transcript_lines", [])),
)
return True
except Exception as exc:
logger.warning("Pre-stop snapshot write failed (%s): %s", target, exc)
return False
def read_snapshot(
path: str | None = None,
) -> dict[str, Any] | None:
"""Read and return a previously-written snapshot, or None if absent/invalid."""
target = path or DEFAULT_SNAPSHOT_PATH
if not os.path.exists(target):
return None
try:
with open(target) as f:
return json.load(f)
except Exception as exc:
logger.debug("Snapshot read failed (%s): %s", target, exc)
return None
def delete_snapshot(path: str | None = None) -> None:
"""Remove a snapshot file. Idempotent — no error if absent."""
target = path or DEFAULT_SNAPSHOT_PATH
try:
os.remove(target)
logger.debug("Snapshot deleted: %s", target)
except FileNotFoundError:
pass
except Exception as exc:
logger.warning("Snapshot delete failed (%s): %s", target, exc)
-125
View File
@@ -1,125 +0,0 @@
"""Snapshot scrubbing — strip secrets and internal details from hibernation snapshots.
Issue #823 (sub of #799). Before the workspace runtime serializes a memory
snapshot for hibernation, every memory entry's content must pass through
this scrubber so an attacker who obtains a snapshot blob cannot recover:
- API keys (sk-ant-, sk-proj-, ghp_, etc.)
- Auth tokens (Bearer headers, OAuth tokens)
- Env-var assignments (ANTHROPIC_API_KEY=..., OPENAI_API_KEY=...)
- Arbitrary subprocess output from the sandbox tool (can be anything)
The scrubber is a pure function so it can be unit-tested independently.
"""
from __future__ import annotations
import re
from typing import Any
# Compiled once at import time — most-specific patterns first so that
# env-var assignments are caught before the generic sk-* or base64 sweeps
# swallow only part of the match.
_SECRET_PATTERNS: list[tuple[re.Pattern[str], str]] = [
# Env-var assignments: ANTHROPIC_API_KEY=sk-ant-... GITHUB_TOKEN=ghp_...
(re.compile(r"(?i)\b[A-Z][A-Z0-9_]*_API_KEY\s*=\s*\S+"), "API_KEY"),
(re.compile(r"(?i)\b[A-Z][A-Z0-9_]*_TOKEN\s*=\s*\S+"), "TOKEN"),
(re.compile(r"(?i)\b[A-Z][A-Z0-9_]*_SECRET\s*=\s*\S+"), "SECRET"),
# HTTP Bearer header values.
(re.compile(r"Bearer\s+\S+"), "BEARER_TOKEN"),
# OpenAI / Anthropic sk-... / sk-ant-... / sk-proj-... key format.
(re.compile(r"sk-[A-Za-z0-9\-_]{16,}"), "SK_TOKEN"),
# GitHub personal access tokens and installation tokens.
(re.compile(r"ghp_[A-Za-z0-9]{20,}"), "GITHUB_PAT"),
(re.compile(r"ghs_[A-Za-z0-9]{20,}"), "GITHUB_SERVER_TOKEN"),
(re.compile(r"github_pat_[A-Za-z0-9_]{60,}"), "GITHUB_PAT_V2"),
# AWS access key IDs.
(re.compile(r"\bAKIA[A-Z0-9]{16}\b"), "AWS_ACCESS_KEY"),
# Cloudflare API tokens.
(re.compile(r"\bcfut_[A-Za-z0-9]{32,}"), "CF_TOKEN"),
# Molecule partner API keys (Phase 34).
(re.compile(r"\bmol_pk_[A-Za-z0-9]{20,}"), "MOL_PK"),
# context7 tokens.
(re.compile(r"\bctx7_[A-Za-z0-9]+"), "CTX7_TOKEN"),
# High-entropy base64 blobs 33+ chars. Catches long opaque tokens that
# don't match any structured pattern above.
(re.compile(r"[A-Za-z0-9+/]{33,}={0,2}"), "BASE64_BLOB"),
]
# Substring markers that identify content from the run_code sandbox tool.
# Any memory entry tagged with this source is excluded wholesale from the
# snapshot — the arbitrary subprocess output cannot be safely scrubbed by
# pattern alone (attacker could print `echo "innocent"` but have hidden
# secrets in stderr or file handles).
_SANDBOX_TOOL_MARKERS = (
"source=sandbox",
"tool=run_code",
"[sandbox_output]",
)
def scrub_content(content: str) -> str:
"""Return `content` with secret patterns replaced by [REDACTED:LABEL] markers.
Idempotent — running scrub_content on already-scrubbed output is a no-op
because [REDACTED:...] doesn't match any of the patterns above.
"""
if not content:
return content
out = content
for pattern, label in _SECRET_PATTERNS:
out = pattern.sub(f"[REDACTED:{label}]", out)
return out
def is_sandbox_content(content: str) -> bool:
"""Return True if `content` originates from the run_code sandbox tool.
Sandbox output can contain arbitrary subprocess stdout/stderr that may
include secrets the scrubber wouldn't recognize (e.g. printed via a
custom format). Entries matching this check should be excluded from
the snapshot entirely rather than scrubbed.
"""
if not content:
return False
lower = content.lower()
return any(marker in lower for marker in _SANDBOX_TOOL_MARKERS)
def scrub_memory_entry(entry: dict[str, Any]) -> dict[str, Any] | None:
"""Scrub a single memory entry for snapshot inclusion.
Returns a new dict with secrets redacted, or None if the entry must be
excluded entirely (sandbox-sourced content).
The input dict is treated as read-only — callers should use the returned
value and not mutate the original.
"""
content = entry.get("content", "")
if is_sandbox_content(content):
return None
scrubbed = dict(entry)
scrubbed["content"] = scrub_content(content)
return scrubbed
def scrub_snapshot(snapshot: dict[str, Any]) -> dict[str, Any]:
"""Scrub a full snapshot payload before serialization.
Walks the `memories` list, scrubs each entry's content, and drops
sandbox-sourced entries. Other snapshot fields (workspace metadata,
config, etc.) pass through unchanged — they are not expected to contain
user-supplied secret-bearing content.
Returns a new dict; the input is not mutated.
"""
out = dict(snapshot)
memories = snapshot.get("memories") or []
scrubbed_list = []
for entry in memories:
cleaned = scrub_memory_entry(entry)
if cleaned is not None:
scrubbed_list.append(cleaned)
out["memories"] = scrubbed_list
return out
-819
View File
@@ -1,819 +0,0 @@
"""Workspace runtime entry point.
Loads config -> discovers adapter -> setup -> create executor -> wrap in A2A -> register -> heartbeat.
"""
import asyncio
import json
import os
import socket
import httpx
import uvicorn
# KI-009 a2a-sdk v1 migration: A2AStarletteApplication removed; use Starlette route factory
from a2a.server.routes import create_agent_card_routes, create_jsonrpc_routes
from a2a.server.request_handlers import DefaultRequestHandler
from a2a.server.tasks import InMemoryTaskStore
from a2a.types import AgentCard, AgentCapabilities, AgentSkill, AgentInterface
from starlette.applications import Starlette
from adapters import get_adapter, AdapterConfig
from agents_md import generate_agents_md
from config import load_config
from heartbeat import HeartbeatLoop
from preflight import run_preflight, render_preflight_report
from builtin_tools.awareness_client import get_awareness_config
import uuid as _uuid
from builtin_tools.telemetry import setup_telemetry, make_trace_middleware
from policies.namespaces import resolve_awareness_namespace
from initial_prompt import (
mark_initial_prompt_attempted,
resolve_initial_prompt_marker,
)
from platform_auth import auth_headers, self_source_headers
def get_machine_ip() -> str: # pragma: no cover
"""Get the machine's IP for A2A discovery."""
try:
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.connect(("8.8.8.8", 80))
ip = s.getsockname()[0]
s.close()
return ip
except Exception:
return "127.0.0.1"
def _check_delegation_results_pending() -> bool:
"""Check if there are unconsumed delegation results waiting.
Reads ``DELEGATION_RESULTS_FILE``. Returns ``True`` if the file
exists and contains non-whitespace content (after stripping) — meaning
the idle loop should skip this tick. Returns ``False`` if the file is
absent, empty, or contains only whitespace.
The extracted form lets unit tests call this directly rather than mirroring
the logic (anti-pattern flagged as #401).
"""
from heartbeat import DELEGATION_RESULTS_FILE
try:
with open(DELEGATION_RESULTS_FILE) as rf:
rf.seek(0)
return bool(rf.read().strip())
except FileNotFoundError:
return False
# Re-exported from transcript_auth for the inline /transcript handler.
# Separate module keeps the security-critical gate import-light + unit-testable.
from transcript_auth import transcript_authorized as _transcript_authorized
async def main(): # pragma: no cover
workspace_id = os.environ.get("WORKSPACE_ID", "")
if not workspace_id:
raise SystemExit("FATAL: WORKSPACE_ID env var is not set. Aborting.")
config_path = os.environ.get("WORKSPACE_CONFIG_PATH", "/configs")
# Docker-aware default — host.docker.internal resolves the platform service
# from inside the Docker network mesh; falls back to localhost for local dev.
if os.path.exists("/.dockerenv") or os.environ.get("DOCKER_VERSION"):
platform_url = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
else:
platform_url = os.environ.get("PLATFORM_URL", "http://localhost:8080")
awareness_config = get_awareness_config()
# 0. Initialise OpenTelemetry (no-op if packages not installed)
setup_telemetry(service_name=workspace_id)
# 0a. Fix /workspace perms before any agent code runs. Docker ships
# named volumes as root:root 755 — without this the non-root agent
# user can't write files the user asked it to produce, and the
# "agent → file → user downloads" flow dead-ends at a bash "permission
# denied". Best-effort: no-ops silently if molecule-runtime itself
# isn't root (template's own start.sh should have handled it there).
from executor_helpers import ensure_workspace_writable
ensure_workspace_writable()
# 1. Load config
config = load_config(config_path)
port = config.a2a.port
preflight = run_preflight(config, config_path)
render_preflight_report(preflight)
# 1a. Generate AGENTS.md so peer agents and discovery tools can see this
# workspace's identity, role, endpoint, and capabilities immediately.
try:
generate_agents_md(config_path, "/workspace/AGENTS.md")
except Exception as _agents_md_err: # pragma: no cover
print(f"Warning: AGENTS.md generation failed (non-fatal): {_agents_md_err}")
if not preflight.ok:
raise SystemExit(1)
if awareness_config:
awareness_namespace = resolve_awareness_namespace(
workspace_id,
awareness_config.get("namespace", ""),
)
print(f"Awareness enabled for namespace: {awareness_namespace}")
# 1.5 Initialise governance adapter (no-op if disabled or package absent)
from builtin_tools.governance import initialize_governance
if config.governance.enabled:
await initialize_governance(config.governance)
print(f"Governance: Microsoft Agent Governance Toolkit enabled (mode={config.governance.policy_mode})")
else:
print("Governance: disabled (set governance.enabled: true in config.yaml to activate)")
# 2. Create heartbeat (passed to adapter for task tracking).
# interval is sourced from observability.heartbeat_interval_seconds
# in config.yaml — clamped to [5, 300] at parse time. Operators
# who want a faster crash-detection signal lower it; ones who want
# to reduce platform write load raise it.
heartbeat = HeartbeatLoop(
platform_url,
workspace_id,
interval_seconds=config.observability.heartbeat_interval_seconds,
)
# 3. Get adapter for this runtime
runtime = config.runtime or "langgraph"
adapter_cls = get_adapter(runtime) # Raises KeyError if unknown — no silent fallback
adapter = adapter_cls()
print(f"Runtime: {runtime} ({adapter.display_name()})")
# 3a. Wire pluggable event-log backend from config.observability.event_log.
# Default config.yaml sets backend=memory; operators set "disabled" to
# opt out without removing append-call sites from adapter code.
from event_log import create_event_log
adapter.event_log = create_event_log(
backend=config.observability.event_log.backend,
ttl_seconds=config.observability.event_log.ttl_seconds,
max_entries=config.observability.event_log.max_entries,
)
# 4. Build adapter config
adapter_config = AdapterConfig(
model=config.model,
system_prompt=None, # Adapter builds its own prompt
tools=config.skills, # Skill names from config.yaml
runtime_config=vars(config.runtime_config) if config.runtime_config else {},
config_path=config_path,
workspace_id=workspace_id,
prompt_files=config.prompt_files,
a2a_port=port,
heartbeat=heartbeat,
)
# 5. Build the AgentCard *before* adapter.setup() so /.well-known/agent-card.json
# is reachable as soon as uvicorn binds, regardless of whether the adapter
# has working LLM credentials. Decoupling readiness ("is the workspace up?")
# from configuration ("can it actually answer?") means a workspace with a
# missing/rotated key stays REACHABLE — canvas can render a clear
# "agent not configured" error instead of "stuck booting forever," and
# operators can deprovision/redeploy normally. Skills built from
# config.skills (static names from config.yaml) up front; richer metadata
# from the adapter's loaded_skills swaps in below if setup() succeeds.
machine_ip = os.environ.get("HOSTNAME", get_machine_ip())
workspace_url = f"http://{machine_ip}:{port}"
# v1: AgentCard.url removed; put url+protocol in supported_interfaces instead.
# v1: AgentCapabilities.inputModes/outputModes removed; move to AgentCard.default_*.
# v1: pushNotifications → push_notifications (Pydantic field name)
#
# AgentCard's protocol message uses `supported_interfaces` (plural,
# interfaces — see a2a-sdk types/a2a_pb2.pyi:189). The 0.3.x→1.0
# migration in #1974 originally used `supported_protocols`, which
# the protobuf doesn't expose at all — every workspace boot since
# then crashed with `ValueError: Protocol message AgentCard has no
# "supported_protocols" field`. The crash didn't surface in the
# publish-runtime smoke because the smoke only IMPORTS
# molecule_runtime.main, never CALLS the AgentCard constructor.
# Don't rename back.
agent_card = AgentCard(
name=config.name,
description=config.description or config.name,
version=config.version,
supported_interfaces=[
AgentInterface(protocol_binding="https://a2a.g/v1", url=workspace_url)
],
capabilities=AgentCapabilities(
streaming=config.a2a.streaming,
push_notifications=config.a2a.push_notifications,
# Note: state_transition_history (a 0.x capability flag) was
# removed in a2a-sdk 1.0. Per the SDK's own
# a2a/compat/v0_3/conversions.py: "No longer supported in
# v1.0". The capability is now universal — Task.history is
# always available and tasks/get accepts historyLength via
# apply_history_length(). Don't add this kwarg back.
),
# Static skill stubs from config.yaml; replaced with rich metadata
# below if adapter.setup() loads skills successfully.
skills=[
AgentSkill(id=name, name=name, description=name, tags=[], examples=[])
for name in (config.skills or [])
],
default_input_modes=["text/plain", "application/json"],
default_output_modes=["text/plain", "application/json"],
)
# 6. Setup adapter and create executor
# On failure: log + continue. The card route stays mounted (above);
# the JSON-RPC route below returns -32603 "agent not configured" until
# the operator fixes credentials and redeploys. Heartbeat keeps running
# so the platform sees the workspace as reachable-but-misconfigured
# rather than crash-looping.
adapter_ready = False
adapter_error: str | None = None
executor = None
try:
await adapter.setup(adapter_config)
executor = await adapter.create_executor(adapter_config)
# 6a. Boot-smoke short-circuit (issue #2275): if MOLECULE_SMOKE_MODE
# is set, exercise the executor's full import tree by calling
# execute() once with stub deps + a short timeout. Skips platform
# registration + uvicorn entirely. Returns process exit code.
from smoke_mode import is_smoke_mode, run_executor_smoke
if is_smoke_mode():
exit_code = await run_executor_smoke(executor)
if hasattr(heartbeat, "stop"):
try:
await heartbeat.stop()
except Exception: # noqa: BLE001
pass
raise SystemExit(exit_code)
# 6b. Restore from pre-stop snapshot if one exists (GH#1391).
# The snapshot is scrubbed before being written, so secrets are
# already redacted — restore_state must not re-expose them.
from lib.pre_stop import read_snapshot
snapshot = read_snapshot()
if snapshot:
try:
adapter.restore_state(snapshot)
print(
f"Pre-stop snapshot restored: task={snapshot.get('current_task', '')!r}, "
f"uptime={snapshot.get('uptime_seconds', 0)}s"
)
except Exception as restore_err:
print(f"Warning: snapshot restore failed (continuing): {restore_err}")
# 6c. Swap rich skill metadata into the card now that setup() loaded
# them. In-place mutation: a2a-sdk's create_agent_card_routes serialises
# the card on each request, so the route mounted below sees the update.
# Isolated via card_helpers.enrich_card_skills — a malformed
# loaded_skills shape (e.g., a future adapter that doesn't follow
# the .metadata convention) is logged + swallowed instead of
# propagating up to the outer except, where it would silently
# degrade an OK boot to the not-configured state.
from card_helpers import enrich_card_skills
enrich_card_skills(agent_card, getattr(adapter, "loaded_skills", None))
adapter_ready = True
except SystemExit:
# Smoke-mode exit signal — propagate untouched.
raise
except Exception as setup_err: # noqa: BLE001
adapter_error = f"{type(setup_err).__name__}: {setup_err}"
print(
f"WARNING: adapter.setup() failed — workspace will serve agent-card "
f"but JSON-RPC will return -32603 until configuration is fixed. "
f"Reason: {adapter_error}",
flush=True,
)
# Heartbeat keeps running so the platform marks the workspace as
# reachable-but-misconfigured. Operators can then redeploy with the
# correct env vars without having to chase a crash-loop.
# 6.5. Initialise Temporal durable execution wrapper (optional). Only
# meaningful when an executor exists; skipped on misconfigured boots.
if adapter_ready:
from builtin_tools.temporal_workflow import create_wrapper as _create_temporal_wrapper
temporal_wrapper = _create_temporal_wrapper()
await temporal_wrapper.start()
# 7. Wrap in A2A.
#
# Route assembly is in workspace/boot_routes.py so the contract —
# card always mounted, JSON-RPC route swaps based on adapter state
# (DefaultRequestHandler when executor is non-None, not_configured
# handler returning -32603 otherwise) — is unit-testable with
# Starlette's TestClient. main.py is `# pragma: no cover` so without
# this extraction a future refactor that re-coupled card + setup()
# would silently bypass PR #2756. tests/test_boot_routes.py pins
# the four-branch contract.
from boot_routes import build_routes
app = Starlette(routes=build_routes(agent_card, executor, adapter_error))
# 8. Register with platform
# When adapter.setup() failed, advertise via configuration_status so
# the platform/canvas can render "configured: false, reason: …" instead
# of a confused "ready but silent" state.
loaded_skills = getattr(adapter, "loaded_skills", None) or []
agent_card_dict = {
"name": config.name,
"description": config.description,
"version": config.version,
"url": workspace_url,
"skills": [
{
"id": s.metadata.id,
"name": s.metadata.name,
"description": s.metadata.description,
"tags": s.metadata.tags,
}
for s in loaded_skills
] if adapter_ready else [
{"id": n, "name": n, "description": n, "tags": []}
for n in (config.skills or [])
],
"capabilities": {
"streaming": config.a2a.streaming,
"pushNotifications": config.a2a.push_notifications,
},
"configuration_status": "ready" if adapter_ready else "not_configured",
**({"configuration_error": adapter_error} if adapter_error else {}),
}
async with httpx.AsyncClient(timeout=10.0) as client:
try:
resp = await client.post(
f"{platform_url}/registry/register",
json={
"id": workspace_id,
"url": workspace_url,
"agent_card": agent_card_dict,
},
headers=auth_headers(),
)
print(f"Registered with platform: {resp.status_code}")
# Phase 30.1 — capture the auth token issued at first register.
# The platform only mints one on first register per workspace,
# so a subsequent restart gets an empty auth_token and we
# keep using the on-disk copy from the original issuance.
if resp.status_code == 200:
try:
body = resp.json()
tok = body.get("auth_token")
if tok:
from platform_auth import save_token
save_token(tok)
print(f"Saved workspace auth token (prefix={tok[:8]}…)")
# RFC #2312 PR-F: persist platform_inbound_secret if the
# platform supplied one. Idempotent — writing the same
# value over an existing file is harmless. Required for
# SaaS where there's no persistent /configs volume; on
# Docker mode it overwrites the value the provisioner
# already wrote at workspace creation.
inbound = body.get("platform_inbound_secret")
if inbound:
from platform_inbound_auth import save_inbound_secret
save_inbound_secret(inbound)
print(f"Saved platform_inbound_secret (prefix={inbound[:8]}…)")
except Exception as parse_exc:
print(f"Warning: couldn't parse register response for token: {parse_exc}")
except Exception as e:
print(f"Warning: failed to register with platform: {e}")
# 9. Start heartbeat
heartbeat.start()
# 9b. Start skills hot-reload watcher (background task)
# When a skill file changes the watcher reloads the skill module and calls
# back into the adapter so the next A2A request uses the updated tools.
# Skipped on misconfigured boots — adapter has no executor / tool registry
# to swap into, so reloading skills would NPE on the agent rebuild path.
if adapter_ready and config.skills:
try:
from skill_loader.watcher import SkillsWatcher
def _on_skill_reload(updated_skill):
"""Rebuild the LangGraph agent when a skill changes in-place."""
if not hasattr(adapter, "loaded_skills"):
return
# Replace the matching skill in the adapter's skill list
adapter.loaded_skills = [
updated_skill if s.metadata.id == updated_skill.metadata.id else s
for s in adapter.loaded_skills
]
# Rebuild the agent's tool list from updated skills
if hasattr(adapter, "all_tools") and hasattr(adapter, "system_prompt"):
from builtin_tools.approval import request_approval
from builtin_tools.delegation import delegate_task, delegate_task_async, check_task_status
from builtin_tools.memory import commit_memory, recall_memory
from builtin_tools.sandbox import run_code
# Core platform tools mirror adapter_base.all_tools — must
# match the platform_tools registry names so docs and tools
# never drift.
base_tools = [
delegate_task, delegate_task_async, check_task_status,
request_approval, commit_memory, recall_memory, run_code,
]
skill_tools = []
for sk in adapter.loaded_skills:
skill_tools.extend(sk.tools)
adapter.all_tools = base_tools + skill_tools
# Rebuild compiled agent so next ainvoke picks up new tools
try:
from agent import create_agent
new_agent = create_agent(
config.model, adapter.all_tools, adapter.system_prompt
)
executor.agent = new_agent
print(f"Skills hot-reload: '{updated_skill.metadata.id}' reloaded — "
f"{len(updated_skill.tools)} tool(s)")
except Exception as rebuild_err:
print(f"Skills hot-reload: agent rebuild failed: {rebuild_err}")
skills_watcher = SkillsWatcher(
config_path=config_path,
skill_names=config.skills,
on_reload=_on_skill_reload,
current_runtime=runtime,
)
asyncio.create_task(skills_watcher.start())
print(f"Skills hot-reload enabled for: {config.skills}")
except Exception as e:
print(f"Warning: skills watcher could not start: {e}")
# 10. Run A2A server
print(f"Workspace {workspace_id} starting on port {port}")
# Wrap the ASGI app with W3C TraceContext extraction middleware so incoming
# A2A HTTP requests propagate their trace context into _incoming_trace_context.
# v1: Starlette app is constructed directly; no build() step needed
starlette_app = app
# Add /transcript route — exposes the most-recent agent session log
# (claude-code reads ~/.claude/projects/<cwd>/<session>.jsonl). Other
# runtimes return supported:false.
from starlette.responses import JSONResponse
from starlette.routing import Route
async def _transcript_handler(request):
# Require workspace bearer token — the same token issued at registration
# and stored in /configs/.auth_token. Any container on molecule-core-net
# could otherwise read the full session log. Closes #287.
#
# #328: fail CLOSED when the token file is unavailable. get_token()
# returns None during the bootstrap window (first register hasn't
# completed), if /configs/.auth_token was deleted, or on OSError.
# The old `if expected:` guard treated all three cases as "skip
# auth" — an unauthenticated container on the same Docker network
# could read the entire session log during that window. Deny
# instead. The platform's TranscriptHandler acquires the token
# during registration, so once the bootstrap completes it always
# has a valid credential to present.
from platform_auth import get_token
if not _transcript_authorized(get_token(), request.headers.get("Authorization", "")):
return JSONResponse({"error": "unauthorized"}, status_code=401)
try:
since = int(request.query_params.get("since", "0"))
limit = int(request.query_params.get("limit", "100"))
except (TypeError, ValueError):
return JSONResponse({"error": "since and limit must be integers"}, status_code=400)
# Isolate adapter call: misconfigured boots leave the adapter
# partially-initialised, and a future adapter override of
# transcript_lines might assume setup() ran. Surface a 503 with
# a clear reason instead of letting the exception propagate to
# Starlette's 500 handler — same pattern as the not-configured
# JSON-RPC route (PR #2756). BaseAdapter.transcript_lines's
# default returns {"supported": false} so today's 4 adapters
# never trigger this branch; this is the safety net.
try:
result = await adapter.transcript_lines(since=since, limit=limit)
except Exception as transcript_err: # noqa: BLE001
return JSONResponse(
{
"error": "transcript unavailable",
"detail": f"{type(transcript_err).__name__}: {transcript_err}",
},
status_code=503,
)
return JSONResponse(result)
starlette_app.add_route("/transcript", _transcript_handler, methods=["GET"])
# /internal/* — platform→workspace forward calls (RFC #2312). Auth
# is the per-workspace platform_inbound_secret in
# /configs/.platform_inbound_secret, distinct from the outbound
# workspace_auth_token used by /transcript above.
from internal_chat_uploads import ingest_handler as _internal_chat_uploads_ingest
starlette_app.add_route(
"/internal/chat/uploads/ingest",
_internal_chat_uploads_ingest,
methods=["POST"],
)
from internal_file_read import file_read_handler as _internal_file_read
starlette_app.add_route(
"/internal/file/read",
_internal_file_read,
methods=["GET"],
)
built_app = make_trace_middleware(starlette_app)
# uvicorn expects the level name in lowercase ("debug" / "info" /
# "warning" / "error" / "critical"). config.observability.log_level
# is uppercased at parse time (config.py.load_config) for the
# Python ``logging`` module's convention; lower it here so both
# consumers get the form they expect from one source of truth.
# An ``LOG_LEVEL`` env var still wins as an ops-side debugging
# override — set it on the workspace process to bypass YAML
# without a config edit + restart cycle.
uvicorn_log_level = os.environ.get("LOG_LEVEL", config.observability.log_level).lower()
server_config = uvicorn.Config(
built_app,
host="0.0.0.0",
port=port,
log_level=uvicorn_log_level,
)
server = uvicorn.Server(server_config)
# 10b. Schedule initial_prompt self-message after server is ready.
# Only runs on first boot — creates a marker file to prevent re-execution on restart.
# Skipped on misconfigured boots: the self-message would route through the
# platform back to /, hit the -32603 not-configured handler, and consume
# the marker for a fire that can't actually run. Wait until the operator
# fixes credentials and the workspace redeploys with adapter_ready=True.
initial_prompt_task = None
initial_prompt_marker = resolve_initial_prompt_marker(config_path)
if adapter_ready and config.initial_prompt and not os.path.exists(initial_prompt_marker):
# Write the marker UP FRONT (#71): if the prompt later crashes or
# times out, we do NOT replay on next boot — that created a
# ProcessError cascade where every message kept crashing. Operators
# can always re-send via chat. Log loudly if the marker write
# fails so the situation is visible.
if not mark_initial_prompt_attempted(initial_prompt_marker):
print(
f"Initial prompt: WARNING — could not write marker at "
f"{initial_prompt_marker}; this boot may replay if it crashes.",
flush=True,
)
async def _send_initial_prompt():
"""Wait for server to be ready, then send initial_prompt as self-message."""
# Wait for the A2A server to accept connections.
# Use the SDK's own constant for the well-known path so this
# probe and the route mounted by create_agent_card_routes()
# never drift apart. Pre-fix this hardcoded the pre-1.x
# well-known path string; a2a-sdk 1.x renamed it (the
# canonical value lives in a2a.utils.constants now), so
# the probe got 404 every attempt and fell through to
# "server not ready after 30s, skipping" even though the
# server was actually serving fine. Net effect: every
# workspace silently dropped its `initial_prompt`.
from a2a.utils.constants import AGENT_CARD_WELL_KNOWN_PATH
ready = False
for attempt in range(30):
await asyncio.sleep(1)
try:
async with httpx.AsyncClient(timeout=5.0) as client:
resp = await client.get(f"http://127.0.0.1:{port}{AGENT_CARD_WELL_KNOWN_PATH}")
if resp.status_code == 200:
ready = True
break
except Exception:
continue
if not ready:
print("Initial prompt: server not ready after 30s, skipping", flush=True)
return
# Send initial prompt through the platform A2A proxy (not directly to self).
# The proxy logs an a2a_receive with source_id=NULL (canvas-style),
# broadcasts A2A_RESPONSE via WebSocket so the chat shows both the
# prompt (as user message) and the response (as agent message).
# Uses urllib in a thread to avoid asyncio/httpx streaming hangs.
import json as _json
import urllib.request
def _do_send_sync():
import time as _time
payload = _json.dumps({
"method": "message/send",
"params": {
"message": {
"role": "user",
"messageId": f"initial-{_uuid.uuid4().hex[:8]}",
"parts": [{"kind": "text", "text": config.initial_prompt}],
},
},
}).encode()
# #220: include platform bearer token so the request isn't
# silently rejected once any workspace has a live token on
# file. Without this, initial_prompt 401s in multi-tenant
# mode exactly like /registry/register did in #215.
# X-Workspace-ID via self_source_headers() so the platform
# tags the row source=agent — without it the canvas's
# My Chat tab renders the initial_prompt as if the user
# had typed it. See platform_auth.py for the full
# explanation.
headers = {
"Content-Type": "application/json",
**self_source_headers(workspace_id),
}
# Retry with backoff — the platform proxy may not be able to
# reach us yet (container networking takes a moment to settle).
max_retries = 5
for attempt in range(max_retries):
try:
req = urllib.request.Request(
f"{platform_url}/workspaces/{workspace_id}/a2a",
data=payload,
headers=headers,
)
with urllib.request.urlopen(req, timeout=600) as resp:
resp.read()
print(f"Initial prompt: completed (status={resp.status})", flush=True)
break
except Exception as e:
if attempt < max_retries - 1:
delay = 2 ** attempt # 1, 2, 4, 8, 16 seconds
print(f"Initial prompt: attempt {attempt + 1} failed ({e}), retrying in {delay}s...", flush=True)
_time.sleep(delay)
else:
print(f"Initial prompt: failed after {max_retries} attempts — {e}", flush=True)
return
# Marker was already written up front (#71). Nothing to do here.
print("Initial prompt: sending via platform proxy...", flush=True)
loop = asyncio.get_event_loop()
loop.run_in_executor(None, _do_send_sync)
initial_prompt_task = asyncio.create_task(_send_initial_prompt())
# 10c. Idle loop — reflection-on-completion / backlog-pull pattern.
# Fires config.idle_prompt every config.idle_interval_seconds while the
# workspace has no active task. This turns every role from "waits for cron"
# into "self-wakes when idle" — the Hermes/Letta shape from today's
# multi-framework survey (see docs/ecosystem-watch.md). Cost collapses to
# event-driven in practice: the idle check is local (no LLM call, just
# heartbeat.active_tasks==0), and the prompt only fires when there's
# actually nothing to do. Gated on idle_prompt being non-empty so existing
# workspaces upgrade opt-in — set idle_prompt in org.yaml defaults or
# per-workspace to enable.
idle_loop_task = None
# Skipped on misconfigured boots — the self-fire would route to the
# -32603 handler in a tight loop and consume cycles for no useful work.
if adapter_ready and config.idle_prompt:
# Idle-fire HTTP timeout. Kept tight relative to the fire cadence so a
# hung platform doesn't accumulate dangling requests — a fire that
# takes longer than the idle interval itself is almost certainly stuck.
IDLE_FIRE_TIMEOUT_SECONDS = max(60, min(300, config.idle_interval_seconds))
# Initial settle delay — never longer than 60s so cold-start races
# don't stall the first fire, and never shorter than the configured
# interval (short intervals shouldn't fire instantly on boot either).
IDLE_INITIAL_SETTLE_SECONDS = min(config.idle_interval_seconds, 60)
async def _run_idle_loop():
"""Self-sends config.idle_prompt periodically when the workspace is idle."""
await asyncio.sleep(IDLE_INITIAL_SETTLE_SECONDS)
import json as _json
from urllib import request as _urlreq, error as _urlerr
while True:
try:
await asyncio.sleep(config.idle_interval_seconds)
except asyncio.CancelledError:
return
# Local idle check — no platform API call, no LLM call.
# heartbeat.active_tasks == 0 means no in-flight work.
if heartbeat.active_tasks > 0:
continue
# Issue #381 fix: skip the idle prompt if there are unconsumed
# delegation results waiting. The heartbeat sends a self-message
# for every new result batch, so sending the idle prompt here would
# race: the agent would compose a stale tick BEFORE processing the
# results notification, producing repeated identical asks (peer sends
# correction, we respond with stale state, peer asks again).
# By skipping the idle prompt when results are pending, we let the
# heartbeat's own self-message wake the agent after results are
# written. The agent then sees the results in _prepare_prompt()
# and processes them before composing.
# Guard logic extracted to _check_delegation_results_pending() for
# direct unit-testing (#401 follow-up).
if _check_delegation_results_pending():
print(
"Idle loop: skipping — unconsumed delegation results pending "
"(heartbeat will notify agent)",
flush=True,
)
continue
# Self-post the idle prompt via the platform A2A proxy (same
# path as initial_prompt). The agent's own concurrency control
# rejects if the workspace becomes busy between this check and
# the post — that's the expected safety valve.
payload = _json.dumps({
"method": "message/send",
"params": {
"message": {
"role": "user",
"messageId": f"idle-{_uuid.uuid4().hex[:8]}",
"parts": [{"kind": "text", "text": config.idle_prompt}],
},
},
}).encode()
def _post_sync():
# Returns (status_code, error_type) so the caller logs the
# actual outcome instead of a bare "post failed" line.
# #220: include auth_headers() on every idle fire. Without
# this, the idle loop 401s in multi-tenant mode.
# self_source_headers() adds X-Workspace-ID so the
# platform classifies the idle fire as source=agent
# rather than user-typed canvas input.
headers = {
"Content-Type": "application/json",
**self_source_headers(workspace_id),
}
try:
req = _urlreq.Request(
f"{platform_url}/workspaces/{workspace_id}/a2a",
data=payload,
headers=headers,
)
with _urlreq.urlopen(req, timeout=IDLE_FIRE_TIMEOUT_SECONDS) as resp:
resp.read()
return resp.status, None
except _urlerr.HTTPError as e:
return e.code, type(e).__name__
except _urlerr.URLError as e:
return None, f"URLError: {e.reason}"
except Exception as e: # pragma: no cover — catch-all safety net
return None, type(e).__name__
print(
f"Idle loop: firing (active_tasks=0, interval={config.idle_interval_seconds}s, "
f"timeout={IDLE_FIRE_TIMEOUT_SECONDS}s)",
flush=True,
)
loop_ref = asyncio.get_running_loop()
def _log_result(future):
try:
status, err = future.result()
if err:
print(
f"Idle loop: post failed — status={status} err={err}",
flush=True,
)
else:
print(f"Idle loop: post ok status={status}", flush=True)
except Exception as e: # pragma: no cover
print(f"Idle loop: executor callback crashed — {e}", flush=True)
fut = loop_ref.run_in_executor(None, _post_sync)
fut.add_done_callback(_log_result)
idle_loop_task = asyncio.create_task(_run_idle_loop())
try:
await server.serve()
finally:
# 10d. Pre-stop serialization — GH#1391.
# Capture in-memory state before the container exits so it survives
# intentional pause and unplanned restart. All content is scrubbed
# via lib.snapshot_scrub before being written to the config volume.
try:
from lib.pre_stop import build_snapshot, write_snapshot
adapter_state = adapter.pre_stop_state() if adapter else {}
snapshot = build_snapshot(heartbeat, adapter_state)
write_snapshot(snapshot)
except Exception as pre_stop_err:
print(f"Warning: pre-stop serialization failed (continuing): {pre_stop_err}")
# Cancel initial prompt if still running
if initial_prompt_task and not initial_prompt_task.done():
initial_prompt_task.cancel()
# Cancel idle loop if running
if idle_loop_task and not idle_loop_task.done():
idle_loop_task.cancel()
# Gracefully stop the Temporal worker background task on shutdown
await temporal_wrapper.stop()
def main_sync(): # pragma: no cover
"""Synchronous entry point for the `molecule-runtime` console script.
Declared in scripts/build_runtime_package.py as the wheel's entry-point
target (`molecule-runtime = "molecule_runtime.main:main_sync"`). Removed
silently during the pre-monorepo consolidation, which broke every
workspace startup against 0.1.16/0.1.17/0.1.18 with `ImportError:
cannot import name 'main_sync'`. The .github/workflows/runtime-pin-compat.yml
smoke step is the regression gate.
"""
asyncio.run(main())
if __name__ == "__main__": # pragma: no cover
main_sync()
-220
View File
@@ -1,220 +0,0 @@
"""Console-script entry point for the ``molecule-mcp`` universal MCP server.
Validates required environment BEFORE importing the heavy
``a2a_mcp_server`` module — that module triggers a ``RuntimeError`` at
import time when ``WORKSPACE_ID`` is unset (a2a_client.py:22), and
console-script entry-point shims surface it as an ugly traceback. This
wrapper catches the missing-env case early and prints actionable help
to stderr so an operator running ``molecule-mcp`` for the first time
gets the right pointer in the first 3 lines of output instead of a
20-line traceback.
Standalone-runtime contract: this wrapper is responsible for keeping
the workspace ALIVE on the platform side, not just exposing tools.
Concretely it:
1. Calls ``POST /registry/register`` once at startup (idempotent —
the upsert flips status awaiting_agent → online for an external
workspace whose token matches).
2. Spawns a daemon heartbeat thread that POSTs to
``POST /registry/heartbeat`` every 20s. Without continuous
heartbeats the platform's healthsweep flips the workspace back
to awaiting_agent (visible as OFFLINE in the canvas with a
"Restart" CTA) within 60-90s.
3. Runs the MCP stdio loop in the foreground.
Why threads + sync requests: the MCP stdio server is async. The
heartbeat work is fire-and-forget HTTP. A daemon thread is the
lowest-friction integration — no asyncio bridging, dies automatically
when the main process exits, and ``requests`` is already a transitive
dependency via ``a2a-sdk``.
In-container usage (``python -m molecule_runtime.a2a_mcp_server`` or
direct import) bypasses this wrapper — the workspace runtime has its
own heartbeat loop in ``heartbeat.py`` so we don't double-heartbeat.
Module layout (RFC #2873 iter 3 split):
* ``mcp_heartbeat`` — register POST + heartbeat loop + auth-failure
escalation + inbound-secret persistence.
* ``mcp_workspace_resolver`` — env validation, single + multi-workspace
resolution, operator-help printer, on-disk token-file read.
* ``mcp_inbox_pollers`` — activate the inbox singleton + spawn one
daemon poller per workspace.
This file keeps just ``main()`` plus thin re-exports of the private
symbols so existing tests' imports (``mcp_cli._build_agent_card``,
``mcp_cli._heartbeat_loop``, etc.) keep working without churn.
"""
from __future__ import annotations
import logging
import os
import sys
import configs_dir
import mcp_heartbeat
import mcp_inbox_pollers
import mcp_workspace_resolver
logger = logging.getLogger(__name__)
# Re-export public surface for back-compat with the pre-split callers
# and tests. The underscore-prefixed names mirror the names that
# existed in this module before the split — keeping them ensures
# `mcp_cli._build_agent_card`, `mcp_cli._heartbeat_loop`, etc.
# resolve identically to the new functions.
HEARTBEAT_INTERVAL_SECONDS = mcp_heartbeat.HEARTBEAT_INTERVAL_SECONDS
_HEARTBEAT_AUTH_LOUD_THRESHOLD = mcp_heartbeat.HEARTBEAT_AUTH_LOUD_THRESHOLD
_HEARTBEAT_AUTH_RELOG_INTERVAL = mcp_heartbeat.HEARTBEAT_AUTH_RELOG_INTERVAL
_build_agent_card = mcp_heartbeat.build_agent_card
_platform_register = mcp_heartbeat.platform_register
_heartbeat_loop = mcp_heartbeat.heartbeat_loop
_log_heartbeat_auth_failure = mcp_heartbeat.log_heartbeat_auth_failure
_persist_inbound_secret_from_heartbeat = mcp_heartbeat.persist_inbound_secret_from_heartbeat
_start_heartbeat_thread = mcp_heartbeat.start_heartbeat_thread
_resolve_workspaces = mcp_workspace_resolver.resolve_workspaces
_print_missing_env_help = mcp_workspace_resolver.print_missing_env_help
_read_token_file = mcp_workspace_resolver.read_token_file
_start_inbox_pollers = mcp_inbox_pollers.start_inbox_pollers
def main() -> None:
"""Entry point for the ``molecule-mcp`` console script.
Returns nothing — calls ``sys.exit`` on validation failure or on
normal completion of the underlying MCP server loop.
Two registration shapes:
* Single-workspace (legacy): ``WORKSPACE_ID`` + token env/file.
Unchanged behavior.
* Multi-workspace: ``MOLECULE_WORKSPACES`` JSON env var with N
``{"id": ..., "token": ...}`` entries. One register + heartbeat
+ inbox poller per entry; messages from any workspace land in
the same agent inbox tagged with ``arrival_workspace_id``.
Subcommand:
``molecule-mcp doctor`` runs an onboarding diagnostic against the
current shell environment + platform reachability and exits.
Closes Ryan's #2934 item 6.
"""
# Subcommand dispatch — must come BEFORE env-var validation so
# `molecule-mcp doctor` can run on a partially-configured shell
# and tell the operator what's missing. Argv shapes:
# molecule-mcp → run server (this function's main path)
# molecule-mcp doctor → run diagnostic, exit
# molecule-mcp --help → defer to doctor for now (no other
# flags are supported yet)
if len(sys.argv) > 1:
if sys.argv[1] in ("doctor", "--doctor"):
import mcp_doctor
sys.exit(mcp_doctor.run())
if sys.argv[1] in ("--help", "-h", "help"):
print(
"molecule-mcp — Molecule AI universal MCP server\n\n"
"Usage:\n"
" molecule-mcp Run the MCP stdio server (registers + heartbeats)\n"
" molecule-mcp doctor Run onboarding diagnostic + exit\n\n"
"Required env: PLATFORM_URL, WORKSPACE_ID (or MOLECULE_WORKSPACES),\n"
" MOLECULE_WORKSPACE_TOKEN (or MOLECULE_WORKSPACE_TOKEN_FILE)\n",
)
sys.exit(0)
if not os.environ.get("PLATFORM_URL", "").strip():
_print_missing_env_help(
["PLATFORM_URL"],
have_token_file=(configs_dir.resolve() / ".auth_token").is_file(),
)
sys.exit(2)
workspaces, errors = _resolve_workspaces()
if errors or not workspaces:
# Reuse the missing-env help printer for legacy WORKSPACE_ID +
# token shape, which is what most first-run operators hit. For
# MOLECULE_WORKSPACES errors, print directly so the JSON-shape
# message isn't mangled into the WORKSPACE_ID-style help.
if os.environ.get("MOLECULE_WORKSPACES", "").strip():
print("molecule-mcp: invalid MOLECULE_WORKSPACES:", file=sys.stderr)
for e in errors:
print(f" - {e}", file=sys.stderr)
else:
_print_missing_env_help(
errors or ["WORKSPACE_ID", "MOLECULE_WORKSPACE_TOKEN"],
have_token_file=(configs_dir.resolve() / ".auth_token").is_file(),
)
sys.exit(2)
platform_url = os.environ["PLATFORM_URL"].strip().rstrip("/")
# In multi-workspace mode the FIRST entry is treated as the
# "primary" — it gets exported to a2a_client.py's module-level
# WORKSPACE_ID (which gates a RuntimeError at import time) and is
# used by tools that don't yet take an explicit workspace_id. PR-2
# parameterizes those tools; for now this preserves existing
# outbound-tool behavior unchanged for single-workspace operators
# AND for the multi-workspace operator's first registered
# workspace.
primary_workspace_id, _primary_token = workspaces[0]
os.environ["WORKSPACE_ID"] = primary_workspace_id
# Configure logging so the operator sees register/heartbeat status
# without needing to set up logging themselves. WARNING by default
# keeps the steady-state quiet (only failures); MOLECULE_MCP_VERBOSE=1
# surfaces register-success + per-tick heartbeat info for debugging.
log_level = (
logging.INFO
if os.environ.get("MOLECULE_MCP_VERBOSE", "").strip()
else logging.WARNING
)
logging.basicConfig(level=log_level, format="[molecule-mcp] %(message)s")
# Populate the per-workspace token registry so heartbeat threads,
# the inbox poller, and (later) outbound tools resolve the right
# token for each workspace via ``platform_auth.auth_headers(wsid)``.
# Done BEFORE register/heartbeat thread spawn so a thread that
# races to fire its first request always sees its token.
try:
from platform_auth import register_workspace_token
for wsid, tok in workspaces:
register_workspace_token(wsid, tok)
except ImportError:
# Older installs that don't yet ship register_workspace_token —
# multi-workspace resolution silently degrades to the legacy
# single-token path; single-workspace operators see no change.
logger.debug("platform_auth.register_workspace_token unavailable; skipping registry populate")
# Standalone-mode register + heartbeat. Skipped via env var so an
# in-container caller (which has its own heartbeat loop) can reuse
# this entry point without double-heartbeating. The wheel's main
# console-script path always runs them; the
# MOLECULE_MCP_DISABLE_HEARTBEAT escape hatch exists for tests +
# the rare embedded use-case.
if not os.environ.get("MOLECULE_MCP_DISABLE_HEARTBEAT", "").strip():
for wsid, tok in workspaces:
_platform_register(platform_url, wsid, tok)
_start_heartbeat_thread(platform_url, wsid, tok)
# Inbox poller — the inbound side of the standalone path. Without
# this thread, the universal MCP server is OUTBOUND-ONLY: an agent
# can call delegate_task / send_message_to_user but never observe
# canvas-user or peer-agent messages. One poller per workspace; all
# of them write to the SAME shared inbox state so the agent's
# inbox_peek/pop/wait tools see a merged view (each message tagged
# with arrival_workspace_id so the agent can route the reply).
#
# Same disable pattern as heartbeat: in-container callers (with
# push delivery via canvas WebSocket) skip this to avoid duplicate
# delivery; tests use the env to keep imports cheap.
if not os.environ.get("MOLECULE_MCP_DISABLE_INBOX", "").strip():
_start_inbox_pollers(platform_url, [w[0] for w in workspaces])
# Env is valid — safe to import the heavy module now. Importing
# earlier would trigger a2a_client.py:22's module-level RuntimeError
# before our friendly help reaches the user.
from a2a_mcp_server import cli_main
cli_main()
if __name__ == "__main__": # pragma: no cover
main()
-426
View File
@@ -1,426 +0,0 @@
"""molecule-mcp doctor — diagnostic subcommand for first-run install.
Run via ``molecule-mcp doctor``. Prints a checklist of common
onboarding failure modes and concrete next-step suggestions for each
failed check.
Closes Ryan's #2934 item 6 ("Add a molecule-mcp doctor subcommand —
this single command would have saved me 30 of the 45 minutes").
Pairs with #2935 (Python>=3.11 callout, PATH guidance, TOKEN_FILE
support) — those fixed the snippet, this gives the operator a way to
self-diagnose when something still goes wrong.
Six checks, in operator-encounter order:
1. Python version — wheel requires >=3.11 (pip says
"no versions found" on older).
2. Wheel install — molecule_runtime importable + version reported.
3. PATH for molecule-mcp — pip user-site installs land at
~/Library/Python/3.X/bin which isn't on
PATH on a fresh macOS shell. Most common
"claude mcp add can't find molecule-mcp"
cause.
4. Env vars — PLATFORM_URL set + reachable;
WORKSPACE_ID set; auth token resolvable
(env or *_FILE or .auth_token).
5. Platform health — GET ${PLATFORM_URL}/healthz returns 2xx.
Catches DNS/firewall/wrong-scheme issues
before the operator hits the real
register call.
6. Token auth — POST ${PLATFORM_URL}/registry/heartbeat
with the resolved workspace_id+token
returns 2xx. End-to-end auth verification.
Uses heartbeat (idempotent timestamp
update) instead of register (UPSERT —
would clobber agent_card metadata) so
the doctor is safe to run against a
live workspace.
Each check prints one of:
[OK] <one-line status>
[WARN] <one-line status> next: <fix suggestion>
[FAIL] <one-line status> next: <fix suggestion>
Exit 0 if all pass or only WARNs; exit 1 if any FAIL — so the
subcommand is scriptable from CI / install-checks too.
Out of scope for now (deferred follow-ups):
- Claude Code-specific checks (parse ~/.claude.json, verify each
MCP entry is plugin-sourced + dev-channels flag is set). That's
a separate Claude-Code-specific doctor and lives in the
claude-code-channel plugin, not the universal-MCP doctor.
- Automated remediation (running the suggested fix). Doctor is
a diagnostic tool — it tells the operator what's wrong + how
to fix it, doesn't apply changes.
"""
from __future__ import annotations
import importlib
import importlib.metadata
import os
import shutil
import sys
from typing import Optional
# urllib avoids a hard dep on `requests` for the doctor — the real
# CLI already imports requests via mcp_heartbeat, but doctor should
# keep working even on a partial install where requests is missing
# (that itself is a finding worth surfacing).
from urllib import request as urllib_request
from urllib.error import URLError
# ANSI colors are friendly on TTYs; auto-disable on pipe / NO_COLOR
# for CI logs where the escape sequences clutter the diff.
def _color(name: str) -> str:
if not sys.stdout.isatty() or os.environ.get("NO_COLOR"):
return ""
return {
"green": "\033[32m",
"yellow": "\033[33m",
"red": "\033[31m",
"dim": "\033[2m",
"reset": "\033[0m",
}.get(name, "")
def _ok(label: str, msg: str) -> None:
print(f" {_color('green')}[OK]{_color('reset')} {label}: {msg}")
def _warn(label: str, msg: str, fix: str) -> None:
print(f" {_color('yellow')}[WARN]{_color('reset')} {label}: {msg}")
print(f" {_color('dim')}next:{_color('reset')} {fix}")
def _fail(label: str, msg: str, fix: str) -> None:
print(f" {_color('red')}[FAIL]{_color('reset')} {label}: {msg}")
print(f" {_color('dim')}next:{_color('reset')} {fix}")
# Each check returns a "ok" | "warn" | "fail" verdict so the caller
# can compute an exit code without re-walking the print stream.
Verdict = str # "ok" | "warn" | "fail"
def check_python_version() -> Verdict:
label = "Python version"
major, minor = sys.version_info[:2]
if (major, minor) >= (3, 11):
_ok(label, f"Python {major}.{minor} (wheel requires >=3.11)")
return "ok"
_fail(
label,
f"Python {major}.{minor} is below the wheel's >=3.11 floor",
"upgrade Python (brew install python@3.12 / apt install python3.12) "
"or run molecule-mcp via a 3.11+ venv.",
)
return "fail"
def check_wheel_install() -> Verdict:
label = "Wheel install"
try:
version = importlib.metadata.version("molecule-ai-workspace-runtime")
except importlib.metadata.PackageNotFoundError:
_fail(
label,
"molecule-ai-workspace-runtime not found in this interpreter's site-packages",
"pip install molecule-ai-workspace-runtime "
"(or pipx install molecule-ai-workspace-runtime to get the "
"binary on PATH automatically).",
)
return "fail"
try:
importlib.import_module("molecule_runtime.mcp_cli")
except ImportError as e:
_fail(
label,
f"package found ({version}) but `molecule_runtime.mcp_cli` won't import: {e}",
"reinstall the wheel (pip install --force-reinstall "
"molecule-ai-workspace-runtime); if it still fails, file "
"a bug with the traceback.",
)
return "fail"
_ok(label, f"molecule-ai-workspace-runtime=={version}")
return "ok"
def check_path_for_binary() -> Verdict:
label = "PATH for molecule-mcp"
found = shutil.which("molecule-mcp")
if found:
_ok(label, f"resolves to {found}")
return "ok"
# Not on PATH — work out where pip put it so the suggestion is
# actionable instead of generic.
user_base = os.environ.get("PYTHONUSERBASE")
if not user_base:
try:
import site
user_base = site.getuserbase()
except Exception:
user_base = None
hint = (
f"add `{user_base}/bin` to PATH"
if user_base
else "switch to `pipx install molecule-ai-workspace-runtime` so the "
"binary lands in pipx's managed bin/ on PATH"
)
_fail(
label,
"molecule-mcp not found on PATH",
f"{hint}, or invoke via `python -m molecule_runtime.mcp_cli` directly.",
)
return "fail"
def _resolve_token() -> tuple[Optional[str], Optional[str]]:
"""Return ``(token_value, source_label)`` if the operator's
environment exposes a token, else ``(None, None)``.
Single source of truth used by both ``check_env_vars()`` (which
only needs the source label) and ``check_register()`` (which
needs the actual value to send a Bearer header). Keeping these
in one place means a future env-var addition only updates the
resolver — not two parallel readers that can drift.
"""
val = os.environ.get("MOLECULE_WORKSPACE_TOKEN", "").strip()
if val:
return val, "env MOLECULE_WORKSPACE_TOKEN"
file_var = os.environ.get("MOLECULE_WORKSPACE_TOKEN_FILE", "").strip()
if file_var:
if os.path.isfile(file_var):
try:
from pathlib import Path as _Path
return (
_Path(file_var).read_text().strip(),
f"file {file_var} (via MOLECULE_WORKSPACE_TOKEN_FILE)",
)
except OSError:
return None, None
return None, None
# Per-runtime container path used by the in-platform path; rarely
# set on external setups but check anyway so the message is
# accurate for both shapes.
try:
import configs_dir
candidate = configs_dir.resolve() / ".auth_token"
if candidate.is_file():
try:
return candidate.read_text().strip(), f"file {candidate}"
except OSError:
return None, None
except Exception:
pass
return None, None
def _resolve_token_summary() -> Optional[str]:
"""Return just the source label (no secret value). Convenience
wrapper around :func:`_resolve_token` for callers that don't
need the value itself.
"""
_, label = _resolve_token()
return label
def check_env_vars() -> Verdict:
label = "Env vars"
missing: list[str] = []
if not os.environ.get("PLATFORM_URL", "").strip():
missing.append("PLATFORM_URL")
if not os.environ.get("WORKSPACE_ID", "").strip() and not os.environ.get(
"MOLECULE_WORKSPACES", "",
).strip():
missing.append("WORKSPACE_ID (or MOLECULE_WORKSPACES)")
token_summary = _resolve_token_summary()
if not token_summary and not os.environ.get("MOLECULE_WORKSPACES", "").strip():
# MOLECULE_WORKSPACES is a JSON-array env that bundles its
# own per-workspace tokens — if it's set we trust the
# resolver to validate.
missing.append(
"MOLECULE_WORKSPACE_TOKEN (or MOLECULE_WORKSPACE_TOKEN_FILE, or "
"/configs/.auth_token)",
)
if missing:
_fail(
label,
f"unset: {', '.join(missing)}",
"see the canvas Connect-External-Agent modal — the snippet "
"exports all three. Use MOLECULE_WORKSPACE_TOKEN_FILE for the "
"token to keep secrets out of shell history.",
)
return "fail"
_ok(
label,
f"PLATFORM_URL + WORKSPACE_ID set; token from {token_summary or 'MOLECULE_WORKSPACES'}",
)
return "ok"
def _http_get(url: str, timeout: float = 5.0) -> tuple[Optional[int], Optional[str]]:
"""Best-effort GET that swallows transport errors and returns
(status, error_message). Status is None when the request couldn't
complete; error_message is None when the request returned 2xx.
"""
try:
# Origin header — staging tenants enforce same-origin via WAF;
# /healthz tolerates either way but matching production headers
# surfaces auth-style 401s correctly during the doctor run.
req = urllib_request.Request(
url,
headers={"Origin": os.environ.get("PLATFORM_URL", "").rstrip("/")},
)
with urllib_request.urlopen(req, timeout=timeout) as resp:
return resp.status, None
except URLError as e:
return None, str(e.reason if hasattr(e, "reason") else e)
except Exception as e:
return None, str(e)
def check_platform_health() -> Verdict:
label = "Platform reachability"
base = os.environ.get("PLATFORM_URL", "").strip().rstrip("/")
if not base:
_warn(label, "skipped (PLATFORM_URL unset — see Env vars)", "set PLATFORM_URL first")
return "warn"
if not base.startswith(("http://", "https://")):
_fail(
label,
f"PLATFORM_URL missing scheme: {base!r}",
"set PLATFORM_URL to include https:// — e.g. "
"PLATFORM_URL=https://your-tenant.staging.moleculesai.app",
)
return "fail"
if base.endswith("/"):
_warn(
label,
"PLATFORM_URL has trailing slash (will be stripped automatically)",
"remove the trailing slash to match the snippet shape",
)
status, err = _http_get(f"{base}/healthz")
if status is None:
_fail(label, f"GET {base}/healthz failed: {err}", "check DNS + firewall + scheme")
return "fail"
if not (200 <= status < 300):
_fail(label, f"GET {base}/healthz returned HTTP {status}", "verify the tenant subdomain is correct + provisioned")
return "fail"
_ok(label, f"GET {base}/healthz → {status}")
return "ok"
def check_token_auth() -> Verdict:
"""Light auth check via POST /registry/heartbeat.
Why heartbeat and not register: register is an UPSERT — sending
it from doctor would clobber the workspace's actual agent_card
(name, description, version) until the real agent next calls
register. That's an invisible production-disruption: someone
runs ``molecule-mcp doctor`` against a live workspace and the
canvas briefly displays "doctor-probe" as the agent name.
Heartbeat only updates last_heartbeat_at (and clears
awaiting_agent if needed) — that's exactly what a normal
molecule-mcp boot does every 20s, so an extra heartbeat from
the doctor is indistinguishable from background traffic.
Skipped when env vars failed earlier so the operator isn't shown
a redundant 401.
"""
label = "Token auth"
base = os.environ.get("PLATFORM_URL", "").strip().rstrip("/")
workspace_id = os.environ.get("WORKSPACE_ID", "").strip()
token, source_label = _resolve_token()
if not (base and workspace_id and token):
_warn(label, "skipped (Env vars must pass first)", "fix Env vars, re-run")
return "warn"
import json
body = json.dumps({"id": workspace_id}).encode()
req = urllib_request.Request(
f"{base}/registry/heartbeat",
data=body,
method="POST",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"Origin": base,
},
)
try:
with urllib_request.urlopen(req, timeout=8.0) as resp:
status = resp.status
except URLError as e:
# Pull HTTP code from HTTPError; transport errors don't have one.
status = getattr(e, "code", None)
err = str(e.reason if hasattr(e, "reason") else e)
if status is None:
_fail(label, f"POST {base}/registry/heartbeat failed: {err}", "check network")
return "fail"
except Exception as e:
_fail(label, f"POST heartbeat failed: {e}", "check network")
return "fail"
if status == 401:
_fail(
label,
"401 Unauthorized — token rejected",
"tokens are shown only once at workspace-create time; "
"re-create the workspace OR rotate via canvas Tokens tab.",
)
return "fail"
if status == 404:
_fail(
label,
f"404 — workspace_id {workspace_id} not found on {base}",
"verify WORKSPACE_ID matches a real workspace + the tenant "
"subdomain in PLATFORM_URL.",
)
return "fail"
if not (200 <= status < 300):
_fail(label, f"POST heartbeat returned HTTP {status}", "see platform logs")
return "fail"
_ok(label, f"POST {base}/registry/heartbeat → {status} (token from {source_label})")
return "ok"
# Back-compat alias: the previous name was check_register, but the
# implementation switched to a non-mutating heartbeat probe (see
# check_token_auth's docstring). Kept so external test suites or
# pinned-import scripts don't break on the rename.
check_register = check_token_auth
CHECKS = [
check_python_version,
check_wheel_install,
check_path_for_binary,
check_env_vars,
check_platform_health,
check_token_auth,
]
def run() -> int:
"""Run all checks and return a process exit code (0 ok, 1 if any fail)."""
print("molecule-mcp doctor — onboarding diagnostic")
print()
verdicts = []
for chk in CHECKS:
try:
verdicts.append(chk())
except Exception as e:
# A buggy check shouldn't kill the rest of the doctor run.
print(f" [BUG] {chk.__name__}: unexpected {type(e).__name__}: {e}")
verdicts.append("fail")
print()
fails = sum(1 for v in verdicts if v == "fail")
warns = sum(1 for v in verdicts if v == "warn")
if fails:
print(f"{fails} check(s) failed, {warns} warning(s). Fix the FAIL items above and re-run.")
return 1
if warns:
print(f"All required checks passed; {warns} warning(s) — review the next-step hints.")
return 0
print("All checks passed.")
return 0
-325
View File
@@ -1,325 +0,0 @@
"""Heartbeat + register thread for the standalone ``molecule-mcp`` wrapper.
Extracted from ``mcp_cli.py`` (RFC #2873 iter 3) so the heartbeat /
register concern lives in its own module. The console-script entry
``mcp_cli:main`` still drives the spawn, but the loop body, auth-failure
escalation, and inbound-secret persistence now live here so they can be
read, tested, and replaced independently of the orchestrator.
Public surface:
* ``HEARTBEAT_INTERVAL_SECONDS`` — cadence constant.
* ``build_agent_card(workspace_id)`` — payload helper.
* ``platform_register(platform_url, workspace_id, token)`` — one-shot
POST /registry/register at startup.
* ``start_heartbeat_thread(platform_url, workspace_id, token)`` — spawn
the daemon thread.
"""
from __future__ import annotations
import logging
import os
import sys
import threading
import time
logger = logging.getLogger(__name__)
# Heartbeat cadence. Must be tighter than healthsweep's stale window
# (currently 60-90s — see registry/healthsweep.go) by a comfortable
# margin so a single missed heartbeat doesn't flip awaiting_agent.
# 20s gives the operator's network 3 attempts within the budget; long
# enough that it doesn't spam, short enough to recover quickly after
# laptop sleep.
HEARTBEAT_INTERVAL_SECONDS = 20.0
# After this many consecutive 401/403 heartbeats, escalate from
# WARNING to ERROR with re-onboard guidance. 3 ticks at 20s = ~1 minute
# of sustained auth failure — enough to rule out a transient platform
# blip but quick enough that an operator doesn't sit puzzled for 10
# minutes wondering why their MCP tools 401. Same threshold used for
# repeat-logging at 20-tick (~7 min) intervals so a long-running
# session that missed the first ERROR still sees the message.
HEARTBEAT_AUTH_LOUD_THRESHOLD = 3
HEARTBEAT_AUTH_RELOG_INTERVAL = 20
def build_agent_card(workspace_id: str) -> dict:
"""Build the ``agent_card`` payload sent to /registry/register.
Three optional env vars override the defaults so an operator can
surface human-readable identity + capabilities to peers and the
canvas Skills tab without code changes:
* ``MOLECULE_AGENT_NAME`` — display name (defaults to
``molecule-mcp-{id[:8]}``). Surfaced in canvas workspace cards
and ``list_peers`` output.
* ``MOLECULE_AGENT_DESCRIPTION`` — one-liner about the agent's
purpose. Rendered in canvas Details + Skills tabs.
* ``MOLECULE_AGENT_SKILLS`` — comma-separated skill names
(e.g. ``research,code-review,memory-curation``). Each name is
expanded to a ``{"name": ...}`` skill object — the minimum
shape that satisfies both ``shared_runtime.summarize_peers``
(uses ``s["name"]``) and the canvas SkillsTab.tsx schema
(id falls back to name when omitted). Empty / whitespace
entries are dropped.
Defaults match the previous hardcoded behaviour exactly so this
is a strict superset — an operator who sets none of the env vars
sees no change.
"""
name = (os.environ.get("MOLECULE_AGENT_NAME") or "").strip()
if not name:
name = f"molecule-mcp-{workspace_id[:8]}"
description = (os.environ.get("MOLECULE_AGENT_DESCRIPTION") or "").strip()
skills_raw = (os.environ.get("MOLECULE_AGENT_SKILLS") or "").strip()
skills: list[dict] = []
if skills_raw:
for s in skills_raw.split(","):
label = s.strip()
if label:
skills.append({"name": label})
card: dict = {"name": name, "skills": skills}
if description:
card["description"] = description
return card
def platform_register(platform_url: str, workspace_id: str, token: str) -> None:
"""One-shot register at startup; fails fast on auth errors.
Lifts the workspace from ``awaiting_agent`` to ``online`` for
operators who never ran the curl-register snippet. Safe to call
repeatedly: the platform's register handler is an upsert that
just refreshes ``url``, ``agent_card``, and ``status``.
Failure model (post-review):
- 401 / 403 → ``sys.exit(3)`` immediately. The operator's
token is wrong; silently looping in a broken state would
make this hard to diagnose because the MCP tools would 401
on every call too. Hard-fail is the kindest option.
- Other 4xx/5xx → log a warning + continue. The heartbeat
thread will surface persistent failures; transient platform
blips shouldn't abort the MCP loop.
- Network / transport errors → log + continue. Same reasoning.
Origin header is required by the SaaS edge WAF; without it
/registry/register currently still works (it's on the WAF
allowlist), but the heartbeat path needs Origin and we want one
consistent header set across both calls.
"""
try:
import httpx
except ImportError:
# httpx is a transitive dep via a2a-sdk; if missing, the MCP
# server won't import either. Let the caller's later import
# surface the real error.
return
payload = {
"id": workspace_id,
"url": "",
"agent_card": build_agent_card(workspace_id),
"delivery_mode": "poll",
}
headers = {
"Authorization": f"Bearer {token}",
"Origin": platform_url,
"Content-Type": "application/json",
}
try:
with httpx.Client(timeout=10.0) as client:
resp = client.post(
f"{platform_url}/registry/register",
json=payload,
headers=headers,
)
if resp.status_code in (401, 403):
print(
f"molecule-mcp: register rejected with HTTP {resp.status_code}"
f"the token in MOLECULE_WORKSPACE_TOKEN is invalid for workspace "
f"{workspace_id}. Regenerate from the canvas → Tokens tab.",
file=sys.stderr,
)
sys.exit(3)
if resp.status_code >= 400:
logger.warning(
"molecule-mcp: register POST returned HTTP %d: %s",
resp.status_code,
(resp.text or "")[:200],
)
else:
logger.info(
"molecule-mcp: registered workspace %s with platform",
workspace_id,
)
except SystemExit:
raise
except Exception as exc: # noqa: BLE001
logger.warning("molecule-mcp: register POST failed: %s", exc)
def heartbeat_loop(
platform_url: str,
workspace_id: str,
token: str,
interval: float = HEARTBEAT_INTERVAL_SECONDS,
) -> None:
"""Daemon thread body: POST /registry/heartbeat every ``interval``s.
Failures are logged at WARNING and the loop continues. The thread
exits when the main process does (daemon=True). Each iteration
rebuilds the payload + headers — cheap and ensures token rotation
via env var (rare but possible) is picked up on the next tick.
"""
try:
import httpx
except ImportError:
return
start_time = time.time()
consecutive_auth_failures = 0
while True:
body = {
"workspace_id": workspace_id,
"error_rate": 0.0,
"sample_error": "",
"active_tasks": 0,
"uptime_seconds": int(time.time() - start_time),
}
headers = {
"Authorization": f"Bearer {token}",
"Origin": platform_url,
"Content-Type": "application/json",
}
try:
with httpx.Client(timeout=10.0) as client:
resp = client.post(
f"{platform_url}/registry/heartbeat",
json=body,
headers=headers,
)
if resp.status_code in (401, 403):
consecutive_auth_failures += 1
log_heartbeat_auth_failure(
consecutive_auth_failures, workspace_id, resp.status_code,
)
elif resp.status_code >= 400:
# Non-auth HTTP error — log, but DO NOT touch the
# auth-failure counter (5xx blips, 429, etc. are
# transient and unrelated to token validity).
logger.warning(
"molecule-mcp: heartbeat HTTP %d: %s",
resp.status_code,
(resp.text or "")[:200],
)
else:
consecutive_auth_failures = 0
persist_inbound_secret_from_heartbeat(resp)
except Exception as exc: # noqa: BLE001
logger.warning("molecule-mcp: heartbeat failed: %s", exc)
time.sleep(interval)
def log_heartbeat_auth_failure(count: int, workspace_id: str, status_code: int) -> None:
"""Escalate consecutive heartbeat 401/403s from quiet WARNING to
actionable ERROR.
The operator's first sign of trouble shouldn't be "tools 401 with no
explanation" — that was the failure mode that motivated this code,
triggered by a workspace being deleted server-side and its tokens
revoked while the runtime kept heartbeating in silence.
Cadence:
* count < threshold: WARNING per tick (transient — could be a
platform blip, don't shout yet)
* count == threshold: ERROR with re-onboard instructions
(the first signal the operator can't miss)
* count > threshold and (count - threshold) % relog == 0: re-log
ERROR (so a session that started after the first ERROR still
sees the message scrolling past in their logs)
"""
if count < HEARTBEAT_AUTH_LOUD_THRESHOLD:
logger.warning(
"molecule-mcp: heartbeat HTTP %d (auth failure %d/%d) — "
"token may be revoked. Will retry; if persistent, regenerate "
"from canvas → Tokens.",
status_code, count, HEARTBEAT_AUTH_LOUD_THRESHOLD,
)
return
# At or past the threshold — this is the loud actionable error.
if count == HEARTBEAT_AUTH_LOUD_THRESHOLD or (
count - HEARTBEAT_AUTH_LOUD_THRESHOLD
) % HEARTBEAT_AUTH_RELOG_INTERVAL == 0:
logger.error(
"molecule-mcp: %d consecutive heartbeat auth failures (HTTP %d) — "
"the token in MOLECULE_WORKSPACE_TOKEN has been REVOKED, likely "
"because workspace %s was deleted server-side. The MCP server is "
"still running but every platform call will fail. Regenerate the "
"workspace + token from the canvas (Tokens tab), update your MCP "
"config, and restart your runtime.",
count, status_code, workspace_id,
)
def persist_inbound_secret_from_heartbeat(resp: object) -> None:
"""Persist ``platform_inbound_secret`` from a heartbeat response, if any.
The platform's heartbeat handler returns the secret on every beat
(mirroring /registry/register) so a workspace that lazy-healed the
secret on the platform side — typical recovery path for a workspace
whose row had a NULL ``platform_inbound_secret`` after a partial
bootstrap — picks it up within one heartbeat tick instead of
requiring a runtime restart.
Without this delivery path the chat-upload code path's "secret was
just minted, will pick up on next heartbeat" 503 message is a lie
and the workspace stays 401-forever until the operator restarts
the runtime. Caught 2026-04-30 on hongmingwang tenant.
Failure is non-fatal: if the body isn't JSON, doesn't carry the
field, or the disk write fails, the next heartbeat retries. This
matches the cold-start register flow in main.py:319-323.
"""
try:
body = resp.json()
except Exception: # noqa: BLE001
return
if not isinstance(body, dict):
return
secret = body.get("platform_inbound_secret")
if not secret:
return
try:
from platform_inbound_auth import save_inbound_secret
save_inbound_secret(secret)
except Exception as exc: # noqa: BLE001
logger.warning(
"molecule-mcp: persist inbound secret from heartbeat failed: %s", exc
)
def start_heartbeat_thread(
platform_url: str,
workspace_id: str,
token: str,
) -> threading.Thread:
"""Start the heartbeat daemon thread. Returns the Thread handle.
The MCP stdio loop runs in the foreground (asyncio); this thread
runs alongside it. ``daemon=True`` so when the operator hits
Ctrl-C / closes the runtime, the heartbeat dies with it instead
of leaking and writing to a stale workspace.
"""
t = threading.Thread(
target=heartbeat_loop,
args=(platform_url, workspace_id, token),
name="molecule-mcp-heartbeat",
daemon=True,
)
t.start()
return t
-63
View File
@@ -1,63 +0,0 @@
"""Inbox-poller spawn helpers for the standalone ``molecule-mcp`` wrapper.
Extracted from ``mcp_cli.py`` (RFC #2873 iter 3). The poller is the
INBOUND side of the standalone path — without it, the universal MCP
server is outbound-only (can call ``delegate_task`` /
``send_message_to_user``, never observes canvas-user / peer-agent
messages).
Public surface:
* ``start_inbox_pollers(platform_url, workspace_ids)`` — activate the
inbox singleton and spawn one daemon poller per workspace.
"""
from __future__ import annotations
import logging
logger = logging.getLogger(__name__)
def start_inbox_pollers(platform_url: str, workspace_ids: list[str]) -> None:
"""Activate the inbox singleton + spawn one poller daemon thread per workspace.
Done lazily here (not at module import) because importing inbox
pulls in platform_auth, which only resolves cleanly AFTER env
validation succeeds. Activation is idempotent within a process,
so a stray double-call (e.g. test harness re-entering main) is
harmless.
The poller threads are daemon=True — die with the main process.
Single-workspace path: one poller, single cursor file at the legacy
location (``.mcp_inbox_cursor``). Cursor-key resolution falls back
to the empty string for back-compat with operators whose existing
on-disk cursor was written by the pre-multi-workspace code.
Multi-workspace path: N pollers, each with its own cursor file
keyed by ``workspace_id[:8]``. Cursors live next to each other in
configs_dir so an operator inspecting state sees all of them
together.
"""
try:
import inbox
except ImportError as exc:
logger.warning("molecule-mcp: inbox module unavailable: %s", exc)
return
if len(workspace_ids) <= 1:
# Back-compat exact: single-workspace mode reuses the legacy
# cursor filename + cursor_path constructor arg, so an existing
# operator's on-disk state isn't invalidated by upgrade.
wsid = workspace_ids[0]
state = inbox.InboxState(cursor_path=inbox.default_cursor_path())
inbox.activate(state)
inbox.start_poller_thread(state, platform_url, wsid)
return
# Multi-workspace: per-workspace cursor file, one shared queue.
cursor_paths = {wsid: inbox.default_cursor_path(wsid) for wsid in workspace_ids}
state = inbox.InboxState(cursor_paths=cursor_paths)
inbox.activate(state)
for wsid in workspace_ids:
inbox.start_poller_thread(state, platform_url, wsid)
-240
View File
@@ -1,240 +0,0 @@
"""Env validation + workspace resolution for the standalone ``molecule-mcp``.
Extracted from ``mcp_cli.py`` (RFC #2873 iter 3). Deals with the two
shapes ``molecule-mcp`` accepts:
* Single-workspace legacy shape: ``WORKSPACE_ID`` + token from
``MOLECULE_WORKSPACE_TOKEN`` or ``${CONFIGS_DIR}/.auth_token``.
* Multi-workspace JSON shape: ``MOLECULE_WORKSPACES`` env var carries a
JSON array of ``{"id": ..., "token": ...}`` entries.
Public surface:
* ``resolve_workspaces()`` → ``(workspaces, errors)``.
* ``read_token_file()`` → token text or ``""``.
* ``print_missing_env_help(missing, have_token_file)`` — operator-help
printer.
"""
from __future__ import annotations
import json
import os
import sys
import configs_dir
def resolve_workspaces() -> tuple[list[tuple[str, str]], list[str]]:
"""Return the list of ``(workspace_id, token)`` pairs to register.
Resolution order:
1. ``MOLECULE_WORKSPACES`` env var — JSON array of
``{"id": "...", "token": "..."}`` objects. Activates the
multi-workspace external-agent path (one process registered into
N workspaces). When set, ``WORKSPACE_ID`` / ``MOLECULE_WORKSPACE_TOKEN``
are IGNORED — the JSON is the source of truth.
2. Single-workspace fallback — ``WORKSPACE_ID`` env var + token
resolved in this order:
a. ``MOLECULE_WORKSPACE_TOKEN`` (inline env — convenient but
leaks into shell history + plaintext MCP-host config).
b. ``MOLECULE_WORKSPACE_TOKEN_FILE`` (path to a file holding
the token — operator can keep it 0600 in their home dir;
survives shell-history scrubs).
c. ``${CONFIGS_DIR}/.auth_token`` (in-container runtimes —
the platform writes this on provision).
Returns ``(workspaces, errors)``:
* ``workspaces``: list of ``(workspace_id, token)`` — non-empty
on the happy path.
* ``errors``: human-readable strings describing what's missing /
malformed. ``main()`` surfaces these with the same shape as
``print_missing_env_help`` so the operator's first run gives
actionable output.
Why JSON env (not file): ergonomic for Claude Code MCP config (one
string in ``mcpServers.molecule.env`` instead of a sidecar file)
and for CI / launchers. A separate config-file path can be added
later without breaking this.
"""
raw = os.environ.get("MOLECULE_WORKSPACES", "").strip()
if raw:
try:
parsed = json.loads(raw)
except json.JSONDecodeError as exc:
return [], [
f"MOLECULE_WORKSPACES is not valid JSON ({exc.msg} at pos "
f"{exc.pos}). Expected: '[{{\"id\":\"<wsid>\",\"token\":"
f"\"<tok>\"}},{{...}}]'"
]
if not isinstance(parsed, list) or not parsed:
return [], [
"MOLECULE_WORKSPACES must be a non-empty JSON array of "
"{\"id\":\"...\",\"token\":\"...\"} objects"
]
out: list[tuple[str, str]] = []
seen: set[str] = set()
errors: list[str] = []
for i, entry in enumerate(parsed):
if not isinstance(entry, dict):
errors.append(
f"MOLECULE_WORKSPACES[{i}] is not an object — got {type(entry).__name__}"
)
continue
wsid = str(entry.get("id", "")).strip()
tok = str(entry.get("token", "")).strip()
if not wsid or not tok:
errors.append(
f"MOLECULE_WORKSPACES[{i}] missing 'id' or 'token'"
)
continue
if wsid in seen:
errors.append(
f"MOLECULE_WORKSPACES[{i}] duplicate workspace id {wsid!r}"
)
continue
seen.add(wsid)
out.append((wsid, tok))
if errors:
return [], errors
return out, []
# Single-workspace back-compat path.
wsid = os.environ.get("WORKSPACE_ID", "").strip()
if not wsid:
return [], ["WORKSPACE_ID (or MOLECULE_WORKSPACES) is required"]
# Token resolution order (#2934): inline env → file path → CONFIGS_DIR
# default. The file-path option exists so operators can keep the
# bearer out of shell history and out of MCP-host config plaintext
# (e.g. ~/.claude.json) — set MOLECULE_WORKSPACE_TOKEN_FILE to a
# 0600 file containing the token. The CONFIGS_DIR/.auth_token
# fallback predates this and stays for in-container runtimes.
tok = os.environ.get("MOLECULE_WORKSPACE_TOKEN", "").strip()
if not tok:
tok, tf_err = _read_token_from_file_env()
if tf_err:
# Operator explicitly pointed TOKEN_FILE somewhere — surface
# the SPECIFIC failure (path doesn't exist, isn't readable,
# or holds a blank file) instead of falling through to the
# generic "set one of these three vars" message. Otherwise
# they get exactly the silent failure mode #2934 flagged
# ("a new user has no chance"). Skip the CONFIGS_DIR
# fallback in this case — the operator's intent is clearly
# to use the file path; deferring to a different source
# would mask their config error.
return [], [tf_err]
if not tok:
tok = read_token_file()
if not tok:
return [], [
"MOLECULE_WORKSPACE_TOKEN, MOLECULE_WORKSPACE_TOKEN_FILE, or "
"CONFIGS_DIR/.auth_token is required"
]
return [(wsid, tok)], []
def _read_token_from_file_env() -> tuple[str, str]:
"""Read the token from the file path in MOLECULE_WORKSPACE_TOKEN_FILE.
Returns ``(token, error)``:
* env var unset/blank → ``("", "")`` — caller falls through silently
to the next source; the operator didn't ask for this path.
* file open/read fails (missing, permission denied, decode error)
→ ``("", "<specific error>")`` — caller surfaces it directly.
The operator EXPLICITLY pointed at this path, so a generic
fallthrough error would mask their config bug (#2934).
* file is blank → ``("", "<blank file error>")`` — same reasoning.
* file read returns junk with internal whitespace/newlines (e.g.
a CSV cell, accidental multi-token paste) → ``("", "<error>")``
rather than concatenating into a malformed bearer that 401s
against the platform with no context.
* happy path → ``("<token>", "")``.
"""
path = os.environ.get("MOLECULE_WORKSPACE_TOKEN_FILE", "").strip()
if not path:
return "", ""
try:
with open(path, encoding="utf-8") as fh:
raw = fh.read()
except FileNotFoundError:
return "", (
f"MOLECULE_WORKSPACE_TOKEN_FILE points to {path!r} which "
f"does not exist"
)
except PermissionError:
return "", (
f"MOLECULE_WORKSPACE_TOKEN_FILE={path!r} is not readable "
f"(permission denied)"
)
except OSError as exc:
return "", (
f"MOLECULE_WORKSPACE_TOKEN_FILE={path!r} could not be read: "
f"{exc}"
)
except UnicodeDecodeError:
return "", (
f"MOLECULE_WORKSPACE_TOKEN_FILE={path!r} is not valid UTF-8"
)
tok = raw.strip()
if not tok:
return "", (
f"MOLECULE_WORKSPACE_TOKEN_FILE={path!r} is empty"
)
# Reject tokens with internal whitespace — a CSV cell or accidental
# multi-token paste would otherwise become a malformed bearer that
# 401s against the platform with no diagnostic.
if any(ch.isspace() for ch in tok):
return "", (
f"MOLECULE_WORKSPACE_TOKEN_FILE={path!r} contains internal "
f"whitespace — expected a single token"
)
return tok, ""
def print_missing_env_help(missing: list[str], have_token_file: bool) -> None:
print("molecule-mcp: missing required environment.\n", file=sys.stderr)
print("Set the following before running molecule-mcp:", file=sys.stderr)
print(" WORKSPACE_ID — your workspace UUID (from canvas)", file=sys.stderr)
print(
" PLATFORM_URL — base URL of your Molecule platform "
"(e.g. https://your-tenant.staging.moleculesai.app)",
file=sys.stderr,
)
if not have_token_file:
print(
" MOLECULE_WORKSPACE_TOKEN — bearer token for this workspace "
"(canvas → Tokens tab)",
file=sys.stderr,
)
print(
" OR set MOLECULE_WORKSPACE_TOKEN_FILE"
" to a path that holds the token",
file=sys.stderr,
)
print(
" (keeps the secret out of shell"
" history and MCP-host config plaintext)",
file=sys.stderr,
)
print("", file=sys.stderr)
print(f"Currently missing: {', '.join(missing)}", file=sys.stderr)
def read_token_file() -> str:
"""Read the token from the resolved configs dir's ``.auth_token`` if
present.
Mirrors platform_auth._token_file's location resolution but without
importing the heavy module here (that import triggers a2a_client's
WORKSPACE_ID guard which is fine after env validation, but cheaper
to inline a 4-line file read than pull in the whole stack just for
the path).
"""
path = configs_dir.resolve() / ".auth_token"
if not path.is_file():
return ""
try:
return path.read_text().strip()
except OSError:
return ""
-71
View File
@@ -1,71 +0,0 @@
#!/usr/bin/env python3
"""Update workspace task status on the canvas.
Usage (from any script, cron job, or shell inside the container):
# Set current task (shows on canvas card)
python3 -m molecule_runtime.molecule_ai_status "Running weekly SEO audit..."
# Clear task (removes banner from canvas)
python3 -m molecule_runtime.molecule_ai_status ""
The status appears as an amber banner on the workspace card in the canvas,
visible to the project owner in real-time.
"""
import os
import sys
import httpx
_WORKSPACE_ID_raw = os.environ.get("WORKSPACE_ID")
if not _WORKSPACE_ID_raw:
raise RuntimeError("WORKSPACE_ID environment variable is required but not set")
WORKSPACE_ID = _WORKSPACE_ID_raw
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
def set_status(task: str):
"""Push current_task to platform via heartbeat."""
try:
try:
from platform_auth import auth_headers as _auth
_headers = _auth()
except Exception:
_headers = {}
httpx.post(
f"{PLATFORM_URL}/registry/heartbeat",
json={
"workspace_id": WORKSPACE_ID,
"current_task": task,
"active_tasks": 1 if task else 0,
"error_rate": 0,
"sample_error": "",
"uptime_seconds": 0,
},
headers=_headers,
timeout=5.0,
)
if task:
# Also log as activity for traceability
httpx.post(
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/activity",
json={
"activity_type": "task_update",
"source_id": WORKSPACE_ID,
"summary": task,
"status": "ok",
},
timeout=5.0,
)
except Exception as e:
print(f"molecule_ai_status: failed to update: {e}", file=sys.stderr)
if __name__ == "__main__": # pragma: no cover
if len(sys.argv) < 2:
print("Usage: python3 -m molecule_runtime.molecule_ai_status 'task description'")
print(" python3 -m molecule_runtime.molecule_ai_status '' # clear")
sys.exit(1)
set_status(sys.argv[1])
-24
View File
@@ -1,24 +0,0 @@
"""molecule_audit — HMAC-SHA256-chained immutable agent event log.
EU AI Act Annex III compliance (Art. 12/13 record-keeping, Art. 17 quality
management) for high-risk AI systems.
Quick start
-----------
from molecule_audit.hooks import LedgerHooks
with LedgerHooks(session_id=task_id) as hooks:
hooks.on_task_start(input_text=user_prompt)
# ... call LLM / tools ...
hooks.on_llm_call(model="hermes-3", output_text=reply)
hooks.on_task_end(output_text=result)
Verify a chain
--------------
python -m molecule_audit.verify --agent-id <id>
"""
from .ledger import AuditEvent, append_event, get_engine, verify_chain
from .hooks import LedgerHooks
__all__ = ["AuditEvent", "append_event", "get_engine", "verify_chain", "LedgerHooks"]
-244
View File
@@ -1,244 +0,0 @@
"""molecule_audit.hooks — Pipeline hook registrations for the audit ledger.
Registers audit events at four EU AI Act Art. 12 pipeline checkpoints:
task_start — an A2A task begins execution
llm_call — a model inference call is made (records model name)
tool_call — a tool/function is invoked (records tool name in model_used)
task_end — a task completes (success or failure)
Usage
-----
The recommended pattern is to create a LedgerHooks instance at the start of
each task and use it as a context manager:
from molecule_audit.hooks import LedgerHooks
with LedgerHooks(session_id=task_id, agent_id=agent_id) as hooks:
hooks.on_task_start(input_text=user_prompt)
response = call_llm(model="hermes-4", prompt=user_prompt)
hooks.on_llm_call(model="hermes-4", input_text=user_prompt,
output_text=response)
result = run_tool("search", query=user_prompt)
hooks.on_tool_call("search", input_data=user_prompt, output_data=result)
hooks.on_task_end(output_text=result)
All hook methods swallow exceptions so that audit failures never block the
agent pipeline. Failures are emitted at WARNING level.
Privacy note
------------
Raw input/output text is never persisted. All on_* methods take plaintext
for convenience and immediately hash it with SHA-256 via hash_content().
Only the hex digest is stored in the ledger.
"""
from __future__ import annotations
import json
import logging
import os
from typing import Any
from .ledger import append_event, get_session_factory, hash_content
logger = logging.getLogger(__name__)
# Default agent identity — set by the platform when launching a workspace container.
_DEFAULT_AGENT_ID: str = os.environ.get("WORKSPACE_ID", "unknown-agent")
class LedgerHooks:
"""Lifecycle hooks that write signed events to the audit ledger.
Parameters
----------
session_id: Task / conversation ID (gen_ai.conversation.id).
Required — must be unique per agent session.
agent_id: Identity of this agent.
Defaults to the WORKSPACE_ID env var.
db_url: SQLAlchemy URL override — useful in tests to point at
an in-memory SQLite DB (``"sqlite:///:memory:"``).
human_oversight_flag: Default oversight flag written on task_start / task_end.
Can be overridden per call.
"""
def __init__(
self,
session_id: str,
agent_id: str | None = None,
db_url: str | None = None,
human_oversight_flag: bool = False,
) -> None:
self.agent_id: str = agent_id or _DEFAULT_AGENT_ID
self.session_id: str = session_id
self._db_url: str | None = db_url
self._default_human_oversight: bool = human_oversight_flag
self._session = None
# ------------------------------------------------------------------
# Session management
# ------------------------------------------------------------------
def _open_session(self):
"""Return a lazily-opened SQLAlchemy session (cached for this instance)."""
if self._session is None:
factory = get_session_factory(self._db_url)
self._session = factory()
return self._session
def close(self) -> None:
"""Release the underlying SQLAlchemy session."""
if self._session is not None:
self._session.close()
self._session = None
def __enter__(self) -> "LedgerHooks":
return self
def __exit__(self, exc_type, exc_val, exc_tb) -> None:
self.close()
# ------------------------------------------------------------------
# Four pipeline hook points (EU AI Act Art. 12)
# ------------------------------------------------------------------
def on_task_start(
self,
input_text: str | None = None,
human_oversight_flag: bool | None = None,
risk_flag: bool = False,
) -> None:
"""Log ``operation=task_start`` when an agent task begins.
Parameters
----------
input_text: Raw user / caller input (hashed before storage).
human_oversight_flag: Override the instance-level default.
risk_flag: Set True when the input triggers a risk condition.
"""
self._safe_append(
operation="task_start",
input_hash=hash_content(input_text),
human_oversight_flag=(
human_oversight_flag
if human_oversight_flag is not None
else self._default_human_oversight
),
risk_flag=risk_flag,
)
def on_llm_call(
self,
model: str,
input_text: str | None = None,
output_text: str | None = None,
risk_flag: bool = False,
) -> None:
"""Log ``operation=llm_call`` when a model inference call is made.
Parameters
----------
model: Model identifier (e.g. ``"hermes-4-405b"``).
input_text: Prompt / messages sent to the model (hashed).
output_text: Model response text (hashed).
risk_flag: Set True when the response triggers a risk condition.
"""
self._safe_append(
operation="llm_call",
input_hash=hash_content(input_text),
output_hash=hash_content(output_text),
model_used=model,
risk_flag=risk_flag,
)
def on_tool_call(
self,
tool_name: str,
input_data: Any = None,
output_data: Any = None,
risk_flag: bool = False,
) -> None:
"""Log ``operation=tool_call`` when a tool/function is invoked.
Parameters
----------
tool_name: Name of the tool or function (stored in ``model_used``).
input_data: Tool input — str, bytes, or JSON-serializable object (hashed).
output_data: Tool output — same type options (hashed).
risk_flag: Set True when the tool result triggers a risk condition.
"""
self._safe_append(
operation="tool_call",
input_hash=hash_content(_to_bytes(input_data)),
output_hash=hash_content(_to_bytes(output_data)),
model_used=tool_name,
risk_flag=risk_flag,
)
def on_task_end(
self,
output_text: str | None = None,
human_oversight_flag: bool | None = None,
risk_flag: bool = False,
) -> None:
"""Log ``operation=task_end`` when a task completes.
Parameters
----------
output_text: Final task output / result (hashed before storage).
human_oversight_flag: Override the instance-level default.
risk_flag: Set True when the final result triggers a risk condition.
"""
self._safe_append(
operation="task_end",
output_hash=hash_content(output_text),
human_oversight_flag=(
human_oversight_flag
if human_oversight_flag is not None
else self._default_human_oversight
),
risk_flag=risk_flag,
)
# ------------------------------------------------------------------
# Internal helpers
# ------------------------------------------------------------------
def _safe_append(self, **kwargs) -> None:
"""Append an audit event, swallowing all exceptions.
Audit failures must never block the agent pipeline. All errors are
logged at WARNING level so operators can detect gaps in the log.
"""
try:
append_event(
agent_id=self.agent_id,
session_id=self.session_id,
db_session=self._open_session(),
**kwargs,
)
except Exception as exc:
logger.warning(
"audit: failed to append event "
"(agent=%s session=%s op=%s): %s",
self.agent_id,
self.session_id,
kwargs.get("operation", "?"),
exc,
)
# ---------------------------------------------------------------------------
# Private helpers
# ---------------------------------------------------------------------------
def _to_bytes(value: Any) -> bytes | None:
"""Convert a value to bytes for hashing; returns None for None."""
if value is None:
return None
if isinstance(value, bytes):
return value
if isinstance(value, str):
return value.encode("utf-8")
# JSON-serializable objects (dicts, lists, etc.)
return json.dumps(value, sort_keys=True, separators=(",", ":")).encode("utf-8")
-434
View File
@@ -1,434 +0,0 @@
"""molecule_audit.ledger — HMAC-SHA256-chained SQLAlchemy audit event log.
EU AI Act Annex III compliance (Art. 12/13 record-keeping, Art. 17 quality
management system) for high-risk AI systems.
HMAC chain design (EDDI pattern, PBKDF2 + SHA-256)
----------------------------------------------------
Key derivation:
key = PBKDF2HMAC(
algorithm=SHA-256,
password=AUDIT_LEDGER_SALT, # from env — the shared secret
salt=b"molecule-audit-ledger-v1", # fixed domain separator
iterations=210_000,
length=32,
)
Canonical JSON (for HMAC input):
json.dumps(row_dict_without_hmac_field, sort_keys=True, separators=(",", ":"))
Timestamp is serialised as RFC-3339 seconds-precision with Z suffix
(e.g. "2026-04-17T12:34:56Z") so the format matches Go's time.Time.UTC().
Per-row HMAC:
hmac_hex = HMAC-SHA256(key, canonical_json.encode()).hexdigest()
Chain linkage:
prev_hmac = hmac field of the immediately prior row for this agent_id
(None / NULL for the first row of each agent)
Tamper-evidence: any row modification breaks all subsequent HMACs for that
agent_id.
Environment variables
---------------------
AUDIT_LEDGER_SALT REQUIRED. Secret salt used as PBKDF2 password.
Raises RuntimeError at first key-derivation call if unset.
AUDIT_LEDGER_DB Path to SQLite file.
Default: /var/log/molecule/audit_ledger.db
Override with a full SQLAlchemy URL (sqlite:///..., postgresql://...)
for non-SQLite backends.
"""
from __future__ import annotations
import hashlib
import hmac as _hmac_mod
import json
import logging
import os
from datetime import datetime, timezone
from typing import Optional
from uuid import uuid4
from sqlalchemy import Boolean, Column, DateTime, String, create_engine
from sqlalchemy.orm import DeclarativeBase, Session, sessionmaker
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------
AUDIT_LEDGER_DB: str = os.environ.get(
"AUDIT_LEDGER_DB", "/var/log/molecule/audit_ledger.db"
)
# PBKDF2 parameters (must never change once events are written — all existing
# HMACs become unverifiable if parameters change).
_PBKDF2_SALT: bytes = b"molecule-audit-ledger-v1" # fixed domain separator
_PBKDF2_ITERATIONS: int = 210_000
_PBKDF2_DKLEN: int = 32
# Cached derived key (reset to None in tests when AUDIT_LEDGER_SALT changes).
_hmac_key: Optional[bytes] = None
# ---------------------------------------------------------------------------
# PBKDF2 key derivation
# ---------------------------------------------------------------------------
def _get_hmac_key() -> bytes:
"""Return (and cache) the 32-byte HMAC key derived from AUDIT_LEDGER_SALT.
Reads AUDIT_LEDGER_SALT exclusively from the environment — never from a
module-level attribute — so the secret is not exposed in the module
namespace. Raises RuntimeError if the env var is not set.
"""
global _hmac_key
if _hmac_key is None:
salt = os.environ.get("AUDIT_LEDGER_SALT", "")
if not salt:
raise RuntimeError(
"AUDIT_LEDGER_SALT environment variable is required but not set. "
"Generate a random 32-byte hex string and export it before "
"starting the agent: "
"export AUDIT_LEDGER_SALT=$(python3 -c "
"\"import secrets; print(secrets.token_hex(32))\")"
)
_hmac_key = hashlib.pbkdf2_hmac(
"sha256",
password=salt.encode("utf-8"),
salt=_PBKDF2_SALT,
iterations=_PBKDF2_ITERATIONS,
dklen=_PBKDF2_DKLEN,
)
return _hmac_key
def reset_hmac_key_cache() -> None:
"""Reset the cached HMAC key — call after changing AUDIT_LEDGER_SALT env var in tests."""
global _hmac_key
_hmac_key = None
# ---------------------------------------------------------------------------
# Canonical JSON helpers
# ---------------------------------------------------------------------------
def _ts_to_canonical(ts: datetime | None) -> str | None:
"""Format a datetime as RFC-3339 seconds-precision Z-suffixed string.
Strips microseconds and converts to UTC so the format is identical to
Go's ``time.Time.UTC().Format("2006-01-02T15:04:05Z")``.
"""
if ts is None:
return None
if ts.tzinfo is not None:
ts = ts.astimezone(timezone.utc)
return ts.strftime("%Y-%m-%dT%H:%M:%SZ")
def _to_canonical_dict(ev: "AuditEvent") -> dict:
"""Return the dict used as HMAC input — excludes the hmac field itself."""
return {
"agent_id": ev.agent_id,
"human_oversight_flag": ev.human_oversight_flag,
"id": ev.id,
"input_hash": ev.input_hash,
"model_used": ev.model_used,
"operation": ev.operation,
"output_hash": ev.output_hash,
"prev_hmac": ev.prev_hmac,
"risk_flag": ev.risk_flag,
"session_id": ev.session_id,
"timestamp": _ts_to_canonical(ev.timestamp),
}
def _compute_event_hmac(ev: "AuditEvent") -> str:
"""Compute HMAC-SHA256 hex digest of ev's canonical JSON.
Keys are sorted alphabetically (matching Python json.dumps sort_keys=True
and Go encoding/json.Marshal on a map). Separators are compact (no spaces)
so the output matches Go's json.Marshal.
"""
canonical = _to_canonical_dict(ev)
payload = json.dumps(canonical, sort_keys=True, separators=(",", ":")).encode("utf-8")
key = _get_hmac_key()
return _hmac_mod.new(key, payload, "sha256").hexdigest()
# ---------------------------------------------------------------------------
# Content hashing helper (privacy-preserving)
# ---------------------------------------------------------------------------
def hash_content(content: str | bytes | None) -> str | None:
"""Return SHA-256 hex digest of content, or None if content is falsy.
Use this to record *that* specific content was processed without persisting
the raw content itself (satisfies EU AI Act data-minimisation principles).
"""
if content is None:
return None
if isinstance(content, str):
content = content.encode("utf-8")
return hashlib.sha256(content).hexdigest()
# ---------------------------------------------------------------------------
# SQLAlchemy model
# ---------------------------------------------------------------------------
class Base(DeclarativeBase):
pass
class AuditEvent(Base):
"""Append-only HMAC-chained audit event.
12 fields: 6 legally mandatory under EU AI Act Art. 12/13, plus 4 strongly
recommended, plus the 2-field HMAC chain (prev_hmac, hmac).
"""
__tablename__ = "audit_events"
# Identity
id = Column(String, primary_key=True, default=lambda: str(uuid4()))
timestamp = Column(
DateTime(timezone=True),
nullable=False,
default=lambda: datetime.now(timezone.utc),
)
# EU AI Act Art. 12 mandatory fields
agent_id = Column(String, nullable=False)
session_id = Column(String, nullable=False) # gen_ai.conversation.id
operation = Column(String, nullable=False) # task_start|llm_call|tool_call|task_end
# Privacy-preserving content fingerprints
input_hash = Column(String, nullable=True) # SHA-256 of input text
output_hash = Column(String, nullable=True) # SHA-256 of output text
# EU AI Act Art. 13 transparency fields
model_used = Column(String, nullable=True) # gen_ai.request.model (or tool name)
# Oversight flags (Art. 14 human oversight)
human_oversight_flag = Column(Boolean, nullable=False, default=False)
risk_flag = Column(Boolean, nullable=False, default=False)
# HMAC chain
prev_hmac = Column(String, nullable=True) # hmac of previous row for this agent_id
hmac = Column(String, nullable=False) # HMAC of this row's canonical JSON
def to_dict(self) -> dict:
"""Return a full dict suitable for API responses (ISO 8601 timestamp)."""
return {
"id": self.id,
"timestamp": self.timestamp.isoformat() if self.timestamp else None,
"agent_id": self.agent_id,
"session_id": self.session_id,
"operation": self.operation,
"input_hash": self.input_hash,
"output_hash": self.output_hash,
"model_used": self.model_used,
"human_oversight_flag": self.human_oversight_flag,
"risk_flag": self.risk_flag,
"prev_hmac": self.prev_hmac,
"hmac": self.hmac,
}
def __repr__(self) -> str:
return (
f"<AuditEvent id={self.id!r} agent_id={self.agent_id!r} "
f"op={self.operation!r} ts={self.timestamp!r}>"
)
# ---------------------------------------------------------------------------
# Engine / session factory
# ---------------------------------------------------------------------------
_engine = None
_SessionFactory = None
def get_engine(db_url: str | None = None):
"""Return (and cache) the SQLAlchemy engine.
Creates the ``audit_events`` table if it does not already exist.
"""
global _engine
if _engine is None:
url = db_url or _db_url_from_env()
if url.startswith("sqlite:///"):
_ensure_sqlite_parent(url)
connect_args = {"check_same_thread": False} if "sqlite" in url else {}
_engine = create_engine(url, connect_args=connect_args)
Base.metadata.create_all(_engine)
return _engine
def _db_url_from_env() -> str:
"""Build the DB URL from environment variables."""
db = AUDIT_LEDGER_DB
if db.startswith(("sqlite://", "postgresql://", "postgres://")):
return db
return f"sqlite:///{db}"
def _ensure_sqlite_parent(url: str) -> None:
"""Create the parent directory for a sqlite:///path URL if needed."""
path = url[len("sqlite:///"):]
if path and path != ":memory:":
os.makedirs(os.path.dirname(os.path.abspath(path)), exist_ok=True)
def get_session_factory(db_url: str | None = None):
"""Return (and cache) a SQLAlchemy sessionmaker bound to the engine."""
global _SessionFactory
if _SessionFactory is None:
_SessionFactory = sessionmaker(bind=get_engine(db_url))
return _SessionFactory
def reset_engine_cache() -> None:
"""Reset the cached engine and session factory — for tests only."""
global _engine, _SessionFactory
_engine = None
_SessionFactory = None
# ---------------------------------------------------------------------------
# Core write API
# ---------------------------------------------------------------------------
def _prev_hmac_for_agent(agent_id: str, session: Session) -> str | None:
"""Return the hmac of the most recent event for agent_id (None if none)."""
last = (
session.query(AuditEvent)
.filter(AuditEvent.agent_id == agent_id)
.order_by(AuditEvent.timestamp.desc(), AuditEvent.id.desc())
.first()
)
return last.hmac if last else None
def append_event(
agent_id: str,
session_id: str,
operation: str,
*,
input_hash: str | None = None,
output_hash: str | None = None,
model_used: str | None = None,
human_oversight_flag: bool = False,
risk_flag: bool = False,
db_session: Session | None = None,
db_url: str | None = None,
) -> AuditEvent:
"""Append one signed, chained event to the ledger and return it.
Derives the HMAC key from AUDIT_LEDGER_SALT (raises RuntimeError if unset),
looks up the previous row's HMAC to form the chain link, signs the new row,
and writes it to the database.
Parameters
----------
agent_id: Identity of the agent (typically WORKSPACE_ID).
session_id: Task / conversation ID (gen_ai.conversation.id).
operation: One of: task_start, llm_call, tool_call, task_end.
input_hash: SHA-256 of the input (use hash_content()).
output_hash: SHA-256 of the output.
model_used: Model name (for llm_call) or tool name (for tool_call).
human_oversight_flag: True if human review was required / triggered.
risk_flag: True if a risk condition was detected.
db_session: Pre-opened Session (created + closed internally if None).
db_url: SQLAlchemy URL override (used if session is None).
"""
own_session = db_session is None
if own_session:
factory = get_session_factory(db_url)
db_session = factory()
try:
prev_hmac = _prev_hmac_for_agent(agent_id, db_session)
event = AuditEvent(
id=str(uuid4()),
timestamp=datetime.now(timezone.utc),
agent_id=agent_id,
session_id=session_id,
operation=operation,
input_hash=input_hash,
output_hash=output_hash,
model_used=model_used,
human_oversight_flag=human_oversight_flag,
risk_flag=risk_flag,
prev_hmac=prev_hmac,
hmac="", # placeholder — replaced below after ID/timestamp are set
)
# Compute the real HMAC now that all fields are populated.
event.hmac = _compute_event_hmac(event)
db_session.add(event)
db_session.commit()
db_session.refresh(event)
return event
except Exception:
if own_session:
db_session.rollback()
raise
finally:
if own_session:
db_session.close()
# ---------------------------------------------------------------------------
# Verification
# ---------------------------------------------------------------------------
def verify_chain(agent_id: str, db_session: Session) -> bool:
"""Return True if the entire HMAC chain for agent_id is intact.
Iterates all events for agent_id in chronological order and checks:
1. Each row's stored hmac matches the freshly-computed HMAC.
2. Each row's prev_hmac equals the prior row's hmac (None for first row).
Returns False (and logs a warning) at the first broken link.
Returns True vacuously when there are no events.
"""
events = (
db_session.query(AuditEvent)
.filter(AuditEvent.agent_id == agent_id)
.order_by(AuditEvent.timestamp.asc(), AuditEvent.id.asc())
.all()
)
expected_prev: str | None = None
for ev in events:
expected_hmac = _compute_event_hmac(ev)
if not _hmac_mod.compare_digest(ev.hmac, expected_hmac):
logger.warning(
"audit: HMAC mismatch at event %s (agent=%s): "
"stored=%r computed=%r",
ev.id,
agent_id,
ev.hmac,
expected_hmac,
)
return False
if not _hmac_mod.compare_digest(ev.prev_hmac or "", expected_prev or ""):
logger.warning(
"audit: chain break at event %s (agent=%s): "
"stored prev_hmac=%r expected=%r",
ev.id,
agent_id,
ev.prev_hmac,
expected_prev,
)
return False
expected_prev = ev.hmac
return True
-136
View File
@@ -1,136 +0,0 @@
"""molecule_audit.verify — CLI to verify an agent's HMAC chain integrity.
Usage
-----
python -m molecule_audit.verify --agent-id <id> [--db <url>]
Options
-------
--agent-id Agent ID whose chain to verify (required).
--db SQLAlchemy DB URL override.
Defaults to AUDIT_LEDGER_DB env var or /var/log/molecule/audit_ledger.db.
Exit codes
----------
0 Chain is valid (or no events found for this agent).
1 Chain is broken — tampered or corrupted row(s) detected.
2 Configuration error (e.g. AUDIT_LEDGER_SALT not set).
3 Database error (e.g. file not found, connection refused).
Example
-------
export AUDIT_LEDGER_SALT=<your-secret>
export AUDIT_LEDGER_DB=/var/log/molecule/audit_ledger.db
python -m molecule_audit.verify --agent-id my-workspace-id
# CHAIN VALID (42 events)
"""
from __future__ import annotations
import argparse
import hmac as _hmac_mod
import sys
def main(argv=None) -> None:
parser = argparse.ArgumentParser(
prog="python -m molecule_audit.verify",
description=(
"Verify the HMAC chain integrity for a given agent's audit log. "
"Exit 0 = valid, 1 = broken, 2 = config error, 3 = DB error."
),
)
parser.add_argument(
"--agent-id",
required=True,
metavar="AGENT_ID",
help="Agent workspace ID to verify.",
)
parser.add_argument(
"--db",
default=None,
metavar="URL",
help=(
"SQLAlchemy DB URL (e.g. sqlite:///path.db or "
"postgresql://user:pass@host/db). "
"Defaults to AUDIT_LEDGER_DB env var."
),
)
args = parser.parse_args(argv)
# Defer imports so errors in configuration (missing SALT) produce clean output.
try:
from molecule_audit.ledger import (
AuditEvent,
_compute_event_hmac,
get_session_factory,
verify_chain,
)
except RuntimeError as exc:
print(f"ERROR: {exc}", file=sys.stderr)
sys.exit(2)
try:
factory = get_session_factory(args.db)
session = factory()
except Exception as exc:
print(f"ERROR: could not open database: {exc}", file=sys.stderr)
sys.exit(3)
try:
from sqlalchemy import asc
n_events = (
session.query(AuditEvent)
.filter(AuditEvent.agent_id == args.agent_id)
.count()
)
if n_events == 0:
print(f"No audit events found for agent_id={args.agent_id!r}")
sys.exit(0)
valid = verify_chain(args.agent_id, session)
if valid:
print(f"CHAIN VALID ({n_events} events)")
sys.exit(0)
else:
# Walk the chain manually to report the exact broken event.
events = (
session.query(AuditEvent)
.filter(AuditEvent.agent_id == args.agent_id)
.order_by(asc(AuditEvent.timestamp), asc(AuditEvent.id))
.all()
)
expected_prev = None
for ev in events:
expected_hmac = _compute_event_hmac(ev)
if not _hmac_mod.compare_digest(ev.hmac, expected_hmac):
print(
f"CHAIN BROKEN at event {ev.id} "
f"(HMAC mismatch: stored={ev.hmac[:12]}... "
f"computed={expected_hmac[:12]}...)"
)
sys.exit(1)
if not _hmac_mod.compare_digest(ev.prev_hmac or "", expected_prev or ""):
print(
f"CHAIN BROKEN at event {ev.id} "
f"(prev_hmac mismatch: stored={ev.prev_hmac} "
f"expected={expected_prev})"
)
sys.exit(1)
expected_prev = ev.hmac
# verify_chain said broken but we couldn't find the exact event
print(f"CHAIN BROKEN (position unknown; run with DEBUG logging)")
sys.exit(1)
except Exception as exc:
print(f"ERROR: verification failed: {exc}", file=sys.stderr)
sys.exit(3)
finally:
session.close()
if __name__ == "__main__":
main()
-69
View File
@@ -1,69 +0,0 @@
"""Build a JSON-RPC handler that returns ``-32603 "agent not configured"``.
Used by the workspace runtime when ``adapter.setup()`` fails (most often
because an LLM credential is missing or rotated). Lets ``/.well-known/agent-card.json``
keep serving 200 — the workspace stays REACHABLE for canvas/operator
introspection — while message-send requests get a clear, immediate
error instead of silently timing out.
Kept as its own module so the behavior is unit-testable without booting
the whole runtime (main.py is ``# pragma: no cover``).
"""
from __future__ import annotations
from typing import Awaitable, Callable
from starlette.requests import Request
from starlette.responses import JSONResponse
from secret_redactor import redact_secrets
def make_not_configured_handler(
reason: str | None,
) -> Callable[[Request], Awaitable[JSONResponse]]:
"""Return a Starlette POST handler that always 503s with JSON-RPC -32603.
``reason`` is surfaced in the JSON-RPC ``error.data`` field so canvas
can render "agent not configured: <reason>" to the user. Pass the
stringified ``adapter.setup()`` exception. ``None`` falls back to a
generic "adapter.setup() failed".
Secret redaction (issue molecule-core#2760): ``reason`` is run
through ``secret_redactor.redact_secrets`` once, when the handler
is built. If a future adapter author writes ``raise
RuntimeError(f"auth failed for {token}")``, the token is replaced
with ``<redacted-secret>`` BEFORE it lands in the response —
closes the structural leak path PR #2756 introduced. Per-request
hot path stays unchanged (one cached string, no re-redaction).
The handler echoes the request's JSON-RPC ``id`` when present so a
well-behaved JSON-RPC client can correlate the error to its request.
Malformed bodies (non-JSON, missing id) get ``id: null`` per spec.
"""
# Redact at handler-build time, not per-request, so the hot path
# stays a constant lookup. The fallback string can't carry secrets
# but we still pass it through redact_secrets() so a future change
# to the fallback can't accidentally introduce a leak.
fallback = redact_secrets(reason or "adapter.setup() failed")
async def _handler(request: Request) -> JSONResponse:
try:
body = await request.json()
except Exception: # noqa: BLE001
body = {}
return JSONResponse(
{
"jsonrpc": "2.0",
"id": body.get("id") if isinstance(body, dict) else None,
"error": {
"code": -32603,
"message": "Internal error: agent not configured",
"data": fallback,
},
},
status_code=503,
)
return _handler
-265
View File
@@ -1,265 +0,0 @@
"""Workspace auth-token store (Phase 30.1).
Single source of truth for this workspace's authentication token. The
token is issued by the platform on the first successful
``POST /registry/register`` call and travels with every subsequent
heartbeat / update-card / (later) secrets-pull / A2A request.
The token is persisted to ``<configs>/.auth_token`` so it survives
restarts — we only expect to receive it once from the platform, since
``/registry/register`` no-ops token issuance for workspaces that already
have one on file.
Storage:
${CONFIGS_DIR}/.auth_token # 0600, one line, no trailing newline
Callers interact with three functions:
:func:`get_token` — returns the cached token or None
:func:`save_token` — persists a freshly-issued token
:func:`auth_headers`— builds the Authorization header dict for httpx
"""
from __future__ import annotations
import logging
import os
import threading
from pathlib import Path
import configs_dir
logger = logging.getLogger(__name__)
# In-process cache so we don't hit disk on every heartbeat. The heartbeat
# loop fires on a short interval and reading a tiny file 10x per minute
# is wasteful. The file is the durable copy; this var is the hot path.
_cached_token: str | None = None
# Per-workspace token registry — populated by mcp_cli when the operator
# runs a multi-workspace external agent (MOLECULE_WORKSPACES env var).
# Keyed by workspace_id, value is the bearer token issued by that
# workspace's tenant. Distinct from `_cached_token` (which is the
# single-workspace path's token); the two coexist so single-workspace
# back-compat is preserved exactly.
#
# Lock guards mutations from the registration phase (one writer per
# workspace, but the writers run in main(), not in heartbeat threads).
# Reads are lock-free for the hot path; the dict is finalized before
# any heartbeat / poller thread starts.
_WORKSPACE_TOKENS: dict[str, str] = {}
_WORKSPACE_TOKENS_LOCK = threading.Lock()
def _token_file() -> Path:
"""Path to the on-disk token file. Resolved via configs_dir so
in-container (/configs) and external-runtime (~/.molecule-workspace)
operators land on a writable location automatically. Explicit
CONFIGS_DIR env var still wins."""
return configs_dir.resolve() / ".auth_token"
def get_token() -> str | None:
"""Return the cached token, reading it from disk on first call.
Resolution order:
1. In-process cache (hot path)
2. ``${CONFIGS_DIR}/.auth_token`` file (in-container default —
the platform writes this on provision and rotates it on
restart)
3. ``MOLECULE_WORKSPACE_TOKEN`` env var (external-runtime path —
operators running the universal MCP server outside a
container have no /configs volume to populate, so they pass
the token via env)
File-first preserves in-container behavior unchanged: containers
always have /configs/.auth_token on disk, env-var fallback only
fires when there's no file. This is additive — no existing caller
sees a behavior change.
"""
global _cached_token
if _cached_token is not None:
return _cached_token
path = _token_file()
if path.exists():
try:
tok = path.read_text().strip()
except OSError as exc:
logger.warning("platform_auth: failed to read %s: %s", path, exc)
tok = ""
if tok:
_cached_token = tok
return tok
# File missing or empty — fall back to env (external-runtime path).
env_tok = os.environ.get("MOLECULE_WORKSPACE_TOKEN", "").strip()
if env_tok:
_cached_token = env_tok
return env_tok
return None
def save_token(token: str) -> None:
"""Persist a newly-issued token. Creates the file with 0600 mode atomically.
Uses ``os.open(O_CREAT, 0o600)`` so the file is never world-readable,
even transiently. The previous ``write_text()`` + ``chmod()`` approach
had a TOCTOU window where a concurrent reader could access the token
between the two syscalls (M4 — flagged in security audit cycle 10).
Idempotent — if an identical token is already on disk we skip the
write so we don't churn the file's mtime or trigger spurious
filesystem watchers."""
global _cached_token
token = token.strip()
if not token:
raise ValueError("platform_auth: refusing to save empty token")
if get_token() == token:
return
path = _token_file()
path.parent.mkdir(parents=True, exist_ok=True)
# O_CREAT | O_WRONLY | O_TRUNC with mode=0o600 atomically creates (or
# truncates) the file with restricted permissions in a single syscall,
# eliminating the TOCTOU window.
fd = os.open(str(path), os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
try:
os.write(fd, token.encode())
finally:
os.close(fd)
_cached_token = token
def register_workspace_token(workspace_id: str, token: str) -> None:
"""Register a per-workspace bearer token in the multi-workspace registry.
Called by ``mcp_cli`` once per entry in the ``MOLECULE_WORKSPACES``
env var so per-workspace heartbeat / poller threads can resolve their
own auth via ``auth_headers(workspace_id=...)`` without each thread
closing over a token literal.
Idempotent: re-registering the same workspace_id with the same token
is a no-op; with a different token it overwrites and logs at INFO
(the legitimate case is operator token rotation between restarts).
"""
workspace_id = (workspace_id or "").strip()
token = (token or "").strip()
if not workspace_id or not token:
return
with _WORKSPACE_TOKENS_LOCK:
prior = _WORKSPACE_TOKENS.get(workspace_id)
if prior == token:
return
if prior is not None:
logger.info(
"platform_auth: workspace_id %s token rotated", workspace_id,
)
_WORKSPACE_TOKENS[workspace_id] = token
def get_workspace_token(workspace_id: str) -> str | None:
"""Return the per-workspace token from the registry, or None.
Lookup is lock-free: writes happen in main() before threads start,
reads are stable thereafter.
"""
return _WORKSPACE_TOKENS.get((workspace_id or "").strip())
def list_registered_workspaces() -> list[str]:
"""Return the workspace IDs currently in the per-workspace registry.
Empty list when no multi-workspace registration has happened (i.e.
single-workspace operators using the legacy WORKSPACE_ID env path —
those callers should fall back to the module-level WORKSPACE_ID).
Used by ``a2a_tools.tool_list_peers`` to aggregate peers across all
workspaces an external agent has registered against, so a
multi-workspace operator can see the full peer surface in one call
instead of having to query each workspace separately.
"""
with _WORKSPACE_TOKENS_LOCK:
return list(_WORKSPACE_TOKENS.keys())
def auth_headers(workspace_id: str | None = None) -> dict[str, str]:
"""Return a header dict to merge into httpx calls. Empty if no token
is available yet — callers send the request as-is and the platform's
heartbeat handler grandfathers pre-token workspaces through until
their next /registry/register issues one.
Always sets ``Origin`` to ``PLATFORM_URL`` when that env var is set.
On hosted SaaS deployments the tenant's edge WAF requires a same-
origin header — without it ``/workspaces/*`` and ``/registry/*/peers``
requests get silently rewritten to the canvas Next.js app, which has
no such routes and returns an empty 404. Inside-container calls are
unaffected (Docker-internal PLATFORM_URLs aren't behind the WAF).
Discovered while smoke-testing the molecule-mcp external-runtime
path against a live tenant — every tool call returned "not found"
because the WAF was eating them.
Token resolution order:
1. ``workspace_id`` arg → per-workspace registry
(multi-workspace external agent — set by mcp_cli)
2. Single-workspace cache + .auth_token file + env var
(pre-existing path; back-compat unchanged)
Single-workspace operators see no behavior change: ``auth_headers()``
with no arg routes through the legacy resolution path exactly as
before. Multi-workspace operators pass ``workspace_id`` so each
thread (heartbeat, poller, send_message_to_user) authenticates
against the correct workspace.
"""
headers: dict[str, str] = {}
platform_url = os.environ.get("PLATFORM_URL", "").strip()
if platform_url:
headers["Origin"] = platform_url
tok: str | None = None
if workspace_id:
tok = get_workspace_token(workspace_id)
if tok is None:
tok = get_token()
if tok:
headers["Authorization"] = f"Bearer {tok}"
return headers
def self_source_headers(workspace_id: str) -> dict[str, str]:
"""Return auth headers PLUS X-Workspace-ID identifying this workspace
as the source of the request.
Use this for any POST the workspace's own runtime fires against the
platform's A2A endpoints — heartbeat self-messages, initial_prompt,
idle-loop fires, peer-to-peer A2A from runtime tools. Without the
X-Workspace-ID header the platform's a2a_receive logger writes
source_id=NULL, which the canvas's My Chat tab interprets as a
user-typed message and renders the internal prompt to the user.
See workspace-server/internal/handlers/a2a_proxy.go:184 for the
server-side classification rule.
Centralised here so adding a new system header (e.g. a per-fire
correlation ID) only touches one place — and so that any
workspace→A2A POST that doesn't use this helper stands out in
review as a probable bug."""
# Pass workspace_id through to auth_headers so the bearer token
# comes from the per-workspace registry when set — otherwise a
# multi-workspace operator's source-tagged POST authenticates with
# the legacy single token (or none) and the platform rejects with
# 401, or worse silently logs the wrong source.
return {**auth_headers(workspace_id), "X-Workspace-ID": workspace_id}
def clear_cache() -> None:
"""Reset the in-memory cache. Used by tests that write fresh token
files between cases."""
global _cached_token
_cached_token = None
with _WORKSPACE_TOKENS_LOCK:
_WORKSPACE_TOKENS.clear()
def refresh_cache() -> str | None:
"""Force re-read of the token from disk, discarding the in-process cache.
Use this when a 401 response suggests the cached token is stale —
e.g. after the platform rotates tokens during a restart (issue #1877).
Returns the (new) token value or None if not found/error."""
global _cached_token
_cached_token = None
return get_token()
-145
View File
@@ -1,145 +0,0 @@
"""Auth gate for the /internal/* Starlette routes.
The platform calls into the workspace's HTTP server using a per-workspace
shared secret minted at provision time and stored in
``/configs/.platform_inbound_secret`` (see migration 044 + RFC #2312).
The workspace validates by string-equality against the file content —
the platform side stores the same plaintext in ``workspaces
.platform_inbound_secret`` and reads it back on every forward call.
Asymmetric to ``platform_auth.py``:
platform_auth.py platform_inbound_auth.py
──────────────── ────────────────────────
workspace → platform platform → workspace
/configs/.auth_token /configs/.platform_inbound_secret
workspace presents bearer workspace validates bearer
Fail-closed semantics (mirrors transcript_auth.py): if the secret file is
missing, empty, or unreadable, every request is rejected. The platform
will surface this as a structural error rather than silently sending
unauthenticated requests through.
"""
from __future__ import annotations
import logging
import os
from pathlib import Path
import configs_dir
logger = logging.getLogger(__name__)
# In-process cache so we don't hit disk on every forward call. Same
# pattern as platform_auth._cached_token. The file is the durable copy;
# this var is the hot path.
_cached_secret: str | None = None
def _secret_file() -> Path:
"""Path to the on-disk inbound-secret file. Resolved via configs_dir
— /configs in-container, ~/.molecule-workspace for external-runtime
operators. Explicit CONFIGS_DIR env var wins."""
return configs_dir.resolve() / ".platform_inbound_secret"
def get_inbound_secret() -> str | None:
"""Return the cached inbound secret, reading from disk on first call.
Returns None if the file is missing, empty, or unreadable. Callers
MUST treat None as an auth failure (fail-closed) — never substitute
a default or skip-auth-on-missing semantics.
"""
global _cached_secret
if _cached_secret is not None:
return _cached_secret
path = _secret_file()
if not path.exists():
return None
try:
secret = path.read_text().strip()
except OSError as exc:
logger.warning("platform_inbound_auth: read %s failed: %s", path, exc)
return None
if not secret:
return None
_cached_secret = secret
return secret
def reset_cache() -> None:
"""Drop the in-process cache. Used by tests + the rare runtime-side
path that needs to re-read after the file is overwritten (e.g. a
rotation flow lands in the future)."""
global _cached_secret
_cached_secret = None
def save_inbound_secret(secret: str) -> None:
"""Persist a freshly-received platform_inbound_secret to disk.
Called from the /registry/register response handler when the platform
returns a `platform_inbound_secret` field. Mirrors platform_auth.save_token's
pattern: 0600 file in CONFIGS_DIR, atomic write via tmp + rename so a
concurrent reader never sees a partial file.
Idempotent: writing the same value over an existing file is a no-op
from the workspace's perspective. Resets the in-process cache so the
next get_inbound_secret() returns the freshly-written value (matters
when a future rotation flow lands and the platform sends a different
secret on a subsequent register call).
"""
global _cached_secret
if not secret:
return
path = _secret_file()
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + ".tmp")
try:
# Open with 0600 from the start so a concurrent reader can never
# see a 0644-default fd before the chmod. mode= is honored by
# os.open underneath; pathlib.write_text does not expose it.
fd = os.open(str(tmp), os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
with os.fdopen(fd, "w") as f:
f.write(secret)
os.replace(str(tmp), str(path))
# Race-safe in-process cache update: clear first, then let next
# caller re-read disk. Avoids the "stored new, cache still has
# old" window if get_inbound_secret races with this write.
_cached_secret = None
except OSError as exc:
logger.warning("platform_inbound_auth: save %s failed: %s", path, exc)
# Best-effort cleanup of the tmp file.
try:
os.unlink(str(tmp))
except OSError as cleanup_exc:
logger.debug("platform_inbound_auth: unlink tmp %s failed: %s", tmp, cleanup_exc)
def inbound_authorized(expected_secret: str | None, auth_header: str) -> bool:
"""Return True iff a /internal/* request should be served.
Args:
expected_secret: the workspace's stored inbound secret, or None
if /configs/.platform_inbound_secret is absent / empty /
unreadable.
auth_header: raw Authorization request header value.
Behavior:
- None / empty expected → fail closed. A missing secret file
is an auth failure, not a bypass.
- Non-empty expected → strict string-equality against
"Bearer <secret>". Bearer prefix is case-sensitive (matches
the platform's wsauth.BearerTokenFromHeader contract).
Constant-time comparison is used to avoid leaking the secret one
byte at a time via timing analysis on a network-reachable endpoint.
"""
if not expected_secret:
return False
expected = f"Bearer {expected_secret}"
# hmac.compare_digest is the stdlib constant-time string compare.
# Length mismatch is documented to short-circuit safely (returns
# False without leaking length-difference timing).
import hmac
return hmac.compare_digest(auth_header, expected)
-107
View File
@@ -1,107 +0,0 @@
# Platform tool registry
Single source of truth for every tool the platform exposes to agents
(A2A delegation, hierarchical memory, broadcast, introspection).
## Why this exists
Pre-#2240, three places independently declared each tool:
1. **MCP server** (`workspace/a2a_mcp_server.py`) — the `TOOLS` JSON list
2. **LangChain `@tool` wrappers** (`workspace/builtin_tools/{delegation,memory}.py`)
3. **Agent-facing system-prompt docs** (`workspace/executor_helpers.py`)
Adding a tool to one and forgetting the others happened repeatedly. The
canonical case: `send_message_to_user` was registered in MCP TOOLS but
the executor_helpers doc string never mentioned it, so agents saw the
tool as available but had no usage guidance — a silent capability
regression.
## What the registry does
`registry.py` defines each tool ONCE as a frozen `ToolSpec`:
```python
ToolSpec(
name="delegate_task",
short="Delegate a task to a peer workspace via A2A and WAIT for the response.",
when_to_use="Use for QUICK questions and small sub-tasks where you can afford to wait inline...",
input_schema={...}, # JSON Schema, consumed by MCP server
impl=tool_delegate_task, # the actual coroutine
section="a2a", # which prompt section it belongs to
)
```
Adapters consume specs; no hardcoded names anywhere else:
- **MCP server** builds its `TOOLS` list from `_PLATFORM_TOOL_SPECS` at import time
- **LangChain `@tool` wrappers** read `name=spec.name` from the registry
- **Doc generator** (`executor_helpers._render_section()`) produces the
system-prompt block from `spec.short` (bullet) + `spec.when_to_use`
(heading + paragraph)
## CLI subprocess block — special case
Non-MCP runtimes (ollama, custom subprocess adapters) use a separate
hand-maintained block in `executor_helpers._A2A_INSTRUCTIONS_CLI` because
the CLI subcommand vocabulary (`peers`, `delegate`, `status`, `info`)
differs from the MCP tool names (`list_peers`, `delegate_task`, etc.).
Auto-generation would lose the readable invocation syntax.
Alignment is enforced via `_CLI_A2A_COMMAND_KEYWORDS` (in
`executor_helpers.py`): every a2a-section spec must be keyed there with
either a CLI subcommand keyword OR an explicit `None` if the tool is
intentionally not exposed via subprocess (e.g.
`send_message_to_user` because its structured `attachments` field
doesn't survive positional-arg shell invocation).
## Tests that catch drift
`workspace/tests/test_platform_tools.py`:
| Test | What it catches |
|---|---|
| `test_mcp_server_registers_every_registry_tool` | MCP TOOLS list out of sync with registry |
| `test_mcp_tool_descriptions_match_registry_short` | hand-edited MCP description that drifted |
| `test_mcp_tool_input_schemas_match_registry` | schema duplicated in server file |
| `test_a2a_instructions_text_includes_every_a2a_tool` | doc generator missed a tool |
| `test_old_pre_rename_names_not_present_in_docs` | stale name leaked back in |
| `test_a2a_mcp_instructions_match_snapshot` | rendered shape (bullet ordering, headings, footers) drifted |
| `test_a2a_cli_instructions_match_snapshot` | CLI block edited in a way that changes shape |
| `test_hma_instructions_match_snapshot` | HMA section drifted |
| `test_cli_keyword_mapping_covers_every_a2a_tool` | tool added to registry without a CLI mapping decision |
| `test_cli_keyword_substrings_appear_in_cli_block` | CLI keyword in the mapping but missing from the doc block |
The snapshot files at `workspace/tests/snapshots/*.txt` are LF-pinned
in `.gitattributes` so a Windows contributor with `core.autocrlf=true`
doesn't get mysterious test failures.
## Adding a new tool
1. Append a `ToolSpec(...)` to `TOOLS` in `registry.py`.
2. Add the LangChain `@tool` wrapper in `workspace/builtin_tools/`
(the wrapper body just calls `spec.impl`).
3. Update `_CLI_A2A_COMMAND_KEYWORDS` in `executor_helpers.py` — set the
value to the CLI subcommand keyword, or to `None` if the tool isn't
exposed via the subprocess interface.
4. Regenerate snapshots — see the comment block at the top of
`workspace/tests/test_platform_tools.py` for the one-liner.
5. Run `pytest workspace/tests/test_platform_tools.py --no-cov`.
## Renaming a tool
Edit `name` in `registry.py` only. Then:
1. The MCP TOOLS list rebuilds automatically.
2. The doc generator regenerates automatically (snapshots will fail
the diff — regenerate them).
3. Search `workspace/` for the old literal in case a non-adapter
consumer (tests, plugin code) hardcoded the old name; update those.
4. Update any `_CLI_A2A_COMMAND_KEYWORDS` key + the literal substring
in `_A2A_INSTRUCTIONS_CLI` if applicable.
## Removing a tool
Delete the `ToolSpec` and the `_CLI_A2A_COMMAND_KEYWORDS` key. Adapters
and doc generators stop registering it automatically; the structural
tests prevent stale references from surviving.
-13
View File
@@ -1,13 +0,0 @@
"""Platform tools — single source of truth for tool naming and docs.
The platform owns A2A and persistent-memory tooling (cross-cutting
runtime concerns per project memory project_runtime_native_pluggable.md).
Tools are defined ONCE in `registry.py`. Every adapter — MCP server,
LangChain wrapper, any future SDK integration — consumes the specs to
register the tool in its native format. Doc generators (system-prompt
injection, canvas help, future doc sites) read from the same place.
Adding a tool: append a ToolSpec to TOOLS in registry.py. Every
adapter picks it up automatically; structural tests fail if any side
drifts from the registry.
"""
-737
View File
@@ -1,737 +0,0 @@
"""Canonical registry of platform tool specs.
Every tool the platform offers to agents (A2A delegation, persistent
memory, broadcast, introspection) is defined ONCE in TOOLS below.
Adapters consume these specs to register the tool in their native
runtime format:
- a2a_mcp_server.py iterates `TOOLS` to build the MCP TOOLS list +
dispatches calls to spec.impl. No tool name or description is
hardcoded there.
- builtin_tools/{delegation,memory}.py define LangChain `@tool`
wrappers using `name=` from the spec; the wrapper body just
calls spec.impl.
- executor_helpers.get_a2a_instructions(mcp=True) /
get_hma_instructions() GENERATE the system-prompt doc string from
`TOOLS` — no hand-maintained instruction text for MCP-capable
runtimes.
- executor_helpers._A2A_INSTRUCTIONS_CLI is a SEPARATE hand-maintained
block for CLI subprocess runtimes (ollama and any other adapter
that drives a2a via `python3 -m molecule_runtime.a2a_cli ...`). It
uses different command-shape names than the registry tool names
(e.g. `peers` vs `list_peers`), so it cannot be auto-generated
from JSON-schema specs without losing the readable invocation
syntax. Its tool-coverage alignment with the registry is enforced
by the `_CLI_A2A_COMMAND_KEYWORDS` mapping in executor_helpers.py
and the alignment tests in test_platform_tools.py — adding a new
a2a tool here will fail those tests until the mapping is updated.
Adding a new tool: append a ToolSpec to `TOOLS` below, then update
`_CLI_A2A_COMMAND_KEYWORDS` in executor_helpers.py (set the value to
the CLI subcommand keyword, or to `None` if the tool isn't exposed via
the CLI subprocess interface). The structural alignment tests in
workspace/tests/test_platform_tools.py fail otherwise.
Renaming a tool: change `name` here. Search workspace/ for the old
literal in case any non-adapter consumer (tests, plugin code) hard-coded
it; update those manually. The grep is the audit, the test is the gate.
Removing a tool: delete the entry AND its `_CLI_A2A_COMMAND_KEYWORDS`
key. Adapters stop registering it automatically; doc generators stop
mentioning it.
"""
from __future__ import annotations
from collections.abc import Awaitable, Callable
from dataclasses import dataclass
from typing import Any, Literal
from a2a_tools import (
tool_broadcast_message,
tool_chat_history,
tool_check_task_status,
tool_commit_memory,
tool_delegate_task,
tool_delegate_task_async,
tool_get_runtime_identity,
tool_get_workspace_info,
tool_inbox_peek,
tool_inbox_pop,
tool_list_peers,
tool_recall_memory,
tool_send_message_to_user,
tool_update_agent_card,
tool_wait_for_message,
)
# Section name maps to the heading in the agent-facing system prompt.
# Adding a new section: add a constant + create a corresponding
# generator in executor_helpers (or generalize get_*_instructions).
A2A_SECTION = "a2a"
MEMORY_SECTION = "memory"
Section = Literal["a2a", "memory"]
@dataclass(frozen=True)
class ToolSpec:
"""Runtime-agnostic definition of one platform tool.
Each adapter (MCP, LangChain, future SDK) consumes the same spec.
Doc generators consume the same spec. There is no other source
of truth for tool naming or description.
"""
name: str
"""The exact name agents see. MUST match every adapter's
registered name and the literal that appears in agent-facing
instruction docs. Structural test enforces this."""
short: str
"""One-line description. Used as the MCP `description` field
AND as the bullet line in agent-facing instruction docs."""
when_to_use: str
"""Two-to-three-sentence agent-facing usage guidance — when
to call this tool, what it returns, what NOT to confuse it
with. Concatenated into the system prompt below the tool list."""
input_schema: dict[str, Any]
"""JSON Schema for the tool's input parameters. Consumed
directly by the MCP server. LangChain derives its schema from
Python type annotations on the @tool function — alignment is
pinned by the structural test."""
impl: Callable[..., Awaitable[str]]
"""The actual coroutine. Both adapters call this; only the
wrapping differs."""
section: Section
"""Which agent-prompt section this tool belongs to (controls
which instruction generator emits it)."""
# ---------------------------------------------------------------------------
# A2A — inter-agent communication & broadcast
# ---------------------------------------------------------------------------
_DELEGATE_TASK = ToolSpec(
name="delegate_task",
short=(
"Delegate a task to a peer workspace via A2A and WAIT for the "
"response (synchronous)."
),
when_to_use=(
"Use for QUICK questions and small sub-tasks where you can "
"afford to wait inline. Returns the peer's response text "
"directly. For longer-running work (research, multi-minute "
"jobs) use delegate_task_async + check_task_status instead "
"so you don't hold this workspace busy waiting."
),
input_schema={
"type": "object",
"properties": {
"workspace_id": {
"type": "string",
"description": "Target workspace ID (from list_peers).",
},
"task": {
"type": "string",
"description": "Task description to send to the peer.",
},
"source_workspace_id": {
"type": "string",
"description": (
"Optional. The registered workspace this delegation "
"originates from when the agent is registered to "
"multiple workspaces (MOLECULE_WORKSPACES). Auto-"
"routes via the peer→source cache when omitted; "
"single-workspace operators can ignore it."
),
},
},
"required": ["workspace_id", "task"],
},
impl=tool_delegate_task,
section=A2A_SECTION,
)
_DELEGATE_TASK_ASYNC = ToolSpec(
name="delegate_task_async",
short=(
"Send a task to a peer and return immediately with a task_id "
"(non-blocking)."
),
when_to_use=(
"Use for long-running work where you want to keep doing other "
"things while the peer processes. Poll with check_task_status "
"to retrieve the result. The platform's A2A queue handles "
"delivery + retries; the peer works independently."
),
input_schema={
"type": "object",
"properties": {
"workspace_id": {
"type": "string",
"description": "Target workspace ID (from list_peers).",
},
"task": {
"type": "string",
"description": "Task description to send to the peer.",
},
"source_workspace_id": {
"type": "string",
"description": (
"Optional. The registered workspace this delegation "
"originates from. Auto-routes via the peer→source "
"cache when omitted."
),
},
},
"required": ["workspace_id", "task"],
},
impl=tool_delegate_task_async,
section=A2A_SECTION,
)
_CHECK_TASK_STATUS = ToolSpec(
name="check_task_status",
short=(
"Poll the status of a task started with delegate_task_async; "
"returns result when done."
),
when_to_use=(
"Statuses: pending/in_progress (peer still working — wait), "
"queued (peer is busy with a prior task — DO NOT retry, the "
"platform stitches the response when it finishes), completed "
"(result available), failed (real error — fall back to a "
"different peer or handle it yourself)."
),
input_schema={
"type": "object",
"properties": {
"workspace_id": {
"type": "string",
"description": "Workspace ID the task was sent to.",
},
"task_id": {
"type": "string",
"description": "task_id returned by delegate_task_async.",
},
"source_workspace_id": {
"type": "string",
"description": (
"Optional. Which registered workspace's delegation "
"log to query. Defaults to this workspace."
),
},
},
"required": ["workspace_id", "task_id"],
},
impl=tool_check_task_status,
section=A2A_SECTION,
)
_LIST_PEERS = ToolSpec(
name="list_peers",
short=(
"List the workspaces this agent can communicate with — name, "
"ID, status, role for each."
),
when_to_use=(
"Call this first when you need to delegate but don't know the "
"target's ID. Access control is enforced — you only see "
"siblings, parent, and direct children. With "
"MOLECULE_WORKSPACES set, peers from every registered workspace "
"are aggregated and tagged with their source."
),
input_schema={
"type": "object",
"properties": {
"source_workspace_id": {
"type": "string",
"description": (
"Optional. Restrict to peers of this one registered "
"workspace. Omit to aggregate across all workspaces "
"an external agent has registered against."
),
},
},
},
impl=tool_list_peers,
section=A2A_SECTION,
)
_GET_WORKSPACE_INFO = ToolSpec(
name="get_workspace_info",
short="Get this workspace's own info — ID, name, role, tier, parent, status.",
when_to_use=(
"Use to introspect your own identity (e.g. before reporting "
"back to the user, or to determine whether you're a tier-0 "
"root that can write GLOBAL memory)."
),
input_schema={
"type": "object",
"properties": {
"source_workspace_id": {
"type": "string",
"description": (
"Optional. In multi-workspace mode (this agent registered "
"in N workspaces), introspect the named workspace instead "
"of the primary one. Single-workspace agents omit this."
),
},
},
},
impl=tool_get_workspace_info,
section=A2A_SECTION,
)
_GET_RUNTIME_IDENTITY = ToolSpec(
name="get_runtime_identity",
short=(
"Return this runtime's identity — model, model_provider, tier, "
"workspace_id, runtime template. Reads from process env; no HTTP call."
),
when_to_use=(
"Use this to answer 'what model am I?' truthfully instead of "
"guessing from a stale system prompt — the operator may have "
"routed you to a different model via persona env between boots. "
"Always permitted by RBAC: even read-only agents may know what "
"model they are. Distinct from get_workspace_info — that one "
"calls the platform for ID/role/tier/parent (workspace metadata); "
"this one returns the live process env (MODEL, MODEL_PROVIDER, "
"MOLECULE_MODEL, ANTHROPIC_BASE_URL, TIER, WORKSPACE_ID, "
"ADAPTER_MODULE)."
),
input_schema={"type": "object", "properties": {}},
impl=tool_get_runtime_identity,
section=A2A_SECTION,
)
_UPDATE_AGENT_CARD = ToolSpec(
name="update_agent_card",
short=(
"Replace this workspace's agent_card on the platform. The "
"platform validates required fields and broadcasts an "
"agent_card_updated event so the canvas reflects the change live."
),
when_to_use=(
"Use when the workspace's capabilities, skills, description, or "
"name change and the canvas display needs to follow. The "
"platform stores the new card and pushes an "
"``agent_card_updated`` event to subscribers. Gated behind the "
"``memory.write`` RBAC capability — read-only roles cannot "
"rewrite the card. Tier-1+ owners always have this capability."
),
input_schema={
"type": "object",
"properties": {
"card": {
"type": "object",
"description": (
"The new agent_card object (name, version, "
"description, skills, etc). Server-side validation "
"rejects payloads missing required fields."
),
},
},
"required": ["card"],
},
impl=tool_update_agent_card,
section=A2A_SECTION,
)
_BROADCAST_MESSAGE = ToolSpec(
name="broadcast_message",
short=(
"Send a message to ALL agent workspaces in the org simultaneously. "
"Requires broadcast_enabled=true on this workspace (set by user/admin)."
),
when_to_use=(
"Use for urgent, org-wide signals: critical status changes, emergency "
"stop instructions, coordinated task announcements. Every non-removed "
"workspace receives the message in its activity log (poll-mode agents "
"see it on their next poll; push-mode canvases get a real-time banner). "
"This tool returns an error if broadcast_enabled is false — a user or "
"admin must enable it via the workspace abilities settings first."
),
input_schema={
"type": "object",
"properties": {
"message": {
"type": "string",
"description": (
"The broadcast text. Keep it concise — every agent in the "
"org receives this in their activity feed."
),
},
"workspace_id": {
"type": "string",
"description": (
"Optional. Multi-workspace mode: the registered workspace "
"to broadcast from. Single-workspace agents omit this."
),
},
},
"required": ["message"],
},
impl=tool_broadcast_message,
section=A2A_SECTION,
)
_SEND_MESSAGE_TO_USER = ToolSpec(
name="send_message_to_user",
short=(
"Send a message directly to the user's canvas chat — pushed instantly "
"via WebSocket. Use this to: (1) acknowledge a task immediately ('Got "
"it, I'll start working on this'), (2) send interim progress updates "
"while doing long work, (3) deliver follow-up results after delegation "
"completes, (4) attach files (zip, pdf, csv, image) for the user to "
"download via the `attachments` field (NEVER paste file URLs in "
"`message`). The message appears in the user's chat as if you're "
"proactively reaching out."
),
when_to_use=(
"Use proactively across the lifecycle of a task — early to "
"acknowledge, mid-flight to update, late to deliver. Never paste "
"file URLs in the message body — always pass absolute paths in "
"`attachments` so the platform serves them as download chips "
"(works on SaaS where external file hosts are unreachable)."
),
input_schema={
"type": "object",
"properties": {
"message": {
"type": "string",
# The "no URLs in message text" rule is the single biggest
# cause of bad chat UX: agents drop catbox.moe / file://
# / temporary upload-host links into the prose, the
# canvas renders them as plain markdown links the user
# can't preview, and SaaS deployments often can't even
# reach those external hosts. Every download MUST go
# through the structured `attachments` field below.
"description": (
"Caption text for the chat bubble. Required even when sending "
"attachments — set to a short label like 'Here's the build:' "
"or 'Done — see attached.'\n\n"
"DO NOT paste file URLs, download links, or container paths in "
"this string. Files MUST go through the `attachments` field, "
"which renders as a clickable download chip and works on SaaS "
"deployments where external file-host URLs (catbox.moe, file://, "
"etc.) are unreachable from the user's browser."
),
},
"attachments": {
"type": "array",
"description": (
"REQUIRED for any file delivery. Pass absolute file paths inside "
"THIS container (e.g. ['/tmp/build.zip', '/workspace/report.pdf']) "
"— the platform uploads each file and returns a download chip "
"with the file's icon + name + size in the user's chat. The chip "
"works in SaaS deployments because the URL is platform-served, "
"not an external host.\n\n"
"USE THIS instead of: pasting URLs in `message`, base64-encoding "
"in the body, or telling the user to look at a path on disk. "
"If the file isn't already on disk, write it first (Bash, Write "
"tool, etc.) then pass its path here. 25 MB per file cap."
),
"items": {"type": "string"},
},
"workspace_id": {
"type": "string",
"description": (
"Optional. Set ONLY when this agent is registered in MULTIPLE "
"workspaces (external multi-workspace MCP path) — pass the "
"`arrival_workspace_id` of the inbound message you're replying "
"to so the user sees the reply in the same canvas they typed in. "
"Single-workspace agents omit this; the message routes to the "
"only registered workspace."
),
},
},
"required": ["message"],
},
impl=tool_send_message_to_user,
section=A2A_SECTION,
)
# ---------------------------------------------------------------------------
# Inbox — inbound delivery for the standalone molecule-mcp path.
#
# These tools observe a poller-fed in-memory queue (see workspace/inbox.py).
# They are universally registered so docs + adapters stay aligned, but
# they only return real data in the standalone molecule-mcp runtime;
# in-container runtimes return an informational "not enabled" message
# because their delivery loop is push-based via the canvas WebSocket.
# ---------------------------------------------------------------------------
_WAIT_FOR_MESSAGE = ToolSpec(
name="wait_for_message",
short=(
"Block until the next inbound message (canvas user OR peer "
"agent) arrives, or until ``timeout_secs`` elapses."
),
when_to_use=(
"Standalone-runtime ONLY (molecule-mcp wrapper). After "
"you reply, call this to wait for the next message — forms "
"the loop ``wait_for_message → respond → wait_for_message``. "
"Returns the head message non-destructively; call inbox_pop "
"with the activity_id once you've handled it. In-container "
"runtimes receive messages via push and should not call this."
),
input_schema={
"type": "object",
"properties": {
"timeout_secs": {
"type": "number",
"description": (
"Max seconds to block. Capped at 300. "
"Default 60."
),
},
},
},
impl=tool_wait_for_message,
section=A2A_SECTION,
)
_INBOX_PEEK = ToolSpec(
name="inbox_peek",
short="List pending inbound messages without removing them.",
when_to_use=(
"Standalone-runtime ONLY. Use to inspect what's queued "
"before deciding which to handle. Non-destructive — pair "
"with inbox_pop to consume after replying."
),
input_schema={
"type": "object",
"properties": {
"limit": {
"type": "integer",
"description": "Max messages to return. Default 10.",
},
},
},
impl=tool_inbox_peek,
section=A2A_SECTION,
)
_CHAT_HISTORY = ToolSpec(
name="chat_history",
short="Fetch the prior conversation with one peer (both sides, chronological).",
when_to_use=(
"Call this when a peer_agent push lands and you need context "
"from prior turns with that workspace — e.g. \"what task did "
"this peer assign me last hour?\" or \"what did I tell them?\". "
"Both sides of the conversation appear in chronological order, "
"so the agent reads the log top-down. Cheaper than re-deriving "
"context from memory because the platform already audits every "
"A2A turn into activity_logs. Pair with `agent_card_url` from "
"the channel envelope when you also need the peer's "
"capabilities."
),
input_schema={
"type": "object",
"properties": {
"peer_id": {
"type": "string",
"description": (
"The peer workspace's UUID — same value you got "
"as `peer_id` on the inbound push, or as "
"`workspace_id` from `list_peers`."
),
},
"limit": {
"type": "integer",
"description": (
"Max rows to return (default 20, capped at 500). "
"Default 20 covers \"most recent context\" without "
"flooding the conversation window."
),
},
"before_ts": {
"type": "string",
"description": (
"Optional RFC3339 timestamp; passes through to the "
"server for paging backward through long histories. "
"Use the oldest `created_at` from a previous response."
),
},
"source_workspace_id": {
"type": "string",
"description": (
"Optional. Multi-workspace mode: query the named "
"workspace's activity log instead of the primary one. "
"Auto-routes via the peer-discovery cache when unset."
),
},
},
"required": ["peer_id"],
},
impl=tool_chat_history,
section=A2A_SECTION,
)
_INBOX_POP = ToolSpec(
name="inbox_pop",
short="Remove a handled message from the inbox queue by activity_id.",
when_to_use=(
"Standalone-runtime ONLY. Call after you've replied to a "
"message returned from wait_for_message or inbox_peek to "
"drop it from the queue. Idempotent — popping a missing "
"id reports removed=false without erroring."
),
input_schema={
"type": "object",
"properties": {
"activity_id": {
"type": "string",
"description": (
"activity_id of the message to remove (from "
"inbox_peek / wait_for_message output)."
),
},
},
"required": ["activity_id"],
},
impl=tool_inbox_pop,
section=A2A_SECTION,
)
# ---------------------------------------------------------------------------
# HMA — hierarchical persistent memory
# ---------------------------------------------------------------------------
_COMMIT_MEMORY = ToolSpec(
name="commit_memory",
short="Save a fact to persistent memory; survives across sessions and restarts.",
when_to_use=(
"Scopes: LOCAL (private to you, default), TEAM (shared with "
"parent + siblings), GLOBAL (entire org — only tier-0 root "
"workspaces can write). Commit decisions, learned facts, and "
"completed-task summaries so future sessions and teammates "
"can recall them."
),
input_schema={
"type": "object",
"properties": {
"content": {
"type": "string",
"description": "What to remember — be specific.",
},
"scope": {
"type": "string",
"enum": ["LOCAL", "TEAM", "GLOBAL"],
"description": "Memory scope (default LOCAL).",
},
"source_workspace_id": {
"type": "string",
"description": (
"Optional. Multi-workspace mode: commit the memory "
"into the named workspace's namespace instead of "
"the primary one. Pair with the inbound message's "
"`arrival_workspace_id` so memories stay in the "
"tenant they were derived from."
),
},
},
"required": ["content"],
},
impl=tool_commit_memory,
section=MEMORY_SECTION,
)
_RECALL_MEMORY = ToolSpec(
name="recall_memory",
short="Search persistent memory; returns matching LOCAL + TEAM + GLOBAL rows.",
when_to_use=(
"Call at the start of new work and when picking up something "
"you may have done before. Empty query returns ALL accessible "
"memories — cheap and avoids missing rows that don't match a "
"narrow keyword. Memory is automatically recalled at session "
"start; use this to refresh mid-session."
),
input_schema={
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query (empty returns all).",
},
"scope": {
"type": "string",
"enum": ["LOCAL", "TEAM", "GLOBAL", ""],
"description": "Filter by scope (empty = all accessible).",
},
"source_workspace_id": {
"type": "string",
"description": (
"Optional. Multi-workspace mode: search the named "
"workspace's memories instead of the primary one. "
"Pair with the inbound message's "
"`arrival_workspace_id` to recall context for the "
"right tenant."
),
},
},
},
impl=tool_recall_memory,
section=MEMORY_SECTION,
)
# ---------------------------------------------------------------------------
# Public registry. Keep alphabetically grouped by section for stable
# adapter listings + diff-friendly review.
# ---------------------------------------------------------------------------
TOOLS: list[ToolSpec] = [
# A2A
_DELEGATE_TASK,
_DELEGATE_TASK_ASYNC,
_CHECK_TASK_STATUS,
_LIST_PEERS,
_GET_WORKSPACE_INFO,
_GET_RUNTIME_IDENTITY,
_UPDATE_AGENT_CARD,
_BROADCAST_MESSAGE,
_SEND_MESSAGE_TO_USER,
# Inbox (standalone-only; in-container returns informational error)
_WAIT_FOR_MESSAGE,
_INBOX_PEEK,
_INBOX_POP,
_CHAT_HISTORY,
# HMA
_COMMIT_MEMORY,
_RECALL_MEMORY,
]
def a2a_tools() -> list[ToolSpec]:
"""All A2A-section tools, in registration order."""
return [t for t in TOOLS if t.section == A2A_SECTION]
def memory_tools() -> list[ToolSpec]:
"""All memory-section tools, in registration order."""
return [t for t in TOOLS if t.section == MEMORY_SECTION]
def by_name(name: str) -> ToolSpec:
"""Look up a spec by its canonical name. Raises KeyError if absent."""
for t in TOOLS:
if t.name == name:
return t
raise KeyError(f"no platform tool named {name!r}")
def tool_names() -> list[str]:
"""Canonical names in registration order."""
return [t.name for t in TOOLS]

Some files were not shown because too many files have changed in this diff Show More