chore(runtime): delete core workspace copy
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 17s
E2E Chat / detect-changes (pull_request) Successful in 13s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Harness Replays / detect-changes (pull_request) Successful in 5s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Failing after 1m43s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m36s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 1m29s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m10s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 3s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m23s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 5m13s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m12s
gate-check-v3 / gate-check (pull_request) Successful in 9s
security-review / approved (pull_request) Failing after 5s
qa-review / approved (pull_request) Failing after 7s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 5s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m34s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m15s
CI / Canvas (Next.js) (pull_request) Successful in 6m19s
CI / all-required (pull_request) Successful in 5m46s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 15s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 2m1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m11s
E2E Chat / E2E Chat (pull_request) Failing after 6m36s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 17s
E2E Chat / detect-changes (pull_request) Successful in 13s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Harness Replays / detect-changes (pull_request) Successful in 5s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Failing after 1m43s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m36s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 1m29s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m10s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 3s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m23s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 5m13s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m12s
gate-check-v3 / gate-check (pull_request) Successful in 9s
security-review / approved (pull_request) Failing after 5s
qa-review / approved (pull_request) Failing after 7s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 5s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m34s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m15s
CI / Canvas (Next.js) (pull_request) Successful in 6m19s
CI / all-required (pull_request) Successful in 5m46s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 15s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 2m1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m11s
E2E Chat / E2E Chat (pull_request) Failing after 6m36s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
This commit is contained in:
@@ -1,60 +0,0 @@
|
||||
name: cascade-list-drift-gate
|
||||
|
||||
# Ported from .github/workflows/cascade-list-drift-gate.yml on 2026-05-11
|
||||
# per RFC internal#219 §1 sweep.
|
||||
#
|
||||
# Differences from the GitHub version:
|
||||
# - on.paths reference .gitea/workflows/publish-runtime.yml (the active
|
||||
# Gitea workflow file) instead of .github/workflows/publish-runtime.yml
|
||||
# (which Category A of this sweep deletes).
|
||||
# - Explicit `WORKFLOW=` arg passed to the drift script so it audits the
|
||||
# .gitea/ workflow (the script's default is still .github/... which
|
||||
# will not exist post-Cat-A).
|
||||
# - Workflow-level env.GITHUB_SERVER_URL set per
|
||||
# feedback_act_runner_github_server_url.
|
||||
# - `continue-on-error: true` on the job (RFC §1 contract — surface
|
||||
# defects without blocking; follow-up PR flips after triage).
|
||||
#
|
||||
# Structural gate: TEMPLATES list in publish-runtime.yml must match
|
||||
# manifest.json's workspace_templates exactly. Closes the recurrence
|
||||
# path of PR #2556 (the data fix) and is the first concrete deliverable
|
||||
# of RFC #388 PR-3.
|
||||
#
|
||||
# Triggers narrowly to keep CI quiet: only on PRs that actually change
|
||||
# one of the two files. The path-filtered split + always-emit-result
|
||||
# pattern (memory: "Required check names need a job that always runs")
|
||||
# is unnecessary here because the workflow IS the check name and PR
|
||||
# branch protection should require it directly. Future-proof: if this
|
||||
# becomes a required check, add a no-op aggregator with always() so the
|
||||
# name still emits when paths don't match.
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
branches: [staging, main]
|
||||
paths:
|
||||
- manifest.json
|
||||
- .gitea/workflows/publish-runtime.yml
|
||||
- scripts/check-cascade-list-vs-manifest.sh
|
||||
|
||||
env:
|
||||
GITHUB_SERVER_URL: https://git.moleculesai.app
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
# bp-exempt: drift visibility gate; CI / all-required remains the required aggregate.
|
||||
check:
|
||||
runs-on: ubuntu-latest
|
||||
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
|
||||
# the PR. Follow-up PR flips this off after surfaced defects are
|
||||
# triaged.
|
||||
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
|
||||
continue-on-error: true
|
||||
steps:
|
||||
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
|
||||
- name: Check cascade list matches manifest
|
||||
# Pass the .gitea/ workflow path explicitly — the script's
|
||||
# default still points at .github/... which Category A of this
|
||||
# sweep removes.
|
||||
run: bash scripts/check-cascade-list-vs-manifest.sh manifest.json .gitea/workflows/publish-runtime.yml
|
||||
@@ -1,225 +0,0 @@
|
||||
name: MCP Stdio Transport Regression
|
||||
|
||||
# Regression test for molecule-ai-workspace-runtime#61:
|
||||
# asyncio.connect_read_pipe / connect_write_pipe fail with
|
||||
# ValueError: "Pipe transport is only for pipes, sockets and character devices"
|
||||
# when stdout is a regular file (openclaw capture, CI tee, debugging).
|
||||
#
|
||||
# This workflow reproduces the exact failure mode and verifies the
|
||||
# fallback to direct buffer I/O works. It runs on every PR that
|
||||
# touches the MCP server or this workflow, plus nightly cron.
|
||||
#
|
||||
# Why a separate workflow (not folded into ci.yml python-lint):
|
||||
# - The test needs to spawn the MCP server with stdout redirected
|
||||
# to a regular file (not a TTY/pipe), which conflicts with
|
||||
# pytest's own capture mechanism.
|
||||
# - It exercises the actual process spawn path (python a2a_mcp_server.py)
|
||||
# not just unit-test mocks — closer to the real openclaw integration.
|
||||
# - A dedicated workflow surfaces stdio-specific regressions without
|
||||
# coupling to the broader Python test suite's coverage gate.
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
branches: [main, staging]
|
||||
paths:
|
||||
- 'workspace/a2a_mcp_server.py'
|
||||
- 'workspace/mcp_cli.py'
|
||||
- 'workspace/tests/test_a2a_mcp_server.py'
|
||||
- '.gitea/workflows/ci-mcp-stdio-transport.yml'
|
||||
push:
|
||||
branches: [main, staging]
|
||||
paths:
|
||||
- 'workspace/a2a_mcp_server.py'
|
||||
- 'workspace/mcp_cli.py'
|
||||
- 'workspace/tests/test_a2a_mcp_server.py'
|
||||
- '.gitea/workflows/ci-mcp-stdio-transport.yml'
|
||||
schedule:
|
||||
# Nightly at 04:00 UTC — catches drift from dependency updates
|
||||
# (e.g. asyncio behavior changes in new Python patch releases).
|
||||
- cron: '0 4 * * *'
|
||||
|
||||
concurrency:
|
||||
group: mcp-stdio-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
env:
|
||||
GITHUB_SERVER_URL: https://git.moleculesai.app
|
||||
|
||||
jobs:
|
||||
# bp-exempt: regression canary for runtime#61; not a merge gate — informational only until promoted to required.
|
||||
# mc#774: continue-on-error mask — new workflow, flip to false once it's green on ≥3 consecutive main runs.
|
||||
mcp-stdio-regular-file:
|
||||
name: MCP stdio with regular-file stdout
|
||||
runs-on: ubuntu-latest
|
||||
continue-on-error: true # mc#774
|
||||
timeout-minutes: 5
|
||||
env:
|
||||
WORKSPACE_ID: "00000000-0000-0000-0000-000000000001"
|
||||
defaults:
|
||||
run:
|
||||
working-directory: workspace
|
||||
steps:
|
||||
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
|
||||
with:
|
||||
python-version: '3.11'
|
||||
cache: pip
|
||||
cache-dependency-path: workspace/requirements.txt
|
||||
- run: pip install -r requirements.txt pytest pytest-asyncio pytest-cov
|
||||
|
||||
- name: Reproduce runtime#61 — stdout as regular file
|
||||
run: |
|
||||
set -euo pipefail
|
||||
echo "=== Reproducing molecule-ai-workspace-runtime#61 ==="
|
||||
echo ""
|
||||
echo "Before the fix, this command would fail with:"
|
||||
echo ' ValueError: Pipe transport is only for pipes, sockets and character devices'
|
||||
echo ""
|
||||
|
||||
# Spawn the MCP server with stdout redirected to a regular file.
|
||||
# This is exactly what openclaw does when capturing MCP output.
|
||||
OUTPUT=$(mktemp)
|
||||
trap 'rm -f "$OUTPUT"' EXIT
|
||||
|
||||
# Send initialize request, then tools/list, then exit
|
||||
{
|
||||
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}'
|
||||
echo '{"jsonrpc":"2.0","id":2,"method":"tools/list"}'
|
||||
} | python a2a_mcp_server.py > "$OUTPUT" 2>&1 || {
|
||||
RC=$?
|
||||
echo "FAIL: MCP server exited with code $RC"
|
||||
echo "--- stdout+stderr ---"
|
||||
cat "$OUTPUT"
|
||||
exit 1
|
||||
}
|
||||
|
||||
echo "PASS: MCP server handled regular-file stdout without crashing"
|
||||
echo ""
|
||||
echo "--- Output (first 20 lines) ---"
|
||||
head -20 "$OUTPUT"
|
||||
echo ""
|
||||
|
||||
# Verify we got valid JSON-RPC responses
|
||||
if grep -q '"result"' "$OUTPUT"; then
|
||||
echo "PASS: JSON-RPC responses found in output"
|
||||
else
|
||||
echo "FAIL: No JSON-RPC responses in output"
|
||||
cat "$OUTPUT"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Reproduce runtime#61 — stdin from regular file
|
||||
run: |
|
||||
set -euo pipefail
|
||||
echo "=== stdin as regular file (CI tee / capture pattern) ==="
|
||||
|
||||
INPUT=$(mktemp)
|
||||
OUTPUT=$(mktemp)
|
||||
trap 'rm -f "$INPUT" "$OUTPUT"' EXIT
|
||||
|
||||
cat > "$INPUT" <<'EOF'
|
||||
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}
|
||||
{"jsonrpc":"2.0","id":2,"method":"tools/list"}
|
||||
EOF
|
||||
|
||||
python a2a_mcp_server.py < "$INPUT" > "$OUTPUT" 2>&1 || {
|
||||
RC=$?
|
||||
echo "FAIL: MCP server exited with code $RC"
|
||||
cat "$OUTPUT"
|
||||
exit 1
|
||||
}
|
||||
|
||||
echo "PASS: MCP server handled regular-file stdin without crashing"
|
||||
|
||||
if grep -q '"result"' "$OUTPUT"; then
|
||||
echo "PASS: JSON-RPC responses found in output"
|
||||
else
|
||||
echo "FAIL: No JSON-RPC responses in output"
|
||||
cat "$OUTPUT"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Verify warning is emitted for non-pipe stdio
|
||||
run: |
|
||||
set -euo pipefail
|
||||
echo "=== Verify diagnostic warning ==="
|
||||
|
||||
OUTPUT=$(mktemp)
|
||||
trap 'rm -f "$OUTPUT"' EXIT
|
||||
|
||||
{
|
||||
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}'
|
||||
} | python a2a_mcp_server.py > "$OUTPUT" 2>&1
|
||||
|
||||
# The warning should mention "not a pipe" for operator visibility
|
||||
if grep -qi "not a pipe" "$OUTPUT"; then
|
||||
echo "PASS: Diagnostic warning emitted for non-pipe stdio"
|
||||
else
|
||||
echo "NOTE: No warning in output (may be suppressed by log level)"
|
||||
fi
|
||||
|
||||
- name: Reproduce openclaw failure — pipe held OPEN, no EOF
|
||||
run: |
|
||||
set -euo pipefail
|
||||
echo "=== keep-stdin-open pipe (the real openclaw / Claude Code case) ==="
|
||||
echo ""
|
||||
echo "Before the readline() fix this HANGS: main() did"
|
||||
echo " stdin.read(65536) -> on a pipe, blocks until 64KB OR EOF."
|
||||
echo "An MCP client sends one ~150B initialize and keeps stdin"
|
||||
echo "open waiting for the response, so the server never parsed"
|
||||
echo "the request and the client timed out (openclaw: 'MCP error"
|
||||
echo "-32000: Connection closed'). The earlier regular-file /"
|
||||
echo "heredoc-pipe steps PASSED through this bug because a file"
|
||||
echo "(or a closing heredoc) yields EOF immediately."
|
||||
echo ""
|
||||
|
||||
# Drive the server through a real pipe that stays OPEN: write
|
||||
# one initialize, do NOT close stdin, and require a response
|
||||
# within a hard timeout. read(65536) -> no output -> timeout
|
||||
# kills it -> FAIL. readline() -> immediate response -> PASS.
|
||||
python - <<'PYEOF'
|
||||
import json, subprocess, sys, time, select
|
||||
|
||||
proc = subprocess.Popen(
|
||||
[sys.executable, "a2a_mcp_server.py"],
|
||||
stdin=subprocess.PIPE, stdout=subprocess.PIPE,
|
||||
stderr=subprocess.STDOUT,
|
||||
env={**__import__("os").environ},
|
||||
)
|
||||
req = json.dumps({
|
||||
"jsonrpc": "2.0", "id": 1, "method": "initialize",
|
||||
"params": {"protocolVersion": "2024-11-05",
|
||||
"capabilities": {},
|
||||
"clientInfo": {"name": "keepopen", "version": "1"}},
|
||||
}) + "\n"
|
||||
proc.stdin.write(req.encode())
|
||||
proc.stdin.flush()
|
||||
# Deliberately DO NOT close proc.stdin — mirror a live MCP client.
|
||||
|
||||
deadline = time.time() + 15
|
||||
line = b""
|
||||
while time.time() < deadline:
|
||||
r, _, _ = select.select([proc.stdout], [], [], 1)
|
||||
if r:
|
||||
line = proc.stdout.readline()
|
||||
if line:
|
||||
break
|
||||
proc.kill()
|
||||
|
||||
if not line:
|
||||
print("FAIL: no response within 15s on an open pipe — "
|
||||
"stdin.read(65536) regression is back")
|
||||
sys.exit(1)
|
||||
resp = json.loads(line.decode())
|
||||
assert resp.get("id") == 1 and "result" in resp, \
|
||||
f"unexpected response: {line[:200]!r}"
|
||||
assert resp["result"]["serverInfo"]["name"] == "molecule", \
|
||||
f"wrong serverInfo: {line[:200]!r}"
|
||||
print("PASS: server answered initialize on a still-open pipe")
|
||||
PYEOF
|
||||
|
||||
- name: Run unit tests for stdio transport
|
||||
run: |
|
||||
set -euo pipefail
|
||||
echo "=== Running stdio transport unit tests ==="
|
||||
python -m pytest tests/test_a2a_mcp_server.py::TestStdioPipeAssertion tests/test_a2a_mcp_server.py::TestStdioKeepOpenPipe -v --no-cov
|
||||
+15
-70
@@ -456,84 +456,29 @@ jobs:
|
||||
cat /tmp/deploy-reminder.md >> "$GITHUB_STEP_SUMMARY"
|
||||
|
||||
# Python Lint & Test — required check, always runs.
|
||||
# Runtime Python moved to molecule-ai-workspace-runtime. Keep this context as
|
||||
# a guard so branch protection still catches attempts to reintroduce an
|
||||
# editable runtime copy under molecule-core/workspace/.
|
||||
python-lint:
|
||||
name: Python Lint & Test
|
||||
runs-on: ubuntu-latest
|
||||
# Phase 4 (RFC #219 §1): confirmed green on main 2026-05-12.
|
||||
continue-on-error: false
|
||||
env:
|
||||
WORKSPACE_ID: test
|
||||
defaults:
|
||||
run:
|
||||
working-directory: workspace
|
||||
steps:
|
||||
- if: false
|
||||
working-directory: .
|
||||
run: echo "No workspace/** changes — skipping real lint+test; this job always runs to satisfy the required-check name on branch protection."
|
||||
- if: always()
|
||||
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
- if: always()
|
||||
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
|
||||
with:
|
||||
python-version: '3.11'
|
||||
cache: pip
|
||||
cache-dependency-path: workspace/requirements.txt
|
||||
- if: always()
|
||||
run: pip install -r requirements.txt pytest pytest-asyncio pytest-cov sqlalchemy>=2.0.0
|
||||
# Coverage flags + fail-under floor moved into workspace/pytest.ini
|
||||
# (issue #1817) so local `pytest` and CI use identical config.
|
||||
- if: always()
|
||||
run: python -m pytest --tb=short
|
||||
|
||||
- if: always()
|
||||
name: Per-file critical-path coverage (MCP / inbox / auth)
|
||||
# MCP-critical Python files have a per-file floor on top of the
|
||||
# 86% total floor in pytest.ini. See issue #2790 for full rationale.
|
||||
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
- name: Runtime SSOT guard
|
||||
run: |
|
||||
set -e
|
||||
PER_FILE_FLOOR=75
|
||||
CRITICAL_FILES=(
|
||||
"a2a_mcp_server.py"
|
||||
"mcp_cli.py"
|
||||
"a2a_tools.py"
|
||||
"a2a_tools_inbox.py"
|
||||
"inbox.py"
|
||||
"platform_auth.py"
|
||||
)
|
||||
|
||||
# pytest already wrote .coverage; emit a JSON view scoped to
|
||||
# the critical files so jq/python can read the per-file pct
|
||||
# without parsing tabular text.
|
||||
INCLUDES=$(printf '*%s,' "${CRITICAL_FILES[@]}")
|
||||
INCLUDES="${INCLUDES%,}"
|
||||
python -m coverage json -o /tmp/critical-cov.json --include="$INCLUDES"
|
||||
|
||||
FAILED=0
|
||||
for f in "${CRITICAL_FILES[@]}"; do
|
||||
pct=$(jq -r --arg f "$f" '.files | to_entries | map(select(.key == $f)) | .[0].value.summary.percent_covered // "MISSING"' /tmp/critical-cov.json)
|
||||
if [ "$pct" = "MISSING" ]; then
|
||||
echo "::error file=workspace/$f::No coverage data — file may have moved or test exclusion mis-set."
|
||||
FAILED=$((FAILED+1))
|
||||
continue
|
||||
fi
|
||||
echo "$f: ${pct}%"
|
||||
if awk "BEGIN{exit !($pct < $PER_FILE_FLOOR)}"; then
|
||||
echo "::error file=workspace/$f::${pct}% < ${PER_FILE_FLOOR}% per-file floor (MCP critical path). See COVERAGE_FLOOR.md."
|
||||
FAILED=$((FAILED+1))
|
||||
fi
|
||||
done
|
||||
|
||||
if [ "$FAILED" -gt 0 ]; then
|
||||
echo ""
|
||||
echo "$FAILED MCP critical-path file(s) below the ${PER_FILE_FLOOR}% per-file floor."
|
||||
echo "These paths handle multi-tenant routing, auth tokens, and inbox dispatch."
|
||||
echo "A coverage drop here is the same risk shape as Go-side tokens/secrets files"
|
||||
echo "dropping below 10% (see COVERAGE_FLOOR.md). Either:"
|
||||
echo " (a) add tests to raise coverage back above ${PER_FILE_FLOOR}%, or"
|
||||
echo " (b) if this is unavoidable historical debt, file an issue and propose"
|
||||
echo " adjusting the floor with rationale in COVERAGE_FLOOR.md."
|
||||
set -eu
|
||||
if [ -d workspace ]; then
|
||||
echo "::error file=workspace::Runtime source must live in molecule-ai-workspace-runtime, not molecule-core/workspace."
|
||||
exit 1
|
||||
fi
|
||||
for f in scripts/build_runtime_package.py scripts/test_build_runtime_package.py; do
|
||||
if [ -e "$f" ]; then
|
||||
echo "::error file=$f::Legacy build-from-workspace packaging script must not be restored."
|
||||
exit 1
|
||||
fi
|
||||
done
|
||||
echo "Runtime SSOT guard passed; core consumes the standalone runtime package."
|
||||
|
||||
all-required:
|
||||
# Aggregator sentinel — RFC internal#219 §2 (Phase 4 — closes internal#286).
|
||||
|
||||
@@ -86,8 +86,6 @@ on:
|
||||
- 'workspace-server/internal/middleware/**'
|
||||
- 'workspace-server/internal/handlers/registry.go'
|
||||
- 'workspace-server/internal/handlers/workspace.go'
|
||||
- 'workspace/a2a_mcp_server.py'
|
||||
- 'workspace/platform_tools/registry.py'
|
||||
- 'tests/e2e/test_peer_visibility_mcp_staging.sh'
|
||||
- 'tests/e2e/test_peer_visibility_mcp_local.sh'
|
||||
- 'tests/e2e/lib/peer_visibility_assert.sh'
|
||||
@@ -100,8 +98,6 @@ on:
|
||||
- 'workspace-server/internal/middleware/**'
|
||||
- 'workspace-server/internal/handlers/registry.go'
|
||||
- 'workspace-server/internal/handlers/workspace.go'
|
||||
- 'workspace/a2a_mcp_server.py'
|
||||
- 'workspace/platform_tools/registry.py'
|
||||
- 'tests/e2e/test_peer_visibility_mcp_staging.sh'
|
||||
- 'tests/e2e/test_peer_visibility_mcp_local.sh'
|
||||
- 'tests/e2e/lib/peer_visibility_assert.sh'
|
||||
|
||||
@@ -1,177 +0,0 @@
|
||||
name: publish-runtime-autobump
|
||||
|
||||
# Auto-bump-on-workspace-edit half of the publish pipeline.
|
||||
#
|
||||
# Why this file exists (issue #351):
|
||||
# Gitea Actions does not correctly disambiguate `paths:` from `tags:`
|
||||
# when both are bundled under a single `on.push` key. The result is
|
||||
# that tag pushes get filtered out and `publish-runtime.yml` never
|
||||
# fires — `action_run` rows: 0. This was unnoticed pre-2026-05-11
|
||||
# because PYPI_TOKEN was absent (publishes would have failed anyway).
|
||||
#
|
||||
# Split design:
|
||||
# - publish-runtime.yml : on.push.tags only (the publisher)
|
||||
# - publish-runtime-autobump.yml: on.push.branches+paths (this file — the version-bumper)
|
||||
#
|
||||
# This file computes the next version from PyPI's latest, pushes a
|
||||
# `runtime-v$VERSION` tag, and exits. The tag push then triggers
|
||||
# publish-runtime.yml via its tags-only trigger.
|
||||
#
|
||||
# Concurrency: shares the `publish-runtime` group with publish-runtime.yml
|
||||
# so concurrent workspace pushes serialize at the bump step. Without
|
||||
# this, two pushes minutes apart could both read PyPI latest=0.1.129
|
||||
# and try to tag 0.1.130 simultaneously, only one of which would land.
|
||||
|
||||
on:
|
||||
# Run on PR pushes to post a success status so Gitea can merge the PR.
|
||||
# All steps use continue-on-error: true so operational failures
|
||||
# (PyPI unreachable, DISPATCH_TOKEN missing) do not block merge.
|
||||
pull_request:
|
||||
paths:
|
||||
- "workspace/**"
|
||||
# mc#1578 / a05add29 cure: build_runtime_package.py owns PYPROJECT_TEMPLATE
|
||||
# (deps, classifiers, project metadata). A change there is publish-affecting
|
||||
# even when workspace/** is untouched, so the autobump must fire to claim
|
||||
# the next runtime-v$VERSION tag. Without this, manual tagging races PyPI
|
||||
# (e.g. runtime-v0.1.18 collided with the 2026-04-27 PyPI 0.1.18 publish,
|
||||
# blocking the python-multipart pin from reaching prod).
|
||||
- "scripts/build_runtime_package.py"
|
||||
- "scripts/test_build_runtime_package.py"
|
||||
# Bump-and-tag on main/staging push (the actual operational trigger).
|
||||
push:
|
||||
branches:
|
||||
- main
|
||||
- staging
|
||||
paths:
|
||||
- "workspace/**"
|
||||
- "scripts/build_runtime_package.py"
|
||||
- "scripts/test_build_runtime_package.py"
|
||||
# Manual dispatch — useful when Gitea Actions API (/actions/*) is
|
||||
# unreachable (e.g. act_runner 404 on Gitea 1.22.6) and we cannot
|
||||
# re-trigger via curl.
|
||||
workflow_dispatch:
|
||||
|
||||
permissions:
|
||||
contents: write # required to push tags back
|
||||
|
||||
concurrency:
|
||||
group: publish-runtime
|
||||
cancel-in-progress: false
|
||||
|
||||
jobs:
|
||||
# PR-validation path: always succeeds so Gitea can merge workflow-only PRs.
|
||||
# Operational failures (PyPI unreachable, missing DISPATCH_TOKEN) are
|
||||
# surfaced via continue-on-error: true rather than blocking the merge.
|
||||
# The actual bump work happens on the main/staging push after merge.
|
||||
# bp-exempt: advisory validation for runtime publication; not a branch-protection gate.
|
||||
pr-validate:
|
||||
runs-on: ubuntu-latest
|
||||
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
|
||||
continue-on-error: true # do not block PR merge on operational failures
|
||||
steps:
|
||||
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
with:
|
||||
fetch-depth: 1
|
||||
|
||||
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
|
||||
with:
|
||||
python-version: "3.11"
|
||||
|
||||
- name: Validate PyPI connectivity (best-effort)
|
||||
run: |
|
||||
set -eu
|
||||
echo "=== Checking PyPI accessibility ==="
|
||||
LATEST=$(curl -fsS --retry 3 --max-time 10 \
|
||||
https://pypi.org/pypi/molecule-ai-workspace-runtime/json \
|
||||
| python -c "import sys,json; print(json.load(sys.stdin)['info']['version'])" \
|
||||
|| echo "PyPI unreachable (non-blocking for PR validation)")
|
||||
echo "Latest: ${LATEST:-unknown}"
|
||||
|
||||
# Actual bump-and-tag: runs on main/staging pushes, posts real success/failure.
|
||||
# No continue-on-error — operational failures here trip the main-red
|
||||
# watchdog, which is the desired signal for infrastructure degradation.
|
||||
# bp-exempt: post-merge tag publication side effect; CI / all-required gates source changes.
|
||||
bump-and-tag:
|
||||
runs-on: ubuntu-latest
|
||||
# Only fire on push events (main/staging after PR merge). Pull_request
|
||||
# events are handled by pr-validate above; we do NOT bump on every
|
||||
# push-synchronize because that would race with the PR head.
|
||||
#
|
||||
# NOTE: the prior condition `github.event.pull_request.base.ref == ''`
|
||||
# was broken — on a PR-merge push in Gitea Actions, the pull_request
|
||||
# context is still attached (base.ref='main'), so the condition always
|
||||
# evaluated to false and bump-and-tag was permanently skipped.
|
||||
if: github.event_name == 'push'
|
||||
steps:
|
||||
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
with:
|
||||
fetch-depth: 1
|
||||
|
||||
- name: Fetch tags for collision check
|
||||
run: git fetch origin --tags --depth=1
|
||||
|
||||
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
|
||||
with:
|
||||
python-version: "3.11"
|
||||
|
||||
- name: Compute next version from PyPI latest and existing tags
|
||||
id: bump
|
||||
run: |
|
||||
set -eu
|
||||
LATEST=$(curl -fsS --retry 3 https://pypi.org/pypi/molecule-ai-workspace-runtime/json \
|
||||
| python -c "import sys,json; print(json.load(sys.stdin)['info']['version'])")
|
||||
MAJOR=$(echo "$LATEST" | cut -d. -f1)
|
||||
MINOR=$(echo "$LATEST" | cut -d. -f2)
|
||||
TAG_LATEST=$(git tag --list "runtime-v${MAJOR}.${MINOR}.*" \
|
||||
| sed -E 's/^runtime-v//' \
|
||||
| grep -E '^[0-9]+\.[0-9]+\.[0-9]+$' \
|
||||
| sort -V \
|
||||
| tail -1 || true)
|
||||
VERSION=$(PYPI_LATEST="$LATEST" TAG_LATEST="$TAG_LATEST" python - <<'PY'
|
||||
import os
|
||||
|
||||
def parse(v):
|
||||
return tuple(int(part) for part in v.split("."))
|
||||
|
||||
pypi = os.environ["PYPI_LATEST"]
|
||||
tag = os.environ.get("TAG_LATEST") or pypi
|
||||
base = max(parse(pypi), parse(tag))
|
||||
print(f"{base[0]}.{base[1]}.{base[2] + 1}")
|
||||
PY
|
||||
)
|
||||
echo "PyPI latest=$LATEST, latest runtime tag=${TAG_LATEST:-none} -> next=$VERSION"
|
||||
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||||
echo "::error::computed version $VERSION does not match PEP 440 X.Y.Z"
|
||||
exit 1
|
||||
fi
|
||||
if git tag --list | grep -qx "runtime-v$VERSION"; then
|
||||
echo "::error::tag runtime-v$VERSION already exists in this repo. Manual intervention required (PyPI and Gitea tag history are out of sync)."
|
||||
exit 1
|
||||
fi
|
||||
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Push runtime-v$VERSION tag
|
||||
env:
|
||||
DISPATCH_TOKEN: ${{ secrets.DISPATCH_TOKEN }}
|
||||
VERSION: ${{ steps.bump.outputs.version }}
|
||||
GITEA_URL: https://git.moleculesai.app
|
||||
run: |
|
||||
set -eu
|
||||
if [ -z "$DISPATCH_TOKEN" ]; then
|
||||
echo "::error::DISPATCH_TOKEN secret is not set — needed to push the tag back to molecule-core."
|
||||
exit 1
|
||||
fi
|
||||
git config user.name "publish-runtime autobump"
|
||||
git config user.email "publish-runtime@moleculesai.app"
|
||||
git tag -a "runtime-v$VERSION" \
|
||||
-m "Auto-bump on workspace/** edit on $GITHUB_REF" \
|
||||
-m "Triggered by: $GITHUB_REF @ $GITHUB_SHA" \
|
||||
-m "publish-runtime.yml will pick up this tag and upload to PyPI"
|
||||
# Push via DISPATCH_TOKEN (a Gitea PAT). Using the bot identity
|
||||
# ensures the resulting tag-push event is dispatched to
|
||||
# publish-runtime.yml; act_runner's default GITHUB_TOKEN cannot
|
||||
# trigger downstream workflows.
|
||||
git remote set-url origin "${GITEA_URL#https://}"
|
||||
git remote set-url origin "https://x-access-token:${DISPATCH_TOKEN}@${GITEA_URL#https://}/molecule-ai/molecule-core.git"
|
||||
git push origin "runtime-v$VERSION"
|
||||
echo "✓ pushed runtime-v$VERSION — publish-runtime.yml should fire next"
|
||||
@@ -1,437 +0,0 @@
|
||||
name: publish-runtime
|
||||
|
||||
# Gitea Actions port of .github/workflows/publish-runtime.yml.
|
||||
#
|
||||
# Ported 2026-05-10 (issue #206). Key differences from the GitHub version:
|
||||
# - Gitea Actions reads .gitea/workflows/, not .github/workflows/
|
||||
# - Dropped `environment: pypi-publish` — Gitea Actions does not support
|
||||
# named environments or OIDC trusted publishers
|
||||
# - Replaced `pypa/gh-action-pypi-publish@release/v1` (OIDC) with
|
||||
# `twine upload` using PYPI_TOKEN secret — same mechanism as a local
|
||||
# `python -m twine upload` with a PyPI token
|
||||
# - Replaced `github.ref_name` (GitHub-only) with `${GITHUB_REF#refs/tags/}`
|
||||
# — Gitea Actions exposes github.ref (the full ref) but not ref_name
|
||||
# - Dropped `merge_group` trigger (Gitea has no merge queue)
|
||||
#
|
||||
# 2026-05-10 (issue #348): originally restored `staging`/`main` branch +
|
||||
# `workspace/**` path-filter trigger in PR #349.
|
||||
#
|
||||
# 2026-05-11 (issue #351): REVERTED the branches+paths trigger from THIS
|
||||
# file. Bundling `paths` with `tags` under a single `on.push` key caused
|
||||
# Gitea Actions to never dispatch the workflow for tag-push events (0
|
||||
# runs in `action_run` for workflow_id='publish-runtime.yml' since the
|
||||
# port, including the runtime-v1.0.0 tag — which is why PyPI is still at
|
||||
# 0.1.129 despite a v1.0.0 Gitea tag existing).
|
||||
#
|
||||
# The auto-bump-on-workspace-edit trigger now lives in
|
||||
# `.gitea/workflows/publish-runtime-autobump.yml`. That file computes the
|
||||
# next version from PyPI's latest and pushes a `runtime-v$VERSION` tag,
|
||||
# which THIS file then picks up via the tags-only trigger below.
|
||||
#
|
||||
# This decoupling means Gitea's path-vs-tag evaluator never has to
|
||||
# disambiguate — each file has a single unambiguous trigger shape.
|
||||
#
|
||||
# PyPI publishing: requires PYPI_TOKEN repository secret (or org-level secret).
|
||||
# Set via: repo Settings → Actions → Variables and Secrets → New Secret.
|
||||
# The token should be a PyPI API token scoped to molecule-ai-workspace-runtime.
|
||||
#
|
||||
# The DISPATCH_TOKEN cascade (git push to template repos) is unchanged —
|
||||
# it uses the Gitea API directly and was already Gitea-compatible.
|
||||
|
||||
on:
|
||||
push:
|
||||
tags:
|
||||
- "runtime-v*"
|
||||
workflow_dispatch:
|
||||
# 2026-05-11 (root cause of #351 / 0 runs ever):
|
||||
# Gitea 1.22.6's workflow parser rejects `workflow_dispatch.inputs.version`
|
||||
# with "unknown on type" — it mis-treats the inputs sub-keys as top-level
|
||||
# `on:` event types. Log line:
|
||||
# actions/workflows.go:DetectWorkflows() [W] ignore invalid workflow
|
||||
# "publish-runtime.yml": unknown on type: map["version": {...}]
|
||||
# That `[W] ignore invalid workflow` is silent UX — the workflow never
|
||||
# registers, so it never fires for ANY event (push.tags included).
|
||||
# Removing the inputs block restores parsing. Manual dispatch from the
|
||||
# Gitea UI now triggers the PyPI auto-bump fallback in `Derive version`
|
||||
# below (no `inputs.version` to read).
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
# Serialize publishes so two concurrent tag pushes don't both compute
|
||||
# "latest+1" and race on PyPI upload. The second one waits.
|
||||
concurrency:
|
||||
group: publish-runtime
|
||||
cancel-in-progress: false
|
||||
|
||||
jobs:
|
||||
publish:
|
||||
# Dedicated publish/release lane (internal#462 / #394 / #399). Ship
|
||||
# path (on: push tag runtime-v*) — reserved capacity, never FIFO
|
||||
# behind PR-CI. `publish` resolves only to molecule-runner-publish-*.
|
||||
runs-on: publish
|
||||
outputs:
|
||||
version: ${{ steps.version.outputs.version }}
|
||||
wheel_sha256: ${{ steps.wheel_hash.outputs.wheel_sha256 }}
|
||||
steps:
|
||||
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
|
||||
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
|
||||
with:
|
||||
python-version: "3.11"
|
||||
cache: pip
|
||||
|
||||
- name: Derive version (tag or PyPI auto-bump)
|
||||
id: version
|
||||
run: |
|
||||
if echo "$GITHUB_REF" | grep -q "^refs/tags/runtime-v"; then
|
||||
# Tag is `runtime-vX.Y.Z` — strip the prefix.
|
||||
VERSION="${GITHUB_REF#refs/tags/runtime-v}"
|
||||
else
|
||||
# workflow_dispatch path (no inputs supported on Gitea 1.22.6) or
|
||||
# any other non-tag trigger: derive from PyPI latest + patch bump.
|
||||
LATEST=$(curl -fsS --retry 3 https://pypi.org/pypi/molecule-ai-workspace-runtime/json \
|
||||
| python -c "import sys,json; print(json.load(sys.stdin)['info']['version'])")
|
||||
MAJOR=$(echo "$LATEST" | cut -d. -f1)
|
||||
MINOR=$(echo "$LATEST" | cut -d. -f2)
|
||||
PATCH=$(echo "$LATEST" | cut -d. -f3)
|
||||
VERSION="${MAJOR}.${MINOR}.$((PATCH+1))"
|
||||
echo "Auto-bumped from PyPI latest $LATEST -> $VERSION"
|
||||
fi
|
||||
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+(\.dev[0-9]+|rc[0-9]+|a[0-9]+|b[0-9]+|\.post[0-9]+)?$'; then
|
||||
echo "::error::version $VERSION does not match PEP 440"
|
||||
exit 1
|
||||
fi
|
||||
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
|
||||
echo "Publishing molecule-ai-workspace-runtime $VERSION"
|
||||
|
||||
- name: Install build tooling
|
||||
run: pip install build twine
|
||||
|
||||
- name: Build package from workspace/
|
||||
run: |
|
||||
python scripts/build_runtime_package.py \
|
||||
--version "${{ steps.version.outputs.version }}" \
|
||||
--out "${{ runner.temp }}/runtime-build"
|
||||
|
||||
- name: Build wheel + sdist
|
||||
working-directory: ${{ runner.temp }}/runtime-build
|
||||
run: python -m build
|
||||
|
||||
- name: Capture wheel SHA256 for cascade content-verification
|
||||
id: wheel_hash
|
||||
working-directory: ${{ runner.temp }}/runtime-build
|
||||
run: |
|
||||
set -eu
|
||||
WHEEL=$(ls dist/*.whl 2>/dev/null | head -1)
|
||||
if [ -z "$WHEEL" ]; then
|
||||
echo "::error::No .whl in dist/ — \`python -m build\` must have failed silently"
|
||||
exit 1
|
||||
fi
|
||||
HASH=$(sha256sum "$WHEEL" | awk '{print $1}')
|
||||
echo "wheel_sha256=${HASH}" >> "$GITHUB_OUTPUT"
|
||||
echo "Local wheel SHA256 (pre-upload): ${HASH}"
|
||||
echo "Wheel filename: $(basename "$WHEEL")"
|
||||
|
||||
- name: Verify package contents (sanity)
|
||||
working-directory: ${{ runner.temp }}/runtime-build
|
||||
run: |
|
||||
python -m twine check dist/*
|
||||
python -m venv /tmp/smoke
|
||||
/tmp/smoke/bin/pip install --quiet dist/*.whl
|
||||
/tmp/smoke/bin/python "$GITHUB_WORKSPACE/scripts/wheel_smoke.py"
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────
|
||||
# RFC#596 (2026-05-19): Gitea PyPI registry as PRIMARY, PyPI as
|
||||
# best-effort fallback. Eliminates the SPOF that caused the
|
||||
# 2026-05-19 P0 (PyPI abuse-block #593 + Railway outage #595).
|
||||
#
|
||||
# Order is inverted intentionally:
|
||||
# 1. Gitea FIRST — must succeed (our internal SSOT).
|
||||
# 2. PyPI SECOND — best-effort, non-fatal on failure (courtesy
|
||||
# mirror; our consumers don't depend on it after Phase 4
|
||||
# template Dockerfile updates).
|
||||
#
|
||||
# Endpoint shape (verified live in RFC#596 Phase 5):
|
||||
# POST https://git.moleculesai.app/api/packages/molecule-ai/pypi/
|
||||
# HTTP Basic auth: username = gitea username, password = PAT with
|
||||
# `write:package` scope. Returns 201 Created on success.
|
||||
# ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
- name: Publish to Gitea PyPI registry (PRIMARY)
|
||||
id: gitea_publish
|
||||
working-directory: ${{ runner.temp }}/runtime-build
|
||||
env:
|
||||
# MOLECULE_PYPI_GITEA_PUBLISHER_USER: Gitea username for the publisher
|
||||
# persona (must own a token with `write:package` scope).
|
||||
# Provisioned in RFC#596 Phase 3 (operator-config PR).
|
||||
# NOTE: secret name MUST NOT start with `GITEA_` or `GITHUB_` —
|
||||
# Gitea 1.22.6 reserves those prefixes for built-in env vars and
|
||||
# rejects repo-secret PUT with HTTP 400 / "invalid secret name".
|
||||
# Empirically reproduced 2026-05-19 against
|
||||
# `/repos/molecule-ai/molecule-core/actions/secrets/GITEA_*`.
|
||||
MOLECULE_PYPI_GITEA_PUBLISHER_USER: ${{ secrets.MOLECULE_PYPI_GITEA_PUBLISHER_USER }}
|
||||
# MOLECULE_PYPI_GITEA_PUBLISHER_TOKEN: PAT for the publisher persona,
|
||||
# `write:package` scope on molecule-ai org.
|
||||
# Synced from Infisical /ci/gitea-pypi-publisher (RFC#596 Phase 3).
|
||||
MOLECULE_PYPI_GITEA_PUBLISHER_TOKEN: ${{ secrets.MOLECULE_PYPI_GITEA_PUBLISHER_TOKEN }}
|
||||
run: |
|
||||
set -eu
|
||||
if [ -z "${MOLECULE_PYPI_GITEA_PUBLISHER_TOKEN:-}" ] || [ -z "${MOLECULE_PYPI_GITEA_PUBLISHER_USER:-}" ]; then
|
||||
echo "::error::MOLECULE_PYPI_GITEA_PUBLISHER_USER / MOLECULE_PYPI_GITEA_PUBLISHER_TOKEN secrets are not set."
|
||||
echo "::error::Provision them via the RFC#596 Phase 3 operator-config sync script."
|
||||
echo "::error::Gitea is the PRIMARY index per RFC#596 — publish job aborts here, NOT after PyPI."
|
||||
exit 1
|
||||
fi
|
||||
python -m twine upload \
|
||||
--verbose \
|
||||
--repository-url "https://git.moleculesai.app/api/packages/molecule-ai/pypi/" \
|
||||
--username "$MOLECULE_PYPI_GITEA_PUBLISHER_USER" \
|
||||
--password "$MOLECULE_PYPI_GITEA_PUBLISHER_TOKEN" \
|
||||
dist/*
|
||||
echo "gitea_status=success" >> "$GITHUB_OUTPUT"
|
||||
echo "gitea_url=https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/molecule-ai-workspace-runtime" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Publish to PyPI (FALLBACK, best-effort)
|
||||
id: pypi_publish
|
||||
# working-directory matches the preceding Build/Verify steps. Without
|
||||
# this, twine runs from the default workspace checkout dir where
|
||||
# `dist/` doesn't exist and fails with:
|
||||
# ERROR InvalidDistribution: Cannot find file (or expand pattern): 'dist/*'
|
||||
# Caught on the first-ever successful dispatch of this workflow
|
||||
# (run 5097, 2026-05-11 02:08Z) — every other step in the publish
|
||||
# job already had this working-directory; Publish was missing it.
|
||||
#
|
||||
# RFC#596: this step is `continue-on-error: true` because PyPI is
|
||||
# NO LONGER the primary index. PyPI 403/timeout/abuse-block does
|
||||
# NOT block the publish — Gitea already has the wheel.
|
||||
continue-on-error: true
|
||||
working-directory: ${{ runner.temp }}/runtime-build
|
||||
env:
|
||||
# PYPI_TOKEN: repository secret scoped to molecule-ai-workspace-runtime.
|
||||
# Set via: Settings → Actions → Variables and Secrets → New Secret.
|
||||
# Format: pypi-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
|
||||
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
|
||||
run: |
|
||||
if [ -z "$PYPI_TOKEN" ]; then
|
||||
echo "::warning::PYPI_TOKEN secret is not set — skipping PyPI mirror publish (non-fatal per RFC#596)."
|
||||
echo "pypi_status=skipped_no_token" >> "$GITHUB_OUTPUT"
|
||||
exit 0
|
||||
fi
|
||||
if python -m twine upload \
|
||||
--verbose \
|
||||
--repository pypi \
|
||||
--username __token__ \
|
||||
--password "$PYPI_TOKEN" \
|
||||
dist/*; then
|
||||
echo "pypi_status=success" >> "$GITHUB_OUTPUT"
|
||||
else
|
||||
rc=$?
|
||||
echo "::warning::PyPI mirror publish failed (exit $rc). Non-fatal per RFC#596 — Gitea has the wheel."
|
||||
echo "pypi_status=failed_exit_$rc" >> "$GITHUB_OUTPUT"
|
||||
fi
|
||||
echo "pypi_url=https://pypi.org/project/molecule-ai-workspace-runtime/${{ steps.version.outputs.version }}/" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Publish job summary (Gitea + PyPI status)
|
||||
if: always()
|
||||
run: |
|
||||
{
|
||||
echo "## publish-runtime $(date -u +%FT%TZ)"
|
||||
echo
|
||||
echo "**Version:** \`${{ steps.version.outputs.version }}\`"
|
||||
echo "**Wheel SHA256:** \`${{ steps.wheel_hash.outputs.wheel_sha256 }}\`"
|
||||
echo
|
||||
echo "### Indexes"
|
||||
echo
|
||||
echo "| Index | Status | URL |"
|
||||
echo "|---------|-------------------------------------------------|-----|"
|
||||
echo "| Gitea (PRIMARY) | ${{ steps.gitea_publish.outputs.gitea_status || 'failed' }} | ${{ steps.gitea_publish.outputs.gitea_url || '—' }} |"
|
||||
echo "| PyPI (fallback) | ${{ steps.pypi_publish.outputs.pypi_status || 'failed' }} | ${{ steps.pypi_publish.outputs.pypi_url || '—' }} |"
|
||||
echo
|
||||
echo "Per RFC#596: Gitea is the contract. PyPI is best-effort."
|
||||
} >> "$GITHUB_STEP_SUMMARY"
|
||||
|
||||
cascade:
|
||||
needs: publish
|
||||
# Publish/release lane (internal#462) — downstream of the runtime
|
||||
# publish ship job; keep it on the reserved lane too.
|
||||
runs-on: publish
|
||||
steps:
|
||||
- name: Wait for PyPI to propagate the new version
|
||||
env:
|
||||
RUNTIME_VERSION: ${{ needs.publish.outputs.version }}
|
||||
EXPECTED_SHA256: ${{ needs.publish.outputs.wheel_sha256 }}
|
||||
run: |
|
||||
set -eu
|
||||
if [ -z "$EXPECTED_SHA256" ]; then
|
||||
echo "::error::publish job did not expose wheel_sha256 — cannot verify wheel content. Refusing to fan out cascade."
|
||||
exit 1
|
||||
fi
|
||||
# NOTE (RFC#596 follow-up): this propagation probe still resolves
|
||||
# against PyPI's default index. After RFC#596 Phase 4 lands and
|
||||
# consumers pull from Gitea first, this probe should be rewritten
|
||||
# to verify the Gitea simple/ endpoint serves the new wheel
|
||||
# (PyPI may be best-effort-failed and the cascade should still
|
||||
# fan out, since templates will pull from Gitea). Tracked in #596.
|
||||
python -m venv /tmp/propagation-probe
|
||||
PROBE=/tmp/propagation-probe/bin
|
||||
$PROBE/pip install --upgrade --quiet pip
|
||||
for i in $(seq 1 30); do
|
||||
if $PROBE/pip install \
|
||||
--quiet \
|
||||
--no-cache-dir \
|
||||
--force-reinstall \
|
||||
--no-deps \
|
||||
"molecule-ai-workspace-runtime==${RUNTIME_VERSION}" \
|
||||
>/dev/null 2>&1; then
|
||||
INSTALLED=$($PROBE/pip show molecule-ai-workspace-runtime 2>/dev/null \
|
||||
| awk -F': ' '/^Version:/{print $2}')
|
||||
if [ "$INSTALLED" = "$RUNTIME_VERSION" ]; then
|
||||
echo "✓ PyPI resolved $RUNTIME_VERSION (install check)"
|
||||
break
|
||||
fi
|
||||
fi
|
||||
if [ $i -eq 30 ]; then
|
||||
echo "::error::pip install --no-cache-dir molecule-ai-workspace-runtime==${RUNTIME_VERSION} never resolved within ~5 min."
|
||||
echo "::error::Refusing to fan out cascade against a potentially stale PyPI index."
|
||||
exit 1
|
||||
fi
|
||||
echo " [$i/30] waiting for PyPI to propagate ${RUNTIME_VERSION}..."
|
||||
sleep 4
|
||||
done
|
||||
|
||||
# Stage (b): download wheel + SHA256 compare against what we built.
|
||||
# Catches Fastly stale-content serving old bytes under a new version URL.
|
||||
#
|
||||
# Caught run 5196 (first-ever successful publish, 2026-05-11): the
|
||||
# previous one-liner `HASH=$(pip download ... && sha256sum ...)`
|
||||
# captured pip's stdout (`Collecting molecule-ai-workspace-runtime
|
||||
# ==X.Y.Z`) into HASH, then the SHA comparison failed against the
|
||||
# leaked `Collecting...` string. `2>/dev/null` silences stderr but
|
||||
# NOT stdout; pip writes its progress to stdout by default.
|
||||
# Fix: split into two steps, silence pip's stdout explicitly, capture
|
||||
# only sha256sum's output into HASH.
|
||||
python -m pip download \
|
||||
--no-deps \
|
||||
--no-cache-dir \
|
||||
--dest /tmp/wheel-probe \
|
||||
--quiet \
|
||||
"molecule-ai-workspace-runtime==${RUNTIME_VERSION}" \
|
||||
>/dev/null 2>&1
|
||||
HASH=$(sha256sum /tmp/wheel-probe/*.whl | awk '{print $1}')
|
||||
if [ "$HASH" != "$EXPECTED_SHA256" ]; then
|
||||
echo "::error::PyPI propagated $RUNTIME_VERSION but wheel content SHA256 mismatch."
|
||||
echo "::error::Expected: $EXPECTED_SHA256"
|
||||
echo "::error::Got: $HASH"
|
||||
echo "::error::Fastly may be serving stale content. Refusing to fan out cascade."
|
||||
exit 1
|
||||
fi
|
||||
echo "✓ PyPI CDN verified (SHA256 match)"
|
||||
|
||||
- name: Fan out via push to .runtime-version
|
||||
env:
|
||||
# Gitea PAT with write:repository scope on the 8 cascade-active
|
||||
# template repos. Used for git push to each template repo's main
|
||||
# branch, which trips their `on: push: branches: [main]` trigger
|
||||
# on publish-image.yml.
|
||||
DISPATCH_TOKEN: ${{ secrets.DISPATCH_TOKEN }}
|
||||
RUNTIME_VERSION: ${{ needs.publish.outputs.version }}
|
||||
run: |
|
||||
set +e # don't abort on a single repo failure — collect them all
|
||||
|
||||
if [ -z "$DISPATCH_TOKEN" ]; then
|
||||
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
||||
echo "::warning::DISPATCH_TOKEN secret not set — skipping cascade."
|
||||
echo "::warning::set it at Settings → Actions → Variables and Secrets → New Secret."
|
||||
exit 0
|
||||
fi
|
||||
echo "::error::DISPATCH_TOKEN secret missing — cascade cannot fan out."
|
||||
echo "::error::PyPI was published, but the 8 template repos will NOT pick up the new version."
|
||||
exit 1
|
||||
fi
|
||||
VERSION="$RUNTIME_VERSION"
|
||||
if [ -z "$VERSION" ]; then
|
||||
echo "::error::publish job did not expose a version output"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
GITEA_URL="${GITEA_URL:-https://git.moleculesai.app}"
|
||||
# Keep in lockstep with manifest.json workspace_templates (suffix-stripped).
|
||||
# Guarded by scripts/check-cascade-list-vs-manifest.sh (cascade-list-drift-gate).
|
||||
# 2026-05-19: pruned crewai/deepagents/gemini-cli — not in manifest.
|
||||
TEMPLATES="claude-code hermes openclaw codex langgraph autogen"
|
||||
FAILED=""
|
||||
SKIPPED=""
|
||||
|
||||
git config --global user.name "publish-runtime cascade"
|
||||
git config --global user.email "publish-runtime@moleculesai.app"
|
||||
|
||||
WORKDIR="$(mktemp -d)"
|
||||
for tpl in $TEMPLATES; do
|
||||
REPO="molecule-ai/molecule-ai-workspace-template-$tpl"
|
||||
CLONE="$WORKDIR/$tpl"
|
||||
|
||||
HTTP=$(curl -sS -o /dev/null -w "%{http_code}" \
|
||||
-H "Authorization: token $DISPATCH_TOKEN" \
|
||||
"$GITEA_URL/api/v1/repos/$REPO/contents/.github/workflows/publish-image.yml")
|
||||
if [ "$HTTP" = "404" ]; then
|
||||
echo "↷ $tpl has no publish-image.yml — soft-skip"
|
||||
SKIPPED="$SKIPPED $tpl"
|
||||
continue
|
||||
fi
|
||||
|
||||
attempt=0
|
||||
success=false
|
||||
while [ $attempt -lt 3 ]; do
|
||||
attempt=$((attempt + 1))
|
||||
rm -rf "$CLONE"
|
||||
if ! git clone --depth=1 \
|
||||
"https://x-access-token:${DISPATCH_TOKEN}@${GITEA_URL#https://}/$REPO.git" \
|
||||
"$CLONE" >/tmp/clone.log 2>&1; then
|
||||
echo "::warning::clone $tpl attempt $attempt failed: $(tail -n3 /tmp/clone.log)"
|
||||
sleep 2
|
||||
continue
|
||||
fi
|
||||
|
||||
cd "$CLONE"
|
||||
echo "$VERSION" > .runtime-version
|
||||
|
||||
if git diff --quiet -- .runtime-version; then
|
||||
echo "✓ $tpl already at $VERSION — no commit needed"
|
||||
success=true
|
||||
cd - >/dev/null
|
||||
break
|
||||
fi
|
||||
|
||||
git add .runtime-version
|
||||
git commit -m "chore: pin runtime to $VERSION (publish-runtime cascade)" \
|
||||
-m "Co-Authored-By: publish-runtime cascade <publish-runtime@moleculesai.app>" \
|
||||
>/dev/null
|
||||
|
||||
if git push origin HEAD:main >/tmp/push.log 2>&1; then
|
||||
echo "✓ $tpl pushed $VERSION on attempt $attempt"
|
||||
success=true
|
||||
cd - >/dev/null
|
||||
break
|
||||
fi
|
||||
|
||||
echo "::warning::push $tpl attempt $attempt failed, pull-rebasing"
|
||||
git pull --rebase origin main >/tmp/rebase.log 2>&1 || true
|
||||
cd - >/dev/null
|
||||
done
|
||||
|
||||
if [ "$success" != "true" ]; then
|
||||
FAILED="$FAILED $tpl"
|
||||
fi
|
||||
done
|
||||
rm -rf "$WORKDIR"
|
||||
|
||||
if [ -n "$FAILED" ]; then
|
||||
echo "::error::Cascade incomplete after 3 retries each. Failed:$FAILED"
|
||||
exit 1
|
||||
fi
|
||||
if [ -n "$SKIPPED" ]; then
|
||||
echo "Cascade complete: pinned $VERSION. Soft-skipped (no publish-image.yml):$SKIPPED"
|
||||
else
|
||||
echo "Cascade complete: $VERSION pinned across all manifest workspace_templates."
|
||||
fi
|
||||
@@ -1,101 +0,0 @@
|
||||
name: Runtime Pin Compatibility
|
||||
|
||||
# Ported from .github/workflows/runtime-pin-compat.yml on 2026-05-11 per
|
||||
# RFC internal#219 §1 sweep.
|
||||
#
|
||||
# Differences from the GitHub version:
|
||||
# - Dropped `merge_group:` (no Gitea merge queue) and
|
||||
# `workflow_dispatch:` (no inputs, but the trigger itself is
|
||||
# parser-rejected when inputs are absent in some Gitea 1.22.x
|
||||
# builds; safest to drop entirely — manual runs go via cron-trigger
|
||||
# bump or push-with-paths-filter).
|
||||
# - on.paths references .gitea/workflows/runtime-pin-compat.yml (this
|
||||
# file) instead of the .github/ one.
|
||||
# - Workflow-level env.GITHUB_SERVER_URL set.
|
||||
# - `continue-on-error: true` on the job (RFC §1 contract).
|
||||
#
|
||||
# CI gate that prevents the 5-hour staging outage from 2026-04-24 from
|
||||
# recurring (controlplane#253). The original failure mode:
|
||||
# 1. molecule-ai-workspace-runtime 0.1.13 declared `a2a-sdk<1.0` in its
|
||||
# requires_dist metadata (incorrect — it actually imports
|
||||
# a2a.server.routes which only exists in a2a-sdk 1.0+)
|
||||
# 2. `pip install molecule-ai-workspace-runtime` resolved cleanly
|
||||
# 3. `from molecule_runtime.main import main_sync` raised ImportError
|
||||
# 4. Every tenant workspace crashed; the canary tenant caught it but
|
||||
# only after 5 hours of degraded staging
|
||||
#
|
||||
# This workflow installs the CURRENTLY PUBLISHED runtime from PyPI on
|
||||
# top of `workspace/requirements.txt` and smoke-imports. Catches:
|
||||
# - Upstream PyPI yanks
|
||||
# - Bad re-releases of molecule-ai-workspace-runtime
|
||||
# - Already-shipped wheels that stop importing because a transitive
|
||||
# dep moved underneath
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main, staging]
|
||||
paths:
|
||||
# Narrow filter: pypi-latest is sensitive only to changes that
|
||||
# affect what we're INSTALLING (requirements.txt) or WHAT THE
|
||||
# CHECK ITSELF DOES (this workflow file). Edits to workspace/
|
||||
# source code don't change what's on PyPI right now, so they
|
||||
# don't change this gate's verdict.
|
||||
- 'workspace/requirements.txt'
|
||||
- '.gitea/workflows/runtime-pin-compat.yml'
|
||||
pull_request:
|
||||
branches: [main, staging]
|
||||
paths:
|
||||
- 'workspace/requirements.txt'
|
||||
- '.gitea/workflows/runtime-pin-compat.yml'
|
||||
# Daily catch for upstream PyPI publishes that break the pin combo
|
||||
# without any change in our repo (e.g. someone re-yanks an a2a-sdk
|
||||
# release or molecule-ai-workspace-runtime publishes a bad bump).
|
||||
schedule:
|
||||
- cron: '0 13 * * *' # 06:00 PT
|
||||
|
||||
env:
|
||||
GITHUB_SERVER_URL: https://git.moleculesai.app
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
pypi-latest-install:
|
||||
name: PyPI-latest install + import smoke
|
||||
runs-on: ubuntu-latest
|
||||
# Phase 3 (RFC #219 §1): surface broken workflows without blocking
|
||||
# the PR. Follow-up PR flips this off after surfaced defects are
|
||||
# triaged.
|
||||
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
|
||||
continue-on-error: true
|
||||
steps:
|
||||
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
|
||||
with:
|
||||
python-version: '3.11'
|
||||
cache: pip
|
||||
cache-dependency-path: workspace/requirements.txt
|
||||
- name: Install runtime + workspace requirements
|
||||
# Install order is load-bearing: install the runtime FIRST so pip
|
||||
# honors whatever a2a-sdk constraint the runtime metadata declares
|
||||
# (this is the surface that broke in 2026-04-24 — runtime declared
|
||||
# `a2a-sdk<1.0` but actually needed >=1.0). The follow-up install
|
||||
# of workspace/requirements.txt then upgrades a2a-sdk to the
|
||||
# constraint our runtime image actually pins. The import smoke
|
||||
# below verifies the upgraded combination is consistent.
|
||||
run: |
|
||||
python -m venv /tmp/venv
|
||||
/tmp/venv/bin/pip install --upgrade pip
|
||||
/tmp/venv/bin/pip install molecule-ai-workspace-runtime
|
||||
/tmp/venv/bin/pip install -r workspace/requirements.txt
|
||||
/tmp/venv/bin/pip show molecule-ai-workspace-runtime a2a-sdk \
|
||||
| grep -E '^(Name|Version):'
|
||||
- name: Smoke import — fail if metadata declares deps that don't satisfy real imports
|
||||
# WORKSPACE_ID is validated at import time by platform_auth.py — EC2
|
||||
# user-data sets it from the cloud-init template; set a placeholder
|
||||
# here so the import smoke doesn't trip on the env-var guard.
|
||||
env:
|
||||
WORKSPACE_ID: 00000000-0000-0000-0000-000000000001
|
||||
run: |
|
||||
/tmp/venv/bin/python -c "from molecule_runtime.main import main_sync; print('runtime imports OK')"
|
||||
@@ -1,150 +0,0 @@
|
||||
name: Runtime PR-Built Compatibility
|
||||
|
||||
# Ported from .github/workflows/runtime-prbuild-compat.yml on 2026-05-11
|
||||
# per RFC internal#219 §1 sweep.
|
||||
#
|
||||
# Differences from the GitHub version:
|
||||
# - Dropped `merge_group:` (no Gitea merge queue) and `workflow_dispatch:`
|
||||
# (Gitea 1.22.6 parser-rejects workflow_dispatch with inputs and is
|
||||
# finicky without them).
|
||||
# - `dorny/paths-filter@v4` replaced with inline `git diff` (per PR#372
|
||||
# pattern for ci.yml port).
|
||||
# - on.paths references .gitea/workflows/runtime-prbuild-compat.yml.
|
||||
# - Workflow-level env.GITHUB_SERVER_URL set.
|
||||
# - `continue-on-error: true` on every job (RFC §1 contract).
|
||||
#
|
||||
# Companion to `runtime-pin-compat.yml`. That workflow tests what's
|
||||
# CURRENTLY PUBLISHED on PyPI; this workflow tests what WOULD BE
|
||||
# PUBLISHED if THIS PR merges.
|
||||
#
|
||||
# Why two workflows: the chicken-and-egg #128 fix added a "PR-built
|
||||
# wheel" job to the original runtime-pin-compat.yml, but both jobs
|
||||
# shared a `paths:` filter that was the union of their needs
|
||||
# (`workspace/**`). That meant the PyPI-latest job ran on every doc
|
||||
# edit even though the upstream PyPI artifact can't change with our
|
||||
# workspace/ source. Splitting the two means each gets a narrow
|
||||
# `paths:` filter that matches the inputs it actually depends on.
|
||||
#
|
||||
# Catches the failure mode where a PR adds an import requiring a newer
|
||||
# SDK than `workspace/requirements.txt` pins:
|
||||
# 1. Pip resolves the existing PyPI wheel + the old SDK pin -> smoke
|
||||
# passes (it imports the OLD main.py from the wheel, not the PR's
|
||||
# new main.py).
|
||||
# 2. Merge -> publish-runtime.yml ships a wheel WITH the new import.
|
||||
# 3. Tenant images redeploy -> all crash on first boot with ImportError.
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main, staging]
|
||||
pull_request:
|
||||
branches: [main, staging]
|
||||
|
||||
env:
|
||||
GITHUB_SERVER_URL: https://git.moleculesai.app
|
||||
|
||||
concurrency:
|
||||
# event_name + sha keeps PR sync and the subsequent staging push on the
|
||||
# same SHA from cancelling each other (per feedback_concurrency_group_per_sha).
|
||||
group: ${{ github.workflow }}-${{ github.event_name }}-${{ github.event.pull_request.head.sha || github.sha }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
detect-changes:
|
||||
runs-on: ubuntu-latest
|
||||
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
|
||||
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
|
||||
continue-on-error: true
|
||||
outputs:
|
||||
wheel: ${{ steps.decide.outputs.wheel }}
|
||||
steps:
|
||||
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
with:
|
||||
fetch-depth: 0
|
||||
- id: decide
|
||||
run: |
|
||||
# Inline replacement for dorny/paths-filter — same pattern
|
||||
# PR#372's ci.yml port used. Diffs against the PR base or the
|
||||
# previous push SHA, then matches against the wheel-relevant
|
||||
# path set.
|
||||
#
|
||||
# NOTE: Gitea Actions does not expose github.event.before as a
|
||||
# shell environment variable. The ${{ github.event.before }} template
|
||||
# expression works inside YAML run: blocks but is evaluated to an
|
||||
# empty string for push events, making the ${VAR:-fallback} always
|
||||
# use the fallback. Use GITHUB_EVENT_BEFORE instead — it IS set in
|
||||
# the runner's shell environment for push events.
|
||||
BASE=""
|
||||
if [ "${{ github.event_name }}" = "pull_request" ]; then
|
||||
BASE="${{ github.event.pull_request.base.sha }}"
|
||||
elif [ -n "$GITHUB_EVENT_BEFORE" ]; then
|
||||
BASE="$GITHUB_EVENT_BEFORE"
|
||||
fi
|
||||
if [ -z "$BASE" ] || echo "$BASE" | grep -qE '^0+$'; then
|
||||
# New branch or no previous SHA: treat as wheel-relevant.
|
||||
echo "wheel=true" >> "$GITHUB_OUTPUT"
|
||||
exit 0
|
||||
fi
|
||||
if ! timeout 30 git cat-file -e "$BASE" 2>/dev/null; then
|
||||
git fetch --depth=1 origin "$BASE" 2>/dev/null || true
|
||||
fi
|
||||
if ! timeout 30 git cat-file -e "$BASE" 2>/dev/null; then
|
||||
echo "wheel=true" >> "$GITHUB_OUTPUT"
|
||||
exit 0
|
||||
fi
|
||||
CHANGED=$(git diff --name-only "$BASE" HEAD)
|
||||
if echo "$CHANGED" | grep -qE '^(workspace/|scripts/build_runtime_package\.py$|scripts/wheel_smoke\.py$|\.gitea/workflows/runtime-prbuild-compat\.yml$)'; then
|
||||
echo "wheel=true" >> "$GITHUB_OUTPUT"
|
||||
else
|
||||
echo "wheel=false" >> "$GITHUB_OUTPUT"
|
||||
fi
|
||||
|
||||
# ONE job (no job-level `if:`) that always runs and reports under the
|
||||
# required-check name `PR-built wheel + import smoke`. Real work is
|
||||
# gated per-step on `needs.detect-changes.outputs.wheel`.
|
||||
local-build-install:
|
||||
needs: detect-changes
|
||||
name: PR-built wheel + import smoke
|
||||
runs-on: ubuntu-latest
|
||||
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
|
||||
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
|
||||
continue-on-error: true
|
||||
steps:
|
||||
- name: No-op pass (paths filter excluded this commit)
|
||||
if: needs.detect-changes.outputs.wheel != 'true'
|
||||
run: |
|
||||
echo "No workspace/ / scripts/{build_runtime_package,wheel_smoke}.py / workflow changes — wheel gate satisfied without rebuilding."
|
||||
echo "::notice::PR-built wheel + import smoke no-op pass (paths filter excluded this commit)."
|
||||
- if: needs.detect-changes.outputs.wheel == 'true'
|
||||
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
- if: needs.detect-changes.outputs.wheel == 'true'
|
||||
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
|
||||
with:
|
||||
python-version: '3.11'
|
||||
cache: pip
|
||||
cache-dependency-path: workspace/requirements.txt
|
||||
- name: Install build tooling
|
||||
if: needs.detect-changes.outputs.wheel == 'true'
|
||||
run: pip install build
|
||||
- name: Build wheel from PR source (mirrors publish-runtime.yml)
|
||||
if: needs.detect-changes.outputs.wheel == 'true'
|
||||
# Use a fixed test version so the wheel filename is predictable.
|
||||
# Doesn't reach PyPI — this build is local-only for the smoke.
|
||||
run: |
|
||||
python scripts/build_runtime_package.py \
|
||||
--version "0.0.0.dev0+pin-compat" \
|
||||
--out /tmp/runtime-build
|
||||
cd /tmp/runtime-build && python -m build
|
||||
- name: Install built wheel + workspace requirements
|
||||
if: needs.detect-changes.outputs.wheel == 'true'
|
||||
run: |
|
||||
python -m venv /tmp/venv-built
|
||||
/tmp/venv-built/bin/pip install --upgrade pip
|
||||
/tmp/venv-built/bin/pip install /tmp/runtime-build/dist/*.whl
|
||||
/tmp/venv-built/bin/pip install -r workspace/requirements.txt
|
||||
/tmp/venv-built/bin/pip show molecule-ai-workspace-runtime a2a-sdk \
|
||||
| grep -E '^(Name|Version):'
|
||||
- name: Smoke import the PR-built wheel
|
||||
if: needs.detect-changes.outputs.wheel == 'true'
|
||||
# Same script publish-runtime.yml runs against the to-be-PyPI wheel.
|
||||
run: |
|
||||
/tmp/venv-built/bin/python "$GITHUB_WORKSPACE/scripts/wheel_smoke.py"
|
||||
@@ -58,14 +58,20 @@ jobs:
|
||||
python-version: '3.11'
|
||||
- name: Install .gitea script test dependencies
|
||||
run: python -m pip install --quiet 'pytest==9.0.2' 'PyYAML==6.0.2'
|
||||
- name: Run scripts/ unittests (build_runtime_package, ...)
|
||||
# Top-level scripts/ tests live alongside their target file
|
||||
# (e.g. scripts/test_build_runtime_package.py exercises
|
||||
# scripts/build_runtime_package.py). discover from scripts/
|
||||
# picks up only top-level test_*.py because scripts/ops/ has
|
||||
# no __init__.py — that's intentional, so we run two passes.
|
||||
- name: Run scripts/ unittests, if any
|
||||
# Top-level scripts/ tests live alongside their target file. The
|
||||
# runtime packaging tests moved to molecule-ai-workspace-runtime, so
|
||||
# this pass may legitimately find no tests.
|
||||
working-directory: scripts
|
||||
run: python -m unittest discover -t . -p 'test_*.py' -v
|
||||
run: |
|
||||
set +e
|
||||
python -m unittest discover -t . -p 'test_*.py' -v
|
||||
rc=$?
|
||||
if [ "$rc" -eq 5 ]; then
|
||||
echo "No top-level scripts/ unittest files found; skipping."
|
||||
exit 0
|
||||
fi
|
||||
exit "$rc"
|
||||
- name: Run scripts/ops/ unittests (sweep_cf_decide, ...)
|
||||
working-directory: scripts/ops
|
||||
run: python -m unittest discover -p 'test_*.py' -v
|
||||
|
||||
@@ -163,11 +163,11 @@ Most agent systems stop at "a smart runtime." Molecule AI pushes further: it giv
|
||||
|
||||
| Core mechanism | Molecule AI module(s) | Why it matters |
|
||||
|---|---|---|
|
||||
| **Durable memory that survives sessions** | `workspace/builtin_tools/memory.py`, `workspace/builtin_tools/awareness_client.py`, `workspace-server/internal/handlers/memories.go` | Memory is not just durable, it is **workspace-scoped** and can route into awareness namespaces tied to the org structure |
|
||||
| **Durable memory that survives sessions** | `molecule-ai-workspace-runtime/molecule_runtime/builtin_tools/`, `workspace-server/internal/handlers/memories.go` | Memory is not just durable, it is **workspace-scoped** and can route into awareness namespaces tied to the org structure |
|
||||
| **Cross-session recall** | `workspace-server/internal/handlers/activity.go` (`/workspaces/:id/session-search`) | Recall spans both activity history and memory rows, so the system can search what happened and what was learned without inventing a separate hidden store |
|
||||
| **Skills built from experience** | `workspace/builtin_tools/memory.py` (`_maybe_log_skill_promotion`) | Promotion from memory into a skill candidate is surfaced as an explicit platform activity, not a silent internal side effect |
|
||||
| **Skill improvement during use** | `workspace/skill_loader/watcher.py`, `workspace/skill_loader/loader.py`, `workspace/main.py` | Skills hot-reload into the live runtime, so improvements become available on the next A2A task without restarting the workspace |
|
||||
| **Persistent skill lifecycle** | `workspace-server/cmd/cli/cmd_agent_skill.go`, `workspace/plugins.py` | Skills are not just generated once; they can be audited, installed, published, shared, mounted by plugins, and governed as reusable operational assets |
|
||||
| **Skills built from experience** | `molecule-ai-workspace-runtime/molecule_runtime/builtin_tools/memory.py` (`_maybe_log_skill_promotion`) | Promotion from memory into a skill candidate is surfaced as an explicit platform activity, not a silent internal side effect |
|
||||
| **Skill improvement during use** | `molecule-ai-workspace-runtime/molecule_runtime/skill_loader/`, `molecule-ai-workspace-runtime/molecule_runtime/main.py` | Skills hot-reload into the live runtime, so improvements become available on the next A2A task without restarting the workspace |
|
||||
| **Persistent skill lifecycle** | `workspace-server/cmd/cli/cmd_agent_skill.go`, `molecule-ai-workspace-runtime/molecule_runtime/plugins.py` | Skills are not just generated once; they can be audited, installed, published, shared, mounted by plugins, and governed as reusable operational assets |
|
||||
|
||||
### Why this matters in Molecule AI
|
||||
|
||||
@@ -208,7 +208,7 @@ The result is not just “an agent that learns.” It is **an organization that
|
||||
|
||||
### Runtime
|
||||
|
||||
- unified `workspace/` image; thin AMI in production (us-east-2)
|
||||
- standalone workspace-template images that install `molecule-ai-workspace-runtime` from the Gitea package registry; thin AMI in production (us-east-2)
|
||||
- adapter-driven execution across **8 runtimes** (Claude Code, Hermes, Gemini CLI, LangGraph, DeepAgents, CrewAI, AutoGen, OpenClaw)
|
||||
- Agent Card registration
|
||||
- awareness-backed memory integration; **Memory v2 backed by pgvector** for semantic recall
|
||||
|
||||
@@ -17,7 +17,7 @@ Canvas (Next.js :3000) ←WebSocket→ Platform (Go :8080) ←HTTP→ Postgres +
|
||||
|
||||
- **Workspace Server** (`workspace-server/`): Go/Gin control plane — workspace CRUD, registry, discovery, WebSocket hub, liveness monitoring.
|
||||
- **Canvas** (`canvas/`): Next.js 15 + React Flow (@xyflow/react v12) + Zustand + Tailwind — visual workspace graph.
|
||||
- **Workspace Runtime** (`workspace/`): Shared runtime published as [`molecule-ai-workspace-runtime`](https://pypi.org/project/molecule-ai-workspace-runtime/) on PyPI. Supports LangGraph, Claude Code, OpenClaw, DeepAgents, CrewAI, AutoGen. Each adapter lives in its own standalone template repo (e.g. `molecule-ai-workspace-template-claude-code`). See `docs/workspace-runtime-package.md` for the full picture.
|
||||
- **Workspace Runtime**: Shared runtime published from [`molecule-ai-workspace-runtime`](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime) to the Molecule AI Gitea package registry. Supports LangGraph, Claude Code, OpenClaw, Hermes, Codex, and AutoGen. Each adapter lives in its own standalone template repo (e.g. `molecule-ai-workspace-template-claude-code`). See `docs/workspace-runtime-package.md` for the full picture.
|
||||
- **molecli** (`workspace-server/cmd/cli/`): Go TUI dashboard (Bubbletea + Lipgloss) — real-time workspace monitoring, event log, health overview, delete/filter operations.
|
||||
|
||||
## Key Architectural Patterns
|
||||
|
||||
@@ -1,304 +1,44 @@
|
||||
# Workspace Runtime PyPI Package
|
||||
# Workspace Runtime Package
|
||||
|
||||
## Requires Python >= 3.11
|
||||
`molecule-ai-workspace-runtime` is the shared Python runtime consumed by
|
||||
workspace template images and by external MCP integrations.
|
||||
|
||||
The wheel pins `requires_python>=3.11`. On Python 3.10 or older, `pip install
|
||||
molecule-ai-workspace-runtime` fails with `Could not find a version that
|
||||
satisfies the requirement (from versions: none)` — the pin filters the only
|
||||
available artifact before pip even attempts install. Upgrade the interpreter
|
||||
(`brew install python@3.12` / `apt install python3.12` / etc.) or use a
|
||||
3.11+ venv.
|
||||
## Source Of Truth
|
||||
|
||||
## Overview
|
||||
The source of truth is the standalone Gitea repo:
|
||||
|
||||
The shared workspace runtime infrastructure has **one editable source** and
|
||||
**one published artifact**:
|
||||
|
||||
1. **Source of truth (monorepo, editable):** `workspace/` — every runtime
|
||||
change lands here. Edit it like any other monorepo code.
|
||||
2. **Published artifact (PyPI, generated):** [`molecule-ai-workspace-runtime`](https://pypi.org/project/molecule-ai-workspace-runtime/)
|
||||
— produced by `.github/workflows/publish-runtime.yml` on every
|
||||
`runtime-vX.Y.Z` tag push. Do NOT edit this independently — it gets
|
||||
overwritten on every publish.
|
||||
|
||||
The legacy sibling repo `molecule-ai-workspace-runtime` (the GitHub repo, as
|
||||
distinct from the PyPI package) is no longer the source-of-truth and should
|
||||
be treated as a publish artifact only. It can be archived or used as a
|
||||
read-only mirror.
|
||||
|
||||
## Where to make changes
|
||||
|
||||
**All runtime edits land in `molecule-monorepo/workspace/`. Period.**
|
||||
|
||||
The GitHub repo `Molecule-AI/molecule-ai-workspace-runtime` is **mirror-only**.
|
||||
It exists so external consumers (template repos, downstream operators) have a
|
||||
git-cloneable artifact that mirrors the PyPI wheel — nothing more.
|
||||
|
||||
- **Direct PRs against `molecule-ai-workspace-runtime` are auto-rejected by
|
||||
the `mirror-guard` CI check.** The check fails any push that did not come
|
||||
from the publish pipeline. There is no opt-out — file the change against
|
||||
`molecule-monorepo/workspace/` instead.
|
||||
- **The mirror + the PyPI wheel both auto-regenerate on every push to
|
||||
`staging`** via `.github/workflows/publish-runtime.yml` (which calls
|
||||
`scripts/build_runtime_package.py`, builds wheel + sdist, smoke-imports,
|
||||
uploads to PyPI via Trusted Publisher, and force-pushes the rewritten tree
|
||||
to the mirror repo). You never touch the mirror by hand.
|
||||
|
||||
If you have an old local clone of the mirror and try to push a fix to it
|
||||
directly, expect a CI failure with a message pointing you here. Re-open the
|
||||
change against `molecule-monorepo/workspace/` and let the publish workflow
|
||||
do the rest.
|
||||
|
||||
## Why this shape
|
||||
|
||||
The 8 workspace template repos (claude-code, langgraph, hermes, etc.) each
|
||||
build their own Docker image and `pip install molecule-ai-workspace-runtime`
|
||||
from PyPI. PyPI is the right distribution channel — semver, reproducible
|
||||
builds, no submodule dance per-repo. But the runtime ALSO needs to evolve
|
||||
in lock-step with the platform's wire protocol (queue shape, A2A metadata,
|
||||
event payloads). Shipping cross-cutting protocol changes as separate
|
||||
runtime + platform PRs in two repos creates ordering pain and broken
|
||||
intermediate states.
|
||||
|
||||
The monorepo + auto-publish split gives both: edit cross-cutting changes
|
||||
in one PR, publish the runtime artifact via a tag.
|
||||
|
||||
## What's in the package
|
||||
|
||||
Everything in `workspace/*.py` plus the `adapters/`, `builtin_tools/`,
|
||||
`plugins_registry/`, `policies/`, `skill_loader/` subpackages. Build
|
||||
artifacts (`Dockerfile`, `*.sh`, `pytest.ini`, `requirements.txt`) are
|
||||
excluded.
|
||||
|
||||
The build script rewrites bare imports so the published package is a
|
||||
proper Python namespace:
|
||||
|
||||
```
|
||||
# In monorepo workspace/:
|
||||
from a2a_client import discover_peer
|
||||
from builtin_tools.memory import store
|
||||
|
||||
# In published molecule_runtime/ (auto-rewritten at publish time):
|
||||
from molecule_runtime.a2a_client import discover_peer
|
||||
from molecule_runtime.builtin_tools.memory import store
|
||||
```text
|
||||
https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime
|
||||
```
|
||||
|
||||
The closed allowlist of rewritten module names lives in
|
||||
`scripts/build_runtime_package.py` (`TOP_LEVEL_MODULES` + `SUBPACKAGES`).
|
||||
Add a new top-level module to workspace/? Add it to the allowlist in the
|
||||
same PR.
|
||||
Do not add runtime source back under `molecule-core/workspace/`. The core repo
|
||||
owns the platform server, canvas, provisioning, and tests around the installed
|
||||
runtime package.
|
||||
|
||||
## Adapter repos
|
||||
## Package Registry
|
||||
|
||||
Each of the 8 adapter template repos contains:
|
||||
- `adapter.py` — runtime-specific `Adapter` class
|
||||
- `requirements.txt` — `molecule-ai-workspace-runtime>=0.1.X` + adapter deps
|
||||
- `Dockerfile` — standalone image with `ENV ADAPTER_MODULE=adapter` and
|
||||
`ENTRYPOINT ["molecule-runtime"]`
|
||||
The runtime package is published to the Molecule AI Gitea package registry:
|
||||
|
||||
| Adapter | Repo |
|
||||
|---------|------|
|
||||
| claude-code | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-claude-code |
|
||||
| langgraph | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-langgraph |
|
||||
| crewai | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-crewai |
|
||||
| autogen | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-autogen |
|
||||
| deepagents | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-deepagents |
|
||||
| hermes | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-hermes |
|
||||
| gemini-cli | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-gemini-cli |
|
||||
| openclaw | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-openclaw |
|
||||
|
||||
## Adapter discovery (ADAPTER_MODULE)
|
||||
|
||||
Standalone adapter repos set `ENV ADAPTER_MODULE=adapter` in their
|
||||
Dockerfile. The runtime's `get_adapter()` checks this env var first:
|
||||
|
||||
```python
|
||||
# In molecule_runtime/adapters/__init__.py
|
||||
def get_adapter(runtime: str) -> type[BaseAdapter]:
|
||||
adapter_module = os.environ.get("ADAPTER_MODULE")
|
||||
if adapter_module:
|
||||
mod = importlib.import_module(adapter_module)
|
||||
return getattr(mod, "Adapter")
|
||||
raise KeyError(...)
|
||||
```text
|
||||
https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/
|
||||
```
|
||||
|
||||
## Publishing a new version
|
||||
PyPI is intentionally not part of the critical path. Template Dockerfiles,
|
||||
external-runtime snippets, and CI install checks should use the Gitea registry.
|
||||
|
||||
```bash
|
||||
# From any local checkout of monorepo, after merging your runtime change:
|
||||
git tag runtime-v0.1.6
|
||||
git push origin runtime-v0.1.6
|
||||
```
|
||||
## Release Flow
|
||||
|
||||
The `publish-runtime` workflow takes over — checks out the tag, runs
|
||||
`scripts/build_runtime_package.py --version 0.1.6`, builds wheel + sdist,
|
||||
runs a smoke import to catch broken rewrites, and uploads to PyPI via
|
||||
the PyPA Trusted Publisher action (OIDC). No static API token is stored
|
||||
in this repo — PyPI verifies the workflow's OIDC claim against the
|
||||
trusted-publisher config registered for `molecule-ai-workspace-runtime`.
|
||||
1. Land a reviewed PR in `molecule-ai-workspace-runtime`.
|
||||
2. Bump `version =` in that repo's `pyproject.toml`.
|
||||
3. Tag `runtime-vX.Y.Z` on the runtime repo.
|
||||
4. The runtime repo's `publish-runtime` workflow builds the wheel and sdist,
|
||||
publishes to the Gitea registry, verifies install from that registry, then
|
||||
cascades `.runtime-version` pins to workspace template repos.
|
||||
|
||||
For dev/test releases without tagging, dispatch the workflow manually
|
||||
with an explicit version (e.g. `0.1.6.dev1` — PEP 440 dev/rc/post forms
|
||||
are accepted).
|
||||
## Core Repo Contract
|
||||
|
||||
After publish, the 8 template repos pick up the new version on their
|
||||
next `:latest` rebuild. To force-pull immediately, bump the pin in each
|
||||
template's `requirements.txt`.
|
||||
`molecule-core` must not ship editable runtime code. Its responsibilities are:
|
||||
|
||||
## End-to-end CD chain
|
||||
|
||||
The full chain from monorepo merge → workspace containers running new code:
|
||||
|
||||
```
|
||||
1. Merge PR with workspace/ changes to main
|
||||
↓
|
||||
2. .github/workflows/auto-tag-runtime.yml fires
|
||||
↓ reads PR labels (release:major/minor) or defaults to patch
|
||||
↓ pushes runtime-vX.Y.Z tag
|
||||
↓
|
||||
3. .github/workflows/publish-runtime.yml fires (on the tag)
|
||||
↓ builds wheel via scripts/build_runtime_package.py
|
||||
↓ smoke-imports the wheel
|
||||
↓ uploads to PyPI
|
||||
↓ cascade job fires repository_dispatch (event-type: runtime-published)
|
||||
↓ to all 8 workspace-template-* repos
|
||||
↓
|
||||
4. Each template's publish-image.yml fires (on repository_dispatch)
|
||||
↓ rebuilds Dockerfile (which pip-installs the new PyPI version)
|
||||
↓ pushes ghcr.io/molecule-ai/workspace-template-<runtime>:latest
|
||||
↓
|
||||
5. Production hosts run scripts/refresh-workspace-images.sh
|
||||
OR an operator hits POST /admin/workspace-images/refresh on the platform
|
||||
↓ docker pull all 8 :latest tags
|
||||
↓ remove + force-recreate any running ws-* containers using a refreshed image
|
||||
↓ canvas re-provisions the workspaces on next interaction
|
||||
```
|
||||
|
||||
Steps 1-4 are fully automated. Step 5 is one-click: a single curl or shell
|
||||
command. SaaS deployments typically wire step 5 into their normal deploy
|
||||
pipeline (every release pulls fresh images on every host); local dev fires
|
||||
it manually after a runtime release lands.
|
||||
|
||||
### Auth
|
||||
|
||||
PyPI publishing uses **Trusted Publisher (OIDC)** — no static token in the
|
||||
monorepo. The trusted-publisher config on PyPI binds the
|
||||
`molecule-ai-workspace-runtime` project to this repo's
|
||||
`publish-runtime.yml` workflow + `pypi-publish` environment. Rotation is
|
||||
moot: there is no shared secret to rotate.
|
||||
|
||||
### Required secrets
|
||||
|
||||
| Secret | Where | Why |
|
||||
|---|---|---|
|
||||
| `TEMPLATE_DISPATCH_TOKEN` | molecule-core repo | Fine-grained PAT with `actions:write` on the 8 template repos. Without it the `cascade` job warns and exits clean — PyPI still publishes; templates just don't auto-rebuild. |
|
||||
|
||||
### Step 5 specifics
|
||||
|
||||
**Local dev (compose stack):**
|
||||
```bash
|
||||
bash scripts/refresh-workspace-images.sh # all runtimes
|
||||
bash scripts/refresh-workspace-images.sh --runtime claude-code
|
||||
bash scripts/refresh-workspace-images.sh --no-recreate # pull only, leave containers
|
||||
```
|
||||
|
||||
**Via platform admin endpoint (any deploy):**
|
||||
```bash
|
||||
curl -X POST "$PLATFORM/admin/workspace-images/refresh"
|
||||
curl -X POST "$PLATFORM/admin/workspace-images/refresh?runtime=claude-code"
|
||||
curl -X POST "$PLATFORM/admin/workspace-images/refresh?recreate=false"
|
||||
```
|
||||
|
||||
The endpoint pulls + recreates from inside the platform container, so it
|
||||
needs Docker socket access (the compose stack mounts
|
||||
`/var/run/docker.sock` already) AND GHCR auth on the host's docker config
|
||||
(`docker login ghcr.io` once per host). On a fresh host without GHCR auth,
|
||||
the pull step warns per runtime and the response surfaces the failures.
|
||||
|
||||
**Fully hands-off (opt-in image auto-refresh):**
|
||||
|
||||
Set `IMAGE_AUTO_REFRESH=true` on the platform process. A watcher polls
|
||||
GHCR every 5 minutes for digest changes on each `workspace-template-*:latest`
|
||||
tag and invokes the same refresh logic the admin endpoint exposes —
|
||||
no operator action required between "runtime PR merged" and
|
||||
"containers running new code". Disabled by default because SaaS deploy
|
||||
pipelines that already pull on every release would do redundant work.
|
||||
|
||||
Optional companion env (same as the admin endpoint):
|
||||
|
||||
- `GHCR_USER` + `GHCR_TOKEN` — required for private template images;
|
||||
unused for the current public set, but harmless if set.
|
||||
|
||||
## Local dev (build the package without publishing)
|
||||
|
||||
```bash
|
||||
python3 scripts/build_runtime_package.py --version 0.1.0-local --out /tmp/runtime-build
|
||||
cd /tmp/runtime-build
|
||||
python -m build # produces dist/*.whl + dist/*.tar.gz
|
||||
pip install dist/*.whl # install into a venv to test locally
|
||||
```
|
||||
|
||||
This is the same pipeline CI runs. Use it to validate import-rewrite
|
||||
correctness before pushing a `runtime-v*` tag.
|
||||
|
||||
## Writing a new adapter
|
||||
|
||||
Use the GitHub template repo
|
||||
[`molecule-ai/molecule-ai-workspace-template-starter`](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-starter) (note: the starter repo did not survive the 2026-05-06 GitHub-org-suspension migration; recreation tracked at internal#41)
|
||||
— it ships with the canonical Dockerfile + adapter.py skeleton + config.yaml
|
||||
schema + the `repository_dispatch: [runtime-published]` cascade receiver
|
||||
already wired up. No follow-up setup PR required.
|
||||
|
||||
```bash
|
||||
# Replace <runtime> with your runtime slug (lowercase, hyphenated).
|
||||
gh repo create Molecule-AI/molecule-ai-workspace-template-<runtime> \
|
||||
--template Molecule-AI/molecule-ai-workspace-template-starter \
|
||||
--public \
|
||||
--description "Molecule AI workspace template: <runtime>"
|
||||
|
||||
git clone https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-<runtime>.git
|
||||
cd molecule-ai-workspace-template-<runtime>
|
||||
```
|
||||
|
||||
Then fill in the `TODO` markers in:
|
||||
|
||||
| File | What to fill in |
|
||||
|---|---|
|
||||
| `adapter.py` | Rename class to `<Runtime>Adapter`. Fill in `name()`, `display_name()`, `description()`, `get_config_schema()`. Implement `setup()` and `create_executor()`. |
|
||||
| `requirements.txt` | Add your runtime's pip dependencies (e.g. `langgraph`, `crewai`, `claude-agent-sdk`). |
|
||||
| `Dockerfile` | Add runtime-specific apt deps (most runtimes don't need any). Replace ENTRYPOINT only if you need custom boot logic. |
|
||||
| `config.yaml` | Update top-level `name`/`runtime`/`description`. Add the models your runtime supports to `models[]`. |
|
||||
| `system-prompt.md` | Default agent prompt. |
|
||||
|
||||
After `git push`:
|
||||
|
||||
1. The template's `publish-image.yml` builds + pushes
|
||||
`ghcr.io/molecule-ai/workspace-template-<runtime>:latest` automatically.
|
||||
2. The next `runtime-vX.Y.Z` tag on `molecule-core` cascades a
|
||||
`repository_dispatch` event into your new template, rebuilding the image
|
||||
against the latest runtime — no setup PR required.
|
||||
3. Register the runtime name in the platform's `RuntimeImages` map (in
|
||||
`workspace-server/internal/provisioner/provisioner.go`) so it's
|
||||
selectable in the canvas.
|
||||
|
||||
## When the starter itself needs to evolve
|
||||
|
||||
If the canonical shape changes (e.g. `config.yaml` schema gets a new field,
|
||||
the `BaseAdapter` interface adds a method, the reusable CI workflow
|
||||
signature changes), update the
|
||||
[starter](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-starter) (recreation pending — see note above)
|
||||
**first**. Existing templates can either migrate at their own pace or be
|
||||
touched in a coordinated cleanup PR. Either way, future templates pick up
|
||||
the new shape from day one.
|
||||
|
||||
## Migration note
|
||||
|
||||
Prior to this workflow, the runtime was duplicated across monorepo
|
||||
`workspace/` AND a sibling repo `molecule-ai-workspace-runtime`, with no
|
||||
sync mechanism. That caused 30+ files to drift between the two trees and
|
||||
tonight's chat-leak / queued-classification fixes existed only in the
|
||||
monorepo copy until manually ported.
|
||||
|
||||
If you have an old local checkout of `molecule-ai-workspace-runtime`, treat
|
||||
it as outdated. The monorepo `workspace/` is now authoritative; the PyPI
|
||||
artifact is rebuilt from it on every `runtime-v*` tag.
|
||||
- Test platform behavior against the installed runtime contract.
|
||||
- Keep MCP/registry/TenantGuard behavior compatible with the runtime package.
|
||||
- Fail CI if `workspace/` or legacy build-from-workspace scripts are restored.
|
||||
|
||||
@@ -1,542 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Build the molecule-ai-workspace-runtime PyPI package from monorepo workspace/.
|
||||
|
||||
Monorepo workspace/ is the single source-of-truth for runtime code. The PyPI
|
||||
package is a publish-time mirror produced by this script, NOT a parallel
|
||||
editable copy. Anyone editing the runtime should edit workspace/, never the
|
||||
sibling molecule-ai-workspace-runtime repo.
|
||||
|
||||
What this does
|
||||
--------------
|
||||
1. Copies workspace/ source into build/molecule_runtime/ (note the rename:
|
||||
bare modules become a real Python package).
|
||||
2. Rewrites top-level imports so e.g. `from a2a_client import X` becomes
|
||||
`from molecule_runtime.a2a_client import X`. The rewrite is regex-based
|
||||
on a closed allowlist of modules — third-party imports like `from a2a.X`
|
||||
(the a2a-sdk package) are left alone because the regex is anchored on
|
||||
exact module names.
|
||||
3. Writes a pyproject.toml with the requested version + the README + the
|
||||
py.typed marker.
|
||||
4. Leaves the build dir ready for `python -m build` to produce a wheel/sdist.
|
||||
|
||||
Usage
|
||||
-----
|
||||
scripts/build_runtime_package.py --version 0.1.6 --out /tmp/runtime-build
|
||||
cd /tmp/runtime-build && python -m build
|
||||
python -m twine upload dist/*
|
||||
|
||||
The publish workflow (.github/workflows/publish-runtime.yml) drives this
|
||||
on every `runtime-v*` tag push.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import re
|
||||
import shutil
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Top-level Python modules in workspace/ that become molecule_runtime.X.
|
||||
# Anything imported as `from <name> import` or `import <name>` (where <name>
|
||||
# matches one of these) gets rewritten to use the package prefix.
|
||||
#
|
||||
# Closed list (not "every .py we copy") because a typo in workspace/ would
|
||||
# otherwise leak into a wrong rewrite. The set is asserted against
|
||||
# `workspace/*.py` at build time — if the disk contents drift from this
|
||||
# list (new module added, old one removed), the build fails loud instead
|
||||
# of silently shipping unrewritten imports. That gap caused 0.1.16 to
|
||||
# ship `from transcript_auth import ...` (unrewritten — module added
|
||||
# without updating this set), which broke every workspace startup with
|
||||
# `ModuleNotFoundError: No module named 'transcript_auth'`.
|
||||
TOP_LEVEL_MODULES = {
|
||||
"_sanitize_a2a",
|
||||
"a2a_cli",
|
||||
"a2a_client",
|
||||
"a2a_executor",
|
||||
"a2a_mcp_server",
|
||||
"a2a_response",
|
||||
"a2a_tools",
|
||||
"a2a_tools_delegation",
|
||||
"a2a_tools_identity",
|
||||
"a2a_tools_inbox",
|
||||
"a2a_tools_memory",
|
||||
"a2a_tools_messaging",
|
||||
"a2a_tools_rbac",
|
||||
"adapter_base",
|
||||
"agent",
|
||||
"agents_md",
|
||||
"boot_routes",
|
||||
"card_helpers",
|
||||
"config",
|
||||
"configs_dir",
|
||||
"consolidation",
|
||||
"coordinator",
|
||||
"event_log",
|
||||
"events",
|
||||
"executor_helpers",
|
||||
"heartbeat",
|
||||
"inbox",
|
||||
"inbox_uploads",
|
||||
"initial_prompt",
|
||||
"internal_chat_uploads",
|
||||
"internal_file_read",
|
||||
"main",
|
||||
"mcp_cli",
|
||||
"mcp_doctor",
|
||||
"mcp_heartbeat",
|
||||
"mcp_inbox_pollers",
|
||||
"mcp_workspace_resolver",
|
||||
"molecule_ai_status",
|
||||
"not_configured_handler",
|
||||
"platform_auth",
|
||||
"platform_inbound_auth",
|
||||
"plugins",
|
||||
"preflight",
|
||||
"prompt",
|
||||
"runtime_wedge",
|
||||
"secret_redactor",
|
||||
"shared_runtime",
|
||||
"smoke_mode",
|
||||
"transcript_auth",
|
||||
"watcher",
|
||||
}
|
||||
|
||||
# Subdirectory packages — these are already real packages (they have or will
|
||||
# have __init__.py) so the rewrite is `from <pkg>` → `from molecule_runtime.<pkg>`.
|
||||
SUBPACKAGES = {
|
||||
"adapters",
|
||||
"builtin_tools",
|
||||
"lib",
|
||||
"platform_tools",
|
||||
"plugins_registry",
|
||||
"policies",
|
||||
"skill_loader",
|
||||
}
|
||||
|
||||
# Files in workspace/ NOT included in the published package. These are
|
||||
# build artifacts, dev scripts, or monorepo-only scaffolding.
|
||||
EXCLUDE_FILES = {
|
||||
"Dockerfile",
|
||||
"build-all.sh",
|
||||
"rebuild-runtime-images.sh",
|
||||
"entrypoint.sh",
|
||||
"pytest.ini",
|
||||
"requirements.txt",
|
||||
# Note: adapter_base.py, agents_md.py, hermes_executor.py, shared_runtime.py
|
||||
# are kept (referenced by adapters/__init__.py and other modules); they get
|
||||
# their imports rewritten via TOP_LEVEL_MODULES. Excluding them broke the
|
||||
# smoke-test install with `ModuleNotFoundError: adapter_base`.
|
||||
}
|
||||
|
||||
EXCLUDE_DIRS = {
|
||||
"__pycache__",
|
||||
"tests",
|
||||
"molecule_audit", # only used by tests; not on production import path
|
||||
"scripts",
|
||||
}
|
||||
|
||||
|
||||
def build_import_rewriter() -> re.Pattern:
|
||||
"""Compile a single regex matching all import statements that need
|
||||
rewriting. The match groups capture the keyword + module name so the
|
||||
replacement preserves whitespace and trailing punctuation.
|
||||
|
||||
Modules included: TOP_LEVEL_MODULES ∪ SUBPACKAGES.
|
||||
|
||||
The negative-lookahead on `\\.` in the suffix prevents matching
|
||||
`from a2a.server.X import Y` against bare `a2a` (which isn't in our
|
||||
set, but the principle matters for any future short module name that
|
||||
happens to be a prefix of a real package name).
|
||||
"""
|
||||
names = sorted(TOP_LEVEL_MODULES | SUBPACKAGES)
|
||||
alt = "|".join(re.escape(n) for n in names)
|
||||
# Matches:
|
||||
# from <name>(\.|\s|import)
|
||||
# import <name>(\s|$|,)
|
||||
# And captures the keyword + name so we can re-emit with prefix.
|
||||
pattern = (
|
||||
r"(?m)^(?P<indent>\s*)" # leading whitespace (preserved)
|
||||
r"(?P<kw>from|import)\s+" # 'from' or 'import'
|
||||
r"(?P<mod>" + alt + r")" # the module name
|
||||
r"(?P<rest>[\s.,]|$)" # what follows: '.subpath', ' import …', ',', whitespace, EOL
|
||||
)
|
||||
return re.compile(pattern)
|
||||
|
||||
|
||||
def rewrite_imports(text: str, regex: re.Pattern) -> str:
|
||||
"""Replace bare imports with package-prefixed ones.
|
||||
|
||||
`import X` → `import molecule_runtime.X as X` (preserve binding)
|
||||
`from X import Y` → `from molecule_runtime.X import Y`
|
||||
`from X.sub import Y` → `from molecule_runtime.X.sub import Y`
|
||||
|
||||
Rejects `import X as Y` because the rewrite would produce
|
||||
`import molecule_runtime.X as X as Y`, a syntax error. The PR #2433
|
||||
incident shipped this exact pattern past `Python Lint & Test` (which
|
||||
runs against pre-rewrite source) but blew up the wheel-smoke gate.
|
||||
Detecting it here turns the silent build failure into a build-time
|
||||
error with a clear path: use `from X import …` or plain `import X`.
|
||||
"""
|
||||
def repl(m: re.Match) -> str:
|
||||
indent, kw, mod, rest = m.group("indent"), m.group("kw"), m.group("mod"), m.group("rest")
|
||||
if kw == "from":
|
||||
# `from X` or `from X.sub` — always safe to prefix.
|
||||
return f"{indent}from molecule_runtime.{mod}{rest}"
|
||||
# `import X` — preserve the binding name `X` (callers do `X.foo`)
|
||||
# by aliasing. `import X.sub` is uncommon for our modules and would
|
||||
# need a different binding form, but isn't used in workspace/ today.
|
||||
if rest.startswith("."):
|
||||
# `import X.sub` — rewrite as `import molecule_runtime.X.sub` and
|
||||
# leave the trailing dot pattern intact for the rest of the line.
|
||||
return f"{indent}import molecule_runtime.{mod}{rest}"
|
||||
# Detect `import X as Y` — the regex's `rest` group captures only
|
||||
# the immediate following char (whitespace, comma, or EOL), so we
|
||||
# have to peek at the surrounding line context. The match start is
|
||||
# at the line's `import` keyword; everything after the matched
|
||||
# name on the same line is what the source author wrote.
|
||||
line_start = text.rfind("\n", 0, m.start()) + 1
|
||||
line_end = text.find("\n", m.end())
|
||||
if line_end == -1:
|
||||
line_end = len(text)
|
||||
line_after = text[m.end() - len(rest):line_end]
|
||||
# Strip comments from consideration so `import X # noqa` doesn't trip.
|
||||
line_after_no_comment = line_after.split("#", 1)[0]
|
||||
if re.search(r"^\s*as\s+\w+", line_after_no_comment):
|
||||
raise ValueError(
|
||||
f"rewrite_imports: cannot rewrite 'import {mod} as <alias>' on a "
|
||||
f"workspace module — the regex would produce "
|
||||
f"'import molecule_runtime.{mod} as {mod} as <alias>', invalid syntax. "
|
||||
f"Use 'from {mod} import …' or plain 'import {mod}' instead. "
|
||||
f"Offending line: {text[line_start:line_end]!r}"
|
||||
)
|
||||
# Plain `import X` — alias preserves the local name.
|
||||
return f"{indent}import molecule_runtime.{mod} as {mod}{rest}"
|
||||
return regex.sub(repl, text)
|
||||
|
||||
|
||||
def copy_tree_filtered(src: Path, dst: Path) -> list[Path]:
|
||||
"""Copy src/ → dst/ skipping EXCLUDE_FILES + EXCLUDE_DIRS. Returns the
|
||||
list of .py files copied so the caller can run the import rewrite over
|
||||
them in one pass."""
|
||||
py_files: list[Path] = []
|
||||
if dst.exists():
|
||||
shutil.rmtree(dst)
|
||||
dst.mkdir(parents=True)
|
||||
for entry in src.iterdir():
|
||||
if entry.is_dir():
|
||||
if entry.name in EXCLUDE_DIRS:
|
||||
continue
|
||||
sub_py = copy_tree_filtered(entry, dst / entry.name)
|
||||
py_files.extend(sub_py)
|
||||
else:
|
||||
if entry.name in EXCLUDE_FILES:
|
||||
continue
|
||||
shutil.copy2(entry, dst / entry.name)
|
||||
if entry.suffix == ".py":
|
||||
py_files.append(dst / entry.name)
|
||||
return py_files
|
||||
|
||||
|
||||
PYPROJECT_TEMPLATE = """\
|
||||
[build-system]
|
||||
requires = ["setuptools>=68.0", "wheel"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "molecule-ai-workspace-runtime"
|
||||
version = "{version}"
|
||||
description = "Molecule AI workspace runtime — shared infrastructure for all agent adapters"
|
||||
requires-python = ">=3.11"
|
||||
license = {{text = "BSL-1.1"}}
|
||||
readme = "README.md"
|
||||
dependencies = [
|
||||
"a2a-sdk[http-server]>=1.0.0,<2.0",
|
||||
"httpx>=0.27.0",
|
||||
"uvicorn>=0.30.0",
|
||||
"starlette>=0.38.0",
|
||||
"websockets>=12.0",
|
||||
# multipart/form-data parser — required for Starlette's Request.form() on
|
||||
# /internal/chat/uploads/ingest. Without it, Starlette raises AssertionError
|
||||
# when parsing multipart bodies, which the chat-upload handler surfaces as
|
||||
# an opaque 400. Mirrors the canonical pin in workspace/requirements.txt;
|
||||
# >=0.0.27 avoids CVE-2024-53981 (DoS via malformed boundary).
|
||||
# Forensic a78762a0 (2026-05-19): Hermes PDF upload 400 root cause.
|
||||
"python-multipart>=0.0.27",
|
||||
"pyyaml>=6.0",
|
||||
"langchain-core>=0.3.0",
|
||||
"opentelemetry-api>=1.24.0",
|
||||
"opentelemetry-sdk>=1.24.0",
|
||||
"opentelemetry-exporter-otlp-proto-http>=1.24.0",
|
||||
"temporalio>=1.7.0",
|
||||
]
|
||||
|
||||
[project.scripts]
|
||||
molecule-runtime = "molecule_runtime.main:main_sync"
|
||||
molecule-mcp = "molecule_runtime.mcp_cli:main"
|
||||
|
||||
[tool.setuptools.packages.find]
|
||||
where = ["."]
|
||||
include = ["molecule_runtime*", "plugins_registry*"]
|
||||
|
||||
[tool.setuptools.package-data]
|
||||
"molecule_runtime" = ["py.typed"]
|
||||
"plugins_registry" = ["py.typed"]
|
||||
"""
|
||||
|
||||
|
||||
README_TEMPLATE = """\
|
||||
# molecule-ai-workspace-runtime
|
||||
|
||||
Shared workspace runtime for [Molecule AI](https://git.moleculesai.app/molecule-ai/molecule-core)
|
||||
agent adapters. Installed by every workspace template image
|
||||
(`workspace-template-claude-code`, `-langgraph`, `-hermes`, etc.) to provide
|
||||
A2A delegation, heartbeat, memory, plugin loading, and skill management.
|
||||
|
||||
This package is **published from the molecule-core monorepo `workspace/`
|
||||
directory** by the `publish-runtime` GitHub Actions workflow on every
|
||||
`runtime-v*` tag push. **Do not edit this package directly** — edit
|
||||
`workspace/` in the monorepo.
|
||||
|
||||
## External-runtime MCP server (`molecule-mcp`)
|
||||
|
||||
Operators running an agent outside the platform's container fleet
|
||||
(any runtime that supports MCP stdio — Claude Code, hermes, codex,
|
||||
etc.) can install this wheel and run the universal MCP server
|
||||
locally.
|
||||
|
||||
### Requirements
|
||||
|
||||
* **Python ≥3.11.** The wheel sets `requires-python = ">=3.11"`. On
|
||||
older interpreters `pip install` returns the cryptic
|
||||
`Could not find a version that satisfies the requirement` — that
|
||||
message is pip filtering this wheel out, NOT the package missing
|
||||
from PyPI. Upgrade with `brew install python@3.12` /
|
||||
`apt install python3.12` / `pyenv install 3.12` first.
|
||||
* **`pipx` recommended over `pip`.** `pipx install` puts
|
||||
`molecule-mcp` on PATH automatically and isolates the runtime's
|
||||
deps from your system Python. Plain `pip install --user` works
|
||||
but the binary lands in `~/.local/bin` (Linux) or
|
||||
`~/Library/Python/3.X/bin` (macOS) which is often not on PATH on
|
||||
a fresh shell — `claude mcp add molecule-<workspace-slug> -- molecule-mcp`
|
||||
then fails with "command not found" at first use.
|
||||
|
||||
* **Server name in `claude mcp add` is workspace-specific.** The
|
||||
Canvas "Add to Claude Code" snippet stamps a unique slug
|
||||
(`molecule-<workspace-name>`) so a single Claude Code session can
|
||||
talk to N molecule workspaces concurrently — `claude mcp add` keys
|
||||
entries by name in `~/.claude.json`, so re-running with a bare
|
||||
`molecule` name silently overwrites the prior workspace's entry.
|
||||
See [molecule-core#1535](https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1535)
|
||||
for the canonical generator.
|
||||
|
||||
### Install
|
||||
|
||||
```sh
|
||||
# Recommended:
|
||||
pipx install molecule-ai-workspace-runtime
|
||||
|
||||
# Alternative (manage PATH yourself):
|
||||
pip install --user molecule-ai-workspace-runtime
|
||||
```
|
||||
|
||||
### Run
|
||||
|
||||
```sh
|
||||
WORKSPACE_ID=<uuid> \\
|
||||
PLATFORM_URL=https://<tenant>.staging.moleculesai.app \\
|
||||
MOLECULE_WORKSPACE_TOKEN=<bearer> \\
|
||||
molecule-mcp
|
||||
```
|
||||
|
||||
That exposes the same 8 platform tools (`delegate_task`, `list_peers`,
|
||||
`send_message_to_user`, `commit_memory`, etc.) that container-bound
|
||||
runtimes already get via the workspace's auto-spawned MCP. Register
|
||||
the binary in your agent's MCP config — use a workspace-specific
|
||||
server name so multi-workspace setups don't collide (e.g. Claude Code:
|
||||
`claude mcp add molecule-<workspace-slug> -- molecule-mcp` with the env
|
||||
above; the Canvas modal stamps the right slug for you).
|
||||
|
||||
### Keeping the token out of shell history
|
||||
|
||||
Inline `MOLECULE_WORKSPACE_TOKEN=<bearer>` ends up in `~/.zsh_history`
|
||||
and (when registered via `claude mcp add`) plaintext in
|
||||
`~/.claude.json`. To avoid that, write the token to a 0600 file and
|
||||
point `MOLECULE_WORKSPACE_TOKEN_FILE` at it:
|
||||
|
||||
```sh
|
||||
umask 077
|
||||
printf '%s' "<bearer>" > ~/.config/molecule/token
|
||||
WORKSPACE_ID=<uuid> \\
|
||||
PLATFORM_URL=https://<tenant>.staging.moleculesai.app \\
|
||||
MOLECULE_WORKSPACE_TOKEN_FILE=$HOME/.config/molecule/token \\
|
||||
molecule-mcp
|
||||
```
|
||||
|
||||
Token resolution order: `MOLECULE_WORKSPACE_TOKEN` (inline env) →
|
||||
`MOLECULE_WORKSPACE_TOKEN_FILE` (path) → `${CONFIGS_DIR}/.auth_token`
|
||||
(in-container default).
|
||||
|
||||
The token comes from the canvas → Tokens tab. Restarting an external
|
||||
workspace from the canvas no longer revokes the token (PR #2412), so
|
||||
operator tokens persist across status nudges.
|
||||
|
||||
### Push vs poll delivery (Claude Code specifics)
|
||||
|
||||
By default the inbox runs in **poll mode** — every turn the agent
|
||||
calls `wait_for_message`, which blocks up to ~60s on
|
||||
`/activity?since_id=…`. Real-time push delivery is also supported,
|
||||
but on Claude Code it requires THREE conditions, ALL of which must
|
||||
hold:
|
||||
|
||||
1. **The MCP server declares `experimental.claude/channel`** — this
|
||||
wheel does (see `_build_initialize_result`). Nothing for you to
|
||||
do.
|
||||
2. **Claude Code installs the server as a marketplace plugin** — a
|
||||
plain `claude mcp add molecule-<workspace-slug> -- molecule-mcp`
|
||||
produces a non-plugin-sourced server, which Claude Code rejects with
|
||||
`channel_enable requires a marketplace plugin`. Until the
|
||||
official `moleculesai/claude-code-plugin` marketplace lands
|
||||
(tracking [#2936](https://git.moleculesai.app/molecule-ai/molecule-core/issues/2936)),
|
||||
operators who want push must scaffold their own local marketplace
|
||||
under
|
||||
`~/.claude/marketplaces/molecule-local/` containing a
|
||||
`marketplace.json` + `plugin.json` that points at this wheel.
|
||||
3. **Claude Code is launched with the dev-channels flag** — pass
|
||||
`--dangerously-load-development-channels plugin:molecule@<marketplace>`
|
||||
on the `claude` invocation. Without this flag the channel
|
||||
capability is silently ignored.
|
||||
|
||||
Symptom of any condition failing: messages arrive but only via the
|
||||
poll path (every ~1–60s), not real-time. There's currently no
|
||||
diagnostic surfaced — `molecule-mcp doctor` (tracking
|
||||
[#2937](https://git.moleculesai.app/molecule-ai/molecule-core/issues/2937)) is
|
||||
planned.
|
||||
|
||||
If you don't need real-time push, the default poll path works
|
||||
universally with no extra setup; both modes converge on the same
|
||||
`inbox_pop` ack so messages never duplicate.
|
||||
|
||||
See [`docs/workspace-runtime-package.md`](https://git.moleculesai.app/molecule-ai/molecule-core/src/branch/main/docs/workspace-runtime-package.md)
|
||||
for the publish flow and architecture.
|
||||
"""
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(description=__doc__)
|
||||
parser.add_argument("--version", required=True, help="Package version, e.g. 0.1.6")
|
||||
parser.add_argument("--out", required=True, type=Path, help="Build output directory (will be wiped)")
|
||||
parser.add_argument("--source", type=Path, default=Path(__file__).resolve().parent.parent / "workspace",
|
||||
help="Path to monorepo workspace/ directory (default: ../workspace from this script)")
|
||||
args = parser.parse_args()
|
||||
|
||||
src = args.source.resolve()
|
||||
out = args.out.resolve()
|
||||
if not src.is_dir():
|
||||
print(f"error: source not a directory: {src}", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
# Drift gate: assert TOP_LEVEL_MODULES matches workspace/*.py.
|
||||
# Without this, a new top-level module added to workspace/ ships
|
||||
# with unrewritten `from <name> import` statements that explode at
|
||||
# runtime with ModuleNotFoundError. (See 0.1.16 transcript_auth
|
||||
# incident — closed list silently went stale.)
|
||||
on_disk_modules = {
|
||||
f.stem for f in src.glob("*.py")
|
||||
if f.stem not in {"__init__", "conftest"}
|
||||
}
|
||||
missing = on_disk_modules - TOP_LEVEL_MODULES
|
||||
stale = TOP_LEVEL_MODULES - on_disk_modules
|
||||
if missing or stale:
|
||||
print("error: TOP_LEVEL_MODULES drifted from workspace/*.py contents:", file=sys.stderr)
|
||||
if missing:
|
||||
print(f" in workspace/ but NOT in TOP_LEVEL_MODULES (will ship un-rewritten): {sorted(missing)}", file=sys.stderr)
|
||||
if stale:
|
||||
print(f" in TOP_LEVEL_MODULES but NOT in workspace/ (no-op, but misleading): {sorted(stale)}", file=sys.stderr)
|
||||
print(" Edit scripts/build_runtime_package.py:TOP_LEVEL_MODULES to match.", file=sys.stderr)
|
||||
return 3
|
||||
|
||||
# Same drift gate for SUBPACKAGES — catches the inverse class of
|
||||
# bug where a workspace/ subdirectory is referenced by main.py
|
||||
# (`from lib.pre_stop import ...`) but is either missing from
|
||||
# SUBPACKAGES (so the rewriter doesn't qualify the import) or
|
||||
# accidentally listed in EXCLUDE_DIRS (so the directory itself
|
||||
# isn't shipped). 0.1.16-0.1.19 had `lib` in EXCLUDE_DIRS while
|
||||
# main.py imported from it — `ModuleNotFoundError: No module
|
||||
# named 'lib'` at every workspace startup.
|
||||
on_disk_subpkgs = {
|
||||
d.name for d in src.iterdir()
|
||||
if d.is_dir()
|
||||
and d.name not in EXCLUDE_DIRS
|
||||
and d.name not in {"__pycache__"}
|
||||
and (d / "__init__.py").exists()
|
||||
}
|
||||
sub_missing = on_disk_subpkgs - SUBPACKAGES
|
||||
sub_stale = SUBPACKAGES - on_disk_subpkgs
|
||||
if sub_missing or sub_stale:
|
||||
print("error: SUBPACKAGES drifted from workspace/ subdirectories:", file=sys.stderr)
|
||||
if sub_missing:
|
||||
print(f" in workspace/ but NOT in SUBPACKAGES (will ship un-rewritten or be excluded): {sorted(sub_missing)}", file=sys.stderr)
|
||||
if sub_stale:
|
||||
print(f" in SUBPACKAGES but NOT in workspace/ (no-op, but misleading): {sorted(sub_stale)}", file=sys.stderr)
|
||||
print(" Edit scripts/build_runtime_package.py:SUBPACKAGES + EXCLUDE_DIRS to match.", file=sys.stderr)
|
||||
return 3
|
||||
|
||||
pkg_dir = out / "molecule_runtime"
|
||||
print(f"[build] source: {src}")
|
||||
print(f"[build] output: {out}")
|
||||
print(f"[build] package: {pkg_dir}")
|
||||
|
||||
if out.exists():
|
||||
shutil.rmtree(out)
|
||||
out.mkdir(parents=True)
|
||||
|
||||
py_files = copy_tree_filtered(src, pkg_dir)
|
||||
print(f"[build] copied {len(py_files)} .py files")
|
||||
|
||||
# Install plugins_registry/ at the wheel TOP LEVEL so that plugin adapter
|
||||
# code (workspace-template-*) can use bare `from plugins_registry import ...`.
|
||||
# The molecule-runtime package (molecule_runtime/) also ships it at
|
||||
# molecule_runtime/plugins_registry/ (satisfies the rewritten
|
||||
# `from molecule_runtime.plugins_registry import ...` in adapter_base.py).
|
||||
# Both copies coexist: they serve different import namespaces.
|
||||
plugins_src = src / "plugins_registry"
|
||||
plugins_dst = out / "plugins_registry"
|
||||
if plugins_src.is_dir():
|
||||
shutil.copytree(plugins_src, plugins_dst)
|
||||
print(f"[build] installed plugins_registry/ at top level (bare-import shim)")
|
||||
|
||||
# Ensure top-level package marker exists. workspace/ doesn't have one
|
||||
# (it's not a package in monorepo), but the published artifact must.
|
||||
init = pkg_dir / "__init__.py"
|
||||
if not init.exists():
|
||||
init.write_text('"""Molecule AI workspace runtime."""\n')
|
||||
|
||||
# Touch py.typed so type-checkers in adapter consumers see the package
|
||||
# as typed. Empty file is the convention.
|
||||
(pkg_dir / "py.typed").touch()
|
||||
|
||||
# Rewrite imports in every .py file we copied + the new __init__.py.
|
||||
regex = build_import_rewriter()
|
||||
rewrites = 0
|
||||
for f in [*py_files, init]:
|
||||
original = f.read_text()
|
||||
rewritten = rewrite_imports(original, regex)
|
||||
if rewritten != original:
|
||||
f.write_text(rewritten)
|
||||
rewrites += 1
|
||||
print(f"[build] rewrote imports in {rewrites} files")
|
||||
|
||||
# Emit pyproject.toml + README at build root.
|
||||
(out / "pyproject.toml").write_text(PYPROJECT_TEMPLATE.format(version=args.version))
|
||||
(out / "README.md").write_text(README_TEMPLATE)
|
||||
|
||||
print(f"[build] done. To publish:")
|
||||
print(f" cd {out}")
|
||||
print(f" python -m build")
|
||||
print(f" python -m twine upload dist/*")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
@@ -1,95 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# check-cascade-list-vs-manifest.sh — structural drift gate for the
|
||||
# publish-runtime cascade list vs manifest.json workspace_templates.
|
||||
#
|
||||
# WHY: PR #2536 pruned the manifest to 4 supported runtimes; PR #2556
|
||||
# realigned the cascade list to match. The underlying drift hazard
|
||||
# (cascade-list ≠ manifest) was unguarded — the data fix didn't prevent
|
||||
# recurrence. This script is the structural gate that does.
|
||||
#
|
||||
# Behavior-based per project pattern: derives the expected set from
|
||||
# manifest.json and the actual set from the workflow YAML, fails on
|
||||
# any divergence in either direction.
|
||||
#
|
||||
# missing-from-cascade → templates in manifest that publish-runtime.yml
|
||||
# won't auto-rebuild on a new wheel publish
|
||||
# (the codex-stuck-on-stale-runtime bug class)
|
||||
# extra-in-cascade → cascade dispatches to deprecated templates
|
||||
# (the wasted-API-calls + dead-CI-noise class)
|
||||
#
|
||||
# Suffix mapping: manifest names map to GHCR repos via
|
||||
# {name without -default suffix} → molecule-ai-workspace-template-<suffix>
|
||||
# That's the same map publish-runtime.yml's TEMPLATES variable iterates.
|
||||
#
|
||||
# Exit:
|
||||
# 0 cascade matches manifest exactly
|
||||
# 1 drift detected (script prints the diff)
|
||||
# 2 bad usage / missing inputs
|
||||
|
||||
set -eu
|
||||
|
||||
MANIFEST="${1:-manifest.json}"
|
||||
WORKFLOW="${2:-.github/workflows/publish-runtime.yml}"
|
||||
|
||||
if [ ! -f "$MANIFEST" ]; then
|
||||
echo "::error::manifest not found: $MANIFEST" >&2
|
||||
exit 2
|
||||
fi
|
||||
if [ ! -f "$WORKFLOW" ]; then
|
||||
echo "::error::workflow not found: $WORKFLOW" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
# Expected cascade entries: manifest workspace_templates → suffix-only
|
||||
# (strip -default tail, e.g. claude-code-default → claude-code, since
|
||||
# publish-runtime.yml's TEMPLATES uses suffixes that match the
|
||||
# molecule-ai-workspace-template-<suffix> repo naming).
|
||||
EXPECTED=$(jq -r '.workspace_templates[].name' "$MANIFEST" \
|
||||
| sed 's/-default$//' \
|
||||
| sort -u)
|
||||
|
||||
# Actual cascade entries: extract from the TEMPLATES="…" line. We look
|
||||
# for the line, pull the contents between the quotes, and split into
|
||||
# one-per-line. Single source of truth in the workflow itself, no
|
||||
# parallel registry needed.
|
||||
#
|
||||
# Why not \s in the regex: BSD sed (macOS) doesn't recognize \s as
|
||||
# whitespace — treats it as literal `s`. POSIX [[:space:]] works on
|
||||
# both BSD and GNU sed. Same hazard nuked the original draft of this
|
||||
# script: \s* matched empty-prefix-of-literal-s, then the leading
|
||||
# whitespace stayed in the captured group.
|
||||
ACTUAL=$(grep -E '[[:space:]]*TEMPLATES="' "$WORKFLOW" \
|
||||
| head -1 \
|
||||
| sed -E 's/^[[:space:]]*TEMPLATES="([^"]*)".*$/\1/' \
|
||||
| tr ' ' '\n' \
|
||||
| grep -v '^$' \
|
||||
| sort -u)
|
||||
|
||||
if [ -z "$ACTUAL" ]; then
|
||||
echo "::error::could not extract TEMPLATES=\"…\" from $WORKFLOW — has the variable name or quoting changed?" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
MISSING=$(comm -23 <(printf '%s\n' "$EXPECTED") <(printf '%s\n' "$ACTUAL"))
|
||||
EXTRA=$(comm -13 <(printf '%s\n' "$EXPECTED") <(printf '%s\n' "$ACTUAL"))
|
||||
|
||||
if [ -z "$MISSING" ] && [ -z "$EXTRA" ]; then
|
||||
echo "✓ cascade list matches manifest workspace_templates ($(echo "$EXPECTED" | wc -l | tr -d ' ') entries)"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
echo "::error::cascade list drift detected between $MANIFEST and $WORKFLOW" >&2
|
||||
echo "" >&2
|
||||
if [ -n "$MISSING" ]; then
|
||||
echo " Templates in manifest but MISSING from cascade (won't auto-rebuild on wheel publish):" >&2
|
||||
echo "$MISSING" | sed 's/^/ - /' >&2
|
||||
echo "" >&2
|
||||
fi
|
||||
if [ -n "$EXTRA" ]; then
|
||||
echo " Templates in cascade but NOT in manifest (deprecated, wasting dispatch calls):" >&2
|
||||
echo "$EXTRA" | sed 's/^/ - /' >&2
|
||||
echo "" >&2
|
||||
fi
|
||||
echo " Fix: edit the TEMPLATES=\"…\" line in $WORKFLOW so the set matches" >&2
|
||||
echo " manifest.json's workspace_templates (suffix-stripped). See PR #2556 for context." >&2
|
||||
exit 1
|
||||
@@ -1,201 +0,0 @@
|
||||
"""Tests for scripts/build_runtime_package.py — the wheel-build import rewriter.
|
||||
|
||||
Run locally: ``python3 -m unittest scripts/test_build_runtime_package.py -v``
|
||||
|
||||
Why this exists: PR #2433 shipped ``import inbox as _inbox_module`` inside
|
||||
the workspace runtime, and the rewriter expanded it to
|
||||
``import molecule_runtime.inbox as inbox as _inbox_module`` — invalid
|
||||
Python. The wheel-smoke gate caught it post-merge but couldn't block
|
||||
the merge (not a required check yet — see PR #2439). PR #2436 added a
|
||||
build-time gate that raises ``ValueError`` on this pattern; this file
|
||||
locks the rewriter's documented contract under unit test so the gate
|
||||
itself can't silently regress.
|
||||
|
||||
Coverage:
|
||||
- ``import X`` → ``import molecule_runtime.X as X``
|
||||
- ``import X.sub`` → ``import molecule_runtime.X.sub``
|
||||
- ``import X`` + trailing comment is preserved
|
||||
- ``from X import Y`` → ``from molecule_runtime.X import Y``
|
||||
- ``from X.sub import Y`` → ``from molecule_runtime.X.sub import Y``
|
||||
- ``from X import Y, Z`` → ``from molecule_runtime.X import Y, Z``
|
||||
- ``import X as Y`` → raises ValueError (the rewriter would
|
||||
produce ``import molecule_runtime.X as X as Y``, syntax error)
|
||||
- non-allowlist module names → not rewritten (regex anchors on the closed set)
|
||||
- Indented imports (inside def/class) keep their indentation.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import sys
|
||||
import unittest
|
||||
|
||||
# scripts/build_runtime_package.py lives at scripts/ — add scripts/ to sys.path
|
||||
# so the import works whether unittest is invoked from repo root or scripts/.
|
||||
HERE = os.path.dirname(os.path.abspath(__file__))
|
||||
if HERE not in sys.path:
|
||||
sys.path.insert(0, HERE)
|
||||
|
||||
import build_runtime_package as M # noqa: E402
|
||||
|
||||
|
||||
def rewrite(text: str) -> str:
|
||||
"""Run the rewriter end-to-end so the test exercises the same path
|
||||
used by the wheel build (regex compile + substitution)."""
|
||||
regex = M.build_import_rewriter()
|
||||
return M.rewrite_imports(text, regex)
|
||||
|
||||
|
||||
class TestBareImportRewriting(unittest.TestCase):
|
||||
def test_plain_import_aliases_to_preserve_binding(self):
|
||||
self.assertEqual(
|
||||
rewrite("import inbox\n"),
|
||||
"import molecule_runtime.inbox as inbox\n",
|
||||
)
|
||||
|
||||
def test_plain_import_with_trailing_comment_is_preserved(self):
|
||||
# Real-world shape from a2a_mcp_server.py — the comment must
|
||||
# survive the rewrite without losing its leading-space buffer.
|
||||
self.assertEqual(
|
||||
rewrite("import inbox # noqa: E402\n"),
|
||||
"import molecule_runtime.inbox as inbox # noqa: E402\n",
|
||||
)
|
||||
|
||||
def test_import_dotted_keeps_dotted_form(self):
|
||||
# `import X.sub` is rare for our modules but the rewriter must
|
||||
# not double-alias — we want `import molecule_runtime.X.sub`,
|
||||
# not `import molecule_runtime.X.sub as X.sub` (invalid).
|
||||
self.assertEqual(
|
||||
rewrite("import platform_tools.registry\n"),
|
||||
"import molecule_runtime.platform_tools.registry\n",
|
||||
)
|
||||
|
||||
def test_indented_import_preserves_indentation(self):
|
||||
src = "def foo():\n import inbox\n return inbox.x\n"
|
||||
out = rewrite(src)
|
||||
self.assertIn(" import molecule_runtime.inbox as inbox\n", out)
|
||||
|
||||
|
||||
class TestFromImportRewriting(unittest.TestCase):
|
||||
def test_from_module_import_simple(self):
|
||||
self.assertEqual(
|
||||
rewrite("from inbox import InboxState\n"),
|
||||
"from molecule_runtime.inbox import InboxState\n",
|
||||
)
|
||||
|
||||
def test_from_dotted_import(self):
|
||||
self.assertEqual(
|
||||
rewrite("from platform_tools.registry import TOOLS\n"),
|
||||
"from molecule_runtime.platform_tools.registry import TOOLS\n",
|
||||
)
|
||||
|
||||
def test_from_import_multiple_symbols(self):
|
||||
# Multi-import statement — the rewriter only touches the module
|
||||
# prefix, not the names being imported.
|
||||
self.assertEqual(
|
||||
rewrite("from a2a_tools import (foo, bar, baz)\n"),
|
||||
"from molecule_runtime.a2a_tools import (foo, bar, baz)\n",
|
||||
)
|
||||
|
||||
def test_from_import_block_form(self):
|
||||
src = (
|
||||
"from a2a_tools import (\n"
|
||||
" tool_check_task_status,\n"
|
||||
" tool_commit_memory,\n"
|
||||
")\n"
|
||||
)
|
||||
out = rewrite(src)
|
||||
self.assertIn("from molecule_runtime.a2a_tools import (\n", out)
|
||||
# Trailing names + closer are unchanged.
|
||||
self.assertIn(" tool_check_task_status,\n", out)
|
||||
self.assertIn(")\n", out)
|
||||
|
||||
|
||||
class TestImportAsAliasRejection(unittest.TestCase):
|
||||
"""The key regression class — the failure mode that shipped in PR #2433."""
|
||||
|
||||
def test_import_as_alias_raises_value_error(self):
|
||||
with self.assertRaises(ValueError) as ctx:
|
||||
rewrite("import inbox as _inbox_module\n")
|
||||
msg = str(ctx.exception)
|
||||
# Error must name the offending module + suggest the fix.
|
||||
self.assertIn("inbox", msg)
|
||||
self.assertIn("as <alias>", msg)
|
||||
self.assertIn("from", msg) # suggests `from X import …`
|
||||
|
||||
def test_import_as_alias_indented_still_rejected(self):
|
||||
# Indented (inside def/class) — same hazard, same rejection.
|
||||
with self.assertRaises(ValueError):
|
||||
rewrite("def foo():\n import inbox as _x\n")
|
||||
|
||||
def test_import_as_alias_with_trailing_comment_still_rejected(self):
|
||||
with self.assertRaises(ValueError):
|
||||
rewrite("import inbox as _x # comment\n")
|
||||
|
||||
def test_plain_import_with_as_in_comment_does_not_trip(self):
|
||||
# The detection strips comments before pattern-matching, so a
|
||||
# comment containing "as foo" must NOT trigger the rejection.
|
||||
self.assertEqual(
|
||||
rewrite("import inbox # rewriter produces alias as inbox\n"),
|
||||
"import molecule_runtime.inbox as inbox # rewriter produces alias as inbox\n",
|
||||
)
|
||||
|
||||
def test_import_followed_by_comma_is_not_an_alias(self):
|
||||
# `import inbox, os` — comma is not `as`, must not be rejected.
|
||||
# Our regex captures `inbox` then `,` — only `inbox` gets prefixed.
|
||||
# `os` is not in TOP_LEVEL_MODULES so it's left alone.
|
||||
out = rewrite("import inbox, os\n")
|
||||
# The first module is rewritten; the second (non-allowlist) is not.
|
||||
self.assertIn("import molecule_runtime.inbox as inbox", out)
|
||||
|
||||
|
||||
class TestOutsideAllowlistModules(unittest.TestCase):
|
||||
def test_third_party_imports_unchanged(self):
|
||||
# `httpx`, `os`, `re` etc. are not in TOP_LEVEL_MODULES — the
|
||||
# regex must not match them. This is the closed-list invariant
|
||||
# that prevents accidental rewrites of stdlib / third-party.
|
||||
src = "import httpx\nimport os\nfrom re import match\n"
|
||||
self.assertEqual(rewrite(src), src)
|
||||
|
||||
def test_short_name_collision_avoided(self):
|
||||
# `from a2a.server.X import Y` must not match the bare `a2a`
|
||||
# prefix — `a2a` isn't in our allowlist (we allow `a2a_tools`,
|
||||
# `a2a_client`, etc., but not bare `a2a`). Belt-and-suspenders.
|
||||
src = "from a2a.server.routes import create_agent_card_routes\n"
|
||||
self.assertEqual(rewrite(src), src)
|
||||
|
||||
|
||||
class TestEndToEndShape(unittest.TestCase):
|
||||
"""Reproduces the PR #2433 → #2436 incident shape."""
|
||||
|
||||
def test_pr_2433_pattern_now_rejected(self):
|
||||
# The exact line PR #2433 added (inside main()), which produced
|
||||
# `import molecule_runtime.inbox as inbox as _inbox_module` —
|
||||
# invalid syntax in the published wheel.
|
||||
with self.assertRaises(ValueError) as ctx:
|
||||
rewrite(
|
||||
" import inbox as _inbox_module\n"
|
||||
" _inbox_module.set_notification_callback(_on_inbox_message)\n"
|
||||
)
|
||||
# Error message includes the offending line so the operator
|
||||
# knows exactly where to fix.
|
||||
self.assertIn("inbox", str(ctx.exception))
|
||||
|
||||
def test_pr_2436_fix_pattern_works(self):
|
||||
# The fix-forward shape (#2436): top-level `import inbox`,
|
||||
# bridge wired in main() via `inbox.set_notification_callback`.
|
||||
src = (
|
||||
"import inbox\n"
|
||||
"\n"
|
||||
"def main():\n"
|
||||
" inbox.set_notification_callback(cb)\n"
|
||||
)
|
||||
out = rewrite(src)
|
||||
self.assertIn("import molecule_runtime.inbox as inbox\n", out)
|
||||
# The callable reference inside main() is left alone — only
|
||||
# imports get rewritten, not arbitrary `inbox.foo` callsites
|
||||
# (those resolve via the module binding the rewrite preserves).
|
||||
self.assertIn(" inbox.set_notification_callback(cb)\n", out)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
+1
-1
@@ -9,7 +9,7 @@ This repo uses the standard monorepo testing convention: **unit tests live with
|
||||
| Go unit + integration (platform, CLI, handlers) | `workspace-server/**/*_test.go` — run with `cd workspace-server && go test -race ./...` |
|
||||
| TypeScript unit (canvas components, hooks, store) | `canvas/src/**/__tests__/` — run with `cd canvas && npm test -- --run` |
|
||||
| TypeScript unit (MCP server handlers) | `mcp-server/src/__tests__/` — run with `cd mcp-server && npx jest` |
|
||||
| Python unit (workspace runtime, adapters) | `workspace/tests/` — run with `cd workspace && python3 -m pytest` |
|
||||
| Python unit (workspace runtime, adapters) | `molecule-ai-workspace-runtime/tests/` in the standalone runtime repo |
|
||||
| Python unit (SDK: plugin + remote agent) | `sdk/python/tests/` — run with `cd sdk/python && python3 -m pytest` |
|
||||
| **Cross-component E2E** (spans platform + runtime + HTTP) | `tests/e2e/` ← **you are here** |
|
||||
|
||||
|
||||
@@ -283,7 +283,7 @@ claude --dangerously-load-development-channels \
|
||||
|
||||
// externalUniversalMcpTemplate — runtime-agnostic standalone path.
|
||||
// Ships as the `molecule-mcp` console script in the
|
||||
// molecule-ai-workspace-runtime PyPI wheel (workspace/mcp_cli.py).
|
||||
// molecule-ai-workspace-runtime wheel published to the Gitea package registry.
|
||||
// Any MCP-aware runtime (Claude Code, hermes, codex, third-party)
|
||||
// registers it once and gets the same 8 universal tools that
|
||||
// container-bound runtimes use today: delegate_task, list_peers,
|
||||
@@ -322,7 +322,7 @@ const externalUniversalMcpTemplate = `# Universal MCP — standalone register +
|
||||
|
||||
# 1. Install the workspace runtime wheel (once per machine — safe to
|
||||
# re-run; subsequent workspaces share the same wheel):
|
||||
pip install molecule-ai-workspace-runtime
|
||||
pip install --index-url https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/ molecule-ai-workspace-runtime
|
||||
|
||||
# 2. Wire molecule-mcp into your agent's MCP config. Claude Code:
|
||||
# NOTE the server name is workspace-specific ("{{MCP_SERVER_NAME}}") so
|
||||
@@ -344,7 +344,7 @@ claude mcp add {{MCP_SERVER_NAME}} -s user -- env \
|
||||
# needed when calling tools through the MCP server.
|
||||
|
||||
# Need help?
|
||||
# Where to install: https://pypi.org/project/molecule-ai-workspace-runtime/
|
||||
# Where to install: https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/molecule-ai-workspace-runtime/
|
||||
# Documentation: https://doc.moleculesai.app/docs/guides/mcp-server-setup
|
||||
# Common errors:
|
||||
# • "Tools not appearing in your agent" — run ` + "`claude mcp list`" + ` (or
|
||||
@@ -359,8 +359,8 @@ claude mcp add {{MCP_SERVER_NAME}} -s user -- env \
|
||||
`
|
||||
|
||||
// externalPythonTemplate uses molecule-sdk-python's RemoteAgentClient +
|
||||
// A2AServer (PR #13 in that repo). Until the SDK cuts a v0.y release
|
||||
// to PyPI the snippet pins git+main.
|
||||
// A2AServer. Until the SDK is published to the Gitea package registry the
|
||||
// snippet pins git+main.
|
||||
const externalPythonTemplate = `# pip install 'git+https://git.moleculesai.app/molecule-ai/molecule-sdk-python.git@main'
|
||||
|
||||
import asyncio
|
||||
@@ -396,7 +396,7 @@ if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
|
||||
# Need help?
|
||||
# Where to install: https://pypi.org/project/molecule-ai-workspace-runtime/
|
||||
# Where to install: https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/molecule-ai-workspace-runtime/
|
||||
# Documentation: https://doc.moleculesai.app/docs/guides/external-agent-registration
|
||||
# Common errors:
|
||||
# • 401 from /heartbeat — AUTH_TOKEN expired or wrong workspace_id.
|
||||
@@ -445,7 +445,7 @@ const externalHermesChannelTemplate = `# Hermes channel — bridges this workspa
|
||||
# also supported via the plugin's dual-mode fallback.
|
||||
#
|
||||
# 1. Install the runtime + plugin:
|
||||
pip install molecule-ai-workspace-runtime
|
||||
pip install --index-url https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/ molecule-ai-workspace-runtime
|
||||
pip install 'git+https://git.moleculesai.app/molecule-ai/hermes-channel-molecule.git'
|
||||
|
||||
# 2. Export the workspace credentials:
|
||||
@@ -528,7 +528,7 @@ const externalCodexTemplate = `# Codex external setup — outbound tools (MCP) +
|
||||
|
||||
# 1. Install codex CLI, the workspace runtime, and the bridge daemon:
|
||||
npm install -g @openai/codex@latest
|
||||
pip install molecule-ai-workspace-runtime
|
||||
pip install --index-url https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/ molecule-ai-workspace-runtime
|
||||
pip install codex-channel-molecule
|
||||
|
||||
# 2. Wire the molecule MCP server into codex's config.toml — this is
|
||||
@@ -620,7 +620,7 @@ const externalKimiTemplate = `# Kimi CLI external setup — register + heartbeat
|
||||
# No public URL needed; runs behind NAT in poll mode.
|
||||
|
||||
# 1. Install the workspace runtime wheel (provides HTTP client):
|
||||
pip install molecule-ai-workspace-runtime
|
||||
pip install --index-url https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/ molecule-ai-workspace-runtime
|
||||
|
||||
# 2. Save credentials and the bridge script:
|
||||
mkdir -p ~/.molecule-ai/kimi-{{MCP_SERVER_NAME}}
|
||||
@@ -779,7 +779,7 @@ const externalOpenClawTemplate = `# OpenClaw MCP config — outbound tool path.
|
||||
# (register-on-startup + 20s heartbeat). Older versions only ship
|
||||
# a2a_mcp_server which does not heartbeat.
|
||||
npm install -g openclaw@latest
|
||||
pip install "molecule-ai-workspace-runtime>=0.1.999"
|
||||
pip install --index-url https://git.moleculesai.app/api/packages/molecule-ai/pypi/simple/ "molecule-ai-workspace-runtime>=0.1.999"
|
||||
|
||||
# 2. Onboard openclaw against your model provider (one-time setup).
|
||||
# --non-interactive needs an explicit --provider + --model so it
|
||||
|
||||
@@ -1,13 +0,0 @@
|
||||
# coverage.py config — consumed by `pytest --cov` via the pytest-cov
|
||||
# plugin. Lives here (not in pytest.ini) because coverage.py only reads
|
||||
# .coveragerc / setup.cfg / tox.ini / pyproject.toml — the [coverage:*]
|
||||
# sections in pytest.ini are silently ignored. See issue #1817.
|
||||
[run]
|
||||
omit =
|
||||
*/tests/*
|
||||
*/__init__.py
|
||||
plugins_registry/*
|
||||
|
||||
[report]
|
||||
# Skip files at 100% in the term-missing output to keep CI logs readable.
|
||||
skip_covered = True
|
||||
@@ -1,104 +0,0 @@
|
||||
FROM python:3.11-slim@sha256:e78299e55776ca065dcb769f80161f48465ad352014240eb5fe4712e22505e9b
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install Node.js, git, gh CLI in a single layer to minimize image size
|
||||
RUN apt-get update && \
|
||||
apt-get install -y --no-install-recommends curl git ca-certificates && \
|
||||
# Node.js 22
|
||||
curl -fsSL https://deb.nodesource.com/setup_22.x | bash - && \
|
||||
apt-get install -y --no-install-recommends nodejs && \
|
||||
# GitHub CLI
|
||||
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg \
|
||||
| dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg && \
|
||||
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" \
|
||||
> /etc/apt/sources.list.d/github-cli.list && \
|
||||
apt-get update && apt-get install -y --no-install-recommends gh && \
|
||||
# Cleanup apt caches and temp files
|
||||
apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false && \
|
||||
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
|
||||
|
||||
# Create non-root user (claude --dangerously-skip-permissions refuses root)
|
||||
RUN useradd -m -s /bin/bash agent
|
||||
|
||||
# Install base Python dependencies (A2A SDK + HTTP only)
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Copy runtime code (adapters/ has been removed — adapters now live in standalone
|
||||
# template repos and install molecule-ai-workspace-runtime from PyPI)
|
||||
COPY *.py ./
|
||||
COPY entrypoint.sh ./
|
||||
COPY skill_loader/ ./skill_loader/
|
||||
COPY builtin_tools/ ./builtin_tools/
|
||||
COPY plugins_registry/ ./plugins_registry/
|
||||
COPY policies/ ./policies/
|
||||
|
||||
# Create CLI aliases
|
||||
RUN ln -s /app/a2a_cli.py /usr/local/bin/a2a && chmod +x /app/a2a_cli.py /app/a2a_mcp_server.py && \
|
||||
ln -s /app/molecule_ai_status.py /usr/local/bin/molecule-monorepo-status && chmod +x /app/molecule_ai_status.py
|
||||
|
||||
# gh wrapper — auto-prefixes PR / issue titles with the agent role + appends
|
||||
# a body footer. Every agent in the template shares one GitHub PAT so plain
|
||||
# `gh pr list` can't distinguish workspaces; the wrapper reads GIT_AUTHOR_NAME
|
||||
# (set by the platform provisioner, "Molecule AI <Role>") and rewrites the
|
||||
# title/body accordingly. Fails open when the env is missing. Anything that
|
||||
# isn't `gh pr create` or `gh issue create` passes through untouched.
|
||||
# /usr/local/bin is earlier in PATH than /usr/bin/gh so this shadows the
|
||||
# real binary without renaming it.
|
||||
COPY scripts/gh-wrapper.sh /usr/local/bin/gh
|
||||
RUN chmod +x /usr/local/bin/gh
|
||||
|
||||
# Copy the git credential helper so entrypoint.sh can register it at boot.
|
||||
# molecule-git-token-helper.sh fetches a fresh GitHub App installation token
|
||||
# from the platform on every git push/fetch, preventing stale-token failures
|
||||
# after the ~60 min GitHub App token TTL (issue #613 / #547).
|
||||
COPY scripts/molecule-git-token-helper.sh ./scripts/
|
||||
RUN chmod +x ./scripts/molecule-git-token-helper.sh
|
||||
|
||||
# Copy the background token refresh daemon. Runs as a background process
|
||||
# started by entrypoint.sh — refreshes gh CLI auth and the credential
|
||||
# helper cache every 45 min so tokens never expire mid-operation.
|
||||
COPY scripts/molecule-gh-token-refresh.sh ./scripts/
|
||||
RUN chmod +x ./scripts/molecule-gh-token-refresh.sh
|
||||
|
||||
# Generic GIT_ASKPASS helper. Reads HTTPS Basic-Auth credentials from env
|
||||
# vars (GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD, with GITEA_USER / GITEA_TOKEN
|
||||
# as fallback) and emits them on the git credential-prompt protocol so
|
||||
# container-side `git` can authenticate to any private HTTPS remote
|
||||
# without on-disk .gitconfig / .git-credentials mutation. The platform
|
||||
# provisioner sets GIT_ASKPASS=/usr/local/bin/molecule-askpass via
|
||||
# applyAgentGitIdentity (workspace-server/internal/handlers/agent_git_identity.go).
|
||||
# Filename is the only project-specific marker; the script body contains
|
||||
# no vendor literals and is identical to the script shipped in each
|
||||
# open-source workspace template (scripts/git-askpass.sh).
|
||||
COPY scripts/molecule-askpass /usr/local/bin/molecule-askpass
|
||||
RUN chmod +x /usr/local/bin/molecule-askpass
|
||||
|
||||
# Dirs and permissions
|
||||
RUN mkdir -p /workspace /plugins /home/agent/.claude /home/agent/.config /home/agent/.local \
|
||||
/home/agent/.molecule-token-cache && \
|
||||
chown -R agent:agent /app /home/agent /workspace
|
||||
|
||||
# Install gosu for clean root → agent user handoff in entrypoint.
|
||||
# The entrypoint starts as root to fix volume ownership, then exec's
|
||||
# as the agent user so Claude Code's --dangerously-skip-permissions works.
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends gosu && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
|
||||
VOLUME /configs
|
||||
VOLUME /workspace
|
||||
|
||||
EXPOSE 8000
|
||||
|
||||
# HEALTHCHECK: probe the A2A agent-card endpoint so orchestrators and
|
||||
# container runtimes can detect a live, responsive workspace agent.
|
||||
# Uses curl (present in python:3.11-slim base) against the uvicorn server.
|
||||
# PORT is injected at runtime via the molecule-runtime entrypoint; the
|
||||
# default matches EXPOSE.
|
||||
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
|
||||
CMD curl -sf http://localhost:${PORT:-8000}/agent/card >/dev/null || exit 1
|
||||
|
||||
RUN chmod +x /app/entrypoint.sh
|
||||
# Start as root — entrypoint fixes volume permissions then drops to agent
|
||||
CMD ["./entrypoint.sh"]
|
||||
@@ -1 +0,0 @@
|
||||
# trigger autobump for python-multipart pin (PDF P0 cure)
|
||||
@@ -1,105 +0,0 @@
|
||||
"""OFFSEC-003: A2A peer-result sanitization — shared across delegation tools.
|
||||
|
||||
This module is intentionally a LEAF (no imports from the molecule-runtime
|
||||
package) to avoid circular dependency cycles. Both ``a2a_tools_delegation``
|
||||
and ``a2a_tools`` can import from here without creating import loops.
|
||||
|
||||
Trust-boundary design (OFFSEC-003):
|
||||
A2A peer responses are untrusted third-party content. Before passing
|
||||
them to the agent context, they MUST be wrapped in a trust-boundary
|
||||
marker pair so the calling agent knows the content is external.
|
||||
|
||||
Boundary markers:
|
||||
- _A2A_BOUNDARY_START = "[A2A_RESULT_FROM_PEER]"
|
||||
- _A2A_BOUNDARY_END = "[/A2A_RESULT_FROM_PEER]"
|
||||
|
||||
The boundary is the PRIMARY security control. A peer that sends
|
||||
"[A2A_RESULT_FROM_PEER]evil[/A2A_RESULT_FROM_PEER]safe" can make "safe"
|
||||
appear inside the trusted context unless the markers themselves are
|
||||
escaped before wrapping — see _escape_boundary_markers() below.
|
||||
|
||||
Defense-in-depth (secondary):
|
||||
Known prompt-injection control-words are also escaped so that even
|
||||
if a calling agent ignores the boundary marker, embedded attack
|
||||
patterns (SYSTEM:, OVERRIDE:, etc.) lose their special meaning.
|
||||
This is not a complete injection sanitizer — do not rely on it as
|
||||
the primary control.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
|
||||
# ── Trust-boundary markers ────────────────────────────────────────────────────
|
||||
|
||||
_A2A_BOUNDARY_START = "[A2A_RESULT_FROM_PEER]"
|
||||
_A2A_BOUNDARY_END = "[/A2A_RESULT_FROM_PEER]"
|
||||
|
||||
# ── Boundary-marker escaping ─────────────────────────────────────────────────
|
||||
# A peer that sends "[/A2A_RESULT_FROM_PEER]evil" can make "evil" appear
|
||||
# inside the trusted zone. Escape BOTH boundary markers in the raw text
|
||||
# before wrapping so they can never close the boundary early.
|
||||
# We use "[/ " as the escape prefix — visually distinct from the real marker.
|
||||
_A2A_BOUNDARY_START_ESCAPED = "[/ A2A_RESULT_FROM_PEER]"
|
||||
_A2A_BOUNDARY_END_ESCAPED = "[/ /A2A_RESULT_FROM_PEER]"
|
||||
|
||||
|
||||
def _escape_boundary_markers(text: str) -> str:
|
||||
"""Escape boundary markers inside the raw peer text before wrapping.
|
||||
|
||||
Replaces any occurrence of the boundary start/end markers with a
|
||||
visually-similar escaped form so a malicious peer can never close
|
||||
the boundary early or inject a fake opener.
|
||||
"""
|
||||
return (
|
||||
text.replace(_A2A_BOUNDARY_START, _A2A_BOUNDARY_START_ESCAPED)
|
||||
.replace(_A2A_BOUNDARY_END, _A2A_BOUNDARY_END_ESCAPED)
|
||||
)
|
||||
|
||||
|
||||
# ── Defense-in-depth: injection pattern escaping ───────────────────────────────
|
||||
# These patterns cover common prompt-injection phrasings. They are NOT a
|
||||
# complete sanitizer — see module docstring. The boundary marker is the
|
||||
# primary control; these are purely defense-in-depth.
|
||||
|
||||
_INJECTION_PATTERNS = [
|
||||
# Single-word patterns: anchor to word boundary so they don't match
|
||||
# inside other words (e.g. "SYSTEM" in "mySYSTEMatic").
|
||||
# Single-word patterns: anchor to word boundary so they don't match
|
||||
# inside other words (e.g. "SYSTEM" in "mySYSTEMatic").
|
||||
(re.compile(r"(^|[^\w])SYSTEM\b", re.IGNORECASE), r"\1[ESCAPED_SYSTEM]"),
|
||||
(re.compile(r"(^|[^\w])OVERRIDE\b", re.IGNORECASE), r"\1[ESCAPED_OVERRIDE]"),
|
||||
# "INSTRUCTIONS" may appear at the start of a string or after a newline.
|
||||
(re.compile(r"(^|\n)INSTRUCTIONS?\b", re.IGNORECASE), " [ESCAPED_INSTRUCTIONS]"),
|
||||
(re.compile(r"(^|[^\w])IGNORE\s+ALL\b", re.IGNORECASE), r"\1[ESCAPED_IGNORE_ALL]"),
|
||||
(re.compile(r"(^|[^\w])YOU\s+ARE\s+NOW\b", re.IGNORECASE), r"\1[ESCAPED_YOU_ARE_NOW]"),
|
||||
]
|
||||
|
||||
|
||||
def sanitize_a2a_result(text: str) -> str:
|
||||
"""Sanitize untrusted text from an A2A peer (OFFSEC-003).
|
||||
|
||||
Order of operations:
|
||||
1. Escape boundary markers in the raw text (prevents injection).
|
||||
2. Escape known injection patterns (defense-in-depth).
|
||||
|
||||
Returns the input unchanged if it is empty/None.
|
||||
|
||||
Note: this function does NOT add boundary wrappers — callers that need
|
||||
to establish a trust boundary should wrap the sanitized result with
|
||||
``[A2A_RESULT_FROM_PEER]\\n{sanitized}\\n[/A2A_RESULT_FROM_PEER]``.
|
||||
See ``a2a_tools_delegation.py:tool_delegate_task`` for the canonical
|
||||
wrapping pattern.
|
||||
"""
|
||||
if not text:
|
||||
return text
|
||||
|
||||
# 1. Escape boundary markers so a malicious peer cannot break the
|
||||
# trust boundary from inside their response.
|
||||
escaped = _escape_boundary_markers(text)
|
||||
|
||||
# 2. Escape known injection control-words (defense-in-depth only).
|
||||
for pattern, replacement in _INJECTION_PATTERNS:
|
||||
escaped = pattern.sub(replacement, escaped)
|
||||
|
||||
return escaped
|
||||
@@ -1,251 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""A2A CLI — command-line tools for inter-workspace communication.
|
||||
|
||||
Supports both synchronous and asynchronous delegation:
|
||||
a2a delegate <id> <task> — Send task, wait for response (sync)
|
||||
a2a delegate --async <id> <task> — Send task, return task ID immediately
|
||||
a2a status <task_id> — Check task status / get result
|
||||
a2a peers — List available peers
|
||||
a2a info — Show this workspace's info
|
||||
|
||||
Environment variables:
|
||||
WORKSPACE_ID — this workspace's ID
|
||||
PLATFORM_URL — platform API base URL
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import uuid
|
||||
|
||||
import httpx
|
||||
|
||||
_WORKSPACE_ID_raw = os.environ.get("WORKSPACE_ID")
|
||||
if not _WORKSPACE_ID_raw:
|
||||
raise RuntimeError("WORKSPACE_ID environment variable is required but not set")
|
||||
WORKSPACE_ID = _WORKSPACE_ID_raw
|
||||
# Platform URL: always host.docker.internal inside containers. The platform API
|
||||
# is only reachable via the Docker network mesh from inside a workspace
|
||||
# container regardless of the runtime environment (Docker/host).
|
||||
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
|
||||
|
||||
|
||||
async def discover(target_id: str) -> dict | None:
|
||||
"""Discover a peer workspace's URL."""
|
||||
async with httpx.AsyncClient(timeout=30.0) as client:
|
||||
resp = await client.get(
|
||||
f"{PLATFORM_URL}/registry/discover/{target_id}",
|
||||
headers={"X-Workspace-ID": WORKSPACE_ID},
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
return resp.json()
|
||||
return None
|
||||
|
||||
|
||||
async def delegate(target_id: str, task: str, async_mode: bool = False):
|
||||
"""Delegate a task to another workspace."""
|
||||
peer = await discover(target_id)
|
||||
if not peer:
|
||||
print(f"Error: cannot reach workspace {target_id} (access denied or offline)", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
target_url = peer.get("url", "")
|
||||
if not target_url:
|
||||
print(f"Error: workspace {target_id} has no URL", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
task_id = str(uuid.uuid4())
|
||||
|
||||
if async_mode:
|
||||
# Async: send and return immediately, don't wait for response
|
||||
# Use a background task that fires and forgets
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
try:
|
||||
# Send with a short timeout — just confirm receipt
|
||||
resp = await client.post(
|
||||
target_url,
|
||||
json={
|
||||
"jsonrpc": "2.0",
|
||||
"id": task_id,
|
||||
"method": "message/send",
|
||||
"params": {
|
||||
"message": {
|
||||
"role": "user",
|
||||
"messageId": str(uuid.uuid4()),
|
||||
"parts": [{"kind": "text", "text": task}],
|
||||
}
|
||||
},
|
||||
},
|
||||
)
|
||||
# Even if we timeout, the task is queued on the target
|
||||
print(json.dumps({
|
||||
"task_id": task_id,
|
||||
"target": target_id,
|
||||
"status": "submitted",
|
||||
"target_url": target_url,
|
||||
}))
|
||||
except httpx.TimeoutException:
|
||||
# Request was sent but we didn't get confirmation — task may or may not have been received
|
||||
print(json.dumps({
|
||||
"task_id": task_id,
|
||||
"target": target_id,
|
||||
"status": "uncertain",
|
||||
"note": "Request sent but response timed out — delivery unconfirmed. Use 'a2a status' to check.",
|
||||
}), file=sys.stderr)
|
||||
return
|
||||
|
||||
# Sync: wait for full response with retry on rate limit
|
||||
max_retries = 3
|
||||
for attempt in range(max_retries):
|
||||
async with httpx.AsyncClient(timeout=300.0) as client:
|
||||
try:
|
||||
resp = await client.post(
|
||||
target_url,
|
||||
json={
|
||||
"jsonrpc": "2.0",
|
||||
"id": task_id,
|
||||
"method": "message/send",
|
||||
"params": {
|
||||
"message": {
|
||||
"role": "user",
|
||||
"messageId": str(uuid.uuid4()),
|
||||
"parts": [{"kind": "text", "text": task}],
|
||||
}
|
||||
},
|
||||
},
|
||||
)
|
||||
try:
|
||||
data = resp.json()
|
||||
except Exception:
|
||||
print(f"Error: invalid JSON response (status {resp.status_code})", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
if "result" in data:
|
||||
parts = data["result"].get("parts", [])
|
||||
text = parts[0].get("text", "") if parts else ""
|
||||
if text and text != "(no response generated)":
|
||||
print(text)
|
||||
return
|
||||
# Empty or no-response — might be rate limited, retry
|
||||
if attempt < max_retries - 1:
|
||||
delay = 5 * (2 ** attempt)
|
||||
print(f"(empty response, retrying in {delay}s...)", file=sys.stderr)
|
||||
await asyncio.sleep(delay)
|
||||
continue
|
||||
print(text or "(no response after retries)")
|
||||
elif "error" in data:
|
||||
error_msg = data['error'].get('message', 'unknown')
|
||||
if ("rate" in error_msg.lower() or "overloaded" in error_msg.lower()) and attempt < max_retries - 1:
|
||||
delay = 5 * (2 ** attempt)
|
||||
print(f"(rate limited, retrying in {delay}s...)", file=sys.stderr)
|
||||
await asyncio.sleep(delay)
|
||||
continue
|
||||
print(f"Error: {error_msg}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return
|
||||
except httpx.TimeoutException:
|
||||
if attempt < max_retries - 1:
|
||||
delay = 5 * (2 ** attempt)
|
||||
print(f"(timeout, retrying in {delay}s...)", file=sys.stderr)
|
||||
await asyncio.sleep(delay)
|
||||
continue
|
||||
print("Error: request timed out after retries", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
async def check_status(target_id: str, task_id: str):
|
||||
"""Check the status of an async task."""
|
||||
peer = await discover(target_id)
|
||||
if not peer:
|
||||
print(f"Error: cannot reach workspace {target_id}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
target_url = peer.get("url", "")
|
||||
async with httpx.AsyncClient(timeout=30.0) as client:
|
||||
resp = await client.post(
|
||||
target_url,
|
||||
json={
|
||||
"jsonrpc": "2.0",
|
||||
"id": str(uuid.uuid4()),
|
||||
"method": "tasks/get",
|
||||
"params": {"id": task_id},
|
||||
},
|
||||
)
|
||||
data = resp.json()
|
||||
if "result" in data:
|
||||
task = data["result"]
|
||||
status = task.get("status", {}).get("state", "unknown")
|
||||
print(f"Status: {status}")
|
||||
if status == "completed":
|
||||
artifacts = task.get("artifacts", [])
|
||||
for a in artifacts:
|
||||
for p in a.get("parts", []):
|
||||
if p.get("text"):
|
||||
print(p["text"])
|
||||
elif "error" in data:
|
||||
print(f"Error: {data['error'].get('message', 'unknown')}")
|
||||
|
||||
|
||||
async def peers():
|
||||
"""List available peers."""
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.get(f"{PLATFORM_URL}/registry/{WORKSPACE_ID}/peers")
|
||||
if resp.status_code != 200:
|
||||
print("Error: could not fetch peers", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
for p in resp.json():
|
||||
status = p.get("status", "?")
|
||||
role = p.get("role", "")
|
||||
print(f"{p['id']} {p['name']:30s} {status:10s} {role}")
|
||||
|
||||
|
||||
async def info():
|
||||
"""Get this workspace's info."""
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.get(f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}")
|
||||
if resp.status_code == 200:
|
||||
d = resp.json()
|
||||
print(f"ID: {d['id']}")
|
||||
print(f"Name: {d['name']}")
|
||||
print(f"Role: {d.get('role', '')}")
|
||||
print(f"Tier: {d['tier']}")
|
||||
print(f"Status: {d['status']}")
|
||||
print(f"Parent: {d.get('parent_id', '(root)')}")
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: a2a <command> [args]")
|
||||
print("Commands:")
|
||||
print(" delegate <workspace_id> <task> — Send task, wait for response")
|
||||
print(" delegate --async <workspace_id> <task> — Send task, return immediately")
|
||||
print(" status <workspace_id> <task_id> — Check async task status")
|
||||
print(" peers — List available peers")
|
||||
print(" info — Show workspace info")
|
||||
sys.exit(1)
|
||||
|
||||
cmd = sys.argv[1]
|
||||
|
||||
if cmd == "delegate":
|
||||
async_mode = "--async" in sys.argv
|
||||
args = [a for a in sys.argv[2:] if a != "--async"]
|
||||
if len(args) < 2:
|
||||
print("Usage: a2a delegate [--async] <workspace_id> <task>", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
asyncio.run(delegate(args[0], " ".join(args[1:]), async_mode))
|
||||
elif cmd == "status":
|
||||
if len(sys.argv) < 4:
|
||||
print("Usage: a2a status <workspace_id> <task_id>", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
asyncio.run(check_status(sys.argv[2], sys.argv[3]))
|
||||
elif cmd == "peers":
|
||||
asyncio.run(peers())
|
||||
elif cmd == "info":
|
||||
asyncio.run(info())
|
||||
else:
|
||||
print(f"Unknown command: {cmd}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__": # pragma: no cover
|
||||
main()
|
||||
@@ -1,803 +0,0 @@
|
||||
"""A2A protocol client — peer discovery, messaging, and workspace info.
|
||||
|
||||
Shared constants (WORKSPACE_ID, PLATFORM_URL) live here so that
|
||||
a2a_tools and a2a_mcp_server can import them from a single place.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
import random
|
||||
import re
|
||||
import threading
|
||||
import time
|
||||
import uuid
|
||||
from collections import OrderedDict
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
|
||||
import httpx
|
||||
|
||||
import a2a_response
|
||||
from platform_auth import auth_headers, self_source_headers
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_WORKSPACE_ID_raw = os.environ.get("WORKSPACE_ID")
|
||||
if not _WORKSPACE_ID_raw:
|
||||
raise RuntimeError("WORKSPACE_ID environment variable is required but not set")
|
||||
WORKSPACE_ID = _WORKSPACE_ID_raw
|
||||
# Platform URL: always host.docker.internal inside containers. The platform API
|
||||
# is only reachable via the Docker network mesh from inside a workspace
|
||||
# container regardless of the runtime environment (Docker/host).
|
||||
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
|
||||
|
||||
# Cache workspace ID → name mappings (populated by list_peers calls)
|
||||
_peer_names: dict[str, str] = {}
|
||||
|
||||
# Cache: peer workspace_id → the source workspace_id whose registry
|
||||
# returned that peer. Populated by ``a2a_tools.tool_list_peers`` whenever
|
||||
# it queries a specific workspace's peers — so a later
|
||||
# ``tool_delegate_task(target)`` can auto-route through the correct
|
||||
# source workspace without the agent having to specify
|
||||
# ``source_workspace_id`` explicitly.
|
||||
#
|
||||
# Single-workspace mode: dict stays empty, all delegations fall through
|
||||
# to the module-level WORKSPACE_ID (existing behavior).
|
||||
#
|
||||
# Multi-workspace mode: as the agent calls list_peers, this map is
|
||||
# populated with each peer's source. Subsequent delegate_task calls
|
||||
# auto-route. If a peer is registered under multiple sources (rare —
|
||||
# e.g. an org-wide capability) the LAST observed source wins; the agent
|
||||
# can override by passing ``source_workspace_id`` explicitly.
|
||||
_peer_to_source: dict[str, str] = {}
|
||||
|
||||
# Cache workspace ID → full peer record (id, name, role, status, url, ...).
|
||||
# Populated by tool_list_peers and by the lazy registry lookup in
|
||||
# enrich_peer_metadata. The notification-callback path (channel envelope
|
||||
# enrichment) reads this cache on every inbound peer_agent push, so the
|
||||
# read shape stays a dict-like ``__getitem__`` lookup; entries carry
|
||||
# their fetched-at timestamp so TTL eviction is in-line with the
|
||||
# lookup. ``None`` as the record is the negative-cache sentinel:
|
||||
# registry failure is cached for one TTL window so we don't re-fire
|
||||
# the 2s-bounded GET on every push from a flaky peer.
|
||||
#
|
||||
# OrderedDict + maxsize bound (#2482): pre-fix this was an unbounded
|
||||
# ``dict``, so a workspace receiving from N distinct peers across its
|
||||
# lifetime accumulated ~100 bytes/entry × N indefinitely. At 10K peers
|
||||
# that's ~1 MB; at 100K (a chatty platform-wide router) ~10 MB; not
|
||||
# crash-class but unbounded. The LRU bound caps memory + the TTL caps
|
||||
# per-entry staleness — both gates are needed because a runaway poller
|
||||
# touching N new peer_ids per push could grow within a single TTL
|
||||
# window.
|
||||
#
|
||||
# All reads / writes go through ``_peer_metadata_get`` /
|
||||
# ``_peer_metadata_set`` so the LRU move-to-end + size-trim invariants
|
||||
# stay co-located. Direct mutation is allowed only in test fixtures
|
||||
# (clearing for isolation); production code path uses the helpers.
|
||||
_PEER_METADATA_MAXSIZE = 1024
|
||||
_peer_metadata: "OrderedDict[str, tuple[float, dict | None]]" = OrderedDict()
|
||||
_peer_metadata_lock = threading.Lock()
|
||||
|
||||
# How long an entry in ``_peer_metadata`` is treated as fresh. 5 minutes
|
||||
# is the same window we use for delegation routing — long enough that a
|
||||
# busy agent receiving repeated pushes from one peer doesn't hit the
|
||||
# registry on every push, short enough that role/name renames propagate
|
||||
# within a single agent session.
|
||||
_PEER_METADATA_TTL_SECONDS = 300.0
|
||||
|
||||
|
||||
def _peer_metadata_get(canon: str) -> tuple[float, dict | None] | None:
|
||||
"""Read with LRU touch — moves the entry to the most-recently-used
|
||||
position so steady-state pushes from a busy peer don't get evicted
|
||||
by a cold-start burst from new peers. Returns the raw tuple shape
|
||||
callers expect; TTL eviction stays at the call site.
|
||||
"""
|
||||
with _peer_metadata_lock:
|
||||
entry = _peer_metadata.get(canon)
|
||||
if entry is not None:
|
||||
_peer_metadata.move_to_end(canon)
|
||||
return entry
|
||||
|
||||
|
||||
def _peer_metadata_set(canon: str, value: tuple[float, dict | None]) -> None:
|
||||
"""Write + evict-if-over-maxsize. The eviction is in-process and
|
||||
cheap (popitem(last=False) on an OrderedDict is O(1)). Holding the
|
||||
lock across the trim keeps the size invariant stable under concurrent
|
||||
writes from background enrichment workers.
|
||||
"""
|
||||
with _peer_metadata_lock:
|
||||
_peer_metadata[canon] = value
|
||||
_peer_metadata.move_to_end(canon)
|
||||
# Trim the oldest entries until at-or-below maxsize. The bound
|
||||
# is a soft cap — a single overrun (set called when at maxsize)
|
||||
# evicts the LRU entry before returning, never letting size
|
||||
# exceed maxsize.
|
||||
while len(_peer_metadata) > _PEER_METADATA_MAXSIZE:
|
||||
_peer_metadata.popitem(last=False)
|
||||
|
||||
|
||||
# Background-fetch executor for enrich_peer_metadata_nonblocking (#2484).
|
||||
# A small pool — peers are highly TTL-cached, so the steady-state load
|
||||
# is "one fetch per peer per 5 minutes." Two workers handle the cold-
|
||||
# start burst when an agent starts receiving pushes from a new peer for
|
||||
# the first time without backing up the inbox poller. Daemon threads:
|
||||
# the executor must NOT block process exit if the inbox shuts down.
|
||||
_enrich_executor: ThreadPoolExecutor | None = None
|
||||
_enrich_executor_lock = threading.Lock()
|
||||
|
||||
# In-flight peer IDs — guards against a single peer's repeated pushes
|
||||
# scheduling N concurrent registry fetches before the first one fills
|
||||
# the cache. Set membership is "a worker is currently fetching this
|
||||
# peer; subsequent calls should NOT schedule another."
|
||||
_enrich_in_flight: set[str] = set()
|
||||
_enrich_in_flight_lock = threading.Lock()
|
||||
|
||||
|
||||
def _get_enrich_executor() -> ThreadPoolExecutor:
|
||||
"""Lazy-init the enrichment worker pool. Lazy because most test
|
||||
fixtures and short-lived CLI invocations don't need it; only the
|
||||
long-running molecule-mcp / inbox-poller path actually schedules
|
||||
background fetches.
|
||||
"""
|
||||
global _enrich_executor
|
||||
if _enrich_executor is not None:
|
||||
return _enrich_executor
|
||||
with _enrich_executor_lock:
|
||||
if _enrich_executor is None:
|
||||
_enrich_executor = ThreadPoolExecutor(
|
||||
max_workers=2,
|
||||
thread_name_prefix="enrich-peer",
|
||||
)
|
||||
return _enrich_executor
|
||||
|
||||
|
||||
def enrich_peer_metadata_nonblocking(
|
||||
peer_id: str,
|
||||
source_workspace_id: str | None = None,
|
||||
) -> dict | None:
|
||||
"""Cache-first variant of ``enrich_peer_metadata`` — returns
|
||||
immediately without blocking on a registry GET.
|
||||
|
||||
Behavior:
|
||||
- Cache hit (fresh): return the cached record.
|
||||
- Cache miss or TTL expired: schedule a background fetch via the
|
||||
worker pool, return ``None`` (caller renders bare peer_id).
|
||||
The next push for this peer hits the warm cache and gets the
|
||||
full record.
|
||||
|
||||
Why this exists (#2484): the inbox poller's notification callback
|
||||
in molecule-mcp called the synchronous ``enrich_peer_metadata`` on
|
||||
every push, blocking the poller for up to 2s × N uncached peers
|
||||
per batch. Push-delivery latency was gated on registry latency —
|
||||
the exact thing the negative-cache patch in PR #2471 was supposed
|
||||
to avoid amplifying. Moving the fetch off the poller thread means
|
||||
push delivery is bounded by the inbox poll interval, never by
|
||||
registry RTT.
|
||||
|
||||
Trade-off: the FIRST push from a new peer arrives metadata-light
|
||||
(no name/role). The MCP host renders the bare peer_id. Subsequent
|
||||
pushes (within the 5-min TTL) hit the warm cache and get the full
|
||||
record. Acceptable because:
|
||||
- Channel-envelope enrichment is a UX nicety, not a correctness
|
||||
invariant.
|
||||
- The cold-cache window per peer is bounded to one push.
|
||||
- The TTL is long enough that an active conversation never
|
||||
re-enters the cold state.
|
||||
"""
|
||||
canon = _validate_peer_id(peer_id)
|
||||
if canon is None:
|
||||
return None
|
||||
# Cache hit (fresh): return without blocking on a registry GET.
|
||||
# This is the hot path for active peer conversations — avoids
|
||||
# spawning a background thread for every push from a known peer.
|
||||
current = time.monotonic()
|
||||
cached = _peer_metadata_get(canon)
|
||||
if cached is not None:
|
||||
fetched_at, record = cached
|
||||
if current - fetched_at < _PEER_METADATA_TTL_SECONDS:
|
||||
return record
|
||||
# Cache miss or TTL expired: schedule background fetch unless one is
|
||||
# already in flight for this peer. The in-flight set keeps a flurry
|
||||
# of pushes from one peer (e.g., a chatty agent) from spawning N
|
||||
# parallel GETs.
|
||||
with _enrich_in_flight_lock:
|
||||
if canon in _enrich_in_flight:
|
||||
return None
|
||||
_enrich_in_flight.add(canon)
|
||||
try:
|
||||
_get_enrich_executor().submit(
|
||||
_enrich_peer_metadata_worker, canon, source_workspace_id
|
||||
)
|
||||
except RuntimeError:
|
||||
# Executor was shut down (process exit path) — drop the request,
|
||||
# let the caller render bare peer_id.
|
||||
with _enrich_in_flight_lock:
|
||||
_enrich_in_flight.discard(canon)
|
||||
return None
|
||||
|
||||
|
||||
def _enrich_peer_metadata_worker(
|
||||
canon: str, source_workspace_id: str | None
|
||||
) -> None:
|
||||
"""Background-thread body for ``enrich_peer_metadata_nonblocking``.
|
||||
Runs the same fetch logic as the synchronous helper but discards
|
||||
the return value — the cache write is the only output anyone
|
||||
needs. Always clears the in-flight marker so a future cache miss
|
||||
can retry.
|
||||
"""
|
||||
try:
|
||||
enrich_peer_metadata(canon, source_workspace_id)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
# Background workers must not crash the executor — log and
|
||||
# move on. The negative-cache path inside enrich_peer_metadata
|
||||
# already records failures, so a re-attempt is rate-limited
|
||||
# by TTL.
|
||||
logger.debug("_enrich_peer_metadata_worker: %s failed: %s", canon, exc)
|
||||
finally:
|
||||
with _enrich_in_flight_lock:
|
||||
_enrich_in_flight.discard(canon)
|
||||
|
||||
|
||||
def _wait_for_enrichment_inflight_for_testing(timeout: float = 2.0) -> None:
|
||||
"""Block until all in-flight enrichment workers have completed.
|
||||
|
||||
Test-only helper. Production code never has a reason to wait — the
|
||||
point of the nonblocking path is that callers don't care when the
|
||||
cache fills. Tests that want to assert "after the worker runs, the
|
||||
cache has the record" use this to synchronise without sleeping.
|
||||
|
||||
Polls ``_enrich_in_flight`` rather than holding a Condition because
|
||||
the worker pool is already serializing through ``_enrich_in_flight_lock``;
|
||||
poll keeps the production hot path lock-free.
|
||||
"""
|
||||
deadline = time.monotonic() + timeout
|
||||
while time.monotonic() < deadline:
|
||||
with _enrich_in_flight_lock:
|
||||
if not _enrich_in_flight:
|
||||
return
|
||||
time.sleep(0.01)
|
||||
|
||||
|
||||
def _peer_in_flight_clear_for_testing() -> None:
|
||||
"""Clear the in-flight enrichment set. Test-only helper."""
|
||||
with _enrich_in_flight_lock:
|
||||
_enrich_in_flight.clear()
|
||||
|
||||
|
||||
def enrich_peer_metadata(
|
||||
peer_id: str,
|
||||
source_workspace_id: str | None = None,
|
||||
*,
|
||||
now: float | None = None,
|
||||
) -> dict | None:
|
||||
"""Return cached or freshly-fetched metadata for ``peer_id``.
|
||||
|
||||
Sync helper — safe to call from the inbox poller's notification
|
||||
callback thread (which is not async). Hits the in-process cache
|
||||
first; on miss or TTL expiry, GETs ``/registry/discover/<peer_id>``
|
||||
synchronously with a tight timeout. Returns None on validation
|
||||
failure, network failure, or non-200 response so callers can
|
||||
degrade gracefully (the channel envelope falls back to the raw
|
||||
``peer_id`` instead of crashing the push path).
|
||||
|
||||
Negative caching: failure outcomes (4xx/5xx/non-JSON/network
|
||||
exception) are stored as ``(now, None)`` and treated as
|
||||
fresh-but-empty for the TTL window. Without this, a peer with a
|
||||
flaky/missing registry record would re-fire the 2s-bounded GET on
|
||||
EVERY push — turning the cache into a no-op for the exact failure
|
||||
scenarios it most needs to defend against.
|
||||
|
||||
The fetched dict is stored as-is, so callers can read whatever
|
||||
fields the platform exposes (currently: ``id``, ``name``, ``role``,
|
||||
``status``, ``url``). New fields surface automatically without a
|
||||
code change here.
|
||||
"""
|
||||
canon = _validate_peer_id(peer_id)
|
||||
if canon is None:
|
||||
return None
|
||||
|
||||
current = now if now is not None else time.monotonic()
|
||||
cached = _peer_metadata_get(canon)
|
||||
if cached is not None:
|
||||
fetched_at, record = cached
|
||||
if current - fetched_at < _PEER_METADATA_TTL_SECONDS:
|
||||
# Fresh entry — return whatever's there. ``None`` is the
|
||||
# negative-cache sentinel: caller treats absence of fields
|
||||
# the same as a registry miss, which is the desired UX.
|
||||
return record
|
||||
|
||||
src = (source_workspace_id or "").strip() or WORKSPACE_ID
|
||||
url = f"{PLATFORM_URL}/registry/discover/{canon}"
|
||||
try:
|
||||
with httpx.Client(timeout=2.0) as client:
|
||||
resp = client.get(url, headers={"X-Workspace-ID": src, **auth_headers(src)})
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.debug("enrich_peer_metadata: GET %s failed: %s", url, exc)
|
||||
_peer_metadata_set(canon, (current, None))
|
||||
return None
|
||||
|
||||
if resp.status_code != 200:
|
||||
logger.debug(
|
||||
"enrich_peer_metadata: %s returned HTTP %d", url, resp.status_code
|
||||
)
|
||||
_peer_metadata_set(canon, (current, None))
|
||||
return None
|
||||
|
||||
try:
|
||||
data = resp.json()
|
||||
except Exception: # noqa: BLE001
|
||||
_peer_metadata_set(canon, (current, None))
|
||||
return None
|
||||
if not isinstance(data, dict):
|
||||
_peer_metadata_set(canon, (current, None))
|
||||
return None
|
||||
|
||||
_peer_metadata_set(canon, (current, data))
|
||||
if name := data.get("name"):
|
||||
_peer_names[canon] = name
|
||||
return data
|
||||
|
||||
|
||||
def _agent_card_url_for(peer_id: str) -> str:
|
||||
"""Construct the platform-side agent-card URL for ``peer_id``.
|
||||
|
||||
Returns the empty string when ``peer_id`` is not a UUID — same
|
||||
trust-boundary rationale as ``discover_peer``: never interpolate
|
||||
path-traversal characters into a URL. An invalid id reflected back
|
||||
to the receiving agent as ``…/registry/discover/../../foo`` is a
|
||||
foothold we close at construction time.
|
||||
|
||||
Uses the registry's discovery path so the agent receiving a push
|
||||
can hit a single endpoint to enumerate the sender's capabilities
|
||||
+ role + URL. Same shape every workspace exposes regardless of
|
||||
runtime — claude-code, hermes, langchain wrappers all register
|
||||
through ``/registry/register`` and surface through ``/registry/discover``.
|
||||
"""
|
||||
safe_id = _validate_peer_id(peer_id)
|
||||
if safe_id is None:
|
||||
return ""
|
||||
return f"{PLATFORM_URL}/registry/discover/{safe_id}"
|
||||
|
||||
# Sentinel prefix for errors originating from send_a2a_message / child agents.
|
||||
# Used by delegate_task to distinguish real errors from normal response text.
|
||||
_A2A_ERROR_PREFIX = "[A2A_ERROR] "
|
||||
|
||||
# Sentinel prefix for queued-for-poll-mode-peer outcomes (#2967).
|
||||
# When the target workspace is registered as delivery_mode=poll (no
|
||||
# public URL — typical for external molecule-mcp standalone runtimes),
|
||||
# the platform's a2a_proxy.go:402 short-circuit returns a synthetic
|
||||
# {"status":"queued","delivery_mode":"poll","method":"..."} envelope
|
||||
# instead of dispatching over HTTP. The message IS delivered (written
|
||||
# to the platform's inbox queue); there's just no synchronous reply
|
||||
# to relay. Pre-#2967 the client treated this as "unexpected response
|
||||
# shape" → caller saw DELEGATION FAILED → retried → recipient saw
|
||||
# duplicates. The Queued prefix lets callers branch on this outcome
|
||||
# explicitly: "delivered async, no synchronous reply expected" is
|
||||
# different from both success-with-text and failure.
|
||||
_A2A_QUEUED_PREFIX = "[A2A_QUEUED] "
|
||||
|
||||
# Workspace IDs are UUIDs everywhere we generate them (platform's
|
||||
# workspaces.id column, /registry/discover/:id route param, etc.) but
|
||||
# the agent-facing tool surface receives them as free-form strings via
|
||||
# tool args. ``_validate_peer_id`` enforces UUID-shape at the
|
||||
# trust boundary so we never interpolate `..` or `/` into a URL path,
|
||||
# never silently coerce malformed input into a 404, and surface a
|
||||
# clear error to the agent rather than letting an HTTP 4xx bubble up
|
||||
# from the platform with a generic error message.
|
||||
#
|
||||
# Lenient on case + whitespace because real-world peer-id strings
|
||||
# come from list_peers/discover_peer responses (canonical lowercase)
|
||||
# or hand-typed agent input (mixed-case acceptable). Strict on
|
||||
# everything else.
|
||||
_UUID_RE = re.compile(
|
||||
r"^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$"
|
||||
)
|
||||
|
||||
|
||||
def _validate_peer_id(peer_id: str) -> str | None:
|
||||
"""Return the canonicalised peer_id if valid, else None.
|
||||
|
||||
Returning None instead of raising so callers in tool surfaces can
|
||||
convert to a friendly agent-facing string ("workspace_id is not a
|
||||
valid UUID") rather than crashing with a stack trace.
|
||||
"""
|
||||
if not isinstance(peer_id, str):
|
||||
return None
|
||||
pid = peer_id.strip()
|
||||
if not _UUID_RE.match(pid):
|
||||
return None
|
||||
return pid.lower()
|
||||
|
||||
|
||||
async def discover_peer(target_id: str, source_workspace_id: str | None = None) -> dict | None:
|
||||
"""Discover a peer workspace's URL via the platform registry.
|
||||
|
||||
Validates ``target_id`` is a UUID before constructing the URL — a
|
||||
malformed id can't reach the platform handler now, which both
|
||||
short-circuits an avoidable round-trip AND ensures we never
|
||||
interpolate path-traversal characters into the URL.
|
||||
|
||||
``source_workspace_id`` selects which registered workspace asks the
|
||||
question — both the X-Workspace-ID header AND the Authorization
|
||||
bearer token must come from the same workspace, otherwise the
|
||||
platform's TenantGuard rejects the request. Defaults to the
|
||||
module-level WORKSPACE_ID for back-compat with single-workspace
|
||||
callers.
|
||||
"""
|
||||
safe_id = _validate_peer_id(target_id)
|
||||
if safe_id is None:
|
||||
return None
|
||||
src = (source_workspace_id or "").strip() or WORKSPACE_ID
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
try:
|
||||
resp = await client.get(
|
||||
f"{PLATFORM_URL}/registry/discover/{safe_id}",
|
||||
headers={"X-Workspace-ID": src, **auth_headers(src)},
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
return resp.json()
|
||||
return None
|
||||
except Exception as e:
|
||||
logger.error(f"Discovery failed for {target_id}: {e}")
|
||||
return None
|
||||
|
||||
|
||||
# httpx exception classes that indicate a transient transport-layer
|
||||
# failure worth retrying — the request never produced an application
|
||||
# response, so a fresh attempt has a real chance of succeeding. Any
|
||||
# error not in this tuple is treated as deterministic (HTTP-status,
|
||||
# JSON parse, runtime-returned JSON-RPC error, etc.) and surfaced to
|
||||
# the caller on the first try.
|
||||
#
|
||||
# Why each one belongs here:
|
||||
# - ConnectError / ConnectTimeout: peer's listening socket wasn't
|
||||
# ready (mid-restart, not yet bound). Fast failure, fast recovery.
|
||||
# - RemoteProtocolError: peer closed the TCP connection without
|
||||
# writing a response — observed on 2026-04-27 when a peer's prior
|
||||
# in-flight Claude SDK session aborted and the new request's
|
||||
# connection was reset mid-handler.
|
||||
# - ReadError / WriteError: TCP read/write socket error mid-flight,
|
||||
# typically a network blip on the Docker bridge or a peer worker
|
||||
# crash.
|
||||
# - ReadTimeout: peer didn't write ANY response bytes within the
|
||||
# 300s read budget. Distinct from "peer is slow but progressing"
|
||||
# (which httpx surfaces as a successful read with chunked bytes).
|
||||
# Retry budget caps the worst case — see _DELEGATE_TOTAL_BUDGET_S.
|
||||
_TRANSIENT_HTTP_ERRORS: tuple[type[Exception], ...] = (
|
||||
httpx.ConnectError,
|
||||
httpx.ConnectTimeout,
|
||||
httpx.ReadError,
|
||||
httpx.WriteError,
|
||||
httpx.RemoteProtocolError,
|
||||
httpx.ReadTimeout,
|
||||
)
|
||||
|
||||
# Retry budget. Up to 5 attempts (1 initial + 4 retries) with
|
||||
# exponential backoff (1, 2, 4, 8 seconds), each backoff jittered ±25%
|
||||
# to prevent synchronized retry storms across siblings if a peer flaps.
|
||||
# _DELEGATE_TOTAL_BUDGET_S caps cumulative wall-clock so a string of
|
||||
# ReadTimeouts can't make the caller wait 25 minutes — once the
|
||||
# deadline elapses we stop retrying even if attempts remain. 600s = 10
|
||||
# minutes is the agreed worst case the caller can tolerate before
|
||||
# falling back to "peer unavailable" handling in tool_delegate_task.
|
||||
_DELEGATE_MAX_ATTEMPTS = 5
|
||||
_DELEGATE_BACKOFF_BASE_S = 1.0
|
||||
_DELEGATE_BACKOFF_CAP_S = 16.0
|
||||
_DELEGATE_TOTAL_BUDGET_S = 600.0
|
||||
|
||||
|
||||
def _delegate_backoff_seconds(attempt_zero_indexed: int) -> float:
|
||||
"""Return the (jittered) backoff delay before retrying after the
|
||||
given attempt index (0 = backoff before retry #1).
|
||||
|
||||
Pure function so the schedule is unit-testable without monkey-
|
||||
patching asyncio.sleep. Jitter is symmetric ±25% on top of the
|
||||
capped exponential — enough to break sync across simultaneous
|
||||
callers without making the schedule unpredictable.
|
||||
"""
|
||||
base = min(_DELEGATE_BACKOFF_BASE_S * (2 ** attempt_zero_indexed), _DELEGATE_BACKOFF_CAP_S)
|
||||
jitter = base * (0.5 * random.random() - 0.25)
|
||||
return max(0.0, base + jitter)
|
||||
|
||||
|
||||
def _format_a2a_error(exc: BaseException, target_url: str) -> str:
|
||||
"""Format an httpx exception as an [A2A_ERROR] string.
|
||||
|
||||
Some httpx exceptions stringify to empty (RemoteProtocolError,
|
||||
ConnectionReset variants) — the canvas would then render
|
||||
"[A2A_ERROR] " with no detail and the operator has no signal to
|
||||
act on. Always include the exception class name and the target
|
||||
URL so the activity log + Agent Comms panel have actionable
|
||||
information without a trip through container logs.
|
||||
"""
|
||||
msg = str(exc).strip()
|
||||
type_name = type(exc).__name__
|
||||
if not msg:
|
||||
detail = f"{type_name} (no message — likely connection reset or silent timeout)"
|
||||
elif msg.startswith(f"{type_name}:") or msg.startswith(f"{type_name} "):
|
||||
# Already prefixed with the type — don't double-prefix.
|
||||
# Prefix-anchored check (not substring) so a message that
|
||||
# happens to mention some OTHER class name mid-string
|
||||
# (e.g. "got OSError on read") doesn't suppress our own
|
||||
# type prefix and lose the diagnostic signal.
|
||||
detail = msg
|
||||
else:
|
||||
detail = f"{type_name}: {msg}"
|
||||
return f"{_A2A_ERROR_PREFIX}{detail} [target={target_url}]"
|
||||
|
||||
|
||||
async def send_a2a_message(peer_id: str, message: str, source_workspace_id: str | None = None) -> str:
|
||||
"""Send an A2A ``message/send`` to a peer workspace via the platform proxy.
|
||||
|
||||
The target URL is constructed internally as
|
||||
``${PLATFORM_URL}/workspaces/{peer_id}/a2a``. Going through the
|
||||
platform's A2A proxy is the only path that works for both
|
||||
in-container and external runtimes — see
|
||||
a2a_tools.tool_delegate_task for the rationale.
|
||||
|
||||
``source_workspace_id`` is the SENDING workspace — drives both the
|
||||
X-Workspace-ID source-tagging header and the bearer token. Defaults
|
||||
to the module-level WORKSPACE_ID for back-compat. Multi-workspace
|
||||
operators pass it explicitly so each registered workspace's peers
|
||||
are reached via their own auth chain.
|
||||
|
||||
Auto-retries up to _DELEGATE_MAX_ATTEMPTS times on transient
|
||||
transport-layer errors (RemoteProtocolError, ConnectError,
|
||||
ReadTimeout, etc.) with exponential-backoff + jitter, capped by
|
||||
_DELEGATE_TOTAL_BUDGET_S. Application-level failures (HTTP 4xx,
|
||||
JSON-RPC error response, malformed JSON) are NOT retried — they
|
||||
indicate a deterministic problem retry won't fix.
|
||||
"""
|
||||
safe_id = _validate_peer_id(peer_id)
|
||||
if safe_id is None:
|
||||
return f"{_A2A_ERROR_PREFIX}invalid peer_id (expected UUID): {peer_id!r}"
|
||||
src = (source_workspace_id or "").strip() or WORKSPACE_ID
|
||||
target_url = f"{PLATFORM_URL}/workspaces/{safe_id}/a2a"
|
||||
|
||||
# Fix F (Cycle 5 / H2 — flagged 5 consecutive audits): timeout=None allowed
|
||||
# a hung upstream to block the agent indefinitely. Use a generous but bounded
|
||||
# timeout: 30s connect + 300s read (long enough for slow LLM responses).
|
||||
timeout_cfg = httpx.Timeout(connect=30.0, read=300.0, write=30.0, pool=30.0)
|
||||
deadline = time.monotonic() + _DELEGATE_TOTAL_BUDGET_S
|
||||
last_exc: BaseException | None = None
|
||||
|
||||
for attempt in range(_DELEGATE_MAX_ATTEMPTS):
|
||||
async with httpx.AsyncClient(timeout=timeout_cfg) as client:
|
||||
try:
|
||||
# self_source_headers() includes X-Workspace-ID so the
|
||||
# platform's a2a_receive logger records source_id =
|
||||
# WORKSPACE_ID. Otherwise peer-A2A messages — including
|
||||
# the case where target_url resolves to this workspace's
|
||||
# own /a2a — get logged with source_id=NULL and surface
|
||||
# in the recipient's My Chat tab as user-typed input.
|
||||
resp = await client.post(
|
||||
target_url,
|
||||
headers=self_source_headers(src),
|
||||
json={
|
||||
"jsonrpc": "2.0",
|
||||
"id": str(uuid.uuid4()),
|
||||
"method": "message/send",
|
||||
"params": {
|
||||
"message": {
|
||||
"role": "user",
|
||||
"messageId": str(uuid.uuid4()),
|
||||
"parts": [{"kind": "text", "text": message}],
|
||||
}
|
||||
},
|
||||
},
|
||||
)
|
||||
data = resp.json()
|
||||
# Dispatch via the SSOT response model (a2a_response.py).
|
||||
# All shape detection lives in one place — the parser
|
||||
# never raises and routes unknown shapes to Malformed
|
||||
# so a future server-side change is loud, not silent.
|
||||
variant = a2a_response.parse(data)
|
||||
if isinstance(variant, a2a_response.Result):
|
||||
# Match legacy semantics:
|
||||
# parts non-empty + first part has no text → ""
|
||||
# parts empty → "(no response)"
|
||||
# Differentiation matters for callers that assert
|
||||
# on the empty-string case (test_a2a_client).
|
||||
if variant.parts:
|
||||
text = variant.text
|
||||
else:
|
||||
text = "(no response)"
|
||||
# Tag child-reported errors so the caller can
|
||||
# detect them reliably — agent-side bug surfaces
|
||||
# text like "Agent error: <traceback>" inside a
|
||||
# JSON-RPC success envelope.
|
||||
if text.startswith("Agent error:"):
|
||||
return f"{_A2A_ERROR_PREFIX}{text}"
|
||||
return text
|
||||
if isinstance(variant, a2a_response.Queued):
|
||||
# Poll-mode peer — message accepted into the inbox
|
||||
# queue, target agent will fetch via poll. NOT a
|
||||
# failure. Return the queued sentinel so callers
|
||||
# (delegate_task etc.) can render the outcome
|
||||
# accurately instead of treating it as an error.
|
||||
logger.info(
|
||||
"send_a2a_message: queued for poll-mode peer (target=%s method=%s)",
|
||||
target_url,
|
||||
variant.method,
|
||||
)
|
||||
return f"{_A2A_QUEUED_PREFIX}target={safe_id} method={variant.method}"
|
||||
if isinstance(variant, a2a_response.Error):
|
||||
msg = variant.message
|
||||
code = variant.code
|
||||
if msg and code is not None:
|
||||
detail = f"{msg} (code={code})"
|
||||
elif msg:
|
||||
detail = msg
|
||||
elif code is not None:
|
||||
detail = f"JSON-RPC error with no message (code={code})"
|
||||
else:
|
||||
detail = "JSON-RPC error with no message"
|
||||
if variant.restarting:
|
||||
# Surface platform-restart-in-progress
|
||||
# explicitly — caller (UI / delegating agent)
|
||||
# can render a softer "agent is restarting"
|
||||
# message rather than a generic failure.
|
||||
retry = (
|
||||
f", retry_after={variant.retry_after}s"
|
||||
if variant.retry_after is not None
|
||||
else ""
|
||||
)
|
||||
detail = f"{detail} (restarting{retry})"
|
||||
return f"{_A2A_ERROR_PREFIX}{detail} [target={target_url}]"
|
||||
# Malformed — log loud + surface as error so the
|
||||
# operator notices a server change. SSOT refactor
|
||||
# subsumes the inline "queued" check that landed in
|
||||
# the #2972 hotfix; that branch is now the typed
|
||||
# Queued variant above.
|
||||
logger.warning(
|
||||
"send_a2a_message: malformed response (target=%s body=%.200s)",
|
||||
target_url,
|
||||
str(variant.raw),
|
||||
)
|
||||
return (
|
||||
f"{_A2A_ERROR_PREFIX}unexpected response shape "
|
||||
f"(no result, error, or queued envelope): "
|
||||
f"{str(variant.raw)[:200]} [target={target_url}]"
|
||||
)
|
||||
except _TRANSIENT_HTTP_ERRORS as e:
|
||||
last_exc = e
|
||||
attempts_remaining = _DELEGATE_MAX_ATTEMPTS - (attempt + 1)
|
||||
if attempts_remaining <= 0 or time.monotonic() >= deadline:
|
||||
# Out of attempts OR out of total budget — surface
|
||||
# the last error to the caller.
|
||||
break
|
||||
delay = _delegate_backoff_seconds(attempt)
|
||||
# Don't sleep past the deadline — clamp.
|
||||
remaining = deadline - time.monotonic()
|
||||
if delay > remaining:
|
||||
delay = max(0.0, remaining)
|
||||
logger.warning(
|
||||
"send_a2a_message: transient %s on attempt %d/%d, retrying in %.1fs (target=%s)",
|
||||
type(e).__name__,
|
||||
attempt + 1,
|
||||
_DELEGATE_MAX_ATTEMPTS,
|
||||
delay,
|
||||
target_url,
|
||||
)
|
||||
await asyncio.sleep(delay)
|
||||
continue
|
||||
except Exception as e:
|
||||
# Non-transient (HTTP-status, JSON parse, etc.) — don't retry.
|
||||
return _format_a2a_error(e, target_url)
|
||||
# Retries exhausted (or budget elapsed). last_exc must be set
|
||||
# because we only break out of the loop after assigning it.
|
||||
assert last_exc is not None # noqa: S101
|
||||
return _format_a2a_error(last_exc, target_url)
|
||||
|
||||
|
||||
async def get_peers_with_diagnostic(source_workspace_id: str | None = None) -> tuple[list[dict], str | None]:
|
||||
"""Get this workspace's peers, returning (peers, diagnostic).
|
||||
|
||||
diagnostic is None when the call succeeded (status 200, even if the list
|
||||
is empty). When peers is [] for a non-trivial reason (auth failure,
|
||||
workspace-id missing from registry, platform error, network error),
|
||||
diagnostic is a short human-readable string explaining what went wrong
|
||||
so callers can surface it instead of "may be isolated" — see #2397.
|
||||
|
||||
``source_workspace_id`` selects which registered workspace's peers to
|
||||
enumerate; defaults to the module-level WORKSPACE_ID for
|
||||
single-workspace back-compat. Multi-workspace operators iterate over
|
||||
each registered workspace separately so each set of peers is fetched
|
||||
with the correct auth.
|
||||
|
||||
The legacy get_peers() shim below preserves the bare-list contract for
|
||||
non-tool callers.
|
||||
"""
|
||||
src = (source_workspace_id or "").strip() or WORKSPACE_ID
|
||||
url = f"{PLATFORM_URL}/registry/{src}/peers"
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
try:
|
||||
resp = await client.get(
|
||||
url,
|
||||
headers={"X-Workspace-ID": src, **auth_headers(src)},
|
||||
)
|
||||
except Exception as e:
|
||||
return [], f"Cannot reach platform at {PLATFORM_URL}: {e}"
|
||||
|
||||
if resp.status_code == 200:
|
||||
try:
|
||||
data = resp.json()
|
||||
except Exception as e:
|
||||
return [], f"Platform returned 200 but body was not JSON: {e}"
|
||||
if not isinstance(data, list):
|
||||
return [], f"Platform returned 200 but body was not a list: {type(data).__name__}"
|
||||
return data, None
|
||||
|
||||
if resp.status_code in (401, 403):
|
||||
return [], (
|
||||
f"Authentication to platform failed (HTTP {resp.status_code}). "
|
||||
"The workspace bearer token may be invalid — restarting the workspace usually re-mints it."
|
||||
)
|
||||
if resp.status_code == 404:
|
||||
return [], (
|
||||
f"Workspace ID {WORKSPACE_ID} is not registered with the platform (HTTP 404). "
|
||||
"Re-registration via the platform's /registry/register endpoint is needed."
|
||||
)
|
||||
if 500 <= resp.status_code < 600:
|
||||
return [], f"Platform error: HTTP {resp.status_code}."
|
||||
return [], f"Unexpected platform response: HTTP {resp.status_code}."
|
||||
|
||||
|
||||
async def get_peers() -> list[dict]:
|
||||
"""Get this workspace's peers from the platform registry.
|
||||
|
||||
Bare-list shim over get_peers_with_diagnostic() — discards the diagnostic
|
||||
so callers that don't care about the failure reason (e.g. system-prompt
|
||||
bootstrap formatters) get the same shape they always had.
|
||||
"""
|
||||
peers, _ = await get_peers_with_diagnostic()
|
||||
return peers
|
||||
|
||||
|
||||
async def get_workspace_info(source_workspace_id: str | None = None) -> dict:
|
||||
"""Get this workspace's info from the platform.
|
||||
|
||||
``source_workspace_id`` selects which registered workspace to
|
||||
introspect when the agent is registered into multiple workspaces
|
||||
(multi-workspace mode). Unset → defaults to the module-level
|
||||
WORKSPACE_ID — single-workspace operators see no behaviour change.
|
||||
|
||||
Distinguishes three failure shapes so callers can handle them
|
||||
distinctly (#2429):
|
||||
- 410 Gone → workspace was deleted; re-onboard required
|
||||
- 404 / other → workspace never existed (or transient)
|
||||
- exception → network / auth failure
|
||||
"""
|
||||
src = source_workspace_id or WORKSPACE_ID
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
try:
|
||||
resp = await client.get(
|
||||
f"{PLATFORM_URL}/workspaces/{src}",
|
||||
headers=auth_headers(src),
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
return resp.json()
|
||||
if resp.status_code == 410:
|
||||
# #2429: platform returns 410 when status='removed'.
|
||||
# Surface "removed" + the actionable hint so callers
|
||||
# can prompt re-onboard instead of falling through to
|
||||
# "not found" — which made the 2026-04-30 incident
|
||||
# impossible to diagnose ("workspace not found" with
|
||||
# a workspace_id we KNEW we'd just registered).
|
||||
try:
|
||||
body = resp.json()
|
||||
except Exception:
|
||||
body = {}
|
||||
return {
|
||||
"error": "removed",
|
||||
"id": body.get("id", src),
|
||||
"removed_at": body.get("removed_at"),
|
||||
"hint": body.get(
|
||||
"hint",
|
||||
"Workspace was deleted on the platform. "
|
||||
"Regenerate workspace + token from the canvas → Tokens tab.",
|
||||
),
|
||||
}
|
||||
return {"error": "not found"}
|
||||
except Exception as e:
|
||||
return {"error": str(e)}
|
||||
@@ -1,567 +0,0 @@
|
||||
"""Bridge between LangGraph agent and A2A protocol, with SSE streaming support.
|
||||
|
||||
SSE streaming architecture
|
||||
--------------------------
|
||||
The A2A SDK (``DefaultRequestHandler`` + ``EventQueue``) owns the SSE transport
|
||||
layer. This executor's job is to push the right event types into the queue as
|
||||
work progresses:
|
||||
|
||||
1. ``TaskStatusUpdateEvent(state=working)`` — immediately signals start
|
||||
2. ``TaskArtifactUpdateEvent(chunk, append=…)`` — one per LLM text token
|
||||
3. ``Message(final_text)`` — terminal event
|
||||
|
||||
Client compatibility
|
||||
--------------------
|
||||
*Non-streaming* (``message/send``):
|
||||
``ResultAggregator.consume_all()`` processes status/artifact events
|
||||
(updating the task in the store) and returns the final ``Message``
|
||||
immediately — backward-compatible with ``a2a_client.py`` which reads
|
||||
``data["result"]["parts"][0]["text"]``.
|
||||
|
||||
*Streaming* (``message/stream``):
|
||||
``consume_and_emit()`` yields every event above as SSE, letting the client
|
||||
render tokens in real time.
|
||||
|
||||
LangGraph integration
|
||||
---------------------
|
||||
Uses ``agent.astream_events(version="v2")`` to receive ``on_chat_model_stream``
|
||||
events with ``AIMessageChunk`` payloads. Text is extracted from both plain
|
||||
strings (OpenAI / Groq) and Anthropic-style content-block lists. Non-text
|
||||
content (tool_use, etc.) is silently skipped. A fresh ``artifact_id`` is
|
||||
generated for each new LLM ``run_id`` so tool-call cycles are grouped cleanly.
|
||||
"""
|
||||
|
||||
import functools
|
||||
import logging
|
||||
import os
|
||||
import uuid
|
||||
|
||||
from a2a.server.agent_execution import AgentExecutor, RequestContext
|
||||
from a2a.server.events import EventQueue
|
||||
from a2a.server.tasks import TaskUpdater
|
||||
from a2a.types import Part
|
||||
# KI-009: a2a-sdk v1 renames a2a.utils → a2a.helpers; TextPart removed (Part takes text= directly)
|
||||
from a2a.helpers import new_text_message
|
||||
from shared_runtime import (
|
||||
extract_history as _extract_history,
|
||||
extract_message_text,
|
||||
brief_task,
|
||||
set_current_task,
|
||||
)
|
||||
from executor_helpers import (
|
||||
collect_outbound_files,
|
||||
extract_attached_files,
|
||||
read_delegation_results,
|
||||
sanitize_agent_error,
|
||||
)
|
||||
from builtin_tools.telemetry import (
|
||||
A2A_TASK_ID,
|
||||
GEN_AI_OPERATION_NAME,
|
||||
GEN_AI_REQUEST_MODEL,
|
||||
GEN_AI_SYSTEM,
|
||||
WORKSPACE_ID_ATTR,
|
||||
_incoming_trace_context,
|
||||
gen_ai_system_from_model,
|
||||
get_tracer,
|
||||
record_llm_token_usage,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "unknown")
|
||||
|
||||
# LangGraph ReAct cycle budget per turn. Library default is 25; 500 covers
|
||||
# PM fan-outs (plan → 6 delegations → 6 awaits → 6 results → synthesize ≈
|
||||
# 30+ steps even before retries). Overridable via LANGGRAPH_RECURSION_LIMIT.
|
||||
DEFAULT_RECURSION_LIMIT = 500
|
||||
|
||||
|
||||
def _parse_recursion_limit() -> int:
|
||||
"""Read LANGGRAPH_RECURSION_LIMIT; fall back to DEFAULT_RECURSION_LIMIT
|
||||
with a WARNING log on any unparseable or non-positive value."""
|
||||
raw = os.environ.get("LANGGRAPH_RECURSION_LIMIT", "")
|
||||
if not raw:
|
||||
return DEFAULT_RECURSION_LIMIT
|
||||
try:
|
||||
n = int(raw)
|
||||
except ValueError:
|
||||
logger.warning(
|
||||
"LANGGRAPH_RECURSION_LIMIT=%r is not an integer; using default %d",
|
||||
raw, DEFAULT_RECURSION_LIMIT,
|
||||
)
|
||||
return DEFAULT_RECURSION_LIMIT
|
||||
if n <= 0:
|
||||
logger.warning(
|
||||
"LANGGRAPH_RECURSION_LIMIT=%d is not positive; using default %d",
|
||||
n, DEFAULT_RECURSION_LIMIT,
|
||||
)
|
||||
return DEFAULT_RECURSION_LIMIT
|
||||
return n
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Compliance (OWASP Top 10 for Agentic Apps) — optional, lazy-loaded
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
try:
|
||||
from builtin_tools.compliance import (
|
||||
AgencyTracker,
|
||||
ExcessiveAgencyError,
|
||||
PromptInjectionError,
|
||||
redact_pii as _redact_pii,
|
||||
sanitize_input as _sanitize_input,
|
||||
)
|
||||
_COMPLIANCE_AVAILABLE = True
|
||||
except ImportError: # pragma: no cover
|
||||
_COMPLIANCE_AVAILABLE = False
|
||||
|
||||
|
||||
@functools.lru_cache(maxsize=1)
|
||||
def _get_compliance_cfg():
|
||||
"""Return ComplianceConfig or None (cached for process lifetime)."""
|
||||
try:
|
||||
from config import load_config
|
||||
return load_config().compliance
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
def _extract_chunk_text(content) -> list[str]:
|
||||
"""Extract text strings from an LLM streaming chunk's content field.
|
||||
|
||||
Handles both provider content styles:
|
||||
- OpenAI / Groq: ``content`` is a plain ``str`` (empty for tool-call chunks).
|
||||
- Anthropic: ``content`` is a list of typed blocks, e.g.
|
||||
``[{"type": "text", "text": "Hello"}, {"type": "tool_use", ...}]``
|
||||
|
||||
Only ``"text"`` blocks are returned; ``tool_use``, ``tool_result``, and
|
||||
other non-text blocks are filtered out so raw tool JSON never appears in
|
||||
the SSE stream.
|
||||
|
||||
Args:
|
||||
content: ``chunk.content`` value from an ``on_chat_model_stream`` event.
|
||||
|
||||
Returns:
|
||||
List of non-empty text strings.
|
||||
"""
|
||||
if isinstance(content, str):
|
||||
return [content] if content else []
|
||||
if isinstance(content, list):
|
||||
texts: list[str] = []
|
||||
for block in content:
|
||||
if isinstance(block, dict) and block.get("type") == "text":
|
||||
text = block.get("text", "")
|
||||
if text:
|
||||
texts.append(text)
|
||||
elif isinstance(block, str) and block:
|
||||
texts.append(block)
|
||||
return texts
|
||||
return []
|
||||
|
||||
|
||||
class LangGraphA2AExecutor(AgentExecutor):
|
||||
"""Bridges LangGraph agent to A2A event model with SSE streaming support.
|
||||
|
||||
Always uses ``agent.astream_events()`` so that:
|
||||
- Streaming clients (``message/stream``) receive token-level SSE events.
|
||||
- Non-streaming clients (``message/send``) receive the final ``Message``
|
||||
collected from the same stream — no duplicate LLM call, full compat.
|
||||
"""
|
||||
|
||||
def __init__(self, agent, heartbeat=None, model: str = "unknown"):
|
||||
self.agent = agent # Compiled LangGraph graph (create_react_agent output)
|
||||
self._heartbeat = heartbeat
|
||||
self._model = model # e.g. "anthropic:claude-sonnet-4-6"
|
||||
|
||||
async def execute(self, context: RequestContext, event_queue: EventQueue) -> None:
|
||||
"""Execute a task from an A2A request with SSE streaming.
|
||||
|
||||
Routes through the Temporal durable workflow when a global
|
||||
``TemporalWorkflowWrapper`` is initialised and connected to Temporal;
|
||||
otherwise falls back to ``_core_execute()`` (direct path).
|
||||
|
||||
Event emission sequence:
|
||||
1. TaskStatusUpdateEvent(working) — immediate start signal
|
||||
2. TaskArtifactUpdateEvent chunks — token-by-token via astream_events
|
||||
3. Message(final_text) — terminal; non-streaming clients
|
||||
return on this; streaming clients
|
||||
also receive it as the last SSE event.
|
||||
"""
|
||||
# ── Optional Temporal durable execution wrapper ──────────────────────
|
||||
# When a TemporalWorkflowWrapper is active this routes execution through
|
||||
# a MoleculeAIAgentWorkflow (task_receive → llm_call → task_complete).
|
||||
# Falls back silently to _core_execute() on any error or if Temporal
|
||||
# is unavailable, so the client always receives a response.
|
||||
try:
|
||||
from builtin_tools.temporal_workflow import get_wrapper as _get_temporal_wrapper
|
||||
|
||||
_tw = _get_temporal_wrapper()
|
||||
if _tw is not None and _tw.is_available():
|
||||
return await _tw.run(self, context, event_queue)
|
||||
except Exception:
|
||||
pass # Never let the wrapper path crash the executor
|
||||
|
||||
await self._core_execute(context, event_queue)
|
||||
|
||||
async def _core_execute(self, context: RequestContext, event_queue: EventQueue) -> str:
|
||||
"""Core execution pipeline — called directly or from a Temporal activity.
|
||||
|
||||
This is the original ``execute()`` body, extracted so that the Temporal
|
||||
``llm_call`` activity can invoke it without re-entering the wrapper
|
||||
check and causing infinite recursion.
|
||||
|
||||
Returns the final response text (empty string on empty input or error).
|
||||
|
||||
Event emission sequence:
|
||||
1. TaskStatusUpdateEvent(working) — immediate start signal
|
||||
2. TaskArtifactUpdateEvent chunks — token-by-token via astream_events
|
||||
3. Message(final_text) — terminal event
|
||||
"""
|
||||
user_input = extract_message_text(context)
|
||||
# Inject delegation results from prior turns. Heartbeat writes
|
||||
# completed delegation rows to DELEGATION_RESULTS_FILE and sends
|
||||
# a self-message to wake the agent; this consumes the file and
|
||||
# surfaces the results as context so the agent can act on them
|
||||
# without needing an explicit check_task_status call.
|
||||
# Results are prepended so they are visible even when the
|
||||
# self-message text is overwritten by a subsequent user message.
|
||||
pending_results = read_delegation_results()
|
||||
if pending_results:
|
||||
logger.info("A2A execute: injecting %d delegation result(s)", pending_results.count("\n") + 1)
|
||||
user_input = f"[Delegation results available]\n{pending_results}\n\n{user_input}"
|
||||
# Pull attached files from A2A message parts (kind: "file") and
|
||||
# append a manifest to the prompt so the agent knows they exist.
|
||||
# LangGraph tools (filesystem, bash, skills) can then open the
|
||||
# files by path — without this the agent silently ignores the
|
||||
# attachments and replies "I'm not sure what you're referring to".
|
||||
_attached_files = extract_attached_files(getattr(context, "message", None))
|
||||
if _attached_files:
|
||||
_manifest = "\n\nAttached files:\n" + "\n".join(
|
||||
f"- {f['name']} ({f['mime_type'] or 'unknown type'}) at {f['path']}"
|
||||
for f in _attached_files
|
||||
)
|
||||
user_input = (user_input + _manifest) if user_input else _manifest.lstrip()
|
||||
if not user_input:
|
||||
parts = getattr(getattr(context, "message", None), "parts", None)
|
||||
logger.warning("A2A execute: no text content in message parts: %s", parts)
|
||||
await event_queue.enqueue_event(
|
||||
new_text_message("Error: message contained no text content.")
|
||||
)
|
||||
return ""
|
||||
|
||||
# ── OA-01: Prompt injection check (OWASP Agentic Top 10) ────────────
|
||||
_compliance_cfg = _get_compliance_cfg() if _COMPLIANCE_AVAILABLE else None
|
||||
if _COMPLIANCE_AVAILABLE and _compliance_cfg and _compliance_cfg.mode == "owasp_agentic":
|
||||
try:
|
||||
user_input = _sanitize_input(
|
||||
user_input,
|
||||
prompt_injection_mode=_compliance_cfg.prompt_injection,
|
||||
context_id=context.context_id or "",
|
||||
)
|
||||
except PromptInjectionError as exc:
|
||||
await event_queue.enqueue_event(
|
||||
new_text_message(f"Request blocked: {exc}")
|
||||
)
|
||||
return ""
|
||||
|
||||
logger.info("A2A execute: user_input=%s", user_input[:200])
|
||||
|
||||
# ── OTEL: task_receive span ──────────────────────────────────────────
|
||||
parent_ctx = _incoming_trace_context.get()
|
||||
tracer = get_tracer()
|
||||
|
||||
_result: str = "" # captured inside the span for return after it closes
|
||||
|
||||
with tracer.start_as_current_span("task_receive", context=parent_ctx) as task_span:
|
||||
task_span.set_attribute(WORKSPACE_ID_ATTR, _WORKSPACE_ID)
|
||||
task_span.set_attribute(A2A_TASK_ID, context.context_id or "")
|
||||
task_span.set_attribute("a2a.input_preview", user_input[:256])
|
||||
|
||||
# Resolve IDs — the RequestContextBuilder always sets them, but
|
||||
# we generate fallbacks for safety (e.g. in unit tests).
|
||||
task_id = context.task_id or str(uuid.uuid4())
|
||||
context_id = context.context_id or str(uuid.uuid4())
|
||||
|
||||
# A2A v1 contract (a2a-sdk ≥ 1.0): enqueue a Task event before any
|
||||
# TaskStatusUpdateEvent. The framework only auto-creates the Task
|
||||
# on continuation messages (existing task_id resolves via
|
||||
# task_manager.get_task()). For fresh requests get_task() returns
|
||||
# None and the SDK rejects the first status update with
|
||||
# InvalidAgentResponseError("Agent should enqueue Task before
|
||||
# TaskStatusUpdateEvent event") — see a2a/server/agent_execution/
|
||||
# active_task.py for the validation site. PR #2170 migrated the
|
||||
# surface to v1 but missed this contract; the synth-E2E gate
|
||||
# surfaced it on every run after staging deploy.
|
||||
if getattr(context, "current_task", None) is None:
|
||||
from a2a.types import Task, TaskState, TaskStatus
|
||||
await event_queue.enqueue_event(
|
||||
Task(
|
||||
id=task_id,
|
||||
context_id=context_id,
|
||||
status=TaskStatus(state=TaskState.TASK_STATE_SUBMITTED),
|
||||
)
|
||||
)
|
||||
|
||||
updater = TaskUpdater(event_queue, task_id, context_id)
|
||||
|
||||
try:
|
||||
# set_current_task INSIDE the try so active_tasks is always
|
||||
# decremented by the finally block even if CancelledError hits
|
||||
# during the heartbeat HTTP push. Moving it outside the try
|
||||
# created a window where cancellation left active_tasks stuck
|
||||
# at 1, permanently blocking queue drain. (#2026)
|
||||
await set_current_task(self._heartbeat, brief_task(user_input))
|
||||
messages = _extract_history(context)
|
||||
if messages:
|
||||
logger.info("A2A execute: injecting %d history messages", len(messages))
|
||||
messages.append(("human", user_input))
|
||||
|
||||
# Recursion limit: see DEFAULT_RECURSION_LIMIT and
|
||||
# _parse_recursion_limit() at module top. Re-read on every
|
||||
# call so the env var can be hot-changed between requests.
|
||||
recursion_limit = _parse_recursion_limit()
|
||||
run_config = {
|
||||
"configurable": {"thread_id": context_id},
|
||||
"run_name": f"a2a-{context_id[:8]}",
|
||||
"recursion_limit": recursion_limit,
|
||||
}
|
||||
|
||||
# ── OTEL: llm_call span ──────────────────────────────────────
|
||||
with tracer.start_as_current_span("llm_call") as llm_span:
|
||||
llm_span.set_attribute(GEN_AI_OPERATION_NAME, "chat")
|
||||
llm_span.set_attribute(GEN_AI_SYSTEM, gen_ai_system_from_model(self._model))
|
||||
llm_span.set_attribute(GEN_AI_REQUEST_MODEL, self._model)
|
||||
llm_span.set_attribute(WORKSPACE_ID_ATTR, _WORKSPACE_ID)
|
||||
|
||||
# ── Step 1: signal "working" to streaming clients ─────────
|
||||
await updater.start_work()
|
||||
|
||||
# ── Step 2: stream tokens via LangGraph astream_events ────
|
||||
# Each "on_chat_model_stream" event carries an AIMessageChunk.
|
||||
# We emit one TaskArtifactUpdateEvent per text chunk so SSE
|
||||
# clients can render tokens in real time.
|
||||
# artifact_id resets on each new LLM run_id so agent→tool→agent
|
||||
# cycles each get their own artifact slot.
|
||||
|
||||
artifact_id = str(uuid.uuid4())
|
||||
has_streamed = False # True after first chunk for current artifact
|
||||
current_run_id = None # Detects new LLM call in a ReAct cycle
|
||||
accumulated: list[str] = [] # All text for the final Message
|
||||
last_ai_message = None # Saved for token-usage telemetry
|
||||
|
||||
# ── OA-03: Excessive agency tracker ──────────────────────
|
||||
_agency = (
|
||||
AgencyTracker(
|
||||
max_tool_calls=_compliance_cfg.max_tool_calls_per_task,
|
||||
max_duration_seconds=float(_compliance_cfg.max_task_duration_seconds),
|
||||
)
|
||||
if _COMPLIANCE_AVAILABLE and _compliance_cfg and _compliance_cfg.mode == "owasp_agentic"
|
||||
else None
|
||||
)
|
||||
|
||||
# ── Tool trace: collect every tool invocation for
|
||||
# platform-level observability ────────────────────
|
||||
# Keyed by run_id so parallel tool calls (LangGraph
|
||||
# supports them) pair start→end correctly. Capped at
|
||||
# MAX_TOOL_TRACE entries to prevent runaway loops from
|
||||
# ballooning the JSONB payload.
|
||||
MAX_TOOL_TRACE = 200
|
||||
tool_trace: list[dict] = []
|
||||
tool_trace_by_run: dict[str, dict] = {}
|
||||
|
||||
async for event in self.agent.astream_events(
|
||||
{"messages": messages},
|
||||
config=run_config,
|
||||
version="v2",
|
||||
):
|
||||
kind = event.get("event", "")
|
||||
|
||||
if kind == "on_chat_model_stream":
|
||||
run_id = event.get("run_id", "")
|
||||
if run_id and run_id != current_run_id:
|
||||
# New LLM run started — fresh artifact slot
|
||||
current_run_id = run_id
|
||||
artifact_id = str(uuid.uuid4())
|
||||
has_streamed = False
|
||||
|
||||
chunk = event.get("data", {}).get("chunk")
|
||||
if chunk is not None:
|
||||
texts = _extract_chunk_text(chunk.content)
|
||||
for text in texts:
|
||||
await updater.add_artifact(
|
||||
parts=[Part(text=text)], # v1: TextPart removed, Part takes text= directly
|
||||
artifact_id=artifact_id,
|
||||
append=has_streamed, # False=first, True=append
|
||||
last_chunk=False,
|
||||
)
|
||||
has_streamed = True
|
||||
accumulated.append(text)
|
||||
|
||||
elif kind == "on_tool_start":
|
||||
tool_name = event.get("name", "?")
|
||||
tool_input = event.get("data", {}).get("input", "")
|
||||
tool_run_id = event.get("run_id", "")
|
||||
logger.debug("SSE: tool start — %s", tool_name)
|
||||
if len(tool_trace) < MAX_TOOL_TRACE:
|
||||
entry = {
|
||||
"tool": tool_name,
|
||||
"input": str(tool_input)[:500] if tool_input else "",
|
||||
}
|
||||
tool_trace.append(entry)
|
||||
if tool_run_id:
|
||||
tool_trace_by_run[tool_run_id] = entry
|
||||
if _agency is not None:
|
||||
_agency.on_tool_call(
|
||||
tool_name=tool_name,
|
||||
context_id=context_id,
|
||||
)
|
||||
|
||||
elif kind == "on_tool_end":
|
||||
tool_end_name = event.get("name", "?")
|
||||
tool_output = event.get("data", {}).get("output", "")
|
||||
tool_run_id = event.get("run_id", "")
|
||||
logger.debug("SSE: tool end — %s", tool_end_name)
|
||||
# Pair via run_id so parallel tool calls don't clobber each other.
|
||||
entry = tool_trace_by_run.get(tool_run_id) if tool_run_id else None
|
||||
if entry is not None:
|
||||
entry["output_preview"] = str(tool_output)[:300] if tool_output else ""
|
||||
|
||||
elif kind == "on_chat_model_end":
|
||||
# Capture the last completed AIMessage for token telemetry
|
||||
output = event.get("data", {}).get("output")
|
||||
if output is not None:
|
||||
last_ai_message = output
|
||||
|
||||
# Record token usage from the last completed LLM call
|
||||
if last_ai_message is not None:
|
||||
record_llm_token_usage(llm_span, {"messages": [last_ai_message]})
|
||||
|
||||
# Build final text from all accumulated streaming tokens
|
||||
final_text = "".join(accumulated).strip() or "(no response generated)"
|
||||
logger.info("A2A execute: response length=%d chars", len(final_text))
|
||||
|
||||
# ── OA-02 / OA-06: Output PII redaction ──────────────────────
|
||||
if _COMPLIANCE_AVAILABLE and _compliance_cfg and _compliance_cfg.mode == "owasp_agentic":
|
||||
final_text, _pii_types = _redact_pii(final_text)
|
||||
if _pii_types:
|
||||
from builtin_tools.audit import log_event as _audit_log
|
||||
_audit_log(
|
||||
event_type="compliance",
|
||||
action="pii.redact",
|
||||
resource="task_output",
|
||||
outcome="redacted",
|
||||
pii_types=_pii_types,
|
||||
context_id=context_id,
|
||||
)
|
||||
|
||||
# ── OTEL: task_complete span ─────────────────────────────────
|
||||
with tracer.start_as_current_span("task_complete") as done_span:
|
||||
done_span.set_attribute(WORKSPACE_ID_ATTR, _WORKSPACE_ID)
|
||||
done_span.set_attribute(A2A_TASK_ID, context_id)
|
||||
done_span.set_attribute("task.has_response", bool(accumulated))
|
||||
done_span.set_attribute("task.response_length", len(final_text))
|
||||
|
||||
# ── Step 3: emit final Message ────────────────────────────────
|
||||
# Non-streaming: ResultAggregator.consume_all() returns this
|
||||
# immediately as the response (a2a_client.py reads .parts[0].text).
|
||||
# Streaming: yielded as the last SSE event in the stream.
|
||||
#
|
||||
# If the reply mentions /workspace/... paths, stage each one
|
||||
# and emit as FileParts alongside the text so the canvas can
|
||||
# render a download button. Same contract the hermes executor
|
||||
# uses — every runtime going through this code path (langgraph,
|
||||
# deepagents, future ReAct variants) inherits it.
|
||||
_outbound = collect_outbound_files(final_text)
|
||||
if _outbound:
|
||||
# NOTE: do NOT re-import `Part` here. It is already imported
|
||||
# at module scope (line 42). A function-scope `from a2a.types
|
||||
# import ... Part ...` would mark `Part` as a local name
|
||||
# throughout this function under Python's scoping rules,
|
||||
# making the earlier `Part(text=text)` call (line ~358, inside
|
||||
# the astream_events loop) raise UnboundLocalError because
|
||||
# the local binding is not yet in scope at that point.
|
||||
#
|
||||
# a2a-sdk 1.x flattened the Part shape: 0.x used
|
||||
# `Part(root=TextPart(text=...))` / `Part(root=FilePart(file=
|
||||
# FileWithUri(uri=..., name=..., mimeType=...)))` (Pydantic
|
||||
# discriminated-union style). 1.x's Part is a single proto
|
||||
# message with flat fields: text, url, filename, media_type,
|
||||
# raw, data, metadata. TextPart/FilePart/FileWithUri were
|
||||
# removed. Same for Message: messageId/taskId/contextId
|
||||
# camelCase became message_id/task_id/context_id.
|
||||
from a2a.types import Message, Role
|
||||
_parts: list[Part] = [Part(text=final_text)] if final_text else []
|
||||
for f in _outbound:
|
||||
_parts.append(Part(
|
||||
url="workspace:" + f["path"],
|
||||
filename=f["name"],
|
||||
media_type=f["mime_type"],
|
||||
))
|
||||
msg = Message(
|
||||
message_id=uuid.uuid4().hex,
|
||||
# 1.x Role is a protobuf enum: ROLE_UNSPECIFIED,
|
||||
# ROLE_USER, ROLE_AGENT. Old `Role.agent` (Pydantic
|
||||
# lowercase enum) doesn't exist anymore.
|
||||
role=Role.ROLE_AGENT,
|
||||
parts=_parts,
|
||||
task_id=task_id,
|
||||
context_id=context_id,
|
||||
)
|
||||
else:
|
||||
msg = new_text_message(final_text, task_id=task_id, context_id=context_id)
|
||||
# Attach tool_trace via metadata when supported. Guarded with
|
||||
# hasattr because some test mocks return a plain string here.
|
||||
if tool_trace and hasattr(msg, "metadata"):
|
||||
try:
|
||||
msg.metadata = {"tool_trace": tool_trace}
|
||||
except (AttributeError, TypeError):
|
||||
# `new_text_message()` returns a plain string in
|
||||
# MagicMock paths in tests, where assignment to
|
||||
# .metadata raises despite hasattr being true (the
|
||||
# mock has the attribute as a property). Suppression
|
||||
# is intentional — production Message objects always
|
||||
# accept the assignment. See #1787 + commit dcbcf19
|
||||
# for the original test-mock motivation.
|
||||
logger.debug("metadata attach skipped (non-Message return from new_text_message)")
|
||||
# A2A v1 (a2a-sdk ≥ 1.0): once Task is enqueued (above, PR #2558),
|
||||
# the executor is in task mode and raw Message enqueues are
|
||||
# rejected with InvalidAgentResponseError("Received Message
|
||||
# object in task mode. Use TaskStatusUpdateEvent or
|
||||
# TaskArtifactUpdateEvent instead."). updater.complete()
|
||||
# wraps the Message in a terminal TaskStatusUpdateEvent
|
||||
# (state=COMPLETED, final=True) which both streaming and
|
||||
# non-streaming clients accept.
|
||||
await updater.complete(message=msg)
|
||||
_result = final_text
|
||||
|
||||
except Exception as e:
|
||||
logger.error("A2A execute error: %s", e, exc_info=True)
|
||||
try:
|
||||
task_span.record_exception(e)
|
||||
from opentelemetry.trace import StatusCode
|
||||
task_span.set_status(StatusCode.ERROR, str(e))
|
||||
except Exception:
|
||||
pass
|
||||
# A2A v1: in task mode, terminal errors must publish a
|
||||
# FAILED TaskStatusUpdateEvent (carrying the error Message)
|
||||
# rather than a raw Message enqueue. updater.failed() does
|
||||
# exactly this — both streaming and non-streaming clients
|
||||
# receive the error and stop polling.
|
||||
await updater.failed(
|
||||
message=new_text_message(
|
||||
sanitize_agent_error(exc=e), task_id=task_id, context_id=context_id
|
||||
)
|
||||
)
|
||||
finally:
|
||||
await set_current_task(self._heartbeat, "")
|
||||
|
||||
return _result
|
||||
|
||||
async def cancel(self, context: RequestContext, event_queue: EventQueue) -> None:
|
||||
"""Cancel a running task — emits canceled state to comply with A2A protocol."""
|
||||
from a2a.types import TaskStatus, TaskState, TaskStatusUpdateEvent
|
||||
await event_queue.enqueue_event(
|
||||
TaskStatusUpdateEvent(
|
||||
status=TaskStatus(state=TaskState.TASK_STATE_CANCELED), # v1: TaskState uses SCREAMING_SNAKE_CASE
|
||||
final=True,
|
||||
)
|
||||
)
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,263 +0,0 @@
|
||||
"""Single source of truth for A2A ``/workspaces/<id>/a2a`` response shapes.
|
||||
|
||||
The workspace-server proxy at
|
||||
``workspace-server/internal/handlers/a2a_proxy.go`` (the canonical
|
||||
emitter) returns one of the following shapes for a single A2A call:
|
||||
|
||||
* **JSON-RPC success** —
|
||||
``{"jsonrpc": "2.0", "result": {...}, "id": "..."}``
|
||||
The agent's reply, passed through unchanged.
|
||||
|
||||
* **JSON-RPC error** —
|
||||
``{"jsonrpc": "2.0", "error": {"message": "...", "code": ...}, "id": "..."}``
|
||||
The agent reported a structured error.
|
||||
|
||||
* **Poll-queued** (synthesized at proxy, RFC #2339 PR 2 — see
|
||||
``a2a_proxy.go:402-406``) —
|
||||
``{"status": "queued", "delivery_mode": "poll", "method": "..."}``
|
||||
The target is a poll-mode workspace (no public URL); the message
|
||||
was written to the platform's inbox queue. The target agent will
|
||||
fetch it via ``GET /activity?since_id=`` polling. NOT a failure —
|
||||
delivery succeeded, there's just no synchronous reply to relay.
|
||||
|
||||
* **Platform error** — ``{"error": "...", "restarting": true?, "retry_after": int?}``
|
||||
HTTP-level failure synthesized by the proxy when the agent is
|
||||
unreachable, the container is restarting, or some other infrastructure
|
||||
failure happened. ``restarting=true`` flags the platform-initiated
|
||||
container-restart path.
|
||||
|
||||
* **Malformed** — anything else. Surfaced explicitly so a future server
|
||||
change is loud rather than silent.
|
||||
|
||||
The ``parse(data)`` function classifies a pre-decoded JSON body into a
|
||||
typed variant. Callers ``match`` on the variant and never re-implement
|
||||
shape detection — that's the SSOT discipline.
|
||||
|
||||
# SSOT contract
|
||||
|
||||
This file is the Python half. The Go server emits these shapes today
|
||||
via inline ``gin.H{...}`` literals. A future PR can introduce a Go
|
||||
mirror (e.g. ``workspace-server/internal/models/a2a_response.go``)
|
||||
with a typed marshaller — until then, **any change to the wire shape
|
||||
must be reflected here** and gated by ``test_a2a_response.py``'s
|
||||
fixture corpus. The corpus exists specifically so a one-sided edit
|
||||
breaks CI.
|
||||
|
||||
# Why a typed model (vs. dict-key sniffing at every site)
|
||||
|
||||
The pre-2967 client at ``a2a_client.py:567-587`` sniffed for ``result``
|
||||
or ``error`` keys inline and treated everything else as malformed —
|
||||
which silently broke poll-mode peers (the queued envelope has neither
|
||||
key). Inline sniffing per call site multiplies the surface area where
|
||||
a new shape gets misclassified. A single typed parser with an
|
||||
explicit ``Malformed`` escape hatch makes shape additions a
|
||||
one-line change here + a fixture entry in the test corpus, instead of
|
||||
a hunt through every parsing site in the runtime.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import dataclasses
|
||||
import logging
|
||||
from typing import Any, Optional, Union
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclasses.dataclass(frozen=True)
|
||||
class Result:
|
||||
"""JSON-RPC success — agent's reply available synchronously.
|
||||
|
||||
``text`` is the convenience extraction from ``parts[0].text`` (the
|
||||
A2A multipart shape). ``parts`` is the full list, available for
|
||||
callers that need richer rendering (multiple parts, non-text parts).
|
||||
``raw_result`` preserves the unparsed ``result`` field for any
|
||||
caller that needs it (e.g. activity-row response_body audit).
|
||||
"""
|
||||
|
||||
text: str
|
||||
parts: list[dict[str, Any]] = dataclasses.field(default_factory=list)
|
||||
raw_result: Optional[dict[str, Any]] = None
|
||||
|
||||
|
||||
@dataclasses.dataclass(frozen=True)
|
||||
class Error:
|
||||
"""JSON-RPC error or platform-level error response.
|
||||
|
||||
``code`` is the JSON-RPC integer code when present, else None.
|
||||
``restarting`` / ``retry_after`` are platform-restart-in-progress
|
||||
metadata: when both are set, the caller knows the container is
|
||||
being recycled and may surface a softer error to the user.
|
||||
"""
|
||||
|
||||
message: str
|
||||
code: Optional[int] = None
|
||||
restarting: bool = False
|
||||
retry_after: Optional[int] = None
|
||||
|
||||
|
||||
@dataclasses.dataclass(frozen=True)
|
||||
class Queued:
|
||||
"""Platform poll-mode short-circuit — message accepted, peer will pick up async.
|
||||
|
||||
Returned when the target workspace is registered as
|
||||
``delivery_mode=poll`` (no public URL — typical for external
|
||||
standalone ``molecule-mcp`` runtimes). The message was written to
|
||||
the platform's inbox queue; the target agent will fetch it via
|
||||
``GET /activity?since_id=`` polling.
|
||||
|
||||
NOT a failure. Callers that expect a synchronous reply (the agent's
|
||||
response text) won't get one here — they should either:
|
||||
|
||||
* Tolerate the absence of a reply (fire-and-forget semantics).
|
||||
* Fall back to the durable ``/workspaces/:id/delegate`` +
|
||||
``/delegations`` polling path (see ``a2a_tools_delegation``'s
|
||||
``_delegate_sync_via_polling``), which writes the same A2A
|
||||
request through the platform's executeDelegation goroutine
|
||||
and lets the caller poll for the result row.
|
||||
|
||||
``method`` echoes the request method (``message/send``, ``notify``,
|
||||
etc.) so callers can correlate.
|
||||
"""
|
||||
|
||||
method: str
|
||||
delivery_mode: str = "poll"
|
||||
|
||||
|
||||
@dataclasses.dataclass(frozen=True)
|
||||
class Malformed:
|
||||
"""Server returned a body the parser can't classify.
|
||||
|
||||
Carries the raw decoded payload for diagnostic logging. Callers
|
||||
typically render this as an error to the user (see
|
||||
``send_a2a_message``) — but the Malformed variant is a separate
|
||||
type so logging / metrics can distinguish it from genuine
|
||||
JSON-RPC ``Error`` responses.
|
||||
"""
|
||||
|
||||
raw: Any # whatever the server returned: dict / list / str / number / etc.
|
||||
|
||||
|
||||
Variant = Union[Result, Error, Queued, Malformed]
|
||||
|
||||
|
||||
# Field-name constants — the wire vocabulary. Single source of truth;
|
||||
# the parser references these by name so a change here is a
|
||||
# one-line edit instead of a hunt through string literals.
|
||||
_KEY_RESULT = "result"
|
||||
_KEY_ERROR = "error"
|
||||
_KEY_STATUS = "status"
|
||||
_KEY_DELIVERY_MODE = "delivery_mode"
|
||||
_KEY_METHOD = "method"
|
||||
_KEY_RESTARTING = "restarting"
|
||||
_KEY_RETRY_AFTER = "retry_after"
|
||||
|
||||
_STATUS_QUEUED = "queued"
|
||||
_DELIVERY_MODE_POLL = "poll"
|
||||
|
||||
|
||||
def parse(data: Any) -> Variant:
|
||||
"""Classify a pre-decoded ``/a2a`` JSON response into a typed variant.
|
||||
|
||||
Never raises. Every branch is total: any input that doesn't match a
|
||||
known shape routes to ``Malformed`` so the caller can decide how
|
||||
to surface it.
|
||||
|
||||
The order of checks matters:
|
||||
|
||||
1. Non-dict input → Malformed (server contract is dict-shaped).
|
||||
2. Poll-queued envelope is checked BEFORE result/error because a
|
||||
server bug that sets both ``status=queued`` and ``result``
|
||||
should be loud, not silently treated as Result.
|
||||
3. ``result`` → Result (the JSON-RPC success path).
|
||||
4. ``error`` → Error (JSON-RPC error or platform error).
|
||||
5. Anything else → Malformed.
|
||||
"""
|
||||
if not isinstance(data, dict):
|
||||
logger.warning(
|
||||
"a2a_response.parse: non-dict body — got %s",
|
||||
type(data).__name__,
|
||||
)
|
||||
return Malformed(raw=data)
|
||||
|
||||
# Push-mode queue envelope — returned when a push-mode workspace
|
||||
# (one with a public URL) is at capacity. The platform queues the
|
||||
# request and returns {"queued": true, "message": "...", "queue_id": "..."}.
|
||||
# Unlike the poll-mode envelope (status=queued + delivery_mode=poll),
|
||||
# this shape has no delivery_mode key — it's distinguishable by
|
||||
# data.get("queued") is True alone. Checked before poll-mode so the
|
||||
# two cases are mutually exclusive even if a buggy server sends both.
|
||||
if data.get("queued") is True:
|
||||
method_raw = data.get(_KEY_METHOD)
|
||||
method = str(method_raw) if method_raw is not None else "message/send"
|
||||
logger.info(
|
||||
"a2a_response.parse: queued for busy push-mode peer (method=%s, queue_id=%s)",
|
||||
method,
|
||||
data.get("queue_id", "?"),
|
||||
)
|
||||
return Queued(method=method, delivery_mode="push")
|
||||
|
||||
# Poll-queued envelope. Both keys must be present — the workspace
|
||||
# server sets them together; if only one is present the body is
|
||||
# ambiguous and we route to Malformed for visibility.
|
||||
if (
|
||||
data.get(_KEY_STATUS) == _STATUS_QUEUED
|
||||
and data.get(_KEY_DELIVERY_MODE) == _DELIVERY_MODE_POLL
|
||||
):
|
||||
method_raw = data.get(_KEY_METHOD)
|
||||
method = str(method_raw) if method_raw is not None else "unknown"
|
||||
logger.info(
|
||||
"a2a_response.parse: queued for poll-mode peer (method=%s)",
|
||||
method,
|
||||
)
|
||||
return Queued(method=method)
|
||||
|
||||
# JSON-RPC success.
|
||||
if _KEY_RESULT in data:
|
||||
result = data[_KEY_RESULT]
|
||||
if isinstance(result, dict):
|
||||
parts_raw = result.get("parts")
|
||||
parts = parts_raw if isinstance(parts_raw, list) else []
|
||||
text = ""
|
||||
if parts:
|
||||
first = parts[0]
|
||||
if isinstance(first, dict):
|
||||
text_raw = first.get("text")
|
||||
text = str(text_raw) if text_raw is not None else ""
|
||||
return Result(text=text, parts=parts, raw_result=result)
|
||||
# ``result`` present but not a dict — unusual but not an error;
|
||||
# surface as a Result with the value rendered to text.
|
||||
return Result(text=str(result), parts=[], raw_result=None)
|
||||
|
||||
# JSON-RPC error or platform error.
|
||||
if _KEY_ERROR in data:
|
||||
err_raw = data[_KEY_ERROR]
|
||||
message = ""
|
||||
code: Optional[int] = None
|
||||
if isinstance(err_raw, dict):
|
||||
msg_raw = err_raw.get("message")
|
||||
if msg_raw is not None:
|
||||
message = str(msg_raw).strip()
|
||||
code_raw = err_raw.get("code")
|
||||
if isinstance(code_raw, int):
|
||||
code = code_raw
|
||||
elif isinstance(err_raw, str):
|
||||
message = err_raw.strip()
|
||||
else:
|
||||
message = str(err_raw)
|
||||
|
||||
restarting = bool(data.get(_KEY_RESTARTING, False))
|
||||
retry_after_raw = data.get(_KEY_RETRY_AFTER)
|
||||
retry_after = retry_after_raw if isinstance(retry_after_raw, int) else None
|
||||
|
||||
return Error(
|
||||
message=message,
|
||||
code=code,
|
||||
restarting=restarting,
|
||||
retry_after=retry_after,
|
||||
)
|
||||
|
||||
logger.warning(
|
||||
"a2a_response.parse: unrecognized shape — keys=%s",
|
||||
sorted(data.keys()),
|
||||
)
|
||||
return Malformed(raw=data)
|
||||
@@ -1,181 +0,0 @@
|
||||
"""A2A MCP tool implementations — the body of each tool handler.
|
||||
|
||||
Imports shared client functions and constants from a2a_client.
|
||||
"""
|
||||
|
||||
import hashlib
|
||||
import json
|
||||
import mimetypes
|
||||
import os
|
||||
import uuid
|
||||
|
||||
import httpx
|
||||
|
||||
from a2a_client import (
|
||||
PLATFORM_URL,
|
||||
WORKSPACE_ID,
|
||||
_A2A_ERROR_PREFIX,
|
||||
_peer_names,
|
||||
_peer_to_source,
|
||||
discover_peer,
|
||||
get_peers,
|
||||
get_peers_with_diagnostic,
|
||||
get_workspace_info,
|
||||
send_a2a_message,
|
||||
)
|
||||
from builtin_tools.security import _redact_secrets
|
||||
from platform_auth import list_registered_workspaces
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# RBAC + auth helpers — extracted to a2a_tools_rbac (RFC #2873 iter 4a).
|
||||
# Re-exported here under the legacy underscore names so existing tests'
|
||||
# patch("a2a_tools._check_memory_write_permission", …) and call sites
|
||||
# inside this module that resolve bare names against the module-level
|
||||
# namespace continue to work unchanged.
|
||||
# ---------------------------------------------------------------------------
|
||||
from a2a_tools_rbac import ( # noqa: E402 (import after the from-a2a_client block)
|
||||
_auth_headers_for_heartbeat,
|
||||
_check_memory_read_permission,
|
||||
_check_memory_write_permission,
|
||||
_get_workspace_tier,
|
||||
_is_root_workspace,
|
||||
_ROLE_PERMISSIONS,
|
||||
)
|
||||
|
||||
|
||||
# Per-field caps on the heartbeat / activity payload. Borrowed from
|
||||
# hermes-agent's design discipline: cap ONCE in the helper, not at every
|
||||
# call site, so a future caller adding error_detail can't accidentally
|
||||
# DoS activity_logs by pasting a 4MB stack trace + base64 image.
|
||||
#
|
||||
# Why these specific limits:
|
||||
# - error_detail (4096): hermes' value. Long enough for a multi-frame
|
||||
# stack trace, short enough that 100 errors in 5min is < 500KB total.
|
||||
# - summary (256): summary is a one-liner shown in the canvas card +
|
||||
# activity row. 256 covers UTF-8 emoji + a sentence.
|
||||
# - response_text (NOT capped): this is the agent's actual reply
|
||||
# content. Capping would silently truncate user-visible output.
|
||||
_MAX_ERROR_DETAIL_CHARS = 4096
|
||||
_MAX_SUMMARY_CHARS = 256
|
||||
|
||||
|
||||
async def report_activity(
|
||||
activity_type: str, target_id: str = "", summary: str = "", status: str = "ok",
|
||||
task_text: str = "", response_text: str = "", error_detail: str = "",
|
||||
):
|
||||
"""Report activity to the platform for live progress tracking."""
|
||||
# Defensive caps in the helper itself so every caller benefits — see
|
||||
# _MAX_ERROR_DETAIL_CHARS / _MAX_SUMMARY_CHARS comments above.
|
||||
if error_detail and len(error_detail) > _MAX_ERROR_DETAIL_CHARS:
|
||||
error_detail = error_detail[:_MAX_ERROR_DETAIL_CHARS]
|
||||
if summary and len(summary) > _MAX_SUMMARY_CHARS:
|
||||
summary = summary[:_MAX_SUMMARY_CHARS]
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
payload: dict = {
|
||||
"activity_type": activity_type,
|
||||
"source_id": WORKSPACE_ID,
|
||||
"target_id": target_id,
|
||||
"method": "message/send",
|
||||
"summary": summary,
|
||||
"status": status,
|
||||
}
|
||||
if task_text:
|
||||
payload["request_body"] = {"task": task_text}
|
||||
if response_text:
|
||||
payload["response_body"] = {"result": response_text}
|
||||
if error_detail:
|
||||
# error_detail is a top-level activity row column on the
|
||||
# platform (handlers/activity.go). Surfacing the cleaned
|
||||
# exception string here lets the Activity tab render a
|
||||
# red error chip + the cause without forcing the user
|
||||
# to scroll into the raw response_body JSON.
|
||||
payload["error_detail"] = error_detail
|
||||
await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/activity",
|
||||
json=payload,
|
||||
headers=_auth_headers_for_heartbeat(),
|
||||
)
|
||||
# Also push current_task via heartbeat for canvas card display
|
||||
if summary:
|
||||
await client.post(
|
||||
f"{PLATFORM_URL}/registry/heartbeat",
|
||||
json={
|
||||
"workspace_id": WORKSPACE_ID,
|
||||
"current_task": summary,
|
||||
"active_tasks": 1,
|
||||
"error_rate": 0,
|
||||
"sample_error": "",
|
||||
"uptime_seconds": 0,
|
||||
},
|
||||
headers=_auth_headers_for_heartbeat(),
|
||||
)
|
||||
except Exception:
|
||||
pass # Best-effort — don't block delegation on activity reporting
|
||||
|
||||
|
||||
# Delegation tool handlers — extracted to a2a_tools_delegation
|
||||
# (RFC #2873 iter 4b). Re-imported here so call sites + tests that
|
||||
# reference ``a2a_tools.tool_delegate_task`` /
|
||||
# ``a2a_tools._delegate_sync_via_polling`` keep resolving identically.
|
||||
from a2a_tools_delegation import ( # noqa: E402 (import after the from-a2a_client block)
|
||||
_SYNC_POLL_BUDGET_S,
|
||||
_SYNC_POLL_INTERVAL_S,
|
||||
_delegate_sync_via_polling,
|
||||
tool_check_task_status,
|
||||
tool_delegate_task,
|
||||
tool_delegate_task_async,
|
||||
)
|
||||
|
||||
|
||||
# Messaging tool handlers — extracted to a2a_tools_messaging
|
||||
# (RFC #2873 iter 4d). Re-imported here so call sites + tests that
|
||||
# reference ``a2a_tools.tool_send_message_to_user`` /
|
||||
# ``tool_list_peers`` / ``tool_get_workspace_info`` /
|
||||
# ``tool_chat_history`` / ``_upload_chat_files`` keep resolving
|
||||
# identically.
|
||||
from a2a_tools_messaging import ( # noqa: E402 (import after the top-of-module imports)
|
||||
_upload_chat_files,
|
||||
tool_broadcast_message,
|
||||
tool_chat_history,
|
||||
tool_get_workspace_info,
|
||||
tool_list_peers,
|
||||
tool_send_message_to_user,
|
||||
)
|
||||
|
||||
|
||||
# Memory tool handlers — extracted to a2a_tools_memory (RFC #2873 iter 4c).
|
||||
# Re-imported here so call sites + tests that reference
|
||||
# ``a2a_tools.tool_commit_memory`` / ``tool_recall_memory`` keep
|
||||
# resolving identically.
|
||||
from a2a_tools_memory import ( # noqa: E402 (import after the top-of-module imports)
|
||||
tool_commit_memory,
|
||||
tool_recall_memory,
|
||||
)
|
||||
|
||||
|
||||
# Inbox tool handlers — extracted to a2a_tools_inbox (RFC #2873 iter 4e).
|
||||
# Re-imported here so call sites + tests that reference
|
||||
# ``a2a_tools.tool_inbox_peek`` / ``tool_inbox_pop`` / ``tool_wait_for_message``
|
||||
# / ``_enrich_inbound_for_agent`` / ``_INBOX_NOT_ENABLED_MSG`` keep
|
||||
# resolving identically.
|
||||
from a2a_tools_inbox import ( # noqa: E402 (import after the top-of-module imports)
|
||||
_INBOX_NOT_ENABLED_MSG,
|
||||
_enrich_inbound_for_agent,
|
||||
tool_inbox_peek,
|
||||
tool_inbox_pop,
|
||||
tool_wait_for_message,
|
||||
)
|
||||
|
||||
|
||||
# Identity tool handlers — extracted to a2a_tools_identity. Ports the
|
||||
# two T4-tier MCP tools (``tool_get_runtime_identity`` +
|
||||
# ``tool_update_agent_card``) from molecule-ai-workspace-runtime PR#17.
|
||||
# That repo is mirror-only (reference_runtime_repo_is_mirror_only);
|
||||
# this is the canonical edit point, and the wheel mirror is
|
||||
# regenerated by publish-runtime.yml on merge.
|
||||
from a2a_tools_identity import ( # noqa: E402 (import after the top-of-module imports)
|
||||
tool_get_runtime_identity,
|
||||
tool_update_agent_card,
|
||||
)
|
||||
@@ -1,459 +0,0 @@
|
||||
"""Delegation tool handlers — single-concern slice of the a2a_tools surface.
|
||||
|
||||
Extracted from ``a2a_tools.py`` (RFC #2873 iter 4b). Owns the three
|
||||
delegation MCP tools + the RFC #2829 PR-5 sync-via-polling helper they
|
||||
share.
|
||||
|
||||
Public surface:
|
||||
|
||||
* ``tool_delegate_task`` — synchronous delegation, waits for response.
|
||||
* ``tool_delegate_task_async`` — fire-and-forget delegation; returns
|
||||
``{delegation_id, ...}``.
|
||||
* ``tool_check_task_status`` — poll the platform's ``/delegations`` log.
|
||||
|
||||
Internal:
|
||||
|
||||
* ``_delegate_sync_via_polling`` — durable async + poll for terminal
|
||||
status (RFC #2829 PR-5 cutover path; toggled by
|
||||
``DELEGATION_SYNC_VIA_INBOX=1``).
|
||||
* ``_SYNC_POLL_INTERVAL_S`` / ``_SYNC_POLL_BUDGET_S`` constants.
|
||||
|
||||
Circular-import note: this module calls ``report_activity`` from
|
||||
``a2a_tools`` to emit activity rows around the delegate dispatch.
|
||||
``a2a_tools`` imports the public symbols here at module-load time,
|
||||
so we use a LAZY import for ``report_activity`` inside the function
|
||||
that needs it. Without the lazy hop Python raises an ImportError
|
||||
on first ``a2a_tools`` import.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
from a2a_client import (
|
||||
PLATFORM_URL,
|
||||
WORKSPACE_ID,
|
||||
_A2A_ERROR_PREFIX,
|
||||
_A2A_QUEUED_PREFIX,
|
||||
_peer_names,
|
||||
_peer_to_source,
|
||||
discover_peer,
|
||||
send_a2a_message,
|
||||
)
|
||||
from a2a_tools_rbac import auth_headers_for_heartbeat as _auth_headers_for_heartbeat
|
||||
from _sanitize_a2a import (
|
||||
_A2A_BOUNDARY_END,
|
||||
_A2A_BOUNDARY_END_ESCAPED,
|
||||
_A2A_BOUNDARY_START,
|
||||
_A2A_BOUNDARY_START_ESCAPED,
|
||||
sanitize_a2a_result,
|
||||
) # noqa: E402
|
||||
|
||||
|
||||
# RFC #2829 PR-5 cutover constants. The poll cadence + timeout are
|
||||
# intentionally generous: 3s gives the platform's executeDelegation
|
||||
# goroutine room to dispatch + the callee to respond + the result to
|
||||
# write to activity_logs without thrashing the platform with rapid
|
||||
# polls; the budget matches the legacy DELEGATION_TIMEOUT (300s) so
|
||||
# operators don't see behavior change beyond "no more 600s timeouts".
|
||||
_SYNC_POLL_INTERVAL_S = 3.0
|
||||
_SYNC_POLL_BUDGET_S = float(os.environ.get("DELEGATION_TIMEOUT", "300.0"))
|
||||
|
||||
|
||||
async def _delegate_sync_via_polling(
|
||||
workspace_id: str,
|
||||
task: str,
|
||||
src: str,
|
||||
) -> str:
|
||||
"""RFC #2829 PR-5: durable async delegation + poll for terminal status.
|
||||
|
||||
Sidesteps the platform proxy's blocking `message/send` HTTP path that
|
||||
hits a hard 600s ceiling. Instead:
|
||||
|
||||
1. POST /workspaces/<src>/delegate (async, returns 202 + delegation_id)
|
||||
— platform's executeDelegation goroutine handles A2A dispatch in
|
||||
the background. No client-side timeout dependency on the platform
|
||||
holding a connection open.
|
||||
2. Poll GET /workspaces/<src>/delegations every 3s for a row with
|
||||
matching delegation_id reaching terminal status (completed/failed).
|
||||
3. Return the response_preview text on completed; surface error_detail
|
||||
on failed (with the same _A2A_ERROR_PREFIX wrapping the legacy
|
||||
path uses, so caller error-detection logic is unchanged).
|
||||
|
||||
Both /delegate and /delegations are existing endpoints — this helper
|
||||
just composes them into a polling synchronous facade. The result is
|
||||
available the moment the platform writes the terminal status row;
|
||||
no extra latency vs. the legacy proxy-blocked path on fast cases.
|
||||
"""
|
||||
import asyncio
|
||||
import time
|
||||
|
||||
idem_key = hashlib.sha256(f"{src}:{workspace_id}:{task}".encode()).hexdigest()[:32]
|
||||
|
||||
# 1. Dispatch via /delegate (the async, durable path).
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{src}/delegate",
|
||||
json={
|
||||
"target_id": workspace_id,
|
||||
"task": task,
|
||||
"idempotency_key": idem_key,
|
||||
},
|
||||
headers=_auth_headers_for_heartbeat(src),
|
||||
)
|
||||
except Exception as e: # pylint: disable=broad-except
|
||||
return f"{_A2A_ERROR_PREFIX}delegate dispatch failed: {e}"
|
||||
|
||||
if resp.status_code != 202 and resp.status_code != 200:
|
||||
return f"{_A2A_ERROR_PREFIX}delegate dispatch failed: HTTP {resp.status_code} {resp.text[:200]}"
|
||||
|
||||
try:
|
||||
dispatch = resp.json()
|
||||
except Exception as e: # pylint: disable=broad-except
|
||||
return f"{_A2A_ERROR_PREFIX}delegate dispatch returned non-JSON: {e}"
|
||||
|
||||
delegation_id = dispatch.get("delegation_id", "")
|
||||
if not delegation_id:
|
||||
return f"{_A2A_ERROR_PREFIX}delegate dispatch missing delegation_id: {dispatch}"
|
||||
|
||||
# 2. Poll for terminal status with a deadline. Each poll is a cheap
|
||||
# /delegations GET — bounded by the platform's existing rate limit.
|
||||
deadline = time.monotonic() + _SYNC_POLL_BUDGET_S
|
||||
last_status = "unknown"
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
poll = await client.get(
|
||||
f"{PLATFORM_URL}/workspaces/{src}/delegations",
|
||||
headers=_auth_headers_for_heartbeat(src),
|
||||
)
|
||||
except Exception as e: # pylint: disable=broad-except
|
||||
# Transient — keep polling. The platform IS holding the
|
||||
# delegation row; we just lost a network request.
|
||||
last_status = f"poll-error: {e}"
|
||||
await asyncio.sleep(_SYNC_POLL_INTERVAL_S)
|
||||
continue
|
||||
|
||||
if poll.status_code != 200:
|
||||
last_status = f"poll HTTP {poll.status_code}"
|
||||
await asyncio.sleep(_SYNC_POLL_INTERVAL_S)
|
||||
continue
|
||||
|
||||
try:
|
||||
rows = poll.json()
|
||||
except Exception as e: # pylint: disable=broad-except
|
||||
last_status = f"poll non-JSON: {e}"
|
||||
await asyncio.sleep(_SYNC_POLL_INTERVAL_S)
|
||||
continue
|
||||
|
||||
# /delegations returns a flat list of delegation events. Filter to
|
||||
# our delegation_id; pick the first terminal one. The list may
|
||||
# have multiple rows per delegation_id (one for the original
|
||||
# dispatch, one per status update); we want the latest terminal.
|
||||
if not isinstance(rows, list):
|
||||
await asyncio.sleep(_SYNC_POLL_INTERVAL_S)
|
||||
continue
|
||||
terminal = None
|
||||
for r in rows:
|
||||
if not isinstance(r, dict):
|
||||
continue
|
||||
if r.get("delegation_id") != delegation_id:
|
||||
continue
|
||||
status = (r.get("status") or "").lower()
|
||||
last_status = status
|
||||
if status in ("completed", "failed"):
|
||||
terminal = r
|
||||
break
|
||||
if terminal:
|
||||
if (terminal.get("status") or "").lower() == "completed":
|
||||
# OFFSEC-003: sanitize response_preview before returning so
|
||||
# boundary markers injected by a malicious peer cannot escape
|
||||
# the trust boundary.
|
||||
return sanitize_a2a_result(terminal.get("response_preview") or "")
|
||||
# OFFSEC-003: sanitize error_detail / summary before wrapping with
|
||||
# the _A2A_ERROR_PREFIX sentinel so injected markers cannot appear
|
||||
# inside the trusted error block returned to the agent.
|
||||
err_raw = (
|
||||
terminal.get("error_detail")
|
||||
or terminal.get("summary")
|
||||
or "delegation failed"
|
||||
)
|
||||
err = sanitize_a2a_result(err_raw)
|
||||
return f"{_A2A_ERROR_PREFIX}{err}"
|
||||
|
||||
await asyncio.sleep(_SYNC_POLL_INTERVAL_S)
|
||||
|
||||
# Budget exhausted — the platform's row is still in flight (or queued).
|
||||
# Surface as an error so the caller can decide to retry or fall back;
|
||||
# the platform DOES still have the durable row, so the work isn't
|
||||
# lost — it'll complete eventually and a future check_task_status
|
||||
# will surface the result.
|
||||
return (
|
||||
f"{_A2A_ERROR_PREFIX}polling timeout after {_SYNC_POLL_BUDGET_S}s "
|
||||
f"(delegation_id={delegation_id}, last_status={last_status}); "
|
||||
f"the platform is still working on it — call check_task_status('{delegation_id}') to retrieve later"
|
||||
)
|
||||
|
||||
|
||||
async def tool_delegate_task(
|
||||
workspace_id: str,
|
||||
task: str,
|
||||
source_workspace_id: str | None = None,
|
||||
) -> str:
|
||||
"""Delegate a task to another workspace via A2A (synchronous — waits for response).
|
||||
|
||||
``source_workspace_id`` selects which registered workspace this
|
||||
delegation originates from — drives auth + the X-Workspace-ID source
|
||||
header so the platform's a2a_proxy logs the correct sender. Single-
|
||||
workspace operators leave it None and routing falls back to the
|
||||
module-level WORKSPACE_ID.
|
||||
"""
|
||||
if not workspace_id or not task:
|
||||
return "Error: workspace_id and task are required"
|
||||
|
||||
# Self-delegation guard: delegating to your own workspace ID deadlocks —
|
||||
# the sending turn holds _run_lock while the receive handler waits for the
|
||||
# same lock, the request 30s-times-out, and the whole cycle is wasted.
|
||||
# Reject immediately with an actionable message. (effective_src mirrors the
|
||||
# `src or WORKSPACE_ID` resolution used below for routing.)
|
||||
effective_src = source_workspace_id or _peer_to_source.get(workspace_id) or WORKSPACE_ID
|
||||
if workspace_id and workspace_id == effective_src:
|
||||
return (
|
||||
"Error: cannot delegate_task to your own workspace — self-delegation "
|
||||
"deadlocks _run_lock (your sending turn holds it, the receive handler "
|
||||
"waits for it, the request times out). There is no peer who is also you: "
|
||||
"just do the work yourself, or call commit_memory / send_message_to_user directly."
|
||||
)
|
||||
|
||||
# Auto-route: if source not specified, look up which registered
|
||||
# workspace last saw this peer (populated by tool_list_peers). Falls
|
||||
# back to the legacy WORKSPACE_ID for single-workspace operators.
|
||||
src = source_workspace_id or _peer_to_source.get(workspace_id) or None
|
||||
|
||||
# Discover the target. discover_peer is the access-control gate +
|
||||
# name/status lookup. The peer's reported ``url`` field is NOT used
|
||||
# for routing — see send_a2a_message, which constructs the URL via
|
||||
# the platform's A2A proxy.
|
||||
peer = await discover_peer(workspace_id, source_workspace_id=src)
|
||||
if not peer:
|
||||
return f"Error: workspace {workspace_id} not found or not accessible (check access control)"
|
||||
|
||||
if (peer.get("status") or "").lower() == "offline":
|
||||
return f"Error: workspace {workspace_id} is offline"
|
||||
|
||||
# Lazy import: a2a_tools imports this module at top-level, so a
|
||||
# top-level import of report_activity from a2a_tools would create a
|
||||
# circular dependency at first-import time. Lazy resolution inside
|
||||
# the function body breaks the cycle without forcing a ground-up
|
||||
# restructure of the activity-reporting layer.
|
||||
from a2a_tools import report_activity
|
||||
|
||||
# Report delegation start — include the task text for traceability
|
||||
peer_name = peer.get("name") or _peer_names.get(workspace_id) or workspace_id[:8]
|
||||
_peer_names[workspace_id] = peer_name # cache for future use
|
||||
# Brief summary for canvas display — just the delegation target
|
||||
await report_activity("a2a_send", workspace_id, f"Delegating to {peer_name}", task_text=task)
|
||||
|
||||
# RFC #2829 PR-5: agent-side cutover. When DELEGATION_SYNC_VIA_INBOX=1,
|
||||
# use the platform's durable async delegation API (POST /delegate +
|
||||
# poll /delegations) instead of the proxy-blocked message/send path.
|
||||
# This sidesteps the 600s message/send timeout class that broke
|
||||
# iteration-14/90-style long-running delegations on 2026-05-05.
|
||||
#
|
||||
# Default off — staging-canary first, flip default after PR-2's
|
||||
# result-push flag (DELEGATION_RESULT_INBOX_PUSH) has been on for
|
||||
# ≥1 week without incident.
|
||||
if os.environ.get("DELEGATION_SYNC_VIA_INBOX") == "1":
|
||||
result = await _delegate_sync_via_polling(workspace_id, task, src or WORKSPACE_ID)
|
||||
else:
|
||||
# send_a2a_message routes through ${PLATFORM_URL}/workspaces/{id}/a2a
|
||||
# (the platform proxy) so the same code works for in-container and
|
||||
# external (standalone molecule-mcp) callers.
|
||||
result = await send_a2a_message(workspace_id, task, source_workspace_id=src)
|
||||
# #2967: when the target is a poll-mode peer, the platform's
|
||||
# a2a_proxy short-circuits and returns a queued envelope —
|
||||
# send_a2a_message surfaces that as the _A2A_QUEUED_PREFIX
|
||||
# sentinel. The synchronous proxy path can't deliver a reply
|
||||
# because the target has no public URL; fall back to the
|
||||
# durable /delegate + /delegations polling path which DOES
|
||||
# work for poll-mode peers (the executeDelegation goroutine
|
||||
# writes to the inbox queue and the result row arrives when
|
||||
# the target picks it up + replies).
|
||||
#
|
||||
# This is what makes external-runtime-to-external-runtime
|
||||
# A2A actually deliver synchronous replies — without the
|
||||
# fallback the calling agent sees the queued sentinel as
|
||||
# success-with-no-text and never gets the peer's response.
|
||||
if result.startswith(_A2A_QUEUED_PREFIX):
|
||||
logger.info(
|
||||
"tool_delegate_task: target=%s is poll-mode; "
|
||||
"falling back from message/send to /delegate-poll path",
|
||||
workspace_id,
|
||||
)
|
||||
result = await _delegate_sync_via_polling(
|
||||
workspace_id, task, src or WORKSPACE_ID,
|
||||
)
|
||||
|
||||
# Detect delegation failures — wrap them clearly so the calling agent
|
||||
# can decide to retry, use another peer, or handle the task itself.
|
||||
is_error = result.startswith(_A2A_ERROR_PREFIX)
|
||||
# Strip the sentinel prefix so error_detail is the human-readable
|
||||
# cause directly. The Activity tab's red error chip surfaces this
|
||||
# without the user having to scroll into the raw response JSON.
|
||||
#
|
||||
# Cap at 4096 chars before sending — the platform's
|
||||
# activity_logs.error_detail column is unbounded TEXT and a
|
||||
# malicious or buggy peer could otherwise stream an arbitrarily
|
||||
# large error message into the caller's activity log. 4096 is
|
||||
# comfortably above any real exception traceback we've seen and
|
||||
# well below an obvious-DoS threshold.
|
||||
error_detail = result[len(_A2A_ERROR_PREFIX):].strip()[:4096] if is_error else ""
|
||||
await report_activity(
|
||||
"a2a_receive", workspace_id,
|
||||
f"{peer_name} responded ({len(result)} chars)" if not is_error else f"{peer_name} failed: {error_detail[:120]}",
|
||||
task_text=task, response_text=result,
|
||||
status="error" if is_error else "ok",
|
||||
error_detail=error_detail,
|
||||
)
|
||||
if is_error:
|
||||
return (
|
||||
f"DELEGATION FAILED to {peer_name}: {result}\n"
|
||||
f"You should either: (1) try a different peer, (2) handle this task yourself, "
|
||||
f"or (3) inform the user that {peer_name} is unavailable and provide your best answer."
|
||||
)
|
||||
# OFFSEC-003: escape boundary markers in peer text, then wrap in boundary
|
||||
# markers so the agent can distinguish trusted (own output) from untrusted
|
||||
# (peer-supplied) content. Explicit wrapping here rather than inside
|
||||
# sanitize_a2a_result preserves a clean separation of concerns.
|
||||
#
|
||||
# Truncate at the closer BEFORE sanitizing so the raw closer (which gets
|
||||
# lost during escaping) is removed from the content. After truncation,
|
||||
# sanitize the remaining text and wrap with escaped boundary markers.
|
||||
if _A2A_BOUNDARY_END in result:
|
||||
result = result[:result.index(_A2A_BOUNDARY_END)]
|
||||
escaped = sanitize_a2a_result(result)
|
||||
return (
|
||||
f"{_A2A_BOUNDARY_START_ESCAPED}\n"
|
||||
f"{escaped}\n"
|
||||
f"{_A2A_BOUNDARY_END_ESCAPED}"
|
||||
)
|
||||
|
||||
|
||||
async def tool_delegate_task_async(
|
||||
workspace_id: str,
|
||||
task: str,
|
||||
source_workspace_id: str | None = None,
|
||||
) -> str:
|
||||
"""Delegate a task via the platform's async delegation API (fire-and-forget).
|
||||
|
||||
Uses POST /workspaces/:id/delegate which runs the A2A request in the background.
|
||||
Results are tracked in the platform DB and broadcast via WebSocket.
|
||||
Use check_task_status to poll for results.
|
||||
|
||||
``source_workspace_id`` selects the sending workspace (which one of
|
||||
this agent's registered workspaces gets logged as the originator);
|
||||
auto-routes via the peer→source cache when omitted.
|
||||
"""
|
||||
if not workspace_id or not task:
|
||||
return "Error: workspace_id and task are required"
|
||||
|
||||
src = source_workspace_id or _peer_to_source.get(workspace_id) or WORKSPACE_ID
|
||||
|
||||
# Self-delegation guard: even on the async path, queuing a task to your own
|
||||
# workspace just makes you re-process your own dispatch — never useful, and
|
||||
# on the sync path it deadlocks (see tool_delegate_task). Reject early.
|
||||
if workspace_id and workspace_id == src:
|
||||
return (
|
||||
"Error: cannot delegate_task_async to your own workspace — there is no "
|
||||
"peer who is also you. Do the work yourself, or call commit_memory / "
|
||||
"send_message_to_user directly."
|
||||
)
|
||||
|
||||
# Idempotency key: SHA-256 of (source, target, task) so that a
|
||||
# restarted agent firing the same delegation gets the same key and
|
||||
# the platform returns the existing delegation_id instead of
|
||||
# creating a duplicate. Fixes #1456. Source is in the key so the
|
||||
# SAME task delegated from two different registered workspaces
|
||||
# produces two distinct delegations (the right behavior — one per
|
||||
# tenant audit trail).
|
||||
idem_key = hashlib.sha256(f"{src}:{workspace_id}:{task}".encode()).hexdigest()[:32]
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{src}/delegate",
|
||||
json={"target_id": workspace_id, "task": task, "idempotency_key": idem_key},
|
||||
headers=_auth_headers_for_heartbeat(src),
|
||||
)
|
||||
if resp.status_code == 202:
|
||||
data = resp.json()
|
||||
return json.dumps({
|
||||
"delegation_id": data.get("delegation_id", ""),
|
||||
"workspace_id": workspace_id,
|
||||
"status": "delegated",
|
||||
"note": "Task delegated. The platform runs it in the background. Use check_task_status to poll for results.",
|
||||
})
|
||||
else:
|
||||
return f"Error: delegation failed with status {resp.status_code}: {resp.text[:200]}"
|
||||
except Exception as e:
|
||||
return f"Error: delegation failed — {e}"
|
||||
|
||||
|
||||
async def tool_check_task_status(
|
||||
workspace_id: str,
|
||||
task_id: str,
|
||||
source_workspace_id: str | None = None,
|
||||
) -> str:
|
||||
"""Check delegations for this workspace via the platform API.
|
||||
|
||||
Args:
|
||||
workspace_id: Ignored (kept for backward compat). Checks
|
||||
``source_workspace_id``'s delegations (the workspace that
|
||||
FIRED the delegations), not the target's.
|
||||
task_id: Optional delegation_id to filter. If empty, returns all recent delegations.
|
||||
source_workspace_id: Which registered workspace's delegation log
|
||||
to query. Defaults to the module-level WORKSPACE_ID.
|
||||
"""
|
||||
src = source_workspace_id or WORKSPACE_ID
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.get(
|
||||
f"{PLATFORM_URL}/workspaces/{src}/delegations",
|
||||
headers=_auth_headers_for_heartbeat(src),
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
return f"Error: failed to check delegations ({resp.status_code})"
|
||||
delegations = resp.json()
|
||||
if task_id:
|
||||
# Filter by delegation_id
|
||||
matching = [d for d in delegations if d.get("delegation_id") == task_id]
|
||||
if matching:
|
||||
# OFFSEC-003: sanitize peer-supplied fields
|
||||
d = matching[0]
|
||||
d["summary"] = sanitize_a2a_result(d.get("summary", ""))
|
||||
d["response_preview"] = sanitize_a2a_result(d.get("response_preview", ""))
|
||||
return json.dumps(d)
|
||||
return json.dumps({"status": "not_found", "delegation_id": task_id})
|
||||
# Return all recent delegations
|
||||
summary = []
|
||||
for d in delegations[:10]:
|
||||
preview = d.get("response_preview", "")
|
||||
if preview:
|
||||
preview = sanitize_a2a_result(preview)
|
||||
summary.append({
|
||||
"delegation_id": d.get("delegation_id", ""),
|
||||
"target_id": d.get("target_id", ""),
|
||||
"status": d.get("status", ""),
|
||||
"summary": sanitize_a2a_result(d.get("summary", "")),
|
||||
"response_preview": preview,
|
||||
})
|
||||
return json.dumps({"delegations": summary, "count": len(delegations)})
|
||||
except Exception as e:
|
||||
return f"Error checking delegations: {e}"
|
||||
@@ -1,187 +0,0 @@
|
||||
"""Identity tool handlers — single-concern slice of the a2a_tools surface.
|
||||
|
||||
Owns the two MCP tools that close the T4-tier workspace owner-permission
|
||||
gaps reported via the canvas:
|
||||
|
||||
* ``tool_get_runtime_identity`` — env-only; returns model, model_provider,
|
||||
molecule_model, anthropic_base_url, tier, workspace_id, runtime
|
||||
(ADAPTER_MODULE). No HTTP call. Always permitted by RBAC — even
|
||||
read-only agents may know what model they are.
|
||||
|
||||
* ``tool_update_agent_card`` — POSTs the card to ``/registry/update-card``
|
||||
with the workspace's own bearer (same auth path as ``tool_commit_memory``
|
||||
via ``a2a_tools_rbac.auth_headers_for_heartbeat``). The platform
|
||||
replaces the stored card and broadcasts an ``agent_card_updated``
|
||||
event so the canvas reflects the new card live. Gated on
|
||||
``memory.write`` capability via the existing RBAC permission map so
|
||||
read-only roles can't silently rewrite the platform card.
|
||||
|
||||
Both originated as a port of molecule-ai-workspace-runtime PR#17
|
||||
(``feat(mcp): add update_agent_card + get_runtime_identity tools``).
|
||||
The mirror-only PR#17 was closed without merge per
|
||||
``reference_runtime_repo_is_mirror_only``; the canonical edit point is
|
||||
this monorepo at ``workspace/`` and the wheel mirror is regenerated
|
||||
automatically by the publish-runtime workflow.
|
||||
|
||||
Imports the auth-header primitive from ``a2a_tools_rbac`` (iter 4a) —
|
||||
NOT from ``a2a_tools`` — to avoid a circular import with the
|
||||
kitchen-sink re-export module.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
from typing import Any
|
||||
|
||||
import httpx
|
||||
|
||||
from a2a_client import PLATFORM_URL
|
||||
from a2a_tools_rbac import (
|
||||
auth_headers_for_heartbeat as _auth_headers_for_heartbeat,
|
||||
check_memory_write_permission as _check_memory_write_permission,
|
||||
)
|
||||
|
||||
|
||||
def _runtime_identity_payload() -> dict[str, Any]:
|
||||
"""Build the identity dict — env-only, no I/O.
|
||||
|
||||
Factored out from ``tool_get_runtime_identity`` so tests can assert
|
||||
against the exact key set without re-parsing JSON. The MCP tool
|
||||
handler ``tool_get_runtime_identity`` is the only public caller in
|
||||
production; tests call this helper directly.
|
||||
"""
|
||||
return {
|
||||
"model": os.environ.get("MODEL", ""),
|
||||
"model_provider": os.environ.get("MODEL_PROVIDER", ""),
|
||||
"molecule_model": os.environ.get("MOLECULE_MODEL", ""),
|
||||
"anthropic_base_url": os.environ.get("ANTHROPIC_BASE_URL", ""),
|
||||
"tier": os.environ.get("TIER", ""),
|
||||
"workspace_id": os.environ.get("WORKSPACE_ID", ""),
|
||||
# Adapter module is the closest thing the runtime has to a
|
||||
# "template slug" — e.g. "adapter" for claude-code-default,
|
||||
# "hermes" for hermes-template, etc. Picked from
|
||||
# $ADAPTER_MODULE env baked by each template's Dockerfile.
|
||||
"runtime": os.environ.get("ADAPTER_MODULE", ""),
|
||||
}
|
||||
|
||||
|
||||
async def tool_get_runtime_identity() -> str:
|
||||
"""Return this runtime's identity — model, provider, tier, IDs.
|
||||
|
||||
Env-only; no HTTP call. Useful so the agent can answer "what model
|
||||
am I?" correctly instead of guessing from a stale system prompt
|
||||
that the operator may have changed between boots.
|
||||
|
||||
Returns the identity as a JSON-encoded string (the dispatch contract
|
||||
every MCP tool in this module follows). Tests that want to assert
|
||||
individual fields can call ``_runtime_identity_payload()`` directly,
|
||||
or ``json.loads`` the return value.
|
||||
|
||||
Always permitted by RBAC — there is no sensitive information here
|
||||
that isn't already available to the process via ``os.environ``.
|
||||
The point of the tool is to surface those env values to the agent
|
||||
layer in a stable, documented shape rather than expecting every
|
||||
agent runtime to know to ``echo $MODEL``.
|
||||
"""
|
||||
return json.dumps(_runtime_identity_payload(), indent=2)
|
||||
|
||||
|
||||
async def tool_update_agent_card(card: Any) -> str:
|
||||
"""Update this workspace's agent_card on the platform.
|
||||
|
||||
POSTs the provided card to ``/registry/update-card`` with the
|
||||
workspace's own bearer token (same auth path as ``tool_commit_memory``
|
||||
and ``tool_get_workspace_info``). The platform validates required
|
||||
fields server-side, replaces the stored card, and broadcasts an
|
||||
``agent_card_updated`` event so the canvas updates live.
|
||||
|
||||
Args:
|
||||
card: A JSON-serialisable object (typically a dict) holding the
|
||||
new card. The platform validates required fields server-side.
|
||||
|
||||
Returns:
|
||||
JSON-encoded string. Body:
|
||||
- ``{"success": true, "status": "updated"}`` on success;
|
||||
- ``{"success": false, "error": "<msg>", "status_code": <int>}``
|
||||
on platform error;
|
||||
- ``{"success": false, "error": "<reason>"}`` on local validation
|
||||
(non-dict card, missing WORKSPACE_ID, network error).
|
||||
|
||||
Permission gate: this tool requires the ``memory.write`` RBAC
|
||||
capability — same gate as ``tool_commit_memory``. The check runs
|
||||
inline rather than at the dispatcher layer to keep ``a2a_mcp_server``
|
||||
permission-agnostic (the gate sits with the implementation, not the
|
||||
transport). Read-only roles get a clear error string back instead
|
||||
of a 403 from the platform.
|
||||
|
||||
We re-check ``isinstance(card, dict)`` here defensively rather than
|
||||
trust the MCP schema validator alone — the schema only constrains
|
||||
the transport, not the in-process call surface used by tests and
|
||||
sibling modules.
|
||||
"""
|
||||
payload = await _update_agent_card_impl(card)
|
||||
return json.dumps(payload, indent=2)
|
||||
|
||||
|
||||
async def _update_agent_card_impl(card: Any) -> dict[str, Any]:
|
||||
"""Dict-returning core of ``tool_update_agent_card``.
|
||||
|
||||
Split out so tests can assert against the raw dict shape (status
|
||||
codes, error messages) without re-parsing JSON on every assertion.
|
||||
The string-returning ``tool_update_agent_card`` is a thin wrapper
|
||||
invoked by the MCP dispatcher.
|
||||
"""
|
||||
# RBAC: require memory.write permission. Same gate as
|
||||
# tool_commit_memory (the agent already needs this capability to
|
||||
# persist anything outbound). Read-only roles can still call
|
||||
# get_runtime_identity / get_workspace_info to introspect — those
|
||||
# are env-only / read-only and have no inline gate.
|
||||
if not _check_memory_write_permission():
|
||||
return {
|
||||
"success": False,
|
||||
"error": (
|
||||
"RBAC — this workspace does not have the 'memory.write' "
|
||||
"permission required to update the agent_card."
|
||||
),
|
||||
}
|
||||
if not isinstance(card, dict):
|
||||
return {
|
||||
"success": False,
|
||||
"error": "card must be a JSON object (dict)",
|
||||
}
|
||||
ws_id = os.environ.get("WORKSPACE_ID", "")
|
||||
if not ws_id:
|
||||
return {
|
||||
"success": False,
|
||||
"error": "WORKSPACE_ID env not set; cannot identify caller",
|
||||
}
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.post(
|
||||
f"{PLATFORM_URL}/registry/update-card",
|
||||
json={"workspace_id": ws_id, "agent_card": card},
|
||||
headers=_auth_headers_for_heartbeat(),
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
body: dict[str, Any] = {}
|
||||
try:
|
||||
body = resp.json()
|
||||
except Exception:
|
||||
pass
|
||||
return {
|
||||
"success": True,
|
||||
"status": body.get("status", "updated"),
|
||||
}
|
||||
# Non-200 — surface what the platform returned.
|
||||
error_msg = ""
|
||||
try:
|
||||
error_msg = resp.json().get("error", "") or resp.text
|
||||
except Exception:
|
||||
error_msg = resp.text
|
||||
return {
|
||||
"success": False,
|
||||
"status_code": resp.status_code,
|
||||
"error": error_msg,
|
||||
}
|
||||
except Exception as e:
|
||||
return {"success": False, "error": f"network error: {e}"}
|
||||
@@ -1,140 +0,0 @@
|
||||
"""Inbox tool handlers — single-concern slice of the a2a_tools surface.
|
||||
|
||||
Standalone-runtime path for inbound-message delivery (push-mode runtimes
|
||||
get messages via the channel-tag synthesis in a2a_mcp_server). The
|
||||
``InboxState`` singleton is set by ``mcp_cli`` before the MCP server
|
||||
starts; in-container runtimes never call ``inbox.activate(...)`` so
|
||||
``inbox.get_state()`` returns None and these tools surface an
|
||||
informational error instead of raising.
|
||||
|
||||
When-to-use guidance for agents (mirrored in
|
||||
``platform_tools/registry.py``):
|
||||
- ``wait_for_message``: block until a new inbound message arrives, then
|
||||
decide what to do with it; forms the loop ``wait → respond → wait``.
|
||||
- ``inbox_peek``: inspect the queue non-destructively.
|
||||
- ``inbox_pop``: remove a handled message by activity_id.
|
||||
|
||||
Extracted from ``a2a_tools.py`` in RFC #2873 iter 4e so the kitchen-sink
|
||||
module shrinks to a back-compat shim. The extraction also makes the
|
||||
``_enrich_inbound_for_agent`` helper unit-testable in isolation —
|
||||
previously it was buried in ``a2a_tools`` and only exercised through
|
||||
the inbox wrappers, leaving its peer-id-empty / cache-miss / registry-
|
||||
unavailable branches under-covered.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
|
||||
|
||||
# Surfaced when the inbox subsystem is not initialised. Returned by the
|
||||
# three inbox tool wrappers below so the agent gets a clear "this
|
||||
# runtime delivers via push" message instead of a NameError.
|
||||
_INBOX_NOT_ENABLED_MSG = (
|
||||
"Error: inbox polling is not enabled in this runtime. The standalone "
|
||||
"molecule-mcp wrapper activates it; in-container runtimes receive "
|
||||
"messages via push delivery and do not need these tools."
|
||||
)
|
||||
|
||||
|
||||
def _enrich_inbound_for_agent(d: dict) -> dict:
|
||||
"""Add peer_name / peer_role / agent_card_url to a poll-path message.
|
||||
|
||||
The PUSH path (a2a_mcp_server._build_channel_notification) already
|
||||
enriches the meta dict with these fields, so a Claude Code host
|
||||
with channel-push sees them. The POLL path goes through
|
||||
InboxMessage.to_dict, which is intentionally identity-free (the
|
||||
storage layer doesn't know about the registry cache). Without this
|
||||
helper, every non-Claude-Code MCP client that uses inbox_peek /
|
||||
wait_for_message gets a plain message and the receiving agent
|
||||
can't tell who's writing — breaking the contract documented in
|
||||
a2a_mcp_server.py:303-345 ("In both paths the same fields apply").
|
||||
|
||||
Cache-first non-blocking enrichment (same shape as push): on cache
|
||||
miss the helper returns the bare message; the next call within the
|
||||
5-min TTL hits the warm cache. Failure to enrich is non-fatal —
|
||||
the agent still gets text + peer_id + kind + activity_id, just
|
||||
without the friendly identity.
|
||||
"""
|
||||
peer_id = d.get("peer_id") or ""
|
||||
if not peer_id:
|
||||
# canvas_user — no peer to enrich; helper returns the plain
|
||||
# message unchanged so the canvas reply path still works.
|
||||
return d
|
||||
try:
|
||||
from a2a_client import ( # local import — avoid module-load cycle
|
||||
_agent_card_url_for,
|
||||
enrich_peer_metadata_nonblocking,
|
||||
)
|
||||
except Exception: # noqa: BLE001
|
||||
# If a2a_client is unavailable (test harness, partial install),
|
||||
# degrade gracefully — agent still gets the bare envelope.
|
||||
return d
|
||||
record = enrich_peer_metadata_nonblocking(peer_id)
|
||||
if record is not None:
|
||||
if name := record.get("name"):
|
||||
d["peer_name"] = name
|
||||
if role := record.get("role"):
|
||||
d["peer_role"] = role
|
||||
# agent_card_url is constructable from peer_id alone — surface it
|
||||
# even when registry enrichment misses, so the receiving agent has
|
||||
# a single endpoint to hit for the peer's full capability list.
|
||||
d["agent_card_url"] = _agent_card_url_for(peer_id)
|
||||
return d
|
||||
|
||||
|
||||
async def tool_inbox_peek(limit: int = 10) -> str:
|
||||
"""Return up to ``limit`` pending inbound messages without removing them."""
|
||||
import inbox # local import — avoids a circular dep at module load
|
||||
|
||||
state = inbox.get_state()
|
||||
if state is None:
|
||||
return _INBOX_NOT_ENABLED_MSG
|
||||
messages = state.peek(limit=limit if isinstance(limit, int) else 10)
|
||||
return json.dumps([_enrich_inbound_for_agent(m.to_dict()) for m in messages])
|
||||
|
||||
|
||||
async def tool_inbox_pop(activity_id: str) -> str:
|
||||
"""Remove a message from the inbox queue by activity_id."""
|
||||
import inbox
|
||||
|
||||
state = inbox.get_state()
|
||||
if state is None:
|
||||
return _INBOX_NOT_ENABLED_MSG
|
||||
if not isinstance(activity_id, str) or not activity_id:
|
||||
return "Error: activity_id is required."
|
||||
removed = state.pop(activity_id)
|
||||
if removed is None:
|
||||
return json.dumps({"removed": False, "activity_id": activity_id})
|
||||
return json.dumps({"removed": True, "activity_id": activity_id})
|
||||
|
||||
|
||||
async def tool_wait_for_message(timeout_secs: float = 60.0) -> str:
|
||||
"""Block until a new message arrives or ``timeout_secs`` elapses.
|
||||
|
||||
Returns the head message non-destructively; the agent decides
|
||||
whether to ``inbox_pop`` it after acting.
|
||||
"""
|
||||
import inbox
|
||||
|
||||
state = inbox.get_state()
|
||||
if state is None:
|
||||
return _INBOX_NOT_ENABLED_MSG
|
||||
|
||||
try:
|
||||
timeout = float(timeout_secs)
|
||||
except (TypeError, ValueError):
|
||||
timeout = 60.0
|
||||
# Cap at 300s — Claude Code's default tool timeout is ~10min, and
|
||||
# blocking longer than 5min wastes the prompt cache window for
|
||||
# nothing useful. Operators who want longer can call repeatedly.
|
||||
timeout = max(0.0, min(timeout, 300.0))
|
||||
|
||||
# The threading.Event-based wait would block the asyncio loop.
|
||||
# Run it on the default executor so the MCP server can keep
|
||||
# processing other JSON-RPC requests while we sleep.
|
||||
loop = asyncio.get_running_loop()
|
||||
message = await loop.run_in_executor(None, state.wait, timeout)
|
||||
if message is None:
|
||||
return json.dumps({"timeout": True, "timeout_secs": timeout})
|
||||
return json.dumps(_enrich_inbound_for_agent(message.to_dict()))
|
||||
@@ -1,141 +0,0 @@
|
||||
"""Memory tool handlers — single-concern slice of the a2a_tools surface.
|
||||
|
||||
Extracted from ``a2a_tools.py`` (RFC #2873 iter 4c). Owns the two
|
||||
agent-memory MCP tools:
|
||||
|
||||
* ``tool_commit_memory`` — write to the workspace's persistent memory.
|
||||
* ``tool_recall_memory`` — search the workspace's persistent memory.
|
||||
|
||||
Both go through the platform's ``/workspaces/:id/memories`` endpoint;
|
||||
the platform is the source of truth for namespace isolation + audit
|
||||
trail. Local responsibility here is RBAC enforcement BEFORE hitting
|
||||
the network so a denied operation surfaces a clear in-band error
|
||||
instead of an opaque platform 403.
|
||||
|
||||
Imports the RBAC primitives from ``a2a_tools_rbac`` (iter 4a).
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
|
||||
import httpx
|
||||
|
||||
from a2a_client import PLATFORM_URL, WORKSPACE_ID
|
||||
from a2a_tools_rbac import (
|
||||
auth_headers_for_heartbeat as _auth_headers_for_heartbeat,
|
||||
check_memory_read_permission as _check_memory_read_permission,
|
||||
check_memory_write_permission as _check_memory_write_permission,
|
||||
is_root_workspace as _is_root_workspace,
|
||||
)
|
||||
from builtin_tools.security import _redact_secrets
|
||||
|
||||
|
||||
async def tool_commit_memory(
|
||||
content: str,
|
||||
scope: str = "LOCAL",
|
||||
source_workspace_id: str | None = None,
|
||||
) -> str:
|
||||
"""Save important information to persistent memory.
|
||||
|
||||
GLOBAL scope is writable only by root workspaces (tier == 0).
|
||||
RBAC memory.write permission is required for all scope levels.
|
||||
The source workspace_id is embedded in every record so the platform
|
||||
can enforce cross-workspace isolation and audit trail.
|
||||
|
||||
``source_workspace_id`` selects which registered workspace this
|
||||
memory belongs to when the agent is registered into multiple
|
||||
workspaces (PR-1 / multi-workspace mode). When unset, falls back
|
||||
to the module-level WORKSPACE_ID — single-workspace operators see
|
||||
no behaviour change.
|
||||
"""
|
||||
if not content:
|
||||
return "Error: content is required"
|
||||
content = _redact_secrets(content)
|
||||
scope = scope.upper()
|
||||
if scope not in ("LOCAL", "TEAM", "GLOBAL"):
|
||||
scope = "LOCAL"
|
||||
|
||||
# RBAC: require memory.write permission (mirrors builtin_tools/memory.py)
|
||||
if not _check_memory_write_permission():
|
||||
return (
|
||||
"Error: RBAC — this workspace does not have the 'memory.write' "
|
||||
"permission for this operation."
|
||||
)
|
||||
|
||||
# Scope enforcement: only root workspaces (tier 0) can write GLOBAL memory.
|
||||
# This prevents tenant workspaces from poisoning org-wide memory (GH#1610).
|
||||
if scope == "GLOBAL" and not _is_root_workspace():
|
||||
return (
|
||||
"Error: RBAC — only root workspaces (tier 0) can write to GLOBAL scope. "
|
||||
"Non-root workspaces may use LOCAL or TEAM scope."
|
||||
)
|
||||
|
||||
src = source_workspace_id or WORKSPACE_ID
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{src}/memories",
|
||||
json={
|
||||
"content": content,
|
||||
"scope": scope,
|
||||
# Embed source workspace so the platform can namespace-isolate
|
||||
# and audit cross-workspace writes (GH#1610 fix).
|
||||
"workspace_id": src,
|
||||
},
|
||||
headers=_auth_headers_for_heartbeat(src),
|
||||
)
|
||||
data = resp.json()
|
||||
if resp.status_code in (200, 201):
|
||||
return json.dumps({"success": True, "id": data.get("id"), "scope": scope})
|
||||
return f"Error: {data.get('error', resp.text)}"
|
||||
except Exception as e:
|
||||
return f"Error saving memory: {e}"
|
||||
|
||||
|
||||
async def tool_recall_memory(
|
||||
query: str = "",
|
||||
scope: str = "",
|
||||
source_workspace_id: str | None = None,
|
||||
) -> str:
|
||||
"""Search persistent memory for previously saved information.
|
||||
|
||||
RBAC memory.read permission is required (mirrors builtin_tools/memory.py).
|
||||
The workspace_id is sent as a query parameter so the platform can
|
||||
cross-validate it against the auth token and defend against any future
|
||||
path traversal / cross-tenant read bugs in the platform itself.
|
||||
|
||||
``source_workspace_id`` selects which registered workspace's memories
|
||||
to search when the agent is registered into multiple workspaces.
|
||||
Unset → defaults to the module-level WORKSPACE_ID.
|
||||
"""
|
||||
# RBAC: require memory.read permission (mirrors builtin_tools/memory.py)
|
||||
if not _check_memory_read_permission():
|
||||
return (
|
||||
"Error: RBAC — this workspace does not have the 'memory.read' "
|
||||
"permission for this operation."
|
||||
)
|
||||
|
||||
src = source_workspace_id or WORKSPACE_ID
|
||||
params: dict[str, str] = {"workspace_id": src}
|
||||
if query:
|
||||
params["q"] = query
|
||||
if scope:
|
||||
params["scope"] = scope.upper()
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.get(
|
||||
f"{PLATFORM_URL}/workspaces/{src}/memories",
|
||||
params=params,
|
||||
headers=_auth_headers_for_heartbeat(src),
|
||||
)
|
||||
data = resp.json()
|
||||
if isinstance(data, list):
|
||||
if not data:
|
||||
return "No memories found."
|
||||
lines = []
|
||||
for m in data:
|
||||
lines.append(f"[{m.get('scope', '?')}] {m.get('content', '')}")
|
||||
return "\n".join(lines)
|
||||
return json.dumps(data)
|
||||
except Exception as e:
|
||||
return f"Error recalling memory: {e}"
|
||||
@@ -1,382 +0,0 @@
|
||||
"""Messaging tool handlers — single-concern slice of the a2a_tools surface.
|
||||
|
||||
Extracted from ``a2a_tools.py`` (RFC #2873 iter 4d). Owns the four
|
||||
human-and-peer messaging MCP tools + the chat-upload helper they share:
|
||||
|
||||
* ``tool_send_message_to_user`` — push a canvas-chat message via the
|
||||
platform's ``/notify`` endpoint.
|
||||
* ``tool_list_peers`` — discover peers across one or many registered
|
||||
workspaces, with side-effect of populating ``_peer_to_source`` for
|
||||
delegate-task auto-routing.
|
||||
* ``tool_get_workspace_info`` — JSON-encode the workspace's own info.
|
||||
* ``tool_chat_history`` — fetch prior conversation rows with a peer.
|
||||
* ``_upload_chat_files`` — internal helper for the message-attachments
|
||||
code path; routes local file paths through the platform's
|
||||
``/chat/uploads`` so the canvas can render them as download chips.
|
||||
|
||||
Imports the auth-header primitive from ``a2a_tools_rbac`` (iter 4a).
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import mimetypes
|
||||
import os
|
||||
|
||||
import httpx
|
||||
|
||||
from a2a_client import (
|
||||
PLATFORM_URL,
|
||||
WORKSPACE_ID,
|
||||
_peer_names,
|
||||
_peer_to_source,
|
||||
get_peers_with_diagnostic,
|
||||
get_workspace_info,
|
||||
)
|
||||
from a2a_tools_rbac import auth_headers_for_heartbeat as _auth_headers_for_heartbeat
|
||||
from platform_auth import list_registered_workspaces
|
||||
|
||||
|
||||
async def _upload_chat_files(
|
||||
client: httpx.AsyncClient,
|
||||
paths: list[str],
|
||||
workspace_id: str | None = None,
|
||||
) -> tuple[list[dict], str | None]:
|
||||
"""Upload local file paths through /workspaces/<self>/chat/uploads.
|
||||
|
||||
The platform stages each upload under /workspace/.molecule/chat-uploads
|
||||
(an "allowed root" the canvas knows how to render via the Download
|
||||
endpoint) and returns metadata the broadcast payload references.
|
||||
|
||||
Why we route through upload instead of just passing the agent's path:
|
||||
the canvas's allowed-root list is /configs, /workspace, /home, /plugins
|
||||
— files at /tmp or /root would be unreachable. Uploading copies the
|
||||
bytes into an allowed root regardless of where the agent wrote them.
|
||||
|
||||
Returns (attachments, error). On any failure the caller should NOT
|
||||
fire the notify — partial-attach would surface a half-rendered chip.
|
||||
"""
|
||||
if not paths:
|
||||
return [], None
|
||||
files_payload: list[tuple[str, tuple[str, bytes, str]]] = []
|
||||
for p in paths:
|
||||
if not isinstance(p, str) or not p:
|
||||
return [], f"Error: invalid attachment path {p!r}"
|
||||
if not os.path.isfile(p):
|
||||
return [], f"Error: attachment not found: {p}"
|
||||
try:
|
||||
with open(p, "rb") as fh:
|
||||
data = fh.read()
|
||||
except OSError as e:
|
||||
return [], f"Error reading {p}: {e}"
|
||||
# Sniff mime from filename so the canvas can pick the right
|
||||
# icon / preview / inline-image renderer. Pre-fix this was
|
||||
# hardcoded application/octet-stream and chat_files.go's
|
||||
# Upload trusts whatever Content-Type the multipart part
|
||||
# carries — `mt := fh.Header.Get("Content-Type")` only falls
|
||||
# back to extension-sniffing when the header is empty. So a
|
||||
# hardcoded octet-stream meant every attachment lost its
|
||||
# real type forever, breaking the canvas chip's icon logic.
|
||||
mime_type, _ = mimetypes.guess_type(p)
|
||||
if not mime_type:
|
||||
mime_type = "application/octet-stream"
|
||||
files_payload.append(("files", (os.path.basename(p), data, mime_type)))
|
||||
target_workspace_id = (workspace_id or "").strip() or WORKSPACE_ID
|
||||
try:
|
||||
resp = await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{target_workspace_id}/chat/uploads",
|
||||
files=files_payload,
|
||||
headers=_auth_headers_for_heartbeat(target_workspace_id),
|
||||
)
|
||||
except Exception as e:
|
||||
return [], f"Error uploading attachments: {e}"
|
||||
if resp.status_code != 200:
|
||||
return [], f"Error: chat/uploads returned {resp.status_code}: {resp.text[:200]}"
|
||||
try:
|
||||
body = resp.json()
|
||||
except Exception as e:
|
||||
return [], f"Error parsing upload response: {e}"
|
||||
uploaded = body.get("files") or []
|
||||
if not isinstance(uploaded, list) or len(uploaded) != len(paths):
|
||||
return [], f"Error: upload returned {len(uploaded) if isinstance(uploaded, list) else 'invalid'} entries for {len(paths)} files"
|
||||
return uploaded, None
|
||||
|
||||
|
||||
async def tool_broadcast_message(
|
||||
message: str,
|
||||
workspace_id: str | None = None,
|
||||
) -> str:
|
||||
"""Send a broadcast message to ALL agent workspaces in the org.
|
||||
|
||||
Requires the workspace to have broadcast_enabled=true (set by a user or
|
||||
admin via PATCH /workspaces/:id/abilities). Use for urgent org-wide
|
||||
signals — status changes, critical alerts, coordination instructions.
|
||||
Every non-removed workspace receives the message in its activity log so
|
||||
poll-mode agents pick it up, and push-mode canvases get a real-time
|
||||
BROADCAST_MESSAGE WebSocket event.
|
||||
|
||||
Args:
|
||||
message: The broadcast text. Keep it concise — all agents receive
|
||||
this, so avoid lengthy prose that floods every context.
|
||||
workspace_id: Optional. Which registered workspace to send the
|
||||
broadcast from. Single-workspace agents omit this.
|
||||
"""
|
||||
if not message:
|
||||
return "Error: message is required"
|
||||
target_workspace_id = (workspace_id or "").strip() or WORKSPACE_ID
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=30.0) as client:
|
||||
resp = await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{target_workspace_id}/broadcast",
|
||||
json={"message": message},
|
||||
headers=_auth_headers_for_heartbeat(target_workspace_id),
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
data = resp.json()
|
||||
delivered = data.get("delivered", "?")
|
||||
return f"Broadcast sent to {delivered} workspace(s)"
|
||||
if resp.status_code == 403:
|
||||
try:
|
||||
hint = resp.json().get("hint", "")
|
||||
except Exception:
|
||||
hint = ""
|
||||
return f"Error: broadcast ability not enabled.{(' ' + hint) if hint else ''}"
|
||||
return f"Error: platform returned {resp.status_code}"
|
||||
except Exception as e:
|
||||
return f"Error sending broadcast: {e}"
|
||||
|
||||
|
||||
async def tool_send_message_to_user(
|
||||
message: str,
|
||||
attachments: list[str] | None = None,
|
||||
workspace_id: str | None = None,
|
||||
) -> str:
|
||||
"""Send a message directly to the user's canvas chat via WebSocket.
|
||||
|
||||
Args:
|
||||
message: The text to display in the user's chat. Required even
|
||||
when sending attachments — set to a short caption like
|
||||
"Here's the build output:" or "Done — see attached."
|
||||
attachments: Optional list of absolute file paths inside this
|
||||
container. Each is uploaded to the platform and rendered
|
||||
in the canvas as a clickable download chip. Use this
|
||||
instead of pasting paths in the message text — paths
|
||||
render as plain text and the user can't click them.
|
||||
Examples:
|
||||
attachments=["/tmp/build-output.zip"]
|
||||
attachments=["/workspace/report.pdf", "/workspace/data.csv"]
|
||||
workspace_id: Optional. When the agent is registered in MULTIPLE
|
||||
workspaces (external multi-workspace MCP path), this
|
||||
selects which workspace's chat to deliver the message to —
|
||||
should match the ``arrival_workspace_id`` of the inbound
|
||||
message you're replying to so the user sees the reply in
|
||||
the same canvas they typed in. Single-workspace agents
|
||||
omit this; the message routes to the only registered
|
||||
workspace.
|
||||
"""
|
||||
if not message:
|
||||
return "Error: message is required"
|
||||
target_workspace_id = (workspace_id or "").strip() or WORKSPACE_ID
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=60.0) as client:
|
||||
uploaded, upload_err = await _upload_chat_files(
|
||||
client, attachments or [], workspace_id=target_workspace_id,
|
||||
)
|
||||
if upload_err:
|
||||
return upload_err
|
||||
payload: dict = {"message": message}
|
||||
if uploaded:
|
||||
payload["attachments"] = uploaded
|
||||
resp = await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{target_workspace_id}/notify",
|
||||
json=payload,
|
||||
headers=_auth_headers_for_heartbeat(target_workspace_id),
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
if uploaded:
|
||||
return f"Message sent to user with {len(uploaded)} attachment(s)"
|
||||
return "Message sent to user"
|
||||
if resp.status_code == 403:
|
||||
try:
|
||||
body = resp.json()
|
||||
if body.get("error") == "talk_to_user_disabled":
|
||||
hint = body.get("hint", "")
|
||||
return (
|
||||
"Error: this workspace is not allowed to send messages "
|
||||
"directly to the user (talk_to_user is disabled). "
|
||||
+ (hint + " " if hint else "")
|
||||
+ "Use delegate_task to forward your update to a parent "
|
||||
"or supervisor workspace that can reach the user."
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
return f"Error: platform returned {resp.status_code}"
|
||||
except Exception as e:
|
||||
return f"Error sending message: {e}"
|
||||
|
||||
|
||||
async def tool_list_peers(source_workspace_id: str | None = None) -> str:
|
||||
"""List all workspaces this agent can communicate with.
|
||||
|
||||
Behavior:
|
||||
- ``source_workspace_id`` set → list peers of that one workspace.
|
||||
- Unset, single-workspace mode → list peers of WORKSPACE_ID
|
||||
(the legacy path, unchanged).
|
||||
- Unset, multi-workspace mode (MOLECULE_WORKSPACES populated) →
|
||||
aggregate across every registered workspace, prefixing each
|
||||
peer with its source so the agent / user can see the full peer
|
||||
surface in one call.
|
||||
|
||||
Side-effect: populates ``_peer_to_source`` so subsequent
|
||||
``tool_delegate_task(target)`` auto-routes through the correct
|
||||
sending workspace without the agent needing ``source_workspace_id``.
|
||||
"""
|
||||
sources: list[str]
|
||||
aggregate = False
|
||||
if source_workspace_id:
|
||||
sources = [source_workspace_id]
|
||||
else:
|
||||
registered = list_registered_workspaces()
|
||||
if len(registered) > 1:
|
||||
sources = registered
|
||||
aggregate = True
|
||||
else:
|
||||
sources = [WORKSPACE_ID]
|
||||
|
||||
all_peers: list[tuple[str, dict]] = [] # (source, peer_record)
|
||||
diagnostics: list[tuple[str, str]] = [] # (source, diagnostic)
|
||||
for src in sources:
|
||||
peers, diagnostic = await get_peers_with_diagnostic(source_workspace_id=src)
|
||||
if peers:
|
||||
for p in peers:
|
||||
all_peers.append((src, p))
|
||||
elif diagnostic is not None:
|
||||
diagnostics.append((src, diagnostic))
|
||||
|
||||
if not all_peers:
|
||||
if diagnostics:
|
||||
joined = "; ".join(f"[{src[:8]}] {d}" for src, d in diagnostics)
|
||||
return f"No peers found. {joined}"
|
||||
return (
|
||||
"You have no peers in the platform registry. "
|
||||
"(No parent, no children, no siblings registered.)"
|
||||
)
|
||||
|
||||
lines = []
|
||||
for src, p in all_peers:
|
||||
status = p.get("status", "unknown")
|
||||
role = p.get("role", "")
|
||||
peer_id = p["id"]
|
||||
# Cache name for use in delegate_task
|
||||
_peer_names[peer_id] = p["name"]
|
||||
# Cache the source workspace so tool_delegate_task auto-routes
|
||||
_peer_to_source[peer_id] = src
|
||||
if aggregate:
|
||||
lines.append(
|
||||
f"- {p['name']} (ID: {peer_id}, status: {status}, role: {role}, via: {src[:8]})"
|
||||
)
|
||||
else:
|
||||
lines.append(f"- {p['name']} (ID: {peer_id}, status: {status}, role: {role})")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
async def tool_get_workspace_info(source_workspace_id: str | None = None) -> str:
|
||||
"""Get this workspace's own info.
|
||||
|
||||
``source_workspace_id`` selects which registered workspace to
|
||||
introspect when the agent is registered into multiple workspaces.
|
||||
Unset → falls back to module-level WORKSPACE_ID.
|
||||
"""
|
||||
info = await get_workspace_info(source_workspace_id=source_workspace_id)
|
||||
return json.dumps(info, indent=2)
|
||||
|
||||
|
||||
async def tool_chat_history(
|
||||
peer_id: str,
|
||||
limit: int = 20,
|
||||
before_ts: str = "",
|
||||
source_workspace_id: str | None = None,
|
||||
) -> str:
|
||||
"""Fetch the prior conversation with one peer.
|
||||
|
||||
Hits ``/workspaces/<self>/activity?peer_id=<peer>&limit=<N>``
|
||||
against the workspace-server, which returns activity rows where
|
||||
the peer is either the sender (``source_id=peer`` — they sent us
|
||||
the message) or the recipient (``target_id=peer`` — we sent to
|
||||
them) of an A2A turn — both sides of the conversation in
|
||||
chronological order.
|
||||
|
||||
Args:
|
||||
peer_id: The other workspace's UUID. Same value the agent
|
||||
sees as ``peer_id`` on a peer_agent push or ``workspace_id``
|
||||
on a delegate_task call.
|
||||
limit: Maximum rows to return; capped server-side at 500. The
|
||||
default of 20 covers "most recent context for this peer"
|
||||
without flooding the agent's context window.
|
||||
before_ts: Optional RFC3339 timestamp; only rows strictly
|
||||
older are returned. Used to page backward through long
|
||||
histories — pass the oldest ``ts`` from the previous
|
||||
response. Empty (default) returns the most recent ``limit``
|
||||
rows.
|
||||
source_workspace_id: Which registered workspace's activity log
|
||||
to query. Auto-routes via ``_peer_to_source`` cache when
|
||||
unset (the workspace this peer was discovered through);
|
||||
falls back to module-level WORKSPACE_ID for single-workspace
|
||||
operators.
|
||||
|
||||
Returns a JSON-encoded list of activity rows (or an error string
|
||||
starting with ``Error:`` so the agent can branch). Each row carries
|
||||
``activity_type``, ``source_id``, ``target_id``, ``method``,
|
||||
``summary``, ``request_body``, ``response_body``, ``status``,
|
||||
``created_at`` — same shape ``inbox_peek`` and the canvas chat
|
||||
loader already see.
|
||||
"""
|
||||
if not peer_id or not isinstance(peer_id, str):
|
||||
return "Error: peer_id is required"
|
||||
if not isinstance(limit, int) or limit <= 0:
|
||||
limit = 20
|
||||
if limit > 500:
|
||||
limit = 500
|
||||
|
||||
src = source_workspace_id or _peer_to_source.get(peer_id) or WORKSPACE_ID
|
||||
|
||||
params: dict[str, str] = {
|
||||
"peer_id": peer_id,
|
||||
"limit": str(limit),
|
||||
}
|
||||
# Forward verbatim — the server route validates as RFC3339 at the
|
||||
# trust boundary and translates into a `created_at < $X` clause.
|
||||
if before_ts:
|
||||
params["before_ts"] = before_ts
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.get(
|
||||
f"{PLATFORM_URL}/workspaces/{src}/activity",
|
||||
params=params,
|
||||
headers=_auth_headers_for_heartbeat(src),
|
||||
)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
return f"Error: chat_history request failed: {exc}"
|
||||
|
||||
if resp.status_code == 400:
|
||||
# Trust-boundary rejection (malformed peer_id, etc.) — surface
|
||||
# the server's reason verbatim so the agent can correct itself.
|
||||
try:
|
||||
err = resp.json().get("error", "bad request")
|
||||
except Exception: # noqa: BLE001
|
||||
err = "bad request"
|
||||
return f"Error: {err}"
|
||||
if resp.status_code >= 400:
|
||||
return f"Error: chat_history returned HTTP {resp.status_code}"
|
||||
|
||||
try:
|
||||
rows = resp.json()
|
||||
except Exception: # noqa: BLE001
|
||||
return "Error: chat_history response was not JSON"
|
||||
if not isinstance(rows, list):
|
||||
return "Error: chat_history response was not a list"
|
||||
|
||||
# Server returns DESC (most recent first); reverse to chronological
|
||||
# so the agent reads the conversation top-down like a chat log.
|
||||
rows.reverse()
|
||||
return json.dumps(rows)
|
||||
@@ -1,138 +0,0 @@
|
||||
"""RBAC + auth-header helpers shared by all a2a_tools tool handlers.
|
||||
|
||||
Extracted from ``a2a_tools.py`` (RFC #2873 iter 4a). Centralises the
|
||||
"what can this workspace do" + "how do I prove it on a platform call"
|
||||
concerns into a single module so:
|
||||
|
||||
* Future tools added under ``a2a_tools/`` see one obvious helper to
|
||||
call instead of re-implementing the role/tier check.
|
||||
* The role-permission table is in ONE place — adding a new role
|
||||
or capability touches one file, not every tool that gates on it.
|
||||
* Tests targeting these helpers don't have to import the whole
|
||||
991-LOC ``a2a_tools`` surface.
|
||||
|
||||
Public surface:
|
||||
|
||||
* ``ROLE_PERMISSIONS`` — canonical role → action set table.
|
||||
* ``get_workspace_tier()`` — config-resolved tier (0 = root).
|
||||
* ``check_memory_write_permission()`` — boolean.
|
||||
* ``check_memory_read_permission()`` — boolean.
|
||||
* ``is_root_workspace()`` — boolean (tier == 0).
|
||||
* ``auth_headers_for_heartbeat(workspace_id=None)`` — auth-header dict
|
||||
with the multi-workspace registry lookup; tolerates ``platform_auth``
|
||||
missing on older installs (returns ``{}``).
|
||||
|
||||
Underscore-prefixed back-compat aliases (``_ROLE_PERMISSIONS``,
|
||||
``_check_memory_write_permission``, etc.) match the names previously
|
||||
exposed in ``a2a_tools`` so existing tests'
|
||||
``patch("a2a_tools._foo", ...)`` continue to work via the re-exports
|
||||
in ``a2a_tools.py``.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
|
||||
|
||||
# Mirror ``builtin_tools/audit.py`` for a2a_tools isolation. Listed as a
|
||||
# module-level constant rather than computed lazily so the table is
|
||||
# discoverable in static analysis + ``grep``.
|
||||
ROLE_PERMISSIONS: dict[str, set[str]] = {
|
||||
"admin": {"delegate", "approve", "memory.read", "memory.write"},
|
||||
"operator": {"delegate", "approve", "memory.read", "memory.write"},
|
||||
"read-only": {"memory.read"},
|
||||
"no-delegation": {"approve", "memory.read", "memory.write"},
|
||||
"no-approval": {"delegate", "memory.read", "memory.write"},
|
||||
"memory-readonly": {"memory.read"},
|
||||
}
|
||||
|
||||
|
||||
def get_workspace_tier() -> int:
|
||||
"""Return the workspace tier from config (0 = root, 1+ = tenant)."""
|
||||
try:
|
||||
from config import load_config
|
||||
|
||||
cfg = load_config()
|
||||
return getattr(cfg, "tier", 1)
|
||||
except Exception:
|
||||
return int(os.environ.get("WORKSPACE_TIER", 1))
|
||||
|
||||
|
||||
def _resolve_role_state() -> tuple[list[str], dict]:
|
||||
"""Return (roles, allowed_actions) from config.
|
||||
|
||||
Fail-closed: if config is unavailable, fall back to an "operator"
|
||||
default with no per-role overrides. Operator has memory.read +
|
||||
memory.write but not the elevated approve/delegate over GLOBAL
|
||||
scope, so a config outage doesn't grant unexpected privileges.
|
||||
"""
|
||||
try:
|
||||
from config import load_config
|
||||
|
||||
cfg = load_config()
|
||||
roles = list(getattr(cfg, "rbac", None).roles or ["operator"])
|
||||
allowed = dict(getattr(cfg, "rbac", None).allowed_actions or {})
|
||||
return roles, allowed
|
||||
except Exception:
|
||||
return ["operator"], {}
|
||||
|
||||
|
||||
def check_memory_write_permission() -> bool:
|
||||
"""Return True if this workspace's RBAC roles grant memory.write."""
|
||||
roles, allowed = _resolve_role_state()
|
||||
for role in roles:
|
||||
if role == "admin":
|
||||
return True
|
||||
if role in allowed:
|
||||
if "memory.write" in allowed[role]:
|
||||
return True
|
||||
elif role in ROLE_PERMISSIONS and "memory.write" in ROLE_PERMISSIONS[role]:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def check_memory_read_permission() -> bool:
|
||||
"""Return True if this workspace's RBAC roles grant memory.read."""
|
||||
roles, allowed = _resolve_role_state()
|
||||
for role in roles:
|
||||
if role == "admin":
|
||||
return True
|
||||
if role in allowed:
|
||||
if "memory.read" in allowed[role]:
|
||||
return True
|
||||
elif role in ROLE_PERMISSIONS and "memory.read" in ROLE_PERMISSIONS[role]:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def is_root_workspace() -> bool:
|
||||
"""Return True if this workspace is tier 0 (root/root-org)."""
|
||||
return get_workspace_tier() == 0
|
||||
|
||||
|
||||
def auth_headers_for_heartbeat(workspace_id: str | None = None) -> dict[str, str]:
|
||||
"""Return Phase 30.1 auth headers; tolerate platform_auth being absent
|
||||
in older installs (e.g. during rolling upgrade).
|
||||
|
||||
``workspace_id`` selects the per-workspace token from the multi-
|
||||
workspace registry when set (PR-1: external agent registered in
|
||||
multiple workspaces). With no arg the legacy single-token path is
|
||||
unchanged.
|
||||
"""
|
||||
try:
|
||||
from platform_auth import auth_headers
|
||||
return auth_headers(workspace_id) if workspace_id else auth_headers()
|
||||
except Exception:
|
||||
return {}
|
||||
|
||||
|
||||
# ============== Back-compat aliases for the previous a2a_tools names ==============
|
||||
# Tests + downstream call sites refer to the pre-extract names; aliasing
|
||||
# keeps both forms valid. The new public names (no underscore prefix)
|
||||
# are preferred for new code.
|
||||
|
||||
_ROLE_PERMISSIONS = ROLE_PERMISSIONS
|
||||
_get_workspace_tier = get_workspace_tier
|
||||
_check_memory_write_permission = check_memory_write_permission
|
||||
_check_memory_read_permission = check_memory_read_permission
|
||||
_is_root_workspace = is_root_workspace
|
||||
_auth_headers_for_heartbeat = auth_headers_for_heartbeat
|
||||
@@ -1,597 +0,0 @@
|
||||
"""Base adapter interface for agent infrastructure providers."""
|
||||
|
||||
import logging
|
||||
import os
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import Mapping
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Provider routing — type alias + resolver used by individual adapters.
|
||||
# Each adapter defines its own ProviderRegistry with the providers it accepts.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Maps prefix → (ordered_auth_env_vars, default_base_url).
|
||||
ProviderRegistry = dict[str, tuple[tuple[str, ...], str]]
|
||||
|
||||
|
||||
def resolve_provider_routing(
|
||||
model_str: str,
|
||||
env: Mapping[str, str],
|
||||
*,
|
||||
registry: ProviderRegistry,
|
||||
runtime_config: dict[str, Any] | None = None,
|
||||
) -> tuple[str, str, str]:
|
||||
"""Resolve a ``provider:model`` string to ``(api_key, base_url, bare_model_id)``.
|
||||
|
||||
URL precedence (highest to lowest):
|
||||
1. ``<PREFIX>_BASE_URL`` env var
|
||||
2. ``runtime_config["provider_url"]``
|
||||
3. registry default for the prefix
|
||||
|
||||
Unknown prefixes fall back to OPENAI_API_KEY + api.openai.com.
|
||||
Raises RuntimeError when no API key env var is set for the prefix.
|
||||
"""
|
||||
if ":" in model_str:
|
||||
prefix, model_id = model_str.split(":", 1)
|
||||
else:
|
||||
prefix, model_id = "openai", model_str
|
||||
|
||||
env_vars, default_url = registry.get(
|
||||
prefix, (("OPENAI_API_KEY",), "https://api.openai.com/v1")
|
||||
)
|
||||
api_key = next((env[v] for v in env_vars if env.get(v)), "")
|
||||
if not api_key:
|
||||
raise RuntimeError(
|
||||
f"No API key found for provider {prefix!r} "
|
||||
f"(checked: {', '.join(env_vars)}). Set one in workspace secrets."
|
||||
)
|
||||
|
||||
env_url = env.get(f"{prefix.upper()}_BASE_URL", "")
|
||||
config_url = (runtime_config or {}).get("provider_url", "")
|
||||
base_url = env_url or config_url or default_url
|
||||
|
||||
return api_key, base_url, model_id
|
||||
|
||||
from a2a.server.agent_execution import AgentExecutor
|
||||
|
||||
from event_log import DisabledEventLog, EventLogBackend
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Shared no-op default for adapter.event_log. Safe to share across
|
||||
# adapters because every DisabledEventLog method is a pure no-op with
|
||||
# no per-instance state.
|
||||
_DISABLED_EVENT_LOG: EventLogBackend = DisabledEventLog()
|
||||
|
||||
|
||||
@dataclass
|
||||
class SetupResult:
|
||||
"""Result from the shared _common_setup() pipeline."""
|
||||
system_prompt: str
|
||||
loaded_skills: list # LoadedSkill instances
|
||||
langchain_tools: list # LangChain BaseTool instances
|
||||
is_coordinator: bool
|
||||
children: list # child workspace dicts
|
||||
|
||||
|
||||
@dataclass
|
||||
class AdapterConfig:
|
||||
"""Standardized config passed to every adapter."""
|
||||
model: str # e.g. "anthropic:claude-sonnet-4-6" or "openrouter:google/gemini-2.5-flash"
|
||||
system_prompt: str | None = None # Assembled system prompt text
|
||||
tools: list[str] = field(default_factory=list) # Tool names from config.yaml
|
||||
runtime_config: dict[str, Any] = field(default_factory=dict) # Raw runtime_config block
|
||||
config_path: str = "/configs" # Path to configs directory
|
||||
workspace_id: str = "" # Workspace identifier
|
||||
prompt_files: list[str] = field(default_factory=list) # Ordered prompt file names
|
||||
a2a_port: int = 8000 # Port for A2A server
|
||||
heartbeat: Any = None # HeartbeatLoop instance
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RuntimeCapabilities:
|
||||
"""Adapter-declared ownership of cross-cutting platform capabilities.
|
||||
|
||||
The platform provides FALLBACK implementations of heartbeat, cron,
|
||||
durable session, etc. When a runtime SDK provides one of these
|
||||
natively (e.g. claude-code's streaming session model, hermes-agent's
|
||||
sidecar lifecycle), the adapter sets the corresponding flag to True.
|
||||
The platform reads these flags and skips its fallback for that
|
||||
capability — the adapter is responsible instead.
|
||||
|
||||
Observability is NEVER skipped: A2A protocol, activity_logs, and the
|
||||
broadcaster always run regardless of who owns the capability. These
|
||||
flags only switch WHO IMPLEMENTS the behavior, not whether the
|
||||
platform sees it.
|
||||
|
||||
All defaults are False so introducing this dataclass is a no-op:
|
||||
every existing adapter inherits BaseAdapter.capabilities() which
|
||||
returns RuntimeCapabilities() with everything off, matching today's
|
||||
"platform does it all" behavior. Each capability gets a platform-
|
||||
side consumer in a follow-up PR; this class is the foundation.
|
||||
|
||||
See project memory `project_runtime_native_pluggable.md` for the
|
||||
architecture principle these flags encode.
|
||||
"""
|
||||
# Heartbeat — adapter sends its own keep-alive signal to the platform's
|
||||
# broadcaster instead of relying on workspace/heartbeat.py's 30s loop.
|
||||
# Set True when the SDK already maintains a long-lived session that
|
||||
# produces natural progress events (e.g. claude-code streaming).
|
||||
provides_native_heartbeat: bool = False
|
||||
|
||||
# Cron / schedule — adapter handles scheduled triggers internally
|
||||
# (Temporal workflows, Durable Functions, sidecar daemons). Platform
|
||||
# scheduler skips polling workspace_schedules for this workspace,
|
||||
# avoiding double-fire on restart.
|
||||
provides_native_scheduler: bool = False
|
||||
|
||||
# Durable session — adapter persists in-flight session state across
|
||||
# restarts and exposes it via pre_stop_state/restore_state. When True,
|
||||
# the platform's a2a_queue does not need to enqueue mid-session
|
||||
# requests; the adapter handles QUEUED-state on its own.
|
||||
provides_native_session: bool = False
|
||||
|
||||
# Status lifecycle — adapter reports its own ready/degraded/failed
|
||||
# state (e.g. via heartbeat metadata). Platform respects the adapter
|
||||
# report instead of inferring status from heartbeat error rate.
|
||||
provides_native_status_mgmt: bool = False
|
||||
|
||||
# Retry — adapter handles transient errors (rate limits, 5xx) with
|
||||
# its own backoff. Platform stops re-dispatching A2A requests that
|
||||
# the adapter explicitly marked as "retrying internally".
|
||||
provides_native_retry: bool = False
|
||||
|
||||
# Activity log decoration — adapter contributes runtime-specific
|
||||
# fields (model, token_count, latency breakdown) into activity_log
|
||||
# rows alongside the platform-defined columns.
|
||||
provides_activity_decoration: bool = False
|
||||
|
||||
# Channel dispatch — adapter sends to external channels (Slack,
|
||||
# Lark, etc.) directly instead of routing through platform channels
|
||||
# manager. Used when the SDK has built-in channel integrations.
|
||||
provides_channel_dispatch: bool = False
|
||||
|
||||
def to_dict(self) -> dict[str, bool]:
|
||||
"""Serializable shape for the heartbeat payload + /capabilities
|
||||
endpoint. Plain dict avoids leaking dataclass internals to Go."""
|
||||
return {
|
||||
"heartbeat": self.provides_native_heartbeat,
|
||||
"scheduler": self.provides_native_scheduler,
|
||||
"session": self.provides_native_session,
|
||||
"status_mgmt": self.provides_native_status_mgmt,
|
||||
"retry": self.provides_native_retry,
|
||||
"activity_decoration": self.provides_activity_decoration,
|
||||
"channel_dispatch": self.provides_channel_dispatch,
|
||||
}
|
||||
|
||||
|
||||
class BaseAdapter(ABC):
|
||||
"""Interface every agent infrastructure adapter must implement.
|
||||
|
||||
To add a new agent infra:
|
||||
1. Create a standalone template repo (molecule-ai-workspace-template-<infra>)
|
||||
2. Implement adapter.py with a class extending BaseAdapter
|
||||
3. Add requirements.txt with your infra's dependencies + molecule-runtime
|
||||
4. Set ADAPTER_MODULE in the Dockerfile to your adapter module path
|
||||
|
||||
Cross-cutting capabilities your adapter can opt into:
|
||||
- capabilities() — declare native ownership of heartbeat, scheduler,
|
||||
session, status mgmt, etc. (see RuntimeCapabilities above)
|
||||
- idle_timeout_override() — extend the platform's per-dispatch
|
||||
silence window for SDKs with long synth turns
|
||||
- runtime_wedge.mark_wedged() / clear_wedge() — flip the workspace
|
||||
to `degraded` + auto-recover when your SDK hits a non-recoverable
|
||||
error class. Import directly from `runtime_wedge`; the heartbeat
|
||||
forwards the state to the platform automatically. See the
|
||||
runtime_wedge module docstring for the integration recipe.
|
||||
"""
|
||||
|
||||
@staticmethod
|
||||
@abstractmethod
|
||||
def name() -> str: # pragma: no cover
|
||||
"""Return the runtime identifier (e.g. 'langgraph', 'crewai').
|
||||
This must match the 'runtime' field in config.yaml."""
|
||||
...
|
||||
|
||||
@staticmethod
|
||||
@abstractmethod
|
||||
def display_name() -> str: # pragma: no cover
|
||||
"""Human-readable name for UI display."""
|
||||
...
|
||||
|
||||
@staticmethod
|
||||
@abstractmethod
|
||||
def description() -> str: # pragma: no cover
|
||||
"""Short description of what this adapter provides."""
|
||||
...
|
||||
|
||||
@staticmethod
|
||||
def get_config_schema() -> dict:
|
||||
"""Return JSON Schema for runtime_config fields this adapter supports.
|
||||
Used by the Config tab UI to render the right form fields.
|
||||
Override in subclasses for adapter-specific settings."""
|
||||
return {}
|
||||
|
||||
def capabilities(self) -> "RuntimeCapabilities":
|
||||
"""Declare which cross-cutting capabilities this adapter owns
|
||||
natively vs delegates to platform fallback.
|
||||
|
||||
Default returns RuntimeCapabilities() — every flag False, meaning
|
||||
the platform owns everything (today's behavior). Adapters override
|
||||
to declare native ownership; e.g. claude-code's adapter returns
|
||||
RuntimeCapabilities(provides_native_heartbeat=True,
|
||||
provides_native_session=True).
|
||||
|
||||
Subsequent platform-side consumers (idle-timeout override,
|
||||
scheduler skip, etc.) read this and route accordingly. See
|
||||
project memory `project_runtime_native_pluggable.md`."""
|
||||
return RuntimeCapabilities()
|
||||
|
||||
def idle_timeout_override(self) -> int | None:
|
||||
"""Per-A2A-dispatch silence window override, in SECONDS.
|
||||
|
||||
Return None to use the platform default (env var
|
||||
A2A_IDLE_TIMEOUT_SECONDS, falling back to 5 minutes — see
|
||||
a2a_proxy.go:defaultIdleTimeoutDuration). Override when this
|
||||
runtime's SDK can legitimately go silent longer than the
|
||||
default before the dispatch should be considered wedged.
|
||||
|
||||
Why this is per-adapter, not just env: the env value is a
|
||||
cluster-wide knob set by ops. Different SDKs have different
|
||||
latency profiles — claude-code synthesis on Opus + tool use
|
||||
legitimately runs 8-10 min between broadcasts; hermes synth
|
||||
with custom providers can be even slower. Hardcoding 5min for
|
||||
everyone either cancels real work (claude-code synth) or
|
||||
leaves wedged runtimes (langgraph) hanging too long.
|
||||
|
||||
Platform reads this from the heartbeat payload and stashes
|
||||
it per-workspace; dispatchA2A consults it before applying the
|
||||
idle timer. None / unset / zero falls through to the global
|
||||
default — same behavior as before this hook landed."""
|
||||
return None
|
||||
|
||||
@property
|
||||
def event_log(self) -> EventLogBackend:
|
||||
"""Pluggable in-process event-log backend.
|
||||
|
||||
Adapters MAY call ``self.event_log.append(kind=..., payload=...)``
|
||||
to record runtime-internal events (tool dispatch, skill load,
|
||||
executor errors, peer-handoff). Readers query the buffer via
|
||||
the platform's ``/workspaces/:id/activity`` endpoint with a
|
||||
cursor — see ``event_log.py`` for the protocol.
|
||||
|
||||
Default: shared ``DisabledEventLog`` no-op, so adapters that
|
||||
never set this still link cleanly. ``main.py`` overrides at boot
|
||||
from the ``observability.event_log`` config block."""
|
||||
return getattr(self, "_event_log", None) or _DISABLED_EVENT_LOG
|
||||
|
||||
@event_log.setter
|
||||
def event_log(self, backend: EventLogBackend) -> None:
|
||||
self._event_log = backend
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Plugin install hooks
|
||||
# ------------------------------------------------------------------
|
||||
# New pipeline: each plugin ships per-runtime adaptors resolved via
|
||||
# `plugins_registry.resolve()`. Adapters expose hooks below that
|
||||
# adaptors call to wire plugin content into the runtime.
|
||||
#
|
||||
# Default implementations are filesystem-only (write to /configs,
|
||||
# append to CLAUDE.md). Runtimes with a dynamic tool registry
|
||||
# (e.g. DeepAgents sub-agents) override the hooks to also register
|
||||
# in-process state.
|
||||
|
||||
def memory_filename(self) -> str:
|
||||
"""File under /configs that the runtime treats as long-lived memory.
|
||||
|
||||
Both Claude Code and DeepAgents read CLAUDE.md natively, so this is
|
||||
the sensible default. Override only if a runtime expects a different
|
||||
filename.
|
||||
"""
|
||||
return "CLAUDE.md"
|
||||
|
||||
def register_tool_hook(self, name: str, fn) -> None:
|
||||
"""Default no-op. Override on runtimes with a dynamic tool registry.
|
||||
|
||||
Runtimes that pick tools up at startup via filesystem scan (Claude
|
||||
Code reads /configs/skills, LangGraph globs **/*.py) don't need to
|
||||
do anything here — the adaptor's file-write step is enough.
|
||||
"""
|
||||
return None
|
||||
|
||||
async def transcript_lines(self, since: int = 0, limit: int = 100) -> dict:
|
||||
"""Return live transcript entries for the most-recent agent session.
|
||||
|
||||
Default implementation returns ``supported: False`` for runtimes
|
||||
that don't expose a per-session log on disk. Override in subclasses
|
||||
that DO (Claude Code reads ``~/.claude/projects/<cwd>/<session>.jsonl``).
|
||||
|
||||
This is the "look over the agent's shoulder" feature — lets canvas /
|
||||
operators see live tool calls + AI thinking instead of waiting for
|
||||
the high-level activity log to flush.
|
||||
|
||||
Args:
|
||||
since: line offset to skip — caller's last cursor (0 = from start)
|
||||
limit: max lines to return (caller-side cap, default 100, max 1000)
|
||||
|
||||
Returns:
|
||||
``{runtime, supported, lines, cursor, more, source}`` where
|
||||
``cursor`` is the new offset to pass on the next poll, ``more``
|
||||
is True if additional lines remain past ``limit``, and ``source``
|
||||
is the file path lines were read from (useful for debugging).
|
||||
"""
|
||||
return {
|
||||
"runtime": self.name(),
|
||||
"supported": False,
|
||||
"lines": [],
|
||||
"cursor": since,
|
||||
"more": False,
|
||||
"source": None,
|
||||
}
|
||||
|
||||
def pre_stop_state(self) -> dict:
|
||||
"""Capture in-memory state for pause/resume serialization.
|
||||
|
||||
Called by main.py's shutdown handler just before the container exits.
|
||||
Returns a dict that will be scrubbed (via lib.snapshot_scrub) and
|
||||
written to /configs/.agent_snapshot.json.
|
||||
|
||||
Default implementation:
|
||||
1. Attempts to read ``self._executor._session_id`` (set by
|
||||
create_executor) and includes it as ``session_id``.
|
||||
2. Includes up to 200 recent transcript lines via transcript_lines().
|
||||
|
||||
Override in adapters that hold additional in-memory state that
|
||||
should survive a container stop.
|
||||
|
||||
Returns:
|
||||
A JSON-serializable dict. All string values are scrubbed before
|
||||
persisting, so it is safe to include raw content from the
|
||||
agent's context.
|
||||
"""
|
||||
from lib.pre_stop import MAX_TRANSCRIPT_LINES
|
||||
|
||||
state: dict = {}
|
||||
|
||||
# Session handle — critical for resuming the Claude Code session.
|
||||
executor = getattr(self, "_executor", None)
|
||||
if executor is not None:
|
||||
session_id = getattr(executor, "_session_id", None)
|
||||
if session_id:
|
||||
state["session_id"] = session_id
|
||||
|
||||
# Recent conversation log — captures where the agent left off.
|
||||
# transcript_lines() may be async; call it synchronously if possible,
|
||||
# otherwise let async adapters override pre_stop_state entirely.
|
||||
try:
|
||||
import inspect as _inspect
|
||||
transcript_fn = self.transcript_lines
|
||||
if _inspect.iscoroutinefunction(transcript_fn):
|
||||
# Async adapter — override pre_stop_state() for transcript access.
|
||||
# The base impl still captures session_id above.
|
||||
pass
|
||||
else:
|
||||
transcript = transcript_fn(since=0, limit=MAX_TRANSCRIPT_LINES)
|
||||
if transcript.get("supported"):
|
||||
state["transcript_lines"] = transcript.get("lines", [])
|
||||
except Exception:
|
||||
# Best-effort: never let transcript capture failure block serialization.
|
||||
pass
|
||||
|
||||
return state
|
||||
|
||||
def restore_state(self, snapshot: dict) -> None:
|
||||
"""Restore in-memory state from a pause/resume snapshot.
|
||||
|
||||
Called by main.py on first boot when /configs/.agent_snapshot.json
|
||||
exists. Gives the adapter a chance to restore session handles,
|
||||
conversation context, or any other in-memory state before the A2A
|
||||
server starts accepting requests.
|
||||
|
||||
Default implementation stores ``snapshot["session_id"]`` and
|
||||
``snapshot["transcript_lines"]`` as ``self._snapshot_session_id``
|
||||
and ``self._snapshot_transcript`` so that ``create_executor()`` or
|
||||
the executor itself can pick them up.
|
||||
|
||||
Args:
|
||||
snapshot: The scrubbed snapshot dict previously written by
|
||||
pre_stop_state(). All secrets have already been redacted.
|
||||
"""
|
||||
self._snapshot_session_id: str | None = snapshot.get("session_id")
|
||||
self._snapshot_transcript: list | None = snapshot.get("transcript_lines")
|
||||
|
||||
def register_subagent_hook(self, name: str, spec: dict) -> None:
|
||||
"""Default no-op. DeepAgents overrides to register a sub-agent."""
|
||||
return None
|
||||
|
||||
def append_to_memory_hook(self, config: AdapterConfig, filename: str, content: str) -> None:
|
||||
"""Append text to /configs/<filename> if the marker isn't already present.
|
||||
|
||||
Idempotent: looks for the first line of `content` as a marker so a
|
||||
re-install doesn't duplicate the block. Adaptors should pass content
|
||||
beginning with a unique header (e.g. ``# Plugin: molecule-dev-conventions``).
|
||||
"""
|
||||
import os
|
||||
target = os.path.join(config.config_path, filename)
|
||||
marker = content.splitlines()[0].strip() if content else ""
|
||||
existing = ""
|
||||
if os.path.exists(target):
|
||||
with open(target) as f:
|
||||
existing = f.read()
|
||||
if marker and marker in existing:
|
||||
logger.info("append_to_memory: %s already contains %r — skipping", filename, marker)
|
||||
return
|
||||
os.makedirs(os.path.dirname(target) or ".", exist_ok=True)
|
||||
with open(target, "a") as f:
|
||||
if existing and not existing.endswith("\n"):
|
||||
f.write("\n")
|
||||
f.write(content if content.endswith("\n") else content + "\n")
|
||||
logger.info("append_to_memory: appended %d chars to %s", len(content), filename)
|
||||
|
||||
async def install_plugins_via_registry(
|
||||
self,
|
||||
config: AdapterConfig,
|
||||
plugins,
|
||||
) -> list:
|
||||
"""Drive the new per-runtime adaptor pipeline for every loaded plugin.
|
||||
|
||||
For each plugin in `plugins.plugins`, resolve the adaptor for this
|
||||
runtime (via :func:`plugins_registry.resolve`) and invoke
|
||||
``install(ctx)``. Returns the list of :class:`InstallResult` so
|
||||
callers can surface warnings (e.g. raw-drop fallback hits).
|
||||
|
||||
Adapters whose runtime supports the new pipeline call this from
|
||||
``setup()`` instead of the legacy ``inject_plugins()``.
|
||||
"""
|
||||
from pathlib import Path
|
||||
from plugins_registry import InstallContext, resolve
|
||||
|
||||
results = []
|
||||
runtime = self.name().replace("-", "_") # e.g. "claude-code" -> "claude_code"
|
||||
|
||||
for plugin in plugins.plugins:
|
||||
adaptor, source = resolve(plugin.name, runtime, Path(plugin.path))
|
||||
ctx = InstallContext(
|
||||
configs_dir=Path(config.config_path),
|
||||
workspace_id=config.workspace_id,
|
||||
runtime=runtime,
|
||||
plugin_root=Path(plugin.path),
|
||||
memory_filename=self.memory_filename(),
|
||||
register_tool=self.register_tool_hook,
|
||||
register_subagent=self.register_subagent_hook,
|
||||
append_to_memory=lambda fn, c, _cfg=config: self.append_to_memory_hook(_cfg, fn, c),
|
||||
)
|
||||
try:
|
||||
result = await adaptor.install(ctx)
|
||||
results.append(result)
|
||||
logger.info(
|
||||
"Plugin %s installed via %s adaptor (warnings: %d)",
|
||||
plugin.name, source, len(result.warnings),
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.exception("Plugin %s install via %s failed: %s", plugin.name, source, exc)
|
||||
|
||||
return results
|
||||
|
||||
async def inject_plugins(self, config: AdapterConfig, plugins) -> None:
|
||||
"""Legacy hook — kept for backwards compatibility during migration.
|
||||
|
||||
Default: drive the new per-runtime adaptor pipeline. Adapters not yet
|
||||
migrated may still override this with their own logic.
|
||||
"""
|
||||
await self.install_plugins_via_registry(config, plugins)
|
||||
|
||||
async def _common_setup(self, config: AdapterConfig) -> SetupResult:
|
||||
"""Shared setup pipeline — loads plugins, skills, tools, coordinator, and builds system prompt.
|
||||
|
||||
All adapters can call this to get the full platform feature set.
|
||||
Returns a SetupResult with LangChain BaseTool instances that adapters
|
||||
convert to their native format if needed.
|
||||
"""
|
||||
from plugins import load_plugins
|
||||
from skill_loader.loader import load_skills
|
||||
from coordinator import get_children, build_children_description
|
||||
from prompt import build_system_prompt, get_peer_capabilities, get_platform_instructions
|
||||
from builtin_tools.approval import request_approval
|
||||
from builtin_tools.delegation import delegate_task, delegate_task_async, check_task_status
|
||||
from builtin_tools.memory import commit_memory, recall_memory
|
||||
from builtin_tools.sandbox import run_code
|
||||
|
||||
platform_url = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
|
||||
|
||||
# Load plugins from per-workspace dir first, then shared fallback
|
||||
workspace_plugins_dir = os.path.join(config.config_path, "plugins")
|
||||
plugins = load_plugins(
|
||||
workspace_plugins_dir=workspace_plugins_dir,
|
||||
shared_plugins_dir=os.environ.get("PLUGINS_DIR", "/plugins"),
|
||||
)
|
||||
await self.inject_plugins(config, plugins)
|
||||
if plugins.plugin_names:
|
||||
logger.info(f"Plugins: {', '.join(plugins.plugin_names)}")
|
||||
|
||||
# Load skills (workspace + plugin skills, deduped). Pass the runtime
|
||||
# name so SKILL.md frontmatter `runtime: [...]` can opt skills out
|
||||
# of incompatible adapters (hermes won't load claude-code-only
|
||||
# skills, etc.).
|
||||
runtime_name = type(self).name()
|
||||
loaded_skills = load_skills(config.config_path, config.tools, current_runtime=runtime_name)
|
||||
seen_skill_ids = {s.metadata.id for s in loaded_skills}
|
||||
for plugin_skills_dir in plugins.skill_dirs:
|
||||
plugin_skill_names = [
|
||||
d for d in os.listdir(plugin_skills_dir)
|
||||
if os.path.isdir(os.path.join(plugin_skills_dir, d))
|
||||
]
|
||||
for skill in load_skills(plugin_skills_dir, plugin_skill_names, current_runtime=runtime_name):
|
||||
if skill.metadata.id not in seen_skill_ids:
|
||||
loaded_skills.append(skill)
|
||||
seen_skill_ids.add(skill.metadata.id)
|
||||
logger.info(f"Loaded {len(loaded_skills)} skills: {[s.metadata.id for s in loaded_skills]}")
|
||||
|
||||
# Core platform tools — names mirror the platform_tools registry,
|
||||
# so the names referenced in get_a2a_instructions/get_hma_instructions
|
||||
# are guaranteed to exist as @tool symbols here. The structural
|
||||
# alignment test in tests/test_platform_tools.py pins this.
|
||||
all_tools = [
|
||||
delegate_task, delegate_task_async, check_task_status,
|
||||
request_approval, commit_memory, recall_memory, run_code,
|
||||
]
|
||||
for skill in loaded_skills:
|
||||
all_tools.extend(skill.tools)
|
||||
|
||||
# Coordinator mode: detect children and add routing tool
|
||||
children = await get_children()
|
||||
is_coordinator = len(children) > 0
|
||||
if is_coordinator:
|
||||
from coordinator import route_task_to_team
|
||||
logger.info(f"Coordinator mode: {len(children)} children")
|
||||
all_tools.append(route_task_to_team)
|
||||
|
||||
# Build system prompt with all context. Parent→child knowledge sharing
|
||||
# was previously handled by `shared_context` (parent's config.yaml file
|
||||
# paths injected into the child's prompt at boot). That path was removed
|
||||
# — agents now pull team-scoped knowledge via memory v2's team:<id>
|
||||
# namespace (recall_memory) on demand instead of paying for it on every
|
||||
# boot regardless of need. See RFC #2789 for the future shared-file
|
||||
# storage that complements this for large blob-shaped artefacts.
|
||||
peers = await get_peer_capabilities(platform_url, config.workspace_id)
|
||||
platform_instructions = await get_platform_instructions(platform_url, config.workspace_id)
|
||||
coordinator_prompt = build_children_description(children) if is_coordinator else ""
|
||||
extra_prompts = list(plugins.prompt_fragments)
|
||||
if coordinator_prompt:
|
||||
extra_prompts.append(coordinator_prompt)
|
||||
|
||||
system_prompt = build_system_prompt(
|
||||
config.config_path, config.workspace_id, loaded_skills, peers,
|
||||
prompt_files=config.prompt_files,
|
||||
plugin_rules=plugins.rules,
|
||||
plugin_prompts=extra_prompts,
|
||||
platform_instructions=platform_instructions,
|
||||
)
|
||||
|
||||
return SetupResult(
|
||||
system_prompt=system_prompt,
|
||||
loaded_skills=loaded_skills,
|
||||
langchain_tools=all_tools,
|
||||
is_coordinator=is_coordinator,
|
||||
children=children,
|
||||
)
|
||||
|
||||
@abstractmethod
|
||||
async def setup(self, config: AdapterConfig) -> None:
|
||||
"""One-time setup: validate config, prepare internal state.
|
||||
Called after deps are installed but before create_executor().
|
||||
Raise RuntimeError if setup fails (missing deps, bad config, etc.)."""
|
||||
... # pragma: no cover
|
||||
|
||||
@abstractmethod
|
||||
async def create_executor(self, config: AdapterConfig) -> AgentExecutor:
|
||||
"""Create and return an AgentExecutor ready for A2A integration.
|
||||
The returned executor's execute() method will be called by the
|
||||
A2A server's DefaultRequestHandler.
|
||||
|
||||
Subclasses should also store the returned executor as ``self._executor``
|
||||
so ``pre_stop_state()`` can access it for serialization.
|
||||
"""
|
||||
... # pragma: no cover
|
||||
@@ -1,22 +0,0 @@
|
||||
"""Adapter registry shim.
|
||||
|
||||
Adapters extracted to standalone repos (molecule-ai-workspace-template-*).
|
||||
ADAPTER_MODULE env var is the primary discovery mechanism in production.
|
||||
This shim provides backward-compatible imports for local dev + tests.
|
||||
"""
|
||||
import importlib
|
||||
import os
|
||||
import logging
|
||||
from adapter_base import BaseAdapter, AdapterConfig
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def get_adapter(runtime: str) -> type[BaseAdapter]:
|
||||
adapter_module = os.environ.get("ADAPTER_MODULE")
|
||||
if adapter_module:
|
||||
mod = importlib.import_module(adapter_module)
|
||||
return getattr(mod, "Adapter")
|
||||
raise KeyError(
|
||||
f"No ADAPTER_MODULE set for runtime '{runtime}'. "
|
||||
"Adapters now live in standalone template repos."
|
||||
)
|
||||
@@ -1,2 +0,0 @@
|
||||
"""Re-export from adapter_base for backward compat."""
|
||||
from adapter_base import * # noqa: F401,F403
|
||||
@@ -1,130 +0,0 @@
|
||||
# Google ADK Adapter
|
||||
|
||||
Molecule AI workspace adapter for [Google Agent Development Kit (ADK)](https://github.com/google/adk-python) — Google's official multi-agent Python SDK (~19k ⭐, Apache-2.0).
|
||||
|
||||
## Overview
|
||||
|
||||
This adapter bridges the A2A protocol used by the Molecule AI platform to Google ADK's runner/session model. Agents are backed by Google Gemini models via AI Studio or Vertex AI. Each workspace gets an `LlmAgent` wrapped in a `Runner` with an `InMemorySessionService`; sessions are tied to A2A task context IDs for stable, isolated per-conversation state.
|
||||
|
||||
**Runtime key:** `google-adk`
|
||||
|
||||
## Installation
|
||||
|
||||
The adapter dependencies are installed automatically by `entrypoint.sh` from this directory's `requirements.txt`:
|
||||
|
||||
```bash
|
||||
pip install -r adapters/google-adk/requirements.txt
|
||||
```
|
||||
|
||||
You'll also need a Google API key (AI Studio) or Vertex AI credentials.
|
||||
|
||||
## Configuration
|
||||
|
||||
### `config.yaml`
|
||||
|
||||
```yaml
|
||||
runtime: google-adk
|
||||
model: google:gemini-2.0-flash # or gemini-1.5-pro, gemini-2.5-flash, etc.
|
||||
runtime_config:
|
||||
agent_name: my-agent # optional, default: molecule-adk-agent
|
||||
max_output_tokens: 8192 # optional, default: 8192
|
||||
temperature: 1.0 # optional, default: 1.0
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Required | Description |
|
||||
|----------|----------|-------------|
|
||||
| `GOOGLE_API_KEY` | Yes (unless Vertex AI) | Google AI Studio API key |
|
||||
| `GOOGLE_GENAI_USE_VERTEXAI` | No | Set to `"1"` to use Vertex AI instead of AI Studio |
|
||||
| `GOOGLE_CLOUD_PROJECT` | When using Vertex AI | GCP project ID |
|
||||
| `GOOGLE_CLOUD_LOCATION` | When using Vertex AI | GCP region, e.g. `"us-central1"` |
|
||||
|
||||
## Usage Example
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from adapter_base import AdapterConfig
|
||||
from adapters.google_adk.adapter import GoogleADKAdapter
|
||||
|
||||
async def main():
|
||||
config = AdapterConfig(
|
||||
model="google:gemini-2.0-flash",
|
||||
system_prompt="You are a helpful assistant.",
|
||||
runtime_config={
|
||||
"agent_name": "demo-agent",
|
||||
"max_output_tokens": 1024,
|
||||
"temperature": 0.7,
|
||||
},
|
||||
workspace_id="ws-demo",
|
||||
)
|
||||
|
||||
adapter = GoogleADKAdapter()
|
||||
await adapter.setup(config) # validates keys, loads plugins/skills
|
||||
|
||||
executor = await adapter.create_executor(config) # returns GoogleADKA2AExecutor
|
||||
# executor.execute(context, event_queue) is called by the A2A server per turn
|
||||
print(f"Adapter: {adapter.display_name()} — model {config.model}")
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
### Running via A2A
|
||||
|
||||
Once the workspace is provisioned, send A2A messages as normal:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8000 \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"method": "message/send",
|
||||
"params": {
|
||||
"message": {
|
||||
"role": "user",
|
||||
"parts": [{"kind": "text", "text": "What is 2 + 2?"}]
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
## Supported Models
|
||||
|
||||
Any model supported by Google ADK and available through your credential path:
|
||||
|
||||
| Model | Notes |
|
||||
|-------|-------|
|
||||
| `gemini-2.0-flash` | Recommended — fast, cost-effective |
|
||||
| `gemini-2.5-flash` | Latest preview, strong reasoning |
|
||||
| `gemini-1.5-pro` | Higher capability, higher latency |
|
||||
| `gemini-1.5-flash` | Fast, lower cost |
|
||||
|
||||
Use the `google:` prefix in `config.yaml` — the adapter strips it before passing the model name to ADK.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
A2A Request
|
||||
│
|
||||
▼
|
||||
GoogleADKA2AExecutor.execute()
|
||||
│
|
||||
├── extract_message_text() ← shared_runtime helper
|
||||
├── _ensure_session() ← create/reuse InMemorySessionService session
|
||||
├── _build_content() ← wrap text in google.genai.types.Content
|
||||
│
|
||||
▼
|
||||
runner.run_async(session_id, user_id, new_message)
|
||||
│
|
||||
▼
|
||||
ADK Event stream → filter is_final_response() → extract text
|
||||
│
|
||||
▼
|
||||
event_queue.enqueue_event(new_agent_text_message(reply))
|
||||
│
|
||||
▼
|
||||
A2A Response
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
Apache-2.0 — same as [google/adk-python](https://github.com/google/adk-python).
|
||||
@@ -1,408 +0,0 @@
|
||||
"""Google ADK adapter for Molecule AI workspace runtime.
|
||||
|
||||
Wraps Google's Agent Development Kit (google-adk v1.x) as a Molecule AI
|
||||
WorkspaceAdapter, bridging the A2A protocol to Google ADK's runner/session
|
||||
model.
|
||||
|
||||
Google ADK concepts used
|
||||
------------------------
|
||||
- ``google.adk.agents.LlmAgent`` — An LLM-backed agent with instructions and
|
||||
optional tools. Declared with ``model``, ``name``, and ``instruction``.
|
||||
- ``google.adk.runners.Runner`` — Drives one or more agents inside a session;
|
||||
``run_async()`` streams ``Event`` objects, including the final response text.
|
||||
- ``google.adk.sessions.InMemorySessionService`` — Manages session state in
|
||||
memory. Each ``Runner`` owns a single ``InMemorySessionService`` instance.
|
||||
|
||||
Runtime-config keys (all optional)
|
||||
------------------------------------
|
||||
``max_output_tokens`` — int, default 8192. Forwarded to the ADK ``GenerateContentConfig``.
|
||||
``temperature`` — float, default 1.0.
|
||||
``agent_name`` — str, default ``"molecule-adk-agent"``.
|
||||
|
||||
Environment variables
|
||||
---------------------
|
||||
``GOOGLE_API_KEY`` — Google AI Studio key (required for ``gemini-*`` models).
|
||||
``GOOGLE_GENAI_USE_VERTEXAI`` — set to ``"1"`` to use Vertex AI instead of AI
|
||||
Studio. In that case supply
|
||||
``GOOGLE_CLOUD_PROJECT`` and
|
||||
``GOOGLE_CLOUD_LOCATION`` as well.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from a2a.server.agent_execution import AgentExecutor, RequestContext
|
||||
from a2a.server.events import EventQueue
|
||||
from a2a.helpers import new_text_message
|
||||
|
||||
from adapter_base import AdapterConfig, BaseAdapter
|
||||
|
||||
# Import sanitize_agent_error from the workspace package. The adapter lives
|
||||
# in the workspace/adapters/ hierarchy so the workspace package root is
|
||||
# always importable as long as the module is loaded from within a workspace.
|
||||
# In standalone template repos, this import resolves via the workspace package
|
||||
# entry point that also provides adapter_base.
|
||||
try:
|
||||
from executor_helpers import sanitize_agent_error # type: ignore[attr-defined]
|
||||
except ImportError: # pragma: no cover
|
||||
sanitize_agent_error = None # fallback: below handler falls back to class-name only
|
||||
|
||||
if TYPE_CHECKING:
|
||||
pass
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Constants
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_DEFAULT_AGENT_NAME = "molecule-adk-agent"
|
||||
_DEFAULT_MAX_OUTPUT_TOKENS = 8192
|
||||
_DEFAULT_TEMPERATURE = 1.0
|
||||
_NO_TEXT_MSG = "Error: message contained no text content."
|
||||
_NO_RESPONSE_MSG = "(no response generated)"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleADKA2AExecutor
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class GoogleADKA2AExecutor(AgentExecutor):
|
||||
"""A2A executor backed by a Google ADK ``Runner``.
|
||||
|
||||
Each executor instance owns a single ``Runner`` and ``InMemorySessionService``.
|
||||
Sessions are created on first use and reused across subsequent turns
|
||||
(the session_id is derived from the A2A context_id so each task gets a
|
||||
stable, isolated session).
|
||||
|
||||
Parameters
|
||||
----------
|
||||
model:
|
||||
ADK model identifier, e.g. ``"gemini-2.0-flash"`` or
|
||||
``"gemini-1.5-pro"``.
|
||||
system_prompt:
|
||||
Optional instruction prepended to every conversation. Passed to
|
||||
``LlmAgent(instruction=...)``.
|
||||
agent_name:
|
||||
Internal ADK agent name. Defaults to ``_DEFAULT_AGENT_NAME``.
|
||||
max_output_tokens:
|
||||
Token cap forwarded to ``GenerateContentConfig``.
|
||||
temperature:
|
||||
Sampling temperature forwarded to ``GenerateContentConfig``.
|
||||
heartbeat:
|
||||
Optional ``HeartbeatLoop`` instance (unused directly but stored for
|
||||
future heartbeat integration).
|
||||
_runner:
|
||||
Inject a pre-built ``Runner`` — for testing only. When provided,
|
||||
the real ADK ``Runner`` is never constructed.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model: str,
|
||||
system_prompt: str | None = None,
|
||||
agent_name: str = _DEFAULT_AGENT_NAME,
|
||||
max_output_tokens: int = _DEFAULT_MAX_OUTPUT_TOKENS,
|
||||
temperature: float = _DEFAULT_TEMPERATURE,
|
||||
heartbeat: Any = None,
|
||||
_runner: Any = None,
|
||||
) -> None:
|
||||
self.model = model
|
||||
self.system_prompt = system_prompt
|
||||
self.agent_name = agent_name
|
||||
self.max_output_tokens = max_output_tokens
|
||||
self.temperature = temperature
|
||||
self._heartbeat = heartbeat
|
||||
self._sessions_created: set[str] = set()
|
||||
|
||||
if _runner is not None:
|
||||
# Test injection — skip building the real ADK objects.
|
||||
self._runner = _runner
|
||||
else:
|
||||
self._runner = self._build_runner()
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Internal helpers
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def _build_runner(self) -> Any: # pragma: no cover — requires real ADK
|
||||
"""Construct a Google ADK ``Runner`` with an ``LlmAgent``.
|
||||
|
||||
Lazy-imports ``google.adk`` so the rest of the workspace runtime
|
||||
doesn't pull in google-adk on startup (it's only needed when this
|
||||
executor is actually instantiated by ``GoogleADKAdapter.create_executor``).
|
||||
"""
|
||||
from google.adk.agents import LlmAgent
|
||||
from google.adk.runners import Runner
|
||||
from google.adk.sessions import InMemorySessionService
|
||||
|
||||
agent = LlmAgent(
|
||||
name=self.agent_name,
|
||||
model=self.model,
|
||||
instruction=self.system_prompt or "",
|
||||
)
|
||||
|
||||
session_service = InMemorySessionService()
|
||||
runner = Runner(
|
||||
agent=agent,
|
||||
app_name=self.agent_name,
|
||||
session_service=session_service,
|
||||
)
|
||||
return runner
|
||||
|
||||
async def _ensure_session(self, session_id: str, user_id: str) -> None:
|
||||
"""Create a session in the service if it doesn't exist yet."""
|
||||
if session_id in self._sessions_created:
|
||||
return
|
||||
session_service = self._runner.session_service
|
||||
existing = await session_service.get_session(
|
||||
app_name=self.agent_name,
|
||||
user_id=user_id,
|
||||
session_id=session_id,
|
||||
)
|
||||
if existing is None:
|
||||
await session_service.create_session(
|
||||
app_name=self.agent_name,
|
||||
user_id=user_id,
|
||||
session_id=session_id,
|
||||
)
|
||||
self._sessions_created.add(session_id)
|
||||
|
||||
def _extract_text(self, context: RequestContext) -> str:
|
||||
"""Pull plain text out of the A2A message parts."""
|
||||
from shared_runtime import extract_message_text
|
||||
return extract_message_text(context)
|
||||
|
||||
def _build_content(self, user_text: str) -> Any:
|
||||
"""Wrap user text in an ADK-compatible ``Content`` object."""
|
||||
from google.genai.types import Content, Part
|
||||
return Content(role="user", parts=[Part(text=user_text)])
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# AgentExecutor interface
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
async def execute(self, context: RequestContext, event_queue: EventQueue) -> None:
|
||||
"""Run a single ADK turn and enqueue the reply as an A2A Message.
|
||||
|
||||
Sequence:
|
||||
1. Extract user text from A2A message parts.
|
||||
2. Ensure an ADK session exists for this context_id.
|
||||
3. Call ``runner.run_async()`` and collect all response events.
|
||||
4. Concatenate final-response text; fall back to ``_NO_RESPONSE_MSG``
|
||||
when the model produces no output.
|
||||
5. Enqueue the reply via ``event_queue``.
|
||||
"""
|
||||
user_text = self._extract_text(context)
|
||||
if not user_text:
|
||||
parts = getattr(getattr(context, "message", None), "parts", None)
|
||||
logger.warning("GoogleADKA2AExecutor: no text in message parts: %s", parts)
|
||||
await event_queue.enqueue_event(new_text_message(_NO_TEXT_MSG))
|
||||
return
|
||||
|
||||
session_id = getattr(context, "context_id", None) or "default-session"
|
||||
user_id = "molecule-user"
|
||||
|
||||
try:
|
||||
await self._ensure_session(session_id, user_id)
|
||||
|
||||
content = self._build_content(user_text)
|
||||
response_parts: list[str] = []
|
||||
|
||||
async for event in self._runner.run_async(
|
||||
session_id=session_id,
|
||||
user_id=user_id,
|
||||
new_message=content,
|
||||
):
|
||||
# Collect text from final-response events
|
||||
if not getattr(event, "is_final_response", lambda: False)():
|
||||
continue
|
||||
candidate_response = getattr(event, "response", None)
|
||||
if candidate_response is None:
|
||||
continue
|
||||
for part in getattr(
|
||||
getattr(candidate_response, "content", None) or MissingContent(),
|
||||
"parts", []
|
||||
):
|
||||
text = getattr(part, "text", None)
|
||||
if text:
|
||||
response_parts.append(text)
|
||||
|
||||
final_text = "".join(response_parts).strip() or _NO_RESPONSE_MSG
|
||||
await event_queue.enqueue_event(new_text_message(final_text))
|
||||
|
||||
except Exception as exc:
|
||||
logger.error(
|
||||
"GoogleADKA2AExecutor: execution error [model=%s]: %s",
|
||||
self.model,
|
||||
type(exc).__name__,
|
||||
exc_info=True,
|
||||
)
|
||||
# Include exception detail (first ~1 KB) in the A2A error response so
|
||||
# callers get actionable context without needing workspace log access.
|
||||
# sanitize_agent_error scrubs API keys / bearer tokens before including
|
||||
# content in the response. Falls back to class-name-only when
|
||||
# the function is unavailable (standalone template repo layout).
|
||||
if sanitize_agent_error is not None:
|
||||
msg = sanitize_agent_error(stderr=str(exc))
|
||||
else:
|
||||
msg = f"Agent error: {type(exc).__name__}"
|
||||
await event_queue.enqueue_event(new_text_message(msg))
|
||||
|
||||
async def cancel(self, context: RequestContext, event_queue: EventQueue) -> None:
|
||||
"""Cancel a running task — emits canceled state per A2A protocol."""
|
||||
from a2a.types import TaskState, TaskStatus, TaskStatusUpdateEvent
|
||||
|
||||
await event_queue.enqueue_event(
|
||||
TaskStatusUpdateEvent(
|
||||
status=TaskStatus(state=TaskState.TASK_STATE_CANCELED),
|
||||
final=True,
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
class MissingContent:
|
||||
"""Sentinel to avoid AttributeError when response.content is None."""
|
||||
parts: list = []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleADKAdapter
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class GoogleADKAdapter(BaseAdapter):
|
||||
"""Molecule AI workspace adapter for Google ADK (google-adk v1.x).
|
||||
|
||||
Implements the full ``BaseAdapter`` lifecycle:
|
||||
- ``setup()`` — validates config and runs ``_common_setup()``.
|
||||
- ``create_executor()`` — returns a ``GoogleADKA2AExecutor`` configured
|
||||
from ``AdapterConfig``.
|
||||
"""
|
||||
|
||||
# Stored by setup(); consumed by create_executor()
|
||||
_setup_result: Any = None
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Identity
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
@staticmethod
|
||||
def name() -> str:
|
||||
"""Runtime identifier — matches the ``runtime`` field in config.yaml."""
|
||||
return "google-adk"
|
||||
|
||||
@staticmethod
|
||||
def display_name() -> str:
|
||||
"""Human-readable name shown in the Molecule AI UI."""
|
||||
return "Google ADK"
|
||||
|
||||
@staticmethod
|
||||
def description() -> str:
|
||||
"""Short description of this adapter's capabilities."""
|
||||
return (
|
||||
"Google Agent Development Kit (ADK) adapter. "
|
||||
"Runs LLM agents via Google Gemini models using the official "
|
||||
"google-adk Python SDK (Apache-2.0)."
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def get_config_schema() -> dict:
|
||||
"""JSON Schema for runtime_config fields rendered in the Config tab."""
|
||||
return {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"agent_name": {
|
||||
"type": "string",
|
||||
"default": _DEFAULT_AGENT_NAME,
|
||||
"description": "Internal ADK agent name",
|
||||
},
|
||||
"max_output_tokens": {
|
||||
"type": "integer",
|
||||
"default": _DEFAULT_MAX_OUTPUT_TOKENS,
|
||||
"description": "Maximum output tokens for the Gemini model",
|
||||
},
|
||||
"temperature": {
|
||||
"type": "number",
|
||||
"default": _DEFAULT_TEMPERATURE,
|
||||
"minimum": 0.0,
|
||||
"maximum": 2.0,
|
||||
"description": "Sampling temperature",
|
||||
},
|
||||
},
|
||||
"additionalProperties": False,
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Lifecycle
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
async def setup(self, config: AdapterConfig) -> None:
|
||||
"""Validate config and run the shared platform setup pipeline.
|
||||
|
||||
Raises ``RuntimeError`` if the required API key is not set and
|
||||
Vertex AI mode is not active.
|
||||
|
||||
Args:
|
||||
config: ``AdapterConfig`` populated by the workspace runtime.
|
||||
"""
|
||||
use_vertex = os.environ.get("GOOGLE_GENAI_USE_VERTEXAI", "").strip() in ("1", "true", "True")
|
||||
api_key = os.environ.get("GOOGLE_API_KEY", "").strip()
|
||||
|
||||
if not use_vertex and not api_key:
|
||||
raise RuntimeError(
|
||||
"GoogleADKAdapter requires GOOGLE_API_KEY (for AI Studio) or "
|
||||
"GOOGLE_GENAI_USE_VERTEXAI=1 with GOOGLE_CLOUD_PROJECT set."
|
||||
)
|
||||
|
||||
logger.info(
|
||||
"GoogleADKAdapter.setup: model=%s vertex=%s", config.model, use_vertex
|
||||
)
|
||||
|
||||
self._setup_result = await self._common_setup(config)
|
||||
|
||||
async def create_executor(self, config: AdapterConfig) -> GoogleADKA2AExecutor:
|
||||
"""Build and return a ``GoogleADKA2AExecutor`` for A2A integration.
|
||||
|
||||
Uses the system prompt assembled by ``_common_setup()`` in ``setup()``.
|
||||
Runtime-config keys ``agent_name``, ``max_output_tokens``, and
|
||||
``temperature`` are respected when present.
|
||||
|
||||
Args:
|
||||
config: ``AdapterConfig`` populated by the workspace runtime.
|
||||
|
||||
Returns:
|
||||
A ready-to-use ``GoogleADKA2AExecutor`` instance.
|
||||
"""
|
||||
rc = config.runtime_config or {}
|
||||
|
||||
# Strip provider prefix from model, e.g. "google:gemini-2.0-flash" → "gemini-2.0-flash"
|
||||
model = config.model
|
||||
if ":" in model:
|
||||
model = model.split(":", 1)[1]
|
||||
|
||||
system_prompt = (
|
||||
self._setup_result.system_prompt
|
||||
if self._setup_result is not None
|
||||
else config.system_prompt or ""
|
||||
)
|
||||
|
||||
return GoogleADKA2AExecutor(
|
||||
model=model,
|
||||
system_prompt=system_prompt,
|
||||
agent_name=rc.get("agent_name", _DEFAULT_AGENT_NAME),
|
||||
max_output_tokens=int(rc.get("max_output_tokens", _DEFAULT_MAX_OUTPUT_TOKENS)),
|
||||
temperature=float(rc.get("temperature", _DEFAULT_TEMPERATURE)),
|
||||
heartbeat=config.heartbeat,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Module-level alias required by the adapter autodiscovery loader
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
Adapter = GoogleADKAdapter
|
||||
@@ -1,7 +0,0 @@
|
||||
# Google ADK adapter dependencies
|
||||
# Pin to the latest stable release — update when a new version is verified.
|
||||
google-adk==1.30.0
|
||||
|
||||
# google-adk transitively requires google-genai; pin explicitly for
|
||||
# reproducibility (same pinning convention as other adapter requirements.txt).
|
||||
google-genai>=1.16.0
|
||||
@@ -1,993 +0,0 @@
|
||||
"""Unit tests for adapters/google-adk/adapter.py.
|
||||
|
||||
Coverage targets (100%)
|
||||
-----------------------
|
||||
- Module constants: _DEFAULT_AGENT_NAME, _DEFAULT_MAX_OUTPUT_TOKENS, etc.
|
||||
- MissingContent sentinel class
|
||||
- GoogleADKA2AExecutor.__init__ — field assignment + runner injection
|
||||
- GoogleADKA2AExecutor._extract_text
|
||||
- GoogleADKA2AExecutor._build_content
|
||||
- GoogleADKA2AExecutor._ensure_session — first call (create), subsequent call (skip)
|
||||
- GoogleADKA2AExecutor.execute — happy path, empty input, API error,
|
||||
no final_response events, partial text
|
||||
- GoogleADKA2AExecutor.cancel — TaskStatusUpdateEvent emitted
|
||||
- GoogleADKAdapter.name / display_name / description / get_config_schema
|
||||
- GoogleADKAdapter.setup — success, missing key, vertex override
|
||||
- GoogleADKAdapter.create_executor — model stripping, defaults, rc overrides
|
||||
- Adapter alias
|
||||
|
||||
All google-adk, google-genai, and shared_runtime calls are mocked.
|
||||
No live API calls are made.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
from types import ModuleType
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stub heavy external modules BEFORE the adapter is imported.
|
||||
# conftest.py already stubs: a2a, builtin_tools, langchain_core.
|
||||
# We need to additionally stub: google.adk, google.genai, shared_runtime.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_a2a_stubs() -> None:
|
||||
"""Register minimal a2a SDK stubs in sys.modules.
|
||||
|
||||
Mirrors what workspace/tests/conftest.py does; needed because
|
||||
this test file lives outside the ``tests/`` directory and conftest.py
|
||||
is not automatically loaded for it.
|
||||
"""
|
||||
if "a2a" in sys.modules:
|
||||
# Already mocked by conftest — just ensure new_agent_text_message is passthrough
|
||||
a2a_utils = sys.modules.get("a2a.utils")
|
||||
if a2a_utils and callable(getattr(a2a_utils, "new_agent_text_message", None)):
|
||||
a2a_utils.new_agent_text_message = lambda text, **kwargs: text
|
||||
return
|
||||
|
||||
agent_execution_mod = ModuleType("a2a.server.agent_execution")
|
||||
|
||||
class AgentExecutor:
|
||||
pass
|
||||
|
||||
class RequestContext:
|
||||
pass
|
||||
|
||||
agent_execution_mod.AgentExecutor = AgentExecutor
|
||||
agent_execution_mod.RequestContext = RequestContext
|
||||
|
||||
events_mod = ModuleType("a2a.server.events")
|
||||
|
||||
class EventQueue:
|
||||
pass
|
||||
|
||||
events_mod.EventQueue = EventQueue
|
||||
|
||||
tasks_mod = ModuleType("a2a.server.tasks")
|
||||
types_mod = ModuleType("a2a.types")
|
||||
|
||||
class Part:
|
||||
# v1: Part takes text= directly; root= retained for compat during transition
|
||||
def __init__(self, text=None, root=None, **kwargs):
|
||||
self.text = text
|
||||
|
||||
types_mod.Part = Part
|
||||
|
||||
# a2a.helpers (v1: moved from a2a.utils)
|
||||
helpers_mod = ModuleType("a2a.helpers")
|
||||
# Passthrough so tests can assert on the plain text string, matching the
|
||||
# hermes_executor test convention from conftest.py.
|
||||
helpers_mod.new_agent_text_message = lambda text, **kwargs: text
|
||||
|
||||
a2a_mod = ModuleType("a2a")
|
||||
a2a_server_mod = ModuleType("a2a.server")
|
||||
|
||||
sys.modules["a2a"] = a2a_mod
|
||||
sys.modules["a2a.server"] = a2a_server_mod
|
||||
sys.modules["a2a.server.agent_execution"] = agent_execution_mod
|
||||
sys.modules["a2a.server.events"] = events_mod
|
||||
sys.modules["a2a.server.tasks"] = tasks_mod
|
||||
sys.modules["a2a.types"] = types_mod
|
||||
sys.modules["a2a.helpers"] = helpers_mod
|
||||
|
||||
|
||||
def _make_google_adk_stubs() -> None:
|
||||
"""Register minimal google.adk and google.genai stubs in sys.modules."""
|
||||
# google (top-level namespace package)
|
||||
google_mod = sys.modules.get("google") or ModuleType("google")
|
||||
google_mod.__path__ = []
|
||||
sys.modules.setdefault("google", google_mod)
|
||||
|
||||
# google.genai
|
||||
google_genai_mod = ModuleType("google.genai")
|
||||
google_genai_mod.__path__ = []
|
||||
|
||||
google_genai_types_mod = ModuleType("google.genai.types")
|
||||
|
||||
class _Content:
|
||||
def __init__(self, role="user", parts=None):
|
||||
self.role = role
|
||||
self.parts = parts or []
|
||||
|
||||
class _Part:
|
||||
def __init__(self, text=""):
|
||||
self.text = text
|
||||
|
||||
google_genai_types_mod.Content = _Content
|
||||
google_genai_types_mod.Part = _Part
|
||||
|
||||
sys.modules["google.genai"] = google_genai_mod
|
||||
sys.modules["google.genai.types"] = google_genai_types_mod
|
||||
|
||||
# google.adk
|
||||
google_adk_mod = ModuleType("google.adk")
|
||||
google_adk_mod.__path__ = []
|
||||
|
||||
# google.adk.agents
|
||||
google_adk_agents_mod = ModuleType("google.adk.agents")
|
||||
|
||||
class _LlmAgent:
|
||||
def __init__(self, name="", model="", instruction="", tools=None):
|
||||
self.name = name
|
||||
self.model = model
|
||||
self.instruction = instruction
|
||||
self.tools = tools or []
|
||||
|
||||
google_adk_agents_mod.LlmAgent = _LlmAgent
|
||||
|
||||
# google.adk.runners
|
||||
google_adk_runners_mod = ModuleType("google.adk.runners")
|
||||
|
||||
class _Runner:
|
||||
def __init__(self, agent=None, app_name="", session_service=None):
|
||||
self.agent = agent
|
||||
self.app_name = app_name
|
||||
self.session_service = session_service
|
||||
|
||||
async def run_async(self, session_id, user_id, new_message):
|
||||
# Stub — tests override this via mock runner
|
||||
return
|
||||
yield # make it an async generator
|
||||
|
||||
google_adk_runners_mod.Runner = _Runner
|
||||
|
||||
# google.adk.sessions
|
||||
google_adk_sessions_mod = ModuleType("google.adk.sessions")
|
||||
|
||||
class _InMemorySessionService:
|
||||
def __init__(self):
|
||||
self._sessions: dict = {}
|
||||
|
||||
async def get_session(self, app_name, user_id, session_id):
|
||||
return self._sessions.get((app_name, user_id, session_id))
|
||||
|
||||
async def create_session(self, app_name, user_id, session_id):
|
||||
self._sessions[(app_name, user_id, session_id)] = {"id": session_id}
|
||||
return self._sessions[(app_name, user_id, session_id)]
|
||||
|
||||
google_adk_sessions_mod.InMemorySessionService = _InMemorySessionService
|
||||
|
||||
sys.modules["google.adk"] = google_adk_mod
|
||||
sys.modules["google.adk.agents"] = google_adk_agents_mod
|
||||
sys.modules["google.adk.runners"] = google_adk_runners_mod
|
||||
sys.modules["google.adk.sessions"] = google_adk_sessions_mod
|
||||
|
||||
|
||||
def _make_shared_runtime_stub() -> None:
|
||||
"""Register shared_runtime stub with extract_message_text."""
|
||||
if "shared_runtime" not in sys.modules:
|
||||
mod = ModuleType("shared_runtime")
|
||||
|
||||
def _extract_message_text(ctx) -> str:
|
||||
parts = getattr(getattr(ctx, "message", None), "parts", None)
|
||||
if parts is None:
|
||||
parts = ctx
|
||||
texts = []
|
||||
for p in parts or []:
|
||||
t = getattr(p, "text", None) or getattr(
|
||||
getattr(p, "root", None), "text", None
|
||||
) or ""
|
||||
if t:
|
||||
texts.append(t)
|
||||
return " ".join(texts).strip()
|
||||
|
||||
mod.extract_message_text = _extract_message_text
|
||||
sys.modules["shared_runtime"] = mod
|
||||
|
||||
|
||||
def _make_adapter_base_stub() -> None:
|
||||
"""Register adapter_base stub in sys.modules."""
|
||||
if "adapter_base" not in sys.modules:
|
||||
mod = ModuleType("adapter_base")
|
||||
from dataclasses import dataclass, field
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
@dataclass
|
||||
class AdapterConfig:
|
||||
model: str = "google:gemini-2.0-flash"
|
||||
system_prompt: str | None = None
|
||||
tools: list = field(default_factory=list)
|
||||
runtime_config: dict = field(default_factory=dict)
|
||||
config_path: str = "/configs"
|
||||
workspace_id: str = ""
|
||||
prompt_files: list = field(default_factory=list)
|
||||
a2a_port: int = 8000
|
||||
heartbeat: object = None
|
||||
|
||||
class BaseAdapter(ABC):
|
||||
@staticmethod
|
||||
@abstractmethod
|
||||
def name() -> str: ... # pragma: no cover
|
||||
|
||||
@staticmethod
|
||||
@abstractmethod
|
||||
def display_name() -> str: ... # pragma: no cover
|
||||
|
||||
@staticmethod
|
||||
@abstractmethod
|
||||
def description() -> str: ... # pragma: no cover
|
||||
|
||||
@staticmethod
|
||||
def get_config_schema() -> dict:
|
||||
return {}
|
||||
|
||||
def memory_filename(self) -> str:
|
||||
return "CLAUDE.md"
|
||||
|
||||
def register_tool_hook(self, name, fn): return None # noqa
|
||||
|
||||
async def transcript_lines(self, since=0, limit=100): return {"supported": False} # noqa
|
||||
|
||||
def register_subagent_hook(self, name, spec): return None # noqa
|
||||
|
||||
def append_to_memory_hook(self, config, filename, content): pass # noqa
|
||||
|
||||
async def install_plugins_via_registry(self, config, plugins): return [] # noqa
|
||||
|
||||
async def inject_plugins(self, config, plugins):
|
||||
await self.install_plugins_via_registry(config, plugins)
|
||||
|
||||
async def _common_setup(self, config):
|
||||
from types import SimpleNamespace
|
||||
return SimpleNamespace(
|
||||
system_prompt="mocked system prompt",
|
||||
loaded_skills=[],
|
||||
langchain_tools=[],
|
||||
is_coordinator=False,
|
||||
children=[],
|
||||
)
|
||||
|
||||
@abstractmethod
|
||||
async def setup(self, config) -> None: ... # pragma: no cover
|
||||
|
||||
@abstractmethod
|
||||
async def create_executor(self, config): ... # pragma: no cover
|
||||
|
||||
mod.AdapterConfig = AdapterConfig
|
||||
mod.BaseAdapter = BaseAdapter
|
||||
mod.SetupResult = None
|
||||
sys.modules["adapter_base"] = mod
|
||||
|
||||
|
||||
# Install all stubs before importing the module under test
|
||||
# Order matters: a2a must be stubbed before adapter.py is imported so that
|
||||
# `from a2a.utils import new_agent_text_message` resolves to the passthrough.
|
||||
_make_a2a_stubs()
|
||||
_make_google_adk_stubs()
|
||||
_make_shared_runtime_stub()
|
||||
_make_adapter_base_stub()
|
||||
|
||||
# Now safe to import the adapter
|
||||
import sys as _sys
|
||||
import os as _os
|
||||
_adapter_dir = _os.path.dirname(_os.path.abspath(__file__))
|
||||
if _adapter_dir not in _sys.path:
|
||||
_sys.path.insert(0, _adapter_dir)
|
||||
|
||||
from adapter import ( # noqa: E402
|
||||
Adapter,
|
||||
GoogleADKA2AExecutor,
|
||||
GoogleADKAdapter,
|
||||
MissingContent,
|
||||
_DEFAULT_AGENT_NAME,
|
||||
_DEFAULT_MAX_OUTPUT_TOKENS,
|
||||
_DEFAULT_TEMPERATURE,
|
||||
_NO_RESPONSE_MSG,
|
||||
_NO_TEXT_MSG,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Fixtures and helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_context(text: str, context_id: str = "ctx-test") -> MagicMock:
|
||||
"""Return a mock RequestContext with the given text in message.parts."""
|
||||
part = MagicMock()
|
||||
part.text = text
|
||||
ctx = MagicMock()
|
||||
ctx.message.parts = [part]
|
||||
ctx.context_id = context_id
|
||||
return ctx
|
||||
|
||||
|
||||
def _make_empty_context() -> MagicMock:
|
||||
"""Return a context whose message parts contain no text."""
|
||||
part = MagicMock(spec=[])
|
||||
part.root = MagicMock(spec=[])
|
||||
ctx = MagicMock()
|
||||
ctx.message.parts = [part]
|
||||
ctx.context_id = "ctx-empty"
|
||||
return ctx
|
||||
|
||||
|
||||
def _make_event(is_final: bool, text: str | None = None) -> MagicMock:
|
||||
"""Build a mock ADK Event that optionally is a final response."""
|
||||
event = MagicMock()
|
||||
event.is_final_response = MagicMock(return_value=is_final)
|
||||
if text is not None:
|
||||
part = MagicMock()
|
||||
part.text = text
|
||||
event.response = MagicMock()
|
||||
event.response.content = MagicMock()
|
||||
event.response.content.parts = [part]
|
||||
else:
|
||||
event.response = None
|
||||
return event
|
||||
|
||||
|
||||
async def _async_gen(*events):
|
||||
"""Yield events one by one as an async generator."""
|
||||
for e in events:
|
||||
yield e
|
||||
|
||||
|
||||
def _make_runner(events=None) -> MagicMock:
|
||||
"""Return a mock Runner whose run_async yields the given events."""
|
||||
runner = MagicMock()
|
||||
runner.session_service = AsyncMock()
|
||||
runner.session_service.get_session = AsyncMock(return_value=None)
|
||||
runner.session_service.create_session = AsyncMock(return_value={"id": "s1"})
|
||||
evts = events or []
|
||||
runner.run_async = MagicMock(return_value=_async_gen(*evts))
|
||||
return runner
|
||||
|
||||
|
||||
def _make_executor(
|
||||
model: str = "gemini-2.0-flash",
|
||||
system_prompt: str | None = "You are helpful.",
|
||||
runner: MagicMock | None = None,
|
||||
) -> GoogleADKA2AExecutor:
|
||||
"""Create a GoogleADKA2AExecutor with an injected mock runner."""
|
||||
return GoogleADKA2AExecutor(
|
||||
model=model,
|
||||
system_prompt=system_prompt,
|
||||
_runner=runner or _make_runner(),
|
||||
)
|
||||
|
||||
|
||||
def _make_adapter_config(**kwargs) -> object:
|
||||
"""Return an AdapterConfig with sensible defaults."""
|
||||
from adapter_base import AdapterConfig
|
||||
defaults = dict(
|
||||
model="google:gemini-2.0-flash",
|
||||
system_prompt="Test prompt.",
|
||||
runtime_config={},
|
||||
workspace_id="ws-test",
|
||||
)
|
||||
defaults.update(kwargs)
|
||||
return AdapterConfig(**defaults)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Constants
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_default_agent_name():
|
||||
assert _DEFAULT_AGENT_NAME == "molecule-adk-agent"
|
||||
|
||||
|
||||
def test_default_max_output_tokens():
|
||||
assert _DEFAULT_MAX_OUTPUT_TOKENS == 8192
|
||||
|
||||
|
||||
def test_default_temperature():
|
||||
assert _DEFAULT_TEMPERATURE == 1.0
|
||||
|
||||
|
||||
def test_no_text_msg_constant():
|
||||
assert "no text" in _NO_TEXT_MSG.lower()
|
||||
|
||||
|
||||
def test_no_response_msg_constant():
|
||||
assert "no response" in _NO_RESPONSE_MSG.lower()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# MissingContent sentinel
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_missing_content_has_empty_parts():
|
||||
mc = MissingContent()
|
||||
assert mc.parts == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleADKA2AExecutor — construction
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_constructor_stores_fields():
|
||||
runner = _make_runner()
|
||||
executor = GoogleADKA2AExecutor(
|
||||
model="gemini-1.5-pro",
|
||||
system_prompt="Hello",
|
||||
agent_name="my-agent",
|
||||
max_output_tokens=4096,
|
||||
temperature=0.5,
|
||||
_runner=runner,
|
||||
)
|
||||
assert executor.model == "gemini-1.5-pro"
|
||||
assert executor.system_prompt == "Hello"
|
||||
assert executor.agent_name == "my-agent"
|
||||
assert executor.max_output_tokens == 4096
|
||||
assert executor.temperature == 0.5
|
||||
assert executor._runner is runner
|
||||
assert executor._sessions_created == set()
|
||||
|
||||
|
||||
def test_constructor_defaults():
|
||||
executor = GoogleADKA2AExecutor(model="gemini-2.0-flash", _runner=_make_runner())
|
||||
assert executor.system_prompt is None
|
||||
assert executor.agent_name == _DEFAULT_AGENT_NAME
|
||||
assert executor.max_output_tokens == _DEFAULT_MAX_OUTPUT_TOKENS
|
||||
assert executor.temperature == _DEFAULT_TEMPERATURE
|
||||
assert executor._heartbeat is None
|
||||
|
||||
|
||||
def test_constructor_uses_injected_runner():
|
||||
stub = MagicMock()
|
||||
stub.session_service = MagicMock()
|
||||
executor = GoogleADKA2AExecutor(model="gemini-2.0-flash", _runner=stub)
|
||||
assert executor._runner is stub
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleADKA2AExecutor — _extract_text
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_extract_text_returns_message_text():
|
||||
executor = _make_executor()
|
||||
ctx = _make_context("Hello world")
|
||||
result = executor._extract_text(ctx)
|
||||
assert result == "Hello world"
|
||||
|
||||
|
||||
def test_extract_text_empty_context():
|
||||
executor = _make_executor()
|
||||
ctx = _make_empty_context()
|
||||
result = executor._extract_text(ctx)
|
||||
assert result == ""
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleADKA2AExecutor — _build_content
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_build_content_creates_content_object():
|
||||
executor = _make_executor()
|
||||
content = executor._build_content("test message")
|
||||
assert content.role == "user"
|
||||
assert len(content.parts) == 1
|
||||
assert content.parts[0].text == "test message"
|
||||
|
||||
|
||||
def test_build_content_empty_string():
|
||||
executor = _make_executor()
|
||||
content = executor._build_content("")
|
||||
assert content.parts[0].text == ""
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleADKA2AExecutor — _ensure_session
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_ensure_session_creates_when_not_exists():
|
||||
runner = _make_runner()
|
||||
runner.session_service.get_session = AsyncMock(return_value=None)
|
||||
executor = GoogleADKA2AExecutor(
|
||||
model="gemini-2.0-flash", agent_name="test-agent", _runner=runner
|
||||
)
|
||||
await executor._ensure_session("session-1", "user-1")
|
||||
runner.session_service.create_session.assert_called_once_with(
|
||||
app_name="test-agent",
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
)
|
||||
assert "session-1" in executor._sessions_created
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_ensure_session_skips_if_already_tracked():
|
||||
runner = _make_runner()
|
||||
executor = GoogleADKA2AExecutor(
|
||||
model="gemini-2.0-flash", _runner=runner
|
||||
)
|
||||
executor._sessions_created.add("session-x")
|
||||
await executor._ensure_session("session-x", "user-1")
|
||||
# Neither get_session nor create_session should be called
|
||||
runner.session_service.get_session.assert_not_called()
|
||||
runner.session_service.create_session.assert_not_called()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_ensure_session_skips_create_when_existing():
|
||||
runner = _make_runner()
|
||||
runner.session_service.get_session = AsyncMock(return_value={"id": "s1"})
|
||||
executor = GoogleADKA2AExecutor(
|
||||
model="gemini-2.0-flash", agent_name="test-agent", _runner=runner
|
||||
)
|
||||
await executor._ensure_session("session-existing", "user-1")
|
||||
runner.session_service.create_session.assert_not_called()
|
||||
assert "session-existing" in executor._sessions_created
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleADKA2AExecutor — execute: happy path
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_returns_response_text():
|
||||
event = _make_event(is_final=True, text="The answer is 42.")
|
||||
runner = _make_runner(events=[event])
|
||||
executor = _make_executor(runner=runner)
|
||||
|
||||
ctx = _make_context("What is 6×7?")
|
||||
eq = AsyncMock()
|
||||
await executor.execute(ctx, eq)
|
||||
|
||||
eq.enqueue_event.assert_called_once_with("The answer is 42.")
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_concatenates_multiple_final_parts():
|
||||
part1 = MagicMock()
|
||||
part1.text = "Hello "
|
||||
part2 = MagicMock()
|
||||
part2.text = "world"
|
||||
event = MagicMock()
|
||||
event.is_final_response = MagicMock(return_value=True)
|
||||
event.response = MagicMock()
|
||||
event.response.content = MagicMock()
|
||||
event.response.content.parts = [part1, part2]
|
||||
|
||||
runner = _make_runner(events=[event])
|
||||
executor = _make_executor(runner=runner)
|
||||
|
||||
ctx = _make_context("Hi")
|
||||
eq = AsyncMock()
|
||||
await executor.execute(ctx, eq)
|
||||
|
||||
eq.enqueue_event.assert_called_once_with("Hello world")
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_skips_non_final_events():
|
||||
non_final = _make_event(is_final=False, text="intermediate")
|
||||
final = _make_event(is_final=True, text="final answer")
|
||||
runner = _make_runner(events=[non_final, final])
|
||||
executor = _make_executor(runner=runner)
|
||||
|
||||
ctx = _make_context("question")
|
||||
eq = AsyncMock()
|
||||
await executor.execute(ctx, eq)
|
||||
|
||||
enqueued = eq.enqueue_event.call_args[0][0]
|
||||
assert enqueued == "final answer"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_fallback_when_no_final_response_events():
|
||||
non_final = _make_event(is_final=False)
|
||||
runner = _make_runner(events=[non_final])
|
||||
executor = _make_executor(runner=runner)
|
||||
|
||||
ctx = _make_context("hello")
|
||||
eq = AsyncMock()
|
||||
await executor.execute(ctx, eq)
|
||||
|
||||
eq.enqueue_event.assert_called_once_with(_NO_RESPONSE_MSG)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_fallback_when_response_is_none():
|
||||
event = MagicMock()
|
||||
event.is_final_response = MagicMock(return_value=True)
|
||||
event.response = None # no response object
|
||||
|
||||
runner = _make_runner(events=[event])
|
||||
executor = _make_executor(runner=runner)
|
||||
|
||||
ctx = _make_context("ping")
|
||||
eq = AsyncMock()
|
||||
await executor.execute(ctx, eq)
|
||||
|
||||
eq.enqueue_event.assert_called_once_with(_NO_RESPONSE_MSG)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_fallback_when_parts_have_no_text():
|
||||
part = MagicMock()
|
||||
part.text = None # no text on the part
|
||||
event = MagicMock()
|
||||
event.is_final_response = MagicMock(return_value=True)
|
||||
event.response = MagicMock()
|
||||
event.response.content = MagicMock()
|
||||
event.response.content.parts = [part]
|
||||
|
||||
runner = _make_runner(events=[event])
|
||||
executor = _make_executor(runner=runner)
|
||||
|
||||
ctx = _make_context("ping")
|
||||
eq = AsyncMock()
|
||||
await executor.execute(ctx, eq)
|
||||
|
||||
eq.enqueue_event.assert_called_once_with(_NO_RESPONSE_MSG)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_fallback_when_response_content_is_none():
|
||||
event = MagicMock()
|
||||
event.is_final_response = MagicMock(return_value=True)
|
||||
event.response = MagicMock()
|
||||
event.response.content = None # content is None → MissingContent sentinel
|
||||
|
||||
runner = _make_runner(events=[event])
|
||||
executor = _make_executor(runner=runner)
|
||||
|
||||
ctx = _make_context("ping")
|
||||
eq = AsyncMock()
|
||||
await executor.execute(ctx, eq)
|
||||
|
||||
eq.enqueue_event.assert_called_once_with(_NO_RESPONSE_MSG)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_uses_context_id_as_session_id():
|
||||
event = _make_event(is_final=True, text="ok")
|
||||
runner = _make_runner(events=[event])
|
||||
executor = _make_executor(runner=runner)
|
||||
|
||||
ctx = _make_context("hello", context_id="ctx-abc-123")
|
||||
eq = AsyncMock()
|
||||
await executor.execute(ctx, eq)
|
||||
|
||||
runner.run_async.assert_called_once()
|
||||
call_kwargs = runner.run_async.call_args[1]
|
||||
assert call_kwargs["session_id"] == "ctx-abc-123"
|
||||
assert call_kwargs["user_id"] == "molecule-user"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_falls_back_to_default_session_id_when_context_id_is_none():
|
||||
event = _make_event(is_final=True, text="ok")
|
||||
runner = _make_runner(events=[event])
|
||||
executor = _make_executor(runner=runner)
|
||||
|
||||
ctx = _make_context("hello")
|
||||
ctx.context_id = None # override
|
||||
eq = AsyncMock()
|
||||
await executor.execute(ctx, eq)
|
||||
|
||||
call_kwargs = runner.run_async.call_args[1]
|
||||
assert call_kwargs["session_id"] == "default-session"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleADKA2AExecutor — execute: empty input
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_empty_input_returns_error():
|
||||
runner = _make_runner()
|
||||
executor = _make_executor(runner=runner)
|
||||
|
||||
ctx = _make_empty_context()
|
||||
eq = AsyncMock()
|
||||
await executor.execute(ctx, eq)
|
||||
|
||||
eq.enqueue_event.assert_called_once_with(_NO_TEXT_MSG)
|
||||
runner.run_async.assert_not_called()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleADKA2AExecutor — execute: error handling
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_api_error_returns_sanitized_message():
|
||||
runner = _make_runner()
|
||||
|
||||
class _FakeAPIError(Exception):
|
||||
pass
|
||||
|
||||
async def _raise(*args, **kwargs):
|
||||
raise _FakeAPIError("api_key=secret token_limit_exceeded")
|
||||
yield # make it an async generator
|
||||
|
||||
runner.run_async = MagicMock(return_value=_raise())
|
||||
executor = _make_executor(runner=runner)
|
||||
|
||||
eq = AsyncMock()
|
||||
await executor.execute(_make_context("hello"), eq)
|
||||
|
||||
enqueued = eq.enqueue_event.call_args[0][0]
|
||||
assert enqueued == "Agent error: _FakeAPIError"
|
||||
assert "secret" not in enqueued
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_api_error_is_logged(caplog):
|
||||
import logging
|
||||
|
||||
runner = _make_runner()
|
||||
|
||||
async def _raise(*args, **kwargs):
|
||||
raise ValueError("bad request")
|
||||
yield # make it an async generator
|
||||
|
||||
runner.run_async = MagicMock(return_value=_raise())
|
||||
executor = _make_executor(runner=runner)
|
||||
|
||||
with caplog.at_level(logging.ERROR, logger="adapter"):
|
||||
await executor.execute(_make_context("hello"), AsyncMock())
|
||||
|
||||
assert any("execution error" in r.message.lower() for r in caplog.records)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleADKA2AExecutor — cancel
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cancel_emits_canceled_event():
|
||||
executor = _make_executor()
|
||||
|
||||
import a2a.types as a2a_types
|
||||
|
||||
class _TaskState:
|
||||
canceled = "canceled"
|
||||
|
||||
class _TaskStatus:
|
||||
def __init__(self, state):
|
||||
self.state = state
|
||||
|
||||
class _TaskStatusUpdateEvent:
|
||||
def __init__(self, status, final):
|
||||
self.status = status
|
||||
self.final = final
|
||||
|
||||
a2a_types.TaskState = _TaskState
|
||||
a2a_types.TaskStatus = _TaskStatus
|
||||
a2a_types.TaskStatusUpdateEvent = _TaskStatusUpdateEvent
|
||||
|
||||
eq = AsyncMock()
|
||||
ctx = MagicMock()
|
||||
await executor.cancel(ctx, eq)
|
||||
|
||||
eq.enqueue_event.assert_called_once()
|
||||
event = eq.enqueue_event.call_args[0][0]
|
||||
assert isinstance(event, _TaskStatusUpdateEvent)
|
||||
assert event.status.state == "canceled"
|
||||
assert event.final is True
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleADKAdapter — identity methods
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_adapter_name():
|
||||
assert GoogleADKAdapter.name() == "google-adk"
|
||||
|
||||
|
||||
def test_adapter_display_name():
|
||||
assert "Google ADK" in GoogleADKAdapter.display_name()
|
||||
|
||||
|
||||
def test_adapter_description():
|
||||
desc = GoogleADKAdapter.description()
|
||||
assert "ADK" in desc or "Google" in desc
|
||||
|
||||
|
||||
def test_adapter_get_config_schema():
|
||||
schema = GoogleADKAdapter.get_config_schema()
|
||||
assert schema["type"] == "object"
|
||||
assert "agent_name" in schema["properties"]
|
||||
assert "max_output_tokens" in schema["properties"]
|
||||
assert "temperature" in schema["properties"]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleADKAdapter — setup
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_setup_succeeds_with_api_key(monkeypatch):
|
||||
monkeypatch.setenv("GOOGLE_API_KEY", "fake-api-key")
|
||||
monkeypatch.delenv("GOOGLE_GENAI_USE_VERTEXAI", raising=False)
|
||||
|
||||
adapter = GoogleADKAdapter()
|
||||
config = _make_adapter_config()
|
||||
|
||||
await adapter.setup(config)
|
||||
|
||||
assert adapter._setup_result is not None
|
||||
assert adapter._setup_result.system_prompt == "mocked system prompt"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_setup_succeeds_with_vertex_ai(monkeypatch):
|
||||
monkeypatch.delenv("GOOGLE_API_KEY", raising=False)
|
||||
monkeypatch.setenv("GOOGLE_GENAI_USE_VERTEXAI", "1")
|
||||
|
||||
adapter = GoogleADKAdapter()
|
||||
config = _make_adapter_config()
|
||||
|
||||
await adapter.setup(config)
|
||||
|
||||
assert adapter._setup_result is not None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_setup_succeeds_with_vertex_ai_true_string(monkeypatch):
|
||||
monkeypatch.delenv("GOOGLE_API_KEY", raising=False)
|
||||
monkeypatch.setenv("GOOGLE_GENAI_USE_VERTEXAI", "True")
|
||||
|
||||
adapter = GoogleADKAdapter()
|
||||
config = _make_adapter_config()
|
||||
|
||||
await adapter.setup(config)
|
||||
assert adapter._setup_result is not None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_setup_raises_without_credentials(monkeypatch):
|
||||
monkeypatch.delenv("GOOGLE_API_KEY", raising=False)
|
||||
monkeypatch.delenv("GOOGLE_GENAI_USE_VERTEXAI", raising=False)
|
||||
|
||||
adapter = GoogleADKAdapter()
|
||||
config = _make_adapter_config()
|
||||
|
||||
with pytest.raises(RuntimeError, match="GOOGLE_API_KEY"):
|
||||
await adapter.setup(config)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleADKAdapter — create_executor
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_create_executor_strips_google_prefix(monkeypatch):
|
||||
monkeypatch.setenv("GOOGLE_API_KEY", "key")
|
||||
adapter = GoogleADKAdapter()
|
||||
config = _make_adapter_config(model="google:gemini-2.0-flash")
|
||||
await adapter.setup(config)
|
||||
|
||||
executor = await adapter.create_executor(config)
|
||||
assert executor.model == "gemini-2.0-flash"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_create_executor_no_prefix_passthrough(monkeypatch):
|
||||
monkeypatch.setenv("GOOGLE_API_KEY", "key")
|
||||
adapter = GoogleADKAdapter()
|
||||
config = _make_adapter_config(model="gemini-1.5-pro")
|
||||
await adapter.setup(config)
|
||||
|
||||
executor = await adapter.create_executor(config)
|
||||
assert executor.model == "gemini-1.5-pro"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_create_executor_uses_setup_system_prompt(monkeypatch):
|
||||
monkeypatch.setenv("GOOGLE_API_KEY", "key")
|
||||
adapter = GoogleADKAdapter()
|
||||
config = _make_adapter_config()
|
||||
await adapter.setup(config)
|
||||
|
||||
executor = await adapter.create_executor(config)
|
||||
assert executor.system_prompt == "mocked system prompt"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_create_executor_runtime_config_overrides(monkeypatch):
|
||||
monkeypatch.setenv("GOOGLE_API_KEY", "key")
|
||||
adapter = GoogleADKAdapter()
|
||||
config = _make_adapter_config(
|
||||
runtime_config={
|
||||
"agent_name": "custom-agent",
|
||||
"max_output_tokens": 512,
|
||||
"temperature": 0.3,
|
||||
}
|
||||
)
|
||||
await adapter.setup(config)
|
||||
|
||||
executor = await adapter.create_executor(config)
|
||||
assert executor.agent_name == "custom-agent"
|
||||
assert executor.max_output_tokens == 512
|
||||
assert executor.temperature == 0.3
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_create_executor_defaults_without_runtime_config(monkeypatch):
|
||||
monkeypatch.setenv("GOOGLE_API_KEY", "key")
|
||||
adapter = GoogleADKAdapter()
|
||||
config = _make_adapter_config(runtime_config={})
|
||||
await adapter.setup(config)
|
||||
|
||||
executor = await adapter.create_executor(config)
|
||||
assert executor.agent_name == _DEFAULT_AGENT_NAME
|
||||
assert executor.max_output_tokens == _DEFAULT_MAX_OUTPUT_TOKENS
|
||||
assert executor.temperature == _DEFAULT_TEMPERATURE
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_create_executor_without_setup_uses_config_system_prompt(monkeypatch):
|
||||
"""create_executor without prior setup falls back to config.system_prompt."""
|
||||
monkeypatch.setenv("GOOGLE_API_KEY", "key")
|
||||
adapter = GoogleADKAdapter()
|
||||
config = _make_adapter_config(system_prompt="fallback prompt")
|
||||
# Intentionally skip setup() — _setup_result remains None
|
||||
|
||||
executor = await adapter.create_executor(config)
|
||||
assert executor.system_prompt == "fallback prompt"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_create_executor_without_setup_no_system_prompt(monkeypatch):
|
||||
"""create_executor without setup and no system_prompt → empty string."""
|
||||
monkeypatch.setenv("GOOGLE_API_KEY", "key")
|
||||
adapter = GoogleADKAdapter()
|
||||
config = _make_adapter_config(system_prompt=None)
|
||||
# Skip setup()
|
||||
|
||||
executor = await adapter.create_executor(config)
|
||||
assert executor.system_prompt == ""
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_create_executor_heartbeat_passed(monkeypatch):
|
||||
monkeypatch.setenv("GOOGLE_API_KEY", "key")
|
||||
adapter = GoogleADKAdapter()
|
||||
heartbeat = MagicMock()
|
||||
config = _make_adapter_config(heartbeat=heartbeat)
|
||||
await adapter.setup(config)
|
||||
|
||||
executor = await adapter.create_executor(config)
|
||||
assert executor._heartbeat is heartbeat
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Adapter alias
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_adapter_alias_is_google_adk_adapter():
|
||||
assert Adapter is GoogleADKAdapter
|
||||
@@ -1,2 +0,0 @@
|
||||
"""Re-export from shared_runtime for backward compat."""
|
||||
from shared_runtime import * # noqa: F401,F403
|
||||
@@ -1,32 +0,0 @@
|
||||
"""Smolagents adapter for Molecule AI workspace runtime.
|
||||
|
||||
Provides env sanitization and safe executor/messaging primitives for use
|
||||
with HuggingFace's smolagents library.
|
||||
|
||||
Two env-sanitization strategies are available:
|
||||
|
||||
* **Allowlist** (recommended) — :mod:`adapters.smolagents.env_sanitize`:
|
||||
only explicitly-safe variables pass through. Stricter but requires keeping
|
||||
the allowlist up-to-date as new safe vars are needed.
|
||||
|
||||
* **Denylist** (simple) — :mod:`adapters.smolagents.safe_env`:
|
||||
well-known secret names plus ``*_API_KEY`` / ``*_TOKEN`` suffix patterns
|
||||
are stripped. Easier to start with; less exhaustive.
|
||||
|
||||
Quick start::
|
||||
|
||||
# Allowlist approach (stricter)
|
||||
from adapters.smolagents.env_sanitize import make_safe_env, SafeLocalPythonExecutor
|
||||
|
||||
# Denylist approach (simpler)
|
||||
from adapters.smolagents.safe_env import make_safe_env
|
||||
|
||||
# Safe messaging
|
||||
from adapters.smolagents.send_message_wrapper import safe_send_message
|
||||
"""
|
||||
|
||||
# Re-export the allowlist-based make_safe_env as the default (most secure).
|
||||
from adapters.smolagents.env_sanitize import SafeLocalPythonExecutor, make_safe_env
|
||||
from adapters.smolagents.send_message_wrapper import safe_send_message
|
||||
|
||||
__all__ = ["make_safe_env", "SafeLocalPythonExecutor", "safe_send_message"]
|
||||
@@ -1,226 +0,0 @@
|
||||
"""Allowlist-based environment sanitization for smolagents (#826 — C3 CRITICAL).
|
||||
|
||||
Security model
|
||||
--------------
|
||||
We use an **allowlist** (not a denylist) — only variables explicitly
|
||||
enumerated as safe are passed through to agent-executed code. Any key not
|
||||
on the list is silently dropped.
|
||||
|
||||
This is intentionally strict: adding a new safe variable is a deliberate
|
||||
engineering act that surfaces in code review, rather than hoping a regex
|
||||
denylist catches every new secret name.
|
||||
|
||||
Thread safety
|
||||
-------------
|
||||
``SafeLocalPythonExecutor.__call__`` mutates ``os.environ`` temporarily.
|
||||
``_ENV_PATCH_LOCK`` serialises concurrent calls so simultaneous executions
|
||||
do not see each other's env patches.
|
||||
|
||||
Extending the allowlist
|
||||
-----------------------
|
||||
Set ``SMOLAGENTS_ENV_EXTRA_ALLOWLIST`` to a comma-separated list of
|
||||
additional uppercase env var names that should be passed through. This is
|
||||
intended for workspace-specific non-secret variables (e.g. ``WORKSPACE_ID``
|
||||
that you know are safe):
|
||||
|
||||
SMOLAGENTS_ENV_EXTRA_ALLOWLIST="MY_COMPANY_ENV,REGION"
|
||||
|
||||
Never add secret names here — use workspace secrets injection instead.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import threading
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Allowlist configuration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Core safe env variables — non-secret system and runtime variables that
|
||||
# agent code may legitimately need (e.g. PATH for subprocess-free tools,
|
||||
# PYTHONPATH for module resolution, TZ for datetime ops).
|
||||
_SAFE_ENV_ALLOWLIST: frozenset = frozenset(
|
||||
[
|
||||
# Shell / system fundamentals
|
||||
"PATH",
|
||||
"HOME",
|
||||
"USER",
|
||||
"LOGNAME",
|
||||
"SHELL",
|
||||
"TERM",
|
||||
"TZ",
|
||||
"TMPDIR",
|
||||
"TEMP",
|
||||
"TMP",
|
||||
# Language / locale
|
||||
"LANG",
|
||||
"LANGUAGE",
|
||||
"LC_ALL",
|
||||
"LC_CTYPE",
|
||||
"LC_MESSAGES",
|
||||
"LC_NUMERIC",
|
||||
"LC_TIME",
|
||||
# Python runtime
|
||||
"PYTHONPATH",
|
||||
"PYTHONHOME",
|
||||
"PYTHONDONTWRITEBYTECODE",
|
||||
"PYTHONUNBUFFERED",
|
||||
"PYTHONIOENCODING",
|
||||
# Molecule workspace non-secret identity vars
|
||||
"WORKSPACE_ID",
|
||||
"WORKSPACE_NAME",
|
||||
"PLATFORM_URL",
|
||||
]
|
||||
)
|
||||
|
||||
# Imports permanently excluded from the executor's authorized list.
|
||||
# These are well-known sandbox-escape vectors.
|
||||
_BANNED_IMPORTS: frozenset = frozenset(
|
||||
["subprocess", "socket", "ctypes", "importlib", "importlib.util"]
|
||||
)
|
||||
|
||||
# Baseline imports every SafeLocalPythonExecutor allows — pure-computation
|
||||
# modules with no I/O escape surface.
|
||||
_BASELINE_SAFE_IMPORTS: List[str] = [
|
||||
"math",
|
||||
"json",
|
||||
"re",
|
||||
"datetime",
|
||||
"collections",
|
||||
"itertools",
|
||||
"functools",
|
||||
"typing",
|
||||
"string",
|
||||
"textwrap",
|
||||
"decimal",
|
||||
"fractions",
|
||||
"statistics",
|
||||
"random",
|
||||
"hashlib",
|
||||
"base64",
|
||||
"urllib.parse",
|
||||
"copy",
|
||||
"dataclasses",
|
||||
"enum",
|
||||
"abc",
|
||||
"io",
|
||||
]
|
||||
|
||||
# Thread lock for env patching
|
||||
_ENV_PATCH_LOCK = threading.Lock()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public API
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def make_safe_env(
|
||||
extra_allowed: Optional[List[str]] = None,
|
||||
) -> Dict[str, str]:
|
||||
"""Return a *copy* of the environment containing only allowlisted keys.
|
||||
|
||||
``os.environ`` is **never mutated** by this function.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
extra_allowed:
|
||||
Additional variable names to include beyond the built-in allowlist.
|
||||
Also merged with the ``SMOLAGENTS_ENV_EXTRA_ALLOWLIST`` env var.
|
||||
|
||||
Returns
|
||||
-------
|
||||
dict
|
||||
A copy of ``os.environ`` filtered to allowlisted keys only.
|
||||
Keys not on the list are silently dropped.
|
||||
"""
|
||||
allowed = set(_SAFE_ENV_ALLOWLIST)
|
||||
|
||||
# Merge caller-provided extras
|
||||
if extra_allowed:
|
||||
allowed.update(k.upper() for k in extra_allowed)
|
||||
|
||||
# Merge env-var-configured extras
|
||||
env_extra = os.environ.get("SMOLAGENTS_ENV_EXTRA_ALLOWLIST", "")
|
||||
if env_extra:
|
||||
for key in env_extra.split(","):
|
||||
key = key.strip().upper()
|
||||
if key:
|
||||
allowed.add(key)
|
||||
|
||||
return {k: v for k, v in os.environ.items() if k in allowed}
|
||||
|
||||
|
||||
class SafeLocalPythonExecutor:
|
||||
"""Allowlist-gated wrapper around smolagents ``LocalPythonExecutor``.
|
||||
|
||||
Guarantees that agent-generated code cannot read secret environment
|
||||
variables (``ANTHROPIC_API_KEY``, ``GH_TOKEN``, ``DATABASE_URL``, etc.)
|
||||
because they are absent from ``os.environ`` during execution.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
additional_imports:
|
||||
Extra module names to allow beyond ``_BASELINE_SAFE_IMPORTS``.
|
||||
``_BANNED_IMPORTS`` takes precedence — listed names are silently
|
||||
removed.
|
||||
extra_allowed_env:
|
||||
Extra variable names to pass through beyond the core allowlist.
|
||||
_inner:
|
||||
Inject a mock ``LocalPythonExecutor`` for tests. When ``None``,
|
||||
the real smolagents executor is constructed lazily.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
additional_imports: Optional[List[str]] = None,
|
||||
extra_allowed_env: Optional[List[str]] = None,
|
||||
*,
|
||||
_inner: Any = None,
|
||||
) -> None:
|
||||
# Compute final import list (baseline + extras − banned)
|
||||
combined = list(_BASELINE_SAFE_IMPORTS)
|
||||
if additional_imports:
|
||||
for imp in additional_imports:
|
||||
if imp not in _BANNED_IMPORTS:
|
||||
combined.append(imp)
|
||||
|
||||
self._authorized_imports: List[str] = combined
|
||||
self._extra_allowed_env: Optional[List[str]] = extra_allowed_env
|
||||
self._inner = _inner # may be None until first call
|
||||
|
||||
def _get_inner(self) -> Any:
|
||||
"""Lazy-construct the real executor on first use (avoids import errors in tests)."""
|
||||
if self._inner is None:
|
||||
from smolagents import LocalPythonExecutor # type: ignore[import]
|
||||
|
||||
self._inner = LocalPythonExecutor(
|
||||
additional_authorized_imports=self._authorized_imports
|
||||
)
|
||||
return self._inner
|
||||
|
||||
def __call__(self, code: str, *args: Any, **kwargs: Any) -> Any:
|
||||
"""Execute ``code`` with only allowlisted env vars visible.
|
||||
|
||||
All keys not on the allowlist are removed from ``os.environ`` for
|
||||
the duration of execution and restored afterward, even on exception.
|
||||
The lock ensures thread safety across concurrent calls.
|
||||
"""
|
||||
safe_env = make_safe_env(self._extra_allowed_env)
|
||||
inner = self._get_inner()
|
||||
|
||||
with _ENV_PATCH_LOCK:
|
||||
# Snapshot full current env
|
||||
original_env = dict(os.environ)
|
||||
# Remove everything not in the safe set
|
||||
keys_to_remove = [k for k in os.environ if k not in safe_env]
|
||||
for k in keys_to_remove:
|
||||
del os.environ[k]
|
||||
try:
|
||||
return inner(code, *args, **kwargs)
|
||||
finally:
|
||||
# Always restore
|
||||
os.environ.clear()
|
||||
os.environ.update(original_env)
|
||||
@@ -1,61 +0,0 @@
|
||||
"""Denylist-based environment sanitization for smolagents (issue #826 — C3 CRITICAL).
|
||||
|
||||
This module provides a simple denylist approach: well-known secret variable
|
||||
names plus ``*_API_KEY`` and ``*_TOKEN`` suffix patterns are stripped before
|
||||
env is passed to agent-executed code.
|
||||
|
||||
For a stricter allowlist-based alternative that only passes explicitly-safe
|
||||
variables through, see :mod:`adapters.smolagents.env_sanitize`.
|
||||
|
||||
Usage::
|
||||
|
||||
from adapters.smolagents.safe_env import make_safe_env
|
||||
|
||||
executor = LocalPythonExecutor(...)
|
||||
# Pass only the sanitised env to the subprocess / exec context:
|
||||
safe = make_safe_env()
|
||||
"""
|
||||
|
||||
import copy
|
||||
import os
|
||||
|
||||
# Named API keys and tokens known to be used by smolagents / LLM clients.
|
||||
# These are removed regardless of the suffix-pattern below.
|
||||
SMOLAGENTS_ENV_DENYLIST: frozenset = frozenset(
|
||||
{
|
||||
"OPENAI_API_KEY",
|
||||
"ANTHROPIC_API_KEY",
|
||||
"GROQ_API_KEY",
|
||||
"CEREBRAS_API_KEY",
|
||||
"QIANFAN_API_KEY",
|
||||
"LANGFUSE_SECRET_KEY",
|
||||
"LANGFUSE_PUBLIC_KEY",
|
||||
"HF_TOKEN",
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
def make_safe_env() -> dict:
|
||||
"""Return a sanitised copy of ``os.environ`` with secrets removed.
|
||||
|
||||
Removes any key that:
|
||||
- Is in :data:`SMOLAGENTS_ENV_DENYLIST`, OR
|
||||
- Ends with ``_API_KEY``, OR
|
||||
- Ends with ``_TOKEN``
|
||||
|
||||
``os.environ`` is **never mutated** — a fresh ``dict`` copy is returned.
|
||||
|
||||
Returns
|
||||
-------
|
||||
dict
|
||||
A copy of the current environment with secret keys removed.
|
||||
"""
|
||||
env = copy.copy(dict(os.environ))
|
||||
for key in list(env.keys()):
|
||||
if (
|
||||
key in SMOLAGENTS_ENV_DENYLIST
|
||||
or key.endswith("_API_KEY")
|
||||
or key.endswith("_TOKEN")
|
||||
):
|
||||
del env[key]
|
||||
return env
|
||||
@@ -1,71 +0,0 @@
|
||||
"""Safe send_message wrapper for smolagents (issue #827 — C1 HIGH).
|
||||
|
||||
Prevents social-engineering attacks where agent-generated content could
|
||||
impersonate platform messages, inject HTML, or flood the user chat.
|
||||
|
||||
Guarantees
|
||||
----------
|
||||
1. Every message is prefixed with ``[smolagents]`` so recipients can
|
||||
attribute it to the agent and cannot be mistaken for platform UI.
|
||||
2. Truncated to 2000 characters to prevent log/UI floods.
|
||||
3. HTML entities (``<``, ``>``, ``&``, ``"``, ``'``) are escaped so
|
||||
rendered UIs that interpret HTML cannot be injected into.
|
||||
|
||||
Usage::
|
||||
|
||||
from adapters.smolagents.send_message_wrapper import safe_send_message
|
||||
|
||||
safe_send_message("Hello world", send_fn=platform_client.send)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import html
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Maximum character length for the *user-visible* portion of the message
|
||||
# (label prefix does not count toward this cap).
|
||||
_MAX_TEXT_LEN: int = 2000
|
||||
|
||||
# Label prepended to every outbound message.
|
||||
_LABEL: str = "[smolagents]"
|
||||
|
||||
|
||||
def safe_send_message(text: str, send_fn) -> None:
|
||||
"""Sanitise *text* and deliver it via *send_fn*.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
text:
|
||||
The raw message text produced by the agent.
|
||||
send_fn:
|
||||
Callable that delivers the message (e.g. ``platform_client.send``
|
||||
or a WebSocket broadcast function). Called with the final,
|
||||
sanitised string as its sole positional argument.
|
||||
|
||||
Side effects
|
||||
------------
|
||||
- Logs a warning when truncation occurs.
|
||||
- Logs a debug entry with the final payload length.
|
||||
"""
|
||||
if not isinstance(text, str):
|
||||
text = str(text)
|
||||
|
||||
# Strip HTML entities to prevent injection into rendered UIs.
|
||||
sanitised = html.escape(text, quote=True)
|
||||
|
||||
# Truncate to cap (before adding label so cap applies to content).
|
||||
if len(sanitised) > _MAX_TEXT_LEN:
|
||||
logger.warning(
|
||||
"safe_send_message: truncating message from %d to %d chars",
|
||||
len(sanitised),
|
||||
_MAX_TEXT_LEN,
|
||||
)
|
||||
sanitised = sanitised[:_MAX_TEXT_LEN]
|
||||
|
||||
payload = f"{_LABEL} {sanitised}"
|
||||
|
||||
logger.debug("safe_send_message: delivering %d-char payload", len(payload))
|
||||
send_fn(payload)
|
||||
@@ -1,133 +0,0 @@
|
||||
"""Create the Deep Agent with model + skills + tools."""
|
||||
|
||||
import os
|
||||
import logging
|
||||
|
||||
from langgraph.prebuilt import create_react_agent
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def create_agent(model_str: str, tools: list, system_prompt: str):
|
||||
"""Create a LangGraph ReAct agent.
|
||||
|
||||
Args:
|
||||
model_str: LangChain-compatible model string (e.g., 'anthropic:claude-sonnet-4-6')
|
||||
tools: List of tool functions
|
||||
system_prompt: The system prompt for the agent
|
||||
"""
|
||||
# Parse provider:model format
|
||||
if ":" in model_str:
|
||||
provider, model_name = model_str.split(":", 1)
|
||||
else:
|
||||
provider = "anthropic"
|
||||
model_name = model_str
|
||||
|
||||
# Import the provider package
|
||||
try:
|
||||
if provider in ("anthropic",):
|
||||
from langchain_anthropic import ChatAnthropic as LLMClass
|
||||
elif provider in ("openai", "openrouter", "groq", "cerebras", "qianfan"):
|
||||
from langchain_openai import ChatOpenAI as LLMClass
|
||||
elif provider == "google_genai":
|
||||
from langchain_google_genai import ChatGoogleGenerativeAI as LLMClass
|
||||
elif provider == "ollama":
|
||||
from langchain_ollama import ChatOllama as LLMClass
|
||||
else:
|
||||
raise ValueError(f"Unsupported model provider: {provider}")
|
||||
except ImportError as e:
|
||||
pkg = "langchain-openai" if provider == "openrouter" else f"langchain-{provider}"
|
||||
raise ImportError(f"Provider '{provider}' requires package '{pkg}'. Install: pip install {pkg}") from e
|
||||
|
||||
# Instantiate the LLM
|
||||
if provider == "anthropic":
|
||||
llm_kwargs = {"model": model_name}
|
||||
anthropic_base_url = os.environ.get("ANTHROPIC_BASE_URL", "")
|
||||
if anthropic_base_url:
|
||||
llm_kwargs["anthropic_api_url"] = anthropic_base_url
|
||||
llm = LLMClass(**llm_kwargs)
|
||||
elif provider == "openrouter":
|
||||
api_key = os.environ.get("OPENROUTER_API_KEY", os.environ.get("OPENAI_API_KEY", ""))
|
||||
max_tokens = int(os.environ.get("MAX_TOKENS", "2048"))
|
||||
llm = LLMClass(
|
||||
model=model_name,
|
||||
openai_api_key=api_key,
|
||||
openai_api_base="https://openrouter.ai/api/v1",
|
||||
max_tokens=max_tokens,
|
||||
)
|
||||
elif provider == "groq":
|
||||
api_key = os.environ.get("GROQ_API_KEY", "")
|
||||
llm = LLMClass(
|
||||
model=model_name,
|
||||
openai_api_key=api_key,
|
||||
openai_api_base="https://api.groq.com/openai/v1",
|
||||
)
|
||||
elif provider == "cerebras":
|
||||
api_key = os.environ.get("CEREBRAS_API_KEY", "")
|
||||
llm = LLMClass(
|
||||
model=model_name,
|
||||
openai_api_key=api_key,
|
||||
openai_api_base="https://api.cerebras.ai/v1",
|
||||
)
|
||||
elif provider == "qianfan":
|
||||
api_key = os.environ.get("QIANFAN_API_KEY", os.environ.get("AISTUDIO_API_KEY", ""))
|
||||
llm = LLMClass(
|
||||
model=model_name,
|
||||
openai_api_key=api_key,
|
||||
openai_api_base="https://qianfan.baidubce.com/v2",
|
||||
)
|
||||
elif provider == "openai":
|
||||
llm_kwargs = {"model": model_name}
|
||||
openai_base_url = os.environ.get("OPENAI_BASE_URL", "")
|
||||
if openai_base_url:
|
||||
llm_kwargs["openai_api_base"] = openai_base_url
|
||||
llm = LLMClass(**llm_kwargs)
|
||||
else:
|
||||
llm = LLMClass(model=model_name)
|
||||
|
||||
# Auto-inject Langfuse tracing if env vars are present
|
||||
callbacks = _setup_langfuse()
|
||||
if callbacks:
|
||||
llm.callbacks = callbacks
|
||||
|
||||
agent = create_react_agent(
|
||||
model=llm,
|
||||
tools=tools,
|
||||
prompt=system_prompt,
|
||||
)
|
||||
|
||||
return agent
|
||||
|
||||
|
||||
def _setup_langfuse():
|
||||
"""Set up Langfuse tracing if LANGFUSE_* env vars are present.
|
||||
|
||||
Returns list of callbacks to pass to agent invocations, or empty list.
|
||||
"""
|
||||
langfuse_host = os.environ.get("LANGFUSE_HOST")
|
||||
langfuse_public = os.environ.get("LANGFUSE_PUBLIC_KEY")
|
||||
langfuse_secret = os.environ.get("LANGFUSE_SECRET_KEY")
|
||||
|
||||
if not (langfuse_host and langfuse_public and langfuse_secret):
|
||||
return []
|
||||
|
||||
try:
|
||||
from langfuse.callback import CallbackHandler
|
||||
|
||||
handler = CallbackHandler(
|
||||
host=langfuse_host,
|
||||
public_key=langfuse_public,
|
||||
secret_key=langfuse_secret,
|
||||
)
|
||||
logger.info("Langfuse tracing enabled: %s", langfuse_host)
|
||||
|
||||
# Also set LANGSMITH_TRACING for LangGraph native integration
|
||||
os.environ.setdefault("LANGSMITH_TRACING", "true")
|
||||
|
||||
return [handler]
|
||||
except ImportError:
|
||||
logger.warning("Langfuse env vars set but langfuse package not installed")
|
||||
return []
|
||||
except Exception as e:
|
||||
logger.warning("Langfuse setup failed: %s", e)
|
||||
return []
|
||||
@@ -1,74 +0,0 @@
|
||||
"""AGENTS.md auto-generation for Molecule AI workspaces.
|
||||
|
||||
Implements the AAIF / Linux Foundation AGENTS.md standard so that peer agents
|
||||
and orchestration tools can discover this workspace's identity, role, A2A
|
||||
endpoint, and available tools without reading the full system prompt.
|
||||
|
||||
Usage::
|
||||
|
||||
from agents_md import generate_agents_md
|
||||
|
||||
generate_agents_md(config_dir="/configs", output_path="/workspace/AGENTS.md")
|
||||
|
||||
The function is called automatically at container startup (see main.py).
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def generate_agents_md(config_dir: str, output_path: str) -> None:
|
||||
"""Generate (or regenerate) AGENTS.md from the workspace config.yaml.
|
||||
|
||||
Always overwrites ``output_path`` — no stale-file guard. Re-calling
|
||||
after editing config.yaml produces a fresh file reflecting the changes.
|
||||
|
||||
Args:
|
||||
config_dir: Directory containing config.yaml (same convention as
|
||||
``load_config`` in config.py).
|
||||
output_path: Absolute path where AGENTS.md will be written.
|
||||
The parent directory is expected to exist.
|
||||
"""
|
||||
from config import load_config
|
||||
|
||||
cfg = load_config(config_dir)
|
||||
|
||||
# ── A2A Endpoint ─────────────────────────────────────────────────────────
|
||||
# AGENT_URL env var takes priority (production deployments behind a proxy).
|
||||
# Otherwise derive from the configured a2a.port (default 8000).
|
||||
endpoint = os.environ.get("AGENT_URL") or f"http://localhost:{cfg.a2a.port}/a2a"
|
||||
|
||||
# ── Role ─────────────────────────────────────────────────────────────────
|
||||
# Fall back to description when the role field is absent so legacy
|
||||
# config.yaml files (without a role key) still produce meaningful output.
|
||||
role = cfg.role if cfg.role else cfg.description
|
||||
|
||||
# ── MCP Tools ────────────────────────────────────────────────────────────
|
||||
# tools (skill names) + plugins (installed plugin names) form the combined
|
||||
# capability surface visible to peer agents.
|
||||
all_tools = list(cfg.tools) + list(cfg.plugins)
|
||||
if all_tools:
|
||||
tools_section = "\n".join(f"- {t}" for t in all_tools)
|
||||
else:
|
||||
tools_section = "None"
|
||||
|
||||
content = (
|
||||
f"# {cfg.name}\n"
|
||||
f"\n"
|
||||
f"**Role:** {role}\n"
|
||||
f"\n"
|
||||
f"## Description\n"
|
||||
f"{cfg.description}\n"
|
||||
f"\n"
|
||||
f"## A2A Endpoint\n"
|
||||
f"{endpoint}\n"
|
||||
f"\n"
|
||||
f"## MCP Tools\n"
|
||||
f"{tools_section}\n"
|
||||
)
|
||||
|
||||
Path(output_path).write_text(content, encoding="utf-8")
|
||||
logger.info("Generated AGENTS.md at %s for workspace %r", output_path, cfg.name)
|
||||
@@ -1,31 +0,0 @@
|
||||
# Publish-runtime pipeline verification — 2026-05-11
|
||||
|
||||
Marker file for the canonical end-to-end pipeline verification after
|
||||
`publish-runtime-bot` provisioning (internal#327) + stale-tag drift
|
||||
resolution (`runtime-v0.1.131` deleted from main).
|
||||
|
||||
## Purpose
|
||||
|
||||
Triggers `workspace/**` path filter on `publish-runtime-autobump.yml`,
|
||||
exercising the full pipeline:
|
||||
|
||||
1. `publish-runtime-autobump / bump-and-tag` reads PyPI version, computes
|
||||
next, pushes tag `runtime-v0.1.131` (or higher) using new bot scope.
|
||||
2. `publish-runtime.yml` fires on tag, builds + publishes to PyPI.
|
||||
3. Cascade autobump: 9 template repos get their `.runtime-version`
|
||||
pinned to the new version.
|
||||
|
||||
## Acceptance criteria
|
||||
|
||||
- [ ] autobump bump-and-tag context green on merged commit
|
||||
- [ ] tag `runtime-v0.1.131` (or computed next) exists on molecule-core
|
||||
- [ ] publish-runtime.yml run green
|
||||
- [ ] PyPI molecule-ai-workspace-runtime updated from 0.1.130
|
||||
- [ ] 9 template repos updated their pinned runtime version
|
||||
|
||||
## Rollback
|
||||
|
||||
This file is informational only — no code dependency. Safe to delete
|
||||
in any future PR once pipeline is proven stable.
|
||||
|
||||
— core-devops (per Hongming "long-term proper robust" directive 2026-05-11 19:48-19:50Z)
|
||||
@@ -1,84 +0,0 @@
|
||||
"""Build the Starlette routes for a workspace from its (card, adapter
|
||||
state) pair.
|
||||
|
||||
Pairs with PR #2756, which decoupled ``/.well-known/agent-card.json`` from
|
||||
``adapter.setup()`` failure. main.py was the only consumer and was
|
||||
``# pragma: no cover`` — so the wiring (card-route mounted unconditionally,
|
||||
JSON-RPC route swapped between DefaultRequestHandler and the
|
||||
not-configured handler based on ``adapter_ready``) had no pytest coverage.
|
||||
|
||||
A future refactor that re-couples the two would silently bypass PR #2756
|
||||
and shipped the original "stuck booting forever" UX again. That gap is
|
||||
what closes here: extract the route-assembly into a pure function whose
|
||||
behaviour is unit-testable with Starlette's TestClient, and have main.py
|
||||
call it. Issue molecule-core#2761.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
from starlette.routing import Route
|
||||
|
||||
from not_configured_handler import make_not_configured_handler
|
||||
|
||||
# Heavy a2a-sdk imports are lazy: deferred to inside build_routes so
|
||||
# tests that exercise only the not-configured branch (no executor) don't
|
||||
# need a2a.server.request_handlers / routes stubbed in their conftest.
|
||||
# Production boot pays the import cost once, on workspace startup.
|
||||
|
||||
|
||||
def build_routes(
|
||||
agent_card: Any,
|
||||
executor: Any | None,
|
||||
adapter_error: str | None,
|
||||
) -> list:
|
||||
"""Return the list of Starlette routes for this workspace.
|
||||
|
||||
Always mounts ``/.well-known/agent-card.json`` from ``agent_card``.
|
||||
|
||||
JSON-RPC route at ``/`` swaps based on adapter state:
|
||||
|
||||
* ``executor`` is non-None → ``DefaultRequestHandler`` with the
|
||||
executor (production happy-path).
|
||||
* ``executor`` is None → ``not_configured_handler`` returning JSON-RPC
|
||||
``-32603`` with ``adapter_error`` in ``error.data``. The
|
||||
workspace stays REACHABLE (operator can introspect, deprovision,
|
||||
redeploy with corrected env) instead of crash-looping invisibly.
|
||||
|
||||
The two branches are mutually exclusive — caller passes one or the
|
||||
other, never both. Test coverage at ``tests/test_boot_routes.py``
|
||||
pins the contract.
|
||||
"""
|
||||
from a2a.server.routes import create_agent_card_routes
|
||||
|
||||
routes: list = []
|
||||
routes.extend(create_agent_card_routes(agent_card))
|
||||
|
||||
if executor is not None:
|
||||
from a2a.server.request_handlers import DefaultRequestHandler
|
||||
from a2a.server.routes import create_jsonrpc_routes
|
||||
from a2a.server.tasks import InMemoryTaskStore
|
||||
|
||||
handler = DefaultRequestHandler(
|
||||
agent_executor=executor,
|
||||
task_store=InMemoryTaskStore(),
|
||||
agent_card=agent_card,
|
||||
)
|
||||
# enable_v0_3_compat=True is the JSON-RPC wire-compat path: clients
|
||||
# using v0.3-shaped payloads (`"role": "user"` lowercase + camelCase
|
||||
# Pydantic field names) can talk to us without re-deploying.
|
||||
# Outbound payloads must also use v0.3 shape — see main.py's
|
||||
# original comment block for the full a2a-sdk 1.x migration note.
|
||||
routes.extend(
|
||||
create_jsonrpc_routes(
|
||||
request_handler=handler,
|
||||
rpc_url="/",
|
||||
enable_v0_3_compat=True,
|
||||
)
|
||||
)
|
||||
else:
|
||||
routes.append(
|
||||
Route("/", make_not_configured_handler(adapter_error), methods=["POST"])
|
||||
)
|
||||
|
||||
return routes
|
||||
@@ -1,37 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# build-all.sh — Rebuild base image and optionally adapter images.
|
||||
#
|
||||
# NOTE: Adapters have been extracted to standalone template repos:
|
||||
# https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-<runtime>
|
||||
#
|
||||
# This script now only builds the base image from workspace/Dockerfile.
|
||||
# Each adapter repo has its own Dockerfile that installs molecule-ai-workspace-runtime
|
||||
# from PyPI and the adapter-specific deps.
|
||||
#
|
||||
# Usage:
|
||||
# bash workspace/build-all.sh # Build base image only
|
||||
#
|
||||
# Standalone adapter repos still reference the legacy base image for local dev
|
||||
# (e.g. FROM workspace-template:base). To build those locally, clone the adapter
|
||||
# repo and run `docker build -t workspace-template:<runtime> .` from its root.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
cd "$SCRIPT_DIR"
|
||||
|
||||
GREEN='\033[0;32m'
|
||||
RED='\033[0;31m'
|
||||
NC='\033[0m'
|
||||
|
||||
log() { echo -e "${GREEN}[build]${NC} $1" >&2; }
|
||||
err() { echo -e "${RED}[error]${NC} $1" >&2; }
|
||||
|
||||
# Build base image
|
||||
log "Building workspace-template:base ..."
|
||||
if ! docker build -t workspace-template:base -f Dockerfile . ; then
|
||||
err "Base image build failed"
|
||||
exit 1
|
||||
fi
|
||||
log "Base image built"
|
||||
log "Done. Adapters are in standalone template repos — see docs/workspace-runtime-package.md"
|
||||
@@ -1,139 +0,0 @@
|
||||
"""A2A communication tools — framework-agnostic delegation and peer discovery.
|
||||
|
||||
These are plain async functions that any adapter can wrap in its native tool format.
|
||||
The LangChain @tool versions are in tools/delegation.py.
|
||||
"""
|
||||
|
||||
import os
|
||||
import uuid
|
||||
|
||||
import httpx
|
||||
|
||||
# OFFSEC-003: peer-controlled text MUST be wrapped with sanitize_a2a_result
|
||||
# before being returned to the LLM. This module's delegate_task() is one of
|
||||
# the trust-boundary entry points where peer output crosses into our agent's
|
||||
# context — same surface as a2a_tools_delegation.py:325 (fixed via #492).
|
||||
# Issue #537.
|
||||
from _sanitize_a2a import sanitize_a2a_result
|
||||
|
||||
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
|
||||
WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")
|
||||
|
||||
|
||||
async def list_peers() -> list[dict]:
|
||||
"""Get this workspace's peers from the platform registry."""
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
try:
|
||||
resp = await client.get(f"{PLATFORM_URL}/registry/{WORKSPACE_ID}/peers")
|
||||
if resp.status_code == 200:
|
||||
return resp.json()
|
||||
return []
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
|
||||
async def delegate_task(workspace_id: str, task: str) -> str:
|
||||
"""Send a task to a peer workspace via A2A and return the response text."""
|
||||
# Task #190 / #193 — Self-delegation guard. Without this, a workspace
|
||||
# delegating to its own UUID round-trips through the platform proxy back
|
||||
# into the sender; the synchronous handler waits on the same lock the
|
||||
# caller holds, the request times out, and the platform writes an
|
||||
# a2a_receive activity row with source_id=our own workspace UUID. The
|
||||
# inbox poller then surfaces that row as kind="peer_agent" and the agent
|
||||
# sees the timeout echoed back as a peer instructing it (#190).
|
||||
#
|
||||
# The sibling guards live in:
|
||||
# - workspace-server/internal/handlers/delegation.go (Go API gate)
|
||||
# - workspace/a2a_tools_delegation.py (MCP path guard)
|
||||
# This module is the framework-agnostic adapter surface used by adapters
|
||||
# that don't go through a2a_tools_delegation.py — it needs its own guard.
|
||||
if WORKSPACE_ID and workspace_id == WORKSPACE_ID:
|
||||
return (
|
||||
"Error: self-delegation rejected (cannot delegate_task to your own "
|
||||
"workspace). There is no peer who is also you — the platform proxy "
|
||||
"would deadlock and the timeout would echo back as a peer_agent "
|
||||
"message from yourself (#190). Do the work directly, or use "
|
||||
"commit_memory / send_message_to_user instead."
|
||||
)
|
||||
|
||||
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||
# Discover target URL
|
||||
try:
|
||||
resp = await client.get(
|
||||
f"{PLATFORM_URL}/registry/discover/{workspace_id}",
|
||||
headers={"X-Workspace-ID": WORKSPACE_ID},
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
return f"Error: cannot reach workspace {workspace_id} (status {resp.status_code})"
|
||||
target_url = resp.json().get("url", "")
|
||||
if not target_url:
|
||||
return f"Error: workspace {workspace_id} has no URL"
|
||||
except Exception as e:
|
||||
return f"Error discovering workspace: {e}"
|
||||
|
||||
# Send A2A message. X-Workspace-ID identifies us as the source —
|
||||
# without it the platform's a2a_receive logger writes
|
||||
# source_id=NULL and the recipient's My Chat tab renders the
|
||||
# delegation as if a human user typed it. Same hazard fixed
|
||||
# in heartbeat.py / a2a_client.py / main.py initial+idle flows.
|
||||
try:
|
||||
a2a_resp = await client.post(
|
||||
target_url,
|
||||
headers={"X-Workspace-ID": WORKSPACE_ID},
|
||||
json={
|
||||
"jsonrpc": "2.0",
|
||||
"id": str(uuid.uuid4()),
|
||||
"method": "message/send",
|
||||
"params": {
|
||||
"message": {
|
||||
"role": "user",
|
||||
"messageId": str(uuid.uuid4()),
|
||||
"parts": [{"kind": "text", "text": task}],
|
||||
},
|
||||
},
|
||||
},
|
||||
)
|
||||
data = a2a_resp.json()
|
||||
if "result" in data:
|
||||
result = data["result"]
|
||||
parts = result.get("parts", []) if isinstance(result, dict) else []
|
||||
if parts and isinstance(parts[0], dict):
|
||||
# OFFSEC-003: wrap peer-controlled text before returning
|
||||
# to LLM context. Issue #537.
|
||||
return sanitize_a2a_result(parts[0].get("text", "(no text)"))
|
||||
# Empty parts list (e.g. {"parts": []}) should return str(result),
|
||||
# not "(no text)" — preserves pre-fix behavior (#279 regression fix).
|
||||
if isinstance(result, dict) and result.get("parts") == []:
|
||||
return sanitize_a2a_result(str(result))
|
||||
return sanitize_a2a_result(str(result) if isinstance(result, str) else "(no text)")
|
||||
elif "error" in data:
|
||||
err = data["error"]
|
||||
# Handle both string-form errors ("error": "some string")
|
||||
# and object-form errors ("error": {"message": "...", "code": ...}).
|
||||
msg = ""
|
||||
if isinstance(err, dict):
|
||||
msg = err.get("message", "")
|
||||
elif isinstance(err, str):
|
||||
msg = err
|
||||
else:
|
||||
msg = str(err)
|
||||
# OFFSEC-003: peer-controlled error message; wrap before return.
|
||||
return sanitize_a2a_result(f"Error: {msg}")
|
||||
return sanitize_a2a_result(str(data))
|
||||
except Exception as e:
|
||||
return f"Error sending A2A message: {e}"
|
||||
|
||||
|
||||
async def get_peers_summary() -> str:
|
||||
"""Return a formatted string of available peers for system prompts."""
|
||||
peers = await list_peers()
|
||||
if not peers:
|
||||
return "No peers available."
|
||||
lines = []
|
||||
for p in peers:
|
||||
name = p.get("name", "Unknown")
|
||||
pid = p.get("id", "")
|
||||
role = p.get("role", "")
|
||||
status = p.get("status", "")
|
||||
lines.append(f"- {name} (ID: {pid}) — {role} [{status}]")
|
||||
return "Available peers:\n" + "\n".join(lines)
|
||||
@@ -1,320 +0,0 @@
|
||||
"""Approval tool for human-in-the-loop workflows.
|
||||
|
||||
When an agent encounters a destructive, expensive, or unauthorized action,
|
||||
it calls request_approval() which creates a request and waits for a decision.
|
||||
|
||||
## Notification strategy
|
||||
|
||||
By default this module uses a **WebSocket subscription** (APPROVAL_USE_WEBSOCKET=true
|
||||
or when the ``websockets`` package is installed). The platform pushes an
|
||||
``APPROVAL_DECIDED`` event to the workspace WebSocket as soon as a human
|
||||
clicks Approve / Deny on the canvas — no polling required, instant delivery.
|
||||
|
||||
If WebSocket is unavailable (env var opt-out or import error) the module
|
||||
falls back to a **polling loop** so existing deployments without WebSocket
|
||||
support continue to work without any config change.
|
||||
|
||||
RBAC enforcement
|
||||
----------------
|
||||
The calling workspace must hold a role that grants the ``"approve"`` action.
|
||||
Roles are read from ``config.yaml`` under ``rbac.roles`` (default: operator).
|
||||
|
||||
Audit trail
|
||||
-----------
|
||||
Every approval lifecycle emits structured JSON Lines records:
|
||||
|
||||
1. ``approval / approve / requested`` — request submitted to platform
|
||||
2. ``approval / approve / granted`` — human approved (actor = decided_by)
|
||||
3. ``approval / approve / denied`` — human denied (actor = decided_by)
|
||||
4. ``approval / approve / timeout`` — no decision within APPROVAL_TIMEOUT
|
||||
|
||||
RBAC denials emit an ``rbac / rbac.deny / denied`` event instead.
|
||||
|
||||
Environment variables
|
||||
---------------------
|
||||
PLATFORM_URL Platform base URL (default: http://platform:8080)
|
||||
WORKSPACE_ID This workspace's ID (default: "")
|
||||
APPROVAL_TIMEOUT Max wait in seconds (default: 300)
|
||||
APPROVAL_POLL_INTERVAL Polling interval in seconds (default: 5, polling path only)
|
||||
APPROVAL_USE_WEBSOCKET "true" to force WS, "false"
|
||||
to force polling (default: auto-detect)
|
||||
AUDIT_LOG_PATH Path for JSON Lines audit log (default: /var/log/molecule/audit.jsonl)
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import uuid
|
||||
|
||||
import httpx
|
||||
from langchain_core.tools import tool
|
||||
|
||||
from builtin_tools.audit import check_permission, get_workspace_roles, log_event
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
|
||||
WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")
|
||||
APPROVAL_POLL_INTERVAL = float(os.environ.get("APPROVAL_POLL_INTERVAL", "5"))
|
||||
APPROVAL_TIMEOUT = float(os.environ.get("APPROVAL_TIMEOUT", "300"))
|
||||
|
||||
# Auto-detect WebSocket support; can be overridden with env var
|
||||
_ws_env = os.environ.get("APPROVAL_USE_WEBSOCKET", "").lower()
|
||||
if _ws_env == "false":
|
||||
_USE_WEBSOCKET_DEFAULT = False
|
||||
elif _ws_env == "true":
|
||||
_USE_WEBSOCKET_DEFAULT = True
|
||||
else:
|
||||
try:
|
||||
import websockets as _ws_probe # noqa: F401
|
||||
_USE_WEBSOCKET_DEFAULT = True
|
||||
except ImportError:
|
||||
_USE_WEBSOCKET_DEFAULT = False
|
||||
|
||||
# Module-level reference so tests can monkeypatch it
|
||||
try:
|
||||
import websockets
|
||||
except ImportError:
|
||||
websockets = None # type: ignore[assignment]
|
||||
|
||||
# Expose for test introspection
|
||||
APPROVAL_USE_WEBSOCKET = _USE_WEBSOCKET_DEFAULT
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Internal helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
async def _create_approval_request(action: str, reason: str) -> dict:
|
||||
"""POST to the platform to create an approval request.
|
||||
|
||||
Returns {"approval_id": str} on success or {"error": str} on failure.
|
||||
"""
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
try:
|
||||
resp = await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/approvals",
|
||||
json={"action": action, "reason": reason},
|
||||
)
|
||||
if resp.status_code != 201:
|
||||
return {"error": f"Failed to create request: {resp.status_code}"}
|
||||
try:
|
||||
approval_id = resp.json().get("approval_id")
|
||||
except (ValueError, Exception):
|
||||
return {"error": f"Platform returned invalid JSON (status {resp.status_code})"}
|
||||
logger.info("Approval requested: %s (id=%s)", action, approval_id)
|
||||
return {"approval_id": approval_id}
|
||||
except Exception as e:
|
||||
return {"error": f"Failed to request approval: {e}"}
|
||||
|
||||
|
||||
async def _wait_websocket(approval_id: str, timeout: float) -> dict:
|
||||
"""Subscribe to the platform WebSocket and wait for APPROVAL_DECIDED event.
|
||||
|
||||
Returns the decision dict or raises asyncio.TimeoutError on expiry.
|
||||
"""
|
||||
ws_url = (
|
||||
PLATFORM_URL.replace("http://", "ws://").replace("https://", "wss://")
|
||||
+ "/ws"
|
||||
)
|
||||
headers = {"X-Workspace-ID": WORKSPACE_ID}
|
||||
|
||||
logger.debug("Approval %s: waiting via WebSocket %s", approval_id, ws_url)
|
||||
|
||||
async with websockets.connect(ws_url, additional_headers=headers) as ws:
|
||||
async for raw_message in ws:
|
||||
try:
|
||||
event = json.loads(raw_message)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
|
||||
if event.get("event") != "APPROVAL_DECIDED":
|
||||
continue
|
||||
if event.get("approval_id") != approval_id:
|
||||
continue
|
||||
|
||||
status = event.get("status")
|
||||
decided_by = event.get("decided_by", "")
|
||||
logger.info("Approval %s decided via WebSocket: %s by %s",
|
||||
approval_id, status, decided_by)
|
||||
|
||||
if status == "approved":
|
||||
return {
|
||||
"approved": True,
|
||||
"approval_id": approval_id,
|
||||
"decided_by": decided_by,
|
||||
}
|
||||
else:
|
||||
return {
|
||||
"approved": False,
|
||||
"approval_id": approval_id,
|
||||
"decided_by": decided_by,
|
||||
"message": "Denied by human",
|
||||
}
|
||||
|
||||
|
||||
async def _wait_polling(approval_id: str, timeout: float) -> dict:
|
||||
"""Legacy polling loop — checks platform REST endpoint every APPROVAL_POLL_INTERVAL seconds."""
|
||||
elapsed = 0.0
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
while elapsed < timeout:
|
||||
await asyncio.sleep(APPROVAL_POLL_INTERVAL)
|
||||
elapsed += APPROVAL_POLL_INTERVAL
|
||||
try:
|
||||
resp = await client.get(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/approvals",
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
for a in resp.json():
|
||||
if a.get("id") == approval_id:
|
||||
status = a.get("status")
|
||||
if status == "approved":
|
||||
logger.info("Approval granted (poll): %s", approval_id)
|
||||
return {
|
||||
"approved": True,
|
||||
"approval_id": approval_id,
|
||||
"decided_by": a.get("decided_by"),
|
||||
}
|
||||
elif status == "denied":
|
||||
logger.info("Approval denied (poll): %s", approval_id)
|
||||
return {
|
||||
"approved": False,
|
||||
"approval_id": approval_id,
|
||||
"decided_by": a.get("decided_by"),
|
||||
"message": "Denied by human",
|
||||
}
|
||||
except Exception:
|
||||
pass # transient error — keep retrying
|
||||
|
||||
raise asyncio.TimeoutError()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public tool
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@tool
|
||||
async def request_approval(
|
||||
action: str,
|
||||
reason: str,
|
||||
) -> dict:
|
||||
"""Request human approval before proceeding with a sensitive action.
|
||||
|
||||
Use this when you're about to do something destructive, expensive,
|
||||
or outside your normal authority. The request is sent to the canvas
|
||||
where a human can approve or deny it.
|
||||
|
||||
Args:
|
||||
action: Short description of what you want to do
|
||||
reason: Why this action is necessary
|
||||
"""
|
||||
# One trace_id links every audit event for this approval lifecycle.
|
||||
trace_id = str(uuid.uuid4())
|
||||
|
||||
# --- RBAC check -----------------------------------------------------------
|
||||
roles, custom_perms = get_workspace_roles()
|
||||
if not check_permission("approve", roles, custom_perms):
|
||||
log_event(
|
||||
event_type="rbac",
|
||||
action="rbac.deny",
|
||||
resource=action,
|
||||
outcome="denied",
|
||||
trace_id=trace_id,
|
||||
attempted_action="approve",
|
||||
roles=roles,
|
||||
)
|
||||
return {
|
||||
"approved": False,
|
||||
"error": (
|
||||
"RBAC: this workspace does not have the 'approve' permission. "
|
||||
f"Current roles: {roles}"
|
||||
),
|
||||
}
|
||||
|
||||
# Step 1: Create the approval request
|
||||
creation = await _create_approval_request(action, reason)
|
||||
if "error" in creation:
|
||||
log_event(
|
||||
event_type="approval",
|
||||
action="approve",
|
||||
resource=action,
|
||||
outcome="failure",
|
||||
trace_id=trace_id,
|
||||
reason="submit_failed",
|
||||
error=creation["error"],
|
||||
)
|
||||
return {"approved": False, "error": creation["error"]}
|
||||
|
||||
approval_id = creation["approval_id"]
|
||||
log_event(
|
||||
event_type="approval",
|
||||
action="approve",
|
||||
resource=action,
|
||||
outcome="requested",
|
||||
trace_id=trace_id,
|
||||
approval_id=approval_id,
|
||||
reason_text=reason,
|
||||
)
|
||||
|
||||
timeout = float(os.environ.get("APPROVAL_TIMEOUT", str(APPROVAL_TIMEOUT)))
|
||||
|
||||
# Step 2: Wait for decision — WebSocket preferred, polling as fallback
|
||||
use_ws = APPROVAL_USE_WEBSOCKET and websockets is not None
|
||||
|
||||
try:
|
||||
if use_ws:
|
||||
try:
|
||||
result = await asyncio.wait_for(
|
||||
_wait_websocket(approval_id, timeout),
|
||||
timeout=timeout,
|
||||
)
|
||||
except Exception as ws_err:
|
||||
# WebSocket failed (connection error, etc.) — fall through to polling
|
||||
logger.warning(
|
||||
"WebSocket approval wait failed (%s), falling back to polling",
|
||||
ws_err,
|
||||
)
|
||||
result = await asyncio.wait_for(
|
||||
_wait_polling(approval_id, timeout),
|
||||
timeout=timeout + APPROVAL_POLL_INTERVAL,
|
||||
)
|
||||
else:
|
||||
# Polling path (primary when WS disabled)
|
||||
result = await asyncio.wait_for(
|
||||
_wait_polling(approval_id, timeout),
|
||||
timeout=timeout + APPROVAL_POLL_INTERVAL, # slight grace period
|
||||
)
|
||||
|
||||
# Log the human decision
|
||||
decided_by = result.get("decided_by")
|
||||
outcome = "granted" if result.get("approved") else "denied"
|
||||
log_event(
|
||||
event_type="approval",
|
||||
action="approve",
|
||||
resource=action,
|
||||
outcome=outcome,
|
||||
# Record the human identity as actor when available
|
||||
actor=decided_by or WORKSPACE_ID,
|
||||
trace_id=trace_id,
|
||||
approval_id=approval_id,
|
||||
decided_by=decided_by,
|
||||
)
|
||||
return result
|
||||
|
||||
except asyncio.TimeoutError:
|
||||
logger.warning("Approval timed out after %.0fs: %s", timeout, approval_id)
|
||||
log_event(
|
||||
event_type="approval",
|
||||
action="approve",
|
||||
resource=action,
|
||||
outcome="timeout",
|
||||
trace_id=trace_id,
|
||||
approval_id=approval_id,
|
||||
timeout_seconds=timeout,
|
||||
)
|
||||
return {
|
||||
"approved": False,
|
||||
"approval_id": approval_id,
|
||||
"error": f"Timed out after {timeout}s waiting for human decision",
|
||||
}
|
||||
@@ -1,274 +0,0 @@
|
||||
"""Immutable append-only audit log for EU AI Act compliance.
|
||||
|
||||
Fulfils Article 12 (record-keeping), Article 13 (transparency), and
|
||||
Article 17 (quality-management system) requirements for high-risk AI systems.
|
||||
|
||||
Log format: JSON Lines (one UTF-8 JSON object per line), suitable for direct
|
||||
ingestion by any SIEM (Splunk, Elastic, Datadog, etc.).
|
||||
|
||||
Required event fields
|
||||
---------------------
|
||||
timestamp ISO 8601 UTC datetime with timezone offset
|
||||
event_type Coarse category: "delegation", "approval", "memory", "rbac"
|
||||
workspace_id Workspace that generated this event
|
||||
actor Entity that triggered the action; defaults to workspace_id for
|
||||
automated events, or the human identity for approval decisions
|
||||
action Verb describing what was attempted:
|
||||
delegate | approve | memory.read | memory.write | rbac.deny
|
||||
resource Object of the action: target workspace ID, memory scope,
|
||||
approval action string, etc.
|
||||
outcome One of: allowed | denied | success | failure | timeout |
|
||||
requested | granted
|
||||
trace_id UUID v4 correlating related events across workspaces
|
||||
|
||||
The log file is opened in append mode ("a") on every write — it is NEVER
|
||||
truncated, rewritten, or deleted by this module. Rotate externally using
|
||||
logrotate (with ``copytruncate`` disabled) or ship to a SIEM before rotating.
|
||||
|
||||
Configuration
|
||||
-------------
|
||||
AUDIT_LOG_PATH env var — full path to the JSONL file
|
||||
default: /var/log/molecule/audit.jsonl
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import functools
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import threading
|
||||
import uuid
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
if TYPE_CHECKING:
|
||||
pass # avoid circular import at runtime
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Configuration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
AUDIT_LOG_PATH: str = os.environ.get(
|
||||
"AUDIT_LOG_PATH", "/var/log/molecule/audit.jsonl"
|
||||
)
|
||||
WORKSPACE_ID: str = os.environ.get("WORKSPACE_ID", "")
|
||||
|
||||
# Protects the open() + write() sequence; prevents interleaved JSON lines
|
||||
# when multiple async tasks run in the same event-loop thread.
|
||||
_write_lock = threading.Lock()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Built-in role → permitted-action mappings
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: Maps each built-in role name to the set of actions it grants.
|
||||
#: Custom roles can be added in config.yaml under ``rbac.allowed_actions``.
|
||||
ROLE_PERMISSIONS: dict[str, set[str]] = {
|
||||
# Full access — shortcircuits all other checks
|
||||
"admin": {"delegate", "approve", "memory.read", "memory.write"},
|
||||
# Standard agent role
|
||||
"operator": {"delegate", "approve", "memory.read", "memory.write"},
|
||||
# Read-only observer — no writes, no delegation, no approvals
|
||||
"read-only": {"memory.read"},
|
||||
# Can approve and write memory, but cannot delegate
|
||||
"no-delegation": {"approve", "memory.read", "memory.write"},
|
||||
# Can delegate and write memory, but cannot invoke approval gate
|
||||
"no-approval": {"delegate", "memory.read", "memory.write"},
|
||||
# Memory reads only (useful for analytic sidecars)
|
||||
"memory-readonly": {"memory.read"},
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Config loader (lazy, cached per process)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@functools.lru_cache(maxsize=1)
|
||||
def _load_workspace_config():
|
||||
"""Return the WorkspaceConfig or None if it cannot be loaded."""
|
||||
try:
|
||||
from config import load_config # local import avoids circular deps
|
||||
return load_config()
|
||||
except Exception as exc:
|
||||
logger.warning("audit: could not load workspace config for RBAC: %s", exc)
|
||||
return None
|
||||
|
||||
|
||||
def get_workspace_roles() -> tuple[list[str], dict[str, list[str]]]:
|
||||
"""Return ``(roles, custom_permissions)`` from the workspace config.
|
||||
|
||||
Falls back to ``["operator"]`` / ``{}`` when the config is unavailable so
|
||||
that agents remain functional in degraded environments.
|
||||
"""
|
||||
cfg = _load_workspace_config()
|
||||
if cfg is None:
|
||||
return ["operator"], {}
|
||||
return list(cfg.rbac.roles), dict(cfg.rbac.allowed_actions)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# RBAC helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def check_permission(
|
||||
action: str,
|
||||
roles: list[str],
|
||||
custom_permissions: dict[str, list[str]] | None = None,
|
||||
) -> bool:
|
||||
"""Return True if *any* of ``roles`` grants ``action``.
|
||||
|
||||
Evaluation order
|
||||
~~~~~~~~~~~~~~~~
|
||||
1. ``"admin"`` shortcircuits — always grants everything.
|
||||
2. Custom role definitions (from ``rbac.allowed_actions`` in config.yaml).
|
||||
3. Built-in :data:`ROLE_PERMISSIONS` table.
|
||||
|
||||
When a role appears in *custom_permissions* its built-in definition is
|
||||
**ignored** — the custom list is the complete permission set for that role.
|
||||
|
||||
Args:
|
||||
action: Action to authorise, e.g. ``"delegate"``.
|
||||
roles: Roles assigned to the calling workspace.
|
||||
custom_permissions: Optional ``{role: [action, ...]}`` mapping loaded
|
||||
from ``WorkspaceConfig.rbac.allowed_actions``.
|
||||
|
||||
Returns:
|
||||
``True`` if the action is permitted, ``False`` otherwise.
|
||||
|
||||
Examples::
|
||||
|
||||
>>> check_permission("delegate", ["operator"])
|
||||
True
|
||||
>>> check_permission("delegate", ["read-only"])
|
||||
False
|
||||
>>> check_permission("deploy", ["developer"], {"developer": ["deploy"]})
|
||||
True
|
||||
"""
|
||||
for role in roles:
|
||||
if role == "admin":
|
||||
return True
|
||||
if custom_permissions and role in custom_permissions:
|
||||
# Custom entry is definitive for this role
|
||||
if action in custom_permissions[role]:
|
||||
return True
|
||||
continue # Don't fall through to built-ins for custom roles
|
||||
if role in ROLE_PERMISSIONS and action in ROLE_PERMISSIONS[role]:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public audit API
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def log_event(
|
||||
event_type: str,
|
||||
action: str,
|
||||
resource: str,
|
||||
outcome: str,
|
||||
actor: str | None = None,
|
||||
trace_id: str | None = None,
|
||||
**extra: Any,
|
||||
) -> str:
|
||||
"""Append one audit event to the immutable JSON Lines log.
|
||||
|
||||
Args:
|
||||
event_type: Coarse category — ``"delegation"``, ``"approval"``,
|
||||
``"memory"``, or ``"rbac"``.
|
||||
action: Verb — ``"delegate"``, ``"approve"``, ``"memory.write"``,
|
||||
``"memory.read"``, ``"rbac.deny"``.
|
||||
resource: Object of the action — target workspace ID, memory scope,
|
||||
approval action string, etc.
|
||||
outcome: Terminal state — one of ``"allowed"``, ``"denied"``,
|
||||
``"success"``, ``"failure"``, ``"timeout"``,
|
||||
``"requested"``, ``"granted"``.
|
||||
actor: Identity that triggered the event. Defaults to
|
||||
``WORKSPACE_ID`` (the running workspace) for automated
|
||||
events. Pass ``decided_by`` for human approval decisions.
|
||||
trace_id: Caller-supplied UUID v4 for cross-event correlation.
|
||||
A fresh UUID is generated when omitted.
|
||||
**extra: Additional key-value pairs appended verbatim to the JSON
|
||||
object (e.g. ``target_workspace_id``, ``memory_scope``,
|
||||
``attempt``). Built-in keys cannot be overridden.
|
||||
|
||||
Returns:
|
||||
The ``trace_id`` used for this event, enabling callers to chain
|
||||
related events under a single correlation identifier.
|
||||
|
||||
Example::
|
||||
|
||||
trace = log_event(
|
||||
event_type="delegation",
|
||||
action="delegate",
|
||||
resource="billing-agent",
|
||||
outcome="success",
|
||||
target_workspace_id="billing-agent",
|
||||
attempt=1,
|
||||
)
|
||||
"""
|
||||
if trace_id is None:
|
||||
trace_id = str(uuid.uuid4())
|
||||
|
||||
event: dict[str, Any] = {
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"event_type": event_type,
|
||||
"workspace_id": WORKSPACE_ID,
|
||||
"actor": actor if actor is not None else WORKSPACE_ID,
|
||||
"action": action,
|
||||
"resource": resource,
|
||||
"outcome": outcome,
|
||||
"trace_id": trace_id,
|
||||
}
|
||||
|
||||
# Merge extra fields — built-in keys are not overridable
|
||||
for key, value in extra.items():
|
||||
if key not in event:
|
||||
event[key] = value
|
||||
|
||||
_write_event(event)
|
||||
return trace_id
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Internal writer
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _ensure_log_dir(path: str) -> None:
|
||||
"""Create the parent directory for *path* if it does not already exist."""
|
||||
Path(path).parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
|
||||
def _write_event(event: dict[str, Any]) -> None:
|
||||
"""Serialise *event* as a JSON line and fsync-append it to the log file.
|
||||
|
||||
The write is atomic with respect to other threads in this process: the
|
||||
lock ensures that no two JSON objects are interleaved on the same line.
|
||||
|
||||
Failures are emitted to the standard Python logger at WARNING level but
|
||||
are **never** re-raised — the application must not crash because audit
|
||||
logging is temporarily unavailable (e.g. disk full, permission error).
|
||||
In production, consider wiring an alert on WARNING messages from this
|
||||
module so that missing audit records are detected quickly.
|
||||
"""
|
||||
try:
|
||||
log_path = AUDIT_LOG_PATH
|
||||
_ensure_log_dir(log_path)
|
||||
line = json.dumps(event, default=str, ensure_ascii=False) + "\n"
|
||||
with _write_lock:
|
||||
with open(log_path, "a", encoding="utf-8") as fh:
|
||||
fh.write(line)
|
||||
fh.flush()
|
||||
os.fsync(fh.fileno())
|
||||
except Exception as exc: # pylint: disable=broad-except
|
||||
logger.warning(
|
||||
"Audit log write failed — event NOT persisted "
|
||||
"(trace_id=%s, action=%s): %s",
|
||||
event.get("trace_id", "?"),
|
||||
event.get("action", "?"),
|
||||
exc,
|
||||
)
|
||||
@@ -1,122 +0,0 @@
|
||||
"""Workspace-scoped awareness backend wrapper.
|
||||
|
||||
The agent-facing memory tools keep their existing signatures and delegate
|
||||
to this helper when workspace awareness is configured.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import sys
|
||||
from types import SimpleNamespace
|
||||
from typing import Any
|
||||
|
||||
from policies.namespaces import resolve_awareness_namespace
|
||||
|
||||
try: # pragma: no cover - optional runtime dependency in lightweight test envs
|
||||
import httpx # type: ignore
|
||||
except ImportError: # pragma: no cover
|
||||
httpx = SimpleNamespace(AsyncClient=None)
|
||||
|
||||
|
||||
DEFAULT_AWARENESS_TIMEOUT = 10.0
|
||||
|
||||
|
||||
def get_awareness_config() -> dict[str, str] | None:
|
||||
"""Return awareness connection settings if the workspace is configured."""
|
||||
base_url = os.environ.get("AWARENESS_URL", "").rstrip("/")
|
||||
workspace_id = os.environ.get("WORKSPACE_ID", "")
|
||||
configured_namespace = os.environ.get("AWARENESS_NAMESPACE", "")
|
||||
if not base_url:
|
||||
return None
|
||||
if not workspace_id and not configured_namespace:
|
||||
return None
|
||||
namespace = resolve_awareness_namespace(workspace_id, configured_namespace)
|
||||
return {
|
||||
"base_url": base_url,
|
||||
"namespace": namespace,
|
||||
}
|
||||
|
||||
|
||||
class AwarenessClient:
|
||||
"""Small HTTP client for workspace-scoped awareness memory operations."""
|
||||
|
||||
def __init__(self, base_url: str, namespace: str, timeout: float = DEFAULT_AWARENESS_TIMEOUT):
|
||||
self.base_url = base_url.rstrip("/")
|
||||
self.namespace = namespace
|
||||
self.timeout = timeout
|
||||
|
||||
def _memories_url(self) -> str:
|
||||
# Keep the awareness path isolated in one helper so the contract can
|
||||
# be adjusted later without touching the agent-facing tools.
|
||||
return f"{self.base_url}/api/v1/namespaces/{self.namespace}/memories"
|
||||
|
||||
async def commit(self, content: str, scope: str) -> dict[str, Any]:
|
||||
client_cls = _resolve_async_client()
|
||||
async with client_cls(timeout=self.timeout) as client:
|
||||
resp = await client.post(
|
||||
self._memories_url(),
|
||||
json={"content": content, "scope": scope},
|
||||
)
|
||||
return _parse_commit_response(resp, scope)
|
||||
|
||||
async def search(self, query: str = "", scope: str = "") -> dict[str, Any]:
|
||||
params: dict[str, str] = {}
|
||||
if query:
|
||||
params["q"] = query
|
||||
if scope:
|
||||
params["scope"] = scope
|
||||
|
||||
client_cls = _resolve_async_client()
|
||||
async with client_cls(timeout=self.timeout) as client:
|
||||
resp = await client.get(self._memories_url(), params=params)
|
||||
return _parse_search_response(resp)
|
||||
|
||||
|
||||
def build_awareness_client() -> AwarenessClient | None:
|
||||
"""Create an awareness client from the current workspace environment."""
|
||||
config = get_awareness_config()
|
||||
if not config:
|
||||
return None
|
||||
return AwarenessClient(config["base_url"], config["namespace"])
|
||||
|
||||
|
||||
def _parse_commit_response(resp: httpx.Response, scope: str) -> dict[str, Any]:
|
||||
data = _safe_json(resp)
|
||||
if resp.status_code in (200, 201):
|
||||
return {"success": True, "id": data.get("id"), "scope": scope}
|
||||
return {"success": False, "error": data.get("error", resp.text)}
|
||||
|
||||
|
||||
def _parse_search_response(resp: httpx.Response) -> dict[str, Any]:
|
||||
data = _safe_json(resp)
|
||||
if resp.status_code == 200:
|
||||
memories = data if isinstance(data, list) else data.get("memories", [])
|
||||
return {
|
||||
"success": True,
|
||||
"count": len(memories),
|
||||
"memories": memories,
|
||||
}
|
||||
return {"success": False, "error": data.get("error", resp.text)}
|
||||
|
||||
|
||||
def _safe_json(resp: httpx.Response) -> dict[str, Any] | list[Any]:
|
||||
try:
|
||||
return resp.json()
|
||||
except ValueError:
|
||||
return {"error": resp.text}
|
||||
|
||||
|
||||
def _resolve_async_client():
|
||||
client_cls = getattr(httpx, "AsyncClient", None)
|
||||
if client_cls is not None:
|
||||
return client_cls
|
||||
|
||||
memory_module = sys.modules.get("builtin_tools.memory")
|
||||
if memory_module is not None:
|
||||
memory_httpx = getattr(memory_module, "httpx", None)
|
||||
client_cls = getattr(memory_httpx, "AsyncClient", None)
|
||||
if client_cls is not None:
|
||||
return client_cls
|
||||
|
||||
raise RuntimeError("httpx.AsyncClient is unavailable")
|
||||
@@ -1,359 +0,0 @@
|
||||
"""OWASP Top 10 for Agentic Applications compliance enforcement (Dec 2025).
|
||||
|
||||
Enable via config.yaml::
|
||||
|
||||
compliance:
|
||||
mode: owasp_agentic
|
||||
prompt_injection: detect # detect | block
|
||||
max_tool_calls_per_task: 50
|
||||
max_task_duration_seconds: 300
|
||||
|
||||
When ``mode`` is absent or empty, this module is a no-op — no overhead, no
|
||||
behaviour change. This makes it safe to import unconditionally.
|
||||
|
||||
Coverage
|
||||
--------
|
||||
|
||||
OA-01 Prompt Injection (``sanitize_input``)
|
||||
Scans user-supplied text for instruction-override patterns, role-hijacking
|
||||
attempts, system-prompt delimiter injection, and known jailbreak keywords.
|
||||
|
||||
- ``detect`` (default): log an audit event, return the original text so
|
||||
the agent still processes the input. Operators are alerted without
|
||||
breaking legitimate use-cases that happen to contain trigger words.
|
||||
|
||||
- ``block``: raise ``PromptInjectionError`` before the agent sees the text.
|
||||
|
||||
OA-03 Excessive Agency (``check_agency_limits``)
|
||||
Tracks the number of tool calls and wall-clock time elapsed per task.
|
||||
When a limit is exceeded, ``ExcessiveAgencyError`` is raised. The caller
|
||||
(``a2a_executor.py``) catches it and terminates the task gracefully.
|
||||
|
||||
OA-02 / OA-06 Insecure Output / Sensitive Data Exposure (``redact_pii``)
|
||||
Scans agent output for credit-card numbers, SSNs, API keys, AWS access
|
||||
keys, and e-mail addresses. Detected values are replaced with
|
||||
``[REDACTED:<type>]`` tokens before the response reaches the caller.
|
||||
An audit event records the PII types found (not the values themselves).
|
||||
|
||||
Note on streaming: ``redact_pii`` is applied to the *final accumulated
|
||||
text* before the terminal ``Message`` event is emitted. Token-by-token
|
||||
SSE artifacts that have already been sent to streaming clients are not
|
||||
retroactively redacted. For full streaming redaction, integrate
|
||||
``redact_pii`` at the ``TaskArtifactUpdateEvent`` level.
|
||||
|
||||
Compliance posture report (``get_compliance_posture``)
|
||||
Returns the current effective compliance configuration as a plain ``dict``
|
||||
suitable for a health or audit endpoint, letting operators verify that the
|
||||
correct settings are active without reading config files.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import re
|
||||
import time
|
||||
import uuid
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
from builtin_tools.audit import log_event
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public exceptions
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class PromptInjectionError(ValueError):
|
||||
"""Raised when prompt injection is detected and ``prompt_injection=block``."""
|
||||
|
||||
|
||||
class ExcessiveAgencyError(RuntimeError):
|
||||
"""Raised when the tool-call count or task-duration limit is exceeded."""
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# OA-01 — Prompt Injection detection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: Compiled patterns matched against normalised (lowercased + collapsed) input.
|
||||
#: Add workspace-specific patterns in config if needed.
|
||||
_INJECTION_PATTERNS: list[tuple[re.Pattern[str], str]] = [
|
||||
# Instruction override
|
||||
(re.compile(r"ignore\s+(all\s+)?previous\s+instructions?", re.I), "instruction_override"),
|
||||
(re.compile(r"disregard\s+(all\s+)?previous", re.I), "instruction_override"),
|
||||
(re.compile(r"forget\s+(all\s+)?previous", re.I), "instruction_override"),
|
||||
(re.compile(r"override\s+(your\s+)?(instructions?|guidelines?|rules?)", re.I), "instruction_override"),
|
||||
# Role hijacking
|
||||
(re.compile(r"you\s+are\s+now\s+\w", re.I), "role_hijack"),
|
||||
(re.compile(r"act\s+as\s+(a\s+)?(new\s+|different\s+|unrestricted\s+)", re.I), "role_hijack"),
|
||||
(re.compile(r"roleplay\s+as", re.I), "role_hijack"),
|
||||
(re.compile(r"pretend\s+(you\s+are|to\s+be)\b", re.I), "role_hijack"),
|
||||
(re.compile(r"from\s+now\s+on\s+(you\s+are|act\s+as)", re.I), "role_hijack"),
|
||||
# System-prompt delimiter injection (LLM-specific tokens)
|
||||
(re.compile(r"<\|?\s*(system|im_start|im_end|endoftext)\s*\|?>", re.I), "delimiter_injection"),
|
||||
(re.compile(r"\[INST\]|\[/INST\]|\[\[SYS\]\]|\[\[/SYS\]\]", re.I), "delimiter_injection"),
|
||||
(re.compile(r"<</SYS>>|<<SYS>>", re.I), "delimiter_injection"),
|
||||
# DAN / jailbreak keywords
|
||||
(re.compile(r"\bDAN\b.{0,30}(mode|now|enabled|activated)", re.I), "jailbreak"),
|
||||
(re.compile(r"do\s+anything\s+now", re.I), "jailbreak"),
|
||||
(re.compile(r"\bjailbreak\b", re.I), "jailbreak"),
|
||||
(re.compile(r"developer\s+mode\s+(enabled|on)", re.I), "jailbreak"),
|
||||
# Prompt exfiltration
|
||||
(re.compile(r"(repeat|print|output|show|reveal|display)\s+(your\s+)?(system\s+prompt|initial\s+instructions?)", re.I), "prompt_exfiltration"),
|
||||
(re.compile(r"what\s+(are\s+)?your\s+(instructions?|system\s+prompt)", re.I), "prompt_exfiltration"),
|
||||
]
|
||||
|
||||
|
||||
def detect_prompt_injection(text: str) -> list[tuple[str, str]]:
|
||||
"""Return a list of ``(pattern_description, category)`` for each match.
|
||||
|
||||
Args:
|
||||
text: Raw user input to scan.
|
||||
|
||||
Returns:
|
||||
List of ``(matched_pattern, category)`` tuples; empty means clean.
|
||||
"""
|
||||
matches: list[tuple[str, str]] = []
|
||||
for pattern, category in _INJECTION_PATTERNS:
|
||||
m = pattern.search(text)
|
||||
if m:
|
||||
matches.append((m.group(0)[:80], category))
|
||||
return matches
|
||||
|
||||
|
||||
def sanitize_input(
|
||||
text: str,
|
||||
*,
|
||||
prompt_injection_mode: str = "detect",
|
||||
context_id: str = "",
|
||||
) -> str:
|
||||
"""Check *text* for prompt injection and enforce the configured response.
|
||||
|
||||
Args:
|
||||
text: User-supplied input to the agent.
|
||||
prompt_injection_mode: ``"detect"`` or ``"block"``.
|
||||
context_id: Task/context identifier for audit correlation.
|
||||
|
||||
Returns:
|
||||
The original *text* unchanged (``detect`` mode always returns input).
|
||||
|
||||
Raises:
|
||||
:class:`PromptInjectionError`: only when ``prompt_injection_mode="block"``
|
||||
and at least one injection pattern is matched.
|
||||
"""
|
||||
matches = detect_prompt_injection(text)
|
||||
if not matches:
|
||||
return text
|
||||
|
||||
categories = list({cat for _, cat in matches})
|
||||
trace_id = str(uuid.uuid4())
|
||||
|
||||
log_event(
|
||||
event_type="compliance",
|
||||
action="prompt_injection.detect",
|
||||
resource="user_input",
|
||||
outcome="detected" if prompt_injection_mode == "detect" else "blocked",
|
||||
trace_id=trace_id,
|
||||
context_id=context_id,
|
||||
categories=categories,
|
||||
match_count=len(matches),
|
||||
# Log category + truncated match, never the full raw text (OA-06)
|
||||
matches=[{"category": cat, "snippet": snippet} for snippet, cat in matches[:5]],
|
||||
)
|
||||
|
||||
if prompt_injection_mode == "block":
|
||||
raise PromptInjectionError(
|
||||
f"Prompt injection detected ({', '.join(categories)}). "
|
||||
"Request blocked by compliance policy."
|
||||
)
|
||||
|
||||
# detect mode — log and continue
|
||||
logger.warning(
|
||||
"Prompt injection patterns detected (context_id=%s, categories=%s) — "
|
||||
"passing to agent in detect mode",
|
||||
context_id,
|
||||
categories,
|
||||
)
|
||||
return text
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# OA-03 — Excessive Agency
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@dataclass
|
||||
class AgencyTracker:
|
||||
"""Per-task mutable state for excessive-agency enforcement.
|
||||
|
||||
Instantiate once per ``execute()`` call and pass to
|
||||
:func:`check_agency_limits` at each tool-start event.
|
||||
"""
|
||||
|
||||
max_tool_calls: int = 50
|
||||
max_duration_seconds: float = 300.0
|
||||
tool_call_count: int = field(default=0, init=False)
|
||||
start_time: float = field(default_factory=time.monotonic, init=False)
|
||||
|
||||
def on_tool_call(self, tool_name: str = "", context_id: str = "") -> None:
|
||||
"""Increment counter and enforce limits.
|
||||
|
||||
Raises:
|
||||
:class:`ExcessiveAgencyError`: if either limit is exceeded.
|
||||
"""
|
||||
self.tool_call_count += 1
|
||||
elapsed = time.monotonic() - self.start_time
|
||||
|
||||
if self.tool_call_count > self.max_tool_calls:
|
||||
log_event(
|
||||
event_type="compliance",
|
||||
action="excessive_agency.tool_limit",
|
||||
resource=tool_name or "unknown_tool",
|
||||
outcome="blocked",
|
||||
context_id=context_id,
|
||||
tool_call_count=self.tool_call_count,
|
||||
limit=self.max_tool_calls,
|
||||
elapsed_seconds=round(elapsed, 2),
|
||||
)
|
||||
raise ExcessiveAgencyError(
|
||||
f"Tool call limit exceeded: {self.tool_call_count} calls > "
|
||||
f"max {self.max_tool_calls} per task"
|
||||
)
|
||||
|
||||
if elapsed > self.max_duration_seconds:
|
||||
log_event(
|
||||
event_type="compliance",
|
||||
action="excessive_agency.duration_limit",
|
||||
resource=tool_name or "unknown_tool",
|
||||
outcome="blocked",
|
||||
context_id=context_id,
|
||||
tool_call_count=self.tool_call_count,
|
||||
elapsed_seconds=round(elapsed, 2),
|
||||
limit_seconds=self.max_duration_seconds,
|
||||
)
|
||||
raise ExcessiveAgencyError(
|
||||
f"Task duration limit exceeded: {elapsed:.0f}s > "
|
||||
f"max {self.max_duration_seconds:.0f}s per task"
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# OA-02 / OA-06 — PII redaction
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: ``(compiled_pattern, replacement_token)`` pairs applied in order.
|
||||
#: The replacement tokens are SIEM-friendly: ``[REDACTED:type]``.
|
||||
_PII_PATTERNS: list[tuple[re.Pattern[str], str]] = [
|
||||
# Formatted credit cards: XXXX-XXXX-XXXX-XXXX or XXXX XXXX XXXX XXXX
|
||||
(re.compile(r"\b\d{4}[\s\-]\d{4}[\s\-]\d{4}[\s\-]\d{4}\b"), "[REDACTED:credit_card]"),
|
||||
# US Social Security Numbers: XXX-XX-XXXX
|
||||
(re.compile(r"\b\d{3}-\d{2}-\d{4}\b"), "[REDACTED:ssn]"),
|
||||
# OpenAI-style keys: sk-... (≥ 32 chars after prefix)
|
||||
(re.compile(r"\bsk-[A-Za-z0-9_\-]{32,}\b"), "[REDACTED:api_key]"),
|
||||
# Generic API/secret keys with common prefixes
|
||||
(re.compile(r"\b(?:sk|pk|api|secret|token|auth)[-_][A-Za-z0-9_\-]{20,}\b", re.I), "[REDACTED:api_key]"),
|
||||
# AWS Access Key IDs
|
||||
(re.compile(r"\bAKIA[0-9A-Z]{16}\b"), "[REDACTED:aws_key]"),
|
||||
# GitHub personal access tokens — classic format (36-char alphanumeric suffix)
|
||||
(re.compile(r"\bghp_[A-Za-z0-9]{36}\b"), "[REDACTED:github_token]"),
|
||||
# GitHub personal access tokens — fine-grained format (82-char alphanumeric+underscore suffix)
|
||||
(re.compile(r"\bgithub_pat_[A-Za-z0-9_]{82}\b"), "[REDACTED:github_token]"),
|
||||
# Email addresses
|
||||
(re.compile(r"\b[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b"), "[REDACTED:email]"),
|
||||
]
|
||||
|
||||
|
||||
def redact_pii(text: str) -> tuple[str, list[str]]:
|
||||
"""Redact PII from *text* and return ``(redacted_text, pii_types_found)``.
|
||||
|
||||
Each unique PII type is reported at most once in ``pii_types_found``.
|
||||
The replacement tokens (``[REDACTED:type]``) are SIEM-indexable and
|
||||
preserve the structural context of the output while hiding sensitive data.
|
||||
|
||||
Args:
|
||||
text: Agent output text to scan.
|
||||
|
||||
Returns:
|
||||
Tuple of ``(redacted_text, list_of_pii_type_strings)``. The list is
|
||||
empty when no PII is detected (the common case).
|
||||
|
||||
Examples::
|
||||
|
||||
>>> redacted, types = redact_pii("Call me at test@example.com sk-abc123...")
|
||||
>>> "email" in types
|
||||
True
|
||||
>>> "[REDACTED:email]" in redacted
|
||||
True
|
||||
"""
|
||||
found: list[str] = []
|
||||
result = text
|
||||
for pattern, replacement in _PII_PATTERNS:
|
||||
new_result = pattern.sub(replacement, result)
|
||||
if new_result != result:
|
||||
# Extract type from "[REDACTED:type]"
|
||||
pii_type = replacement[len("[REDACTED:"):-1]
|
||||
if pii_type not in found:
|
||||
found.append(pii_type)
|
||||
result = new_result
|
||||
return result, found
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Compliance posture report
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def get_compliance_posture() -> dict[str, Any]:
|
||||
"""Return the current compliance configuration as a serialisable dict.
|
||||
|
||||
Loads ``WorkspaceConfig`` lazily (cached) and returns a snapshot of the
|
||||
active compliance settings. Safe to call from a health endpoint.
|
||||
|
||||
Returns a dict with these keys::
|
||||
|
||||
{
|
||||
"compliance_mode": "owasp_agentic" | "",
|
||||
"enabled": true | false,
|
||||
"prompt_injection": "detect" | "block",
|
||||
"max_tool_calls_per_task": 50,
|
||||
"max_task_duration_seconds": 300,
|
||||
"pii_redaction_enabled": true,
|
||||
"security_scan_mode": "warn" | "block" | "off",
|
||||
"rbac_roles": ["operator"],
|
||||
}
|
||||
"""
|
||||
try:
|
||||
from builtin_tools.audit import _load_workspace_config
|
||||
cfg = _load_workspace_config()
|
||||
except Exception:
|
||||
cfg = None
|
||||
|
||||
if cfg is None:
|
||||
return {
|
||||
"compliance_mode": "",
|
||||
"enabled": False,
|
||||
"prompt_injection": "detect",
|
||||
"max_tool_calls_per_task": 50,
|
||||
"max_task_duration_seconds": 300,
|
||||
"pii_redaction_enabled": False,
|
||||
"security_scan_mode": "warn",
|
||||
"rbac_roles": [],
|
||||
"note": "config unavailable",
|
||||
}
|
||||
|
||||
c = cfg.compliance
|
||||
enabled = c.mode == "owasp_agentic"
|
||||
return {
|
||||
"compliance_mode": c.mode,
|
||||
"enabled": enabled,
|
||||
"prompt_injection": c.prompt_injection,
|
||||
"max_tool_calls_per_task": c.max_tool_calls_per_task,
|
||||
"max_task_duration_seconds": c.max_task_duration_seconds,
|
||||
# PII redaction is active whenever compliance mode is on
|
||||
"pii_redaction_enabled": enabled,
|
||||
"security_scan_mode": cfg.security_scan.mode,
|
||||
"rbac_roles": list(cfg.rbac.roles),
|
||||
}
|
||||
@@ -1,550 +0,0 @@
|
||||
"""Async delegation tool for sending tasks to peer workspaces via A2A.
|
||||
|
||||
Delegations are non-blocking: the tool fires the A2A request in the background
|
||||
and returns immediately with a task_id. The agent can check status anytime via
|
||||
check_task_status, or just continue working and check later.
|
||||
|
||||
When the delegate responds, the result is stored and the agent is notified
|
||||
via a status update.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
import uuid
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
from typing import Optional
|
||||
|
||||
import httpx
|
||||
from langchain_core.tools import tool
|
||||
|
||||
from builtin_tools.audit import check_permission, get_workspace_roles, log_event
|
||||
from builtin_tools.telemetry import (
|
||||
A2A_SOURCE_WORKSPACE,
|
||||
A2A_TARGET_WORKSPACE,
|
||||
A2A_TASK_ID,
|
||||
WORKSPACE_ID_ATTR,
|
||||
get_current_traceparent,
|
||||
get_tracer,
|
||||
inject_trace_headers,
|
||||
)
|
||||
|
||||
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
|
||||
WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")
|
||||
DELEGATION_RETRY_ATTEMPTS = int(os.environ.get("DELEGATION_RETRY_ATTEMPTS", "3"))
|
||||
DELEGATION_RETRY_DELAY = float(os.environ.get("DELEGATION_RETRY_DELAY", "5.0"))
|
||||
DELEGATION_TIMEOUT = float(os.environ.get("DELEGATION_TIMEOUT", "300.0"))
|
||||
|
||||
|
||||
class DelegationStatus(str, Enum):
|
||||
PENDING = "pending"
|
||||
IN_PROGRESS = "in_progress"
|
||||
# QUEUED: peer's a2a-proxy returned HTTP 202 + {queued: true}, meaning
|
||||
# the peer is mid-task and the request was placed in a drain queue.
|
||||
# The reply will arrive via the platform's stitch path when the
|
||||
# peer finishes its current work. The LLM should WAIT, not retry,
|
||||
# and definitely not fall back to doing the work itself — see the
|
||||
# check_task_status docstring for the prompt-side guidance.
|
||||
QUEUED = "queued"
|
||||
COMPLETED = "completed"
|
||||
FAILED = "failed"
|
||||
|
||||
|
||||
@dataclass
|
||||
class DelegationTask:
|
||||
task_id: str
|
||||
workspace_id: str
|
||||
task_description: str
|
||||
status: DelegationStatus = DelegationStatus.PENDING
|
||||
result: Optional[str] = None
|
||||
error: Optional[str] = None
|
||||
|
||||
|
||||
# In-memory store of delegation tasks for this workspace
|
||||
_delegations: dict[str, DelegationTask] = {}
|
||||
_background_tasks: set[asyncio.Task] = set()
|
||||
MAX_DELEGATION_HISTORY = 100
|
||||
logger = __import__("logging").getLogger(__name__)
|
||||
|
||||
|
||||
def _evict_old_delegations():
|
||||
"""Remove completed/failed delegations when store exceeds MAX_DELEGATION_HISTORY."""
|
||||
if len(_delegations) <= MAX_DELEGATION_HISTORY:
|
||||
return
|
||||
# Evict oldest completed/failed first
|
||||
removable = [
|
||||
tid for tid, d in _delegations.items()
|
||||
if d.status in (DelegationStatus.COMPLETED, DelegationStatus.FAILED)
|
||||
]
|
||||
for tid in removable[:len(_delegations) - MAX_DELEGATION_HISTORY]:
|
||||
del _delegations[tid]
|
||||
|
||||
|
||||
def _on_task_done(task: asyncio.Task):
|
||||
"""Callback for background tasks — log unhandled exceptions."""
|
||||
_background_tasks.discard(task)
|
||||
if not task.cancelled() and task.exception():
|
||||
logger.error("Delegation background task failed: %s", task.exception())
|
||||
|
||||
|
||||
async def _notify_completion(task_id: str, target_workspace_id: str, status: str):
|
||||
"""Push notification to platform when delegation completes/fails."""
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10) as client:
|
||||
await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/notify",
|
||||
json={
|
||||
"type": "delegation_complete",
|
||||
"task_id": task_id,
|
||||
"target_workspace_id": target_workspace_id,
|
||||
"status": status,
|
||||
},
|
||||
)
|
||||
except Exception as e:
|
||||
logger.debug("Delegation notify failed (best-effort): %s", e)
|
||||
|
||||
|
||||
async def _record_delegation_on_platform(task_id: str, target_workspace_id: str, task: str):
|
||||
"""Register the delegation in the platform's activity_logs (#64 fix).
|
||||
|
||||
Best-effort POST to /workspaces/<self>/delegations/record. The agent still
|
||||
fires A2A directly for speed + OTEL propagation, but the platform's
|
||||
GET /delegations endpoint now mirrors the same set an agent's local
|
||||
check_task_status sees.
|
||||
"""
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10) as client:
|
||||
await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/delegations/record",
|
||||
json={
|
||||
"target_id": target_workspace_id,
|
||||
"task": task,
|
||||
"delegation_id": task_id,
|
||||
},
|
||||
)
|
||||
except Exception as e:
|
||||
logger.debug("Delegation record failed (best-effort): %s", e)
|
||||
|
||||
|
||||
async def _refresh_queued_from_platform(task_id: str) -> bool:
|
||||
"""Lazy-refresh a QUEUED delegation's local state from the platform.
|
||||
|
||||
Called by check_task_status when local status is QUEUED. The
|
||||
platform's drain stitch (a2a_queue.go) updates the delegate_result
|
||||
activity_logs row when a queued delegation eventually completes,
|
||||
but it has no callback to this runtime — without this lazy refresh,
|
||||
the LLM polling check_task_status would see "queued" forever
|
||||
even after the platform has the result.
|
||||
|
||||
Returns True if the local delegation was updated to a terminal state
|
||||
(completed/failed), False otherwise. Best-effort — network/parse
|
||||
errors leave the local state untouched and let the next call retry.
|
||||
"""
|
||||
delegation = _delegations.get(task_id)
|
||||
if not delegation:
|
||||
return False
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10) as client:
|
||||
resp = await client.get(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/delegations",
|
||||
headers={},
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
return False
|
||||
entries = resp.json()
|
||||
if not isinstance(entries, list):
|
||||
return False
|
||||
except Exception as e:
|
||||
logger.debug("refresh queued delegation %s: %s", task_id, e)
|
||||
return False
|
||||
# Find the latest delegate_result row matching our task_id.
|
||||
# Platform list is newest-first; the first match is the freshest.
|
||||
for entry in entries:
|
||||
if entry.get("delegation_id") != task_id:
|
||||
continue
|
||||
if entry.get("type") != "delegation":
|
||||
continue
|
||||
# Only delegate_result rows carry the eventual outcome; the
|
||||
# initial 'delegate' row stays at status='pending' even after
|
||||
# the result lands. Filtering on summary text is brittle, but
|
||||
# the rows from the LIST endpoint don't include `method`. The
|
||||
# `delegate_result` rows are the ones with `error` (failure)
|
||||
# or `response_preview` (success) populated — pick those.
|
||||
status = entry.get("status", "")
|
||||
if status == "completed":
|
||||
delegation.status = DelegationStatus.COMPLETED
|
||||
delegation.result = entry.get("response_preview", "")
|
||||
await _notify_completion(task_id, delegation.workspace_id, "completed")
|
||||
return True
|
||||
if status == "failed":
|
||||
delegation.status = DelegationStatus.FAILED
|
||||
delegation.error = entry.get("error", "")
|
||||
await _notify_completion(task_id, delegation.workspace_id, "failed")
|
||||
return True
|
||||
# status == "queued" / "pending" / "dispatched": platform hasn't
|
||||
# resolved yet; leave local state unchanged so the next poll
|
||||
# retries. Don't break — keep scanning in case there's a newer
|
||||
# entry for the same task_id (possible if the same delegation
|
||||
# was retried).
|
||||
return False
|
||||
|
||||
|
||||
async def _update_delegation_on_platform(task_id: str, status: str, error: str = "", response_preview: str = ""):
|
||||
"""Mirror status changes to the platform's activity_logs (#64 fix).
|
||||
|
||||
Paired with _record_delegation_on_platform — fires on completion/failure
|
||||
so the platform view stays in sync with the agent's local dict.
|
||||
"""
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10) as client:
|
||||
await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/delegations/{task_id}/update",
|
||||
json={
|
||||
"status": status,
|
||||
"error": error,
|
||||
"response_preview": response_preview[:500],
|
||||
},
|
||||
)
|
||||
except Exception as e:
|
||||
logger.debug("Delegation update failed (best-effort): %s", e)
|
||||
|
||||
|
||||
async def _execute_delegation(task_id: str, workspace_id: str, task: str):
|
||||
"""Background coroutine that sends the A2A request and stores the result."""
|
||||
delegation = _delegations[task_id]
|
||||
delegation.status = DelegationStatus.IN_PROGRESS
|
||||
|
||||
# #64: register on the platform so GET /workspaces/<self>/delegations
|
||||
# sees the same set as check_task_status. Best-effort — platform
|
||||
# unreachability must not block the actual A2A delegation.
|
||||
await _record_delegation_on_platform(task_id, workspace_id, task)
|
||||
|
||||
tracer = get_tracer()
|
||||
with tracer.start_as_current_span("task_delegate") as delegate_span:
|
||||
delegate_span.set_attribute(WORKSPACE_ID_ATTR, WORKSPACE_ID)
|
||||
delegate_span.set_attribute(A2A_SOURCE_WORKSPACE, WORKSPACE_ID)
|
||||
delegate_span.set_attribute(A2A_TARGET_WORKSPACE, workspace_id)
|
||||
delegate_span.set_attribute(A2A_TASK_ID, task_id)
|
||||
|
||||
async with httpx.AsyncClient(timeout=DELEGATION_TIMEOUT) as client:
|
||||
# Discover target URL
|
||||
try:
|
||||
discover_resp = await client.get(
|
||||
f"{PLATFORM_URL}/registry/discover/{workspace_id}",
|
||||
headers={"X-Workspace-ID": WORKSPACE_ID},
|
||||
)
|
||||
if discover_resp.status_code != 200:
|
||||
delegation.status = DelegationStatus.FAILED
|
||||
delegation.error = f"Discovery failed: HTTP {discover_resp.status_code}"
|
||||
log_event(event_type="delegation", action="delegate", resource=workspace_id,
|
||||
outcome="failure", trace_id=task_id, reason="discovery_error")
|
||||
return
|
||||
|
||||
target_url = discover_resp.json().get("url")
|
||||
if not target_url:
|
||||
delegation.status = DelegationStatus.FAILED
|
||||
delegation.error = "No URL for workspace"
|
||||
return
|
||||
except Exception as e:
|
||||
delegation.status = DelegationStatus.FAILED
|
||||
delegation.error = f"Discovery error: {e}"
|
||||
return
|
||||
|
||||
# Send A2A with retry
|
||||
outgoing_headers = inject_trace_headers({
|
||||
"Content-Type": "application/json",
|
||||
"X-Workspace-ID": WORKSPACE_ID,
|
||||
})
|
||||
traceparent = get_current_traceparent()
|
||||
|
||||
last_error = None
|
||||
for attempt in range(DELEGATION_RETRY_ATTEMPTS):
|
||||
try:
|
||||
a2a_resp = await client.post(
|
||||
target_url,
|
||||
headers=outgoing_headers,
|
||||
json={
|
||||
"jsonrpc": "2.0",
|
||||
"method": "message/send",
|
||||
"id": f"delegation-{task_id}-{attempt}",
|
||||
"params": {
|
||||
"message": {
|
||||
"role": "user",
|
||||
"parts": [{"kind": "text", "text": task}],
|
||||
"messageId": f"msg-{task_id}-{attempt}",
|
||||
},
|
||||
"metadata": {
|
||||
"parent_task_id": task_id,
|
||||
"source_workspace_id": WORKSPACE_ID,
|
||||
"traceparent": traceparent,
|
||||
},
|
||||
},
|
||||
},
|
||||
)
|
||||
|
||||
# HTTP 202 + {queued: true} = peer's a2a-proxy
|
||||
# accepted the request but the peer's runtime is
|
||||
# mid-task. Platform-side drain will deliver the
|
||||
# reply asynchronously. Mark QUEUED locally so
|
||||
# check_task_status can surface that state
|
||||
# to the LLM with explicit "wait, don't bypass"
|
||||
# guidance. Do NOT mark FAILED — the request is
|
||||
# alive in the platform's queue, not lost.
|
||||
#
|
||||
# Without this branch, the loop falls through, the
|
||||
# `if "error" in result` line below references an
|
||||
# unbound `result`, and the eventual FAILED status
|
||||
# leads the LLM to conclude the peer is permanently
|
||||
# unavailable — at which point it does the delegated
|
||||
# work itself, defeating the whole orchestration.
|
||||
if a2a_resp.status_code == 202:
|
||||
try:
|
||||
queued_body = a2a_resp.json()
|
||||
except Exception:
|
||||
queued_body = {}
|
||||
if queued_body.get("queued") is True:
|
||||
delegation.status = DelegationStatus.QUEUED
|
||||
log_event(
|
||||
event_type="delegation", action="delegate",
|
||||
resource=workspace_id, outcome="queued",
|
||||
trace_id=task_id, attempt=attempt + 1,
|
||||
)
|
||||
await _notify_completion(task_id, workspace_id, "queued")
|
||||
await _update_delegation_on_platform(
|
||||
task_id, "queued", "", "",
|
||||
)
|
||||
return
|
||||
|
||||
if a2a_resp.status_code == 200:
|
||||
try:
|
||||
result = a2a_resp.json()
|
||||
except Exception:
|
||||
delegation.status = DelegationStatus.FAILED
|
||||
delegation.error = "Invalid JSON response"
|
||||
return
|
||||
|
||||
if "result" in result:
|
||||
task_result = result["result"]
|
||||
artifacts = task_result.get("artifacts", [])
|
||||
texts = []
|
||||
for artifact in artifacts:
|
||||
for part in artifact.get("parts", []):
|
||||
if part.get("kind") == "text":
|
||||
texts.append(part["text"])
|
||||
# Also check top-level parts
|
||||
for part in task_result.get("parts", []):
|
||||
if part.get("kind") == "text":
|
||||
texts.append(part["text"])
|
||||
|
||||
delegation.status = DelegationStatus.COMPLETED
|
||||
delegation.result = "\n".join(texts) if texts else str(task_result)
|
||||
log_event(event_type="delegation", action="delegate", resource=workspace_id,
|
||||
outcome="success", trace_id=task_id, attempt=attempt + 1)
|
||||
await _notify_completion(task_id, workspace_id, "completed")
|
||||
# #64: mirror to platform activity_logs so
|
||||
# GET /delegations shows the completion state.
|
||||
await _update_delegation_on_platform(
|
||||
task_id, "completed", "",
|
||||
delegation.result or "",
|
||||
)
|
||||
return
|
||||
|
||||
if "error" in result:
|
||||
last_error = result["error"].get("message", str(result["error"]))
|
||||
break
|
||||
|
||||
except (httpx.ConnectError, httpx.TimeoutException) as e:
|
||||
last_error = str(e)
|
||||
if attempt < DELEGATION_RETRY_ATTEMPTS - 1:
|
||||
await asyncio.sleep(DELEGATION_RETRY_DELAY * (attempt + 1))
|
||||
continue
|
||||
|
||||
delegation.status = DelegationStatus.FAILED
|
||||
delegation.error = str(last_error)
|
||||
log_event(event_type="delegation", action="delegate", resource=workspace_id,
|
||||
outcome="failure", trace_id=task_id, last_error=str(last_error))
|
||||
await _notify_completion(task_id, workspace_id, "failed")
|
||||
# #64: mirror failure to platform activity_logs.
|
||||
await _update_delegation_on_platform(
|
||||
task_id, "failed", str(last_error), "",
|
||||
)
|
||||
|
||||
|
||||
@tool
|
||||
async def delegate_task(
|
||||
workspace_id: str,
|
||||
task: str,
|
||||
) -> str:
|
||||
"""Delegate a task to a peer workspace via A2A and WAIT for the response.
|
||||
|
||||
Synchronous variant — blocks until the peer replies (or the platform's
|
||||
A2A round-trip times out). Use this for QUICK questions and small
|
||||
sub-tasks where you can afford to wait inline.
|
||||
|
||||
For longer-running work (research, multi-minute jobs) use
|
||||
delegate_task_async + check_task_status instead so you don't hold
|
||||
this workspace busy waiting.
|
||||
|
||||
Tool name + description are sourced from the platform_tools registry —
|
||||
a single ToolSpec drives MCP, LangChain, and system-prompt docs.
|
||||
"""
|
||||
from a2a_tools import tool_delegate_task
|
||||
return await tool_delegate_task(workspace_id, task)
|
||||
|
||||
|
||||
@tool
|
||||
async def delegate_task_async(
|
||||
workspace_id: str,
|
||||
task: str,
|
||||
) -> dict:
|
||||
"""Delegate a task to a peer workspace via A2A protocol (non-blocking).
|
||||
|
||||
Sends the task in the background and returns immediately with a task_id.
|
||||
Use check_task_status to poll for the result, or continue working
|
||||
and check later. The delegate works independently.
|
||||
|
||||
Args:
|
||||
workspace_id: The ID of the target workspace to delegate to.
|
||||
task: The task description to send to the peer.
|
||||
|
||||
Returns:
|
||||
A dict with task_id and status="delegated". Use check_task_status(task_id) to get results.
|
||||
"""
|
||||
task_id = str(uuid.uuid4())
|
||||
|
||||
# Task #190 / #193 — Self-delegation guard (async path). Even on the
|
||||
# async path that returns a task_id immediately, _execute_delegation
|
||||
# eventually fires the A2A POST back to our own URL, which times out
|
||||
# against our own held run lock, gets recorded with source_id=our
|
||||
# workspace UUID, and surfaces in the inbox as a peer_agent message
|
||||
# from ourselves (#190). Reject before scheduling the background task
|
||||
# so no peer_agent echo can be generated. Sibling guards:
|
||||
# - workspace-server/internal/handlers/delegation.go (Go API gate)
|
||||
# - workspace/a2a_tools_delegation.py (MCP sync + async paths)
|
||||
# - workspace/builtin_tools/a2a_tools.py (framework-agnostic sync)
|
||||
if WORKSPACE_ID and workspace_id == WORKSPACE_ID:
|
||||
log_event(event_type="delegation", action="delegate", resource=workspace_id,
|
||||
outcome="rejected_self_delegation", trace_id=task_id)
|
||||
return {
|
||||
"success": False,
|
||||
"error": (
|
||||
"self-delegation rejected: cannot delegate_task_async to your "
|
||||
"own workspace (would time out and echo back as a peer_agent "
|
||||
"message from yourself — #190)"
|
||||
),
|
||||
}
|
||||
|
||||
# RBAC check
|
||||
roles, custom_perms = get_workspace_roles()
|
||||
if not check_permission("delegate", roles, custom_perms):
|
||||
log_event(event_type="rbac", action="rbac.deny", resource=workspace_id,
|
||||
outcome="denied", trace_id=task_id, attempted_action="delegate", roles=roles)
|
||||
return {"success": False, "error": f"RBAC: no 'delegate' permission. Roles: {roles}"}
|
||||
|
||||
log_event(event_type="delegation", action="delegate", resource=workspace_id,
|
||||
outcome="dispatched", trace_id=task_id, task_preview=task[:200])
|
||||
|
||||
# Store the delegation and launch background task
|
||||
delegation = DelegationTask(
|
||||
task_id=task_id,
|
||||
workspace_id=workspace_id,
|
||||
task_description=task[:200],
|
||||
)
|
||||
_delegations[task_id] = delegation
|
||||
_evict_old_delegations()
|
||||
|
||||
bg_task = asyncio.create_task(_execute_delegation(task_id, workspace_id, task))
|
||||
_background_tasks.add(bg_task)
|
||||
bg_task.add_done_callback(_on_task_done)
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"task_id": task_id,
|
||||
"status": "delegated",
|
||||
"message": f"Task delegated to {workspace_id}. Use check_task_status('{task_id}') to get the result when ready.",
|
||||
}
|
||||
|
||||
|
||||
@tool
|
||||
async def check_task_status(
|
||||
task_id: str = "",
|
||||
) -> dict:
|
||||
"""Check the status of a delegated task, or list all active delegations.
|
||||
|
||||
Status semantics — IMPORTANT:
|
||||
|
||||
- "pending" / "in_progress" → peer is actively working. Wait and check again.
|
||||
- "queued" → peer's a2a-proxy accepted the call but the peer is
|
||||
processing a prior task. The reply WILL arrive — the platform's
|
||||
drain re-dispatches when the peer is free. This tool transparently
|
||||
polls the platform for the eventual outcome on each call, so
|
||||
keep polling check_task_status periodically and you'll see
|
||||
the status flip to "completed" / "failed" automatically.
|
||||
Do NOT retry the delegation. Do NOT do the work yourself.
|
||||
Acknowledge to the user that the peer is busy and will reply,
|
||||
then continue with other delegations or check back later.
|
||||
- "completed" → result is in the `result` field.
|
||||
- "failed" → real failure (network, peer crashed, etc.). The
|
||||
`error` field has the cause. Only fall back to doing the work
|
||||
yourself if status is "failed", never if status is "queued".
|
||||
|
||||
Args:
|
||||
task_id: The task_id returned by delegate_task_async. If empty, lists all delegations.
|
||||
|
||||
Returns:
|
||||
Status and result (if completed) of the delegation.
|
||||
"""
|
||||
if not task_id:
|
||||
# List all delegations
|
||||
summary = []
|
||||
for tid, d in _delegations.items():
|
||||
entry = {
|
||||
"task_id": tid,
|
||||
"workspace_id": d.workspace_id,
|
||||
"status": d.status.value,
|
||||
"task": d.task_description,
|
||||
}
|
||||
if d.status == DelegationStatus.COMPLETED:
|
||||
entry["result_preview"] = (d.result or "")[:200]
|
||||
if d.status == DelegationStatus.FAILED:
|
||||
entry["error"] = d.error
|
||||
summary.append(entry)
|
||||
return {"delegations": summary, "count": len(summary)}
|
||||
|
||||
delegation = _delegations.get(task_id)
|
||||
if not delegation:
|
||||
return {"error": f"No delegation found with task_id {task_id}"}
|
||||
|
||||
# Lazy refresh for QUEUED entries: the platform's drain stitch
|
||||
# updates its activity_logs row when the queued delegation
|
||||
# eventually completes, but doesn't push back to this runtime.
|
||||
# Without this refresh, the LLM polling here would see "queued"
|
||||
# forever even after the result is available — exactly the bug
|
||||
# the upstream director-bypass docstring guidance warned against.
|
||||
if delegation.status == DelegationStatus.QUEUED:
|
||||
await _refresh_queued_from_platform(task_id)
|
||||
# delegation is the same dict entry — _refresh mutates in-place.
|
||||
|
||||
result = {
|
||||
"task_id": task_id,
|
||||
"workspace_id": delegation.workspace_id,
|
||||
"status": delegation.status.value,
|
||||
"task": delegation.task_description,
|
||||
}
|
||||
|
||||
if delegation.status == DelegationStatus.COMPLETED:
|
||||
result["result"] = delegation.result
|
||||
elif delegation.status == DelegationStatus.FAILED:
|
||||
result["error"] = delegation.error
|
||||
|
||||
# RFC #2251 V1.0 reproduction-harness instrumentation. Every poll of
|
||||
# check_task_status emits a phase=check_status line so the harness
|
||||
# operator can tell whether a coordinator stuck for 8 minutes was
|
||||
# polling-children-the-whole-time vs synthesizing-after-children-done.
|
||||
# `grep rfc2251_phase=check_status` in the workspace's container log
|
||||
# gives the polling pattern. Strip when V1.0 ships.
|
||||
logger.info(
|
||||
"rfc2251_phase=check_status task_id=%s peer=%s status=%s",
|
||||
task_id, delegation.workspace_id, delegation.status.value,
|
||||
)
|
||||
return result
|
||||
@@ -1,403 +0,0 @@
|
||||
"""Bridge between Molecule AI's RBAC + audit subsystem and the Microsoft Agent
|
||||
Governance Toolkit (agent-os-kernel, released April 2, 2026).
|
||||
|
||||
Integration points
|
||||
------------------
|
||||
* ``check_permission`` → ``PolicyEvaluator.evaluate()``
|
||||
Molecule AI's RBAC gate runs first; if RBAC allows the action the toolkit
|
||||
evaluator is consulted according to ``policy_mode``.
|
||||
|
||||
* ``log_event`` → governance audit sink
|
||||
Every permission decision (allow or deny) is written via
|
||||
``tools.audit.log_event`` with extra governance metadata so the full
|
||||
decision trail lands in Molecule AI's existing audit stream.
|
||||
|
||||
* OTEL traceparent flows through
|
||||
``tools.telemetry.get_current_traceparent()`` is called inside ``emit()``
|
||||
and the W3C traceparent string is attached to every audit record, giving
|
||||
end-to-end distributed tracing across agent boundaries.
|
||||
|
||||
Graceful degradation
|
||||
--------------------
|
||||
If ``agent-os-kernel`` is not installed the module falls back to Molecule AI
|
||||
RBAC alone. No exception propagates to the agent — governance is a
|
||||
best-effort overlay, never a hard dependency.
|
||||
|
||||
Install::
|
||||
|
||||
pip install agent-os-kernel
|
||||
|
||||
Minimal config.yaml snippet::
|
||||
|
||||
governance:
|
||||
enabled: true
|
||||
toolkit: microsoft
|
||||
policy_mode: strict # strict | permissive | audit
|
||||
policy_endpoint: https://your-tenant.governance.azure.com
|
||||
policy_file: policies/workspace.rego
|
||||
blocked_patterns:
|
||||
- ".*\\.exec$"
|
||||
- "shell\\."
|
||||
max_tool_calls_per_task: 50
|
||||
|
||||
NOTE: The agent-os-kernel package was released April 2, 2026 and is in
|
||||
community preview. The API bindings in this module target v3.0.x of the
|
||||
package (agent_os.policies.PolicyEvaluator). If the package API changes,
|
||||
update _init_evaluator() accordingly.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
from typing import Any, Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
WORKSPACE_ID: str = os.environ.get("WORKSPACE_ID", "")
|
||||
|
||||
# Module-level singleton — set by initialize_governance() at startup
|
||||
_adapter: Optional["GovernanceAdapter"] = None
|
||||
|
||||
|
||||
class GovernanceAdapter:
|
||||
"""Bridges Molecule AI RBAC + audit trail to the Microsoft Agent Governance Toolkit."""
|
||||
|
||||
def __init__(self, config: Any) -> None:
|
||||
self._config = config
|
||||
self._evaluator = None
|
||||
self._toolkit_available: bool = False
|
||||
|
||||
async def initialize(self) -> None:
|
||||
"""Async entry point: initialise evaluator and log outcome."""
|
||||
self._init_evaluator()
|
||||
if self._toolkit_available:
|
||||
logger.info(
|
||||
"GovernanceAdapter initialised — toolkit=%s mode=%s",
|
||||
self._config.toolkit,
|
||||
self._config.policy_mode,
|
||||
)
|
||||
else:
|
||||
logger.warning(
|
||||
"GovernanceAdapter initialised in RBAC-only mode "
|
||||
"(agent-os-kernel not available or failed to load)."
|
||||
)
|
||||
|
||||
def _init_evaluator(self) -> None:
|
||||
"""Lazy-import and configure the PolicyEvaluator from agent-os-kernel.
|
||||
|
||||
All failures are caught and logged; the adapter simply runs without
|
||||
the toolkit rather than crashing the workspace.
|
||||
"""
|
||||
try:
|
||||
try:
|
||||
from agent_os.policies import PolicyEvaluator # type: ignore[import]
|
||||
except ImportError:
|
||||
logger.warning(
|
||||
"agent-os-kernel is not installed — graceful degradation active. "
|
||||
"Governance will use Molecule AI RBAC only. "
|
||||
"To enable the Microsoft Agent Governance Toolkit run: "
|
||||
"pip install agent-os-kernel"
|
||||
)
|
||||
return
|
||||
|
||||
kwargs: dict[str, Any] = {
|
||||
"policy_mode": self._config.policy_mode,
|
||||
"max_tool_calls_per_task": self._config.max_tool_calls_per_task,
|
||||
"blocked_patterns": self._config.blocked_patterns,
|
||||
}
|
||||
if self._config.policy_endpoint:
|
||||
kwargs["endpoint"] = self._config.policy_endpoint
|
||||
|
||||
self._evaluator = PolicyEvaluator(**kwargs)
|
||||
|
||||
# Load a policy file if one is configured and exists on disk.
|
||||
if self._config.policy_file:
|
||||
policy_file = self._config.policy_file
|
||||
if os.path.exists(policy_file):
|
||||
ext = os.path.splitext(policy_file)[1].lower()
|
||||
if ext == ".rego":
|
||||
self._evaluator.load_rego(path=policy_file)
|
||||
logger.info("Loaded Rego policy file: %s", policy_file)
|
||||
elif ext in (".yaml", ".yml"):
|
||||
self._evaluator.load_yaml(path=policy_file)
|
||||
logger.info("Loaded YAML policy file: %s", policy_file)
|
||||
elif ext == ".cedar":
|
||||
self._evaluator.load_cedar(path=policy_file)
|
||||
logger.info("Loaded Cedar policy file: %s", policy_file)
|
||||
else:
|
||||
logger.warning(
|
||||
"Unrecognised policy file extension '%s' — skipping load.",
|
||||
ext,
|
||||
)
|
||||
else:
|
||||
logger.warning(
|
||||
"policy_file '%s' does not exist — skipping load.",
|
||||
policy_file,
|
||||
)
|
||||
|
||||
self._toolkit_available = True
|
||||
logger.info(
|
||||
"agent-os-kernel PolicyEvaluator ready — policy_mode=%s",
|
||||
self._config.policy_mode,
|
||||
)
|
||||
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning(
|
||||
"Failed to initialise agent-os-kernel PolicyEvaluator: %s — "
|
||||
"graceful degradation active (RBAC only).",
|
||||
exc,
|
||||
)
|
||||
|
||||
def check_permission(
|
||||
self,
|
||||
action: str,
|
||||
roles: list[str],
|
||||
custom_permissions: dict | None = None,
|
||||
context: dict | None = None,
|
||||
) -> tuple[bool, str]:
|
||||
"""Evaluate an action against Molecule AI RBAC and (optionally) the toolkit.
|
||||
|
||||
Returns
|
||||
-------
|
||||
tuple[bool, str]
|
||||
``(allowed, reason)`` — reason is a short human-readable string
|
||||
explaining the decision.
|
||||
"""
|
||||
from builtin_tools import audit # inline import to avoid circular dependencies
|
||||
|
||||
context = context or {}
|
||||
|
||||
# --- Step 1: Molecule AI RBAC gate (always runs) ---
|
||||
rbac_allowed: bool = audit.check_permission(action, roles, custom_permissions)
|
||||
|
||||
if not rbac_allowed:
|
||||
self.emit(
|
||||
event_type="permission_check",
|
||||
action=action,
|
||||
resource=context.get("resource", ""),
|
||||
outcome="denied",
|
||||
actor=context.get("actor"),
|
||||
policy_decision="rbac_deny",
|
||||
roles=roles,
|
||||
)
|
||||
return False, f"RBAC denied action '{action}' for roles {roles}"
|
||||
|
||||
# --- Step 2: If toolkit unavailable or audit-only mode, return RBAC result ---
|
||||
if not self._toolkit_available or self._config.policy_mode == "audit":
|
||||
self.emit(
|
||||
event_type="permission_check",
|
||||
action=action,
|
||||
resource=context.get("resource", ""),
|
||||
outcome="allowed",
|
||||
actor=context.get("actor"),
|
||||
policy_decision="rbac_allowed",
|
||||
roles=roles,
|
||||
toolkit_mode=self._config.policy_mode,
|
||||
)
|
||||
return rbac_allowed, "rbac_allowed"
|
||||
|
||||
# --- Step 3: Toolkit evaluation ---
|
||||
eval_context: dict[str, Any] = {
|
||||
"action": action,
|
||||
"resource": context.get("resource", ""),
|
||||
"roles": roles,
|
||||
"workspace_id": WORKSPACE_ID,
|
||||
}
|
||||
# Merge any extra context keys the caller supplied.
|
||||
for key, value in context.items():
|
||||
if key not in eval_context:
|
||||
eval_context[key] = value
|
||||
|
||||
toolkit_allowed: bool = True
|
||||
reason: str = ""
|
||||
evaluator_name: str = "agent-os-kernel"
|
||||
|
||||
try:
|
||||
decision = self._evaluator.evaluate(eval_context)
|
||||
toolkit_allowed = getattr(decision, "allowed", True)
|
||||
reason = getattr(decision, "reason", "")
|
||||
evaluator_name = getattr(decision, "evaluator_name", "agent-os-kernel")
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning(
|
||||
"agent-os-kernel evaluation raised an exception: %s — "
|
||||
"falling back to RBAC result to avoid blocking the agent.",
|
||||
exc,
|
||||
)
|
||||
self.emit(
|
||||
event_type="permission_check",
|
||||
action=action,
|
||||
resource=context.get("resource", ""),
|
||||
outcome="allowed",
|
||||
actor=context.get("actor"),
|
||||
policy_decision="toolkit_evaluation_error",
|
||||
toolkit_mode=self._config.policy_mode,
|
||||
roles=roles,
|
||||
)
|
||||
return rbac_allowed, "toolkit_evaluation_error"
|
||||
|
||||
# --- Step 4: Combine results according to policy_mode ---
|
||||
if self._config.policy_mode == "permissive":
|
||||
# Toolkit denial is advisory only in permissive mode.
|
||||
if not toolkit_allowed:
|
||||
logger.warning(
|
||||
"Governance toolkit denied action '%s' (reason=%s) but policy_mode "
|
||||
"is 'permissive' — allowing and logging advisory denial.",
|
||||
action,
|
||||
reason,
|
||||
)
|
||||
final_allowed = rbac_allowed
|
||||
else:
|
||||
# strict: both gates must allow.
|
||||
final_allowed = rbac_allowed and toolkit_allowed
|
||||
|
||||
outcome = "allowed" if final_allowed else "denied"
|
||||
self.emit(
|
||||
event_type="permission_check",
|
||||
action=action,
|
||||
resource=context.get("resource", ""),
|
||||
outcome=outcome,
|
||||
actor=context.get("actor"),
|
||||
policy_decision=reason or outcome,
|
||||
evaluator=evaluator_name,
|
||||
toolkit_mode=self._config.policy_mode,
|
||||
roles=roles,
|
||||
)
|
||||
return final_allowed, reason or "allowed"
|
||||
|
||||
def emit(
|
||||
self,
|
||||
event_type: str,
|
||||
action: str,
|
||||
resource: str,
|
||||
outcome: str,
|
||||
actor: str | None = None,
|
||||
trace_id: str | None = None,
|
||||
**extra: Any,
|
||||
) -> str:
|
||||
"""Write a governance-annotated audit event.
|
||||
|
||||
Pulls the current W3C traceparent from the active OTEL span so that
|
||||
governance decisions are traceable across service boundaries.
|
||||
|
||||
Returns
|
||||
-------
|
||||
str
|
||||
The ``trace_id`` produced by ``audit.log_event``.
|
||||
"""
|
||||
from builtin_tools import audit # inline import to avoid circular dependencies
|
||||
from builtin_tools.telemetry import get_current_traceparent # inline import
|
||||
|
||||
traceparent: str | None = get_current_traceparent()
|
||||
|
||||
recorded_trace_id: str = audit.log_event(
|
||||
event_type,
|
||||
action,
|
||||
resource,
|
||||
outcome,
|
||||
actor=actor,
|
||||
trace_id=trace_id,
|
||||
governance_toolkit=(
|
||||
self._config.toolkit if self._toolkit_available else "disabled"
|
||||
),
|
||||
traceparent=traceparent or "",
|
||||
**extra,
|
||||
)
|
||||
return recorded_trace_id
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Module-level functions
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
async def initialize_governance(config: Any) -> Optional[GovernanceAdapter]:
|
||||
"""Initialize the module-level GovernanceAdapter singleton.
|
||||
|
||||
Called once at startup by main.py when governance.enabled is True.
|
||||
Returns the adapter, or None if initialization fails.
|
||||
"""
|
||||
global _adapter
|
||||
|
||||
try:
|
||||
adapter = GovernanceAdapter(config)
|
||||
await adapter.initialize()
|
||||
_adapter = adapter
|
||||
logger.info(
|
||||
"Governance singleton initialised — toolkit=%s mode=%s",
|
||||
config.toolkit,
|
||||
config.policy_mode,
|
||||
)
|
||||
return adapter
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning(
|
||||
"initialize_governance() failed: %s — governance disabled for this session.",
|
||||
exc,
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def get_governance_adapter() -> Optional[GovernanceAdapter]:
|
||||
"""Return the module-level GovernanceAdapter singleton (may be None)."""
|
||||
return _adapter
|
||||
|
||||
|
||||
def check_permission_with_governance(
|
||||
action: str,
|
||||
roles: list[str],
|
||||
custom_permissions: dict | None = None,
|
||||
context: dict | None = None,
|
||||
) -> tuple[bool, str]:
|
||||
"""Convenience wrapper: use GovernanceAdapter when available, else RBAC only.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
action:
|
||||
The action name to evaluate (e.g. ``"memory.write"``).
|
||||
roles:
|
||||
The list of role names held by the requesting actor.
|
||||
custom_permissions:
|
||||
Optional custom role→action mapping to overlay on built-in roles.
|
||||
context:
|
||||
Optional extra context forwarded to the PolicyEvaluator.
|
||||
|
||||
Returns
|
||||
-------
|
||||
tuple[bool, str]
|
||||
``(allowed, reason)``
|
||||
"""
|
||||
if _adapter is None:
|
||||
from builtin_tools import audit # inline import to avoid circular dependencies
|
||||
|
||||
result: bool = audit.check_permission(action, roles, custom_permissions)
|
||||
return result, "rbac_only"
|
||||
|
||||
return _adapter.check_permission(action, roles, custom_permissions, context)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Private helper
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _emit_governance_event(
|
||||
event_type: str,
|
||||
action: str,
|
||||
resource: str,
|
||||
outcome: str,
|
||||
actor: str | None = None,
|
||||
trace_id: str | None = None,
|
||||
**extra: Any,
|
||||
) -> Optional[str]:
|
||||
"""Emit a governance audit event via the singleton adapter if one is set.
|
||||
|
||||
Returns the trace_id produced by log_event, or None if no adapter is set.
|
||||
"""
|
||||
if _adapter is None:
|
||||
return None
|
||||
return _adapter.emit(
|
||||
event_type,
|
||||
action,
|
||||
resource,
|
||||
outcome,
|
||||
actor=actor,
|
||||
trace_id=trace_id,
|
||||
**extra,
|
||||
)
|
||||
@@ -1,561 +0,0 @@
|
||||
"""Human-In-The-Loop (HITL) workflow primitives.
|
||||
|
||||
Generalizes the approval tool into reusable HITL building blocks that work
|
||||
across all Molecule AI adapters.
|
||||
|
||||
Features
|
||||
--------
|
||||
@requires_approval
|
||||
Decorator that gates *any* async callable (tool, method, standalone fn)
|
||||
behind a human approval request. The decorated function only runs if
|
||||
the request is granted. Roles in ``hitl.bypass_roles`` skip the gate.
|
||||
|
||||
pause_task / resume_task
|
||||
LangChain tools for explicit pause/resume of in-flight tasks. An agent
|
||||
calls ``pause_task(task_id, reason)`` to suspend itself; an external
|
||||
signal (webhook, dashboard click, another agent) calls ``resume_task``
|
||||
with the same task_id to wake it up.
|
||||
|
||||
Notification channels
|
||||
---------------------
|
||||
Configured under ``hitl:`` in ``config.yaml``:
|
||||
|
||||
hitl:
|
||||
channels:
|
||||
- type: dashboard # always active; uses platform approval API
|
||||
- type: slack
|
||||
webhook_url: https://hooks.slack.com/services/…
|
||||
- type: email
|
||||
smtp_host: smtp.example.com
|
||||
smtp_port: 587
|
||||
from: alerts@example.com
|
||||
to: ops@example.com
|
||||
username: alerts@example.com # optional; password from SMTP_PASSWORD env
|
||||
default_timeout: 300 # seconds before an unanswered request times out
|
||||
bypass_roles: [admin] # roles that skip the approval gate entirely
|
||||
|
||||
Environment variables
|
||||
---------------------
|
||||
SMTP_PASSWORD Password for SMTP authentication (preferred over config file)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import functools
|
||||
import logging
|
||||
import os
|
||||
import smtplib
|
||||
from dataclasses import dataclass, field
|
||||
from email.mime.text import MIMEText
|
||||
from typing import Any, Callable
|
||||
|
||||
import httpx
|
||||
from langchain_core.tools import tool
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Config
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@dataclass
|
||||
class HITLConfig:
|
||||
"""HITL settings loaded from the ``hitl:`` block in config.yaml."""
|
||||
channels: list[dict] = field(default_factory=lambda: [{"type": "dashboard"}])
|
||||
default_timeout: float = 300.0
|
||||
bypass_roles: list[str] = field(default_factory=list)
|
||||
|
||||
|
||||
def _load_hitl_config() -> HITLConfig:
|
||||
"""Load HITL config from workspace config; fall back to safe defaults."""
|
||||
try:
|
||||
from config import load_config
|
||||
cfg = load_config()
|
||||
raw = getattr(cfg, "hitl", None)
|
||||
if raw is None:
|
||||
return HITLConfig()
|
||||
return HITLConfig(
|
||||
channels=raw.channels if hasattr(raw, "channels") else [{"type": "dashboard"}],
|
||||
default_timeout=float(raw.default_timeout if hasattr(raw, "default_timeout") else 300),
|
||||
bypass_roles=list(raw.bypass_roles if hasattr(raw, "bypass_roles") else []),
|
||||
)
|
||||
except Exception:
|
||||
return HITLConfig()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Pause / Resume registry
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class _TaskPauseRegistry:
|
||||
"""In-process registry mapping task_id → asyncio.Event + optional result.
|
||||
|
||||
Multiple coroutines awaiting the same task_id are all unblocked when
|
||||
``resume()`` is called. Results survive until the awaiting coroutine
|
||||
calls ``pop_result()``.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._events: dict[str, asyncio.Event] = {}
|
||||
self._results: dict[str, dict] = {}
|
||||
# #265: owner map — workspace_id that created each task.
|
||||
# Empty string means "no owner / legacy" (bypasses ownership check).
|
||||
self._owners: dict[str, str] = {}
|
||||
|
||||
def register(self, task_id: str, owner: str = "") -> asyncio.Event:
|
||||
"""Create and store an Event for *task_id*. Returns the event.
|
||||
|
||||
Args:
|
||||
task_id: Unique task identifier.
|
||||
owner: Workspace ID that owns this task. When set, ``resume``
|
||||
will reject callers from a different workspace.
|
||||
"""
|
||||
ev = asyncio.Event()
|
||||
self._events[task_id] = ev
|
||||
self._owners[task_id] = owner
|
||||
return ev
|
||||
|
||||
def resume(self, task_id: str, result: dict | None = None, owner: str = "") -> bool:
|
||||
"""Signal the Event for *task_id*. Returns False if not registered.
|
||||
|
||||
Args:
|
||||
task_id: The identifier used in ``register``.
|
||||
result: Optional result payload forwarded to the waiting coroutine.
|
||||
owner: Caller's workspace ID. When both the stored owner and
|
||||
*owner* are non-empty and they differ, the call is rejected
|
||||
(returns False) — prevents cross-workspace prompt injection
|
||||
(#265). Passing ``owner=""`` bypasses the check (used in
|
||||
direct registry calls from tests and platform code).
|
||||
"""
|
||||
# #265 ownership check
|
||||
stored_owner = self._owners.get(task_id, "")
|
||||
if owner and stored_owner and owner != stored_owner:
|
||||
logger.warning(
|
||||
"HITL: resume rejected for task %s — caller workspace %r != owner %r",
|
||||
task_id, owner, stored_owner,
|
||||
)
|
||||
return False
|
||||
ev = self._events.get(task_id)
|
||||
if ev is None:
|
||||
return False
|
||||
self._results[task_id] = result or {}
|
||||
ev.set()
|
||||
return True
|
||||
|
||||
def pop_result(self, task_id: str) -> dict:
|
||||
"""Return and remove the stored result for *task_id*."""
|
||||
return self._results.pop(task_id, {})
|
||||
|
||||
def cleanup(self, task_id: str) -> None:
|
||||
"""Remove *task_id* from all dicts."""
|
||||
self._events.pop(task_id, None)
|
||||
self._results.pop(task_id, None)
|
||||
self._owners.pop(task_id, None)
|
||||
|
||||
def list_paused(self) -> list[str]:
|
||||
"""Return IDs of tasks whose events have not yet been set."""
|
||||
return [tid for tid, ev in self._events.items() if not ev.is_set()]
|
||||
|
||||
|
||||
# Global singleton — safe within one asyncio event loop / process
|
||||
pause_registry = _TaskPauseRegistry()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Notification channels
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
async def _notify_channels(
|
||||
action: str,
|
||||
reason: str,
|
||||
approval_id: str,
|
||||
cfg: HITLConfig,
|
||||
) -> None:
|
||||
"""Fire-and-forget notifications to all configured channels.
|
||||
|
||||
Errors in individual channels are logged but never re-raised so that a
|
||||
misconfigured Slack webhook cannot block the approval flow.
|
||||
"""
|
||||
platform_url = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
|
||||
workspace_id = os.environ.get("WORKSPACE_ID", "")
|
||||
|
||||
for channel in cfg.channels:
|
||||
ch_type = channel.get("type", "dashboard")
|
||||
try:
|
||||
if ch_type == "slack":
|
||||
await _notify_slack(channel, action, reason, approval_id,
|
||||
platform_url, workspace_id)
|
||||
elif ch_type == "email":
|
||||
await _notify_email(channel, action, reason, approval_id,
|
||||
platform_url, workspace_id)
|
||||
# "dashboard" is handled by the platform via the approval POST
|
||||
except Exception as exc:
|
||||
logger.warning("HITL: channel '%s' notification failed: %s", ch_type, exc)
|
||||
|
||||
|
||||
async def _notify_slack(
|
||||
cfg: dict,
|
||||
action: str,
|
||||
reason: str,
|
||||
approval_id: str,
|
||||
platform_url: str,
|
||||
workspace_id: str,
|
||||
) -> None:
|
||||
webhook_url = cfg.get("webhook_url", "")
|
||||
if not webhook_url:
|
||||
return
|
||||
|
||||
approve_url = f"{platform_url}/workspaces/{workspace_id}/approvals/{approval_id}/approve"
|
||||
deny_url = f"{platform_url}/workspaces/{workspace_id}/approvals/{approval_id}/deny"
|
||||
|
||||
payload = {
|
||||
"text": f":warning: Approval required from workspace `{workspace_id}`",
|
||||
"blocks": [
|
||||
{
|
||||
"type": "section",
|
||||
"text": {
|
||||
"type": "mrkdwn",
|
||||
"text": (
|
||||
f"*Action:* {action}\n"
|
||||
f"*Reason:* {reason}\n"
|
||||
f"*Approval ID:* `{approval_id}`"
|
||||
),
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "actions",
|
||||
"elements": [
|
||||
{
|
||||
"type": "button",
|
||||
"text": {"type": "plain_text", "text": "Approve"},
|
||||
"style": "primary",
|
||||
"url": approve_url,
|
||||
},
|
||||
{
|
||||
"type": "button",
|
||||
"text": {"type": "plain_text", "text": "Deny"},
|
||||
"style": "danger",
|
||||
"url": deny_url,
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
}
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
await client.post(webhook_url, json=payload)
|
||||
logger.info("HITL: Slack notification sent for approval %s", approval_id)
|
||||
|
||||
|
||||
async def _notify_email(
|
||||
cfg: dict,
|
||||
action: str,
|
||||
reason: str,
|
||||
approval_id: str,
|
||||
platform_url: str,
|
||||
workspace_id: str,
|
||||
) -> None:
|
||||
smtp_host = cfg.get("smtp_host", "")
|
||||
smtp_port = int(cfg.get("smtp_port", 587))
|
||||
from_addr = cfg.get("from", "")
|
||||
to_addr = cfg.get("to", "")
|
||||
|
||||
if not all([smtp_host, from_addr, to_addr]):
|
||||
logger.warning("HITL: email channel missing smtp_host/from/to — skipping")
|
||||
return
|
||||
|
||||
approve_url = f"{platform_url}/workspaces/{workspace_id}/approvals/{approval_id}/approve"
|
||||
deny_url = f"{platform_url}/workspaces/{workspace_id}/approvals/{approval_id}/deny"
|
||||
|
||||
body = (
|
||||
f"Approval required from workspace {workspace_id}\n\n"
|
||||
f"Action : {action}\n"
|
||||
f"Reason : {reason}\n"
|
||||
f"ID : {approval_id}\n\n"
|
||||
f"Approve: {approve_url}\n"
|
||||
f"Deny : {deny_url}\n"
|
||||
)
|
||||
|
||||
msg = MIMEText(body, "plain", "utf-8")
|
||||
msg["Subject"] = f"[Molecule AI] Approval required: {action}"
|
||||
msg["From"] = from_addr
|
||||
msg["To"] = to_addr
|
||||
|
||||
username = cfg.get("username", "")
|
||||
password = cfg.get("password", os.environ.get("SMTP_PASSWORD", ""))
|
||||
|
||||
def _send() -> None:
|
||||
with smtplib.SMTP(smtp_host, smtp_port) as srv:
|
||||
srv.ehlo()
|
||||
srv.starttls()
|
||||
if username and password:
|
||||
srv.login(username, password)
|
||||
srv.send_message(msg)
|
||||
|
||||
await asyncio.to_thread(_send)
|
||||
logger.info("HITL: email notification sent for approval %s", approval_id)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# @requires_approval decorator
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def requires_approval(
|
||||
action_description: str = "",
|
||||
reason_template: str = "",
|
||||
bypass_roles: list[str] | None = None,
|
||||
) -> Callable[[Callable], Callable]:
|
||||
"""Decorator that gates an async callable behind a human approval request.
|
||||
|
||||
The wrapped function executes only when a human approves. Use this on
|
||||
any tool or async helper that performs destructive or high-impact work.
|
||||
|
||||
Args:
|
||||
action_description: Short label for the action shown to the approver.
|
||||
Defaults to the function's ``name`` attribute or
|
||||
``__name__``.
|
||||
reason_template: f-string template for the reason line. Keyword
|
||||
arguments of the decorated function are available,
|
||||
e.g. ``"Delete table {table_name}"``).
|
||||
bypass_roles: Roles that skip the gate entirely. Overrides
|
||||
``hitl.bypass_roles`` in config.yaml when given.
|
||||
|
||||
Returns:
|
||||
A decorator; applying it to a function returns an async wrapper.
|
||||
|
||||
Usage::
|
||||
|
||||
@tool
|
||||
@requires_approval("Wipe production DB", bypass_roles=["admin"])
|
||||
async def drop_table(table_name: str) -> dict:
|
||||
...
|
||||
|
||||
# Works with plain async functions too:
|
||||
@requires_approval("Send customer email")
|
||||
async def send_email(to: str, body: str) -> dict:
|
||||
...
|
||||
"""
|
||||
def decorator(fn: Callable) -> Callable:
|
||||
action = action_description or getattr(fn, "name", None) or fn.__name__
|
||||
|
||||
@functools.wraps(fn)
|
||||
async def wrapper(*args: Any, **kwargs: Any) -> Any:
|
||||
hitl_cfg = _load_hitl_config()
|
||||
|
||||
# --- Check bypass roles -----------------------------------------
|
||||
active_bypass = bypass_roles if bypass_roles is not None else hitl_cfg.bypass_roles
|
||||
if active_bypass:
|
||||
try:
|
||||
from builtin_tools.audit import get_workspace_roles
|
||||
roles, _ = get_workspace_roles()
|
||||
if any(r in active_bypass for r in roles):
|
||||
logger.info(
|
||||
"@requires_approval bypassed (role %s) for '%s'", roles, action
|
||||
)
|
||||
return await fn(*args, **kwargs)
|
||||
except Exception:
|
||||
pass # If RBAC check fails, proceed to approval gate
|
||||
|
||||
# --- Build reason string -----------------------------------------
|
||||
if reason_template:
|
||||
try:
|
||||
reason = reason_template.format(**kwargs)
|
||||
except (KeyError, IndexError):
|
||||
reason = reason_template
|
||||
else:
|
||||
arg_parts = [f"{k}={str(v)[:60]}" for k, v in list(kwargs.items())[:3]]
|
||||
reason = f"Args: {', '.join(arg_parts)}" if arg_parts else "Automated action"
|
||||
|
||||
# --- Fire non-dashboard notifications (async, non-blocking) ------
|
||||
asyncio.create_task(
|
||||
_notify_channels(action, reason, "pending", hitl_cfg)
|
||||
)
|
||||
|
||||
# --- Request approval via approval tool --------------------------
|
||||
try:
|
||||
from builtin_tools.approval import request_approval
|
||||
approval_result = await request_approval.ainvoke(
|
||||
{"action": action, "reason": reason}
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.error("@requires_approval: approval call failed: %s", exc)
|
||||
return {
|
||||
"success": False,
|
||||
"error": f"Approval gate error: {exc}",
|
||||
}
|
||||
|
||||
if not approval_result.get("approved"):
|
||||
# Art. 14 audit: log the denial outcome so the activity log
|
||||
# contains evidence that the human oversight gate was exercised.
|
||||
try:
|
||||
from builtin_tools.audit import log_event
|
||||
log_event(
|
||||
event_type="hitl",
|
||||
action="approve",
|
||||
resource=action,
|
||||
outcome="denied",
|
||||
actor=approval_result.get("decided_by"),
|
||||
approval_id=approval_result.get("approval_id"),
|
||||
reason=reason,
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
return {
|
||||
"success": False,
|
||||
"error": (
|
||||
f"Action '{action}' not approved: "
|
||||
f"{approval_result.get('message', approval_result.get('error', 'denied'))}"
|
||||
),
|
||||
"approval_id": approval_result.get("approval_id"),
|
||||
}
|
||||
|
||||
# Art. 14 audit: log the approval grant before running the function.
|
||||
try:
|
||||
from builtin_tools.audit import log_event
|
||||
log_event(
|
||||
event_type="hitl",
|
||||
action="approve",
|
||||
resource=action,
|
||||
outcome="granted",
|
||||
actor=approval_result.get("decided_by"),
|
||||
approval_id=approval_result.get("approval_id"),
|
||||
reason=reason,
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# --- Approved — run the original function ------------------------
|
||||
return await fn(*args, **kwargs)
|
||||
|
||||
return wrapper
|
||||
|
||||
return decorator
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Pause / Resume LangChain tools
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@tool
|
||||
async def pause_task(task_id: str, reason: str = "") -> dict:
|
||||
"""Suspend the current task and wait for a resume signal.
|
||||
|
||||
The agent calls this to pause itself at a decision point. Execution
|
||||
resumes when ``resume_task`` is called with the same task_id, or after
|
||||
the configured ``hitl.default_timeout`` seconds.
|
||||
|
||||
Args:
|
||||
task_id: Unique identifier for this pause point (use the A2A task ID
|
||||
or any stable string that the caller can reference later).
|
||||
reason: Human-readable description of why the task is pausing.
|
||||
"""
|
||||
# #265: record workspace ownership on registration so resume_task can
|
||||
# reject callers from a different workspace (cross-workspace prompt-injection
|
||||
# prevention). External task_id is unchanged — only internal ownership
|
||||
# metadata is added, so no tests or callers need to update their task IDs.
|
||||
_ws = os.environ.get("WORKSPACE_ID", "")
|
||||
|
||||
try:
|
||||
from builtin_tools.audit import log_event
|
||||
log_event(
|
||||
event_type="hitl",
|
||||
action="pause",
|
||||
resource=task_id,
|
||||
outcome="paused",
|
||||
trace_id=task_id,
|
||||
reason=reason,
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
event = pause_registry.register(task_id, owner=_ws)
|
||||
timeout = _load_hitl_config().default_timeout
|
||||
logger.info("HITL: task %s paused — %s", task_id, reason or "(no reason given)")
|
||||
|
||||
try:
|
||||
await asyncio.wait_for(event.wait(), timeout=timeout)
|
||||
result = pause_registry.pop_result(task_id)
|
||||
logger.info("HITL: task %s resumed", task_id)
|
||||
try:
|
||||
from builtin_tools.audit import log_event
|
||||
log_event(
|
||||
event_type="hitl",
|
||||
action="resume",
|
||||
resource=task_id,
|
||||
outcome="resumed",
|
||||
trace_id=task_id,
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
return {"resumed": True, "task_id": task_id, **result}
|
||||
|
||||
except asyncio.TimeoutError:
|
||||
logger.warning("HITL: task %s timed out after %.0fs", task_id, timeout)
|
||||
try:
|
||||
from builtin_tools.audit import log_event
|
||||
log_event(
|
||||
event_type="hitl",
|
||||
action="pause",
|
||||
resource=task_id,
|
||||
outcome="timeout",
|
||||
trace_id=task_id,
|
||||
timeout_seconds=timeout,
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
return {
|
||||
"resumed": False,
|
||||
"task_id": task_id,
|
||||
"error": f"Timed out after {timeout:.0f}s waiting for resume signal",
|
||||
}
|
||||
finally:
|
||||
pause_registry.cleanup(task_id)
|
||||
|
||||
|
||||
@tool
|
||||
async def resume_task(task_id: str, message: str = "") -> dict:
|
||||
"""Resume a previously paused task.
|
||||
|
||||
Signals the ``pause_task`` coroutine waiting on *task_id* to continue.
|
||||
Safe to call even if the task has already resumed or timed out (returns
|
||||
success=False in that case).
|
||||
|
||||
Args:
|
||||
task_id: The identifier passed to ``pause_task``.
|
||||
message: Optional message forwarded to the resumed task.
|
||||
"""
|
||||
# #265: pass caller's workspace ID so the registry can reject a resume
|
||||
# from a different workspace (ownership check in _TaskPauseRegistry.resume).
|
||||
_ws = os.environ.get("WORKSPACE_ID", "")
|
||||
|
||||
result_payload = {"message": message} if message else {}
|
||||
success = pause_registry.resume(task_id, result_payload, owner=_ws)
|
||||
|
||||
if success:
|
||||
logger.info("HITL: resume signal sent for task %s", task_id)
|
||||
try:
|
||||
from builtin_tools.audit import log_event
|
||||
log_event(
|
||||
event_type="hitl",
|
||||
action="resume",
|
||||
resource=task_id,
|
||||
outcome="success",
|
||||
trace_id=task_id,
|
||||
message=message,
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
return {"success": True, "task_id": task_id}
|
||||
|
||||
return {
|
||||
"success": False,
|
||||
"task_id": task_id,
|
||||
"error": "Task not found or already resumed",
|
||||
}
|
||||
|
||||
|
||||
@tool
|
||||
async def list_paused_tasks() -> dict:
|
||||
"""List all tasks currently suspended and waiting for a resume signal."""
|
||||
paused = pause_registry.list_paused()
|
||||
return {"paused_tasks": paused, "count": len(paused)}
|
||||
@@ -1,470 +0,0 @@
|
||||
"""HMA memory tools for agents.
|
||||
|
||||
Hierarchical Memory Architecture:
|
||||
- LOCAL: private to this workspace, invisible to others
|
||||
- TEAM: shared with parent + siblings (same team)
|
||||
- GLOBAL: readable by all, writable by root workspaces only
|
||||
|
||||
RBAC enforcement
|
||||
----------------
|
||||
``commit_memory`` requires the ``"memory.write"`` action.
|
||||
``recall_memory`` requires the ``"memory.read"`` action.
|
||||
Roles are read from ``config.yaml`` under ``rbac.roles`` (default: operator).
|
||||
|
||||
Audit trail
|
||||
-----------
|
||||
Every memory operation appends a JSON Lines record to the audit log:
|
||||
|
||||
memory / memory.write / allowed — write permitted by RBAC
|
||||
memory / memory.write / success — write committed successfully
|
||||
memory / memory.write / failure — write failed (platform error)
|
||||
memory / memory.read / allowed — read permitted by RBAC
|
||||
memory / memory.read / success — search returned results
|
||||
memory / memory.read / failure — search failed (platform error)
|
||||
|
||||
RBAC denials emit ``rbac / rbac.deny / denied`` events instead.
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import uuid
|
||||
from types import SimpleNamespace
|
||||
from typing import Any
|
||||
|
||||
from langchain_core.tools import tool
|
||||
from builtin_tools.awareness_client import build_awareness_client
|
||||
from builtin_tools.audit import check_permission, get_workspace_roles, log_event
|
||||
from builtin_tools.security import _redact_secrets
|
||||
from builtin_tools.telemetry import MEMORY_QUERY, MEMORY_SCOPE, WORKSPACE_ID_ATTR, get_tracer
|
||||
|
||||
try: # pragma: no cover - optional runtime dependency in lightweight test envs
|
||||
import httpx # type: ignore
|
||||
except ImportError: # pragma: no cover
|
||||
httpx = SimpleNamespace(AsyncClient=None)
|
||||
|
||||
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
|
||||
WORKSPACE_ID = os.environ.get("WORKSPACE_ID", "")
|
||||
|
||||
|
||||
@tool
|
||||
async def commit_memory(content: str, scope: str = "LOCAL") -> dict:
|
||||
"""Store a fact in memory with a specific scope.
|
||||
|
||||
Args:
|
||||
content: The fact or knowledge to remember.
|
||||
scope: Memory scope — LOCAL (private), TEAM (shared with team), or GLOBAL (company-wide, root only).
|
||||
"""
|
||||
content = _redact_secrets(content)
|
||||
trace_id = str(uuid.uuid4())
|
||||
scope = scope.upper()
|
||||
if scope not in ("LOCAL", "TEAM", "GLOBAL"):
|
||||
return {"error": "scope must be LOCAL, TEAM, or GLOBAL"}
|
||||
|
||||
# --- RBAC check -----------------------------------------------------------
|
||||
roles, custom_perms = get_workspace_roles()
|
||||
if not check_permission("memory.write", roles, custom_perms):
|
||||
log_event(
|
||||
event_type="rbac",
|
||||
action="rbac.deny",
|
||||
resource=scope,
|
||||
outcome="denied",
|
||||
trace_id=trace_id,
|
||||
attempted_action="memory.write",
|
||||
roles=roles,
|
||||
)
|
||||
return {
|
||||
"success": False,
|
||||
"error": (
|
||||
"RBAC: this workspace does not have the 'memory.write' permission. "
|
||||
f"Current roles: {roles}"
|
||||
),
|
||||
}
|
||||
|
||||
log_event(
|
||||
event_type="memory",
|
||||
action="memory.write",
|
||||
resource=scope,
|
||||
outcome="allowed",
|
||||
trace_id=trace_id,
|
||||
memory_scope=scope,
|
||||
content_length=len(content),
|
||||
)
|
||||
|
||||
# ── OTEL: memory_write span ──────────────────────────────────────────────
|
||||
tracer = get_tracer()
|
||||
|
||||
with tracer.start_as_current_span("memory_write") as mem_span:
|
||||
mem_span.set_attribute(WORKSPACE_ID_ATTR, WORKSPACE_ID)
|
||||
mem_span.set_attribute(MEMORY_SCOPE, scope)
|
||||
mem_span.set_attribute("memory.content_length", len(content))
|
||||
|
||||
awareness_client = build_awareness_client()
|
||||
if awareness_client is not None:
|
||||
try:
|
||||
result = await awareness_client.commit(content, scope)
|
||||
except Exception as e:
|
||||
log_event(
|
||||
event_type="memory",
|
||||
action="memory.write",
|
||||
resource=scope,
|
||||
outcome="failure",
|
||||
trace_id=trace_id,
|
||||
memory_scope=scope,
|
||||
error=str(e),
|
||||
)
|
||||
try:
|
||||
mem_span.record_exception(e)
|
||||
except Exception:
|
||||
pass
|
||||
return {"success": False, "error": str(e)}
|
||||
else:
|
||||
# #215-class bug: platform now gates /workspaces/:id/memories behind
|
||||
# workspace auth. Import auth_headers lazily (same pattern as the
|
||||
# activity-log path below) so test environments that don't ship
|
||||
# platform_auth still work.
|
||||
try:
|
||||
from platform_auth import auth_headers as _auth
|
||||
_headers = _auth()
|
||||
except Exception:
|
||||
_headers = {}
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
try:
|
||||
resp = await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/memories",
|
||||
json={"content": content, "scope": scope},
|
||||
headers=_headers,
|
||||
)
|
||||
if resp.status_code == 201:
|
||||
result = {"success": True, "id": resp.json().get("id"), "scope": scope}
|
||||
else:
|
||||
result = {"success": False, "error": resp.json().get("error", resp.text)}
|
||||
except Exception as e:
|
||||
log_event(
|
||||
event_type="memory",
|
||||
action="memory.write",
|
||||
resource=scope,
|
||||
outcome="failure",
|
||||
trace_id=trace_id,
|
||||
memory_scope=scope,
|
||||
error=str(e),
|
||||
)
|
||||
try:
|
||||
mem_span.record_exception(e)
|
||||
except Exception:
|
||||
pass
|
||||
return {"success": False, "error": str(e)}
|
||||
|
||||
if result.get("success"):
|
||||
mem_span.set_attribute("memory.id", result.get("id") or "")
|
||||
mem_span.set_attribute("memory.success", True)
|
||||
log_event(
|
||||
event_type="memory",
|
||||
action="memory.write",
|
||||
resource=scope,
|
||||
outcome="success",
|
||||
trace_id=trace_id,
|
||||
memory_scope=scope,
|
||||
memory_id=result.get("id"),
|
||||
)
|
||||
# #125: surface memory writes in /activity so the Canvas
|
||||
# "Agent Comms" tab shows what an agent chose to remember.
|
||||
# Fire-and-forget — failure here must not poison the tool
|
||||
# response since the memory write itself already succeeded.
|
||||
await _record_memory_activity(scope, content, result.get("id"))
|
||||
await _maybe_log_skill_promotion(content, scope, result)
|
||||
else:
|
||||
mem_span.set_attribute("memory.success", False)
|
||||
log_event(
|
||||
event_type="memory",
|
||||
action="memory.write",
|
||||
resource=scope,
|
||||
outcome="failure",
|
||||
trace_id=trace_id,
|
||||
memory_scope=scope,
|
||||
error=result.get("error"),
|
||||
)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
@tool
|
||||
async def recall_memory(query: str = "", scope: str = "") -> dict:
|
||||
"""Search stored memories.
|
||||
|
||||
Args:
|
||||
query: Text to search for (empty returns all).
|
||||
scope: Filter by scope — LOCAL, TEAM, GLOBAL, or empty for all accessible.
|
||||
"""
|
||||
trace_id = str(uuid.uuid4())
|
||||
scope = scope.upper()
|
||||
if scope and scope not in ("LOCAL", "TEAM", "GLOBAL"):
|
||||
return {"error": "scope must be LOCAL, TEAM, GLOBAL, or empty"}
|
||||
|
||||
# --- RBAC check -----------------------------------------------------------
|
||||
roles, custom_perms = get_workspace_roles()
|
||||
if not check_permission("memory.read", roles, custom_perms):
|
||||
log_event(
|
||||
event_type="rbac",
|
||||
action="rbac.deny",
|
||||
resource=scope or "all",
|
||||
outcome="denied",
|
||||
trace_id=trace_id,
|
||||
attempted_action="memory.read",
|
||||
roles=roles,
|
||||
)
|
||||
return {
|
||||
"success": False,
|
||||
"error": (
|
||||
"RBAC: this workspace does not have the 'memory.read' permission. "
|
||||
f"Current roles: {roles}"
|
||||
),
|
||||
}
|
||||
|
||||
log_event(
|
||||
event_type="memory",
|
||||
action="memory.read",
|
||||
resource=scope or "all",
|
||||
outcome="allowed",
|
||||
trace_id=trace_id,
|
||||
memory_scope=scope or "all",
|
||||
query_length=len(query),
|
||||
)
|
||||
|
||||
# ── OTEL: memory_read span ───────────────────────────────────────────────
|
||||
tracer = get_tracer()
|
||||
|
||||
with tracer.start_as_current_span("memory_read") as mem_span:
|
||||
mem_span.set_attribute(WORKSPACE_ID_ATTR, WORKSPACE_ID)
|
||||
mem_span.set_attribute(MEMORY_SCOPE, scope or "all")
|
||||
mem_span.set_attribute(MEMORY_QUERY, query[:256] if query else "")
|
||||
|
||||
awareness_client = build_awareness_client()
|
||||
if awareness_client is not None:
|
||||
try:
|
||||
result = await awareness_client.search(query, scope)
|
||||
mem_span.set_attribute("memory.result_count", result.get("count", 0))
|
||||
mem_span.set_attribute("memory.success", result.get("success", False))
|
||||
log_event(
|
||||
event_type="memory",
|
||||
action="memory.read",
|
||||
resource=scope or "all",
|
||||
outcome="success" if result.get("success") else "failure",
|
||||
trace_id=trace_id,
|
||||
memory_scope=scope or "all",
|
||||
result_count=result.get("count", 0),
|
||||
)
|
||||
return result
|
||||
except Exception as e:
|
||||
log_event(
|
||||
event_type="memory",
|
||||
action="memory.read",
|
||||
resource=scope or "all",
|
||||
outcome="failure",
|
||||
trace_id=trace_id,
|
||||
memory_scope=scope or "all",
|
||||
error=str(e),
|
||||
)
|
||||
try:
|
||||
mem_span.record_exception(e)
|
||||
except Exception:
|
||||
pass
|
||||
return {"success": False, "error": str(e)}
|
||||
|
||||
params = {}
|
||||
if query:
|
||||
params["q"] = query
|
||||
if scope:
|
||||
params["scope"] = scope.upper()
|
||||
|
||||
# #215-class bug (search path): same fix as commit_memory above —
|
||||
# the platform gates GET /workspaces/:id/memories behind workspace
|
||||
# auth, so without auth_headers() every search silently 401s and the
|
||||
# agent thinks its backlog is empty (observed on Technical Researcher
|
||||
# idle-loop pilot 2026-04-15).
|
||||
try:
|
||||
from platform_auth import auth_headers as _auth
|
||||
_headers = _auth()
|
||||
except Exception:
|
||||
_headers = {}
|
||||
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
try:
|
||||
resp = await client.get(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/memories",
|
||||
params=params,
|
||||
headers=_headers,
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
memories = resp.json()
|
||||
mem_span.set_attribute("memory.result_count", len(memories))
|
||||
mem_span.set_attribute("memory.success", True)
|
||||
log_event(
|
||||
event_type="memory",
|
||||
action="memory.read",
|
||||
resource=scope or "all",
|
||||
outcome="success",
|
||||
trace_id=trace_id,
|
||||
memory_scope=scope or "all",
|
||||
result_count=len(memories),
|
||||
)
|
||||
return {
|
||||
"success": True,
|
||||
"count": len(memories),
|
||||
"memories": memories,
|
||||
}
|
||||
mem_span.set_attribute("memory.success", False)
|
||||
log_event(
|
||||
event_type="memory",
|
||||
action="memory.read",
|
||||
resource=scope or "all",
|
||||
outcome="failure",
|
||||
trace_id=trace_id,
|
||||
memory_scope=scope or "all",
|
||||
http_status=resp.status_code,
|
||||
)
|
||||
return {"success": False, "error": resp.json().get("error", resp.text)}
|
||||
except Exception as e:
|
||||
log_event(
|
||||
event_type="memory",
|
||||
action="memory.read",
|
||||
resource=scope or "all",
|
||||
outcome="failure",
|
||||
trace_id=trace_id,
|
||||
memory_scope=scope or "all",
|
||||
error=str(e),
|
||||
)
|
||||
try:
|
||||
mem_span.record_exception(e)
|
||||
except Exception:
|
||||
pass
|
||||
return {"success": False, "error": str(e)}
|
||||
|
||||
|
||||
def _parse_promotion_packet(content: str) -> dict[str, Any] | None:
|
||||
"""Return a structured memory packet when content looks like promotion metadata."""
|
||||
text = content.strip()
|
||||
if not text.startswith("{"):
|
||||
return None
|
||||
|
||||
try:
|
||||
payload = json.loads(text)
|
||||
except json.JSONDecodeError:
|
||||
return None
|
||||
|
||||
if not isinstance(payload, dict): # pragma: no cover
|
||||
return None
|
||||
if not payload.get("promote_to_skill"):
|
||||
return None
|
||||
|
||||
return payload
|
||||
|
||||
|
||||
async def _record_memory_activity(scope: str, content: str, memory_id: str | None) -> None:
|
||||
"""Surface a successful memory write as an activity row so the Canvas
|
||||
"Agent Comms" tab can display what an agent chose to remember.
|
||||
Fire-and-forget — never raises. #125.
|
||||
|
||||
The summary is intentionally short (scope tag + first 80 chars of
|
||||
content with a ``…`` ellipsis when truncated) so the activity table
|
||||
stays readable; full content lives in ``agent_memories``.
|
||||
"""
|
||||
workspace_id = WORKSPACE_ID.strip()
|
||||
platform_url = PLATFORM_URL.strip().rstrip("/")
|
||||
if not workspace_id or not platform_url:
|
||||
return
|
||||
|
||||
preview = content.strip().replace("\n", " ")
|
||||
if len(preview) > 80:
|
||||
preview = preview[:80] + "…"
|
||||
summary = f"[{scope}] {preview}"
|
||||
|
||||
# NOTE: target_id is a UUID column scoped to workspace_id references —
|
||||
# cannot hold awareness/memory IDs (which are arbitrary strings).
|
||||
# We embed the memory_id in the summary instead so it's still searchable.
|
||||
if memory_id:
|
||||
summary = f"{summary} (id={memory_id[:24]})"
|
||||
payload: dict[str, Any] = {
|
||||
"workspace_id": workspace_id,
|
||||
"activity_type": "memory_write",
|
||||
"summary": summary,
|
||||
"status": "ok",
|
||||
}
|
||||
|
||||
try:
|
||||
try:
|
||||
from platform_auth import auth_headers as _auth
|
||||
_headers = _auth()
|
||||
except Exception:
|
||||
_headers = {}
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
await client.post(
|
||||
f"{platform_url}/workspaces/{workspace_id}/activity",
|
||||
json=payload,
|
||||
headers=_headers,
|
||||
)
|
||||
except Exception:
|
||||
# Activity logging is purely observability — never poison the
|
||||
# tool response on a failure here. We don't even log_event the
|
||||
# failure since the memory write itself succeeded and that's
|
||||
# what matters to the caller.
|
||||
pass
|
||||
|
||||
|
||||
async def _maybe_log_skill_promotion(content: str, scope: str, memory_result: dict) -> None:
|
||||
"""Best-effort activity log for durable memory entries that should become skills."""
|
||||
packet = _parse_promotion_packet(content)
|
||||
if packet is None:
|
||||
return
|
||||
|
||||
workspace_id = WORKSPACE_ID.strip()
|
||||
platform_url = PLATFORM_URL.strip().rstrip("/")
|
||||
if not workspace_id or not platform_url:
|
||||
return
|
||||
|
||||
repetition_signal = packet.get("repetition_signal")
|
||||
summary = (
|
||||
packet.get("summary")
|
||||
or packet.get("title")
|
||||
or packet.get("what changed")
|
||||
or "Repeatable workflow promoted to skill candidate"
|
||||
)
|
||||
metadata: dict[str, Any] = {
|
||||
"source": "memory-curation",
|
||||
"scope": scope,
|
||||
"memory_id": memory_result.get("id"),
|
||||
"promote_to_skill": True,
|
||||
"repetition_signal": repetition_signal,
|
||||
"memory_packet": packet,
|
||||
}
|
||||
|
||||
payload = {
|
||||
"activity_type": "skill_promotion",
|
||||
"method": "memory/skill-promotion",
|
||||
"summary": summary,
|
||||
"status": "ok",
|
||||
"source_id": workspace_id,
|
||||
"request_body": packet,
|
||||
"metadata": metadata,
|
||||
}
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
await client.post(
|
||||
f"{platform_url}/workspaces/{workspace_id}/activity",
|
||||
json=payload,
|
||||
)
|
||||
await client.post(
|
||||
f"{platform_url}/registry/heartbeat",
|
||||
json={
|
||||
"workspace_id": workspace_id,
|
||||
"error_rate": 0,
|
||||
"sample_error": "",
|
||||
"active_tasks": 1,
|
||||
"uptime_seconds": 0,
|
||||
"current_task": f"Skill promotion: {summary}",
|
||||
},
|
||||
)
|
||||
except Exception:
|
||||
# Best-effort observability only. Memory commits must never fail because
|
||||
# the promotion log could not be written.
|
||||
return
|
||||
@@ -1,281 +0,0 @@
|
||||
"""Code sandbox tool for safe code execution.
|
||||
|
||||
Executes code in an isolated environment. Three backends are supported:
|
||||
|
||||
subprocess (default)
|
||||
Runs code locally via asyncio subprocess with a hard timeout.
|
||||
Best for Tier 1/2 agents where run_code is lightly used and the
|
||||
workspace container itself is the isolation boundary.
|
||||
|
||||
docker
|
||||
Throwaway Docker-in-Docker container: network disabled, memory capped,
|
||||
read-only filesystem. Requires Docker socket access inside the container.
|
||||
Best for Tier 3 on-prem deployments.
|
||||
|
||||
e2b
|
||||
Cloud-hosted microVM sandbox via E2B (https://e2b.dev).
|
||||
No local Docker required — code runs in E2B's isolated cloud VMs.
|
||||
Supports Python and JavaScript.
|
||||
Requires:
|
||||
- e2b-code-interpreter Python package (pinned in requirements.txt)
|
||||
- E2B_API_KEY workspace secret (set via canvas Secrets panel or API)
|
||||
Best for hosted/cloud Molecule AI deployments.
|
||||
|
||||
Backend is selected via the SANDBOX_BACKEND env var, which the provisioner
|
||||
sets from config.yaml → sandbox.backend. Default: "subprocess".
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
import tempfile
|
||||
|
||||
from langchain_core.tools import tool
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
SANDBOX_BACKEND = os.environ.get("SANDBOX_BACKEND", "subprocess")
|
||||
SANDBOX_TIMEOUT = int(os.environ.get("SANDBOX_TIMEOUT", "30"))
|
||||
SANDBOX_MEMORY_LIMIT = os.environ.get("SANDBOX_MEMORY_LIMIT", "256m")
|
||||
MAX_OUTPUT = 10_000
|
||||
|
||||
# E2B kernel names differ from internal language names.
|
||||
_E2B_KERNEL_MAP = {
|
||||
"python": "python3",
|
||||
"javascript": "js",
|
||||
"js": "js",
|
||||
}
|
||||
|
||||
|
||||
@tool
|
||||
async def run_code(code: str, language: str = "python") -> dict:
|
||||
"""Execute code in an isolated sandbox and return the output.
|
||||
|
||||
Args:
|
||||
code: The code to execute.
|
||||
language: Programming language — python, javascript, or shell.
|
||||
The e2b backend supports python and javascript only.
|
||||
"""
|
||||
if SANDBOX_BACKEND == "docker":
|
||||
return await _run_docker(code, language)
|
||||
elif SANDBOX_BACKEND == "e2b":
|
||||
return await _run_e2b(code, language)
|
||||
else:
|
||||
return await _run_subprocess(code, language)
|
||||
|
||||
|
||||
async def _run_subprocess(code: str, language: str) -> dict:
|
||||
"""Fallback: run code in a subprocess with timeout."""
|
||||
cmd_map = {
|
||||
"python": ["python3", "-c"],
|
||||
"javascript": ["node", "-e"],
|
||||
"shell": ["sh", "-c"],
|
||||
"bash": ["bash", "-c"],
|
||||
}
|
||||
|
||||
cmd_prefix = cmd_map.get(language)
|
||||
if not cmd_prefix:
|
||||
return {"error": f"Unsupported language: {language}", "exit_code": -1}
|
||||
|
||||
try:
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
*cmd_prefix, code,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
)
|
||||
|
||||
stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=SANDBOX_TIMEOUT)
|
||||
|
||||
return {
|
||||
"exit_code": proc.returncode,
|
||||
"stdout": stdout.decode("utf-8", errors="replace")[:MAX_OUTPUT],
|
||||
"stderr": stderr.decode("utf-8", errors="replace")[:MAX_OUTPUT],
|
||||
"language": language,
|
||||
"backend": "subprocess",
|
||||
}
|
||||
except asyncio.TimeoutError:
|
||||
try:
|
||||
proc.kill()
|
||||
await proc.wait()
|
||||
except ProcessLookupError:
|
||||
pass
|
||||
return {"error": f"Timeout after {SANDBOX_TIMEOUT}s", "exit_code": -1}
|
||||
except Exception as e:
|
||||
return {"error": str(e), "exit_code": -1}
|
||||
|
||||
|
||||
async def _run_docker(code: str, language: str) -> dict:
|
||||
"""Run code in a throwaway Docker container via mounted temp file."""
|
||||
image_map = {
|
||||
"python": ("python:3.11-slim", ["python3", "/sandbox/code.py"]),
|
||||
"javascript": ("node:20-slim", ["node", "/sandbox/code.js"]),
|
||||
"shell": ("alpine:3.18", ["sh", "/sandbox/code.sh"]),
|
||||
"bash": ("alpine:3.18", ["sh", "/sandbox/code.sh"]),
|
||||
}
|
||||
|
||||
entry = image_map.get(language)
|
||||
if not entry:
|
||||
return {"error": f"Unsupported language: {language}", "exit_code": -1}
|
||||
|
||||
image, run_cmd = entry
|
||||
code_file = None
|
||||
|
||||
try:
|
||||
# Write code to temp file — avoids shell metacharacter injection
|
||||
ext = {"python": ".py", "javascript": ".js", "shell": ".sh", "bash": ".sh"}.get(language, ".txt")
|
||||
fd, code_file = tempfile.mkstemp(suffix=ext, prefix="sandbox_")
|
||||
with os.fdopen(fd, "w") as f:
|
||||
f.write(code)
|
||||
|
||||
cmd = [
|
||||
"docker", "run", "--rm",
|
||||
"--network", "none",
|
||||
"--memory", SANDBOX_MEMORY_LIMIT,
|
||||
"--cpus", "0.5",
|
||||
"--read-only",
|
||||
"--tmpfs", "/tmp:size=32m",
|
||||
"-v", f"{code_file}:/sandbox/code{ext}:ro",
|
||||
image,
|
||||
] + run_cmd
|
||||
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
*cmd,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
)
|
||||
|
||||
stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=SANDBOX_TIMEOUT)
|
||||
|
||||
return {
|
||||
"exit_code": proc.returncode,
|
||||
"stdout": stdout.decode("utf-8", errors="replace")[:MAX_OUTPUT],
|
||||
"stderr": stderr.decode("utf-8", errors="replace")[:MAX_OUTPUT],
|
||||
"language": language,
|
||||
"backend": "docker",
|
||||
"image": image,
|
||||
}
|
||||
except asyncio.TimeoutError:
|
||||
return {"error": f"Timeout after {SANDBOX_TIMEOUT}s", "exit_code": -1}
|
||||
except Exception as e:
|
||||
return {"error": str(e), "exit_code": -1}
|
||||
finally:
|
||||
if code_file:
|
||||
try:
|
||||
os.unlink(code_file)
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
|
||||
async def _run_e2b(code: str, language: str) -> dict:
|
||||
"""Run code in an E2B cloud microVM sandbox.
|
||||
|
||||
Requires the e2b-code-interpreter package and an E2B_API_KEY secret.
|
||||
Each call creates a fresh sandbox, runs the code, and destroys the sandbox.
|
||||
Sandbox lifetime is bounded by SANDBOX_TIMEOUT seconds.
|
||||
|
||||
Supported languages: python, javascript.
|
||||
"""
|
||||
# Import lazily so the package is only required when the e2b backend is
|
||||
# actually configured — other backends work without it installed.
|
||||
try:
|
||||
from e2b_code_interpreter import Sandbox
|
||||
except ImportError:
|
||||
return {
|
||||
"error": (
|
||||
"e2b-code-interpreter is not installed. "
|
||||
"Add it to requirements.txt or switch to the docker/subprocess backend."
|
||||
),
|
||||
"exit_code": -1,
|
||||
}
|
||||
|
||||
api_key = os.environ.get("E2B_API_KEY")
|
||||
if not api_key:
|
||||
return {
|
||||
"error": (
|
||||
"E2B_API_KEY is not set. "
|
||||
"Add it as a workspace secret via the canvas Secrets panel or platform API."
|
||||
),
|
||||
"exit_code": -1,
|
||||
}
|
||||
|
||||
kernel = _E2B_KERNEL_MAP.get(language)
|
||||
if kernel is None:
|
||||
return {
|
||||
"error": (
|
||||
f"Language '{language}' is not supported by the e2b backend. "
|
||||
"Supported: python, javascript."
|
||||
),
|
||||
"exit_code": -1,
|
||||
}
|
||||
|
||||
sandbox = None
|
||||
try:
|
||||
# Create a fresh sandbox for this execution.
|
||||
# timeout controls the sandbox lifetime in seconds.
|
||||
sandbox = await asyncio.wait_for(
|
||||
asyncio.get_running_loop().run_in_executor(
|
||||
None,
|
||||
lambda: Sandbox(api_key=api_key, timeout=SANDBOX_TIMEOUT),
|
||||
),
|
||||
timeout=SANDBOX_TIMEOUT,
|
||||
)
|
||||
|
||||
# Execute code and collect results.
|
||||
execution = await asyncio.wait_for(
|
||||
asyncio.get_running_loop().run_in_executor(
|
||||
None,
|
||||
lambda: sandbox.run_code(code, language=kernel),
|
||||
),
|
||||
timeout=SANDBOX_TIMEOUT,
|
||||
)
|
||||
|
||||
# E2B returns a list of Result objects; collect text/error output.
|
||||
stdout_parts = []
|
||||
stderr_parts = []
|
||||
|
||||
for result in execution.results:
|
||||
# result.text is the primary output (stdout equivalent)
|
||||
if hasattr(result, "text") and result.text:
|
||||
stdout_parts.append(str(result.text))
|
||||
# Some result types expose an error attribute
|
||||
if hasattr(result, "error") and result.error:
|
||||
stderr_parts.append(str(result.error))
|
||||
|
||||
# Logs are stored separately in execution.logs
|
||||
if hasattr(execution, "logs"):
|
||||
logs = execution.logs
|
||||
if hasattr(logs, "stdout") and logs.stdout:
|
||||
stdout_parts.extend(logs.stdout)
|
||||
if hasattr(logs, "stderr") and logs.stderr:
|
||||
stderr_parts.extend(logs.stderr)
|
||||
|
||||
combined_stdout = "".join(stdout_parts)[:MAX_OUTPUT]
|
||||
combined_stderr = "".join(stderr_parts)[:MAX_OUTPUT]
|
||||
|
||||
# Treat any stderr output as a non-zero exit code (e2b doesn't expose
|
||||
# a numeric exit code at the sandbox level).
|
||||
exit_code = 1 if combined_stderr else 0
|
||||
|
||||
return {
|
||||
"exit_code": exit_code,
|
||||
"stdout": combined_stdout,
|
||||
"stderr": combined_stderr,
|
||||
"language": language,
|
||||
"backend": "e2b",
|
||||
}
|
||||
|
||||
except asyncio.TimeoutError:
|
||||
logger.warning("E2B sandbox timed out after %ds", SANDBOX_TIMEOUT)
|
||||
return {"error": f"Timeout after {SANDBOX_TIMEOUT}s", "exit_code": -1}
|
||||
except Exception as e:
|
||||
logger.exception("E2B sandbox error: %s", e)
|
||||
return {"error": str(e), "exit_code": -1}
|
||||
finally:
|
||||
# Always destroy the sandbox to avoid leaking E2B credits.
|
||||
if sandbox is not None:
|
||||
try:
|
||||
await asyncio.get_running_loop().run_in_executor(
|
||||
None, sandbox.kill
|
||||
)
|
||||
except Exception:
|
||||
pass # Best-effort cleanup
|
||||
@@ -1,120 +0,0 @@
|
||||
"""Secret-scrubbing utilities for workspace runtime (#834 — C2).
|
||||
|
||||
Provides ``_redact_secrets()`` applied at every ``commit_memory`` call site
|
||||
to prevent API keys and tokens from being persisted verbatim in the
|
||||
memories table.
|
||||
|
||||
Design notes
|
||||
------------
|
||||
- **Allowlist of known prefixes** (``sk-``, ``ghp_``, etc.) cover the most
|
||||
dangerous tokens because they are unambiguous.
|
||||
- **Contextual pattern** covers generic high-entropy values that appear
|
||||
immediately after assignment keywords (``key=``, ``token=``, ``secret=``,
|
||||
``password=``, ``api_key=``). The keyword is preserved in the output so
|
||||
log lines remain readable; only the value is redacted.
|
||||
- **Idempotent**: the replacement token ``[REDACTED]`` does not match any
|
||||
of the patterns, so calling ``_redact_secrets`` twice is safe.
|
||||
- **No false-positive risk on normal prose**: all patterns require either
|
||||
a well-known prefix (``AKIA``, ``ghp_``, ``sk-``) or both a keyword and
|
||||
≥ 40 base64/alphanumeric chars — ordinary English words never match.
|
||||
|
||||
Relationship to ``compliance.redact_pii``
|
||||
------------------------------------------
|
||||
``redact_pii`` handles PII (emails, SSNs, credit cards) and uses typed
|
||||
tokens ``[REDACTED:type]`` for SIEM indexing. ``_redact_secrets`` is
|
||||
narrowly scoped to API credentials and uses the plain ``[REDACTED]`` token
|
||||
because the exact secret type is not important at the storage layer —
|
||||
what matters is that no credential value ever reaches the database.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from typing import List
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Replacement sentinel
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
#: Replacement token — deliberately plain so downstream readers do not need
|
||||
#: to parse structured tokens. Does not match any scrub pattern (idempotent).
|
||||
REDACTED: str = "[REDACTED]"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Patterns
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Patterns that identify secret values by their well-known prefix.
|
||||
# Ordered from most specific to least specific.
|
||||
_BARE_PATTERNS: List[re.Pattern] = [
|
||||
# OpenAI / Anthropic-style keys: sk-<20+ alnum/hyphen/underscore chars>
|
||||
# Covers: sk-<key>, sk-ant-<key>, sk-proj-<key>, etc.
|
||||
re.compile(r"\bsk-[A-Za-z0-9_-]{20,}\b"),
|
||||
# GitHub classic personal access token
|
||||
re.compile(r"\bghp_[A-Za-z0-9]{36}\b"),
|
||||
# GitHub server-to-server token
|
||||
re.compile(r"\bghs_[A-Za-z0-9]{36}\b"),
|
||||
# GitHub fine-grained personal access token
|
||||
re.compile(r"\bgithub_pat_[A-Za-z0-9_]{82}\b"),
|
||||
# AWS access key ID
|
||||
re.compile(r"\bAKIA[0-9A-Z]{16}\b"),
|
||||
]
|
||||
|
||||
# Contextual pattern: keyword= followed by a high-entropy value.
|
||||
#
|
||||
# Group 1 captures the keyword + equals sign so it is preserved in the
|
||||
# replacement — "api_key=[REDACTED]" is more informative than "[REDACTED]".
|
||||
#
|
||||
# The value charset [A-Za-z0-9+/] covers base64 and common token alphabets.
|
||||
# The minimum length of 40 chars prevents false-positives on short values.
|
||||
_CONTEXTUAL_RE: re.Pattern = re.compile(
|
||||
r"(?i)"
|
||||
r"((?:api_key|key|token|secret|password)\s*=\s*)"
|
||||
r"([A-Za-z0-9+/]{40,}={0,2})"
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public API
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _redact_secrets(content: str) -> str:
|
||||
"""Scrub known secret patterns from *content*, replacing with ``[REDACTED]``.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
content:
|
||||
Raw string to scrub — typically a ``commit_memory`` payload.
|
||||
|
||||
Returns
|
||||
-------
|
||||
str
|
||||
Copy of *content* with secrets replaced. If no secrets are found,
|
||||
the original string is returned unchanged. Calling this function
|
||||
on already-redacted content is safe (idempotent).
|
||||
|
||||
Examples::
|
||||
|
||||
>>> _redact_secrets("token is sk-abc1234567890123456789012345")
|
||||
'token is [REDACTED]'
|
||||
|
||||
>>> _redact_secrets("api_key=" + "A" * 45)
|
||||
'api_key=[REDACTED]'
|
||||
|
||||
>>> _redact_secrets("The answer is 42.")
|
||||
'The answer is 42.'
|
||||
|
||||
>>> _redact_secrets("[REDACTED]")
|
||||
'[REDACTED]'
|
||||
"""
|
||||
result = content
|
||||
|
||||
# Apply prefix-based patterns first (most unambiguous)
|
||||
for pattern in _BARE_PATTERNS:
|
||||
result = pattern.sub(REDACTED, result)
|
||||
|
||||
# Apply contextual pattern — preserve keyword, replace only the value
|
||||
result = _CONTEXTUAL_RE.sub(r"\1" + REDACTED, result)
|
||||
|
||||
return result
|
||||
@@ -1,344 +0,0 @@
|
||||
"""Skill dependency security scanner — supply-chain risk management.
|
||||
|
||||
Scans a skill's ``requirements.txt`` for known CVEs before the skill is
|
||||
loaded into the workspace. Two scanners are supported:
|
||||
|
||||
Snyk CLI — ``snyk test --file=requirements.txt --json``
|
||||
Preferred; requires the ``snyk`` binary in PATH and
|
||||
a SNYK_TOKEN env var for authenticated scans.
|
||||
|
||||
pip-audit — ``pip-audit -r requirements.txt --json``
|
||||
Fallback; no authentication required.
|
||||
|
||||
The scanner is auto-selected: Snyk if available, pip-audit otherwise.
|
||||
If neither is present in PATH the scan is silently skipped with a log line.
|
||||
|
||||
Scan mode (``security_scan.mode`` in config.yaml):
|
||||
|
||||
block — raise ``SkillSecurityError`` when critical/high CVEs are found;
|
||||
the skill is *not* loaded.
|
||||
warn — log a WARNING + audit event; the skill is loaded anyway.
|
||||
off — skip scanning entirely; useful in air-gapped CI.
|
||||
|
||||
Audit trail
|
||||
-----------
|
||||
Every scan (pass or fail) is recorded via ``tools.audit.log_event`` with
|
||||
``event_type="security_scan"``, enabling compliance reports to prove that
|
||||
all loaded skills were checked before activation.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import shutil
|
||||
import subprocess
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
from builtin_tools.audit import log_event
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public exception
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class SkillSecurityError(RuntimeError):
|
||||
"""Raised when a skill fails security scanning in ``block`` mode.
|
||||
|
||||
The message contains the skill name, scanner used, and a summary of the
|
||||
critical/high findings so operators can act on it immediately.
|
||||
"""
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Data models
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@dataclass
|
||||
class CVEFinding:
|
||||
"""A single vulnerability finding from a security scanner."""
|
||||
|
||||
vuln_id: str
|
||||
"""CVE or advisory identifier, e.g. ``SNYK-PYTHON-REQUESTS-1234``."""
|
||||
package: str
|
||||
"""Affected package name."""
|
||||
version: str
|
||||
"""Installed version of the package."""
|
||||
severity: str
|
||||
"""One of: critical | high | medium | low | unknown."""
|
||||
description: str
|
||||
"""Short human-readable summary (≤ 200 chars)."""
|
||||
|
||||
|
||||
@dataclass
|
||||
class ScanResult:
|
||||
"""Aggregated result of a single skill dependency scan."""
|
||||
|
||||
skill_name: str
|
||||
scanner: str
|
||||
"""Scanner used: ``"snyk"`` | ``"pip-audit"`` | ``"none"``."""
|
||||
requirements_file: Optional[str]
|
||||
"""Absolute path to the scanned requirements.txt, or ``None``."""
|
||||
findings: list[CVEFinding] = field(default_factory=list)
|
||||
scan_error: Optional[str] = None
|
||||
"""Non-fatal scanner error (e.g. timeout); findings may be incomplete."""
|
||||
|
||||
@property
|
||||
def critical_or_high(self) -> list[CVEFinding]:
|
||||
return [f for f in self.findings if f.severity in ("critical", "high")]
|
||||
|
||||
@property
|
||||
def has_critical_or_high(self) -> bool:
|
||||
return bool(self.critical_or_high)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Internal helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _find_requirements(skill_path: Path) -> Optional[Path]:
|
||||
"""Return the first ``requirements.txt`` found in the skill tree."""
|
||||
for candidate in (
|
||||
skill_path / "requirements.txt",
|
||||
skill_path / "tools" / "requirements.txt",
|
||||
):
|
||||
if candidate.exists():
|
||||
return candidate
|
||||
return None
|
||||
|
||||
|
||||
def _run_scanner(cmd: list[str], timeout: int = 120) -> tuple[str, Optional[str]]:
|
||||
"""Run a scanner subprocess and return ``(stdout, error_or_None)``."""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
)
|
||||
# Both Snyk and pip-audit exit 1 when vulns are found — not an error.
|
||||
# Exit 2 from Snyk means a genuine scan failure.
|
||||
if result.returncode == 2 and not result.stdout.strip():
|
||||
return "", f"scanner exited 2: {result.stderr.strip()[:200]}"
|
||||
return result.stdout, None
|
||||
except subprocess.TimeoutExpired:
|
||||
return "", f"scanner timed out after {timeout}s"
|
||||
except FileNotFoundError as exc:
|
||||
return "", str(exc)
|
||||
except Exception as exc: # pylint: disable=broad-except
|
||||
return "", str(exc)
|
||||
|
||||
|
||||
def _parse_snyk(stdout: str) -> tuple[list[CVEFinding], Optional[str]]:
|
||||
"""Parse ``snyk test --json`` output."""
|
||||
if not stdout.strip():
|
||||
return [], "empty snyk output"
|
||||
try:
|
||||
data = json.loads(stdout)
|
||||
except json.JSONDecodeError as exc:
|
||||
return [], f"snyk JSON parse error: {exc}"
|
||||
|
||||
vulns = data.get("vulnerabilities", [])
|
||||
findings = [
|
||||
CVEFinding(
|
||||
vuln_id=v.get("id", "UNKNOWN"),
|
||||
package=v.get("packageName", "?"),
|
||||
version=v.get("version", "?"),
|
||||
severity=v.get("severity", "unknown").lower(),
|
||||
description=(v.get("title", "") or "")[:200],
|
||||
)
|
||||
for v in vulns
|
||||
if isinstance(v, dict)
|
||||
]
|
||||
return findings, None
|
||||
|
||||
|
||||
def _parse_pip_audit(stdout: str) -> tuple[list[CVEFinding], Optional[str]]:
|
||||
"""Parse ``pip-audit --json`` output.
|
||||
|
||||
pip-audit does not always provide a CVSS severity level. When absent we
|
||||
conservatively classify the finding as ``"high"`` so it is not silently
|
||||
ignored in ``warn`` mode.
|
||||
"""
|
||||
if not stdout.strip():
|
||||
return [], "empty pip-audit output"
|
||||
try:
|
||||
data = json.loads(stdout)
|
||||
except json.JSONDecodeError as exc:
|
||||
return [], f"pip-audit JSON parse error: {exc}"
|
||||
|
||||
# pip-audit ≥ 2.x wraps results in {"dependencies": [...]}
|
||||
if isinstance(data, dict):
|
||||
deps = data.get("dependencies", [])
|
||||
else:
|
||||
deps = data # older versions return a bare list
|
||||
|
||||
findings: list[CVEFinding] = []
|
||||
for dep in deps:
|
||||
if not isinstance(dep, dict):
|
||||
continue
|
||||
for vuln in dep.get("vulns", []):
|
||||
sev_raw = vuln.get("fix_versions") and "high" # pip-audit lacks severity
|
||||
sev = (vuln.get("severity") or sev_raw or "high").lower()
|
||||
findings.append(
|
||||
CVEFinding(
|
||||
vuln_id=vuln.get("id", "UNKNOWN"),
|
||||
package=dep.get("name", "?"),
|
||||
version=dep.get("version", "?"),
|
||||
severity=sev,
|
||||
description=(vuln.get("description", "") or "")[:200],
|
||||
)
|
||||
)
|
||||
return findings, None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public API
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def scan_skill_dependencies(
|
||||
skill_name: str,
|
||||
skill_path: Path,
|
||||
mode: str,
|
||||
fail_open_if_no_scanner: bool = True,
|
||||
) -> ScanResult:
|
||||
"""Scan a skill's dependency file for known CVEs.
|
||||
|
||||
Args:
|
||||
skill_name: Name of the skill (used in log messages and audit events).
|
||||
skill_path: Absolute path to the skill's root directory.
|
||||
mode: ``"block"`` | ``"warn"`` | ``"off"``
|
||||
fail_open_if_no_scanner:
|
||||
When *True* (default) silently skip scanning if neither snyk nor
|
||||
pip-audit is in PATH. When *False* and ``mode="block"``, raise
|
||||
:class:`SkillSecurityError` so operators know the gate is absent.
|
||||
Corresponds to ``security_scan.fail_open_if_no_scanner`` in
|
||||
config.yaml. Closes #268.
|
||||
|
||||
Returns:
|
||||
A :class:`ScanResult` describing what was found.
|
||||
|
||||
Raises:
|
||||
:class:`SkillSecurityError`: When ``mode="block"`` and one or more
|
||||
critical/high severity CVEs are found — OR when
|
||||
``mode="block"`` and ``fail_open_if_no_scanner=False`` and no
|
||||
scanner is available.
|
||||
"""
|
||||
if mode == "off":
|
||||
return ScanResult(skill_name=skill_name, scanner="none", requirements_file=None)
|
||||
|
||||
req_file = _find_requirements(skill_path)
|
||||
if req_file is None:
|
||||
# No requirements file — nothing to scan; not a problem.
|
||||
return ScanResult(skill_name=skill_name, scanner="none", requirements_file=None)
|
||||
|
||||
# ── Select scanner ────────────────────────────────────────────────────────
|
||||
scanner_name: str
|
||||
findings: list[CVEFinding]
|
||||
scan_error: Optional[str]
|
||||
|
||||
if shutil.which("snyk"):
|
||||
scanner_name = "snyk"
|
||||
stdout, run_error = _run_scanner(
|
||||
["snyk", "test", f"--file={req_file}", "--json"]
|
||||
)
|
||||
if run_error:
|
||||
findings, scan_error = [], run_error
|
||||
else:
|
||||
findings, scan_error = _parse_snyk(stdout)
|
||||
|
||||
elif shutil.which("pip-audit"):
|
||||
scanner_name = "pip-audit"
|
||||
stdout, run_error = _run_scanner(
|
||||
["pip-audit", "-r", str(req_file), "--json", "--progress-spinner=off"]
|
||||
)
|
||||
if run_error:
|
||||
findings, scan_error = [], run_error
|
||||
else:
|
||||
findings, scan_error = _parse_pip_audit(stdout)
|
||||
|
||||
else:
|
||||
logger.info(
|
||||
"security_scan: no scanner (snyk, pip-audit) in PATH — skipping %s",
|
||||
skill_name,
|
||||
)
|
||||
log_event(
|
||||
event_type="security_scan",
|
||||
action="skill.security_scan",
|
||||
resource=skill_name,
|
||||
outcome="skipped",
|
||||
reason="no_scanner_in_path",
|
||||
requirements_file=str(req_file),
|
||||
mode=mode,
|
||||
)
|
||||
# #268: if fail_open_if_no_scanner=False and mode=block, the operator
|
||||
# explicitly opted in to "fail closed" — raise so the missing scanner
|
||||
# is visible rather than silently skipped.
|
||||
if not fail_open_if_no_scanner and mode == "block":
|
||||
raise SkillSecurityError(
|
||||
f"Skill '{skill_name}' blocked: no scanner (snyk or pip-audit) "
|
||||
f"found in PATH and fail_open_if_no_scanner=false"
|
||||
)
|
||||
return ScanResult(
|
||||
skill_name=skill_name,
|
||||
scanner="none",
|
||||
requirements_file=str(req_file),
|
||||
scan_error="No scanner (snyk or pip-audit) found in PATH",
|
||||
)
|
||||
|
||||
result = ScanResult(
|
||||
skill_name=skill_name,
|
||||
scanner=scanner_name,
|
||||
requirements_file=str(req_file),
|
||||
findings=findings,
|
||||
scan_error=scan_error,
|
||||
)
|
||||
|
||||
# ── Log scan outcome to audit trail ──────────────────────────────────────
|
||||
audit_outcome = "clean" if not result.has_critical_or_high else "vulnerable"
|
||||
log_event(
|
||||
event_type="security_scan",
|
||||
action="skill.security_scan",
|
||||
resource=skill_name,
|
||||
outcome=audit_outcome,
|
||||
scanner=scanner_name,
|
||||
requirements_file=str(req_file),
|
||||
total_findings=len(findings),
|
||||
critical_or_high_count=len(result.critical_or_high),
|
||||
scan_error=scan_error,
|
||||
)
|
||||
|
||||
if scan_error:
|
||||
logger.warning(
|
||||
"security_scan: scanner error for skill '%s': %s", skill_name, scan_error
|
||||
)
|
||||
|
||||
# ── Enforce mode ─────────────────────────────────────────────────────────
|
||||
if result.has_critical_or_high:
|
||||
summary = ", ".join(
|
||||
f"{f.vuln_id}({f.severity}) in {f.package}@{f.version}"
|
||||
for f in result.critical_or_high[:5]
|
||||
)
|
||||
if len(result.critical_or_high) > 5:
|
||||
summary += f" … and {len(result.critical_or_high) - 5} more"
|
||||
|
||||
msg = (
|
||||
f"Skill '{skill_name}' has {len(result.critical_or_high)} "
|
||||
f"critical/high CVE(s) [{scanner_name}]: {summary}"
|
||||
)
|
||||
|
||||
if mode == "block":
|
||||
logger.error("Blocking skill load — %s", msg)
|
||||
raise SkillSecurityError(msg)
|
||||
|
||||
# warn mode — continue loading, but make noise
|
||||
logger.warning("Security warning — %s", msg)
|
||||
|
||||
return result
|
||||
@@ -1,418 +0,0 @@
|
||||
"""OpenTelemetry (OTEL) instrumentation for the Molecule AI workspace runtime.
|
||||
|
||||
Architecture
|
||||
------------
|
||||
* One global ``TracerProvider`` is initialised at startup via ``setup_telemetry()``.
|
||||
* Up to three exporters are wired in:
|
||||
1. **OTLP/HTTP** — activated when ``OTEL_EXPORTER_OTLP_ENDPOINT`` is set.
|
||||
Point this at any compatible collector (Jaeger, Tempo, Grafana OTEL, …).
|
||||
2. **Langfuse OTLP bridge** — activated when the ``LANGFUSE_HOST``,
|
||||
``LANGFUSE_PUBLIC_KEY`` and ``LANGFUSE_SECRET_KEY`` env vars are all present.
|
||||
Langfuse ≥4 accepts OTLP/HTTP at ``<host>/api/public/otel``.
|
||||
This is a *second* exporter alongside the existing Langfuse LangChain
|
||||
callback handler in agent.py — both paths emit spans simultaneously.
|
||||
3. **Console** (debug) — activated when ``OTEL_DEBUG=1``.
|
||||
|
||||
* **W3C TraceContext** propagation (``traceparent`` / ``tracestate``) is used for
|
||||
cross-workspace context injection and extraction so A2A hops form a single
|
||||
distributed trace.
|
||||
|
||||
* ``make_trace_middleware()`` returns an ASGI middleware that extracts incoming
|
||||
trace context from HTTP headers and stores it in a ``ContextVar`` so the
|
||||
A2A executor can access it to parent its spans correctly.
|
||||
|
||||
GenAI semantic conventions
|
||||
--------------------------
|
||||
Attribute constants for ``gen_ai.*`` follow OpenTelemetry GenAI SemConv 1.26.
|
||||
|
||||
Usage example
|
||||
-------------
|
||||
# main.py — call once at startup
|
||||
from builtin_tools.telemetry import setup_telemetry, make_trace_middleware
|
||||
setup_telemetry(service_name=workspace_id)
|
||||
instrumented = make_trace_middleware(app.build())
|
||||
|
||||
# Any module
|
||||
from builtin_tools.telemetry import get_tracer
|
||||
tracer = get_tracer()
|
||||
with tracer.start_as_current_span("my_span") as span:
|
||||
span.set_attribute("key", "value")
|
||||
|
||||
# Outgoing HTTP — inject W3C headers
|
||||
from builtin_tools.telemetry import inject_trace_headers
|
||||
headers = inject_trace_headers({"Content-Type": "application/json"})
|
||||
await client.post(url, headers=headers, ...)
|
||||
|
||||
# Incoming HTTP — extract context (done automatically by middleware)
|
||||
from builtin_tools.telemetry import extract_trace_context
|
||||
ctx = extract_trace_context(dict(request.headers))
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
import logging
|
||||
import os
|
||||
from contextvars import ContextVar
|
||||
from typing import Any, Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GenAI Semantic Convention attribute keys (OTel SemConv 1.26)
|
||||
# https://opentelemetry.io/docs/specs/semconv/gen-ai/
|
||||
# ---------------------------------------------------------------------------
|
||||
GEN_AI_SYSTEM = "gen_ai.system"
|
||||
GEN_AI_REQUEST_MODEL = "gen_ai.request.model"
|
||||
GEN_AI_OPERATION_NAME = "gen_ai.operation.name"
|
||||
GEN_AI_USAGE_INPUT_TOKENS = "gen_ai.usage.input_tokens"
|
||||
GEN_AI_USAGE_OUTPUT_TOKENS = "gen_ai.usage.output_tokens"
|
||||
GEN_AI_RESPONSE_FINISH_REASONS = "gen_ai.response.finish_reasons"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Workspace / A2A attribute keys
|
||||
# ---------------------------------------------------------------------------
|
||||
WORKSPACE_ID_ATTR = "workspace.id"
|
||||
A2A_SOURCE_WORKSPACE = "a2a.source_workspace_id"
|
||||
A2A_TARGET_WORKSPACE = "a2a.target_workspace_id"
|
||||
A2A_TASK_ID = "a2a.task_id"
|
||||
MEMORY_SCOPE = "memory.scope"
|
||||
MEMORY_QUERY = "memory.query"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Module-level state
|
||||
# ---------------------------------------------------------------------------
|
||||
WORKSPACE_ID: str = os.environ.get("WORKSPACE_ID", "unknown")
|
||||
|
||||
_initialized: bool = False
|
||||
_tracer: Any = None # opentelemetry.trace.Tracer | _NoopTracer
|
||||
|
||||
# ContextVar that carries incoming trace context from the ASGI middleware to
|
||||
# the A2A executor. Using a ContextVar (rather than a global) is safe with
|
||||
# asyncio because each task inherits a copy of the context at creation time.
|
||||
_incoming_trace_context: ContextVar[Optional[Any]] = ContextVar(
|
||||
"otel_incoming_trace_context", default=None
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public API
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def setup_telemetry(service_name: Optional[str] = None) -> None:
|
||||
"""Initialise the global ``TracerProvider``. Safe to call multiple times.
|
||||
|
||||
Reads configuration from environment variables:
|
||||
|
||||
``OTEL_EXPORTER_OTLP_ENDPOINT``
|
||||
Base URL of an OTLP-compatible collector (e.g. ``http://jaeger:4318``).
|
||||
Spans are sent to ``<endpoint>/v1/traces``.
|
||||
|
||||
``LANGFUSE_HOST`` + ``LANGFUSE_PUBLIC_KEY`` + ``LANGFUSE_SECRET_KEY``
|
||||
When all three are set, a second OTLP exporter is wired to Langfuse's
|
||||
ingest endpoint using HTTP Basic auth.
|
||||
|
||||
``OTEL_DEBUG``
|
||||
Set to ``1`` / ``true`` to also print spans to stdout.
|
||||
"""
|
||||
global _initialized, _tracer
|
||||
|
||||
if _initialized:
|
||||
return
|
||||
|
||||
try:
|
||||
from opentelemetry import propagate, trace
|
||||
from opentelemetry.baggage.propagation import W3CBaggagePropagator
|
||||
from opentelemetry.propagators.composite import CompositePropagator
|
||||
from opentelemetry.sdk.resources import SERVICE_NAME as OTEL_SERVICE_NAME
|
||||
from opentelemetry.sdk.resources import Resource
|
||||
from opentelemetry.sdk.trace import TracerProvider
|
||||
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
|
||||
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
|
||||
except ImportError as exc:
|
||||
logger.warning(
|
||||
"OTEL: opentelemetry packages not installed — telemetry disabled. "
|
||||
"Add opentelemetry-api, opentelemetry-sdk, "
|
||||
"opentelemetry-exporter-otlp-proto-http to requirements.txt. "
|
||||
"Error: %s",
|
||||
exc,
|
||||
)
|
||||
return
|
||||
|
||||
svc = service_name or f"molecule-{WORKSPACE_ID}"
|
||||
|
||||
resource = Resource.create(
|
||||
{
|
||||
OTEL_SERVICE_NAME: svc,
|
||||
"service.version": "1.0.0",
|
||||
WORKSPACE_ID_ATTR: WORKSPACE_ID,
|
||||
}
|
||||
)
|
||||
|
||||
provider = TracerProvider(resource=resource)
|
||||
|
||||
# -- Exporter 1: Generic OTLP/HTTP ----------------------------------------
|
||||
otlp_endpoint = os.environ.get("OTEL_EXPORTER_OTLP_ENDPOINT", "").rstrip("/")
|
||||
if otlp_endpoint:
|
||||
try:
|
||||
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
|
||||
|
||||
exporter = OTLPSpanExporter(endpoint=f"{otlp_endpoint}/v1/traces")
|
||||
provider.add_span_processor(BatchSpanProcessor(exporter))
|
||||
logger.info("OTEL: OTLP/HTTP exporter → %s", otlp_endpoint)
|
||||
except ImportError:
|
||||
logger.warning(
|
||||
"OTEL: OTEL_EXPORTER_OTLP_ENDPOINT is set but "
|
||||
"opentelemetry-exporter-otlp-proto-http is not installed"
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("OTEL: OTLP exporter init failed: %s", exc)
|
||||
|
||||
# -- Exporter 2: Langfuse OTLP bridge -------------------------------------
|
||||
# Langfuse ≥4 accepts OTLP at <host>/api/public/otel (Basic auth).
|
||||
lf_host = os.environ.get("LANGFUSE_HOST", "").rstrip("/")
|
||||
lf_public = os.environ.get("LANGFUSE_PUBLIC_KEY", "")
|
||||
lf_secret = os.environ.get("LANGFUSE_SECRET_KEY", "")
|
||||
|
||||
if lf_host and lf_public and lf_secret:
|
||||
try:
|
||||
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
|
||||
|
||||
lf_endpoint = f"{lf_host}/api/public/otel/v1/traces"
|
||||
token = base64.b64encode(f"{lf_public}:{lf_secret}".encode()).decode()
|
||||
lf_exporter = OTLPSpanExporter(
|
||||
endpoint=lf_endpoint,
|
||||
headers={"Authorization": f"Basic {token}"},
|
||||
)
|
||||
provider.add_span_processor(BatchSpanProcessor(lf_exporter))
|
||||
logger.info("OTEL: Langfuse OTLP bridge → %s", lf_endpoint)
|
||||
except ImportError:
|
||||
logger.warning(
|
||||
"OTEL: Langfuse env vars set but "
|
||||
"opentelemetry-exporter-otlp-proto-http is not installed"
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("OTEL: Langfuse OTLP bridge init failed: %s", exc)
|
||||
|
||||
# -- Exporter 3: Console (debug) ------------------------------------------
|
||||
if os.environ.get("OTEL_DEBUG", "").lower() in ("1", "true", "yes"):
|
||||
provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
|
||||
logger.info("OTEL: console debug exporter enabled")
|
||||
|
||||
# -- Register global provider + W3C propagators ---------------------------
|
||||
trace.set_tracer_provider(provider)
|
||||
propagate.set_global_textmap(
|
||||
CompositePropagator(
|
||||
[
|
||||
TraceContextTextMapPropagator(),
|
||||
W3CBaggagePropagator(),
|
||||
]
|
||||
)
|
||||
)
|
||||
|
||||
_tracer = trace.get_tracer(
|
||||
"molecule.workspace",
|
||||
schema_url="https://opentelemetry.io/schemas/1.26.0",
|
||||
)
|
||||
_initialized = True
|
||||
logger.info("OTEL: telemetry initialised for service '%s'", svc)
|
||||
|
||||
|
||||
def get_tracer() -> Any:
|
||||
"""Return the global ``Tracer``. Lazily calls ``setup_telemetry()`` if needed.
|
||||
|
||||
Returns a no-op tracer when the opentelemetry packages are not installed so
|
||||
that instrumented code never raises ``ImportError``.
|
||||
"""
|
||||
global _tracer
|
||||
|
||||
if not _initialized:
|
||||
setup_telemetry()
|
||||
|
||||
if _tracer is None:
|
||||
# Packages unavailable — hand back a no-op implementation
|
||||
try:
|
||||
from opentelemetry import trace
|
||||
|
||||
return trace.get_tracer("molecule.noop")
|
||||
except ImportError:
|
||||
return _NoopTracer()
|
||||
|
||||
return _tracer
|
||||
|
||||
|
||||
def inject_trace_headers(headers: dict) -> dict:
|
||||
"""Inject W3C ``traceparent`` / ``tracestate`` into *headers* and return it.
|
||||
|
||||
Mutates the dict in-place so it can be used directly::
|
||||
|
||||
headers = inject_trace_headers({"Content-Type": "application/json"})
|
||||
await client.post(url, headers=headers, ...)
|
||||
"""
|
||||
try:
|
||||
from opentelemetry import propagate
|
||||
|
||||
propagate.inject(headers)
|
||||
except Exception:
|
||||
pass # Never let telemetry break the caller
|
||||
return headers
|
||||
|
||||
|
||||
def extract_trace_context(carrier: dict) -> Any:
|
||||
"""Extract W3C trace context from a header mapping.
|
||||
|
||||
Returns an OpenTelemetry ``Context`` object suitable for::
|
||||
|
||||
tracer.start_as_current_span("name", context=ctx)
|
||||
|
||||
Returns ``None`` when packages are unavailable or no context is present.
|
||||
"""
|
||||
try:
|
||||
from opentelemetry import propagate
|
||||
|
||||
return propagate.extract(carrier)
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
def get_current_traceparent() -> Optional[str]:
|
||||
"""Return the W3C ``traceparent`` string for the active span, or ``None``."""
|
||||
try:
|
||||
from opentelemetry import trace
|
||||
|
||||
span = trace.get_current_span()
|
||||
ctx = span.get_span_context()
|
||||
if not ctx.is_valid:
|
||||
return None
|
||||
trace_id = format(ctx.trace_id, "032x")
|
||||
span_id = format(ctx.span_id, "016x")
|
||||
flags = "01" if ctx.trace_flags else "00"
|
||||
return f"00-{trace_id}-{span_id}-{flags}"
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
def make_trace_middleware(asgi_app: Any) -> Any:
|
||||
"""Wrap an ASGI application with W3C trace-context extraction middleware.
|
||||
|
||||
The middleware reads ``traceparent`` / ``tracestate`` from every incoming
|
||||
HTTP request and stores the extracted ``Context`` in the
|
||||
``_incoming_trace_context`` ContextVar. The A2A executor reads that
|
||||
ContextVar to parent its ``task_receive`` span correctly, forming an
|
||||
unbroken distributed trace across workspace hops.
|
||||
|
||||
Usage::
|
||||
|
||||
built = app.build()
|
||||
instrumented = make_trace_middleware(built)
|
||||
uvicorn.Config(instrumented, ...)
|
||||
"""
|
||||
|
||||
async def _middleware(scope: dict, receive: Any, send: Any) -> None: # type: ignore[override]
|
||||
if scope.get("type") != "http":
|
||||
await asgi_app(scope, receive, send)
|
||||
return
|
||||
|
||||
# Decode byte-headers from the ASGI scope (latin-1 per HTTP/1.1 spec)
|
||||
raw_headers: list[tuple[bytes, bytes]] = scope.get("headers", [])
|
||||
str_headers: dict[str, str] = {
|
||||
k.decode("latin-1"): v.decode("latin-1") for k, v in raw_headers
|
||||
}
|
||||
|
||||
ctx = extract_trace_context(str_headers)
|
||||
token = _incoming_trace_context.set(ctx)
|
||||
try:
|
||||
await asgi_app(scope, receive, send)
|
||||
finally:
|
||||
_incoming_trace_context.reset(token)
|
||||
|
||||
return _middleware
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers for GenAI attributes
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def gen_ai_system_from_model(model_str: str) -> str:
|
||||
"""Map a ``provider:model`` string to a ``gen_ai.system`` value."""
|
||||
if ":" not in model_str:
|
||||
return "unknown"
|
||||
provider = model_str.split(":", 1)[0].lower()
|
||||
return {
|
||||
"anthropic": "anthropic",
|
||||
"openai": "openai",
|
||||
"openrouter": "openrouter",
|
||||
"groq": "groq",
|
||||
"google_genai": "google",
|
||||
"ollama": "ollama",
|
||||
}.get(provider, provider)
|
||||
|
||||
|
||||
def record_llm_token_usage(span: Any, result: dict) -> None:
|
||||
"""Extract token counts from a LangGraph ainvoke result and set span attrs.
|
||||
|
||||
Handles both Anthropic (``usage``) and OpenAI (``token_usage``) metadata
|
||||
shapes. Silently skips if metadata is absent.
|
||||
"""
|
||||
try:
|
||||
messages = result.get("messages", [])
|
||||
for msg in reversed(messages):
|
||||
meta = getattr(msg, "response_metadata", {}) or {}
|
||||
# Anthropic
|
||||
usage = meta.get("usage", {})
|
||||
if usage:
|
||||
inp = usage.get("input_tokens") or usage.get("prompt_tokens")
|
||||
out = usage.get("output_tokens") or usage.get("completion_tokens")
|
||||
if inp is not None:
|
||||
span.set_attribute(GEN_AI_USAGE_INPUT_TOKENS, int(inp))
|
||||
if out is not None:
|
||||
span.set_attribute(GEN_AI_USAGE_OUTPUT_TOKENS, int(out))
|
||||
return
|
||||
# OpenAI
|
||||
token_usage = meta.get("token_usage", {})
|
||||
if token_usage:
|
||||
inp = token_usage.get("prompt_tokens")
|
||||
out = token_usage.get("completion_tokens")
|
||||
if inp is not None:
|
||||
span.set_attribute(GEN_AI_USAGE_INPUT_TOKENS, int(inp))
|
||||
if out is not None:
|
||||
span.set_attribute(GEN_AI_USAGE_OUTPUT_TOKENS, int(out))
|
||||
return
|
||||
except Exception:
|
||||
pass # Best-effort — never break the caller
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# No-op fallbacks (used when opentelemetry packages are absent)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class _NoopSpan:
|
||||
"""Transparent no-op span that satisfies the context-manager protocol."""
|
||||
|
||||
def set_attribute(self, key: str, value: Any) -> None: # noqa: ARG002
|
||||
pass
|
||||
|
||||
def set_status(self, *args: Any, **kwargs: Any) -> None:
|
||||
pass
|
||||
|
||||
def record_exception(self, exc: BaseException, *args: Any, **kwargs: Any) -> None:
|
||||
pass
|
||||
|
||||
def add_event(self, name: str, *args: Any, **kwargs: Any) -> None:
|
||||
pass
|
||||
|
||||
def __enter__(self) -> "_NoopSpan":
|
||||
return self
|
||||
|
||||
def __exit__(self, *args: Any) -> None:
|
||||
pass
|
||||
|
||||
|
||||
class _NoopTracer:
|
||||
"""Transparent no-op tracer returned when the SDK is unavailable."""
|
||||
|
||||
def start_as_current_span(self, name: str, *args: Any, **kwargs: Any) -> _NoopSpan: # noqa: ARG002
|
||||
return _NoopSpan()
|
||||
|
||||
def start_span(self, name: str, *args: Any, **kwargs: Any) -> _NoopSpan: # noqa: ARG002
|
||||
return _NoopSpan()
|
||||
@@ -1,697 +0,0 @@
|
||||
"""Temporal durable execution wrapper for Molecule AI A2A workspaces.
|
||||
|
||||
Architecture
|
||||
-----------
|
||||
A co-located Temporal worker runs as an asyncio background task **inside the
|
||||
same process** as the A2A server. This means worker activities share the same
|
||||
memory space as the A2A handler, which lets us bridge non-serialisable objects
|
||||
(LangGraph agent, EventQueue, RequestContext) through an in-process registry
|
||||
without having to serialise them through Temporal's state store.
|
||||
|
||||
Workflow stages (names mirror the OTEL span names in a2a_executor.py):
|
||||
|
||||
task_receive → llm_call → task_complete
|
||||
|
||||
task_receive — durable checkpoint: task acknowledged, queued
|
||||
llm_call — durable checkpoint: LLM execution + SSE streaming (retryable)
|
||||
task_complete — durable checkpoint: execution finished, telemetry recorded
|
||||
|
||||
Crash-recovery behaviour
|
||||
------------------------
|
||||
If the process crashes while ``llm_call`` is running, Temporal retries the
|
||||
activity on the restarted process. The in-process registry is empty after a
|
||||
restart, so the activity detects a registry miss, logs a warning, and returns
|
||||
an error result. The SSE client connection is already gone at that point so
|
||||
no response can be delivered — but the task is permanently recorded in
|
||||
Temporal's history and will not silently disappear.
|
||||
|
||||
Env vars
|
||||
--------
|
||||
TEMPORAL_HOST Temporal gRPC endpoint (default: ``localhost:7233``)
|
||||
Set this to enable durable execution. Leave unset (or point
|
||||
at an unreachable host) to run in direct-execution mode.
|
||||
|
||||
Dependencies (optional)
|
||||
-----------
|
||||
temporalio>=1.7.0
|
||||
|
||||
Add to requirements.txt to enable. The module loads and the wrapper class
|
||||
works without the package installed — all Temporal paths return early with a
|
||||
graceful fallback to direct execution.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import dataclasses
|
||||
import logging
|
||||
import os
|
||||
import uuid
|
||||
from datetime import timedelta
|
||||
from typing import Any, Optional
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _platform_url() -> str:
|
||||
"""Return the platform URL, defaulting to host.docker.internal.
|
||||
|
||||
The workspace runtime always runs inside a Docker container, so
|
||||
``localhost`` refers to the container itself, not the platform host.
|
||||
The platform API is only reachable via ``host.docker.internal`` from
|
||||
within a workspace container, regardless of how the container was started.
|
||||
"""
|
||||
return os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# Constants
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
_TASK_QUEUE = "molecule-agent-tasks"
|
||||
_WORKFLOW_EXECUTION_TIMEOUT = timedelta(minutes=30)
|
||||
_ACTIVITY_START_TO_CLOSE_TIMEOUT = timedelta(minutes=10)
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# Checkpoint persistence (non-fatal)
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
async def _fetch_latest_checkpoint(workspace_id: str) -> Optional[dict]:
|
||||
"""GET /workspaces/:id/checkpoints/latest — returns the most recently
|
||||
completed step for this workspace, or None if no checkpoints exist yet.
|
||||
|
||||
Non-fatal: any HTTP error, network failure, or timeout returns None so
|
||||
the calling code continues without a resume context. A 404 (no checkpoints)
|
||||
is the expected response for a freshly provisioned workspace.
|
||||
|
||||
Args:
|
||||
workspace_id: The workspace to query.
|
||||
|
||||
Reads:
|
||||
PLATFORM_URL Platform base URL (default ``http://host.docker.internal:8080``).
|
||||
"""
|
||||
try:
|
||||
from platform_auth import auth_headers as _auth_headers # type: ignore[import]
|
||||
|
||||
platform_url = _platform_url()
|
||||
url = f"{platform_url}/workspaces/{workspace_id}/checkpoints/latest"
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.get(url, headers=_auth_headers())
|
||||
if resp.status_code == 404:
|
||||
return None
|
||||
resp.raise_for_status()
|
||||
return resp.json()
|
||||
except Exception as exc:
|
||||
logger.debug(
|
||||
"Temporal: latest checkpoint fetch skipped workspace=%s: %s "
|
||||
"(non-fatal — starting fresh context)",
|
||||
workspace_id,
|
||||
exc,
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
async def _save_checkpoint(
|
||||
workspace_id: str,
|
||||
workflow_id: str,
|
||||
step_name: str,
|
||||
step_index: int,
|
||||
payload: Optional[dict] = None,
|
||||
) -> None:
|
||||
"""POST a step checkpoint to the platform.
|
||||
|
||||
Non-fatal: any HTTP error, network failure, or timeout is logged as a
|
||||
WARNING and silently swallowed so the calling activity always continues.
|
||||
Checkpoint loss is survivable; aborting a workflow on a transient DB or
|
||||
network blip is not.
|
||||
|
||||
Args:
|
||||
workspace_id: The workspace whose token is used for auth.
|
||||
workflow_id: Unique ID for this workflow execution (task_id).
|
||||
step_name: Temporal activity stage name
|
||||
(``task_receive`` / ``llm_call`` / ``task_complete``).
|
||||
step_index: 0-based stage index matching the platform schema.
|
||||
payload: Optional JSON-serialisable dict stored as JSONB.
|
||||
|
||||
Reads:
|
||||
PLATFORM_URL Platform base URL (default ``http://host.docker.internal:8080``).
|
||||
"""
|
||||
try:
|
||||
from platform_auth import auth_headers as _auth_headers # type: ignore[import]
|
||||
|
||||
platform_url = _platform_url()
|
||||
url = f"{platform_url}/workspaces/{workspace_id}/checkpoints"
|
||||
body: dict = {
|
||||
"workflow_id": workflow_id,
|
||||
"step_name": step_name,
|
||||
"step_index": step_index,
|
||||
}
|
||||
if payload is not None:
|
||||
body["payload"] = payload
|
||||
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.post(url, json=body, headers=_auth_headers())
|
||||
resp.raise_for_status()
|
||||
|
||||
logger.debug(
|
||||
"Temporal: checkpoint saved workspace=%s wf=%s step=%s idx=%d",
|
||||
workspace_id,
|
||||
workflow_id,
|
||||
step_name,
|
||||
step_index,
|
||||
)
|
||||
except Exception as exc:
|
||||
# Non-fatal: workflow continues regardless of checkpoint outcome.
|
||||
logger.warning(
|
||||
"Temporal: checkpoint failed workspace=%s wf=%s step=%s: %s "
|
||||
"(non-fatal — workflow continues)",
|
||||
workspace_id,
|
||||
workflow_id,
|
||||
step_name,
|
||||
exc,
|
||||
)
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# Serialisable data models
|
||||
# These are the only objects that cross the Temporal serialisation boundary.
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class AgentTaskInput:
|
||||
"""Serialisable snapshot of an incoming A2A task.
|
||||
|
||||
All fields must be JSON-representable so that Temporal can persist them in
|
||||
its workflow history (used for crash recovery and replay).
|
||||
"""
|
||||
|
||||
task_id: str
|
||||
context_id: str
|
||||
user_input: str
|
||||
model: str
|
||||
workspace_id: str
|
||||
history: list # [[role, content], ...] — tuples converted to lists
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class LLMResult:
|
||||
"""Serialisable execution result passed from ``llm_call`` to ``task_complete``."""
|
||||
|
||||
final_text: str
|
||||
success: bool
|
||||
error: str = ""
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# In-process registry
|
||||
#
|
||||
# Maps task_id → {executor, context, event_queue, final_text}
|
||||
# Activities look up non-serialisable objects here. The registry is
|
||||
# populated by TemporalWorkflowWrapper.run() before the workflow starts and
|
||||
# cleaned up in the finally block when the workflow completes.
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
_task_registry: dict[str, dict[str, Any]] = {}
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# Temporal workflow + activities
|
||||
# Loaded only when the temporalio package is installed. The surrounding
|
||||
# try/except ensures the module imports cleanly without the package.
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
_TEMPORAL_AVAILABLE = False
|
||||
|
||||
try:
|
||||
from temporalio import activity, workflow
|
||||
from temporalio.client import Client
|
||||
from temporalio.worker import Worker
|
||||
|
||||
_TEMPORAL_AVAILABLE = True
|
||||
|
||||
# ── Activities ────────────────────────────────────────────────────────── #
|
||||
|
||||
@activity.defn(name="task_receive")
|
||||
async def task_receive_activity(inp: AgentTaskInput) -> dict:
|
||||
"""Durable checkpoint: task received and queued for LLM execution.
|
||||
|
||||
Mirrors the *task_receive* OTEL span opened in
|
||||
``LangGraphA2AExecutor._core_execute()``. This activity is lightweight —
|
||||
it validates that the in-process registry entry exists and logs receipt.
|
||||
The actual A2A "working" signal (``updater.start_work()``) is emitted
|
||||
inside ``_core_execute()`` so that SSE timing is preserved.
|
||||
|
||||
Saves a step checkpoint after completing. Checkpoint failure is
|
||||
non-fatal — the activity returns normally regardless.
|
||||
"""
|
||||
logger.info(
|
||||
"Temporal[task_receive] task_id=%s context_id=%s workspace=%s model=%s",
|
||||
inp.task_id,
|
||||
inp.context_id,
|
||||
inp.workspace_id,
|
||||
inp.model,
|
||||
)
|
||||
if inp.task_id not in _task_registry:
|
||||
logger.warning(
|
||||
"Temporal[task_receive] task_id=%s not found in registry "
|
||||
"(crash recovery path — no SSE client connection available)",
|
||||
inp.task_id,
|
||||
)
|
||||
try:
|
||||
await _save_checkpoint(
|
||||
inp.workspace_id, inp.task_id, "task_receive", 0,
|
||||
{"task_id": inp.task_id, "status": "registry_miss"},
|
||||
)
|
||||
except Exception as _ckpt_exc: # pragma: no cover
|
||||
logger.warning("task_receive checkpoint swallowed: %s", _ckpt_exc)
|
||||
return {"task_id": inp.task_id, "status": "registry_miss"}
|
||||
|
||||
try:
|
||||
await _save_checkpoint(
|
||||
inp.workspace_id, inp.task_id, "task_receive", 0,
|
||||
{"task_id": inp.task_id, "status": "received"},
|
||||
)
|
||||
except Exception as _ckpt_exc: # pragma: no cover
|
||||
logger.warning("task_receive checkpoint swallowed: %s", _ckpt_exc)
|
||||
return {"task_id": inp.task_id, "status": "received"}
|
||||
|
||||
@activity.defn(name="llm_call")
|
||||
async def llm_call_activity(inp: AgentTaskInput) -> LLMResult:
|
||||
"""Durable checkpoint: LLM execution with streaming to the event_queue.
|
||||
|
||||
Mirrors the *llm_call* OTEL span in ``LangGraphA2AExecutor._core_execute()``.
|
||||
Calls ``executor._core_execute()`` which handles the full execution pipeline:
|
||||
SSE streaming, OTEL sub-spans, final message emission, and heartbeat updates.
|
||||
|
||||
On crash recovery (empty registry): logs a warning and returns an error
|
||||
result. Temporal records the failure and will retry if configured to do so.
|
||||
The original SSE client connection is gone after a crash, so no response
|
||||
can be delivered, but the task is durably recorded in Temporal's history.
|
||||
"""
|
||||
logger.info("Temporal[llm_call] task_id=%s", inp.task_id)
|
||||
|
||||
entry = _task_registry.get(inp.task_id)
|
||||
if entry is None:
|
||||
msg = (
|
||||
f"task_id={inp.task_id} not in registry — "
|
||||
"process likely restarted; original SSE client connection is gone"
|
||||
)
|
||||
logger.warning("Temporal[llm_call] registry miss: %s", msg)
|
||||
miss_result = LLMResult(final_text="", success=False, error=msg)
|
||||
try:
|
||||
await _save_checkpoint(
|
||||
inp.workspace_id, inp.task_id, "llm_call", 1,
|
||||
{"success": False, "error": msg},
|
||||
)
|
||||
except Exception as _ckpt_exc: # pragma: no cover
|
||||
logger.warning("llm_call checkpoint swallowed: %s", _ckpt_exc)
|
||||
return miss_result
|
||||
|
||||
try:
|
||||
executor = entry["executor"]
|
||||
context = entry["context"]
|
||||
event_queue = entry["event_queue"]
|
||||
|
||||
# _core_execute() is the renamed body of the original execute().
|
||||
# It handles: OTEL spans, SSE streaming, final message, heartbeat.
|
||||
final_text = await executor._core_execute(context, event_queue)
|
||||
|
||||
# Cache for task_complete observability
|
||||
entry["final_text"] = final_text or ""
|
||||
result = LLMResult(final_text=final_text or "", success=True)
|
||||
|
||||
except Exception as exc:
|
||||
logger.error(
|
||||
"Temporal[llm_call] task_id=%s execution error: %s",
|
||||
inp.task_id,
|
||||
exc,
|
||||
exc_info=True,
|
||||
)
|
||||
result = LLMResult(final_text="", success=False, error=str(exc))
|
||||
|
||||
try:
|
||||
await _save_checkpoint(
|
||||
inp.workspace_id, inp.task_id, "llm_call", 1,
|
||||
{"success": result.success, "error": result.error or None},
|
||||
)
|
||||
except Exception as _ckpt_exc: # pragma: no cover
|
||||
logger.warning("llm_call checkpoint swallowed: %s", _ckpt_exc)
|
||||
return result
|
||||
|
||||
@activity.defn(name="task_complete")
|
||||
async def task_complete_activity(result: LLMResult) -> None:
|
||||
"""Durable checkpoint: task execution finished.
|
||||
|
||||
Mirrors the *task_complete* OTEL span in ``LangGraphA2AExecutor._core_execute()``.
|
||||
This activity records the outcome for Temporal observability. The actual
|
||||
OTEL task_complete span fires inside ``_core_execute()``; this activity
|
||||
provides a durable, queryable record in Temporal's workflow history.
|
||||
|
||||
Saves a step checkpoint. Checkpoint failure is non-fatal.
|
||||
The ``workspace_id`` and ``task_id`` are not available in this activity
|
||||
(only the ``LLMResult`` is passed from ``llm_call``), so the checkpoint
|
||||
is skipped here — ``llm_call`` already captured the final outcome.
|
||||
"""
|
||||
if result.success:
|
||||
logger.info(
|
||||
"Temporal[task_complete] success=True final_text_len=%d",
|
||||
len(result.final_text),
|
||||
)
|
||||
else:
|
||||
logger.warning(
|
||||
"Temporal[task_complete] success=False error=%r",
|
||||
result.error,
|
||||
)
|
||||
|
||||
# ── Workflow ──────────────────────────────────────────────────────────── #
|
||||
|
||||
@workflow.defn
|
||||
class MoleculeAIAgentWorkflow:
|
||||
"""Durable Temporal workflow for Molecule AI A2A agent task execution.
|
||||
|
||||
Sequences three activities that mirror the OTEL span hierarchy in
|
||||
``LangGraphA2AExecutor._core_execute()``:
|
||||
|
||||
task_receive → llm_call → task_complete
|
||||
|
||||
Each activity is a durable checkpoint: if the process crashes between
|
||||
activities, Temporal resumes from the last completed checkpoint on
|
||||
restart. If an activity fails (exception or timeout), Temporal can
|
||||
retry it according to the configured retry policy.
|
||||
"""
|
||||
|
||||
@workflow.run
|
||||
async def run(self, inp: AgentTaskInput) -> LLMResult:
|
||||
opts: dict[str, Any] = {
|
||||
"start_to_close_timeout": _ACTIVITY_START_TO_CLOSE_TIMEOUT,
|
||||
}
|
||||
|
||||
# Stage 1 — acknowledge receipt (lightweight checkpoint)
|
||||
await workflow.execute_activity(task_receive_activity, inp, **opts)
|
||||
|
||||
# Stage 2 — LLM execution (main work; retryable on crash/timeout)
|
||||
result: LLMResult = await workflow.execute_activity(
|
||||
llm_call_activity, inp, **opts
|
||||
)
|
||||
|
||||
# Stage 3 — record completion (lightweight checkpoint)
|
||||
await workflow.execute_activity(task_complete_activity, result, **opts)
|
||||
|
||||
return result
|
||||
|
||||
except ImportError:
|
||||
# temporalio not installed — the wrapper class below will gracefully fall
|
||||
# back to direct execution for every call.
|
||||
logger.debug(
|
||||
"Temporal: temporalio package not installed — "
|
||||
"durable execution disabled (add temporalio>=1.7.0 to requirements.txt)"
|
||||
)
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# TemporalWorkflowWrapper
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TemporalWorkflowWrapper:
|
||||
"""Wraps ``LangGraphA2AExecutor.execute()`` with Temporal durable execution.
|
||||
|
||||
The wrapper intercepts each ``execute()`` call and routes it through a
|
||||
``MoleculeAIAgentWorkflow`` Temporal workflow. If Temporal is unavailable
|
||||
for any reason, execution falls back transparently to the direct path
|
||||
(``executor._core_execute()``), so the A2A server never crashes due to
|
||||
Temporal issues.
|
||||
|
||||
Lifecycle
|
||||
---------
|
||||
1. ``create_wrapper()`` — instantiate and register the global singleton.
|
||||
2. ``await wrapper.start()`` — connect to Temporal, launch the background
|
||||
worker. No-op (with a log warning) if Temporal is unreachable.
|
||||
3. Normal operation — ``wrapper.run()`` is called from ``execute()``.
|
||||
4. ``await wrapper.stop()`` — cancel the background worker task on shutdown.
|
||||
|
||||
Co-located worker pattern
|
||||
-------------------------
|
||||
The Temporal worker runs as an asyncio background task in the **same event
|
||||
loop** as the A2A server. This means:
|
||||
- No separate worker process to manage.
|
||||
- Activities share the process's memory (registry access works).
|
||||
- Worker and server share the same asyncio event loop.
|
||||
|
||||
Env vars
|
||||
--------
|
||||
``TEMPORAL_HOST`` Temporal gRPC address, e.g. ``localhost:7233`` or
|
||||
``temporal.internal:7233``. Defaults to
|
||||
``localhost:7233``. If Temporal is not reachable at
|
||||
this address, the wrapper falls back to direct execution.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._host: str = os.environ.get("TEMPORAL_HOST", "localhost:7233")
|
||||
self._client: Optional[Any] = None
|
||||
self._worker: Optional[Any] = None
|
||||
self._worker_task: Optional[asyncio.Task] = None # type: ignore[type-arg]
|
||||
self._available: bool = False
|
||||
|
||||
# ── Lifecycle ─────────────────────────────────────────────────────────── #
|
||||
|
||||
async def start(self) -> None:
|
||||
"""Connect to Temporal and start the co-located background worker.
|
||||
|
||||
Safe to call multiple times (idempotent after first success).
|
||||
Never raises — logs a warning and returns on any failure.
|
||||
"""
|
||||
if not _TEMPORAL_AVAILABLE:
|
||||
logger.info(
|
||||
"Temporal: temporalio package not installed — "
|
||||
"all tasks will use direct execution. "
|
||||
"To enable durable execution: pip install temporalio>=1.7.0"
|
||||
)
|
||||
return
|
||||
|
||||
if self._available:
|
||||
return # already started
|
||||
|
||||
# Connect to the Temporal server
|
||||
try:
|
||||
self._client = await Client.connect(self._host) # type: ignore[name-defined]
|
||||
logger.info("Temporal: connected to %s", self._host)
|
||||
except Exception as exc:
|
||||
logger.warning(
|
||||
"Temporal: cannot connect to %s (%s) — "
|
||||
"all tasks will use direct execution (no durable state)",
|
||||
self._host,
|
||||
exc,
|
||||
)
|
||||
return
|
||||
|
||||
# Start the worker as an asyncio background task
|
||||
try:
|
||||
self._worker = Worker( # type: ignore[name-defined]
|
||||
self._client,
|
||||
task_queue=_TASK_QUEUE,
|
||||
workflows=[MoleculeAIAgentWorkflow], # type: ignore[name-defined]
|
||||
activities=[
|
||||
task_receive_activity, # type: ignore[name-defined]
|
||||
llm_call_activity, # type: ignore[name-defined]
|
||||
task_complete_activity, # type: ignore[name-defined]
|
||||
],
|
||||
)
|
||||
self._worker_task = asyncio.create_task(
|
||||
self._worker.run(),
|
||||
name="temporal-worker",
|
||||
)
|
||||
self._available = True
|
||||
logger.info(
|
||||
"Temporal: co-located worker started on task queue '%s'",
|
||||
_TASK_QUEUE,
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning(
|
||||
"Temporal: worker initialisation failed (%s) — "
|
||||
"falling back to direct execution",
|
||||
exc,
|
||||
)
|
||||
|
||||
async def stop(self) -> None:
|
||||
"""Gracefully stop the Temporal worker background task."""
|
||||
self._available = False
|
||||
if self._worker_task and not self._worker_task.done():
|
||||
self._worker_task.cancel()
|
||||
try:
|
||||
await self._worker_task
|
||||
except (asyncio.CancelledError, Exception):
|
||||
pass
|
||||
logger.info("Temporal: worker stopped")
|
||||
|
||||
# ── Public API ────────────────────────────────────────────────────────── #
|
||||
|
||||
def is_available(self) -> bool:
|
||||
"""Return ``True`` if Temporal is connected and the worker is running."""
|
||||
return self._available
|
||||
|
||||
async def run(
|
||||
self,
|
||||
executor: Any,
|
||||
context: Any,
|
||||
event_queue: Any,
|
||||
) -> None:
|
||||
"""Route one A2A task execution through a Temporal durable workflow.
|
||||
|
||||
Steps
|
||||
-----
|
||||
1. Build a serialisable ``AgentTaskInput`` from the A2A request context.
|
||||
2. Store non-serialisable state (executor, context, event_queue) in
|
||||
the in-process ``_task_registry`` keyed by task_id.
|
||||
3. Submit and await ``MoleculeAIAgentWorkflow`` on the Temporal server.
|
||||
4. Clean up the registry entry (always, via ``finally``).
|
||||
|
||||
Falls back to ``executor._core_execute()`` if:
|
||||
- Temporal is not available (``is_available()`` is False).
|
||||
- Input extraction fails.
|
||||
- The workflow raises any exception.
|
||||
|
||||
This guarantees that the A2A client always receives a response even
|
||||
when Temporal is misconfigured or temporarily unreachable.
|
||||
"""
|
||||
if not self._available or self._client is None:
|
||||
# Temporal unavailable — silent direct fallback
|
||||
await executor._core_execute(context, event_queue)
|
||||
return
|
||||
|
||||
task_id = getattr(context, "task_id", None) or str(uuid.uuid4())
|
||||
context_id = getattr(context, "context_id", None) or str(uuid.uuid4())
|
||||
|
||||
# Build serialisable AgentTaskInput
|
||||
try:
|
||||
from adapters.shared_runtime import (
|
||||
extract_history as _extract_history,
|
||||
extract_message_text,
|
||||
)
|
||||
|
||||
user_input = extract_message_text(context) or ""
|
||||
raw_history = _extract_history(context)
|
||||
# Convert (role, content) tuples → [role, content] lists (JSON-safe)
|
||||
history: list = [list(pair) for pair in raw_history]
|
||||
except Exception as exc:
|
||||
logger.warning(
|
||||
"Temporal: failed to extract serialisable task input (%s) — "
|
||||
"falling back to direct execution",
|
||||
exc,
|
||||
)
|
||||
await executor._core_execute(context, event_queue)
|
||||
return
|
||||
|
||||
workspace_id_env = os.environ.get("WORKSPACE_ID", "unknown")
|
||||
|
||||
# Issue #837: query the latest checkpoint for this workspace.
|
||||
# If a previous workflow crashed mid-step, inject the last known
|
||||
# step into the history so the agent is aware of its prior state.
|
||||
# Non-fatal: a missing or 404 response means starting fresh.
|
||||
last_ckpt = await _fetch_latest_checkpoint(workspace_id_env)
|
||||
if last_ckpt:
|
||||
step_name = last_ckpt.get("step_name", "unknown")
|
||||
workflow_id_ckpt = last_ckpt.get("workflow_id", "")
|
||||
completed_at = last_ckpt.get("completed_at", "")
|
||||
ckpt_note = (
|
||||
f"[SYSTEM: This workspace was previously executing workflow "
|
||||
f"'{workflow_id_ckpt}'. The last recorded step was '{step_name}' "
|
||||
f"(completed at {completed_at}). If the current task is a "
|
||||
f"continuation of that workflow, resume from this point. "
|
||||
f"Otherwise ignore this context and start fresh.]"
|
||||
)
|
||||
# Prepend as a synthetic context entry so the agent sees it at the
|
||||
# start of its history — before any user messages for this task.
|
||||
history = [["system", ckpt_note]] + history
|
||||
logger.info(
|
||||
"Temporal: injecting checkpoint context task_id=%s last_step=%s wf=%s",
|
||||
task_id,
|
||||
step_name,
|
||||
workflow_id_ckpt,
|
||||
)
|
||||
|
||||
inp = AgentTaskInput(
|
||||
task_id=task_id,
|
||||
context_id=context_id,
|
||||
user_input=user_input,
|
||||
model=getattr(executor, "_model", "unknown"),
|
||||
workspace_id=workspace_id_env,
|
||||
history=history,
|
||||
)
|
||||
|
||||
# Register non-serialisable in-process state for activities to access
|
||||
_task_registry[task_id] = {
|
||||
"executor": executor,
|
||||
"context": context,
|
||||
"event_queue": event_queue,
|
||||
"final_text": "",
|
||||
}
|
||||
|
||||
try:
|
||||
logger.info(
|
||||
"Temporal: starting workflow molecule-%s on queue '%s'",
|
||||
task_id,
|
||||
_TASK_QUEUE,
|
||||
)
|
||||
await self._client.execute_workflow(
|
||||
MoleculeAIAgentWorkflow.run, # type: ignore[name-defined]
|
||||
inp,
|
||||
id=f"molecule-{task_id}",
|
||||
task_queue=_TASK_QUEUE,
|
||||
execution_timeout=_WORKFLOW_EXECUTION_TIMEOUT,
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.error(
|
||||
"Temporal: workflow molecule-%s failed (%s) — "
|
||||
"falling back to direct execution so client receives a response",
|
||||
task_id,
|
||||
exc,
|
||||
exc_info=True,
|
||||
)
|
||||
# Direct fallback ensures the SSE client is never left hanging
|
||||
await executor._core_execute(context, event_queue)
|
||||
finally:
|
||||
_task_registry.pop(task_id, None)
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# Module-level singleton helpers
|
||||
# Used by a2a_executor.py and main.py
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
_global_wrapper: Optional[TemporalWorkflowWrapper] = None
|
||||
|
||||
|
||||
def get_wrapper() -> Optional[TemporalWorkflowWrapper]:
|
||||
"""Return the global ``TemporalWorkflowWrapper``, or ``None`` if not set.
|
||||
|
||||
Called from ``LangGraphA2AExecutor.execute()`` on every request.
|
||||
Returns ``None`` before ``create_wrapper()`` is called (direct-execution mode).
|
||||
"""
|
||||
return _global_wrapper
|
||||
|
||||
|
||||
def create_wrapper() -> TemporalWorkflowWrapper:
|
||||
"""Create (or return the existing) global ``TemporalWorkflowWrapper``.
|
||||
|
||||
Idempotent — safe to call multiple times. Call ``await wrapper.start()``
|
||||
after this to connect to Temporal and launch the background worker.
|
||||
|
||||
Example (in main.py)::
|
||||
|
||||
from builtin_tools.temporal_workflow import create_wrapper as create_temporal_wrapper
|
||||
temporal_wrapper = create_temporal_wrapper()
|
||||
await temporal_wrapper.start() # connects + starts worker
|
||||
try:
|
||||
await server.serve()
|
||||
finally:
|
||||
await temporal_wrapper.stop()
|
||||
"""
|
||||
global _global_wrapper
|
||||
if _global_wrapper is None:
|
||||
_global_wrapper = TemporalWorkflowWrapper()
|
||||
return _global_wrapper
|
||||
@@ -1,57 +0,0 @@
|
||||
"""Helpers for building / mutating the workspace ``AgentCard``.
|
||||
|
||||
Kept as their own module so the behavior is unit-testable without booting
|
||||
the whole runtime (``main.py`` is ``# pragma: no cover``).
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Iterable
|
||||
|
||||
from a2a.types import AgentCard, AgentSkill
|
||||
|
||||
|
||||
def enrich_card_skills(card: AgentCard, loaded_skills: Iterable | None) -> bool:
|
||||
"""Replace ``card.skills`` with rich metadata from the adapter's loaded
|
||||
skills, in place. Pairs with PR #2756: the card was built up front from
|
||||
static ``config.skills`` names so /.well-known/agent-card.json could
|
||||
serve before ``adapter.setup()`` finishes; this swaps in the richer
|
||||
descriptions/tags/examples that ``setup()``'s skill loader produces.
|
||||
|
||||
Returns ``True`` on swap, ``False`` when the swap was skipped or
|
||||
failed. Failure cases:
|
||||
* ``loaded_skills`` is None / empty — caller didn't load any.
|
||||
* Any element doesn't expose ``.metadata.{id,name,description,tags,examples}``
|
||||
(a future adapter that doesn't follow the canonical shape).
|
||||
|
||||
Failures DO NOT raise — a malformed ``loaded_skills`` shape would
|
||||
otherwise propagate to ``main.py``'s outer ``except Exception``,
|
||||
silently degrading an OK boot to the not-configured state. Static
|
||||
stubs from ``config.skills`` stay in place; setup() already
|
||||
succeeded, the agent works, only the card's skill enrichment is
|
||||
degraded. Operator sees a clear log line; tests assert this
|
||||
distinction.
|
||||
"""
|
||||
if not loaded_skills:
|
||||
return False
|
||||
|
||||
try:
|
||||
rich = [
|
||||
AgentSkill(
|
||||
id=skill.metadata.id,
|
||||
name=skill.metadata.name,
|
||||
description=skill.metadata.description,
|
||||
tags=skill.metadata.tags,
|
||||
examples=skill.metadata.examples,
|
||||
)
|
||||
for skill in loaded_skills
|
||||
]
|
||||
except Exception as enrich_err: # noqa: BLE001
|
||||
print(
|
||||
f"Warning: skill metadata enrichment failed (keeping static "
|
||||
f"stubs from config.skills): {type(enrich_err).__name__}: {enrich_err}",
|
||||
flush=True,
|
||||
)
|
||||
return False
|
||||
|
||||
card.skills = rich
|
||||
return True
|
||||
@@ -1,659 +0,0 @@
|
||||
"""Load workspace configuration from config.yaml."""
|
||||
|
||||
import logging
|
||||
import os
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import yaml
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class RBACConfig:
|
||||
"""Role-based access control settings for this workspace.
|
||||
|
||||
``roles`` declares what this workspace is *allowed* to do. Each role
|
||||
name maps to a set of permitted actions. Built-in roles are defined in
|
||||
``tools/audit.ROLE_PERMISSIONS``; custom roles can be added via
|
||||
``allowed_actions``.
|
||||
|
||||
Built-in roles
|
||||
--------------
|
||||
admin All actions (delegate, approve, memory.read, memory.write)
|
||||
operator Same as admin — standard agent role (default)
|
||||
read-only memory.read only
|
||||
no-delegation approve + memory.read + memory.write
|
||||
no-approval delegate + memory.read + memory.write
|
||||
memory-readonly memory.read only
|
||||
|
||||
Example config.yaml snippet::
|
||||
|
||||
rbac:
|
||||
roles:
|
||||
- operator
|
||||
allowed_actions:
|
||||
analyst:
|
||||
- memory.read
|
||||
- memory.write
|
||||
"""
|
||||
|
||||
roles: list[str] = field(default_factory=lambda: ["operator"])
|
||||
"""List of role names granted to this workspace."""
|
||||
|
||||
allowed_actions: dict[str, list[str]] = field(default_factory=dict)
|
||||
"""Custom role → [action, ...] overrides. Takes precedence over built-ins."""
|
||||
|
||||
|
||||
@dataclass
|
||||
class HITLConfig:
|
||||
"""Human-In-The-Loop settings loaded from the ``hitl:`` block in config.yaml.
|
||||
|
||||
Example config.yaml snippet::
|
||||
|
||||
hitl:
|
||||
channels:
|
||||
- type: dashboard # always active
|
||||
- type: slack
|
||||
webhook_url: https://hooks.slack.com/services/…
|
||||
- type: email
|
||||
smtp_host: smtp.example.com
|
||||
from: alerts@example.com
|
||||
to: ops@example.com
|
||||
default_timeout: 300 # seconds
|
||||
bypass_roles: [admin]
|
||||
"""
|
||||
channels: list[dict] = field(default_factory=lambda: [{"type": "dashboard"}])
|
||||
default_timeout: float = 300.0
|
||||
bypass_roles: list[str] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass
|
||||
class DelegationConfig:
|
||||
retry_attempts: int = 3
|
||||
retry_delay: float = 5.0
|
||||
timeout: float = 120.0
|
||||
escalate: bool = True
|
||||
|
||||
|
||||
@dataclass
|
||||
class A2AConfig:
|
||||
port: int = 8000
|
||||
streaming: bool = True
|
||||
push_notifications: bool = True
|
||||
|
||||
|
||||
@dataclass
|
||||
class SandboxConfig:
|
||||
backend: str = "subprocess" # subprocess | docker
|
||||
memory_limit: str = "256m"
|
||||
timeout: int = 30
|
||||
|
||||
@dataclass
|
||||
class RuntimeConfig:
|
||||
"""Configuration for CLI-based agent runtimes (claude-code, codex, ollama, custom)."""
|
||||
command: str = "" # e.g. "claude", "codex", "ollama" (model goes in model field)
|
||||
args: list[str] = field(default_factory=list) # additional CLI args
|
||||
required_env: list[str] = field(default_factory=list) # env vars required to run (e.g. ["CLAUDE_CODE_OAUTH_TOKEN"])
|
||||
timeout: int = 0 # seconds (0 = no timeout — agents wait until done)
|
||||
model: str = "" # model override for the CLI
|
||||
provider: str = "" # explicit LLM provider (e.g., "anthropic", "openai",
|
||||
# "minimax"). Falls back to the top-level resolved
|
||||
# provider when empty. Adapters (hermes, claude-code,
|
||||
# codex) prefer this over slug-parsing the model name.
|
||||
# Per-model entries surfaced in the canvas Model dropdown. Each entry is a
|
||||
# raw dict with at least ``id``; ``required_env`` is the per-model auth
|
||||
# list (e.g. ``{"id": "MiniMax-M2.7", "required_env": ["MINIMAX_API_KEY"]}``).
|
||||
# Preflight prefers an entry's ``required_env`` over the top-level
|
||||
# ``required_env`` when the picked ``model`` matches an entry's ``id``
|
||||
# (case-insensitive). The top-level list remains the fallback so single-
|
||||
# model templates need not migrate. Surfaced 2026-05-02 after a user
|
||||
# picked MiniMax in canvas, set MINIMAX_API_KEY, and still got booted
|
||||
# into a CLAUDE_CODE_OAUTH_TOKEN preflight failure.
|
||||
models: list[dict] = field(default_factory=list)
|
||||
# Deprecated — use required_env + secrets API instead. Kept for backward compat.
|
||||
auth_token_env: str = ""
|
||||
auth_token_file: str = ""
|
||||
|
||||
|
||||
@dataclass
|
||||
class GovernanceConfig:
|
||||
"""Microsoft Agent Governance Toolkit integration settings.
|
||||
|
||||
When ``enabled`` is True, Molecule AI's RBAC and audit trail are bridged
|
||||
to the Agent Governance Toolkit (agent-os-kernel) for policy evaluation.
|
||||
|
||||
``toolkit`` is reserved for future extensibility — only ``"microsoft"``
|
||||
is supported today.
|
||||
|
||||
``policy_mode`` controls enforcement:
|
||||
strict RBAC *and* toolkit policy must both allow — strictest mode
|
||||
permissive RBAC must allow; toolkit denials are logged but not enforced
|
||||
audit RBAC only; toolkit evaluated and logged but never blocks
|
||||
|
||||
``policy_file`` path to a Rego (.rego), YAML (.yaml/.yml), or Cedar
|
||||
(.cedar) policy file, loaded into the PolicyEvaluator at startup.
|
||||
|
||||
``blocked_patterns`` is a list of regex patterns that the toolkit will
|
||||
always deny regardless of roles or policy.
|
||||
"""
|
||||
|
||||
enabled: bool = False
|
||||
toolkit: str = "microsoft"
|
||||
policy_endpoint: str = ""
|
||||
policy_mode: str = "audit" # strict | permissive | audit
|
||||
policy_file: str = ""
|
||||
blocked_patterns: list[str] = field(default_factory=list)
|
||||
max_tool_calls_per_task: int = 50
|
||||
|
||||
|
||||
@dataclass
|
||||
class SecurityScanConfig:
|
||||
"""Skill dependency security scanning settings.
|
||||
|
||||
``mode`` controls what happens when critical/high CVEs are found:
|
||||
|
||||
block — raise ``SkillSecurityError``; the skill is NOT loaded.
|
||||
warn — emit a WARNING + audit event; the skill is loaded anyway (default).
|
||||
off — skip scanning entirely (air-gapped or CI environments).
|
||||
|
||||
Scanners tried in order: Snyk CLI (requires ``SNYK_TOKEN``), then
|
||||
pip-audit. If neither is available the scan is silently skipped.
|
||||
|
||||
Example config.yaml snippet::
|
||||
|
||||
security_scan: warn # shorthand string form
|
||||
# or verbose form:
|
||||
security_scan:
|
||||
mode: block
|
||||
"""
|
||||
|
||||
mode: str = "warn"
|
||||
"""One of: block | warn | off."""
|
||||
|
||||
fail_open_if_no_scanner: bool = True
|
||||
"""When True (default), silently skip scanning if no scanner (snyk/pip-audit)
|
||||
is in PATH. When False and mode='block', raise SkillSecurityError so that
|
||||
operators who require a CVE gate know the gate is absent. Closes #268."""
|
||||
|
||||
|
||||
@dataclass
|
||||
class EventLogConfig:
|
||||
"""Settings for the workspace event log (workspace/event_log.py).
|
||||
|
||||
The event log is an append-and-query buffer for runtime events
|
||||
(turn started, tool invoked, peer message delivered, …) that the
|
||||
canvas Activity tab and platform-side `/activity` endpoint read.
|
||||
Defaults are tuned for a long-running workspace: 1-hour TTL and a
|
||||
10k-entry cap together hold ~1 MB of events in memory at the
|
||||
documented per-event size budget (~100 bytes payload).
|
||||
|
||||
Example config.yaml snippet::
|
||||
|
||||
observability:
|
||||
event_log:
|
||||
backend: memory # or "disabled" to opt out
|
||||
ttl_seconds: 3600
|
||||
max_entries: 10000
|
||||
"""
|
||||
|
||||
backend: str = "memory"
|
||||
"""``memory`` (default) buffers events in process RAM with the
|
||||
bounds below; ``disabled`` returns a no-op log so the canvas
|
||||
Activity tab is silent. Unknown values fall back to ``memory`` —
|
||||
a typo should not crash boot or silently drop telemetry."""
|
||||
|
||||
ttl_seconds: int = 3600
|
||||
"""How long an event survives before TTL eviction. 1 hour covers
|
||||
a long agentic loop comfortably without leaking; operators
|
||||
debugging a slow drift may temporarily widen this, but be aware
|
||||
the bound is RAM, not disk."""
|
||||
|
||||
max_entries: int = 10_000
|
||||
"""Hard cap on resident events. Together with ``ttl_seconds`` this
|
||||
bounds memory: the FIFO eviction drops oldest first, so a query
|
||||
cursor that falls behind sees a contiguous tail rather than a
|
||||
gappy log."""
|
||||
|
||||
|
||||
@dataclass
|
||||
class ObservabilityConfig:
|
||||
"""Observability settings — heartbeat cadence, log verbosity, event log.
|
||||
|
||||
Hermes-style block: groups platform-runtime knobs that operators
|
||||
typically tune together (cadence, verbosity, event-log retention)
|
||||
into one declarative section instead of scattering them across env
|
||||
vars and hard-coded constants. Adopting this shape unblocks
|
||||
per-workspace tuning without a code change.
|
||||
|
||||
The ``event_log`` sub-block is schema-only in this PR (#119 PR-2);
|
||||
consumer wiring (the canvas Activity tab + `/activity` endpoint
|
||||
reading from the configured backend) lands in PR-3.
|
||||
|
||||
Example config.yaml snippet::
|
||||
|
||||
observability:
|
||||
heartbeat_interval_seconds: 60
|
||||
log_level: DEBUG
|
||||
event_log:
|
||||
backend: memory
|
||||
ttl_seconds: 3600
|
||||
max_entries: 10000
|
||||
"""
|
||||
|
||||
heartbeat_interval_seconds: int = 30
|
||||
"""Seconds between heartbeats sent to the platform. Default 30 matches
|
||||
``workspace/heartbeat.py``'s long-standing constant. Lower values
|
||||
reduce platform-side detection latency for crashed workspaces; higher
|
||||
values reduce platform write load. Bounds: clamped to [5, 300] at
|
||||
parse time — outside that range the workspace either floods the
|
||||
platform or looks dead before the next beat."""
|
||||
|
||||
log_level: str = "INFO"
|
||||
"""Python ``logging`` level for the workspace runtime. Accepts the
|
||||
standard names (DEBUG, INFO, WARNING, ERROR, CRITICAL). Today the
|
||||
runtime reads ``LOG_LEVEL`` env; PR-3 of the #119 stack switches to
|
||||
this field with env still honored as an override for ops debugging."""
|
||||
|
||||
event_log: EventLogConfig = field(default_factory=EventLogConfig)
|
||||
"""Event-log backend + retention bounds. See ``EventLogConfig``."""
|
||||
|
||||
|
||||
@dataclass
|
||||
class ComplianceConfig:
|
||||
"""OWASP Top 10 for Agentic Applications compliance settings.
|
||||
|
||||
Default is ``mode: owasp_agentic`` + ``prompt_injection: detect``.
|
||||
The detect mode logs injection attempts as audit events without
|
||||
blocking the request — so there is no false-positive UX cost, only
|
||||
a gain in visibility. Operators opt into stricter ``block`` mode per
|
||||
workspace. To disable compliance entirely (not recommended), set
|
||||
``mode: ""`` in config.yaml.
|
||||
|
||||
Before 2026-04-24, the default was ``mode: ""`` (fully off). A
|
||||
review of the A2A inbound path showed that no shipped template set
|
||||
``mode`` explicitly, so prompt-injection detection was silently
|
||||
disabled for every live workspace despite the machinery existing.
|
||||
Flipping the default to ``owasp_agentic`` with ``prompt_injection:
|
||||
detect`` closes that gap with zero user-visible behavior change.
|
||||
|
||||
Example config.yaml snippet to opt OUT::
|
||||
|
||||
compliance:
|
||||
mode: "" # disables all compliance checks
|
||||
|
||||
Example config.yaml snippet to tighten::
|
||||
|
||||
compliance:
|
||||
mode: owasp_agentic # (default)
|
||||
prompt_injection: block # (default: detect)
|
||||
max_tool_calls_per_task: 30
|
||||
max_task_duration_seconds: 180
|
||||
"""
|
||||
|
||||
mode: str = "owasp_agentic"
|
||||
"""Enable compliance mode. ``owasp_agentic`` (default) activates the
|
||||
OA-01/OA-02/OA-03/OA-06 checks; ``""`` disables everything."""
|
||||
|
||||
prompt_injection: str = "detect"
|
||||
"""``detect`` logs injection attempts (default, zero UX cost);
|
||||
``block`` raises PromptInjectionError before the agent sees the
|
||||
text. Operators can tighten to ``block`` per workspace."""
|
||||
|
||||
max_tool_calls_per_task: int = 50
|
||||
"""Maximum number of tool invocations per task before ExcessiveAgencyError."""
|
||||
|
||||
max_task_duration_seconds: int = 300
|
||||
"""Maximum wall-clock seconds per task before ExcessiveAgencyError."""
|
||||
|
||||
|
||||
@dataclass
|
||||
class WorkspaceConfig:
|
||||
name: str = "Workspace"
|
||||
description: str = ""
|
||||
role: str = ""
|
||||
"""Human-readable role label for this agent (e.g. 'Senior Code Reviewer').
|
||||
Surfaced in AGENTS.md so peer agents can understand this workspace's purpose
|
||||
without reading the full system prompt. Falls back to description when empty."""
|
||||
version: str = "1.0.0"
|
||||
tier: int = 1
|
||||
model: str = "anthropic:claude-opus-4-7"
|
||||
provider: str = ""
|
||||
"""Explicit LLM provider slug (e.g., ``anthropic``, ``openai``, ``minimax``).
|
||||
|
||||
When empty, ``load_config`` derives it from the ``model`` slug prefix
|
||||
(``anthropic:claude-opus-4-7`` → ``anthropic``; ``minimax/abab7-chat`` →
|
||||
``minimax``; bare model names → ``""``). Set explicitly via the canvas
|
||||
Provider dropdown or the ``LLM_PROVIDER`` env var when the model name
|
||||
is provider-ambiguous (e.g., a custom alias) or when an adapter needs
|
||||
a specific gateway distinct from the model namespace.
|
||||
"""
|
||||
runtime: str = "langgraph" # langgraph | claude-code | codex | ollama | custom
|
||||
runtime_config: RuntimeConfig = field(default_factory=RuntimeConfig)
|
||||
initial_prompt: str = ""
|
||||
"""Auto-sent as the first A2A message after startup. Default empty = no auto-message.
|
||||
Can be an inline string or a file reference (initial_prompt_file in yaml)."""
|
||||
idle_prompt: str = ""
|
||||
"""Auto-sent every `idle_interval_seconds` while the workspace has no active
|
||||
task (heartbeat.active_tasks == 0). Default empty = no idle loop. This is
|
||||
the reflection-on-completion / backlog-pull pattern from the Hermes/Letta
|
||||
playbook: the workspace self-wakes when idle, runs a lightweight reflection
|
||||
prompt, and either picks up queued work or stops. Cost scales with useful
|
||||
activity (the prompt returns quickly if there's nothing to do). Can be
|
||||
inline or a file reference via `idle_prompt_file`."""
|
||||
idle_interval_seconds: int = 600
|
||||
"""How often the idle loop checks in (seconds). Default 600 (10 min).
|
||||
Ignored when idle_prompt is empty."""
|
||||
skills: list[str] = field(default_factory=list)
|
||||
plugins: list[str] = field(default_factory=list) # installed plugin names
|
||||
tools: list[str] = field(default_factory=list)
|
||||
prompt_files: list[str] = field(default_factory=list)
|
||||
a2a: A2AConfig = field(default_factory=A2AConfig)
|
||||
delegation: DelegationConfig = field(default_factory=DelegationConfig)
|
||||
sandbox: SandboxConfig = field(default_factory=SandboxConfig)
|
||||
rbac: RBACConfig = field(default_factory=RBACConfig)
|
||||
hitl: HITLConfig = field(default_factory=HITLConfig)
|
||||
governance: GovernanceConfig = field(default_factory=GovernanceConfig)
|
||||
security_scan: SecurityScanConfig = field(default_factory=SecurityScanConfig)
|
||||
compliance: ComplianceConfig = field(default_factory=ComplianceConfig)
|
||||
observability: ObservabilityConfig = field(default_factory=ObservabilityConfig)
|
||||
sub_workspaces: list[dict] = field(default_factory=list)
|
||||
effort: str = ""
|
||||
"""Claude output effort level for the agentic loop: low | medium | high | xhigh | max.
|
||||
Empty string = not set (model default applies). xhigh is the Opus 4.7 recommended
|
||||
default for long agentic tasks. Passed as ``output_config.effort`` by ClaudeSDKExecutor."""
|
||||
task_budget: int = 0
|
||||
"""Advisory total-token budget across the full agentic loop. 0 = not set.
|
||||
Must be >= 20000 when non-zero (API minimum). When set, ClaudeSDKExecutor
|
||||
automatically adds the ``task-budgets-2026-03-13`` beta header."""
|
||||
|
||||
|
||||
def _derive_provider_from_model(model: str) -> str:
|
||||
"""Extract the provider slug prefix from a model identifier.
|
||||
|
||||
Recognizes both ``provider:model`` (Anthropic / OpenAI / Google convention)
|
||||
and ``provider/model`` (HuggingFace / Minimax convention). Returns ``""``
|
||||
when the model has no recognizable separator — callers must treat empty
|
||||
as "use adapter default routing", not as a hard failure.
|
||||
"""
|
||||
for sep in (":", "/"):
|
||||
if sep in model:
|
||||
return model.partition(sep)[0]
|
||||
return ""
|
||||
|
||||
|
||||
_legacy_model_provider_warned = False
|
||||
|
||||
|
||||
def _picked_model_from_env(default: str) -> str:
|
||||
"""Resolve the operator-picked model id from env; newest name wins.
|
||||
|
||||
Precedence: ``MOLECULE_MODEL`` (canonical, unambiguous) → ``MODEL`` →
|
||||
``MODEL_PROVIDER`` (legacy) → ``default`` (the YAML ``model:`` field).
|
||||
|
||||
``MODEL_PROVIDER`` is **misleadingly named**: it carries the picked
|
||||
*model id*, never the LLM provider — the provider lives in
|
||||
``LLM_PROVIDER`` / the YAML ``provider:`` field. The legacy path stays
|
||||
so canvas Save+Restart, the workspace-server secret-mint path, and
|
||||
persona env files that set it keep working, but if it's the *only* one
|
||||
set we log a deprecation once — the misnomer keeps biting (e.g. setting
|
||||
``MODEL_PROVIDER=claude-code`` expecting it to select the claude-code
|
||||
*runtime* — it doesn't, ``runtime:`` does — after which the claude CLI
|
||||
404s on ``--model claude-code``). Set ``MODEL``/``MOLECULE_MODEL`` to
|
||||
an id from ``runtime_config.models[].id`` (e.g. ``opus``, ``sonnet``,
|
||||
``claude-opus-4-7``, ``MiniMax-M2.7-highspeed``) instead.
|
||||
"""
|
||||
global _legacy_model_provider_warned
|
||||
for name in ("MOLECULE_MODEL", "MODEL"):
|
||||
v = (os.environ.get(name) or "").strip()
|
||||
if v:
|
||||
return v
|
||||
legacy = (os.environ.get("MODEL_PROVIDER") or "").strip()
|
||||
if legacy:
|
||||
if not _legacy_model_provider_warned:
|
||||
logger.warning(
|
||||
"MODEL_PROVIDER=%r is deprecated and misleadingly named — it "
|
||||
"sets the picked *model id*, not the LLM provider (that's "
|
||||
"LLM_PROVIDER / the YAML `provider:` field). Set MODEL (or "
|
||||
"MOLECULE_MODEL) to an id from runtime_config.models instead.",
|
||||
legacy,
|
||||
)
|
||||
_legacy_model_provider_warned = True
|
||||
return legacy
|
||||
return default
|
||||
|
||||
|
||||
_EVENT_LOG_VALID_BACKENDS = {"memory", "disabled"}
|
||||
|
||||
|
||||
def _parse_event_log(raw: object) -> "EventLogConfig":
|
||||
"""Coerce the ``observability.event_log`` YAML block into EventLogConfig.
|
||||
|
||||
Lenient like the rest of this parser: a missing block, a non-dict
|
||||
value, or a bad backend name resolves to defaults rather than
|
||||
raising at boot. The event_log is observability infra — a typo in
|
||||
one field should not crash the workspace before any event can fire.
|
||||
Bounds (ttl_seconds, max_entries) clamp to positives so a 0/-1
|
||||
misconfig doesn't disable the log silently; that's what
|
||||
``backend: disabled`` is for.
|
||||
"""
|
||||
if not isinstance(raw, dict):
|
||||
return EventLogConfig()
|
||||
backend = str(raw.get("backend", "memory")).strip().lower()
|
||||
if backend not in _EVENT_LOG_VALID_BACKENDS:
|
||||
backend = "memory"
|
||||
try:
|
||||
ttl_seconds = int(raw.get("ttl_seconds", 3600))
|
||||
except (TypeError, ValueError):
|
||||
ttl_seconds = 3600
|
||||
if ttl_seconds <= 0:
|
||||
ttl_seconds = 3600
|
||||
try:
|
||||
max_entries = int(raw.get("max_entries", 10_000))
|
||||
except (TypeError, ValueError):
|
||||
max_entries = 10_000
|
||||
if max_entries <= 0:
|
||||
max_entries = 10_000
|
||||
return EventLogConfig(
|
||||
backend=backend, ttl_seconds=ttl_seconds, max_entries=max_entries
|
||||
)
|
||||
|
||||
|
||||
def _clamp_heartbeat(value: object) -> int:
|
||||
"""Coerce raw YAML/env input into the [5, 300]-second heartbeat band.
|
||||
|
||||
Outside that band the workspace either floods the platform with
|
||||
sub-second beats or looks dead long before the next one — both
|
||||
real failure modes seen on incidents, neither benign. Coerce here
|
||||
so adapters and ``heartbeat.py`` can read the value without
|
||||
re-validating.
|
||||
"""
|
||||
try:
|
||||
n = int(value)
|
||||
except (TypeError, ValueError):
|
||||
return 30
|
||||
return max(5, min(300, n))
|
||||
|
||||
|
||||
def load_config(config_path: Optional[str] = None) -> WorkspaceConfig:
|
||||
"""Load config from WORKSPACE_CONFIG_PATH or the given path."""
|
||||
if config_path is None:
|
||||
config_path = os.environ.get("WORKSPACE_CONFIG_PATH", "/configs")
|
||||
|
||||
config_file = Path(config_path) / "config.yaml"
|
||||
if not config_file.exists():
|
||||
raise FileNotFoundError(f"Config file not found: {config_file}")
|
||||
|
||||
with open(config_file) as f:
|
||||
raw = yaml.safe_load(f) or {}
|
||||
|
||||
# Operator-picked model from env (canvas / secret-mint / persona env),
|
||||
# falling back to the YAML `model:` field. See _picked_model_from_env for
|
||||
# the precedence (MOLECULE_MODEL > MODEL > legacy MODEL_PROVIDER).
|
||||
model = _picked_model_from_env(raw.get("model", "anthropic:claude-opus-4-7"))
|
||||
|
||||
# Resolve top-level provider with this priority chain:
|
||||
# 1. ``LLM_PROVIDER`` env var (canvas Save+Restart sets this so the
|
||||
# operator's choice survives a CP-driven restart even though the
|
||||
# regenerated /configs/config.yaml drops most user fields).
|
||||
# 2. Explicit YAML ``provider:`` (an operator pinned it in the file).
|
||||
# 3. Derive from the model slug prefix for backward compat:
|
||||
# ``anthropic:claude-opus-4-7`` → ``anthropic``
|
||||
# ``minimax/abab7-chat-preview`` → ``minimax``
|
||||
# bare model names → ``""`` (signals "use adapter default")
|
||||
# Empty after all three is fine — adapters that don't need an explicit
|
||||
# provider (langgraph, claude-code-default, codex) keep their existing
|
||||
# routing; adapters that do (hermes via derive-provider.sh) prefer this
|
||||
# over slug-parsing the model name.
|
||||
provider = (
|
||||
os.environ.get("LLM_PROVIDER")
|
||||
or raw.get("provider")
|
||||
or _derive_provider_from_model(model)
|
||||
)
|
||||
|
||||
runtime = raw.get("runtime", "langgraph")
|
||||
runtime_raw = raw.get("runtime_config", {})
|
||||
|
||||
a2a_raw = raw.get("a2a", {})
|
||||
delegation_raw = raw.get("delegation", {})
|
||||
sandbox_raw = raw.get("sandbox", {})
|
||||
rbac_raw = raw.get("rbac", {})
|
||||
hitl_raw = raw.get("hitl", {})
|
||||
governance_raw = raw.get("governance", {})
|
||||
# security_scan accepts both shorthand string ("warn") and dict ({"mode": "warn"})
|
||||
_ss_raw = raw.get("security_scan", {})
|
||||
security_scan_raw = _ss_raw if isinstance(_ss_raw, dict) else {"mode": str(_ss_raw)}
|
||||
compliance_raw = raw.get("compliance", {})
|
||||
observability_raw = raw.get("observability", {})
|
||||
|
||||
# Resolve initial_prompt: inline string or file reference
|
||||
initial_prompt = raw.get("initial_prompt", "")
|
||||
initial_prompt_file = raw.get("initial_prompt_file", "")
|
||||
if not initial_prompt and initial_prompt_file:
|
||||
prompt_path = Path(config_path) / initial_prompt_file
|
||||
if prompt_path.exists():
|
||||
initial_prompt = prompt_path.read_text().strip()
|
||||
|
||||
# Resolve idle_prompt: same pattern as initial_prompt
|
||||
idle_prompt = raw.get("idle_prompt", "")
|
||||
idle_prompt_file = raw.get("idle_prompt_file", "")
|
||||
if not idle_prompt and idle_prompt_file:
|
||||
idle_path = Path(config_path) / idle_prompt_file
|
||||
if idle_path.exists():
|
||||
idle_prompt = idle_path.read_text().strip()
|
||||
idle_interval_seconds = int(raw.get("idle_interval_seconds", 600))
|
||||
|
||||
return WorkspaceConfig(
|
||||
name=raw.get("name", "Workspace"),
|
||||
description=raw.get("description", ""),
|
||||
role=raw.get("role", ""),
|
||||
version=raw.get("version", "1.0.0"),
|
||||
tier=int(raw.get("tier", 1)) if str(raw.get("tier", 1)).isdigit() else 1,
|
||||
model=model,
|
||||
provider=provider,
|
||||
runtime=runtime,
|
||||
initial_prompt=initial_prompt,
|
||||
idle_prompt=idle_prompt,
|
||||
idle_interval_seconds=idle_interval_seconds,
|
||||
runtime_config=RuntimeConfig(
|
||||
command=runtime_raw.get("command", ""),
|
||||
args=runtime_raw.get("args", []),
|
||||
required_env=runtime_raw.get("required_env", []),
|
||||
timeout=runtime_raw.get("timeout", 0),
|
||||
# Picked-model precedence (priority order):
|
||||
# 1. operator-picked model from env — MOLECULE_MODEL > MODEL >
|
||||
# (legacy) MODEL_PROVIDER, plumbed via canvas Save+Restart,
|
||||
# workspace-server's secret-mint path, or the universal
|
||||
# MODEL/MODEL_PROVIDER env from applyRuntimeModelEnv. The
|
||||
# operator's canvas selection MUST win over the template's
|
||||
# baked-in default; previously the template's
|
||||
# `runtime_config.model: sonnet` always won and the picked
|
||||
# MiniMax/GLM/etc model was silently dropped (Bug B,
|
||||
# surfaced 2026-05-02 during E2E).
|
||||
# 2. runtime_raw.model — explicit YAML override in the
|
||||
# template's runtime_config.
|
||||
# 3. top-level `model` (already env-resolved above). This is
|
||||
# the SaaS restart case (CP regenerates a minimal
|
||||
# config.yaml on every boot, dropping runtime_config.model).
|
||||
# Centralising here means EVERY adapter gets the override for
|
||||
# free — no per-adapter env-reading code required.
|
||||
model=_picked_model_from_env(runtime_raw.get("model") or model),
|
||||
# Same fallback shape as ``model`` above: an explicit
|
||||
# ``runtime_config.provider`` wins; otherwise inherit the
|
||||
# top-level resolved provider so adapters see a single
|
||||
# consistent choice without each one re-implementing
|
||||
# env/YAML/slug-prefix resolution.
|
||||
provider=runtime_raw.get("provider") or provider,
|
||||
# Per-model entries (canvas Model dropdown source). Pass through
|
||||
# raw dicts so the schema can grow without a parser change. Only
|
||||
# entries that are dicts are kept — a malformed YAML element
|
||||
# (string, list, None) is silently dropped rather than raising,
|
||||
# matching the rest of this parser's lenient defaults.
|
||||
models=[m for m in (runtime_raw.get("models") or []) if isinstance(m, dict)],
|
||||
# Deprecated fields — kept for backward compat
|
||||
auth_token_env=runtime_raw.get("auth_token_env", ""),
|
||||
auth_token_file=runtime_raw.get("auth_token_file", ""),
|
||||
),
|
||||
skills=raw.get("skills", []),
|
||||
plugins=raw.get("plugins", []),
|
||||
tools=raw.get("tools", []),
|
||||
prompt_files=raw.get("prompt_files", []),
|
||||
a2a=A2AConfig(
|
||||
port=a2a_raw.get("port", 8000),
|
||||
streaming=a2a_raw.get("streaming", True),
|
||||
push_notifications=a2a_raw.get("push_notifications", True),
|
||||
),
|
||||
delegation=DelegationConfig(
|
||||
retry_attempts=delegation_raw.get("retry_attempts", 3),
|
||||
retry_delay=delegation_raw.get("retry_delay", 5.0),
|
||||
timeout=delegation_raw.get("timeout", 120.0),
|
||||
escalate=delegation_raw.get("escalate", True),
|
||||
),
|
||||
sandbox=SandboxConfig(
|
||||
backend=sandbox_raw.get("backend", "subprocess"),
|
||||
memory_limit=sandbox_raw.get("memory_limit", "256m"),
|
||||
timeout=sandbox_raw.get("timeout", 30),
|
||||
),
|
||||
rbac=RBACConfig(
|
||||
roles=rbac_raw.get("roles", ["operator"]),
|
||||
allowed_actions=rbac_raw.get("allowed_actions", {}),
|
||||
),
|
||||
hitl=HITLConfig(
|
||||
channels=hitl_raw.get("channels", [{"type": "dashboard"}]),
|
||||
default_timeout=float(hitl_raw.get("default_timeout", 300)),
|
||||
bypass_roles=hitl_raw.get("bypass_roles", []),
|
||||
),
|
||||
governance=GovernanceConfig(
|
||||
enabled=governance_raw.get("enabled", False),
|
||||
toolkit=governance_raw.get("toolkit", "microsoft"),
|
||||
policy_endpoint=governance_raw.get("policy_endpoint", ""),
|
||||
policy_mode=governance_raw.get("policy_mode", "audit"),
|
||||
policy_file=governance_raw.get("policy_file", ""),
|
||||
blocked_patterns=governance_raw.get("blocked_patterns", []),
|
||||
max_tool_calls_per_task=governance_raw.get("max_tool_calls_per_task", 50),
|
||||
),
|
||||
security_scan=SecurityScanConfig(
|
||||
mode=security_scan_raw.get("mode", "warn"),
|
||||
fail_open_if_no_scanner=security_scan_raw.get("fail_open_if_no_scanner", True),
|
||||
),
|
||||
compliance=ComplianceConfig(
|
||||
# Default must match ComplianceConfig.mode's dataclass default
|
||||
# (see class docstring for rationale — 2026-04-24 flip).
|
||||
mode=compliance_raw.get("mode", "owasp_agentic"),
|
||||
prompt_injection=compliance_raw.get("prompt_injection", "detect"),
|
||||
max_tool_calls_per_task=int(compliance_raw.get("max_tool_calls_per_task", 50)),
|
||||
max_task_duration_seconds=int(compliance_raw.get("max_task_duration_seconds", 300)),
|
||||
),
|
||||
observability=ObservabilityConfig(
|
||||
heartbeat_interval_seconds=_clamp_heartbeat(
|
||||
observability_raw.get("heartbeat_interval_seconds", 30)
|
||||
),
|
||||
log_level=str(observability_raw.get("log_level", "INFO")).upper(),
|
||||
event_log=_parse_event_log(observability_raw.get("event_log", {})),
|
||||
),
|
||||
sub_workspaces=raw.get("sub_workspaces", []),
|
||||
effort=str(raw.get("effort", "")),
|
||||
task_budget=int(raw.get("task_budget", 0)),
|
||||
)
|
||||
@@ -1,61 +0,0 @@
|
||||
"""Resolve the configs directory used by the workspace runtime.
|
||||
|
||||
The runtime persists per-workspace state to a single directory:
|
||||
``.auth_token`` (platform_auth), ``.platform_inbound_secret``
|
||||
(platform_inbound_auth), ``.mcp_inbox_cursor`` (inbox). Inside a
|
||||
workspace EC2 container that directory is ``/configs`` — a tmpfs/EBS
|
||||
mount owned by the agent user, populated by the provisioner before
|
||||
runtime boot.
|
||||
|
||||
Outside a container — operators running ``molecule-mcp`` on a laptop
|
||||
for the external-runtime path — ``/configs`` doesn't exist (or, if it
|
||||
does, isn't writable by an unprivileged user). The default would
|
||||
silently fail on the first heartbeat: ``.platform_inbound_secret``
|
||||
write hits ``Read-only file system: '/configs'``, the heartbeat thread
|
||||
logs and dies, the workspace flips offline within a minute. The
|
||||
operator sees no actionable error.
|
||||
|
||||
This module is the single resolution point. Resolution order:
|
||||
|
||||
1. ``CONFIGS_DIR`` env var, if set — explicit operator override.
|
||||
2. ``/configs`` — used iff the path exists AND is writable. This
|
||||
preserves the in-container default for every existing deployment.
|
||||
3. ``$HOME/.molecule-workspace`` — the non-container fallback,
|
||||
created with mode 0700 so per-file 0600 perms aren't undermined
|
||||
by a world-readable parent.
|
||||
|
||||
Not cached: callers (heartbeat thread, MCP tools) hit this at most a
|
||||
few times per second; reading the env var + one ``stat()`` call is
|
||||
cheap, and the existing call sites read ``os.environ`` live so tests
|
||||
that monkeypatch ``CONFIGS_DIR`` between cases keep working.
|
||||
|
||||
Issue: Molecule-AI/molecule-core#2458.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def resolve() -> Path:
|
||||
"""Return the configs directory, creating the home fallback if needed."""
|
||||
explicit = os.environ.get("CONFIGS_DIR", "").strip()
|
||||
if explicit:
|
||||
path = Path(explicit)
|
||||
path.mkdir(parents=True, exist_ok=True)
|
||||
return path
|
||||
|
||||
in_container = Path("/configs")
|
||||
if in_container.exists() and os.access(str(in_container), os.W_OK):
|
||||
return in_container
|
||||
|
||||
home_path = Path.home() / ".molecule-workspace"
|
||||
home_path.mkdir(parents=True, exist_ok=True, mode=0o700)
|
||||
return home_path
|
||||
|
||||
|
||||
def reset_cache() -> None:
|
||||
"""No-op kept for API stability; this module is stateless. Tests
|
||||
that called reset_cache when the cached prototype was in tree
|
||||
keep working without modification."""
|
||||
return
|
||||
@@ -1,137 +0,0 @@
|
||||
"""Memory consolidation loop.
|
||||
|
||||
When an agent is idle (no active tasks for a configurable period),
|
||||
the consolidation loop wakes up and summarizes noisy local memory
|
||||
entries into dense, high-value knowledge facts.
|
||||
|
||||
Similar to human sleep consolidation — raw scratchpad entries get
|
||||
compressed into reusable knowledge.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
|
||||
import httpx
|
||||
|
||||
from platform_auth import auth_headers
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
if os.path.exists("/.dockerenv") or os.environ.get("DOCKER_VERSION"):
|
||||
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
|
||||
else:
|
||||
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://localhost:8080")
|
||||
_WORKSPACE_ID_raw = os.environ.get("WORKSPACE_ID")
|
||||
if not _WORKSPACE_ID_raw:
|
||||
raise RuntimeError("WORKSPACE_ID environment variable is required but not set")
|
||||
WORKSPACE_ID = _WORKSPACE_ID_raw
|
||||
CONSOLIDATION_INTERVAL = float(os.environ.get("CONSOLIDATION_INTERVAL", "300")) # 5 min
|
||||
CONSOLIDATION_THRESHOLD = int(os.environ.get("CONSOLIDATION_THRESHOLD", "10")) # min memories before consolidating
|
||||
|
||||
|
||||
class ConsolidationLoop:
|
||||
"""Background loop that consolidates local memories when idle."""
|
||||
|
||||
def __init__(self, agent=None):
|
||||
self.agent = agent
|
||||
self._running = False
|
||||
|
||||
async def start(self):
|
||||
"""Start the consolidation loop."""
|
||||
self._running = True
|
||||
logger.info("Memory consolidation loop started (interval=%ss, threshold=%d)",
|
||||
CONSOLIDATION_INTERVAL, CONSOLIDATION_THRESHOLD)
|
||||
|
||||
while self._running:
|
||||
await asyncio.sleep(CONSOLIDATION_INTERVAL)
|
||||
|
||||
if not self._running:
|
||||
break
|
||||
|
||||
try:
|
||||
await self._consolidate()
|
||||
except Exception as e:
|
||||
logger.warning("Consolidation error: %s", e)
|
||||
|
||||
async def _consolidate(self):
|
||||
"""Check if consolidation is needed and run it."""
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
# Fetch local memories
|
||||
resp = await client.get(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/memories",
|
||||
params={"scope": "LOCAL"},
|
||||
headers=auth_headers(),
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
return
|
||||
|
||||
memories = resp.json()
|
||||
if len(memories) < CONSOLIDATION_THRESHOLD:
|
||||
return
|
||||
|
||||
logger.info("Consolidating %d local memories", len(memories))
|
||||
|
||||
# Build a summary of all local memories
|
||||
contents = [m["content"] for m in memories]
|
||||
summary_prompt = (
|
||||
"Summarize the following workspace memories into 3-5 key facts. "
|
||||
"Each fact should be a single, clear sentence capturing the most "
|
||||
"important and reusable knowledge:\n\n"
|
||||
+ "\n".join(f"- {c}" for c in contents)
|
||||
)
|
||||
|
||||
# Use the agent to generate the summary if available
|
||||
summary = ""
|
||||
if self.agent:
|
||||
try:
|
||||
result = await self.agent.ainvoke(
|
||||
{"messages": [("user", summary_prompt)]},
|
||||
config={"configurable": {"thread_id": "consolidation"}},
|
||||
)
|
||||
messages = result.get("messages", [])
|
||||
summary = ""
|
||||
for msg in reversed(messages):
|
||||
content = getattr(msg, "content", "")
|
||||
if isinstance(content, str) and content.strip():
|
||||
msg_type = getattr(msg, "type", "")
|
||||
if msg_type != "human":
|
||||
summary = content
|
||||
break
|
||||
|
||||
if summary:
|
||||
# Store consolidated summary as a TEAM memory — only delete originals if POST succeeds
|
||||
resp = await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/memories",
|
||||
json={"content": f"[Consolidated] {summary}", "scope": "TEAM"},
|
||||
headers=auth_headers(),
|
||||
)
|
||||
if resp.status_code in (200, 201):
|
||||
# Safe to delete originals — consolidated version is saved
|
||||
for m in memories:
|
||||
await client.delete(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/memories/{m['id']}",
|
||||
headers=auth_headers(),
|
||||
)
|
||||
logger.info("Consolidated %d memories into team knowledge", len(memories))
|
||||
else:
|
||||
logger.warning("Consolidation POST failed (status %d) — keeping originals", resp.status_code)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
"CONSOLIDATION: Agent summarization failed (rate limit? model error?): %s. "
|
||||
"Falling back to simple concatenation.", e
|
||||
)
|
||||
# Fall through to concatenation below
|
||||
|
||||
# Fallback: concatenate without agent summarization
|
||||
if not (self.agent and summary):
|
||||
combined = " | ".join(contents[:20])
|
||||
await client.post(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/memories",
|
||||
json={"content": f"[Consolidated] {combined}", "scope": "TEAM"},
|
||||
headers=auth_headers(),
|
||||
)
|
||||
logger.info("Consolidated %d memories via concatenation fallback", len(memories))
|
||||
|
||||
def stop(self):
|
||||
self._running = False
|
||||
@@ -1,152 +0,0 @@
|
||||
"""Coordinator pattern for team workspaces.
|
||||
|
||||
When a workspace is expanded into a team, the parent agent becomes a
|
||||
coordinator that routes incoming tasks to the appropriate child workspace
|
||||
based on the task content and children's capabilities.
|
||||
|
||||
The coordinator:
|
||||
1. Fetches its children's Agent Cards (skills, capabilities)
|
||||
2. Analyzes each incoming task to determine which child is best suited
|
||||
3. Delegates to the chosen child via the delegation tool
|
||||
4. Aggregates responses if a task requires multiple children
|
||||
5. Falls back to handling the task itself if no child is appropriate
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
|
||||
import httpx
|
||||
from langchain_core.tools import tool
|
||||
from shared_runtime import build_peer_section
|
||||
from policies.routing import build_team_routing_payload
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
if os.path.exists("/.dockerenv") or os.environ.get("DOCKER_VERSION"):
|
||||
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
|
||||
else:
|
||||
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://localhost:8080")
|
||||
_WORKSPACE_ID_raw = os.environ.get("WORKSPACE_ID")
|
||||
if not _WORKSPACE_ID_raw:
|
||||
raise RuntimeError("WORKSPACE_ID environment variable is required but not set")
|
||||
WORKSPACE_ID = _WORKSPACE_ID_raw
|
||||
|
||||
|
||||
async def get_children() -> list[dict]:
|
||||
"""Fetch this workspace's children from the platform."""
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.get(
|
||||
f"{PLATFORM_URL}/registry/{WORKSPACE_ID}/peers",
|
||||
headers={"X-Workspace-ID": WORKSPACE_ID},
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
peers = resp.json()
|
||||
# Filter to only children (parent_id == our ID)
|
||||
return [p for p in peers if p.get("parent_id") == WORKSPACE_ID]
|
||||
except Exception as e:
|
||||
logger.warning("Failed to fetch children: %s", e)
|
||||
return []
|
||||
|
||||
|
||||
def build_children_description(children: list[dict]) -> str:
|
||||
"""Build a description of children's capabilities for the coordinator prompt."""
|
||||
if not children:
|
||||
return ""
|
||||
|
||||
team_section = build_peer_section(
|
||||
children,
|
||||
heading="## Your Team (sub-workspaces you coordinate)",
|
||||
instruction=(
|
||||
"Use the `delegate_task_async` tool to send tasks to the chosen member. "
|
||||
"Only delegate to members listed above."
|
||||
),
|
||||
)
|
||||
|
||||
return "\n".join(
|
||||
[
|
||||
team_section,
|
||||
"",
|
||||
"### Coordination Rules — MANDATORY",
|
||||
"1. You are a COORDINATOR. Your ONLY job is to delegate and synthesize. NEVER do the work yourself.",
|
||||
"2. For EVERY task, use `delegate_task_async` to send it to the appropriate team member(s). "
|
||||
"Do this BEFORE writing any analysis, code, or research yourself.",
|
||||
"3. If a task spans multiple members, delegate to ALL of them in parallel and aggregate results.",
|
||||
"4. If ALL members are offline/paused, tell the caller which members are unavailable. "
|
||||
"Do NOT attempt the work yourself — you lack the specialist context.",
|
||||
"5. If a delegation FAILS (error, timeout): try another member first. "
|
||||
"Only provide your own brief summary if NO member can respond. Never forward raw errors.",
|
||||
"6. Your response should be a SYNTHESIS of your team's work, not your own analysis.",
|
||||
"7. Always respond in the same language the caller uses.",
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@tool
|
||||
async def route_task_to_team(
|
||||
task: str,
|
||||
preferred_member_id: str = "",
|
||||
) -> dict:
|
||||
"""Route a task to the most appropriate team member.
|
||||
|
||||
As the team coordinator, analyze the task and delegate to the best-suited
|
||||
child workspace. If preferred_member_id is provided, delegate directly to
|
||||
that member.
|
||||
|
||||
Args:
|
||||
task: The task description to route.
|
||||
preferred_member_id: Optional — directly delegate to this member.
|
||||
"""
|
||||
import time
|
||||
from builtin_tools.delegation import delegate_task_async as delegate
|
||||
|
||||
# RFC #2251 V1.0 reproduction-harness instrumentation. Phase-tagged log
|
||||
# lines correlate with scripts/measure-coordinator-task-bounds.sh's
|
||||
# external timing trace, so an operator running the harness against
|
||||
# staging can answer "what phase was the coordinator in at minute 7?".
|
||||
# `grep rfc2251_phase` on the workspace's container logs is the query.
|
||||
# Strip when V1.0 ships and the phase data lands in the structured
|
||||
# heartbeat payload instead.
|
||||
_phase_t0 = time.monotonic()
|
||||
logger.info(
|
||||
"rfc2251_phase=route_start task_chars=%d preferred_member_id=%s",
|
||||
len(task), preferred_member_id or "none",
|
||||
)
|
||||
|
||||
children = await get_children()
|
||||
logger.info(
|
||||
"rfc2251_phase=children_fetched count=%d elapsed_ms=%d",
|
||||
len(children), int((time.monotonic() - _phase_t0) * 1000),
|
||||
)
|
||||
|
||||
decision = build_team_routing_payload(
|
||||
children,
|
||||
task=task,
|
||||
preferred_member_id=preferred_member_id,
|
||||
)
|
||||
logger.info(
|
||||
"rfc2251_phase=routing_decided action=%s elapsed_ms=%d",
|
||||
decision.get("action", "unknown"), int((time.monotonic() - _phase_t0) * 1000),
|
||||
)
|
||||
|
||||
if decision.get("action") == "delegate_to_preferred_member":
|
||||
# Async delegation — returns immediately with task_id
|
||||
target = decision["preferred_member_id"]
|
||||
logger.info(
|
||||
"rfc2251_phase=delegate_invoked target=%s elapsed_ms=%d",
|
||||
target, int((time.monotonic() - _phase_t0) * 1000),
|
||||
)
|
||||
result = await delegate.ainvoke(
|
||||
{"workspace_id": target, "task": task}
|
||||
)
|
||||
logger.info(
|
||||
"rfc2251_phase=delegate_returned target=%s task_id=%s elapsed_ms=%d",
|
||||
target, result.get("task_id", "n/a"), int((time.monotonic() - _phase_t0) * 1000),
|
||||
)
|
||||
return result
|
||||
|
||||
logger.info(
|
||||
"rfc2251_phase=route_returning_decision_only elapsed_ms=%d",
|
||||
int((time.monotonic() - _phase_t0) * 1000),
|
||||
)
|
||||
return decision
|
||||
@@ -1,174 +0,0 @@
|
||||
#!/bin/sh
|
||||
# Drop privileges to the agent user before exec'ing molecule-runtime.
|
||||
# claude-code refuses --dangerously-skip-permissions when running as
|
||||
# root/sudo for safety. Without this entrypoint, every cron tick fails
|
||||
# with `ProcessError: Command failed with exit code 1` and the agent
|
||||
# logs `--dangerously-skip-permissions cannot be used with root/sudo
|
||||
# privileges for security reasons`.
|
||||
#
|
||||
# Pattern matches the legacy monorepo workspace/entrypoint.sh:
|
||||
# fix volume ownership as root, then re-exec via gosu as agent (uid 1000).
|
||||
|
||||
# --- RFC#523 Layer 2: tenant-workspace forbidden-env guard (task #146) ---
|
||||
# Defense-in-depth. The provisioner (workspace-server) has a fail-closed
|
||||
# abort at provision time (Layer 1, prepareProvisionContext), and the
|
||||
# in-container env-build has a silent strip (forensic #145,
|
||||
# provisioner.buildContainerEnv). This guard fires if either upstream
|
||||
# layer is bypassed — e.g. someone runs this image standalone with
|
||||
# `docker run -e GITEA_TOKEN=...`. Exit 1 with a clear message instead
|
||||
# of running with an operator-scope credential in tenant scope.
|
||||
#
|
||||
# Key names are generic. The MOLECULE_OPERATOR_ prefix is the one
|
||||
# molecule-AI-specific literal; this entrypoint lives inside the
|
||||
# claude-code template that is internal-only (memory
|
||||
# `feedback_open_source_templates_no_hardcoded_org_internals` — claude-
|
||||
# code template is internal, separate-published templates must NOT carry
|
||||
# org-specific literals). A fork can edit FORBIDDEN_KEYS /
|
||||
# FORBIDDEN_PREFIXES for its own operator-scope names without touching
|
||||
# the rest of the entrypoint.
|
||||
#
|
||||
# Skipped when MOLECULE_TENANT_GUARD_DISABLE=1 — for local-dev where the
|
||||
# operator host IS the tenant host (e.g. running molecule-runtime on the
|
||||
# operator box for debugging). NEVER set this in tenant containers.
|
||||
if [ "${MOLECULE_TENANT_GUARD_DISABLE:-0}" != "1" ]; then
|
||||
FORBIDDEN_KEYS="GITEA_TOKEN GITEA_PAT GITHUB_TOKEN GITHUB_PAT GH_TOKEN GITLAB_TOKEN GL_TOKEN BITBUCKET_TOKEN CP_ADMIN_API_TOKEN CP_ADMIN_TOKEN INFISICAL_OPERATOR_TOKEN INFISICAL_BOOTSTRAP_TOKEN RAILWAY_TOKEN RAILWAY_PERSONAL_API_TOKEN HETZNER_TOKEN HETZNER_API_TOKEN"
|
||||
FORBIDDEN_PREFIXES="MOLECULE_OPERATOR_"
|
||||
FOUND=""
|
||||
for k in $FORBIDDEN_KEYS; do
|
||||
# eval is safe here — $k is from a static whitespace-separated
|
||||
# literal list above (no user input). POSIX sh has no
|
||||
# associative arrays, hence the indirect-expansion via eval to
|
||||
# test "is this var set" without caring about its value.
|
||||
eval "v=\${$k+set}"
|
||||
if [ "$v" = "set" ]; then
|
||||
FOUND="$FOUND $k"
|
||||
fi
|
||||
done
|
||||
for prefix in $FORBIDDEN_PREFIXES; do
|
||||
# env | awk is the portable POSIX way to enumerate by prefix.
|
||||
# busybox awk (alpine), gawk (debian), and BSD awk (macOS-test)
|
||||
# all support index(). Doesn't depend on bash arrays / [[ =~ ]].
|
||||
prefix_hits=$(env | awk -F= -v p="$prefix" 'index($1, p)==1 {print $1}')
|
||||
if [ -n "$prefix_hits" ]; then
|
||||
FOUND="$FOUND $prefix_hits"
|
||||
fi
|
||||
done
|
||||
if [ -n "$FOUND" ]; then
|
||||
echo "RFC#523 Layer 2: refusing to start tenant workspace — forbidden operator-scope env var(s) present:$FOUND" >&2
|
||||
echo "These vars are operator-fleet scope and must not reach tenant workspaces." >&2
|
||||
echo "Remove them from workspace_secrets / global_secrets / docker -e and retry." >&2
|
||||
echo "If running this image standalone for local dev with intentional operator scope, set MOLECULE_TENANT_GUARD_DISABLE=1." >&2
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ "$(id -u)" = "0" ]; then
|
||||
# Configs volume is created by Docker as root; agent needs write access
|
||||
# for plugin installs, memory writes, .auth_token rotation, etc.
|
||||
chown -R agent:agent /configs 2>/dev/null
|
||||
# Strip CRLF from hook scripts — Windows Docker Desktop copies host files
|
||||
# with CRLF line endings even when .gitattributes says eol=lf. The \r in
|
||||
# the shebang line makes python3 try to open 'script.py\r' → ENOENT →
|
||||
# claude-code swallows the hook error → "(no response generated)".
|
||||
# This is the permanent fix — runs at every container start.
|
||||
for f in /configs/.claude/hooks/*.sh /configs/.claude/hooks/*.py; do
|
||||
[ -f "$f" ] && sed -i 's/\r$//' "$f"
|
||||
done
|
||||
# /workspace handling — only chown when the contents are root-owned
|
||||
# (typical on Docker Desktop on Windows where host uid maps to 0).
|
||||
# On Linux Docker with matching uids the recursive chown is skipped
|
||||
# to keep startup fast.
|
||||
chown agent:agent /workspace 2>/dev/null || true
|
||||
if [ -d /workspace ]; then
|
||||
first_entry=$(find /workspace -mindepth 1 -maxdepth 1 -print -quit 2>/dev/null)
|
||||
if [ -n "$first_entry" ] && [ "$(stat -c '%u' "$first_entry" 2>/dev/null)" = "0" ]; then
|
||||
chown -R agent:agent /workspace 2>/dev/null
|
||||
fi
|
||||
fi
|
||||
# Claude Code session directory — mounted at /root/.claude/sessions by
|
||||
# the platform provisioner. Symlink it into agent's home so the SDK
|
||||
# finds it when running as agent. The provisioner's mount point is
|
||||
# hardcoded to /root/.claude/sessions; we don't want to change the
|
||||
# platform contract just for this template.
|
||||
mkdir -p /home/agent/.claude
|
||||
if [ -d /root/.claude/sessions ]; then
|
||||
chown -R agent:agent /root/.claude /home/agent/.claude 2>/dev/null
|
||||
ln -sfn /root/.claude/sessions /home/agent/.claude/sessions
|
||||
fi
|
||||
|
||||
# --- Per-persona git identity (closes molecule-core#155) ---
|
||||
# Without this, every team commit lands with an empty author and Gitea
|
||||
# attributes the work to the founder PAT instead of the persona that
|
||||
# actually authored it. Same fingerprint that got us suspended on GitHub
|
||||
# 2026-05-06. GITEA_USER is injected by the provisioner from the
|
||||
# workspace_secrets table; bot.moleculesai.app is the agent-only domain
|
||||
# so commits are clearly distinguishable from human authors.
|
||||
if [ -n "${GITEA_USER:-}" ]; then
|
||||
git config --global user.name "${GITEA_USER}"
|
||||
git config --global user.email "${GITEA_USER}@bot.moleculesai.app"
|
||||
fi
|
||||
|
||||
# --- GitHub credential helper setup (issue #547 / #613) ---
|
||||
# Configure git to use the molecule credential helper for github.com.
|
||||
# This runs as root so the global gitconfig is written before we drop
|
||||
# to agent. The helper fetches fresh GitHub App installation tokens
|
||||
# from the platform API, with caching and env-var fallback.
|
||||
#
|
||||
# NOTE: post-suspension (2026-05-06), github.com/Molecule-AI is gone;
|
||||
# the helper's platform endpoint also 500s (internal#187). The helper
|
||||
# block is kept for legacy boxes that still have a working token chain;
|
||||
# post-suspension provisioner injects GITEA_TOKEN directly so this
|
||||
# path's failure is non-fatal. Full removal tracked under #171.
|
||||
if [ -x /app/scripts/molecule-git-token-helper.sh ]; then
|
||||
# Set credential helper for github.com only (not all hosts).
|
||||
# The '!' prefix tells git to run the command as a shell command.
|
||||
git config --global "credential.https://github.com.helper" \
|
||||
"!/app/scripts/molecule-git-token-helper.sh"
|
||||
# Disable other credential helpers for github.com to avoid conflicts.
|
||||
git config --global "credential.https://github.com.useHttpPath" true
|
||||
fi
|
||||
# Move gitconfig to agent's home so it takes effect after gosu —
|
||||
# done unconditionally so the per-persona identity survives the drop
|
||||
# even when the github.com helper block is skipped.
|
||||
if [ -f /root/.gitconfig ]; then
|
||||
cp /root/.gitconfig /home/agent/.gitconfig
|
||||
chown agent:agent /home/agent/.gitconfig
|
||||
fi
|
||||
# Create the token cache directory for the agent user.
|
||||
mkdir -p /home/agent/.molecule-token-cache
|
||||
chown agent:agent /home/agent/.molecule-token-cache
|
||||
chmod 700 /home/agent/.molecule-token-cache
|
||||
|
||||
exec gosu agent "$0" "$@"
|
||||
fi
|
||||
|
||||
# Now running as agent (uid 1000)
|
||||
|
||||
# --- Start background token refresh daemon (with respawn supervision) ---
|
||||
# Keeps gh CLI and git credentials fresh across the 60-min token TTL.
|
||||
# Wrapped in a respawn loop so a daemon crash doesn't silently leave the
|
||||
# workspace stuck on an expired token. Runs in the background; entrypoint
|
||||
# continues to exec molecule-runtime.
|
||||
if [ -x /app/scripts/molecule-gh-token-refresh.sh ]; then
|
||||
nohup bash -c '
|
||||
while true; do
|
||||
/app/scripts/molecule-gh-token-refresh.sh
|
||||
rc=$?
|
||||
echo "[molecule-gh-token-refresh] daemon exited rc=$rc — respawning in 30s" >&2
|
||||
sleep 30
|
||||
done
|
||||
' > /home/agent/.gh-token-refresh.log 2>&1 &
|
||||
fi
|
||||
|
||||
# --- Initial gh auth setup ---
|
||||
# If GITHUB_TOKEN or GH_TOKEN is set (injected at provision time),
|
||||
# authenticate gh CLI with it so it works immediately (before the first
|
||||
# background refresh fires). The background daemon will replace this
|
||||
# with a fresh token within ~60s of boot.
|
||||
if [ -n "${GITHUB_TOKEN:-}" ]; then
|
||||
echo "${GITHUB_TOKEN}" | gh auth login --hostname github.com --with-token 2>/dev/null || true
|
||||
elif [ -n "${GH_TOKEN:-}" ]; then
|
||||
echo "${GH_TOKEN}" | gh auth login --hostname github.com --with-token 2>/dev/null || true
|
||||
fi
|
||||
|
||||
exec molecule-runtime "$@"
|
||||
@@ -1,249 +0,0 @@
|
||||
"""Workspace event log — append-and-query buffer for runtime events.
|
||||
|
||||
Hermes-style declarative observability primitive. Adapter and platform
|
||||
code emit semantic events (turn started, tool invoked, peer message
|
||||
delivered) and external readers — the canvas Activity tab, A2A peers,
|
||||
and the platform's `/workspaces/:id/activity` endpoint — query them
|
||||
with a cursor.
|
||||
|
||||
Today's PR ships the in-memory backend only. Redis backend lands in
|
||||
the follow-up that wires platform-side fan-out (#119 PR-3 follow-up).
|
||||
The Protocol shape lets a future backend swap in without touching the
|
||||
emitting sites.
|
||||
|
||||
Eviction is the load-bearing invariant: the workspace runtime is
|
||||
long-lived, so an unbounded list would leak memory. Every append
|
||||
prunes by both TTL and max_entries; readers that fall behind past
|
||||
the eviction frontier see a contiguous tail without an error — the
|
||||
cursor protocol only guarantees "events with id > since that are
|
||||
still resident", not "every event ever appended". A reader that
|
||||
needs at-least-once delivery must poll faster than the eviction TTL.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import threading
|
||||
import time
|
||||
from collections import deque
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from typing import Any, Deque, Iterable, Optional, Protocol
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class Event:
|
||||
"""One immutable entry in the event log.
|
||||
|
||||
``id`` is a monotonic integer assigned at append time. It SURVIVES
|
||||
eviction — the counter is never reset when an old event drops out
|
||||
of the buffer, so a reader's cursor stays valid even if the event
|
||||
it points to has aged out (the next query just returns the resident
|
||||
tail). This is the contract that lets a slow reader reconnect
|
||||
without resetting to id=0.
|
||||
"""
|
||||
|
||||
id: int
|
||||
timestamp: float
|
||||
"""Seconds since the Unix epoch — the same shape as ``time.time()``
|
||||
so callers can format with ``datetime.fromtimestamp`` without an
|
||||
extra conversion. Float, not int, because event-bursts within the
|
||||
same second need stable ordering for downstream merging."""
|
||||
|
||||
kind: str
|
||||
"""Short tag categorising the event: ``turn.started``, ``tool.invoked``,
|
||||
``peer.message.delivered``, etc. Convention is dotted snake_case so
|
||||
the canvas can group by prefix without a parser."""
|
||||
|
||||
payload: dict = field(default_factory=dict)
|
||||
"""Arbitrary JSON-serialisable dict. Keep small — the in-memory
|
||||
backend holds every event in process RAM. Large blobs (file
|
||||
contents, full transcripts) belong in the platform's blob store
|
||||
with a reference here, not the value itself."""
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Plain-dict shape for JSON serialisation in the API layer.
|
||||
|
||||
Wrapping ``dataclasses.asdict`` rather than relying on the
|
||||
consumer to call it themselves means the wire format stays
|
||||
owned by this module — a rename of ``kind`` to ``type`` (or
|
||||
whatever the canvas eventually settles on) flips here, not in
|
||||
every reader.
|
||||
"""
|
||||
return asdict(self)
|
||||
|
||||
|
||||
class EventLogBackend(Protocol):
|
||||
"""Backend Protocol — the swap point for memory ↔ redis ↔ disabled.
|
||||
|
||||
Implementations must be safe to call from multiple threads. The
|
||||
workspace runtime appends from the heartbeat thread, the agent's
|
||||
main loop, and any A2A executor concurrently; readers run on the
|
||||
HTTP server thread. A backend that needs locking owns it.
|
||||
"""
|
||||
|
||||
def append(self, kind: str, payload: Optional[dict] = None) -> Event:
|
||||
"""Add an event and return the persisted record (with id assigned)."""
|
||||
...
|
||||
|
||||
def query(self, since: Optional[int] = None, limit: Optional[int] = None) -> list[Event]:
|
||||
"""Return events with ``id > since`` (or all resident if ``since`` is None).
|
||||
|
||||
Order is ascending by id. ``limit`` caps the returned slice;
|
||||
if the resident tail is shorter than ``limit``, returns what
|
||||
is available.
|
||||
"""
|
||||
...
|
||||
|
||||
def clear(self) -> None:
|
||||
"""Drop all entries. Provided for test isolation, not for production callers."""
|
||||
...
|
||||
|
||||
|
||||
class InMemoryEventLog:
|
||||
"""Bounded in-memory ring buffer with TTL eviction.
|
||||
|
||||
Two eviction triggers, both checked on every ``append`` (and on
|
||||
``query`` for read-side freshness when older entries have aged
|
||||
past the TTL but no append has happened to evict them):
|
||||
|
||||
- **TTL:** entries older than ``ttl_seconds`` are dropped.
|
||||
- **max_entries:** when the deque exceeds ``max_entries``, oldest
|
||||
drop until back at the cap.
|
||||
|
||||
Both bounds are advisory at construction — non-positive values
|
||||
fall back to permissive defaults rather than disabling the log,
|
||||
because a misconfigured value should not silently lose events.
|
||||
To disable the log, use ``DisabledEventLog`` instead.
|
||||
|
||||
The id counter is monotonic across the entire process lifetime;
|
||||
eviction does not reset it. A query with ``since=last_seen_id``
|
||||
returns the resident tail past that cursor, which may be empty if
|
||||
the reader is too far behind.
|
||||
"""
|
||||
|
||||
_DEFAULT_TTL_SECONDS = 3600 # 1 hour — covers a long agentic loop without leaking
|
||||
_DEFAULT_MAX_ENTRIES = 10_000 # ~1 MB at 100 bytes/event, safely under workspace RAM budget
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
ttl_seconds: int = _DEFAULT_TTL_SECONDS,
|
||||
max_entries: int = _DEFAULT_MAX_ENTRIES,
|
||||
now: Optional[Any] = None,
|
||||
) -> None:
|
||||
self._ttl_seconds: int = ttl_seconds if ttl_seconds > 0 else self._DEFAULT_TTL_SECONDS
|
||||
self._max_entries: int = max_entries if max_entries > 0 else self._DEFAULT_MAX_ENTRIES
|
||||
# Injected clock for deterministic TTL tests. Production passes
|
||||
# ``time.time``; tests pass a callable that returns a controlled value.
|
||||
self._now = now if callable(now) else time.time
|
||||
self._lock = threading.Lock()
|
||||
self._next_id: int = 1
|
||||
self._buf: Deque[Event] = deque()
|
||||
|
||||
def append(self, kind: str, payload: Optional[dict] = None) -> Event:
|
||||
with self._lock:
|
||||
event = Event(
|
||||
id=self._next_id,
|
||||
timestamp=self._now(),
|
||||
kind=kind,
|
||||
payload=dict(payload) if payload else {},
|
||||
)
|
||||
self._next_id += 1
|
||||
self._buf.append(event)
|
||||
self._evict_locked()
|
||||
return event
|
||||
|
||||
def query(self, since: Optional[int] = None, limit: Optional[int] = None) -> list[Event]:
|
||||
with self._lock:
|
||||
# Read-side TTL sweep — covers the case where appends pause
|
||||
# but a reader keeps polling. Without this, a stale tail
|
||||
# would survive forever once writes stop.
|
||||
self._evict_locked()
|
||||
cutoff = since if since is not None else 0
|
||||
tail: Iterable[Event] = (e for e in self._buf if e.id > cutoff)
|
||||
if limit is not None and limit >= 0:
|
||||
if limit == 0:
|
||||
# Explicit empty-slice probe — used by pagination
|
||||
# UIs to ask "are there any new events?" without
|
||||
# paying for the data. Distinct from limit=None
|
||||
# (no cap) — return empty rather than the first event.
|
||||
return []
|
||||
out: list[Event] = []
|
||||
for e in tail:
|
||||
out.append(e)
|
||||
if len(out) >= limit:
|
||||
break
|
||||
return out
|
||||
return list(tail)
|
||||
|
||||
def clear(self) -> None:
|
||||
with self._lock:
|
||||
self._buf.clear()
|
||||
# NOTE: do NOT reset _next_id — the cursor contract is that
|
||||
# ids are monotonic across the lifetime of the process, even
|
||||
# across explicit clears (which only happen in tests).
|
||||
|
||||
def _evict_locked(self) -> None:
|
||||
"""Caller MUST hold self._lock."""
|
||||
if not self._buf:
|
||||
return
|
||||
cutoff = self._now() - self._ttl_seconds
|
||||
while self._buf and self._buf[0].timestamp < cutoff:
|
||||
self._buf.popleft()
|
||||
# max_entries bound after TTL — a long buffer that fits the
|
||||
# window can still be capped if the burst rate exceeded design.
|
||||
while len(self._buf) > self._max_entries:
|
||||
self._buf.popleft()
|
||||
|
||||
|
||||
class DisabledEventLog:
|
||||
"""No-op backend for ``backend: disabled``.
|
||||
|
||||
Append returns a synthetic event so callers that want the id
|
||||
don't crash; query always returns empty. The synthetic event is
|
||||
NOT cached anywhere — the contract for ``backend: disabled`` is
|
||||
that no state is retained. Operators who pick this backend opt
|
||||
out of the canvas Activity tab and the `/activity` endpoint.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._next_id: int = 1
|
||||
self._lock = threading.Lock()
|
||||
|
||||
def append(self, kind: str, payload: Optional[dict] = None) -> Event:
|
||||
# Single-shot id increment — keeps the returned event ids
|
||||
# monotonic for callers that compare them, even though we
|
||||
# never persist anything.
|
||||
with self._lock:
|
||||
event = Event(
|
||||
id=self._next_id,
|
||||
timestamp=time.time(),
|
||||
kind=kind,
|
||||
payload=dict(payload) if payload else {},
|
||||
)
|
||||
self._next_id += 1
|
||||
return event
|
||||
|
||||
def query(self, since: Optional[int] = None, limit: Optional[int] = None) -> list[Event]:
|
||||
return []
|
||||
|
||||
def clear(self) -> None:
|
||||
return None
|
||||
|
||||
|
||||
def create_event_log(
|
||||
backend: str = "memory",
|
||||
ttl_seconds: int = InMemoryEventLog._DEFAULT_TTL_SECONDS,
|
||||
max_entries: int = InMemoryEventLog._DEFAULT_MAX_ENTRIES,
|
||||
) -> EventLogBackend:
|
||||
"""Factory — pick a backend by name from EventLogConfig.
|
||||
|
||||
Unknown backend strings fall back to ``memory`` rather than
|
||||
raising at boot. A typo'd config value should degrade to the
|
||||
safe default, not crash the workspace before any event can be
|
||||
recorded. The redis backend lands in a follow-up; until then
|
||||
``backend: redis`` also resolves to in-memory.
|
||||
"""
|
||||
name = (backend or "memory").strip().lower()
|
||||
if name in ("disabled", "off", "none"):
|
||||
return DisabledEventLog()
|
||||
# memory is the default; redis falls through here until it's wired.
|
||||
return InMemoryEventLog(ttl_seconds=ttl_seconds, max_entries=max_entries)
|
||||
@@ -1,96 +0,0 @@
|
||||
"""WebSocket subscriber for platform events.
|
||||
|
||||
Subscribes to the platform WebSocket with X-Workspace-ID header
|
||||
so the workspace only receives events about reachable peers.
|
||||
Triggers system prompt rebuild on relevant peer changes.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Events that should trigger a system prompt rebuild
|
||||
REBUILD_EVENTS = {
|
||||
"WORKSPACE_ONLINE",
|
||||
"WORKSPACE_OFFLINE",
|
||||
"WORKSPACE_EXPANDED",
|
||||
"WORKSPACE_COLLAPSED",
|
||||
"WORKSPACE_REMOVED",
|
||||
"AGENT_CARD_UPDATED",
|
||||
}
|
||||
|
||||
|
||||
class PlatformEventSubscriber:
|
||||
"""Subscribes to platform WebSocket for peer events."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
platform_url: str,
|
||||
workspace_id: str,
|
||||
on_peer_change=None,
|
||||
):
|
||||
self.ws_url = platform_url.replace("http://", "ws://").replace("https://", "wss://") + "/ws"
|
||||
self.workspace_id = workspace_id
|
||||
self.on_peer_change = on_peer_change
|
||||
self._running = False
|
||||
self._reconnect_delay = 1.0
|
||||
|
||||
async def start(self):
|
||||
"""Connect to platform WebSocket with exponential backoff reconnect."""
|
||||
self._running = True
|
||||
|
||||
while self._running:
|
||||
try:
|
||||
await self._connect()
|
||||
except Exception as e:
|
||||
if not self._running:
|
||||
break
|
||||
logger.warning("WebSocket disconnected: %s. Reconnecting in %.0fs...", e, self._reconnect_delay)
|
||||
await asyncio.sleep(self._reconnect_delay)
|
||||
self._reconnect_delay = min(self._reconnect_delay * 2, 30.0)
|
||||
|
||||
async def _connect(self):
|
||||
"""Establish WebSocket connection and process events."""
|
||||
try:
|
||||
import websockets
|
||||
except ImportError:
|
||||
logger.warning("websockets package not installed, skipping event subscription")
|
||||
self._running = False
|
||||
return
|
||||
|
||||
# Fix D (Cycle 5): include bearer token in WebSocket upgrade so the
|
||||
# server's new auth check can validate this agent connection.
|
||||
# Graceful fallback for workspaces that have no token yet.
|
||||
headers = {"X-Workspace-ID": self.workspace_id}
|
||||
try:
|
||||
from platform_auth import auth_headers as _auth_headers
|
||||
headers.update(_auth_headers())
|
||||
except Exception:
|
||||
pass # No token available — connect unauthenticated (grandfathered)
|
||||
logger.info("Connecting to platform WebSocket: %s", self.ws_url)
|
||||
|
||||
async with websockets.connect(self.ws_url, additional_headers=headers) as ws:
|
||||
self._reconnect_delay = 1.0 # Reset on successful connect
|
||||
logger.info("Platform WebSocket connected")
|
||||
|
||||
async for message in ws:
|
||||
try:
|
||||
event = json.loads(message)
|
||||
event_type = event.get("event", "")
|
||||
|
||||
if event_type in REBUILD_EVENTS:
|
||||
logger.info("Peer event: %s for workspace %s",
|
||||
event_type, event.get("workspace_id", ""))
|
||||
if self.on_peer_change:
|
||||
await self.on_peer_change(event)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
except Exception as e:
|
||||
logger.warning("Error processing event: %s", e)
|
||||
|
||||
def stop(self):
|
||||
self._running = False
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,706 +0,0 @@
|
||||
"""Heartbeat loop — alive signal + delegation status checker.
|
||||
|
||||
Every 30 seconds:
|
||||
1. Send heartbeat to platform (alive signal with current_task, error_rate)
|
||||
2. Check pending delegations — any results back?
|
||||
3. Store completed delegation results for the agent to pick up
|
||||
|
||||
Resilient: recreates HTTP client on failure, auto-restarts on crash.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
import httpx
|
||||
|
||||
from platform_auth import auth_headers, refresh_cache, self_source_headers
|
||||
|
||||
|
||||
def _runtime_state_payload() -> dict:
|
||||
"""Build the {runtime_state, sample_error} portion of the heartbeat
|
||||
body when SOME adapter executor has marked itself wedged. Returns
|
||||
an empty dict when the runtime is healthy so the heartbeat payload
|
||||
doesn't grow fields the platform doesn't need.
|
||||
|
||||
Source of truth is runtime_wedge (lives in molecule-runtime,
|
||||
independent of any specific adapter). Pre task #87 this imported
|
||||
from claude_sdk_executor — that worked because the executor was
|
||||
bundled into molecule-runtime, but blocked moving it to the
|
||||
claude-code template repo. The runtime_wedge module is now the
|
||||
cross-cutting wedge-state holder; adapters mark/clear via it,
|
||||
heartbeat reads it.
|
||||
|
||||
Imported lazily so a workspace whose runtime image somehow ships
|
||||
without runtime_wedge (corrupt install, mid-rolling-deploy state)
|
||||
keeps heartbeating — a missing import means "no wedge info; assume
|
||||
healthy."
|
||||
"""
|
||||
try:
|
||||
from runtime_wedge import is_wedged, wedge_reason
|
||||
except Exception:
|
||||
return {}
|
||||
if not is_wedged():
|
||||
return {}
|
||||
return {
|
||||
"runtime_state": "wedged",
|
||||
# sample_error doubles as the human-readable banner text on the
|
||||
# canvas's degraded card — keep it short and actionable.
|
||||
"sample_error": wedge_reason(),
|
||||
}
|
||||
|
||||
|
||||
def _runtime_metadata_payload() -> dict:
|
||||
"""Build the {runtime_metadata} portion of the heartbeat body —
|
||||
adapter-declared capabilities + per-capability override values
|
||||
(idle timeout, etc.). The platform reads this to route capabilities
|
||||
to the right owner: native (adapter) vs fallback (platform).
|
||||
|
||||
Returns an empty dict if the adapter can't be loaded or introspected.
|
||||
Heartbeat must NEVER fail because of capability discovery — observability
|
||||
is more important than capability accuracy. The platform falls through
|
||||
to its own defaults when fields are missing.
|
||||
|
||||
See project memory `project_runtime_native_pluggable.md` and
|
||||
workspace/adapter_base.py:RuntimeCapabilities.
|
||||
"""
|
||||
try:
|
||||
from adapters import get_adapter
|
||||
# ADAPTER_MODULE wins over the runtime arg in get_adapter — pass
|
||||
# an empty string to force the env-var path.
|
||||
adapter_cls = get_adapter("")
|
||||
adapter = adapter_cls()
|
||||
caps = adapter.capabilities()
|
||||
meta: dict = {"capabilities": caps.to_dict()}
|
||||
idle = adapter.idle_timeout_override()
|
||||
# Only include the override when it's a positive integer. None /
|
||||
# zero / negative falls through to the platform's global default
|
||||
# (env A2A_IDLE_TIMEOUT_SECONDS, default 5min) — that "absent
|
||||
# field = use default" contract is what keeps the wire small.
|
||||
if isinstance(idle, int) and idle > 0:
|
||||
meta["idle_timeout_seconds"] = idle
|
||||
return {"runtime_metadata": meta}
|
||||
except Exception as e:
|
||||
# debug-level: missing ADAPTER_MODULE in dev / test envs is normal
|
||||
logger.debug("runtime_metadata: failed to read adapter caps: %s", e)
|
||||
return {}
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _persist_inbound_secret_from_heartbeat(resp) -> None:
|
||||
"""Persist ``platform_inbound_secret`` from a heartbeat response, if any.
|
||||
|
||||
The platform's heartbeat handler (workspace-server PR #2421) returns
|
||||
the secret on every beat — mirrors /registry/register so a workspace
|
||||
whose secret was lazy-healed on the platform side picks it up within
|
||||
one heartbeat tick instead of requiring a runtime restart.
|
||||
|
||||
Without this delivery path the chat-upload code path's "secret was
|
||||
just minted, will pick up on next heartbeat" 503 message is a lie
|
||||
and the workspace stays 401-forever until the operator restarts the
|
||||
runtime. Caught 2026-04-30 on the hongmingwang tenant — the
|
||||
standalone wrapper (mcp_cli.py) got the same change in #2421 but
|
||||
the in-container heartbeat (this file) was missed in the first
|
||||
pass.
|
||||
|
||||
Failure is non-fatal: if the body isn't JSON, doesn't carry the
|
||||
field, or the disk write fails, the next heartbeat retries. This
|
||||
matches the cold-start register flow in main.py:319-323.
|
||||
"""
|
||||
try:
|
||||
body = resp.json()
|
||||
except Exception:
|
||||
return
|
||||
if not isinstance(body, dict):
|
||||
return
|
||||
secret = body.get("platform_inbound_secret")
|
||||
if not secret:
|
||||
return
|
||||
try:
|
||||
from platform_inbound_auth import save_inbound_secret
|
||||
|
||||
save_inbound_secret(secret)
|
||||
except Exception as exc:
|
||||
logger.warning(
|
||||
"heartbeat: persist inbound secret failed: %s", exc
|
||||
)
|
||||
|
||||
|
||||
HEARTBEAT_INTERVAL = 30 # seconds — fallback default when no per-instance value is passed
|
||||
MAX_CONSECUTIVE_FAILURES = 10
|
||||
MAX_SEEN_DELEGATION_IDS = 200
|
||||
SELF_MESSAGE_COOLDOWN = 60 # seconds — minimum between self-messages to prevent loops
|
||||
# Shared path — adapter executors (in their template repos) read this
|
||||
# same file via executor_helpers.read_delegation_results so heartbeat-
|
||||
# delivered async delegation results land in the next agent turn.
|
||||
DELEGATION_RESULTS_FILE = os.environ.get("DELEGATION_RESULTS_FILE", "/tmp/delegation_results.jsonl")
|
||||
# Cursor file for tracking activity_log IDs processed from the a2a_receive path
|
||||
# (delegations fired via tool_delegate_task → POST /workspaces/:id/a2a proxy, not
|
||||
# POST /workspaces/:id/delegate). Persisted to disk so heartbeat restarts
|
||||
# don't re-process the same rows.
|
||||
_ACTIVITY_DELEGATION_CURSOR_FILE = os.environ.get(
|
||||
"DELEGATION_ACTIVITY_CURSOR_FILE",
|
||||
"/tmp/delegation_activity_cursor",
|
||||
)
|
||||
|
||||
|
||||
class HeartbeatLoop:
|
||||
def __init__(
|
||||
self,
|
||||
platform_url: str,
|
||||
workspace_id: str,
|
||||
interval_seconds: int = HEARTBEAT_INTERVAL,
|
||||
):
|
||||
self.platform_url = platform_url
|
||||
self.workspace_id = workspace_id
|
||||
# Per-instance interval — main.py threads ObservabilityConfig.
|
||||
# heartbeat_interval_seconds (clamped to [5, 300] at parse time)
|
||||
# in here so operators can tune cadence per-workspace via the
|
||||
# `observability:` block in config.yaml. Defaults to the
|
||||
# legacy module constant so callers that haven't been updated
|
||||
# yet (and tests that construct HeartbeatLoop directly with the
|
||||
# 2-arg signature) keep their existing 30s behavior.
|
||||
self._interval_seconds = interval_seconds
|
||||
self.start_time = time.time()
|
||||
self.error_count = 0
|
||||
self.request_count = 0
|
||||
self.active_tasks = 0
|
||||
self.current_task = ""
|
||||
self.sample_error = ""
|
||||
self._task = None
|
||||
self._consecutive_failures = 0
|
||||
self._seen_delegation_ids: set[str] = set()
|
||||
self._last_self_message_time = 0.0
|
||||
self._parent_name: str | None = None # Cached after first lookup
|
||||
# Seen activity IDs for a2a_receive polling (delegations via POST /a2a proxy path).
|
||||
# Loaded lazily from cursor file on first poll to avoid blocking startup.
|
||||
self._seen_activity_ids: set[str] = set()
|
||||
self._activity_cursor_loaded = False
|
||||
|
||||
@property
|
||||
def error_rate(self) -> float:
|
||||
if self.request_count == 0:
|
||||
return 0.0
|
||||
return self.error_count / self.request_count
|
||||
|
||||
def record_error(self, error: str):
|
||||
self.error_count += 1
|
||||
self.request_count += 1
|
||||
self.sample_error = error
|
||||
|
||||
def record_success(self):
|
||||
self.request_count += 1
|
||||
|
||||
def start(self):
|
||||
self._task = asyncio.create_task(self._loop())
|
||||
self._task.add_done_callback(self._on_done)
|
||||
|
||||
def _on_done(self, task):
|
||||
if not task.cancelled() and task.exception():
|
||||
logger.error("Heartbeat loop died: %s — restarting", task.exception())
|
||||
self._task = asyncio.create_task(self._loop())
|
||||
self._task.add_done_callback(self._on_done)
|
||||
|
||||
async def stop(self):
|
||||
if self._task:
|
||||
self._task.cancel()
|
||||
try:
|
||||
await self._task
|
||||
except asyncio.CancelledError:
|
||||
pass
|
||||
|
||||
async def _loop(self):
|
||||
while True:
|
||||
client = None
|
||||
try:
|
||||
client = httpx.AsyncClient(timeout=10.0)
|
||||
while True:
|
||||
# 1. Send heartbeat (Phase 30.1: include auth header if token known)
|
||||
try:
|
||||
body = {
|
||||
"workspace_id": self.workspace_id,
|
||||
"error_rate": self.error_rate,
|
||||
"sample_error": self.sample_error,
|
||||
"active_tasks": self.active_tasks,
|
||||
"current_task": self.current_task,
|
||||
"uptime_seconds": int(time.time() - self.start_time),
|
||||
}
|
||||
# Layer the runtime-wedge fields on top so a
|
||||
# non-empty sample_error from the wedge wins
|
||||
# over the (typically empty) heartbeat
|
||||
# sample_error field. The platform reads
|
||||
# runtime_state to flip status → degraded.
|
||||
body.update(_runtime_state_payload())
|
||||
body.update(_runtime_metadata_payload())
|
||||
resp = await client.post(
|
||||
f"{self.platform_url}/registry/heartbeat",
|
||||
json=body,
|
||||
headers=auth_headers(),
|
||||
)
|
||||
self.error_count = 0
|
||||
self.request_count = 0
|
||||
self._consecutive_failures = 0
|
||||
# 2026-04-30: persist the platform_inbound_secret
|
||||
# if the heartbeat response carries one. Mirrors
|
||||
# the cold-start register flow in main.py:319-323
|
||||
# and closes the recovery path for workspaces
|
||||
# whose secret was lazy-healed on the platform
|
||||
# side after register-time. Without this, the
|
||||
# workspace stays 401-forever on chat upload
|
||||
# until restart. See workspace-server PR #2421
|
||||
# for the server-side delivery change.
|
||||
_persist_inbound_secret_from_heartbeat(resp)
|
||||
except Exception as e:
|
||||
self._consecutive_failures += 1
|
||||
# Issue #1877: if heartbeat 401'd, re-read the token from disk
|
||||
# and retry once. This handles the platform's token-rotation race
|
||||
# where WriteFilesToContainer hasn't finished writing the new
|
||||
# token before the runtime boots and caches the old value.
|
||||
is_401 = False
|
||||
if isinstance(e, httpx.HTTPStatusError) and e.response.status_code == 401:
|
||||
is_401 = True
|
||||
if is_401:
|
||||
logger.warning("Heartbeat 401 for %s — refreshing token cache and retrying once", self.workspace_id)
|
||||
refresh_cache()
|
||||
try:
|
||||
retry_body = {
|
||||
"workspace_id": self.workspace_id,
|
||||
"error_rate": self.error_rate,
|
||||
"sample_error": self.sample_error,
|
||||
"active_tasks": self.active_tasks,
|
||||
"current_task": self.current_task,
|
||||
"uptime_seconds": int(time.time() - self.start_time),
|
||||
}
|
||||
retry_body.update(_runtime_state_payload())
|
||||
retry_resp = await client.post(
|
||||
f"{self.platform_url}/registry/heartbeat",
|
||||
json=retry_body,
|
||||
headers=auth_headers(),
|
||||
)
|
||||
self._consecutive_failures = 0
|
||||
self.request_count += 1
|
||||
_persist_inbound_secret_from_heartbeat(retry_resp)
|
||||
except Exception:
|
||||
# Retry also failed — fall through to the normal
|
||||
# failure tracking below.
|
||||
pass
|
||||
if self._consecutive_failures <= 3 or self._consecutive_failures % MAX_CONSECUTIVE_FAILURES == 0:
|
||||
logger.warning("Heartbeat failed (%d consecutive): %s", self._consecutive_failures, e)
|
||||
if self._consecutive_failures >= MAX_CONSECUTIVE_FAILURES:
|
||||
logger.info("Heartbeat: recreating HTTP client after %d failures", self._consecutive_failures)
|
||||
try:
|
||||
await client.aclose()
|
||||
except Exception:
|
||||
pass
|
||||
break
|
||||
|
||||
# 2. Check delegation status
|
||||
try:
|
||||
await self._check_delegations(client)
|
||||
except Exception as e:
|
||||
logger.debug("Delegation check failed: %s", e)
|
||||
|
||||
# 3. Check activity_logs for delegation results that arrived via
|
||||
# the POST /a2a proxy path (tool_delegate_task → send_a2a_message).
|
||||
# These are NOT written to the delegations table, so
|
||||
# _check_delegations misses them. See issue #354.
|
||||
try:
|
||||
await self._check_activity_delegations(client)
|
||||
except Exception as e:
|
||||
logger.debug("Activity delegation check failed: %s", e)
|
||||
|
||||
await asyncio.sleep(self._interval_seconds)
|
||||
|
||||
except asyncio.CancelledError:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
"Heartbeat loop error: %s — retrying in %ds", e, self._interval_seconds
|
||||
)
|
||||
await asyncio.sleep(self._interval_seconds)
|
||||
finally:
|
||||
if client:
|
||||
try:
|
||||
await client.aclose()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
async def _check_delegations(self, client: httpx.AsyncClient):
|
||||
"""Check for completed delegations and store results for the agent."""
|
||||
try:
|
||||
resp = await client.get(
|
||||
f"{self.platform_url}/workspaces/{self.workspace_id}/delegations",
|
||||
headers=auth_headers(),
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
return
|
||||
|
||||
delegations = resp.json()
|
||||
if not isinstance(delegations, list):
|
||||
return
|
||||
|
||||
new_results = []
|
||||
for d in delegations:
|
||||
did = d.get("delegation_id", "")
|
||||
status = d.get("status", "")
|
||||
|
||||
if not did or did in self._seen_delegation_ids:
|
||||
continue
|
||||
|
||||
if status in ("completed", "failed"):
|
||||
# Fix B (Cycle 5): validate source_id before accepting delegation
|
||||
# results. Only process delegations that THIS workspace created
|
||||
# (source_id == self.workspace_id). Attacker-crafted delegation
|
||||
# records with a foreign source_id cannot inject instructions.
|
||||
source_id = d.get("source_id", "")
|
||||
if source_id != self.workspace_id:
|
||||
logger.warning(
|
||||
"Heartbeat: skipping delegation %s — source_id %r does not "
|
||||
"match this workspace %r; possible injection attempt",
|
||||
did, source_id, self.workspace_id,
|
||||
)
|
||||
self._seen_delegation_ids.add(did) # mark seen so we don't warn again
|
||||
continue
|
||||
|
||||
self._seen_delegation_ids.add(did)
|
||||
new_results.append({
|
||||
"delegation_id": did,
|
||||
"target_id": d.get("target_id", ""),
|
||||
"source_id": source_id,
|
||||
"status": status,
|
||||
"summary": d.get("summary", ""),
|
||||
"response_preview": d.get("response_preview", ""),
|
||||
"error": d.get("error", ""),
|
||||
"timestamp": time.time(),
|
||||
})
|
||||
|
||||
# Evict old seen IDs if over limit
|
||||
if len(self._seen_delegation_ids) > MAX_SEEN_DELEGATION_IDS:
|
||||
# Keep most recent half
|
||||
self._seen_delegation_ids = set(list(self._seen_delegation_ids)[MAX_SEEN_DELEGATION_IDS // 2:])
|
||||
|
||||
if new_results:
|
||||
# Append to results file for context injection on next message
|
||||
with open(DELEGATION_RESULTS_FILE, "a") as f:
|
||||
for r in new_results:
|
||||
f.write(json.dumps(r) + "\n")
|
||||
logger.info("Heartbeat: %d new delegation results — triggering self-message", len(new_results))
|
||||
|
||||
# Build a summary message for the agent.
|
||||
# Fix B (Cycle 5): do NOT embed raw response_preview text in
|
||||
# user-role A2A messages — that is the prompt-injection vector.
|
||||
# Instead reference only the delegation ID and status; the agent
|
||||
# reads full content from DELEGATION_RESULTS_FILE which was
|
||||
# written above from trusted platform data.
|
||||
summary_lines = []
|
||||
for r in new_results:
|
||||
line = f"- [{r['status']}] Delegation {r['delegation_id'][:8]}: {r['summary'][:80]}"
|
||||
if r.get("error"):
|
||||
line += f"\n Error: {r['error'][:100]}"
|
||||
summary_lines.append(line)
|
||||
|
||||
# Look up parent workspace (cached after first call)
|
||||
if self._parent_name is None:
|
||||
try:
|
||||
parent_resp = await client.get(
|
||||
f"{self.platform_url}/workspaces/{self.workspace_id}",
|
||||
headers=auth_headers(),
|
||||
)
|
||||
if parent_resp.status_code == 200:
|
||||
parent_id = parent_resp.json().get("parent_id", "")
|
||||
if parent_id:
|
||||
parent_info = await client.get(
|
||||
f"{self.platform_url}/workspaces/{parent_id}",
|
||||
headers=auth_headers(),
|
||||
)
|
||||
if parent_info.status_code == 200:
|
||||
self._parent_name = parent_info.json().get("name", "")
|
||||
if self._parent_name is None:
|
||||
self._parent_name = "" # No parent — cache empty
|
||||
except Exception:
|
||||
pass # Will retry next cycle
|
||||
parent_name = self._parent_name or ""
|
||||
|
||||
report_instruction = ""
|
||||
if parent_name:
|
||||
report_instruction = (
|
||||
f"\n\nIMPORTANT: Report these results back to your parent '{parent_name}' "
|
||||
f"by delegating a summary to them. Use delegate_task or delegate_task_async "
|
||||
f"with a concise status report. Also use send_message_to_user to notify the user."
|
||||
)
|
||||
else:
|
||||
report_instruction = (
|
||||
"\n\nReport results using send_message_to_user to notify the user."
|
||||
)
|
||||
|
||||
trigger_msg = (
|
||||
"Delegation results are ready. Review them and take appropriate action:\n"
|
||||
+ "\n".join(summary_lines)
|
||||
+ report_instruction
|
||||
)
|
||||
|
||||
# Send A2A self-message to wake the agent.
|
||||
# Minimum 60s between self-messages to avoid spam, but always send
|
||||
# when there are genuinely NEW results to process.
|
||||
now = time.time()
|
||||
if now - self._last_self_message_time < SELF_MESSAGE_COOLDOWN:
|
||||
logger.debug("Heartbeat: self-message cooldown (60s), will retry next cycle")
|
||||
else:
|
||||
self._last_self_message_time = now
|
||||
try:
|
||||
# self_source_headers() adds X-Workspace-ID so the
|
||||
# platform tags this row source=agent, not canvas
|
||||
# — see platform_auth.py for the full rationale.
|
||||
await client.post(
|
||||
f"{self.platform_url}/workspaces/{self.workspace_id}/a2a",
|
||||
json={
|
||||
"method": "message/send",
|
||||
"params": {
|
||||
"message": {
|
||||
"role": "user",
|
||||
"parts": [{"type": "text", "text": trigger_msg}],
|
||||
},
|
||||
},
|
||||
},
|
||||
headers=self_source_headers(self.workspace_id),
|
||||
timeout=120.0,
|
||||
)
|
||||
logger.info("Heartbeat: self-message sent to process delegation results")
|
||||
except Exception as e:
|
||||
logger.warning("Heartbeat: failed to send self-message: %s", e)
|
||||
|
||||
# Also push notification to user via canvas
|
||||
for r in new_results:
|
||||
try:
|
||||
msg = f"Delegation {r['status']}: {r['summary'][:100]}"
|
||||
if r.get("response_preview"):
|
||||
msg += f"\nResult: {r['response_preview'][:200]}"
|
||||
await client.post(
|
||||
f"{self.platform_url}/workspaces/{self.workspace_id}/notify",
|
||||
json={"message": msg, "type": "delegation_result"},
|
||||
headers=auth_headers(),
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
except Exception as e:
|
||||
logger.debug("Delegation check error: %s", e)
|
||||
|
||||
async def _check_activity_delegations(self, client: httpx.AsyncClient):
|
||||
"""Poll activity_logs for delegation results that arrived via the POST /a2a proxy path.
|
||||
|
||||
tool_delegate_task → send_a2a_message → POST /workspaces/:id/a2a (proxy)
|
||||
logs to activity_logs but NOT the delegations table. _check_delegations
|
||||
only checks the delegations table, so these results are invisible to the
|
||||
heartbeat — the agent never wakes up to consume them (issue #354).
|
||||
|
||||
This method closes that gap: polls GET /workspaces/:id/activity?type=a2a_receive,
|
||||
filters for rows from peer workspaces (source_id != "" and != self.workspace_id),
|
||||
tracks seen IDs with a cursor file, and sends a self-message to wake the agent.
|
||||
"""
|
||||
try:
|
||||
# Load cursor lazily on first call so startup is not blocked by disk I/O.
|
||||
if not self._activity_cursor_loaded:
|
||||
self._activity_cursor_loaded = True
|
||||
try:
|
||||
if os.path.exists(_ACTIVITY_DELEGATION_CURSOR_FILE):
|
||||
cursor = open(_ACTIVITY_DELEGATION_CURSOR_FILE).read().strip()
|
||||
if cursor:
|
||||
self._seen_activity_ids = set(cursor.split(","))
|
||||
except Exception:
|
||||
pass # Corrupt cursor — start fresh
|
||||
|
||||
params: dict[str, str] = {"type": "a2a_receive"}
|
||||
resp = await client.get(
|
||||
f"{self.platform_url}/workspaces/{self.workspace_id}/activity",
|
||||
params=params,
|
||||
headers=auth_headers(),
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
return
|
||||
|
||||
rows = resp.json()
|
||||
if not isinstance(rows, list):
|
||||
return
|
||||
|
||||
# Activity API returns newest-first; process in reverse order so
|
||||
# we advance the cursor monotonically (oldest → newest).
|
||||
rows = list(reversed(rows))
|
||||
|
||||
new_results: list[dict] = []
|
||||
last_id: str | None = None
|
||||
for row in rows:
|
||||
if not isinstance(row, dict):
|
||||
continue
|
||||
activity_id = str(row.get("id", ""))
|
||||
if not activity_id:
|
||||
continue
|
||||
last_id = activity_id
|
||||
|
||||
if activity_id in self._seen_activity_ids:
|
||||
continue
|
||||
|
||||
# Filter: must have a non-empty source_id that is NOT this workspace
|
||||
# (peer agent messages only; skip canvas-user messages and self-notify).
|
||||
source_id = row.get("source_id") or ""
|
||||
if not source_id or source_id == self.workspace_id:
|
||||
continue
|
||||
|
||||
self._seen_activity_ids.add(activity_id)
|
||||
summary = row.get("summary") or ""
|
||||
# Extract response text from request_body if available.
|
||||
# Shape mirrors inbox._extract_text: walk parts for "text" field.
|
||||
response_text = summary
|
||||
request_body = row.get("request_body")
|
||||
if isinstance(request_body, dict):
|
||||
params_obj = request_body.get("params")
|
||||
if isinstance(params_obj, dict):
|
||||
msg = params_obj.get("message")
|
||||
if isinstance(msg, dict):
|
||||
parts = msg.get("parts") or []
|
||||
texts = []
|
||||
for p in (parts if isinstance(parts, list) else []):
|
||||
if isinstance(p, dict) and p.get("kind") == "text" or p.get("type") == "text":
|
||||
t = p.get("text", "")
|
||||
if t:
|
||||
texts.append(t)
|
||||
if texts:
|
||||
response_text = " ".join(texts)
|
||||
|
||||
new_results.append({
|
||||
"delegation_id": activity_id, # Use activity ID as pseudo-delegation ID
|
||||
"target_id": source_id,
|
||||
"source_id": self.workspace_id,
|
||||
"status": "completed",
|
||||
"summary": summary,
|
||||
"response_preview": response_text[:4096],
|
||||
"error": "",
|
||||
"timestamp": time.time(),
|
||||
})
|
||||
|
||||
if not new_results:
|
||||
return
|
||||
|
||||
# Persist cursor so restarts don't re-process these rows.
|
||||
if last_id:
|
||||
try:
|
||||
with open(_ACTIVITY_DELEGATION_CURSOR_FILE, "w") as f:
|
||||
# Keep cursor as comma-joined IDs; truncate if over 100KB.
|
||||
cursor_str = ",".join(sorted(self._seen_activity_ids))
|
||||
if len(cursor_str) > 102_400:
|
||||
# Evict oldest half when cursor file grows too large.
|
||||
sorted_ids = sorted(self._seen_activity_ids)
|
||||
self._seen_activity_ids = set(sorted_ids[len(sorted_ids) // 2:])
|
||||
cursor_str = ",".join(sorted(self._seen_activity_ids))
|
||||
f.write(cursor_str)
|
||||
except Exception:
|
||||
pass # Non-fatal; next cycle will retry
|
||||
|
||||
# Append to results file and trigger self-message (mirrors _check_delegations).
|
||||
with open(DELEGATION_RESULTS_FILE, "a") as f:
|
||||
for r in new_results:
|
||||
f.write(json.dumps(r) + "\n")
|
||||
logger.info(
|
||||
"Heartbeat: %d new a2a_receive delegation results from activity_logs — "
|
||||
"triggering self-message",
|
||||
len(new_results),
|
||||
)
|
||||
|
||||
# Build and send self-message to wake the agent.
|
||||
summary_lines = []
|
||||
for r in new_results:
|
||||
line = f"- [completed] Peer response from {r['target_id'][:8]}: {r['summary'][:80] or '(no summary)'}"
|
||||
if r.get("error"):
|
||||
line += f"\n Error: {r['error'][:100]}"
|
||||
summary_lines.append(line)
|
||||
|
||||
# Look up parent name (reuse cached value from _check_delegations if set).
|
||||
if self._parent_name is None:
|
||||
try:
|
||||
parent_resp = await client.get(
|
||||
f"{self.platform_url}/workspaces/{self.workspace_id}",
|
||||
headers=auth_headers(),
|
||||
)
|
||||
if parent_resp.status_code == 200:
|
||||
parent_id = parent_resp.json().get("parent_id", "")
|
||||
if parent_id:
|
||||
parent_info = await client.get(
|
||||
f"{self.platform_url}/workspaces/{parent_id}",
|
||||
headers=auth_headers(),
|
||||
)
|
||||
if parent_info.status_code == 200:
|
||||
self._parent_name = parent_info.json().get("name", "")
|
||||
if self._parent_name is None:
|
||||
self._parent_name = ""
|
||||
except Exception:
|
||||
self._parent_name = ""
|
||||
parent_name = self._parent_name or ""
|
||||
|
||||
report_instruction = ""
|
||||
if parent_name:
|
||||
report_instruction = (
|
||||
f"\n\nIMPORTANT: Delegate a summary of these results to your parent "
|
||||
f"'{parent_name}' using delegate_task. Also use send_message_to_user "
|
||||
f"to notify the user."
|
||||
)
|
||||
else:
|
||||
report_instruction = (
|
||||
"\n\nReport results using send_message_to_user to notify the user."
|
||||
)
|
||||
|
||||
trigger_msg = (
|
||||
"Delegation results are ready (from a2a_receive via activity_logs). "
|
||||
"Review them and take appropriate action:\n"
|
||||
+ "\n".join(summary_lines)
|
||||
+ report_instruction
|
||||
)
|
||||
|
||||
now = time.time()
|
||||
if now - self._last_self_message_time < SELF_MESSAGE_COOLDOWN:
|
||||
logger.debug(
|
||||
"Heartbeat: self-message cooldown active; "
|
||||
"a2a_receive results will be retried next cycle"
|
||||
)
|
||||
else:
|
||||
self._last_self_message_time = now
|
||||
try:
|
||||
await client.post(
|
||||
f"{self.platform_url}/workspaces/{self.workspace_id}/a2a",
|
||||
json={
|
||||
"method": "message/send",
|
||||
"params": {
|
||||
"message": {
|
||||
"role": "user",
|
||||
"parts": [{"type": "text", "text": trigger_msg}],
|
||||
},
|
||||
},
|
||||
},
|
||||
headers=self_source_headers(self.workspace_id),
|
||||
timeout=120.0,
|
||||
)
|
||||
logger.info("Heartbeat: a2a_receive self-message sent")
|
||||
except Exception as e:
|
||||
logger.warning("Heartbeat: failed to send a2a_receive self-message: %s", e)
|
||||
|
||||
# Also notify the user via canvas.
|
||||
for r in new_results:
|
||||
try:
|
||||
msg = f"Delegation completed: {r['summary'][:100] or '(no summary)'}"
|
||||
preview = r.get("response_preview", "")
|
||||
if preview:
|
||||
msg += f"\nResult: {preview[:200]}"
|
||||
await client.post(
|
||||
f"{self.platform_url}/workspaces/{self.workspace_id}/notify",
|
||||
json={"message": msg, "type": "delegation_result"},
|
||||
headers=auth_headers(),
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
except Exception as e:
|
||||
logger.debug("Activity delegation check error: %s", e)
|
||||
@@ -1,807 +0,0 @@
|
||||
"""In-memory inbox + background poller for the standalone molecule-mcp path.
|
||||
|
||||
Purpose
|
||||
-------
|
||||
The universal MCP server (a2a_mcp_server.py) is OUTBOUND-ONLY by default —
|
||||
it gives an MCP-aware agent the same A2A delegation, peer-discovery, and
|
||||
memory tools that container-bound runtimes already have. There is no
|
||||
inbound delivery path: when the canvas user types a message or a peer
|
||||
sends an A2A request, the activity lands on the platform but the
|
||||
standalone agent never sees it.
|
||||
|
||||
This module closes that gap WITHOUT requiring a tunnel or a public agent
|
||||
URL. A daemon thread polls ``/workspaces/:id/activity?type=a2a_receive``
|
||||
on the platform and stages new rows in an in-memory deque. Three new MCP
|
||||
tools (``inbox_peek``, ``inbox_pop``, ``wait_for_message``) let the
|
||||
agent observe the queue.
|
||||
|
||||
Why a poller (not push)
|
||||
-----------------------
|
||||
runtime=external workspaces have ``delivery_mode="poll"`` — the platform
|
||||
records inbound A2A in ``activity_logs`` but does not call back to the
|
||||
agent. A poller is the only inbound surface that works without the
|
||||
operator exposing a public URL through a tunnel. 5s cadence matches
|
||||
the molecule-mcp-claude-channel plugin's POLL_INTERVAL — it's already
|
||||
proven on staging for the channel-based delivery path.
|
||||
|
||||
Cursor model
|
||||
------------
|
||||
``activity_logs.id`` is the cursor (server-assigned, monotonic). We
|
||||
persist it to ``${CONFIGS_DIR}/.mcp_inbox_cursor`` so an agent restart
|
||||
doesn't replay the last 10 minutes of inbound traffic and re-act on
|
||||
already-handled messages. On 410 (cursor pruned) we drop back to
|
||||
``since_secs=600`` for a bounded backlog and let the cursor advance
|
||||
naturally from there.
|
||||
|
||||
Scope
|
||||
-----
|
||||
Standalone molecule-mcp ONLY. The in-container runtime has its own
|
||||
push delivery (main.py + canvas WebSocket); we never want both
|
||||
running at once or a single message would be delivered twice. The
|
||||
caller (mcp_cli.main) gates activation explicitly via
|
||||
``activate(state)``; in-container code that imports this module by
|
||||
accident gets a no-op until activate is called.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import threading
|
||||
import time
|
||||
from collections import deque
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Any, Callable
|
||||
|
||||
import configs_dir
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Poll cadence. 5s mirrors the molecule-mcp-claude-channel plugin's
|
||||
# proven default — fast enough that a canvas user typing "are you
|
||||
# there?" gets picked up before they refresh, slow enough that 12
|
||||
# requests/min won't trip rate limits or wake mobile devices.
|
||||
POLL_INTERVAL_SECONDS = 5.0
|
||||
|
||||
# Initial backlog window for the first poll AND the recovery path
|
||||
# after a stale-cursor 410. 10 minutes is enough to cover a brief
|
||||
# crash/restart without flooding a long-idle workspace with hours of
|
||||
# stale chat.
|
||||
INITIAL_BACKLOG_SECONDS = 600
|
||||
|
||||
# Hard cap on the in-memory deque. The poller is bounded by the
|
||||
# server's per-page limit (default 100) and the agent typically pops
|
||||
# faster than the operator types, so an idle workspace shouldn't
|
||||
# exceed a handful. The cap protects against runaway growth if the
|
||||
# agent process stops calling pop.
|
||||
MAX_QUEUED_MESSAGES = 200
|
||||
|
||||
|
||||
@dataclass
|
||||
class InboxMessage:
|
||||
"""One inbound A2A message staged for the agent.
|
||||
|
||||
Mirrors the shape the agent sees via inbox_peek / wait_for_message.
|
||||
Fields are derived from the activity_logs row by ``_from_activity``.
|
||||
"""
|
||||
|
||||
activity_id: str
|
||||
text: str
|
||||
peer_id: str # empty string = canvas user; non-empty = peer workspace_id
|
||||
method: str # JSON-RPC method ("message/send", "tasks/send", etc.)
|
||||
created_at: str # RFC3339 timestamp from the activity row
|
||||
|
||||
# Which OF MY workspaces did this message arrive on. Only meaningful
|
||||
# for the multi-workspace external agent (one process registered
|
||||
# against multiple workspaces). Empty string = single-workspace
|
||||
# path / pre-multi-workspace caller — back-compat with consumers
|
||||
# that don't set it. Tools like send_message_to_user use this to
|
||||
# know which workspace's identity to reply with.
|
||||
arrival_workspace_id: str = ""
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
# Task #190 / #193 — Distinguish delegation-result rows from peer-agent
|
||||
# messages. The platform's pushDelegationResultToInbox (RFC #2829 PR-2)
|
||||
# writes activity_type='a2a_receive' with method='delegate_result' and
|
||||
# source_id=our own workspace UUID, so the caller's inbox poller can
|
||||
# surface delegation completions/failures via wait_for_message. But
|
||||
# the default to_dict derives kind="peer_agent" purely from peer_id
|
||||
# being non-empty — which makes a synchronous-delegation timeout, or
|
||||
# a cross-workspace ProxyA2A failure, appear to the agent as a NEW
|
||||
# peer_agent message from our own workspace UUID (#190 self-echo).
|
||||
#
|
||||
# Explicitly classify rows with method='delegate_result' as
|
||||
# kind='delegation_result' regardless of peer_id, so:
|
||||
# 1. wait_for_message gives the original caller a structured
|
||||
# delegation result (not a fake peer instruction).
|
||||
# 2. Agents reading the envelope don't mistake the row for a
|
||||
# peer instructing them — preventing the #190 reply-via-
|
||||
# delegate_task-to-self loop.
|
||||
if self.method == "delegate_result":
|
||||
kind = "delegation_result"
|
||||
elif self.peer_id:
|
||||
kind = "peer_agent"
|
||||
else:
|
||||
kind = "canvas_user"
|
||||
d = {
|
||||
"activity_id": self.activity_id,
|
||||
"text": self.text,
|
||||
"peer_id": self.peer_id,
|
||||
"kind": kind,
|
||||
"method": self.method,
|
||||
"created_at": self.created_at,
|
||||
}
|
||||
# Only surface arrival_workspace_id when it's set, so single-
|
||||
# workspace consumers don't see a new key in their existing
|
||||
# output.
|
||||
if self.arrival_workspace_id:
|
||||
d["arrival_workspace_id"] = self.arrival_workspace_id
|
||||
return d
|
||||
|
||||
|
||||
@dataclass
|
||||
class InboxState:
|
||||
"""Thread-safe queue of pending inbound messages.
|
||||
|
||||
Producer: the poller thread(s), calling ``record(message)``. Consumers:
|
||||
the MCP tool handlers, calling ``peek``, ``pop``, or ``wait``.
|
||||
Synchronization is via a single ``threading.Lock`` (cheap — every
|
||||
operation is O(n) over a small deque) plus an ``Event`` that wakes
|
||||
``wait`` callers when a new message lands.
|
||||
|
||||
Cursors are per-workspace. Single-workspace operators construct with
|
||||
``InboxState(cursor_path=...)`` (back-compat — the path becomes the
|
||||
cursor file for the empty-string workspace_id key). Multi-workspace
|
||||
operators construct with ``InboxState(cursor_paths={wsid: path,...})``
|
||||
so each poller advances its own cursor independently — one
|
||||
workspace's slow poll can't stall another's, and a 410 on one cursor
|
||||
only resets that one.
|
||||
"""
|
||||
|
||||
cursor_path: Path | None = None
|
||||
"""Single-workspace cursor file. Sets ``cursor_paths[""]`` if
|
||||
``cursor_paths`` not also supplied. Kept on the dataclass for
|
||||
back-compat — existing callers pass ``cursor_path=`` positionally."""
|
||||
|
||||
cursor_paths: dict[str, Path] = field(default_factory=dict)
|
||||
"""Per-workspace cursor files keyed by workspace_id. Multi-workspace
|
||||
pollers each own their own row here."""
|
||||
|
||||
_queue: deque[InboxMessage] = field(default_factory=lambda: deque(maxlen=MAX_QUEUED_MESSAGES))
|
||||
_lock: threading.Lock = field(default_factory=threading.Lock)
|
||||
_arrival: threading.Event = field(default_factory=threading.Event)
|
||||
_cursors: dict[str, str | None] = field(default_factory=dict)
|
||||
_cursors_loaded: dict[str, bool] = field(default_factory=dict)
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
# Back-compat: single-workspace constructor passes
|
||||
# cursor_path=Path(...). Promote it into the dict under the
|
||||
# empty-string key so the lookup APIs are uniform.
|
||||
if self.cursor_path is not None and "" not in self.cursor_paths:
|
||||
self.cursor_paths[""] = self.cursor_path
|
||||
|
||||
def _path_for(self, workspace_id: str) -> Path | None:
|
||||
"""Resolve the cursor path for a workspace_id key, or None."""
|
||||
return self.cursor_paths.get(workspace_id or "")
|
||||
|
||||
def load_cursor(self, workspace_id: str = "") -> str | None:
|
||||
"""Read the persisted cursor from disk. Cached after first call.
|
||||
|
||||
Missing/unreadable file → None (poller will fall back to the
|
||||
initial-backlog window). We never raise: a corrupt cursor is
|
||||
less bad than the inbox refusing to start.
|
||||
|
||||
``workspace_id=""`` is the single-workspace path, untouched.
|
||||
"""
|
||||
path = self._path_for(workspace_id)
|
||||
with self._lock:
|
||||
if self._cursors_loaded.get(workspace_id):
|
||||
return self._cursors.get(workspace_id)
|
||||
cursor: str | None = None
|
||||
if path is not None:
|
||||
try:
|
||||
if path.is_file():
|
||||
cursor = path.read_text().strip() or None
|
||||
except OSError as exc:
|
||||
logger.warning("inbox: failed to read cursor %s: %s", path, exc)
|
||||
cursor = None
|
||||
self._cursors[workspace_id] = cursor
|
||||
self._cursors_loaded[workspace_id] = True
|
||||
return cursor
|
||||
|
||||
def save_cursor(self, activity_id: str, workspace_id: str = "") -> None:
|
||||
"""Persist the cursor. Best-effort — log + continue on failure.
|
||||
|
||||
Loss of the cursor on a write failure means an extra page of
|
||||
backlog after restart, never a stuck poller. Silent-fail
|
||||
would mask a permission misconfiguration on the operator's
|
||||
configs dir; warn loudly so they can fix it.
|
||||
"""
|
||||
path = self._path_for(workspace_id)
|
||||
with self._lock:
|
||||
self._cursors[workspace_id] = activity_id
|
||||
self._cursors_loaded[workspace_id] = True
|
||||
if path is None:
|
||||
return
|
||||
try:
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
tmp = path.with_suffix(path.suffix + ".tmp")
|
||||
tmp.write_text(activity_id)
|
||||
tmp.replace(path)
|
||||
except OSError as exc:
|
||||
logger.warning("inbox: failed to persist cursor to %s: %s", path, exc)
|
||||
|
||||
def reset_cursor(self, workspace_id: str = "") -> None:
|
||||
"""Forget the cursor. Used after a 410 from the activity API."""
|
||||
path = self._path_for(workspace_id)
|
||||
with self._lock:
|
||||
self._cursors[workspace_id] = None
|
||||
self._cursors_loaded[workspace_id] = True
|
||||
if path is None:
|
||||
return
|
||||
try:
|
||||
if path.is_file():
|
||||
path.unlink()
|
||||
except OSError as exc:
|
||||
logger.warning("inbox: failed to delete cursor %s: %s", path, exc)
|
||||
|
||||
def record(self, message: InboxMessage) -> None:
|
||||
"""Append a message, wake any waiter, and fire the notification
|
||||
callback (if registered) for push-UX-capable hosts.
|
||||
|
||||
Skips a row whose activity_id we've already queued — defensive
|
||||
against the poller racing with the consumer + cursor save. The
|
||||
dedupe short-circuits BEFORE the notification fires, so a
|
||||
notification-capable host doesn't see duplicate push events on
|
||||
backlog overlap.
|
||||
"""
|
||||
with self._lock:
|
||||
for existing in self._queue:
|
||||
if existing.activity_id == message.activity_id:
|
||||
return
|
||||
self._queue.append(message)
|
||||
self._arrival.set()
|
||||
# Fire notification AFTER releasing the lock so the callback
|
||||
# is free to do anything (including calling back into inbox)
|
||||
# without deadlock. Best-effort: a raising callback must not
|
||||
# prevent the message from landing in the queue — observability
|
||||
# is more important than push delivery.
|
||||
cb = _NOTIFICATION_CALLBACK
|
||||
if cb is not None:
|
||||
try:
|
||||
cb(message.to_dict())
|
||||
except Exception:
|
||||
logger.warning(
|
||||
"inbox: notification callback raised", exc_info=True
|
||||
)
|
||||
|
||||
def peek(self, limit: int = 10) -> list[InboxMessage]:
|
||||
"""Return up to ``limit`` pending messages without removing them."""
|
||||
if limit <= 0:
|
||||
limit = 10
|
||||
with self._lock:
|
||||
return list(self._queue)[:limit]
|
||||
|
||||
def pop(self, activity_id: str) -> InboxMessage | None:
|
||||
"""Remove a specific message. Idempotent; returns None if absent.
|
||||
|
||||
We require the caller to specify which message it handled
|
||||
rather than auto-popping the head — preserves observability
|
||||
when the agent reads several but only handles one.
|
||||
"""
|
||||
with self._lock:
|
||||
for existing in list(self._queue):
|
||||
if existing.activity_id == activity_id:
|
||||
self._queue.remove(existing)
|
||||
if not self._queue:
|
||||
self._arrival.clear()
|
||||
return existing
|
||||
return None
|
||||
|
||||
def wait(self, timeout_secs: float) -> InboxMessage | None:
|
||||
"""Block until a message is available or timeout elapses.
|
||||
|
||||
Returns the head message WITHOUT popping; the caller decides
|
||||
whether to pop after acting on it. Same shape as Python's
|
||||
Queue.get with timeout, but non-destructive so a peek-style
|
||||
agent can still inspect with peek/pop.
|
||||
"""
|
||||
# Fast path: queue already has something.
|
||||
with self._lock:
|
||||
if self._queue:
|
||||
return self._queue[0]
|
||||
self._arrival.clear()
|
||||
|
||||
triggered = self._arrival.wait(timeout=max(0.0, timeout_secs))
|
||||
if not triggered:
|
||||
return None
|
||||
with self._lock:
|
||||
return self._queue[0] if self._queue else None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Module singleton — set by mcp_cli before MCP server starts.
|
||||
# ---------------------------------------------------------------------------
|
||||
#
|
||||
# In-container callers don't activate; the inbox tools detect the
|
||||
# unset singleton and return an informational error rather than
|
||||
# breaking the dispatch path.
|
||||
|
||||
_STATE: InboxState | None = None
|
||||
|
||||
|
||||
# Notification bridge — set by the universal MCP server (a2a_mcp_server.py)
|
||||
# at startup so that new inbox arrivals can be pushed to notification-
|
||||
# capable hosts (Claude Code) as MCP `notifications/claude/channel`
|
||||
# events. Kept module-level (rather than a method on InboxState) so the
|
||||
# inbox doesn't need to know about MCP — a thin pluggable seam.
|
||||
#
|
||||
# Defaults to None: in-container runtimes that don't activate the inbox
|
||||
# also don't push notifications, and tests start clean. The wheel's
|
||||
# wiring is exercised by tests/test_a2a_mcp_server.py + the bridge
|
||||
# tests below.
|
||||
_NOTIFICATION_CALLBACK: Callable[[dict], None] | None = None
|
||||
|
||||
|
||||
def set_notification_callback(cb: Callable[[dict], None] | None) -> None:
|
||||
"""Register (or clear) the per-message notification callback.
|
||||
|
||||
The callback receives ``InboxMessage.to_dict()`` for each new
|
||||
arrival — same shape ``inbox_peek`` returns to the agent, so a
|
||||
bridge can build its MCP notification payload without re-deriving
|
||||
fields.
|
||||
|
||||
Best-effort: a raising callback does NOT prevent the message from
|
||||
landing in the queue (see ``InboxState.record``). Pass ``None`` to
|
||||
clear (used by tests + the wheel's shutdown path).
|
||||
"""
|
||||
global _NOTIFICATION_CALLBACK
|
||||
_NOTIFICATION_CALLBACK = cb
|
||||
|
||||
|
||||
def activate(state: InboxState) -> None:
|
||||
"""Register an InboxState as the singleton this module exposes.
|
||||
|
||||
Idempotent within a process: re-activating with the same state is
|
||||
a no-op; activating with a DIFFERENT state replaces the singleton
|
||||
+ logs at WARNING (the only legitimate caller is mcp_cli at
|
||||
startup; double-activate usually means a test/runtime mix-up).
|
||||
"""
|
||||
global _STATE
|
||||
if _STATE is state:
|
||||
return
|
||||
if _STATE is not None:
|
||||
logger.warning("inbox: replacing existing singleton state")
|
||||
_STATE = state
|
||||
|
||||
|
||||
def get_state() -> InboxState | None:
|
||||
"""Return the active InboxState, or None if the runtime never activated.
|
||||
|
||||
Tool implementations call this and surface a clear "(inbox not
|
||||
enabled)" message to the agent when None — keeps the in-container
|
||||
path's tool dispatch from raising on an inbox-tool call that the
|
||||
agent shouldn't have made anyway.
|
||||
"""
|
||||
return _STATE
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Activity → InboxMessage adapter
|
||||
# ---------------------------------------------------------------------------
|
||||
#
|
||||
# The platform's a2a_proxy logs request_body as the JSON-RPC envelope
|
||||
# it forwarded to the workspace. Three shapes have been observed in
|
||||
# the wild (verified against workspace-server's logA2ASuccess in
|
||||
# a2a_proxy_helpers.go on 2026-04-29) — handle all three before
|
||||
# falling back to summary so a peer message at least surfaces SOMETHING.
|
||||
|
||||
|
||||
def _extract_text(request_body: Any, summary: str | None) -> str:
|
||||
"""Pull the human-readable text out of an A2A activity row.
|
||||
|
||||
Mirrors molecule-mcp-claude-channel/server.ts:445 (extractText) so
|
||||
canvas-user messages and peer-agent messages render identically
|
||||
across both inbound channels.
|
||||
"""
|
||||
if not isinstance(request_body, dict):
|
||||
return summary or "(empty A2A message)"
|
||||
|
||||
candidates: list[Any] = []
|
||||
params = request_body.get("params") if isinstance(request_body.get("params"), dict) else None
|
||||
if params:
|
||||
message = params.get("message") if isinstance(params.get("message"), dict) else None
|
||||
if message:
|
||||
candidates.append(message.get("parts"))
|
||||
candidates.append(params.get("parts"))
|
||||
candidates.append(request_body.get("parts"))
|
||||
|
||||
# The A2A protocol's part discriminator field varies between SDK
|
||||
# versions: a2a-sdk v0 uses ``type``, v1 uses ``kind``. The platform's
|
||||
# activity_logs preserves whichever the original sender used, so we
|
||||
# accept either. Verified live against a hosted SaaS workspace on
|
||||
# 2026-04-30 — every canvas-user message arrived with ``kind`` and
|
||||
# the type-only filter was silently falling through to summary.
|
||||
for parts in candidates:
|
||||
if isinstance(parts, list):
|
||||
text = "".join(
|
||||
p.get("text", "")
|
||||
for p in parts
|
||||
if isinstance(p, dict)
|
||||
and (p.get("kind") == "text" or p.get("type") == "text")
|
||||
)
|
||||
if text:
|
||||
return text
|
||||
return summary or "(empty A2A message)"
|
||||
|
||||
|
||||
def _is_self_notify_row(row: dict[str, Any]) -> bool:
|
||||
"""Return True if ``row`` is the agent's own send_message_to_user
|
||||
POST surfacing back through the activity API.
|
||||
|
||||
The shape (workspace-server handlers/activity.go, ``Notify`` writer):
|
||||
method='notify' AND no peer (source_id is None or '')
|
||||
|
||||
Matched on both fields together so a future caller using
|
||||
``method='notify'`` for a different purpose with a real peer_id
|
||||
still passes through.
|
||||
"""
|
||||
if row.get("method") != "notify":
|
||||
return False
|
||||
source_id = row.get("source_id")
|
||||
return source_id is None or source_id == ""
|
||||
|
||||
|
||||
def _is_self_echo_row(row: dict[str, Any], workspace_id: str) -> bool:
|
||||
"""Return True if ``row`` is a self-originated a2a_receive row.
|
||||
|
||||
Internal #469: when a workspace delegates to a target that never picks
|
||||
up the task, ``tool_delegate_task`` calls ``report_activity`` which
|
||||
POSTs to the platform with source_id set to the *sender's* workspace
|
||||
UUID (mandated by spoof-defense in workspace-server's a2a_proxy). The
|
||||
activity API exposes that row under type=a2a_receive, so the inbox
|
||||
poller re-fetches it. Without this guard the row is surfaced as
|
||||
kind='peer_agent' with the workspace's own identity as peer_id —
|
||||
the workspace sees its own delegation-failure echoed back as if a
|
||||
peer had delegated to it.
|
||||
|
||||
The guard mirrors the existing _is_self_notify_row pattern: both
|
||||
skip rows that would otherwise create spurious inbound signal. The
|
||||
long-term fix (making the platform write a distinct activity_type
|
||||
for agent-outbound rows) is tracked separately; this guard stays
|
||||
because it only excludes rows the agent never wants.
|
||||
|
||||
``workspace_id`` must be non-empty — an empty-string workspace_id
|
||||
(single-workspace legacy path) can never match a UUID source_id, so
|
||||
the predicate is always False there, which is safe.
|
||||
|
||||
RFC #2829 PR-2 note: rows with method="delegate_result" are excluded
|
||||
from the self-echo guard even when source_id matches our workspace_id.
|
||||
The platform may write a delegation-result row with source_id set to
|
||||
our workspace_id (e.g. a self-delegation or edge case in the platform's
|
||||
result-writing path). Such rows must reach the inbox so that
|
||||
message_from_activity can surface them as peer_agent inbound and the
|
||||
runtime receives the delegation result. Silently filtering them as
|
||||
self-echo would break delegation result delivery.
|
||||
"""
|
||||
if not workspace_id:
|
||||
return False
|
||||
return row.get("source_id") == workspace_id and row.get("method") != "delegate_result"
|
||||
|
||||
|
||||
def message_from_activity(row: dict[str, Any]) -> InboxMessage:
|
||||
"""Convert one /activity row into an InboxMessage.
|
||||
|
||||
Mutates ``row['request_body']`` in-place to swap any
|
||||
``platform-pending:`` URIs to the locally-staged ``workspace:`` URIs
|
||||
(see ``inbox_uploads.rewrite_request_body``) — by the time the
|
||||
upstream chat message arrives via this path, the upload-receive row
|
||||
that staged the bytes has already populated the URI cache (lower
|
||||
activity_logs.id, processed earlier in the same poll batch). A
|
||||
cache miss leaves the URI untouched; the agent surfaces an
|
||||
unresolvable URI rather than the inbox silently dropping the part.
|
||||
"""
|
||||
request_body = row.get("request_body")
|
||||
if isinstance(request_body, str):
|
||||
# The Go handler returns request_body as json.RawMessage; httpx
|
||||
# deserializes that to a dict already. But some legacy paths or
|
||||
# mocked servers may return it as a string — handle defensively.
|
||||
try:
|
||||
request_body = json.loads(request_body)
|
||||
except (TypeError, ValueError):
|
||||
request_body = None
|
||||
|
||||
# Rewrite platform-pending: URIs → workspace: URIs in-place. Imported
|
||||
# at call time to keep the import graph clean for the in-container
|
||||
# path that doesn't use this module (also avoids a circular: the
|
||||
# uploads module is small enough that re-importing per call is
|
||||
# cheap, and the Python import cache makes it free after the first).
|
||||
from inbox_uploads import rewrite_request_body
|
||||
rewrite_request_body(request_body)
|
||||
|
||||
return InboxMessage(
|
||||
activity_id=str(row.get("id", "")),
|
||||
text=_extract_text(request_body, row.get("summary")),
|
||||
peer_id=row.get("source_id") or "",
|
||||
method=row.get("method") or "",
|
||||
created_at=str(row.get("created_at", "")),
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Poller — daemon thread that fills the queue from the activity API
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _poll_once(
|
||||
state: InboxState,
|
||||
platform_url: str,
|
||||
workspace_id: str,
|
||||
headers: dict[str, str],
|
||||
timeout_secs: float = 10.0,
|
||||
) -> int:
|
||||
"""One poll iteration. Returns number of new messages enqueued.
|
||||
|
||||
Idempotent and stateless apart from the InboxState passed in —
|
||||
safe to call from tests with a stub state + a real httpx mock.
|
||||
|
||||
``workspace_id`` doubles as the cursor key on InboxState — pollers
|
||||
for distinct workspaces get distinct cursors and don't trample each
|
||||
other. For the single-workspace path the cursor key is the empty
|
||||
string (per InboxState.__post_init__'s back-compat promotion of
|
||||
``cursor_path``).
|
||||
"""
|
||||
import httpx
|
||||
|
||||
url = f"{platform_url}/workspaces/{workspace_id}/activity"
|
||||
# Dual cursor key resolution: in single-workspace mode the cursor
|
||||
# was historically stored under the "" key (back-compat). In
|
||||
# multi-workspace mode each poller's cursor lives under its own
|
||||
# workspace_id. Try the workspace-specific key first; if absent on
|
||||
# this state, fall back to the legacy empty-string slot so existing
|
||||
# InboxState-with-cursor_path-only constructors keep working.
|
||||
cursor_key = workspace_id if workspace_id in state.cursor_paths else ""
|
||||
params: dict[str, str] = {"type": "a2a_receive"}
|
||||
cursor = state.load_cursor(cursor_key)
|
||||
if cursor:
|
||||
params["since_id"] = cursor
|
||||
else:
|
||||
params["since_secs"] = str(INITIAL_BACKLOG_SECONDS)
|
||||
|
||||
try:
|
||||
with httpx.Client(timeout=timeout_secs) as client:
|
||||
resp = client.get(url, params=params, headers=headers)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning("inbox poller: GET /activity failed: %s", exc)
|
||||
return 0
|
||||
|
||||
if resp.status_code == 410:
|
||||
# Cursor pruned — drop back to the backlog window. The next
|
||||
# poll picks up wherever the activity API has rows now.
|
||||
logger.info(
|
||||
"inbox poller: cursor %s expired (410); resetting to since_secs=%d",
|
||||
cursor,
|
||||
INITIAL_BACKLOG_SECONDS,
|
||||
)
|
||||
state.reset_cursor(cursor_key)
|
||||
return 0
|
||||
|
||||
if resp.status_code >= 400:
|
||||
logger.warning(
|
||||
"inbox poller: HTTP %d from /activity: %s",
|
||||
resp.status_code,
|
||||
(resp.text or "")[:200],
|
||||
)
|
||||
return 0
|
||||
|
||||
try:
|
||||
rows = resp.json()
|
||||
except ValueError as exc:
|
||||
logger.warning("inbox poller: non-JSON response: %s", exc)
|
||||
return 0
|
||||
if not isinstance(rows, list):
|
||||
return 0
|
||||
|
||||
# since_id mode returns ASC (oldest first). since_secs mode returns
|
||||
# DESC; reverse so we record in chronological order and the cursor
|
||||
# we save is the freshest row.
|
||||
if cursor is None:
|
||||
rows = list(reversed(rows))
|
||||
|
||||
# Imported lazily at use-site so a runtime that never sees an
|
||||
# upload-receive row never imports the module. Cheap on the hot
|
||||
# path because Python caches the import.
|
||||
from inbox_uploads import is_chat_upload_row, BatchFetcher
|
||||
|
||||
new_count = 0
|
||||
last_id: str | None = None
|
||||
# ``batch_fetcher`` is lazy: a poll batch with no upload rows pays
|
||||
# zero overhead. Once the first upload row appears we open one
|
||||
# BatchFetcher and submit every subsequent upload row to its thread
|
||||
# pool; before processing the FIRST non-upload row we drain the
|
||||
# pool (wait_all) so the URI cache is hot when message rewriting
|
||||
# runs. Without the barrier, the chat message that references the
|
||||
# upload would arrive at the agent with the un-rewritten
|
||||
# platform-pending: URI.
|
||||
batch_fetcher: BatchFetcher | None = None
|
||||
|
||||
def _drain_uploads(bf: BatchFetcher | None) -> None:
|
||||
if bf is None:
|
||||
return
|
||||
bf.wait_all()
|
||||
bf.close()
|
||||
|
||||
for row in rows:
|
||||
if not isinstance(row, dict):
|
||||
continue
|
||||
if is_chat_upload_row(row):
|
||||
# Side-effect row from the platform's poll-mode chat-upload
|
||||
# handler — fetch the bytes, stage to /workspace/.molecule/
|
||||
# chat-uploads, ack. NOT enqueued as an InboxMessage; the
|
||||
# agent will see the chat message that REFERENCES this
|
||||
# upload via a separate (later) activity row, with the
|
||||
# pending: URI rewritten to a workspace: URI by
|
||||
# message_from_activity. We DO advance the cursor past
|
||||
# this row so a permanent network outage on /content
|
||||
# doesn't stall the cursor and block real chat traffic.
|
||||
if batch_fetcher is None:
|
||||
batch_fetcher = BatchFetcher(
|
||||
platform_url=platform_url,
|
||||
workspace_id=workspace_id,
|
||||
headers=headers,
|
||||
)
|
||||
batch_fetcher.submit(row)
|
||||
last_id = str(row.get("id", "")) or last_id
|
||||
continue
|
||||
# Non-upload row: drain any pending uploads first so the URI
|
||||
# cache is populated before we run rewrite_request_body /
|
||||
# message_from_activity on a row that may reference one.
|
||||
if batch_fetcher is not None:
|
||||
_drain_uploads(batch_fetcher)
|
||||
batch_fetcher = None
|
||||
if _is_self_notify_row(row):
|
||||
# The workspace-server's `/notify` handler writes the agent's
|
||||
# own send_message_to_user POSTs to activity_logs with
|
||||
# activity_type='a2a_receive', method='notify', and no
|
||||
# source_id, so the canvas chat-history loader can restore
|
||||
# those bubbles after a page reload (handlers/activity.go,
|
||||
# comment block at line 428). The activity API exposes that
|
||||
# filter only on type, so the same row otherwise lands in
|
||||
# this poll and gets pushed back to the agent — confirmed
|
||||
# live 2026-05-01: agent observed its own outbound as an
|
||||
# inbound `← molecule: Agent message: ...`. Filter here
|
||||
# belt-and-braces; the long-term fix is upstream renaming
|
||||
# the activity_type to `agent_outbound` (molecule-core
|
||||
# #2469). Once that lands, this filter becomes redundant
|
||||
# but stays in place because it only excludes rows we never
|
||||
# want, so removing it would just be churn.
|
||||
#
|
||||
# NB: still call save_cursor for these rows below — we
|
||||
# advance past them so the next poll doesn't keep re-seeing
|
||||
# the same self-notify on every iteration.
|
||||
last_id = str(row.get("id", "")) or last_id
|
||||
continue
|
||||
if _is_self_echo_row(row, workspace_id):
|
||||
# Internal #469: tool_delegate_task writes its own a2a_receive
|
||||
# row with source_id = this workspace's UUID (spoof-defense).
|
||||
# The poll fetches it back as kind='peer_agent', making the
|
||||
# workspace echo its own delegation-failure as an inbound from
|
||||
# a phantom peer. Skip it — the real delegation-result path
|
||||
# (delegate_result push) is separate and unaffected. Cursor
|
||||
# still advances so the next poll doesn't re-seen this row.
|
||||
last_id = str(row.get("id", "")) or last_id
|
||||
continue
|
||||
message = message_from_activity(row)
|
||||
if not message.activity_id:
|
||||
continue
|
||||
# Tag the message with the workspace it arrived on so the agent
|
||||
# (and tools like send_message_to_user) can route the reply to
|
||||
# the right tenant. Empty-string in single-workspace mode keeps
|
||||
# to_dict()'s output shape unchanged for back-compat consumers.
|
||||
message.arrival_workspace_id = workspace_id if cursor_key else ""
|
||||
state.record(message)
|
||||
last_id = message.activity_id
|
||||
new_count += 1
|
||||
|
||||
# Drain any uploads still in flight if the batch ended with upload
|
||||
# rows (no chat-message row to trigger the inline drain). Without
|
||||
# this, a future poll that picks up the chat-message row first
|
||||
# would race with the still-running fetches.
|
||||
if batch_fetcher is not None:
|
||||
_drain_uploads(batch_fetcher)
|
||||
|
||||
if last_id is not None:
|
||||
state.save_cursor(last_id, cursor_key)
|
||||
return new_count
|
||||
|
||||
|
||||
def _poll_loop(
|
||||
state: InboxState,
|
||||
platform_url: str,
|
||||
workspace_id: str,
|
||||
interval: float = POLL_INTERVAL_SECONDS,
|
||||
stop_event: threading.Event | None = None,
|
||||
) -> None:
|
||||
"""Daemon-thread body: poll forever until stop_event fires.
|
||||
|
||||
auth_headers(workspace_id) is rebuilt every iteration so a token
|
||||
rotation via env var, .auth_token file, or per-workspace registry
|
||||
is picked up without a restart. Cheap (a dict + an env read).
|
||||
|
||||
Multi-workspace pollers pass the workspace_id so the per-workspace
|
||||
bearer token is selected from platform_auth's registry; single-
|
||||
workspace pollers fall through to the legacy resolution path
|
||||
(workspace_id arg is still passed but the registry lookup misses
|
||||
and auth_headers falls back to the cached/file/env token).
|
||||
"""
|
||||
from platform_auth import auth_headers
|
||||
|
||||
while True:
|
||||
try:
|
||||
_poll_once(state, platform_url, workspace_id, auth_headers(workspace_id))
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning("inbox poller: iteration crashed: %s", exc)
|
||||
if stop_event is not None and stop_event.wait(interval):
|
||||
return
|
||||
if stop_event is None:
|
||||
time.sleep(interval)
|
||||
|
||||
|
||||
def start_poller_thread(
|
||||
state: InboxState,
|
||||
platform_url: str,
|
||||
workspace_id: str,
|
||||
interval: float = POLL_INTERVAL_SECONDS,
|
||||
stop_event: threading.Event | None = None,
|
||||
) -> threading.Thread:
|
||||
"""Spawn the poller as a daemon thread. Returns the Thread handle.
|
||||
|
||||
daemon=True so the poller dies with the main process — same
|
||||
rationale as mcp_cli's heartbeat thread (no leaks, no stale
|
||||
workspace writes after the operator hits Ctrl-C).
|
||||
|
||||
Thread name embeds the workspace_id (truncated) so a multi-workspace
|
||||
operator running ``ps -eL`` or eyeballing ``threading.enumerate()``
|
||||
can tell which thread is which without reverse-engineering it from
|
||||
crash tracebacks.
|
||||
|
||||
Pass ``stop_event`` to enable graceful shutdown — used by tests so
|
||||
the daemon thread doesn't outlive the test that started it and race
|
||||
with later tests' httpx patches. Production code passes None and
|
||||
relies on the daemon flag for process-exit cleanup.
|
||||
"""
|
||||
name = "molecule-mcp-inbox-poller"
|
||||
if workspace_id:
|
||||
name = f"{name}-{workspace_id[:8]}"
|
||||
t = threading.Thread(
|
||||
target=_poll_loop,
|
||||
args=(state, platform_url, workspace_id, interval, stop_event),
|
||||
name=name,
|
||||
daemon=True,
|
||||
)
|
||||
t.start()
|
||||
return t
|
||||
|
||||
|
||||
def default_cursor_path(workspace_id: str = "") -> Path:
|
||||
"""Standard cursor location: ``<resolved configs dir>/.mcp_inbox_cursor``.
|
||||
|
||||
Resolved via configs_dir so the cursor lives next to .auth_token
|
||||
+ .platform_inbound_secret regardless of whether the runtime is
|
||||
in-container (/configs) or external (~/.molecule-workspace).
|
||||
|
||||
Multi-workspace operators pass ``workspace_id`` to get a unique
|
||||
cursor file per workspace (``.mcp_inbox_cursor_<wsid_short>``) so
|
||||
pollers don't trample each other's cursors. Single-workspace
|
||||
operators omit the arg and keep the legacy filename — back-compat
|
||||
with existing on-disk cursors.
|
||||
"""
|
||||
base = configs_dir.resolve() / ".mcp_inbox_cursor"
|
||||
if workspace_id:
|
||||
# 8-char prefix is enough to disambiguate two workspaces in the
|
||||
# same operator's setup (UUID v4 first 32 bits ≈ 4 billion of
|
||||
# entropy) without hash-bombing the filename.
|
||||
return base.with_name(f".mcp_inbox_cursor_{workspace_id[:8]}")
|
||||
return base
|
||||
@@ -1,733 +0,0 @@
|
||||
"""Poll-mode chat-upload fetcher + URI cache for the standalone path.
|
||||
|
||||
Companion to ``inbox.py``. When the workspace's inbox poller sees an
|
||||
``activity_logs`` row with ``method='chat_upload_receive'`` (written by
|
||||
the platform's ``uploadPollMode`` handler — workspace-server
|
||||
``internal/handlers/chat_files.go``), this module:
|
||||
|
||||
1. Pulls the bytes from
|
||||
``GET /workspaces/:id/pending-uploads/:file_id/content``.
|
||||
2. Writes them to ``/workspace/.molecule/chat-uploads/<prefix>-<name>``
|
||||
— same on-disk shape as the push-mode handler in
|
||||
``internal_chat_uploads.py``, so anything downstream that already
|
||||
resolves ``workspace:/workspace/.molecule/chat-uploads/...`` URIs
|
||||
works unchanged.
|
||||
3. POSTs ``/workspaces/:id/pending-uploads/:file_id/ack`` so Phase 3
|
||||
sweep can clean up the platform-side ``pending_uploads`` row.
|
||||
4. Records a ``platform-pending:<wsid>/<file_id> →
|
||||
workspace:/workspace/.molecule/chat-uploads/...`` mapping in a
|
||||
process-local cache so the chat message that arrives later
|
||||
(referencing the platform-pending URI) gets rewritten before the
|
||||
agent sees it.
|
||||
|
||||
URI rewrite ordering — the chat message containing the
|
||||
``platform-pending:`` URI is logged by the platform AFTER the
|
||||
``chat_upload_receive`` row, so the inbox poller sees the upload-receive
|
||||
row first (lower activity_logs.id) and stages the bytes before the chat
|
||||
message arrives in the same poll batch (or a later one). The URI cache
|
||||
is therefore populated before the message_from_activity path needs it.
|
||||
A miss (network race, restart with stale cursor) is handled by keeping
|
||||
the original ``platform-pending:`` URI in the rewritten body — the agent
|
||||
will see something it can't open, which is preferable to silently
|
||||
dropping the URI.
|
||||
|
||||
Auth — same Bearer token the inbox poller uses (``platform_auth.auth_headers``).
|
||||
Both endpoints are on the wsAuth-gated route, so this module can never
|
||||
read another tenant's bytes even if a token is misrouted.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import concurrent.futures
|
||||
import logging
|
||||
import mimetypes
|
||||
import os
|
||||
import re
|
||||
import secrets as pysecrets
|
||||
import threading
|
||||
from collections import OrderedDict
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Same on-disk root as internal_chat_uploads.CHAT_UPLOAD_DIR — keeping
|
||||
# these decoupled would let drift sneak in. Imported here rather than
|
||||
# from internal_chat_uploads to avoid pulling in starlette as a
|
||||
# transitive dep (this module runs in the standalone MCP path which
|
||||
# doesn't ship the in-container HTTP server).
|
||||
CHAT_UPLOAD_DIR = "/workspace/.molecule/chat-uploads"
|
||||
|
||||
# Per-file safety net. The platform enforces 100 MB on the staging side
|
||||
# (workspace-server migration 20260519200000_pending_uploads_bump_size_cap
|
||||
# + pendinguploads.MaxFileBytes — bumped from 25 MB per CTO directive
|
||||
# 2026-05-19 to match push-mode mc#1588), but a buggy or hostile
|
||||
# platform response shouldn't be able to fill the workspace's disk —
|
||||
# refuse to write more than this even if the response claims a larger
|
||||
# Content-Length.
|
||||
MAX_FILE_BYTES = 100 * 1024 * 1024
|
||||
|
||||
# Network deadline for the GET. Tuned for a 100 MB transfer over a
|
||||
# reasonable consumer link (~5 Mbps gives ~160s for the full payload),
|
||||
# plus headroom for TLS + platform auth. Scaled up from the original
|
||||
# 60s (sized for 25 MB) when the per-file cap moved to 100 MB — a fixed
|
||||
# 60s would fire BEFORE a legitimate slow uplink finished streaming, the
|
||||
# same wrong-reason failure mc#1588 fixed on the canvas side (forensic
|
||||
# a99ab0a1 reno-stars). Aligned with platform httpClient.Timeout (1200s
|
||||
# in chat_files.go after mc#1588) — laptop pull side gets a smaller
|
||||
# value because it's downstream of a fully-staged row, not a live
|
||||
# multipart parse.
|
||||
DEFAULT_FETCH_TIMEOUT = 240.0
|
||||
|
||||
# Concurrency cap for ``BatchFetcher``. Four workers is enough headroom
|
||||
# for the realistic "user dragged 3-4 files into chat at once" case
|
||||
# while bounding the platform's per-workspace fan-out. The cap matters
|
||||
# because the platform's /content endpoint reads bytea from Postgres in
|
||||
# a single round-trip per request — N workers = N concurrent DB reads
|
||||
# of up to 100 MB each (post-mc#1588 cap), so a higher cap could pressure
|
||||
# platform memory without much UX win (network bandwidth is the
|
||||
# bottleneck once the bytes are buffered).
|
||||
DEFAULT_BATCH_FETCH_WORKERS = 4
|
||||
|
||||
# Upper bound on how long ``BatchFetcher.wait_all`` blocks the inbox
|
||||
# poll loop before giving up on still-in-flight fetches. Aligned with
|
||||
# DEFAULT_FETCH_TIMEOUT so a single hung fetch can't stall the loop
|
||||
# longer than its own deadline. A timeout fires only if a worker thread
|
||||
# is stuck past the underlying httpx timeout — pathological case;
|
||||
# normal completion is bounded by per-fetch timeout × ceil(N/W).
|
||||
DEFAULT_BATCH_WAIT_TIMEOUT = DEFAULT_FETCH_TIMEOUT + 5.0
|
||||
|
||||
# Cap on the URI cache. A long-lived workspace handling thousands of
|
||||
# uploads shouldn't grow without bound; an LRU cap of 1024 keeps the
|
||||
# entries-needed-for-a-typical-conversation well within memory.
|
||||
URI_CACHE_MAX_ENTRIES = 1024
|
||||
|
||||
# Same character class as internal_chat_uploads — kept duplicated rather
|
||||
# than imported to avoid dragging starlette into the standalone path.
|
||||
_UNSAFE_FILENAME_CHARS = re.compile(r"[^a-zA-Z0-9._\-]")
|
||||
|
||||
|
||||
def sanitize_filename(name: str) -> str:
|
||||
"""Reduce a user-supplied filename to a safe form.
|
||||
|
||||
Mirrors ``internal_chat_uploads.sanitize_filename`` and the Go
|
||||
handler's ``SanitizeFilename`` — three-way parity is pinned by
|
||||
``workspace-server/internal/handlers/sanitize_filename_test.go`` and
|
||||
``workspace/tests/test_internal_chat_uploads.py`` so the URI shape
|
||||
is identical regardless of which path handles the upload.
|
||||
"""
|
||||
base = os.path.basename(name)
|
||||
base = base.replace(" ", "_")
|
||||
base = _UNSAFE_FILENAME_CHARS.sub("_", base)
|
||||
if len(base) > 100:
|
||||
ext = ""
|
||||
dot = base.rfind(".")
|
||||
if dot >= 0 and len(base) - dot <= 16:
|
||||
ext = base[dot:]
|
||||
base = base[: 100 - len(ext)] + ext
|
||||
if base in ("", ".", ".."):
|
||||
return "file"
|
||||
return base
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# URI cache — maps platform-pending URIs to local workspace: URIs
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class _URICache:
|
||||
"""Thread-safe bounded LRU mapping of platform-pending → workspace URIs.
|
||||
|
||||
Bounded so a workspace that runs for months and handles thousands of
|
||||
uploads doesn't accumulate entries forever. ``OrderedDict.move_to_end``
|
||||
promotes recently-used entries; eviction takes the oldest.
|
||||
|
||||
The cache is intentionally per-process — there is no persistence
|
||||
across a workspace restart. A restart with a stale inbox cursor that
|
||||
re-poll an upload-receive row will re-fetch (the bytes are already
|
||||
on disk from the prior session — see ``stage_to_disk``'s O_EXCL
|
||||
handling) and re-register; a chat message that referenced the
|
||||
platform-pending URI BEFORE the restart and arrives AFTER would miss
|
||||
the rewrite and surface the platform-pending URI to the agent. That
|
||||
is preferable to a stale persisted mapping that points at a deleted
|
||||
file.
|
||||
"""
|
||||
|
||||
def __init__(self, max_entries: int = URI_CACHE_MAX_ENTRIES):
|
||||
self._max = max_entries
|
||||
self._lock = threading.Lock()
|
||||
self._entries: "OrderedDict[str, str]" = OrderedDict()
|
||||
|
||||
def get(self, pending_uri: str) -> str | None:
|
||||
with self._lock:
|
||||
local = self._entries.get(pending_uri)
|
||||
if local is not None:
|
||||
self._entries.move_to_end(pending_uri)
|
||||
return local
|
||||
|
||||
def set(self, pending_uri: str, local_uri: str) -> None:
|
||||
with self._lock:
|
||||
self._entries[pending_uri] = local_uri
|
||||
self._entries.move_to_end(pending_uri)
|
||||
while len(self._entries) > self._max:
|
||||
self._entries.popitem(last=False)
|
||||
|
||||
def __len__(self) -> int:
|
||||
with self._lock:
|
||||
return len(self._entries)
|
||||
|
||||
def clear(self) -> None:
|
||||
with self._lock:
|
||||
self._entries.clear()
|
||||
|
||||
|
||||
_cache = _URICache()
|
||||
|
||||
|
||||
def get_cache() -> _URICache:
|
||||
"""Expose the module-singleton cache for tests and the rewrite path."""
|
||||
return _cache
|
||||
|
||||
|
||||
def resolve_pending_uri(uri: str) -> str | None:
|
||||
"""Return the local ``workspace:`` URI for a ``platform-pending:`` URI,
|
||||
or None if not yet staged. Convenience for callers that want to
|
||||
fall back to an on-demand fetch — pass the result through to
|
||||
``executor_helpers.resolve_attachment_uri``.
|
||||
"""
|
||||
return _cache.get(uri)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# On-disk staging
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _open_safe(path: str) -> int:
|
||||
"""Open ``path`` for write with ``O_CREAT|O_EXCL|O_NOFOLLOW``.
|
||||
|
||||
Same shape as ``internal_chat_uploads._open_safe`` — refuses to
|
||||
follow a pre-existing symlink at the target and refuses to overwrite
|
||||
an existing regular file. The 16-byte random prefix makes a name
|
||||
collision astronomical, but defense-in-depth costs nothing.
|
||||
"""
|
||||
flags = os.O_WRONLY | os.O_CREAT | os.O_EXCL
|
||||
if hasattr(os, "O_NOFOLLOW"):
|
||||
flags |= os.O_NOFOLLOW
|
||||
return os.open(path, flags, 0o600)
|
||||
|
||||
|
||||
def stage_to_disk(content: bytes, filename: str) -> str:
|
||||
"""Write ``content`` under ``CHAT_UPLOAD_DIR`` and return the local URI.
|
||||
|
||||
Returns ``workspace:/workspace/.molecule/chat-uploads/<prefix>-<sanitized>``.
|
||||
The 32-hex prefix makes the on-disk name unguessable to anything
|
||||
that didn't see the response, so even if a stale agent has a guess
|
||||
at the original filename it can't construct a URL to a sibling's
|
||||
upload.
|
||||
|
||||
Raises:
|
||||
OSError: write failure (mkdir, open, or write). Caller is
|
||||
expected to log + skip; the activity row stays unacked so a
|
||||
future poll re-tries.
|
||||
ValueError: ``content`` exceeds ``MAX_FILE_BYTES``. Pre-staging
|
||||
guard belt-and-braces above the platform's same-side cap.
|
||||
"""
|
||||
if len(content) > MAX_FILE_BYTES:
|
||||
raise ValueError(
|
||||
f"content size {len(content)} exceeds workspace cap {MAX_FILE_BYTES}"
|
||||
)
|
||||
|
||||
Path(CHAT_UPLOAD_DIR).mkdir(parents=True, exist_ok=True)
|
||||
|
||||
sanitized = sanitize_filename(filename)
|
||||
prefix = pysecrets.token_hex(16)
|
||||
stored = f"{prefix}-{sanitized}"
|
||||
target = os.path.join(CHAT_UPLOAD_DIR, stored)
|
||||
|
||||
fd = _open_safe(target)
|
||||
try:
|
||||
with os.fdopen(fd, "wb") as f:
|
||||
f.write(content)
|
||||
except OSError:
|
||||
# Best-effort cleanup — partial writes leave a stub file that
|
||||
# would mask a future retry's success otherwise.
|
||||
try:
|
||||
os.unlink(target)
|
||||
except OSError:
|
||||
pass
|
||||
raise
|
||||
|
||||
return f"workspace:{CHAT_UPLOAD_DIR}/{stored}"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Activity row → fetch/stage/ack flow
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _request_body_dict(row: dict[str, Any]) -> dict[str, Any] | None:
|
||||
"""Coerce ``row['request_body']`` into a dict.
|
||||
|
||||
The /activity API returns request_body as JSON (already-deserialized
|
||||
by httpx). Some legacy paths or mocked transports may emit a string;
|
||||
handle defensively rather than raising.
|
||||
"""
|
||||
body = row.get("request_body")
|
||||
if isinstance(body, dict):
|
||||
return body
|
||||
if isinstance(body, str):
|
||||
import json
|
||||
try:
|
||||
decoded = json.loads(body)
|
||||
except (TypeError, ValueError):
|
||||
return None
|
||||
return decoded if isinstance(decoded, dict) else None
|
||||
return None
|
||||
|
||||
|
||||
def is_chat_upload_row(row: dict[str, Any]) -> bool:
|
||||
"""True if ``row`` is the platform's chat-upload-receive activity.
|
||||
|
||||
Used by the inbox poller to fork the row off the regular A2A
|
||||
message handling path — this row is not a peer message; it's an
|
||||
instruction to fetch + stage bytes. Match on ``method`` only;
|
||||
``activity_type`` is already filtered to ``a2a_receive`` upstream.
|
||||
"""
|
||||
return row.get("method") == "chat_upload_receive"
|
||||
|
||||
|
||||
def fetch_and_stage(
|
||||
row: dict[str, Any],
|
||||
*,
|
||||
platform_url: str,
|
||||
workspace_id: str,
|
||||
headers: dict[str, str],
|
||||
timeout_secs: float = DEFAULT_FETCH_TIMEOUT,
|
||||
client: Any = None,
|
||||
) -> str | None:
|
||||
"""Fetch the row's bytes, stage them under chat-uploads, and ack.
|
||||
|
||||
Returns the local ``workspace:`` URI on success, or ``None`` if any
|
||||
step failed (logged with enough detail to triage). Failure leaves
|
||||
the platform-side row unacked, so a subsequent poll retries — the
|
||||
activity row stays in the cursor's window because we DO advance the
|
||||
cursor (the row is "handled" from the inbox's perspective even on
|
||||
fetch failure; otherwise a permanent network outage would stall the
|
||||
cursor and block real chat traffic).
|
||||
|
||||
On success, the URI cache is updated so a subsequent chat message
|
||||
referencing the same ``platform-pending:`` URI is rewritten before
|
||||
the agent sees it.
|
||||
|
||||
Pass ``client`` to reuse a shared ``httpx.Client`` for both GET and
|
||||
POST ack (saves one TLS handshake per row vs. constructing one
|
||||
per-call). ``BatchFetcher`` does this across an entire poll batch so
|
||||
N concurrent fetches share one connection pool.
|
||||
"""
|
||||
body = _request_body_dict(row)
|
||||
if body is None:
|
||||
logger.warning(
|
||||
"inbox_uploads: row %s missing request_body; cannot fetch",
|
||||
row.get("id"),
|
||||
)
|
||||
return None
|
||||
|
||||
file_id = body.get("file_id")
|
||||
if not isinstance(file_id, str) or not file_id:
|
||||
logger.warning(
|
||||
"inbox_uploads: row %s has no file_id in request_body",
|
||||
row.get("id"),
|
||||
)
|
||||
return None
|
||||
|
||||
pending_uri = body.get("uri")
|
||||
if not isinstance(pending_uri, str) or not pending_uri:
|
||||
# Reconstruct what the platform would have written — defensive
|
||||
# against a row whose uri field got truncated. Same shape as the
|
||||
# Go handler's URI builder.
|
||||
pending_uri = f"platform-pending:{workspace_id}/{file_id}"
|
||||
|
||||
filename = body.get("name") or "file"
|
||||
if not isinstance(filename, str):
|
||||
filename = "file"
|
||||
|
||||
# Caller-supplied client: reuse for both GET + POST ack. Otherwise
|
||||
# build a one-shot client and close it on the way out. Lazy httpx
|
||||
# import keeps the standalone MCP path's optional dep optional.
|
||||
own_client = client is None
|
||||
if own_client:
|
||||
try:
|
||||
import httpx # noqa: WPS433
|
||||
except ImportError:
|
||||
logger.error("inbox_uploads: httpx not installed; cannot fetch %s", file_id)
|
||||
return None
|
||||
client = httpx.Client(timeout=timeout_secs)
|
||||
|
||||
try:
|
||||
return _fetch_and_stage_with_client(
|
||||
client,
|
||||
platform_url=platform_url,
|
||||
workspace_id=workspace_id,
|
||||
headers=headers,
|
||||
file_id=file_id,
|
||||
pending_uri=pending_uri,
|
||||
filename=filename,
|
||||
body=body,
|
||||
)
|
||||
finally:
|
||||
if own_client:
|
||||
try:
|
||||
client.close()
|
||||
except Exception: # noqa: BLE001 — close should never crash the caller
|
||||
pass
|
||||
|
||||
|
||||
def _fetch_and_stage_with_client(
|
||||
client: Any,
|
||||
*,
|
||||
platform_url: str,
|
||||
workspace_id: str,
|
||||
headers: dict[str, str],
|
||||
file_id: str,
|
||||
pending_uri: str,
|
||||
filename: str,
|
||||
body: dict[str, Any],
|
||||
) -> str | None:
|
||||
"""Inner body of fetch_and_stage. Always uses the supplied client for
|
||||
both GET and POST so the connection pool is shared across the call.
|
||||
"""
|
||||
content_url = f"{platform_url}/workspaces/{workspace_id}/pending-uploads/{file_id}/content"
|
||||
ack_url = f"{platform_url}/workspaces/{workspace_id}/pending-uploads/{file_id}/ack"
|
||||
|
||||
try:
|
||||
resp = client.get(content_url, headers=headers)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning("inbox_uploads: GET %s failed: %s", content_url, exc)
|
||||
return None
|
||||
|
||||
if resp.status_code == 404:
|
||||
# Row was swept or already acked by a previous poll race — nothing
|
||||
# to fetch. Don't ack again; the platform's GC handles it. This is
|
||||
# a soft-skip, not an error — log at INFO so triage isn't noisy.
|
||||
logger.info(
|
||||
"inbox_uploads: pending upload %s already gone (404); skipping",
|
||||
file_id,
|
||||
)
|
||||
return None
|
||||
if resp.status_code >= 400:
|
||||
logger.warning(
|
||||
"inbox_uploads: GET %s returned %d: %s",
|
||||
content_url,
|
||||
resp.status_code,
|
||||
(resp.text or "")[:200],
|
||||
)
|
||||
return None
|
||||
|
||||
content = resp.content or b""
|
||||
if len(content) > MAX_FILE_BYTES:
|
||||
logger.warning(
|
||||
"inbox_uploads: refusing to stage %s — size %d exceeds cap %d",
|
||||
file_id,
|
||||
len(content),
|
||||
MAX_FILE_BYTES,
|
||||
)
|
||||
return None
|
||||
|
||||
# Mimetype precedence: platform's Content-Type header → request_body
|
||||
# mimeType field → extension guess. Same precedence as the in-
|
||||
# container ingest handler.
|
||||
mime_header = resp.headers.get("content-type", "").split(";")[0].strip()
|
||||
mime = (
|
||||
mime_header
|
||||
or (body.get("mimeType") if isinstance(body.get("mimeType"), str) else "")
|
||||
or (mimetypes.guess_type(filename)[0] or "")
|
||||
)
|
||||
|
||||
try:
|
||||
local_uri = stage_to_disk(content, filename)
|
||||
except (OSError, ValueError) as exc:
|
||||
logger.error(
|
||||
"inbox_uploads: failed to stage %s (%s) to disk: %s",
|
||||
file_id,
|
||||
filename,
|
||||
exc,
|
||||
)
|
||||
return None
|
||||
|
||||
_cache.set(pending_uri, local_uri)
|
||||
logger.info(
|
||||
"inbox_uploads: staged file_id=%s name=%s size=%d mime=%s pending_uri=%s local_uri=%s",
|
||||
file_id,
|
||||
filename,
|
||||
len(content),
|
||||
mime,
|
||||
pending_uri,
|
||||
local_uri,
|
||||
)
|
||||
|
||||
# Ack last so a write failure above leaves the row available for a
|
||||
# retry on the next poll. A failed ack is logged but doesn't roll
|
||||
# back the on-disk file — the platform's sweep will clean up
|
||||
# eventually.
|
||||
try:
|
||||
ack_resp = client.post(ack_url, headers=headers)
|
||||
if ack_resp.status_code >= 400:
|
||||
logger.warning(
|
||||
"inbox_uploads: ack %s returned %d: %s",
|
||||
ack_url,
|
||||
ack_resp.status_code,
|
||||
(ack_resp.text or "")[:200],
|
||||
)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning("inbox_uploads: POST %s failed: %s", ack_url, exc)
|
||||
|
||||
return local_uri
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# BatchFetcher — concurrent fetch across a single poll batch
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class BatchFetcher:
|
||||
"""Fetch + stage + ack a batch of upload-receive rows concurrently.
|
||||
|
||||
Why this exists: the inbox poll loop used to call ``fetch_and_stage``
|
||||
serially per row. With N upload rows in a batch (a user dragging
|
||||
multiple files into chat at once), the loop blocked for
|
||||
``N × per_fetch_latency`` before processing the chat message that
|
||||
referenced them — a 4-file upload at 5s each = 20s of stall
|
||||
before the agent saw the user's prompt. ``BatchFetcher`` runs the
|
||||
fetches on a small thread pool (default 4 workers) so the stall is
|
||||
bounded by ``ceil(N/W) × per_fetch_latency`` instead.
|
||||
|
||||
Connection reuse: one ``httpx.Client`` is shared across every fetch
|
||||
in the batch. httpx clients carry a connection pool, so a second
|
||||
fetch to the same platform host reuses the TCP+TLS handshake from
|
||||
the first — measurable win when fetches happen back-to-back.
|
||||
|
||||
Correctness invariant the caller MUST preserve: the inbox loop is
|
||||
expected to call ``wait_all()`` before processing the chat-message
|
||||
activity row that REFERENCES one of these uploads. Without the
|
||||
barrier, the URI cache is empty when ``rewrite_request_body`` runs
|
||||
and the agent sees the un-rewritten ``platform-pending:`` URI. The
|
||||
caller-side test ``test_poll_once_waits_for_uploads_before_messages``
|
||||
pins this end-to-end.
|
||||
|
||||
Use as a context manager so the executor + client are torn down
|
||||
even if the caller raises mid-batch.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
platform_url: str,
|
||||
workspace_id: str,
|
||||
headers: dict[str, str],
|
||||
timeout_secs: float = DEFAULT_FETCH_TIMEOUT,
|
||||
max_workers: int = DEFAULT_BATCH_FETCH_WORKERS,
|
||||
client: Any = None,
|
||||
):
|
||||
self._platform_url = platform_url
|
||||
self._workspace_id = workspace_id
|
||||
self._headers = dict(headers) # copy so caller mutations don't leak in
|
||||
self._timeout_secs = timeout_secs
|
||||
|
||||
# Caller can inject a client (tests do this); production callers
|
||||
# let us build one. Track ownership so we only close ours.
|
||||
self._own_client = client is None
|
||||
if self._own_client:
|
||||
try:
|
||||
import httpx # noqa: WPS433
|
||||
except ImportError:
|
||||
# Match fetch_and_stage's behavior: log + degrade rather
|
||||
# than raising at construction time. submit() will then
|
||||
# return None for every row.
|
||||
logger.error("inbox_uploads: httpx not installed; BatchFetcher inert")
|
||||
self._client: Any = None
|
||||
else:
|
||||
self._client = httpx.Client(timeout=timeout_secs)
|
||||
else:
|
||||
self._client = client
|
||||
|
||||
self._executor = concurrent.futures.ThreadPoolExecutor(
|
||||
max_workers=max_workers,
|
||||
thread_name_prefix="upload-fetch",
|
||||
)
|
||||
self._futures: list[concurrent.futures.Future[Any]] = []
|
||||
self._closed = False
|
||||
# Flipped to True by wait_all when the timeout fires; close()
|
||||
# reads this to decide between drain-and-wait vs cancel-queued.
|
||||
self._timed_out = False
|
||||
|
||||
def submit(self, row: dict[str, Any]) -> concurrent.futures.Future[Any] | None:
|
||||
"""Submit ``row`` for fetch + stage + ack. Non-blocking — the
|
||||
worker thread runs ``fetch_and_stage`` with the shared client.
|
||||
|
||||
Returns the Future so a caller that wants per-row outcome can
|
||||
await it; ``None`` if the BatchFetcher is in a degraded state
|
||||
(httpx missing).
|
||||
"""
|
||||
if self._closed:
|
||||
raise RuntimeError("BatchFetcher: submit after close")
|
||||
if self._client is None:
|
||||
return None
|
||||
fut = self._executor.submit(
|
||||
fetch_and_stage,
|
||||
row,
|
||||
platform_url=self._platform_url,
|
||||
workspace_id=self._workspace_id,
|
||||
headers=self._headers,
|
||||
timeout_secs=self._timeout_secs,
|
||||
client=self._client,
|
||||
)
|
||||
self._futures.append(fut)
|
||||
return fut
|
||||
|
||||
def wait_all(self, timeout: float | None = DEFAULT_BATCH_WAIT_TIMEOUT) -> None:
|
||||
"""Block until every submitted future completes (or times out).
|
||||
|
||||
Per-future exceptions are logged + swallowed — ``fetch_and_stage``
|
||||
already converts every error path to ``return None``, so a real
|
||||
exception propagating up to here is unexpected and we don't want
|
||||
one bad fetch to abort the whole batch.
|
||||
|
||||
Timeouts are also logged + swallowed AND record the timed-out
|
||||
futures on ``self._timed_out`` so ``close`` can cancel them
|
||||
without paying their full latency. Without this hand-off,
|
||||
``close()``'s ``shutdown(wait=True)`` would block on the leaked
|
||||
workers and undo the user-facing timeout — the inbox poll loop
|
||||
would stall indefinitely on a hung /content fetch.
|
||||
"""
|
||||
if not self._futures:
|
||||
return
|
||||
try:
|
||||
done, not_done = concurrent.futures.wait(
|
||||
self._futures,
|
||||
timeout=timeout,
|
||||
return_when=concurrent.futures.ALL_COMPLETED,
|
||||
)
|
||||
except Exception as exc: # noqa: BLE001 — concurrent.futures shouldn't raise here
|
||||
logger.warning("inbox_uploads: BatchFetcher.wait_all crashed: %s", exc)
|
||||
return
|
||||
for fut in done:
|
||||
exc = fut.exception()
|
||||
if exc is not None:
|
||||
logger.warning(
|
||||
"inbox_uploads: BatchFetcher worker raised: %s", exc
|
||||
)
|
||||
if not_done:
|
||||
logger.warning(
|
||||
"inbox_uploads: BatchFetcher.wait_all left %d in-flight after %ss timeout",
|
||||
len(not_done),
|
||||
timeout,
|
||||
)
|
||||
# Mark these futures so close() knows to cancel-not-wait. We
|
||||
# cancel queued-but-not-started ones immediately; futures
|
||||
# already running can't be cancelled (Python's threading
|
||||
# model), but close() will pass cancel_futures=True so any
|
||||
# remaining queued items don't run.
|
||||
for fut in not_done:
|
||||
fut.cancel()
|
||||
self._timed_out = True
|
||||
|
||||
def close(self) -> None:
|
||||
"""Tear down the executor + (if owned) the httpx client.
|
||||
|
||||
Idempotent. After close, ``submit`` raises and the BatchFetcher
|
||||
cannot be reused — construct a fresh one for the next poll.
|
||||
|
||||
If ``wait_all`` reported a timeout, shutdown skips the
|
||||
``wait=True`` drain and instead asks the executor to drop queued
|
||||
futures (``cancel_futures=True``). Currently-running workers
|
||||
can't be interrupted by Python's threading model, but the poll
|
||||
loop returns immediately rather than blocking on a hung fetch.
|
||||
"""
|
||||
if self._closed:
|
||||
return
|
||||
self._closed = True
|
||||
timed_out = getattr(self, "_timed_out", False)
|
||||
try:
|
||||
if timed_out:
|
||||
# cancel_futures landed in Python 3.9 — guarded for older
|
||||
# interpreters via a TypeError fallback. Drop queued
|
||||
# tasks; running ones will exit when their httpx call
|
||||
# eventually returns or the daemon thread dies.
|
||||
try:
|
||||
self._executor.shutdown(wait=False, cancel_futures=True)
|
||||
except TypeError:
|
||||
self._executor.shutdown(wait=False)
|
||||
else:
|
||||
# Healthy path: wait for in-flight work so we don't
|
||||
# interrupt a fetch mid-write.
|
||||
self._executor.shutdown(wait=True)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning("inbox_uploads: executor shutdown error: %s", exc)
|
||||
if self._own_client and self._client is not None:
|
||||
try:
|
||||
self._client.close()
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning("inbox_uploads: client close error: %s", exc)
|
||||
|
||||
def __enter__(self) -> "BatchFetcher":
|
||||
return self
|
||||
|
||||
def __exit__(self, exc_type, exc, tb) -> None:
|
||||
self.close()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# URI rewrite for incoming chat messages
|
||||
# ---------------------------------------------------------------------------
|
||||
#
|
||||
# The chat message that references a staged upload arrives as a
|
||||
# SEPARATE activity_log row, with parts of kind=file containing
|
||||
# platform-pending: URIs in the file.uri field. Walk the structure
|
||||
# in-place and rewrite to the local workspace: URI when the cache has it.
|
||||
# Unknown URIs pass through unchanged — the agent gets to choose how
|
||||
# to react (most runtimes log + ignore an unresolvable URI).
|
||||
|
||||
|
||||
def _rewrite_part(part: Any) -> None:
|
||||
"""Mutate a single A2A Part dict to swap platform-pending: URIs."""
|
||||
if not isinstance(part, dict):
|
||||
return
|
||||
file_obj = part.get("file")
|
||||
if not isinstance(file_obj, dict):
|
||||
return
|
||||
uri = file_obj.get("uri")
|
||||
if not isinstance(uri, str) or not uri.startswith("platform-pending:"):
|
||||
return
|
||||
rewritten = _cache.get(uri)
|
||||
if rewritten:
|
||||
file_obj["uri"] = rewritten
|
||||
|
||||
|
||||
def rewrite_request_body(body: Any) -> None:
|
||||
"""Mutate ``body`` in-place, replacing platform-pending: URIs with
|
||||
the cached local equivalents.
|
||||
|
||||
Walks the same shapes ``inbox._extract_text`` accepts:
|
||||
|
||||
- ``body['parts']``
|
||||
- ``body['params']['parts']``
|
||||
- ``body['params']['message']['parts']``
|
||||
|
||||
No-op for shapes that don't match — the message simply passes
|
||||
through to the agent as-is.
|
||||
"""
|
||||
if not isinstance(body, dict):
|
||||
return
|
||||
candidates: list[Any] = []
|
||||
params = body.get("params") if isinstance(body.get("params"), dict) else None
|
||||
if params:
|
||||
message = params.get("message") if isinstance(params.get("message"), dict) else None
|
||||
if message:
|
||||
candidates.append(message.get("parts"))
|
||||
candidates.append(params.get("parts"))
|
||||
candidates.append(body.get("parts"))
|
||||
|
||||
for parts in candidates:
|
||||
if isinstance(parts, list):
|
||||
for part in parts:
|
||||
_rewrite_part(part)
|
||||
@@ -1,51 +0,0 @@
|
||||
"""Helpers for the workspace's one-shot initial_prompt.
|
||||
|
||||
Kept as a standalone module (no heavy imports like uvicorn) so the marker
|
||||
logic is unit-testable without standing up the full workspace runtime.
|
||||
|
||||
Background: the workspace runtime supports an `initial_prompt` that runs once
|
||||
on first boot (clone the repo, set git hooks, read CLAUDE.md, commit_memory).
|
||||
A marker file `.initial_prompt_done` prevents the prompt from re-running on
|
||||
subsequent boots.
|
||||
|
||||
Prior behaviour wrote the marker AFTER the prompt completed successfully. If
|
||||
the prompt crashed mid-execution (e.g. ProcessError from a stale Claude
|
||||
session), the marker was never written; every subsequent container boot
|
||||
replayed the same failing prompt, cascading into "every message crashes until
|
||||
an operator intervenes." See GitHub issue #71.
|
||||
|
||||
Fix (2026-04-12): write the marker BEFORE firing the prompt. If the prompt
|
||||
fails, operators re-send it manually via chat — cheap and available — instead
|
||||
of trapping the workspace in a crash loop.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
|
||||
|
||||
def resolve_initial_prompt_marker(config_path: str) -> str:
|
||||
"""Return the path where the `.initial_prompt_done` marker should live.
|
||||
|
||||
Prefers ``<config_path>/.initial_prompt_done`` when the directory is
|
||||
writable; falls back to ``/workspace/.initial_prompt_done`` for containers
|
||||
where ``/configs`` is read-only.
|
||||
"""
|
||||
if os.access(config_path, os.W_OK):
|
||||
return os.path.join(config_path, ".initial_prompt_done")
|
||||
return "/workspace/.initial_prompt_done"
|
||||
|
||||
|
||||
def mark_initial_prompt_attempted(marker_path: str) -> bool:
|
||||
"""Write the marker best-effort. Return True on success, False on I/O error.
|
||||
|
||||
Called BEFORE the initial-prompt self-message is sent. If the attempt
|
||||
later fails, the marker is still present — so the next container boot
|
||||
does NOT replay the same failing prompt. Operators retry manually via
|
||||
the chat interface instead of relying on auto-replay.
|
||||
"""
|
||||
try:
|
||||
with open(marker_path, "w") as f:
|
||||
f.write("attempted")
|
||||
return True
|
||||
except OSError:
|
||||
return False
|
||||
@@ -1,287 +0,0 @@
|
||||
"""POST /internal/chat/uploads/ingest — workspace-side chat upload sink.
|
||||
|
||||
Replaces the Docker-exec / tar-copy path the platform-side workspace-server
|
||||
used historically (see RFC #2312). The platform forwards the multipart
|
||||
request to this handler with a Bearer header carrying the workspace's
|
||||
inbound secret; this handler validates, writes each file under
|
||||
``/workspace/.molecule/chat-uploads/<random>-<sanitized-name>``, and
|
||||
returns the same ``ChatUploadedFile`` shape the platform Go handler
|
||||
returned previously, so callers (canvas, molecli, A2A tools) see no
|
||||
contract change.
|
||||
|
||||
Why no platform-side Docker-exec equivalent here:
|
||||
The handler runs INSIDE the workspace container, which already has
|
||||
direct filesystem access to /workspace. mkdir + open + write is
|
||||
enough — no archive ceremony, no remote-exec round-trip, no
|
||||
docker socket dependency. Same code path on local Docker and SaaS
|
||||
EC2; the bug behind #2308 (platform's findContainer is nil in
|
||||
SaaS) cannot exist here by construction.
|
||||
|
||||
Path safety:
|
||||
sanitize_filename strips everything outside [A-Za-z0-9._-], collapses
|
||||
spaces, refuses ``""``/`"."`/`".."`, and caps length at 100 chars
|
||||
(preserving extension if ≤16 chars). Files are written with
|
||||
O_CREAT|O_EXCL|O_NOFOLLOW so a pre-existing symlink at the target
|
||||
cannot redirect the write to /etc/* or any sensitive location, and
|
||||
a colliding name fails fast (the random prefix already makes
|
||||
collisions astronomical, but defense-in-depth costs nothing).
|
||||
|
||||
Limits (matches the Go contract from chat_files.go):
|
||||
- 100 MB total request body
|
||||
- 100 MB per file
|
||||
- filename truncated to 100 chars
|
||||
|
||||
Response shape:
|
||||
{"files": [
|
||||
{"uri": "workspace:/workspace/.molecule/chat-uploads/<id>-<name>",
|
||||
"name": "<sanitized name>",
|
||||
"mimeType": "<content-type or guessed>",
|
||||
"size": <bytes>}
|
||||
]}
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import mimetypes
|
||||
import os
|
||||
import re
|
||||
import secrets as pysecrets
|
||||
from pathlib import Path
|
||||
|
||||
from starlette.requests import Request
|
||||
from starlette.responses import JSONResponse
|
||||
|
||||
from platform_inbound_auth import get_inbound_secret, inbound_authorized
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# In-container destination — must match the platform-side Go constant
|
||||
# `chatUploadDir` so the URI scheme stays identical and existing canvas
|
||||
# / agent code that resolves "workspace:/workspace/.molecule/chat-uploads/*"
|
||||
# keeps working unchanged.
|
||||
CHAT_UPLOAD_DIR = "/workspace/.molecule/chat-uploads"
|
||||
|
||||
# Total-request body cap. multipart/form-data with multiple parts can
|
||||
# add ~100 bytes of framing per file; the cap is the bytes hitting the
|
||||
# socket, including framing.
|
||||
#
|
||||
# SERVER_MIRROR: keep aligned with workspace-server/internal/handlers/
|
||||
# chat_files.go chatUploadMaxBytes AND canvas/src/components/tabs/chat/
|
||||
# uploads.ts MAX_UPLOAD_BYTES. Three constants exist (platform Go +
|
||||
# workspace Python + canvas TS) because each layer must enforce or
|
||||
# pre-flight the cap on its own; an SSOT follow-up tracked in
|
||||
# molecule-ai/internal would expose the cap via GET /uploads/limits.
|
||||
CHAT_UPLOAD_MAX_BYTES = 100 * 1024 * 1024 # 100 MB
|
||||
|
||||
# Per-file cap. Aligned with the total at 100 MB so a single legitimate
|
||||
# large file (e.g. a 70 MB PDF — reno-stars 2026-05-19 forensic
|
||||
# a99ab0a1) succeeds end-to-end; batched small attachments still fit
|
||||
# under the same ceiling.
|
||||
CHAT_UPLOAD_MAX_FILE_BYTES = 100 * 1024 * 1024 # 100 MB
|
||||
|
||||
# Conservative {alnum, dot, underscore, dash} character class — anything
|
||||
# outside gets rewritten so embedded paths, control chars, newlines,
|
||||
# quotes, and shell metachars never reach the filesystem.
|
||||
_UNSAFE_FILENAME_CHARS = re.compile(r"[^a-zA-Z0-9._\-]")
|
||||
|
||||
|
||||
def sanitize_filename(name: str) -> str:
|
||||
"""Reduce a user-supplied filename to a safe form.
|
||||
|
||||
Mirrors workspace-server/internal/handlers/chat_files.go::sanitizeFilename
|
||||
so canvas-emitted URIs stay identical regardless of which path
|
||||
handles the upload.
|
||||
"""
|
||||
base = os.path.basename(name)
|
||||
base = base.replace(" ", "_")
|
||||
base = _UNSAFE_FILENAME_CHARS.sub("_", base)
|
||||
if len(base) > 100:
|
||||
ext = ""
|
||||
dot = base.rfind(".")
|
||||
if dot >= 0 and len(base) - dot <= 16:
|
||||
ext = base[dot:]
|
||||
base = base[: 100 - len(ext)] + ext
|
||||
if base in ("", ".", ".."):
|
||||
return "file"
|
||||
return base
|
||||
|
||||
|
||||
def _open_safe(path: str) -> int:
|
||||
"""Open `path` for write with O_CREAT|O_EXCL|O_NOFOLLOW.
|
||||
|
||||
Refuses to follow a pre-existing symlink at the target, and refuses
|
||||
to overwrite an existing regular file. Both protections close the
|
||||
same class of attack: a process inside the workspace container that
|
||||
raced to create a symlink at the destination before the upload landed.
|
||||
The random 16-byte prefix on the stored name makes the race
|
||||
effectively impossible, but defense-in-depth costs nothing here.
|
||||
"""
|
||||
flags = os.O_WRONLY | os.O_CREAT | os.O_EXCL
|
||||
# O_NOFOLLOW is POSIX; refuses to open if the path is a symlink.
|
||||
if hasattr(os, "O_NOFOLLOW"):
|
||||
flags |= os.O_NOFOLLOW
|
||||
return os.open(path, flags, 0o600)
|
||||
|
||||
|
||||
async def ingest_handler(request: Request) -> JSONResponse:
|
||||
"""POST /internal/chat/uploads/ingest — Starlette route handler.
|
||||
|
||||
Auth: Bearer <platform_inbound_secret>; fail-closed when the secret
|
||||
file is missing or empty.
|
||||
|
||||
Body: multipart/form-data with one or more `files` parts.
|
||||
|
||||
Returns 200 with the list of stored URIs on success, or one of:
|
||||
401 unauthorized — bad / missing bearer
|
||||
400 bad request — malformed multipart, no files field, etc.
|
||||
413 payload too large — total body or per-file over cap
|
||||
500 internal — disk write failed
|
||||
"""
|
||||
if not inbound_authorized(get_inbound_secret(), request.headers.get("Authorization", "")):
|
||||
return JSONResponse({"error": "unauthorized"}, status_code=401)
|
||||
|
||||
# Total-body guard. Starlette won't enforce this for us; we read
|
||||
# Content-Length first and reject early to avoid streaming a 5 GB
|
||||
# request through the multipart parser only to bail at the end.
|
||||
cl_str = request.headers.get("Content-Length", "")
|
||||
if cl_str:
|
||||
try:
|
||||
cl = int(cl_str)
|
||||
except ValueError:
|
||||
cl = -1
|
||||
if cl > CHAT_UPLOAD_MAX_BYTES:
|
||||
return JSONResponse(
|
||||
{"error": f"request body exceeds total limit ({CHAT_UPLOAD_MAX_BYTES // (1024*1024)} MB)"},
|
||||
status_code=413,
|
||||
)
|
||||
|
||||
try:
|
||||
form = await request.form(max_files=64, max_fields=32)
|
||||
except Exception as exc: # multipart parse error
|
||||
# Surface exc.class + str(exc) to the caller. Prior behavior returned
|
||||
# only the opaque {"error": "failed to parse multipart form"}, which
|
||||
# took ~25 min to root-cause in forensic a78762a0 (Hermes workspace
|
||||
# PDF upload, 2026-05-19) — the underlying cause was a MISSING
|
||||
# python-multipart dep, surfaced as an AssertionError from Starlette's
|
||||
# parser. Surfacing exception class + detail in the 400 body would
|
||||
# have cut that to ~10 min. Per feedback_surface_actionable_failure_
|
||||
# reason_to_user (CTO 2026-05-17): user-facing failures MUST tell the
|
||||
# user WHY. Top-level "error" key is preserved for backwards-compat
|
||||
# with existing canvas / alert rules.
|
||||
logger.warning(
|
||||
"internal_chat_uploads: multipart parse failed: %s: %s",
|
||||
type(exc).__name__, exc,
|
||||
)
|
||||
return JSONResponse(
|
||||
{
|
||||
"error": "failed to parse multipart form",
|
||||
"exception": type(exc).__name__,
|
||||
"detail": str(exc),
|
||||
},
|
||||
status_code=400,
|
||||
)
|
||||
|
||||
# Starlette's FormData allows multiple values per key — `files` may
|
||||
# appear multiple times for batched uploads. getlist returns them
|
||||
# in order.
|
||||
parts = form.getlist("files")
|
||||
if not parts:
|
||||
return JSONResponse({"error": "expected at least one 'files' field"}, status_code=400)
|
||||
|
||||
# Filter out non-file entries defensively. Starlette's UploadFile
|
||||
# has a .filename attribute; plain string fields don't.
|
||||
uploads = [p for p in parts if hasattr(p, "filename") and hasattr(p, "read")]
|
||||
if not uploads:
|
||||
return JSONResponse({"error": "expected at least one 'files' field"}, status_code=400)
|
||||
|
||||
# mkdir -p is idempotent. Fired every call so a container restart
|
||||
# that wipes /workspace/.molecule doesn't surprise us.
|
||||
try:
|
||||
Path(CHAT_UPLOAD_DIR).mkdir(parents=True, exist_ok=True)
|
||||
except OSError as exc:
|
||||
# Surface errno + path in the response so a fresh-tenant
|
||||
# "failed to prepare uploads dir" 500 self-diagnoses without
|
||||
# requiring SSM access to the workspace stderr. Prior incident
|
||||
# 2026-05-01: hongming.moleculesai.app hit EACCES on the
|
||||
# /workspace volume's `.molecule` subtree (root-owned race
|
||||
# window between Docker volume create and entrypoint's chown,
|
||||
# fixed via molecule-ai-workspace-template-claude-code#23).
|
||||
# The errno + path are not security-sensitive — both are
|
||||
# well-known to anyone with workspace access.
|
||||
logger.error("internal_chat_uploads: mkdir %s failed: %s", CHAT_UPLOAD_DIR, exc)
|
||||
return JSONResponse(
|
||||
{
|
||||
"error": "failed to prepare uploads dir",
|
||||
"path": CHAT_UPLOAD_DIR,
|
||||
"errno": exc.errno,
|
||||
"detail": str(exc),
|
||||
},
|
||||
status_code=500,
|
||||
)
|
||||
|
||||
response_files: list[dict] = []
|
||||
total_bytes = 0
|
||||
for upload in uploads:
|
||||
# Read into memory with a hard cap. Files larger than the cap
|
||||
# surface as 413; we don't truncate silently.
|
||||
data = await upload.read(CHAT_UPLOAD_MAX_FILE_BYTES + 1)
|
||||
if len(data) > CHAT_UPLOAD_MAX_FILE_BYTES:
|
||||
return JSONResponse(
|
||||
{"error": f"{upload.filename} exceeds per-file limit ({CHAT_UPLOAD_MAX_FILE_BYTES // (1024*1024)} MB)"},
|
||||
status_code=413,
|
||||
)
|
||||
total_bytes += len(data)
|
||||
if total_bytes > CHAT_UPLOAD_MAX_BYTES:
|
||||
return JSONResponse(
|
||||
{"error": f"total request body exceeds limit ({CHAT_UPLOAD_MAX_BYTES // (1024*1024)} MB)"},
|
||||
status_code=413,
|
||||
)
|
||||
|
||||
sanitized = sanitize_filename(upload.filename or "file")
|
||||
# 16-byte random prefix → 32-hex-char + sanitized name. Same
|
||||
# shape as the Go handler's `hex.EncodeToString(rand 16) + "-" + name`.
|
||||
prefix = pysecrets.token_hex(16)
|
||||
stored = f"{prefix}-{sanitized}"
|
||||
target = os.path.join(CHAT_UPLOAD_DIR, stored)
|
||||
|
||||
try:
|
||||
fd = _open_safe(target)
|
||||
except FileExistsError:
|
||||
# 32 hex chars of entropy → 128 bits → re-collision is
|
||||
# astronomical. If we hit it anyway, surface as 500 rather
|
||||
# than overwriting; the next retry will pick a fresh prefix.
|
||||
logger.error("internal_chat_uploads: collision at %s — refusing overwrite", target)
|
||||
return JSONResponse({"error": "internal collision; retry"}, status_code=500)
|
||||
except OSError as exc:
|
||||
logger.error("internal_chat_uploads: open %s failed: %s", target, exc)
|
||||
return JSONResponse({"error": "failed to write file"}, status_code=500)
|
||||
|
||||
try:
|
||||
with os.fdopen(fd, "wb") as f:
|
||||
f.write(data)
|
||||
except OSError as exc:
|
||||
logger.error("internal_chat_uploads: write %s failed: %s", target, exc)
|
||||
# Best-effort cleanup of the partial file. unlink can fail
|
||||
# if the file was never created (open succeeded but write
|
||||
# failed before any bytes hit disk) or if the dir was
|
||||
# concurrently torn down — neither case warrants surfacing.
|
||||
try:
|
||||
os.unlink(target)
|
||||
except OSError as unlink_exc:
|
||||
logger.debug("internal_chat_uploads: unlink %s after write fail: %s", target, unlink_exc)
|
||||
return JSONResponse({"error": "failed to write file"}, status_code=500)
|
||||
|
||||
# Mime type: prefer the part's Content-Type header, fall back to
|
||||
# extension-based guess. matches the Go handler's precedence.
|
||||
mime_type = upload.headers.get("content-type") if hasattr(upload, "headers") else None
|
||||
if not mime_type:
|
||||
mime_type, _ = mimetypes.guess_type(sanitized)
|
||||
|
||||
response_files.append({
|
||||
"uri": f"workspace:{CHAT_UPLOAD_DIR}/{stored}",
|
||||
"name": sanitized,
|
||||
"mimeType": mime_type or "",
|
||||
"size": len(data),
|
||||
})
|
||||
|
||||
return JSONResponse({"files": response_files}, status_code=200)
|
||||
@@ -1,134 +0,0 @@
|
||||
"""GET /internal/file/read?path=<abs path> — workspace-side file read sink.
|
||||
|
||||
Companion to /internal/chat/uploads/ingest (RFC #2312 PR-B). Replaces the
|
||||
docker-cp tar-stream extraction the platform-side workspace-server used
|
||||
in chat_files.go::Download. Same path-safety contract as the legacy Go
|
||||
handler:
|
||||
|
||||
* absolute path required
|
||||
* must canonicalise to itself (no `..` segments, no double-slashes)
|
||||
* must land under one of {/configs, /workspace, /home, /plugins}
|
||||
* must be a regular file (not a directory, symlink, device, etc.)
|
||||
|
||||
Why a single broad "/internal/file/read" instead of a chat-specific path:
|
||||
|
||||
Today's chat_files.go::Download already accepts paths under any of the
|
||||
four allowed roots — it's not strictly chat. Future PR-G/H will migrate
|
||||
/files/* template-config reads to the same forward pattern; reusing
|
||||
the same endpoint avoids three near-identical handlers (one per domain)
|
||||
with duplicated path-safety logic.
|
||||
|
||||
Auth: Bearer <platform_inbound_secret>; fail-closed when missing.
|
||||
|
||||
Response shape (matches Go contract for byte-for-byte compatibility):
|
||||
|
||||
Content-Type: <mime.guess from extension or application/octet-stream>
|
||||
Content-Length: <stat size>
|
||||
Content-Disposition: attachment; filename="<basename>"; filename*=UTF-8''<encoded>
|
||||
body: raw file bytes (binary-safe — no JSON wrapping)
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import mimetypes
|
||||
import os
|
||||
import urllib.parse
|
||||
from pathlib import Path
|
||||
|
||||
from starlette.requests import Request
|
||||
from starlette.responses import FileResponse, JSONResponse
|
||||
|
||||
from platform_inbound_auth import get_inbound_secret, inbound_authorized
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Mirror chat_files.go's allowedRoots set. A request whose `path` doesn't
|
||||
# fall under one of these — by exact-match or prefix-with-trailing-slash
|
||||
# — is rejected at the gate, regardless of how many `..` segments
|
||||
# canonicalised away.
|
||||
_ALLOWED_ROOTS = ("/configs", "/workspace", "/home", "/plugins")
|
||||
|
||||
|
||||
def _content_disposition_attachment(name: str) -> str:
|
||||
"""Mirror chat_files.go::contentDispositionAttachment.
|
||||
|
||||
Quotes, CR, and LF stripped/escaped per RFC 6266 / RFC 5987.
|
||||
Drop control chars, escape backslash and double-quote in the
|
||||
quoted-string. Emit percent-encoded filename* so non-ASCII names
|
||||
survive in clients that prefer the modern form.
|
||||
"""
|
||||
safe_q: list[str] = []
|
||||
for ch in name:
|
||||
if ch in ("\r", "\n"):
|
||||
continue # would terminate the header
|
||||
if ch in ('"', "\\"):
|
||||
safe_q.append("\\")
|
||||
safe_q.append(ch)
|
||||
continue
|
||||
if ord(ch) < 0x20 or ord(ch) == 0x7f:
|
||||
continue # other control chars
|
||||
safe_q.append(ch)
|
||||
ascii_safe = "".join(safe_q)
|
||||
encoded = urllib.parse.quote(name, safe="") # full RFC 3986 unreserved-only
|
||||
return f'attachment; filename="{ascii_safe}"; filename*=UTF-8\'\'{encoded}'
|
||||
|
||||
|
||||
def _validate_path(path: str) -> tuple[bool, str]:
|
||||
"""Return (ok, error_msg). Mirrors Go's chat_files.go::Download
|
||||
validation in the same order so error shapes stay identical."""
|
||||
if not path:
|
||||
return False, "path query required"
|
||||
if not os.path.isabs(path):
|
||||
return False, "path must be absolute"
|
||||
rooted = False
|
||||
for root in _ALLOWED_ROOTS:
|
||||
if path == root or path.startswith(root + "/"):
|
||||
rooted = True
|
||||
break
|
||||
if not rooted:
|
||||
return False, "path must be under /configs, /workspace, /home, or /plugins"
|
||||
# Reject anything that canonicalises differently or contains a
|
||||
# traversal segment. Defence-in-depth on top of the prefix check.
|
||||
if os.path.normpath(path) != path or ".." in path:
|
||||
return False, "invalid path"
|
||||
return True, ""
|
||||
|
||||
|
||||
async def file_read_handler(request: Request):
|
||||
"""GET /internal/file/read — Starlette route handler."""
|
||||
if not inbound_authorized(get_inbound_secret(), request.headers.get("Authorization", "")):
|
||||
return JSONResponse({"error": "unauthorized"}, status_code=401)
|
||||
|
||||
path = request.query_params.get("path", "")
|
||||
ok, err = _validate_path(path)
|
||||
if not ok:
|
||||
return JSONResponse({"error": err}, status_code=400)
|
||||
|
||||
# lstat (not stat) so a symlink at the path doesn't pretend to be the
|
||||
# file it points at — we want to know "is this LITERALLY a regular
|
||||
# file at the validated path." A symlink could redirect to /etc/*
|
||||
# or another mount.
|
||||
try:
|
||||
st = os.lstat(path)
|
||||
except FileNotFoundError:
|
||||
return JSONResponse({"error": "file not found"}, status_code=404)
|
||||
except OSError as exc:
|
||||
logger.warning("internal_file_read: lstat %s failed: %s", path, exc)
|
||||
return JSONResponse({"error": "stat failed"}, status_code=500)
|
||||
|
||||
import stat as _stat
|
||||
if not _stat.S_ISREG(st.st_mode):
|
||||
return JSONResponse({"error": "path is not a regular file"}, status_code=400)
|
||||
|
||||
name = os.path.basename(path)
|
||||
mime_type, _ = mimetypes.guess_type(name)
|
||||
if not mime_type:
|
||||
mime_type = "application/octet-stream"
|
||||
|
||||
return FileResponse(
|
||||
path,
|
||||
media_type=mime_type,
|
||||
headers={
|
||||
"Content-Disposition": _content_disposition_attachment(name),
|
||||
},
|
||||
)
|
||||
@@ -1,192 +0,0 @@
|
||||
"""Pre-stop serialization for pause/resume — GH#1391.
|
||||
|
||||
Captures the agent's in-memory state just before the container exits so it
|
||||
survives intentional pause and unplanned restart. All content is scrubbed
|
||||
with lib.snapshot_scrub before being written to disk so that a snapshot blob
|
||||
obtained by an attacker cannot recover API keys, tokens, or arbitrary sandbox
|
||||
output (GH#823).
|
||||
|
||||
State captured
|
||||
--------------
|
||||
- ``workspace_id`` — identity for cross-container restore
|
||||
- ``current_task`` — active task label from heartbeat (what the canvas sees)
|
||||
- ``active_tasks`` — task count
|
||||
- ``session_id`` — SDK session handle (Claude Code); key for full session
|
||||
- ``transcript_lines`` — recent session log lines from the adapter
|
||||
- ``uptime_seconds`` — how long this container has been running
|
||||
- ``timestamp`` — when the snapshot was taken (ISO-8601)
|
||||
|
||||
Scrubbing
|
||||
---------
|
||||
Every text field passes through scrub_snapshot before being written.
|
||||
Sandbox-sourced content (tool=run_code, source=sandbox, [sandbox_output]) is
|
||||
dropped wholesale. Secrets matching the pattern library are replaced with
|
||||
[REDACTED:TYPE] markers.
|
||||
|
||||
Storage
|
||||
-------
|
||||
Snapshots are written to /configs/.agent_snapshot.json by default. The
|
||||
config volume survives container restarts so the file is durable. The path
|
||||
is also overridable via ``AGENT_SNAPSHOT_PATH`` for testing or custom layouts.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from datetime import datetime, timezone
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from .snapshot_scrub import scrub_snapshot
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from heartbeat import HeartbeatLoop
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Default snapshot path — on the config volume, survives container restarts.
|
||||
DEFAULT_SNAPSHOT_PATH = os.environ.get(
|
||||
"AGENT_SNAPSHOT_PATH",
|
||||
"/configs/.agent_snapshot.json",
|
||||
)
|
||||
|
||||
# How many transcript lines to capture in the snapshot (recent window).
|
||||
MAX_TRANSCRIPT_LINES = 200
|
||||
|
||||
|
||||
def build_snapshot(
|
||||
heartbeat: "HeartbeatLoop | None",
|
||||
adapter_state: dict[str, Any],
|
||||
) -> dict[str, Any]:
|
||||
"""Build a raw snapshot dict from live workspace state.
|
||||
|
||||
Args:
|
||||
heartbeat: HeartbeatLoop instance; provides current_task, session_id, etc.
|
||||
adapter_state: Arbitrary state dict from the adapter's pre_stop_state() hook.
|
||||
Keys are free-form; all string values in nested dicts/lists are
|
||||
scrubbed before writing.
|
||||
|
||||
Returns a raw (not yet scrubbed) snapshot dict.
|
||||
"""
|
||||
import time
|
||||
|
||||
raw: dict[str, Any] = {
|
||||
"workspace_id": os.environ.get("WORKSPACE_ID", "unknown"),
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
# Defaults — heartbeat block below overwrites these when available:
|
||||
"current_task": "",
|
||||
"active_tasks": 0,
|
||||
}
|
||||
|
||||
if heartbeat is not None:
|
||||
raw["current_task"] = heartbeat.current_task or ""
|
||||
raw["active_tasks"] = heartbeat.active_tasks
|
||||
if hasattr(heartbeat, "start_time"):
|
||||
raw["uptime_seconds"] = int(time.time() - heartbeat.start_time)
|
||||
# session_id lives in the adapter but we also accept it via heartbeat
|
||||
# for convenience (avoids requiring every adapter to pass it separately).
|
||||
if not adapter_state.get("session_id"):
|
||||
raw["session_id"] = getattr(heartbeat, "_session_id", None) or ""
|
||||
|
||||
# Adapter-supplied state (conversation history, reasoning traces, etc.)
|
||||
raw["adapter"] = adapter_state
|
||||
|
||||
return raw
|
||||
|
||||
|
||||
def _scrub_value(value: Any) -> Any:
|
||||
"""Recursively scrub all secret patterns from a value.
|
||||
|
||||
- Strings: scrub_content() replaces patterns with [REDACTED:TYPE].
|
||||
- Dicts: return a new dict with all values scrubbed recursively.
|
||||
- Lists: drop entries that are sandbox content; scrub remaining items.
|
||||
- Other: pass through unchanged.
|
||||
"""
|
||||
from .snapshot_scrub import is_sandbox_content, scrub_content
|
||||
|
||||
if isinstance(value, str):
|
||||
return scrub_content(value)
|
||||
if isinstance(value, dict):
|
||||
return {k: _scrub_value(v) for k, v in value.items()}
|
||||
if isinstance(value, list):
|
||||
result = []
|
||||
for item in value:
|
||||
if isinstance(item, str) and is_sandbox_content(item):
|
||||
continue # Drop sandbox entries wholesale
|
||||
result.append(_scrub_value(item))
|
||||
return result
|
||||
return value
|
||||
|
||||
|
||||
def write_snapshot(
|
||||
snapshot: dict[str, Any],
|
||||
path: str | None = None,
|
||||
) -> bool:
|
||||
"""Scrub and write a snapshot to disk.
|
||||
|
||||
Args:
|
||||
snapshot: Raw snapshot dict from build_snapshot().
|
||||
path: Target file path (default: DEFAULT_SNAPSHOT_PATH).
|
||||
|
||||
Returns:
|
||||
True if the snapshot was written successfully; False on any error.
|
||||
Errors are logged but never raise — pre-stop serialization must be
|
||||
best-effort to avoid blocking shutdown.
|
||||
"""
|
||||
target = path or DEFAULT_SNAPSHOT_PATH
|
||||
|
||||
try:
|
||||
# Deep-scrub every string value in the snapshot to remove API keys,
|
||||
# tokens, and arbitrary sandbox output before writing to disk.
|
||||
scrubbed = _scrub_value(snapshot)
|
||||
|
||||
# Ensure parent directory exists.
|
||||
parent = os.path.dirname(target)
|
||||
if parent:
|
||||
os.makedirs(parent, exist_ok=True)
|
||||
|
||||
with open(target, "w") as f:
|
||||
json.dump(scrubbed, f, indent=2, default=str)
|
||||
|
||||
logger.info(
|
||||
"Pre-stop snapshot written: %s (workspace=%s, task=%r, lines=%d)",
|
||||
target,
|
||||
scrubbed.get("workspace_id", "?"),
|
||||
scrubbed.get("current_task", ""),
|
||||
len(scrubbed.get("adapter", {}).get("transcript_lines", [])),
|
||||
)
|
||||
return True
|
||||
|
||||
except Exception as exc:
|
||||
logger.warning("Pre-stop snapshot write failed (%s): %s", target, exc)
|
||||
return False
|
||||
|
||||
|
||||
def read_snapshot(
|
||||
path: str | None = None,
|
||||
) -> dict[str, Any] | None:
|
||||
"""Read and return a previously-written snapshot, or None if absent/invalid."""
|
||||
target = path or DEFAULT_SNAPSHOT_PATH
|
||||
|
||||
if not os.path.exists(target):
|
||||
return None
|
||||
|
||||
try:
|
||||
with open(target) as f:
|
||||
return json.load(f)
|
||||
except Exception as exc:
|
||||
logger.debug("Snapshot read failed (%s): %s", target, exc)
|
||||
return None
|
||||
|
||||
|
||||
def delete_snapshot(path: str | None = None) -> None:
|
||||
"""Remove a snapshot file. Idempotent — no error if absent."""
|
||||
target = path or DEFAULT_SNAPSHOT_PATH
|
||||
try:
|
||||
os.remove(target)
|
||||
logger.debug("Snapshot deleted: %s", target)
|
||||
except FileNotFoundError:
|
||||
pass
|
||||
except Exception as exc:
|
||||
logger.warning("Snapshot delete failed (%s): %s", target, exc)
|
||||
@@ -1,125 +0,0 @@
|
||||
"""Snapshot scrubbing — strip secrets and internal details from hibernation snapshots.
|
||||
|
||||
Issue #823 (sub of #799). Before the workspace runtime serializes a memory
|
||||
snapshot for hibernation, every memory entry's content must pass through
|
||||
this scrubber so an attacker who obtains a snapshot blob cannot recover:
|
||||
|
||||
- API keys (sk-ant-, sk-proj-, ghp_, etc.)
|
||||
- Auth tokens (Bearer headers, OAuth tokens)
|
||||
- Env-var assignments (ANTHROPIC_API_KEY=..., OPENAI_API_KEY=...)
|
||||
- Arbitrary subprocess output from the sandbox tool (can be anything)
|
||||
|
||||
The scrubber is a pure function so it can be unit-tested independently.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from typing import Any
|
||||
|
||||
|
||||
# Compiled once at import time — most-specific patterns first so that
|
||||
# env-var assignments are caught before the generic sk-* or base64 sweeps
|
||||
# swallow only part of the match.
|
||||
_SECRET_PATTERNS: list[tuple[re.Pattern[str], str]] = [
|
||||
# Env-var assignments: ANTHROPIC_API_KEY=sk-ant-... GITHUB_TOKEN=ghp_...
|
||||
(re.compile(r"(?i)\b[A-Z][A-Z0-9_]*_API_KEY\s*=\s*\S+"), "API_KEY"),
|
||||
(re.compile(r"(?i)\b[A-Z][A-Z0-9_]*_TOKEN\s*=\s*\S+"), "TOKEN"),
|
||||
(re.compile(r"(?i)\b[A-Z][A-Z0-9_]*_SECRET\s*=\s*\S+"), "SECRET"),
|
||||
# HTTP Bearer header values.
|
||||
(re.compile(r"Bearer\s+\S+"), "BEARER_TOKEN"),
|
||||
# OpenAI / Anthropic sk-... / sk-ant-... / sk-proj-... key format.
|
||||
(re.compile(r"sk-[A-Za-z0-9\-_]{16,}"), "SK_TOKEN"),
|
||||
# GitHub personal access tokens and installation tokens.
|
||||
(re.compile(r"ghp_[A-Za-z0-9]{20,}"), "GITHUB_PAT"),
|
||||
(re.compile(r"ghs_[A-Za-z0-9]{20,}"), "GITHUB_SERVER_TOKEN"),
|
||||
(re.compile(r"github_pat_[A-Za-z0-9_]{60,}"), "GITHUB_PAT_V2"),
|
||||
# AWS access key IDs.
|
||||
(re.compile(r"\bAKIA[A-Z0-9]{16}\b"), "AWS_ACCESS_KEY"),
|
||||
# Cloudflare API tokens.
|
||||
(re.compile(r"\bcfut_[A-Za-z0-9]{32,}"), "CF_TOKEN"),
|
||||
# Molecule partner API keys (Phase 34).
|
||||
(re.compile(r"\bmol_pk_[A-Za-z0-9]{20,}"), "MOL_PK"),
|
||||
# context7 tokens.
|
||||
(re.compile(r"\bctx7_[A-Za-z0-9]+"), "CTX7_TOKEN"),
|
||||
# High-entropy base64 blobs 33+ chars. Catches long opaque tokens that
|
||||
# don't match any structured pattern above.
|
||||
(re.compile(r"[A-Za-z0-9+/]{33,}={0,2}"), "BASE64_BLOB"),
|
||||
]
|
||||
|
||||
|
||||
# Substring markers that identify content from the run_code sandbox tool.
|
||||
# Any memory entry tagged with this source is excluded wholesale from the
|
||||
# snapshot — the arbitrary subprocess output cannot be safely scrubbed by
|
||||
# pattern alone (attacker could print `echo "innocent"` but have hidden
|
||||
# secrets in stderr or file handles).
|
||||
_SANDBOX_TOOL_MARKERS = (
|
||||
"source=sandbox",
|
||||
"tool=run_code",
|
||||
"[sandbox_output]",
|
||||
)
|
||||
|
||||
|
||||
def scrub_content(content: str) -> str:
|
||||
"""Return `content` with secret patterns replaced by [REDACTED:LABEL] markers.
|
||||
|
||||
Idempotent — running scrub_content on already-scrubbed output is a no-op
|
||||
because [REDACTED:...] doesn't match any of the patterns above.
|
||||
"""
|
||||
if not content:
|
||||
return content
|
||||
out = content
|
||||
for pattern, label in _SECRET_PATTERNS:
|
||||
out = pattern.sub(f"[REDACTED:{label}]", out)
|
||||
return out
|
||||
|
||||
|
||||
def is_sandbox_content(content: str) -> bool:
|
||||
"""Return True if `content` originates from the run_code sandbox tool.
|
||||
|
||||
Sandbox output can contain arbitrary subprocess stdout/stderr that may
|
||||
include secrets the scrubber wouldn't recognize (e.g. printed via a
|
||||
custom format). Entries matching this check should be excluded from
|
||||
the snapshot entirely rather than scrubbed.
|
||||
"""
|
||||
if not content:
|
||||
return False
|
||||
lower = content.lower()
|
||||
return any(marker in lower for marker in _SANDBOX_TOOL_MARKERS)
|
||||
|
||||
|
||||
def scrub_memory_entry(entry: dict[str, Any]) -> dict[str, Any] | None:
|
||||
"""Scrub a single memory entry for snapshot inclusion.
|
||||
|
||||
Returns a new dict with secrets redacted, or None if the entry must be
|
||||
excluded entirely (sandbox-sourced content).
|
||||
|
||||
The input dict is treated as read-only — callers should use the returned
|
||||
value and not mutate the original.
|
||||
"""
|
||||
content = entry.get("content", "")
|
||||
if is_sandbox_content(content):
|
||||
return None
|
||||
scrubbed = dict(entry)
|
||||
scrubbed["content"] = scrub_content(content)
|
||||
return scrubbed
|
||||
|
||||
|
||||
def scrub_snapshot(snapshot: dict[str, Any]) -> dict[str, Any]:
|
||||
"""Scrub a full snapshot payload before serialization.
|
||||
|
||||
Walks the `memories` list, scrubs each entry's content, and drops
|
||||
sandbox-sourced entries. Other snapshot fields (workspace metadata,
|
||||
config, etc.) pass through unchanged — they are not expected to contain
|
||||
user-supplied secret-bearing content.
|
||||
|
||||
Returns a new dict; the input is not mutated.
|
||||
"""
|
||||
out = dict(snapshot)
|
||||
memories = snapshot.get("memories") or []
|
||||
scrubbed_list = []
|
||||
for entry in memories:
|
||||
cleaned = scrub_memory_entry(entry)
|
||||
if cleaned is not None:
|
||||
scrubbed_list.append(cleaned)
|
||||
out["memories"] = scrubbed_list
|
||||
return out
|
||||
@@ -1,819 +0,0 @@
|
||||
"""Workspace runtime entry point.
|
||||
|
||||
Loads config -> discovers adapter -> setup -> create executor -> wrap in A2A -> register -> heartbeat.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import os
|
||||
import socket
|
||||
|
||||
import httpx
|
||||
import uvicorn
|
||||
# KI-009 a2a-sdk v1 migration: A2AStarletteApplication removed; use Starlette route factory
|
||||
from a2a.server.routes import create_agent_card_routes, create_jsonrpc_routes
|
||||
from a2a.server.request_handlers import DefaultRequestHandler
|
||||
from a2a.server.tasks import InMemoryTaskStore
|
||||
from a2a.types import AgentCard, AgentCapabilities, AgentSkill, AgentInterface
|
||||
from starlette.applications import Starlette
|
||||
|
||||
from adapters import get_adapter, AdapterConfig
|
||||
from agents_md import generate_agents_md
|
||||
from config import load_config
|
||||
from heartbeat import HeartbeatLoop
|
||||
from preflight import run_preflight, render_preflight_report
|
||||
from builtin_tools.awareness_client import get_awareness_config
|
||||
import uuid as _uuid
|
||||
|
||||
from builtin_tools.telemetry import setup_telemetry, make_trace_middleware
|
||||
from policies.namespaces import resolve_awareness_namespace
|
||||
|
||||
|
||||
from initial_prompt import (
|
||||
mark_initial_prompt_attempted,
|
||||
resolve_initial_prompt_marker,
|
||||
)
|
||||
from platform_auth import auth_headers, self_source_headers
|
||||
|
||||
|
||||
def get_machine_ip() -> str: # pragma: no cover
|
||||
"""Get the machine's IP for A2A discovery."""
|
||||
try:
|
||||
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
|
||||
s.connect(("8.8.8.8", 80))
|
||||
ip = s.getsockname()[0]
|
||||
s.close()
|
||||
return ip
|
||||
except Exception:
|
||||
return "127.0.0.1"
|
||||
|
||||
|
||||
def _check_delegation_results_pending() -> bool:
|
||||
"""Check if there are unconsumed delegation results waiting.
|
||||
|
||||
Reads ``DELEGATION_RESULTS_FILE``. Returns ``True`` if the file
|
||||
exists and contains non-whitespace content (after stripping) — meaning
|
||||
the idle loop should skip this tick. Returns ``False`` if the file is
|
||||
absent, empty, or contains only whitespace.
|
||||
|
||||
The extracted form lets unit tests call this directly rather than mirroring
|
||||
the logic (anti-pattern flagged as #401).
|
||||
"""
|
||||
from heartbeat import DELEGATION_RESULTS_FILE
|
||||
|
||||
try:
|
||||
with open(DELEGATION_RESULTS_FILE) as rf:
|
||||
rf.seek(0)
|
||||
return bool(rf.read().strip())
|
||||
except FileNotFoundError:
|
||||
return False
|
||||
|
||||
|
||||
# Re-exported from transcript_auth for the inline /transcript handler.
|
||||
# Separate module keeps the security-critical gate import-light + unit-testable.
|
||||
from transcript_auth import transcript_authorized as _transcript_authorized
|
||||
|
||||
|
||||
async def main(): # pragma: no cover
|
||||
workspace_id = os.environ.get("WORKSPACE_ID", "")
|
||||
if not workspace_id:
|
||||
raise SystemExit("FATAL: WORKSPACE_ID env var is not set. Aborting.")
|
||||
config_path = os.environ.get("WORKSPACE_CONFIG_PATH", "/configs")
|
||||
# Docker-aware default — host.docker.internal resolves the platform service
|
||||
# from inside the Docker network mesh; falls back to localhost for local dev.
|
||||
if os.path.exists("/.dockerenv") or os.environ.get("DOCKER_VERSION"):
|
||||
platform_url = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
|
||||
else:
|
||||
platform_url = os.environ.get("PLATFORM_URL", "http://localhost:8080")
|
||||
awareness_config = get_awareness_config()
|
||||
|
||||
# 0. Initialise OpenTelemetry (no-op if packages not installed)
|
||||
setup_telemetry(service_name=workspace_id)
|
||||
|
||||
# 0a. Fix /workspace perms before any agent code runs. Docker ships
|
||||
# named volumes as root:root 755 — without this the non-root agent
|
||||
# user can't write files the user asked it to produce, and the
|
||||
# "agent → file → user downloads" flow dead-ends at a bash "permission
|
||||
# denied". Best-effort: no-ops silently if molecule-runtime itself
|
||||
# isn't root (template's own start.sh should have handled it there).
|
||||
from executor_helpers import ensure_workspace_writable
|
||||
ensure_workspace_writable()
|
||||
|
||||
# 1. Load config
|
||||
config = load_config(config_path)
|
||||
port = config.a2a.port
|
||||
preflight = run_preflight(config, config_path)
|
||||
render_preflight_report(preflight)
|
||||
|
||||
# 1a. Generate AGENTS.md so peer agents and discovery tools can see this
|
||||
# workspace's identity, role, endpoint, and capabilities immediately.
|
||||
try:
|
||||
generate_agents_md(config_path, "/workspace/AGENTS.md")
|
||||
except Exception as _agents_md_err: # pragma: no cover
|
||||
print(f"Warning: AGENTS.md generation failed (non-fatal): {_agents_md_err}")
|
||||
if not preflight.ok:
|
||||
raise SystemExit(1)
|
||||
if awareness_config:
|
||||
awareness_namespace = resolve_awareness_namespace(
|
||||
workspace_id,
|
||||
awareness_config.get("namespace", ""),
|
||||
)
|
||||
print(f"Awareness enabled for namespace: {awareness_namespace}")
|
||||
|
||||
# 1.5 Initialise governance adapter (no-op if disabled or package absent)
|
||||
from builtin_tools.governance import initialize_governance
|
||||
if config.governance.enabled:
|
||||
await initialize_governance(config.governance)
|
||||
print(f"Governance: Microsoft Agent Governance Toolkit enabled (mode={config.governance.policy_mode})")
|
||||
else:
|
||||
print("Governance: disabled (set governance.enabled: true in config.yaml to activate)")
|
||||
|
||||
# 2. Create heartbeat (passed to adapter for task tracking).
|
||||
# interval is sourced from observability.heartbeat_interval_seconds
|
||||
# in config.yaml — clamped to [5, 300] at parse time. Operators
|
||||
# who want a faster crash-detection signal lower it; ones who want
|
||||
# to reduce platform write load raise it.
|
||||
heartbeat = HeartbeatLoop(
|
||||
platform_url,
|
||||
workspace_id,
|
||||
interval_seconds=config.observability.heartbeat_interval_seconds,
|
||||
)
|
||||
|
||||
# 3. Get adapter for this runtime
|
||||
runtime = config.runtime or "langgraph"
|
||||
adapter_cls = get_adapter(runtime) # Raises KeyError if unknown — no silent fallback
|
||||
|
||||
adapter = adapter_cls()
|
||||
print(f"Runtime: {runtime} ({adapter.display_name()})")
|
||||
|
||||
# 3a. Wire pluggable event-log backend from config.observability.event_log.
|
||||
# Default config.yaml sets backend=memory; operators set "disabled" to
|
||||
# opt out without removing append-call sites from adapter code.
|
||||
from event_log import create_event_log
|
||||
adapter.event_log = create_event_log(
|
||||
backend=config.observability.event_log.backend,
|
||||
ttl_seconds=config.observability.event_log.ttl_seconds,
|
||||
max_entries=config.observability.event_log.max_entries,
|
||||
)
|
||||
|
||||
# 4. Build adapter config
|
||||
adapter_config = AdapterConfig(
|
||||
model=config.model,
|
||||
system_prompt=None, # Adapter builds its own prompt
|
||||
tools=config.skills, # Skill names from config.yaml
|
||||
runtime_config=vars(config.runtime_config) if config.runtime_config else {},
|
||||
config_path=config_path,
|
||||
workspace_id=workspace_id,
|
||||
prompt_files=config.prompt_files,
|
||||
a2a_port=port,
|
||||
heartbeat=heartbeat,
|
||||
)
|
||||
|
||||
# 5. Build the AgentCard *before* adapter.setup() so /.well-known/agent-card.json
|
||||
# is reachable as soon as uvicorn binds, regardless of whether the adapter
|
||||
# has working LLM credentials. Decoupling readiness ("is the workspace up?")
|
||||
# from configuration ("can it actually answer?") means a workspace with a
|
||||
# missing/rotated key stays REACHABLE — canvas can render a clear
|
||||
# "agent not configured" error instead of "stuck booting forever," and
|
||||
# operators can deprovision/redeploy normally. Skills built from
|
||||
# config.skills (static names from config.yaml) up front; richer metadata
|
||||
# from the adapter's loaded_skills swaps in below if setup() succeeds.
|
||||
machine_ip = os.environ.get("HOSTNAME", get_machine_ip())
|
||||
workspace_url = f"http://{machine_ip}:{port}"
|
||||
|
||||
# v1: AgentCard.url removed; put url+protocol in supported_interfaces instead.
|
||||
# v1: AgentCapabilities.inputModes/outputModes removed; move to AgentCard.default_*.
|
||||
# v1: pushNotifications → push_notifications (Pydantic field name)
|
||||
#
|
||||
# AgentCard's protocol message uses `supported_interfaces` (plural,
|
||||
# interfaces — see a2a-sdk types/a2a_pb2.pyi:189). The 0.3.x→1.0
|
||||
# migration in #1974 originally used `supported_protocols`, which
|
||||
# the protobuf doesn't expose at all — every workspace boot since
|
||||
# then crashed with `ValueError: Protocol message AgentCard has no
|
||||
# "supported_protocols" field`. The crash didn't surface in the
|
||||
# publish-runtime smoke because the smoke only IMPORTS
|
||||
# molecule_runtime.main, never CALLS the AgentCard constructor.
|
||||
# Don't rename back.
|
||||
agent_card = AgentCard(
|
||||
name=config.name,
|
||||
description=config.description or config.name,
|
||||
version=config.version,
|
||||
supported_interfaces=[
|
||||
AgentInterface(protocol_binding="https://a2a.g/v1", url=workspace_url)
|
||||
],
|
||||
capabilities=AgentCapabilities(
|
||||
streaming=config.a2a.streaming,
|
||||
push_notifications=config.a2a.push_notifications,
|
||||
# Note: state_transition_history (a 0.x capability flag) was
|
||||
# removed in a2a-sdk 1.0. Per the SDK's own
|
||||
# a2a/compat/v0_3/conversions.py: "No longer supported in
|
||||
# v1.0". The capability is now universal — Task.history is
|
||||
# always available and tasks/get accepts historyLength via
|
||||
# apply_history_length(). Don't add this kwarg back.
|
||||
),
|
||||
# Static skill stubs from config.yaml; replaced with rich metadata
|
||||
# below if adapter.setup() loads skills successfully.
|
||||
skills=[
|
||||
AgentSkill(id=name, name=name, description=name, tags=[], examples=[])
|
||||
for name in (config.skills or [])
|
||||
],
|
||||
default_input_modes=["text/plain", "application/json"],
|
||||
default_output_modes=["text/plain", "application/json"],
|
||||
)
|
||||
|
||||
# 6. Setup adapter and create executor
|
||||
# On failure: log + continue. The card route stays mounted (above);
|
||||
# the JSON-RPC route below returns -32603 "agent not configured" until
|
||||
# the operator fixes credentials and redeploys. Heartbeat keeps running
|
||||
# so the platform sees the workspace as reachable-but-misconfigured
|
||||
# rather than crash-looping.
|
||||
adapter_ready = False
|
||||
adapter_error: str | None = None
|
||||
executor = None
|
||||
try:
|
||||
await adapter.setup(adapter_config)
|
||||
executor = await adapter.create_executor(adapter_config)
|
||||
|
||||
# 6a. Boot-smoke short-circuit (issue #2275): if MOLECULE_SMOKE_MODE
|
||||
# is set, exercise the executor's full import tree by calling
|
||||
# execute() once with stub deps + a short timeout. Skips platform
|
||||
# registration + uvicorn entirely. Returns process exit code.
|
||||
from smoke_mode import is_smoke_mode, run_executor_smoke
|
||||
if is_smoke_mode():
|
||||
exit_code = await run_executor_smoke(executor)
|
||||
if hasattr(heartbeat, "stop"):
|
||||
try:
|
||||
await heartbeat.stop()
|
||||
except Exception: # noqa: BLE001
|
||||
pass
|
||||
raise SystemExit(exit_code)
|
||||
|
||||
# 6b. Restore from pre-stop snapshot if one exists (GH#1391).
|
||||
# The snapshot is scrubbed before being written, so secrets are
|
||||
# already redacted — restore_state must not re-expose them.
|
||||
from lib.pre_stop import read_snapshot
|
||||
snapshot = read_snapshot()
|
||||
if snapshot:
|
||||
try:
|
||||
adapter.restore_state(snapshot)
|
||||
print(
|
||||
f"Pre-stop snapshot restored: task={snapshot.get('current_task', '')!r}, "
|
||||
f"uptime={snapshot.get('uptime_seconds', 0)}s"
|
||||
)
|
||||
except Exception as restore_err:
|
||||
print(f"Warning: snapshot restore failed (continuing): {restore_err}")
|
||||
|
||||
# 6c. Swap rich skill metadata into the card now that setup() loaded
|
||||
# them. In-place mutation: a2a-sdk's create_agent_card_routes serialises
|
||||
# the card on each request, so the route mounted below sees the update.
|
||||
# Isolated via card_helpers.enrich_card_skills — a malformed
|
||||
# loaded_skills shape (e.g., a future adapter that doesn't follow
|
||||
# the .metadata convention) is logged + swallowed instead of
|
||||
# propagating up to the outer except, where it would silently
|
||||
# degrade an OK boot to the not-configured state.
|
||||
from card_helpers import enrich_card_skills
|
||||
enrich_card_skills(agent_card, getattr(adapter, "loaded_skills", None))
|
||||
adapter_ready = True
|
||||
except SystemExit:
|
||||
# Smoke-mode exit signal — propagate untouched.
|
||||
raise
|
||||
except Exception as setup_err: # noqa: BLE001
|
||||
adapter_error = f"{type(setup_err).__name__}: {setup_err}"
|
||||
print(
|
||||
f"WARNING: adapter.setup() failed — workspace will serve agent-card "
|
||||
f"but JSON-RPC will return -32603 until configuration is fixed. "
|
||||
f"Reason: {adapter_error}",
|
||||
flush=True,
|
||||
)
|
||||
# Heartbeat keeps running so the platform marks the workspace as
|
||||
# reachable-but-misconfigured. Operators can then redeploy with the
|
||||
# correct env vars without having to chase a crash-loop.
|
||||
|
||||
# 6.5. Initialise Temporal durable execution wrapper (optional). Only
|
||||
# meaningful when an executor exists; skipped on misconfigured boots.
|
||||
if adapter_ready:
|
||||
from builtin_tools.temporal_workflow import create_wrapper as _create_temporal_wrapper
|
||||
temporal_wrapper = _create_temporal_wrapper()
|
||||
await temporal_wrapper.start()
|
||||
|
||||
# 7. Wrap in A2A.
|
||||
#
|
||||
# Route assembly is in workspace/boot_routes.py so the contract —
|
||||
# card always mounted, JSON-RPC route swaps based on adapter state
|
||||
# (DefaultRequestHandler when executor is non-None, not_configured
|
||||
# handler returning -32603 otherwise) — is unit-testable with
|
||||
# Starlette's TestClient. main.py is `# pragma: no cover` so without
|
||||
# this extraction a future refactor that re-coupled card + setup()
|
||||
# would silently bypass PR #2756. tests/test_boot_routes.py pins
|
||||
# the four-branch contract.
|
||||
from boot_routes import build_routes
|
||||
app = Starlette(routes=build_routes(agent_card, executor, adapter_error))
|
||||
|
||||
# 8. Register with platform
|
||||
# When adapter.setup() failed, advertise via configuration_status so
|
||||
# the platform/canvas can render "configured: false, reason: …" instead
|
||||
# of a confused "ready but silent" state.
|
||||
loaded_skills = getattr(adapter, "loaded_skills", None) or []
|
||||
agent_card_dict = {
|
||||
"name": config.name,
|
||||
"description": config.description,
|
||||
"version": config.version,
|
||||
"url": workspace_url,
|
||||
"skills": [
|
||||
{
|
||||
"id": s.metadata.id,
|
||||
"name": s.metadata.name,
|
||||
"description": s.metadata.description,
|
||||
"tags": s.metadata.tags,
|
||||
}
|
||||
for s in loaded_skills
|
||||
] if adapter_ready else [
|
||||
{"id": n, "name": n, "description": n, "tags": []}
|
||||
for n in (config.skills or [])
|
||||
],
|
||||
"capabilities": {
|
||||
"streaming": config.a2a.streaming,
|
||||
"pushNotifications": config.a2a.push_notifications,
|
||||
},
|
||||
"configuration_status": "ready" if adapter_ready else "not_configured",
|
||||
**({"configuration_error": adapter_error} if adapter_error else {}),
|
||||
}
|
||||
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
try:
|
||||
resp = await client.post(
|
||||
f"{platform_url}/registry/register",
|
||||
json={
|
||||
"id": workspace_id,
|
||||
"url": workspace_url,
|
||||
"agent_card": agent_card_dict,
|
||||
},
|
||||
headers=auth_headers(),
|
||||
)
|
||||
print(f"Registered with platform: {resp.status_code}")
|
||||
# Phase 30.1 — capture the auth token issued at first register.
|
||||
# The platform only mints one on first register per workspace,
|
||||
# so a subsequent restart gets an empty auth_token and we
|
||||
# keep using the on-disk copy from the original issuance.
|
||||
if resp.status_code == 200:
|
||||
try:
|
||||
body = resp.json()
|
||||
tok = body.get("auth_token")
|
||||
if tok:
|
||||
from platform_auth import save_token
|
||||
save_token(tok)
|
||||
print(f"Saved workspace auth token (prefix={tok[:8]}…)")
|
||||
# RFC #2312 PR-F: persist platform_inbound_secret if the
|
||||
# platform supplied one. Idempotent — writing the same
|
||||
# value over an existing file is harmless. Required for
|
||||
# SaaS where there's no persistent /configs volume; on
|
||||
# Docker mode it overwrites the value the provisioner
|
||||
# already wrote at workspace creation.
|
||||
inbound = body.get("platform_inbound_secret")
|
||||
if inbound:
|
||||
from platform_inbound_auth import save_inbound_secret
|
||||
save_inbound_secret(inbound)
|
||||
print(f"Saved platform_inbound_secret (prefix={inbound[:8]}…)")
|
||||
except Exception as parse_exc:
|
||||
print(f"Warning: couldn't parse register response for token: {parse_exc}")
|
||||
except Exception as e:
|
||||
print(f"Warning: failed to register with platform: {e}")
|
||||
|
||||
# 9. Start heartbeat
|
||||
heartbeat.start()
|
||||
|
||||
# 9b. Start skills hot-reload watcher (background task)
|
||||
# When a skill file changes the watcher reloads the skill module and calls
|
||||
# back into the adapter so the next A2A request uses the updated tools.
|
||||
# Skipped on misconfigured boots — adapter has no executor / tool registry
|
||||
# to swap into, so reloading skills would NPE on the agent rebuild path.
|
||||
if adapter_ready and config.skills:
|
||||
try:
|
||||
from skill_loader.watcher import SkillsWatcher
|
||||
|
||||
def _on_skill_reload(updated_skill):
|
||||
"""Rebuild the LangGraph agent when a skill changes in-place."""
|
||||
if not hasattr(adapter, "loaded_skills"):
|
||||
return
|
||||
# Replace the matching skill in the adapter's skill list
|
||||
adapter.loaded_skills = [
|
||||
updated_skill if s.metadata.id == updated_skill.metadata.id else s
|
||||
for s in adapter.loaded_skills
|
||||
]
|
||||
# Rebuild the agent's tool list from updated skills
|
||||
if hasattr(adapter, "all_tools") and hasattr(adapter, "system_prompt"):
|
||||
from builtin_tools.approval import request_approval
|
||||
from builtin_tools.delegation import delegate_task, delegate_task_async, check_task_status
|
||||
from builtin_tools.memory import commit_memory, recall_memory
|
||||
from builtin_tools.sandbox import run_code
|
||||
# Core platform tools mirror adapter_base.all_tools — must
|
||||
# match the platform_tools registry names so docs and tools
|
||||
# never drift.
|
||||
base_tools = [
|
||||
delegate_task, delegate_task_async, check_task_status,
|
||||
request_approval, commit_memory, recall_memory, run_code,
|
||||
]
|
||||
skill_tools = []
|
||||
for sk in adapter.loaded_skills:
|
||||
skill_tools.extend(sk.tools)
|
||||
adapter.all_tools = base_tools + skill_tools
|
||||
# Rebuild compiled agent so next ainvoke picks up new tools
|
||||
try:
|
||||
from agent import create_agent
|
||||
new_agent = create_agent(
|
||||
config.model, adapter.all_tools, adapter.system_prompt
|
||||
)
|
||||
executor.agent = new_agent
|
||||
print(f"Skills hot-reload: '{updated_skill.metadata.id}' reloaded — "
|
||||
f"{len(updated_skill.tools)} tool(s)")
|
||||
except Exception as rebuild_err:
|
||||
print(f"Skills hot-reload: agent rebuild failed: {rebuild_err}")
|
||||
|
||||
skills_watcher = SkillsWatcher(
|
||||
config_path=config_path,
|
||||
skill_names=config.skills,
|
||||
on_reload=_on_skill_reload,
|
||||
current_runtime=runtime,
|
||||
)
|
||||
asyncio.create_task(skills_watcher.start())
|
||||
print(f"Skills hot-reload enabled for: {config.skills}")
|
||||
except Exception as e:
|
||||
print(f"Warning: skills watcher could not start: {e}")
|
||||
|
||||
# 10. Run A2A server
|
||||
print(f"Workspace {workspace_id} starting on port {port}")
|
||||
# Wrap the ASGI app with W3C TraceContext extraction middleware so incoming
|
||||
# A2A HTTP requests propagate their trace context into _incoming_trace_context.
|
||||
# v1: Starlette app is constructed directly; no build() step needed
|
||||
starlette_app = app
|
||||
|
||||
# Add /transcript route — exposes the most-recent agent session log
|
||||
# (claude-code reads ~/.claude/projects/<cwd>/<session>.jsonl). Other
|
||||
# runtimes return supported:false.
|
||||
from starlette.responses import JSONResponse
|
||||
from starlette.routing import Route
|
||||
|
||||
async def _transcript_handler(request):
|
||||
# Require workspace bearer token — the same token issued at registration
|
||||
# and stored in /configs/.auth_token. Any container on molecule-core-net
|
||||
# could otherwise read the full session log. Closes #287.
|
||||
#
|
||||
# #328: fail CLOSED when the token file is unavailable. get_token()
|
||||
# returns None during the bootstrap window (first register hasn't
|
||||
# completed), if /configs/.auth_token was deleted, or on OSError.
|
||||
# The old `if expected:` guard treated all three cases as "skip
|
||||
# auth" — an unauthenticated container on the same Docker network
|
||||
# could read the entire session log during that window. Deny
|
||||
# instead. The platform's TranscriptHandler acquires the token
|
||||
# during registration, so once the bootstrap completes it always
|
||||
# has a valid credential to present.
|
||||
from platform_auth import get_token
|
||||
if not _transcript_authorized(get_token(), request.headers.get("Authorization", "")):
|
||||
return JSONResponse({"error": "unauthorized"}, status_code=401)
|
||||
try:
|
||||
since = int(request.query_params.get("since", "0"))
|
||||
limit = int(request.query_params.get("limit", "100"))
|
||||
except (TypeError, ValueError):
|
||||
return JSONResponse({"error": "since and limit must be integers"}, status_code=400)
|
||||
# Isolate adapter call: misconfigured boots leave the adapter
|
||||
# partially-initialised, and a future adapter override of
|
||||
# transcript_lines might assume setup() ran. Surface a 503 with
|
||||
# a clear reason instead of letting the exception propagate to
|
||||
# Starlette's 500 handler — same pattern as the not-configured
|
||||
# JSON-RPC route (PR #2756). BaseAdapter.transcript_lines's
|
||||
# default returns {"supported": false} so today's 4 adapters
|
||||
# never trigger this branch; this is the safety net.
|
||||
try:
|
||||
result = await adapter.transcript_lines(since=since, limit=limit)
|
||||
except Exception as transcript_err: # noqa: BLE001
|
||||
return JSONResponse(
|
||||
{
|
||||
"error": "transcript unavailable",
|
||||
"detail": f"{type(transcript_err).__name__}: {transcript_err}",
|
||||
},
|
||||
status_code=503,
|
||||
)
|
||||
return JSONResponse(result)
|
||||
|
||||
starlette_app.add_route("/transcript", _transcript_handler, methods=["GET"])
|
||||
|
||||
# /internal/* — platform→workspace forward calls (RFC #2312). Auth
|
||||
# is the per-workspace platform_inbound_secret in
|
||||
# /configs/.platform_inbound_secret, distinct from the outbound
|
||||
# workspace_auth_token used by /transcript above.
|
||||
from internal_chat_uploads import ingest_handler as _internal_chat_uploads_ingest
|
||||
starlette_app.add_route(
|
||||
"/internal/chat/uploads/ingest",
|
||||
_internal_chat_uploads_ingest,
|
||||
methods=["POST"],
|
||||
)
|
||||
from internal_file_read import file_read_handler as _internal_file_read
|
||||
starlette_app.add_route(
|
||||
"/internal/file/read",
|
||||
_internal_file_read,
|
||||
methods=["GET"],
|
||||
)
|
||||
|
||||
built_app = make_trace_middleware(starlette_app)
|
||||
|
||||
# uvicorn expects the level name in lowercase ("debug" / "info" /
|
||||
# "warning" / "error" / "critical"). config.observability.log_level
|
||||
# is uppercased at parse time (config.py.load_config) for the
|
||||
# Python ``logging`` module's convention; lower it here so both
|
||||
# consumers get the form they expect from one source of truth.
|
||||
# An ``LOG_LEVEL`` env var still wins as an ops-side debugging
|
||||
# override — set it on the workspace process to bypass YAML
|
||||
# without a config edit + restart cycle.
|
||||
uvicorn_log_level = os.environ.get("LOG_LEVEL", config.observability.log_level).lower()
|
||||
server_config = uvicorn.Config(
|
||||
built_app,
|
||||
host="0.0.0.0",
|
||||
port=port,
|
||||
log_level=uvicorn_log_level,
|
||||
)
|
||||
server = uvicorn.Server(server_config)
|
||||
|
||||
# 10b. Schedule initial_prompt self-message after server is ready.
|
||||
# Only runs on first boot — creates a marker file to prevent re-execution on restart.
|
||||
# Skipped on misconfigured boots: the self-message would route through the
|
||||
# platform back to /, hit the -32603 not-configured handler, and consume
|
||||
# the marker for a fire that can't actually run. Wait until the operator
|
||||
# fixes credentials and the workspace redeploys with adapter_ready=True.
|
||||
initial_prompt_task = None
|
||||
initial_prompt_marker = resolve_initial_prompt_marker(config_path)
|
||||
if adapter_ready and config.initial_prompt and not os.path.exists(initial_prompt_marker):
|
||||
# Write the marker UP FRONT (#71): if the prompt later crashes or
|
||||
# times out, we do NOT replay on next boot — that created a
|
||||
# ProcessError cascade where every message kept crashing. Operators
|
||||
# can always re-send via chat. Log loudly if the marker write
|
||||
# fails so the situation is visible.
|
||||
if not mark_initial_prompt_attempted(initial_prompt_marker):
|
||||
print(
|
||||
f"Initial prompt: WARNING — could not write marker at "
|
||||
f"{initial_prompt_marker}; this boot may replay if it crashes.",
|
||||
flush=True,
|
||||
)
|
||||
async def _send_initial_prompt():
|
||||
"""Wait for server to be ready, then send initial_prompt as self-message."""
|
||||
# Wait for the A2A server to accept connections.
|
||||
# Use the SDK's own constant for the well-known path so this
|
||||
# probe and the route mounted by create_agent_card_routes()
|
||||
# never drift apart. Pre-fix this hardcoded the pre-1.x
|
||||
# well-known path string; a2a-sdk 1.x renamed it (the
|
||||
# canonical value lives in a2a.utils.constants now), so
|
||||
# the probe got 404 every attempt and fell through to
|
||||
# "server not ready after 30s, skipping" even though the
|
||||
# server was actually serving fine. Net effect: every
|
||||
# workspace silently dropped its `initial_prompt`.
|
||||
from a2a.utils.constants import AGENT_CARD_WELL_KNOWN_PATH
|
||||
ready = False
|
||||
for attempt in range(30):
|
||||
await asyncio.sleep(1)
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.get(f"http://127.0.0.1:{port}{AGENT_CARD_WELL_KNOWN_PATH}")
|
||||
if resp.status_code == 200:
|
||||
ready = True
|
||||
break
|
||||
except Exception:
|
||||
continue
|
||||
|
||||
if not ready:
|
||||
print("Initial prompt: server not ready after 30s, skipping", flush=True)
|
||||
return
|
||||
|
||||
# Send initial prompt through the platform A2A proxy (not directly to self).
|
||||
# The proxy logs an a2a_receive with source_id=NULL (canvas-style),
|
||||
# broadcasts A2A_RESPONSE via WebSocket so the chat shows both the
|
||||
# prompt (as user message) and the response (as agent message).
|
||||
# Uses urllib in a thread to avoid asyncio/httpx streaming hangs.
|
||||
import json as _json
|
||||
import urllib.request
|
||||
|
||||
def _do_send_sync():
|
||||
import time as _time
|
||||
payload = _json.dumps({
|
||||
"method": "message/send",
|
||||
"params": {
|
||||
"message": {
|
||||
"role": "user",
|
||||
"messageId": f"initial-{_uuid.uuid4().hex[:8]}",
|
||||
"parts": [{"kind": "text", "text": config.initial_prompt}],
|
||||
},
|
||||
},
|
||||
}).encode()
|
||||
|
||||
# #220: include platform bearer token so the request isn't
|
||||
# silently rejected once any workspace has a live token on
|
||||
# file. Without this, initial_prompt 401s in multi-tenant
|
||||
# mode exactly like /registry/register did in #215.
|
||||
# X-Workspace-ID via self_source_headers() so the platform
|
||||
# tags the row source=agent — without it the canvas's
|
||||
# My Chat tab renders the initial_prompt as if the user
|
||||
# had typed it. See platform_auth.py for the full
|
||||
# explanation.
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
**self_source_headers(workspace_id),
|
||||
}
|
||||
|
||||
# Retry with backoff — the platform proxy may not be able to
|
||||
# reach us yet (container networking takes a moment to settle).
|
||||
max_retries = 5
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
req = urllib.request.Request(
|
||||
f"{platform_url}/workspaces/{workspace_id}/a2a",
|
||||
data=payload,
|
||||
headers=headers,
|
||||
)
|
||||
with urllib.request.urlopen(req, timeout=600) as resp:
|
||||
resp.read()
|
||||
print(f"Initial prompt: completed (status={resp.status})", flush=True)
|
||||
break
|
||||
except Exception as e:
|
||||
if attempt < max_retries - 1:
|
||||
delay = 2 ** attempt # 1, 2, 4, 8, 16 seconds
|
||||
print(f"Initial prompt: attempt {attempt + 1} failed ({e}), retrying in {delay}s...", flush=True)
|
||||
_time.sleep(delay)
|
||||
else:
|
||||
print(f"Initial prompt: failed after {max_retries} attempts — {e}", flush=True)
|
||||
return
|
||||
|
||||
# Marker was already written up front (#71). Nothing to do here.
|
||||
|
||||
print("Initial prompt: sending via platform proxy...", flush=True)
|
||||
loop = asyncio.get_event_loop()
|
||||
loop.run_in_executor(None, _do_send_sync)
|
||||
|
||||
initial_prompt_task = asyncio.create_task(_send_initial_prompt())
|
||||
|
||||
# 10c. Idle loop — reflection-on-completion / backlog-pull pattern.
|
||||
# Fires config.idle_prompt every config.idle_interval_seconds while the
|
||||
# workspace has no active task. This turns every role from "waits for cron"
|
||||
# into "self-wakes when idle" — the Hermes/Letta shape from today's
|
||||
# multi-framework survey (see docs/ecosystem-watch.md). Cost collapses to
|
||||
# event-driven in practice: the idle check is local (no LLM call, just
|
||||
# heartbeat.active_tasks==0), and the prompt only fires when there's
|
||||
# actually nothing to do. Gated on idle_prompt being non-empty so existing
|
||||
# workspaces upgrade opt-in — set idle_prompt in org.yaml defaults or
|
||||
# per-workspace to enable.
|
||||
idle_loop_task = None
|
||||
# Skipped on misconfigured boots — the self-fire would route to the
|
||||
# -32603 handler in a tight loop and consume cycles for no useful work.
|
||||
if adapter_ready and config.idle_prompt:
|
||||
# Idle-fire HTTP timeout. Kept tight relative to the fire cadence so a
|
||||
# hung platform doesn't accumulate dangling requests — a fire that
|
||||
# takes longer than the idle interval itself is almost certainly stuck.
|
||||
IDLE_FIRE_TIMEOUT_SECONDS = max(60, min(300, config.idle_interval_seconds))
|
||||
# Initial settle delay — never longer than 60s so cold-start races
|
||||
# don't stall the first fire, and never shorter than the configured
|
||||
# interval (short intervals shouldn't fire instantly on boot either).
|
||||
IDLE_INITIAL_SETTLE_SECONDS = min(config.idle_interval_seconds, 60)
|
||||
|
||||
async def _run_idle_loop():
|
||||
"""Self-sends config.idle_prompt periodically when the workspace is idle."""
|
||||
await asyncio.sleep(IDLE_INITIAL_SETTLE_SECONDS)
|
||||
|
||||
import json as _json
|
||||
from urllib import request as _urlreq, error as _urlerr
|
||||
|
||||
while True:
|
||||
try:
|
||||
await asyncio.sleep(config.idle_interval_seconds)
|
||||
except asyncio.CancelledError:
|
||||
return
|
||||
|
||||
# Local idle check — no platform API call, no LLM call.
|
||||
# heartbeat.active_tasks == 0 means no in-flight work.
|
||||
if heartbeat.active_tasks > 0:
|
||||
continue
|
||||
|
||||
# Issue #381 fix: skip the idle prompt if there are unconsumed
|
||||
# delegation results waiting. The heartbeat sends a self-message
|
||||
# for every new result batch, so sending the idle prompt here would
|
||||
# race: the agent would compose a stale tick BEFORE processing the
|
||||
# results notification, producing repeated identical asks (peer sends
|
||||
# correction, we respond with stale state, peer asks again).
|
||||
# By skipping the idle prompt when results are pending, we let the
|
||||
# heartbeat's own self-message wake the agent after results are
|
||||
# written. The agent then sees the results in _prepare_prompt()
|
||||
# and processes them before composing.
|
||||
# Guard logic extracted to _check_delegation_results_pending() for
|
||||
# direct unit-testing (#401 follow-up).
|
||||
if _check_delegation_results_pending():
|
||||
print(
|
||||
"Idle loop: skipping — unconsumed delegation results pending "
|
||||
"(heartbeat will notify agent)",
|
||||
flush=True,
|
||||
)
|
||||
continue
|
||||
|
||||
# Self-post the idle prompt via the platform A2A proxy (same
|
||||
# path as initial_prompt). The agent's own concurrency control
|
||||
# rejects if the workspace becomes busy between this check and
|
||||
# the post — that's the expected safety valve.
|
||||
payload = _json.dumps({
|
||||
"method": "message/send",
|
||||
"params": {
|
||||
"message": {
|
||||
"role": "user",
|
||||
"messageId": f"idle-{_uuid.uuid4().hex[:8]}",
|
||||
"parts": [{"kind": "text", "text": config.idle_prompt}],
|
||||
},
|
||||
},
|
||||
}).encode()
|
||||
|
||||
def _post_sync():
|
||||
# Returns (status_code, error_type) so the caller logs the
|
||||
# actual outcome instead of a bare "post failed" line.
|
||||
# #220: include auth_headers() on every idle fire. Without
|
||||
# this, the idle loop 401s in multi-tenant mode.
|
||||
# self_source_headers() adds X-Workspace-ID so the
|
||||
# platform classifies the idle fire as source=agent
|
||||
# rather than user-typed canvas input.
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
**self_source_headers(workspace_id),
|
||||
}
|
||||
try:
|
||||
req = _urlreq.Request(
|
||||
f"{platform_url}/workspaces/{workspace_id}/a2a",
|
||||
data=payload,
|
||||
headers=headers,
|
||||
)
|
||||
with _urlreq.urlopen(req, timeout=IDLE_FIRE_TIMEOUT_SECONDS) as resp:
|
||||
resp.read()
|
||||
return resp.status, None
|
||||
except _urlerr.HTTPError as e:
|
||||
return e.code, type(e).__name__
|
||||
except _urlerr.URLError as e:
|
||||
return None, f"URLError: {e.reason}"
|
||||
except Exception as e: # pragma: no cover — catch-all safety net
|
||||
return None, type(e).__name__
|
||||
|
||||
print(
|
||||
f"Idle loop: firing (active_tasks=0, interval={config.idle_interval_seconds}s, "
|
||||
f"timeout={IDLE_FIRE_TIMEOUT_SECONDS}s)",
|
||||
flush=True,
|
||||
)
|
||||
loop_ref = asyncio.get_running_loop()
|
||||
|
||||
def _log_result(future):
|
||||
try:
|
||||
status, err = future.result()
|
||||
if err:
|
||||
print(
|
||||
f"Idle loop: post failed — status={status} err={err}",
|
||||
flush=True,
|
||||
)
|
||||
else:
|
||||
print(f"Idle loop: post ok status={status}", flush=True)
|
||||
except Exception as e: # pragma: no cover
|
||||
print(f"Idle loop: executor callback crashed — {e}", flush=True)
|
||||
|
||||
fut = loop_ref.run_in_executor(None, _post_sync)
|
||||
fut.add_done_callback(_log_result)
|
||||
|
||||
idle_loop_task = asyncio.create_task(_run_idle_loop())
|
||||
|
||||
try:
|
||||
await server.serve()
|
||||
finally:
|
||||
# 10d. Pre-stop serialization — GH#1391.
|
||||
# Capture in-memory state before the container exits so it survives
|
||||
# intentional pause and unplanned restart. All content is scrubbed
|
||||
# via lib.snapshot_scrub before being written to the config volume.
|
||||
try:
|
||||
from lib.pre_stop import build_snapshot, write_snapshot
|
||||
adapter_state = adapter.pre_stop_state() if adapter else {}
|
||||
snapshot = build_snapshot(heartbeat, adapter_state)
|
||||
write_snapshot(snapshot)
|
||||
except Exception as pre_stop_err:
|
||||
print(f"Warning: pre-stop serialization failed (continuing): {pre_stop_err}")
|
||||
|
||||
# Cancel initial prompt if still running
|
||||
if initial_prompt_task and not initial_prompt_task.done():
|
||||
initial_prompt_task.cancel()
|
||||
# Cancel idle loop if running
|
||||
if idle_loop_task and not idle_loop_task.done():
|
||||
idle_loop_task.cancel()
|
||||
# Gracefully stop the Temporal worker background task on shutdown
|
||||
await temporal_wrapper.stop()
|
||||
|
||||
|
||||
def main_sync(): # pragma: no cover
|
||||
"""Synchronous entry point for the `molecule-runtime` console script.
|
||||
|
||||
Declared in scripts/build_runtime_package.py as the wheel's entry-point
|
||||
target (`molecule-runtime = "molecule_runtime.main:main_sync"`). Removed
|
||||
silently during the pre-monorepo consolidation, which broke every
|
||||
workspace startup against 0.1.16/0.1.17/0.1.18 with `ImportError:
|
||||
cannot import name 'main_sync'`. The .github/workflows/runtime-pin-compat.yml
|
||||
smoke step is the regression gate.
|
||||
"""
|
||||
asyncio.run(main())
|
||||
|
||||
|
||||
if __name__ == "__main__": # pragma: no cover
|
||||
main_sync()
|
||||
@@ -1,220 +0,0 @@
|
||||
"""Console-script entry point for the ``molecule-mcp`` universal MCP server.
|
||||
|
||||
Validates required environment BEFORE importing the heavy
|
||||
``a2a_mcp_server`` module — that module triggers a ``RuntimeError`` at
|
||||
import time when ``WORKSPACE_ID`` is unset (a2a_client.py:22), and
|
||||
console-script entry-point shims surface it as an ugly traceback. This
|
||||
wrapper catches the missing-env case early and prints actionable help
|
||||
to stderr so an operator running ``molecule-mcp`` for the first time
|
||||
gets the right pointer in the first 3 lines of output instead of a
|
||||
20-line traceback.
|
||||
|
||||
Standalone-runtime contract: this wrapper is responsible for keeping
|
||||
the workspace ALIVE on the platform side, not just exposing tools.
|
||||
Concretely it:
|
||||
1. Calls ``POST /registry/register`` once at startup (idempotent —
|
||||
the upsert flips status awaiting_agent → online for an external
|
||||
workspace whose token matches).
|
||||
2. Spawns a daemon heartbeat thread that POSTs to
|
||||
``POST /registry/heartbeat`` every 20s. Without continuous
|
||||
heartbeats the platform's healthsweep flips the workspace back
|
||||
to awaiting_agent (visible as OFFLINE in the canvas with a
|
||||
"Restart" CTA) within 60-90s.
|
||||
3. Runs the MCP stdio loop in the foreground.
|
||||
|
||||
Why threads + sync requests: the MCP stdio server is async. The
|
||||
heartbeat work is fire-and-forget HTTP. A daemon thread is the
|
||||
lowest-friction integration — no asyncio bridging, dies automatically
|
||||
when the main process exits, and ``requests`` is already a transitive
|
||||
dependency via ``a2a-sdk``.
|
||||
|
||||
In-container usage (``python -m molecule_runtime.a2a_mcp_server`` or
|
||||
direct import) bypasses this wrapper — the workspace runtime has its
|
||||
own heartbeat loop in ``heartbeat.py`` so we don't double-heartbeat.
|
||||
|
||||
Module layout (RFC #2873 iter 3 split):
|
||||
* ``mcp_heartbeat`` — register POST + heartbeat loop + auth-failure
|
||||
escalation + inbound-secret persistence.
|
||||
* ``mcp_workspace_resolver`` — env validation, single + multi-workspace
|
||||
resolution, operator-help printer, on-disk token-file read.
|
||||
* ``mcp_inbox_pollers`` — activate the inbox singleton + spawn one
|
||||
daemon poller per workspace.
|
||||
|
||||
This file keeps just ``main()`` plus thin re-exports of the private
|
||||
symbols so existing tests' imports (``mcp_cli._build_agent_card``,
|
||||
``mcp_cli._heartbeat_loop``, etc.) keep working without churn.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
|
||||
import configs_dir
|
||||
import mcp_heartbeat
|
||||
import mcp_inbox_pollers
|
||||
import mcp_workspace_resolver
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Re-export public surface for back-compat with the pre-split callers
|
||||
# and tests. The underscore-prefixed names mirror the names that
|
||||
# existed in this module before the split — keeping them ensures
|
||||
# `mcp_cli._build_agent_card`, `mcp_cli._heartbeat_loop`, etc.
|
||||
# resolve identically to the new functions.
|
||||
HEARTBEAT_INTERVAL_SECONDS = mcp_heartbeat.HEARTBEAT_INTERVAL_SECONDS
|
||||
_HEARTBEAT_AUTH_LOUD_THRESHOLD = mcp_heartbeat.HEARTBEAT_AUTH_LOUD_THRESHOLD
|
||||
_HEARTBEAT_AUTH_RELOG_INTERVAL = mcp_heartbeat.HEARTBEAT_AUTH_RELOG_INTERVAL
|
||||
|
||||
_build_agent_card = mcp_heartbeat.build_agent_card
|
||||
_platform_register = mcp_heartbeat.platform_register
|
||||
_heartbeat_loop = mcp_heartbeat.heartbeat_loop
|
||||
_log_heartbeat_auth_failure = mcp_heartbeat.log_heartbeat_auth_failure
|
||||
_persist_inbound_secret_from_heartbeat = mcp_heartbeat.persist_inbound_secret_from_heartbeat
|
||||
_start_heartbeat_thread = mcp_heartbeat.start_heartbeat_thread
|
||||
|
||||
_resolve_workspaces = mcp_workspace_resolver.resolve_workspaces
|
||||
_print_missing_env_help = mcp_workspace_resolver.print_missing_env_help
|
||||
_read_token_file = mcp_workspace_resolver.read_token_file
|
||||
|
||||
_start_inbox_pollers = mcp_inbox_pollers.start_inbox_pollers
|
||||
|
||||
|
||||
def main() -> None:
|
||||
"""Entry point for the ``molecule-mcp`` console script.
|
||||
|
||||
Returns nothing — calls ``sys.exit`` on validation failure or on
|
||||
normal completion of the underlying MCP server loop.
|
||||
|
||||
Two registration shapes:
|
||||
* Single-workspace (legacy): ``WORKSPACE_ID`` + token env/file.
|
||||
Unchanged behavior.
|
||||
* Multi-workspace: ``MOLECULE_WORKSPACES`` JSON env var with N
|
||||
``{"id": ..., "token": ...}`` entries. One register + heartbeat
|
||||
+ inbox poller per entry; messages from any workspace land in
|
||||
the same agent inbox tagged with ``arrival_workspace_id``.
|
||||
|
||||
Subcommand:
|
||||
``molecule-mcp doctor`` runs an onboarding diagnostic against the
|
||||
current shell environment + platform reachability and exits.
|
||||
Closes Ryan's #2934 item 6.
|
||||
"""
|
||||
# Subcommand dispatch — must come BEFORE env-var validation so
|
||||
# `molecule-mcp doctor` can run on a partially-configured shell
|
||||
# and tell the operator what's missing. Argv shapes:
|
||||
# molecule-mcp → run server (this function's main path)
|
||||
# molecule-mcp doctor → run diagnostic, exit
|
||||
# molecule-mcp --help → defer to doctor for now (no other
|
||||
# flags are supported yet)
|
||||
if len(sys.argv) > 1:
|
||||
if sys.argv[1] in ("doctor", "--doctor"):
|
||||
import mcp_doctor
|
||||
sys.exit(mcp_doctor.run())
|
||||
if sys.argv[1] in ("--help", "-h", "help"):
|
||||
print(
|
||||
"molecule-mcp — Molecule AI universal MCP server\n\n"
|
||||
"Usage:\n"
|
||||
" molecule-mcp Run the MCP stdio server (registers + heartbeats)\n"
|
||||
" molecule-mcp doctor Run onboarding diagnostic + exit\n\n"
|
||||
"Required env: PLATFORM_URL, WORKSPACE_ID (or MOLECULE_WORKSPACES),\n"
|
||||
" MOLECULE_WORKSPACE_TOKEN (or MOLECULE_WORKSPACE_TOKEN_FILE)\n",
|
||||
)
|
||||
sys.exit(0)
|
||||
|
||||
if not os.environ.get("PLATFORM_URL", "").strip():
|
||||
_print_missing_env_help(
|
||||
["PLATFORM_URL"],
|
||||
have_token_file=(configs_dir.resolve() / ".auth_token").is_file(),
|
||||
)
|
||||
sys.exit(2)
|
||||
|
||||
workspaces, errors = _resolve_workspaces()
|
||||
if errors or not workspaces:
|
||||
# Reuse the missing-env help printer for legacy WORKSPACE_ID +
|
||||
# token shape, which is what most first-run operators hit. For
|
||||
# MOLECULE_WORKSPACES errors, print directly so the JSON-shape
|
||||
# message isn't mangled into the WORKSPACE_ID-style help.
|
||||
if os.environ.get("MOLECULE_WORKSPACES", "").strip():
|
||||
print("molecule-mcp: invalid MOLECULE_WORKSPACES:", file=sys.stderr)
|
||||
for e in errors:
|
||||
print(f" - {e}", file=sys.stderr)
|
||||
else:
|
||||
_print_missing_env_help(
|
||||
errors or ["WORKSPACE_ID", "MOLECULE_WORKSPACE_TOKEN"],
|
||||
have_token_file=(configs_dir.resolve() / ".auth_token").is_file(),
|
||||
)
|
||||
sys.exit(2)
|
||||
|
||||
platform_url = os.environ["PLATFORM_URL"].strip().rstrip("/")
|
||||
|
||||
# In multi-workspace mode the FIRST entry is treated as the
|
||||
# "primary" — it gets exported to a2a_client.py's module-level
|
||||
# WORKSPACE_ID (which gates a RuntimeError at import time) and is
|
||||
# used by tools that don't yet take an explicit workspace_id. PR-2
|
||||
# parameterizes those tools; for now this preserves existing
|
||||
# outbound-tool behavior unchanged for single-workspace operators
|
||||
# AND for the multi-workspace operator's first registered
|
||||
# workspace.
|
||||
primary_workspace_id, _primary_token = workspaces[0]
|
||||
os.environ["WORKSPACE_ID"] = primary_workspace_id
|
||||
|
||||
# Configure logging so the operator sees register/heartbeat status
|
||||
# without needing to set up logging themselves. WARNING by default
|
||||
# keeps the steady-state quiet (only failures); MOLECULE_MCP_VERBOSE=1
|
||||
# surfaces register-success + per-tick heartbeat info for debugging.
|
||||
log_level = (
|
||||
logging.INFO
|
||||
if os.environ.get("MOLECULE_MCP_VERBOSE", "").strip()
|
||||
else logging.WARNING
|
||||
)
|
||||
logging.basicConfig(level=log_level, format="[molecule-mcp] %(message)s")
|
||||
|
||||
# Populate the per-workspace token registry so heartbeat threads,
|
||||
# the inbox poller, and (later) outbound tools resolve the right
|
||||
# token for each workspace via ``platform_auth.auth_headers(wsid)``.
|
||||
# Done BEFORE register/heartbeat thread spawn so a thread that
|
||||
# races to fire its first request always sees its token.
|
||||
try:
|
||||
from platform_auth import register_workspace_token
|
||||
for wsid, tok in workspaces:
|
||||
register_workspace_token(wsid, tok)
|
||||
except ImportError:
|
||||
# Older installs that don't yet ship register_workspace_token —
|
||||
# multi-workspace resolution silently degrades to the legacy
|
||||
# single-token path; single-workspace operators see no change.
|
||||
logger.debug("platform_auth.register_workspace_token unavailable; skipping registry populate")
|
||||
|
||||
# Standalone-mode register + heartbeat. Skipped via env var so an
|
||||
# in-container caller (which has its own heartbeat loop) can reuse
|
||||
# this entry point without double-heartbeating. The wheel's main
|
||||
# console-script path always runs them; the
|
||||
# MOLECULE_MCP_DISABLE_HEARTBEAT escape hatch exists for tests +
|
||||
# the rare embedded use-case.
|
||||
if not os.environ.get("MOLECULE_MCP_DISABLE_HEARTBEAT", "").strip():
|
||||
for wsid, tok in workspaces:
|
||||
_platform_register(platform_url, wsid, tok)
|
||||
_start_heartbeat_thread(platform_url, wsid, tok)
|
||||
|
||||
# Inbox poller — the inbound side of the standalone path. Without
|
||||
# this thread, the universal MCP server is OUTBOUND-ONLY: an agent
|
||||
# can call delegate_task / send_message_to_user but never observe
|
||||
# canvas-user or peer-agent messages. One poller per workspace; all
|
||||
# of them write to the SAME shared inbox state so the agent's
|
||||
# inbox_peek/pop/wait tools see a merged view (each message tagged
|
||||
# with arrival_workspace_id so the agent can route the reply).
|
||||
#
|
||||
# Same disable pattern as heartbeat: in-container callers (with
|
||||
# push delivery via canvas WebSocket) skip this to avoid duplicate
|
||||
# delivery; tests use the env to keep imports cheap.
|
||||
if not os.environ.get("MOLECULE_MCP_DISABLE_INBOX", "").strip():
|
||||
_start_inbox_pollers(platform_url, [w[0] for w in workspaces])
|
||||
|
||||
# Env is valid — safe to import the heavy module now. Importing
|
||||
# earlier would trigger a2a_client.py:22's module-level RuntimeError
|
||||
# before our friendly help reaches the user.
|
||||
from a2a_mcp_server import cli_main
|
||||
cli_main()
|
||||
|
||||
|
||||
if __name__ == "__main__": # pragma: no cover
|
||||
main()
|
||||
@@ -1,426 +0,0 @@
|
||||
"""molecule-mcp doctor — diagnostic subcommand for first-run install.
|
||||
|
||||
Run via ``molecule-mcp doctor``. Prints a checklist of common
|
||||
onboarding failure modes and concrete next-step suggestions for each
|
||||
failed check.
|
||||
|
||||
Closes Ryan's #2934 item 6 ("Add a molecule-mcp doctor subcommand —
|
||||
this single command would have saved me 30 of the 45 minutes").
|
||||
Pairs with #2935 (Python>=3.11 callout, PATH guidance, TOKEN_FILE
|
||||
support) — those fixed the snippet, this gives the operator a way to
|
||||
self-diagnose when something still goes wrong.
|
||||
|
||||
Six checks, in operator-encounter order:
|
||||
|
||||
1. Python version — wheel requires >=3.11 (pip says
|
||||
"no versions found" on older).
|
||||
2. Wheel install — molecule_runtime importable + version reported.
|
||||
3. PATH for molecule-mcp — pip user-site installs land at
|
||||
~/Library/Python/3.X/bin which isn't on
|
||||
PATH on a fresh macOS shell. Most common
|
||||
"claude mcp add can't find molecule-mcp"
|
||||
cause.
|
||||
4. Env vars — PLATFORM_URL set + reachable;
|
||||
WORKSPACE_ID set; auth token resolvable
|
||||
(env or *_FILE or .auth_token).
|
||||
5. Platform health — GET ${PLATFORM_URL}/healthz returns 2xx.
|
||||
Catches DNS/firewall/wrong-scheme issues
|
||||
before the operator hits the real
|
||||
register call.
|
||||
6. Token auth — POST ${PLATFORM_URL}/registry/heartbeat
|
||||
with the resolved workspace_id+token
|
||||
returns 2xx. End-to-end auth verification.
|
||||
Uses heartbeat (idempotent timestamp
|
||||
update) instead of register (UPSERT —
|
||||
would clobber agent_card metadata) so
|
||||
the doctor is safe to run against a
|
||||
live workspace.
|
||||
|
||||
Each check prints one of:
|
||||
[OK] <one-line status>
|
||||
[WARN] <one-line status> next: <fix suggestion>
|
||||
[FAIL] <one-line status> next: <fix suggestion>
|
||||
|
||||
Exit 0 if all pass or only WARNs; exit 1 if any FAIL — so the
|
||||
subcommand is scriptable from CI / install-checks too.
|
||||
|
||||
Out of scope for now (deferred follow-ups):
|
||||
- Claude Code-specific checks (parse ~/.claude.json, verify each
|
||||
MCP entry is plugin-sourced + dev-channels flag is set). That's
|
||||
a separate Claude-Code-specific doctor and lives in the
|
||||
claude-code-channel plugin, not the universal-MCP doctor.
|
||||
- Automated remediation (running the suggested fix). Doctor is
|
||||
a diagnostic tool — it tells the operator what's wrong + how
|
||||
to fix it, doesn't apply changes.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib
|
||||
import importlib.metadata
|
||||
import os
|
||||
import shutil
|
||||
import sys
|
||||
from typing import Optional
|
||||
|
||||
# urllib avoids a hard dep on `requests` for the doctor — the real
|
||||
# CLI already imports requests via mcp_heartbeat, but doctor should
|
||||
# keep working even on a partial install where requests is missing
|
||||
# (that itself is a finding worth surfacing).
|
||||
from urllib import request as urllib_request
|
||||
from urllib.error import URLError
|
||||
|
||||
|
||||
# ANSI colors are friendly on TTYs; auto-disable on pipe / NO_COLOR
|
||||
# for CI logs where the escape sequences clutter the diff.
|
||||
def _color(name: str) -> str:
|
||||
if not sys.stdout.isatty() or os.environ.get("NO_COLOR"):
|
||||
return ""
|
||||
return {
|
||||
"green": "\033[32m",
|
||||
"yellow": "\033[33m",
|
||||
"red": "\033[31m",
|
||||
"dim": "\033[2m",
|
||||
"reset": "\033[0m",
|
||||
}.get(name, "")
|
||||
|
||||
|
||||
def _ok(label: str, msg: str) -> None:
|
||||
print(f" {_color('green')}[OK]{_color('reset')} {label}: {msg}")
|
||||
|
||||
|
||||
def _warn(label: str, msg: str, fix: str) -> None:
|
||||
print(f" {_color('yellow')}[WARN]{_color('reset')} {label}: {msg}")
|
||||
print(f" {_color('dim')}next:{_color('reset')} {fix}")
|
||||
|
||||
|
||||
def _fail(label: str, msg: str, fix: str) -> None:
|
||||
print(f" {_color('red')}[FAIL]{_color('reset')} {label}: {msg}")
|
||||
print(f" {_color('dim')}next:{_color('reset')} {fix}")
|
||||
|
||||
|
||||
# Each check returns a "ok" | "warn" | "fail" verdict so the caller
|
||||
# can compute an exit code without re-walking the print stream.
|
||||
Verdict = str # "ok" | "warn" | "fail"
|
||||
|
||||
|
||||
def check_python_version() -> Verdict:
|
||||
label = "Python version"
|
||||
major, minor = sys.version_info[:2]
|
||||
if (major, minor) >= (3, 11):
|
||||
_ok(label, f"Python {major}.{minor} (wheel requires >=3.11)")
|
||||
return "ok"
|
||||
_fail(
|
||||
label,
|
||||
f"Python {major}.{minor} is below the wheel's >=3.11 floor",
|
||||
"upgrade Python (brew install python@3.12 / apt install python3.12) "
|
||||
"or run molecule-mcp via a 3.11+ venv.",
|
||||
)
|
||||
return "fail"
|
||||
|
||||
|
||||
def check_wheel_install() -> Verdict:
|
||||
label = "Wheel install"
|
||||
try:
|
||||
version = importlib.metadata.version("molecule-ai-workspace-runtime")
|
||||
except importlib.metadata.PackageNotFoundError:
|
||||
_fail(
|
||||
label,
|
||||
"molecule-ai-workspace-runtime not found in this interpreter's site-packages",
|
||||
"pip install molecule-ai-workspace-runtime "
|
||||
"(or pipx install molecule-ai-workspace-runtime to get the "
|
||||
"binary on PATH automatically).",
|
||||
)
|
||||
return "fail"
|
||||
try:
|
||||
importlib.import_module("molecule_runtime.mcp_cli")
|
||||
except ImportError as e:
|
||||
_fail(
|
||||
label,
|
||||
f"package found ({version}) but `molecule_runtime.mcp_cli` won't import: {e}",
|
||||
"reinstall the wheel (pip install --force-reinstall "
|
||||
"molecule-ai-workspace-runtime); if it still fails, file "
|
||||
"a bug with the traceback.",
|
||||
)
|
||||
return "fail"
|
||||
_ok(label, f"molecule-ai-workspace-runtime=={version}")
|
||||
return "ok"
|
||||
|
||||
|
||||
def check_path_for_binary() -> Verdict:
|
||||
label = "PATH for molecule-mcp"
|
||||
found = shutil.which("molecule-mcp")
|
||||
if found:
|
||||
_ok(label, f"resolves to {found}")
|
||||
return "ok"
|
||||
# Not on PATH — work out where pip put it so the suggestion is
|
||||
# actionable instead of generic.
|
||||
user_base = os.environ.get("PYTHONUSERBASE")
|
||||
if not user_base:
|
||||
try:
|
||||
import site
|
||||
user_base = site.getuserbase()
|
||||
except Exception:
|
||||
user_base = None
|
||||
hint = (
|
||||
f"add `{user_base}/bin` to PATH"
|
||||
if user_base
|
||||
else "switch to `pipx install molecule-ai-workspace-runtime` so the "
|
||||
"binary lands in pipx's managed bin/ on PATH"
|
||||
)
|
||||
_fail(
|
||||
label,
|
||||
"molecule-mcp not found on PATH",
|
||||
f"{hint}, or invoke via `python -m molecule_runtime.mcp_cli` directly.",
|
||||
)
|
||||
return "fail"
|
||||
|
||||
|
||||
def _resolve_token() -> tuple[Optional[str], Optional[str]]:
|
||||
"""Return ``(token_value, source_label)`` if the operator's
|
||||
environment exposes a token, else ``(None, None)``.
|
||||
|
||||
Single source of truth used by both ``check_env_vars()`` (which
|
||||
only needs the source label) and ``check_register()`` (which
|
||||
needs the actual value to send a Bearer header). Keeping these
|
||||
in one place means a future env-var addition only updates the
|
||||
resolver — not two parallel readers that can drift.
|
||||
"""
|
||||
val = os.environ.get("MOLECULE_WORKSPACE_TOKEN", "").strip()
|
||||
if val:
|
||||
return val, "env MOLECULE_WORKSPACE_TOKEN"
|
||||
file_var = os.environ.get("MOLECULE_WORKSPACE_TOKEN_FILE", "").strip()
|
||||
if file_var:
|
||||
if os.path.isfile(file_var):
|
||||
try:
|
||||
from pathlib import Path as _Path
|
||||
return (
|
||||
_Path(file_var).read_text().strip(),
|
||||
f"file {file_var} (via MOLECULE_WORKSPACE_TOKEN_FILE)",
|
||||
)
|
||||
except OSError:
|
||||
return None, None
|
||||
return None, None
|
||||
# Per-runtime container path used by the in-platform path; rarely
|
||||
# set on external setups but check anyway so the message is
|
||||
# accurate for both shapes.
|
||||
try:
|
||||
import configs_dir
|
||||
candidate = configs_dir.resolve() / ".auth_token"
|
||||
if candidate.is_file():
|
||||
try:
|
||||
return candidate.read_text().strip(), f"file {candidate}"
|
||||
except OSError:
|
||||
return None, None
|
||||
except Exception:
|
||||
pass
|
||||
return None, None
|
||||
|
||||
|
||||
def _resolve_token_summary() -> Optional[str]:
|
||||
"""Return just the source label (no secret value). Convenience
|
||||
wrapper around :func:`_resolve_token` for callers that don't
|
||||
need the value itself.
|
||||
"""
|
||||
_, label = _resolve_token()
|
||||
return label
|
||||
|
||||
|
||||
def check_env_vars() -> Verdict:
|
||||
label = "Env vars"
|
||||
missing: list[str] = []
|
||||
if not os.environ.get("PLATFORM_URL", "").strip():
|
||||
missing.append("PLATFORM_URL")
|
||||
if not os.environ.get("WORKSPACE_ID", "").strip() and not os.environ.get(
|
||||
"MOLECULE_WORKSPACES", "",
|
||||
).strip():
|
||||
missing.append("WORKSPACE_ID (or MOLECULE_WORKSPACES)")
|
||||
token_summary = _resolve_token_summary()
|
||||
if not token_summary and not os.environ.get("MOLECULE_WORKSPACES", "").strip():
|
||||
# MOLECULE_WORKSPACES is a JSON-array env that bundles its
|
||||
# own per-workspace tokens — if it's set we trust the
|
||||
# resolver to validate.
|
||||
missing.append(
|
||||
"MOLECULE_WORKSPACE_TOKEN (or MOLECULE_WORKSPACE_TOKEN_FILE, or "
|
||||
"/configs/.auth_token)",
|
||||
)
|
||||
if missing:
|
||||
_fail(
|
||||
label,
|
||||
f"unset: {', '.join(missing)}",
|
||||
"see the canvas Connect-External-Agent modal — the snippet "
|
||||
"exports all three. Use MOLECULE_WORKSPACE_TOKEN_FILE for the "
|
||||
"token to keep secrets out of shell history.",
|
||||
)
|
||||
return "fail"
|
||||
_ok(
|
||||
label,
|
||||
f"PLATFORM_URL + WORKSPACE_ID set; token from {token_summary or 'MOLECULE_WORKSPACES'}",
|
||||
)
|
||||
return "ok"
|
||||
|
||||
|
||||
def _http_get(url: str, timeout: float = 5.0) -> tuple[Optional[int], Optional[str]]:
|
||||
"""Best-effort GET that swallows transport errors and returns
|
||||
(status, error_message). Status is None when the request couldn't
|
||||
complete; error_message is None when the request returned 2xx.
|
||||
"""
|
||||
try:
|
||||
# Origin header — staging tenants enforce same-origin via WAF;
|
||||
# /healthz tolerates either way but matching production headers
|
||||
# surfaces auth-style 401s correctly during the doctor run.
|
||||
req = urllib_request.Request(
|
||||
url,
|
||||
headers={"Origin": os.environ.get("PLATFORM_URL", "").rstrip("/")},
|
||||
)
|
||||
with urllib_request.urlopen(req, timeout=timeout) as resp:
|
||||
return resp.status, None
|
||||
except URLError as e:
|
||||
return None, str(e.reason if hasattr(e, "reason") else e)
|
||||
except Exception as e:
|
||||
return None, str(e)
|
||||
|
||||
|
||||
def check_platform_health() -> Verdict:
|
||||
label = "Platform reachability"
|
||||
base = os.environ.get("PLATFORM_URL", "").strip().rstrip("/")
|
||||
if not base:
|
||||
_warn(label, "skipped (PLATFORM_URL unset — see Env vars)", "set PLATFORM_URL first")
|
||||
return "warn"
|
||||
if not base.startswith(("http://", "https://")):
|
||||
_fail(
|
||||
label,
|
||||
f"PLATFORM_URL missing scheme: {base!r}",
|
||||
"set PLATFORM_URL to include https:// — e.g. "
|
||||
"PLATFORM_URL=https://your-tenant.staging.moleculesai.app",
|
||||
)
|
||||
return "fail"
|
||||
if base.endswith("/"):
|
||||
_warn(
|
||||
label,
|
||||
"PLATFORM_URL has trailing slash (will be stripped automatically)",
|
||||
"remove the trailing slash to match the snippet shape",
|
||||
)
|
||||
status, err = _http_get(f"{base}/healthz")
|
||||
if status is None:
|
||||
_fail(label, f"GET {base}/healthz failed: {err}", "check DNS + firewall + scheme")
|
||||
return "fail"
|
||||
if not (200 <= status < 300):
|
||||
_fail(label, f"GET {base}/healthz returned HTTP {status}", "verify the tenant subdomain is correct + provisioned")
|
||||
return "fail"
|
||||
_ok(label, f"GET {base}/healthz → {status}")
|
||||
return "ok"
|
||||
|
||||
|
||||
def check_token_auth() -> Verdict:
|
||||
"""Light auth check via POST /registry/heartbeat.
|
||||
|
||||
Why heartbeat and not register: register is an UPSERT — sending
|
||||
it from doctor would clobber the workspace's actual agent_card
|
||||
(name, description, version) until the real agent next calls
|
||||
register. That's an invisible production-disruption: someone
|
||||
runs ``molecule-mcp doctor`` against a live workspace and the
|
||||
canvas briefly displays "doctor-probe" as the agent name.
|
||||
|
||||
Heartbeat only updates last_heartbeat_at (and clears
|
||||
awaiting_agent if needed) — that's exactly what a normal
|
||||
molecule-mcp boot does every 20s, so an extra heartbeat from
|
||||
the doctor is indistinguishable from background traffic.
|
||||
|
||||
Skipped when env vars failed earlier so the operator isn't shown
|
||||
a redundant 401.
|
||||
"""
|
||||
label = "Token auth"
|
||||
base = os.environ.get("PLATFORM_URL", "").strip().rstrip("/")
|
||||
workspace_id = os.environ.get("WORKSPACE_ID", "").strip()
|
||||
token, source_label = _resolve_token()
|
||||
if not (base and workspace_id and token):
|
||||
_warn(label, "skipped (Env vars must pass first)", "fix Env vars, re-run")
|
||||
return "warn"
|
||||
import json
|
||||
body = json.dumps({"id": workspace_id}).encode()
|
||||
req = urllib_request.Request(
|
||||
f"{base}/registry/heartbeat",
|
||||
data=body,
|
||||
method="POST",
|
||||
headers={
|
||||
"Authorization": f"Bearer {token}",
|
||||
"Content-Type": "application/json",
|
||||
"Origin": base,
|
||||
},
|
||||
)
|
||||
try:
|
||||
with urllib_request.urlopen(req, timeout=8.0) as resp:
|
||||
status = resp.status
|
||||
except URLError as e:
|
||||
# Pull HTTP code from HTTPError; transport errors don't have one.
|
||||
status = getattr(e, "code", None)
|
||||
err = str(e.reason if hasattr(e, "reason") else e)
|
||||
if status is None:
|
||||
_fail(label, f"POST {base}/registry/heartbeat failed: {err}", "check network")
|
||||
return "fail"
|
||||
except Exception as e:
|
||||
_fail(label, f"POST heartbeat failed: {e}", "check network")
|
||||
return "fail"
|
||||
if status == 401:
|
||||
_fail(
|
||||
label,
|
||||
"401 Unauthorized — token rejected",
|
||||
"tokens are shown only once at workspace-create time; "
|
||||
"re-create the workspace OR rotate via canvas Tokens tab.",
|
||||
)
|
||||
return "fail"
|
||||
if status == 404:
|
||||
_fail(
|
||||
label,
|
||||
f"404 — workspace_id {workspace_id} not found on {base}",
|
||||
"verify WORKSPACE_ID matches a real workspace + the tenant "
|
||||
"subdomain in PLATFORM_URL.",
|
||||
)
|
||||
return "fail"
|
||||
if not (200 <= status < 300):
|
||||
_fail(label, f"POST heartbeat returned HTTP {status}", "see platform logs")
|
||||
return "fail"
|
||||
_ok(label, f"POST {base}/registry/heartbeat → {status} (token from {source_label})")
|
||||
return "ok"
|
||||
|
||||
|
||||
# Back-compat alias: the previous name was check_register, but the
|
||||
# implementation switched to a non-mutating heartbeat probe (see
|
||||
# check_token_auth's docstring). Kept so external test suites or
|
||||
# pinned-import scripts don't break on the rename.
|
||||
check_register = check_token_auth
|
||||
|
||||
|
||||
CHECKS = [
|
||||
check_python_version,
|
||||
check_wheel_install,
|
||||
check_path_for_binary,
|
||||
check_env_vars,
|
||||
check_platform_health,
|
||||
check_token_auth,
|
||||
]
|
||||
|
||||
|
||||
def run() -> int:
|
||||
"""Run all checks and return a process exit code (0 ok, 1 if any fail)."""
|
||||
print("molecule-mcp doctor — onboarding diagnostic")
|
||||
print()
|
||||
verdicts = []
|
||||
for chk in CHECKS:
|
||||
try:
|
||||
verdicts.append(chk())
|
||||
except Exception as e:
|
||||
# A buggy check shouldn't kill the rest of the doctor run.
|
||||
print(f" [BUG] {chk.__name__}: unexpected {type(e).__name__}: {e}")
|
||||
verdicts.append("fail")
|
||||
print()
|
||||
fails = sum(1 for v in verdicts if v == "fail")
|
||||
warns = sum(1 for v in verdicts if v == "warn")
|
||||
if fails:
|
||||
print(f"{fails} check(s) failed, {warns} warning(s). Fix the FAIL items above and re-run.")
|
||||
return 1
|
||||
if warns:
|
||||
print(f"All required checks passed; {warns} warning(s) — review the next-step hints.")
|
||||
return 0
|
||||
print("All checks passed.")
|
||||
return 0
|
||||
@@ -1,325 +0,0 @@
|
||||
"""Heartbeat + register thread for the standalone ``molecule-mcp`` wrapper.
|
||||
|
||||
Extracted from ``mcp_cli.py`` (RFC #2873 iter 3) so the heartbeat /
|
||||
register concern lives in its own module. The console-script entry
|
||||
``mcp_cli:main`` still drives the spawn, but the loop body, auth-failure
|
||||
escalation, and inbound-secret persistence now live here so they can be
|
||||
read, tested, and replaced independently of the orchestrator.
|
||||
|
||||
Public surface:
|
||||
|
||||
* ``HEARTBEAT_INTERVAL_SECONDS`` — cadence constant.
|
||||
* ``build_agent_card(workspace_id)`` — payload helper.
|
||||
* ``platform_register(platform_url, workspace_id, token)`` — one-shot
|
||||
POST /registry/register at startup.
|
||||
* ``start_heartbeat_thread(platform_url, workspace_id, token)`` — spawn
|
||||
the daemon thread.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
import threading
|
||||
import time
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Heartbeat cadence. Must be tighter than healthsweep's stale window
|
||||
# (currently 60-90s — see registry/healthsweep.go) by a comfortable
|
||||
# margin so a single missed heartbeat doesn't flip awaiting_agent.
|
||||
# 20s gives the operator's network 3 attempts within the budget; long
|
||||
# enough that it doesn't spam, short enough to recover quickly after
|
||||
# laptop sleep.
|
||||
HEARTBEAT_INTERVAL_SECONDS = 20.0
|
||||
|
||||
# After this many consecutive 401/403 heartbeats, escalate from
|
||||
# WARNING to ERROR with re-onboard guidance. 3 ticks at 20s = ~1 minute
|
||||
# of sustained auth failure — enough to rule out a transient platform
|
||||
# blip but quick enough that an operator doesn't sit puzzled for 10
|
||||
# minutes wondering why their MCP tools 401. Same threshold used for
|
||||
# repeat-logging at 20-tick (~7 min) intervals so a long-running
|
||||
# session that missed the first ERROR still sees the message.
|
||||
HEARTBEAT_AUTH_LOUD_THRESHOLD = 3
|
||||
HEARTBEAT_AUTH_RELOG_INTERVAL = 20
|
||||
|
||||
|
||||
def build_agent_card(workspace_id: str) -> dict:
|
||||
"""Build the ``agent_card`` payload sent to /registry/register.
|
||||
|
||||
Three optional env vars override the defaults so an operator can
|
||||
surface human-readable identity + capabilities to peers and the
|
||||
canvas Skills tab without code changes:
|
||||
|
||||
* ``MOLECULE_AGENT_NAME`` — display name (defaults to
|
||||
``molecule-mcp-{id[:8]}``). Surfaced in canvas workspace cards
|
||||
and ``list_peers`` output.
|
||||
* ``MOLECULE_AGENT_DESCRIPTION`` — one-liner about the agent's
|
||||
purpose. Rendered in canvas Details + Skills tabs.
|
||||
* ``MOLECULE_AGENT_SKILLS`` — comma-separated skill names
|
||||
(e.g. ``research,code-review,memory-curation``). Each name is
|
||||
expanded to a ``{"name": ...}`` skill object — the minimum
|
||||
shape that satisfies both ``shared_runtime.summarize_peers``
|
||||
(uses ``s["name"]``) and the canvas SkillsTab.tsx schema
|
||||
(id falls back to name when omitted). Empty / whitespace
|
||||
entries are dropped.
|
||||
|
||||
Defaults match the previous hardcoded behaviour exactly so this
|
||||
is a strict superset — an operator who sets none of the env vars
|
||||
sees no change.
|
||||
"""
|
||||
name = (os.environ.get("MOLECULE_AGENT_NAME") or "").strip()
|
||||
if not name:
|
||||
name = f"molecule-mcp-{workspace_id[:8]}"
|
||||
|
||||
description = (os.environ.get("MOLECULE_AGENT_DESCRIPTION") or "").strip()
|
||||
|
||||
skills_raw = (os.environ.get("MOLECULE_AGENT_SKILLS") or "").strip()
|
||||
skills: list[dict] = []
|
||||
if skills_raw:
|
||||
for s in skills_raw.split(","):
|
||||
label = s.strip()
|
||||
if label:
|
||||
skills.append({"name": label})
|
||||
|
||||
card: dict = {"name": name, "skills": skills}
|
||||
if description:
|
||||
card["description"] = description
|
||||
return card
|
||||
|
||||
|
||||
def platform_register(platform_url: str, workspace_id: str, token: str) -> None:
|
||||
"""One-shot register at startup; fails fast on auth errors.
|
||||
|
||||
Lifts the workspace from ``awaiting_agent`` to ``online`` for
|
||||
operators who never ran the curl-register snippet. Safe to call
|
||||
repeatedly: the platform's register handler is an upsert that
|
||||
just refreshes ``url``, ``agent_card``, and ``status``.
|
||||
|
||||
Failure model (post-review):
|
||||
- 401 / 403 → ``sys.exit(3)`` immediately. The operator's
|
||||
token is wrong; silently looping in a broken state would
|
||||
make this hard to diagnose because the MCP tools would 401
|
||||
on every call too. Hard-fail is the kindest option.
|
||||
- Other 4xx/5xx → log a warning + continue. The heartbeat
|
||||
thread will surface persistent failures; transient platform
|
||||
blips shouldn't abort the MCP loop.
|
||||
- Network / transport errors → log + continue. Same reasoning.
|
||||
|
||||
Origin header is required by the SaaS edge WAF; without it
|
||||
/registry/register currently still works (it's on the WAF
|
||||
allowlist), but the heartbeat path needs Origin and we want one
|
||||
consistent header set across both calls.
|
||||
"""
|
||||
try:
|
||||
import httpx
|
||||
except ImportError:
|
||||
# httpx is a transitive dep via a2a-sdk; if missing, the MCP
|
||||
# server won't import either. Let the caller's later import
|
||||
# surface the real error.
|
||||
return
|
||||
|
||||
payload = {
|
||||
"id": workspace_id,
|
||||
"url": "",
|
||||
"agent_card": build_agent_card(workspace_id),
|
||||
"delivery_mode": "poll",
|
||||
}
|
||||
headers = {
|
||||
"Authorization": f"Bearer {token}",
|
||||
"Origin": platform_url,
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
try:
|
||||
with httpx.Client(timeout=10.0) as client:
|
||||
resp = client.post(
|
||||
f"{platform_url}/registry/register",
|
||||
json=payload,
|
||||
headers=headers,
|
||||
)
|
||||
if resp.status_code in (401, 403):
|
||||
print(
|
||||
f"molecule-mcp: register rejected with HTTP {resp.status_code} — "
|
||||
f"the token in MOLECULE_WORKSPACE_TOKEN is invalid for workspace "
|
||||
f"{workspace_id}. Regenerate from the canvas → Tokens tab.",
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(3)
|
||||
if resp.status_code >= 400:
|
||||
logger.warning(
|
||||
"molecule-mcp: register POST returned HTTP %d: %s",
|
||||
resp.status_code,
|
||||
(resp.text or "")[:200],
|
||||
)
|
||||
else:
|
||||
logger.info(
|
||||
"molecule-mcp: registered workspace %s with platform",
|
||||
workspace_id,
|
||||
)
|
||||
except SystemExit:
|
||||
raise
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning("molecule-mcp: register POST failed: %s", exc)
|
||||
|
||||
|
||||
def heartbeat_loop(
|
||||
platform_url: str,
|
||||
workspace_id: str,
|
||||
token: str,
|
||||
interval: float = HEARTBEAT_INTERVAL_SECONDS,
|
||||
) -> None:
|
||||
"""Daemon thread body: POST /registry/heartbeat every ``interval``s.
|
||||
|
||||
Failures are logged at WARNING and the loop continues. The thread
|
||||
exits when the main process does (daemon=True). Each iteration
|
||||
rebuilds the payload + headers — cheap and ensures token rotation
|
||||
via env var (rare but possible) is picked up on the next tick.
|
||||
"""
|
||||
try:
|
||||
import httpx
|
||||
except ImportError:
|
||||
return
|
||||
|
||||
start_time = time.time()
|
||||
consecutive_auth_failures = 0
|
||||
while True:
|
||||
body = {
|
||||
"workspace_id": workspace_id,
|
||||
"error_rate": 0.0,
|
||||
"sample_error": "",
|
||||
"active_tasks": 0,
|
||||
"uptime_seconds": int(time.time() - start_time),
|
||||
}
|
||||
headers = {
|
||||
"Authorization": f"Bearer {token}",
|
||||
"Origin": platform_url,
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
try:
|
||||
with httpx.Client(timeout=10.0) as client:
|
||||
resp = client.post(
|
||||
f"{platform_url}/registry/heartbeat",
|
||||
json=body,
|
||||
headers=headers,
|
||||
)
|
||||
if resp.status_code in (401, 403):
|
||||
consecutive_auth_failures += 1
|
||||
log_heartbeat_auth_failure(
|
||||
consecutive_auth_failures, workspace_id, resp.status_code,
|
||||
)
|
||||
elif resp.status_code >= 400:
|
||||
# Non-auth HTTP error — log, but DO NOT touch the
|
||||
# auth-failure counter (5xx blips, 429, etc. are
|
||||
# transient and unrelated to token validity).
|
||||
logger.warning(
|
||||
"molecule-mcp: heartbeat HTTP %d: %s",
|
||||
resp.status_code,
|
||||
(resp.text or "")[:200],
|
||||
)
|
||||
else:
|
||||
consecutive_auth_failures = 0
|
||||
persist_inbound_secret_from_heartbeat(resp)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning("molecule-mcp: heartbeat failed: %s", exc)
|
||||
time.sleep(interval)
|
||||
|
||||
|
||||
def log_heartbeat_auth_failure(count: int, workspace_id: str, status_code: int) -> None:
|
||||
"""Escalate consecutive heartbeat 401/403s from quiet WARNING to
|
||||
actionable ERROR.
|
||||
|
||||
The operator's first sign of trouble shouldn't be "tools 401 with no
|
||||
explanation" — that was the failure mode that motivated this code,
|
||||
triggered by a workspace being deleted server-side and its tokens
|
||||
revoked while the runtime kept heartbeating in silence.
|
||||
|
||||
Cadence:
|
||||
* count < threshold: WARNING per tick (transient — could be a
|
||||
platform blip, don't shout yet)
|
||||
* count == threshold: ERROR with re-onboard instructions
|
||||
(the first signal the operator can't miss)
|
||||
* count > threshold and (count - threshold) % relog == 0: re-log
|
||||
ERROR (so a session that started after the first ERROR still
|
||||
sees the message scrolling past in their logs)
|
||||
"""
|
||||
if count < HEARTBEAT_AUTH_LOUD_THRESHOLD:
|
||||
logger.warning(
|
||||
"molecule-mcp: heartbeat HTTP %d (auth failure %d/%d) — "
|
||||
"token may be revoked. Will retry; if persistent, regenerate "
|
||||
"from canvas → Tokens.",
|
||||
status_code, count, HEARTBEAT_AUTH_LOUD_THRESHOLD,
|
||||
)
|
||||
return
|
||||
# At or past the threshold — this is the loud actionable error.
|
||||
if count == HEARTBEAT_AUTH_LOUD_THRESHOLD or (
|
||||
count - HEARTBEAT_AUTH_LOUD_THRESHOLD
|
||||
) % HEARTBEAT_AUTH_RELOG_INTERVAL == 0:
|
||||
logger.error(
|
||||
"molecule-mcp: %d consecutive heartbeat auth failures (HTTP %d) — "
|
||||
"the token in MOLECULE_WORKSPACE_TOKEN has been REVOKED, likely "
|
||||
"because workspace %s was deleted server-side. The MCP server is "
|
||||
"still running but every platform call will fail. Regenerate the "
|
||||
"workspace + token from the canvas (Tokens tab), update your MCP "
|
||||
"config, and restart your runtime.",
|
||||
count, status_code, workspace_id,
|
||||
)
|
||||
|
||||
|
||||
def persist_inbound_secret_from_heartbeat(resp: object) -> None:
|
||||
"""Persist ``platform_inbound_secret`` from a heartbeat response, if any.
|
||||
|
||||
The platform's heartbeat handler returns the secret on every beat
|
||||
(mirroring /registry/register) so a workspace that lazy-healed the
|
||||
secret on the platform side — typical recovery path for a workspace
|
||||
whose row had a NULL ``platform_inbound_secret`` after a partial
|
||||
bootstrap — picks it up within one heartbeat tick instead of
|
||||
requiring a runtime restart.
|
||||
|
||||
Without this delivery path the chat-upload code path's "secret was
|
||||
just minted, will pick up on next heartbeat" 503 message is a lie
|
||||
and the workspace stays 401-forever until the operator restarts
|
||||
the runtime. Caught 2026-04-30 on hongmingwang tenant.
|
||||
|
||||
Failure is non-fatal: if the body isn't JSON, doesn't carry the
|
||||
field, or the disk write fails, the next heartbeat retries. This
|
||||
matches the cold-start register flow in main.py:319-323.
|
||||
"""
|
||||
try:
|
||||
body = resp.json()
|
||||
except Exception: # noqa: BLE001
|
||||
return
|
||||
if not isinstance(body, dict):
|
||||
return
|
||||
secret = body.get("platform_inbound_secret")
|
||||
if not secret:
|
||||
return
|
||||
try:
|
||||
from platform_inbound_auth import save_inbound_secret
|
||||
|
||||
save_inbound_secret(secret)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
logger.warning(
|
||||
"molecule-mcp: persist inbound secret from heartbeat failed: %s", exc
|
||||
)
|
||||
|
||||
|
||||
def start_heartbeat_thread(
|
||||
platform_url: str,
|
||||
workspace_id: str,
|
||||
token: str,
|
||||
) -> threading.Thread:
|
||||
"""Start the heartbeat daemon thread. Returns the Thread handle.
|
||||
|
||||
The MCP stdio loop runs in the foreground (asyncio); this thread
|
||||
runs alongside it. ``daemon=True`` so when the operator hits
|
||||
Ctrl-C / closes the runtime, the heartbeat dies with it instead
|
||||
of leaking and writing to a stale workspace.
|
||||
"""
|
||||
t = threading.Thread(
|
||||
target=heartbeat_loop,
|
||||
args=(platform_url, workspace_id, token),
|
||||
name="molecule-mcp-heartbeat",
|
||||
daemon=True,
|
||||
)
|
||||
t.start()
|
||||
return t
|
||||
@@ -1,63 +0,0 @@
|
||||
"""Inbox-poller spawn helpers for the standalone ``molecule-mcp`` wrapper.
|
||||
|
||||
Extracted from ``mcp_cli.py`` (RFC #2873 iter 3). The poller is the
|
||||
INBOUND side of the standalone path — without it, the universal MCP
|
||||
server is outbound-only (can call ``delegate_task`` /
|
||||
``send_message_to_user``, never observes canvas-user / peer-agent
|
||||
messages).
|
||||
|
||||
Public surface:
|
||||
|
||||
* ``start_inbox_pollers(platform_url, workspace_ids)`` — activate the
|
||||
inbox singleton and spawn one daemon poller per workspace.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def start_inbox_pollers(platform_url: str, workspace_ids: list[str]) -> None:
|
||||
"""Activate the inbox singleton + spawn one poller daemon thread per workspace.
|
||||
|
||||
Done lazily here (not at module import) because importing inbox
|
||||
pulls in platform_auth, which only resolves cleanly AFTER env
|
||||
validation succeeds. Activation is idempotent within a process,
|
||||
so a stray double-call (e.g. test harness re-entering main) is
|
||||
harmless.
|
||||
|
||||
The poller threads are daemon=True — die with the main process.
|
||||
|
||||
Single-workspace path: one poller, single cursor file at the legacy
|
||||
location (``.mcp_inbox_cursor``). Cursor-key resolution falls back
|
||||
to the empty string for back-compat with operators whose existing
|
||||
on-disk cursor was written by the pre-multi-workspace code.
|
||||
|
||||
Multi-workspace path: N pollers, each with its own cursor file
|
||||
keyed by ``workspace_id[:8]``. Cursors live next to each other in
|
||||
configs_dir so an operator inspecting state sees all of them
|
||||
together.
|
||||
"""
|
||||
try:
|
||||
import inbox
|
||||
except ImportError as exc:
|
||||
logger.warning("molecule-mcp: inbox module unavailable: %s", exc)
|
||||
return
|
||||
|
||||
if len(workspace_ids) <= 1:
|
||||
# Back-compat exact: single-workspace mode reuses the legacy
|
||||
# cursor filename + cursor_path constructor arg, so an existing
|
||||
# operator's on-disk state isn't invalidated by upgrade.
|
||||
wsid = workspace_ids[0]
|
||||
state = inbox.InboxState(cursor_path=inbox.default_cursor_path())
|
||||
inbox.activate(state)
|
||||
inbox.start_poller_thread(state, platform_url, wsid)
|
||||
return
|
||||
|
||||
# Multi-workspace: per-workspace cursor file, one shared queue.
|
||||
cursor_paths = {wsid: inbox.default_cursor_path(wsid) for wsid in workspace_ids}
|
||||
state = inbox.InboxState(cursor_paths=cursor_paths)
|
||||
inbox.activate(state)
|
||||
for wsid in workspace_ids:
|
||||
inbox.start_poller_thread(state, platform_url, wsid)
|
||||
@@ -1,240 +0,0 @@
|
||||
"""Env validation + workspace resolution for the standalone ``molecule-mcp``.
|
||||
|
||||
Extracted from ``mcp_cli.py`` (RFC #2873 iter 3). Deals with the two
|
||||
shapes ``molecule-mcp`` accepts:
|
||||
|
||||
* Single-workspace legacy shape: ``WORKSPACE_ID`` + token from
|
||||
``MOLECULE_WORKSPACE_TOKEN`` or ``${CONFIGS_DIR}/.auth_token``.
|
||||
* Multi-workspace JSON shape: ``MOLECULE_WORKSPACES`` env var carries a
|
||||
JSON array of ``{"id": ..., "token": ...}`` entries.
|
||||
|
||||
Public surface:
|
||||
|
||||
* ``resolve_workspaces()`` → ``(workspaces, errors)``.
|
||||
* ``read_token_file()`` → token text or ``""``.
|
||||
* ``print_missing_env_help(missing, have_token_file)`` — operator-help
|
||||
printer.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
|
||||
import configs_dir
|
||||
|
||||
|
||||
def resolve_workspaces() -> tuple[list[tuple[str, str]], list[str]]:
|
||||
"""Return the list of ``(workspace_id, token)`` pairs to register.
|
||||
|
||||
Resolution order:
|
||||
|
||||
1. ``MOLECULE_WORKSPACES`` env var — JSON array of
|
||||
``{"id": "...", "token": "..."}`` objects. Activates the
|
||||
multi-workspace external-agent path (one process registered into
|
||||
N workspaces). When set, ``WORKSPACE_ID`` / ``MOLECULE_WORKSPACE_TOKEN``
|
||||
are IGNORED — the JSON is the source of truth.
|
||||
|
||||
2. Single-workspace fallback — ``WORKSPACE_ID`` env var + token
|
||||
resolved in this order:
|
||||
a. ``MOLECULE_WORKSPACE_TOKEN`` (inline env — convenient but
|
||||
leaks into shell history + plaintext MCP-host config).
|
||||
b. ``MOLECULE_WORKSPACE_TOKEN_FILE`` (path to a file holding
|
||||
the token — operator can keep it 0600 in their home dir;
|
||||
survives shell-history scrubs).
|
||||
c. ``${CONFIGS_DIR}/.auth_token`` (in-container runtimes —
|
||||
the platform writes this on provision).
|
||||
|
||||
Returns ``(workspaces, errors)``:
|
||||
* ``workspaces``: list of ``(workspace_id, token)`` — non-empty
|
||||
on the happy path.
|
||||
* ``errors``: human-readable strings describing what's missing /
|
||||
malformed. ``main()`` surfaces these with the same shape as
|
||||
``print_missing_env_help`` so the operator's first run gives
|
||||
actionable output.
|
||||
|
||||
Why JSON env (not file): ergonomic for Claude Code MCP config (one
|
||||
string in ``mcpServers.molecule.env`` instead of a sidecar file)
|
||||
and for CI / launchers. A separate config-file path can be added
|
||||
later without breaking this.
|
||||
"""
|
||||
raw = os.environ.get("MOLECULE_WORKSPACES", "").strip()
|
||||
if raw:
|
||||
try:
|
||||
parsed = json.loads(raw)
|
||||
except json.JSONDecodeError as exc:
|
||||
return [], [
|
||||
f"MOLECULE_WORKSPACES is not valid JSON ({exc.msg} at pos "
|
||||
f"{exc.pos}). Expected: '[{{\"id\":\"<wsid>\",\"token\":"
|
||||
f"\"<tok>\"}},{{...}}]'"
|
||||
]
|
||||
if not isinstance(parsed, list) or not parsed:
|
||||
return [], [
|
||||
"MOLECULE_WORKSPACES must be a non-empty JSON array of "
|
||||
"{\"id\":\"...\",\"token\":\"...\"} objects"
|
||||
]
|
||||
out: list[tuple[str, str]] = []
|
||||
seen: set[str] = set()
|
||||
errors: list[str] = []
|
||||
for i, entry in enumerate(parsed):
|
||||
if not isinstance(entry, dict):
|
||||
errors.append(
|
||||
f"MOLECULE_WORKSPACES[{i}] is not an object — got {type(entry).__name__}"
|
||||
)
|
||||
continue
|
||||
wsid = str(entry.get("id", "")).strip()
|
||||
tok = str(entry.get("token", "")).strip()
|
||||
if not wsid or not tok:
|
||||
errors.append(
|
||||
f"MOLECULE_WORKSPACES[{i}] missing 'id' or 'token'"
|
||||
)
|
||||
continue
|
||||
if wsid in seen:
|
||||
errors.append(
|
||||
f"MOLECULE_WORKSPACES[{i}] duplicate workspace id {wsid!r}"
|
||||
)
|
||||
continue
|
||||
seen.add(wsid)
|
||||
out.append((wsid, tok))
|
||||
if errors:
|
||||
return [], errors
|
||||
return out, []
|
||||
|
||||
# Single-workspace back-compat path.
|
||||
wsid = os.environ.get("WORKSPACE_ID", "").strip()
|
||||
if not wsid:
|
||||
return [], ["WORKSPACE_ID (or MOLECULE_WORKSPACES) is required"]
|
||||
# Token resolution order (#2934): inline env → file path → CONFIGS_DIR
|
||||
# default. The file-path option exists so operators can keep the
|
||||
# bearer out of shell history and out of MCP-host config plaintext
|
||||
# (e.g. ~/.claude.json) — set MOLECULE_WORKSPACE_TOKEN_FILE to a
|
||||
# 0600 file containing the token. The CONFIGS_DIR/.auth_token
|
||||
# fallback predates this and stays for in-container runtimes.
|
||||
tok = os.environ.get("MOLECULE_WORKSPACE_TOKEN", "").strip()
|
||||
if not tok:
|
||||
tok, tf_err = _read_token_from_file_env()
|
||||
if tf_err:
|
||||
# Operator explicitly pointed TOKEN_FILE somewhere — surface
|
||||
# the SPECIFIC failure (path doesn't exist, isn't readable,
|
||||
# or holds a blank file) instead of falling through to the
|
||||
# generic "set one of these three vars" message. Otherwise
|
||||
# they get exactly the silent failure mode #2934 flagged
|
||||
# ("a new user has no chance"). Skip the CONFIGS_DIR
|
||||
# fallback in this case — the operator's intent is clearly
|
||||
# to use the file path; deferring to a different source
|
||||
# would mask their config error.
|
||||
return [], [tf_err]
|
||||
if not tok:
|
||||
tok = read_token_file()
|
||||
if not tok:
|
||||
return [], [
|
||||
"MOLECULE_WORKSPACE_TOKEN, MOLECULE_WORKSPACE_TOKEN_FILE, or "
|
||||
"CONFIGS_DIR/.auth_token is required"
|
||||
]
|
||||
return [(wsid, tok)], []
|
||||
|
||||
|
||||
def _read_token_from_file_env() -> tuple[str, str]:
|
||||
"""Read the token from the file path in MOLECULE_WORKSPACE_TOKEN_FILE.
|
||||
|
||||
Returns ``(token, error)``:
|
||||
* env var unset/blank → ``("", "")`` — caller falls through silently
|
||||
to the next source; the operator didn't ask for this path.
|
||||
* file open/read fails (missing, permission denied, decode error)
|
||||
→ ``("", "<specific error>")`` — caller surfaces it directly.
|
||||
The operator EXPLICITLY pointed at this path, so a generic
|
||||
fallthrough error would mask their config bug (#2934).
|
||||
* file is blank → ``("", "<blank file error>")`` — same reasoning.
|
||||
* file read returns junk with internal whitespace/newlines (e.g.
|
||||
a CSV cell, accidental multi-token paste) → ``("", "<error>")``
|
||||
rather than concatenating into a malformed bearer that 401s
|
||||
against the platform with no context.
|
||||
* happy path → ``("<token>", "")``.
|
||||
"""
|
||||
path = os.environ.get("MOLECULE_WORKSPACE_TOKEN_FILE", "").strip()
|
||||
if not path:
|
||||
return "", ""
|
||||
try:
|
||||
with open(path, encoding="utf-8") as fh:
|
||||
raw = fh.read()
|
||||
except FileNotFoundError:
|
||||
return "", (
|
||||
f"MOLECULE_WORKSPACE_TOKEN_FILE points to {path!r} which "
|
||||
f"does not exist"
|
||||
)
|
||||
except PermissionError:
|
||||
return "", (
|
||||
f"MOLECULE_WORKSPACE_TOKEN_FILE={path!r} is not readable "
|
||||
f"(permission denied)"
|
||||
)
|
||||
except OSError as exc:
|
||||
return "", (
|
||||
f"MOLECULE_WORKSPACE_TOKEN_FILE={path!r} could not be read: "
|
||||
f"{exc}"
|
||||
)
|
||||
except UnicodeDecodeError:
|
||||
return "", (
|
||||
f"MOLECULE_WORKSPACE_TOKEN_FILE={path!r} is not valid UTF-8"
|
||||
)
|
||||
tok = raw.strip()
|
||||
if not tok:
|
||||
return "", (
|
||||
f"MOLECULE_WORKSPACE_TOKEN_FILE={path!r} is empty"
|
||||
)
|
||||
# Reject tokens with internal whitespace — a CSV cell or accidental
|
||||
# multi-token paste would otherwise become a malformed bearer that
|
||||
# 401s against the platform with no diagnostic.
|
||||
if any(ch.isspace() for ch in tok):
|
||||
return "", (
|
||||
f"MOLECULE_WORKSPACE_TOKEN_FILE={path!r} contains internal "
|
||||
f"whitespace — expected a single token"
|
||||
)
|
||||
return tok, ""
|
||||
|
||||
|
||||
def print_missing_env_help(missing: list[str], have_token_file: bool) -> None:
|
||||
print("molecule-mcp: missing required environment.\n", file=sys.stderr)
|
||||
print("Set the following before running molecule-mcp:", file=sys.stderr)
|
||||
print(" WORKSPACE_ID — your workspace UUID (from canvas)", file=sys.stderr)
|
||||
print(
|
||||
" PLATFORM_URL — base URL of your Molecule platform "
|
||||
"(e.g. https://your-tenant.staging.moleculesai.app)",
|
||||
file=sys.stderr,
|
||||
)
|
||||
if not have_token_file:
|
||||
print(
|
||||
" MOLECULE_WORKSPACE_TOKEN — bearer token for this workspace "
|
||||
"(canvas → Tokens tab)",
|
||||
file=sys.stderr,
|
||||
)
|
||||
print(
|
||||
" OR set MOLECULE_WORKSPACE_TOKEN_FILE"
|
||||
" to a path that holds the token",
|
||||
file=sys.stderr,
|
||||
)
|
||||
print(
|
||||
" (keeps the secret out of shell"
|
||||
" history and MCP-host config plaintext)",
|
||||
file=sys.stderr,
|
||||
)
|
||||
print("", file=sys.stderr)
|
||||
print(f"Currently missing: {', '.join(missing)}", file=sys.stderr)
|
||||
|
||||
|
||||
def read_token_file() -> str:
|
||||
"""Read the token from the resolved configs dir's ``.auth_token`` if
|
||||
present.
|
||||
|
||||
Mirrors platform_auth._token_file's location resolution but without
|
||||
importing the heavy module here (that import triggers a2a_client's
|
||||
WORKSPACE_ID guard which is fine after env validation, but cheaper
|
||||
to inline a 4-line file read than pull in the whole stack just for
|
||||
the path).
|
||||
"""
|
||||
path = configs_dir.resolve() / ".auth_token"
|
||||
if not path.is_file():
|
||||
return ""
|
||||
try:
|
||||
return path.read_text().strip()
|
||||
except OSError:
|
||||
return ""
|
||||
@@ -1,71 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Update workspace task status on the canvas.
|
||||
|
||||
Usage (from any script, cron job, or shell inside the container):
|
||||
|
||||
# Set current task (shows on canvas card)
|
||||
python3 -m molecule_runtime.molecule_ai_status "Running weekly SEO audit..."
|
||||
|
||||
# Clear task (removes banner from canvas)
|
||||
python3 -m molecule_runtime.molecule_ai_status ""
|
||||
|
||||
The status appears as an amber banner on the workspace card in the canvas,
|
||||
visible to the project owner in real-time.
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
import httpx
|
||||
|
||||
_WORKSPACE_ID_raw = os.environ.get("WORKSPACE_ID")
|
||||
if not _WORKSPACE_ID_raw:
|
||||
raise RuntimeError("WORKSPACE_ID environment variable is required but not set")
|
||||
WORKSPACE_ID = _WORKSPACE_ID_raw
|
||||
PLATFORM_URL = os.environ.get("PLATFORM_URL", "http://host.docker.internal:8080")
|
||||
|
||||
|
||||
def set_status(task: str):
|
||||
"""Push current_task to platform via heartbeat."""
|
||||
try:
|
||||
try:
|
||||
from platform_auth import auth_headers as _auth
|
||||
_headers = _auth()
|
||||
except Exception:
|
||||
_headers = {}
|
||||
httpx.post(
|
||||
f"{PLATFORM_URL}/registry/heartbeat",
|
||||
json={
|
||||
"workspace_id": WORKSPACE_ID,
|
||||
"current_task": task,
|
||||
"active_tasks": 1 if task else 0,
|
||||
"error_rate": 0,
|
||||
"sample_error": "",
|
||||
"uptime_seconds": 0,
|
||||
},
|
||||
headers=_headers,
|
||||
timeout=5.0,
|
||||
)
|
||||
if task:
|
||||
# Also log as activity for traceability
|
||||
httpx.post(
|
||||
f"{PLATFORM_URL}/workspaces/{WORKSPACE_ID}/activity",
|
||||
json={
|
||||
"activity_type": "task_update",
|
||||
"source_id": WORKSPACE_ID,
|
||||
"summary": task,
|
||||
"status": "ok",
|
||||
},
|
||||
timeout=5.0,
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"molecule_ai_status: failed to update: {e}", file=sys.stderr)
|
||||
|
||||
|
||||
if __name__ == "__main__": # pragma: no cover
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: python3 -m molecule_runtime.molecule_ai_status 'task description'")
|
||||
print(" python3 -m molecule_runtime.molecule_ai_status '' # clear")
|
||||
sys.exit(1)
|
||||
|
||||
set_status(sys.argv[1])
|
||||
@@ -1,24 +0,0 @@
|
||||
"""molecule_audit — HMAC-SHA256-chained immutable agent event log.
|
||||
|
||||
EU AI Act Annex III compliance (Art. 12/13 record-keeping, Art. 17 quality
|
||||
management) for high-risk AI systems.
|
||||
|
||||
Quick start
|
||||
-----------
|
||||
from molecule_audit.hooks import LedgerHooks
|
||||
|
||||
with LedgerHooks(session_id=task_id) as hooks:
|
||||
hooks.on_task_start(input_text=user_prompt)
|
||||
# ... call LLM / tools ...
|
||||
hooks.on_llm_call(model="hermes-3", output_text=reply)
|
||||
hooks.on_task_end(output_text=result)
|
||||
|
||||
Verify a chain
|
||||
--------------
|
||||
python -m molecule_audit.verify --agent-id <id>
|
||||
"""
|
||||
|
||||
from .ledger import AuditEvent, append_event, get_engine, verify_chain
|
||||
from .hooks import LedgerHooks
|
||||
|
||||
__all__ = ["AuditEvent", "append_event", "get_engine", "verify_chain", "LedgerHooks"]
|
||||
@@ -1,244 +0,0 @@
|
||||
"""molecule_audit.hooks — Pipeline hook registrations for the audit ledger.
|
||||
|
||||
Registers audit events at four EU AI Act Art. 12 pipeline checkpoints:
|
||||
task_start — an A2A task begins execution
|
||||
llm_call — a model inference call is made (records model name)
|
||||
tool_call — a tool/function is invoked (records tool name in model_used)
|
||||
task_end — a task completes (success or failure)
|
||||
|
||||
Usage
|
||||
-----
|
||||
The recommended pattern is to create a LedgerHooks instance at the start of
|
||||
each task and use it as a context manager:
|
||||
|
||||
from molecule_audit.hooks import LedgerHooks
|
||||
|
||||
with LedgerHooks(session_id=task_id, agent_id=agent_id) as hooks:
|
||||
hooks.on_task_start(input_text=user_prompt)
|
||||
response = call_llm(model="hermes-4", prompt=user_prompt)
|
||||
hooks.on_llm_call(model="hermes-4", input_text=user_prompt,
|
||||
output_text=response)
|
||||
result = run_tool("search", query=user_prompt)
|
||||
hooks.on_tool_call("search", input_data=user_prompt, output_data=result)
|
||||
hooks.on_task_end(output_text=result)
|
||||
|
||||
All hook methods swallow exceptions so that audit failures never block the
|
||||
agent pipeline. Failures are emitted at WARNING level.
|
||||
|
||||
Privacy note
|
||||
------------
|
||||
Raw input/output text is never persisted. All on_* methods take plaintext
|
||||
for convenience and immediately hash it with SHA-256 via hash_content().
|
||||
Only the hex digest is stored in the ledger.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from typing import Any
|
||||
|
||||
from .ledger import append_event, get_session_factory, hash_content
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Default agent identity — set by the platform when launching a workspace container.
|
||||
_DEFAULT_AGENT_ID: str = os.environ.get("WORKSPACE_ID", "unknown-agent")
|
||||
|
||||
|
||||
class LedgerHooks:
|
||||
"""Lifecycle hooks that write signed events to the audit ledger.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
session_id: Task / conversation ID (gen_ai.conversation.id).
|
||||
Required — must be unique per agent session.
|
||||
agent_id: Identity of this agent.
|
||||
Defaults to the WORKSPACE_ID env var.
|
||||
db_url: SQLAlchemy URL override — useful in tests to point at
|
||||
an in-memory SQLite DB (``"sqlite:///:memory:"``).
|
||||
human_oversight_flag: Default oversight flag written on task_start / task_end.
|
||||
Can be overridden per call.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
session_id: str,
|
||||
agent_id: str | None = None,
|
||||
db_url: str | None = None,
|
||||
human_oversight_flag: bool = False,
|
||||
) -> None:
|
||||
self.agent_id: str = agent_id or _DEFAULT_AGENT_ID
|
||||
self.session_id: str = session_id
|
||||
self._db_url: str | None = db_url
|
||||
self._default_human_oversight: bool = human_oversight_flag
|
||||
self._session = None
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Session management
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def _open_session(self):
|
||||
"""Return a lazily-opened SQLAlchemy session (cached for this instance)."""
|
||||
if self._session is None:
|
||||
factory = get_session_factory(self._db_url)
|
||||
self._session = factory()
|
||||
return self._session
|
||||
|
||||
def close(self) -> None:
|
||||
"""Release the underlying SQLAlchemy session."""
|
||||
if self._session is not None:
|
||||
self._session.close()
|
||||
self._session = None
|
||||
|
||||
def __enter__(self) -> "LedgerHooks":
|
||||
return self
|
||||
|
||||
def __exit__(self, exc_type, exc_val, exc_tb) -> None:
|
||||
self.close()
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Four pipeline hook points (EU AI Act Art. 12)
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def on_task_start(
|
||||
self,
|
||||
input_text: str | None = None,
|
||||
human_oversight_flag: bool | None = None,
|
||||
risk_flag: bool = False,
|
||||
) -> None:
|
||||
"""Log ``operation=task_start`` when an agent task begins.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
input_text: Raw user / caller input (hashed before storage).
|
||||
human_oversight_flag: Override the instance-level default.
|
||||
risk_flag: Set True when the input triggers a risk condition.
|
||||
"""
|
||||
self._safe_append(
|
||||
operation="task_start",
|
||||
input_hash=hash_content(input_text),
|
||||
human_oversight_flag=(
|
||||
human_oversight_flag
|
||||
if human_oversight_flag is not None
|
||||
else self._default_human_oversight
|
||||
),
|
||||
risk_flag=risk_flag,
|
||||
)
|
||||
|
||||
def on_llm_call(
|
||||
self,
|
||||
model: str,
|
||||
input_text: str | None = None,
|
||||
output_text: str | None = None,
|
||||
risk_flag: bool = False,
|
||||
) -> None:
|
||||
"""Log ``operation=llm_call`` when a model inference call is made.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
model: Model identifier (e.g. ``"hermes-4-405b"``).
|
||||
input_text: Prompt / messages sent to the model (hashed).
|
||||
output_text: Model response text (hashed).
|
||||
risk_flag: Set True when the response triggers a risk condition.
|
||||
"""
|
||||
self._safe_append(
|
||||
operation="llm_call",
|
||||
input_hash=hash_content(input_text),
|
||||
output_hash=hash_content(output_text),
|
||||
model_used=model,
|
||||
risk_flag=risk_flag,
|
||||
)
|
||||
|
||||
def on_tool_call(
|
||||
self,
|
||||
tool_name: str,
|
||||
input_data: Any = None,
|
||||
output_data: Any = None,
|
||||
risk_flag: bool = False,
|
||||
) -> None:
|
||||
"""Log ``operation=tool_call`` when a tool/function is invoked.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
tool_name: Name of the tool or function (stored in ``model_used``).
|
||||
input_data: Tool input — str, bytes, or JSON-serializable object (hashed).
|
||||
output_data: Tool output — same type options (hashed).
|
||||
risk_flag: Set True when the tool result triggers a risk condition.
|
||||
"""
|
||||
self._safe_append(
|
||||
operation="tool_call",
|
||||
input_hash=hash_content(_to_bytes(input_data)),
|
||||
output_hash=hash_content(_to_bytes(output_data)),
|
||||
model_used=tool_name,
|
||||
risk_flag=risk_flag,
|
||||
)
|
||||
|
||||
def on_task_end(
|
||||
self,
|
||||
output_text: str | None = None,
|
||||
human_oversight_flag: bool | None = None,
|
||||
risk_flag: bool = False,
|
||||
) -> None:
|
||||
"""Log ``operation=task_end`` when a task completes.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
output_text: Final task output / result (hashed before storage).
|
||||
human_oversight_flag: Override the instance-level default.
|
||||
risk_flag: Set True when the final result triggers a risk condition.
|
||||
"""
|
||||
self._safe_append(
|
||||
operation="task_end",
|
||||
output_hash=hash_content(output_text),
|
||||
human_oversight_flag=(
|
||||
human_oversight_flag
|
||||
if human_oversight_flag is not None
|
||||
else self._default_human_oversight
|
||||
),
|
||||
risk_flag=risk_flag,
|
||||
)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Internal helpers
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def _safe_append(self, **kwargs) -> None:
|
||||
"""Append an audit event, swallowing all exceptions.
|
||||
|
||||
Audit failures must never block the agent pipeline. All errors are
|
||||
logged at WARNING level so operators can detect gaps in the log.
|
||||
"""
|
||||
try:
|
||||
append_event(
|
||||
agent_id=self.agent_id,
|
||||
session_id=self.session_id,
|
||||
db_session=self._open_session(),
|
||||
**kwargs,
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning(
|
||||
"audit: failed to append event "
|
||||
"(agent=%s session=%s op=%s): %s",
|
||||
self.agent_id,
|
||||
self.session_id,
|
||||
kwargs.get("operation", "?"),
|
||||
exc,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Private helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _to_bytes(value: Any) -> bytes | None:
|
||||
"""Convert a value to bytes for hashing; returns None for None."""
|
||||
if value is None:
|
||||
return None
|
||||
if isinstance(value, bytes):
|
||||
return value
|
||||
if isinstance(value, str):
|
||||
return value.encode("utf-8")
|
||||
# JSON-serializable objects (dicts, lists, etc.)
|
||||
return json.dumps(value, sort_keys=True, separators=(",", ":")).encode("utf-8")
|
||||
@@ -1,434 +0,0 @@
|
||||
"""molecule_audit.ledger — HMAC-SHA256-chained SQLAlchemy audit event log.
|
||||
|
||||
EU AI Act Annex III compliance (Art. 12/13 record-keeping, Art. 17 quality
|
||||
management system) for high-risk AI systems.
|
||||
|
||||
HMAC chain design (EDDI pattern, PBKDF2 + SHA-256)
|
||||
----------------------------------------------------
|
||||
Key derivation:
|
||||
key = PBKDF2HMAC(
|
||||
algorithm=SHA-256,
|
||||
password=AUDIT_LEDGER_SALT, # from env — the shared secret
|
||||
salt=b"molecule-audit-ledger-v1", # fixed domain separator
|
||||
iterations=210_000,
|
||||
length=32,
|
||||
)
|
||||
|
||||
Canonical JSON (for HMAC input):
|
||||
json.dumps(row_dict_without_hmac_field, sort_keys=True, separators=(",", ":"))
|
||||
Timestamp is serialised as RFC-3339 seconds-precision with Z suffix
|
||||
(e.g. "2026-04-17T12:34:56Z") so the format matches Go's time.Time.UTC().
|
||||
|
||||
Per-row HMAC:
|
||||
hmac_hex = HMAC-SHA256(key, canonical_json.encode()).hexdigest()
|
||||
|
||||
Chain linkage:
|
||||
prev_hmac = hmac field of the immediately prior row for this agent_id
|
||||
(None / NULL for the first row of each agent)
|
||||
|
||||
Tamper-evidence: any row modification breaks all subsequent HMACs for that
|
||||
agent_id.
|
||||
|
||||
Environment variables
|
||||
---------------------
|
||||
AUDIT_LEDGER_SALT REQUIRED. Secret salt used as PBKDF2 password.
|
||||
Raises RuntimeError at first key-derivation call if unset.
|
||||
AUDIT_LEDGER_DB Path to SQLite file.
|
||||
Default: /var/log/molecule/audit_ledger.db
|
||||
Override with a full SQLAlchemy URL (sqlite:///..., postgresql://...)
|
||||
for non-SQLite backends.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import hmac as _hmac_mod
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from datetime import datetime, timezone
|
||||
from typing import Optional
|
||||
from uuid import uuid4
|
||||
|
||||
from sqlalchemy import Boolean, Column, DateTime, String, create_engine
|
||||
from sqlalchemy.orm import DeclarativeBase, Session, sessionmaker
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Configuration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
AUDIT_LEDGER_DB: str = os.environ.get(
|
||||
"AUDIT_LEDGER_DB", "/var/log/molecule/audit_ledger.db"
|
||||
)
|
||||
|
||||
# PBKDF2 parameters (must never change once events are written — all existing
|
||||
# HMACs become unverifiable if parameters change).
|
||||
_PBKDF2_SALT: bytes = b"molecule-audit-ledger-v1" # fixed domain separator
|
||||
_PBKDF2_ITERATIONS: int = 210_000
|
||||
_PBKDF2_DKLEN: int = 32
|
||||
|
||||
# Cached derived key (reset to None in tests when AUDIT_LEDGER_SALT changes).
|
||||
_hmac_key: Optional[bytes] = None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PBKDF2 key derivation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _get_hmac_key() -> bytes:
|
||||
"""Return (and cache) the 32-byte HMAC key derived from AUDIT_LEDGER_SALT.
|
||||
|
||||
Reads AUDIT_LEDGER_SALT exclusively from the environment — never from a
|
||||
module-level attribute — so the secret is not exposed in the module
|
||||
namespace. Raises RuntimeError if the env var is not set.
|
||||
"""
|
||||
global _hmac_key
|
||||
if _hmac_key is None:
|
||||
salt = os.environ.get("AUDIT_LEDGER_SALT", "")
|
||||
if not salt:
|
||||
raise RuntimeError(
|
||||
"AUDIT_LEDGER_SALT environment variable is required but not set. "
|
||||
"Generate a random 32-byte hex string and export it before "
|
||||
"starting the agent: "
|
||||
"export AUDIT_LEDGER_SALT=$(python3 -c "
|
||||
"\"import secrets; print(secrets.token_hex(32))\")"
|
||||
)
|
||||
_hmac_key = hashlib.pbkdf2_hmac(
|
||||
"sha256",
|
||||
password=salt.encode("utf-8"),
|
||||
salt=_PBKDF2_SALT,
|
||||
iterations=_PBKDF2_ITERATIONS,
|
||||
dklen=_PBKDF2_DKLEN,
|
||||
)
|
||||
return _hmac_key
|
||||
|
||||
|
||||
def reset_hmac_key_cache() -> None:
|
||||
"""Reset the cached HMAC key — call after changing AUDIT_LEDGER_SALT env var in tests."""
|
||||
global _hmac_key
|
||||
_hmac_key = None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Canonical JSON helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _ts_to_canonical(ts: datetime | None) -> str | None:
|
||||
"""Format a datetime as RFC-3339 seconds-precision Z-suffixed string.
|
||||
|
||||
Strips microseconds and converts to UTC so the format is identical to
|
||||
Go's ``time.Time.UTC().Format("2006-01-02T15:04:05Z")``.
|
||||
"""
|
||||
if ts is None:
|
||||
return None
|
||||
if ts.tzinfo is not None:
|
||||
ts = ts.astimezone(timezone.utc)
|
||||
return ts.strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||
|
||||
|
||||
def _to_canonical_dict(ev: "AuditEvent") -> dict:
|
||||
"""Return the dict used as HMAC input — excludes the hmac field itself."""
|
||||
return {
|
||||
"agent_id": ev.agent_id,
|
||||
"human_oversight_flag": ev.human_oversight_flag,
|
||||
"id": ev.id,
|
||||
"input_hash": ev.input_hash,
|
||||
"model_used": ev.model_used,
|
||||
"operation": ev.operation,
|
||||
"output_hash": ev.output_hash,
|
||||
"prev_hmac": ev.prev_hmac,
|
||||
"risk_flag": ev.risk_flag,
|
||||
"session_id": ev.session_id,
|
||||
"timestamp": _ts_to_canonical(ev.timestamp),
|
||||
}
|
||||
|
||||
|
||||
def _compute_event_hmac(ev: "AuditEvent") -> str:
|
||||
"""Compute HMAC-SHA256 hex digest of ev's canonical JSON.
|
||||
|
||||
Keys are sorted alphabetically (matching Python json.dumps sort_keys=True
|
||||
and Go encoding/json.Marshal on a map). Separators are compact (no spaces)
|
||||
so the output matches Go's json.Marshal.
|
||||
"""
|
||||
canonical = _to_canonical_dict(ev)
|
||||
payload = json.dumps(canonical, sort_keys=True, separators=(",", ":")).encode("utf-8")
|
||||
key = _get_hmac_key()
|
||||
return _hmac_mod.new(key, payload, "sha256").hexdigest()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Content hashing helper (privacy-preserving)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def hash_content(content: str | bytes | None) -> str | None:
|
||||
"""Return SHA-256 hex digest of content, or None if content is falsy.
|
||||
|
||||
Use this to record *that* specific content was processed without persisting
|
||||
the raw content itself (satisfies EU AI Act data-minimisation principles).
|
||||
"""
|
||||
if content is None:
|
||||
return None
|
||||
if isinstance(content, str):
|
||||
content = content.encode("utf-8")
|
||||
return hashlib.sha256(content).hexdigest()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# SQLAlchemy model
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class Base(DeclarativeBase):
|
||||
pass
|
||||
|
||||
|
||||
class AuditEvent(Base):
|
||||
"""Append-only HMAC-chained audit event.
|
||||
|
||||
12 fields: 6 legally mandatory under EU AI Act Art. 12/13, plus 4 strongly
|
||||
recommended, plus the 2-field HMAC chain (prev_hmac, hmac).
|
||||
"""
|
||||
|
||||
__tablename__ = "audit_events"
|
||||
|
||||
# Identity
|
||||
id = Column(String, primary_key=True, default=lambda: str(uuid4()))
|
||||
timestamp = Column(
|
||||
DateTime(timezone=True),
|
||||
nullable=False,
|
||||
default=lambda: datetime.now(timezone.utc),
|
||||
)
|
||||
|
||||
# EU AI Act Art. 12 mandatory fields
|
||||
agent_id = Column(String, nullable=False)
|
||||
session_id = Column(String, nullable=False) # gen_ai.conversation.id
|
||||
operation = Column(String, nullable=False) # task_start|llm_call|tool_call|task_end
|
||||
|
||||
# Privacy-preserving content fingerprints
|
||||
input_hash = Column(String, nullable=True) # SHA-256 of input text
|
||||
output_hash = Column(String, nullable=True) # SHA-256 of output text
|
||||
|
||||
# EU AI Act Art. 13 transparency fields
|
||||
model_used = Column(String, nullable=True) # gen_ai.request.model (or tool name)
|
||||
|
||||
# Oversight flags (Art. 14 human oversight)
|
||||
human_oversight_flag = Column(Boolean, nullable=False, default=False)
|
||||
risk_flag = Column(Boolean, nullable=False, default=False)
|
||||
|
||||
# HMAC chain
|
||||
prev_hmac = Column(String, nullable=True) # hmac of previous row for this agent_id
|
||||
hmac = Column(String, nullable=False) # HMAC of this row's canonical JSON
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Return a full dict suitable for API responses (ISO 8601 timestamp)."""
|
||||
return {
|
||||
"id": self.id,
|
||||
"timestamp": self.timestamp.isoformat() if self.timestamp else None,
|
||||
"agent_id": self.agent_id,
|
||||
"session_id": self.session_id,
|
||||
"operation": self.operation,
|
||||
"input_hash": self.input_hash,
|
||||
"output_hash": self.output_hash,
|
||||
"model_used": self.model_used,
|
||||
"human_oversight_flag": self.human_oversight_flag,
|
||||
"risk_flag": self.risk_flag,
|
||||
"prev_hmac": self.prev_hmac,
|
||||
"hmac": self.hmac,
|
||||
}
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return (
|
||||
f"<AuditEvent id={self.id!r} agent_id={self.agent_id!r} "
|
||||
f"op={self.operation!r} ts={self.timestamp!r}>"
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Engine / session factory
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_engine = None
|
||||
_SessionFactory = None
|
||||
|
||||
|
||||
def get_engine(db_url: str | None = None):
|
||||
"""Return (and cache) the SQLAlchemy engine.
|
||||
|
||||
Creates the ``audit_events`` table if it does not already exist.
|
||||
"""
|
||||
global _engine
|
||||
if _engine is None:
|
||||
url = db_url or _db_url_from_env()
|
||||
if url.startswith("sqlite:///"):
|
||||
_ensure_sqlite_parent(url)
|
||||
connect_args = {"check_same_thread": False} if "sqlite" in url else {}
|
||||
_engine = create_engine(url, connect_args=connect_args)
|
||||
Base.metadata.create_all(_engine)
|
||||
return _engine
|
||||
|
||||
|
||||
def _db_url_from_env() -> str:
|
||||
"""Build the DB URL from environment variables."""
|
||||
db = AUDIT_LEDGER_DB
|
||||
if db.startswith(("sqlite://", "postgresql://", "postgres://")):
|
||||
return db
|
||||
return f"sqlite:///{db}"
|
||||
|
||||
|
||||
def _ensure_sqlite_parent(url: str) -> None:
|
||||
"""Create the parent directory for a sqlite:///path URL if needed."""
|
||||
path = url[len("sqlite:///"):]
|
||||
if path and path != ":memory:":
|
||||
os.makedirs(os.path.dirname(os.path.abspath(path)), exist_ok=True)
|
||||
|
||||
|
||||
def get_session_factory(db_url: str | None = None):
|
||||
"""Return (and cache) a SQLAlchemy sessionmaker bound to the engine."""
|
||||
global _SessionFactory
|
||||
if _SessionFactory is None:
|
||||
_SessionFactory = sessionmaker(bind=get_engine(db_url))
|
||||
return _SessionFactory
|
||||
|
||||
|
||||
def reset_engine_cache() -> None:
|
||||
"""Reset the cached engine and session factory — for tests only."""
|
||||
global _engine, _SessionFactory
|
||||
_engine = None
|
||||
_SessionFactory = None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Core write API
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _prev_hmac_for_agent(agent_id: str, session: Session) -> str | None:
|
||||
"""Return the hmac of the most recent event for agent_id (None if none)."""
|
||||
last = (
|
||||
session.query(AuditEvent)
|
||||
.filter(AuditEvent.agent_id == agent_id)
|
||||
.order_by(AuditEvent.timestamp.desc(), AuditEvent.id.desc())
|
||||
.first()
|
||||
)
|
||||
return last.hmac if last else None
|
||||
|
||||
|
||||
def append_event(
|
||||
agent_id: str,
|
||||
session_id: str,
|
||||
operation: str,
|
||||
*,
|
||||
input_hash: str | None = None,
|
||||
output_hash: str | None = None,
|
||||
model_used: str | None = None,
|
||||
human_oversight_flag: bool = False,
|
||||
risk_flag: bool = False,
|
||||
db_session: Session | None = None,
|
||||
db_url: str | None = None,
|
||||
) -> AuditEvent:
|
||||
"""Append one signed, chained event to the ledger and return it.
|
||||
|
||||
Derives the HMAC key from AUDIT_LEDGER_SALT (raises RuntimeError if unset),
|
||||
looks up the previous row's HMAC to form the chain link, signs the new row,
|
||||
and writes it to the database.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
agent_id: Identity of the agent (typically WORKSPACE_ID).
|
||||
session_id: Task / conversation ID (gen_ai.conversation.id).
|
||||
operation: One of: task_start, llm_call, tool_call, task_end.
|
||||
input_hash: SHA-256 of the input (use hash_content()).
|
||||
output_hash: SHA-256 of the output.
|
||||
model_used: Model name (for llm_call) or tool name (for tool_call).
|
||||
human_oversight_flag: True if human review was required / triggered.
|
||||
risk_flag: True if a risk condition was detected.
|
||||
db_session: Pre-opened Session (created + closed internally if None).
|
||||
db_url: SQLAlchemy URL override (used if session is None).
|
||||
"""
|
||||
own_session = db_session is None
|
||||
if own_session:
|
||||
factory = get_session_factory(db_url)
|
||||
db_session = factory()
|
||||
|
||||
try:
|
||||
prev_hmac = _prev_hmac_for_agent(agent_id, db_session)
|
||||
|
||||
event = AuditEvent(
|
||||
id=str(uuid4()),
|
||||
timestamp=datetime.now(timezone.utc),
|
||||
agent_id=agent_id,
|
||||
session_id=session_id,
|
||||
operation=operation,
|
||||
input_hash=input_hash,
|
||||
output_hash=output_hash,
|
||||
model_used=model_used,
|
||||
human_oversight_flag=human_oversight_flag,
|
||||
risk_flag=risk_flag,
|
||||
prev_hmac=prev_hmac,
|
||||
hmac="", # placeholder — replaced below after ID/timestamp are set
|
||||
)
|
||||
|
||||
# Compute the real HMAC now that all fields are populated.
|
||||
event.hmac = _compute_event_hmac(event)
|
||||
|
||||
db_session.add(event)
|
||||
db_session.commit()
|
||||
db_session.refresh(event)
|
||||
return event
|
||||
|
||||
except Exception:
|
||||
if own_session:
|
||||
db_session.rollback()
|
||||
raise
|
||||
finally:
|
||||
if own_session:
|
||||
db_session.close()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Verification
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def verify_chain(agent_id: str, db_session: Session) -> bool:
|
||||
"""Return True if the entire HMAC chain for agent_id is intact.
|
||||
|
||||
Iterates all events for agent_id in chronological order and checks:
|
||||
1. Each row's stored hmac matches the freshly-computed HMAC.
|
||||
2. Each row's prev_hmac equals the prior row's hmac (None for first row).
|
||||
|
||||
Returns False (and logs a warning) at the first broken link.
|
||||
Returns True vacuously when there are no events.
|
||||
"""
|
||||
events = (
|
||||
db_session.query(AuditEvent)
|
||||
.filter(AuditEvent.agent_id == agent_id)
|
||||
.order_by(AuditEvent.timestamp.asc(), AuditEvent.id.asc())
|
||||
.all()
|
||||
)
|
||||
|
||||
expected_prev: str | None = None
|
||||
for ev in events:
|
||||
expected_hmac = _compute_event_hmac(ev)
|
||||
if not _hmac_mod.compare_digest(ev.hmac, expected_hmac):
|
||||
logger.warning(
|
||||
"audit: HMAC mismatch at event %s (agent=%s): "
|
||||
"stored=%r computed=%r",
|
||||
ev.id,
|
||||
agent_id,
|
||||
ev.hmac,
|
||||
expected_hmac,
|
||||
)
|
||||
return False
|
||||
if not _hmac_mod.compare_digest(ev.prev_hmac or "", expected_prev or ""):
|
||||
logger.warning(
|
||||
"audit: chain break at event %s (agent=%s): "
|
||||
"stored prev_hmac=%r expected=%r",
|
||||
ev.id,
|
||||
agent_id,
|
||||
ev.prev_hmac,
|
||||
expected_prev,
|
||||
)
|
||||
return False
|
||||
expected_prev = ev.hmac
|
||||
|
||||
return True
|
||||
@@ -1,136 +0,0 @@
|
||||
"""molecule_audit.verify — CLI to verify an agent's HMAC chain integrity.
|
||||
|
||||
Usage
|
||||
-----
|
||||
python -m molecule_audit.verify --agent-id <id> [--db <url>]
|
||||
|
||||
Options
|
||||
-------
|
||||
--agent-id Agent ID whose chain to verify (required).
|
||||
--db SQLAlchemy DB URL override.
|
||||
Defaults to AUDIT_LEDGER_DB env var or /var/log/molecule/audit_ledger.db.
|
||||
|
||||
Exit codes
|
||||
----------
|
||||
0 Chain is valid (or no events found for this agent).
|
||||
1 Chain is broken — tampered or corrupted row(s) detected.
|
||||
2 Configuration error (e.g. AUDIT_LEDGER_SALT not set).
|
||||
3 Database error (e.g. file not found, connection refused).
|
||||
|
||||
Example
|
||||
-------
|
||||
export AUDIT_LEDGER_SALT=<your-secret>
|
||||
export AUDIT_LEDGER_DB=/var/log/molecule/audit_ledger.db
|
||||
python -m molecule_audit.verify --agent-id my-workspace-id
|
||||
# CHAIN VALID (42 events)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import hmac as _hmac_mod
|
||||
import sys
|
||||
|
||||
|
||||
def main(argv=None) -> None:
|
||||
parser = argparse.ArgumentParser(
|
||||
prog="python -m molecule_audit.verify",
|
||||
description=(
|
||||
"Verify the HMAC chain integrity for a given agent's audit log. "
|
||||
"Exit 0 = valid, 1 = broken, 2 = config error, 3 = DB error."
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--agent-id",
|
||||
required=True,
|
||||
metavar="AGENT_ID",
|
||||
help="Agent workspace ID to verify.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--db",
|
||||
default=None,
|
||||
metavar="URL",
|
||||
help=(
|
||||
"SQLAlchemy DB URL (e.g. sqlite:///path.db or "
|
||||
"postgresql://user:pass@host/db). "
|
||||
"Defaults to AUDIT_LEDGER_DB env var."
|
||||
),
|
||||
)
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
# Defer imports so errors in configuration (missing SALT) produce clean output.
|
||||
try:
|
||||
from molecule_audit.ledger import (
|
||||
AuditEvent,
|
||||
_compute_event_hmac,
|
||||
get_session_factory,
|
||||
verify_chain,
|
||||
)
|
||||
except RuntimeError as exc:
|
||||
print(f"ERROR: {exc}", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
try:
|
||||
factory = get_session_factory(args.db)
|
||||
session = factory()
|
||||
except Exception as exc:
|
||||
print(f"ERROR: could not open database: {exc}", file=sys.stderr)
|
||||
sys.exit(3)
|
||||
|
||||
try:
|
||||
from sqlalchemy import asc
|
||||
|
||||
n_events = (
|
||||
session.query(AuditEvent)
|
||||
.filter(AuditEvent.agent_id == args.agent_id)
|
||||
.count()
|
||||
)
|
||||
|
||||
if n_events == 0:
|
||||
print(f"No audit events found for agent_id={args.agent_id!r}")
|
||||
sys.exit(0)
|
||||
|
||||
valid = verify_chain(args.agent_id, session)
|
||||
|
||||
if valid:
|
||||
print(f"CHAIN VALID ({n_events} events)")
|
||||
sys.exit(0)
|
||||
else:
|
||||
# Walk the chain manually to report the exact broken event.
|
||||
events = (
|
||||
session.query(AuditEvent)
|
||||
.filter(AuditEvent.agent_id == args.agent_id)
|
||||
.order_by(asc(AuditEvent.timestamp), asc(AuditEvent.id))
|
||||
.all()
|
||||
)
|
||||
expected_prev = None
|
||||
for ev in events:
|
||||
expected_hmac = _compute_event_hmac(ev)
|
||||
if not _hmac_mod.compare_digest(ev.hmac, expected_hmac):
|
||||
print(
|
||||
f"CHAIN BROKEN at event {ev.id} "
|
||||
f"(HMAC mismatch: stored={ev.hmac[:12]}... "
|
||||
f"computed={expected_hmac[:12]}...)"
|
||||
)
|
||||
sys.exit(1)
|
||||
if not _hmac_mod.compare_digest(ev.prev_hmac or "", expected_prev or ""):
|
||||
print(
|
||||
f"CHAIN BROKEN at event {ev.id} "
|
||||
f"(prev_hmac mismatch: stored={ev.prev_hmac} "
|
||||
f"expected={expected_prev})"
|
||||
)
|
||||
sys.exit(1)
|
||||
expected_prev = ev.hmac
|
||||
# verify_chain said broken but we couldn't find the exact event
|
||||
print(f"CHAIN BROKEN (position unknown; run with DEBUG logging)")
|
||||
sys.exit(1)
|
||||
|
||||
except Exception as exc:
|
||||
print(f"ERROR: verification failed: {exc}", file=sys.stderr)
|
||||
sys.exit(3)
|
||||
finally:
|
||||
session.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,69 +0,0 @@
|
||||
"""Build a JSON-RPC handler that returns ``-32603 "agent not configured"``.
|
||||
|
||||
Used by the workspace runtime when ``adapter.setup()`` fails (most often
|
||||
because an LLM credential is missing or rotated). Lets ``/.well-known/agent-card.json``
|
||||
keep serving 200 — the workspace stays REACHABLE for canvas/operator
|
||||
introspection — while message-send requests get a clear, immediate
|
||||
error instead of silently timing out.
|
||||
|
||||
Kept as its own module so the behavior is unit-testable without booting
|
||||
the whole runtime (main.py is ``# pragma: no cover``).
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Awaitable, Callable
|
||||
|
||||
from starlette.requests import Request
|
||||
from starlette.responses import JSONResponse
|
||||
|
||||
from secret_redactor import redact_secrets
|
||||
|
||||
|
||||
def make_not_configured_handler(
|
||||
reason: str | None,
|
||||
) -> Callable[[Request], Awaitable[JSONResponse]]:
|
||||
"""Return a Starlette POST handler that always 503s with JSON-RPC -32603.
|
||||
|
||||
``reason`` is surfaced in the JSON-RPC ``error.data`` field so canvas
|
||||
can render "agent not configured: <reason>" to the user. Pass the
|
||||
stringified ``adapter.setup()`` exception. ``None`` falls back to a
|
||||
generic "adapter.setup() failed".
|
||||
|
||||
Secret redaction (issue molecule-core#2760): ``reason`` is run
|
||||
through ``secret_redactor.redact_secrets`` once, when the handler
|
||||
is built. If a future adapter author writes ``raise
|
||||
RuntimeError(f"auth failed for {token}")``, the token is replaced
|
||||
with ``<redacted-secret>`` BEFORE it lands in the response —
|
||||
closes the structural leak path PR #2756 introduced. Per-request
|
||||
hot path stays unchanged (one cached string, no re-redaction).
|
||||
|
||||
The handler echoes the request's JSON-RPC ``id`` when present so a
|
||||
well-behaved JSON-RPC client can correlate the error to its request.
|
||||
Malformed bodies (non-JSON, missing id) get ``id: null`` per spec.
|
||||
"""
|
||||
|
||||
# Redact at handler-build time, not per-request, so the hot path
|
||||
# stays a constant lookup. The fallback string can't carry secrets
|
||||
# but we still pass it through redact_secrets() so a future change
|
||||
# to the fallback can't accidentally introduce a leak.
|
||||
fallback = redact_secrets(reason or "adapter.setup() failed")
|
||||
|
||||
async def _handler(request: Request) -> JSONResponse:
|
||||
try:
|
||||
body = await request.json()
|
||||
except Exception: # noqa: BLE001
|
||||
body = {}
|
||||
return JSONResponse(
|
||||
{
|
||||
"jsonrpc": "2.0",
|
||||
"id": body.get("id") if isinstance(body, dict) else None,
|
||||
"error": {
|
||||
"code": -32603,
|
||||
"message": "Internal error: agent not configured",
|
||||
"data": fallback,
|
||||
},
|
||||
},
|
||||
status_code=503,
|
||||
)
|
||||
|
||||
return _handler
|
||||
@@ -1,265 +0,0 @@
|
||||
"""Workspace auth-token store (Phase 30.1).
|
||||
|
||||
Single source of truth for this workspace's authentication token. The
|
||||
token is issued by the platform on the first successful
|
||||
``POST /registry/register`` call and travels with every subsequent
|
||||
heartbeat / update-card / (later) secrets-pull / A2A request.
|
||||
|
||||
The token is persisted to ``<configs>/.auth_token`` so it survives
|
||||
restarts — we only expect to receive it once from the platform, since
|
||||
``/registry/register`` no-ops token issuance for workspaces that already
|
||||
have one on file.
|
||||
|
||||
Storage:
|
||||
${CONFIGS_DIR}/.auth_token # 0600, one line, no trailing newline
|
||||
|
||||
Callers interact with three functions:
|
||||
:func:`get_token` — returns the cached token or None
|
||||
:func:`save_token` — persists a freshly-issued token
|
||||
:func:`auth_headers`— builds the Authorization header dict for httpx
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
import threading
|
||||
from pathlib import Path
|
||||
|
||||
import configs_dir
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# In-process cache so we don't hit disk on every heartbeat. The heartbeat
|
||||
# loop fires on a short interval and reading a tiny file 10x per minute
|
||||
# is wasteful. The file is the durable copy; this var is the hot path.
|
||||
_cached_token: str | None = None
|
||||
|
||||
# Per-workspace token registry — populated by mcp_cli when the operator
|
||||
# runs a multi-workspace external agent (MOLECULE_WORKSPACES env var).
|
||||
# Keyed by workspace_id, value is the bearer token issued by that
|
||||
# workspace's tenant. Distinct from `_cached_token` (which is the
|
||||
# single-workspace path's token); the two coexist so single-workspace
|
||||
# back-compat is preserved exactly.
|
||||
#
|
||||
# Lock guards mutations from the registration phase (one writer per
|
||||
# workspace, but the writers run in main(), not in heartbeat threads).
|
||||
# Reads are lock-free for the hot path; the dict is finalized before
|
||||
# any heartbeat / poller thread starts.
|
||||
_WORKSPACE_TOKENS: dict[str, str] = {}
|
||||
_WORKSPACE_TOKENS_LOCK = threading.Lock()
|
||||
|
||||
|
||||
def _token_file() -> Path:
|
||||
"""Path to the on-disk token file. Resolved via configs_dir so
|
||||
in-container (/configs) and external-runtime (~/.molecule-workspace)
|
||||
operators land on a writable location automatically. Explicit
|
||||
CONFIGS_DIR env var still wins."""
|
||||
return configs_dir.resolve() / ".auth_token"
|
||||
|
||||
|
||||
def get_token() -> str | None:
|
||||
"""Return the cached token, reading it from disk on first call.
|
||||
|
||||
Resolution order:
|
||||
1. In-process cache (hot path)
|
||||
2. ``${CONFIGS_DIR}/.auth_token`` file (in-container default —
|
||||
the platform writes this on provision and rotates it on
|
||||
restart)
|
||||
3. ``MOLECULE_WORKSPACE_TOKEN`` env var (external-runtime path —
|
||||
operators running the universal MCP server outside a
|
||||
container have no /configs volume to populate, so they pass
|
||||
the token via env)
|
||||
|
||||
File-first preserves in-container behavior unchanged: containers
|
||||
always have /configs/.auth_token on disk, env-var fallback only
|
||||
fires when there's no file. This is additive — no existing caller
|
||||
sees a behavior change.
|
||||
"""
|
||||
global _cached_token
|
||||
if _cached_token is not None:
|
||||
return _cached_token
|
||||
path = _token_file()
|
||||
if path.exists():
|
||||
try:
|
||||
tok = path.read_text().strip()
|
||||
except OSError as exc:
|
||||
logger.warning("platform_auth: failed to read %s: %s", path, exc)
|
||||
tok = ""
|
||||
if tok:
|
||||
_cached_token = tok
|
||||
return tok
|
||||
# File missing or empty — fall back to env (external-runtime path).
|
||||
env_tok = os.environ.get("MOLECULE_WORKSPACE_TOKEN", "").strip()
|
||||
if env_tok:
|
||||
_cached_token = env_tok
|
||||
return env_tok
|
||||
return None
|
||||
|
||||
|
||||
def save_token(token: str) -> None:
|
||||
"""Persist a newly-issued token. Creates the file with 0600 mode atomically.
|
||||
|
||||
Uses ``os.open(O_CREAT, 0o600)`` so the file is never world-readable,
|
||||
even transiently. The previous ``write_text()`` + ``chmod()`` approach
|
||||
had a TOCTOU window where a concurrent reader could access the token
|
||||
between the two syscalls (M4 — flagged in security audit cycle 10).
|
||||
|
||||
Idempotent — if an identical token is already on disk we skip the
|
||||
write so we don't churn the file's mtime or trigger spurious
|
||||
filesystem watchers."""
|
||||
global _cached_token
|
||||
token = token.strip()
|
||||
if not token:
|
||||
raise ValueError("platform_auth: refusing to save empty token")
|
||||
if get_token() == token:
|
||||
return
|
||||
path = _token_file()
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
# O_CREAT | O_WRONLY | O_TRUNC with mode=0o600 atomically creates (or
|
||||
# truncates) the file with restricted permissions in a single syscall,
|
||||
# eliminating the TOCTOU window.
|
||||
fd = os.open(str(path), os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
|
||||
try:
|
||||
os.write(fd, token.encode())
|
||||
finally:
|
||||
os.close(fd)
|
||||
_cached_token = token
|
||||
|
||||
|
||||
def register_workspace_token(workspace_id: str, token: str) -> None:
|
||||
"""Register a per-workspace bearer token in the multi-workspace registry.
|
||||
|
||||
Called by ``mcp_cli`` once per entry in the ``MOLECULE_WORKSPACES``
|
||||
env var so per-workspace heartbeat / poller threads can resolve their
|
||||
own auth via ``auth_headers(workspace_id=...)`` without each thread
|
||||
closing over a token literal.
|
||||
|
||||
Idempotent: re-registering the same workspace_id with the same token
|
||||
is a no-op; with a different token it overwrites and logs at INFO
|
||||
(the legitimate case is operator token rotation between restarts).
|
||||
"""
|
||||
workspace_id = (workspace_id or "").strip()
|
||||
token = (token or "").strip()
|
||||
if not workspace_id or not token:
|
||||
return
|
||||
with _WORKSPACE_TOKENS_LOCK:
|
||||
prior = _WORKSPACE_TOKENS.get(workspace_id)
|
||||
if prior == token:
|
||||
return
|
||||
if prior is not None:
|
||||
logger.info(
|
||||
"platform_auth: workspace_id %s token rotated", workspace_id,
|
||||
)
|
||||
_WORKSPACE_TOKENS[workspace_id] = token
|
||||
|
||||
|
||||
def get_workspace_token(workspace_id: str) -> str | None:
|
||||
"""Return the per-workspace token from the registry, or None.
|
||||
|
||||
Lookup is lock-free: writes happen in main() before threads start,
|
||||
reads are stable thereafter.
|
||||
"""
|
||||
return _WORKSPACE_TOKENS.get((workspace_id or "").strip())
|
||||
|
||||
|
||||
def list_registered_workspaces() -> list[str]:
|
||||
"""Return the workspace IDs currently in the per-workspace registry.
|
||||
|
||||
Empty list when no multi-workspace registration has happened (i.e.
|
||||
single-workspace operators using the legacy WORKSPACE_ID env path —
|
||||
those callers should fall back to the module-level WORKSPACE_ID).
|
||||
|
||||
Used by ``a2a_tools.tool_list_peers`` to aggregate peers across all
|
||||
workspaces an external agent has registered against, so a
|
||||
multi-workspace operator can see the full peer surface in one call
|
||||
instead of having to query each workspace separately.
|
||||
"""
|
||||
with _WORKSPACE_TOKENS_LOCK:
|
||||
return list(_WORKSPACE_TOKENS.keys())
|
||||
|
||||
|
||||
def auth_headers(workspace_id: str | None = None) -> dict[str, str]:
|
||||
"""Return a header dict to merge into httpx calls. Empty if no token
|
||||
is available yet — callers send the request as-is and the platform's
|
||||
heartbeat handler grandfathers pre-token workspaces through until
|
||||
their next /registry/register issues one.
|
||||
|
||||
Always sets ``Origin`` to ``PLATFORM_URL`` when that env var is set.
|
||||
On hosted SaaS deployments the tenant's edge WAF requires a same-
|
||||
origin header — without it ``/workspaces/*`` and ``/registry/*/peers``
|
||||
requests get silently rewritten to the canvas Next.js app, which has
|
||||
no such routes and returns an empty 404. Inside-container calls are
|
||||
unaffected (Docker-internal PLATFORM_URLs aren't behind the WAF).
|
||||
Discovered while smoke-testing the molecule-mcp external-runtime
|
||||
path against a live tenant — every tool call returned "not found"
|
||||
because the WAF was eating them.
|
||||
|
||||
Token resolution order:
|
||||
1. ``workspace_id`` arg → per-workspace registry
|
||||
(multi-workspace external agent — set by mcp_cli)
|
||||
2. Single-workspace cache + .auth_token file + env var
|
||||
(pre-existing path; back-compat unchanged)
|
||||
|
||||
Single-workspace operators see no behavior change: ``auth_headers()``
|
||||
with no arg routes through the legacy resolution path exactly as
|
||||
before. Multi-workspace operators pass ``workspace_id`` so each
|
||||
thread (heartbeat, poller, send_message_to_user) authenticates
|
||||
against the correct workspace.
|
||||
"""
|
||||
headers: dict[str, str] = {}
|
||||
platform_url = os.environ.get("PLATFORM_URL", "").strip()
|
||||
if platform_url:
|
||||
headers["Origin"] = platform_url
|
||||
tok: str | None = None
|
||||
if workspace_id:
|
||||
tok = get_workspace_token(workspace_id)
|
||||
if tok is None:
|
||||
tok = get_token()
|
||||
if tok:
|
||||
headers["Authorization"] = f"Bearer {tok}"
|
||||
return headers
|
||||
|
||||
|
||||
def self_source_headers(workspace_id: str) -> dict[str, str]:
|
||||
"""Return auth headers PLUS X-Workspace-ID identifying this workspace
|
||||
as the source of the request.
|
||||
|
||||
Use this for any POST the workspace's own runtime fires against the
|
||||
platform's A2A endpoints — heartbeat self-messages, initial_prompt,
|
||||
idle-loop fires, peer-to-peer A2A from runtime tools. Without the
|
||||
X-Workspace-ID header the platform's a2a_receive logger writes
|
||||
source_id=NULL, which the canvas's My Chat tab interprets as a
|
||||
user-typed message and renders the internal prompt to the user.
|
||||
See workspace-server/internal/handlers/a2a_proxy.go:184 for the
|
||||
server-side classification rule.
|
||||
|
||||
Centralised here so adding a new system header (e.g. a per-fire
|
||||
correlation ID) only touches one place — and so that any
|
||||
workspace→A2A POST that doesn't use this helper stands out in
|
||||
review as a probable bug."""
|
||||
# Pass workspace_id through to auth_headers so the bearer token
|
||||
# comes from the per-workspace registry when set — otherwise a
|
||||
# multi-workspace operator's source-tagged POST authenticates with
|
||||
# the legacy single token (or none) and the platform rejects with
|
||||
# 401, or worse silently logs the wrong source.
|
||||
return {**auth_headers(workspace_id), "X-Workspace-ID": workspace_id}
|
||||
|
||||
|
||||
def clear_cache() -> None:
|
||||
"""Reset the in-memory cache. Used by tests that write fresh token
|
||||
files between cases."""
|
||||
global _cached_token
|
||||
_cached_token = None
|
||||
with _WORKSPACE_TOKENS_LOCK:
|
||||
_WORKSPACE_TOKENS.clear()
|
||||
|
||||
|
||||
def refresh_cache() -> str | None:
|
||||
"""Force re-read of the token from disk, discarding the in-process cache.
|
||||
|
||||
Use this when a 401 response suggests the cached token is stale —
|
||||
e.g. after the platform rotates tokens during a restart (issue #1877).
|
||||
Returns the (new) token value or None if not found/error."""
|
||||
global _cached_token
|
||||
_cached_token = None
|
||||
return get_token()
|
||||
@@ -1,145 +0,0 @@
|
||||
"""Auth gate for the /internal/* Starlette routes.
|
||||
|
||||
The platform calls into the workspace's HTTP server using a per-workspace
|
||||
shared secret minted at provision time and stored in
|
||||
``/configs/.platform_inbound_secret`` (see migration 044 + RFC #2312).
|
||||
The workspace validates by string-equality against the file content —
|
||||
the platform side stores the same plaintext in ``workspaces
|
||||
.platform_inbound_secret`` and reads it back on every forward call.
|
||||
|
||||
Asymmetric to ``platform_auth.py``:
|
||||
|
||||
platform_auth.py platform_inbound_auth.py
|
||||
──────────────── ────────────────────────
|
||||
workspace → platform platform → workspace
|
||||
/configs/.auth_token /configs/.platform_inbound_secret
|
||||
workspace presents bearer workspace validates bearer
|
||||
|
||||
Fail-closed semantics (mirrors transcript_auth.py): if the secret file is
|
||||
missing, empty, or unreadable, every request is rejected. The platform
|
||||
will surface this as a structural error rather than silently sending
|
||||
unauthenticated requests through.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
import configs_dir
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# In-process cache so we don't hit disk on every forward call. Same
|
||||
# pattern as platform_auth._cached_token. The file is the durable copy;
|
||||
# this var is the hot path.
|
||||
_cached_secret: str | None = None
|
||||
|
||||
|
||||
def _secret_file() -> Path:
|
||||
"""Path to the on-disk inbound-secret file. Resolved via configs_dir
|
||||
— /configs in-container, ~/.molecule-workspace for external-runtime
|
||||
operators. Explicit CONFIGS_DIR env var wins."""
|
||||
return configs_dir.resolve() / ".platform_inbound_secret"
|
||||
|
||||
|
||||
def get_inbound_secret() -> str | None:
|
||||
"""Return the cached inbound secret, reading from disk on first call.
|
||||
|
||||
Returns None if the file is missing, empty, or unreadable. Callers
|
||||
MUST treat None as an auth failure (fail-closed) — never substitute
|
||||
a default or skip-auth-on-missing semantics.
|
||||
"""
|
||||
global _cached_secret
|
||||
if _cached_secret is not None:
|
||||
return _cached_secret
|
||||
path = _secret_file()
|
||||
if not path.exists():
|
||||
return None
|
||||
try:
|
||||
secret = path.read_text().strip()
|
||||
except OSError as exc:
|
||||
logger.warning("platform_inbound_auth: read %s failed: %s", path, exc)
|
||||
return None
|
||||
if not secret:
|
||||
return None
|
||||
_cached_secret = secret
|
||||
return secret
|
||||
|
||||
|
||||
def reset_cache() -> None:
|
||||
"""Drop the in-process cache. Used by tests + the rare runtime-side
|
||||
path that needs to re-read after the file is overwritten (e.g. a
|
||||
rotation flow lands in the future)."""
|
||||
global _cached_secret
|
||||
_cached_secret = None
|
||||
|
||||
|
||||
def save_inbound_secret(secret: str) -> None:
|
||||
"""Persist a freshly-received platform_inbound_secret to disk.
|
||||
|
||||
Called from the /registry/register response handler when the platform
|
||||
returns a `platform_inbound_secret` field. Mirrors platform_auth.save_token's
|
||||
pattern: 0600 file in CONFIGS_DIR, atomic write via tmp + rename so a
|
||||
concurrent reader never sees a partial file.
|
||||
|
||||
Idempotent: writing the same value over an existing file is a no-op
|
||||
from the workspace's perspective. Resets the in-process cache so the
|
||||
next get_inbound_secret() returns the freshly-written value (matters
|
||||
when a future rotation flow lands and the platform sends a different
|
||||
secret on a subsequent register call).
|
||||
"""
|
||||
global _cached_secret
|
||||
if not secret:
|
||||
return
|
||||
path = _secret_file()
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
tmp = path.with_suffix(path.suffix + ".tmp")
|
||||
try:
|
||||
# Open with 0600 from the start so a concurrent reader can never
|
||||
# see a 0644-default fd before the chmod. mode= is honored by
|
||||
# os.open underneath; pathlib.write_text does not expose it.
|
||||
fd = os.open(str(tmp), os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
|
||||
with os.fdopen(fd, "w") as f:
|
||||
f.write(secret)
|
||||
os.replace(str(tmp), str(path))
|
||||
# Race-safe in-process cache update: clear first, then let next
|
||||
# caller re-read disk. Avoids the "stored new, cache still has
|
||||
# old" window if get_inbound_secret races with this write.
|
||||
_cached_secret = None
|
||||
except OSError as exc:
|
||||
logger.warning("platform_inbound_auth: save %s failed: %s", path, exc)
|
||||
# Best-effort cleanup of the tmp file.
|
||||
try:
|
||||
os.unlink(str(tmp))
|
||||
except OSError as cleanup_exc:
|
||||
logger.debug("platform_inbound_auth: unlink tmp %s failed: %s", tmp, cleanup_exc)
|
||||
|
||||
|
||||
def inbound_authorized(expected_secret: str | None, auth_header: str) -> bool:
|
||||
"""Return True iff a /internal/* request should be served.
|
||||
|
||||
Args:
|
||||
expected_secret: the workspace's stored inbound secret, or None
|
||||
if /configs/.platform_inbound_secret is absent / empty /
|
||||
unreadable.
|
||||
auth_header: raw Authorization request header value.
|
||||
|
||||
Behavior:
|
||||
- None / empty expected → fail closed. A missing secret file
|
||||
is an auth failure, not a bypass.
|
||||
- Non-empty expected → strict string-equality against
|
||||
"Bearer <secret>". Bearer prefix is case-sensitive (matches
|
||||
the platform's wsauth.BearerTokenFromHeader contract).
|
||||
|
||||
Constant-time comparison is used to avoid leaking the secret one
|
||||
byte at a time via timing analysis on a network-reachable endpoint.
|
||||
"""
|
||||
if not expected_secret:
|
||||
return False
|
||||
expected = f"Bearer {expected_secret}"
|
||||
# hmac.compare_digest is the stdlib constant-time string compare.
|
||||
# Length mismatch is documented to short-circuit safely (returns
|
||||
# False without leaking length-difference timing).
|
||||
import hmac
|
||||
return hmac.compare_digest(auth_header, expected)
|
||||
@@ -1,107 +0,0 @@
|
||||
# Platform tool registry
|
||||
|
||||
Single source of truth for every tool the platform exposes to agents
|
||||
(A2A delegation, hierarchical memory, broadcast, introspection).
|
||||
|
||||
## Why this exists
|
||||
|
||||
Pre-#2240, three places independently declared each tool:
|
||||
|
||||
1. **MCP server** (`workspace/a2a_mcp_server.py`) — the `TOOLS` JSON list
|
||||
2. **LangChain `@tool` wrappers** (`workspace/builtin_tools/{delegation,memory}.py`)
|
||||
3. **Agent-facing system-prompt docs** (`workspace/executor_helpers.py`)
|
||||
|
||||
Adding a tool to one and forgetting the others happened repeatedly. The
|
||||
canonical case: `send_message_to_user` was registered in MCP TOOLS but
|
||||
the executor_helpers doc string never mentioned it, so agents saw the
|
||||
tool as available but had no usage guidance — a silent capability
|
||||
regression.
|
||||
|
||||
## What the registry does
|
||||
|
||||
`registry.py` defines each tool ONCE as a frozen `ToolSpec`:
|
||||
|
||||
```python
|
||||
ToolSpec(
|
||||
name="delegate_task",
|
||||
short="Delegate a task to a peer workspace via A2A and WAIT for the response.",
|
||||
when_to_use="Use for QUICK questions and small sub-tasks where you can afford to wait inline...",
|
||||
input_schema={...}, # JSON Schema, consumed by MCP server
|
||||
impl=tool_delegate_task, # the actual coroutine
|
||||
section="a2a", # which prompt section it belongs to
|
||||
)
|
||||
```
|
||||
|
||||
Adapters consume specs; no hardcoded names anywhere else:
|
||||
|
||||
- **MCP server** builds its `TOOLS` list from `_PLATFORM_TOOL_SPECS` at import time
|
||||
- **LangChain `@tool` wrappers** read `name=spec.name` from the registry
|
||||
- **Doc generator** (`executor_helpers._render_section()`) produces the
|
||||
system-prompt block from `spec.short` (bullet) + `spec.when_to_use`
|
||||
(heading + paragraph)
|
||||
|
||||
## CLI subprocess block — special case
|
||||
|
||||
Non-MCP runtimes (ollama, custom subprocess adapters) use a separate
|
||||
hand-maintained block in `executor_helpers._A2A_INSTRUCTIONS_CLI` because
|
||||
the CLI subcommand vocabulary (`peers`, `delegate`, `status`, `info`)
|
||||
differs from the MCP tool names (`list_peers`, `delegate_task`, etc.).
|
||||
Auto-generation would lose the readable invocation syntax.
|
||||
|
||||
Alignment is enforced via `_CLI_A2A_COMMAND_KEYWORDS` (in
|
||||
`executor_helpers.py`): every a2a-section spec must be keyed there with
|
||||
either a CLI subcommand keyword OR an explicit `None` if the tool is
|
||||
intentionally not exposed via subprocess (e.g.
|
||||
`send_message_to_user` because its structured `attachments` field
|
||||
doesn't survive positional-arg shell invocation).
|
||||
|
||||
## Tests that catch drift
|
||||
|
||||
`workspace/tests/test_platform_tools.py`:
|
||||
|
||||
| Test | What it catches |
|
||||
|---|---|
|
||||
| `test_mcp_server_registers_every_registry_tool` | MCP TOOLS list out of sync with registry |
|
||||
| `test_mcp_tool_descriptions_match_registry_short` | hand-edited MCP description that drifted |
|
||||
| `test_mcp_tool_input_schemas_match_registry` | schema duplicated in server file |
|
||||
| `test_a2a_instructions_text_includes_every_a2a_tool` | doc generator missed a tool |
|
||||
| `test_old_pre_rename_names_not_present_in_docs` | stale name leaked back in |
|
||||
| `test_a2a_mcp_instructions_match_snapshot` | rendered shape (bullet ordering, headings, footers) drifted |
|
||||
| `test_a2a_cli_instructions_match_snapshot` | CLI block edited in a way that changes shape |
|
||||
| `test_hma_instructions_match_snapshot` | HMA section drifted |
|
||||
| `test_cli_keyword_mapping_covers_every_a2a_tool` | tool added to registry without a CLI mapping decision |
|
||||
| `test_cli_keyword_substrings_appear_in_cli_block` | CLI keyword in the mapping but missing from the doc block |
|
||||
|
||||
The snapshot files at `workspace/tests/snapshots/*.txt` are LF-pinned
|
||||
in `.gitattributes` so a Windows contributor with `core.autocrlf=true`
|
||||
doesn't get mysterious test failures.
|
||||
|
||||
## Adding a new tool
|
||||
|
||||
1. Append a `ToolSpec(...)` to `TOOLS` in `registry.py`.
|
||||
2. Add the LangChain `@tool` wrapper in `workspace/builtin_tools/`
|
||||
(the wrapper body just calls `spec.impl`).
|
||||
3. Update `_CLI_A2A_COMMAND_KEYWORDS` in `executor_helpers.py` — set the
|
||||
value to the CLI subcommand keyword, or to `None` if the tool isn't
|
||||
exposed via the subprocess interface.
|
||||
4. Regenerate snapshots — see the comment block at the top of
|
||||
`workspace/tests/test_platform_tools.py` for the one-liner.
|
||||
5. Run `pytest workspace/tests/test_platform_tools.py --no-cov`.
|
||||
|
||||
## Renaming a tool
|
||||
|
||||
Edit `name` in `registry.py` only. Then:
|
||||
|
||||
1. The MCP TOOLS list rebuilds automatically.
|
||||
2. The doc generator regenerates automatically (snapshots will fail
|
||||
the diff — regenerate them).
|
||||
3. Search `workspace/` for the old literal in case a non-adapter
|
||||
consumer (tests, plugin code) hardcoded the old name; update those.
|
||||
4. Update any `_CLI_A2A_COMMAND_KEYWORDS` key + the literal substring
|
||||
in `_A2A_INSTRUCTIONS_CLI` if applicable.
|
||||
|
||||
## Removing a tool
|
||||
|
||||
Delete the `ToolSpec` and the `_CLI_A2A_COMMAND_KEYWORDS` key. Adapters
|
||||
and doc generators stop registering it automatically; the structural
|
||||
tests prevent stale references from surviving.
|
||||
@@ -1,13 +0,0 @@
|
||||
"""Platform tools — single source of truth for tool naming and docs.
|
||||
|
||||
The platform owns A2A and persistent-memory tooling (cross-cutting
|
||||
runtime concerns per project memory project_runtime_native_pluggable.md).
|
||||
Tools are defined ONCE in `registry.py`. Every adapter — MCP server,
|
||||
LangChain wrapper, any future SDK integration — consumes the specs to
|
||||
register the tool in its native format. Doc generators (system-prompt
|
||||
injection, canvas help, future doc sites) read from the same place.
|
||||
|
||||
Adding a tool: append a ToolSpec to TOOLS in registry.py. Every
|
||||
adapter picks it up automatically; structural tests fail if any side
|
||||
drifts from the registry.
|
||||
"""
|
||||
@@ -1,737 +0,0 @@
|
||||
"""Canonical registry of platform tool specs.
|
||||
|
||||
Every tool the platform offers to agents (A2A delegation, persistent
|
||||
memory, broadcast, introspection) is defined ONCE in TOOLS below.
|
||||
Adapters consume these specs to register the tool in their native
|
||||
runtime format:
|
||||
|
||||
- a2a_mcp_server.py iterates `TOOLS` to build the MCP TOOLS list +
|
||||
dispatches calls to spec.impl. No tool name or description is
|
||||
hardcoded there.
|
||||
|
||||
- builtin_tools/{delegation,memory}.py define LangChain `@tool`
|
||||
wrappers using `name=` from the spec; the wrapper body just
|
||||
calls spec.impl.
|
||||
|
||||
- executor_helpers.get_a2a_instructions(mcp=True) /
|
||||
get_hma_instructions() GENERATE the system-prompt doc string from
|
||||
`TOOLS` — no hand-maintained instruction text for MCP-capable
|
||||
runtimes.
|
||||
|
||||
- executor_helpers._A2A_INSTRUCTIONS_CLI is a SEPARATE hand-maintained
|
||||
block for CLI subprocess runtimes (ollama and any other adapter
|
||||
that drives a2a via `python3 -m molecule_runtime.a2a_cli ...`). It
|
||||
uses different command-shape names than the registry tool names
|
||||
(e.g. `peers` vs `list_peers`), so it cannot be auto-generated
|
||||
from JSON-schema specs without losing the readable invocation
|
||||
syntax. Its tool-coverage alignment with the registry is enforced
|
||||
by the `_CLI_A2A_COMMAND_KEYWORDS` mapping in executor_helpers.py
|
||||
and the alignment tests in test_platform_tools.py — adding a new
|
||||
a2a tool here will fail those tests until the mapping is updated.
|
||||
|
||||
Adding a new tool: append a ToolSpec to `TOOLS` below, then update
|
||||
`_CLI_A2A_COMMAND_KEYWORDS` in executor_helpers.py (set the value to
|
||||
the CLI subcommand keyword, or to `None` if the tool isn't exposed via
|
||||
the CLI subprocess interface). The structural alignment tests in
|
||||
workspace/tests/test_platform_tools.py fail otherwise.
|
||||
|
||||
Renaming a tool: change `name` here. Search workspace/ for the old
|
||||
literal in case any non-adapter consumer (tests, plugin code) hard-coded
|
||||
it; update those manually. The grep is the audit, the test is the gate.
|
||||
|
||||
Removing a tool: delete the entry AND its `_CLI_A2A_COMMAND_KEYWORDS`
|
||||
key. Adapters stop registering it automatically; doc generators stop
|
||||
mentioning it.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Awaitable, Callable
|
||||
from dataclasses import dataclass
|
||||
from typing import Any, Literal
|
||||
|
||||
from a2a_tools import (
|
||||
tool_broadcast_message,
|
||||
tool_chat_history,
|
||||
tool_check_task_status,
|
||||
tool_commit_memory,
|
||||
tool_delegate_task,
|
||||
tool_delegate_task_async,
|
||||
tool_get_runtime_identity,
|
||||
tool_get_workspace_info,
|
||||
tool_inbox_peek,
|
||||
tool_inbox_pop,
|
||||
tool_list_peers,
|
||||
tool_recall_memory,
|
||||
tool_send_message_to_user,
|
||||
tool_update_agent_card,
|
||||
tool_wait_for_message,
|
||||
)
|
||||
|
||||
# Section name maps to the heading in the agent-facing system prompt.
|
||||
# Adding a new section: add a constant + create a corresponding
|
||||
# generator in executor_helpers (or generalize get_*_instructions).
|
||||
A2A_SECTION = "a2a"
|
||||
MEMORY_SECTION = "memory"
|
||||
|
||||
Section = Literal["a2a", "memory"]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ToolSpec:
|
||||
"""Runtime-agnostic definition of one platform tool.
|
||||
|
||||
Each adapter (MCP, LangChain, future SDK) consumes the same spec.
|
||||
Doc generators consume the same spec. There is no other source
|
||||
of truth for tool naming or description.
|
||||
"""
|
||||
|
||||
name: str
|
||||
"""The exact name agents see. MUST match every adapter's
|
||||
registered name and the literal that appears in agent-facing
|
||||
instruction docs. Structural test enforces this."""
|
||||
|
||||
short: str
|
||||
"""One-line description. Used as the MCP `description` field
|
||||
AND as the bullet line in agent-facing instruction docs."""
|
||||
|
||||
when_to_use: str
|
||||
"""Two-to-three-sentence agent-facing usage guidance — when
|
||||
to call this tool, what it returns, what NOT to confuse it
|
||||
with. Concatenated into the system prompt below the tool list."""
|
||||
|
||||
input_schema: dict[str, Any]
|
||||
"""JSON Schema for the tool's input parameters. Consumed
|
||||
directly by the MCP server. LangChain derives its schema from
|
||||
Python type annotations on the @tool function — alignment is
|
||||
pinned by the structural test."""
|
||||
|
||||
impl: Callable[..., Awaitable[str]]
|
||||
"""The actual coroutine. Both adapters call this; only the
|
||||
wrapping differs."""
|
||||
|
||||
section: Section
|
||||
"""Which agent-prompt section this tool belongs to (controls
|
||||
which instruction generator emits it)."""
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# A2A — inter-agent communication & broadcast
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_DELEGATE_TASK = ToolSpec(
|
||||
name="delegate_task",
|
||||
short=(
|
||||
"Delegate a task to a peer workspace via A2A and WAIT for the "
|
||||
"response (synchronous)."
|
||||
),
|
||||
when_to_use=(
|
||||
"Use for QUICK questions and small sub-tasks where you can "
|
||||
"afford to wait inline. Returns the peer's response text "
|
||||
"directly. For longer-running work (research, multi-minute "
|
||||
"jobs) use delegate_task_async + check_task_status instead "
|
||||
"so you don't hold this workspace busy waiting."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"workspace_id": {
|
||||
"type": "string",
|
||||
"description": "Target workspace ID (from list_peers).",
|
||||
},
|
||||
"task": {
|
||||
"type": "string",
|
||||
"description": "Task description to send to the peer.",
|
||||
},
|
||||
"source_workspace_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Optional. The registered workspace this delegation "
|
||||
"originates from when the agent is registered to "
|
||||
"multiple workspaces (MOLECULE_WORKSPACES). Auto-"
|
||||
"routes via the peer→source cache when omitted; "
|
||||
"single-workspace operators can ignore it."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["workspace_id", "task"],
|
||||
},
|
||||
impl=tool_delegate_task,
|
||||
section=A2A_SECTION,
|
||||
)
|
||||
|
||||
_DELEGATE_TASK_ASYNC = ToolSpec(
|
||||
name="delegate_task_async",
|
||||
short=(
|
||||
"Send a task to a peer and return immediately with a task_id "
|
||||
"(non-blocking)."
|
||||
),
|
||||
when_to_use=(
|
||||
"Use for long-running work where you want to keep doing other "
|
||||
"things while the peer processes. Poll with check_task_status "
|
||||
"to retrieve the result. The platform's A2A queue handles "
|
||||
"delivery + retries; the peer works independently."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"workspace_id": {
|
||||
"type": "string",
|
||||
"description": "Target workspace ID (from list_peers).",
|
||||
},
|
||||
"task": {
|
||||
"type": "string",
|
||||
"description": "Task description to send to the peer.",
|
||||
},
|
||||
"source_workspace_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Optional. The registered workspace this delegation "
|
||||
"originates from. Auto-routes via the peer→source "
|
||||
"cache when omitted."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["workspace_id", "task"],
|
||||
},
|
||||
impl=tool_delegate_task_async,
|
||||
section=A2A_SECTION,
|
||||
)
|
||||
|
||||
_CHECK_TASK_STATUS = ToolSpec(
|
||||
name="check_task_status",
|
||||
short=(
|
||||
"Poll the status of a task started with delegate_task_async; "
|
||||
"returns result when done."
|
||||
),
|
||||
when_to_use=(
|
||||
"Statuses: pending/in_progress (peer still working — wait), "
|
||||
"queued (peer is busy with a prior task — DO NOT retry, the "
|
||||
"platform stitches the response when it finishes), completed "
|
||||
"(result available), failed (real error — fall back to a "
|
||||
"different peer or handle it yourself)."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"workspace_id": {
|
||||
"type": "string",
|
||||
"description": "Workspace ID the task was sent to.",
|
||||
},
|
||||
"task_id": {
|
||||
"type": "string",
|
||||
"description": "task_id returned by delegate_task_async.",
|
||||
},
|
||||
"source_workspace_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Optional. Which registered workspace's delegation "
|
||||
"log to query. Defaults to this workspace."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["workspace_id", "task_id"],
|
||||
},
|
||||
impl=tool_check_task_status,
|
||||
section=A2A_SECTION,
|
||||
)
|
||||
|
||||
_LIST_PEERS = ToolSpec(
|
||||
name="list_peers",
|
||||
short=(
|
||||
"List the workspaces this agent can communicate with — name, "
|
||||
"ID, status, role for each."
|
||||
),
|
||||
when_to_use=(
|
||||
"Call this first when you need to delegate but don't know the "
|
||||
"target's ID. Access control is enforced — you only see "
|
||||
"siblings, parent, and direct children. With "
|
||||
"MOLECULE_WORKSPACES set, peers from every registered workspace "
|
||||
"are aggregated and tagged with their source."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"source_workspace_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Optional. Restrict to peers of this one registered "
|
||||
"workspace. Omit to aggregate across all workspaces "
|
||||
"an external agent has registered against."
|
||||
),
|
||||
},
|
||||
},
|
||||
},
|
||||
impl=tool_list_peers,
|
||||
section=A2A_SECTION,
|
||||
)
|
||||
|
||||
_GET_WORKSPACE_INFO = ToolSpec(
|
||||
name="get_workspace_info",
|
||||
short="Get this workspace's own info — ID, name, role, tier, parent, status.",
|
||||
when_to_use=(
|
||||
"Use to introspect your own identity (e.g. before reporting "
|
||||
"back to the user, or to determine whether you're a tier-0 "
|
||||
"root that can write GLOBAL memory)."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"source_workspace_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Optional. In multi-workspace mode (this agent registered "
|
||||
"in N workspaces), introspect the named workspace instead "
|
||||
"of the primary one. Single-workspace agents omit this."
|
||||
),
|
||||
},
|
||||
},
|
||||
},
|
||||
impl=tool_get_workspace_info,
|
||||
section=A2A_SECTION,
|
||||
)
|
||||
|
||||
_GET_RUNTIME_IDENTITY = ToolSpec(
|
||||
name="get_runtime_identity",
|
||||
short=(
|
||||
"Return this runtime's identity — model, model_provider, tier, "
|
||||
"workspace_id, runtime template. Reads from process env; no HTTP call."
|
||||
),
|
||||
when_to_use=(
|
||||
"Use this to answer 'what model am I?' truthfully instead of "
|
||||
"guessing from a stale system prompt — the operator may have "
|
||||
"routed you to a different model via persona env between boots. "
|
||||
"Always permitted by RBAC: even read-only agents may know what "
|
||||
"model they are. Distinct from get_workspace_info — that one "
|
||||
"calls the platform for ID/role/tier/parent (workspace metadata); "
|
||||
"this one returns the live process env (MODEL, MODEL_PROVIDER, "
|
||||
"MOLECULE_MODEL, ANTHROPIC_BASE_URL, TIER, WORKSPACE_ID, "
|
||||
"ADAPTER_MODULE)."
|
||||
),
|
||||
input_schema={"type": "object", "properties": {}},
|
||||
impl=tool_get_runtime_identity,
|
||||
section=A2A_SECTION,
|
||||
)
|
||||
|
||||
_UPDATE_AGENT_CARD = ToolSpec(
|
||||
name="update_agent_card",
|
||||
short=(
|
||||
"Replace this workspace's agent_card on the platform. The "
|
||||
"platform validates required fields and broadcasts an "
|
||||
"agent_card_updated event so the canvas reflects the change live."
|
||||
),
|
||||
when_to_use=(
|
||||
"Use when the workspace's capabilities, skills, description, or "
|
||||
"name change and the canvas display needs to follow. The "
|
||||
"platform stores the new card and pushes an "
|
||||
"``agent_card_updated`` event to subscribers. Gated behind the "
|
||||
"``memory.write`` RBAC capability — read-only roles cannot "
|
||||
"rewrite the card. Tier-1+ owners always have this capability."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"card": {
|
||||
"type": "object",
|
||||
"description": (
|
||||
"The new agent_card object (name, version, "
|
||||
"description, skills, etc). Server-side validation "
|
||||
"rejects payloads missing required fields."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["card"],
|
||||
},
|
||||
impl=tool_update_agent_card,
|
||||
section=A2A_SECTION,
|
||||
)
|
||||
|
||||
_BROADCAST_MESSAGE = ToolSpec(
|
||||
name="broadcast_message",
|
||||
short=(
|
||||
"Send a message to ALL agent workspaces in the org simultaneously. "
|
||||
"Requires broadcast_enabled=true on this workspace (set by user/admin)."
|
||||
),
|
||||
when_to_use=(
|
||||
"Use for urgent, org-wide signals: critical status changes, emergency "
|
||||
"stop instructions, coordinated task announcements. Every non-removed "
|
||||
"workspace receives the message in its activity log (poll-mode agents "
|
||||
"see it on their next poll; push-mode canvases get a real-time banner). "
|
||||
"This tool returns an error if broadcast_enabled is false — a user or "
|
||||
"admin must enable it via the workspace abilities settings first."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"message": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"The broadcast text. Keep it concise — every agent in the "
|
||||
"org receives this in their activity feed."
|
||||
),
|
||||
},
|
||||
"workspace_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Optional. Multi-workspace mode: the registered workspace "
|
||||
"to broadcast from. Single-workspace agents omit this."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["message"],
|
||||
},
|
||||
impl=tool_broadcast_message,
|
||||
section=A2A_SECTION,
|
||||
)
|
||||
|
||||
_SEND_MESSAGE_TO_USER = ToolSpec(
|
||||
name="send_message_to_user",
|
||||
short=(
|
||||
"Send a message directly to the user's canvas chat — pushed instantly "
|
||||
"via WebSocket. Use this to: (1) acknowledge a task immediately ('Got "
|
||||
"it, I'll start working on this'), (2) send interim progress updates "
|
||||
"while doing long work, (3) deliver follow-up results after delegation "
|
||||
"completes, (4) attach files (zip, pdf, csv, image) for the user to "
|
||||
"download via the `attachments` field (NEVER paste file URLs in "
|
||||
"`message`). The message appears in the user's chat as if you're "
|
||||
"proactively reaching out."
|
||||
),
|
||||
when_to_use=(
|
||||
"Use proactively across the lifecycle of a task — early to "
|
||||
"acknowledge, mid-flight to update, late to deliver. Never paste "
|
||||
"file URLs in the message body — always pass absolute paths in "
|
||||
"`attachments` so the platform serves them as download chips "
|
||||
"(works on SaaS where external file hosts are unreachable)."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"message": {
|
||||
"type": "string",
|
||||
# The "no URLs in message text" rule is the single biggest
|
||||
# cause of bad chat UX: agents drop catbox.moe / file://
|
||||
# / temporary upload-host links into the prose, the
|
||||
# canvas renders them as plain markdown links the user
|
||||
# can't preview, and SaaS deployments often can't even
|
||||
# reach those external hosts. Every download MUST go
|
||||
# through the structured `attachments` field below.
|
||||
"description": (
|
||||
"Caption text for the chat bubble. Required even when sending "
|
||||
"attachments — set to a short label like 'Here's the build:' "
|
||||
"or 'Done — see attached.'\n\n"
|
||||
"DO NOT paste file URLs, download links, or container paths in "
|
||||
"this string. Files MUST go through the `attachments` field, "
|
||||
"which renders as a clickable download chip and works on SaaS "
|
||||
"deployments where external file-host URLs (catbox.moe, file://, "
|
||||
"etc.) are unreachable from the user's browser."
|
||||
),
|
||||
},
|
||||
"attachments": {
|
||||
"type": "array",
|
||||
"description": (
|
||||
"REQUIRED for any file delivery. Pass absolute file paths inside "
|
||||
"THIS container (e.g. ['/tmp/build.zip', '/workspace/report.pdf']) "
|
||||
"— the platform uploads each file and returns a download chip "
|
||||
"with the file's icon + name + size in the user's chat. The chip "
|
||||
"works in SaaS deployments because the URL is platform-served, "
|
||||
"not an external host.\n\n"
|
||||
"USE THIS instead of: pasting URLs in `message`, base64-encoding "
|
||||
"in the body, or telling the user to look at a path on disk. "
|
||||
"If the file isn't already on disk, write it first (Bash, Write "
|
||||
"tool, etc.) then pass its path here. 25 MB per file cap."
|
||||
),
|
||||
"items": {"type": "string"},
|
||||
},
|
||||
"workspace_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Optional. Set ONLY when this agent is registered in MULTIPLE "
|
||||
"workspaces (external multi-workspace MCP path) — pass the "
|
||||
"`arrival_workspace_id` of the inbound message you're replying "
|
||||
"to so the user sees the reply in the same canvas they typed in. "
|
||||
"Single-workspace agents omit this; the message routes to the "
|
||||
"only registered workspace."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["message"],
|
||||
},
|
||||
impl=tool_send_message_to_user,
|
||||
section=A2A_SECTION,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Inbox — inbound delivery for the standalone molecule-mcp path.
|
||||
#
|
||||
# These tools observe a poller-fed in-memory queue (see workspace/inbox.py).
|
||||
# They are universally registered so docs + adapters stay aligned, but
|
||||
# they only return real data in the standalone molecule-mcp runtime;
|
||||
# in-container runtimes return an informational "not enabled" message
|
||||
# because their delivery loop is push-based via the canvas WebSocket.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_WAIT_FOR_MESSAGE = ToolSpec(
|
||||
name="wait_for_message",
|
||||
short=(
|
||||
"Block until the next inbound message (canvas user OR peer "
|
||||
"agent) arrives, or until ``timeout_secs`` elapses."
|
||||
),
|
||||
when_to_use=(
|
||||
"Standalone-runtime ONLY (molecule-mcp wrapper). After "
|
||||
"you reply, call this to wait for the next message — forms "
|
||||
"the loop ``wait_for_message → respond → wait_for_message``. "
|
||||
"Returns the head message non-destructively; call inbox_pop "
|
||||
"with the activity_id once you've handled it. In-container "
|
||||
"runtimes receive messages via push and should not call this."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"timeout_secs": {
|
||||
"type": "number",
|
||||
"description": (
|
||||
"Max seconds to block. Capped at 300. "
|
||||
"Default 60."
|
||||
),
|
||||
},
|
||||
},
|
||||
},
|
||||
impl=tool_wait_for_message,
|
||||
section=A2A_SECTION,
|
||||
)
|
||||
|
||||
_INBOX_PEEK = ToolSpec(
|
||||
name="inbox_peek",
|
||||
short="List pending inbound messages without removing them.",
|
||||
when_to_use=(
|
||||
"Standalone-runtime ONLY. Use to inspect what's queued "
|
||||
"before deciding which to handle. Non-destructive — pair "
|
||||
"with inbox_pop to consume after replying."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"limit": {
|
||||
"type": "integer",
|
||||
"description": "Max messages to return. Default 10.",
|
||||
},
|
||||
},
|
||||
},
|
||||
impl=tool_inbox_peek,
|
||||
section=A2A_SECTION,
|
||||
)
|
||||
|
||||
_CHAT_HISTORY = ToolSpec(
|
||||
name="chat_history",
|
||||
short="Fetch the prior conversation with one peer (both sides, chronological).",
|
||||
when_to_use=(
|
||||
"Call this when a peer_agent push lands and you need context "
|
||||
"from prior turns with that workspace — e.g. \"what task did "
|
||||
"this peer assign me last hour?\" or \"what did I tell them?\". "
|
||||
"Both sides of the conversation appear in chronological order, "
|
||||
"so the agent reads the log top-down. Cheaper than re-deriving "
|
||||
"context from memory because the platform already audits every "
|
||||
"A2A turn into activity_logs. Pair with `agent_card_url` from "
|
||||
"the channel envelope when you also need the peer's "
|
||||
"capabilities."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"peer_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"The peer workspace's UUID — same value you got "
|
||||
"as `peer_id` on the inbound push, or as "
|
||||
"`workspace_id` from `list_peers`."
|
||||
),
|
||||
},
|
||||
"limit": {
|
||||
"type": "integer",
|
||||
"description": (
|
||||
"Max rows to return (default 20, capped at 500). "
|
||||
"Default 20 covers \"most recent context\" without "
|
||||
"flooding the conversation window."
|
||||
),
|
||||
},
|
||||
"before_ts": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Optional RFC3339 timestamp; passes through to the "
|
||||
"server for paging backward through long histories. "
|
||||
"Use the oldest `created_at` from a previous response."
|
||||
),
|
||||
},
|
||||
"source_workspace_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Optional. Multi-workspace mode: query the named "
|
||||
"workspace's activity log instead of the primary one. "
|
||||
"Auto-routes via the peer-discovery cache when unset."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["peer_id"],
|
||||
},
|
||||
impl=tool_chat_history,
|
||||
section=A2A_SECTION,
|
||||
)
|
||||
|
||||
_INBOX_POP = ToolSpec(
|
||||
name="inbox_pop",
|
||||
short="Remove a handled message from the inbox queue by activity_id.",
|
||||
when_to_use=(
|
||||
"Standalone-runtime ONLY. Call after you've replied to a "
|
||||
"message returned from wait_for_message or inbox_peek to "
|
||||
"drop it from the queue. Idempotent — popping a missing "
|
||||
"id reports removed=false without erroring."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"activity_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"activity_id of the message to remove (from "
|
||||
"inbox_peek / wait_for_message output)."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["activity_id"],
|
||||
},
|
||||
impl=tool_inbox_pop,
|
||||
section=A2A_SECTION,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# HMA — hierarchical persistent memory
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_COMMIT_MEMORY = ToolSpec(
|
||||
name="commit_memory",
|
||||
short="Save a fact to persistent memory; survives across sessions and restarts.",
|
||||
when_to_use=(
|
||||
"Scopes: LOCAL (private to you, default), TEAM (shared with "
|
||||
"parent + siblings), GLOBAL (entire org — only tier-0 root "
|
||||
"workspaces can write). Commit decisions, learned facts, and "
|
||||
"completed-task summaries so future sessions and teammates "
|
||||
"can recall them."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"content": {
|
||||
"type": "string",
|
||||
"description": "What to remember — be specific.",
|
||||
},
|
||||
"scope": {
|
||||
"type": "string",
|
||||
"enum": ["LOCAL", "TEAM", "GLOBAL"],
|
||||
"description": "Memory scope (default LOCAL).",
|
||||
},
|
||||
"source_workspace_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Optional. Multi-workspace mode: commit the memory "
|
||||
"into the named workspace's namespace instead of "
|
||||
"the primary one. Pair with the inbound message's "
|
||||
"`arrival_workspace_id` so memories stay in the "
|
||||
"tenant they were derived from."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["content"],
|
||||
},
|
||||
impl=tool_commit_memory,
|
||||
section=MEMORY_SECTION,
|
||||
)
|
||||
|
||||
_RECALL_MEMORY = ToolSpec(
|
||||
name="recall_memory",
|
||||
short="Search persistent memory; returns matching LOCAL + TEAM + GLOBAL rows.",
|
||||
when_to_use=(
|
||||
"Call at the start of new work and when picking up something "
|
||||
"you may have done before. Empty query returns ALL accessible "
|
||||
"memories — cheap and avoids missing rows that don't match a "
|
||||
"narrow keyword. Memory is automatically recalled at session "
|
||||
"start; use this to refresh mid-session."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "Search query (empty returns all).",
|
||||
},
|
||||
"scope": {
|
||||
"type": "string",
|
||||
"enum": ["LOCAL", "TEAM", "GLOBAL", ""],
|
||||
"description": "Filter by scope (empty = all accessible).",
|
||||
},
|
||||
"source_workspace_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Optional. Multi-workspace mode: search the named "
|
||||
"workspace's memories instead of the primary one. "
|
||||
"Pair with the inbound message's "
|
||||
"`arrival_workspace_id` to recall context for the "
|
||||
"right tenant."
|
||||
),
|
||||
},
|
||||
},
|
||||
},
|
||||
impl=tool_recall_memory,
|
||||
section=MEMORY_SECTION,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public registry. Keep alphabetically grouped by section for stable
|
||||
# adapter listings + diff-friendly review.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
TOOLS: list[ToolSpec] = [
|
||||
# A2A
|
||||
_DELEGATE_TASK,
|
||||
_DELEGATE_TASK_ASYNC,
|
||||
_CHECK_TASK_STATUS,
|
||||
_LIST_PEERS,
|
||||
_GET_WORKSPACE_INFO,
|
||||
_GET_RUNTIME_IDENTITY,
|
||||
_UPDATE_AGENT_CARD,
|
||||
_BROADCAST_MESSAGE,
|
||||
_SEND_MESSAGE_TO_USER,
|
||||
# Inbox (standalone-only; in-container returns informational error)
|
||||
_WAIT_FOR_MESSAGE,
|
||||
_INBOX_PEEK,
|
||||
_INBOX_POP,
|
||||
_CHAT_HISTORY,
|
||||
# HMA
|
||||
_COMMIT_MEMORY,
|
||||
_RECALL_MEMORY,
|
||||
]
|
||||
|
||||
|
||||
def a2a_tools() -> list[ToolSpec]:
|
||||
"""All A2A-section tools, in registration order."""
|
||||
return [t for t in TOOLS if t.section == A2A_SECTION]
|
||||
|
||||
|
||||
def memory_tools() -> list[ToolSpec]:
|
||||
"""All memory-section tools, in registration order."""
|
||||
return [t for t in TOOLS if t.section == MEMORY_SECTION]
|
||||
|
||||
|
||||
def by_name(name: str) -> ToolSpec:
|
||||
"""Look up a spec by its canonical name. Raises KeyError if absent."""
|
||||
for t in TOOLS:
|
||||
if t.name == name:
|
||||
return t
|
||||
raise KeyError(f"no platform tool named {name!r}")
|
||||
|
||||
|
||||
def tool_names() -> list[str]:
|
||||
"""Canonical names in registration order."""
|
||||
return [t.name for t in TOOLS]
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user