History

Hongming Wang 63ac99788b fix(runtime): isolate card-skill enrichment + transcript handler from adapter shape mismatch PR #2756 added a try/except around adapter.setup() so a missing LLM key doesn't crash the workspace boot. Two paths that now run AFTER setup succeeds were not similarly isolated, leaving small but real coupling risks for future adapter authors. 1. Skill metadata enrichment swap (main.py:248-259). When adapter.setup() returns, main.py reads adapter.loaded_skills and replaces the static stubs in agent_card.skills with rich metadata (description, tags, examples). The list comprehension assumes each element exposes .metadata.{id,name,description,tags,examples}. A future adapter that returns a non-canonical shape would raise AttributeError, propagate to the outer except, capture as adapter_error, and silently degrade an OK boot to the not-configured state — even though setup() actually succeeded. Extract to card_helpers.enrich_card_skills(card, loaded_skills) → bool. Helper swallows enrichment failures, logs the cause, returns False, leaves the static stubs in place. setup() success path continues unchanged. 6 unit tests cover: None input, empty list, canonical happy path, missing .metadata attr, partial .metadata (missing one canonical field), atomic-failure-no-partial-swap. 2. /transcript handler (main.py:513). Calls await adapter.transcript_lines(...) without try/except. BaseAdapter's default returns {"supported": false} so today's 4 adapters never trigger this — but a future adapter override that assumes setup() ran would surface as a 500 from Starlette's default error handler instead of a useful 503 with the exception class + message. Inline try/except returns 503 with the reason, matching the not-configured JSON-RPC handler's pattern. Both changes match the architectural principle the PR #2756 chain established: availability (workspace reachable) is decoupled from configuration / adapter behavior. Operators see useful errors instead of silent degradation; future adapter authors can't accidentally break tenant readiness with a shape mismatch. Adds: - workspace/card_helpers.py (~50 lines, 100% covered) - workspace/tests/test_card_helpers.py (6 tests) - AgentCard/AgentSkill/AgentCapabilities/AgentInterface stubs to workspace/tests/conftest.py so future card-related tests work under the existing a2a-mock infrastructure - card_helpers in TOP_LEVEL_MODULES (drift gate would have caught it) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-04 14:15:27 -07:00
..
demo-freeze-snapshots	ops: demo-day freeze + rollback runbook	2026-05-01 12:04:30 -07:00
ops	feat(ops): add sweep-aws-secrets janitor — orphan tenant bootstrap secrets	2026-05-03 02:38:08 -07:00
build_runtime_package.py	fix(runtime): isolate card-skill enrichment + transcript handler from adapter shape mismatch	2026-05-04 14:15:27 -07:00
build-images.sh	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
bundle-compile.sh	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
canary-smoke.sh	feat(canary): smoke harness + GHA verification workflow (Phase 2)	2026-04-19 03:30:19 -07:00
check-cascade-list-vs-manifest.sh	feat(ci): structural drift gate for cascade list vs manifest (RFC #388 PR-3)	2026-05-03 03:52:39 -07:00
cleanup-rogue-workspaces.sh	fix(provisioner): stop rogue config-missing restart loop (#17 )	2026-04-14 07:32:58 -07:00
clone-manifest.sh	fix(quickstart): wire up template/plugin registry via manifest.json	2026-04-23 14:55:34 -07:00
demo-day-runbook.md	ops: demo-day freeze + rollback runbook	2026-05-01 12:04:30 -07:00
demo-freeze.sh	ops: demo-day freeze + rollback runbook	2026-05-01 12:04:30 -07:00
demo-thaw.sh	ops: demo-day freeze + rollback runbook	2026-05-01 12:04:30 -07:00
dev-start.sh	fix(dev-start): detect missing Go and fall back to docker-compose platform	2026-04-29 20:04:37 -07:00
import-agent.sh	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
lockdown-tenant-sg.sh	feat(security): Phase 35.1 — SG lockdown script for tenant EC2 instances	2026-04-18 12:01:41 -07:00
measure-coordinator-task-bounds-runner.sh	fix(harness-runner): switch from non-existent /heartbeat-history to /activity	2026-04-28 23:12:51 -07:00
measure-coordinator-task-bounds.sh	docs: registry pattern + harness scripts READMEs	2026-04-28 22:19:40 -07:00
nuke-and-rebuild.sh	fix(scripts): nuke-and-rebuild self-bootstraps templates; add E2E test	2026-04-26 14:37:04 -07:00
post-rebuild-setup.sh	security: remove hardcoded API keys from post-rebuild-setup.sh	2026-04-20 13:02:52 -07:00
README.md	docs(scripts): rename /heartbeat-history → /activity in README	2026-04-29 02:23:00 -07:00
refresh-workspace-images.sh	feat(platform/admin): /admin/workspace-images/refresh + Docker SDK + GHCR auth	2026-04-26 10:17:21 -07:00
rollback-latest.sh	fix(scripts): correct platform dir path + add ROOT isolation (shellcheck clean)	2026-04-22 15:42:24 +00:00
test_build_runtime_package.py	chore: rewriter unit tests + drop misleading noqa on `import inbox`	2026-04-30 20:45:32 -07:00
test-a2a-cross-runtime.sh	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
test-all-adapters.sh	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
test-all-runtimes-a2a-e2e.sh	test(e2e): wire SaaS auth headers (TENANT_ADMIN_TOKEN + TENANT_ORG_ID)	2026-05-02 04:36:23 -07:00
test-all.sh	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
test-cross-agent-chat.sh	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
test-hermes-plugin-e2e.sh	test(e2e): unified A2A round-trip parity harness across all 4 runtimes	2026-05-02 04:36:23 -07:00
test-nuke-and-rebuild.sh	fix(scripts): nuke-and-rebuild self-bootstraps templates; add E2E test	2026-04-26 14:37:04 -07:00
test-team-e2e.sh	initial commit — Molecule AI platform	2026-04-13 11:55:37 -07:00
wheel_smoke.py	feat(mcp): notifications/claude/channel for push-feel inbox UX	2026-04-30 20:10:01 -07:00

README.md

scripts/

Operational and one-off scripts for molecule-core. Most are self-documenting — see the header comments in each file.

RFC #2251 coordinator task-bound harnesses

There are three related scripts; pick the right one:

Script	Purpose	Targets
`measure-coordinator-task-bounds.sh`	Canonical v1 harness for the RFC #2251 / Issue 4 reproduction. Provisions a PM coordinator + Researcher child via `claude-code-default` + `langgraph` templates, sends a synthesis-heavy A2A kickoff, observes elapsed time + activity trace.	OSS-shape platform — localhost or any `/workspaces`-shaped endpoint. Has tenant/admin-token guards for non-localhost runs.
`measure-coordinator-task-bounds-runner.sh`	Generalised runner for the same measurement contract but with arbitrary template + secret + model combinations (Hermes/MiniMax, etc.). Useful for cross-runtime variants without modifying the canonical harness.	Same as above (local or SaaS via `MODE=saas`).
`measure-coordinator-task-bounds.sh` (in molecule-controlplane)	Production-shape variant that bootstraps a real staging tenant via `POST /cp/admin/orgs`, then runs the same measurement against `<slug>.staging.moleculesai.app`.	Staging controlplane only — refuses to run against production.

See reference_harness_pair_pattern (auto-memory) for when to use which and the cross-repo design rationale.

Common safety pattern across all three

Cleanup trap on EXIT/INT/TERM auto-deletes provisioned resources.
DRY_RUN=1 prints plan + auth fingerprint, exits before any state mutation. Run this before pointing at staging or any shared infrastructure.
Non-target guard refuses arbitrary endpoints (the controlplane variant is locked to staging-api.moleculesai.app; the OSS variant requires explicit auth + tenant scoping for non-localhost PLATFORM).
Cleanup failures emit cleanup_*_failed events with remediation hints; no silenced curl. ADMIN_TOKEN expiring mid-run surfaces as a structured event rather than a silent leak.

Activity trace caveat

If activity_trace.raw == "<endpoint_unavailable>", the per-workspace /activity endpoint isn't wired on the target build — the bound measurement is INCONCLUSIVE on the platform-ceiling question. Either wire the endpoint or replace with the equivalent Datadog query. Note that /activity accepts a since_secs query parameter; see the endpoint handler for the supported range.

Other scripts

cleanup-rogue-workspaces.sh — emergency teardown for leaked workspaces. Prompts for confirmation. Pair with the harnesses if a cleanup trap fails (see cleanup_*_failed events).
canary-smoke.sh — quick smoke test for canary releases.
dev-start.sh — local-dev platform bring-up.

The rest are self-documenting in their header comments.