forked from molecule-ai/molecule-core
PR #2756 added a try/except around adapter.setup() so a missing LLM key doesn't crash the workspace boot. Two paths that now run AFTER setup succeeds were not similarly isolated, leaving small but real coupling risks for future adapter authors. 1. **Skill metadata enrichment swap (main.py:248-259).** When adapter.setup() returns, main.py reads adapter.loaded_skills and replaces the static stubs in agent_card.skills with rich metadata (description, tags, examples). The list comprehension assumes each element exposes .metadata.{id,name,description,tags,examples}. A future adapter that returns a non-canonical shape would raise AttributeError, propagate to the outer except, capture as adapter_error, and silently degrade an OK boot to the not-configured state — even though setup() actually succeeded. Extract to card_helpers.enrich_card_skills(card, loaded_skills) → bool. Helper swallows enrichment failures, logs the cause, returns False, leaves the static stubs in place. setup() success path continues unchanged. 6 unit tests cover: None input, empty list, canonical happy path, missing .metadata attr, partial .metadata (missing one canonical field), atomic-failure-no-partial-swap. 2. **/transcript handler (main.py:513).** Calls await adapter.transcript_lines(...) without try/except. BaseAdapter's default returns {"supported": false} so today's 4 adapters never trigger this — but a future adapter override that assumes setup() ran would surface as a 500 from Starlette's default error handler instead of a useful 503 with the exception class + message. Inline try/except returns 503 with the reason, matching the not-configured JSON-RPC handler's pattern. Both changes match the architectural principle the PR #2756 chain established: availability (workspace reachable) is decoupled from configuration / adapter behavior. Operators see useful errors instead of silent degradation; future adapter authors can't accidentally break tenant readiness with a shape mismatch. Adds: - workspace/card_helpers.py (~50 lines, 100% covered) - workspace/tests/test_card_helpers.py (6 tests) - AgentCard/AgentSkill/AgentCapabilities/AgentInterface stubs to workspace/tests/conftest.py so future card-related tests work under the existing a2a-mock infrastructure - card_helpers in TOP_LEVEL_MODULES (drift gate would have caught it) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| demo-freeze-snapshots | ||
| ops | ||
| build_runtime_package.py | ||
| build-images.sh | ||
| bundle-compile.sh | ||
| canary-smoke.sh | ||
| check-cascade-list-vs-manifest.sh | ||
| cleanup-rogue-workspaces.sh | ||
| clone-manifest.sh | ||
| demo-day-runbook.md | ||
| demo-freeze.sh | ||
| demo-thaw.sh | ||
| dev-start.sh | ||
| import-agent.sh | ||
| lockdown-tenant-sg.sh | ||
| measure-coordinator-task-bounds-runner.sh | ||
| measure-coordinator-task-bounds.sh | ||
| nuke-and-rebuild.sh | ||
| post-rebuild-setup.sh | ||
| README.md | ||
| refresh-workspace-images.sh | ||
| rollback-latest.sh | ||
| test_build_runtime_package.py | ||
| test-a2a-cross-runtime.sh | ||
| test-all-adapters.sh | ||
| test-all-runtimes-a2a-e2e.sh | ||
| test-all.sh | ||
| test-cross-agent-chat.sh | ||
| test-hermes-plugin-e2e.sh | ||
| test-nuke-and-rebuild.sh | ||
| test-team-e2e.sh | ||
| wheel_smoke.py | ||
scripts/
Operational and one-off scripts for molecule-core. Most are self-documenting — see the header comments in each file.
RFC #2251 coordinator task-bound harnesses
There are three related scripts; pick the right one:
| Script | Purpose | Targets |
|---|---|---|
measure-coordinator-task-bounds.sh |
Canonical v1 harness for the RFC #2251 / Issue 4 reproduction. Provisions a PM coordinator + Researcher child via claude-code-default + langgraph templates, sends a synthesis-heavy A2A kickoff, observes elapsed time + activity trace. |
OSS-shape platform — localhost or any /workspaces-shaped endpoint. Has tenant/admin-token guards for non-localhost runs. |
measure-coordinator-task-bounds-runner.sh |
Generalised runner for the same measurement contract but with arbitrary template + secret + model combinations (Hermes/MiniMax, etc.). Useful for cross-runtime variants without modifying the canonical harness. | Same as above (local or SaaS via MODE=saas). |
measure-coordinator-task-bounds.sh (in molecule-controlplane) |
Production-shape variant that bootstraps a real staging tenant via POST /cp/admin/orgs, then runs the same measurement against <slug>.staging.moleculesai.app. |
Staging controlplane only — refuses to run against production. |
See reference_harness_pair_pattern (auto-memory) for when to use which
and the cross-repo design rationale.
Common safety pattern across all three
- Cleanup trap on EXIT/INT/TERM auto-deletes provisioned resources.
DRY_RUN=1prints plan + auth fingerprint, exits before any state mutation. Run this before pointing at staging or any shared infrastructure.- Non-target guard refuses arbitrary endpoints (the controlplane
variant is locked to
staging-api.moleculesai.app; the OSS variant requires explicit auth + tenant scoping for non-localhost PLATFORM). - Cleanup failures emit
cleanup_*_failedevents with remediation hints; no silenced curl. ADMIN_TOKEN expiring mid-run surfaces as a structured event rather than a silent leak.
Activity trace caveat
If activity_trace.raw == "<endpoint_unavailable>", the per-workspace
/activity endpoint isn't wired on the target build — the bound
measurement is INCONCLUSIVE on the platform-ceiling question. Either
wire the endpoint or replace with the equivalent Datadog query. Note
that /activity accepts a since_secs query parameter; see the
endpoint handler for the supported range.
Other scripts
cleanup-rogue-workspaces.sh— emergency teardown for leaked workspaces. Prompts for confirmation. Pair with the harnesses if a cleanup trap fails (seecleanup_*_failedevents).canary-smoke.sh— quick smoke test for canary releases.dev-start.sh— local-dev platform bring-up.
The rest are self-documenting in their header comments.