Compare commits
2 Commits
main
...
docs/claud
| Author | SHA1 | Date | |
|---|---|---|---|
| a3ad08e8ab | |||
| 97fbdfb74f |
@ -1 +0,0 @@
|
||||
ci-auth-test3-1778443616
|
||||
@ -1 +0,0 @@
|
||||
ci-auth-test4-1778443835
|
||||
@ -19,4 +19,4 @@ on:
|
||||
|
||||
jobs:
|
||||
secret-scan:
|
||||
uses: molecule-ai/molecule-core/.gitea/workflows/secret-scan.yml@staging
|
||||
uses: Molecule-AI/molecule-core/.github/workflows/secret-scan.yml@staging
|
||||
@ -274,28 +274,6 @@ Each workspace exposes an A2A server, builds an Agent Card, and registers with t
|
||||
|
||||
But the long-term collaboration model remains direct workspace-to-workspace communication via A2A.
|
||||
|
||||
## Known Limitations
|
||||
|
||||
### Playwright / browser system libs are not installed
|
||||
|
||||
The base `molecule-ai-workspace-runtime` image is built on `python:3.11-slim` with Node.js 22, git, and `gh` — about 500 MB. It deliberately **does not** include the system libraries Chromium needs (`libnss3`, `libatk-bridge2.0-0`, `libxkbcommon0`, `libcups2`, `libdrm2`, `libxcomposite1`, `libxdamage1`, `libxrandr2`, `libgbm1`, `libpango-1.0-0`, `libasound2`, etc.). Adding them would inflate the image by ~200–250 MB (~40%) for every workspace, even though only frontend / QA workspaces ever launch a browser.
|
||||
|
||||
Practical consequences:
|
||||
|
||||
- `npx playwright test` (and any other Chromium-driven E2E tooling) **will fail at browser launch** when run from inside an in-container workspace agent.
|
||||
- The error surface is missing-shared-object messages such as `error while loading shared libraries: libnss3.so` or `Host system is missing dependencies to run browsers`.
|
||||
- Unit and integration tests (Vitest, Jest, etc.) that don't spawn a real browser are unaffected.
|
||||
|
||||
Recommended workflow:
|
||||
|
||||
1. **Run E2E in CI**, not in-container. The Gitea Actions self-hosted runner (and GitHub Actions runners used by mirror repos) has the full Playwright dep set installed and is the supported surface for E2E. Push a branch, let CI run the suite.
|
||||
2. **Local debugging** of a single failing spec is best done on a developer laptop with `npx playwright install-deps` run once.
|
||||
3. **In-container iteration** on test logic itself is fine — write specs, lint them, type-check them — just don't expect `playwright test` to actually launch a browser.
|
||||
|
||||
If a particular workspace role genuinely needs in-container E2E (a dedicated QA template, for instance), the right place to layer Playwright deps is in a **role-specific adapter template image** that does `FROM molecule-ai-workspace-runtime:<tag>` and adds `RUN npx playwright install-deps`. Open a request against `molecule-ai-workspace-runtime` if you need this template stamped.
|
||||
|
||||
Tracking issue: [molecule-ai/molecule-app#7](https://git.moleculesai.app/molecule-ai/molecule-app/issues/7).
|
||||
|
||||
## Related Docs
|
||||
|
||||
- [Agent Runtime Adapters](./cli-runtime.md)
|
||||
|
||||
@ -7,45 +7,6 @@ All notable changes to the Molecule AI platform are documented here.
|
||||
Entries are published daily at 23:50 UTC.
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-10
|
||||
|
||||
### ✨ New features
|
||||
|
||||
- **MCP HTTP/SSE transport for Hermes**: `a2a_mcp_server.py` now speaks HTTP + SSE in addition to stdio, enabling the Hermes runtime to host MCP tools over a network endpoint rather than only via child-process stdio. (`molecule-ai-workspace-runtime` [#5](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime/pull/5))
|
||||
- **molecule-sdk-python**: `RemoteAgentClient` now accepts `org_id` and `origin` kwargs in its constructor, enabling org-scoped registration and origin tracking from the first handshake. (`molecule-sdk-python` [#7](https://git.moleculesai.app/molecule-ai/molecule-sdk-python/pull/7))
|
||||
- **molecule-sdk-python**: `fetch_inbound()` now supports `peer_id` and `before_ts` filter params for targeted message retrieval — useful for polling a specific peer's pending tasks. (`molecule-sdk-python` [#6](https://git.moleculesai.app/molecule-ai/molecule-sdk-python/pull/6))
|
||||
- **molecule-sdk-python**: new `strip_a2a_boundary()` helper for safely stripping the `[A2A_RESULT_FROM_PEER]` trust-boundary marker from peer A2A responses (OFFSEC-003). Works correctly on both pre- and post-OFFSEC-003 responses. (`molecule-sdk-python` [#8](https://git.moleculesai.app/molecule-ai/molecule-sdk-python/pull/8))
|
||||
|
||||
### 🔧 Fixes
|
||||
|
||||
- **molecule-app**: WCAG 2.4.7 focus-visible rings added to all customer-facing buttons (`ThemeToggle`, `Track-issue Link`, and general CTA buttons) — keyboard and assistive-technology users now see a visible focus indicator on every interactive element. (`molecule-app` [#5](https://git.moleculesai.app/molecule-ai/molecule-app/pull/5), [#9](https://git.moleculesai.app/molecule-ai/molecule-app/pull/9), [#10](https://git.moleculesai.app/molecule-ai/molecule-app/pull/10))
|
||||
- **status.moleculesai.app aggregator**: the status page's probe result aggregator was rewritten to correctly compute composite uptime across all monitored endpoints — resolving false-down alerts caused by a data-structure bug in the previous implementation. (`molecule-ai-status` [#10](https://git.moleculesai.app/molecule-ai/molecule-ai-status/pull/10))
|
||||
- **molecule-sdk-python**: `InboundMessage` now surfaces `peer_name`, `peer_role`, and `agent_card_url` fields, enabling callers to attribute and inspect inbound A2A messages without a separate registry lookup. (`molecule-sdk-python` [#5](https://git.moleculesai.app/molecule-ai/molecule-sdk-python/pull/5))
|
||||
- **molecule-cli**: CI test workflow added — `molecule ci test` now runs a reproducible test suite against any workspace template. (`molecule-cli` [#3](https://git.moleculesai.app/molecule-ai/molecule-cli/pull/3))
|
||||
- **molecule-ai-workspace-runtime**: `a2a-sdk` dependency pinned to `>=1.0.0` to match the actual code — eliminates a version mismatch that caused `AttributeError` on newer SDK builds. (`molecule-ai-workspace-runtime` [#4](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime/pull/4))
|
||||
|
||||
### 📚 Docs
|
||||
|
||||
- **molecule-sdk-python**: README API surface additions covering the Phase 30.8 RemoteAgentClient API, including `org_id`, `origin`, `fetch_inbound`, `InboundMessage`, and `strip_a2a_boundary()`. (`molecule-sdk-python` [#4](https://git.moleculesai.app/molecule-ai/molecule-sdk-python/pull/4))
|
||||
- **molecule-ai-status**: status page documentation updated to reflect the new Gitea-native uptime probe replacing the Upptime dependency. (`molecule-ai-status` [#4](https://git.moleculesai.app/molecule-ai/molecule-ai-status/pull/4))
|
||||
- **molecule-sdk-python**: `pytest-asyncio` documented as an optional test dependency in `CLAUDE.md`. (`molecule-sdk-python` [#3](https://git.moleculesai.app/molecule-ai/molecule-sdk-python/pull/3))
|
||||
- **Remote Workspaces guide**: full `RemoteAgentClient` API reference added to `content/docs/guides/remote-workspaces.md`, covering constructor params, `fetch_inbound()`, `InboundMessage` fields, and the OFFSEC-003 `strip_a2a_boundary()` security section. (`docs` [#13](https://git.moleculesai.app/molecule-ai/docs/pull/13))
|
||||
- **status.moleculesai.app**: status page aggregator fix documented in the changelog. (`docs` [#14](https://git.moleculesai.app/molecule-ai/docs/pull/14))
|
||||
|
||||
### 🧹 Internal
|
||||
|
||||
- **CI migration wave**: 22 repos migrated CI workflows from `.github/workflows/` to `.gitea/workflows/` following the GitHub org suspension (post-suspension sweep). Affected repos: `molecule-cli`, `molecule-sdk-python`, `molecule-mcp-server`, and all 21 plugin repos.
|
||||
- **Plugin hygiene**: 20 plugin repos received `.gitignore` Python-ignores (`__pycache__/`, `*.pyc`) and `__pycache__` directory removal across the plugin ecosystem (`molecule-ai-plugin-*`).
|
||||
- **Plugin smoke-test suites**: 13 plugin repos (`molecule-ai-plugin-*`) now ship with documented smoke-test suites and coverage rationale READMEs (`tests/README.md`), adding test counts ranging from 21 to 26 tests per plugin.
|
||||
- **Hook path fixes**: `molecule-ai-plugin-molecule-freeze-scope` and `molecule-ai-plugin-molecule-audit-trail` received `get_repo_root()` layout detection fixes and corresponding test suites.
|
||||
- **molecule-ai-org-template-molecule-dev**: org-level `initial_prompt` updated from GitHub to Gitea URLs. (`molecule-ai-org-template-molecule-dev` [#8](https://git.moleculesai.app/molecule-ai/molecule-ai-org-template-molecule-dev/pull/8))
|
||||
- **molecule-ai-workspace-template-claude-code**: adapter alias-map now correctly maps `yaml_provider` for runtime-wheel defaults. (`molecule-ai-workspace-template-claude-code` [#12](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-claude-code/pull/12))
|
||||
- **molecule-ai-plugin-molecule-careful-bash**: token exfiltration pattern block (OFFSEC-002) now documented in `known-issues.md`. (`molecule-ai-plugin-molecule-careful-bash` [#3](https://git.moleculesai.app/molecule-ai/molecule-ai-plugin-molecule-careful-bash/pull/3))
|
||||
- **molecule-ci**: 7 reusable workflows ported to `.gitea/workflows/`, and Docker build smoke tests now gracefully skip when the daemon is unavailable. (`molecule-ci` [#6](https://git.moleculesai.app/molecule-ai/molecule-ci/pull/6), [#7](https://git.moleculesai.app/molecule-ai/molecule-ci/pull/7))
|
||||
|
||||
---
|
||||
|
||||
## 2026-04-23
|
||||
|
||||
### ✨ New features
|
||||
@ -114,120 +75,6 @@ Entries are published daily at 23:50 UTC.
|
||||
---
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## 2026-05-10
|
||||
|
||||
### ✨ New features
|
||||
|
||||
- **A2A priority queue — Phase 1**: task dispatch now supports a `priority` field (`low` / `normal` / `high` / `urgent`). High/urgent tasks bypass the normal FIFO queue and are dispatched immediately. (`molecule-core` [#225](https://git.moleculesai.app/molecule-ai/molecule-core/pull/225))
|
||||
- **Plugin drift detector + queue + admin apply endpoint**: a new plugin drift detection system monitors loaded plugins against their pinned SHAs and surfaces drift via a queue; admins can review and apply corrections via a new `/admin/plugin-apply` endpoint. (`molecule-core` [#204](https://git.moleculesai.app/molecule-ai/molecule-core/pull/204))
|
||||
- **workspace-server pre-restart A2A drain signal**: the workspace-server now sends a pre-restart A2A drain signal before restarting, allowing peer workspaces to gracefully drain pending tasks instead of timing out. (`molecule-core` [#207](https://git.moleculesai.app/molecule-ai/molecule-core/pull/207))
|
||||
- **Admin auth runbook**: new `admin-auth.md` runbook documents the test-token route lockdown and `AdminAuth` middleware behaviour for operators. (`molecule-core` [#220](https://git.moleculesai.app/molecule-ai/molecule-core/pull/220))
|
||||
- **Static `.github-token` fallback to git credential helper**: workspace-server now falls back to a static `.github-token` value when no git credential helper is configured, enabling simpler air-gapped setups. (`molecule-core` [#219](https://git.moleculesai.app/molecule-ai/molecule-core/pull/219))
|
||||
- **Keyboard shortcuts in Toolbar help dialog**: all keyboard shortcuts are now documented in a Toolbar help dialog accessible from the canvas top bar. (`molecule-core` [#244](https://git.moleculesai.app/molecule-ai/molecule-core/pull/244))
|
||||
- **HTTP/SSE transport for Hermes MCP**: `a2a_mcp_server.py` now exposes `--transport=http --port=<N>` for Hermes workspaces that prefer HTTP + SSE over stdio. Endpoints: `POST /mcp` (JSON-RPC), `GET /mcp/stream` (SSE), `GET /health`. (`molecule-ai-workspace-runtime` [#5](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime/pull/5))
|
||||
- **RemoteAgentClient `org_id` and `origin` kwargs**: `RemoteAgentClient` now accepts `org_id` (injected as `X-Molecule-Org-Id` header) and `origin` (injected as `Origin` header for request tracing) as constructor kwargs. Both propagate to all 14+ outbound call sites automatically via `_auth_headers()`. (`molecule-sdk-python` [#7](https://git.moleculesai.app/molecule-ai/molecule-sdk-python/pull/7))
|
||||
- **RemoteAgentClient `fetch_inbound()` filter params**: `fetch_inbound()` now accepts `peer_id` (narrow to a specific peer's messages) and `before_ts` (RFC3339 timestamp for cursor-based pagination). Enables agents to selectively consume inbound activity from known siblings. (`molecule-sdk-python` [#6](https://git.moleculesai.app/molecule-ai/molecule-sdk-python/pull/6))
|
||||
- **InboundMessage enrichment fields**: `InboundMessage` now exposes typed `peer_name`, `peer_role`, and `agent_card_url` attributes, surfaced from the platform's peer registry at dispatch time. Previously these were only accessible via the raw channel envelope. (`molecule-sdk-python` [#5](https://git.moleculesai.app/molecule-ai/molecule-sdk-python/pull/5))
|
||||
- **`strip_a2a_boundary()` — OFFSEC-003 trust-boundary SDK helper**: `molecule-sdk-python` now exports `strip_a2a_boundary(text)` to strip `[A2A_RESULT_FROM_PEER]...[/A2A_RESULT_FROM_PEER]` wrappers from peer-generated content. The platform wraps all external-peer responses in these markers so agents know not to re-inject the content as platform-native output. Safe on pre-OFFSEC-003 responses (returns input unchanged when markers absent) and on `None`/empty strings. (`molecule-sdk-python` [#8](https://git.moleculesai.app/molecule-ai/molecule-sdk-python/pull/8))
|
||||
|
||||
### 🔧 Fixes
|
||||
|
||||
- **Canvas accessibility — WCAG 2.4.7 focus-visible rings (batch 2)**: `focus-visible` keyboard rings added to 9 customer-facing buttons across molecule-app — SignInButton on the landing page, "Request access" on the waitlist page, "+ New Workspace" CTA and Notifications bell in the app shell, "Try again" on error boundaries, "Sign out" in the header, the "I agree" button on terms-gate, and "Manage keys on canvas" in the API tokens view. ARIA attributes (`aria-current`, `aria-label`, `aria-busy`) also corrected on the billing view PlanCard and portal buttons. All rings use semantic color tokens — no hardcoded hex colors. (`molecule-app` [#5](https://git.moleculesai.app/molecule-ai/molecule-app/pull/5))
|
||||
- **Canvas accessibility — WCAG 2.4.7 ThemeToggle focus ring**: `focus-visible` keyboard ring added to the three theme-preference radio buttons (Light / System / Dark) in `ThemeToggle`, fixing WCAG 2.4.7 for the theme switcher. (`molecule-app` [#10](https://git.moleculesai.app/molecule-ai/molecule-app/pull/10))
|
||||
- **Canvas accessibility — WCAG 2.4.7 NotImplementedState focus ring**: `focus-visible` keyboard ring added to the "Track issue #N" link in `NotImplementedState`, completing the WCAG 2.4.7 focus-visible ring coverage across all customer-facing interactive elements. (`molecule-app` [#9](https://git.moleculesai.app/molecule-ai/molecule-app/pull/9))
|
||||
- **SSRF validation before writing external workspace URL**: the workspace handler now validates URLs against SSRF allowlists before writing external workspace configurations. (`molecule-core` [#221](https://git.moleculesai.app/molecule-ai/molecule-core/pull/221))
|
||||
- **Dockerfile tenant chown /org-templates**: `/org-templates` directory now correctly chowned to the canvas user to fix `EACCES` on `mkdir` for external resolvers. (`molecule-core` [#223](https://git.moleculesai.app/molecule-ai/molecule-core/pull/223))
|
||||
- **CI `ghcr` → `ECR` migration + POST route smoke tests**: canary-verify workflow migrated from GHCR to ECR; new POST route smoke tests added for deployment verification. (`molecule-core` [#217](https://git.moleculesai.app/molecule-ai/molecule-core/pull/217))
|
||||
- **CI `dorny/paths-filter` → shell-based git diff**: replaced `dorny/paths-filter` with shell-based git diff for Gitea Actions compatibility. (`molecule-core` [#208](https://git.moleculesai.app/molecule-ai/molecule-core/pull/208))
|
||||
- **SOP tier-check clause splitter strips newlines**: the SOP tier-check script's clause splitter now correctly preserves newlines, fixing every `tier:low` PR CI failure. (`molecule-core` [#243](https://git.moleculesai.app/molecule-ai/molecule-core/pull/243))
|
||||
- **SOP tier-check APPROVER_TEAMS pattern matching**: outer quotes removed from case patterns in `APPROVER_TEAMS` matching logic, fixing approval team resolution. (`molecule-core` [#231](https://git.moleculesai.app/molecule-ai/molecule-core/pull/231))
|
||||
- **CI port `publish-workspace-server-image.yml` to `.gitea/workflows/`**: `publish-workspace-server-image.yml` migrated from `.github/workflows/` to `.gitea/workflows/` for Gitea Actions parity. (`molecule-core` [#237](https://git.moleculesai.app/molecule-ai/molecule-core/pull/237))
|
||||
- **CI port `publish-runtime.yml` to `.gitea/workflows/`**: `publish-runtime.yml` migrated from `.github/workflows/` to `.gitea/workflows/` for Gitea Actions parity. (`molecule-core` [#211](https://git.moleculesai.app/molecule-ai/molecule-core/pull/211))
|
||||
- **Docker base image digests pinned**: base image digests pinned in all Dockerfiles to ensure reproducible builds and prevent unexpected base image updates. (`molecule-core` [#199](https://git.moleculesai.app/molecule-ai/molecule-core/pull/199))
|
||||
- **KeyboardShortcutsDialog corrected**: keyboard shortcuts dialog text corrected and min-clamp test expectations fixed. (`molecule-core` [#200](https://git.moleculesai.app/molecule-ai/molecule-core/pull/200))
|
||||
- **`MODEL_PROVIDER` env var deprecated**: the `MODEL_PROVIDER` env var was misnamed — it carried the model ID (e.g. `claude-opus-4-7`) despite its name, and was being misused as a runtime selector. The runtime now accepts `MODEL` and `MOLECULE_MODEL` as the canonical env var for model selection. `MODEL_PROVIDER` still works but emits a deprecation warning. (`molecule-core` [#280](https://git.moleculesai.app/molecule-ai/molecule-core/pull/280))
|
||||
- **`delegate_task` self-delegation guard**: calling `delegate_task` with your own workspace ID now returns an early actionable error instead of deadlocking the task lock. Previously self-delegation would hold `_run_lock`, timeout after 30 s, and waste the turn. (`molecule-core` [#291](https://git.moleculesai.app/molecule-ai/molecule-core/pull/291))
|
||||
- **status.moleculesai.app false "down" reports fixed**: the custom uptime-probe binary correctly writes raw JSONL results but the aggregator step — which renders `history/<slug>.yml` and `history/summary.json` in Upptime format — was not migrated when the probe moved from Upptime to the custom binary post-2026-05-06. The missing aggregator caused `status.moleculesai.app` to show false-positive outages for Canvas and other endpoints. Resolved by adding the probe result aggregator. (`molecule-ai-status` [#10](https://git.moleculesai.app/molecule-ai/molecule-ai-status/pull/10))
|
||||
|
||||
### 📚 Docs
|
||||
|
||||
- **Canvas known issues section cleaned up**: duplicate entries removed from known issues; pre-commit action link fixed. (`molecule-core` [#202](https://git.moleculesai.app/molecule-ai/molecule-core/pull/202))
|
||||
- **Canvas controls section corrected**: Canvas Controls section corrected to reflect current keyboard navigation and MiniMap state. (`molecule-core` [#201](https://git.moleculesai.app/molecule-ai/molecule-core/pull/201))
|
||||
|
||||
### 🧹 Internal
|
||||
|
||||
- **SOP tier-check AND-composition of required team approvals per tier**: tier-check now enforces AND-composition of required team approvals per tier (`tier:high`). (`molecule-core` [#225](https://git.moleculesai.app/molecule-ai/molecule-core/pull/225))
|
||||
- **Canvas structural tests for TIER_CONFIG and COMM_TYPE_LABELS**: structural tests added for canvas TIER_CONFIG and COMM_TYPE_LABELS constants. (`molecule-core` [#245](https://git.moleculesai.app/molecule-ai/molecule-core/pull/245))
|
||||
|
||||
|
||||
## 2026-05-09
|
||||
|
||||
### ✨ New features
|
||||
|
||||
- **Keyboard-accessible canvas node resize**: Cmd/Ctrl+Arrow keys now resize canvas nodes in the topology view, satisfying WCAG AA keyboard navigation requirements. (`molecule-core` [#192](https://git.moleculesai.app/molecule-ai/molecule-core/pull/192))
|
||||
- **Keyboard-accessible edge anchors**: Enter/Space on an edge now selects the anchor for keyboard-based topology editing. (`molecule-core` [#190](https://git.moleculesai.app/molecule-ai/molecule-core/pull/190))
|
||||
|
||||
### 🔧 Fixes
|
||||
|
||||
- **Handlers auto-restart workspace after file write/delete/replace**: file mutations via the Canvas editor now correctly trigger workspace restart, ensuring the agent picks up the new file state without manual intervention. (`molecule-core` [#188](https://git.moleculesai.app/molecule-ai/molecule-core/pull/188))
|
||||
- **CI `gh api` → Gitea API migration**: all GitHub Actions `gh api` calls replaced with Gitea-compatible alternatives — CI now runs cleanly in Gitea Actions without GitHub dependency. (`molecule-core` [#191](https://git.moleculesai.app/molecule-ai/molecule-core/pull/191))
|
||||
- **WCAG AA contrast fix + KeyboardShortcutsDialog improvements**: toolbar contrast ratios corrected for WCAG AA compliance; keyboard shortcuts dialog now scrolls properly on small viewports. (`molecule-core` [#198](https://git.moleculesai.app/molecule-ai/molecule-core/pull/198))
|
||||
|
||||
### 📚 Docs
|
||||
|
||||
- **Canvas accessibility audit — all gaps now closed**: the accessibility audit doc updated to reflect fully closed status. (`molecule-core` [#197](https://git.moleculesai.app/molecule-ai/molecule-core/pull/197))
|
||||
- **Canvas controls section corrected**: keyboard accessibility and MiniMap presence now correctly documented. (`molecule-core` [#201](https://git.moleculesai.app/molecule-ai/molecule-core/pull/201))
|
||||
- **Stale audit doc text fixed**: stale text from PR #182 corrected in canvas audit documentation. (`molecule-core` [#187](https://git.moleculesai.app/molecule-ai/molecule-core/pull/187))
|
||||
|
||||
### 🧹 Internal
|
||||
|
||||
- **gh-identity module path migration**: `github.com/Molecule-AI/gh-identity` imports migrated to `git.moleculesai.app/molecule-ai/gh-identity` across all workspace templates. (`molecule-core` [#189](https://git.moleculesai.app/molecule-ai/molecule-core/pull/189))
|
||||
- **Pending uploads test isolation fix**: sweeper test isolation corrected — eliminates cross-test pollution in CI. (`molecule-core` [#185](https://git.moleculesai.app/molecule-ai/molecule-core/pull/185))
|
||||
- **Poll error counter to 0 before assert**: RecordsMetricsOnSuccess now polls error counter to 0 before asserting, eliminating flaky E2E test failures. (`molecule-core` [#194](https://git.moleculesai.app/molecule-ai/molecule-core/pull/194))
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-08
|
||||
|
||||
### 🔧 Fixes
|
||||
|
||||
- **molecule-app CI testTimeout bumped to 20s**: vitest `testTimeout` increased to 20 s to handle shared act_runner load on the molecule-app repo. (`molecule-app` [#4](https://git.moleculesai.app/molecule-ai/molecule-app/pull/4))
|
||||
- **molecule-app drops staging branch — trunk-based migration**: first repo of the trunk-based development migration; staging branch removed. (`molecule-app` [#3](https://git.moleculesai.app/molecule-ai/molecule-app/pull/3))
|
||||
- **docs CI switches to ubuntu-latest**: docs repo CI now uses `ubuntu-latest` now that the repo is public. (`docs` [#4](https://git.moleculesai.app/molecule-ai/docs/pull/4))
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-07
|
||||
|
||||
### 📚 Docs
|
||||
|
||||
- **Install guide — GitHub.com refs → Gitea**: all active `github.com/Molecule-AI` references migrated to `git.moleculesai.app/molecule-ai` in the installation docs. (`docs` [#1](https://git.moleculesai.app/molecule-ai/docs/pull/1))
|
||||
- **Website github.com → Gitea link migration**: `molecules-market` website links updated to point at Gitea. (`landingpage` [#3](https://git.moleculesai.app/molecule-ai/landingpage/pull/3))
|
||||
- **molecule-monorepo → molecule-core rename (Phase 4)**: landingpage follow-up renaming of `molecule-monorepo` to `molecule-core` in all cross-repo references. (`landingpage` [#4](https://git.moleculesai.app/molecule-ai/landingpage/pull/4))
|
||||
- **CI lowercase 'molecule-ai/' in cross-repo workflow refs**: cross-repo workflow references now consistently lowercase for Gitea Actions compatibility. (`landingpage` [#2](https://git.moleculesai.app/molecule-ai/landingpage/pull/2))
|
||||
- **Market Purchase button on tier cards**: demo Mock #1 — Purchase button now appears on tier cards in the molecules-market. (`landingpage` [#5](https://git.moleculesai.app/molecule-ai/landingpage/pull/5))
|
||||
|
||||
### 🔧 Fixes
|
||||
|
||||
- **molecule-app runs-on ubuntu-latest**: Hetzner runner labels post-suspension; CI now uses `ubuntu-latest`. (`molecule-app` [#1](https://git.moleculesai.app/molecule-ai/molecule-app/pull/1))
|
||||
- **molecule-app GitHub → Gitea URL migration**: all `github.com/Molecule-AI` references migrated to `git.moleculesai.app/molecule-ai` in molecule-app. (`molecule-app` [#2](https://git.moleculesai.app/molecule-ai/molecule-app/pull/2))
|
||||
- **docs GitHub → Gitea URL migration**: `github.com/Molecule-AI` references migrated to Gitea across docs repo. (`docs` [#3](https://git.moleculesai.app/molecule-ai/docs/pull/3))
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-06
|
||||
|
||||
### 🧹 Internal
|
||||
|
||||
- **molecule-core org-wide Gitea URL migration**: all `github.com/Molecule-AI` references migrated to `git.moleculesai.app/molecule-ai` across all repos in the org. (`molecule-core`)
|
||||
- **Hetzner act-runner suspension**: CI runners updated to use `ubuntu-latest` labels following Hetzner act-runner suspension. (`molecule-app` [#1](https://git.moleculesai.app/molecule-ai/molecule-app/pull/1))
|
||||
|
||||
---
|
||||
|
||||
## 2026-04-22
|
||||
|
||||
### ✨ New features
|
||||
|
||||
@ -110,9 +110,7 @@ import os, logging
|
||||
client = RemoteAgentClient(
|
||||
workspace_id = os.environ["WORKSPACE_ID"],
|
||||
platform_url = os.environ["PLATFORM_URL"],
|
||||
org_id = os.environ["ORG_ID"], # optional — injected as X-Molecule-Org-Id header
|
||||
origin = "my-agent/1.0", # optional — injected as Origin header for tracing
|
||||
agent_card = {"name": "researcher", "skills": ["web-search", "research"]},
|
||||
agent_card = {"name": "researcher", "skills": ["web-search", "research"]},
|
||||
)
|
||||
client.register() # Phase 30.1 — get + cache token
|
||||
secrets = client.pull_secrets() # Phase 30.2 — decrypt API keys
|
||||
@ -132,84 +130,6 @@ The agent appears on the canvas with a **purple REMOTE badge** within seconds. F
|
||||
|
||||
---
|
||||
|
||||
## RemoteAgentClient API Reference
|
||||
|
||||
### Constructor
|
||||
|
||||
```python
|
||||
from molecule_agent import RemoteAgentClient
|
||||
|
||||
client = RemoteAgentClient(
|
||||
workspace_id = "ws-...", # required — your workspace UUID
|
||||
platform_url = "https://...", # required — your platform base URL
|
||||
auth_token = "...", # optional — set to skip the register() step if you already have a token
|
||||
org_id = "org-...", # optional — injected as X-Molecule-Org-Id on every request
|
||||
origin = "my-agent/1.0", # optional — injected as Origin header for request tracing
|
||||
agent_card = {...}, # optional — updated on every heartbeat
|
||||
)
|
||||
```
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|---|---|---|
|
||||
| `workspace_id` | `str` | Your workspace UUID, obtained from `POST /workspaces`. |
|
||||
| `platform_url` | `str` | Your platform base URL, e.g. `https://acme.moleculesai.app`. |
|
||||
| `auth_token` | `str` | Pre-obtained bearer token. If omitted, call `register()` to fetch one. |
|
||||
| `org_id` | `str` | Optional org UUID. When set, injected as `X-Molecule-Org-Id` on every outbound request. |
|
||||
| `origin` | `str` | Optional UA string (e.g. `"researcher/2.1"`). Injected as `Origin` header for logging/tracing. |
|
||||
| `agent_card` | `dict` | Agent metadata broadcast to the canvas. Updated on every heartbeat. |
|
||||
|
||||
### fetch_inbound(peer_id=, before_ts=)
|
||||
|
||||
Poll for inbound A2A messages directed at this workspace:
|
||||
|
||||
```python
|
||||
messages = client.fetch_inbound(peer_id="ws-peer-uuid", before_ts="2026-05-10T12:00:00Z")
|
||||
for msg in messages:
|
||||
print(msg.id, msg.method, msg.peer_name, msg.peer_role)
|
||||
```
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|---|---|---|
|
||||
| `peer_id` | `str` | Filter to messages from a specific peer workspace UUID. Omit to receive from all peers. |
|
||||
| `before_ts` | `str` | RFC3339 timestamp. Return only messages older than this cut-off. Use for pagination by tracking the oldest message seen. |
|
||||
|
||||
Returns a list of `InboundMessage` objects.
|
||||
|
||||
### InboundMessage
|
||||
|
||||
Each inbound message carries these fields in addition to the standard A2A fields:
|
||||
|
||||
| Field | Type | Description |
|
||||
|---|---|---|
|
||||
| `id` | `str` | Message ID. |
|
||||
| `method` | `str` | A2A method (`delegate_task`, `cancel_task`, etc.). |
|
||||
| `params` | `dict` | Method parameters. |
|
||||
| `peer_id` | `str` | UUID of the peer workspace that sent this message. |
|
||||
| `peer_name` | `str` | Display name of the sending peer (from its agent card). |
|
||||
| `peer_role` | `str` | Role of the sending peer (`"sre"`, `"frontend"`, etc.). |
|
||||
| `agent_card_url` | `str` | URL of the sending peer's agent card. |
|
||||
| `raw` | `dict` | Raw channel envelope for forward-compatibility. |
|
||||
|
||||
> **Note:** `peer_name`, `peer_role`, and `agent_card_url` are enriched from the platform's peer registry at dispatch time. They are `None` if the sending peer has not registered an agent card.
|
||||
|
||||
### Security: OFFSEC-003 — trust-boundary markers on peer responses
|
||||
|
||||
When a remote workspace receives a `delegate_task` response from an external peer, the platform wraps the peer-generated content in `[A2A_RESULT_FROM_PEER]...[/A2A_RESULT_FROM_PEER]` trust-boundary markers. These markers signal to the agent that the enclosed content originated outside the platform's trust boundary and must not be re-injected as platform-native output.
|
||||
|
||||
Use `strip_a2a_boundary()` to strip the wrappers before processing the content:
|
||||
|
||||
```python
|
||||
from molecule_agent import RemoteAgentClient, strip_a2a_boundary
|
||||
|
||||
# Normalise inbound peer result — safe on pre-OFFSEC-003 responses (returns
|
||||
# input unchanged when markers absent) and on None/empty strings.
|
||||
result = strip_a2a_boundary(msg.params.get("result", ""))
|
||||
```
|
||||
|
||||
This is particularly important when displaying peer results to users or using them as tool inputs — always strip the boundary markers first. See `molecule-core` [#334](https://git.moleculesai.app/molecule-ai/molecule-core/pull/334) for the platform-side implementation.
|
||||
|
||||
---
|
||||
|
||||
## What Phase 30 Covers
|
||||
|
||||
| Phase | What shipped | Endpoint |
|
||||
|
||||
@ -1,214 +0,0 @@
|
||||
---
|
||||
title: "a2a-sdk v0 → v1 migration"
|
||||
description: "Cheat sheet for migrating workspace runtime code (and forks) from a2a-sdk 0.3.x to 1.x — renamed/removed symbols, common error shapes, before/after diffs."
|
||||
---
|
||||
|
||||
import { Callout } from 'fumadocs-ui/components/callout';
|
||||
|
||||
The `a2a-sdk` Python package released v1.0 in late April 2026. The
|
||||
Molecule workspace runtime migrated under tracking ID **KI-009** and
|
||||
shipped in `molecule-ai-workspace-runtime` **v0.1.11** (commit
|
||||
`d5cf872`, PR #39). The platform now runs exclusively on v1.
|
||||
|
||||
If you're consuming the platform's published wheel, bumping
|
||||
`molecule-ai-workspace-runtime>=0.1.11` handles the migration for
|
||||
you. If you maintain a fork of the runtime, an external agent talking
|
||||
A2A directly, or your own adapter that imports from `a2a.*`, this page
|
||||
is your checklist.
|
||||
|
||||
## Why migrate
|
||||
|
||||
- **Upstream**: `a2a-sdk` 1.0 reorganised the import surface, flattened
|
||||
`Part`, removed deprecated capability flags, and replaced the
|
||||
`A2AStarletteApplication` wrapper with explicit Starlette route
|
||||
factories.
|
||||
- **Platform**: as of 2026-04-24 the platform sends/receives via v1
|
||||
shapes natively. The SDK ships a v0_3 compat layer (enabled in the
|
||||
runtime via `enable_v0_3_compat=True` on `create_jsonrpc_routes`) so
|
||||
in-flight 0.x callers don't break, but new code should target v1.
|
||||
- **Forks/external runtimes**: v0 code throws on `import a2a.utils`
|
||||
and `from a2a.server.apps import A2AStarletteApplication` once you
|
||||
install v1, so the migration is a hard cutover at install time, not
|
||||
a soft deprecation.
|
||||
|
||||
## Cheat sheet — renamed and removed symbols
|
||||
|
||||
The four breaking changes that hit the Molecule runtime during KI-009.
|
||||
All four are confirmed against
|
||||
`molecule-core/workspace/` source.
|
||||
|
||||
### 1. `new_agent_text_message` renamed to `new_text_message`
|
||||
|
||||
- **v0 location**: `a2a.utils.new_agent_text_message`
|
||||
- **v1 location**: `a2a.helpers.new_text_message`
|
||||
|
||||
Both the module path and the symbol name changed.
|
||||
|
||||
### 2. `Part` API flattened — `TextPart` removed
|
||||
|
||||
- **v0**: `Part(root=TextPart(text="..."))` — `Part` wrapped a `root`
|
||||
union of `TextPart` / `FilePart` / `DataPart`.
|
||||
- **v1**: `Part(text="...")` — `Part` accepts the text payload
|
||||
directly. `TextPart` no longer exists as a public symbol.
|
||||
|
||||
`FilePart` / `DataPart` are similarly flattened (`Part(file=...)`,
|
||||
`Part(data=...)`); the Molecule runtime only emits text parts so the
|
||||
file/data shapes weren't exercised in KI-009 and aren't covered by
|
||||
this guide.
|
||||
|
||||
### 3. `A2AStarletteApplication` removed — use route factories
|
||||
|
||||
- **v0**: `from a2a.server.apps import A2AStarletteApplication` then
|
||||
`A2AStarletteApplication(agent_card, request_handler).build()`.
|
||||
- **v1**: `from a2a.server.routes import create_agent_card_routes,
|
||||
create_jsonrpc_routes` then build a Starlette app from the returned
|
||||
route lists.
|
||||
|
||||
The factories also let you mount the JSON-RPC endpoint at any path
|
||||
(the runtime mounts at `/` because the platform POSTs to root, see
|
||||
`workspace/main.py:279`).
|
||||
|
||||
### 4. `state_transition_history` capability flag removed
|
||||
|
||||
- **v0**: `AgentCapabilities(streaming=..., push_notifications=...,
|
||||
state_transition_history=True)` was a per-agent opt-in.
|
||||
- **v1**: the field is gone from `AgentCapabilities`. Per the SDK's own
|
||||
`a2a/compat/v0_3/conversions.py`: *"No longer supported in v1.0"*.
|
||||
The capability is now universal — `Task.history` is always available
|
||||
and `tasks/get` accepts `historyLength` via `apply_history_length()`.
|
||||
|
||||
If you pass `state_transition_history=...` as a kwarg to
|
||||
`AgentCapabilities` under v1, Pydantic will reject it. Drop the kwarg.
|
||||
See [`workspace/main.py:215`](https://git.moleculesai.app/Molecule-AI/molecule-core/blob/main/workspace/main.py#L215)
|
||||
for the explanatory comment that prevents future accidental re-adds.
|
||||
|
||||
## Common error shapes
|
||||
|
||||
When v0 code runs against the v1 SDK, the failure modes look like this:
|
||||
|
||||
| Error | Cause |
|
||||
|---|---|
|
||||
| `ModuleNotFoundError: No module named 'a2a.utils'` | v0 import path; module renamed to `a2a.helpers`. |
|
||||
| `ImportError: cannot import name 'A2AStarletteApplication' from 'a2a.server.apps'` | The whole `a2a.server.apps` module is gone in v1. Switch to `a2a.server.routes` factories. |
|
||||
| `ImportError: cannot import name 'TextPart' from 'a2a.types'` | Flattened `Part` API; use `Part(text=...)`. |
|
||||
| `ValueError: Protocol message AgentCapabilities has no "state_transition_history" field` | Removed capability flag passed as kwarg; drop it. |
|
||||
| `ValueError: Protocol message Part has no "root" field` | v0 `Part(root=TextPart(...))` shape against v1 schema; flatten to `Part(text=...)`. |
|
||||
|
||||
The protobuf-style `ValueError` messages always follow the pattern
|
||||
`Protocol message <Type> has no "<field>" field` — that's the
|
||||
fingerprint of "v0 shape against v1 schema." Treat it as a v0→v1 hint
|
||||
even if the field name isn't on the cheat sheet above.
|
||||
|
||||
## Migration checklist
|
||||
|
||||
1. **Bump the dep** — `a2a-sdk[http-server]>=0.3.25` is the floor; remove
|
||||
any `<1.0` upper bound. The Molecule wheel uses
|
||||
`a2a-sdk[http-server]>=0.3.25` with no upper bound (see
|
||||
[`molecule-ai-workspace-runtime/pyproject.toml`](https://git.moleculesai.app/Molecule-AI/molecule-ai-workspace-runtime/blob/main/pyproject.toml)).
|
||||
2. **Fix imports** — sweep the four renamed/removed symbols above. A
|
||||
safe grep is `grep -rn "from a2a\\|import a2a"` across your tree.
|
||||
3. **Fix removed-field reads/writes** — search for
|
||||
`state_transition_history` usage and delete the kwarg/field access.
|
||||
4. **Flatten `Part` constructors** — search for `Part(root=` and
|
||||
convert to `Part(text=...)` / `Part(file=...)` / `Part(data=...)`.
|
||||
5. **Replace the app factory** — search for `A2AStarletteApplication`
|
||||
and rewrite the bootstrap using `create_agent_card_routes` +
|
||||
`create_jsonrpc_routes`. Pass `enable_v0_3_compat=True` to
|
||||
`create_jsonrpc_routes` if your peers may still be on v0.
|
||||
6. **Re-run tests** — fixture-level mocks of `a2a.helpers` /
|
||||
`a2a.utils` need to mock both names so tests still pass during the
|
||||
rename rollout (see
|
||||
[`workspace/tests/conftest.py:105-111`](https://git.moleculesai.app/Molecule-AI/molecule-core/blob/main/workspace/tests/conftest.py#L105-L111)
|
||||
for the dual-name pattern).
|
||||
|
||||
## Before / after diffs
|
||||
|
||||
### `new_agent_text_message` → `new_text_message`
|
||||
|
||||
```diff
|
||||
-from a2a.utils import new_agent_text_message
|
||||
+from a2a.helpers import new_text_message
|
||||
|
||||
async def execute(self, context, event_queue):
|
||||
- await event_queue.enqueue_event(new_agent_text_message("hello"))
|
||||
+ await event_queue.enqueue_event(new_text_message("hello"))
|
||||
```
|
||||
|
||||
### Flat `Part` API
|
||||
|
||||
```diff
|
||||
-from a2a.types import Part, TextPart
|
||||
+from a2a.types import Part
|
||||
|
||||
-msg_parts = [Part(root=TextPart(text=final_text))]
|
||||
+msg_parts = [Part(text=final_text)]
|
||||
```
|
||||
|
||||
### `AgentCapabilities` — drop `state_transition_history`
|
||||
|
||||
```diff
|
||||
capabilities=AgentCapabilities(
|
||||
streaming=config.a2a.streaming,
|
||||
push_notifications=config.a2a.push_notifications,
|
||||
- state_transition_history=True,
|
||||
),
|
||||
```
|
||||
|
||||
### `A2AStarletteApplication` → route factories
|
||||
|
||||
```diff
|
||||
-from a2a.server.apps import A2AStarletteApplication
|
||||
+from a2a.server.routes import create_agent_card_routes, create_jsonrpc_routes
|
||||
|
||||
-app = A2AStarletteApplication(
|
||||
- agent_card=agent_card,
|
||||
- http_handler=request_handler,
|
||||
-).build()
|
||||
+routes = []
|
||||
+routes.extend(create_agent_card_routes(agent_card))
|
||||
+routes.extend(create_jsonrpc_routes(
|
||||
+ request_handler=request_handler,
|
||||
+ rpc_url="/",
|
||||
+ enable_v0_3_compat=True,
|
||||
+))
|
||||
+app = Starlette(routes=routes)
|
||||
```
|
||||
|
||||
The `enable_v0_3_compat=True` flag on `create_jsonrpc_routes` is what
|
||||
keeps in-flight v0 callers (peers that haven't migrated yet) from
|
||||
breaking — it accepts the old method names and translates them. The
|
||||
Molecule runtime ships with this flag on (see
|
||||
[`workspace/main.py:279`](https://git.moleculesai.app/Molecule-AI/molecule-core/blob/main/workspace/main.py#L279));
|
||||
strip it once your entire fleet is on v1.
|
||||
|
||||
## For downstream consumers
|
||||
|
||||
- **Using the published wheel** (`pip install
|
||||
molecule-ai-workspace-runtime>=0.1.11`): the migration is in the
|
||||
wheel — no code changes needed in your adapter or workspace template
|
||||
beyond bumping the pin.
|
||||
- **Running a fork of the runtime**: cherry-pick or rebase against
|
||||
commit `d5cf872` ("feat: migrate a2a-sdk 1.x (KI-009) (#39)") in
|
||||
`molecule-ai-workspace-runtime`. The diff is the canonical reference
|
||||
for what KI-009 actually changed.
|
||||
- **Standalone external agent** (talking A2A without the wheel): apply
|
||||
the [Migration checklist](#migration-checklist) directly to your
|
||||
source. The four cheat-sheet items are the entire surface that
|
||||
changed for the typical agent role; only `Part` flattening and the
|
||||
`state_transition_history` removal affect on-the-wire shapes — the
|
||||
other two are import-only.
|
||||
|
||||
<Callout type="info">
|
||||
The wheel keeps `enable_v0_3_compat=True` on `create_jsonrpc_routes`,
|
||||
so a v0 peer can still hit a v1 wheel and vice versa during the
|
||||
migration window. You don't need to coordinate a fleet-wide cutover —
|
||||
migrate at your own pace.
|
||||
</Callout>
|
||||
|
||||
## See also
|
||||
|
||||
- [`molecule-ai-workspace-runtime` v0.1.11 release](https://git.moleculesai.app/Molecule-AI/molecule-ai-workspace-runtime/releases/tag/v0.1.11) — first wheel containing KI-009
|
||||
- [PR #39 — feat: migrate a2a-sdk 1.x (KI-009)](https://git.moleculesai.app/Molecule-AI/molecule-ai-workspace-runtime/pulls/39)
|
||||
- [PR #48 — feat(a2a): dual-compat for a2a-sdk 0.3.x and 1.x](https://git.moleculesai.app/Molecule-AI/molecule-ai-workspace-runtime/pulls/48) — runtime-side compat shim that keeps v0 peers working against the v1 wheel
|
||||
- [Bring Your Own Runtime (MCP)](/docs/runtime-mcp) — universal wheel install path
|
||||
- [External Agents](/docs/external-agents) — manual A2A path for non-MCP runtimes
|
||||
@ -102,22 +102,6 @@ example above. Drop it into your client's MCP settings file
|
||||
(typically `~/.cursor/mcp.json` for Cursor, the MCP Servers panel for
|
||||
Cline) and restart the client.
|
||||
|
||||
## Environment variables
|
||||
|
||||
The following env vars are supported by the `molecule-mcp` wheel in addition to the
|
||||
required trio (`WORKSPACE_ID`, `PLATFORM_URL`, `MOLECULE_WORKSPACE_TOKEN`):
|
||||
|
||||
| Env var | What it controls | Default |
|
||||
|---|---|---|
|
||||
| `MOLECULE_MODEL` | **Canonical.** The model ID the workspace runtime uses — e.g. `claude-opus-4-7`, `minimax/MiniMax-M2.7-highspeed` | _(unset — template default)_ |
|
||||
| `MODEL` | **Alias for `MOLECULE_MODEL`.** Accepted for backwards compatibility. | _(unset)_ |
|
||||
| `MODEL_PROVIDER` | **Deprecated.** This var was previously misread as "runtime selector" (`claude-code`, `minimax`, etc.) but carried the model ID, causing the wrong model to be used. Prefer `MOLECULE_MODEL`. | _(unset — emits deprecation warning)_ |
|
||||
| `MOLECULE_AGENT_SKILLS` | Comma-separated skill names — e.g. `research,code-review,memory-curation` | `[]` |
|
||||
|
||||
<Callout type="warn">
|
||||
`MODEL_PROVIDER` is deprecated. It was misnamed — despite its name it carried the **model ID** (e.g. `claude-opus-4-7`), not the runtime/provider name. Setting it caused production incidents where the Claude CLI received `--model MODEL_PROVIDER_VALUE` and returned 404s. Use `MOLECULE_MODEL` instead.
|
||||
</Callout>
|
||||
|
||||
## Optional — declare your identity & capabilities
|
||||
|
||||
Three additional env vars control how your workspace appears on the
|
||||
@ -222,38 +206,6 @@ Claude Code, Cursor, Cline, OpenCode, hermes-agent, or anything else
|
||||
that opens an MCP stdio connection. If your client speaks MCP, it
|
||||
speaks the wheel.
|
||||
|
||||
## HTTP/SSE transport for Hermes workspaces
|
||||
|
||||
Hermes workspaces (which are MCP-native) can connect to the platform MCP
|
||||
server over **HTTP + Server-Sent Events** instead of stdio. This is the
|
||||
recommended path when Hermes runs as a standalone service rather than
|
||||
inside a shell.
|
||||
|
||||
The `a2a_mcp_server.py` in the runtime exposes two endpoints:
|
||||
|
||||
| Endpoint | Method | Purpose |
|
||||
|---|---|---|
|
||||
| `/mcp` | `POST` | Receive JSON-RPC requests |
|
||||
| `/mcp/stream` | `GET` | SSE stream for push-based responses |
|
||||
| `/health` | `GET` | Health check |
|
||||
|
||||
Start the server with the `--transport=http --port=<N>` flags:
|
||||
|
||||
```bash
|
||||
python a2a_mcp_server.py \
|
||||
--transport=http \
|
||||
--port=8080 \
|
||||
--workspace-id=<uuid> \
|
||||
--platform-url=https://<tenant>.moleculesai.app \
|
||||
--workspace-token=<token>
|
||||
```
|
||||
|
||||
<Callout type="info">
|
||||
The stdio transport (described in [Step 2](#step-2--add-it-to-your-runtime))
|
||||
remains the default. HTTP/SSE is an alternative for Hermes deployments
|
||||
where a long-running daemon process is preferred over a stdio subprocess.
|
||||
</Callout>
|
||||
|
||||
## Heartbeat & lifecycle
|
||||
|
||||
The wheel spawns a daemon thread that POSTs `/registry/heartbeat` every
|
||||
|
||||
@ -1,284 +0,0 @@
|
||||
---
|
||||
title: "Provisioning Workspaces on AWS EC2 (production SaaS provisioner)"
|
||||
description: "How the molecule-controlplane EC2 provisioner turns POST /cp/orgs and POST /workspaces calls into running tenant + workspace EC2 instances — env vars, lifecycle, tier sizing, and the migration off Fly Machines."
|
||||
---
|
||||
|
||||
# Provisioning Workspaces on AWS EC2 (production SaaS provisioner)
|
||||
|
||||
As of April 2026, Molecule AI's SaaS control plane provisions both **tenants**
|
||||
(per-org platform VMs) and **workspaces** (per-agent inference VMs) on
|
||||
AWS EC2 instances. The provisioner lives at
|
||||
[`molecule-controlplane/internal/provisioner/ec2.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/internal/provisioner/ec2.go)
|
||||
and is auto-wired by [`cmd/server/main.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/cmd/server/main.go)
|
||||
whenever AWS credentials are present in the control-plane environment. The
|
||||
platform manages workspace lifecycle, auth, and routing; AWS manages the
|
||||
underlying EC2, security groups, and network plumbing.
|
||||
|
||||
This tutorial documents what env vars the provisioner reads, what AWS
|
||||
actions it performs on a `POST /workspaces`, and how to operate it. It is
|
||||
the replacement for the deprecated [Fly Machines provisioner](./fly-machines-provisioner.md)
|
||||
tutorial.
|
||||
|
||||
> **Audience:** operators running a self-hosted Molecule AI control plane
|
||||
> against their own AWS account, and contributors debugging the
|
||||
> production CP. End-users of `*.moleculesai.app` do not need any of
|
||||
> this — provisioning happens transparently when you create an org or
|
||||
> workspace in the canvas.
|
||||
|
||||
## When EC2 is the active provisioner
|
||||
|
||||
`cmd/server/main.go` switches on whether `AWS_ACCESS_KEY_ID` is set in the
|
||||
process environment. If yes, it constructs an `*provisioner.EC2` from the
|
||||
config below and registers it as the tenant provisioner. There is **no**
|
||||
`CONTAINER_BACKEND=ec2` switch — the dispatcher key is presence of AWS
|
||||
credentials. (The legacy `flyio` backend still has dead code in the tree
|
||||
but is no longer wired in `main.go`.)
|
||||
|
||||
A typical Railway-hosted control plane log line on boot:
|
||||
|
||||
```
|
||||
provisioner: EC2 (region=us-east-2, ami=ami-0ea3c35c5c3284d82)
|
||||
tenant provisioner: EC2 ✓
|
||||
```
|
||||
|
||||
If `AWS_ACCESS_KEY_ID` is unset, you'll see `provisioner: disabled`
|
||||
instead — useful for local dev where you want orgs CRUD to work without
|
||||
AWS access.
|
||||
|
||||
## Environment variables
|
||||
|
||||
The full list of env vars `cmd/server/main.go` passes into
|
||||
`provisioner.EC2Config`. Anything not listed here is unused by the
|
||||
provisioner.
|
||||
|
||||
### Required for any EC2 provisioning
|
||||
|
||||
| Var | Default | Purpose |
|
||||
|-----|---------|---------|
|
||||
| `AWS_ACCESS_KEY_ID` | — | Toggle: presence enables EC2 wiring at all |
|
||||
| `AWS_SECRET_ACCESS_KEY` | — | Standard AWS SDK credential pair |
|
||||
| `AWS_REGION` | `us-east-1` | Region for tenant + workspace launches |
|
||||
| `EC2_AMI` | `ami-0ea3c35c5c3284d82` (Ubuntu 22.04 us-east-2) | Default AMI when no `thin_ami_pins` row matches |
|
||||
| `EC2_VPC_ID` | — | VPC for per-tenant SG creation; falls back to `EC2_SECURITY_GROUP` if unset |
|
||||
| `EC2_SUBNET_ID` | — | Subnet for `RunInstances` |
|
||||
| `SECRETS_ENCRYPTION_KEY` | — | KMS-envelope DEK for tenant secret-at-rest; provisioner stays disabled until set |
|
||||
|
||||
### Required for production (#44 secure bootstrap)
|
||||
|
||||
| Var | Purpose |
|
||||
|-----|---------|
|
||||
| `EC2_TENANT_IAM_PROFILE` | Instance profile attached to every tenant EC2 so it can fetch its bootstrap bundle from Secrets Manager at boot. Without this set, `Provision` returns the error `"Secrets Manager + IAM instance profile are required (#113 — plaintext user-data path removed)"`. |
|
||||
| `PROVISION_SHARED_SECRET` | Shared HMAC-secret stored alongside the tenant bootstrap bundle so workspace-server can authenticate inbound `/cp/...` callbacks |
|
||||
| `CP_ADMIN_API_TOKEN` | Token the tenant uses to call admin endpoints back on the control plane |
|
||||
| `CP_BASE_URL` | URL the tenant boot script uses to reach the control plane (typically `https://api.moleculesai.app`) |
|
||||
|
||||
### Required for the canvas Terminal tab
|
||||
|
||||
| Var | Purpose |
|
||||
|-----|---------|
|
||||
| `EIC_ENDPOINT_SG_ID` | Security-group ID of the region's [EC2 Instance Connect endpoint](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-endpoint.html). The provisioner adds a `tcp/22` ingress rule to every per-tenant + per-workspace SG sourced from this SG, so the canvas Terminal can EIC-tunnel into the box for diagnostic ssh. Empty leaves the canvas Terminal broken with `failed to open EIC tunnel`. Discover with `aws ec2 describe-instance-connect-endpoints --region <region>`. |
|
||||
|
||||
### Cloudflare integration (per-tenant subdomains)
|
||||
|
||||
| Var | Purpose |
|
||||
|-----|---------|
|
||||
| `CLOUDFLARE_API_TOKEN` | Enables CF DNS client; provisioner creates the per-tenant `<slug>.<APP_DOMAIN>` CNAME |
|
||||
| `CLOUDFLARE_ACCOUNT_ID` | Enables CF Tunnel client (preferred over Worker + wildcard DNS) |
|
||||
| `CLOUDFLARE_ZONE_ID` | DNS zone the tenant CNAMEs are written under |
|
||||
| `APP_DOMAIN` | Default `moleculesai.app`; tenant FQDN becomes `<slug>.<APP_DOMAIN>` |
|
||||
|
||||
### Optional — runtime images, tier image, backups, canary, multi-env
|
||||
|
||||
| Var | Purpose |
|
||||
|-----|---------|
|
||||
| `MOLECULE_ENV` | `dev` / `staging` / `prod`; stamped on every EC2 tag and scopes the orphan-report's AWS lister so envs don't false-positive each other |
|
||||
| `EC2_INSTANCE_TYPE` | Default `t3.small` for tenant VMs (workspaces use the per-tier table below) |
|
||||
| `EC2_SECURITY_GROUP` | Fallback shared SG when `EC2_VPC_ID` is unset; production should leave this empty |
|
||||
| `EC2_KEY_NAME` | Optional EC2 KeyPair name for emergency console SSH |
|
||||
| `TENANT_IMAGE` | OCI ref for the tenant platform image (e.g. `ghcr.io/molecule-ai/platform-tenant:staging-<sha>`) |
|
||||
| `CANARY_TENANT_IMAGE` | Override `TENANT_IMAGE` for orgs flagged `is_canary=true` |
|
||||
| `CANARY_ROLE_ARN`, `CANARY_REGION`, `CANARY_VPC_ID`, `CANARY_SUBNET_ID` | Second-AWS-account target for canary tenant launches; all four required together |
|
||||
| `TENANT_BACKUP_S3_PREFIX` | Empty disables nightly `pg_dump`; set `s3://bucket/path` to enable |
|
||||
| `TENANT_BACKUP_REPORT_URL` | Defaults to `${CP_BASE_URL}/cp/tenants/backup-report` |
|
||||
| `GHCR_PULL_TOKEN` | GHCR pull token written into the tenant bootstrap bundle (private images only) |
|
||||
|
||||
For the always-current set, grep
|
||||
[`cmd/server/main.go` lines 86–158](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/cmd/server/main.go#L86-L158)
|
||||
for `os.Getenv` calls inside the `provisioner.NewEC2` block.
|
||||
|
||||
## What happens on `POST /cp/orgs` (tenant provision)
|
||||
|
||||
`OrgsHandler.Create` calls into `(*EC2).Provision(ctx, cfg)`. Roughly:
|
||||
|
||||
1. **Cloudflare cleanup** — `cleanupStaleSlugArtifacts` scrubs any
|
||||
leftover tunnel/DNS rows from a previously-purged org with the same
|
||||
slug, so the slug is reusable.
|
||||
2. **Cloudflare Tunnel + DNS** — `CreateTunnel` → `CreateTunnelDNS`
|
||||
(writes `<slug>.<APP_DOMAIN>` → `<tunnel-id>.cfargotunnel.com`) →
|
||||
`ConfigureTunnelIngress` (registers the hostname on the tunnel's
|
||||
remote config so CF's edge knows to forward). DNS or ingress
|
||||
failures roll back the tunnel and abort the provision — fail-fast
|
||||
behavior added 2026-04-26 after a six-hour outage in which
|
||||
unreachable tenants timed out at 600–900s instead of surfacing the
|
||||
real CF API problem.
|
||||
3. **Bootstrap secrets to AWS Secrets Manager** — the provisioner
|
||||
generates a per-tenant DB password + admin token, packages them with
|
||||
the GHCR pull token, tunnel token, encryption key, and shared
|
||||
secret, and `PutSecret`s them at `awsapi.TenantSecretName(orgID)`.
|
||||
The tenant fetches this bundle at boot via its instance profile —
|
||||
no plaintext secrets in user-data (see #113).
|
||||
4. **Per-tenant SG creation** — `createPerTenantSG` calls
|
||||
`CreateSecurityGroup` with the resolved VPC, the per-org name, and
|
||||
the ingress rules from `tenantIngressRules(vpcCidr, EICEndpointSGID)`.
|
||||
The SG ingress always includes the canvas-terminal EIC `tcp/22`
|
||||
rule sourced from the EIC endpoint's own SG (UserIdGroupPairs, not
|
||||
`0.0.0.0/0` — only AWS EIC's endpoint can use it).
|
||||
5. **`RunInstances`** — `awsClient.RunInstance(ctx, awsapi.LaunchConfig{...})`
|
||||
launches with `InstanceType = TenantInstanceType` (default
|
||||
`t3.small`), the resolved AMI, IAM instance profile, base64-encoded
|
||||
user-data, and tags `OrgID` / `OrgSlug` / `Role=tenant` / `TunnelID`
|
||||
/ `SGID`. Volume size is 30 GB.
|
||||
6. **Audit row** — every CF, SG, Secrets Manager, and EC2 lifecycle
|
||||
event is recorded in the `tenant_resources` audit table (#2343)
|
||||
so the orphan reconciler can diff claims vs live state.
|
||||
|
||||
`Provision` returns a `*Result` whose fields (`FlyMachineID`, `FlyRegion`,
|
||||
`AdminToken`) are still named after Fly. The EC2 provisioner fake-fills
|
||||
them with EC2 equivalents (`InstanceID`, `AWSRegion`); a column-rename
|
||||
migration is on the controlplane backlog.
|
||||
|
||||
## What happens on `POST /workspaces` (workspace provision)
|
||||
|
||||
`workspace-server`'s `POST /workspaces` reaches the control plane via
|
||||
`/cp/workspaces/provision`, which calls
|
||||
`(*EC2).ProvisionWorkspace(ctx, workspaceID, runtime, orgID, tier, platformURL, env)`:
|
||||
|
||||
1. **Resolve tier resources** — `workspaceTierResources(tier)` returns
|
||||
`(instanceType, volumeSize)` per the table below. Hermes runtime
|
||||
floors `volumeSize` to 50 GB regardless of tier (uv + Python venv +
|
||||
Node.js gateway pegs disk at 18–25 GB during install).
|
||||
2. **Resolve AMI** — `resolveWorkspaceAMI` looks up `thin_ami_pins`
|
||||
for the runtime + region. A pin row means the AMI is pre-baked
|
||||
(per `packer/scripts/install-base.sh`) and user-data can skip
|
||||
apt-update + the Python/Node installs (60–140 s saved per
|
||||
provision, RFC #388). Fallback to the static `WorkspaceAMI`.
|
||||
3. **Resolve runtime image** — `resolveRuntimeImage` looks up
|
||||
`runtime_image_pins` and emits the containerized user-data path
|
||||
(docker pull + run) when present. Independent of the AMI gate
|
||||
above; the new path also installs Docker if missing on a thin/stock
|
||||
AMI.
|
||||
4. **Per-workspace SG creation** — same `createPerTenantSG` call with
|
||||
`namePrefix="workspace"`. Workspace SGs get
|
||||
`workspaceIngressRules(EICEndpointSGID)` — currently the EIC
|
||||
`tcp/22` rule and nothing else (workspaces sit behind the
|
||||
Cloudflare Tunnel for HTTP).
|
||||
5. **`RunInstance`** — launches with `wsShort = workspaceID[:12]`
|
||||
prefixed name, the resolved instance type + volume + AMI +
|
||||
user-data, and tags `WorkspaceID` / `Runtime` / `Role=workspace`
|
||||
/ `SGID` / `OrgID`. The `OrgID` tag is what lets
|
||||
`DeprovisionInstance` cascade-terminate workspace EC2s when their
|
||||
tenant is deleted (incident 2026-04-23: ~27 orphaned workspace
|
||||
EC2s pinned staging at the 64 vCPU limit before the tag was
|
||||
added).
|
||||
6. **Audit row** — `tenant_resources` `KindEC2Instance` `StateCreated`
|
||||
with role / runtime / tier / workspace metadata.
|
||||
|
||||
The boot script registers the workspace agent with the platform via
|
||||
`/workspaces/:id/register`, the platform issues an A2A auth token, and
|
||||
the agent comes up ready for `message/send` calls.
|
||||
|
||||
## Tier-based resource sizing
|
||||
|
||||
`workspaceTierResources` is the single source of truth. As of writing,
|
||||
all tiers below T4 are clamped up to T4 (the SaaS floor) and tiers
|
||||
above T4 are also clamped down to T4 (today's max):
|
||||
|
||||
| Tier | Instance type | Volume | Effective use |
|
||||
|------|---------------|--------|---------------|
|
||||
| T1 / T2 | clamped to T4 | clamped to T4 | not in production |
|
||||
| T3 | `t3.medium` | 40 GB | reserved (clamped today) |
|
||||
| T4 | `t3.large` | 80 GB | all production workspaces |
|
||||
|
||||
If you set a tier outside `[3, 4]` the clamp lifts it to T4 — a cheap
|
||||
mis-provision rather than a fall-through to the unset `t3.small`
|
||||
default. The clamp was added in PR #434 follow-up after `tier=5`
|
||||
silently yielded `t3.small`.
|
||||
|
||||
Hermes overrides volume to 50 GB minimum regardless of tier.
|
||||
|
||||
## Lifecycle — stop, restart, redeploy, teardown
|
||||
|
||||
| Operation | Mechanism |
|
||||
|-----------|-----------|
|
||||
| **Stop / start a tenant** | `POST /cp/admin/tenants/:slug/{stop,start}` → `(*EC2).Stop` / `Start` via the EC2 API (no termination) |
|
||||
| **Redeploy a tenant** (in-place new image) | `POST /cp/admin/tenants/:slug/redeploy` → SSM Run Command pulls the latest `TENANT_IMAGE` and recreates the platform container; never reboots EC2 |
|
||||
| **Refresh workspace template images** | `POST /cp/admin/tenants/:slug/workspaces/redeploy` (single-tenant) or `POST /cp/admin/tenants/workspaces/redeploy-fleet` (canary-batched fleet); HTTP-only, no SSM |
|
||||
| **Delete a workspace** | platform `DELETE /workspaces/:id` → CP `DeprovisionInstance(workspaceInstanceID, ...)` terminates the EC2 + cleans DNS + SG |
|
||||
| **Delete a tenant (Art. 17 cascade)** | `DELETE /cp/orgs/:slug` → cascade-terminates all workspace EC2s tagged with this `OrgID`, then terminates the tenant EC2, then deletes the SG, Secrets Manager bundle, CF tunnel + CNAME |
|
||||
| **Orphan recovery** | `tenant_resources` audit table + 30-min reconciler that diffs claims vs live AWS state and exposes orphan counts via `/cp/admin/stats` |
|
||||
|
||||
`DeprovisionInstance` polls termination under its own deadline so a
|
||||
stuck shutdown surfaces as a deprovision failure (and the caller's
|
||||
retry replays the cascade) instead of becoming a silent leak (#263).
|
||||
|
||||
## Why EC2 (vs Fly Machines)
|
||||
|
||||
The control plane has migrated infrastructure twice in April 2026 — both
|
||||
documented in the
|
||||
[molecule-controlplane README "Migration history"](https://git.moleculesai.app/molecule-ai/molecule-controlplane#migration-history):
|
||||
|
||||
- **Apr 2026 — CP host:** Fly (`molecule-cp.fly.dev`) → Railway
|
||||
(`api.moleculesai.app`).
|
||||
- **Apr 2026 — tenant + workspace compute:** Fly Machines → AWS EC2
|
||||
with SSM Run Command for redeploy.
|
||||
|
||||
The drivers were production needs Fly couldn't easily meet:
|
||||
|
||||
- **Region + data-residency control.** EU customers required
|
||||
EU-resident tenant data; AWS regional pinning per tenant is
|
||||
straightforward, Fly's region routing is per-app and harder to
|
||||
guarantee per-tenant.
|
||||
- **AWS-native auth chain for the canvas Terminal.** EC2 Instance
|
||||
Connect lets the platform open SSH tunnels to a tenant box via
|
||||
short-lived (60 s) IAM-signed public keys — no shared SSH keys,
|
||||
no inbound `0.0.0.0/0` rules. The same path powers the Files API
|
||||
EIC writes (see [SaaS file writes via EC2 Instance Connect](./saas-file-writes-eic.md)).
|
||||
- **Secrets Manager + IAM instance profiles** for tenant bootstrap
|
||||
secrets (#113 removed the plaintext user-data path).
|
||||
- **Cloudflare Tunnels** instead of public IPs — no inbound exposure
|
||||
on tenant EC2s; CF edge is the only ingress.
|
||||
- **`tenant_resources` audit table + reconciler** for cascade-cleanup
|
||||
guarantees that Fly's flat machine list couldn't enforce.
|
||||
|
||||
Old `internal/flyapi/` and `internal/provisioner/fly.go` files remain
|
||||
in the controlplane tree as legacy code awaiting cleanup; they are not
|
||||
wired in `cmd/server/main.go`.
|
||||
|
||||
## Operating notes
|
||||
|
||||
- **Schema names still say "fly".** The `org_instances` columns
|
||||
`fly_app` / `fly_machine_id` / `fly_region` are fake-filled with EC2
|
||||
equivalents; a rename migration is on the controlplane backlog
|
||||
(`PLAN.md`).
|
||||
- **`SECRETS_ENCRYPTION_KEY` gates the whole provisioner.** The crypto
|
||||
envelope is required even when only AWS creds are present; without
|
||||
it, `tenant provisioner: DISABLED` is logged and `POST /cp/orgs`
|
||||
accepts the row but never spins a tenant.
|
||||
- **Per-tenant SG creation needs `EC2_VPC_ID`.** If you only set
|
||||
`EC2_SECURITY_GROUP` (the legacy shared-SG fallback), every tenant
|
||||
shares one SG — caught the bug in PR #434 review. Production must
|
||||
set `EC2_VPC_ID`.
|
||||
- **`EIC_ENDPOINT_SG_ID` is silently load-bearing.** If unset, the
|
||||
canvas Terminal hangs with `failed to open EIC tunnel` and the
|
||||
Files API EIC write path returns 500 — the EC2 boots fine, the
|
||||
symptom only shows when an operator opens the canvas Terminal tab.
|
||||
|
||||
## References
|
||||
|
||||
- [`molecule-controlplane/internal/provisioner/ec2.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/internal/provisioner/ec2.go) — provisioner source
|
||||
- [`molecule-controlplane/cmd/server/main.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/blob/main/cmd/server/main.go) — env-var wiring
|
||||
- [`molecule-controlplane` README "Migration history"](https://git.moleculesai.app/molecule-ai/molecule-controlplane#migration-history) — canonical record
|
||||
- [AWS EC2 Instance Connect endpoints](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-endpoint.html)
|
||||
- [AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html)
|
||||
- [SaaS file writes via EC2 Instance Connect](./saas-file-writes-eic.md) — EIC is also the Files API write channel
|
||||
- [Fly Machines provisioner (DEPRECATED)](./fly-machines-provisioner.md) — previous backend, retained for migration history
|
||||
Loading…
Reference in New Issue
Block a user