molecule-ai-workspace-runtime

molecule-ai/molecule-ai-workspace-runtime

Author	SHA1	Message	Date
Hongming Wang	1b04da2061	Merge pull request #38 from Molecule-AI/fix/auto-detect-llm-token-type feat(runtime): auto-detect LLM token type, normalise env on boot	2026-04-23 13:53:06 -07:00
Hongming Wang	e562b7a03e	Merge branch 'staging' into fix/auto-detect-llm-token-type	2026-04-23 13:52:25 -07:00
Hongming Wang	3556244725	Merge pull request #40 from Molecule-AI/fix/heartbeat-401-token-refresh-1877 fix(heartbeat): refresh on-disk auth token on 401 + retry once (#1877)	2026-04-23 13:51:42 -07:00
rabbitblood	a78b9f229e	test(1877): convert async tests to sync httpx.Client to unblock CI CI doesn't have pytest-asyncio installed, and the async wrapping was incidental — the production retry pattern (refresh-on-401) is identical in sync and async forms. Switching to httpx.Client + MockTransport keeps the same coverage without the async dep. 6/6 still pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 13:35:45 -07:00
rabbitblood	050c2412b3	fix(heartbeat): refresh on-disk auth token on 401 + retry once (#1877 ) ## Problem Auto-restart rotates the workspace's auth token in two non-atomic steps: 1. Platform issues new token via wsauth.IssueToken 2. Provisioner writes the new token to /configs/.auth_token AFTER ContainerStart returns Between steps 1 and 2, the new container has booted and the runtime has already loaded the OLD cached value of .auth_token (or no value if the file was empty during boot). The runtime's first /registry/heartbeat call sends the stale token, gets 401, but the loop never re-reads the on-disk token — so subsequent heartbeats also send the stale value. Each 401 means the platform never sees the workspace as alive → status stays 'provisioning' → scheduler won't dispatch → workspace looks dead from every angle even though the container is actually running. The existing code comment in workspace_provision.go acknowledges this: "the workspace will get 401 on its first heartbeat and can recover on the next restart." That recovery only worked because workspaces used to crash for unrelated reasons and get restarted. After PR #1861 (provisioner empty-volume auto-recover) removed those crashes, workspaces get stuck in the 401 loop with no exit. ## Fix Two-part runtime-side fix in molecule-ai-workspace-runtime: 1. platform_auth.refresh_from_disk() — new helper that clears the in-memory cache and re-reads /configs/.auth_token. Returns the fresh value (or None if missing). Updates the cache as a side effect. 2. HeartbeatLoop._loop() — on 401 from /registry/heartbeat, calls refresh_from_disk() and retries the request ONCE with the new token. Same pattern in _check_delegations(). Bounded retry budget — if the on-disk token is also stale (bug elsewhere), no infinite loop. ## Tests 6/6 new tests in tests/test_token_refresh_1877.py: - refresh_picks_up_rotated_token — happy path - refresh_returns_none_when_file_missing — defensive - refresh_clears_stale_cache_when_file_disappears - refresh_is_idempotent - 401_retry_pattern_uses_refreshed_token — the production fix path - 401_retry_no_loop_when_disk_token_also_stale — bounded retry budget All pass locally on Python 3.13 + pytest 9. ## Why this fix and not the alternatives - Alternative B (platform writes token before ContainerStart): Right architecturally but invasive — needs provisioner refactor to prep volumes before docker run. - Alternative C (skip rotation on auto-restart): Breaks the multi-instance-safety invariant the existing code calls out (revoke prevents stale tokens from sister deployments). - This fix (A): 3-line core change + helper. Self-healing for any timing edge case, not just the post-restart one. Costs nothing in the happy path (only triggers on 401). ## Version Bumped to 0.1.9. Once published to PyPI + workspace template image rebuilt, deployed workspaces auto-recover from token-rotation races without operator intervention. Closes #1877. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 13:26:36 -07:00
rabbitblood	4bafea58ae	fix(llm_auth): tighten base-URL hostname match + strip whitespace + no token in logs Self-review findings on #38: 1. Token substring leak: the "unknown prefix" warning included the first 12 chars of the token in the log message. Logs get shipped to Langfuse / CloudWatch / slack-firehose — 12 bytes of a secret in a log is still 12 bytes too many. Warning no longer references the token value at all. 2. Base-URL substring match was too loose: `"anthropic.com" not in base` would accept `https://proxy.anthropic.com.evil.example/` as "looks like Anthropic, keep the URL." Replaced with an allowlist of exact hostnames parsed via urllib.parse.urlparse. 3. Whitespace in pasted tokens: operators frequently paste tokens from terminals with a trailing newline. The token would flow through startswith() detection but then fail downstream auth with a confusing "malformed token" error. Strip and persist the cleaned value. 4. Malformed base URL crash guard: if someone sets ANTHROPIC_BASE_URL to something urlparse can't handle, don't crash — fall through to clearing it, which is the safe choice in OAuth mode. Added 5 new tests covering each of the above. 16/16 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:46:07 -07:00
rabbitblood	0a0f11b41f	feat(runtime): auto-detect LLM token type, normalise env on boot Platform stores per-workspace LLM credentials under a single key (ANTHROPIC_AUTH_TOKEN in workspace_secrets). But downstream tools expect different env var names depending on the token type: sk-ant-oat01-* → CLAUDE_CODE_OAUTH_TOKEN (Claude Code OAuth session) sk-ant-api03-* → ANTHROPIC_API_KEY (direct Anthropic API) sk-cp-* → ANTHROPIC_AUTH_TOKEN (proxy: MiniMax, gateways) Without normalisation, an OAuth token under ANTHROPIC_AUTH_TOKEN gets sent as a bearer to api.anthropic.com, which responds: 401 authentication_error: OAuth authentication is currently not supported. This was a platform-wide footgun: anyone rotating LLM keys had to know the exact env var for each token type, AND make sure stale overrides were cleared, AND set ANTHROPIC_BASE_URL correctly for proxies (or NOT set for native Claude). Nothing downstream could help — the SDK just saw the wrong var. Fix: - New molecule_runtime/llm_auth.py — normalise_llm_env() mutates os.environ (or any dict) to the correct shape based on token prefix. Returns a NormalisationResult for logging. - main.py calls it as step 0, before any adapter/executor import. Every adapter (claude-code, langgraph, crewai, autogen, hermes, …) benefits automatically — no per-adapter branching needed. - 11 unit tests covering all prefix paths, edge cases, and the "operator deliberately set CLAUDE_CODE_OAUTH_TOKEN" precedence rule. Operationally: this means operators can keep using one ANTHROPIC_AUTH_TOKEN slot in platform settings and just paste whatever token the agent needs. No env-var-name awareness required. Tested locally: 11/11 new tests pass. 83 other tests unchanged (pre-existing failures on staging are all unrelated: test_workspace_id_validation, test_a2a_mcp_server RBAC, the test_imports.main module-walker — same signature as on staging HEAD before this PR). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:41:47 -07:00
molecule-ai[bot]	dcb6edd1a1	fix(shared_runtime): push heartbeat on CLEAR in set_current_task() (#37 ) Fixes #1372 — phantom busy: canvas showed workspace as active for up to 30s after task completion because set_current_task("") returned early without posting the updated heartbeat. Before: clearing only updated the heartbeat object; the next 30s scheduled heartbeat cycle propagated the clear. Quick tasks would leave a phantom-busy indicator. After: both SET and CLEAR push immediately to /registry/heartbeat. active_tasks=0 on clear, active_tasks=1 on set. Heartbeat object update and HTTP post are now unconditional. Tests: 5 new cases covering SET/CLEAR HTTP body, error resilience, None heartbeat, and missing env vars. Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>	2026-04-22 17:33:42 +00:00
rabbitblood	1e545ed6ba	chore: bump 0.1.8 — executor_helpers phantom-busy fix confirmed in tree Some checks failed Publish to PyPI / build-and-publish (push) Failing after 8s Details	2026-04-21 07:16:47 -07:00
rabbitblood	5a1990552d	chore: bump 0.1.7 — ensure executor_helpers phantom-busy fix in PyPI build Some checks failed Publish to PyPI / build-and-publish (push) Failing after 7s Details	2026-04-21 07:07:17 -07:00
rabbitblood	59f54560a0	Merge branch 'main' of https://github.com/Molecule-AI/molecule-ai-workspace-runtime into fix/507-mcp-server-path-absolute-imports Some checks failed Publish to PyPI / build-and-publish (push) Failing after 6s Details # Conflicts: # pyproject.toml	2026-04-21 06:37:38 -07:00
rabbitblood	d3235cc564	fix(heartbeat): increment/decrement active_tasks + push on clear (#1372 , #1408 ) Both set_current_task() implementations (shared_runtime.py + executor_helpers.py): - Increment active_tasks on task start, decrement on completion (was binary 0/1) - Push heartbeat immediately on BOTH increment AND decrement - Only clear current_task when active_tasks reaches 0 (preserves description for still-running tasks) Fixes phantom-busy: the old code returned early on clear, leaving active_tasks=1 in the platform DB until the next 30s heartbeat cycle. If a new cron fired before the heartbeat, the workspace appeared permanently busy — required manual DB reset every 30 min. Bump: 0.1.2 → 0.1.3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 06:37:12 -07:00
Hongming Wang	7febb51382	Merge pull request #36 from Molecule-AI/chore/bump-0.1.5 Some checks failed Publish to PyPI / build-and-publish (push) Failing after 6s Details chore: bump to 0.1.5 for X-Molecule-Org-Id header fix	2026-04-20 20:30:54 -07:00
Hongming Wang	742b7d1dfb	chore: bump version to 0.1.5 for org-id-header fix	2026-04-20 20:30:31 -07:00
Hongming Wang	4b0185a57b	Merge pull request #35 from Molecule-AI/feat/send-org-id-header feat(auth): send X-Molecule-Org-Id on every outbound platform call	2026-04-20 20:28:40 -07:00
Hongming Wang	ba5466243b	feat(auth): send X-Molecule-Org-Id on every outbound platform call The SaaS tenant platform's TenantGuard middleware rejects cross-org routing with synthetic 404s unless the request carries X-Molecule-Org-Id matching the tenant's MOLECULE_ORG_ID env var. The runtime never sent it, so every non-allowlisted workspace→platform path (memories, delegations, notify, a2a, update-card, peers...) 404'd. Paired with CP change feat/workspace-export-org-id which injects MOLECULE_ORG_ID into workspace user-data env. auth_headers() now returns both headers — the existing Authorization bearer AND the new X-Molecule-Org-Id — so every caller that already threads auth_headers() through httpx picks it up for free. Self- hosted deployments with MOLECULE_ORG_ID unset keep the old behavior (no header, TenantGuard is a no-op). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 20:28:07 -07:00
molecule-ai[bot]	0e2e1fc2c4	Merge pull request #33 from Molecule-AI/fix/a2a-cli-discover-workspace-id-validation fix(a2a_cli): validate WORKSPACE_ID in discover() before X-Workspace-ID header	2026-04-21 01:53:19 +00:00
Molecule AI Infra-Runtime-BE	d4b9bff5d0	fix(a2a_cli): validate WORKSPACE_ID in discover() before X-Workspace-ID header PR #32 wrapped all platform URL construction sites with get_validated_workspace_id() but missed a2a_cli.discover(), which passed the raw unvalidated WORKSPACE_ID in the X-Workspace-ID header. All other functions (peers, info) had try/except guards added. discover() now calls get_validated_workspace_id() upfront and returns None (printing the error) if validation fails — consistent with the best-effort error handling pattern used elsewhere in the module. Tests: 2 new cases in TestA2aCliDiscoverValidation covering empty and slash-injected WORKSPACE_ID values. Follow-up to: PR #32 (fix/908-add-namespace-param-commit-memory) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 01:35:37 +00:00
molecule-ai[bot]	40c30c068a	Merge pull request #32 from Molecule-AI/fix/908-add-namespace-param-commit-memory fix(CI): set WORKSPACE_ID env var + validation coverage	2026-04-21 01:29:32 +00:00
Molecule AI Infra-SRE	4bfe6222a6	fix(CI): remove conflicting bandit flags from security linter step PR #31 added `-ll --severity-level=high` but these flags conflict: - `-ll` is a shorthand for `--level low` (only show low+ issues) - `--severity-level=high` suppresses everything but high-severity issues The combination causes bandit to exit 2 because `--severity-level` is not allowed alongside `-l/--level`. Use `--severity-level=high` alone. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 00:58:43 +00:00
Molecule AI Infra-SRE	875a8ef952	fix(CI): set WORKSPACE_ID env var for test job PR #29 introduced WORKSPACE_ID validation at module import time (platform_auth.py). The CI environment did not set WORKSPACE_ID, causing 8 failures + 13 errors on every main push. Add a dummy CI-only value so imports succeed without affecting real workspaces. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 00:55:08 +00:00
Molecule AI Infra-SRE	249e5c07eb	fix(builtin_tools/validation): complete WORKSPACE_ID validation in a2a_tools.py Add get_validated_workspace_id() to all 6 remaining unguarded URL positions in molecule_runtime/a2a_tools.py (the MCP tool body implementations): - report_activity(): /workspaces/{id}/activity + heartbeat - tool_delegate_task_async(): /workspaces/{id}/delegate - tool_check_task_status(): /workspaces/{id}/delegations - tool_send_message_to_user(): /workspaces/{id}/notify - tool_commit_memory(): /workspaces/{id}/memories (POST) - tool_recall_memory(): /workspaces/{id}/memories (GET) All 6 functions now use validated ws_id. The last remaining unguarded WORKSPACE_ID use in the entire molecule_runtime package is in builtin_tools/telemetry.py:142 (metric service name — not a URL path, low security risk). 67/67 tests pass.	2026-04-21 00:55:08 +00:00
Molecule AI Infra-SRE	32a7880f4f	test+fix(builtin_tools/validation): add test coverage + fix ".." bypass in regex Tests: 37 new test cases in tests/test_validation.py covering: - Valid ID patterns (6): normal IDs, underscores, dots, max-length (256) - Empty/missing (1): raises with "empty" in message - Invalid chars (10): / \ .. # ? & whitespace - Caching (2): result is cached; raises on repeated bad calls - Error type (1): WorkspaceIdValidationError is a ValueError subclass Fix: regex now uses negative lookahead `(?!.*\.\.)` to reject ".." anywhere in the string (not just at the start). The old pattern `^[A-Za-z0-9_\-.]{1,256}$` matched ".." literally because two dots ARE in the allowed character class. Also adds test cases for embedded ".." (ws..example, ws../etc). Fixes: the ".." bypass was a gap in the original CWE-20 fix.	2026-04-21 00:55:08 +00:00
Molecule AI Infra-SRE	be9c9997c0	fix(builtin_tools/validation): cover remaining WORKSPACE_ID URL usages Extend get_validated_workspace_id() to all remaining unguarded URL positions: - consolidation.py: _consolidate() — validates before GET/POST/DELETE to /workspaces/{id}/memories endpoints. Graceful skip on failure (log + return). - coordinator.py: get_children() — validates before /registry/{id}/peers. Graceful skip (empty list) on failure. - molecule_ai_status.py: set_status() — validates before /registry/heartbeat and /workspaces/{id}/activity. Exits with descriptive error on failure. With these three, every runtime use of WORKSPACE_ID in a URL path is now validated. Remaining WORKSPACE_ID uses are: - JSON body fields (not injection-risky): heartbeat, memory POST bodies - Header values (X-Workspace-ID): lower risk, non-URL-injection	2026-04-21 00:55:08 +00:00
Molecule AI Infra-SRE	42bdf530b5	fix(builtin_tools/validation): extend WORKSPACE_ID validation to top-level modules Fixes remaining unguarded WORKSPACE_ID URL usages identified after the initial builtin_tools/ fix: - a2a_client.py: get_peers() and get_workspace_info() now use get_validated_workspace_id() before URL construction. The raw module-level constant is still used in the discover_peer() header (low risk, not URL path). - a2a_cli.py: peers() and info() CLI commands now validate WORKSPACE_ID before calling the platform API. Commands exit with error code 1 + descriptive message if WORKSPACE_ID is empty or malformed. Follow-up candidates (lower priority, not URL injection risk): - coordinator.py: WORKSPACE_ID in registry peer URL - consolidation.py: WORKSPACE_ID in memory URLs (long-running consolidation job) - molecule_ai_status.py: WORKSPACE_ID in activity log URL	2026-04-21 00:55:08 +00:00
Molecule AI Infra-SRE	d52082839f	fix(builtin_tools): validate WORKSPACE_ID before URL construction Add WORKSPACE_ID format validation before every URL/header use to prevent URL injection (CWE-20 / CWE-88). The validator: - Rejects empty values (fail-fast with clear error) - Rejects path-traversal chars (/ \ ..) and fragment/query chars (# ? &) - Accepts alphanumeric, hyphen, underscore, dot (typical ID formats) - Caches the result after first successful call (zero overhead per call) Validated in: - memory.py: commit_memory, search_memory (both awareness-client + httpx paths) - approval.py: _create_approval_request, _wait_polling - delegation.py: _notify_completion, _record_delegation_on_platform, _update_delegation_on_platform - a2a_tools.py: list_peers, delegate_task Fixes #14.	2026-04-21 00:55:08 +00:00
molecule-ai[bot]	548549d5e9	feat(CI): add bandit security linter (audit rec #2 ) (#31 ) Bandit runs on every PR against molecule_runtime/ at high severity. Addresses audit recommendation from issue #9. Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 00:23:17 +00:00
molecule-ai[bot]	30d96b4e4e	fix(platform_auth): validate WORKSPACE_ID at import time (issue #14 , CWE-20) (#29 ) WORKSPACE_ID was read via os.environ.get("WORKSPACE_ID", "") in multiple builtin_tools modules and used directly in platform API URLs and X-Workspace-ID headers without validation. A crafted ID containing /, .., or # could cause URL path injection. Fix: validate_workspace_id() in platform_auth.py now validates the ID format at module import time using a regex that permits only lowercase alphanumerics and hyphens (matching UUIDs and org-generated IDs). The validated value is exposed as a module-level WORKSPACE_ID constant. builtin_tools/approval.py and builtin_tools/delegation.py now import from platform_auth instead of reading os.environ directly. Failing input raises ValueError with a clear message — workspace fails fast at startup rather than silently accepting malformed IDs in requests. Add 15 regression tests (45/45 passing total). Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app> Co-authored-by: Infra-Runtime-BE <infra-runtime-be@molecule.ai>	2026-04-21 00:04:54 +00:00
Hongming Wang	953aa2847c	Merge pull request #30 from Molecule-AI/fix/adapter-loader-find-subclass Some checks failed Publish to PyPI / build-and-publish (push) Failing after 7s Details fix(adapter-loader): fall back to any BaseAdapter subclass	2026-04-20 16:59:38 -07:00
Hongming Wang	4aa0d9f110	fix(adapter-loader): fall back to any BaseAdapter subclass ADAPTER_MODULE resolution required the imported module to export a class literally named `Adapter`. The claude-code, langgraph, and openclaw adapter-template repos (3 of 4 currently in production) don't ship that alias — they export ClaudeCodeAdapter / LangGraphAdapter / OpenClawAdapter directly. Only hermes has the `Adapter = HermesAdapter` shim at the bottom of adapter.py. Consequence in prod: every fresh claude-code / langgraph / openclaw workspace crashed at runtime startup with "module 'adapter' has no attribute 'Adapter'", even with a2a-sdk correctly pinned <1.0. Provisioning looked successful from CP's side (EC2 ran) but the agent never registered because the process never reached A2A bootstrap. Fix: if `Adapter` is absent from the imported module, scan the module for any attribute that is a proper BaseAdapter subclass (excluding BaseAdapter itself — regression guard in tests). The explicit alias remains the preferred contract; this is purely additive tolerance. Bump to 0.1.4 and publish to PyPI via the existing v* tag trigger. 6 new tests cover: explicit alias, subclass-fallback, non-adapter-noise ignored, empty module → error, missing module → error, re-exported BaseAdapter → not selected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 16:59:12 -07:00
molecule-ai[bot]	457adcbd64	Merge pull request #28 from Molecule-AI/fix/908-add-namespace-param-commit-memory feat(builtin_tools/memory): add namespace param to commit_memory and search_memory	2026-04-20 23:18:45 +00:00
Molecule AI Infra-SRE	ecc0a231bf	feat(builtin_tools/memory): add optional namespace param to commit_memory and search_memory Adds optional namespace parameter so agents can organize memories into named buckets (e.g. "facts", "procedures", "blockers"). Defaults to "general". - commit_memory(content, scope, , namespace=None): namespace normalised to "general" when None or whitespace-only, forwarded to awareness client and included in httpx POST body. - search_memory(query, scope, , namespace=None): namespace forwarded as ?namespace= query param (omitted when None), matching the existing behaviour for the scope param. - AwarenessClient.commit() and .search() updated to accept namespace kwarg. Fixes #908.	2026-04-20 23:12:32 +00:00
molecule-ai[bot]	830381d40b	Merge pull request #27 from Molecule-AI/fix/cli-auth-helper-and-sandbox-warn fix(cli_executor + sandbox): CWE-78 auth helper + subprocess isolation warning	2026-04-20 23:07:07 +00:00
Molecule AI Infra-Runtime-BE	83f87702ea	fix(cli_executor + sandbox): CWE-78 auth helper + subprocess warning Issue #21 (CWE-78): _create_auth_helper() wrote a shell script using shlex.quote() which does NOT protect against $(...) command substitution inside the token value. Replaced with a mode-0600 token file passed via AGENT_AUTH_TOKEN_FILE env var — token is never interpreted by a shell. Issue #22 (CWE-266): sandbox subprocess backend warns once at module load time when active, alerting operators that SANDBOX_BACKEND=docker or e2b should be used for production isolation. Co-Authored-By: Infra-Runtime-BE <infra-runtime-be@molecule.ai>	2026-04-20 23:05:57 +00:00
molecule-ai[bot]	2bb0f97085	Merge pull request #26 from Molecule-AI/fix/plugin-setup-env-scrub fix(plugins_registry/builtins): strip API keys from plugin setup.sh env	2026-04-20 23:04:33 +00:00
molecule-ai[bot]	097908e707	Merge pull request #25 from Molecule-AI/fix/security-failopen-rbac-and-token-log-v2 fix(builtin_tools/audit): fail-secure RBAC + 3 additional security fixes	2026-04-20 23:04:31 +00:00
Molecule AI Infra-Runtime-BE	d6944086fe	fix(plugins_registry/builtins): strip API keys from plugin setup.sh env Issue #19 (CWE-C-312): AgentskillsAdaptor.install() passed the full os.environ to the subprocess running setup.sh, including ANTHROPIC_API_KEY, OPENAI_API_KEY, GITHUB_TOKEN, WORKSPACE_AUTH_TOKEN, etc. A malicious or compromised plugin's setup.sh could exfiltrate them. Fix: _scrubbed_env() builds a copy of os.environ with sensitive keys removed, matching the same _SCRUB_KEYS list used in skill_loader/loader.py so the scrubbing policy is consistent. CONFIGS_DIR is still passed via the extra dict. Non-secret vars (PATH, HOME, etc.) are preserved. Add 6 regression tests (30/30 passing). Co-Authored-By: Infra-Runtime-BE <infra-runtime-be@molecule.ai>	2026-04-20 22:52:13 +00:00
Molecule AI Infra-Runtime-BE	c72fbfc9a4	fix(builtin_tools/audit): fail-secure RBAC — read-only default when config unavailable Fixes #11 (CWE-285): get_workspace_roles() returned ["operator"] (full delegate/approve/memory.write) when workspace config could not be loaded. Changed to ["read-only"] — deny-by-default per Principle of Least Privilege. Add regression tests in tests/test_audit.py. Also includes: - main.py: remove token prefix log (CWE-532) — issue #10/#17 - a2a_mcp_server.py: RBAC gate on sensitive MCP tools (CWE-862) — issue #12 - cli_executor.py: sanitize stderr in error logs (CWE-209) — issue #13 - tests/test_a2a_mcp_server.py: 5 new regression tests for MCP RBAC Co-Authored-By: Infra-Runtime-BE <infra-runtime-be@molecule.ai>	2026-04-20 22:47:38 +00:00
Hongming Wang	0d1c8e711f	Merge pull request #24 from Molecule-AI/fix/pin-a2a-sdk-pre-1-0 Some checks failed Publish to PyPI / build-and-publish (push) Failing after 32s Details fix: pin a2a-sdk<1.0 — keep a2a.server.apps import working	2026-04-20 15:36:15 -07:00
Hongming Wang	90a1bdbbf4	fix: pin a2a-sdk<1.0 to keep a2a.server.apps import working a2a-sdk 1.0.0 restructured the package and removed a2a.server.apps, which main.py imports directly for A2AStarletteApplication. The current >=0.3.25 constraint resolves to 1.0.0 on fresh installs and the runtime crashes at startup with ModuleNotFoundError — which is exactly what bit production workspace EC2 instances provisioned on 2026-04-20. Bump to 0.1.3 and pin <1.0 until we're ready to migrate to the 1.x import paths. Companion fix in molecule-controlplane PR #174 pins at pip-install time; this PR fixes the upstream package so other callers don't re-hit the same trap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 15:34:27 -07:00
molecule-ai[bot]	2391952eae	Merge pull request #7 from Molecule-AI/fix/auth-headers-and-pip-audit fix: add auth headers to skill promotion logs and improve pip-audit severity parsing	2026-04-20 08:50:26 -07:00
Molecule AI Backend Engineer 3	fa64a04cba	fix: add auth headers to skill promotion logs and improve pip-audit severity parsing - Extract _auth_headers_for_platform() helper so _maybe_log_skill_promotion() includes auth headers when calling /workspaces/:id/activity (was missing) - Improve pip-audit severity parsing: if fix_versions is present, severity is 'high' (patch available); otherwise 'medium' (no known fix yet)	2026-04-20 05:03:22 +00:00
rabbitblood	2da6f2d1cd	Merge branch 'main' of https://github.com/Molecule-AI/molecule-ai-workspace-runtime into fix/507-mcp-server-path-absolute-imports	2026-04-17 21:36:51 -07:00
rabbitblood	d1719dd2a6	fix: strip CRLF from .sh/.py files in plugin hook installer — permanent #507 fix The TRUE root cause of recurring CRLF: shutil.copy2() in _copy_dir_files() copies hook files byte-for-byte from /plugins/ (mounted from Windows host) into /configs/.claude/hooks/. Windows git checkout introduces \r\n regardless of .gitattributes. Previous fixes were band-aids: - .gitattributes eol=lf (only works for files IN git, not host disk) - entrypoint.sh sed strip (runs at boot but after plugin install) - provisioner CopyTemplateToContainer strip (wrong code path — hooks come through the Python plugin installer, not the Go template copier) This fix strips CRLF at the single point where ALL plugin hooks enter a container: _copy_dir_files() in builtins.py. read_bytes() + replace + write_bytes for .sh/.py files. Other file types pass through unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 21:36:43 -07:00
Hongming Wang	ceeec69c8c	Merge pull request #6 from Molecule-AI/fix/507-mcp-server-path-absolute-imports fix: resolve MCP server path from package + absolute imports (2nd half of #507)	2026-04-16 13:47:54 -07:00
rabbitblood	18d904cfc1	fix: MCP server path resolution + absolute imports (2nd half of #507 ) The a2a MCP subprocess was launched with a hard-coded /app/a2a_mcp_server.py path that only existed in the legacy workspace-template layout. Current templates copy adapter.py into /app but not the MCP server script, so claude-code's mcp_servers={"a2a": ...} config spawned a non-existent file, the server never registered any tools, and every agent reported that search_memory / commit_memory / list_peers / delegate_task / send_message_to_user were unavailable in the tool registry. Surfaced this cycle after the CRLF hook fix (PR molecule-core#508 + plugin repo's .gitattributes) unblocked the primary (no response generated) symptom. Before that, agents crashed before the missing-MCP issue was observable — the two bugs stacked. Changes ------- * executor_helpers._default_mcp_server_path: resolves the installed molecule_runtime.a2a_mcp_server module's __file__ so the path is always correct regardless of template layout. Legacy /app path kept as last-resort fallback for any old images still in rotation. * a2a_mcp_server.py, a2a_tools.py, a2a_client.py: convert bare module imports (from a2a_tools import ...) to absolute (from molecule_runtime.a2a_tools import ...). Previously this worked only when main.py injected the package dir onto sys.path; the MCP subprocess doesn't go through main.py, so the bare imports would fail. Added a sys.path shim at the top of a2a_mcp_server.py so running as a standalone script (python path/to/a2a_mcp_server.py) still works — the subprocess can now locate the package root automatically. * consolidation.py, heartbeat.py, main.py: same bare-to-absolute conversion for platform_auth imports (unblocks the same class of failure if any of these modules are imported from a non-main.py entrypoint in the future). Verification ------------ Deployed the updated files into ws-8010dbd0 (PM) and ran an isolated sdk.query() as agent user. SystemMessage.init.mcp_servers now reports [{'name': 'a2a', 'status': 'connected'}] and the tools list includes all 8 mcp__a2a__* entries: mcp__a2a__check_task_status, mcp__a2a__commit_memory, mcp__a2a__delegate_task, mcp__a2a__delegate_task_async, mcp__a2a__get_workspace_info, mcp__a2a__list_peers, mcp__a2a__recall_memory, mcp__a2a__send_message_to_user Rolled the in-container hotfix across all 22 workspaces pending release (docker cp the 4 changed files into each site-packages/molecule_runtime/). Fixes Molecule-AI/molecule-core#507 (secondary) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:28:57 -07:00
Hongming Wang	d140999a09	Merge pull request #5 from Molecule-AI/fix/488-session-file-existence-gate Some checks failed Publish to PyPI / build-and-publish (push) Failing after 34s Details fix: gate session resume on file existence (closes #488 in molecule-core)	2026-04-16 11:16:26 -07:00
rabbitblood	6cd4d74c5a	test: move sdk stubs to conftest.py (consistent across all test modules)	2026-04-16 11:15:45 -07:00
rabbitblood	a35d128870	test: stub claude_agent_sdk + a2a in session-resume tests CI failed on collect because claude_agent_sdk + a2a aren't test-env deps (they're installed inside the claude-code workspace image). The test file now stubs both via sys.modules so the collector can import claude_sdk_executor without pulling the real SDKs. Tests don't exercise the SDK anyway — only _resolve_resume() glob logic.	2026-04-16 11:13:33 -07:00
rabbitblood	3b56410ad5	fix: gate session resume on file existence (closes #488 ) ## Symptom (cycle 6+ of #488) Workspaces appear `online` (heartbeats fine) but every cron tick fails silently with `No conversation found with session ID: <uuid>` → `ProcessError: exit code 1` → idle loop logs HTTP 200, no actual work happens. Backend Engineer received 5 idle pulses without claiming a single one of the 6 open Hermes issues (#496-500) because the bug prevents `gh issue list` from ever firing. ## Root cause (verified live in ws-20cb8ff8-3e4 today) claude-code stores sessions at `/root/.claude/projects/<cwd-with-/-as-->/<id>.jsonl`. When a workspace container is recreated, `self._session_id` from a prior instance references a file that no longer exists. Passing it as `resume=<id>` to ClaudeAgentOptions crashes the CLI on the very first call. The existing #75 fix only fires AFTER the first ProcessError lands, and per-cycle executor re-instantiation can reload the stale id from elsewhere — restart-with-reset_claude_session was the only working mitigation, hand-fired every cycle. ## Fix New `_resolve_resume()` in ClaudeSDKExecutor: probes a handful of well-known session-file locations (`/root/.claude/projects/*/<id>.jsonl`, `/root/.claude/sessions/<id>.jsonl`, plus the agent-uid variants) via `glob.glob`. If no file matches the in-memory `_session_id`, drops the id (sets to None) AND returns None so `ClaudeAgentOptions.resume` is unset — CLI starts a fresh session. Logged at INFO with `#488` in the message so operators correlate. `_build_options()` now calls `_resolve_resume()` instead of reading `self._session_id` directly. Cheap path when no session set: zero glob calls. Hot path (session set + file exists): one glob call, short-circuits on first match. ## Drive-by fix: stale `from X import` in 4 modules Same regression class as #1 (the runtime release that closed it): - `claude_sdk_executor.py:43`: `from executor_helpers import …` - `cli_executor.py:39-40`: `from config import …`, `from executor_helpers import …` - `main.py:28-30`: `from config import …`, `from heartbeat import …`, `from preflight import …` - `preflight.py:7`: `from config import …` All rewritten to absolute `from molecule_runtime.<module> import …` so they resolve outside of workspace containers (e.g. test environments where `/app` isn't on sys.path). The grep guard in `tests/test_imports.py` already covered `adapters` — extending to all top-level imports would catch this class going forward; not in this PR to keep scope tight. ## Tests 6 new in `tests/test_session_resume_gate.py`: - baseline (no session) → no glob, returns None - file exists → keep id, returns id, single glob (early-exit) - file missing → drop id (clears `_session_id`), returns None - late-pattern match → walks all patterns until hit - log includes session id (operator triage) - log references #488 (debugger discoverability) All 16 tests (10 existing + 6 new) pass. ## Release plan - Bump version 0.1.1 → 0.1.2 (in this commit) - After merge, push v0.1.2 tag → publish.yml auto-publishes to PyPI - Then rebuild workspace template images locally so workspaces pick up the fix (templates pin `>=0.1.0`, will resolve to 0.1.2 on next build) - Then mass-restart workspaces with reset_claude_session=true once to clear any DB-side stale state, and the permanent fix kicks in Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 11:12:03 -07:00

1 2

57 Commits