molecule-ai-workspace-runtime

molecule-ai/molecule-ai-workspace-runtime

Author	SHA1	Message	Date
devops-engineer	3a9c76eeef	Merge pull request 'fix(post-suspension): migrate github.com/Molecule-AI refs to git.moleculesai.app (Class G #168 )' (#2 ) from fix/post-suspension-github-urls into main All checks were successful ci / mirror-guard (push) Successful in 10s Details Secret scan / Scan diff for credential-shaped strings (push) Successful in 13s Details	2026-05-07 20:02:39 +00:00
devops-engineer	91eac4b611	fix(post-suspension): migrate github.com/Molecule-AI refs to git.moleculesai.app (Class G #168 ) Some checks failed ci / mirror-guard (pull_request) Failing after 7s Details Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s Details The GitHub org Molecule-AI was suspended on 2026-05-06; canonical SCM is now Gitea at https://git.moleculesai.app/molecule-ai/. Stale github.com/Molecule-AI/... URLs return 404 and break tooling that clones / pip-installs / curls them. This bundles all non-Go-module URL fixes for this repo into a single PR. Go module path references (in *.go, go.mod, go.sum) are out of scope here -- tracked separately under Task #140. Token-auth clone URLs also flip ${GITHUB_TOKEN} -> ${GITEA_TOKEN} since the GitHub token does not auth against Gitea. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 13:02:35 -07:00
security-auditor	a96f696ffb	fix(ci): inline secret-scan body, drop cross-repo uses: of private molecule-core All checks were successful ci / mirror-guard (push) Successful in 4s Details Secret scan / Scan diff for credential-shaped strings (push) Successful in 6s Details The 3-line wrapper at .github/workflows/secret-scan.yml referenced `uses: molecule-ai/molecule-core/.github/workflows/secret-scan.yml@staging`. molecule-core is private; act_runner clones cross-repo reusable workflows anonymously, so the resolve fails at 0s with no logs. Same root cause + same fix that molecule-controlplane already shipped (see its secret-scan.yml comment block lines 10-22). Inlining keeps the gate functional until Gitea is upgraded or the canonical scanner moves to a public repo. When either lands, this file reverts to the 3-line wrapper. Refs: internal#46 Phase 3 Class 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 02:29:03 -07:00
claude-ceo-assistant	c266664f12	Merge pull request 'fix(ci): lowercase 'molecule-ai/' in cross-repo workflow refs' (#1 ) from fix/lowercase-org-slug into main Some checks failed Secret scan / secret-scan (push) Failing after 0s Details ci / mirror-guard (push) Successful in 5s Details	2026-05-07 08:59:04 +00:00
security-auditor	d7ea277ce4	fix(ci): lowercase 'molecule-ai/' in cross-repo workflow refs Some checks failed Secret scan / secret-scan (pull_request) Failing after 0s Details ci / mirror-guard (pull_request) Failing after 3s Details Gitea is case-sensitive on owner slugs; canonical is lowercase `molecule-ai/...`. Mixed-case `Molecule-AI/...` refs fail-at-0s when the runner tries to resolve the cross-repo workflow / checkout. Same fix as molecule-controlplane#12. Mechanical case-correction; no behavior change beyond making CI resolve again. Refs: internal#46 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 01:00:01 -07:00
Hongming Wang	05486e193c	Merge pull request #56 from Molecule-AI/chore/lockdown-as-mirror Some checks failed Secret scan / secret-scan (push) Failing after 0s Details ci / mirror-guard (push) Successful in 9s Details chore: lock down as publish artifact; source-of-truth is monorepo	2026-04-29 01:58:40 -07:00
Hongming Wang	0fb1038724	Merge pull request #58 from Molecule-AI/chore/precommit-add-minimax-pattern chore(precommit): add sk-cp- MiniMax pattern (F1088 retroactive fix); bump 0.1.16 → 0.1.17	2026-04-29 00:54:13 -07:00
Hongming Wang	d517deea72	Merge pull request #57 from Molecule-AI/chore/enroll-secret-scan chore(ci): enroll in org-wide secret-scan reusable workflow (#2109 rollout)	2026-04-29 00:54:09 -07:00
Hongming Wang	b8903eac09	Merge pull request #59 from Molecule-AI/fix/secret-scan-add-sk-cp-pattern fix(pre-commit): align SECRET_PATTERNS with molecule-core canonical (add sk-cp-)	2026-04-28 15:25:44 -07:00
Hongming Wang	2d514612a2	fix(pre-commit): align SECRET_PATTERNS with molecule-core canonical (add sk-cp-) The bundled pre-commit hook is the runtime-side mirror of molecule-core's canonical .github/workflows/secret-scan.yml SECRET_PATTERNS array. They drifted: canonical added the MiniMax sk-cp- pattern (F1088 vector — caught only after the fact) but this side wasn't updated. Result: a workspace developer's local pre-commit would let through a sk-cp- token that the org-wide CI scan would then refuse — useless friction. This brings the two sides back into byte-aligned-on-the-pattern-list state. The drift is exactly the maintenance gap that task #139's upcoming molecule-core CI lint is designed to surface automatically; this PR clears the gap so the lint passes from day 1. Refs: task #139.	2026-04-28 15:21:13 -07:00
rabbitblood	e927d3b281	chore(precommit): add sk-cp- MiniMax pattern (F1088 retroactive fix); bump 0.1.16 → 0.1.17	2026-04-26 21:43:24 -07:00
rabbitblood	d381f20779	fix(ci): use molecule-core@staging — repo was renamed from molecule-monorepo, workflow lives on staging	2026-04-26 15:44:29 -07:00
rabbitblood	0b11d669b5	chore(ci): enroll in org-wide secret-scan reusable workflow Calls the canonical workflow shipped in Molecule-AI/molecule-monorepo#2109. Defense against the #2090-class leak: a hosted-agent commit slipping a credential-shaped string into a PR — caught at the PR layer, before merge. Higher stakes here than most repos: this package publishes to PyPI, so a leaked credential on a release tag would propagate to every downstream tenant on next pip install. Pattern set lives in molecule-monorepo so we don't maintain a parallel copy here. Pairs with the runtime-side pre-commit hook (scripts/pre-commit-checks.sh) which catches local commits before they reach a PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 15:14:17 -07:00
Hongming Wang	01b818d1c8	Merge pull request #55 from Molecule-AI/feat/precommit-secret-scan feat(precommit): add secret scan to bundled pre-commit hook (defense-in-depth for #2090-style leaks)	2026-04-26 12:40:50 -07:00
Hongming Wang	96864263bb	chore: lock down as publish artifact; source-of-truth is monorepo This repo is now a publish artifact of Molecule-AI/molecule-core/workspace/. Runtime code edits go to the monorepo; the publish-runtime workflow regenerates this mirror + uploads to PyPI on every runtime-v* tag. Changes: - Delete .github/workflows/publish.yml. PyPI publishing now happens only from the monorepo's publish-runtime workflow. Without removing this, two different code shapes could reach PyPI depending on which workflow fired (the drift this lockdown is preventing). - Delete .github/workflows/auto-promote-staging.yml. The staging→main fast-forward dance has no purpose on a mirror repo — the mirror is rebuilt wholesale on each release. - Replace .github/workflows/ci.yml with a 'mirror-guard' job that fails on any pull_request event with a clear redirect message. Push events are still allowed (so existing in-flight branches don't all turn red while the migration finishes); that allowance becomes a follow-up removal once the auto-sync from monorepo is wired up. - Rewrite README.md with a prominent ⚠ banner pointing at the monorepo. - Add CONTRIBUTING.md with the explicit redirect table. What this does NOT do: - Wire up the auto-sync from monorepo → this repo. The publish-runtime workflow currently uploads to PyPI but doesn't push the rewritten tree back here. As a follow-up, extend that workflow with a step that commits the build dir to this repo's main. Until then this repo's contents will go stale relative to PyPI — but that's fine because no one should be reading code from here anyway. 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-04-26 12:03:12 -07:00
rabbitblood	f1bede31a8	feat(precommit): add secret scan to bundled pre-commit hook (defense-in-depth for #2090-style leaks) Adds a secret-scan gate alongside the existing internal-paths block in the runtime's bundled pre-commit hook. Runs on every commit in every repo (not scoped to Molecule-AI public repos like the internal-paths block) — refuses any staged addition matching a high-value credential shape and prints a recovery message that does NOT echo the secret value. Pattern set covers GitHub family (ghp_, ghs_, gho_, ghu_, ghr_, github_pat_), Anthropic / OpenAI / Slack / AWS — same shape as the tenant-proxy CI scanner; keep aligned when either side adds a pattern. Single hook file dispatches both checks (renamed pre-commit-block-internal-paths.sh → pre-commit-checks.sh) so each agent commit pays one git-config + one hook-install surface, not two. Both checks share the existing fast-paths (skip if GIT_AUTHOR_NAME unset; skip during rebase / cherry-pick / merge / revert). End-to-end test exercises a real bash subprocess against a real temp git repo with real staged content. Three cases: - ghs_-prefixed token in package.json (the actual #2090 vector) → refuse - clean README → pass through - sk-ant- key in a non-Molecule-AI repo → refuse (secret scan is universal, internal-paths block is not) Skipped when bash is not on PATH so Windows test environments without WSL stay green. Bumps version 0.1.15 → 0.1.16. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 11:57:39 -07:00
Hongming Wang	7fc3537b2f	Merge pull request #42 from Molecule-AI/fix/stderr-capture-a2a-response fix(runtime): capture stderr in A2A error response (closes #66)	2026-04-24 13:25:15 -07:00
Hongming Wang	1759e221e9	Merge pull request #53 from Molecule-AI/chore/bump-0.1.15 Some checks failed Publish to PyPI / build-and-publish (push) Failing after 41s Details chore: bump to 0.1.15 — ship A2A_ERROR observability fix (#51)	2026-04-24 12:03:36 -07:00
rabbitblood	84f3faea8a	chore: bump to 0.1.15 — ship A2A_ERROR observability fix (#51 ) PR #52 fixed the empty '[A2A_ERROR] ' suffix but didn't bump the version — the fix landed on main without a corresponding PyPI release, so workspace-template rebuilds keep pulling 0.1.14 and the fix never reaches running agents. Bump to 0.1.15 to trigger the publish-on-tag workflow (maintainer pushes v0.1.15 tag after staging→main promotion). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 11:54:13 -07:00
Hongming Wang	0d71ee8345	Merge pull request #52 from Molecule-AI/fix/a2a-error-observability-51 fix(a2a): include exception class + error code in [A2A_ERROR] (#51)	2026-04-24 11:35:42 -07:00
rabbitblood	4940abdc68	fix(tests): remove pytest-asyncio dependency from #51 regression tests CI does not install pytest-asyncio — follow test_shared_runtime.py's _run(coro) helper pattern. Tests still cover the same two paths (bare exception class-name fallback + message passthrough) but no longer require the async pytest plugin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 11:34:30 -07:00
rabbitblood	6ead3b433e	fix(a2a): include exception class + error code in [A2A_ERROR] (#51 ) When an exception's str() is empty (bare TimeoutError(), BrokenPipeError(), some httpx transport errors) `f"{_A2A_ERROR_PREFIX}{e}"` produced `"[A2A_ERROR] "` with a trailing space and zero diagnostic context, masking the real cause of peer-delegation failures in activity_logs. Observed on main monorepo: 22+ occurrences in 75 min across 7 leads during the MiniMax M2.7 trial rate-limit episode — zero breadcrumbs to route the debug from. Fix: - Exception branch: fall back to `type(e).__name__` when str(e) is empty - Error branch: include JSON-RPC `error.code` alongside message when present Tests: test_a2a_error_observability.py covers both the bare-exception path (must surface class name) and the message-passthrough path (must preserve existing useful messages). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 11:22:57 -07:00
Hongming Wang	d75a161ee8	fix(ci): sync auto-promote workflow (ff-only, no-gates mode)	2026-04-24 08:35:15 -07:00
Hongming Wang	ae624a1f6a	Merge pull request #50 from Molecule-AI/chore/add-auto-promote-staging chore(ci): add auto-promote-staging workflow	2026-04-24 08:18:43 -07:00
Hongming Wang	f58d12bee2	chore(ci): add auto-promote-staging workflow	2026-04-24 07:43:56 -07:00
Hongming Wang	a80294766c	Merge pull request #49 from Molecule-AI/fix/precommit-skip-rebase Some checks failed Publish to PyPI / build-and-publish (push) Failing after 40s Details fix(precommit): skip during rebase/cherry-pick/merge/revert — unblocks DIRTY PR rebase	2026-04-24 04:35:19 -07:00
rabbitblood	c43df7f947	fix(precommit): skip during rebase/cherry-pick/merge/revert — unblocks DIRTY PR rebase Trace from molecule-core cycle 107 (2026-04-24): 15 staging PRs stuck DIRTY (real merge conflicts) with 0 merges in 1+ hours. Authors couldn't rebase to fix the conflicts because the pre-commit hook (shipped in 0.1.11) refuses ANY commit that includes forbidden paths in the diff — including rebase replays of historical commits that pre-date the gate. Specifically, agents trying to `git rebase staging` on a PR like "docs(marketing): Phase 30 social copy" fail at the first commit replay because that commit added marketing/* files. The fix would require interactive rebase + manual file deletion + commit amend — agents don't do that, so the PR stays DIRTY indefinitely. Detection: check .git for rebase-merge/, rebase-apply/, CHERRY_PICK_HEAD, MERGE_HEAD, or REVERT_HEAD. These state markers exist only during the corresponding git operation. Skip the hook silently when present. The hook still blocks fresh `git commit` (the failure mode it was designed for). It just doesn't try to police what was already in git history. Bumped to 0.1.14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 04:34:55 -07:00
Hongming Wang	faa5b42aa4	Merge pull request #47 from Molecule-AI/fix/enable-v0-3-compat Some checks failed Publish to PyPI / build-and-publish (push) Failing after 42s Details fix: enable v0_3 compat in JSON-RPC dispatcher	2026-04-24 02:37:20 -07:00
rabbitblood	19f0033222	fix: enable v0_3 compat in JSON-RPC dispatcher — platform sends old method names Sister fix to 0.1.12 (root mounting). After fixing the route mount, every inbound A2A still returned `-32601 Method not found` because the 1.x dispatcher's method table doesn't recognize v0.3-shaped names (`message/send`, `tasks/get`) that the platform's ProxyA2A still sends. Reproduces in the SDK on a minimal handler: create_jsonrpc_routes(h, "/") → "Method not found" create_jsonrpc_routes(h, "/", enable_v0_3_compat=True) → dispatches OK Bumped to 0.1.13. Both 0.1.12 and 0.1.13 are needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 02:37:07 -07:00
Hongming Wang	d22c19ad31	Merge pull request #46 from Molecule-AI/fix/jsonrpc-mount-at-root Some checks failed Publish to PyPI / build-and-publish (push) Failing after 40s Details fix: mount JSON-RPC at root — fixes silent fleet productivity loss	2026-04-24 02:06:58 -07:00
rabbitblood	30ebe9baf3	fix: mount JSON-RPC at root — platform POSTs to /, not /api/v1/jsonrpc/ Baseline restart 2026-04-24: every workspace came up healthy (uvicorn listening, agent-card serving) but produced zero delegations for two maintenance cycles. Tracing revealed platform's ProxyA2A POSTs to `http://ws-<id>:8000/` (no path suffix, see workspace-server/internal/provisioner.InternalURL) while the runtime's JSON-RPC routes were mounted at `/api/v1/jsonrpc/` under the a2a-sdk 1.x API migration. Result was silent — every inbound A2A returned 404 Not Found, the platform logged "Not Found" at INFO level, but no error bubbled up because the SDK's jsonrpc route factory doesn't respond to root when mounted at a subpath. Agents stayed warm, crons fired, but no work flowed. Fix: `create_jsonrpc_routes(handler, "/")` — matches platform expectation and the agent-card self-advertisement (which also shows root as the JSON-RPC URL). Agent-card route keeps its hard-coded `/.well-known/agent-card.json` path so there's no collision. Bumped to 0.1.12. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 02:06:04 -07:00
Hongming Wang	64720c0fc6	Merge pull request #45 from Molecule-AI/feat/precommit-hook-block-internal-paths Some checks failed Publish to PyPI / build-and-publish (push) Failing after 41s Details feat: pre-commit hook to block internal paths in public monorepo (A)	2026-04-24 00:49:42 -07:00
rabbitblood	89739bf848	feat: pre-commit hook to block internal paths in public monorepo (A) Anti-leak proposal item A. Companion to D (decision tree in role prompts, separate PR on org-templates). Why a local pre-commit hook =========================== Agents try to `git add /research/foo.md` despite SHARED_RULES, the .gitignore patterns, and the CI gate. Each leak attempt costs ~5 cycles (PR opens, CI fails, agent retries with workaround) and pollutes git history with reverts. A pre-commit hook converts the failure from "PR opens then fails" → "commit refused immediately, with the recovery command printed in the same error message the agent reads." Agents act on what's in the current response context — putting the redirect command literally in the failure output is the highest-density feedback we can provide. What changes ============ - molecule_runtime/scripts/pre-commit-block-internal-paths.sh — bash hook. Checks `git remote get-url origin`, only enforces in Molecule-AI/molecule-monorepo + molecule-core. In every other repo (internal, plugins, templates, third-party) it's a no-op. When forbidden paths are staged, refuses the commit with the redirect recipe + the alternative public-facing paths + the workflow-edit path for legitimate exceptions. - molecule_runtime/precommit_hook.py — install_pre_commit_hook(): 1. Extracts bundled hook to ~/.molecule-runtime/git-hooks/pre-commit 2. chmod +x 3. Sets core.hooksPath globally — UNLESS already set by an operator (then logs a warning + skips, doesn't clobber) - molecule_runtime/main.py — calls install_pre_commit_hook() at step 0.2, right after install_credential_helper() - pyproject.toml bumped to 0.1.11 Both A and D together close the loop: D ensures the agent knows the right path before writing; A enforces it at the local git boundary if the agent forgets. CI gate remains the third backstop for anything that gets pushed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 00:48:47 -07:00
Hongming Wang	f334872d56	Merge pull request #44 from Molecule-AI/feat/inline-credential-helper feat: ship GitHub credential-helper inline in runtime (fixes #1933 class)	2026-04-24 00:42:32 -07:00
rabbitblood	f1329fe230	feat: ship GitHub credential-helper inline in runtime (fixes #1933 class) Lifts the per-template wiring (Dockerfile COPY + entrypoint.sh git config + nohup daemon launch) into the Python runtime. Templates that depend on molecule-ai-workspace-runtime get the behavior automatically — they no longer need to maintain their own copy of the helper scripts or remember to write the right git config in their entrypoint. Background: - GitHub App installation tokens (ghs_…) expire ~60min after issue - claude-code-default template shipped without wiring → 39 workspaces lost their tokens, three PMs' A2A queues filled with retry-status messages, manual fleet restart required (cycle 62-66 incident) This commit: - Adds molecule_runtime/scripts/{molecule-git-token-helper.sh, molecule-gh-token-refresh.sh} as package data (copies from canonical workspace/scripts/ in molecule-monorepo) - Adds molecule_runtime/credential_helper.py with install_credential_helper() that: 1. Extracts bundled scripts to ~/.molecule-runtime/scripts/ 2. Configures git credential.helper for github.com 3. Creates ~/.molecule-token-cache/ mode 0700 4. Spawns refresh daemon under respawn loop (PID file dedup) 5. Runs initial gh auth login --with-token - Hooks call site early in main.py (step 0.1, before config load) - Fails-soft: each step independently fault-tolerant; missing git/gh binary doesn't block runtime startup Bumped to 0.1.10. Templates can drop their entrypoint.sh credential helper setup once they update the runtime pin (separate PRs per template). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 00:41:32 -07:00
Molecule AI SDK-Dev	19fde6f466	fix(runtime): capture stderr in A2A error response (closes #66 ) - Lower _PROCESS_ERROR_STDERR_MAX_CHARS to 1024 (was 4096) so A2A responses stay bounded — the full context is already in workspace logs via logger.error/exception. - Add stderr= kwarg to sanitize_agent_error() so callers can surface subprocess stderr verbatim in A2A responses. - In _execute_locked() non-retryable error path, extract the first 1 KB of exc.stderr and pass it to sanitize_agent_error() so the A2A response carries actionable context (rate limit message, auth error, etc.) instead of just a class name. - Add test_executor_helpers.py unit tests for the new stderr= kwarg.	2026-04-24 05:00:51 +00:00
molecule-ai[bot]	d5cf872311	feat: migrate a2a-sdk 1.x (KI-009) (#39 ) - Replace a2a.utils.new_agent_text_message → a2a.helpers.new_text_message - Replace Part(root=TextPart(...)) → Part(text=...) (flat Part API) - Replace A2AStarletteApplication → Starlette route factories (create_agent_card_routes, create_jsonrpc_routes) - Update conftest stubs: remove a2a.server.apps/a2a.utils, add a2a.server.routes/a2a.helpers/AgentInterface - Add AgentInterface to AgentCard supported_interfaces - Rename snake_case AgentCard fields per 1.x schema Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 01:54:33 +00:00
Hongming Wang	1b04da2061	Merge pull request #38 from Molecule-AI/fix/auto-detect-llm-token-type feat(runtime): auto-detect LLM token type, normalise env on boot	2026-04-23 13:53:06 -07:00
Hongming Wang	e562b7a03e	Merge branch 'staging' into fix/auto-detect-llm-token-type	2026-04-23 13:52:25 -07:00
Hongming Wang	3556244725	Merge pull request #40 from Molecule-AI/fix/heartbeat-401-token-refresh-1877 fix(heartbeat): refresh on-disk auth token on 401 + retry once (#1877)	2026-04-23 13:51:42 -07:00
rabbitblood	a78b9f229e	test(1877): convert async tests to sync httpx.Client to unblock CI CI doesn't have pytest-asyncio installed, and the async wrapping was incidental — the production retry pattern (refresh-on-401) is identical in sync and async forms. Switching to httpx.Client + MockTransport keeps the same coverage without the async dep. 6/6 still pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 13:35:45 -07:00
rabbitblood	050c2412b3	fix(heartbeat): refresh on-disk auth token on 401 + retry once (#1877 ) ## Problem Auto-restart rotates the workspace's auth token in two non-atomic steps: 1. Platform issues new token via wsauth.IssueToken 2. Provisioner writes the new token to /configs/.auth_token AFTER ContainerStart returns Between steps 1 and 2, the new container has booted and the runtime has already loaded the OLD cached value of .auth_token (or no value if the file was empty during boot). The runtime's first /registry/heartbeat call sends the stale token, gets 401, but the loop never re-reads the on-disk token — so subsequent heartbeats also send the stale value. Each 401 means the platform never sees the workspace as alive → status stays 'provisioning' → scheduler won't dispatch → workspace looks dead from every angle even though the container is actually running. The existing code comment in workspace_provision.go acknowledges this: "the workspace will get 401 on its first heartbeat and can recover on the next restart." That recovery only worked because workspaces used to crash for unrelated reasons and get restarted. After PR #1861 (provisioner empty-volume auto-recover) removed those crashes, workspaces get stuck in the 401 loop with no exit. ## Fix Two-part runtime-side fix in molecule-ai-workspace-runtime: 1. platform_auth.refresh_from_disk() — new helper that clears the in-memory cache and re-reads /configs/.auth_token. Returns the fresh value (or None if missing). Updates the cache as a side effect. 2. HeartbeatLoop._loop() — on 401 from /registry/heartbeat, calls refresh_from_disk() and retries the request ONCE with the new token. Same pattern in _check_delegations(). Bounded retry budget — if the on-disk token is also stale (bug elsewhere), no infinite loop. ## Tests 6/6 new tests in tests/test_token_refresh_1877.py: - refresh_picks_up_rotated_token — happy path - refresh_returns_none_when_file_missing — defensive - refresh_clears_stale_cache_when_file_disappears - refresh_is_idempotent - 401_retry_pattern_uses_refreshed_token — the production fix path - 401_retry_no_loop_when_disk_token_also_stale — bounded retry budget All pass locally on Python 3.13 + pytest 9. ## Why this fix and not the alternatives - Alternative B (platform writes token before ContainerStart): Right architecturally but invasive — needs provisioner refactor to prep volumes before docker run. - Alternative C (skip rotation on auto-restart): Breaks the multi-instance-safety invariant the existing code calls out (revoke prevents stale tokens from sister deployments). - This fix (A): 3-line core change + helper. Self-healing for any timing edge case, not just the post-restart one. Costs nothing in the happy path (only triggers on 401). ## Version Bumped to 0.1.9. Once published to PyPI + workspace template image rebuilt, deployed workspaces auto-recover from token-rotation races without operator intervention. Closes #1877. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 13:26:36 -07:00
rabbitblood	4bafea58ae	fix(llm_auth): tighten base-URL hostname match + strip whitespace + no token in logs Self-review findings on #38: 1. Token substring leak: the "unknown prefix" warning included the first 12 chars of the token in the log message. Logs get shipped to Langfuse / CloudWatch / slack-firehose — 12 bytes of a secret in a log is still 12 bytes too many. Warning no longer references the token value at all. 2. Base-URL substring match was too loose: `"anthropic.com" not in base` would accept `https://proxy.anthropic.com.evil.example/` as "looks like Anthropic, keep the URL." Replaced with an allowlist of exact hostnames parsed via urllib.parse.urlparse. 3. Whitespace in pasted tokens: operators frequently paste tokens from terminals with a trailing newline. The token would flow through startswith() detection but then fail downstream auth with a confusing "malformed token" error. Strip and persist the cleaned value. 4. Malformed base URL crash guard: if someone sets ANTHROPIC_BASE_URL to something urlparse can't handle, don't crash — fall through to clearing it, which is the safe choice in OAuth mode. Added 5 new tests covering each of the above. 16/16 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:46:07 -07:00
rabbitblood	0a0f11b41f	feat(runtime): auto-detect LLM token type, normalise env on boot Platform stores per-workspace LLM credentials under a single key (ANTHROPIC_AUTH_TOKEN in workspace_secrets). But downstream tools expect different env var names depending on the token type: sk-ant-oat01-* → CLAUDE_CODE_OAUTH_TOKEN (Claude Code OAuth session) sk-ant-api03-* → ANTHROPIC_API_KEY (direct Anthropic API) sk-cp-* → ANTHROPIC_AUTH_TOKEN (proxy: MiniMax, gateways) Without normalisation, an OAuth token under ANTHROPIC_AUTH_TOKEN gets sent as a bearer to api.anthropic.com, which responds: 401 authentication_error: OAuth authentication is currently not supported. This was a platform-wide footgun: anyone rotating LLM keys had to know the exact env var for each token type, AND make sure stale overrides were cleared, AND set ANTHROPIC_BASE_URL correctly for proxies (or NOT set for native Claude). Nothing downstream could help — the SDK just saw the wrong var. Fix: - New molecule_runtime/llm_auth.py — normalise_llm_env() mutates os.environ (or any dict) to the correct shape based on token prefix. Returns a NormalisationResult for logging. - main.py calls it as step 0, before any adapter/executor import. Every adapter (claude-code, langgraph, crewai, autogen, hermes, …) benefits automatically — no per-adapter branching needed. - 11 unit tests covering all prefix paths, edge cases, and the "operator deliberately set CLAUDE_CODE_OAUTH_TOKEN" precedence rule. Operationally: this means operators can keep using one ANTHROPIC_AUTH_TOKEN slot in platform settings and just paste whatever token the agent needs. No env-var-name awareness required. Tested locally: 11/11 new tests pass. 83 other tests unchanged (pre-existing failures on staging are all unrelated: test_workspace_id_validation, test_a2a_mcp_server RBAC, the test_imports.main module-walker — same signature as on staging HEAD before this PR). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:41:47 -07:00
molecule-ai[bot]	dcb6edd1a1	fix(shared_runtime): push heartbeat on CLEAR in set_current_task() (#37 ) Fixes #1372 — phantom busy: canvas showed workspace as active for up to 30s after task completion because set_current_task("") returned early without posting the updated heartbeat. Before: clearing only updated the heartbeat object; the next 30s scheduled heartbeat cycle propagated the clear. Quick tasks would leave a phantom-busy indicator. After: both SET and CLEAR push immediately to /registry/heartbeat. active_tasks=0 on clear, active_tasks=1 on set. Heartbeat object update and HTTP post are now unconditional. Tests: 5 new cases covering SET/CLEAR HTTP body, error resilience, None heartbeat, and missing env vars. Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>	2026-04-22 17:33:42 +00:00
rabbitblood	1e545ed6ba	chore: bump 0.1.8 — executor_helpers phantom-busy fix confirmed in tree Some checks failed Publish to PyPI / build-and-publish (push) Failing after 8s Details	2026-04-21 07:16:47 -07:00
rabbitblood	5a1990552d	chore: bump 0.1.7 — ensure executor_helpers phantom-busy fix in PyPI build Some checks failed Publish to PyPI / build-and-publish (push) Failing after 7s Details	2026-04-21 07:07:17 -07:00
rabbitblood	59f54560a0	Merge branch 'main' of https://github.com/Molecule-AI/molecule-ai-workspace-runtime into fix/507-mcp-server-path-absolute-imports Some checks failed Publish to PyPI / build-and-publish (push) Failing after 6s Details # Conflicts: # pyproject.toml	2026-04-21 06:37:38 -07:00
rabbitblood	d3235cc564	fix(heartbeat): increment/decrement active_tasks + push on clear (#1372 , #1408 ) Both set_current_task() implementations (shared_runtime.py + executor_helpers.py): - Increment active_tasks on task start, decrement on completion (was binary 0/1) - Push heartbeat immediately on BOTH increment AND decrement - Only clear current_task when active_tasks reaches 0 (preserves description for still-running tasks) Fixes phantom-busy: the old code returned early on clear, leaving active_tasks=1 in the platform DB until the next 30s heartbeat cycle. If a new cron fired before the heartbeat, the workspace appeared permanently busy — required manual DB reset every 30 min. Bump: 0.1.2 → 0.1.3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 06:37:12 -07:00
Hongming Wang	7febb51382	Merge pull request #36 from Molecule-AI/chore/bump-0.1.5 Some checks failed Publish to PyPI / build-and-publish (push) Failing after 6s Details chore: bump to 0.1.5 for X-Molecule-Org-Id header fix	2026-04-20 20:30:54 -07:00

1 2

94 Commits