WORKSPACE_ID was read via os.environ.get("WORKSPACE_ID", "") in multiple
builtin_tools modules and used directly in platform API URLs and X-Workspace-ID
headers without validation. A crafted ID containing /, .., or # could cause
URL path injection.
Fix: validate_workspace_id() in platform_auth.py now validates the ID format
at module import time using a regex that permits only lowercase alphanumerics
and hyphens (matching UUIDs and org-generated IDs). The validated value is
exposed as a module-level WORKSPACE_ID constant. builtin_tools/approval.py
and builtin_tools/delegation.py now import from platform_auth instead of
reading os.environ directly.
Failing input raises ValueError with a clear message — workspace fails fast
at startup rather than silently accepting malformed IDs in requests.
Add 15 regression tests (45/45 passing total).
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Infra-Runtime-BE <infra-runtime-be@molecule.ai>
ADAPTER_MODULE resolution required the imported module to export a
class literally named `Adapter`. The claude-code, langgraph, and
openclaw adapter-template repos (3 of 4 currently in production) don't
ship that alias — they export ClaudeCodeAdapter / LangGraphAdapter /
OpenClawAdapter directly. Only hermes has the `Adapter = HermesAdapter`
shim at the bottom of adapter.py.
Consequence in prod: every fresh claude-code / langgraph / openclaw
workspace crashed at runtime startup with
"module 'adapter' has no attribute 'Adapter'", even with a2a-sdk
correctly pinned <1.0. Provisioning looked successful from CP's side
(EC2 ran) but the agent never registered because the process never
reached A2A bootstrap.
Fix: if `Adapter` is absent from the imported module, scan the module
for any attribute that is a proper BaseAdapter subclass (excluding
BaseAdapter itself — regression guard in tests). The explicit alias
remains the preferred contract; this is purely additive tolerance.
Bump to 0.1.4 and publish to PyPI via the existing v* tag trigger.
6 new tests cover: explicit alias, subclass-fallback, non-adapter-noise
ignored, empty module → error, missing module → error, re-exported
BaseAdapter → not selected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #19 (CWE-C-312): AgentskillsAdaptor.install() passed the full
os.environ to the subprocess running setup.sh, including
ANTHROPIC_API_KEY, OPENAI_API_KEY, GITHUB_TOKEN, WORKSPACE_AUTH_TOKEN,
etc. A malicious or compromised plugin's setup.sh could exfiltrate them.
Fix: _scrubbed_env() builds a copy of os.environ with sensitive keys
removed, matching the same _SCRUB_KEYS list used in skill_loader/loader.py
so the scrubbing policy is consistent. CONFIGS_DIR is still passed via
the extra dict. Non-secret vars (PATH, HOME, etc.) are preserved.
Add 6 regression tests (30/30 passing).
Co-Authored-By: Infra-Runtime-BE <infra-runtime-be@molecule.ai>
CI failed on collect because claude_agent_sdk + a2a aren't test-env deps
(they're installed inside the claude-code workspace image). The test file
now stubs both via sys.modules so the collector can import
claude_sdk_executor without pulling the real SDKs. Tests don't exercise
the SDK anyway — only _resolve_resume() glob logic.
## Symptom (cycle 6+ of #488)
Workspaces appear `online` (heartbeats fine) but every cron tick fails
silently with `No conversation found with session ID: <uuid>` →
`ProcessError: exit code 1` → idle loop logs HTTP 200, no actual work
happens. Backend Engineer received 5 idle pulses without claiming a
single one of the 6 open Hermes issues (#496-500) because the bug
prevents `gh issue list` from ever firing.
## Root cause (verified live in ws-20cb8ff8-3e4 today)
claude-code stores sessions at `/root/.claude/projects/<cwd-with-/-as-->/<id>.jsonl`.
When a workspace container is recreated, `self._session_id` from a
prior instance references a file that no longer exists. Passing it as
`resume=<id>` to ClaudeAgentOptions crashes the CLI on the very first
call. The existing #75 fix only fires AFTER the first ProcessError
lands, and per-cycle executor re-instantiation can reload the stale id
from elsewhere — restart-with-reset_claude_session was the only working
mitigation, hand-fired every cycle.
## Fix
New `_resolve_resume()` in ClaudeSDKExecutor: probes a handful of
well-known session-file locations (`/root/.claude/projects/*/<id>.jsonl`,
`/root/.claude/sessions/<id>.jsonl`, plus the agent-uid variants) via
`glob.glob`. If no file matches the in-memory `_session_id`, drops the
id (sets to None) AND returns None so `ClaudeAgentOptions.resume` is
unset — CLI starts a fresh session. Logged at INFO with `#488` in the
message so operators correlate.
`_build_options()` now calls `_resolve_resume()` instead of reading
`self._session_id` directly. Cheap path when no session set: zero
glob calls. Hot path (session set + file exists): one glob call,
short-circuits on first match.
## Drive-by fix: stale `from X import` in 4 modules
Same regression class as #1 (the runtime release that closed it):
- `claude_sdk_executor.py:43`: `from executor_helpers import …`
- `cli_executor.py:39-40`: `from config import …`, `from executor_helpers import …`
- `main.py:28-30`: `from config import …`, `from heartbeat import …`, `from preflight import …`
- `preflight.py:7`: `from config import …`
All rewritten to absolute `from molecule_runtime.<module> import …`
so they resolve outside of workspace containers (e.g. test environments
where `/app` isn't on sys.path). The grep guard in `tests/test_imports.py`
already covered `adapters` — extending to all top-level imports would
catch this class going forward; not in this PR to keep scope tight.
## Tests
6 new in `tests/test_session_resume_gate.py`:
- baseline (no session) → no glob, returns None
- file exists → keep id, returns id, single glob (early-exit)
- file missing → drop id (clears `_session_id`), returns None
- late-pattern match → walks all patterns until hit
- log includes session id (operator triage)
- log references #488 (debugger discoverability)
All 16 tests (10 existing + 6 new) pass.
## Release plan
- Bump version 0.1.1 → 0.1.2 (in this commit)
- After merge, push v0.1.2 tag → publish.yml auto-publishes to PyPI
- Then rebuild workspace template images locally so workspaces pick up the
fix (templates pin `>=0.1.0`, will resolve to 0.1.2 on next build)
- Then mass-restart workspaces with reset_claude_session=true once to clear
any DB-side stale state, and the permanent fix kicks in
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Every modular workspace template repo (claude-code, hermes, langgraph,
…) was crashing on boot with:
KeyError: "Unknown runtime '<runtime>'. Available: "
Root cause: `molecule_runtime/main.py` and four other modules used
top-level imports like `from adapters import get_adapter` — a monorepo
legacy that resolved when something on sys.path had an `adapters/`
package. Standalone template repos COPY only `adapter.py` (singular) to
/app and don't ship an `adapters/` package, so this import path went
through some side-resolution that left `get_adapter` unable to see the
user's adapter. The ADAPTER_MODULE → import → getattr → issubclass
chain then silently fell through to the discovery branch and reported
"Unknown runtime".
Fix is one-line per file: `from adapters` → `from molecule_runtime.adapters`
in:
- molecule_runtime/main.py:27
- molecule_runtime/a2a_executor.py:44
- molecule_runtime/coordinator.py:20
- molecule_runtime/prompt.py:6
- molecule_runtime/builtin_tools/temporal_workflow.py:417
Tests + CI added so this regression class is caught at PR time, not at
runtime in self-hosters' clusters:
- tests/test_imports.py: parametrised import smoke for every previously
affected module + a grep guard that fails if any future change
reintroduces a top-level `from adapters` / `import adapters` line
- .github/workflows/ci.yml: runs the smoke on every PR (no CI existed
before — the publish workflow only fires on tag push)
Closes#1.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>