Commit Graph

58 Commits

Author SHA1 Message Date
Hongming Wang
2b9b4306eb fix(adapter): per-entry isolation in _load_providers + tighten _normalize_provider
Two correctness issues spotted in self-review of c6f4912:

1. String-as-prefix typo split into character tuple. ``model_prefixes:
   mimo-`` (operator forgot brackets) used to iterate over characters
   → ``('m','i','m','o','-')``, silently routing every model id starting
   with 'm', 'i', or '-' through the entry. Now: non-list values coerce
   to empty tuple (entry survives but matches nothing — operator notices
   in boot banner, not via misrouted requests).

2. Single bad provider entry nuked the whole registry. _load_providers
   built the registry via a generator inside tuple(...). One AttributeError
   mid-comprehension (e.g. ``[mimo-, 123]`` — int's missing .lower())
   propagated out, broad except caught it, registry silently fell back
   to _BUILTIN_PROVIDERS (oauth + anthropic-api only). Every third-party
   model would then route to anthropic-oauth — exactly the silent-fallback
   failure mode this PR was meant to eliminate. Now: per-entry try/except
   drops the bad entry with a warning, rest survives.

Also: entries without a string ``name`` field are now dropped with a
warning instead of silently using the placeholder ``<unnamed>`` —
operator typos surface in boot logs.

Tests: 28 passing (3 new regression tests covering both issues plus
the no-name path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:58:24 -07:00
Hongming Wang
7c3aeb5a14 ci: install pyyaml so the YAML-loading test path is exercised
Without pyyaml in CI, adapter._load_providers' broad except-Exception
swallows the ImportError and silently falls back to _BUILTIN_PROVIDERS.
Tests then assert 7 providers but get 2; setup() can't route any
third-party model. Locally pyyaml is system-installed so the issue
went unnoticed.

Same failure mode as the 2026-04-30 incident (CI green, prod broken)
— pinning the dep here closes that gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:40:47 -07:00
Hongming Wang
9de33057aa feat(config): add MiniMax-M2.7-highspeed model entry
Routes via the existing `minimax` provider entry (model prefix matches
`minimax-` case-insensitively) — no registry change needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:30:24 -07:00
Hongming Wang
c6f4912d09 feat(adapter): data-driven provider registry in config.yaml
Move the model→endpoint→auth-env mapping out of hardcoded constants
in adapter.py + entrypoint.sh into a single `providers:` list at the
top of config.yaml. The adapter loads it at boot via _load_providers;
canvas Config tab will read the same YAML for its Provider dropdown so
UI and adapter never disagree on what's available. Adding a new
provider becomes a one-line YAML edit — no Python or shell changes.

Includes 5 third-party providers ready out of the box (Anthropic-compat
endpoints, Bearer-style ANTHROPIC_AUTH_TOKEN OR ANTHROPIC_API_KEY auth):

  xiaomi-mimo  https://api.xiaomimimo.com/anthropic
  minimax      https://api.minimax.io/anthropic
  zai          https://api.z.ai/api/anthropic           (NEW)
  moonshot     https://api.moonshot.ai/anthropic        (NEW)
  deepseek     https://api.deepseek.com/anthropic       (NEW)

Plus 7 new model entries in runtime_config.models (mimo-v2.5, MiniMax-M2,
MiniMax-M2.7, GLM-4.6, GLM-4.5, kimi-k2.5, kimi-k2, deepseek-v4-pro,
deepseek-v4-flash) so they show up in the Canvas Config dropdown.

Operator override unchanged: ANTHROPIC_BASE_URL set as a workspace
secret still wins over the registry default — the escape hatch for
regional endpoints (Xiaomi token-plan-sgp, MiniMax api.minimaxi.com).

entrypoint.sh: drops the `mimo-*` case mapping (adapter handles routing
now). _BUILTIN_PROVIDERS retained as malformed-YAML fallback so a
bare-bones workspace still boots with oauth + anthropic-api defaults.

Tests: 25 passing. New coverage:
  - YAML parses + normalizes to expected shape
  - Malformed YAML falls back to builtins (warning, not raise)
  - Each new provider routes its model id to the right base_url
  - ANTHROPIC_AUTH_TOKEN alone satisfies third-party auth check
  - Operator-set ANTHROPIC_BASE_URL overrides registry default
  - Case-insensitive prefix match (MiniMax-M2 / minimax-m2.7 / GLM-4.6)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:29:40 -07:00
Hongming Wang
e02c5bf34b
Merge pull request #21 from Molecule-AI/feat/setup-raise-on-third-party-no-base-url
feat(adapter): raise on third-party model without ANTHROPIC_BASE_URL
2026-04-30 23:09:41 -07:00
Hongming Wang
c646b8cebe feat(adapter): raise on third-party model without ANTHROPIC_BASE_URL
Aligns setup()'s third-party-model-without-URL handling with
create_executor()'s pre-validate (#19) — both unrecoverable
misconfigurations now raise ValueError at boot instead of one warning
and one raising.

Why: a third-party (mimo-*) model selected without ANTHROPIC_BASE_URL
sends every LLM request to api.anthropic.com with a non-Anthropic key,
401-ing every prompt. Workspace boots, looks "online" via heartbeat,
but is structurally broken on the user-facing path. The previous
warning-only path produced the same end-user symptom as the
2026-04-30 incident (workspace looks alive, every interaction fails)
just via a different misconfig shape.

Symmetry: create_executor raises when ANTHROPIC_BASE_URL is set to a
non-Anthropic host but no model is picked. setup() now raises when a
third-party model is picked but no URL is set. Together they catch
both halves of the misconfig surface at boot, before the workspace
enters "online" status.

Adds 4 setup() tests:
- raises on third-party + no URL
- passes on third-party + URL
- passes on OAuth alias (sonnet) + no URL
- passes on Anthropic API id (claude-*) + no URL

Stubs molecule_runtime.plugins.load_plugins as a no-op so the pass-path
tests run cleanly without the runtime installed. Test count: 11 (7
create_executor + 4 setup).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:50:25 -07:00
Hongming Wang
3d83e0513c
Merge pull request #20 from Molecule-AI/chore/adapter-prevalidate-cleanup
chore(adapter): drop redundant urlparse imports + dead ternary
2026-04-30 22:48:54 -07:00
Hongming Wang
a4d83cb356 chore(adapter): drop redundant urlparse imports + dead ternary
Self-review follow-up to #19. Two cosmetic cleanups:

- urlparse is now imported at module-top (added in #17 alongside the
  auth-mode classification) so the two inline `from urllib.parse import
  urlparse` statements inside conditional branches are redundant.
- The log-format ternary " (custom upstream)" if base_url else "" lives
  inside `if base_url:` — base_url is unconditionally truthy there, so
  the else branch was dead code.

No behavior change. Tests still 7/7 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:45:04 -07:00
Hongming Wang
a8d3b97668
Merge pull request #19 from Molecule-AI/feat/adapter-prevalidate
feat(adapter): pre-validate ANTHROPIC_BASE_URL + missing model combo
2026-04-30 22:43:49 -07:00
Hongming Wang
61f935674f
Merge branch 'main' into feat/adapter-prevalidate 2026-04-30 22:38:53 -07:00
Hongming Wang
0d95b5098a feat(adapter): pre-validate ANTHROPIC_BASE_URL + missing model combo
The 2026-04-30 staging incident traced back to workspaces booting with
ANTHROPIC_BASE_URL pointing at a non-Anthropic shim (MiniMax / OpenAI
gateway) but no explicit model configured. The adapter silently fell
back to "sonnet" — an Anthropic-native alias the upstream didn't
recognize — and the SDK --print probe hung 30s before timing out.
Platform's phantom-busy sweep then nuked the workspace at 10min,
producing "every workspace dead" with the root cause buried in a
30s subprocess hang.

Pre-validate the combo at adapter boot: when ANTHROPIC_BASE_URL host
is non-Anthropic AND no explicit model is set, raise ValueError with
an actionable message pointing to MODEL_PROVIDER / runtime_config.model.
Also log the resolved model + base_url_host every boot so future
failures explain themselves in the workspace logs without digging
into the SDK subprocess.

Tests live under tests/ with their own pytest.ini that anchors rootdir
there — keeps pytest from importing the package __init__.py (which
does the runtime-discovery relative import that requires
molecule_runtime installed). 7 tests cover: misconfig raises with the
right message, Anthropic-native passes, no-base-url passes, custom-url
+ explicit model passes, dataclass + dict shapes, unparseable URL
no-crash. CI runs them on every push/PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:35:49 -07:00
14f27b7886
Merge pull request #17 from Molecule-AI/feat/xiaomi-mimo-anthropic-compat
feat: add Xiaomi MiMo support (testing — entrypoint-shell mapping)
2026-04-29 17:13:23 -07:00
8c57ee0a95
Merge pull request #18 from Molecule-AI/fix/token-plan-url-support
fix: Token Plan URL support and multi-endpoint routing docs
2026-04-29 17:11:44 -07:00
528531f30d
Merge branch 'main' into feat/xiaomi-mimo-anthropic-compat 2026-04-29 17:11:32 -07:00
Hongming Wang
def15d3738 fix: document Token Plan URL support and multi-endpoint routing
- README: split Xiaomi MiMo into pay-as-you-go vs Token Plan rows,
  explicitly document ANTHROPIC_BASE_URL as a required secret for
  Token Plan users, and note that operator-set values always win over
  the shell mapping fallback
- entrypoint.sh: add supported Xiaomi MiMo endpoints comment listing
  pay-as-you-go + Token Plan SG/HK URLs for discoverability
2026-04-29 16:56:43 -07:00
Hongming Wang
f6577c6853
Merge pull request #11 from Molecule-AI/chore/enroll-secret-scan
chore(ci): enroll in org-wide secret-scan reusable workflow (Molecule-AI/molecule-core#2109)
2026-04-29 13:48:20 -07:00
Hongming Wang
4af6cd612a
Merge branch 'main' into chore/enroll-secret-scan 2026-04-29 13:46:33 -07:00
Hongming Wang
824bc4a176 adapter: warn for the right env var per auth mode + log boot banner
The pre-multi-provider warning hardcoded CLAUDE_CODE_OAUTH_TOKEN — it
fired even when an operator legitimately picked claude-sonnet-4-6 (API
key) or mimo-v2-flash (third-party) and set ANTHROPIC_API_KEY instead.
Misleading.

Now classifies the picked model into oauth / anthropic_api /
third_party_anthropic_compat and warns about the env var that auth path
actually needs. Adds a single-line boot banner so workspace logs surface
which provider was selected and (for third-party) which base-URL host
took effect — host-only, never full URL.

Adds an additional warning when a third-party model is selected but
ANTHROPIC_BASE_URL is unset, since the symptom otherwise is silent
fall-through to api.anthropic.com with a third-party key (401).

Functional tests against 14 model-id cases (oauth aliases, claude-*
versioned, all 4 mimo-* variants, case-insensitivity, empty/None,
unknown id fallback) all pass — see commit's pre-push validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 03:15:21 -07:00
Hongming Wang
a21d16d94f feat: add Xiaomi MiMo support via Anthropic-API-compatible routing (testing)
Adds 4 model entries (mimo-v2-flash, mimo-v2-pro, mimo-v2-omni,
mimo-v2.5-pro) selectable from canvas. When MODEL matches mimo-*,
entrypoint.sh exports ANTHROPIC_BASE_URL=https://api.xiaomimimo.com/anthropic
so the claude CLI's native ANTHROPIC_BASE_URL handling routes there.
ANTHROPIC_API_KEY in this case is the Xiaomi key, not Anthropic Console.

Verified live against all 4 model IDs with x-api-key auth — all returned
200 with proper Anthropic-shape Messages responses (id, type=message,
role=assistant, content[].text, usage including cache_read_input_tokens).

Operator-set ANTHROPIC_BASE_URL is never overridden — the case-statement
only fills in the default when unset, so a user-supplied proxy still wins.

Marked as testing because the model→base-URL mapping currently lives in
entrypoint.sh shell. The robust shape is a data-driven `runtime_env`
field in config.yaml read by the platform provisioner; will follow up
with that as a separate cross-repo PR (workspace-server + canvas) so
this template no longer carries provider-specific knowledge in shell.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 03:13:01 -07:00
Hongming Wang
e1e3c8d3d5
Merge pull request #7 from Molecule-AI/fix/oauth-token-startup-warning
fix(adapter): warn at startup if CLAUDE_CODE_OAUTH_TOKEN is absent (KI-001)
2026-04-29 02:01:09 -07:00
c930626f82 fix(adapter): warn at startup if CLAUDE_CODE_OAUTH_TOKEN is absent (KI-001)
adapter.py:setup() now emits a logger.warning() if CLAUDE_CODE_OAUTH_TOKEN
is absent, so operators see the problem immediately instead of getting a silent
AuthenticationError on the first LLM call. known-issues.md updated to mark
KI-001 as resolved.
2026-04-29 01:57:16 -07:00
Hongming Wang
8bb5d91199
Merge pull request #9 from Molecule-AI/fix/wire-up-gh-token-refresh
fix: wire up GitHub App token refresh — fixes #1933
2026-04-29 00:59:55 -07:00
Hongming Wang
f48c993bbb
Merge branch 'main' into fix/wire-up-gh-token-refresh 2026-04-29 00:58:08 -07:00
Hongming Wang
afc0fae6e7
Merge pull request #14 from Molecule-AI/fix/no-publish-on-pr
fix(publish-image): drop pull_request trigger — leaks PR builds to GHCR
2026-04-29 00:56:54 -07:00
Hongming Wang
fd92de2591
Merge branch 'main' into fix/wire-up-gh-token-refresh 2026-04-29 00:56:02 -07:00
Hongming Wang
2bd206e89b
Merge branch 'main' into fix/no-publish-on-pr 2026-04-29 00:54:31 -07:00
Hongming Wang
7cb0c6c45b
Merge pull request #15 from Molecule-AI/fix/a2a-sdk-v1-file-part-protobuf
fix(a2a-v1): rewrite FilePart emit using v1 protobuf Part struct
2026-04-29 00:50:32 -07:00
Hongming Wang
1a84de8a61 fix(a2a-v1): rewrite FilePart emit using v1 protobuf Part struct
a2a-sdk v1.0.2 replaced the v0 Pydantic discriminated-union types
(Part(root=TextPart(...))/Part(root=FilePart(file=FileWithUri(...))))
with a single protobuf Part struct that has optional `text`, `url`,
`raw`, `data`, `filename`, `media_type` fields. The classes
FilePart, TextPart, FileWithUri don't exist in v1 — import fails:

    File "claude_sdk_executor.py", line 592
        from a2a.types import FilePart, FileWithUri, Message, Part, Role, TextPart
    ImportError: cannot import name 'FilePart' from 'a2a.types'

Production impact: every claude-code workspace (Design Director, UX
Researcher, all coordinators in molecule-core teams) crashes on
result delivery whenever the response includes a /workspace/* file
reference. The A2A delegation loop is broken at the result-delivery
step. Workspaces can receive tasks but can't ship results back.

Fix:

  - Drop FilePart/TextPart/FileWithUri imports (don't exist in v1).
  - `Part(root=TextPart(text=t))` → `Part(text=t)`.
  - `Part(root=FilePart(file=FileWithUri(uri=u, name=n, mimeType=m)))` →
    `Part(url=u, filename=n, media_type=m)`.
  - `messageId=...` → `message_id=...` (snake_case in protobuf).
  - `Role.agent` → `Role.ROLE_AGENT` (v1 enum).

Verified by constructing the exact shape against v1.0.2 in the
running claude-code template image:

  Message:
    message_id: 03ff9367
    role: ROLE_AGENT
    parts count: 2
    text part: hello
    file part: workspace:foo.txt foo.txt text/plain

Refs: molecule-core memory `reference_a2a_sdk_v0_to_v1_migration`
documents the Pydantic→protobuf shift; this is the fifth migration
finding today (after the new_agent_text_message rename in
crewai/openclaw/autogen/gemini-cli).

Test plan:

  - [x] `python3 -m py_compile claude_sdk_executor.py` clean.
  - [x] Runtime construction smoke verified against the live v1.0.2
        a2a-sdk in the claude-code template image.
  - [ ] End-to-end: provision a claude-code workspace, send a task
        whose response references a /workspace/* file, confirm
        result lands without ImportError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:46:47 -07:00
Hongming Wang
3531f19668 fix(publish-image): drop pull_request trigger — leaks PR builds to GHCR
`on: pull_request:` was the only template-repo with this trigger
out of the 8 (other 7 trigger only on push:main, repository_dispatch,
workflow_dispatch). The reusable publish-template-image workflow has
no PR-skip guard, so the PR trigger fired every time a PR was opened
or updated and pushed both `:latest` (clobbering the production tag
with unmerged code) and `:sha-<7>` (a stable tag for an unmerged
commit) to GHCR.

Verification at PR time already happens via the
validate-workspace-template workflow's "Docker build smoke test"
step, which builds the image but does NOT push. That's the right
place for PR-time verification.

Removing the trigger here aligns claude-code with the canonical 7
templates and stops the GHCR leak.

While here, updated the runtime_version comment to drop the now-
stale "/PR" reference.
2026-04-27 15:15:46 -07:00
Hongming Wang
de2ab5ab33 feat: forward client_payload.runtime_version + ARG RUNTIME_VERSION
Closes the cache trap structurally (instead of pin-bumping every
runtime release):
1. publish-image.yml caller now forwards
   github.event.client_payload.runtime_version (set by cascade) to
   the molecule-ci reusable workflow as runtime_version input.
2. Reusable workflow forwards it to docker build as a --build-arg.
3. Dockerfile declares ARG RUNTIME_VERSION near the pip install
   layer so its value becomes part of the cache key.
4. The pip install RUN command does an extra targeted upgrade to
   the exact version when ARG is set — guarantees the version is
   what we expect even if requirements.txt resolves to something
   else.

Pairs with molecule-ci PR #12 + molecule-core PR #2181. Together
the pipeline is now race- and cache-proof end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 07:46:14 -07:00
Hongming Wang
059db9ba14 chore: bump pin to >=0.1.22 (state_transition_history fix)
Forces docker layer cache invalidation. Cascade rebuilt against
0.1.22 but hit GHA cache and shipped 0.1.21 (broken AgentCard
construction). Pin bump invalidates the cache key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 07:39:20 -07:00
Hongming Wang
b1c4cab460 chore: bump runtime pin to >=0.1.21 (lib/ + manifest)
Forces docker layer cache invalidation. The 13:29 cascade rebuild
hit GHA's cached pip-install layer (requirements.txt unchanged →
same cache key → 0.1.19 baked in). Image shipped with 0.1.19 even
though 0.1.21 was on PyPI. Same race as the 0.1.18 → 0.1.19 cycle
earlier today (task #130 — structural fix is to wait for PyPI
propagation in the cascade step).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 06:52:30 -07:00
Hongming Wang
2b0b0d9fcd fix: migrate claude_sdk_executor to a2a-sdk 1.x (new_text_message)
Same a2a-sdk 1.x rename already shipped in hermes/executor.py and
workspace/a2a_executor.py: a2a-sdk dropped `new_agent_text_message`
in favor of `new_text_message` (role=Role.agent default preserves
behavior). Three call sites in this file.

Symptom: every claude-code workspace died at create_executor →
ImportError: cannot import name 'new_agent_text_message' from
'a2a.helpers'. Why this slipped past every prior fix:

The boot smoke gate only does `import adapter`. adapter.py imports
ClaudeSDKExecutor lazily INSIDE create_executor() (line 106),
which means claude_sdk_executor.py is never loaded at module
import time. The lazy-load pattern hid the bug from CI.

molecule-ci PR #8 (lint + import-every-app-py smoke) catches this
class going forward — the new smoke loop iterates every /app/*.py
including claude_sdk_executor.py, forcing module-level imports to
resolve.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:55:19 -07:00
Hongming Wang
e7dea39df2 fix: qualify all bare imports of runtime modules
Five `from <runtime_module> import` statements in adapter.py +
claude_sdk_executor.py were never qualified when the template was
extracted to its own repo (#87). They worked when the runtime was
bundled into workspace/ where bare imports resolved against
sibling files; in the template repo they explode at startup with
ModuleNotFoundError as soon as Python reaches the import.

Caught by manual provision after pipeline-3 wire-real E2E. The
plugins import was the first one tripped because it sits in
adapter.setup() — earlier bare imports inside claude_sdk_executor.py
are deferred until the executor is constructed.

Pattern: any `from <X> import Y` where X is a workspace/ module ->
`from molecule_runtime.X import Y`. Fixes:
- adapter.py:97          plugins
- claude_sdk_executor.py executor_helpers, heartbeat, a2a_client, platform_auth

Same class of bug as the runtime's TOP_LEVEL_MODULES drift but
inverted — instead of forgetting to rewrite imports IN the wheel,
the template authors forgot to qualify imports IN the template
code (the build script's rewriter only runs on workspace/ -> wheel).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:20:24 -07:00
Hongming Wang
280e89c50b fix: export Adapter alias so runtime adapter discovery works
`workspace/adapters/__init__.py:get_adapter()` does
`getattr(mod, "Adapter")` after importing ADAPTER_MODULE. Without the
alias the runtime's preflight check fails with:

  [FAIL] Runtime: ADAPTER_MODULE='adapter' imported, but no `Adapter`
  class is exported. Add `Adapter = YourAdapterClass` at module scope

Symptom: workspace container restarts forever, never reaches `online`.

This contract was added (or hardened) in #123's adapter-discovery
refactor. Hermes's adapter.py already has `Adapter = HermesAgentAdapter`
at module scope; claude-code missed the migration. gemini-cli template
has the same bug — file separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 04:55:30 -07:00
Hongming Wang
e18986e7b4 chore: bump runtime pin to >=0.1.19
Forces docker layer cache invalidation. The cascade-triggered build at
10:59:46 UTC raced PyPI propagation: publish-runtime completed at 10:59:47
UTC and the cascade fired repository_dispatch immediately, so the
template build's `pip install` got 0.1.18 (still missing main_sync)
instead of the freshly-uploaded 0.1.19. GHA layer cache then pinned
that for any subsequent build with identical requirements.txt.

Bumping the pin invalidates the cache and forces a fresh resolve. The
proper structural fix is to add a sleep/poll-PyPI step to the cascade
job in publish-runtime.yml so it doesn't fan out until the new version
is actually visible on PyPI's index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 04:16:51 -07:00
Hongming Wang
d313e45117 chore: bump molecule-ai-workspace-runtime pin to >=0.1.16
Forces docker layer cache invalidation so the next image rebuild
actually pulls 0.1.16 (which has RuntimeCapabilities + post-#87
changes) instead of reusing the cached layer with 0.1.15.

Caught by the new boot-import smoke gate in molecule-ci PR #7 — the
build at 09:34 UTC pushed a SHA-tagged image whose pip layer was
cached with 0.1.15, then ImportError'd on `from
molecule_runtime.adapters.base import RuntimeCapabilities`. Smoke
gate failed red, broken image never reached :latest. Working as
designed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 02:36:49 -07:00
Hongming Wang
1a54cdb308
Merge pull request #13 from Molecule-AI/feat/import-claude-sdk-executor
feat(template): own claude_sdk_executor locally (universal-runtime refactor)
2026-04-27 00:36:48 -07:00
Hongming Wang
033cf33c42
Merge pull request #12 from Molecule-AI/feat/declare-runtime-capabilities
feat(adapter): declare native_session + idle_timeout_override
2026-04-27 00:36:34 -07:00
Hongming Wang
fab7c6a929 feat(template): own claude_sdk_executor locally (universal-runtime refactor)
First half of molecule-core task #87 — move adapter-specific code out
of the universal molecule-runtime package into the template that
actually consumes it.

Adds:
  - claude_sdk_executor.py (757 LOC) — copied verbatim from
    molecule-core/workspace/claude_sdk_executor.py @ commit 186f25c2.
    The adapter at adapter.py:59 already does
    `from claude_sdk_executor import ClaudeSDKExecutor` — once this
    file lands at /app/, Python's import order picks the local copy
    over the same-named module that older molecule-runtime versions
    ship under site-packages.
  - Dockerfile: COPY claude_sdk_executor.py . alongside adapter.py.

Pure additive at this stage — molecule-runtime still ships the
file too, so any image built from this template just has two copies
on disk (local /app shadows the site-packages one). No behavior
change.

Sequencing (the molecule-core PR follows AFTER this image rebuilds):
  1. THIS PR — template gets local copy, image rebuilds with it
     (current PR; safe because no removal yet)
  2. molecule-core PR — drop workspace/claude_sdk_executor.py, bump
     molecule-ai-workspace-runtime PyPI version. Templates that
     haven't pulled the new runtime version still work because their
     local copy is unchanged.
  3. (later) Bump requirements.txt pin in this template once the
     new runtime version is on PyPI, so future builds explicitly
     install the slimmed runtime.

Why local-copy-first:
  - Reverse order (drop from runtime first, then add to template)
    creates a window where any template image build pulling the
    latest runtime would fail to import claude_sdk_executor.
  - This order has zero downtime: every intermediate state is valid.

Validates the capability primitives shipped in molecule-core PRs
#2137-#2144 — once this template image rebuilds and the molecule-
core deletion lands, the claude-code workspace is the FIRST adapter
to live entirely outside molecule-runtime, with native_session +
idle_timeout_override declared via capabilities() (PR #12 here).

Source: molecule-core/workspace/claude_sdk_executor.py @ 186f25c2
(commit hash pinned for traceability of any future divergence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:58:05 -07:00
Hongming Wang
c9ca671cd5 feat(adapter): declare provides_native_session + idle_timeout_override
Wires this template into the platform's capability-primitive layer
(molecule-core task #117). Two declarations:

1. RuntimeCapabilities(provides_native_session=True) — the claude-agent-sdk
   maintains a long-lived streaming session with its own client state.
   The platform's a2a_queue would double-buffer that in-flight state
   if it didn't know the SDK owned it. Once primitive #5 lands in
   molecule-core, the platform's enqueue path will skip workspaces
   declaring this and dispatch directly.

2. idle_timeout_override() returning 900 (15 min) — Opus + multi-step
   tool use legitimately runs 8-10 min between broadcaster events.
   The pre-capability bug (molecule-core PR #2128) hit this: the
   platform's 5min idle timer cancelled mid-flight during long
   packaging steps. The override moves the per-workspace ceiling up
   without leaving genuinely-wedged runs hanging too long. Consumed
   by molecule-core PR #2139 in a2a_proxy.dispatchA2A.

Other capability flags stay False — see inline docstring for the
per-flag rationale (notably native_status_mgmt is partially adapter-
driven via runtime_state="wedged" but the recovery path stays platform-
owned, so we don't claim it yet).

Requires molecule-ai-workspace-runtime with RuntimeCapabilities (PR
#2137 in molecule-core, merged 2026-04-27). The current
requirements.txt pin (>=0.1.0) will pick up the latest released
version on next image rebuild.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:07:28 -07:00
rabbitblood
8e905cda0d chore(ci): enroll in org-wide secret-scan reusable workflow (Molecule-AI/molecule-core#2109) 2026-04-26 20:09:02 -07:00
Hongming Wang
13eadcc158
ci(publish-image): accept repository_dispatch from monorepo runtime publish (#10)
Adds 'repository_dispatch' trigger (event-type: runtime-published) so
molecule-core's publish-runtime.yml cascade job can fire this template's
image rebuild after a new molecule-ai-workspace-runtime PyPI release.

Without this, every runtime release waited for the next push: main /
manual workflow_dispatch to propagate to the published image. With it,
runtime fixes flow monorepo → PyPI → all 8 template images
automatically.

Part of the runtime CD chain. See molecule-core docs/workspace-runtime-package.md.

Co-authored-by: Hongming Wang <hongmingwangalt@gmail.com>
2026-04-26 12:42:19 -07:00
Hongming Wang
8fbd6689f0
Merge pull request #8 from Molecule-AI/fix/publish-image-pr-trigger
fix(ci): add pull_request trigger to publish-image workflow
2026-04-24 13:25:54 -07:00
rabbitblood
39c5b5b11f chore: enforce LF line endings + fix entrypoint.sh CRLF
Without this, Windows Docker Desktop checks out the entrypoint and
helper scripts with CRLF, and `#!/bin/sh\r` either fails outright or
silently exec's the wrong interpreter, depending on the kernel +
busybox combo.

Adds .gitattributes to pin LF on all shell/Python/YAML files +
renormalises the existing entrypoint.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:57:57 -07:00
rabbitblood
d4ab584deb fix: wire up GitHub App token refresh — fixes #1933
Symptoms before this PR:
- After ~60 min of workspace uptime, every git push/clone returns 401
- PMM, DevRel, Social Media Brand and other content agents infinite-loop
  status reports back to PMs ("I tried, GH_TOKEN dead")
- PM A2A queues overflow with retry-status messages (depth 27 on Marketing
  Lead, 18 on Dev Lead, 11 on Core Platform Lead at peak)

Root cause:
- GH_TOKEN/GITHUB_TOKEN injected at provision time has a ~60 min TTL
  (GitHub App installation tokens cap at one hour)
- Workspace env is frozen at container start — no in-process mechanism
  to refresh after expiry
- The credential-helper architecture exists in the codebase but was
  never wired up at template boot. Specifically the claude-code template:
  - did not COPY the helper scripts into the image
  - did not configure git credential.helper at boot
  - did not start the background refresh daemon
  - did not run initial gh auth login

Fix:
1. Dockerfile COPYs scripts/molecule-git-token-helper.sh and
   scripts/molecule-gh-token-refresh.sh into /app/scripts/
2. entrypoint.sh (root half) configures git credential helper for
   github.com and creates the per-user token cache directory
3. entrypoint.sh (agent half) starts the refresh daemon under a
   respawn loop and runs initial `gh auth login --with-token`

The helper hits the platform's /admin/github-installation-token endpoint
(fallback to env-var GH_TOKEN when platform unreachable). The refresh
daemon calls _refresh_gh every ~45 min ± 2 min jitter so cli auth and
helper cache stay warm even when no git operation triggers a refresh.

Acceptance:
- After this image deploys, `gh api /user` from inside a workspace
  should keep returning 200 even after >60 min uptime
- Marketing Lead / Dev Lead a2a queues should drain to <5 within one
  cycle of the new image rolling

Follow-up issues to file (not in this PR):
- Replicate this wiring in the other 7 template repos (autogen, crewai,
  deepagents, gemini-cli, hermes, langgraph, openclaw)
- Lift the wiring into the molecule-runtime PyPI package so future
  templates inherit it instead of re-implementing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:57:30 -07:00
8250fd0008 fix(ci): add pull_request trigger to publish-image workflow
Branch protection on main requires the publish / Build & push template
image check to pass for all PRs. The workflow previously only triggered
on push to main, so PRs could never satisfy branch protection.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-04-23 05:37:16 +00:00
molecule-ai[bot]
fc6f71194e
fix(security): remove API key from git history + add publish-image CI
Removes the .auth-token file (containing a live API key) from git history.
The file was committed in the initial commit (b8859da, Apr 16) but is now
replaced with an empty placeholder in this branch.

Also adds .github/workflows/publish-image.yml for GHCR image publishing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 00:09:30 +00:00
molecule-ai[bot]
2ef87f2f23
fix(security): remove .auth-token API key from git history
The .auth-token file committed in b8859da contains a live API key.
Remove it from git history and add CI publish-image workflow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 00:06:36 +00:00
335474b71b docs: add known-issues.md and runbooks/local-dev-setup.md
Recovered from prior work. Previously pushed commit 03c6929
was lost during reset-to-origin/main divergence resolution.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 08:36:22 +00:00