The `HongmingWang-Rabbit/hermes-agent` fork is no longer reachable on
github.com (account suspended 2026-05-06). The patched fork now lives
at https://git.moleculesai.app/molecule-ai/hermes-agent. Same SHAs,
same branches — pure URL flip.
See molecule-ai/internal#72 for the github.com fork shell decision.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two artifacts that unblock the parked follow-ups from #59:
1. scripts/edge-429-probe.sh (closes the "operator-blocked" status of
#62). An operator without CF/Vercel dashboard access can reproduce
a canvas-sized burst against a tenant subdomain and read each 429's
response shape — workspace-server bucket overflow (JSON body +
X-RateLimit-* headers) is distinguishable from CF (cf-ray) and
Vercel (x-vercel-id) by inspection of the report. Read-only,
parallel via background subshells (no GNU parallel dependency),
no credential use. Smoke-tested against example.com end-to-end.
2. docs/engineering/ratelimit-observability.md (closes the
"metric-blocked" status of #64). The existing
molecule_http_requests_total{path,status} counter + X-RateLimit-*
response headers already cover #64's acceptance criterion ("watch
metrics for two weeks"). The runbook collects the PromQL queries,
a decision tree for the re-tune (keep / per-tenant override /
change default), an alert rule template, and a hard "do not roll
ad-hoc per-bucket-key exposure" note (in-memory map includes
SHA-256 of bearer tokens — exposing it is a security review
surface, file a follow-up if needed).
Neither artifact changes runtime behaviour. Pure operational tooling.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OSS contributors who clone molecule-core and `go run ./workspace-server/cmd/server`
now get a working end-to-end provision without authenticating to GHCR or AWS ECR.
Pre-fix: with MOLECULE_IMAGE_REGISTRY unset, the provisioner attempted to pull
ghcr.io/molecule-ai/workspace-template-<runtime>:latest, which has been
returning 403 since the 2026-05-06 GitHub-org suspension.
Post-fix: when MOLECULE_IMAGE_REGISTRY is unset, the provisioner switches to
local-build mode — looks up the workspace-template-<runtime> repo's HEAD sha
on Gitea via a single API call, shallow-clones into ~/.cache/molecule/, and
runs `docker build --platform=linux/amd64`. SHA-pinned cache key skips the
clone+build entirely on subsequent provisions.
Production tenants are unaffected: every prod tenant sets the var to its
private ECR mirror, so the SaaS pull path is byte-for-byte identical.
SSOT for mode detection lives in Resolve() (registry_mode.go) returning a
discriminated RegistrySource{Mode, Prefix} so call sites that branch on
mode get a compile-time push instead of a string-equality footgun.
Coverage:
* registry_mode.go — new SSOT (Resolve, RegistryMode, IsKnownRuntime)
* registry_mode_test.go — 8 tests pinning mode-decision contract
* localbuild.go — clone+build pipeline (570 LOC, fully unit-tested)
* localbuild_test.go — 22 tests covering happy/sad paths, fail-closed
* provisioner.go — Start() inserts ensureLocalImageHook in local mode
* docs/adr/ADR-002 — design rationale + alternatives + security review
* docs/development/local-development.md — local-build flow + env overrides
Security:
* Allowlist-only runtime names (knownRuntimes) gate the clone path.
* Repo prefix hardcoded to git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-;
forks via opt-in MOLECULE_LOCAL_TEMPLATE_REPO_PREFIX.
* MOLECULE_GITEA_TOKEN masked in every log line via maskTokenInURL/maskTokenInString.
* Fail-closed: Gitea unreachable / runtime not mirrored → clear error, never
silently fall back to GHCR/ECR.
* docker build invocation passes no --build-arg from external input.
* HTTP body cap 64KB on Gitea API responses (defence vs malicious upstream).
Closes#63 / Task #194.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-reported friction: pip install molecule-ai-workspace-runtime on a
3.10 interpreter fails with "Could not find a version that satisfies the
requirement (from versions: none)" — pip's requires_python filter
silently drops the only available artifact before attempting install,
so the error doesn't mention Python at all. Operators see
"package missing", file a bug, and chase a phantom CDN/visibility
issue.
Two changes mirror the requirement at the two operator-touch surfaces:
1. workspace-server/internal/handlers/external_connection.go:
the externalUniversalMcpTemplate snippet (rendered into the
canvas Connect-External-Agent modal) now leads with a brief
"Requires Python >= 3.11" block + diagnostic + upgrade paths.
2. docs/workspace-runtime-package.md: same callout at the top of
the doc, before the Overview, so anyone landing here from search
gets the answer immediately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-model retrospective review of #2856 (Phase 1 Expand removal)
flagged that TeamHandler.Collapse is unreachable from the canvas UI:
the "Collapse Team" button calls PATCH /workspaces/:id { collapsed }
(visual flag toggle on canvas_layouts), NOT POST /workspaces/:id/collapse.
The destructive POST route — which stops EC2s, marks children removed,
and deletes layouts — has zero UI callers (verified via grep across
canvas/, scripts/, and the MCP tool registry; only docs referenced it).
Two semantically different operations had been sharing the word
"Collapse":
- Visual collapse (canvas) → PATCH { collapsed: true }. Hides
children visually. Reversible. UI-only.
- Destructive collapse (POST /collapse) → Stops + marks removed.
Irreversible. No caller.
Deleting the destructive one + its supporting machinery:
- workspace-server/internal/handlers/team.go (entirely)
- workspace-server/internal/handlers/team_test.go (entirely)
- POST /collapse route + teamh init in router.go
- findTemplateDirByName helper (zero non-test callers after Expand
was deleted in #2856; package-private so no out-of-package consumers)
- NewTeamHandler constructor (no callers after route removed)
Plus stale doc references (the most dangerous was the MCP wrapper
mapping in mcp-server-setup.md — anyone generating MCP tool wrappers
from that table was wiring a 404):
- docs/agent-runtime/team-expansion.md (deleted entirely — whole
guide taught the deleted flow)
- docs/api-reference.md (dropped two team.go rows)
- docs/api-protocol/platform-api.md (dropped /expand + /collapse
rows)
- docs/architecture/molecule-technical-doc.md (dropped /expand +
/collapse rows)
- docs/guides/mcp-server-setup.md (dropped expand_team +
collapse_team MCP wrapper mappings)
- docs/glossary.md (dropped "(org template expand_team)"
parenthetical)
- docs/frontend/canvas.md (dropped broken link to deleted
team-expansion.md)
Kept: docs/architecture/backends.md mention of "TeamHandler.Expand
(#2367) bypassed routing on Start" — correct historical context for
the AST gate's existence, no live route reference.
Visual-collapse path unaffected:
canvas/src/components/ContextMenu.tsx:227 → api.patch — unchanged
canvas/src/components/WorkspaceNode.tsx:128 → api.patch — unchanged
go vet ./... clean. go test ./internal/handlers/ -count 1 — all green
(4.3s, no regression).
Net: -388/+10 = ~378 lines removed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#10.
The 2026-05-05 hongming silent-drop incident shipped because the
backends.md parity matrix didn't enforce a "go through the dispatcher"
rule — three handlers (TeamHandler.Expand, OrgHandler.createWorkspaceTree,
workspace_crud.go's stopAndRemove) silently bypassed routing on
SaaS for ~6 months across two distinct verbs.
This doc pass:
- Adds a "How to dispatch" section that's the canonical answer to
"where do I call Start / Stop / Has from?". Names the three
dispatchers (provisionWorkspaceAuto, StopWorkspaceAuto,
HasProvisioner), their fallbacks, and the allowed exceptions.
- Updates the matrix lifecycle rows so every dispatched operation
points at the dispatcher source, not the per-backend bodies.
- Adds Org-import + Team-collapse rows so the bulk paths are visible
to anyone scanning for parity gaps.
- Lists the source-level pins (4 of them) under Enforcement so
future contributors see them as load-bearing tests, not noise.
- Adds a "When you add a NEW dispatch site" section so the next verb
(Pause / Hibernate / Snapshot) lands as a dispatcher mirror, not
as another bespoke handler that drifts from the existing two.
- Refreshes Last audit to 2026-05-05.
No code change; doc-only. The SoT abstractions described here landed
in PRs #2811 + #2824.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#9.
Three pieces, all small:
1. **docs/e2e-coverage.md** — source of truth for which E2E suites
guard which surfaces. Today three were running but informational
only on staging; that's how the org-import silent-drop bug shipped
without a test catching it pre-merge. Now the matrix shows what's
required where + a follow-up note for the two suites that need an
always-emit refactor before they can be required.
2. **tools/branch-protection/apply.sh** — branch protection as code.
Lets `staging` and `main` required-checks live in a reviewable
shell script instead of UI clicks that get lost between admins.
This PR's net change: add `E2E API Smoke Test` and `Canvas tabs E2E`
as required on staging. Both already use the always-emit path-filter
pattern (no-op step emits SUCCESS when the workflow's paths weren't
touched), so making them required can't deadlock unrelated PRs.
3. **branch-protection-drift.yml** — daily cron + drift_check.sh
that compares live protection against apply.sh's desired state.
Catches out-of-band UI edits before they drift further. Fails the
workflow on mismatch; ops re-runs apply.sh or updates the script.
Out of scope (filed as follow-ups):
- e2e-staging-saas + e2e-staging-external use plain `paths:` filters
and never trigger when paths are unchanged. They need refactoring
to the always-emit shape (same as e2e-api / e2e-staging-canvas)
before they can be required.
- main branch protection mirrors staging here; if main wants the
E2E SaaS / External added later, do it in apply.sh and rerun.
Operator must apply once after merge:
bash tools/branch-protection/apply.sh
The drift check picks it up from there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Parent → child knowledge sharing previously lived behind a `shared_context`
list in config.yaml: at boot, every child workspace HTTP-fetched its parent's
listed files via GET /workspaces/:id/shared-context and prepended them as
a "## Parent Context" block. That paid the full transfer cost on every
boot regardless of whether the agent needed it, single-parent SPOF, no team
or org scope, and broken if the parent was unreachable.
Replace with memory v2's team:<id> namespace: agents call recall_memory
on demand. For large blob-shaped artefacts see RFC #2789 (platform-owned
shared file storage).
Removed:
- workspace/coordinator.py: get_parent_context()
- workspace/prompt.py: parent_context arg + injection block
- workspace/adapter_base.py: import + call + arg pass
- workspace/config.py: shared_context field + parser entry
- workspace-server/internal/handlers/templates.go: SharedContext handler
- workspace-server/internal/router/router.go: GET /shared-context route
- canvas/src/components/tabs/ConfigTab.tsx: Shared Context tag input
- canvas/src/components/tabs/config/form-inputs.tsx: schema field + default
- canvas/src/components/tabs/config/yaml-utils.ts: serializer entry
- 6 tests pinning the removed behavior; 5 doc references
Added regression gates so any reintroduction is loud:
- workspace/tests/test_prompt.py: build_system_prompt must NOT emit
"## Parent Context"
- workspace/tests/test_config.py: legacy YAML key loads cleanly but
shared_context attr must NOT exist on WorkspaceConfig
- tests/e2e/test_staging_full_saas.sh §9d: GET /shared-context must NOT
return 200 against a live tenant
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates plugin-author and operator docs to reflect the four fixup
PRs (C1, C2, I1, I4) for self-review findings.
Stacked on C1+C2 so the docs reference behavior that lands in the
same wave; rebases to staging once those merge.
What changes:
* docs/memory-plugins/README.md
- New "Memory idempotency" section explaining MemoryWrite.id
contract: omit → plugin generates UUID; supplied → upsert
- "Replacing the built-in plugin" rewritten as a 6-step
operator runbook with concrete commands for -dry-run / -apply
/ -verify / MEMORY_V2_CUTOVER, including the failure path
("if -verify reports mismatches, do not flip the cutover flag")
- Added link to new CHANGELOG.md
* docs/memory-plugins/testing-your-plugin.md
- New TestMyPlugin_IDIsIdempotencyKey example: write same id
twice, assert single row + updated content
- "What the harness does NOT cover" expanded with two new
operational gates: backfill twice → no double; verify-mode
reports zero mismatches
* docs/memory-plugins/pinecone-example/README.md
- Wire-mapping table updated: id (caller-supplied) → Pinecone
vector id (upsert); id (omitted) → plugin-generated UUID
- Production-hardening checklist gained an idempotency-key item
* docs/memory-plugins/CHANGELOG.md (new)
- Captures the four fixup PRs in one place with severity-ordered
summary, plugin-author action items, and remaining open
follow-ups (#289, #291, #293) for transparency
No code changes. Docs-only PR.
Self-review (post-merge) flagged that the backfill claimed to be
idempotent on re-run but actually duplicates every row because the
plugin's INSERT uses gen_random_uuid() and ignores any id passed in.
Fix is contract-level: extend MemoryWrite with an optional `id`
idempotency key. When supplied, the plugin MUST treat the write as
upsert keyed on this id; when omitted, the plugin generates a fresh
UUID (production agent commits keep working unchanged).
Changes:
* docs/api-protocol/memory-plugin-v1.yaml: add id field with
description that flags it as idempotency key
* internal/memory/contract/contract.go: add ID to MemoryWrite struct,
update memory_write_minimal golden vector
* internal/memory/pgplugin/store.go: split CommitMemory into two
paths — upsert when body.ID set (INSERT ... ON CONFLICT (id) DO
UPDATE), plain INSERT otherwise
* cmd/memory-backfill/main.go: pass agent_memories.id to MemoryWrite,
fix the false comment about 409 deduplication
New tests:
* pgplugin: TestCommitMemory_WithIDUpserts pins the upsert SQL is
used when id is set; TestCommitMemory_UpsertScanError covers the
error branch
* backfill: TestBackfill_PassesSourceUUIDAsIdempotencyKey pins the
forwarding behavior; TestBackfill_RerunIsIdempotent simulates a
retry and asserts both runs pass the same uuid (plugin upsert is
what makes this safe)
Why this matters: operators retrying a failed backfill (which they
will — networks fail, transactions abort) would otherwise create N
duplicates per memory. The duplicates aren't visible until search
results show obvious dupes — debugging that under prod load is bad.
Production agent commits are unaffected: they leave id empty, the
plugin generates a fresh UUID via gen_random_uuid(), zero behavior
change for the hot path.
Builds on merged PR-1..7 (PR-8 in queue). Pure docs; no code.
What ships:
* docs/memory-plugins/README.md — contract overview, capability
negotiation, deployment models, replacement workflow
* docs/memory-plugins/testing-your-plugin.md — using the contract
test harness to validate wire compatibility, what the harness
DOES NOT cover (capability accuracy, TTL eviction, concurrency)
* docs/memory-plugins/pinecone-example/README.md — worked example
of a Pinecone-backed plugin: capability mapping (only embedding,
no FTS), wire mapping (memory → vector + metadata), production-
hardening checklist
Documentation strategy:
* Lead with what workspace-server takes care of (security perimeter,
redaction, ACL, GLOBAL audit, prompt-injection wrap) so plugin
authors don't reimplement those layers
* Show three deployment models (same machine / separate container /
self-managed) so operators see their topology
* Capability table makes it explicit what each capability gates so
a plugin that supports only one (e.g. semantic search) is still
a useful plugin
* Pinecone example is honest: shows the skeleton, the wire mapping,
and explicitly calls out what's MISSING from the sketch (batch
commits, TTL janitor, circuit breaker, metrics)
First of 11 PRs implementing the memory-system plugin refactor (RFC #2728).
This PR is pure additive scaffolding — no behavior change, no integration
yet. It defines the wire shape between workspace-server and a memory
plugin so PR-2 (HTTP client) and PR-3 (built-in postgres plugin) can be
built against a single source of truth.
What ships:
- docs/api-protocol/memory-plugin-v1.yaml: OpenAPI 3.0.3 spec covering
/v1/health, namespace upsert/patch/delete, memory commit, search,
forget. Auth-free (private network only); workspace-server is the
only sanctioned client and the security perimeter.
- workspace-server/internal/memory/contract: typed Go bindings with
Validate() methods on every wire object so both client (PR-2) and
server (PR-3) self-check at the boundary.
- Round-trip JSON tests for every type (catch asymmetric tag bugs).
- 5 golden vector files under testdata/ pinning the exact wire shape;
update via UPDATE_GOLDENS=1.
Coverage: 100% of statements in contract.go.
The validation rules encode design decisions worth flagging in review:
- SearchRequest with empty Namespaces is REJECTED at plugin level —
workspace-server is required to intersect the readable set
server-side; an empty list reaching the plugin is a bug.
- NamespacePatch with no fields is REJECTED — empty patches are
pointless round-trips.
- MemoryWrite with whitespace-only Content is REJECTED — zero-info
memories pollute search results.
No code yet calls into this package; integration starts in PR-2.
Closes the documentation + audit gap for declarative skill-compat. The
plumbing has been live since PR #117 (RuntimeCapabilities) and
skill_loader's `_normalize_runtime_field` has been emitting filter
decisions for weeks, but:
- No public doc explained the `runtime` frontmatter field, so skill
authors didn't know how to opt in / opt out.
- No structural gate ensured every load_skills() call site threads
current_runtime — a future caller forgetting the kwarg silently
force-loads runtime-incompatible skills (no AttributeError, just a
delayed crash on first tool invocation).
Two changes:
1. docs/agent-runtime/skills.md
- Adds `runtime`, `tags`, `examples` to the Frontmatter Fields table.
- Adds a Runtime Compatibility section with example, accepted shapes
(universal default, list, string sugar), and the "logged + omitted,
not crashed" failure mode. Notes that match values come from each
adapter's name() (the same string in config.yaml's runtime: field).
2. workspace/tests/test_load_skills_call_sites.py
- Static AST gate: walks every workspace/*.py (excluding tests),
finds load_skills(...) Call nodes, fails if any lacks
current_runtime= as a keyword.
- Defense-in-depth `test_known_call_sites_present` — pins that the
scan actually sees the two known callers (adapter_base,
skill_loader.watcher) so a refactor that moves them is loud.
- Sanity-checked the matcher against a synthetic violating module.
Same-shape pattern as PR #2358 (tenant_resources audit-coverage AST
gate, #150) — pin the contract structurally, not just behaviorally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #32 (workspace template) merged 2026-05-02; image rebuild
succeeded. Plugin baked in. Local full-chain E2E green; caught + fixed
a real KeyError in upstream hermes_cli/tools_config.py. Upstream PR
#18775 still OPEN/CONFLICTING — not on critical path.
Also rewrites hermes-platform-plugins-upstream-pr.md to reflect the
final landing shape (existing hermes_cli/plugins.py, not a new
plugins/platforms/ system).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the OpenAI Codex CLI as a Molecule workspace runtime and lands
the design docs that drove the runtime native-MCP push parity work
across claude-code, hermes, openclaw, and codex.
manifest.json:
- Adds `codex` workspace_template entry pointing at the new
Molecule-AI/molecule-ai-workspace-template-codex repo (initial
commit landed there in parallel; 14 files / 1411 LOC). The
workspace-server runtime registry already had `codex` in its
fallback set — this entry makes it manifest-reachable in prod.
docs/integrations/:
- runtime-native-mcp-status.md — index across all four runtime streams
- codex-app-server-adapter-design.md — full design including v2 RPC
sequence, executor skeleton, schema-vs-runtime drift findings
(real codex 0.72 returns thread.id, schema says thread.threadId)
- hermes-platform-plugins-upstream-pr.md — pre-submission draft of
the hermes-agent upstream PR
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- workspace-runtime-package.md: add explicit "Where to make changes"
section documenting the mirror-only policy on
Molecule-AI/molecule-ai-workspace-runtime — direct PRs are auto-rejected
by mirror-guard CI; staging push regenerates both the mirror and the
PyPI wheel via .github/workflows/publish-runtime.yml.
- infra/workspace-terminal.md: replace dead molecule-core#1528 reference
(repo renamed to molecule-monorepo, no longer accepting issues at the
old name) with a forward-pointer to monorepo + molecule-controlplane
issue trackers.
- architecture/backends.md: bump audit date to 2026-05-02 and add rows
for channel envelope enrichment (#2471), chat_history MCP tool
(#2474), /activity before_ts paging (#2476), /activity peer_id filter
(#2472), runtime_wedge smoke gate (#2473 + #2475), and the canvas-E2E
state-file requirement (#2327).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Manual fresh-user clean-slate test surfaced three friction points in
the existing dev-start.sh:
1. The script ran docker compose -f docker-compose.infra.yml
directly, bypassing infra/scripts/setup.sh — so the workspace
template registry was never populated and the canvas template
palette came up empty (the "Template palette is empty"
troubleshooting hit).
2. ADMIN_TOKEN was not handled at all. Without it, the AdminAuth
fail-open gate worked initially but slammed shut the moment the
first workspace registered a token — at which point the canvas
could no longer call /workspaces or /templates. New users hit
401s with no obvious next step.
3. The script wasn't mentioned in docs/quickstart.md. New users
followed the documented 4-step manual flow and never discovered
the single command existed.
Fixes:
- dev-start.sh now calls infra/scripts/setup.sh, which brings up
full infra (postgres + redis + langfuse + clickhouse + temporal)
AND populates the template/plugin registry from manifest.json.
- On first run, dev-start.sh writes MOLECULE_ENV=development to
.env. This activates middleware.isDevModeFailOpen() which lets
the canvas keep calling admin endpoints without a bearer (the
intended local-dev escape hatch). The .env is preserved on
re-runs and sourced before the platform launches.
- The script intentionally does NOT auto-generate an ADMIN_TOKEN.
A first attempt did, and broke the canvas because isDevModeFailOpen
requires ADMIN_TOKEN empty AND MOLECULE_ENV=development together.
Setting ADMIN_TOKEN in dev would close the hatch and the canvas
has no way to read that token in a dev build (no
NEXT_PUBLIC_ADMIN_TOKEN bake step here). The .env comment block
explicitly warns future contributors not to add it.
- Both processes' logs go to /tmp/molecule-{platform,canvas}.log
instead of stdout-mixed so the readiness banner stays clean.
- Health-poll loops cap at 30s with a clear timeout error pointing
to the log file, instead of hanging forever.
- The readiness banner now lists the log paths AND tells the user
the next step is "open localhost:3000 → add API key in Config →
Secrets & API Keys → Global", instead of just listing service
URLs.
Quickstart doc rewrite leads with:
git clone ...
cd molecule-monorepo
./scripts/dev-start.sh
The 4-step manual flow is preserved as "Manual setup (advanced)"
for contributors who want per-component logs.
Verified end-to-end from clean Docker (no containers, no volumes,
no .env) three times: total wall-clock ~12s for a re-run with
cached npm/docker layers. Platform's HTTP 200 on /workspaces
without a bearer confirms the dev-mode auth hatch is active.
Same root cause as the workspace/molecule_ai_status.py docstring fix
in this PR: this doc claimed `molecule-monorepo-status` was a usable
shell alias and `from molecule_ai_status import set_status` was a
usable Python import. Both worked under the pre-#87 monolithic-template
layout (where workspace/Dockerfile created the symlink and COPY'd the
modules into /app/) but neither works in current standalone template
images that install the runtime as a wheel:
- `which molecule-monorepo-status` errors — only `a2a-db` and
`molecule-runtime` are registered console scripts.
- `from molecule_ai_status` raises ImportError — modules are under the
`molecule_runtime` package now.
Switched both examples to the canonical `python3 -m
molecule_runtime.molecule_ai_status` form (CLI) and `from
molecule_runtime.molecule_ai_status import set_status` (Python). Same
form the runtime ships in its own usage banner, so anyone discovering
this doc gets a runnable example.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Quota gates are resource-state conflicts, not payment failures —
RFC 9110 reserves 402 for billing/payment failures specifically. The
canonical Molecule-AI/docs PR #82 already shipped the corrected text;
this brings the molecule-core copy of the tutorial in line.
The inline parenthetical "(not 402 Payment Required — quota gates are
resource-state conflicts, not payment failures, per RFC 9110)" doubles
as a regression anchor: a future edit that flips 409 back to 402 would
have to also reword that explanation, making the change a deliberate
two-step act rather than a casual oversight.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an opt-in goroutine that polls GHCR every 5 minutes for digest
changes on each workspace-template-*:latest tag and invokes the same
refresh logic /admin/workspace-images/refresh exposes. With this, the
chain from "merge runtime PR" to "containers running new code" is fully
hands-off — no operator step between auto-tag → publish-runtime →
cascade → template image rebuild → host pull + recreate.
Opt-in via IMAGE_AUTO_REFRESH=true. SaaS deploys whose pipeline already
pulls every release should leave it off (would be redundant work);
self-hosters get true zero-touch.
Why a refactor of admin_workspace_images.go is in this PR:
The HTTP handler held all the refresh logic inline. To share it with
the new watcher without HTTP loopback, extracted WorkspaceImageService
with a Refresh(ctx, runtimes, recreate) (RefreshResult, error) shape.
HTTP handler is now a thin wrapper; behavior is preserved (same JSON
response, same 500-on-list-failure, same per-runtime soft-fail).
Watcher design notes:
- Last-observed digest tracked in memory (not persisted). On boot the
first observation per runtime is seed-only — no spurious refresh
fires on every restart.
- On Refresh error, the seen digest rolls back so the next tick retries.
Without this rollback a transient Docker glitch would convince the
watcher the work was done.
- Per-runtime fetch errors don't block other runtimes (one template's
brief 500 doesn't pause the others).
- digestFetcher injection seam in tick() lets unit tests cover all
bookkeeping branches without standing up an httptest GHCR server.
Verified live: probed GHCR's /token + manifest HEAD against
workspace-template-claude-code; got HTTP 200 + a real
Docker-Content-Digest. Same calls the watcher makes.
Co-authored-by: Hongming Wang <hongmingwangalt@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: point new-runtime-template flow at the GitHub template repo
The 'Writing a new adapter' section was a 6-step manual checklist that
re-derived the canonical shape every time. Now that
Molecule-AI/molecule-ai-workspace-template-starter exists as a GitHub
template, the flow collapses to:
gh repo create ... --template Molecule-AI/molecule-ai-workspace-template-starter
Plus a fill-in-the-TODO-markers table.
Why this matters: the starter ships with the
'repository_dispatch: [runtime-published]' cascade receiver pre-wired,
which means new templates pick up runtime PyPI publishes automatically
without the one-time setup PR each existing template needed (PRs #6-#22
across the 8 template repos that we just opened to retrofit). At
'hundreds of runtimes' scale this is the difference between linear PR-
toil and zero PR-toil per template addition.
Also adds: 'When the starter itself needs to evolve' — explicit pattern
for keeping the canonical shape in one place when it changes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
* docs(workspace-runtime): drop PYPI_TOKEN refs — OIDC is the new auth
Reflects PR #2113 (PyPI Trusted Publisher / OIDC migration). No static
PyPI token exists in the repo anymore, so the docs shouldn't claim one
does. Replaces the PYPI_TOKEN row in the Required Secrets table with an
"Auth" section pointing at the OIDC config; TEMPLATE_DISPATCH_TOKEN is
still the only repo secret the cascade needs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Hongming Wang <hongmingwangalt@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
External architecture review flagged the SECRETS_ENCRYPTION_KEY env var
on the platform as encryption-at-rest theater. The reviewer read only
the platform repo and missed that the master key actually lives in AWS
KMS at the control plane layer, with envelope encryption wrapping each
tenant secret blob.
Adds docs/architecture/secrets-key-custody.md as the canonical source
of truth for the full chain:
- Two-mode envelope (KMS_KEY_ARN vs static-key fallback)
- Per-blob AES-256-GCM with KMS-wrapped DEKs
- Where each key actually lives (KMS, CP env, tenant env)
- Threat model per attacker capability
- Rotation story (annual KMS CMK rotation, manual DEK rotation on incident)
- Audit posture (SOC2 / ISO 27001 questionnaire bullets)
Patches three downstream docs that previously stopped at the env-var
level and link them to the new custody doc:
- development/constraints-and-rules.md (Rule 11)
- architecture/database-schema.md (workspace_secrets paragraph)
- architecture/molecule-technical-doc.md (env-vars table)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The production-side end of the runtime CD chain. Operators (or the post-
publish CI workflow) hit this after a runtime release to pull the latest
workspace-template-* images from GHCR and recreate any running ws-* containers
so they adopt the new image. Without this, freshly-published runtime sat in
the registry but containers kept the old image until naturally cycled.
Implementation notes:
- Uses Docker SDK ImagePull rather than shelling out to docker CLI — the
alpine platform container has no docker CLI installed.
- ghcrAuthHeader() reads GHCR_USER + GHCR_TOKEN env, builds the base64-
encoded JSON payload Docker engine expects in PullOptions.RegistryAuth.
Both empty → public/cached images only; both set → private GHCR pulls.
- Container matching uses ContainerInspect (NOT ContainerList) because
ContainerList returns the resolved digest in .Image, not the human tag.
Inspect surfaces .Config.Image which is what we need.
- Provisioner.DefaultImagePlatform() exported so admin handler picks the
same Apple-Silicon-needs-amd64 platform as the provisioner — single
source of truth for the multi-arch override.
Local-dev companion: scripts/refresh-workspace-images.sh runs on the
host and inherits the host's docker keychain auth — alternate path for
when GHCR_USER/TOKEN aren't set in the platform env.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
71 files across docs/marketing/ and marketing/ are blocked by the
Block-internal-flavored-paths CI gate (CEO directive 2026-04-23).
These paths must live in Molecule-AI/internal, not the public monorepo.
Unblocks PR #1981 (staging→main sync).
Public-facing blog/devrel content should be re-added via correct paths:
docs/blog/<slug>.md, docs/devrel/<slug>.md, docs/tutorials/<slug>.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Discord-format announcement for Phase 34 GA (April 30, 2026).
All four features: Tool Trace, Platform Instructions, Partner API Keys,
SaaS Federation v2. ~550 words, community-native tone.
Address: Molecule-AI/molecule-core#1836
Co-Authored-By: Claude Community Manager <noreply@anthropic.com>
Community announcement for Phase 34 GA (April 30, 2026).
Four features: Tool Trace, Platform Instructions, Partner API Keys,
SaaS Federation v2. Discord-format, ~550 words, community-native tone.
Addresses Molecule-AI/molecule-core#1836.
Co-Authored-By: Claude Community Manager <noreply@anthropic.com>
Community announcement for Phase 34 GA (April 30, 2026).
Four features: Tool Trace, Platform Instructions, Partner API Keys,
SaaS Federation v2. Discord-format, ~550 words, community-native tone.
Addresses Molecule-AI/molecule-core#1836.
Co-Authored-By: Claude Community Manager <noreply@anthropic.com>
This monorepo is public. Internal content (positioning, competitive
briefs, sales playbooks, PMM/press drip, draft campaigns) belongs in
Molecule-AI/internal — never here.
## What this PR removes
/research/ (3 competitive briefs)
/marketing/ (45 files: assets, audio, community, copy,
demos, devrel, drip, pmm, press, sales)
/docs/marketing/ (31 draft campaign / blog / brief files)
comment-1172.json + comment-1173.json
test-pmm-temp.txt
tick-reflections-temp.md
83 files removed, 7,141 lines deleted from public history (going forward —
historical commits remain visible in this repo's git log).
## Companion: internal repo absorption
Molecule-AI/internal PR `chore/migrate-monorepo-internal-content-2026-04-23`
absorbs all 79 files into `from-monorepo-2026-04-23/` for curator triage
into the existing internal/marketing/ tree. Bulk-dump avoids file-collision
on overlapping subdirs (audio, devrel, pmm).
## Three-layer enforcement so this can't recur
1. .gitignore — blocks `git add` of /research, /marketing, /docs/marketing,
/comment-*.json, *-temp.{md,txt}, /test-pmm-*, /tick-reflections-*
2. .github/workflows/block-internal-paths.yml — CI hard gate. Fails any PR
that adds a forbidden path. Cannot be silently bypassed.
3. docs/internal-content-policy.md — canonical decision tree for agents
and humans. Linked from the CI failure message.
A separate PR on molecule-ai-org-template-molecule-dev updates SHARED_RULES
to teach every agent role to write internal content directly to
Molecule-AI/internal via gh repo clone + commit + PR (the prevention-at-
source layer; this PR is the mechanical backstop).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The monitoring section referenced GET /org/tokens/:id/logs which does
not exist. The org token API only exposes List/Create/Revoke
(GET/POST/DELETE /org/tokens). Per-token activity logs via API are
a planned feature, not yet built.
Fixes: molecule-core#1914
- Replaced fake curl example with Canvas Activity Log path
- Added roadmap note: per-token activity logs via API (planned)
- Updated footer to include per-token activity logs on roadmap
- Kept the operational guidance (monitor call patterns, revoke if
suspicious) since the principle is correct even if the API is TBD
- Rename combined overview slug to tool-trace-platform-instructions-overview
- Add og_image placeholder to all 3 posts
- Cross-link all Phase 34 posts bidirectionally
- Add Tool Trace X post and Platform Instructions LinkedIn post
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Research team competitive audit confirmed no competitor has documented
programmatic partner org provisioning API equivalent to mol_pk_*. Updated
lead claim from unverified "only platform" to verified "first-mover" /
"first agent platform" framing for legal defensibility. Resolves the
VERIFICATION REQUIRED warning blocks in the battlecard.
Co-authored-by: Molecule AI Marketing Lead <marketing-lead@agents.moleculesai.app>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>