Commit Graph

3663 Commits

Author SHA1 Message Date
Hongming Wang
e778d477ba Merge pull request #381 from Molecule-AI/chore/triage-operator-handoff
chore(handoff): Triage Operator role + agent handoff package
2026-04-15 23:43:05 -07:00
Hongming Wang
16ae320bed
Merge pull request #381 from Molecule-AI/chore/triage-operator-handoff
chore(handoff): Triage Operator role + agent handoff package
2026-04-15 23:43:05 -07:00
Hongming Wang
2e43bb7271 chore(handoff): triage-operator role + agent handoff package
Wraps up a ~100-tick autonomous triage session by converting the prior
operator's institutional knowledge into standing, checked-in artifacts
so the next team picking up the hourly PR + issue cycle can drop in
without re-discovering everything from scratch.

## New role: Triage Operator

Peer to Dev Lead, Research Lead, Documentation Specialist under PM.
Owns the 7-gate PR verification + issue-pickup cycle across both
molecule-monorepo and molecule-controlplane. NOT an engineer — never
writes logic, never makes design calls. Mechanical fixes on other
people's branches + verified-merge only.

Runs on cron `17 * * * *`. On first boot reads four handoff files +
the last 20 lines of cron-learnings.jsonl, waits for the scheduled
tick (no first-boot triage — known stale-state footgun).

## Files

org-templates/molecule-dev/triage-operator/
- system-prompt.md (48 lines) — role prompt loaded at boot. Standing
  rules, verification discipline, escalation paths.
- philosophy.md (135 lines) — 10 principles each tied to a real
  incident. Rule 2 ("tool succeeded ≠ work done") references the
  WorkOS refresh-token + missing-migration saga. Rule 3 (authority
  verification) references PR #370 CEO directive hold.
- playbook.md (234 lines) — step-by-step tick flow (Step 0 guards →
  1 list → 2 seven-gate → 3 docs sync → 4 issue pickup → 5 report).
  Expected 5–30 min wall-clock. When-not-to-triage.
- handoff-notes.md (146 lines) — point-in-time state for the NEXT
  operator arriving fresh. 15 PRs merged this session, in-flight
  items, design-call backlog with recommendations per issue.
- SKILL.md (152 lines) — installable skill spec. Invocation, inputs,
  outputs, required composed skills, edge cases, output format.

.claude/AGENT_HANDOFF.md (206 lines) — top-level handoff for any
Claude Code agent working this repo (not just the triage operator).
The 10 principles (one-liners), communication style the user
expects, currently-live state, open items, what NOT to do, break-
glass escalation conditions. Points at triage-operator/philosophy.md
for full incident context.

## Wiring

org.yaml gains a Triage Operator workspace block under PM with:
- tier: 3, model: opus
- 8 plugins (careful-bash, session-context, cron-learnings,
  code-review, cross-vendor-review, llm-judge, update-docs, hitl)
- Hourly cron at `:17` with the full Step 0–5 flow inline as prompt
- canvas position (1150, 250) — peer to Documentation Specialist

## Why this ships now

The 30-min manual triage cron was cancelled per CEO direction. The
role moves to another team. Without this handoff package they'd be
rediscovering the same incident-classes I shipped fixes for
(#318 fail-open, #327 cross-tenant decrypt, #351 tokenless grace,
WorkOS refresh-token saga, missing migration runner). The philosophy
file gives them the scar tissue in ~10 min of reading; the playbook
gives them the steps; the SKILL gives them an invocable entry point.

No code changes outside org.yaml. Existing TestPlugins_UnionWithDefaults
still passes (verified in platform test run).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:41:01 -07:00
Hongming Wang
df5821a251 chore(handoff): triage-operator role + agent handoff package
Wraps up a ~100-tick autonomous triage session by converting the prior
operator's institutional knowledge into standing, checked-in artifacts
so the next team picking up the hourly PR + issue cycle can drop in
without re-discovering everything from scratch.

## New role: Triage Operator

Peer to Dev Lead, Research Lead, Documentation Specialist under PM.
Owns the 7-gate PR verification + issue-pickup cycle across both
molecule-monorepo and molecule-controlplane. NOT an engineer — never
writes logic, never makes design calls. Mechanical fixes on other
people's branches + verified-merge only.

Runs on cron `17 * * * *`. On first boot reads four handoff files +
the last 20 lines of cron-learnings.jsonl, waits for the scheduled
tick (no first-boot triage — known stale-state footgun).

## Files

org-templates/molecule-dev/triage-operator/
- system-prompt.md (48 lines) — role prompt loaded at boot. Standing
  rules, verification discipline, escalation paths.
- philosophy.md (135 lines) — 10 principles each tied to a real
  incident. Rule 2 ("tool succeeded ≠ work done") references the
  WorkOS refresh-token + missing-migration saga. Rule 3 (authority
  verification) references PR #370 CEO directive hold.
- playbook.md (234 lines) — step-by-step tick flow (Step 0 guards →
  1 list → 2 seven-gate → 3 docs sync → 4 issue pickup → 5 report).
  Expected 5–30 min wall-clock. When-not-to-triage.
- handoff-notes.md (146 lines) — point-in-time state for the NEXT
  operator arriving fresh. 15 PRs merged this session, in-flight
  items, design-call backlog with recommendations per issue.
- SKILL.md (152 lines) — installable skill spec. Invocation, inputs,
  outputs, required composed skills, edge cases, output format.

.claude/AGENT_HANDOFF.md (206 lines) — top-level handoff for any
Claude Code agent working this repo (not just the triage operator).
The 10 principles (one-liners), communication style the user
expects, currently-live state, open items, what NOT to do, break-
glass escalation conditions. Points at triage-operator/philosophy.md
for full incident context.

## Wiring

org.yaml gains a Triage Operator workspace block under PM with:
- tier: 3, model: opus
- 8 plugins (careful-bash, session-context, cron-learnings,
  code-review, cross-vendor-review, llm-judge, update-docs, hitl)
- Hourly cron at `:17` with the full Step 0–5 flow inline as prompt
- canvas position (1150, 250) — peer to Documentation Specialist

## Why this ships now

The 30-min manual triage cron was cancelled per CEO direction. The
role moves to another team. Without this handoff package they'd be
rediscovering the same incident-classes I shipped fixes for
(#318 fail-open, #327 cross-tenant decrypt, #351 tokenless grace,
WorkOS refresh-token saga, missing migration runner). The philosophy
file gives them the scar tissue in ~10 min of reading; the playbook
gives them the steps; the SKILL gives them an invocable entry point.

No code changes outside org.yaml. Existing TestPlugins_UnionWithDefaults
still passes (verified in platform test run).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:41:01 -07:00
Hongming Wang
04c3911c0e fix(security): forward Authorization header in transcript proxy (#405) (#380)
The platform's GET /workspaces/:id/transcript proxy was constructing the
outbound request without an Authorization header. The workspace's /transcript
endpoint (hardened in #287/#328) fails-closed when the header is absent,
so every transcript call in production returned 401 from the workspace.

Fix: after WorkspaceAuth validates the incoming bearer token, the handler
now forwards it verbatim via req.Header.Set("Authorization", ...).
Forwarding is safe — the token has already been validated by the middleware.

Tests:
- TestTranscript_ForwardsAuthHeader: was t.Skip'd as a bug marker; now
  active. Verifies the Authorization header reaches the workspace stub.
- TestTranscript_NoAuthHeader_PassesThrough: new. Verifies that a missing
  header produces no synthetic Authorization on the upstream call, and the
  workspace 401 is faithfully relayed.

Identified by QA audit 2026-04-16.

Co-authored-by: QA Engineer <qa-engineer@molecule-ai.internal>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 23:38:07 -07:00
Hongming Wang
52bdadbd6d
fix(security): forward Authorization header in transcript proxy (#405) (#380)
The platform's GET /workspaces/:id/transcript proxy was constructing the
outbound request without an Authorization header. The workspace's /transcript
endpoint (hardened in #287/#328) fails-closed when the header is absent,
so every transcript call in production returned 401 from the workspace.

Fix: after WorkspaceAuth validates the incoming bearer token, the handler
now forwards it verbatim via req.Header.Set("Authorization", ...).
Forwarding is safe — the token has already been validated by the middleware.

Tests:
- TestTranscript_ForwardsAuthHeader: was t.Skip'd as a bug marker; now
  active. Verifies the Authorization header reaches the workspace stub.
- TestTranscript_NoAuthHeader_PassesThrough: new. Verifies that a missing
  header produces no synthetic Authorization on the upstream call, and the
  workspace 401 is faithfully relayed.

Identified by QA audit 2026-04-16.

Co-authored-by: QA Engineer <qa-engineer@molecule-ai.internal>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 23:38:07 -07:00
Hongming Wang
1ff544eba8 feat(adapters): add gemini-cli runtime adapter (closes #332) (#379)
Adds a `gemini-cli` workspace runtime backed by Google's Gemini CLI
(@google/gemini-cli, ~101k ★, Apache 2.0). Mirrors the claude-code
adapter pattern: Docker image installs the CLI, CLIAgentExecutor
drives the subprocess, A2A MCP tools wire via ~/.gemini/settings.json.

Changes:
- workspace-template/adapters/gemini_cli/ — new adapter (Dockerfile,
  adapter.py, __init__.py, requirements.txt); setup() seeds GEMINI.md
  from system-prompt.md and injects A2A MCP server into settings.json
- workspace-template/cli_executor.py — adds gemini-cli to
  RUNTIME_PRESETS (--yolo flag, -p prompt, --model, GEMINI_API_KEY env
  auth); adds mcp_via_settings preset flag to skip --mcp-config
  injection for runtimes that own their own settings file
- workspace-configs-templates/gemini-cli/ — default config.yaml +
  system-prompt.md template
- tests/test_adapters.py — adds gemini-cli to expected adapter set
- CLAUDE.md — documents new runtime row in the image table

Requires: GEMINI_API_KEY global secret. Build:
  bash workspace-template/build-all.sh gemini-cli

Co-authored-by: DevOps Engineer <devops@molecule.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 23:30:00 -07:00
Hongming Wang
0aec76400a
feat(adapters): add gemini-cli runtime adapter (closes #332) (#379)
Adds a `gemini-cli` workspace runtime backed by Google's Gemini CLI
(@google/gemini-cli, ~101k ★, Apache 2.0). Mirrors the claude-code
adapter pattern: Docker image installs the CLI, CLIAgentExecutor
drives the subprocess, A2A MCP tools wire via ~/.gemini/settings.json.

Changes:
- workspace-template/adapters/gemini_cli/ — new adapter (Dockerfile,
  adapter.py, __init__.py, requirements.txt); setup() seeds GEMINI.md
  from system-prompt.md and injects A2A MCP server into settings.json
- workspace-template/cli_executor.py — adds gemini-cli to
  RUNTIME_PRESETS (--yolo flag, -p prompt, --model, GEMINI_API_KEY env
  auth); adds mcp_via_settings preset flag to skip --mcp-config
  injection for runtimes that own their own settings file
- workspace-configs-templates/gemini-cli/ — default config.yaml +
  system-prompt.md template
- tests/test_adapters.py — adds gemini-cli to expected adapter set
- CLAUDE.md — documents new runtime row in the image table

Requires: GEMINI_API_KEY global secret. Build:
  bash workspace-template/build-all.sh gemini-cli

Co-authored-by: DevOps Engineer <devops@molecule.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 23:30:00 -07:00
Hongming Wang
592fe6d7f7 feat(org-templates): add 7-role marketing team sub-tree (#373)
Add Marketing Lead + 6 reports as a peer sub-tree of PM under the CEO:
DevRel Engineer, Product Marketing Manager, Content Marketer, Community
Manager, SEO Growth Analyst, Social Media / Brand.

- Marketing Lead: tier-3 Opus CMO-equivalent with a 5-min orchestrator
  pulse (minutes 4/9/14/... offset from Dev Lead's 2/7/12/...) that
  dispatches cross-role work, reviews drafts, and routes cross-team
  asks back to PM.
- DevRel + PMM: tier-3 Opus (technical writing + positioning judgment).
  Each has an idle_prompt for proactive issue-claim plus an hourly
  evolution cron (DevRel = sample-coverage audit, PMM = competitor
  diff against docs/ecosystem-watch.md).
- Content / Community / SEO / Social: tier-2 Sonnet with idle_prompts
  for backlog-pull (matches the #205 idle-loop pattern proven on
  Technical Researcher + Market Analyst + Competitive Intelligence).
  Each has an hourly cron tuned to its surface.
- category_routing gets 6 new keys (content, positioning, community,
  growth, social, devrel) so audit_summary messages fan out correctly.
- Canvas positions lay out the marketing cluster to the right of
  PM/Dev Lead (x=1000-1300, y=50/250/400) so the graph stays readable.

Each role also gets a system-prompt.md under its files_dir with
responsibilities, team interfaces, conventions, and self-review gates
(molecule-skill-llm-judge or molecule-hitl depending on risk).

Per CEO directive 2026-04-16 ("comprehensive marketing team"). This is
PR 1 of 2 — follow-up will add cross-tree A2A conventions and wire
DevRel ↔ Backend Engineer / PMM ↔ Competitive Intelligence delegations.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:20:04 -07:00
Hongming Wang
b2e1631640
feat(org-templates): add 7-role marketing team sub-tree (#373)
Add Marketing Lead + 6 reports as a peer sub-tree of PM under the CEO:
DevRel Engineer, Product Marketing Manager, Content Marketer, Community
Manager, SEO Growth Analyst, Social Media / Brand.

- Marketing Lead: tier-3 Opus CMO-equivalent with a 5-min orchestrator
  pulse (minutes 4/9/14/... offset from Dev Lead's 2/7/12/...) that
  dispatches cross-role work, reviews drafts, and routes cross-team
  asks back to PM.
- DevRel + PMM: tier-3 Opus (technical writing + positioning judgment).
  Each has an idle_prompt for proactive issue-claim plus an hourly
  evolution cron (DevRel = sample-coverage audit, PMM = competitor
  diff against docs/ecosystem-watch.md).
- Content / Community / SEO / Social: tier-2 Sonnet with idle_prompts
  for backlog-pull (matches the #205 idle-loop pattern proven on
  Technical Researcher + Market Analyst + Competitive Intelligence).
  Each has an hourly cron tuned to its surface.
- category_routing gets 6 new keys (content, positioning, community,
  growth, social, devrel) so audit_summary messages fan out correctly.
- Canvas positions lay out the marketing cluster to the right of
  PM/Dev Lead (x=1000-1300, y=50/250/400) so the graph stays readable.

Each role also gets a system-prompt.md under its files_dir with
responsibilities, team interfaces, conventions, and self-review gates
(molecule-skill-llm-judge or molecule-hitl depending on risk).

Per CEO directive 2026-04-16 ("comprehensive marketing team"). This is
PR 1 of 2 — follow-up will add cross-tree A2A conventions and wire
DevRel ↔ Backend Engineer / PMM ↔ Competitive Intelligence delegations.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:20:04 -07:00
Hongming Wang
06c205da77 Merge pull request #370 from Molecule-AI/feat/engineers-pick-up-issues
feat(template): engineers pick up issues proactively (CEO 2026-04-16 directive)
2026-04-15 22:53:44 -07:00
Hongming Wang
e557259aad
Merge pull request #370 from Molecule-AI/feat/engineers-pick-up-issues
feat(template): engineers pick up issues proactively (CEO 2026-04-16 directive)
2026-04-15 22:53:44 -07:00
rabbitblood
4f9ef2dd0e feat(template): engineers pick up issues proactively (CEO 2026-04-16 directive)
CEO directive verbatim: *"devs should pick up issues and declare that its
assigned to them, PM and leaders regularly check in. dont just rely on
outside reviewer"*.

Adds `idle_prompt` + `idle_interval_seconds: 600` to Frontend Engineer,
Backend Engineer, and DevOps Engineer. Each engineer now polls open GH
issues matching its specialty, claims unassigned ones via `gh issue edit
--add-assignee @me`, leaves a public comment declaring the pickup, and
commits memory to prevent double-pickup on the next tick.

Previously engineers were reactive-only per the #159 orchestrator/worker
split. The CEO is correcting that: devs should be a true self-organizing
unit, not a work-queue that only advances when an outside reviewer
dispatches.

## Per-role specialty filters

| Role | Labels it claims |
|---|---|
| Frontend Engineer | canvas, a11y, ux, typescript, frontend, bug, security |
| Backend Engineer | security, platform, go, database, bug |
| DevOps Engineer | docker, ci, deployment, infra, devops, bug |

Priority order within each role: security > bug > feature.

## Self-review gates

Each engineer's idle_prompt includes the self-review chain:
- Frontend: molecule-skill-code-review + molecule-skill-llm-judge
- Backend: molecule-skill-code-review + molecule-security-scan + molecule-skill-llm-judge
- DevOps: molecule-skill-code-review + molecule-freeze-scope + molecule-hitl for risky ops

These plugins were wired into engineer roles by #280, #303, #310, #322 —
the idle_prompt makes them the PRIMARY quality gate instead of a nice-to-
have before PR. Matches the "team self-regulates, don't rely on outside
reviewer" spirit.

## Hard rules (same shape as researcher idle_prompts from #216/#321)

- Max 1 claim per tick (1 `gh issue edit --add-assignee` call)
- Never take someone else's assigned issue
- Under 90 seconds wall-clock for the claim + plan step
- Don't double-pick: check `task-assigned:<role>` memory first
- No busy-work fabrication: write "<role>-idle HH:MM — no work" if nothing matches

## What this does NOT change

- Leaders' orchestrator pulses still dispatch (#159) — this is the TAIL
  pickup, not the primary dispatch path. Dev Lead still prioritizes via
  its own pulse.
- PR merging still goes through reviewer per `feedback_never_merge_prs.md`.
  This directive is about the QUALITY GATE (team self-review, peer review
  via Dev Lead's pulse) not about bypassing merge approval.
- Destructive/irreversible ops still need explicit human ack via
  molecule-hitl's @requires_approval decorator.

## Rollout plan

- Ship template change (this PR)
- After merge: rebuild workspace-template:claude-code, re-provision
  BE + FE + DevOps via apply_template=true, re-inject idle_prompt
  (platform doesn't auto-propagate org.yaml to live configs — tracked
  separately)
- Measure: 24h of activity_logs. Should see `a2a_receive` events every
  10 min per engineer, response bodies mentioning claim decisions or
  idle-clean states, and `gh issue edit` events showing up as assignees.

## Related
- `feedback_devs_pick_up_issues_leaders_check_in.md` — memory saved last cycle
- #159 orchestrator/worker split (leaders dispatch)
- #216 / #321 researcher idle_prompts (same pattern applied to researchers)
- `project_north_star_24_7.md` — team self-regulation is the north-star
2026-04-15 22:49:10 -07:00
rabbitblood
90d68ca039 feat(template): engineers pick up issues proactively (CEO 2026-04-16 directive)
CEO directive verbatim: *"devs should pick up issues and declare that its
assigned to them, PM and leaders regularly check in. dont just rely on
outside reviewer"*.

Adds `idle_prompt` + `idle_interval_seconds: 600` to Frontend Engineer,
Backend Engineer, and DevOps Engineer. Each engineer now polls open GH
issues matching its specialty, claims unassigned ones via `gh issue edit
--add-assignee @me`, leaves a public comment declaring the pickup, and
commits memory to prevent double-pickup on the next tick.

Previously engineers were reactive-only per the #159 orchestrator/worker
split. The CEO is correcting that: devs should be a true self-organizing
unit, not a work-queue that only advances when an outside reviewer
dispatches.

## Per-role specialty filters

| Role | Labels it claims |
|---|---|
| Frontend Engineer | canvas, a11y, ux, typescript, frontend, bug, security |
| Backend Engineer | security, platform, go, database, bug |
| DevOps Engineer | docker, ci, deployment, infra, devops, bug |

Priority order within each role: security > bug > feature.

## Self-review gates

Each engineer's idle_prompt includes the self-review chain:
- Frontend: molecule-skill-code-review + molecule-skill-llm-judge
- Backend: molecule-skill-code-review + molecule-security-scan + molecule-skill-llm-judge
- DevOps: molecule-skill-code-review + molecule-freeze-scope + molecule-hitl for risky ops

These plugins were wired into engineer roles by #280, #303, #310, #322 —
the idle_prompt makes them the PRIMARY quality gate instead of a nice-to-
have before PR. Matches the "team self-regulates, don't rely on outside
reviewer" spirit.

## Hard rules (same shape as researcher idle_prompts from #216/#321)

- Max 1 claim per tick (1 `gh issue edit --add-assignee` call)
- Never take someone else's assigned issue
- Under 90 seconds wall-clock for the claim + plan step
- Don't double-pick: check `task-assigned:<role>` memory first
- No busy-work fabrication: write "<role>-idle HH:MM — no work" if nothing matches

## What this does NOT change

- Leaders' orchestrator pulses still dispatch (#159) — this is the TAIL
  pickup, not the primary dispatch path. Dev Lead still prioritizes via
  its own pulse.
- PR merging still goes through reviewer per `feedback_never_merge_prs.md`.
  This directive is about the QUALITY GATE (team self-review, peer review
  via Dev Lead's pulse) not about bypassing merge approval.
- Destructive/irreversible ops still need explicit human ack via
  molecule-hitl's @requires_approval decorator.

## Rollout plan

- Ship template change (this PR)
- After merge: rebuild workspace-template:claude-code, re-provision
  BE + FE + DevOps via apply_template=true, re-inject idle_prompt
  (platform doesn't auto-propagate org.yaml to live configs — tracked
  separately)
- Measure: 24h of activity_logs. Should see `a2a_receive` events every
  10 min per engineer, response bodies mentioning claim decisions or
  idle-clean states, and `gh issue edit` events showing up as assignees.

## Related
- `feedback_devs_pick_up_issues_leaders_check_in.md` — memory saved last cycle
- #159 orchestrator/worker split (leaders dispatch)
- #216 / #321 researcher idle_prompts (same pattern applied to researchers)
- `project_north_star_24_7.md` — team self-regulation is the north-star
2026-04-15 22:49:10 -07:00
Hongming Wang
829e4bf89b Merge pull request #369 from Molecule-AI/chore/eco-watch-2026-04-18
All CI green. Docs-only: adds AMD GAIA + ClawRun ecosystem survey entries.
2026-04-15 22:46:53 -07:00
Hongming Wang
4b467c37a8
Merge pull request #369 from Molecule-AI/chore/eco-watch-2026-04-18
All CI green. Docs-only: adds AMD GAIA + ClawRun ecosystem survey entries.
2026-04-15 22:46:53 -07:00
Research Lead
dff50f5927 chore(eco-watch): 2026-04-18 survey — AMD GAIA + ClawRun
Add two new entries to docs/ecosystem-watch.md:

- **AMD GAIA** (amd/gaia, ~1.2k , MIT, v0.17.2 April 10 2026):
  AMD-backed local-first agent framework with MCP client support,
  RAG, vision, and voice. Hardware-locked to Ryzen AI but signals
  local/privacy-first positioning. @tool decorator pattern worth
  borrowing for workspace adapters.

- **ClawRun** (clawrun-sh/clawrun, ~84 , Apache 2.0, 45 releases):
  Closest architectural match we've tracked — hosting/lifecycle layer
  with sandbox, heartbeat, snapshot/resume, channels, and cost
  tracking. Per-channel budget enforcement is a concrete gap in our
  workspace_channels. Filed #368.

HEAD at survey time: 8db86df

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 05:40:44 +00:00
Research Lead
3ed4038149 chore(eco-watch): 2026-04-18 survey — AMD GAIA + ClawRun
Add two new entries to docs/ecosystem-watch.md:

- **AMD GAIA** (amd/gaia, ~1.2k , MIT, v0.17.2 April 10 2026):
  AMD-backed local-first agent framework with MCP client support,
  RAG, vision, and voice. Hardware-locked to Ryzen AI but signals
  local/privacy-first positioning. @tool decorator pattern worth
  borrowing for workspace adapters.

- **ClawRun** (clawrun-sh/clawrun, ~84 , Apache 2.0, 45 releases):
  Closest architectural match we've tracked — hosting/lifecycle layer
  with sandbox, heartbeat, snapshot/resume, channels, and cost
  tracking. Per-channel budget enforcement is a concrete gap in our
  workspace_channels. Filed #368.

HEAD at survey time: a4a89a3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 05:40:44 +00:00
Hongming Wang
8db86df330 Merge pull request #363 from Molecule-AI/chore/eco-watch-2026-04-17
All CI green. Docs-only: adds GenericAgent + OpenSRE ecosystem survey entries.
2026-04-15 22:14:23 -07:00
Hongming Wang
a4a89a30c1
Merge pull request #363 from Molecule-AI/chore/eco-watch-2026-04-17
All CI green. Docs-only: adds GenericAgent + OpenSRE ecosystem survey entries.
2026-04-15 22:14:23 -07:00
Research Lead
04ceb95142 chore(eco-watch): 2026-04-17 survey — GenericAgent + OpenSRE
Add two new entries to docs/ecosystem-watch.md:

- **GenericAgent** (lsdefine/GenericAgent, ~2.1k , MIT, v1.0 January
  2026): self-evolving skill tree with a four-tier memory hierarchy
  (rules/indices/facts/skills/archives). Skill crystallisation at
  runtime is the automation of our install-time plugins model. Filed
  #361 to add named memory tiers to agent_memories.

- **OpenSRE** (Tracer-Cloud/opensre, ~900 , Apache 2.0): AI SRE
  agent toolkit with 40+ production DevOps integrations and MCP
  support. Filed #362 to evaluate its adapters as a Molecule AI
  DevOps workspace skill pack.

HEAD at survey time: 2e1fc8d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 05:11:01 +00:00
Research Lead
fe6e3032a4 chore(eco-watch): 2026-04-17 survey — GenericAgent + OpenSRE
Add two new entries to docs/ecosystem-watch.md:

- **GenericAgent** (lsdefine/GenericAgent, ~2.1k , MIT, v1.0 January
  2026): self-evolving skill tree with a four-tier memory hierarchy
  (rules/indices/facts/skills/archives). Skill crystallisation at
  runtime is the automation of our install-time plugins model. Filed
  #361 to add named memory tiers to agent_memories.

- **OpenSRE** (Tracer-Cloud/opensre, ~900 , Apache 2.0): AI SRE
  agent toolkit with 40+ production DevOps integrations and MCP
  support. Filed #362 to evaluate its adapters as a Molecule AI
  DevOps workspace skill pack.

HEAD at survey time: 93fd546

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 05:11:01 +00:00
Hongming Wang
2e1fc8d832 Merge pull request #360 from Molecule-AI/chore/issue-358-wsauth-dead-constants
All CI green. Removes dead constants and stale comment left over from PR #357 grace-period test deletion (closes #358).
2026-04-15 22:05:37 -07:00
Hongming Wang
93fd5467e2
Merge pull request #360 from Molecule-AI/chore/issue-358-wsauth-dead-constants
All CI green. Removes dead constants and stale comment left over from PR #357 grace-period test deletion (closes #358).
2026-04-15 22:05:37 -07:00
PM Bot
409a249ca6 chore(test): remove dead constants from wsauth_middleware_test.go (#358)
PR #357 deleted the grace-period tests that used hasLiveTokenQuery and
workspaceExistsQuery, but the constants themselves (and the stale comment
describing the old HasAnyLiveToken-based dispatch) were not removed.

Remove both dead const declarations and update the header comment to
reflect the strict-enforcement contract introduced by #357.

Closes #358.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 05:02:11 +00:00
PM Bot
e257cd80d4 chore(test): remove dead constants from wsauth_middleware_test.go (#358)
PR #357 deleted the grace-period tests that used hasLiveTokenQuery and
workspaceExistsQuery, but the constants themselves (and the stale comment
describing the old HasAnyLiveToken-based dispatch) were not removed.

Remove both dead const declarations and update the header comment to
reflect the strict-enforcement contract introduced by #357.

Closes #358.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 05:02:11 +00:00
Hongming Wang
d09e72c5fd Merge pull request #357 from Molecule-AI/fix/issue-351-remove-tokenless-grace-period
All CI green. Merges strict WorkspaceAuth — removes tokenless grace period that enabled zombie workspace enumeration (#351).
2026-04-15 21:57:17 -07:00
Hongming Wang
4e514aa59a
Merge pull request #357 from Molecule-AI/fix/issue-351-remove-tokenless-grace-period
All CI green. Merges strict WorkspaceAuth — removes tokenless grace period that enabled zombie workspace enumeration (#351).
2026-04-15 21:57:17 -07:00
Hongming Wang
b2b0045913 fix(security): remove WorkspaceAuth tokenless grace period (#351)
Severity HIGH. #318 closed the fake-UUID fail-open for WorkspaceAuth
but left the grace period intact for *real* workspaces with no live
tokens. Zombie test-artifact workspaces from prior DAST runs still
exist in the DB with empty configs and no tokens, so they pass
WorkspaceExists=true but HasAnyLiveToken=false — and fell through the
grace period, leaking every global-secret key name to any
unauthenticated caller on the Docker network.

Phase 30.1 shipped months ago; every production workspace has gone
through multiple boot cycles and acquired a token since. The
"legacy workspaces grandfathered" window no longer serves legitimate
traffic. Removing it entirely is the cleanest fix — and does NOT
affect registration (which is on /registry/register, outside this
middleware's scope).

New contract (strict):

  every /workspaces/:id/* request MUST carry
  Authorization: Bearer <token-for-this-workspace>

Any missing/mismatched/revoked/wrong-workspace bearer → 401. No
existence check, no fallback. The wsauth.WorkspaceExists helper is
kept in the package for any future caller but no longer used here.

Tests:
- TestWorkspaceAuth_351_NoBearer_Returns401_NoDBCalls — new, covers
  fake UUID / zombie / pre-token in one sub-table. Asserts zero DB
  calls on missing bearer.
- Existing C4/C8 + #170 tests updated to drop the stale
  HasAnyLiveToken sqlmock expectations.
- Renamed TestWorkspaceAuth_Issue170_SecretDelete_FailOpen_NoTokens
  to _NoTokensStillRejected and flipped the assertion from 200 to 401.
- Dropped TestWorkspaceAuth_318_ExistsQueryError_Returns500 — the
  code path it covered no longer exists.

Full platform test sweep green.

Closes #351

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:52:44 -07:00
Hongming Wang
fa239217a0 fix(security): remove WorkspaceAuth tokenless grace period (#351)
Severity HIGH. #318 closed the fake-UUID fail-open for WorkspaceAuth
but left the grace period intact for *real* workspaces with no live
tokens. Zombie test-artifact workspaces from prior DAST runs still
exist in the DB with empty configs and no tokens, so they pass
WorkspaceExists=true but HasAnyLiveToken=false — and fell through the
grace period, leaking every global-secret key name to any
unauthenticated caller on the Docker network.

Phase 30.1 shipped months ago; every production workspace has gone
through multiple boot cycles and acquired a token since. The
"legacy workspaces grandfathered" window no longer serves legitimate
traffic. Removing it entirely is the cleanest fix — and does NOT
affect registration (which is on /registry/register, outside this
middleware's scope).

New contract (strict):

  every /workspaces/:id/* request MUST carry
  Authorization: Bearer <token-for-this-workspace>

Any missing/mismatched/revoked/wrong-workspace bearer → 401. No
existence check, no fallback. The wsauth.WorkspaceExists helper is
kept in the package for any future caller but no longer used here.

Tests:
- TestWorkspaceAuth_351_NoBearer_Returns401_NoDBCalls — new, covers
  fake UUID / zombie / pre-token in one sub-table. Asserts zero DB
  calls on missing bearer.
- Existing C4/C8 + #170 tests updated to drop the stale
  HasAnyLiveToken sqlmock expectations.
- Renamed TestWorkspaceAuth_Issue170_SecretDelete_FailOpen_NoTokens
  to _NoTokensStillRejected and flipped the assertion from 200 to 401.
- Dropped TestWorkspaceAuth_318_ExistsQueryError_Returns500 — the
  code path it covered no longer exists.

Full platform test sweep green.

Closes #351

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:52:44 -07:00
Hongming Wang
742d061787 Merge pull request #350 from Molecule-AI/chore/eco-watch-2026-04-16b
chore(eco-watch): 2026-04-16b survey — AgentScope + Plannotator
2026-04-15 21:47:50 -07:00
Hongming Wang
75146f4314
Merge pull request #350 from Molecule-AI/chore/eco-watch-2026-04-16b
chore(eco-watch): 2026-04-16b survey — AgentScope + Plannotator
2026-04-15 21:47:50 -07:00
Research Lead
93720565b0 chore(eco-watch): 2026-04-16b survey — AgentScope + Plannotator
Add two new entries to docs/ecosystem-watch.md:

- **AgentScope** (modelscope/agentscope, ~23.8k , Apache 2.0,
  v1.0.18 March 26 2026): Alibaba/ModelScope multi-agent framework
  with MCP support, MsgHub typed routing, and OpenTelemetry
  observability. No canvas or workspace lifecycle — framework-layer
  complement, not a platform competitor.

- **Plannotator** (backnotprop/plannotator, ~4.3k , Apache 2.0+MIT,
  v0.17.10 April 13 2026): Browser-based agent plan annotation tool
  with structured feedback types (delete/insert/replace/comment).
  Directly informs our hitl.py feedback schema. Filed #349 to add
  structured feedback types to resume_task.

HEAD at survey time: 0897f9e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 04:40:51 +00:00
Research Lead
6be5d09764 chore(eco-watch): 2026-04-16b survey — AgentScope + Plannotator
Add two new entries to docs/ecosystem-watch.md:

- **AgentScope** (modelscope/agentscope, ~23.8k , Apache 2.0,
  v1.0.18 March 26 2026): Alibaba/ModelScope multi-agent framework
  with MCP support, MsgHub typed routing, and OpenTelemetry
  observability. No canvas or workspace lifecycle — framework-layer
  complement, not a platform competitor.

- **Plannotator** (backnotprop/plannotator, ~4.3k , Apache 2.0+MIT,
  v0.17.10 April 13 2026): Browser-based agent plan annotation tool
  with structured feedback types (delete/insert/replace/comment).
  Directly informs our hitl.py feedback schema. Filed #349 to add
  structured feedback types to resume_task.

HEAD at survey time: 4196876

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 04:40:51 +00:00
Hongming Wang
0897f9e59c Merge pull request #346 from Molecule-AI/chore/issue-342-auditor-prompt-drift
chore(auditor): close #319 + #337 prompt drift on Security Auditor (#342)
2026-04-15 21:31:06 -07:00
Hongming Wang
4196876c2b
Merge pull request #346 from Molecule-AI/chore/issue-342-auditor-prompt-drift
chore(auditor): close #319 + #337 prompt drift on Security Auditor (#342)
2026-04-15 21:31:06 -07:00
Hongming Wang
d8183e16cc Merge pull request #343 from Molecule-AI/fix/issue-337-webhook-secret-constant-time
fix(security): constant-time webhook_secret comparison (#337)
2026-04-15 21:31:02 -07:00
Hongming Wang
c5d40b861b
Merge pull request #343 from Molecule-AI/fix/issue-337-webhook-secret-constant-time
fix(security): constant-time webhook_secret comparison (#337)
2026-04-15 21:31:02 -07:00
Hongming Wang
c6a721fd56 Merge pull request #341 from Molecule-AI/fix/publish-platform-image-keychain-again
fix(ci): disable osxkeychain credsStore on self-hosted runner (#199 follow-up)
2026-04-15 21:30:59 -07:00
Hongming Wang
af3d9904e1
Merge pull request #341 from Molecule-AI/fix/publish-platform-image-keychain-again
fix(ci): disable osxkeychain credsStore on self-hosted runner (#199 follow-up)
2026-04-15 21:30:59 -07:00
Hongming Wang
c7477047c2 Merge pull request #338 from Molecule-AI/fix/issue-328-transcript-fail-closed
fix(security): /transcript fails closed when auth token missing (#328)
2026-04-15 21:30:56 -07:00
Hongming Wang
e7bde9a919
Merge pull request #338 from Molecule-AI/fix/issue-328-transcript-fail-closed
fix(security): /transcript fails closed when auth token missing (#328)
2026-04-15 21:30:56 -07:00
Hongming Wang
2da48dda13 chore(auditor): close #319 + #337 prompt drift on Security Auditor (#342)
Two recent platform-level security changes (#319 channel_config
encryption, #337 constant-time webhook_secret compare) were not
reflected in the Security Auditor's system prompt or the schedule cron
prompt. That meant the auditor wouldn't proactively look for the
*next* instance of either class — a new credential field added to
channel_config without being added to sensitiveFields, or a new
secret comparison using raw `!=`, would slip through until a human
happened to notice.

Updated two files:

1. org-templates/molecule-dev/security-auditor/system-prompt.md
   Added two bullets to "What You Check":
   - Secret comparisons must use subtle.ConstantTimeCompare /
     crypto.timingSafeEqual (cites #337 as the repo's recent instance)
   - Secret storage at rest: any new channel_config credential field
     must be added to sensitiveFields and exercised in both the
     Encrypt (write) and Decrypt (read) boundary helpers, and the
     ec1: prefix must never leak into API responses (cites #319)

2. org-templates/molecule-dev/org.yaml
   Same two checks added to the Security Auditor's 12-hour cron
   prompt's "MANUAL REVIEW of every changed file" section. Wording
   is concrete enough to paste into a grep: "flag any `!=` / `==` /
   bytes.Equal against a user-supplied value that gates auth".

Pure config / prompt — no code changes, no tests to write. YAML parse
verified, TestPlugins_UnionWithDefaults still passes.

Closes #342

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:24:34 -07:00
Hongming Wang
6b153ca3cb chore(auditor): close #319 + #337 prompt drift on Security Auditor (#342)
Two recent platform-level security changes (#319 channel_config
encryption, #337 constant-time webhook_secret compare) were not
reflected in the Security Auditor's system prompt or the schedule cron
prompt. That meant the auditor wouldn't proactively look for the
*next* instance of either class — a new credential field added to
channel_config without being added to sensitiveFields, or a new
secret comparison using raw `!=`, would slip through until a human
happened to notice.

Updated two files:

1. org-templates/molecule-dev/security-auditor/system-prompt.md
   Added two bullets to "What You Check":
   - Secret comparisons must use subtle.ConstantTimeCompare /
     crypto.timingSafeEqual (cites #337 as the repo's recent instance)
   - Secret storage at rest: any new channel_config credential field
     must be added to sensitiveFields and exercised in both the
     Encrypt (write) and Decrypt (read) boundary helpers, and the
     ec1: prefix must never leak into API responses (cites #319)

2. org-templates/molecule-dev/org.yaml
   Same two checks added to the Security Auditor's 12-hour cron
   prompt's "MANUAL REVIEW of every changed file" section. Wording
   is concrete enough to paste into a grep: "flag any `!=` / `==` /
   bytes.Equal against a user-supplied value that gates auth".

Pure config / prompt — no code changes, no tests to write. YAML parse
verified, TestPlugins_UnionWithDefaults still passes.

Closes #342

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:24:34 -07:00
Hongming Wang
7af8f33bcc fix(security): constant-time webhook_secret comparison (#337)
Severity LOW. The /webhooks/:type handler compared the Telegram
X-Telegram-Bot-Api-Secret-Token header against the decrypted
webhook_secret using Go's `!=` operator, which short-circuits on the
first mismatched byte. Under low-latency Docker-network conditions an
attacker could time response latency byte-by-byte and converge on the
real secret, then inject Telegram-formatted messages into any channel.

Fix: switch to crypto/subtle.ConstantTimeCompare, which runs in time
proportional to the length of the shorter input regardless of content
match. Same posture as the cdp-proxy token compare in host-bridge
(which already used timingSafeEqual).

Risk profile over the public internet is low (Telegram webhooks have
natural jitter that masks the signal), but the defensive pattern
matters for consistency across all secret comparisons.

Closes #337

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:23:12 -07:00
Hongming Wang
50819500f0 fix(security): constant-time webhook_secret comparison (#337)
Severity LOW. The /webhooks/:type handler compared the Telegram
X-Telegram-Bot-Api-Secret-Token header against the decrypted
webhook_secret using Go's `!=` operator, which short-circuits on the
first mismatched byte. Under low-latency Docker-network conditions an
attacker could time response latency byte-by-byte and converge on the
real secret, then inject Telegram-formatted messages into any channel.

Fix: switch to crypto/subtle.ConstantTimeCompare, which runs in time
proportional to the length of the shorter input regardless of content
match. Same posture as the cdp-proxy token compare in host-bridge
(which already used timingSafeEqual).

Risk profile over the public internet is low (Telegram webhooks have
natural jitter that masks the signal), but the defensive pattern
matters for consistency across all secret comparisons.

Closes #337

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:23:12 -07:00
Hongming Wang
94a9f92c50 fix(security): scope PausePollersForToken to requesting workspace (closes #329)
CI 5/6 pass (E2E cancel = run-supersession pattern). Dev Lead review 04:21:  Approved. Fixes cross-tenant token exposure: PausePollersForToken now scoped to requesting workspace_id via SQL WHERE clause. Closes #329.
2026-04-15 21:22:50 -07:00
Hongming Wang
a205c92428
fix(security): scope PausePollersForToken to requesting workspace (closes #329)
CI 5/6 pass (E2E cancel = run-supersession pattern). Dev Lead review 04:21:  Approved. Fixes cross-tenant token exposure: PausePollersForToken now scoped to requesting workspace_id via SQL WHERE clause. Closes #329.
2026-04-15 21:22:50 -07:00
Hongming Wang
9ea6fc23e0 chore(eco-watch): 2026-04-16 daily survey — Gemini CLI + open-multi-agent
CI fully green. Dev Lead review:  Approved. Docs-only: adds Gemini CLI and open-multi-agent entries to ecosystem-watch.md; files issues #332 (gemini-cli adapter) and #333 (PM goal-decomp skill).
2026-04-15 21:22:37 -07:00
Hongming Wang
12dc0ebdf2
chore(eco-watch): 2026-04-16 daily survey — Gemini CLI + open-multi-agent
CI fully green. Dev Lead review:  Approved. Docs-only: adds Gemini CLI and open-multi-agent entries to ecosystem-watch.md; files issues #332 (gemini-cli adapter) and #333 (PM goal-decomp skill).
2026-04-15 21:22:37 -07:00