Commit Graph

382 Commits

Author SHA1 Message Date
Hongming Wang
12dc0ebdf2
chore(eco-watch): 2026-04-16 daily survey — Gemini CLI + open-multi-agent
CI fully green. Dev Lead review:  Approved. Docs-only: adds Gemini CLI and open-multi-agent entries to ecosystem-watch.md; files issues #332 (gemini-cli adapter) and #333 (PM goal-decomp skill).
2026-04-15 21:22:37 -07:00
Hongming Wang
c11d8f3ec3
fix(security): hitl task-id ownership + wire fail_open_if_no_scanner in loader (closes #265, #268)
Security audit cycle 13: hitl.py LGTM (workspace-scoped task IDs). Loader.py fix applied (commit 0557f73): fail_open_if_no_scanner now read from config and forwarded to scan_skill_dependencies(); regression test added. CI 5/6 pass (E2E cancel = run-supersession pattern). Closes #265. Closes #268.
2026-04-15 21:18:52 -07:00
Hongming Wang
d3a7e4c8f9
chore(org): wire molecule-compliance + molecule-audit + molecule-freeze-scope (closes #322)
Config-only YAML. CI green on all 6 checks (E2E cancel = run-supersession pattern). Adds missing plugin wiring: Security Auditor→compliance+audit, Backend→compliance, QA→compliance, DevOps→freeze-scope. Closes #322.
2026-04-15 21:13:26 -07:00
Hongming Wang
75dee70027
docs(glossary): add terminology disambiguation table (closes #320)
CI fully green (all 6 checks pass). Docs-only: adds docs/glossary.md, links from README.md and CLAUDE.md. Closes #320.
2026-04-15 21:13:04 -07:00
Hongming Wang
d85ee97472
fix(security): encrypt channel_config bot_token at rest (closes #319)
CI fully green. Dev Lead code review:  clean, all read/write paths verified, tests cover round-trip + idempotency + legacy plaintext. Closes #319.
2026-04-15 21:09:34 -07:00
Hongming Wang
5c3aac11e3
fix(security): close WorkspaceAuth fail-open on non-existent workspace IDs (#318)
CI fully green. Security Audit cycle 15 LGTM. Closes #318. Closes #325.
2026-04-15 21:02:29 -07:00
Hongming Wang
4d7b1f56de
chore(template): widen idle-loop to Market Analyst + Competitive Intelligence (wave 2)
Expands autonomous orchestration reach to Market Analyst and Competitive Intelligence roles.
2026-04-15 20:29:41 -07:00
Hongming Wang
3252af6ea6
fix(template): Telegram channel for Security Auditor + DevOps Engineer (#246 #247)
Closes #246
Closes #247

Critical security findings and CI build-break alerts are now pushed via Telegram instead of waiting for someone to manually check memory/logs.
2026-04-15 19:57:34 -07:00
Hongming Wang
17b9263167
Merge pull request #314 from Molecule-AI/fix/issue-310-llm-judge-be-fe
feat(template): add molecule-skill-llm-judge to Backend + Frontend Engineer (#310)
2026-04-15 19:51:00 -07:00
Hongming Wang
ac8daf2f70 feat(template): add molecule-skill-llm-judge to Backend + Frontend Engineer (#310)
Backend Engineer and Frontend Engineer were missing molecule-skill-llm-judge
while Dev Lead, QA Engineer, and Security Auditor already have it.

llm-judge lets engineers self-gate their PR against the issue body before
requesting review, catching 'shipped the wrong thing' before Dev Lead sees it.
No new plugins needed — already installed org-wide.

Closes #310
2026-04-16 02:48:08 +00:00
Hongming Wang
fec287fce3
fix(security): add bearer token auth to /transcript endpoint (#287)
Closes #287

Any container on molecule-monorepo-net could previously read the full Claude session log without authentication. Guard uses get_token() from platform_auth — skipped only before workspace registration (dev-mode).
2026-04-15 19:47:23 -07:00
airenostars
af95a6eb78
feat(reno-stars): citation-builder — one backlink directory per day (#299)
Closes #301

Co-authored-by: airenostars <noreply@github.com>
2026-04-15 19:47:20 -07:00
Hongming Wang
8fc4940798
Merge pull request #308 from Molecule-AI/fix/uiux-cron-cadence-hourly
fix(template): UIUX Designer cron from 15min to hourly (#306)
2026-04-15 19:22:29 -07:00
Hongming Wang
ece45bbf45 fix(template): UIUX Designer cron from 15min to hourly (#306)
Closes #306. The cron expression was "5,20,35,50 * * * *" (every 15
min = 96 ticks/day) despite the schedule being named "Hourly UI/UX
audit". Each tick launches Chromium, takes 8 screenshots, runs them
through Claude vision, and delegates to PM — 768 vision calls/day
from one workspace with no meaningful delta between ticks (canvas UI
only changes on deploys).

Changed to "5 * * * *" (hourly, at :05 past the hour). 6x reduction
in cost + noise.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 19:22:19 -07:00
Hongming Wang
5c4146e09c
Merge pull request #307 from Molecule-AI/fix/backend-engineer-security-scan
feat(template): add molecule-security-scan to Backend Engineer (#303)
2026-04-15 19:21:19 -07:00
Hongming Wang
d9065bcc4d feat(template): add molecule-security-scan to Backend Engineer (#303)
Closes #303. Surfaces CVE/secret scanning at dev time instead of
waiting for the Security Auditor's 12h cron. Backend Engineer's
plugin list: [molecule-hitl, molecule-skill-code-review,
molecule-security-scan].

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 19:21:11 -07:00
Hongming Wang
e88ae9f6d0
fix(a2a-tools): auth_headers on recall_memory + commit_memory (#304)
Adds auth_headers to recall_memory and commit_memory in a2a_tools.py. Fixes the #215-class auth regression for A2A memory tools. Test mocks updated to accept headers kwarg.
2026-04-15 19:12:18 -07:00
Hongming Wang
f28bba0321
Merge pull request #297 from Molecule-AI/fix/cdp-plist-chmod-600
fix(security): chmod 600 macOS launchd plist (#296)
2026-04-15 18:20:55 -07:00
Hongming Wang
009769e263 fix(security): chmod 600 macOS launchd plist containing CDP token (#296)
One-liner oversight from #295: the macOS install path wrote the plist
with the default umask (~0644), leaving CDP_PROXY_TOKEN world-readable
to any local user account. The Linux path already writes to a chmod
600 env-file — this brings macOS to parity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:20:48 -07:00
Hongming Wang
5ba54ba574
Merge pull request #295 from Molecule-AI/fix/cdp-proxy-bind-localhost
fix(security): token-auth on cdp-proxy to prevent LAN exposure (#293)
2026-04-15 18:00:30 -07:00
Hongming Wang
c0be9baab1 fix(security): token-auth on cdp-proxy to prevent LAN exposure (#293)
HIGH finding from security-auditor on PR #291 (merged tick-37). The
cdp-proxy bound to 0.0.0.0:9223 with no authentication, exposing
Chrome DevTools Protocol — full remote control of any tab, including
cookie/localStorage exfiltration — to anyone on the same WiFi/LAN.

Root cause: Docker Desktop on macOS routes host.docker.internal
through the VM network interface, not loopback. Binding to 127.0.0.1
would break the primary use case (containers reaching the host
Chrome). The design trade was "bind wide for reachability, accept LAN
exposure" — #293 makes that trade unacceptable.

Fix: bearer token auth on every HTTP + WebSocket request. The proxy
REFUSES TO START without a token — no unauth mode.

Three-file change:

1. cdp-proxy.cjs
   - Read token from CDP_PROXY_TOKEN env OR ~/.molecule-cdp-proxy-token
   - Fail loudly if neither is set (exit 1 with install-host-bridge.sh
     pointer)
   - Validate X-CDP-Proxy-Token header via crypto.timingSafeEqual on
     every HTTP request AND every WS upgrade
   - Strip the header before forwarding to Chrome (defense in depth —
     token never leaks into Chrome's request log)

2. install-host-bridge.sh
   - New ensure_token() function generates a 64-char hex token via
     openssl rand -hex 32 (fallback to /dev/urandom). Written to
     ~/.molecule-cdp-proxy-token with chmod 600.
   - macOS: token injected into launchd plist EnvironmentVariables
   - Linux: written to ~/.molecule-cdp-proxy.env (chmod 600) and
     referenced via systemd EnvironmentFile — avoids embedding the
     token in the often world-readable unit file
   - Install reuses existing token if present (16+ chars); uninstall
     preserves token file so a reinstall keeps the same token
   - Verify command now includes the token header
   - Documents container-side bind-mount pattern
     (-v ~/.molecule-cdp-proxy-token:/run/secrets/cdp-proxy-token:ro)

3. lib/connect.js
   - New loadProxyToken() with precedence: env var >
     /run/secrets/cdp-proxy-token > ~/.molecule-cdp-proxy-token
   - Attaches X-CDP-Proxy-Token header on both /json/version probe +
     final puppeteer.connect() call via headers: {} option
     (puppeteer-core v21+ supports this natively)
   - Host-direct fallback (CDP port 9222 on loopback) unchanged —
     Chrome's own port is loopback-only so it doesn't need the token

Attack surface now:
  - LAN attacker must also steal the token file from the user's home
    directory (requires shell access) OR the env var (requires
    launchd/systemd process inspection as the same user) — reduces to
    local-privilege-escalation territory
  - Containers on the same Docker network still have access (they
    mount the token by design) — intentional, any workspace-template
    install already runs inside the platform's trust boundary

Not fixing in this PR:
  - Rate limiting on /json/version (low priority — probe-and-mine is
    expensive even without)
  - IP allowlist on top of token auth (diminishing returns)
  - Rotating the token periodically (user can rm ~/.molecule-cdp-proxy-token
    and reinstall)

Closes #293.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:00:02 -07:00
Hongming Wang
004f418d36
Merge pull request #271 from Molecule-AI/fix/seo-builder-delegate-code-blockers
fix(reno-stars): SEO Builder delegates code blockers to Dev Leader, not human
2026-04-15 17:56:09 -07:00
Hongming Wang
472495c380
Merge pull request #270 from Molecule-AI/feat/workspace-transcript-endpoint
feat: GET /workspaces/:id/transcript — live agent session log
2026-04-15 17:55:41 -07:00
Hongming Wang
bd51ea6190
Merge pull request #292 from Molecule-AI/feat/reno-stars-social-publish-helpers
feat(reno-stars): social-publish skill with 7 battle-tested helpers
2026-04-15 17:53:58 -07:00
Hongming Wang
8dc833f306
Merge pull request #291 from Molecule-AI/feat/browser-automation-cdp-proxy-bundled
feat(browser-automation): bundle host-bridge CDP proxy for portable Chrome access
2026-04-15 17:53:31 -07:00
airenostars
f2ab9eb924 fix(reno-stars): SEO Builder delegates code blockers to Dev Leader, not human
Issue surfaced in SEO Builder Run 10 (2026-04-15):
- Marketing Leader found 2 code-level metadata blockers
  (white-rock page.tsx override + en.json description >160c)
- Telegram report listed them under "⚠️ ACTION ITEMS (human)"
- User: "it should automatically report to dev team instead of
  just asking CEO to do it"

Fix: when seo-builder finds a code-level blocker it can't fix via
DB, it delegates to the Dev Leader sibling workspace via A2A instead
of flagging for human. Only genuine human actions (Yelp email
verification, Google account-linked operations) stay in the human
bucket.

Also clarify marketing-leader/CLAUDE.md so the "DO NOT DELEGATE"
rule doesn't accidentally block this pattern — it's now explicit
that sibling handoff for scope mismatches is allowed (as opposed
to delegating down the hierarchy to spawn sub-agents, which stays
forbidden).
2026-04-15 17:47:27 -07:00
airenostars
66b8cbb7fa fix(transcript): validate workspace URL to prevent SSRF (#272)
`TranscriptHandler.Get` previously proxied `agent_card->>'url'` directly
to the outbound HTTP client with no validation. Since `agent_card` is
attacker-writable via /registry/register, a workspace-token holder
could point it at cloud metadata (169.254.169.254), link-local ranges,
or non-http schemes and pivot the platform container against internal
services (IMDS, Redis, Postgres, other containers on the Docker net).

Four required fixes per reviewer:

1. `validateWorkspaceURL(u *url.URL)` — runs before `httpClient.Do`:
   - scheme must be http/https (rejects file://, gopher://, ftp://)
   - cloud metadata hostname blocklist (GCP + Azure + plain "metadata")
   - IMDS IP blocklist (169.254.169.254)
   - IPv4/IPv6 link-local blocklist (169.254/16, fe80::/10, multicast)
   - IPv6 unique-local fd00::/8 blocklist
   - loopback + docker.internal still allowed for local dev

2. Query-param allowlist — `target.RawQuery = c.Request.URL.RawQuery`
   forwarded everything verbatim, letting a caller smuggle params the
   upstream transcript endpoint didn't intend to expose. Replaced with
   an allowlist of `since` and `limit`.

3. Sanitized error string — `fmt.Sprintf("workspace unreachable: %v", err)`
   leaked the actual internal host/IP via `net.OpError`. Now logs the
   real error server-side and returns a plain "workspace unreachable"
   to the caller.

4. 10 new regression test cases:
   - `TestTranscript_Rejects{CloudMetadataIP,NonHTTPScheme,MetadataHostname,LinkLocalIPv6}`
     exercise the handler end-to-end with each attack URL and assert
     400 before the HTTP client fires.
   - `TestValidateWorkspaceURL` table-drives the validator across
     localhost/public/docker-internal (allowed) + IMDS/GCP/Azure/file/
     gopher/link-local/multicast (rejected).
   - `TestTranscript_ProxyPropagatesAllowlistedQueryParams` asserts
     `secret=leak&cmd=rm` is stripped while `since=42&limit=7` pass
     through.

Also fixed a pre-existing test bug: `seedWorkspace` was issuing a real
SQL Exec against sqlmock with no expectation set, so the prior test
helpers silently failed in CI. Replaced with `expectWorkspaceURLLookup`
which programs the mock correctly. All 11 tests now pass.

Closes #272

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:46:55 -07:00
airenostars
f6922d9cb5 feat(reno-stars): social-publish skill with 7 battle-tested helpers
Add a new `social-publish` skill under the Marketing Leader template
containing verbatim copies of 7 puppeteer-core helper scripts that reliably
publish video posts to Facebook, Instagram, X, LinkedIn, TikTok, YouTube,
and Google Business Profile. Each helper encapsulates hours of debugging
from the 2026-04-15 incident (Lexical editor mirror selection, FB Reel
Next-button disambiguation, post-publish upsell dismissal, TikTok
beforeunload race, GBP iframe scoping, etc).

Rewrite the existing social-media-poster / monitor / engage skills to
delegate publishing to these helpers instead of freestyling puppeteer
per run. Mirror the same delegation note into the social-media-specialist
skill copies so both the Marketing Leader and its specialist agent follow
the same rule.

Not implemented as a platform plugin: the helpers are dom-specific to
Reno Stars Chrome sessions (profile path, account IDs, hardcoded URLs)
and belong in org-template content rather than a generic platform
capability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:34:13 -07:00
airenostars
ff19c2ce26 feat(browser-automation): bundle host-bridge CDP proxy + connect helper
The plugin now ships everything a user needs to wire Chrome on their
host to workspaces inside Docker:

- host-bridge/cdp-proxy.cjs — rewrites the Host header so Chrome accepts
  DevTools Protocol connections from container-originated traffic, and
  forwards both HTTP (tab list, screenshots) and WebSocket upgrades.

- host-bridge/install-host-bridge.sh — one-command install on macOS
  (launchd user agent) or Linux (systemd --user unit). `uninstall`
  subcommand cleans up. No root required.

- skills/browser-automation/lib/connect.js — the mandatory helper
  consumers already use; re-exported here so the plugin is self-contained.

- SKILL.md — documents the one-time host setup and the existing
  defaultViewport:null + disconnect-not-close rules. The 2026-04-15
  social-media-poster incident (3h debug chasing phantom "sessions
  expired" errors on an 800x600 viewport) is captured inline.

Smoke-tested on macOS: install script registered the agent, proxy
listens on 0.0.0.0:9223, and a live workspace container
(ws-bee4d521-3d3) successfully reached Chrome via
host.docker.internal:9223.

This replaces ad-hoc per-user CDP proxies and makes the plugin
usable by any Molecule operator, not just the Reno Stars org.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:29:46 -07:00
Hongming Wang
7720489df9
Merge pull request #290 from Molecule-AI/chore/ci-e2e-api-concurrency-group
chore(ci): serialize e2e-api across runs to prevent docker collision
2026-04-15 17:29:40 -07:00
Hongming Wang
b231449eb4
Merge pull request #285 from Molecule-AI/fix/memory-tools-auth-headers
fix(memory-tools): #215-class — auth_headers on commit_memory + search_memory HTTP fallback
2026-04-15 17:29:24 -07:00
Hongming Wang
469d24c23a fix(tests): update memory fakes for auth_headers kwarg + activity overwrite
The #215-class fix in memory.py (859a60e) adds headers=_headers to the
direct-httpx commit_memory + search_memory paths, but 9 existing tests
in test_memory.py had FakeAsyncClient.post/get signatures like
`async def post(self, url, json):` with no headers kwarg. Python
raised TypeError: unexpected keyword argument 'headers' on every call,
commit_memory caught it and returned {success: False}, tests failed.

Fixes applied:

1. Add `headers=None` to every FakeAsyncClient.post + .get signature
   across test_memory.py. Uses replace_all so all 9+ fakes match.

2. For tests that capture a single captured["url"]:
   - test_commit_memory_uses_awareness_client_when_configured
   - test_commit_memory_uses_platform_fallback_without_awareness
   - test_commit_memory_httpx_201_success
   filter to only capture /memories URLs. Without the filter, the
   subsequent _record_memory_activity fire-and-forget post to /activity
   overwrites captured["url"] and the assertion fails.

3. For test_commit_memory_promoted_packet_logs_skill_promotion: bump
   expected captured["calls"] from 3 to 4. Pre-fix, the memory_write
   /activity call (from _record_memory_activity #125) was silently
   dropped because the fake rejected headers=; post-fix it succeeds
   and lands in the captured list alongside the skill_promotion
   /activity and /registry/heartbeat calls. Also extend that test's
   fake to accept /registry/heartbeat (was raising AssertionError).

Total: 36/36 memory tests pass. Full workspace-template suite 1189/1189.

This is strictly test-infrastructure work — zero production code
changed. CI never caught the break because the Mac mini runner has
been stuck for ~4 hours (tick-33/34/35/36 reports).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:29:15 -07:00
rabbitblood
48eca7264c fix(memory-tools): #215-class — auth_headers on commit_memory + search_memory HTTP fallback
Context: platform now gates `GET /workspaces/:id/memories` and
`POST /workspaces/:id/memories` behind workspace auth (post-#166 /
#167 AdminAuth wave). The `builtin_tools.memory` tool had three HTTP
call sites:

  1. commit_memory POST fallback (line 121)        ← NO auth_headers
  2. search_memory GET fallback (line 269)         ← NO auth_headers
  3. activity-log helper POST (line 371)           ← HAS auth_headers

Path 3 was already fixed. Paths 1 + 2 silently 401 every call, but the
tool's error-handling path returns `{"success": False}` without surfacing
the auth failure to the agent. Result: the agent sees an empty memory
backlog on every call and assumes there's nothing to do.

## Discovered today

Technical Researcher is the first workspace opted in to the idle-loop
pilot from #216 (reflection-on-completion pattern). The pilot fires
every 10 min, the agent calls `search_memory "research-backlog:..."` as
the first step, gets back an empty result, writes "tr-idle clean" to
memory, and stops. Clean-idle outcome every tick, 9 consecutive ticks.

Looking at TR's activity_logs response bodies:

    "Memory auth has failed on every tick this session — skipping the call"
    "tr-idle — step 2 done. Memory unavailable (auth token missing..."
    "tr-idle 04:15 — clean (memory auth still down, 3rd consecutive tick)"

The AGENT knew the memory calls were failing. The platform 401 error
was surfacing in the tool response, but our instrumentation wasn't
counting it as a defect — we saw "tr-idle clean" writes and assumed
the pilot was working as designed. It was actually silently broken.

## Fix

Import `platform_auth.auth_headers` lazily (same pattern as the
activity-log path already uses), attach `headers=_auth()` to both
httpx call sites. Matches the #225 fix for the register call.

## Not in this PR

- awareness_client.py also makes HTTP calls to a separate AWARENESS_URL
  service (not the platform), which may or may not need the same fix
  depending on that service's auth posture. Out of scope for this PR.

- TR's specific token problem: TR's `/configs/.auth_token` file is
  empty because it was re-provisioned via `apply_template: true`
  (recovery path from the failed-volume incident) and Phase 30.1
  only mints a token on FIRST register per workspace. This fix
  doesn't help TR until TR gets a fresh token — tracked separately.

## Test plan

- [x] Python syntax check on memory.py passes
- [ ] CI: all memory-related tests should still pass (the new code
      paths only add header passing, no shape change)
- [ ] Real-world verification: after TR gets a fresh token, idle-loop
      pilot should produce a dispatch within 10 min (seeded backlog
      already in place from this session)

## Related
- #215 / #225 — register call auth_headers fix (same pattern)
- #216 — TR idle-loop pilot (couldn't measure until this lands)
- #166 / #167 — platform AdminAuth wave that surfaced this gap
2026-04-15 17:26:26 -07:00
Hongming Wang
f2457ac287 chore(ci): serialize e2e-api across runs to prevent docker collision
Now that the Molecule-AI org has two self-hosted Apple-silicon runners
(`hongming-m1-mini` + `hongming-m1-mini-2`) servicing the same label set,
two CI runs could execute the e2e-api job concurrently. Each run starts
fixed-name docker containers (`molecule-ci-postgres`, `molecule-ci-redis`)
bound to host ports 15432/16379 — a collision means the second run fails
with "container name already in use" or "port already in use".

Adds a workflow-level `concurrency: e2e-api` group to the job so GitHub
Actions serializes e2e-api executions globally regardless of which runner
picks them up. `cancel-in-progress: false` ensures later runs queue
rather than cancelling the in-flight one (we want every PR's e2e check
to actually execute, not get skipped by a newer push).

Tradeoff: e2e-api is now effectively single-threaded across the whole
org. Measured duration is ~1-2 min per run, so the added serialization
latency is small relative to total CI wall time. All other jobs still
parallelize across both runners.
2026-04-15 17:06:41 -07:00
Hongming Wang
ba285504e0
Merge pull request #289 from Molecule-AI/fix/code-review-plugin-on-engineers
feat(template): add molecule-skill-code-review to Frontend/Backend/DevOps Engineer (#280)
2026-04-15 16:55:47 -07:00
Hongming Wang
ea12ff9761 feat(template): add molecule-skill-code-review to Frontend/Backend/DevOps Engineer (#280)
Closes #280. Self-review rubric now runs on the same workspaces that
raise PRs, not just on the reviewers. Dev Lead uses the same
16-criteria rubric in review, so catching issues pre-PR cuts the
review loop.

- Frontend Engineer: new plugins: [molecule-skill-code-review]
- Backend Engineer: plugins extended from [molecule-hitl] to
  [molecule-hitl, molecule-skill-code-review]
- DevOps Engineer: plugins extended from [molecule-hitl] to
  [molecule-hitl, molecule-skill-code-review]

The issue didn't explicitly call out DevOps Engineer but the reasoning
applies — DevOps Engineer writes Dockerfiles + CI workflows + infra
scripts that Dev Lead reviews with the same rubric. Including here
for consistency.

Verified all 5 reviewer/engineer roles' plugin lists via
walk-script:
  Dev Lead:        [code-review, llm-judge]
  Frontend Eng:    [code-review]                         ← NEW
  Backend Eng:     [hitl, code-review]                   ← NEW
  DevOps Eng:      [hitl, code-review]                   ← NEW
  Security Aud:    [code-review, cross-vendor, llm-judge,
                    security-scan, hitl]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:55:24 -07:00
Hongming Wang
b31a5a4a53
Merge pull request #276 from Molecule-AI/feat/hermes-phase2d-i-system-prompt
feat(hermes): Phase 2d-i — system-prompt.md injection on all 3 dispatch paths
2026-04-15 16:53:31 -07:00
Hongming Wang
d340924479
Merge pull request #288 from Molecule-AI/fix/security-headers-referrer-permissions
fix(security): add Referrer-Policy + Permissions-Policy headers (#282)
2026-04-15 16:52:37 -07:00
Hongming Wang
cb37aa850c fix(security): add Referrer-Policy + Permissions-Policy headers (#282)
Closes #282. CLAUDE.md documented the SecurityHeaders() middleware as
setting 6 headers (X-Content-Type-Options, X-Frame-Options, Referrer-
Policy, Content-Security-Policy, Permissions-Policy, HSTS) but the
implementation only set 4 — Referrer-Policy and Permissions-Policy
were silently missing.

Adds:
- Referrer-Policy: strict-origin-when-cross-origin — prevents
  browsers from leaking full paths/queries in Referer on cross-
  origin navigation. Particularly relevant for canvas embeds of
  Langfuse trace URLs that may contain trace IDs.
- Permissions-Policy: camera=(), microphone=(), geolocation=() —
  denies sensor access by default. Iframes the canvas embeds
  (Langfuse trace viewer etc.) can no longer request these
  without an explicit delegation.

Regression tests added to securityheaders_test.go — both headers
are now in the same table-driven assertion loop as the other 4,
so a future edit that drops them again fails CI loudly.

LOW severity — this is defense-in-depth, not a direct exploit path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:52:19 -07:00
Hongming Wang
60bc2dba2e
Merge pull request #277 from Molecule-AI/fix/wire-security-plugins-to-roles
feat(template): wire molecule-hitl + molecule-security-scan into roles (#266, #275)
2026-04-15 16:22:19 -07:00
Hongming Wang
bb366c13ba feat(template): wire molecule-hitl + molecule-security-scan into roles (#266, #275)
Closes #266 and #275. Per-role install matrix matching the per-tick
#266 triage comment.

## Added plugins

| Role | Plugin | Rationale |
|---|---|---|
| Backend Engineer | molecule-hitl | Scope includes destructive DB migrations + runtime config changes — @requires_approval stops unattended agents from shipping prod schema mutations. |
| DevOps Engineer | molecule-hitl | Scope covers fly deploys + registry pushes + CI pipeline mutations — @requires_approval before destructive infra ops. |
| Security Auditor | molecule-hitl | Gates public issue filing for critical findings; prevents false-positive spam of the tracker. |
| Security Auditor | molecule-security-scan | Primary consumer of gosec/bandit/CVE scanning via builtin_tools/security_scan.py. Security Auditor system prompt already expects to run these tools; this wires them. |

## Per-PR #71 semantics
Each workspace's `plugins:` UNIONs with `defaults.plugins` — these
additions don't drop any existing plugin. Security Auditor's list went
from 3 → 5; Backend + DevOps Engineer now have a role-specific list
layered on top of defaults.

## NOT adding (yet)
Dev Lead / Research Lead / Technical Researcher / QA Engineer / UIUX
Designer / PM / Documentation Specialist — none have destructive ops
scope in the role description. If you want belt-and-suspenders HITL
coverage I can extend this PR; leaving narrow for now.

## Test plan
- [x] YAML parses cleanly (python3 -c 'import yaml; yaml.safe_load(...)')
- [x] Three edited roles' plugins lists verified by walk-script
- [ ] Next org re-import activates the plugins on each workspace container
- [ ] Agents invoke request_approval / security_scan from their system
      prompts after re-import

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:21:58 -07:00
rabbitblood
baffc6b0c3 feat(hermes): Phase 2d-i — system-prompt.md injection on all 3 dispatch paths
The Hermes adapter never read /configs/system-prompt.md. Any role that
switched to runtime: hermes was silently losing its role identity because
the system prompt wasn't passed to the model. This PR fixes that by:

1. HermesA2AExecutor.__init__ takes new optional `config_path` kwarg
2. `create_executor(config_path=...)` forwards to the constructor
3. `adapter.py` passes `config.config_path` through from AdapterConfig
4. `execute()` reads system-prompt.md via executor_helpers.get_system_prompt
   (hot-reload-capable — reads on every turn, not just at startup)
5. `_do_inference(user_message, history, system_prompt)` — new arg threads
   through the dispatch to each native path
6. Each path uses the provider's NATIVE system field:
   - OpenAI-compat: prepends `{"role":"system", "content":...}` to messages
   - Anthropic: top-level `system=` kwarg (NOT in messages — Anthropic
     requires system at the top level)
   - Gemini: `config=GenerateContentConfig(system_instruction=...)`

## Phase scoreboard
- 2a (in main) — native Anthropic dispatch infra
- 2b (in main) — native Gemini dispatch
- 2c (in main) — multi-turn history on all paths
- **2d-i (this PR)** — system prompts on all paths
- 2d-ii (future) — tool calling on native paths
- 2d-iii (future) — vision content blocks on native paths
- 2d-iv (future) — streaming

## Test coverage

46/46 tests pass (20 Phase 2 dispatch + 26 Phase 1 registry):

- Existing dispatch tests updated to assert the 3-arg call shape
  `("hello", None, None)` — history + system_prompt both None
- 4 new tests:
  - `dispatch_passes_system_prompt_to_anthropic` — happy path, third arg flows
  - `dispatch_passes_system_prompt_to_gemini` — happy path
  - `dispatch_passes_system_prompt_to_openai` — happy path
  - `executor_accepts_config_path_kwarg` — constructor stores config_path
  - `create_executor_forwards_config_path` — both back-compat and registry
    resolution paths forward config_path through to the executor

## Back-compat

- `config_path=None` (default) → execute() skips system-prompt injection,
  same behavior as pre-2d-i
- Workspaces with `runtime: hermes` but no `/configs/system-prompt.md`
  file get `system_prompt=None` (get_system_prompt returns fallback),
  same as before
- The 13 OpenAI-compat providers work identically — system_prompt just
  adds a leading message, which every OpenAI-compat endpoint already
  supports
- Anthropic + Gemini previously got zero system context; now they get
  the same system prompt the workspace's system-prompt.md carries

## Why this matters

Before this PR: if someone flipped a workspace from `runtime: claude-code`
to `runtime: hermes`, the agent would act generically (no role identity,
no project conventions, no CLAUDE.md context) because the Hermes executor
never looked at system-prompt.md. That's a silent correctness regression
the test suite wouldn't catch because none of our live workspaces use
the hermes runtime today.

With this PR: Hermes workspaces get the same system prompt injection as
Claude-code workspaces, making the `runtime: hermes` switch a true drop-in
alternative.

## Related
- #267 Phase 2c (multi-turn history — in main)
- #255 Phase 2b (gemini native — in main)
- #240 Phase 2a (anthropic native — in main)
- #208 Phase 1 (provider registry — in main)
- project_hermes_multi_provider.md — Phase 2d-i was the next queued item
2026-04-15 16:21:47 -07:00
Hongming Wang
ab8f6a1c7a
Merge pull request #267 from Molecule-AI/feat/hermes-phase2c-streaming
feat(hermes): Phase 2c — multi-turn history passed natively to all dispatch paths
2026-04-15 16:10:21 -07:00
Hongming Wang
d02ede498d
Merge pull request #273 from Molecule-AI/fix/ci-self-hosted-runner-failures
fix(ci): publish-platform-image keychain + path diagnostics
2026-04-15 16:06:53 -07:00
Hongming Wang
0b403aeeab fix(ci): publish-platform-image keychain + path diagnostics
Every publish-platform-image run since the aa41947 self-hosted runner
migration has been failing with two runner-level issues that the
workflow now works around (keychain) or surfaces clearly (path):

1. "error storing credentials - err: exit status 1, out:
   'User interaction is not allowed. (-25308)'"

   docker/login-action tries to persist the GHCR + Fly tokens in the
   macOS Keychain, but the Mac mini runner runs as a non-interactive
   launchd service without an unlocked desktop session — keychain
   access raises -25308. Fix: set DOCKER_CONFIG to a per-run temp dir
   containing a plain config.json before the login step so credentials
   land in a file, not the keychain. This is the same trick the
   GitHub-hosted macos runners use in docker action examples.

2. "Unexpected error attempting to determine if executable file
   exists '/usr/local/bin/docker': Error: EACCES: permission denied,
   stat '/usr/local/bin/docker'"

   Not a workflow bug — the runner literally can't read the Docker
   binary path. Adds a diagnostic step before QEMU/buildx setup that
   prints: PATH, `command -v docker`, `docker --version`, and
   `ls -la` on both /usr/local/bin/docker and /opt/homebrew/bin/docker.
   Surfacing these in the log means the next failure (if any) shows
   the actual problem instead of hiding behind a cryptic buildx error.

Does NOT fix the root cause of #2 — that needs the user to SSH into
the Mac mini runner and reinstall / re-permission Docker Desktop
(or switch to Colima/OrbStack). The diagnostic output will tell us
exactly which path is broken.

The 20+ queued CI runs from `ci.yml` are unrelated to this PR — they
are stuck because the self-hosted runner has severely degraded queue
throughput (runs wait 2+ hours before being picked up). That's a
separate runner-health issue tracked as a user action in the triage
report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:06:28 -07:00
airenostars
1f22d7df1b feat: GET /workspaces/:id/transcript — live agent session log
Closes #N (issue to be filed)

Lets canvas / operators see live tool calls + AI thinking instead of
waiting for the high-level activity log to flush. Right now the only
way to "look over an agent's shoulder" is `docker exec ws-XXX cat
/home/agent/.claude/projects/.../<session>.jsonl`, which:
  - doesn't work for remote workspaces (Phase 30 / Fly Machines)
  - requires shell access on the host
  - has no pagination

This PR adds:

1. `BaseAdapter.transcript_lines(since, limit)` — async hook returning
   `{runtime, supported, lines, cursor, more, source}`. Default returns
   `supported: false` so non-claude-code runtimes pass through gracefully.

2. `ClaudeCodeAdapter.transcript_lines` override — reads the most-
   recently-modified `.jsonl` in `~/.claude/projects/<cwd>/`. Resolves
   cwd the same way `ClaudeSDKExecutor._resolve_cwd()` does so the
   project dir name matches what Claude Code actually writes to. Limit
   capped at 1000 to prevent OOM.

3. Workspace HTTP route `GET /transcript` — Starlette handler added
   alongside the A2A app. Trusts the internal Docker network (same
   model as POST / for A2A); Phase 30 remote-workspace auth is a
   follow-up.

4. Platform proxy `GET /workspaces/:id/transcript` — looks up the
   workspace's URL, forwards GET, caps response at 1MB. Gated by
   existing `WorkspaceAuth` middleware (same as /traces, /memories,
   /delegations).

Tests: 6 Python unit tests cover empty dir / pagination / multi-session
/ malformed lines / limit cap, plus 4 Go tests cover 404 / proxy
forwarding / query-string propagation / unreachable-workspace 502.

Verified end-to-end on a live workspace — returns real claude-code
session entries through the platform proxy.

## Follow-ups
- WebSocket variant for live streaming (instead of polling)
- Canvas UI tab "Transcript" between Activity and Traces
- LangGraph / DeepAgents / OpenClaw transcript adapters
- Phase 30 remote-workspace auth on /transcript
2026-04-15 14:29:43 -07:00
rabbitblood
cb3c7dcf91 feat(hermes): Phase 2c — multi-turn history passed natively to all paths
Completes the Phase 2 scope by keeping conversation turns as turns across
all three dispatch paths. Pre-2c, history was flattened into a single user
message via shared_runtime.build_task_text, which worked as a fallback but
lost the model's native multi-turn awareness (role attribution,
instruction-following on mid-conversation corrections, system-prompt
grounding against prior turns).

Phase 2a + 2b shipped the dispatch infrastructure + per-provider native
paths. This PR uses them properly.

## What's new

- **`_history_to_openai_messages(user_message, history)`** (static) — maps
  A2A `(role, text)` tuples to OpenAI Chat Completions
  `[{"role":"user"|"assistant","content":str}]`. Roles: `human`→`user`,
  `ai`→`assistant`. Current turn appended as the final user message.

- **`_history_to_anthropic_messages`** (static) — identical wire shape to
  OpenAI for text-only turns, so it delegates. Phase 2d tool_use/vision
  blocks will diverge here.

- **`_history_to_gemini_contents`** (static) — Gemini uses a different
  shape: `role="user"|"model"` (NOT "assistant") and text wrapped in
  `parts=[{"text":...}]`. Delegates to none of the others.

- **`_do_openai_compat(user_message, history=None)`** — accepts history,
  builds messages via `_history_to_openai_messages`. Back-compat: pass
  `history=None` to get the old single-turn behavior.

- **`_do_anthropic_native(user_message, history=None)`** — same signature
  change, calls `_history_to_anthropic_messages`. Still uses
  `anthropic.AsyncAnthropic().messages.create()`, just with proper
  multi-turn.

- **`_do_gemini_native(user_message, history=None)`** — same pattern,
  calls `_history_to_gemini_contents`, passes to Gemini's
  `generate_content(contents=...)`.

- **`_do_inference(user_message, history=None)`** — new signature,
  dispatches by auth_scheme as before, passes both args through.

- **`execute()`** — no longer calls `build_task_text`. Calls
  `extract_history(context)` directly and forwards to `_do_inference`.
  Removes the `build_task_text` import (not needed in this file anymore).

## Tests

Existing 7 dispatch tests updated for the new `(user_message, history)`
signature — they assert the path is called with `("hello", None)` since
they pass no history.

5 NEW tests:

- `test_history_to_openai_messages_empty_history` — empty history degrades
  to single user message (back-compat)
- `test_history_to_openai_messages_multi_turn` — round-trip of a 3-turn
  history + current turn
- `test_history_to_anthropic_messages_same_as_openai` — cross-check that
  anthropic path produces identical wire shape for text-only
- `test_history_to_gemini_contents_uses_model_role_and_parts_wrapper` —
  verifies the Gemini-specific role mapping (`ai`→`model`) + parts wrapper
- `test_dispatch_passes_history_through` — end-to-end: _do_inference
  forwards history to the chosen provider path

All 41 tests pass (15 Phase 2 dispatch + 26 Phase 1 registry):

    pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
    41 passed in 0.07s

## Back-compat

- No public API changes to `create_executor()`. Callers that hit
  `execute()` via A2A get the new multi-turn behavior automatically via
  `extract_history(context)`.
- Callers that passed an empty history list (or None) get the same
  single-turn behavior as pre-2c.
- The `build_task_text` helper in shared_runtime is unchanged — other
  adapters (AutoGen, LangGraph) that use it keep working. Only Hermes
  bypasses it now.

## What's NOT in this PR (Phase 2d)

- Tool calling / function calling on native paths (anthropic `tools=`,
  gemini `tools=Tool(function_declarations=[...])`)
- Vision content blocks (image_url → anthropic `{type:"image", source:
  {type:"base64",...}}` / gemini `{inline_data:{mime_type,data}}`)
- System instructions pass-through (anthropic `system=`, gemini
  `system_instruction=`)
- Streaming (`astream_messages` / `streamGenerateContent` stream variants)
- Extended thinking (anthropic `thinking={"type":"enabled"}`) / Gemini
  thinking config

Phase 2c is the **multi-turn upgrade**. Tool + vision + streaming are
Phase 2d, scoped in project_hermes_multi_provider.md.

## Related

- #240 Phase 2a (native Anthropic dispatch — in main)
- #255 Phase 2b (native Gemini dispatch — in main)
- Phase 1 (#208 — provider registry baseline, in main)
- `project_hermes_multi_provider.md` queued memory
- CEO 2026-04-15: "focus on supporting hermes agent"
2026-04-15 14:21:10 -07:00
Hongming Wang
2afd65104d
Merge pull request #264 from Molecule-AI/feat/plugin-compliance-posture-split
feat(plugin): split compliance-posture into 3 plugins (#256)
2026-04-15 14:15:55 -07:00
Hongming Wang
45e4eb0be3 feat(plugin): split compliance-posture into 3 plugins (#256)
Closes #256. Per CEO direction, shipping three separate opt-in plugins
instead of one bundled "compliance-posture" — keeps installs granular
so a workspace that only wants CVE scanning doesn't carry OWASP policy
or append-only audit retention.

- plugins/molecule-compliance/        — wraps compliance.py (OWASP OA-01
  prompt injection + OA-03 excessive agency). Skill: owasp-agentic.
- plugins/molecule-audit/              — wraps audit.py (EU AI Act Art.
  12/13/17 append-only JSONL log, SIEM-friendly). Skill: ai-act-audit-log.
- plugins/molecule-security-scan/      — wraps security_scan.py (Snyk or
  pip-audit CVE gate on skill requirements.txt). Skill: skill-cve-gate.

Each plugin ships a manifest + one SKILL.md with:
- When to install / when to skip
- Configuration shape (config.yaml blocks)
- Anti-patterns to avoid
- Cross-references to the other two plugins so an operator can reason
  about the full compliance surface

All three wrap code that already exists in workspace-template/builtin_tools/
— no Python changes. Install per workspace via
POST /workspaces/:id/plugins {"source":"builtin://molecule-<name>"}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:15:25 -07:00
Hongming Wang
2aa901882f
Merge pull request #263 from Molecule-AI/docs/sync-2026-04-15-tick-32
docs: sync CLAUDE.md test counts after tick-32
2026-04-15 14:11:16 -07:00