Commit Graph

285 Commits

Author SHA1 Message Date
Hongming Wang
975f55a560 Merge branch 'main' into feat/tr-idle-prompt 2026-04-15 11:54:08 -07:00
Hongming Wang
daf1ce8cb5 Merge pull request #232 from Molecule-AI/fix/code-review-idle-loop-and-docs
fix(code-review): idle loop hardening + idle_prompt docs + admin-auth runbook
2026-04-15 11:52:06 -07:00
Hongming Wang
54b49ffd1b fix(code-review): idle loop hardening + idle_prompt docs + admin-auth runbook
Addresses items 4, 5, 7 from the self-review of the batch merge. PR A
(#228) covered items 1, 2, 3, 6 on the Go side.

## workspace-template/main.py — idle loop hardening

- Replace asyncio.get_event_loop() with asyncio.get_running_loop() —
  the former is deprecated in 3.12+ and emits a DeprecationWarning on
  every idle fire.
- Replace hardcoded urlopen timeout=600 with IDLE_FIRE_TIMEOUT_SECONDS
  clamped to max(60, min(300, idle_interval_seconds)). Long cadence
  workspaces no longer hold dangling requests open for 10 minutes; the
  cap adapts automatically when the interval is short.
- Type the exception handling: split HTTPError (has .code) from URLError
  (connection-level) from the generic catch-all. Log status + error
  class separately so operators can grep for specific failure modes
  instead of a bare "post failed".
- Fire-and-forget no longer loses exceptions. run_in_executor Future
  now has an add_done_callback that logs the outcome, so a panic in
  _post_sync surfaces as "Idle loop: post failed — status=None err=..."
  instead of Python's default "Task exception was never retrieved"
  warning burried in stderr.

## org-templates/molecule-dev/org.yaml — discoverability

Added idle_prompt + idle_interval_seconds to the defaults: block with
explanatory comments. Without this, users had to read main.py to
discover the feature.

## docs/runbooks/admin-auth.md — new

Documents the three middleware variants (AdminAuth strict,
CanvasOrBearer soft, WorkspaceAuth per-id), the exact contract of each,
and the three-question test for adding a new route to CanvasOrBearer.
Also flags the session-cookie follow-up as Phase H.

Referenced PRs: #138, #164, #165, #166, #167, #168, #190, #194, #203,
#228.

No code deltas in platform/ beyond the Python + YAML + docs changes.
Full pytest suite unchanged except the pre-existing test_hermes_smoke
flake that fails in full-suite but passes in isolation (test isolation
bug, not introduced by this PR).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:52:01 -07:00
Hongming Wang
acb8721bbb Merge pull request #228 from Molecule-AI/fix/code-review-go-batch
fix(code-review): Go-side follow-ups from self-review batch
2026-04-15 11:48:30 -07:00
Hongming Wang
35705274c9 fix(code-review): CanvasOrBearer fall-through, scheduler short(), activity spoof log + 6 new tests
Addresses self-review of the 10-PR batch merged earlier this session.
Splits the follow-ups into this Go-side PR and a later Python/docs PR.

## Fixes

1. wsauth_middleware.go CanvasOrBearer — invalid bearer now hard-rejects
   with 401 instead of falling through to the Origin check. Previous code
   let an attacker with an expired token + matching Origin bypass auth.
   Empty bearer still falls through to the Origin path (the intended
   canvas path).

2. scheduler.go short() helper — extracts safe UUID prefix truncation.
   Pre-existing unsafe [:12] and [:8] slices would panic on workspace IDs
   shorter than the bound. #115's new skip path had the bounds check;
   the happy-path log lines did not. One helper, three call sites.

3. activity.go security-event log on source_id spoof — #209 added the
   403 but the attempt was invisible to any auditor cron. Stable
   greppable log line with authed_workspace, body_source_id, client IP.

## New tests

- TestShort_helper — bounds-safety regression guard for the helper
- TestRecordSkipped_writesSkippedStatus — #115 coverage gap, exercises
  UPDATE + INSERT via sqlmock
- TestRecordSkipped_shortWorkspaceIDNoPanic — short-ID crash regression
- TestActivityHandler_Report_SourceIDSpoofRejected — #209 403 path
- TestActivityHandler_Report_MatchingSourceIDAccepted — non-spoof path
- TestHistory_IncludesErrorDetail — #152 problem B coverage

go test -race ./... green locally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:48:25 -07:00
rabbitblood
9ceaea33ce chore(template): enable idle-loop pilot on Technical Researcher (#205 follow-up)
PR #205 shipped the workspace idle-loop mechanism (reflection-on-completion
pattern from the Hermes/Letta research survey) but deliberately added NO
default idle_prompt in org.yaml so rollout could be measured one workspace
at a time before going team-wide.

This is that first opt-in: Technical Researcher gets a backlog-pull + reflect
idle prompt on a 10-minute cadence.

## Why TR first

- Research-heavy role with a naturally bursty load — lots of idle time
  between the once-per-hour plugin curation cron fires
- Non-user-facing (no canvas UI impact, no UX risk)
- Already has a clear backlog shape: the plugin curation cron produces
  findings that could feed follow-up studies
- Vision-free (no Playwright) so cost per idle tick is pure text

## What the idle_prompt does

Three-step reflection, under 60s wall-clock, max 1 A2A send per tick:

1. **Backlog pull** — search_memory "research-backlog:technical-researcher"
   for any stashed research questions (from prior cron fires or Research
   Lead delegations). If found → delegate_task to Research Lead with a
   concrete deliverable spec, then commit_memory to remove the item from
   the backlog.

2. **Reflection fallback** — if backlog is empty, look at the last memory
   entry from the Hourly plugin curation cron. Does it surface a follow-up
   study worth doing? If yes → file a GH issue labeled `research` and
   commit_memory to put the question on the backlog for next tick.

3. **Idle-clean outcome** — if neither backlog nor reflection produced
   anything, write "tr-idle HH:MM — clean" to memory and stop. No busy work.

Hard rules enforce: max 1 A2A per tick, skip step 1 if Research Lead busy,
under 60s wall-clock, never re-run a cron's own prompt from inside the idle
loop.

## Rollout plan

- **This PR**: enables TR only via the `idle_prompt` + `idle_interval_seconds`
  fields added to its workspace entry in org.yaml.
- **Next 24h**: measure activity_logs delta on TR vs baseline, count
  idle-fired delegations vs idle-clean outcomes, confirm Research Lead
  isn't being flooded.
- **If green** (delegations land useful work, no flood): roll to Market
  Analyst + Competitive Intelligence in a follow-up PR.
- **If noisy** (too many idle fires producing nothing): tune idle_interval
  up to 1200-1800s.

## Apply locally per feedback rule

Per `feedback_apply_template_locally_too.md`: not waiting for merge. After
pushing this PR I'll edit TR's live /configs/config.yaml to add the same
idle_prompt + idle_interval_seconds fields, then restart ws-57e13b54-119
(Technical Researcher) so the new workspace-template binary picks up the
idle loop immediately. Measurement clock starts from that restart.

## Related
- #205 (mechanism) — just merged in this cycle (7f11328)
- #208 Hermes Phase 1 — also just merged (be53a33)
- docs/ecosystem-watch.md → `### Hermes Agent` — reflection-on-completion
  pattern reference
2026-04-15 11:34:51 -07:00
Hongming Wang
5ce848367e Merge pull request #212 from Molecule-AI/fix/issue-211-migration-runner-skips-down
fix(db): #211 — migration runner skips *.down.sql (stop wiping data on boot)
2026-04-15 11:24:11 -07:00
Hongming Wang
0b627816ed fix(db): #211 — migration runner skips *.down.sql (stop wiping data on boot)
Closes #211 HIGH ops/security. RunMigrations globbed \`*.sql\` which
matches both \`.up.sql\` AND \`.down.sql\`. Alphabetical sort puts \"d\"
before \"u\", so every platform boot ran the rollback BEFORE the forward
migration for any pair starting with migration 018.

Net effect: every restart wiped workspace_auth_tokens (the 020 pair),
which in turn regressed AdminAuth to its fail-open bootstrap bypass for
every route protected by it — the live server was effectively
unauthenticated from restart until the next workspace re-registered.
Also wiped 018_secrets_encryption_version and 019_workspace_access
pairs silently.

Fix is a 3-line filter: skip files whose base name ends in \`.down.sql\`.
Down migrations remain on disk for operator-driven rollback via psql,
but are never picked up by the auto-run loop.

Added unit test against a tmp dir to lock the filter behaviour so this
can never regress: stages a mix of legacy plain .sql, matched up/down
pairs, asserts only forward files survive.

Follow-up (not in this PR): the runner still re-applies every migration
on every boot. Migrations must be idempotent. A proper schema_migrations
tracking table is tracked as a future cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:24:06 -07:00
Hongming Wang
7f11328e22 Merge pull request #205 from Molecule-AI/feat/workspace-idle-loop
feat(workspace): add idle-loop reflection pattern (Hermes/Letta shape, opt-in, ~90 LOC)
2026-04-15 11:21:47 -07:00
Hongming Wang
8a011c9f51 Merge remote-tracking branch 'origin/main' into feat/workspace-idle-loop 2026-04-15 11:21:15 -07:00
Hongming Wang
be53a33546 Merge pull request #208 from Molecule-AI/feat/hermes-phase1-provider-registry
feat(hermes): Phase 1 — multi-provider registry (15 providers, 26 tests, back-compat preserved)
2026-04-15 11:21:05 -07:00
Hongming Wang
80ae2bd6ad Merge remote-tracking branch 'origin/main' into feat/hermes-phase1-provider-registry 2026-04-15 11:20:51 -07:00
Hongming Wang
c2fee56d59 Merge branch 'main' into feat/hermes-phase1-provider-registry 2026-04-15 11:20:06 -07:00
Hongming Wang
2b8aef5c60 Merge pull request #210 from Molecule-AI/fix/issue-204-push-sender-abstract
fix(workspace-template): #204 — drop PushNotificationSender (abstract class)
2026-04-15 11:18:57 -07:00
Hongming Wang
7d7d5995e0 fix(workspace-template): #204 — drop PushNotificationSender (abstract class)
Closes #204. PR #198 wired push_sender=PushNotificationSender() into
DefaultRequestHandler to satisfy #175's push-notification capability,
but PushNotificationSender in a2a-sdk is an abstract base class and
cannot be instantiated. Every workspace container crashed on startup
with TypeError.

Reverted to DefaultRequestHandler's defaults. The pushNotifications
capability still appears in AgentCard.capabilities (advertised to A2A
clients) but actual implementation of the sender is deferred to a
Phase-H follow-up that subclasses PushNotificationSender properly.

Existing pytest suite unchanged (the crash was only at runtime on
main.py import, which no existing test exercises directly).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:18:52 -07:00
Hongming Wang
a076318cf5 Merge pull request #209 from Molecule-AI/fix/c2-source-id-spoof-check
fix(security): C2 from #169 — reject spoofed source_id in activity.Report
2026-04-15 11:15:14 -07:00
Hongming Wang
25bbfd3bfc fix(security): C2 from #169 — reject spoofed source_id in activity.Report
Cherry-picks the one genuinely new fix from #169 after confirming the
rest of that PR is already covered on main (C1/C3/C5 by wsAuth group,
C6 by #94+#119 SSRF blocklist, C4 ownership by existing WHERE filter).

Pre-existing middleware (WorkspaceAuth on /workspaces/:id/* sub-routes)
proves the caller owns the :id path param. But the body field
source_id was never validated — a workspace authenticated for its own
/activity endpoint could still attribute logs to a different workspace
by setting source_id=<foreign UUID>. Rejected with 403 now.

No schema change, no new middleware. 4-line handler delta. Closes the
only real gap in #169; #169 itself will be closed as superseded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:15:08 -07:00
rabbitblood
8d8ca18bc0 feat(hermes): Phase 1 — multi-provider registry (15 providers, back-compat preserved)
Ships the first half of the queued Hermes adapter expansion. PR 2 only
supported Nous Portal + OpenRouter; this adds 13 more providers reachable
via OpenAI-compat endpoints. Native SDK paths for Anthropic + Gemini are
Phase 2 (better tool-calling + vision fidelity).

## What's new

**`workspace-template/adapters/hermes/providers.py`** (new file, 220 LOC):
- ``ProviderConfig`` dataclass: name, env vars, base URL, default model, auth scheme, docs
- ``PROVIDERS`` dict with 15 entries across 4 groups:
  - PR 2 baseline: nous_portal, openrouter
  - Frontier commercial: openai, anthropic, xai, gemini
  - Chinese providers: qwen, glm, kimi, minimax, deepseek
  - OSS/alt: groq, together, fireworks, mistral
- ``RESOLUTION_ORDER`` tuple: priority for auto-detect (back-compat first,
  then commercial, then Chinese, then OSS/alt)
- ``resolve_provider(explicit=None)`` -> (ProviderConfig, api_key)
  - With explicit name: routes to that provider, raises if env var empty
  - Without: walks RESOLUTION_ORDER, first env-var-set provider wins

**`workspace-template/adapters/hermes/executor.py`** (refactored):
- `create_executor(hermes_api_key=None, provider=None, model=None)` now has
  three parameters:
  - `hermes_api_key`: PR 2 back-compat — routes to Nous Portal
  - `provider`: canonical short name from the registry (e.g. "anthropic")
  - `model`: optional override of the provider's default model
- Delegates all resolution to `providers.resolve_provider()` — no more
  hardcoded URLs or env var lookups in the executor itself
- `HermesA2AExecutor.__init__` no longer has Nous-specific defaults; callers
  pass base_url + model explicitly (which create_executor always does)

**`workspace-template/tests/test_hermes_providers.py`** (new file, 26 tests):
- Registry shape invariants (count >= 15, no duplicates, every config valid)
- PR 2 back-compat: HERMES_API_KEY / OPENROUTER_API_KEY still route correctly
- Auto-detect for every provider in the registry (parametrized — guards against
  typos in env var lists)
- Explicit `provider=` bypass of auto-detect
- Error cases: unknown provider, explicit-but-empty, auto-detect-with-no-env
- All 26 tests pass locally in 0.08s

## Back-compat guarantees

| Scenario | PR 2 behavior | This PR behavior |
|---|---|---|
| `create_executor(hermes_api_key="x")` | Nous Portal | Nous Portal (unchanged) |
| `HERMES_API_KEY=x` env, auto-detect | Nous Portal | Nous Portal (unchanged) |
| `OPENROUTER_API_KEY=x` env, auto-detect | OpenRouter | OpenRouter (unchanged) |
| Both env + explicit hermes_api_key param | Nous Portal (param wins) | Nous Portal (param wins, unchanged) |

Nothing existing can break. New callers gain access to 13 more providers.

## What's NOT in this PR (Phase 2)

- **Native Anthropic Messages API path** — better tool calling, vision, extended
  thinking. Requires pulling in `anthropic` SDK. ~50 LOC.
- **Native Gemini generateContent path** — for vision + google tools. Requires
  `google-genai` SDK. ~50 LOC.
- **Streaming support across all providers** — current executor is non-streaming
  (single chat.completions.create call). Streaming works with openai.AsyncOpenAI
  but hasn't been wired to the A2A event queue path. ~30 LOC.
- **Per-provider model overrides in config.yaml** — Phase 1 uses the registry's
  default_model. Phase 2 adds a `hermes: { provider: qwen, model: qwen3-coder-plus }`
  block in the workspace config.
- **`.env.example` updates** — not critical since the registry itself documents
  every env var via the `env_vars` field, but nice-to-have.

## Related
- Queued memory: `project_hermes_multi_provider.md`
- CEO directive 2026-04-15: *"once current works are cleared, I want you to
  focus on supporting hermes agent, right now it doesnt take too much providers"*
- `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's eco-watch
  entry listed "Nous Portal, OpenRouter, GLM, Kimi, MiniMax, OpenAI, …" which
  shaped this registry's initial set

## Test plan
- [x] Unit tests: 26/26 pass locally (pytest)
- [ ] CI will run on the self-hosted macOS arm64 runner
- [ ] Smoke test in a real workspace: set QWEN_API_KEY and verify Technical
      Researcher actually hits Alibaba DashScope successfully
- [ ] Integration test per provider with real API keys (gated on env, skip
      when not set — Phase 2 CI addition)
2026-04-15 11:14:35 -07:00
Hongming Wang
2a0552214a Merge pull request #207 from Molecule-AI/fix/issue-115-scheduler-busy-skip
fix(scheduler): #115 — skip cron fire when workspace busy
2026-04-15 11:13:20 -07:00
Hongming Wang
fb942fbb0c fix(scheduler): #115 — skip cron fire when workspace is busy
Closes #115. The Security Auditor hourly cron (and likely others) hit a
~36% miss rate because the platform's A2A proxy rejected fires with
"workspace agent busy — retry after a short backoff" while the agent was
still executing the prior audit. That error was recorded as a hard
failure and polluted last_error.

New behaviour:

Before fireSchedule calls into the A2A proxy, it reads
workspaces.active_tasks for the target. If >0, it:
  - Advances next_run_at to the next cron slot (cron keeps ticking)
  - Bumps run_count
  - Sets last_status='skipped' + last_error=<reason>
  - Inserts a cron_run activity_logs row with status='skipped' + error_detail
  - Broadcasts CRON_SKIPPED for canvas + operators

Effect: busy-collision ceases to be an error. The history surface now
distinguishes "ran and failed" from "skipped because busy". Operators
can tell the difference at a glance, and the liveness view doesn't
stall waiting for the next ticker cycle.

Pairs with #149 (dedicated heartbeat pulse) and #152 problem B
(error_detail surfaced in history) for a coherent scheduler story.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:13:15 -07:00
Hongming Wang
187e8c30ed Merge pull request #206 from Molecule-AI/fix/issue-152-schedule-history-error-detail
fix(scheduler): #152 problem B — surface cron error_detail in schedule history
2026-04-15 11:11:21 -07:00
Hongming Wang
ce88a396da fix(scheduler): #152 problem B — persist and surface cron error_detail
Closes #152 problem B (schedule history API drops error detail).

Two tiny changes:

1. scheduler.fireSchedule now writes lastError into activity_logs.error_detail
   when inserting the cron_run row. Previously the column was left NULL even
   on failure because the INSERT didn't include it.

2. schedules.History SELECT now reads error_detail and includes it in the
   JSON response under error_detail. Frontend + audit cron can now display
   "why did this run fail" instead of just "status=error".

No schema change — activity_logs.error_detail already exists from
migration 009. This just starts using the column.

Problem A of #152 (Research Lead ecosystem-watch 50% error rate on its
own) is a separate ops investigation and stays open.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:11:16 -07:00
rabbitblood
37bca9176e feat(workspace): add idle-loop reflection pattern (Hermes/Letta shape)
Today's multi-framework research (Hermes, Letta, Trigger.dev, Inngest, AG2,
Rivet, n8n, Composio, SWE-agent — see docs/ecosystem-watch.md) confirmed
that nobody runs while(true) per agent. The working patterns are:

  (a) event-driven + hibernation (Hermes, Letta, Trigger.dev, Inngest)
  (b) cron/user-triggered ephemeral runs (AG2, Rivet, n8n, SWE-agent)

Molecule AI is currently 100% in category (b). Observed team utilization:
~0.5% — agents idle 99.5% of the time because cron fires and CEO-typed
A2A are the only initiating signals. CEO's north-star is 24/7 iteration,
current cadence falls short.

This PR closes the gap by adding an in-workspace idle loop that wakes the
agent periodically ONLY when it has no active task. The shape is the
Hermes reflection-on-completion pattern combined with the Letta backlog-pull
pattern, collapsed into a ~60 LOC change in the workspace-template. Zero
new Go code. Zero new DB tables. Zero new API endpoints.

## How it works

1. `config.py` gets two new fields on WorkspaceConfig:
   - `idle_prompt: str = ""` — the prompt to self-send when idle
   - `idle_interval_seconds: int = 600` — how often to check (default 10 min)
   Both support inline or file ref (matching the initial_prompt pattern).

2. `main.py` spawns an `_run_idle_loop()` asyncio task alongside the
   existing initial_prompt task (same lifecycle hooks — cancelled in the
   `finally:` of the server.serve() block).

3. The loop body:
   a. Sleep interval
   b. Check `heartbeat.active_tasks == 0` LOCALLY (no LLM call, no HTTP)
   c. If idle → self-POST the idle_prompt via the existing /workspaces/{id}/a2a proxy
   d. Loop
   The agent's own concurrency control rejects the post if it becomes busy
   between the check and the POST — that's the safety valve.

4. Gated on `config.idle_prompt` being non-empty. Default = "" = no loop.
   Existing workspaces upgrade silently as no-ops until someone explicitly
   opts in by setting idle_prompt in org.yaml (either defaults: or
   per-workspace:).

## Cost analysis (from the research report)

- while(true) pattern: ~$93/day/org (12 agents × 12 thinks/hour × $0.027). Unshippable.
- Hermes reflection-on-completion: ~$0.45/day/org. Cost ∝ useful work.
- This PR's idle loop at 10-min cadence: upper bound 12 × 6/hour × 24h
  × ~3k tokens × Sonnet rate ≈ $5/day/org PER ROLE, only if they're
  genuinely idle every check. In practice far less because busy periods
  skip the LLM call entirely (the active_tasks check is local).

## Rollout plan

Research report recommended rolling to ONE workspace first (Technical
Researcher) and measuring 24h of activity_logs before enabling for
all 12. This PR enables the mechanism; it does NOT add any default
idle_prompt to org-templates/molecule-dev/org.yaml. That's a follow-up
PR after this one lands and one workspace has been manually opted in
for measurement.

## Not touched in this PR

- No Go code (no new platform endpoint, no new DB columns)
- No org.yaml changes (zero-impact until someone opts in)
- No scheduler changes (the idle loop is a workspace concern, not a
  scheduler concern — matches the research report's layering)

## Test plan

- [x] Python syntax check (ast.parse) on main.py + config.py
- [ ] Unit test: WorkspaceConfig parses idle_prompt / idle_interval_seconds from yaml
- [ ] Integration test: set idle_prompt on Technical Researcher, measure that
      an A2A message is received every ~10 min while idle, and NOT received
      while busy with a delegation
- [ ] Dogfood: enable on Technical Researcher for 24h, count activity_logs
      delta vs baseline, confirm cost stays within model

## Related

- Today's research report (conversation output, summarized in commit trailer)
- docs/ecosystem-watch.md → `### Hermes Agent` (the canonical reflection-on-completion example)
- #159 orchestrator/worker split — complementary: leaders pulse for dispatch,
  workers idle-loop for pull. Together: leaders push work, workers pull work,
  no role ever sits idle with a cold queue.
2026-04-15 11:09:43 -07:00
Hongming Wang
ddce151698 Merge pull request #203 from Molecule-AI/fix/issue-168-route-split
fix(auth): #168 — CanvasOrBearer on PUT /canvas/viewport (route-split)
2026-04-15 11:09:22 -07:00
Hongming Wang
7a16eb4f70 fix(auth): #168 — CanvasOrBearer middleware for PUT /canvas/viewport only
Closes #168 by the route-split path from #194's review. #167 put PUT
/canvas/viewport behind strict AdminAuth, breaking canvas drag/zoom
persist because the canvas uses session cookies not bearer tokens.

New narrow middleware CanvasOrBearer:
  - Accepts a valid bearer (same contract as AdminAuth) OR
  - Accepts a request whose Origin exactly matches CORS_ORIGINS
  - Lazy-bootstrap fail-open preserved for fresh installs

Applied ONLY to PUT /canvas/viewport. The softer check is acceptable
there because viewport corruption is cosmetic-only — worst case a
user refreshes the page. This middleware must NOT be used on routes
that leak prompts (#165), create resources (#164), or write files
(#190) — see #194 review for why.

The other canvas-facing routes mentioned in #168 (Events tab, Bundle
Export/Import) remain behind strict AdminAuth pending a proper
session-cookie-accepting AdminAuth (#168 follow-up for Phase H).

6 new tests cover: bootstrap fail-open, no-creds 401, canvas origin
match, wrong origin 401, empty origin rejected, localhost default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:09:16 -07:00
Hongming Wang
87d330cf15 Merge pull request #198 from Molecule-AI/fix/a2a-compat-batch-173-174-175
fix(a2a): A2A protocol compliance — cancel(), capabilities, push store (closes #173 #174 #175)
2026-04-15 11:02:11 -07:00
Hongming Wang
1634594ca0 Merge branch 'main' into fix/a2a-compat-batch-173-174-175 2026-04-15 11:01:54 -07:00
Hongming Wang
adbcb4fe3b Merge pull request #200 from Molecule-AI/fix/issue-190-templates-import-auth
fix(security): #190 — gate POST /templates/import behind AdminAuth
2026-04-15 11:00:54 -07:00
Hongming Wang
7a164f98d3 fix(security): #190 — gate POST /templates/import behind AdminAuth
Closes #190 (HIGH). The route was registered on the root router with no
auth middleware, letting any unauthenticated caller write arbitrary files
into configsDir via a crafted template. Same vulnerability class as #164
(bundles/import) and path-traversal risk same as #103 (org/import).

One-line gate via the existing wsAdmin pattern. Lazy-bootstrap fail-open
preserved for fresh installs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:00:49 -07:00
Hongming Wang
fb8ba8e3d3 Merge pull request #197 from Molecule-AI/fix/ci-python-bypass-setup-python
fix(ci): apply bypass-setup-python to main (missed in #186 squash)
2026-04-15 10:58:27 -07:00
Hongming Wang
dde97433ed fix(ci): apply user's bypass-setup-python to main (missed in #186 squash-merge)
#186's squash-merge commit (3ff40c4b) took 15e15a21 (AGENT_TOOLSDIRECTORY
override) but missed a6cfc5f (bypass setup-python entirely) which was
pushed to the PR branch after the merge was initiated. The merge
commit still has the old setup-python@v5 job config.

Applies a6cfc5f's ci.yml verbatim via git checkout. Restores the
Homebrew-python3.11 bypass path that the user prototyped. No other
changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 10:58:22 -07:00
Backend Engineer
4cea1c6478 fix(a2a): cancel() event, stateTransitionHistory capability, wire push store (#173 #174 #175)
#173 — implement cancel() in LangGraphA2AExecutor: emits
TaskStatusUpdateEvent(state=canceled, final=True) so clients see the
state transition rather than silence. Removes pragma: no cover.
Test: test_cancel_emits_canceled_event.

#174 — add stateTransitionHistory=True to AgentCapabilities in main.py
so microsoft/agent-framework clients know they can request full task
history via the A2A protocol.

#175 — wire InMemoryPushNotificationConfigStore and PushNotificationSender
into DefaultRequestHandler so the advertised pushNotifications capability
is backed by a real store. Both classes live in a2a.server.tasks (a2a-sdk
0.3.25); import confirmed by probe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 17:58:10 +00:00
Hongming Wang
53299828a4 Merge pull request #187 from Molecule-AI/fix/issue-179-trusted-proxies
fix(router): SetTrustedProxies(nil) closes rate-limit bypass via X-Forwarded-For (#179)
2026-04-15 10:55:01 -07:00
Hongming Wang
5f56e6d555 Merge pull request #192 from Molecule-AI/fix/issue-170-secret-delete-auth
fix: require workspace auth on DELETE /secrets/:key (#170)
2026-04-15 10:54:58 -07:00
Hongming Wang
f2b34e590b Merge pull request #189 from Molecule-AI/fix/issue-178-security-auditor-cron
fix(template): revert Security Auditor cron to 2x/day (closes #178)
2026-04-15 10:54:55 -07:00
Hongming Wang
8e93c4ea73 Merge branch 'main' into fix/issue-178-security-auditor-cron 2026-04-15 10:54:45 -07:00
Hongming Wang
96e8cfbd4c Merge branch 'main' into fix/issue-170-secret-delete-auth 2026-04-15 10:54:36 -07:00
Hongming Wang
271e3427c4 Merge branch 'main' into fix/issue-179-trusted-proxies 2026-04-15 10:54:21 -07:00
Hongming Wang
3ff40c4b68 chore(ci): migrate all jobs to self-hosted macOS arm64 runner
* chore(ci): migrate all jobs to self-hosted macOS arm64 runner

Switches every job in `ci.yml` and `publish-platform-image.yml` from
`ubuntu-latest` to `[self-hosted, macos, arm64]` to avoid GitHub-hosted
minute rate limits. All jobs run on a single Apple-silicon self-hosted
runner registered at the Molecule-AI org level.

Notable non-trivial adaptations (macOS runners can't use `services:` and
some GHA marketplace actions are Linux-only):

- e2e-api: `services: postgres/redis` replaced with inline `docker run`
  steps. Ports remapped to 15432/16379 to avoid collision with anything
  the host may already expose on the standard ports. Containers are named
  (`molecule-ci-postgres` / `molecule-ci-redis`) and torn down in an
  `if: always()` step. Postgres readiness is still gated on pg_isready
  via `docker exec`.
- shellcheck: `ludeeus/action-shellcheck` is a Docker action, Linux-only.
  Replaced with a direct `shellcheck` invocation (pre-installed on the
  runner) that scans `tests/e2e/*.sh` with `--severity=warning`.
- publish-platform-image: added `docker/setup-qemu-action@v3` and an
  explicit `platforms: linux/amd64` on both `docker/build-push-action`
  invocations. The runner is arm64 but Fly tenant machines pull amd64,
  so QEMU-emulated cross-arch builds are required. GHA cache-from/cache-to
  behavior is unchanged.

Runner prereqs (one-time host setup):
- Docker Desktop installed and running (for e2e-api + image publish)
- `shellcheck` on PATH
- `docker` on PATH
- Go / Node / gh / Python are installed via setup-* actions per job

* fix(ci): set AGENT_TOOLSDIRECTORY for python-lint on self-hosted runner

setup-python@v5 defaults to /Users/runner/hostedtoolcache which doesn't
exist on the hongming-claw self-hosted runner. AGENT_TOOLSDIRECTORY tells
the action to use a writable path under the runner user's home directory.

Fixes the only failing job in CI run 24469156329 on PR #186.

---------

Co-authored-by: Hongming Wang <HongmingWang-Rabbit@users.noreply.github.com>
2026-04-15 10:48:27 -07:00
Backend Engineer
734c0f6bcf fix: require workspace auth on DELETE /secrets/:key (#170)
The route wsAuth.DELETE("/secrets/:key", sech.Delete) was already moved
inside the WorkspaceAuth group in a prior commit, closing the CWE-306
unauthenticated-delete vector. This commit adds two regression tests to
lock that in:

- TestWorkspaceAuth_Issue170_SecretDelete_NoBearer_Returns401: workspace
  with live tokens, no bearer header → 401 (blocks the attack).
- TestWorkspaceAuth_Issue170_SecretDelete_FailOpen_NoTokens: workspace
  with no tokens (bootstrap/legacy) → 200 (fail-open preserved).

Mirrors the TestAdminAuth_Issue120_* and TestWorkspaceAuth_C4_C8_* patterns.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 17:42:08 +00:00
Hongming Wang
7b0387e9be fix(template): revert Security Auditor cron to 2x/day — closes #178
Every-10-min cadence introduced in PR #159 increased Security Auditor
from 2 runs/day to 144 runs/day (144x). Combined with PM, Research Lead,
Dev Lead, and other hourly evolution-lever crons, this is the likely
root cause of the P0 OAuth quota exhaustion (#160, resets Apr 17 23:00 UTC).

Restored: cron_expr 7 6,18 * * * (twice daily, 12-hour interval)
Schedule name updated to match new cadence.
Audit prompt content (DAST teardown, PM routing, PM deliverable) retained.
2026-04-15 17:33:54 +00:00
Hongming Wang
6582ece523 Merge pull request #188 from Molecule-AI/fix/e2e-auth-headers-post-167
fix(tests): e2e auth headers for /events + /bundles/export (post #167)
2026-04-15 10:33:44 -07:00
Hongming Wang
8f23908304 fix(tests): add auth headers to e2e GET /events + /bundles/export (post #167)
PR #167 gated /events and /bundles/export/:id behind AdminAuth. The e2e
script's 3 calls to these routes were unauthenticated and broke when the
runner picked them up for the first time on PR #186 (self-hosted runner
migration). Same admin-gate contract, same fix pattern as the #99/#110
e2e hotfixes.

POST /bundles/import is left unauthenticated because by that point in
the script both workspaces have been deleted and #110 revoked their
tokens, so HasAnyLiveTokenGlobal=0 and AdminAuth fails-open.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 10:33:38 -07:00
Backend Engineer
06e02a310c fix(router): call SetTrustedProxies(nil) to close IP-spoofing bypass (#179)
Without this call Gin's default trusts all X-Forwarded-For headers, letting
any caller rotate their effective IP and bypass per-IP rate limiting.
SetTrustedProxies(nil) forces c.ClientIP() to always return the real
TCP RemoteAddr.

Adds two regression tests: one documenting the pre-fix bypass, one
asserting the spoofed header is ignored after the fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 17:32:54 +00:00
Hongming Wang
9979845391 Merge pull request #182 from Molecule-AI/fix/issue-177-documentation-specialist-dir
fix(template): add missing documentation-specialist/system-prompt.md (closes #177)
2026-04-15 10:31:02 -07:00
Hongming Wang
4bd377fe28 Merge branch 'main' into fix/issue-177-documentation-specialist-dir 2026-04-15 10:30:49 -07:00
Hongming Wang
762b2f14a2 Merge pull request #185 from Molecule-AI/fix/issue-180-approvals-auth
fix(security): gate GET /approvals/pending behind AdminAuth (#180)
2026-04-15 10:30:38 -07:00
Backend Engineer
9122d6aeea fix(security): gate GET /approvals/pending behind AdminAuth (#180)
GET /approvals/pending was registered on the open router with no
middleware, allowing any unauthenticated caller to enumerate all pending
approvals across every workspace on the platform.

Fix: add inline middleware.AdminAuth(db.DB) to the route registration,
matching the pattern used in PR #167 for bundles, events, and viewport.

The three workspace-scoped approvals routes (POST/GET /approvals,
POST /approvals/:id/decide) were already correctly behind WorkspaceAuth
inside the wsAuth group — no change needed there.

Tests: two new regression tests in wsauth_middleware_test.go —
  TestAdminAuth_Issue180_ApprovalsListing_NoBearer_Returns401
  TestAdminAuth_Issue180_ApprovalsListing_FailOpen_NoTokens

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 17:25:09 +00:00
Hongming Wang
b2d7c3407f fix(template): add missing documentation-specialist/system-prompt.md (closes #177) 2026-04-15 17:23:38 +00:00
Hongming Wang
aded2e3762 Merge pull request #159 from Molecule-AI/chore/orchestrator-worker-split
chore(template): orchestrator/worker split — leaders pulse every 5min, workers stay reactive (supersedes #158)
2026-04-15 09:53:51 -07:00