Commit Graph

4144 Commits

Author SHA1 Message Date
Hongming Wang
28472f0d2d
Merge pull request #2764 from Molecule-AI/auto-sync/main-f42feb4e
chore: sync main → staging (auto, ff to f42feb4e)
2026-05-04 19:51:06 +00:00
molecule-ai[bot]
f42feb4ed7
Merge pull request #2763 from Molecule-AI/staging
staging → main: auto-promote 99e7f13
2026-05-04 19:35:21 +00:00
Hongming Wang
99e7f13149
Merge pull request #2762 from Molecule-AI/fix/preflight-env-warn-not-fail
fix(preflight): downgrade required_env + auth_token failures to warnings
2026-05-04 19:23:06 +00:00
Hongming Wang
6488ba09e7 fix(preflight): downgrade required_env + auth_token failures to warnings
Preflight was hard-failing the workspace boot when required env vars or
legacy auth_token_files were missing, raising SystemExit(1) before
main.py's PR #2756 try/except could mount the not-configured handler.
Result: codex/openclaw workspaces launched without OPENAI_API_KEY were
INVISIBLE — `/.well-known/agent-card.json` never returned 200, the bench
timed out at 600s, canvas had no actionable signal. PR #2756 fixed half
the puzzle (decouple agent-card from adapter.setup() failure); this
fixes the other half (decouple from preflight failure).

Caught by bench-provision-time run 25335853189 on 2026-05-04: codex and
openclaw both timed_out at 609s while claude-code (whose default model
needs no env) hit 86.7s on the same AMI. Hermes hit 147s because hermes
config doesn't declare top-level required_env.

After this change:
- Missing required_env: WARN (operator sees it in boot logs); workspace
  proceeds to adapter.setup() which raises with the same env-name detail;
  PR #2756's try/except mounts the not-configured handler;
  /.well-known/agent-card.json serves 200; JSON-RPC POST / returns
  -32603 "agent not configured" with the env-name in `error.data`.
- Missing auth_token_file (legacy path): same treatment.
- Other preflight failures (runtime adapter not installable, invalid
  A2A port) STAY as fails — those are structural, the workspace truly
  can't run.

Updated 4 existing tests that asserted `report.ok is False` on
required_env / auth_token misses to assert `report.ok is True` and
check `report.warnings` instead. All 31 preflight tests pass; full
suite 1664 pass + 1 unrelated flake on staging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:20:34 -07:00
Hongming Wang
8176b5142d
Merge pull request #2759 from Molecule-AI/auto-sync/main-31427776
chore: sync main → staging (auto, ff to 31427776)
2026-05-04 18:03:49 +00:00
Hongming Wang
314277769e
Merge pull request #2758 from Molecule-AI/staging
staging → main: auto-promote 4f9e3fe
2026-05-04 10:53:03 -07:00
e0b567e992
Merge pull request #2757 from Molecule-AI/fix/memory-v2-wiring-real-tests
Memory v2 wiring: replace decorative tests with real integration
2026-05-04 17:43:09 +00:00
Hongming Wang
707e4d7342 Memory v2 wiring: replace decorative tests with real integration
Self-review of #2755 found two tests that didn't actually exercise the
production code path:

- TestNamespaceCleanupFn_NamespaceFormat asserted
  "workspace:" + "abc-123" == "workspace:abc-123" — a compile-time
  invariant, not runtime behavior. Provided no protection if the closure
  in Bundle.NamespaceCleanupFn ever stopped using that prefix.

- TestNamespaceCleanupFn_FailureLogsButReturns built a *parallel*
  cleanup closure inline with errors.New, then invoked the parallel
  closure. The production closure was never exercised. A regression
  in NamespaceCleanupFn (e.g. forgetting the deferred recover, calling
  the plugin without nil-check) would still pass this test.

Replaced both with real integration:

- TestNamespaceCleanupFn_HitsPluginAtCorrectNamespace spins up
  httptest.Server, points MEMORY_PLUGIN_URL at it, calls Build(),
  invokes the production closure, and asserts the server actually
  saw DELETE /v1/namespaces/workspace:abc-123.

- TestNamespaceCleanupFn_PluginErrorDoesNotPanic exercises the
  failure path for real: server returns 500 on DELETE, closure must
  log and return without propagating. defer-recover is belt-and-
  suspenders since production calls this from a for-loop in
  workspace_crud.go that has no recover.

Couldn't ship with #2755 because the merge queue locks the branch
once enqueued. Following up now that #2755 is merged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 10:38:59 -07:00
Hongming Wang
4f9e3feece
Merge pull request #2756 from Molecule-AI/fix/agent-card-decouple-from-setup
fix(runtime): decouple agent-card readiness from adapter.setup()
2026-05-04 17:32:02 +00:00
Hongming Wang
10752fe330
Merge pull request #2755 from Molecule-AI/fix/memory-v2-main-wiring
Memory v2 fixup CRITICAL: wire plugin from main.go (was fully dormant)
2026-05-04 17:31:01 +00:00
Hongming Wang
8f7122a9b6
Merge branch 'staging' into fix/agent-card-decouple-from-setup 2026-05-04 10:24:41 -07:00
Hongming Wang
b3982035b3
Merge branch 'staging' into fix/memory-v2-main-wiring 2026-05-04 10:24:31 -07:00
Hongming Wang
d1122f8d28 fix(build): register not_configured_handler in TOP_LEVEL_MODULES
The wheel-build drift gate caught the new module added in this PR —
without registering it, the published wheel would ship `import
not_configured_handler` un-rewritten, which would `ModuleNotFoundError`
at runtime under `molecule_runtime.main`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 10:24:02 -07:00
Hongming Wang
4b35d25d86 fix(runtime): decouple agent-card readiness from adapter.setup()
Today, if `adapter.setup()` raises (most often: an LLM credential is
missing/rotated), main.py crashes before the agent-card route is mounted.
start.sh restart-loops, /.well-known/agent-card.json never returns 200,
and the workspace is invisible to the bench/canvas — operators see
"stuck booting forever" with no clear error to act on.

The agent-card is a static capability advertisement (name, version,
skills, supported protocols). It doesn't need a working LLM. Coupling
its mount to setup() conflates *availability* ("am I up?") with
*configuration* ("can I actually answer?"). They're different concerns.

This change:
- Builds AgentCard from `config.skills` (static names from config.yaml)
  BEFORE adapter.setup(), so the route mounts independent of setup state.
- Wraps setup() + create_executor in try/except. On success, mounts
  the real DefaultRequestHandler with rich loaded_skills metadata
  swapped into the card in-place. On failure, mounts a JSON-RPC
  handler that returns -32603 "agent not configured" with the
  setup() exception in error.data.
- Heartbeat keeps running on misconfigured boots so the platform
  marks the workspace as reachable-but-misconfigured rather than
  crash-looping. Operators redeploy with corrected env without
  chasing a restart loop.
- initial_prompt and idle_loop are skipped on misconfigured boots —
  they self-fire to /, which would land in -32603 anyway, and the
  marker would consume on the first useless attempt.

Bench impact (RFC #388 strict <120s): codex/openclaw bench-time-outs
were the agent-card-never-returns-200 symptom. With this fix those
runtimes serve the card immediately on EC2 boot, so the bench
measures infrastructure cold-start (claude-code class: ~50–80s)
instead of credential-coupled boot.

Adds workspace/not_configured_handler.py (factory + module-level so
behavior is unit-testable; main.py is `# pragma: no cover`) and
workspace/tests/test_not_configured_handler.py (6 tests covering
status code, JSON-RPC envelope shape, id-echo, malformed-body
fallback, reason surfacing, batch-body safety).

All 1665 existing workspace tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 10:22:31 -07:00
Hongming Wang
46731729d4 Memory v2 fixup Critical: wire plugin from main.go (was fully dormant)
Caught during continued review: the entire v2 plugin system shipped
in PRs #2729-#2742 + #2744-#2751 was never actually invoked because
main.go and router.go don't construct the plugin client/resolver or
attach the WithMemoryV2 / WithNamespaceCleanup hooks.

Operators setting MEMORY_PLUGIN_URL=... saw zero behavior change
because nothing read it. Every fixup we shipped (idempotency, verify
mode, expires_at validation, audit JSON, namespace cleanup, O(N)
export, boot E2E) was also dormant for the same reason.

Root cause: when a multi-handler feature lands across many PRs, none
of them are individually responsible for wiring main.go — and the
master-task-tracking issue didn't gate-check that the wiring landed.
Add main.go integration to every multi-handler RFC checklist.

What ships:

  * internal/memory/wiring/wiring.go: new package that constructs the
    plugin client + resolver from MEMORY_PLUGIN_URL once. Returns nil
    when unset (preserves zero-config legacy behavior). Probes
    /v1/health at boot but doesn't fail-closed — the MCP layer's
    circuit breaker handles ongoing unavailability.

  * internal/memory/wiring/wiring_test.go: 6 tests covering the
    nil/non-nil bundle paths + the namespace-cleanup closure
    contract (nil-safe, format-stable, failure-tolerant).

  * cmd/server/main.go: imports memwiring, calls Build(db.DB) once
    after WorkspaceHandler creation, attaches WithNamespaceCleanup,
    threads the bundle through router.Setup.

  * internal/router/router.go: Setup signature gains *memwiring.Bundle
    param. Inside, attaches WithMemoryV2 to AdminMemoriesHandler and
    MCPHandler when the bundle is non-nil.

After this, the v2 plugin is reachable end-to-end:

  Operator sets MEMORY_PLUGIN_URL → main.Build instantiates client +
  resolver → WorkspaceHandler gets cleanup hook → router wires
  AdminMemoriesHandler + MCPHandler with WithMemoryV2 → MCP tool
  calls (commit_memory_v2, search_memory, etc.) actually do
  something → admin export/import respects MEMORY_V2_CUTOVER.

Prerequisite for #292 (staging verification) — without this, the
operator runbook's step 2 (set MEMORY_PLUGIN_URL, observe behavior)
silently no-ops.

Verified: all 9 affected test packages still green
(memory/{client,contract,e2e,namespace,pgplugin,wiring}, handlers,
router, plus the build).
2026-05-04 10:22:30 -07:00
Hongming Wang
6dc2d907a2
Merge pull request #2754 from Molecule-AI/auto-sync/main-849bc973
chore: sync main → staging (auto, ff to 849bc973)
2026-05-04 17:19:03 +00:00
molecule-ai[bot]
849bc97349
Merge pull request #2753 from Molecule-AI/staging
staging → main: auto-promote e13dcab
2026-05-04 17:08:11 +00:00
Hongming Wang
e13dcab5e0
Merge pull request #2749 from Molecule-AI/fix/memory-v2-i3-export-on
Memory v2 fixup I3: admin export O(workspaces) → O(N_roots+1)
2026-05-04 16:49:43 +00:00
Hongming Wang
721010307c
Merge pull request #2752 from Molecule-AI/auto-sync/main-73a949bb
chore: sync main → staging (auto, ff to 73a949bb)
2026-05-04 16:49:23 +00:00
Hongming Wang
9f47ecf86e
Merge branch 'staging' into fix/memory-v2-i3-export-on 2026-05-04 09:44:37 -07:00
Hongming Wang
ebc20794f3 fix(admin-memories): include each member's private namespace in export
ReadableNamespaces(rootID) returns {workspace:rootID, team:rootID,
org:rootID} — the workspace: namespace it surfaces is the root's only.
The I3 batching change resolved namespaces once per root which silently
dropped every child workspace's private memories from admin export
(workspace:childID never reached the plugin search).

Keep the per-root batching win for team:/org:/custom: namespaces;
inject each member's workspace:<id> + owner mapping explicitly so
coverage matches the legacy per-workspace iteration.

Cost stays at 1 SQL + N_roots resolver + 1 plugin search.

Test changes:
- New TestExport_IncludesEveryMembersPrivateNamespace uses a
  per-workspace resolver stub (mirrors real behaviour) and asserts
  every member's workspace:<id> reaches the plugin search AND that
  children's private memories appear in the response with correct
  owner attribution. Verified to FAIL on the pre-fix code.
- TestExport_BatchesPluginCallsByRoot updated to expect 5 namespaces
  (3 workspace + team + org) instead of 3 — it had pinned the buggy
  3-namespace behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 09:44:06 -07:00
Hongming Wang
73a949bb5c
Merge pull request #2737 from Molecule-AI/staging
staging → main: auto-promote f74fff6
2026-05-04 09:37:55 -07:00
Hongming Wang
281cb04163
Merge pull request #2751 from Molecule-AI/fix/memory-v2-opt2-boot-e2e
Memory v2 fixup Opt-2: real-subprocess boot E2E
2026-05-04 16:27:56 +00:00
Hongming Wang
fe7ff5440d Memory v2 fixup Opt-2: add E2E.md operator runbook
Companion to boot_e2e_test.go (just merged). Documents:
  - When the E2E suite runs (build tag + env var)
  - Local run with docker postgres
  - CI integration example (label-gated workflow step)
  - What each test pins
  - Explicit gap list (migration drift, recovery, TTL)
2026-05-04 09:24:16 -07:00
Hongming Wang
5b0a75ab73 Memory v2 fixup Optional-2: real-subprocess boot E2E
Self-review #293. PR-11's E2E test uses sqlmock + httptest —
integration, not E2E. This adds the actual real-subprocess test:
build the binary with `go build`, start it pointing at real postgres,
drive HTTP via the real client.

What in-process tests miss that this catches:
  - Binary build / boot-path panics (env var typos, mixed-key
    interface bugs that only surface when start() runs)
  - Wire encoding bugs that sqlmock smooths over (the pq.Array
    regression from PR-3 development would have been caught here)
  - HTTP+TCP-socket edge cases
  - Real upsert behavior under postgres ON CONFLICT (C1 fix)

Build-tag gated so default CI doesn't require docker:
  go test -tags memory_plugin_e2e -v ./cmd/memory-plugin-postgres/

Tests skip silently when MEMORY_PLUGIN_E2E_DB is unset.

Three tests:
  1. TestE2E_BootAndHealth — capabilities advertised correctly
  2. TestE2E_FullCommitSearchForgetRoundTrip — full agent flow
  3. TestE2E_IdempotencyKey — C1 upsert against real postgres

Plus E2E.md operator runbook with docker quickstart + CI integration
example + explicit statement of what's still uncovered (migration
drift, recovery scenarios, TTL eviction over real time).
2026-05-04 09:23:46 -07:00
Hongming Wang
a6dadc7ee0
Merge pull request #2750 from Molecule-AI/fix/memory-v2-i5-namespace-cleanup
Memory v2 fixup I5: workspace purge cleans up plugin namespace
2026-05-04 16:23:41 +00:00
Hongming Wang
5e52a0fdad
Merge pull request #2748 from Molecule-AI/docs/memory-v2-fixup-docs
Memory v2 docs update: idempotency key + verify mode + cutover runbook
2026-05-04 16:21:02 +00:00
Hongming Wang
6b445aae2d Memory v2 fixup I5: workspace purge cleans up plugin namespace
Self-review #291. When a workspace is hard-purged, its
`workspace:<id>` namespace stays in the plugin storage. Over time
deleted workspaces accumulate as orphan namespaces.

Fix: optional namespaceCleanupFn hook on WorkspaceHandler. The
purge path (workspace_crud.go ~line 520) iterates each purged id
and calls the hook best-effort. main.go wires the hook to
plugin.DeleteNamespace when MEMORY_PLUGIN_URL is set; operators
who haven't enabled the plugin keep the no-op default.

Why a hook (not direct plugin import):
  * Keeps WorkspaceHandler decoupled from the memory contract
    package (easier to test, smaller blast radius if the contract
    bumps)
  * Tests inject a captureCleanupHook stub without standing up a
    real plugin client
  * Production wiring stays a one-liner in main.go

What gets cleaned up:
  * `workspace:<id>` for each purged workspace
  * NOT `team:<root>` / `org:<root>` — those may still be
    referenced by other workspaces under the same root, so dropping
    them on a single workspace's purge would orphan team/org data
    for the survivors. Operator can purge those manually after
    confirming the entire root is gone.

What stays untouched:
  * Soft-removed workspaces (status='removed', no ?purge=true). The
    grace window is by design — the data should still be there if
    the operator unremoves.

Tests:
  * TestWithNamespaceCleanup_DefaultIsNil pins the safe default
  * TestWithNamespaceCleanup_NilStaysNil pins the explicit-nil case
  * TestWithNamespaceCleanup_AttachesFn pins the wiring
  * TestPurge_CallsCleanupHookPerID exercises the per-id loop body
  * TestPurge_NilHookIsSkipped pins the nil guard

A full end-to-end Delete-handler test requires mocking broadcaster
+ provisioner + descendant SQL chain, which is out-of-scope for a
single fixup. Integration coverage for the wired path lives in
PR-11's E2E swap test (#293 follow-up).
2026-05-04 09:20:37 -07:00
Hongming Wang
4f3d51bd61
Merge branch 'staging' into docs/memory-v2-fixup-docs 2026-05-04 09:18:49 -07:00
Hongming Wang
9a64aeaa2c Memory v2 fixup I3: admin export O(workspaces) → O(N_roots+1)
Self-review #289. The previous exportViaPlugin ran one resolver CTE
walk + one plugin search PER WORKSPACE. For a 1000-workspace tenant
that's 1000× of each, mostly redundant — workspaces sharing a
team/org root see identical readable namespaces.

New strategy:
  1. Single SQL pass returns each workspace + its computed root_id
     via a recursive CTE (loadWorkspacesWithRoots).
  2. Group by root → unique tree count is typically << workspace
     count.
  3. Resolver runs ONCE per root (any member sees the same readable
     list).
  4. Build the union of all root namespaces; single plugin.Search
     call.
  5. Map each memory back to a workspace_name via pickOwnerForNamespace
     (workspace:<id> → matching member; team:* / org:* / custom:* →
     canonical first member of root group).

Net call cost: 1 SQL + N_roots resolver + 1 plugin call (vs
N_workspaces × resolver + N_workspaces × plugin in the old code).

Tests:
  * TestExport_BatchesPluginCallsByRoot pins the new behavior
    explicitly: 3 workspaces under 1 root → exactly 1 plugin search
    (was 3 with the old code).
  * TestPickOwnerForNamespace covers all five attribution cases:
    workspace:<id> match, workspace:<id> no-match-fallback, team:*,
    org:*, custom:* → first-member-of-root-group; plus empty-members
    fallback.
  * All 9 existing TestExport_* / TestImport_* / TestPickOwner /
    TestNamespaceKindFromLegacyScope / TestSkipImport / etc. tests
    remain green (verified with -run "Export").

The legacy DB path (when MEMORY_V2_CUTOVER unset) is unchanged.
2026-05-04 09:17:30 -07:00
Hongming Wang
2d783b5ca6 Memory v2 docs update: idempotency key + verify mode + cutover runbook
Updates plugin-author and operator docs to reflect the four fixup
PRs (C1, C2, I1, I4) for self-review findings.

Stacked on C1+C2 so the docs reference behavior that lands in the
same wave; rebases to staging once those merge.

What changes:

  * docs/memory-plugins/README.md
    - New "Memory idempotency" section explaining MemoryWrite.id
      contract: omit → plugin generates UUID; supplied → upsert
    - "Replacing the built-in plugin" rewritten as a 6-step
      operator runbook with concrete commands for -dry-run / -apply
      / -verify / MEMORY_V2_CUTOVER, including the failure path
      ("if -verify reports mismatches, do not flip the cutover flag")
    - Added link to new CHANGELOG.md

  * docs/memory-plugins/testing-your-plugin.md
    - New TestMyPlugin_IDIsIdempotencyKey example: write same id
      twice, assert single row + updated content
    - "What the harness does NOT cover" expanded with two new
      operational gates: backfill twice → no double; verify-mode
      reports zero mismatches

  * docs/memory-plugins/pinecone-example/README.md
    - Wire-mapping table updated: id (caller-supplied) → Pinecone
      vector id (upsert); id (omitted) → plugin-generated UUID
    - Production-hardening checklist gained an idempotency-key item

  * docs/memory-plugins/CHANGELOG.md (new)
    - Captures the four fixup PRs in one place with severity-ordered
      summary, plugin-author action items, and remaining open
      follow-ups (#289, #291, #293) for transparency

No code changes. Docs-only PR.
2026-05-04 09:08:28 -07:00
Hongming Wang
6fc328ef44
Merge pull request #2747 from Molecule-AI/fix/memory-v2-c2-backfill-verify
Memory v2 fixup C2: backfill -verify mode (parity check)
2026-05-04 16:08:27 +00:00
Hongming Wang
bb3212ad37
Merge branch 'staging' into fix/memory-v2-c2-backfill-verify 2026-05-04 09:08:21 -07:00
Hongming Wang
1986260603 Merge remote-tracking branch 'origin/fix/memory-v2-c1-backfill-idempotent' into docs/memory-v2-fixup-docs 2026-05-04 09:05:11 -07:00
Hongming Wang
d297e75fc9
Merge pull request #2746 from Molecule-AI/fix/memory-v2-i1-i4-small
Memory v2 fixup I1+I4: expires_at validation + audit JSON marshal
2026-05-04 16:05:02 +00:00
Hongming Wang
3ae0513209
Merge pull request #2744 from Molecule-AI/fix/memory-v2-c1-backfill-idempotent
Memory v2 fixup C1: backfill idempotency via MemoryWrite.id
2026-05-04 16:04:54 +00:00
Hongming Wang
4b6373861c Memory v2 fixup C2: backfill -verify mode (parity check)
Self-review missed deliverable from PR-7's task spec. Operators had
no way to confirm a -apply produced equivalent search results to the
legacy agent_memories direct queries; this PR ships that.

Usage:
  memory-backfill -verify                      # 50-workspace random sample
  memory-backfill -verify -verify-sample=200   # bigger sample
  memory-backfill -verify -workspace=<uuid>    # one specific workspace

Algorithm:
  1. Pick N random workspaces (or use -workspace if specified)
  2. For each: query agent_memories direct, query plugin search via
     the workspace's readable namespace list
  3. Multiset-compare contents: every legacy row must have a matching
     plugin row. Plugin having MORE rows is OK (team-shared content
     may be visible from sibling workspaces).
  4. Print mismatches with content excerpt; non-zero mismatches/errors
     yields a non-zero exit so CI can gate cutover.

Sql:
  - Sampling uses ORDER BY random() LIMIT N (TABLESAMPLE has surprising
    distribution at small populations).
  - Filters out status='removed' workspaces.

Test coverage:
  * pickWorkspaceSample: single-ws short-circuit, random sampling,
    query error, scan error
  * queryLegacyMemories: happy path, error path
  * verifyParity:
      - all match → 1 match, 0 mismatch
      - missing-from-plugin → 1 mismatch with content excerpt
      - plugin-extra rows → 1 match (legacy is subset of plugin)
      - legacy query error → 1 error counter
      - resolver error → 1 error counter
      - plugin search error → 1 error counter
      - no readable namespaces + empty legacy → match
      - no readable namespaces + non-empty legacy → mismatch
      - pickSample error → propagated up
  * CLI: -verify+-apply rejected as mutually exclusive; -verify alone
    is a valid mode

Note: namespaceResolverAdapter bridges *namespace.Resolver to the
verify package's verifyResolver interface so verify.go has zero
dependency on the namespace package — keeps test stubs minimal.
2026-05-04 09:01:31 -07:00
Hongming Wang
3886e8fb9f
Merge pull request #2745 from Molecule-AI/fix/harness-stub-auth-headers-1arg
fix(harness): stub platform_auth with *args lambdas (#2743 fallout)
2026-05-04 15:58:24 +00:00
Hongming Wang
d48693144b Memory v2 fixup I1+I4: expires_at validation + audit JSON marshal
Two small Important findings from self-review, bundled because both
are <20 line changes touching the same file.

I1: expires_at silent drop
  - mcp_tools_memory_v2.go:130 had `if t, err := ...; err == nil { ... }`
    which dropped malformed timestamps without telling the agent.
    Agent passes `expires_at: "tomorrow"`, gets a 200, and the memory
    has no TTL.
  - Now returns a clear error: "invalid expires_at: must be RFC3339"
  - Test renamed: TestCommitMemoryV2_BadExpiresIsIgnored (which
    codified the bug) → TestCommitMemoryV2_BadExpiresReturnsError
    (which pins the fix).

I4: audit log JSON via Sprintf-%q
  - auditOrgWrite was building activity_logs.metadata via fmt.Sprintf
    with %q. Go-quoted strings happen to coincide with JSON-quoted
    for ASCII (and today's values are pure ASCII: UUID + hex digest)
    so the bug was latent.
  - Replaced with json.Marshal of map[string]string. Same wire shape
    today, but won't silently produce invalid JSON if metadata grows
    to include arbitrary content snippets.
  - New test TestAuditOrgWrite_MetadataIsValidJSON uses a custom
    sqlmock.Argument matcher (jsonValidMatcher) that fails the test
    if the metadata column isn't parseable JSON. The test runs
    auditOrgWrite with a content string containing quotes,
    backslashes, and a control byte — values where %q would diverge
    from JSON-quote.

Both pre-existing tests (TestCommitMemoryV2_AuditsOrgWrites etc.)
remain green.
2026-05-04 08:57:58 -07:00
Hongming Wang
1b207b214d fix(harness): stub platform_auth with *args lambdas (#2743 fallout)
PR #2743 (multi-workspace MCP PR-2) made auth_headers accept an
optional ``workspace_id`` arg and self_source_headers stayed
1-arg-required. The peer-discovery-404 harness replay stubbed both
with 0-arg lambdas, so the helper call inside the replay raised:

    TypeError: <lambda>() takes 0 positional arguments but 1 was given

…and the diagnostic captured by the replay was the TypeError text,
not the platform-404 string the assertion grep'd for. Caught by
PR-2737 (auto-promote staging→main) — the replay went red right
after #2743 merged into staging.

Switching both stubs to ``*args, **kwargs`` makes them tolerant of
both the legacy 0-arg call shape AND the new 1-arg-with-workspace
call shape, so neither the harness nor the in-tree unit tests need
to know which version of the runtime helpers ran the call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 08:55:42 -07:00
Hongming Wang
1e97fb9a16 Memory v2 fixup C1: backfill idempotency via MemoryWrite.id
Self-review (post-merge) flagged that the backfill claimed to be
idempotent on re-run but actually duplicates every row because the
plugin's INSERT uses gen_random_uuid() and ignores any id passed in.

Fix is contract-level: extend MemoryWrite with an optional `id`
idempotency key. When supplied, the plugin MUST treat the write as
upsert keyed on this id; when omitted, the plugin generates a fresh
UUID (production agent commits keep working unchanged).

Changes:
  * docs/api-protocol/memory-plugin-v1.yaml: add id field with
    description that flags it as idempotency key
  * internal/memory/contract/contract.go: add ID to MemoryWrite struct,
    update memory_write_minimal golden vector
  * internal/memory/pgplugin/store.go: split CommitMemory into two
    paths — upsert when body.ID set (INSERT ... ON CONFLICT (id) DO
    UPDATE), plain INSERT otherwise
  * cmd/memory-backfill/main.go: pass agent_memories.id to MemoryWrite,
    fix the false comment about 409 deduplication

New tests:
  * pgplugin: TestCommitMemory_WithIDUpserts pins the upsert SQL is
    used when id is set; TestCommitMemory_UpsertScanError covers the
    error branch
  * backfill: TestBackfill_PassesSourceUUIDAsIdempotencyKey pins the
    forwarding behavior; TestBackfill_RerunIsIdempotent simulates a
    retry and asserts both runs pass the same uuid (plugin upsert is
    what makes this safe)

Why this matters: operators retrying a failed backfill (which they
will — networks fail, transactions abort) would otherwise create N
duplicates per memory. The duplicates aren't visible until search
results show obvious dupes — debugging that under prod load is bad.

Production agent commits are unaffected: they leave id empty, the
plugin generates a fresh UUID via gen_random_uuid(), zero behavior
change for the hot path.
2026-05-04 08:54:13 -07:00
Hongming Wang
7cffff844b
Merge pull request #2743 from Molecule-AI/feat/mcp-multi-workspace-pr2
feat(mcp): cross-workspace delegation routing (multi-ws PR-2)
2026-05-04 15:43:20 +00:00
Hongming Wang
4a0d7cd545
Merge branch 'staging' into feat/mcp-multi-workspace-pr2 2026-05-04 08:37:20 -07:00
Hongming Wang
35b3ea598a test: fix WORKSPACE_ID assert to match module attr (CI portability)
CI's pytest harness pre-sets WORKSPACE_ID=test in the env before
test collection, so a2a_client's module-level WORKSPACE_ID
(captured at import time, line 24) holds "test" — but the local
fixture's monkeypatch.setenv("WORKSPACE_ID", ...) only affects the
ENV value seen on later os.environ reads, NOT the already-bound
module attribute.

Assert against a2a_client.WORKSPACE_ID directly so the test is
portable across local + CI runs without monkey-patching the module
itself (which a future test reload might undo).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 08:35:48 -07:00
Hongming Wang
1161b97faf feat(mcp): cross-workspace delegation routing (multi-ws PR-2)
PR-2 of the multi-workspace external-agent stack. PR-1 (#2739)
landed per-workspace auth + heartbeat + inbox. This PR threads
``source_workspace_id`` through the A2A client + tool surface so an
agent registered against multiple workspaces can list peers across
all of them and delegate from a specific source.

Changes
-------

* ``a2a_client``: ``discover_peer``, ``send_a2a_message``,
  ``get_peers_with_diagnostic``, and ``enrich_peer_metadata`` now
  accept ``source_workspace_id``. Routing uses it for both the
  X-Workspace-ID header and (transitively, via ``auth_headers(src)``)
  the bearer token. Defaults to module-level WORKSPACE_ID for
  back-compat.
* ``a2a_client._peer_to_source``: a new lock-free cache mapping each
  discovered peer back to the source workspace whose registry
  surfaced it. ``tool_list_peers`` populates the cache on every call;
  ``tool_delegate_task`` consults it for auto-routing.
* ``a2a_tools.tool_list_peers(source_workspace_id=None)``: when
  multiple workspaces are registered (MOLECULE_WORKSPACES) and no
  explicit source is passed, aggregates peers across every
  registered workspace and tags each entry with ``via: <src[:8]>``.
  Single-workspace mode is unchanged — no ``via:`` annotation, same
  output shape.
* ``a2a_tools.tool_delegate_task`` and ``tool_delegate_task_async``
  resolve source via ``source_workspace_id arg → _peer_to_source[target]
  → WORKSPACE_ID``. Agents almost never need to specify ``source_*``
  explicitly — call ``list_peers`` first and the cache handles the
  rest.
* ``tool_delegate_task_async`` idempotency key now includes the
  source workspace, so the same task delegated from two registered
  workspaces produces two distinct delegations (the right behavior
  — one per tenant audit trail).
* ``platform_auth.list_registered_workspaces()``: new helper for the
  tool layer to enumerate the multi-ws registry. Lock-free reads
  matched by the existing single-writer-per-workspace contract from
  PR-1.
* ``platform_auth.self_source_headers``: now passes ``workspace_id``
  through to ``auth_headers`` — without this, a multi-workspace POST
  source-tagged with ``X-Workspace-ID=ws_b`` was authenticating
  with ws_a's token (or no token if MOLECULE_WORKSPACE_TOKEN unset).
  Latent PR-1 bug exposed by the new tool surface.
* ``a2a_mcp_server`` tool dispatch passes ``source_workspace_id``
  from the tool call arguments.
* ``platform_tools.registry``: add ``source_workspace_id`` to the
  delegate_task, delegate_task_async, check_task_status, list_peers
  input schemas with copy explaining when to use it (rarely — the
  cache handles it).

Tests (15 new, all passing)
---------------------------

``test_a2a_multi_workspace.py``:
* TestDiscoverPeerSourceRouting (3): src arg drives header+token,
  fallback to module ws when omitted, invalid target short-circuits
  before any HTTP attempt.
* TestSendA2AMessageSourceRouting (1): X-Workspace-ID source header
  + Authorization bearer both come from the source arg via the
  patched self_source_headers chain.
* TestGetPeersSourceRouting (1): URL path AND headers use the
  source workspace id.
* TestToolListPeersAggregation (4): aggregates across multiple
  registered workspaces, tags origin, leaves single-workspace path
  unchanged, explicit src arg overrides aggregation, diagnostic
  joining when every workspace returns empty.
* TestToolDelegateTaskAutoRouting (3): cache-driven auto-route,
  explicit override beats cache, single-workspace fallback to
  module WORKSPACE_ID.
* TestListRegisteredWorkspaces (3): registry enumeration helper.

Plus ``tests/snapshots/a2a_instructions_mcp.txt`` regenerated to
absorb the new ``source_workspace_id`` schema entries.

Back-compat
-----------

Every change defaults ``source_workspace_id=None``; legacy
single-workspace operators (no MOLECULE_WORKSPACES) see identical
behavior — same URLs, same headers, same tool output. The 24
PR-1 tests + 125 existing A2A tests all still pass.

Out of scope (PR-3)
-------------------

Memory namespacing per registered workspace lands after the new
memory system v2 PR (#2740) settles in production.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 08:32:24 -07:00
Hongming Wang
059962a0a3
Merge pull request #2742 from Molecule-AI/feat/memory-v2-pr11-e2e-swap
Memory v2 PR-11: E2E test — flat-plugin swap proves contract works
2026-05-04 15:29:56 +00:00
Hongming Wang
b07575c710
Merge branch 'staging' into feat/memory-v2-pr11-e2e-swap 2026-05-04 08:24:26 -07:00
Hongming Wang
586fa5f84e
Merge pull request #2741 from Molecule-AI/feat/memory-v2-pr10-docs
Memory v2 PR-10: operator docs for writing a custom memory plugin
2026-05-04 15:20:35 +00:00
Hongming Wang
b937415e1e Memory v2 PR-11: E2E test — flat-plugin swap proves contract works
Final implementation PR. Builds on PR-1..10 (all merged or queued).

Proves the central design property of the plugin contract: ANY
plugin satisfying the v1 OpenAPI spec works as a drop-in replacement
for the built-in postgres plugin. If this test fails after a refactor,
the contract has drifted in a way that breaks ecosystem plugins.

What ships:
  * internal/memory/e2e/swap_test.go — five E2E tests against a
    deliberately minimal "flat-memory" stub plugin (~50 LOC, single
    map, zero capabilities)
  * MCPHandler.Dispatch — small exported wrapper around dispatch so
    out-of-package E2E tests can drive tools by name without
    duplicating the whole MCP RPC stack

E2E coverage:
  * TestE2E_FlatPluginRoundTrip: full lifecycle
    - list_writable_namespaces returns 3 entries
    - commit_memory_v2 writes through plugin
    - search_memory finds it back
    - commit_summary writes a summary
    - forget_memory deletes
    - search after forget excludes the deleted memory

  * TestE2E_LegacyShimRoutesThroughFlatPlugin: PR-6 shim wired up
    - Legacy commit_memory(scope=LOCAL) ends up in plugin storage
    - Legacy recall_memory finds it back through plugin search
    - Response shapes preserved (scope:LOCAL stays scope:LOCAL)

  * TestE2E_OrgMemoriesDelimiterWrap: prompt-injection mitigation
    - Org-namespace memory committed
    - Audit INSERT into activity_logs verified
    - Search returns content with [MEMORY id=... scope=ORG ns=...]
      prefix applied

  * TestE2E_StubPluginCapabilitiesAreEmpty: capability negotiation
    - Stub plugin reports zero capabilities
    - Client.SupportsCapability returns false for FTS, embedding
    - Confirms graceful degradation when plugin doesn't support a
      feature

  * TestE2E_PluginUnreachable_AgentSeesClearError: failure surface
    - Plugin URL pointing at bogus port
    - commit_memory_v2 returns informative error
    - No nil-pointer dereference; error message is actionable

The flat plugin is intentionally minimal — it has no namespaces table
distinct from memory records, no FTS, no semantic search, no TTL. The
test proves operators can drop in a 50-line plugin and the agent
behavior is identical (modulo capability-gated features).
2026-05-04 08:20:35 -07:00
Hongming Wang
0f46c7eefe
Merge pull request #2739 from Molecule-AI/feat/mcp-multi-workspace-pr1
mcp: support multi-workspace external-agent registration (PR-1 of stack)
2026-05-04 15:19:03 +00:00