molecule-core/docs/ecosystem-research-outcomes.md
Hongming Wang d8026347e5 chore: open-source restructure — rename dirs, remove internal files, scrub secrets
Renames:
- platform/ → workspace-server/ (Go module path stays as "platform" for
  external dep compat — will update after plugin module republish)
- workspace-template/ → workspace/

Removed (moved to separate repos or deleted):
- PLAN.md — internal roadmap (move to private project board)
- HANDOFF.md, AGENTS.md — one-time internal session docs
- .claude/ — gitignored entirely (local agent config)
- infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy
- org-templates/molecule-dev/ → standalone template repo
- .mcp-eval/ → molecule-mcp-server repo
- test-results/ — ephemeral, gitignored

Security scrubbing:
- Cloudflare account/zone/KV IDs → placeholders
- Real EC2 IPs → <EC2_IP> in all docs
- CF token prefix, Neon project ID, Fly app names → redacted
- Langfuse dev credentials → parameterized
- Personal runner username/machine name → generic

Community files:
- CONTRIBUTING.md — build, test, branch conventions
- CODE_OF_CONDUCT.md — Contributor Covenant 2.1

All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml,
README, CLAUDE.md updated for new directory names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 00:24:44 -07:00

15 KiB
Raw Blame History

Ecosystem Research Outcomes

Input: docs/ecosystem-watch.md — Holaboss (holaboss-ai/holaboss-ai), Hermes Agent (NousResearch/hermes-agent), gstack (garrytan/gstack).

Method: Molecule AI-dev team coordination — PM fan-out to Research Lead (3 analysts) and Dev Lead (6 engineers). Full research corpus archived under /tmp/eco_research/ during synthesis; raw outputs are what the team actually said. Cross-referenced against real repo files before listing any file path in this doc.

Date: 2026-04-12


Top-5 platform improvements (prioritized)

Ranking is by convergence across team members + impact for the hours spent. All effort tags are S (≤1 day), M (13 days), L (≥1 week).

1. Memory: Postgres FTS + namespace scoping — S, high impact

Replace the content ILIKE '%q%' sequential scan in workspace-server/internal/handlers/memories.go:Search with a tsvector generated column, GIN index, and ts_rank ordering. Add a namespace VARCHAR(50) DEFAULT 'general' column plus the (workspace_id, namespace) composite index. Ship as migration workspace-server/migrations/017_memories_fts_namespace.sql. Purely additive — old rows get namespace = 'general', new query params (?q=, ?namespace=) are optional, no breaking change.

Converged across Backend, QA, Frontend, UIUX — everyone proposed some flavour of this. Combines Hermes's FTS5 recall pattern with Holaboss's knowledge/{facts,procedures,blockers,reference}/ namespace model. Canvas can render namespace-grouped accordions against the same endpoint with zero backend changes after day one.

Ecosystem citations: Hermes — "FTS5 + LLM-summarization for cross-session recall — cheap, no vector-store overhead"; Holaboss — filesystem-as-memory hierarchy.

2. Workspace hibernation: idle watchdog + auto-pause — M, DevOps win

DevOps Engineer's proposal: add a _idle_watchdog background job in workspace/entrypoint.sh that reads /tmp/.last_activity (written by main.py on each A2A request) and calls the existing POST /workspaces/:id/pause after IDLE_SHUTDOWN_MINUTES (default 30). Platform's existing liveness monitor handles resume on next task; no new Go code required — this is a shell + one main.py line + an env var. Enables Hermes-style serverless-ish behaviour for agents that only wake for cron audits (Security Auditor, QA Engineer). Pairs naturally with Proposal 3 below.

Ecosystem citation: Hermes — Daytona / Modal serverless backends with hibernation.

3. Parallel adapter builds — S, QoL

workspace/build-all.sh builds the 6 adapter images sequentially (~15 min wall-clock). They all FROM workspace-template:base with no inter-adapter dependency — swap the Step 3 loop for background jobs + wait, log each build to /tmp/build_<tag>.log. Cuts total build time to ~57 minutes. Prerequisite for hibernate/wake feeling snappy (Proposal 2).

4. Plugin manifest: permissions + version floor + config schema — S, spec-alignment

Extend pluginInfo in workspace-server/internal/handlers/plugins.go with permissions []string (e.g. env:GITHUB_TOKEN, path:/workspace/repo, docker:CAP), min_platform_version (semver floor enforced at install time when PLATFORM_VERSION env is set), and config_schema json.RawMessage (stored raw so canvas can render a form without re-parsing). All three are additive — missing keys unmarshal to zero values. Positions Molecule AI ahead of the agentskills.io spec picking up permissions semantics, and mirrors Holaboss's workspace.yaml-forces-prompts-into-AGENTS.md principle (config stays machine-readable).

Ecosystem citations: Hermes — "if agentskills.io spec picks up mass adoption → align our plugin manifest"; Holaboss — workspace.yaml rejects inline prompt bodies.

5. Fail-secure encryption at boot — S, security critical

Security Auditor's top proposal. Today SECRETS_ENCRYPTION_KEY is optional — when unset, the platform boots and silently falls back to storing secrets in plaintext. Flip to fail-secure: if the binary is built with go build -tags prod (or MOLECULE_ENV=prod is set), refuse to start without a 32-byte key and log a loud abort. Dev builds retain the current fallback with a startup warning. Small, surgical change in workspace-server/internal/crypto/aes.go

  • cmd/server/main.go init; unit test already exists to verify encryption path.

Ecosystem citation: gstack CSO persona — OWASP A02:2021 (cryptographic failures), STRIDE "Tampering / Information Disclosure."


Per-agent improvement proposals

Each of the 9 team members produced 23 concrete proposals mapped to real file paths. Summary here; full proposals live in the captured research (happy to expand any). Proposals adopted into Top-5 above are marked .

Market Analyst (Holaboss axis, 3,164 chars)

  • Structured filesystem memory layer (→ Top-5 #1)
  • Compaction-boundary artifact for long-horizon single-agent mode — defer (we're multi-agent; only useful if we add a persistent PM-only mode).
  • Section-based prompt assembly with per-section cache fingerprints — consider once Claude-SDK prompt-caching becomes a cost line item.

Technical Researcher (Hermes axis, 3,874 chars)

  • Nudge-to-persist memory pattern → exposed in UIUX Proposal 2 below.
  • FTS5 recall (→ Top-5 #1).
  • Daytona/Modal-style hibernation (→ Top-5 #2).
  • Honcho dialectic user-modelling backend — evaluate for PM role only; too invasive to bolt onto every workspace.
  • hermes claw migrate graceful-import pattern — add to backlog if we ever deprecate a runtime adapter.

Competitive Intelligence (gstack axis, 2,974 chars)

  • Weekly Retro Synthesis command (/retro) — CEO-side skill, see below.
  • /freeze, /guard, /unfreeze architectural guardrails — see QA proposal 3.
  • Lift CSO (OWASP + STRIDE) and Designer (AI-slop detection) role prompts into our Security Auditor and UIUX system-prompts as attributed additions — S effort, high leverage.

Frontend Engineer (6,154 chars)

  1. Namespaced Memory Browsercanvas/src/components/tabs/MemoryTab.tsx parses namespace:key naming into grouped accordions. Zero backend change for MVP; converges with BE Top-5 #1.
  2. "Save as memory" nudge in ActivityTabcanvas/src/components/tabs/ActivityTab.tsx renders an inline "Save as memory →" link on task_complete and skill_promo events; clicks pre-populate the MemoryTab add form. Hermes closed-learning-loop pattern.
  3. [3rd proposal in full output] — available on request.

Backend Engineer (12,938 chars, the longest output)

  1. Memory FTS + namespace (→ Top-5 #1)
  2. Plugin manifest extension (→ Top-5 #4)
  3. Schedule import/export via bundle system — M; currently workspace_schedules rows are orphaned on bundles/export. Small handler change in workspace-server/internal/handlers/bundle.go.

DevOps Engineer (6,761 chars)

  1. Idle watchdog auto-pause (→ Top-5 #2)
  2. Parallel adapter builds (→ Top-5 #3)
  3. Per-adapter CI stages (build + smoke-test each image in its own GitHub-Actions matrix job) — M; currently adapter images only get built locally.

Security Auditor (8,875 chars, strongest deliverable)

  1. Fail-secure encryption at boot (→ Top-5 #5)
  2. Remove test:* from production systemCallerPrefixesS. workspace-server/internal/handlers/a2a_proxy.go:50 currently whitelists the literal prefix test: in every environment; it's an access-control bypass waiting to be exploited. Guard behind MOLECULE_ENV != prod.
  3. Plugin supply-chain hardening — mandate plugin.yaml presence and reject staged trees containing executable bits (+x) outside skills/*/hook.sh. S; adds a preflight in workspace-server/internal/plugins/localresolver.go.

QA Engineer (6,395 chars)

  1. Filesystem memory namespace isolation (tests enforcing namespace separation) — S, complements Top-5 #1.
  2. Autonomous skill-creation loop + FTS5 recall test suite — M; Hermes self-improvement pattern needs explicit coverage before landing.
  3. Freeze / Guard / Unfreeze architectural guardrail tests — M; ports gstack's /freeze primitive as enforced test fixtures (e.g. a /freeze on the auth middleware fails CI if any handler modifies it without an override flag).

UIUX Designer (14,285 chars, the longest engineering output)

  1. Namespaced Memory Browser — same as FE proposal 1; the two should be implemented as one ticket, UIUX leads, FE executes.
  2. Clickable Skill Promotion Nudge on node card — surfaces Hermes's skill-promotion pattern at the canvas-graph level. S.
  3. Inline initial_prompt body warning in ConfigTab — canvas/src/components/tabs/ConfigTab.tsx flags when initial_prompt: has inline body text >200 chars with a "Extract to AGENTS.md" lint-style hint. S; encodes the Holaboss principle that config should stay machine-readable.

CEO workflow improvements

Patterns to adopt at the Claude Code CLI layer (the CEO's interface), not inside the Molecule AI platform itself.

New skills to add under .claude/skills/

  1. /retro — lifted directly from gstack. Generates a weekly retrospective by reading git log --since='7 days ago' --oneline --shortstat, the merged PR list, the set of closed issues, and the activity logs across the org. Outputs a markdown doc under docs/retros/<YYYY-MM-DD>.md. High leverage at near-zero cost; gstack validated the pattern at 70k.
  2. /freeze <path> — sets a repo-level flag (a file under .claude/freezes/) that any future code-review or edit skill reads. When the next CEO-driven change touches a frozen path, the edit skill blocks with a clear message. Adopted from gstack's /freeze / /guard / /unfreeze trio.
  3. /verify-refs — explicit helper for the "verify before citing" discipline we encoded in the team's hardened prompts. Takes a draft message, finds #NNN / sha:hex / path: refs, runs gh issue view, git log -1, cat respectively, and reports any mismatches before the CEO sends.

Settings / Hooks changes

  • Pre-tool hook on Bash commands that match git push origin main — reject unless FORCE_PUSH_MAIN=1 is exported. This session we caught ourselves (and PM) about to commit to main twice. A hook makes the rule programmatic.
  • Status line / telemetry counter for MCP tool failures — so PR breakage from upstream MCP-client issues (e.g. #67) surfaces in the prompt, not only when we try to use it.

Process / conventions

  • When briefing PM on a fan-out task: always include the explicit workspace IDs and instruction to inline documents — even though this is now encoded in the hardened system prompts (PR #69), the reminder at task-issue time saves a round-trip.
  • Treat Agent error (ProcessError) as a platform bug, not a transient failure. Restart the affected workspace, note the incident in the issue tracker referencing #66 and #71 until they land.

Ecosystem signals to monitor (next quarter)

Items to watch on docs/ecosystem-watch.md and in the repos directly:

  • agentskills.io spec finalisation — if the upstream spec locks in permissions semantics, our plugin.yaml should conform on the first release day. Today's Top-5 #4 positions us to lead rather than follow.
  • Hermes multi-agent / A2A addition — would put us in direct overlap on the core thesis. Signal: any Nous Research blog post or commit matching a2a|delegate|subagent_a2a.
  • gstack parallel / multi-session — if gstack ships simultaneous Claude Code workers + routing between them, their 70k head start converts into direct competition. Signal: any /multi-* command in the garrytan/gstack repo or a Garry Tan post showing it.
  • Holaboss → A2A — Holaboss shipping workspace-to-workspace messaging would put them in the "AI company" shape we occupy today. Signal: a workspace.yaml connections: field or a holaboss a2a subcommand.
  • Atropos RL trajectory format — if Nous standardises the schema for RL training-data export, our activity logs should adopt it so users can export Molecule AI runs for training.

Explicit non-adoptions

Decisions to NOT copy, with reasons, so we don't revisit them:

  • Holaboss single-active-agent-per-workspace shape — incompatible with our core thesis. Keep the concept of workspace-as-container but don't collapse to a single agent.
  • Hermes six-backend abstraction (Docker / SSH / Daytona / Singularity / Modal) — our Docker provisioner is the right ceiling for v1. Serverless hibernation (Top-5 #2) buys us 80% of the cost win without the plumbing.
  • gstack's Claude-Code-native-only execution model — gstack is a prompt library living inside one Claude Code session. Adopting that shape would eliminate our multi-agent / multi-runtime differentiation. We borrow specific role prompts, not the architecture.
  • workspace.yaml banning inline prompts at the schema level — Holaboss rejects inline prompt bodies at parse time. We ship a UIUX warning instead (UIUX proposal 3) so existing templates keep working. The principle is right; the enforcement mechanism is too blunt for a platform that already has shipped templates out in the wild.
  • Compaction-boundary artifact — solves long-horizon single-agent cost. We are multi-agent with per-workspace checkpointing already; this would be complexity for no direct gain.

Process observations (meta)

Notes on how this coordination went that inform future runs:

  1. #66 (opaque stderr) and #71 (initial_prompt replay crash) are blocking team coordination. Every fresh org import today started with ProcessError cascades. Until these land, any multi-agent research task requires host-side intervention (touching .initial_prompt_done, restarting crashed workspaces).
  2. #65 (per-agent repo-access YAML) would eliminate the inline-documents workaround that every Hard-Learned Rule we just added to the prompts tells the team to do. This is the single highest-leverage platform improvement on the list.
  3. Capturing raw analyst outputs from the activity log is a valid fallback when PM crashes mid-synthesis. All 9 outputs in this doc were retrieved from GET /workspaces/:id/activity after the PM/RL/DL round-trip failed. Worth surfacing in platform docs as the "recovery" path.
  4. The hardened system prompts (PR #69) are already paying off: Research Lead and Dev Lead both immediately fanned out in parallel with delegation IDs, rather than attempting solo synthesis. The "always fan out" rule is doing work.

Next actions

Recommend proceeding in this order, each as its own PR:

  1. Ship #71 fix (initial_prompt marker up-front) — unblocks all future fresh org imports.
  2. Ship #66 fix (stderr capture) — restores debuggability.
  3. Top-5 #1 (memory FTS + namespace) — highest-convergence team ask, cleanest migration.
  4. Top-5 #5 (fail-secure encryption) — security-critical, trivial.
  5. CEO /retro skill — near-zero effort, compounding weekly.

Everything else in this doc flows from there.