molecule-core/docs/ecosystem-research-outcomes.md
Hongming Wang d8026347e5 chore: open-source restructure — rename dirs, remove internal files, scrub secrets
Renames:
- platform/ → workspace-server/ (Go module path stays as "platform" for
  external dep compat — will update after plugin module republish)
- workspace-template/ → workspace/

Removed (moved to separate repos or deleted):
- PLAN.md — internal roadmap (move to private project board)
- HANDOFF.md, AGENTS.md — one-time internal session docs
- .claude/ — gitignored entirely (local agent config)
- infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy
- org-templates/molecule-dev/ → standalone template repo
- .mcp-eval/ → molecule-mcp-server repo
- test-results/ — ephemeral, gitignored

Security scrubbing:
- Cloudflare account/zone/KV IDs → placeholders
- Real EC2 IPs → <EC2_IP> in all docs
- CF token prefix, Neon project ID, Fly app names → redacted
- Langfuse dev credentials → parameterized
- Personal runner username/machine name → generic

Community files:
- CONTRIBUTING.md — build, test, branch conventions
- CODE_OF_CONDUCT.md — Contributor Covenant 2.1

All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml,
README, CLAUDE.md updated for new directory names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 00:24:44 -07:00

345 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Ecosystem Research Outcomes
**Input:** [`docs/ecosystem-watch.md`](./ecosystem-watch.md) — Holaboss
(`holaboss-ai/holaboss-ai`), Hermes Agent (`NousResearch/hermes-agent`),
gstack (`garrytan/gstack`).
**Method:** Molecule AI-dev team coordination — PM fan-out to Research Lead
(3 analysts) and Dev Lead (6 engineers). Full research corpus archived
under `/tmp/eco_research/` during synthesis; raw outputs are what the
team actually said. Cross-referenced against real repo files before
listing any file path in this doc.
**Date:** 2026-04-12
---
## Top-5 platform improvements (prioritized)
Ranking is by convergence across team members + impact for the hours
spent. All effort tags are S (≤1 day), M (13 days), L (≥1 week).
### 1. Memory: Postgres FTS + namespace scoping — **S, high impact**
Replace the `content ILIKE '%q%'` sequential scan in
`workspace-server/internal/handlers/memories.go:Search` with a `tsvector`
generated column, GIN index, and `ts_rank` ordering. Add a
`namespace VARCHAR(50) DEFAULT 'general'` column plus the
`(workspace_id, namespace)` composite index. Ship as migration
`workspace-server/migrations/017_memories_fts_namespace.sql`. Purely
additive — old rows get `namespace = 'general'`, new query params
(`?q=`, `?namespace=`) are optional, no breaking change.
Converged across Backend, QA, Frontend, UIUX — everyone proposed
some flavour of this. Combines Hermes's FTS5 recall pattern with
Holaboss's `knowledge/{facts,procedures,blockers,reference}/`
namespace model. Canvas can render namespace-grouped accordions
against the same endpoint with zero backend changes after day one.
**Ecosystem citations:** Hermes — "FTS5 + LLM-summarization for
cross-session recall — cheap, no vector-store overhead"; Holaboss —
filesystem-as-memory hierarchy.
### 2. Workspace hibernation: idle watchdog + auto-pause — **M, DevOps win**
DevOps Engineer's proposal: add a `_idle_watchdog` background job
in `workspace/entrypoint.sh` that reads `/tmp/.last_activity`
(written by `main.py` on each A2A request) and calls the existing
`POST /workspaces/:id/pause` after `IDLE_SHUTDOWN_MINUTES` (default
30). Platform's existing liveness monitor handles resume on next task;
no new Go code required — this is a shell + one `main.py` line + an
env var. Enables Hermes-style serverless-ish behaviour for agents
that only wake for cron audits (Security Auditor, QA Engineer).
Pairs naturally with Proposal 3 below.
**Ecosystem citation:** Hermes — Daytona / Modal serverless backends
with hibernation.
### 3. Parallel adapter builds — **S, QoL**
`workspace/build-all.sh` builds the 6 adapter images
sequentially (~15 min wall-clock). They all `FROM
workspace-template:base` with no inter-adapter dependency — swap the
Step 3 loop for background jobs + `wait`, log each build to
`/tmp/build_<tag>.log`. Cuts total build time to ~57 minutes.
Prerequisite for hibernate/wake feeling snappy (Proposal 2).
### 4. Plugin manifest: permissions + version floor + config schema — **S, spec-alignment**
Extend `pluginInfo` in `workspace-server/internal/handlers/plugins.go`
with `permissions []string` (e.g. `env:GITHUB_TOKEN`,
`path:/workspace/repo`, `docker:CAP`), `min_platform_version`
(semver floor enforced at install time when `PLATFORM_VERSION`
env is set), and `config_schema json.RawMessage` (stored raw so
canvas can render a form without re-parsing). All three are
additive — missing keys unmarshal to zero values. Positions
Molecule AI ahead of the agentskills.io spec picking up
permissions semantics, and mirrors Holaboss's
`workspace.yaml`-forces-prompts-into-AGENTS.md principle
(config stays machine-readable).
**Ecosystem citations:** Hermes — "if `agentskills.io` spec picks
up mass adoption → align our plugin manifest"; Holaboss —
`workspace.yaml` rejects inline prompt bodies.
### 5. Fail-secure encryption at boot — **S, security critical**
Security Auditor's top proposal. Today `SECRETS_ENCRYPTION_KEY`
is optional — when unset, the platform boots and silently falls
back to storing secrets in plaintext. Flip to fail-secure: if the
binary is built with `go build -tags prod` (or `MOLECULE_ENV=prod`
is set), refuse to start without a 32-byte key and log a loud
abort. Dev builds retain the current fallback with a startup
warning. Small, surgical change in `workspace-server/internal/crypto/aes.go`
+ `cmd/server/main.go` init; unit test already exists to verify
encryption path.
**Ecosystem citation:** gstack CSO persona — OWASP A02:2021
(cryptographic failures), STRIDE "Tampering / Information
Disclosure."
---
## Per-agent improvement proposals
Each of the 9 team members produced 23 concrete proposals mapped to
real file paths. Summary here; full proposals live in the captured
research (happy to expand any). Proposals adopted into Top-5 above are
marked ✅.
### Market Analyst (Holaboss axis, 3,164 chars)
- ✅ Structured filesystem memory layer (→ Top-5 #1)
- Compaction-boundary artifact for long-horizon single-agent mode —
**defer** (we're multi-agent; only useful if we add a persistent
PM-only mode).
- Section-based prompt assembly with per-section cache fingerprints —
**consider** once Claude-SDK prompt-caching becomes a cost line item.
### Technical Researcher (Hermes axis, 3,874 chars)
- ✅ Nudge-to-persist memory pattern → exposed in UIUX Proposal 2 below.
- ✅ FTS5 recall (→ Top-5 #1).
- ✅ Daytona/Modal-style hibernation (→ Top-5 #2).
- Honcho dialectic user-modelling backend — **evaluate for PM role
only**; too invasive to bolt onto every workspace.
- `hermes claw migrate` graceful-import pattern — **add to backlog**
if we ever deprecate a runtime adapter.
### Competitive Intelligence (gstack axis, 2,974 chars)
- Weekly Retro Synthesis command (`/retro`) — **CEO-side skill, see
below**.
- `/freeze`, `/guard`, `/unfreeze` architectural guardrails — see
QA proposal 3.
- Lift CSO (OWASP + STRIDE) and Designer (AI-slop detection) role
prompts into our Security Auditor and UIUX system-prompts as
attributed additions — **S effort, high leverage**.
### Frontend Engineer (6,154 chars)
1. **Namespaced Memory Browser**`canvas/src/components/tabs/MemoryTab.tsx`
parses `namespace:key` naming into grouped accordions. Zero backend
change for MVP; converges with BE Top-5 #1.
2. **"Save as memory" nudge in ActivityTab** —
`canvas/src/components/tabs/ActivityTab.tsx` renders an inline
"Save as memory →" link on `task_complete` and `skill_promo`
events; clicks pre-populate the MemoryTab add form. Hermes
closed-learning-loop pattern.
3. **[3rd proposal in full output]** — available on request.
### Backend Engineer (12,938 chars, the longest output)
1. ✅ Memory FTS + namespace (→ Top-5 #1)
2. ✅ Plugin manifest extension (→ Top-5 #4)
3. Schedule import/export via bundle system — **M**; currently
`workspace_schedules` rows are orphaned on `bundles/export`. Small
handler change in `workspace-server/internal/handlers/bundle.go`.
### DevOps Engineer (6,761 chars)
1. ✅ Idle watchdog auto-pause (→ Top-5 #2)
2. ✅ Parallel adapter builds (→ Top-5 #3)
3. Per-adapter CI stages (build + smoke-test each image in its own
GitHub-Actions matrix job) — **M**; currently adapter images only
get built locally.
### Security Auditor (8,875 chars, strongest deliverable)
1. ✅ Fail-secure encryption at boot (→ Top-5 #5)
2. Remove `test:*` from production `systemCallerPrefixes`**S**.
`workspace-server/internal/handlers/a2a_proxy.go:50` currently whitelists
the literal prefix `test:` in every environment; it's an
access-control bypass waiting to be exploited. Guard behind
`MOLECULE_ENV != prod`.
3. Plugin supply-chain hardening — mandate `plugin.yaml` presence
and reject staged trees containing executable bits (`+x`) outside
`skills/*/hook.sh`. **S**; adds a preflight in
`workspace-server/internal/plugins/localresolver.go`.
### QA Engineer (6,395 chars)
1. Filesystem memory namespace isolation (tests enforcing namespace
separation) — **S**, complements Top-5 #1.
2. Autonomous skill-creation loop + FTS5 recall test suite — **M**;
Hermes self-improvement pattern needs explicit coverage before
landing.
3. Freeze / Guard / Unfreeze architectural guardrail tests — **M**;
ports gstack's `/freeze` primitive as enforced test fixtures
(e.g. a `/freeze` on the auth middleware fails CI if any handler
modifies it without an override flag).
### UIUX Designer (14,285 chars, the longest engineering output)
1. Namespaced Memory Browser — same as FE proposal 1; the two
should be implemented as one ticket, UIUX leads, FE executes.
2. Clickable Skill Promotion Nudge on node card — surfaces Hermes's
skill-promotion pattern at the canvas-graph level. **S**.
3. Inline `initial_prompt` body warning in ConfigTab —
`canvas/src/components/tabs/ConfigTab.tsx` flags when
`initial_prompt:` has inline body text >200 chars with a
"Extract to AGENTS.md" lint-style hint. **S**; encodes the
Holaboss principle that config should stay machine-readable.
---
## CEO workflow improvements
Patterns to adopt at the **Claude Code CLI** layer (the CEO's
interface), not inside the Molecule AI platform itself.
### New skills to add under `.claude/skills/`
1. **`/retro`** — lifted directly from gstack. Generates a weekly
retrospective by reading `git log --since='7 days ago' --oneline
--shortstat`, the merged PR list, the set of closed issues, and
the activity logs across the org. Outputs a markdown doc under
`docs/retros/<YYYY-MM-DD>.md`. High leverage at near-zero cost;
gstack validated the pattern at 70k⭐.
2. **`/freeze <path>`** — sets a repo-level flag (a file under
`.claude/freezes/`) that any future code-review or edit skill
reads. When the next CEO-driven change touches a frozen path,
the edit skill blocks with a clear message. Adopted from
gstack's `/freeze` / `/guard` / `/unfreeze` trio.
3. **`/verify-refs`** — explicit helper for the "verify before
citing" discipline we encoded in the team's hardened prompts.
Takes a draft message, finds `#NNN` / `sha:hex` / `path:` refs,
runs `gh issue view`, `git log -1`, `cat` respectively, and
reports any mismatches before the CEO sends.
### Settings / Hooks changes
- **Pre-tool hook on `Bash` commands that match `git push origin main`**
— reject unless `FORCE_PUSH_MAIN=1` is exported. This session we
caught ourselves (and PM) about to commit to `main` twice. A
hook makes the rule programmatic.
- **Status line / telemetry counter** for MCP tool failures — so
PR breakage from upstream MCP-client issues (e.g. #67) surfaces
in the prompt, not only when we try to use it.
### Process / conventions
- When briefing PM on a fan-out task: always include the explicit
workspace IDs and instruction to **inline documents** — even
though this is now encoded in the hardened system prompts
(PR #69), the reminder at task-issue time saves a round-trip.
- Treat `Agent error (ProcessError)` as a **platform bug**, not a
transient failure. Restart the affected workspace, note the
incident in the issue tracker referencing #66 and #71 until
they land.
---
## Ecosystem signals to monitor (next quarter)
Items to watch on `docs/ecosystem-watch.md` and in the repos directly:
- **agentskills.io spec finalisation** — if the upstream spec
locks in permissions semantics, our `plugin.yaml` should
conform on the first release day. Today's Top-5 #4 positions
us to lead rather than follow.
- **Hermes multi-agent / A2A addition** — would put us in direct
overlap on the core thesis. Signal: any Nous Research blog
post or commit matching `a2a|delegate|subagent_a2a`.
- **gstack parallel / multi-session** — if gstack ships
simultaneous Claude Code workers + routing between them,
their 70k⭐ head start converts into direct competition.
Signal: any `/multi-*` command in the `garrytan/gstack` repo
or a Garry Tan post showing it.
- **Holaboss → A2A** — Holaboss shipping workspace-to-workspace
messaging would put them in the "AI company" shape we occupy
today. Signal: a `workspace.yaml` `connections:` field or a
`holaboss a2a` subcommand.
- **Atropos RL trajectory format** — if Nous standardises the
schema for RL training-data export, our activity logs should
adopt it so users can export Molecule AI runs for training.
---
## Explicit non-adoptions
Decisions to NOT copy, with reasons, so we don't revisit them:
- **Holaboss single-active-agent-per-workspace shape** — incompatible
with our core thesis. Keep the concept of workspace-as-container
but don't collapse to a single agent.
- **Hermes six-backend abstraction** (Docker / SSH / Daytona /
Singularity / Modal) — our Docker provisioner is the right
ceiling for v1. Serverless hibernation (Top-5 #2) buys us 80%
of the cost win without the plumbing.
- **gstack's Claude-Code-native-only execution model** — gstack is
a prompt library living inside one Claude Code session. Adopting
that shape would eliminate our multi-agent / multi-runtime
differentiation. We borrow specific role prompts, not the
architecture.
- **`workspace.yaml` banning inline prompts at the schema level**
— Holaboss rejects inline prompt bodies at parse time. We ship a
UIUX *warning* instead (UIUX proposal 3) so existing templates
keep working. The principle is right; the enforcement mechanism
is too blunt for a platform that already has shipped templates
out in the wild.
- **Compaction-boundary artifact** — solves long-horizon single-agent
cost. We are multi-agent with per-workspace checkpointing already;
this would be complexity for no direct gain.
---
## Process observations (meta)
Notes on how this coordination went that inform future runs:
1. **`#66` (opaque stderr) and `#71` (initial_prompt replay crash)
are blocking team coordination.** Every fresh org import today
started with ProcessError cascades. Until these land, any
multi-agent research task requires host-side intervention
(touching `.initial_prompt_done`, restarting crashed workspaces).
2. **`#65` (per-agent repo-access YAML) would eliminate the
inline-documents workaround** that every Hard-Learned Rule we
just added to the prompts tells the team to do. This is the
single highest-leverage platform improvement on the list.
3. **Capturing raw analyst outputs from the activity log is a valid
fallback** when PM crashes mid-synthesis. All 9 outputs in this
doc were retrieved from `GET /workspaces/:id/activity` after the
PM/RL/DL round-trip failed. Worth surfacing in platform docs
as the "recovery" path.
4. **The hardened system prompts (PR #69) are already paying off**:
Research Lead and Dev Lead both immediately fanned out in
parallel with delegation IDs, rather than attempting solo
synthesis. The "always fan out" rule is doing work.
---
## Next actions
Recommend proceeding in this order, each as its own PR:
1. **Ship #71 fix** (initial_prompt marker up-front) — unblocks all
future fresh org imports.
2. **Ship #66 fix** (stderr capture) — restores debuggability.
3. **Top-5 #1** (memory FTS + namespace) — highest-convergence
team ask, cleanest migration.
4. **Top-5 #5** (fail-secure encryption) — security-critical, trivial.
5. **CEO `/retro` skill** — near-zero effort, compounding weekly.
Everything else in this doc flows from there.