Commit Graph

5 Commits

Author SHA1 Message Date
rabbitblood
2492f8c806 feat(platform): wire github-app-auth plugin for per-installation tokens
Integrates github.com/Molecule-AI/molecule-ai-plugin-github-app-auth.
When GITHUB_APP_ID is set, the platform constructs a plugin
Authenticator at boot and registers it as an EnvMutator on the
WorkspaceHandler. Every workspace provision then gets a fresh
GITHUB_TOKEN / GH_TOKEN injected from the App's installation token
(rotates ~hourly, refresh 5 min before expiry).

Verified live this turn:
- Platform boot log: `github-app-auth: registered, 1 mutator(s) in chain`
- `docker exec ws-<id> gh auth status` → `Logged in as molecule-ai[bot] (GH_TOKEN)`
- `gh issue list --repo Molecule-AI/molecule-core` returns real data
  (Hermes #498/#499/#500 visible from inside a workspace container)

## Changes
- platform/go.mod + go.sum: new dep on the plugin
- platform/cmd/server/main.go: import + conditional registration
  (soft-skip when GITHUB_APP_ID is unset for self-hosted/dev)
- docker-compose.yml: pass GITHUB_APP_* env + bind-mount private key

## Drive-by
.gitignore: exclude /org-templates /plugins /workspace-configs-templates
— these dirs are populated locally by clone-manifest.sh from the
standalone repos, should never be committed to core. Without this rule
my previous git add -A staged 33 embedded git dirs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:52:20 -07:00
Hongming Wang
a152342e8c feat(platform): auto-detect SaaS tenant → control plane provisioner
No env vars to configure. The platform auto-detects the backend:

  MOLECULE_ORG_ID set → SaaS tenant → control plane provisioner
  MOLECULE_ORG_ID empty → self-hosted → Docker provisioner

The control plane URL defaults to https://api.moleculesai.app (override
with CP_PROVISION_URL for testing). No FLY_API_TOKEN on the tenant.

Removed: direct Fly provisioner (FlyProvisioner) — all SaaS workspace
provisioning goes through the control plane which holds the Fly token
and manages billing, quotas, and cleanup.

Two backends: CPProvisioner (SaaS) and Docker Provisioner (self-hosted).

Closes #494

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 11:50:52 -07:00
Hongming Wang
c2c80fd269 feat(platform): Fly Machines provisioner for SaaS workspace deployment
When CONTAINER_BACKEND=flyio, workspaces are provisioned as Fly Machines
instead of local Docker containers. This enables workspace deployment
on SaaS tenants where no Docker daemon is available.

New files:
- provisioner/fly_provisioner.go: FlyProvisioner with Start/Stop/
  IsRunning/Restart/Close via Fly Machines API (api.machines.dev/v1)
- FlyRuntimeImages maps runtimes to GHCR image tags

Changes:
- main.go: select Docker vs Fly based on CONTAINER_BACKEND env var
- workspace.go: SetFlyProvisioner() setter, Create checks flyProv first
- workspace_provision.go: provisionWorkspaceFly() loads secrets, calls
  FlyProvisioner.Start, issues auth token for the new machine

Env vars for Fly backend:
- CONTAINER_BACKEND=flyio (activates Fly provisioner)
- FLY_API_TOKEN (Fly deploy token)
- FLY_WORKSPACE_APP (Fly app name for workspace machines)
- FLY_REGION (default: ord)

Closes #494

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:51:15 -07:00
rabbitblood
76a36e8062 fix(platform): panic-recovering supervisor for every background goroutine (#92)
Yesterday's scheduler-died incident (#85) was one instance of a systemic
bug: every long-running goroutine in the platform lacks panic recovery
and exposes no liveness signal. In a multi-tenant SaaS deployment, a
single tenant's bad data panicking any subsystem takes down the
subsystem for every tenant, silently, with all standard health probes
still green. That is a scale-of-one sev-1.

This PR:

1. Introduces `platform/internal/supervised/` with two primitives:

   a. RunWithRecover(ctx, name, fn) — runs fn in a recover wrapper.
      On panic logs the stack + exponential-backoff restart (1s → 2s →
      4s → … → 30s cap). On clean return (fn decided to stop) returns.
      On ctx.Done cancels cleanly.

   b. Heartbeat(name) + LastTick(name) + Snapshot() + IsHealthy(names,
      staleThreshold) — shared in-memory liveness registry. Every
      subsystem calls Heartbeat(name) at the end of each tick so
      operators can distinguish "goroutine alive and healthy" from
      "alive but stuck inside a single tick".

2. Wraps every `go X.Start(ctx)` in main.go:
   - broadcaster.Subscribe   (Redis pub/sub relay → WebSocket)
   - registry.StartLivenessMonitor
   - registry.StartHealthSweep
   - scheduler.Start         (the one that died yesterday)
   - channelMgr.Start        (Telegram / Slack)

3. Adds `supervised.Heartbeat("scheduler")` inside the scheduler tick
   loop as the first end-to-end demonstration. Follow-up PRs will add
   heartbeats to the other four subsystems.

4. Adds `GET /admin/liveness` endpoint returning per-subsystem
   last_tick_at + seconds_ago. Operators can poll this and alert on
   any subsystem whose seconds_ago exceeds 2x its cron/tick interval.

5. Unit tests for RunWithRecover (clean return no restart; panic
   restarts with backoff; ctx cancel stops restart loop) and for the
   liveness registry.

Net new code: ~160 lines + ~100 lines of tests. Refactor of main.go:
~10 line changes. No behavior change on happy path; only lifts what
happens on a panic.

Closes #92. Supersedes the local recover added to scheduler.go in
#90 (kept conceptually, but now via the shared helper).
2026-04-14 20:34:18 -07:00
Hongming Wang
24fec62d7f initial commit — Molecule AI platform
Forked clean from public hackathon repo (Starfire-AgentTeam, BSL 1.1)
with full rebrand to Molecule AI under github.com/Molecule-AI/molecule-monorepo.

Brand: Starfire → Molecule AI.
Slug: starfire / agent-molecule → molecule.
Env vars: STARFIRE_* → MOLECULE_*.
Go module: github.com/agent-molecule/platform → github.com/Molecule-AI/molecule-monorepo/platform.
Python packages: starfire_plugin → molecule_plugin, starfire_agent → molecule_agent.
DB: agentmolecule → molecule.

History truncated; see public repo for prior commits and contributor
attribution. Verified green: go test -race ./... (platform), pytest
(workspace-template 1129 + sdk 132), vitest (canvas 352), build (mcp).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:55:37 -07:00