molecule-core/workspace-server/internal/router/router.go
Hongming Wang 94d9331c76 feat(canvas+platform): chat attachments, model selection, deploy/delete UX
Session's accumulated UX work across frontend and platform. Reviewable
in four logical sections — diff is large but internally cohesive
(each section fixes a gap the next one depends on).

## Chat attachments — user ↔ agent file round trip

- New POST /workspaces/:id/chat/uploads (multipart, 50 MB total /
  25 MB per file, UUID-prefixed storage under
  /workspace/.molecule/chat-uploads/).
- New GET /workspaces/:id/chat/download with RFC 6266 filename
  escaping and binary-safe io.CopyN streaming.
- Canvas: drag-and-drop onto chat pane, pending-file pills,
  per-message attachment chips with fetch+blob download (anchor
  navigation can't carry auth headers).
- A2A flow carries FileParts end-to-end; hermes template executor
  now consumes attachments via platform helpers.

## Platform attachment helpers (workspace/executor_helpers.py)

Every runtime's executor routes through the same helpers so future
runtimes inherit attachment awareness for free:
- extract_attached_files — resolve workspace:/file:///bare URIs,
  reject traversal, skip non-existent.
- build_user_content_with_files — manifest for non-image files,
  multi-modal list (text + image_url) for images. Respects
  MOLECULE_DISABLE_IMAGE_INLINING for providers whose vision
  adapter hangs on base64 payloads (MiniMax M2.7).
- collect_outbound_files — scans agent reply for /workspace/...
  paths, stages each into chat-uploads/ (download endpoint
  whitelist), emits as FileParts in the A2A response.
- ensure_workspace_writable — called at molecule-runtime startup
  so non-root agents can write /workspace without each template
  having to chmod in its Dockerfile.

Hermes template executor + langgraph (a2a_executor.py) + claude-code
(claude_sdk_executor.py) all adopt the helpers.

## Model selection & related platform fixes

- PUT /workspaces/:id/model — was 404'ing, so canvas "Save"
  silently lost the model choice. Stores into workspace_secrets
  (MODEL_PROVIDER), auto-restarts via RestartByID.
- applyRuntimeModelEnv falls back to envVars["MODEL_PROVIDER"]
  so Restart propagates the stored model to HERMES_DEFAULT_MODEL
  without needing the caller to rehydrate payload.Model.
- ConfigTab Tier dropdown now reads from workspaces row, not the
  (stale) config.yaml — fixes "badge shows T3, form shows T2".

## ChatTab & WebSocket UX fixes

- Send button no longer locks after a dropped TASK_COMPLETE —
  `sending` no longer initializes from data.currentTask.
- A2A POST timeout 15 s → 120 s. LLM turns routinely exceed 15 s;
  the previous default aborted fetches while the server was still
  replying, producing "agent may be unreachable" on success.
- socket.ts: disposed flag + reconnectTimer cancellation + handler
  detachment fix zombie-WebSocket in React StrictMode.
- Hermes Config tab: RUNTIMES_WITH_OWN_CONFIG drops 'hermes' —
  the adaptor's purpose IS the form, banner was contradictory.
- workspace_provision.go auto-recovery: try <runtime>-default AND
  bare <runtime> for template path (hermes lives at the bare name).

## Org deploy/delete animation (theme-ready CSS)

- styles/theme-tokens.css — design tokens (durations, easings,
  colors). Light theme overrides by setting only the deltas.
- styles/org-deploy.css — animation classes + keyframes, every
  value references a token. prefers-reduced-motion respected.
- Canvas projects node.draggable=false onto locked workspaces
  (deploying children AND actively-deleting ids) — RF's
  authoritative drag lock; useDragHandlers retains a belt-and-
  braces check.
- Organ cancel button (red pulse pill on root during deploy)
  cascades via existing DELETE /workspaces/:id?confirm=true.
- Auto fit-view after each arrival, debounced 500 ms so rapid
  sibling arrivals coalesce into one fit (previous per-event
  fit made the viewport lurch continuously).
- Auto-fit respects user-pan — onMoveEnd stamps a user-pan
  timestamp only when event !== null (ignores programmatic
  fitView) so auto-fits don't self-cancel.
- deletingIds store slice + useOrgDeployState merge gives the
  delete flow the same dim + non-draggable treatment as deploy.
- Platform-level classNames.ts shared by canvas-events +
  useCanvasViewport (DRY'd 3 copies of split/filter/join).

## Server payload change

- org_import.go WORKSPACE_PROVISIONING broadcast now includes
  parent_id + parent-RELATIVE x/y (slotX/slotY) so the canvas
  renders the child at the right parent-nested slot without doing
  any absolute-position walk. createWorkspaceTree signature gains
  relX, relY alongside absX, absY; both call sites updated.

## Tests

- workspace/tests/test_executor_helpers.py — 11 new cases
  covering URI resolution (including traversal rejection),
  attached-file extraction (both Part shapes), manifest-only
  vs multi-modal content, large-image skip, outbound staging,
  dedup, and ensure_workspace_writable (chmod 777 + non-root
  tolerance).
- workspace-server chat_files_test.go — upload validation,
  Content-Disposition escaping, filename sanitisation.
- workspace-server secrets_test.go — SetModel upsert, empty
  clears, invalid UUID rejection.
- tests/e2e/test_chat_attachments_e2e.sh — round-trip against
  a live hermes workspace.
- tests/e2e/test_chat_attachments_multiruntime_e2e.sh — static
  plumbing check + round-trip across hermes/langgraph/claude-code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:27:51 -07:00

660 lines
29 KiB
Go
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

package router
import (
"context"
"os"
"path/filepath"
"strconv"
"strings"
"time"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/channels"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/events"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/handlers"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/metrics"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/middleware"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/provisioner"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/supervised"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/ws"
"github.com/docker/docker/client"
"github.com/gin-contrib/cors"
"github.com/gin-gonic/gin"
)
func Setup(hub *ws.Hub, broadcaster *events.Broadcaster, prov *provisioner.Provisioner, platformURL, configsDir string, wh *handlers.WorkspaceHandler, channelMgr *channels.Manager) *gin.Engine {
r := gin.Default()
// Issue #179 — trust no reverse-proxy headers. Without this call Gin's
// default is to trust ALL X-Forwarded-For values, which lets any caller
// spoof their IP and bypass per-IP rate limiting. With nil, c.ClientIP()
// always returns the real TCP RemoteAddr.
if err := r.SetTrustedProxies(nil); err != nil {
panic("router: SetTrustedProxies: " + err.Error())
}
// CORS origins — configurable via CORS_ORIGINS env var (comma-separated)
corsOrigins := []string{"http://localhost:3000", "http://localhost:3001"}
if v := os.Getenv("CORS_ORIGINS"); v != "" {
corsOrigins = strings.Split(v, ",")
}
r.Use(cors.New(cors.Config{
AllowOrigins: corsOrigins,
AllowMethods: []string{"GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"},
AllowHeaders: []string{"Origin", "Content-Type", "X-Workspace-ID", "Authorization"},
AllowCredentials: true,
}))
// Rate limiting — configurable via RATE_LIMIT env var (default 600 req/min)
// 15 workspaces × 2 heartbeats/min + canvas polling + user actions needs headroom
rateLimit := 600
if v := os.Getenv("RATE_LIMIT"); v != "" {
if n, err := strconv.Atoi(v); err == nil && n > 0 {
rateLimit = n
}
}
limiter := middleware.NewRateLimiter(rateLimit, time.Minute, context.Background())
r.Use(limiter.Middleware())
// Prometheus metrics middleware — records every request's method/path/status/latency.
// Must be registered after rate limiter so aborted requests are also counted.
r.Use(metrics.Middleware())
// Tenant guard — the public repo's only SaaS hook. When MOLECULE_ORG_ID is
// set (only by the private molecule-controlplane provisioner on tenant Fly
// Machines), rejects requests whose X-Molecule-Org-Id header doesn't match.
// Unset (self-hosted / dev / CI) → no-op. Registered after metrics so
// rejected requests still land on the 4xx counter.
r.Use(middleware.TenantGuard())
// Security headers (#151) — sets X-Content-Type-Options, X-Frame-Options,
// Referrer-Policy, Content-Security-Policy, Permissions-Policy, HSTS on
// every response. Tests in securityheaders_test.go assert each header is
// present and that handler-set headers are not overridden. Registered
// last so a handler can still opt out by setting its own header before
// c.Next() returns.
r.Use(middleware.SecurityHeaders())
// Health
r.GET("/health", func(c *gin.Context) {
c.JSON(200, gin.H{"status": "ok"})
})
// /admin/liveness — per-subsystem last-tick timestamps. Operators read this
// to catch stuck-but-not-crashed goroutines (the failure mode that caused
// the 12h scheduler outage of 2026-04-14, issue #85). Any subsystem whose
// last tick is older than 2× its expected interval is stale.
//
// #166: gated behind AdminAuth. Internal health state is an ops-intel leak
// in production (scheduler tick cadence reveals fleet size + work pattern).
r.GET("/admin/liveness", middleware.AdminAuth(db.DB), func(c *gin.Context) {
snap := supervised.Snapshot()
out := make(map[string]interface{}, len(snap))
now := time.Now()
for name, last := range snap {
out[name] = gin.H{
"last_tick_at": last,
"seconds_ago": int(now.Sub(last).Seconds()),
}
}
c.JSON(200, gin.H{"subsystems": out})
})
// Prometheus metrics — exempt from rate limiter via separate registration
// (registered before Use(limiter) takes effect on this specific route — the
// middleware.Middleware() still records it for observability).
// Scrape with: curl http://localhost:8080/metrics
r.GET("/metrics", metrics.Handler())
// Single-workspace read — open so canvas nodes can fetch their own state
// without a token (used by WorkspaceNode polling and health checks).
r.GET("/workspaces/:id", wh.Get)
// C1 + C20: workspace list and life-cycle mutations gated behind AdminAuth.
// Fail-open when no tokens exist anywhere (fresh install / pre-Phase-30).
// Blocks:
// C1 — unauthenticated GET /workspaces (workspace topology exposure)
// C20 — unauthenticated DELETE /workspaces/:id (mass-deletion attack)
// unauthenticated POST /workspaces (workspace creation)
{
wsAdmin := r.Group("", middleware.AdminAuth(db.DB))
wsAdmin.GET("/workspaces", wh.List)
wsAdmin.POST("/workspaces", wh.Create)
wsAdmin.DELETE("/workspaces/:id", wh.Delete)
// Out-of-band bootstrap signal: CP's watcher POSTs here when it
// detects "RUNTIME CRASHED" in a workspace EC2 console output,
// so the canvas flips to failed in seconds instead of waiting
// for the 10-minute provision-timeout sweeper.
wsAdmin.POST("/admin/workspaces/:id/bootstrap-failed", wh.BootstrapFailed)
// Proxy to CP's serial-console endpoint so the canvas's "View
// Logs" button can render the actual boot trace without handing
// the tenant AWS credentials. Admin-gated because console output
// can include user-data snippets we treat as semi-sensitive.
wsAdmin.GET("/workspaces/:id/console", wh.Console)
// Admin memory backup/restore (#1051) — bulk export/import of agent
// memories for safe Docker rebuilds. Matches workspaces by name on import.
// F1084/#1131: Export applies redactSecrets before returning content.
// F1085/#1132: Import applies redactSecrets before persisting content.)
adminMemH := handlers.NewAdminMemoriesHandler()
wsAdmin.GET("/admin/memories/export", adminMemH.Export)
wsAdmin.POST("/admin/memories/import", adminMemH.Import)
}
// A2A proxy — registered outside the auth group; already enforces CanCommunicate access control.
r.POST("/workspaces/:id/a2a", wh.ProxyA2A)
// Auth-gated workspace sub-routes — ALL /workspaces/:id/* paths except /a2a.
// Fix A (Cycle 5): single WorkspaceAuth middleware blocks C2-C5, C7-C9, C12, C13
// by requiring a valid bearer token for any workspace that has one on file.
// Legacy workspaces (no token) are grandfathered to allow rolling upgrades.
wsAuth := r.Group("/workspaces/:id", middleware.WorkspaceAuth(db.DB))
{
// #680: PATCH /workspaces/:id moved under WorkspaceAuth (#680 IDOR fix).
// WorkspaceAuth enforces that the caller holds a valid bearer token for
// this specific workspace — both auth AND ownership in one check. Cosmetic
// updates (x/y drag-reposition, inline rename) from the combined tenant
// image canvas still pass via the isSameOriginCanvas bypass in WorkspaceAuth.
wsAuth.PATCH("", wh.Update)
// Lifecycle
wsAuth.GET("/state", wh.State)
wsAuth.POST("/restart", wh.Restart)
wsAuth.POST("/pause", wh.Pause)
wsAuth.POST("/resume", wh.Resume)
// Manual hibernate (opt-in, #711) — stops the container and sets status
// to 'hibernated'. The workspace auto-wakes on the next A2A message.
wsAuth.POST("/hibernate", wh.Hibernate)
// Async Delegation
delh := handlers.NewDelegationHandler(wh, broadcaster)
wsAuth.POST("/delegate", delh.Delegate)
wsAuth.GET("/delegations", delh.ListDelegations)
// Record-only endpoint for agent-initiated delegations (#64). Agent-side
// delegate_to_workspace fires A2A directly for speed + OTEL propagation;
// this endpoint just adds an activity_logs row so GET /delegations returns
// the same set the agent's local `check_delegation_status` sees.
wsAuth.POST("/delegations/record", delh.Record)
wsAuth.POST("/delegations/:delegation_id/update", delh.UpdateStatus)
// Traces (Langfuse proxy)
trh := handlers.NewTracesHandler()
wsAuth.GET("/traces", trh.List)
// Live agent transcript proxy — surfaces the runtime-specific session
// log (claude-code reads ~/.claude/projects/<cwd>/<session>.jsonl).
// Lets canvas / operators see live tool calls + AI thinking instead
// of waiting for the high-level activity log to flush.
trsh := handlers.NewTranscriptHandler()
wsAuth.GET("/transcript", trsh.Get)
// Agent Memories (HMA)
memsh := handlers.NewMemoriesHandler()
wsAuth.POST("/memories", memsh.Commit)
wsAuth.GET("/memories", memsh.Search)
wsAuth.DELETE("/memories/:memoryId", memsh.Delete)
// Approvals
apph := handlers.NewApprovalsHandler(broadcaster)
wsAuth.POST("/approvals", apph.Create)
wsAuth.GET("/approvals", apph.List)
wsAuth.POST("/approvals/:approvalId/decide", apph.Decide)
// /approvals/pending is a cross-workspace admin path; WorkspaceAuth cannot
// be used here (no workspace scope), but it still needs auth so an
// unauthenticated caller cannot enumerate all pending approvals across the
// entire platform. Gated behind AdminAuth (issue #180).
r.GET("/approvals/pending", middleware.AdminAuth(db.DB), apph.ListAll)
// Team Expansion
teamh := handlers.NewTeamHandler(broadcaster, prov, platformURL, configsDir)
wsAuth.POST("/expand", teamh.Expand)
wsAuth.POST("/collapse", teamh.Collapse)
// Agents
ah := handlers.NewAgentHandler(broadcaster)
wsAuth.POST("/agent", ah.Assign)
wsAuth.PATCH("/agent", ah.Replace)
wsAuth.DELETE("/agent", ah.Remove)
wsAuth.POST("/agent/move", ah.Move)
}
// Registry
rh := handlers.NewRegistryHandler(broadcaster)
// #1870 Phase 1: wire the queue drain hook so Heartbeat can dispatch
// a queued A2A request when the workspace reports spare capacity.
rh.SetQueueDrainFunc(wh.DrainQueueForWorkspace)
r.POST("/registry/register", rh.Register)
r.POST("/registry/heartbeat", rh.Heartbeat)
r.POST("/registry/update-card", rh.UpdateCard)
// Webhooks
whh := handlers.NewWebhookHandlerWithWorkspace(wh)
r.POST("/webhooks/github", whh.GitHub)
r.POST("/webhooks/github/:id", whh.GitHub)
// Discovery
dh := handlers.NewDiscoveryHandler()
r.GET("/registry/discover/:id", dh.Discover)
r.GET("/registry/:id/peers", dh.Peers)
r.POST("/registry/check-access", dh.CheckAccess)
// Events — #165: gated behind AdminAuth. The raw event log contains org
// topology, workspace names, and agent-card fragments; an unauth read
// leaks the entire fleet structure. GET /events/:workspaceId is still
// a cross-workspace read so it uses AdminAuth, not WorkspaceAuth.
eh := handlers.NewEventsHandler()
{
eventsAdmin := r.Group("", middleware.AdminAuth(db.DB))
eventsAdmin.GET("/events", eh.List)
eventsAdmin.GET("/events/:workspaceId", eh.ListByWorkspace)
}
// Remaining auth-gated workspace sub-routes — appended to wsAuth group declared above.
{
// Activity Logs
acth := handlers.NewActivityHandler(broadcaster)
wsAuth.GET("/activity", acth.List)
wsAuth.GET("/session-search", acth.SessionSearch)
wsAuth.POST("/activity", acth.Report)
wsAuth.POST("/notify", acth.Notify)
// Config
cfgh := handlers.NewConfigHandler()
wsAuth.GET("/config", cfgh.Get)
wsAuth.PATCH("/config", cfgh.Patch)
// Schedules (cron tasks)
schedh := handlers.NewScheduleHandler()
wsAuth.GET("/schedules", schedh.List)
wsAuth.POST("/schedules", schedh.Create)
wsAuth.PATCH("/schedules/:scheduleId", schedh.Update)
wsAuth.DELETE("/schedules/:scheduleId", schedh.Delete)
wsAuth.POST("/schedules/:scheduleId/run", schedh.RunNow)
wsAuth.GET("/schedules/:scheduleId/history", schedh.History)
// Schedule health — open to CanCommunicate peers (no workspace bearer token
// required) so peer agents can detect silent cron failures without admin auth.
// Auth is enforced inside the handler via X-Workspace-ID + CanCommunicate
// (mirrors the /workspaces/:id/a2a pattern). Issue #249.
r.GET("/workspaces/:id/schedules/health", schedh.Health)
// Budget — per-workspace spend ceiling and current usage (#541).
// GET stays on wsAuth — a workspace agent reading its own budget is legitimate.
// PATCH is admin-only — workspace agents must not be able to self-clear their
// spending ceiling (that would defeat the entire budget enforcement feature).
budgeth := handlers.NewBudgetHandler()
wsAuth.GET("/budget", budgeth.GetBudget)
r.PATCH("/workspaces/:id/budget", middleware.AdminAuth(db.DB), budgeth.PatchBudget)
// Token management (user-facing create/list/revoke)
tokh := handlers.NewTokenHandler()
wsAuth.GET("/tokens", tokh.List)
wsAuth.POST("/tokens", tokh.Create)
wsAuth.DELETE("/tokens/:tokenId", tokh.Revoke)
// Memory
memh := handlers.NewMemoryHandler()
wsAuth.GET("/memory", memh.List)
wsAuth.GET("/memory/:key", memh.Get)
wsAuth.POST("/memory", memh.Set)
wsAuth.DELETE("/memory/:key", memh.Delete)
// Secrets (auto-restart workspace after secret change)
sech := handlers.NewSecretsHandler(wh.RestartByID)
wsAuth.GET("/secrets", sech.List)
// Phase 30.2 — decrypted values pull, token-gated. Canvas uses List
// (keys + metadata only); remote agents use Values to bootstrap env.
wsAuth.GET("/secrets/values", sech.Values)
wsAuth.POST("/secrets", sech.Set)
wsAuth.PUT("/secrets", sech.Set)
wsAuth.DELETE("/secrets/:key", sech.Delete)
wsAuth.GET("/model", sech.GetModel)
wsAuth.PUT("/model", sech.SetModel)
// Token usage metrics — cost transparency (#593).
// WorkspaceAuth middleware (on wsAuth) binds the bearer to :id.
mtrh := handlers.NewMetricsHandler()
wsAuth.GET("/metrics", mtrh.GetMetrics)
// Cloudflare Artifacts demo integration (#595).
// All four routes require workspace-scoped bearer auth (wsAuth).
// CF credentials read from CF_ARTIFACTS_API_TOKEN / CF_ARTIFACTS_NAMESPACE;
// missing credentials return 503 so the handler still registers in
// every deployment — the demo is gated on env vars, not compilation.
arth := handlers.NewArtifactsHandler()
wsAuth.POST("/artifacts", arth.Create)
wsAuth.GET("/artifacts", arth.Get)
wsAuth.POST("/artifacts/fork", arth.Fork)
wsAuth.POST("/artifacts/token", arth.Token)
// Temporal workflow checkpoints — step-level persistence for resumable
// workflows (#788, #837, parent #583). WorkspaceAuth on wsAuth ensures each
// workspace can only read/write its own checkpoints.
// NOTE: /checkpoints/latest must be registered BEFORE /checkpoints/:wfid
// so Gin's static-segment resolution takes precedence over the wildcard.
cpth := handlers.NewCheckpointsHandler(db.DB)
wsAuth.POST("/checkpoints", cpth.Upsert)
wsAuth.GET("/checkpoints/latest", cpth.Latest)
wsAuth.GET("/checkpoints/:wfid", cpth.List)
wsAuth.DELETE("/checkpoints/:wfid", cpth.Delete)
// MCP bridge — opencode / Claude Code integration (#800).
// Exposes A2A delegation, peer discovery, and workspace operations as a
// remote MCP server over HTTP (Streamable HTTP + SSE transports).
//
// Security:
// C1: WorkspaceAuth on wsAuth validates bearer token before any MCP logic.
// C2: MCPRateLimiter caps tool calls at 120/min/token so a long-lived
// opencode session cannot saturate the platform.
// C3: commit_memory/recall_memory with scope=GLOBAL → permission error;
// send_message_to_user excluded unless MOLECULE_MCP_ALLOW_SEND_MESSAGE=true.
mcpH := handlers.NewMCPHandler(db.DB, broadcaster)
mcpRl := middleware.NewMCPRateLimiter(120, time.Minute, context.Background())
wsAuth.GET("/mcp/stream", mcpRl.Middleware(), mcpH.Stream)
wsAuth.POST("/mcp", mcpRl.Middleware(), mcpH.Call)
}
// Global secrets — /settings/secrets is the canonical path; /admin/secrets kept for backward compat.
// Fix (Cycle 7): protected by AdminAuth — any valid workspace bearer token grants access.
// Fail-open when no tokens exist (fresh install / pre-Phase-30 upgrade).
{
adminAuth := r.Group("", middleware.AdminAuth(db.DB))
sechGlobal := handlers.NewSecretsHandler(wh.RestartByID)
adminAuth.GET("/settings/secrets", sechGlobal.ListGlobal)
adminAuth.PUT("/settings/secrets", sechGlobal.SetGlobal)
adminAuth.POST("/settings/secrets", sechGlobal.SetGlobal)
adminAuth.DELETE("/settings/secrets/:key", sechGlobal.DeleteGlobal)
adminAuth.GET("/admin/secrets", sechGlobal.ListGlobal)
adminAuth.POST("/admin/secrets", sechGlobal.SetGlobal)
adminAuth.DELETE("/admin/secrets/:key", sechGlobal.DeleteGlobal)
}
// Platform instructions — configurable rules with global/workspace scope.
// Admin endpoints for CRUD; workspace-facing resolve endpoint for agent bootstrap.
// (Team scope is reserved in the schema but not yet wired — needs teams/team_members
// migration first.)
{
instrH := handlers.NewInstructionsHandler()
adminInstr := r.Group("", middleware.AdminAuth(db.DB))
adminInstr.GET("/instructions", instrH.List)
adminInstr.POST("/instructions", instrH.Create)
adminInstr.PUT("/instructions/:id", instrH.Update)
adminInstr.DELETE("/instructions/:id", instrH.Delete)
// Resolve mounted under wsAuth — caller must hold a valid bearer token
// for :id, preventing cross-workspace enumeration of operator policy.
wsAuth.GET("/instructions/resolve", instrH.Resolve)
}
// Admin — cross-workspace schedule health monitoring (issue #618).
// Lets cron-audit agents and operators detect silent schedule failures
// across all workspaces without holding individual workspace bearer tokens.
// AdminAuth mirrors the /admin/liveness gate — fail-open on fresh install,
// strict bearer-only once any token exists.
{
asHealth := handlers.NewAdminSchedulesHealthHandler()
r.GET("/admin/schedules/health", middleware.AdminAuth(db.DB), asHealth.Health)
}
// Admin — test token minting (issue #6). Hidden in production via TestTokensEnabled().
// NOT behind AdminAuth — this is the bootstrap endpoint E2E tests and
// fresh installs use to obtain their first admin bearer. Adding AdminAuth
// (#612) broke the chicken-and-egg: after first workspace provision creates
// a live token in the DB, AdminAuth requires auth for ALL requests, but the
// client has no token yet because it needs this endpoint to get one.
// The handler itself rejects calls when MOLECULE_ENV=prod (TestTokensEnabled).
{
tokh := handlers.NewAdminTestTokenHandler()
r.GET("/admin/workspaces/:id/test-token", tokh.GetTestToken)
}
// Admin — GitHub App installation token refresh (issue #547).
// Long-running workspaces (>60 min) use this endpoint to refresh
// GH_TOKEN without restarting. Returns the current installation token
// from the github-app-auth plugin's in-process cache (which proactively
// refreshes 5 min before expiry). 404 when no GitHub App is configured
// (dev / self-hosted without GITHUB_APP_ID).
{
ghTokH := handlers.NewGitHubTokenHandler(wh.TokenRegistry())
// #1068: moved from AdminAuth to allow any authenticated workspace to
// refresh its GitHub token. The credential helper in containers calls
// this endpoint with a workspace bearer token — AdminAuth (PR #729)
// rejects those, breaking token refresh after 60 min.
// Keep the old path as an alias for backward compat.
r.GET("/admin/github-installation-token", middleware.AdminAuth(db.DB), ghTokH.GetInstallationToken)
wsAuth.GET("/github-installation-token", ghTokH.GetInstallationToken)
}
// Terminal — shares Docker client with provisioner
var dockerCli *client.Client
if prov != nil {
dockerCli = prov.DockerClient()
}
th := handlers.NewTerminalHandler(dockerCli)
wsAuth.GET("/terminal", th.HandleConnect)
// Canvas Viewport — #166 + #168: GET stays fully open for bootstrap.
// PUT uses CanvasOrBearer (accepts Origin-match OR bearer token) so the
// browser canvas can persist drag/zoom state without a bearer, while
// bearer-carrying clients (molecli, integration tests) still work.
// Viewport corruption is cosmetic-only — worst case a user refreshes
// the page — so the softer check is acceptable here. This middleware
// MUST NOT be used on routes that leak prompts, create workspaces,
// or write files (#164/#165/#190 class).
vh := handlers.NewViewportHandler()
r.GET("/canvas/viewport", vh.Get)
r.PUT("/canvas/viewport", middleware.CanvasOrBearer(db.DB), vh.Save)
// Templates
tmplh := handlers.NewTemplatesHandler(configsDir, dockerCli)
// #686: GET /templates lists all template names+metadata from configsDir.
// Open access lets unauthenticated callers enumerate org configurations and
// installed plugins. AdminAuth-gate it alongside POST /templates/import.
// #190: POST /templates/import writes arbitrary files into configsDir.
// Must be admin-gated — same class as /bundles/import (#164) and /org/import.
{
tmplAdmin := r.Group("", middleware.AdminAuth(db.DB))
tmplAdmin.GET("/templates", tmplh.List)
tmplAdmin.POST("/templates/import", tmplh.Import)
}
wsAuth.GET("/shared-context", tmplh.SharedContext)
wsAuth.PUT("/files", tmplh.ReplaceFiles)
wsAuth.GET("/files", tmplh.ListFiles)
wsAuth.GET("/files/*path", tmplh.ReadFile)
wsAuth.PUT("/files/*path", tmplh.WriteFile)
wsAuth.DELETE("/files/*path", tmplh.DeleteFile)
// Chat attachments — file upload (user → agent) and binary-safe
// streaming download (agent → user). Namespaced under /chat/ so
// the security model is obviously distinct from /files/* (which
// handles workspace config/templates and has a different caller).
chatfh := handlers.NewChatFilesHandler(tmplh)
wsAuth.POST("/chat/uploads", chatfh.Upload)
wsAuth.GET("/chat/download", chatfh.Download)
// Plugins
pluginsDir := findPluginsDir(configsDir)
// Runtime lookup lets the plugins handler filter the registry to plugins
// that declare support for the workspace's runtime, without taking a
// direct DB dependency in the handler package.
runtimeLookup := func(workspaceID string) (string, error) {
var runtime string
err := db.DB.QueryRowContext(
context.Background(),
`SELECT COALESCE(runtime, 'langgraph') FROM workspaces WHERE id = $1`,
workspaceID,
).Scan(&runtime)
return runtime, err
}
plgh := handlers.NewPluginsHandler(pluginsDir, dockerCli, wh.RestartByID).
WithRuntimeLookup(runtimeLookup)
r.GET("/plugins", plgh.ListRegistry)
r.GET("/plugins/sources", plgh.ListSources)
wsAuth.GET("/plugins", plgh.ListInstalled)
wsAuth.GET("/plugins/available", plgh.ListAvailableForWorkspace)
wsAuth.GET("/plugins/compatibility", plgh.CheckRuntimeCompatibility)
wsAuth.POST("/plugins", plgh.Install)
wsAuth.DELETE("/plugins/:name", plgh.Uninstall)
// Phase 30.3 — stream plugin as tar.gz so remote agents can pull +
// unpack locally instead of going through Docker exec.
wsAuth.GET("/plugins/:name/download", plgh.Download)
// Bundles — #164 + #165: both gated behind AdminAuth.
// POST /bundles/import — CRITICAL: anon creation of arbitrary workspaces
// with user-supplied config (system prompts,
// plugins, secrets envelope). #164.
// GET /bundles/export/:id — HIGH: full system prompts + memory for any
// workspace by UUID probe. #165.
bh := handlers.NewBundleHandler(broadcaster, prov, platformURL, configsDir, dockerCli)
{
bundleAdmin := r.Group("", middleware.AdminAuth(db.DB))
bundleAdmin.GET("/bundles/export/:id", bh.Export)
bundleAdmin.POST("/bundles/import", bh.Import)
}
// Org Templates
orgDir := findOrgDir(configsDir)
orgh := handlers.NewOrgHandler(wh, broadcaster, prov, channelMgr, configsDir, orgDir)
// #686: GET /org/templates exposes the org template catalogue (names, roles,
// configured system prompts). AdminAuth-gate to match /org/import.
r.GET("/org/templates", middleware.AdminAuth(db.DB), orgh.ListTemplates)
// Organization-scoped API tokens — user-facing replacement for
// ADMIN_TOKEN. Same AdminAuth gate: you need ADMIN_TOKEN, a
// session cookie, OR an existing org token to mint more. That's
// bootstrap-friendly (first token from ADMIN_TOKEN or canvas
// session) and self-sustaining afterwards (tokens mint tokens).
//
// The mint endpoint gets an extra per-IP rate limiter — a
// compromised session or leaked bearer could otherwise mint
// thousands of tokens per second, making forensic cleanup
// painful. 10 mints per hour per IP is ample for real usage;
// legitimate bursts fit in the ceiling and abuse bounces off.
// List + Delete don't need the extra limit (they can't be used
// to generate new secret material).
{
orgTokenHandler := handlers.NewOrgTokenHandler()
orgTokenAdmin := r.Group("", middleware.AdminAuth(db.DB))
orgTokenAdmin.GET("/org/tokens", orgTokenHandler.List)
orgTokenMintLimiter := middleware.NewRateLimiter(10, time.Hour, context.Background())
orgTokenAdmin.POST("/org/tokens", orgTokenMintLimiter.Middleware(), orgTokenHandler.Create)
orgTokenAdmin.DELETE("/org/tokens/:id", orgTokenHandler.Revoke)
}
// /org/import can create arbitrary workspaces from an uploaded YAML — it
// must be an admin-gated route. The handler also path-sanitizes
// `dir`/`template`/`files_dir` via resolveInsideRoot, but defence-in-
// depth keeps the route behind AdminAuth regardless.
r.POST("/org/import", middleware.AdminAuth(db.DB), orgh.Import)
// Org plugin allowlist — tool governance (#591).
// Both endpoints are admin-gated: reading the allowlist reveals approved
// tooling policy; writing it enforces org-level install governance.
{
allowlistAdmin := r.Group("", middleware.AdminAuth(db.DB))
aplh := handlers.NewOrgPluginAllowlistHandler()
allowlistAdmin.GET("/orgs/:id/plugins/allowlist", aplh.GetAllowlist)
allowlistAdmin.PUT("/orgs/:id/plugins/allowlist", aplh.PutAllowlist)
}
// Channels (social integrations — Telegram, Slack, Discord, etc.)
chh := handlers.NewChannelHandler(channelMgr)
r.GET("/channels/adapters", chh.ListAdapters)
wsAuth.GET("/channels", chh.List)
wsAuth.POST("/channels", chh.Create)
wsAuth.PATCH("/channels/:channelId", chh.Update)
wsAuth.DELETE("/channels/:channelId", chh.Delete)
wsAuth.POST("/channels/:channelId/send", chh.Send)
wsAuth.POST("/channels/:channelId/test", chh.Test)
// #250: /channels/discover is an admin-setup helper (takes a bot
// token, asks the vendor "what chats is this token a member of?").
// Leaving it unauthenticated turned it into a bot-token oracle plus
// a drive-by deleteWebhook side effect against any valid token an
// attacker could probe. AdminAuth matches the intent — it's a
// platform-operator helper, not a per-workspace route.
r.POST("/channels/discover", middleware.AdminAuth(db.DB), chh.Discover)
r.POST("/webhooks/:type", chh.Webhook)
// Audit — EU AI Act Annex III compliance endpoint (#594).
// Returns append-only HMAC-chained agent event log with optional inline
// chain verification when AUDIT_LEDGER_SALT is configured.
audh := handlers.NewAuditHandler()
wsAuth.GET("/audit", audh.Query)
// SSE — AG-UI compatible event stream per workspace (#590).
// WorkspaceAuth middleware (on wsAuth) binds the bearer token to :id.
sseh := handlers.NewSSEHandler(broadcaster)
wsAuth.GET("/events/stream", sseh.StreamEvents)
// WebSocket
sh := handlers.NewSocketHandler(hub)
r.GET("/ws", sh.HandleConnect)
// Control-plane reverse proxy — forwards /cp/* to the SaaS CP.
// Canvas's browser bundle fetches /cp/auth/me, /cp/orgs, etc. on
// SAME ORIGIN (the tenant's <slug>.moleculesai.app). Those paths
// aren't mounted on the tenant platform; without this proxy they
// 404 and login breaks. When CP_UPSTREAM_URL is empty (self-
// hosted / local dev where no CP exists), we skip the mount so
// Gin's default 404 surfaces cleanly instead of proxying to a
// placeholder.
//
// Mounted via NoRoute-style group BEFORE the canvas NoRoute so
// /cp/* wins over the UI fallback.
if cpURL := os.Getenv("CP_UPSTREAM_URL"); cpURL != "" {
cpProxy := newCPProxy(cpURL)
r.Any("/cp/*path", cpProxy)
}
// Canvas reverse proxy — when running as a combined tenant image
// (Dockerfile.tenant), the Next.js canvas server runs on :3000 inside
// the same container. Any route not matched by the API handlers above
// gets proxied to the canvas so the browser only ever talks to :8080.
//
// When CANVAS_PROXY_URL is empty (self-hosted / local dev), this is a
// no-op and Gin returns its default 404. The canvas dev server runs
// separately on :3000 in that setup.
if canvasURL := os.Getenv("CANVAS_PROXY_URL"); canvasURL != "" {
canvasProxy := newCanvasProxy(canvasURL)
r.NoRoute(canvasProxy)
}
return r
}
func findPluginsDir(configsDir string) string {
// configsDir-relative is most reliable; plugins live at repo-root plugins/
candidates := []string{
filepath.Join(configsDir, "..", "plugins"),
"../plugins",
"plugins",
}
for _, c := range candidates {
if info, err := os.Stat(c); err == nil && info.IsDir() {
// Must have at least one plugin subfolder to be valid
entries, _ := os.ReadDir(c)
for _, e := range entries {
if e.IsDir() {
abs, _ := filepath.Abs(c)
return abs
}
}
}
}
abs, _ := filepath.Abs(filepath.Join(configsDir, "..", "plugins"))
return abs
}
func findOrgDir(configsDir string) string {
candidates := []string{
"org-templates",
"../org-templates",
filepath.Join(configsDir, "..", "org-templates"),
}
for _, c := range candidates {
if info, err := os.Stat(c); err == nil && info.IsDir() {
abs, _ := filepath.Abs(c)
return abs
}
}
return "org-templates"
}