molecule-core/workspace-server/internal/bundle/importer.go
Hongming Wang 1125a029b8 fix(platform): unblock SaaS workspace registration end-to-end
Every workspace in the cross-EC2 SaaS provisioning shape was failing
registration, heartbeat, or A2A routing. Four distinct blockers sat
between "EC2 is up" and "agent responds"; three are platform-side and
fixed here (the fourth is in the CP user-data, separate PR).

1. SSRF validator blocked RFC-1918 (registry.go + mcp.go)
   validateAgentURL and isPrivateOrMetadataIP rejected 172.16.0.0/12,
   which contains the AWS default VPC range (172.31.x.x) that every
   sibling workspace EC2 registers from. Registration returned 400 and
   the 10-min provision sweep flipped status to failed. RFC-1918 +
   IPv6 ULA are now gated behind saasMode(); link-local (169.254/16),
   loopback, IPv6 metadata (fe80::/10, ::1), and TEST-NET stay blocked
   unconditionally in both modes.

   saasMode() resolution order:
     1. MOLECULE_DEPLOY_MODE=saas|self-hosted (explicit operator flag)
     2. MOLECULE_ORG_ID presence (legacy implicit signal, kept for
        back-compat so existing deployments don't need a config change)

   isPrivateOrMetadataIP now actually checks IPv6 — previously it
   returned false on any non-IPv4 input, which would let a registered
   [::1] or [fe80::...] URL bypass the SSRF check entirely.

2. Orphan auth-token minting (workspace_provision.go)
   issueAndInjectToken mints a token and stuffs it into
   cfg.ConfigFiles[".auth_token"]. The Docker provisioner writes that
   file into the /configs volume — the CP provisioner ignores it
   (only cfg.EnvVars crosses the wire). Result: live token in DB, no
   plaintext on disk, RegistryHandler.requireWorkspaceToken 401s every
   /registry/register attempt because the workspace is no longer in
   the "no live token → bootstrap-allowed" state. Now no-ops in SaaS
   mode; the register handler already mints on first successful
   register and returns the plaintext in the response body for the
   runtime to persist locally.

   Also removes the redundant wsauth.IssueToken call at the bottom of
   provisionWorkspaceCP, which created the same orphan-token pattern
   a second time.

3. Compaction artefacts (bundle/importer.go, handlers/org_tokens.go,
   scheduler.go, workspace_provision.go)
   Four pre-existing compile errors on main from an earlier session's
   code truncation: missing tuple destructuring on ExecContext /
   redactSecrets / orgTokenActor, missing close-brace in
   Scheduler.fireSchedule's panic recovery. All one-line mechanical
   fixes; without them the binary would not build.

Tests
-----
ssrf_test.go adds:
  * TestSaasMode — covers the env resolution ladder (explicit flag
    wins over legacy signal, case-insensitive, whitespace tolerant)
  * TestIsPrivateOrMetadataIP_SaaSMode — asserts RFC-1918 + IPv6 ULA
    flip to allowed, metadata/loopback/TEST-NET still blocked
  * TestIsPrivateOrMetadataIP_IPv6 — regression guard for the old
    "returns false for all IPv6" behaviour

Follow-up issue for CP-sourced workspace_id attestation will be filed
separately — closes the residual intra-VPC SSRF + token-race windows
the SaaS-mode relaxation introduces.

Verified end-to-end today on workspace 6565a2e0 (hermes runtime, OpenAI
provider) — agent returned "PONG" in 1.4s after register → heartbeat →
A2A proxy → runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 03:06:46 -07:00

146 lines
4.2 KiB
Go

package bundle
import (
"context"
"fmt"
"strings"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/events"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/provisioner"
"github.com/google/uuid"
)
// ImportResult tracks the outcome of importing a bundle tree.
type ImportResult struct {
WorkspaceID string `json:"workspace_id"`
Name string `json:"name"`
Status string `json:"status"` // "provisioning" or "failed"
Error string `json:"error,omitempty"`
Children []ImportResult `json:"children,omitempty"`
}
// Import provisions a workspace tree from a Bundle.
// It creates workspace records, writes config files to a temp dir, and triggers the provisioner.
func Import(
ctx context.Context,
b *Bundle,
parentID *string,
broadcaster *events.Broadcaster,
prov *provisioner.Provisioner,
platformURL string,
) ImportResult {
// Generate fresh workspace ID
wsID := uuid.New().String()
result := ImportResult{
WorkspaceID: wsID,
Name: b.Name,
Status: "provisioning",
}
// Create workspace record
_, err := db.DB.ExecContext(ctx, `
INSERT INTO workspaces (id, name, role, tier, status, parent_id, source_bundle_id)
VALUES ($1, $2, $3, $4, 'provisioning', $5, $6)
`, wsID, b.Name, nilIfEmpty(b.Description), b.Tier, parentID, b.ID)
if err != nil {
result.Status = "failed"
result.Error = fmt.Sprintf("failed to create workspace record: %v", err)
return result
}
_ = broadcaster.RecordAndBroadcast(ctx, "WORKSPACE_PROVISIONING", wsID, map[string]interface{}{
"name": b.Name,
"tier": b.Tier,
"source_bundle_id": b.ID,
})
// Build config files in memory for the provisioner
configFiles := buildBundleConfigFiles(b)
// Extract runtime from config.yaml in the bundle
bundleRuntime := "langgraph"
if configYaml, ok := b.Prompts["config.yaml"]; ok {
for _, line := range strings.Split(configYaml, "\n") {
line = strings.TrimSpace(line)
if strings.HasPrefix(line, "runtime:") {
bundleRuntime = strings.TrimSpace(strings.TrimPrefix(line, "runtime:"))
break
}
}
}
// Store runtime in DB
_, _ = db.DB.ExecContext(ctx, `UPDATE workspaces SET runtime = $1 WHERE id = $2`, bundleRuntime, wsID)
// Provision the container if provisioner is available
if prov != nil {
cfg := provisioner.WorkspaceConfig{
WorkspaceID: wsID,
ConfigFiles: configFiles,
Tier: b.Tier,
Runtime: bundleRuntime,
EnvVars: map[string]string{},
PlatformURL: platformURL,
// PluginsPath set by caller if available
}
go func() {
provCtx, cancel := context.WithTimeout(context.Background(), provisioner.ProvisionTimeout)
defer cancel()
url, err := prov.Start(provCtx, cfg)
if err != nil {
markFailed(provCtx, wsID, broadcaster, err)
} else if url != "" {
db.DB.ExecContext(provCtx, `UPDATE workspaces SET url = $1 WHERE id = $2`, url, wsID)
}
}()
}
// Recursively import sub-workspaces
for _, sub := range b.SubWorkspaces {
childResult := Import(ctx, &sub, &wsID, broadcaster, prov, platformURL)
result.Children = append(result.Children, childResult)
}
return result
}
// buildBundleConfigFiles builds a map of config files from a bundle for writing into a container volume.
func buildBundleConfigFiles(b *Bundle) map[string][]byte {
files := make(map[string][]byte)
// Write system-prompt.md
if b.SystemPrompt != "" {
files["system-prompt.md"] = []byte(b.SystemPrompt)
}
// Write config.yaml from prompts if present
if configYaml, ok := b.Prompts["config.yaml"]; ok {
files["config.yaml"] = []byte(configYaml)
}
// Write skills
for _, skill := range b.Skills {
for relPath, content := range skill.Files {
files[fmt.Sprintf("skills/%s/%s", skill.ID, relPath)] = []byte(content)
}
}
return files
}
func markFailed(ctx context.Context, wsID string, broadcaster *events.Broadcaster, err error) {
db.DB.ExecContext(ctx,
`UPDATE workspaces SET status = 'failed', updated_at = now() WHERE id = $1`, wsID)
broadcaster.RecordAndBroadcast(ctx, "WORKSPACE_PROVISION_FAILED", wsID, map[string]interface{}{
"error": err.Error(),
})
}
func nilIfEmpty(s string) interface{} {
if s == "" {
return nil
}
return s
}