molecule-core/workspace-server/internal/bundle/importer.go
Hongming Wang 8059fee128 fix(tenant-guard): allowlist /registry/register + /registry/heartbeat (#1236)
* fix(security): call redactSecrets before seeding workspace memories (F1085)

seedInitialMemories() in workspace_provision.go was inserting template/config
memories directly into agent_memories without scrubbing credential patterns.
A workspace provisioned from a template containing API keys, tokens, or other
secrets would store them in plain text — the same class of issue as #838.

Fix: call redactSecrets(workspaceID, content) on the truncated memory content
before the INSERT. The truncation (maxMemoryContentLength = 100 KiB, CWE-400)
is preserved — redaction runs after truncation so the size limit still applies.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(workspace_provision): add seedInitialMemories coverage for #1208

Cover the truncate-at-100k boundary (PR #1167, CWE-400) and the
redactSecrets call (F1085 / #1132), both identified as untested in #1208.

- TestSeedInitialMemories_TruncatesOversizedContent: boundary at exactly
  100k, 1 byte over, far over, and well under. Verifies INSERT receives
  exactly maxMemoryContentLength bytes.
- TestSeedInitialMemories_RedactsSecrets: verifies redactSecrets runs
  before INSERT, regression test for F1085.
- TestSeedInitialMemories_InvalidScopeSkipped: invalid scope is silently
  skipped, no INSERT called.
- TestSeedInitialMemories_EmptyMemoriesNil: nil slice is handled without
  DB calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(marketing): Discord adapter launch visual assets (#1209)

Squash-merge: Discord adapter launch visual assets (3 PNGs) + social copy. Acceptance: assets on staging.

* fix(ci): golangci-lint errcheck failures on staging

Suppress errcheck warnings for calls where the return value is safely
ignored:
  - resp.Body.Close() (artifacts/client.go): deferred cleanup — failure
    to close a response body is non-critical; the defer itself is what
    matters for connection reuse.
  - rows.Close() (bundle/exporter.go): deferred cleanup in a loop where
    rows.Err() already handles query errors.
  - filepath.Walk (bundle/exporter.go): top-level walk call; errors in
    sub-directory traversal are handled by the inner callback (which
    returns nil for err != nil).
  - broadcaster.RecordAndBroadcast (bundle/importer.go): fire-and-forget
    event broadcast; errors are logged internally by the broadcaster.
  - db.DB.ExecContext (bundle/importer.go): best-effort runtime column
    update; non-critical auxiliary data that the provisioner re-extracts
    if needed.

Fixes: #1143

* test(artifacts): suppress w.Write return values to satisfy errcheck

All httptest.ResponseWriter.Write calls in client_test.go now discard
the byte count and error return with _, _ = prefix. The Write method
is safe to discard in test handlers — httptest.ResponseWriter.Write
never returns an error for in-memory buffers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(CI): move changes job off self-hosted runner + add workflow concurrency

Cherry-pick from staging PR #1194 for main. Two changes to relieve
macOS arm64 runner saturation:

1. `changes` job: runs on ubuntu-latest instead of
   [self-hosted, macos, arm64]. This job does a plain `git diff`
   with zero macOS dependencies — moving it off the runner frees
   a slot immediately on every workflow trigger.

2. Add workflow-level concurrency:
   concurrency: group: ci-${{ github.ref }}; cancel-in-progress: true

   Prevents multiple stale in-flight CI runs from queuing on the
   same ref when new commits arrive.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(security): call redactSecrets before seeding workspace memories (F1085) (#1203)

seedInitialMemories() in workspace_provision.go was inserting template/config
memories directly into agent_memories without scrubbing credential patterns.
A workspace provisioned from a template containing API keys, tokens, or other
secrets would store them in plain text — the same class of issue as #838.

Fix: call redactSecrets(workspaceID, content) on the truncated memory content
before the INSERT. The truncation (maxMemoryContentLength = 100 KiB, CWE-400)
is preserved — redaction runs after truncation so the size limit still applies.

Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* tick: 2026-04-21 ~03:40Z — CI stalled 59+ min, GH_TOKEN 4th rotation, PR reviews done

* fix(tenant-guard): allowlist /registry/register + /registry/heartbeat

Final layer of today's stuck-provisioning saga. With the private-IP
platform_url fix and the intra-VPC :8080 SG rule in place, workspace
EC2s finally reached the tenant on the right port — only to have every
POST bounced with a synthetic 404 by TenantGuard.

TenantGuard is the SaaS hook that rejects cross-tenant routing. It
demands X-Molecule-Org-Id on every request, but CP's workspace user-
data doesn't export MOLECULE_ORG_ID (only WORKSPACE_ID, PLATFORM_URL,
RUNTIME, PORT), so the runtime can't attach the header. Net effect:
every workspace's first heartbeat to /registry/heartbeat was a silent
404, and the workspace sat in 'provisioning' until the platform
sweeper timed it out.

Allowlist the two workspace-boot paths:
  - /registry/register  — one-shot at runtime startup
  - /registry/heartbeat — every 30s

Both are still gated by wsauth.HasAnyLiveToken (workspaces with a
token on file must present it; legacy tokenless workspaces are
grandfathered). And the tenant SG already scopes :8080 to the VPC
CIDR, so only intra-VPC callers can reach these paths in the first
place. The allowlist bypasses cross-org routing, not auth.

Follow-up: passing MOLECULE_ORG_ID into the workspace env would let
the runtime attach the header and drop this allowlist entry. Tracked
separately; not urgent since the multi-layer auth above is already
adequate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Infra-SRE <infra-sre@agents.moleculesai.app>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-DevOps <core-devops@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
2026-04-21 02:47:27 +00:00

146 lines
4.2 KiB
Go

package bundle
import (
"context"
"fmt"
"strings"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/events"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/provisioner"
"github.com/google/uuid"
)
// ImportResult tracks the outcome of importing a bundle tree.
type ImportResult struct {
WorkspaceID string `json:"workspace_id"`
Name string `json:"name"`
Status string `json:"status"` // "provisioning" or "failed"
Error string `json:"error,omitempty"`
Children []ImportResult `json:"children,omitempty"`
}
// Import provisions a workspace tree from a Bundle.
// It creates workspace records, writes config files to a temp dir, and triggers the provisioner.
func Import(
ctx context.Context,
b *Bundle,
parentID *string,
broadcaster *events.Broadcaster,
prov *provisioner.Provisioner,
platformURL string,
) ImportResult {
// Generate fresh workspace ID
wsID := uuid.New().String()
result := ImportResult{
WorkspaceID: wsID,
Name: b.Name,
Status: "provisioning",
}
// Create workspace record
_, err := db.DB.ExecContext(ctx, `
INSERT INTO workspaces (id, name, role, tier, status, parent_id, source_bundle_id)
VALUES ($1, $2, $3, $4, 'provisioning', $5, $6)
`, wsID, b.Name, nilIfEmpty(b.Description), b.Tier, parentID, b.ID)
if err != nil {
result.Status = "failed"
result.Error = fmt.Sprintf("failed to create workspace record: %v", err)
return result
}
_ = broadcaster.RecordAndBroadcast(ctx, "WORKSPACE_PROVISIONING", wsID, map[string]interface{}{
"name": b.Name,
"tier": b.Tier,
"source_bundle_id": b.ID,
})
// Build config files in memory for the provisioner
configFiles := buildBundleConfigFiles(b)
// Extract runtime from config.yaml in the bundle
bundleRuntime := "langgraph"
if configYaml, ok := b.Prompts["config.yaml"]; ok {
for _, line := range strings.Split(configYaml, "\n") {
line = strings.TrimSpace(line)
if strings.HasPrefix(line, "runtime:") {
bundleRuntime = strings.TrimSpace(strings.TrimPrefix(line, "runtime:"))
break
}
}
}
// Store runtime in DB
_ = db.DB.ExecContext(ctx, `UPDATE workspaces SET runtime = $1 WHERE id = $2`, bundleRuntime, wsID)
// Provision the container if provisioner is available
if prov != nil {
cfg := provisioner.WorkspaceConfig{
WorkspaceID: wsID,
ConfigFiles: configFiles,
Tier: b.Tier,
Runtime: bundleRuntime,
EnvVars: map[string]string{},
PlatformURL: platformURL,
// PluginsPath set by caller if available
}
go func() {
provCtx, cancel := context.WithTimeout(context.Background(), provisioner.ProvisionTimeout)
defer cancel()
url, err := prov.Start(provCtx, cfg)
if err != nil {
markFailed(provCtx, wsID, broadcaster, err)
} else if url != "" {
db.DB.ExecContext(provCtx, `UPDATE workspaces SET url = $1 WHERE id = $2`, url, wsID)
}
}()
}
// Recursively import sub-workspaces
for _, sub := range b.SubWorkspaces {
childResult := Import(ctx, &sub, &wsID, broadcaster, prov, platformURL)
result.Children = append(result.Children, childResult)
}
return result
}
// buildBundleConfigFiles builds a map of config files from a bundle for writing into a container volume.
func buildBundleConfigFiles(b *Bundle) map[string][]byte {
files := make(map[string][]byte)
// Write system-prompt.md
if b.SystemPrompt != "" {
files["system-prompt.md"] = []byte(b.SystemPrompt)
}
// Write config.yaml from prompts if present
if configYaml, ok := b.Prompts["config.yaml"]; ok {
files["config.yaml"] = []byte(configYaml)
}
// Write skills
for _, skill := range b.Skills {
for relPath, content := range skill.Files {
files[fmt.Sprintf("skills/%s/%s", skill.ID, relPath)] = []byte(content)
}
}
return files
}
func markFailed(ctx context.Context, wsID string, broadcaster *events.Broadcaster, err error) {
db.DB.ExecContext(ctx,
`UPDATE workspaces SET status = 'failed', updated_at = now() WHERE id = $1`, wsID)
broadcaster.RecordAndBroadcast(ctx, "WORKSPACE_PROVISION_FAILED", wsID, map[string]interface{}{
"error": err.Error(),
})
}
func nilIfEmpty(s string) interface{} {
if s == "" {
return nil
}
return s
}