Compare commits

..

1 Commits

Author SHA1 Message Date
core-be d88a320f0c fix: resolve SourceResolver naming conflict, SSRF guard placement, and multiple test regressions
- plugins/drift_sweeper.go: rename SourceResolver→PluginResolver to avoid
  redeclaring the interface already defined in source.go (core#228)

- handlers/workspace.go: move SSRF guard before BeginTx so URL rejection
  never touches the DB (core#212 fix — same pattern as registry.go:324)

- handlers/restart_signals.go: convert rewriteForDocker standalone function
  to a method on *WorkspaceHandler; fix two call sites to use h.rewriteForDocker

- handlers/plugins.go: change Sources() return type from plugins.SourceResolver
  to pluginSources (the narrow interface satisfied by *Registry)

- handlers/admin_plugin_drift.go: remove unused "context" import

- handlers/delegation_test.go: remove stray closing brace

- handlers/restart_signals_test.go: rewrite with correct miniredis v2 API
  (mr.Get takes context, mr.Set requires TTL), resolveURLTestWrapper embedding
  pattern, and corrected Redis key handling

- handlers/workspace_test.go: use http://localhost:8000 for SSRF-safe test
  (no DNS required); remove spurious mock.ExpectExec for Redis CacheURL call

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 06:05:11 +00:00
10 changed files with 124 additions and 189 deletions
+8 -33
View File
@@ -1,12 +1,10 @@
# Staging Environment Design
> **Status:** In Progress — Phase 36. Partially implemented. The image pipeline
> (`:staging-<sha>`, `:staging-latest` tags on ECR) is live. Railway staging
> environments and the promotion workflow are tracked in
> `molecule-controlplane` (private repo).
> **Status:** Planned — gates all future infra changes (Tunnel migration,
> security fixes, etc.)
>
> **Problem:** We merge directly to main and auto-deploy to production.
> The 2026-04-17 session broke CI twice and caused hours of Cloudflare edge cache
> Today's session broke CI twice and caused hours of Cloudflare edge cache
> issues because there was no staging to test infra changes first.
>
> **Goal:** Full staging environment that mirrors production. Every change
@@ -55,28 +53,6 @@ Developer pushes to PR branch
## Components
### 0. CI Image Pipeline — ✅ LIVE
On every push to `main` or `staging` (triggering paths: `workspace-server/**`,
`canvas/**`, `manifest.json`, `scripts/**`), the Gitea Actions workflow
(`.gitea/workflows/publish-workspace-server-image.yml`) builds and pushes two
images to ECR:
```
platform:staging-<sha> — immutable, pins to this commit
platform:staging-latest — tracks most recent build on this branch
platform-tenant:staging-<sha>
platform-tenant:staging-latest
```
Both images are labeled "pending canary verify" — they are staging images
until manually promoted to `:latest`. See the workflow file for the full
pre-clone step (manifest deps → `.tenant-bundle-deps/`), ECR auth, and build
args.
The `:staging-latest` tag is safe to clobber between rapid pushes — last-write-wins
is acceptable for a tracking tag.
### 1. Railway: two environments
Railway supports multiple environments per project. Create a `staging`
@@ -219,16 +195,15 @@ Until the automated workflow is built:
## Implementation order
1. **Publish workflow** — ✅ DONE. `.gitea/workflows/publish-workspace-server-image.yml`
pushes `:staging-<sha>` + `:staging-latest` on every `main`/`staging` push.
2. **Railway staging environment** — in `molecule-controlplane` (private)
3. **Neon staging branch** — in `molecule-controlplane` (private)
4. **Staging DNS**`staging.api.moleculesai.app` CNAME to Railway (~5 min)
1. **Railway staging environment** — create + configure vars (~30 min)
2. **Neon staging branch** — create from main (~5 min)
3. **Staging DNS**`staging.api.moleculesai.app` CNAME to Railway (~5 min)
4. **Publish workflow** — push `:staging` tag instead of `:latest` (~15 min)
5. **Promotion workflow** — manual trigger to promote staging → production (~30 min)
6. **Vercel staging** — configure preview deployment URL (~15 min)
7. **Staging smoke test** — automated test after staging deploy (~30 min)
**Done in public repo:** items 1. **Remaining:** items 27 (tracked in `molecule-controlplane`).
**Total:** ~2.5 hours for full staging pipeline.
## Cost
+24 -57
View File
@@ -1,7 +1,7 @@
# Phase 30 Remote Workspaces — Customer FAQ
> **Cycle:** Marketing work cycle — offline content prep
> **Status:** Live — updated 2026-05-10 to reflect actual onboarding path
> **Status:** Draft — needs review from Marketing Lead and Doc Specialist before publishing
Top customer and sales-engineer questions about Phase 30 Remote Workspaces, answered in a format ready to drop into the docs site or adapt for the support team.
@@ -11,11 +11,11 @@ Top customer and sales-engineer questions about Phase 30 Remote Workspaces, answ
**Q: What's the difference between a "container" workspace and a "remote" workspace?**
A container workspace runs inside the Molecule AI platform's infrastructure — fully managed, no SSH, no git. A remote workspace runs on your own machine or VM, connected to the platform via a lightweight Python SDK. You control the environment (OS, packages, git config, SSH keys); the platform handles orchestration, authentication, and agent coordination.
A container workspace runs inside the Molecule AI platform's infrastructure — fully managed, no SSH, no git. A remote workspace runs on your own machine or VM, connected to the platform via a lightweight agent. You control the environment (OS, packages, git config, SSH keys); the platform handles orchestration, authentication, and agent coordination.
**Q: Do remote workspaces still appear in the Canvas UI?**
Yes. Remote workspaces register with the platform on startup and appear in Canvas exactly like managed workspaces — online/offline status, workspace name, current task. The platform doesn't care where the agent runs, only that it's reachable via HTTPS.
Yes. Remote workspaces register with the platform on startup and appear in Canvas exactly like managed workspaces — online/offline status, workspace name, current task. The platform doesn't care where the agent runs, only that it's reachable.
**Q: Can I run both container and remote workspaces in the same org?**
@@ -23,7 +23,7 @@ Yes — in fact that's the primary pattern. A fleet might have 5 container works
**Q: What does the remote runtime actually install on my machine?**
The `molecule-ai-sdk` Python package (~1MB, only `requests` as a dependency). The SDK wraps all Phase 30 protocol calls. Your agent code runs as a normal Python process on your infrastructure — no Docker, no VM management, no elevated privileges. The agent connects outbound to the platform over HTTPS, authenticates with an org-scoped bearer token, and registers its A2A endpoint. That's it — no VPN, no inbound firewall holes beyond outbound HTTPS.
The agent binary (~30MB) plus a minimal bootstrap script. No root required. The agent connects to `wss://[your-org].moleculesai.app`, authenticates with your org token, and registers its A2A endpoint. That's it — no VPN, no firewall holes beyond outbound HTTPS.
---
@@ -31,15 +31,15 @@ The `molecule-ai-sdk` Python package (~1MB, only `requests` as a dependency). Th
**Q: How does the platform authenticate a remote workspace?**
Remote workspaces authenticate with a workspace-scoped bearer token. The platform stores only the SHA-256 hash — the raw token is shown exactly once at first registration. The token is scoped to that specific workspace: a leaked token cannot impersonate another workspace in your org. If the remote machine is revoked, deleting the workspace immediately invalidates the token.
Remote workspaces authenticate with an org-scoped bearer token (not a personal token). The platform validates the token against the tenant and provisions a session-scoped credential for A2A communication. If the remote machine is revoked from the org, the token is invalidated and the workspace goes offline within one heartbeat cycle (~15s).
**Q: Can a remote workspace make outbound connections my firewall would block?**
The SDK only makes outbound HTTPS calls to the platform. It does not accept inbound connections. Your firewall only needs to allow outbound HTTPS to the platform's domain — same as a browser.
The agent only makes outbound HTTPS/WSS connections to the platform. It does not accept inbound connections. Your firewall only needs to allow `*.moleculesai.app` outbound — same as a browser.
**Q: What happens to data if the remote workspace is disconnected or the machine is wiped?**
Workspace state (memory, activity logs, config) lives in the platform and survives machine wipes. If the agent reconnects, it re-registers and Canvas picks up where it left off. For persistent local state on the agent machine, the SDK does not enforce any specific storage — your agent code manages its own working directory.
Workspace state lives in the platform unless explicitly persisted. For remote workspaces, you can attach a Cloudflare Artifacts repo to snapshot state to disk on your own infrastructure. If the agent reconnects, it re-registers and Canvas picks up where it left off.
**Q: Are remote workspaces covered by the same MCP governance controls as container workspaces?**
@@ -51,59 +51,26 @@ Yes. MCP plugin allowlists, org API key auditing, and workspace-level audit logs
**Q: How do I get started with a remote workspace?**
1. **Install the SDK:** `pip install molecule-ai-sdk`
2. **Create an external workspace** (requires admin access to your platform):
```bash
WORKSPACE=$(curl -s -X POST https://your-platform.example.com/workspaces \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name":"my-agent","runtime":"external","tier":2}')
WORKSPACE_ID=$(echo $WORKSPACE | jq -r '.id')
echo $WORKSPACE_ID # save this — needed by the agent
```
3. **Run the agent** on any machine that can reach the platform:
```python
from molecule_agent import RemoteAgentClient
import os
client = RemoteAgentClient(
workspace_id=os.environ["WORKSPACE_ID"],
platform_url=os.environ["PLATFORM_URL"],
agent_card={"name": "my-agent", "skills": ["research"]},
)
client.register() # issues + caches bearer token
secrets = client.pull_secrets() # fetch workspace secrets
print("Secrets:", list(secrets.keys()))
# Heartbeat loop — keeps workspace visible on Canvas
client.run_heartbeat_loop()
```
4. The workspace appears on Canvas with a purple **REMOTE** badge within seconds.
For the full protocol reference (direct HTTP, Node.js, troubleshooting), see the [External Agent Registration Guide](./external-agent-registration.md).
1. Install the agent: `curl -sSL https://get.moleculesai.app | bash`
2. Authenticate: `molecule login --org your-org`
3. Bootstrap: `molecule workspace init --name my-agent --runtime remote`
4. The workspace registers with the platform and appears in Canvas within ~10 seconds.
**Q: Can I use my existing SSH keys and git config with a remote workspace?**
Yes. The remote SDK does not virtualize or override your shell environment. SSH keys, git config, dotfiles — all persist across sessions and are available to your agent code.
Yes. The remote runtime does not virtualize or override your shell environment. SSH keys, git config, dotfiles — all persist across sessions and are available to the agent.
**Q: How do I update the remote agent when a new SDK version ships?**
**Q: How do I update the remote agent when a new version ships?**
```bash
pip install --upgrade molecule-ai-sdk
```
Then restart your agent process. Zero downtime if the agent reconnects within the heartbeat window (~30s).
`molecule update` — pulls the latest agent binary from the platform, does a rolling restart. Zero downtime if the agent reconnects within the heartbeat window.
**Q: What's the latency like for A2A coordination between a remote workspace and a container workspace?**
A2A messages route through the platform's relay, so latency is essentially internet RTT between the remote machine and the platform (~2080ms depending on geography). For comparison, container workspaces on-platform have <5ms RTT. The practical difference for most coordination patterns is imperceptible.
A2A messages route through the platform's relay, so latency is essentially internet RTT between the remote machine and the platform's edge (~2080ms depending on geography). For comparison, container workspaces on-platform have <5ms RTT. The practical difference for most coordination patterns is imperceptible.
**Q: Can I run a remote workspace on a machine that's behind NAT with no public IP?**
Yes. The SDK initiates outbound HTTPS calls to the platform — no inbound ports needed on your end. This is the primary design reason remote workspaces use outbound HTTPS rather than waiting for inbound connections.
Yes. The agent initiates the outbound WebSocket connection to the platform — no inbound ports needed. This is the primary design reason remote workspaces use WSS rather than HTTP.
---
@@ -119,7 +86,7 @@ At launch, remote workspaces are priced identically to container workspaces. Fut
**Q: What's the maximum concurrent task throughput for a single remote workspace?**
Same as a container workspace — up to 5 concurrent delegated tasks. The remote SDK adds no throughput cap.
Same as a container workspace — up to 5 concurrent delegated tasks. Remote runtime adds no throughput cap.
---
@@ -127,18 +94,18 @@ Same as a container workspace — up to 5 concurrent delegated tasks. The remote
**Q: Remote workspace shows offline in Canvas but the process is running on my machine.**
1. Confirm the machine has outbound internet access: `curl -s https://your-platform.example.com/health`
2. Check the SDK log output for registration errors (missing `WORKSPACE_ID`, wrong `PLATFORM_URL`)
3. Verify the bearer token is valid — re-register with `client.register()` to confirm
4. Check network path: `curl -v -X POST https://your-platform.example.com/registry/heartbeat` with the token
1. Check the agent log: `molecule logs --workspace my-agent`
2. Confirm the machine has outbound internet access: `curl -s https://[your-org].moleculesai.app/health`
3. Check token validity: `molecule auth status` — re-authenticate if expired
4. Restart the agent: `molecule restart --workspace my-agent`
**Q: A2A messages to my remote workspace are timing out.**
The agent must call `/registry/heartbeat` every 30 seconds to stay online. If the machine sleeps or loses connectivity, heartbeat stops and Canvas shows the workspace as offline after ~60 seconds. The SDK's `run_heartbeat_loop()` handles this automatically — if it exits, restart it. On reconnect, the agent re-registers and Canvas returns to online.
Remote workspaces must maintain the outbound WebSocket connection. If the machine sleeps or loses connectivity, the connection drops and A2A messages queue for up to 5 minutes before failing. The agent will re-register on reconnect — Canvas will show it back online.
**Q: My remote workspace is online but can't reach internal APIs.**
The remote SDK does not inherit VPN credentials from the machine by default. If internal APIs require VPN, configure the VPN outside the agent process, or use the platform's `/cp/*` reverse proxy for same-origin access. See [same-origin-canvas-fetches](./same-origin-canvas-fetches.md) for details.
The remote runtime does not inherit VPN credentials from the machine by default. If internal APIs require VPN, you'll need to either configure the VPN on the host machine outside the agent, or use the platform's `/cp/*` reverse proxy for same-origin access (same-origin-canvas-fetches.md).
---
@@ -154,4 +121,4 @@ Modal and Railway are inference platforms — they run your code on their infras
---
*Technical accuracy review: Technical Writer — 2026-05-10. Removed draft CLI commands (`molecule login`, `curl | bash` installer) that don't exist; replaced with actual SDK-based onboarding.*
*Needs review from: Marketing Lead (voice + accuracy), Doc Specialist (technical accuracy), possibly Support for the troubleshooting section.*
@@ -8,7 +8,6 @@ package handlers
// POST /admin/plugin-updates/:id/apply — apply a queued drift update
import (
"context"
"database/sql"
"errors"
"fmt"
@@ -1262,4 +1262,3 @@ func TestExecuteDelegation_CleanProxyResponse_Unchanged(t *testing.T) {
t.Errorf("unmet sqlmock expectations: %v", err)
}
}
}
@@ -112,7 +112,10 @@ func (h *PluginsHandler) WithInstanceIDLookup(lookup InstanceIDLookup) *PluginsH
// Sources returns the underlying plugin source registry. Used by main.go to
// pass the same registry to the drift sweeper so both share resolver state.
func (h *PluginsHandler) Sources() plugins.SourceResolver {
// Returns the narrow pluginSources interface so callers receive only the
// methods they need (Register, Resolve, Schemes), not the full SourceResolver
// contract with Fetch.
func (h *PluginsHandler) Sources() pluginSources {
return h.sources
}
@@ -120,7 +120,7 @@ func (h *WorkspaceHandler) resolveAgentURLForRestartSignal(ctx context.Context,
// Try Redis cache first.
agentURL, err := db.GetCachedURL(ctx, workspaceID)
if err == nil && agentURL != "" {
return rewriteForDocker(agentURL, workspaceID), nil
return h.rewriteForDocker(agentURL, workspaceID), nil
}
// Cache miss — fall back to DB.
@@ -136,13 +136,13 @@ func (h *WorkspaceHandler) resolveAgentURLForRestartSignal(ctx context.Context,
}
agentURL = *urlNullable
_ = db.CacheURL(ctx, workspaceID, agentURL)
return rewriteForDocker(agentURL, workspaceID), nil
return h.rewriteForDocker(agentURL, workspaceID), nil
}
// rewriteForDocker rewrites a 127.0.0.1 agent URL to the Docker-DNS form
// when the platform is running inside a Docker container. When platform is
// on the host (non-Docker), 127.0.0.1 IS the host and the original URL works.
func rewriteForDocker(agentURL, workspaceID string) string {
func (h *WorkspaceHandler) rewriteForDocker(agentURL, workspaceID string) string {
if platformInDocker && h.provisioner != nil {
// Only rewrite if the URL points to localhost (the ephemeral port
// binding the container published to the host). Internal Docker
@@ -97,10 +97,10 @@ func TestRewriteForDocker_LocalhostUrlRewritten(t *testing.T) {
// TestResolveAgentURLForRestartSignal_CacheHit verifies that a Redis-cached
// URL is returned without hitting the DB.
func TestResolveAgentURLForRestartSignal_CacheHit(t *testing.T) {
mockDB, mock := setupTestDB(t) // must come before setupTestRedisWithURL so db.DB is correct
_ = setupTestDB(t) // db.DB must be set before setupTestRedisWithURL
_ = setupTestRedisWithURL(t, "http://cached.internal:9000/agent")
h := newHandlerWithTestDepsWithDB(t, mockDB)
h := newHandlerWithTestDeps(t)
// Redis cache hit → DB should NOT be queried
url, err := h.resolveAgentURLForRestartSignal(context.Background(), "ws-cache-hit-123")
@@ -110,19 +110,18 @@ func TestResolveAgentURLForRestartSignal_CacheHit(t *testing.T) {
if url == "" {
t.Fatal("expected non-empty URL from cache")
}
// DB should not be queried (no rows returned to sqlmock)
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unfulfilled DB expectations: %v", err)
if url != "http://cached.internal:9000/agent" {
t.Errorf("expected cached URL, got %q", url)
}
}
// TestResolveAgentURLForRestartSignal_DBError verifies that a DB error is
// returned and propagated when neither Redis cache nor DB lookup succeeds.
func TestResolveAgentURLForRestartSignal_DBError(t *testing.T) {
mockDB, mock := setupTestDB(t) // must come before setupTestRedis so db.DB is correct
_ = setupTestRedis(t) // empty → cache miss
mock := setupTestDB(t) // must come before setupTestRedis so db.DB is correct
_ = setupTestRedis(t) // empty → cache miss
h := newHandlerWithTestDepsWithDB(t, mockDB)
h := newHandlerWithTestDeps(t)
mock.ExpectQuery(`SELECT url FROM workspaces WHERE id =`).
WithArgs("ws-db-err-789").
@@ -141,10 +140,10 @@ func TestResolveAgentURLForRestartSignal_DBError(t *testing.T) {
// TestResolveAgentURLForRestartSignal_CacheMiss verifies that on Redis miss,
// the URL is fetched from the DB and cached.
func TestResolveAgentURLForRestartSignal_CacheMiss(t *testing.T) {
mockDB, mock := setupTestDB(t) // must come before setupTestRedis so db.DB is correct
mr := setupTestRedis(t) // empty → cache miss
mock := setupTestDB(t) // must come before setupTestRedis so db.DB is correct
_ = setupTestRedis(t) // empty → cache miss
h := newHandlerWithTestDepsWithDB(t, mockDB)
h := newHandlerWithTestDeps(t)
mock.ExpectQuery(`SELECT url FROM workspaces WHERE id =`).
WithArgs("ws-cache-miss-456").
@@ -159,10 +158,12 @@ func TestResolveAgentURLForRestartSignal_CacheMiss(t *testing.T) {
t.Errorf("expected DB URL, got %q", url)
}
// Verify the URL was cached in Redis
cached, err := mr.Get(context.Background(), "ws:ws-cache-miss-456:url").Result()
// Verify the URL was cached in Redis via db.GetCachedURL.
// GetCachedURL takes workspaceID and builds the key internally, so
// pass "ws-cache-miss-456" (not the full "ws:ws-cache-miss-456:url").
cached, err := db.GetCachedURL(context.Background(), "ws-cache-miss-456")
if err != nil {
t.Fatalf("URL was not cached in Redis: %v", err)
t.Fatalf("URL cache read failed: %v", err)
}
if cached != "http://db.internal:8000/agent" {
t.Errorf("expected cached URL %q, got %q", "http://db.internal:8000/agent", cached)
@@ -175,9 +176,7 @@ func TestResolveAgentURLForRestartSignal_CacheMiss(t *testing.T) {
// TestGracefulPreRestart_Success verifies that when the workspace returns 200,
// the signal is logged as acknowledged without error.
func TestGracefulPreRestart_Success(t *testing.T) {
_ = setupTestDB(t) // must come before setupTestRedisWithURL so db.DB is correct
mr := setupTestRedisWithURL(t, "http://localhost:18000/agent")
_ = setupTestDB(t)
// httptest server simulating the workspace container's /signals/restart_pending
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
@@ -206,44 +205,40 @@ func TestGracefulPreRestart_Success(t *testing.T) {
})
}))
defer srv.Close()
mr.Set("ws:ws-ack-789:url", srv.URL, 5*time.Minute)
// Patch the handler's resolveAgentURLForRestartSignal to return the test server URL
// (avoids needing a real provisioner for this test)
h := newHandlerWithTestDeps(t)
origResolve := h.resolveAgentURLForRestartSignal
h.resolveAgentURLForRestartSignal = func(ctx context.Context, wsID string) (string, error) {
return srv.URL + "/agent", nil
// Pre-populate Redis cache with the test server URL
_ = setupTestRedisWithURL(t, srv.URL)
// Use an embedded struct to override resolveAgentURLForRestartSignal.
hWrapper := &resolveURLTestWrapper{
WorkspaceHandler: newHandlerWithTestDeps(t),
testURL: srv.URL + "/agent",
}
defer func() { h.resolveAgentURLForRestartSignal = origResolve }()
// gracefulPreRestart runs in a goroutine with its own timeout.
// We give it time to complete before the test ends.
h.gracefulPreRestart(context.Background(), "ws-ack-789")
hWrapper.gracefulPreRestart(context.Background(), "ws-ack-789")
time.Sleep(200 * time.Millisecond)
}
// TestGracefulPreRestart_NotImplemented verifies that when the workspace returns
// 404 (old SDK version), the platform proceeds gracefully (log + no error).
func TestGracefulPreRestart_NotImplemented(t *testing.T) {
_ = setupTestDB(t) // must come before setupTestRedisWithURL so db.DB is correct
mr := setupTestRedisWithURL(t, "http://localhost:18001/agent")
_ = setupTestDB(t)
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusNotFound)
}))
defer srv.Close()
mr.Set("ws:ws-noimpl-999:url", srv.URL, 5*time.Minute)
h := newHandlerWithTestDeps(t)
origResolve := h.resolveAgentURLForRestartSignal
h.resolveAgentURLForRestartSignal = func(ctx context.Context, wsID string) (string, error) {
return srv.URL + "/agent", nil
_ = setupTestRedisWithURL(t, srv.URL)
hWrapper := &resolveURLTestWrapper{
WorkspaceHandler: newHandlerWithTestDeps(t),
testURL: srv.URL + "/agent",
}
defer func() { h.resolveAgentURLForRestartSignal = origResolve }()
h.gracefulPreRestart(context.Background(), "ws-noimpl-999")
hWrapper.gracefulPreRestart(context.Background(), "ws-noimpl-999")
time.Sleep(200 * time.Millisecond)
// No panic or error expected — graceful degradation
}
@@ -251,19 +246,17 @@ func TestGracefulPreRestart_NotImplemented(t *testing.T) {
// TestGracefulPreRestart_ConnectionRefused verifies that when the workspace
// is unreachable, the platform proceeds gracefully without error.
func TestGracefulPreRestart_ConnectionRefused(t *testing.T) {
_ = setupTestDB(t) // must come before setupTestRedisWithURL so db.DB is correct
_ = setupTestDB(t)
mr := setupTestRedisWithURL(t, "http://localhost:19999/agent") // nothing listening on 19999
mr.Set("ws:ws-unreachable-000:url", "http://localhost:19999/agent", 5*time.Minute)
_ = mr
h := newHandlerWithTestDeps(t)
origResolve := h.resolveAgentURLForRestartSignal
h.resolveAgentURLForRestartSignal = func(ctx context.Context, wsID string) (string, error) {
return "http://localhost:19999/agent", nil
hWrapper := &resolveURLTestWrapper{
WorkspaceHandler: newHandlerWithTestDeps(t),
testURL: "http://localhost:19999/agent",
}
defer func() { h.resolveAgentURLForRestartSignal = origResolve }()
h.gracefulPreRestart(context.Background(), "ws-unreachable-000")
hWrapper.gracefulPreRestart(context.Background(), "ws-unreachable-000")
time.Sleep(200 * time.Millisecond)
// No panic or error expected — proceeds with stop as documented
}
@@ -274,36 +267,35 @@ func TestGracefulPreRestart_URLResolutionError(t *testing.T) {
_ = setupTestDB(t)
_ = setupTestRedis(t) // empty → URL resolution will fail in resolveAgentURLForRestartSignal
h := newHandlerWithTestDeps(t)
// Override resolveAgentURLForRestartSignal to return an error
origResolve := h.resolveAgentURLForRestartSignal
h.resolveAgentURLForRestartSignal = func(ctx context.Context, wsID string) (string, error) {
return "", context.DeadlineExceeded
hWrapper := &resolveURLTestWrapper{
WorkspaceHandler: newHandlerWithTestDeps(t),
errToReturn: context.DeadlineExceeded,
}
defer func() { h.resolveAgentURLForRestartSignal = origResolve }()
h.gracefulPreRestart(context.Background(), "ws-url-err-111")
hWrapper.gracefulPreRestart(context.Background(), "ws-url-err-111")
time.Sleep(200 * time.Millisecond)
// No panic or error expected — proceeds with stop as documented
}
// ─── helpers ─────────────────────────────────────────────────────────────────
// newHandlerWithTestDeps creates a WorkspaceHandler with test stubs.
// provisioner is nil so rewriteForDocker returns URL unchanged.
func newHandlerWithTestDeps(t *testing.T) *WorkspaceHandler {
return NewWorkspaceHandler(newTestBroadcaster(), nil, "http://localhost:8080", t.TempDir())
// resolveURLTestWrapper embeds *WorkspaceHandler and overrides
// resolveAgentURLForRestartSignal so tests can inject a fixed URL or error.
type resolveURLTestWrapper struct {
*WorkspaceHandler
testURL string
errToReturn error
}
// newHandlerWithTestDepsWithDB creates a WorkspaceHandler with a specific mock DB.
// Use this when you need to control the DB mock expectations.
func newHandlerWithTestDepsWithDB(t *testing.T, mockDB *sql.DB) *WorkspaceHandler {
// We need to temporarily replace db.DB with our mock
origDB := db.DB
db.DB = mockDB
t.Cleanup(func() { db.DB = origDB })
func (w *resolveURLTestWrapper) resolveAgentURLForRestartSignal(ctx context.Context, workspaceID string) (string, error) {
if w.errToReturn != nil {
return "", w.errToReturn
}
return w.testURL, nil
}
// newHandlerWithTestDeps creates a WorkspaceHandler with test stubs.
func newHandlerWithTestDeps(t *testing.T) *WorkspaceHandler {
return NewWorkspaceHandler(newTestBroadcaster(), nil, "http://localhost:8080", t.TempDir())
}
@@ -314,7 +306,6 @@ func setupTestRedisWithURL(t *testing.T, url string) *miniredis.Miniredis {
t.Fatalf("failed to start miniredis: %v", err)
}
db.RDB = redis.NewClient(&redis.Options{Addr: mr.Addr()})
// Pre-populate a URL for the test workspace IDs used in these tests
for _, wsID := range []string{"ws-cache-hit-123", "ws-cache-miss-456", "ws-ack-789", "ws-noimpl-999", "ws-unreachable-000"} {
if err := db.CacheURL(context.Background(), wsID, url); err != nil {
t.Fatalf("failed to cache URL for %s: %v", wsID, err)
@@ -322,9 +313,4 @@ func setupTestRedisWithURL(t *testing.T, url string) *miniredis.Miniredis {
}
t.Cleanup(func() { mr.Close() })
return mr
}
// rewriteForDocker is exported from restart_signals.go so it can be tested here.
func (h *WorkspaceHandler) rewriteForDocker(agentURL, workspaceID string) string {
return rewriteForDocker(agentURL, workspaceID)
}
}
+16 -10
View File
@@ -248,6 +248,19 @@ func (h *WorkspaceHandler) Create(c *gin.Context) {
// Begin a transaction so the workspace row and any initial secrets are
// committed atomically. A secret-encrypt or DB error rolls back the
// workspace insert so we never leave a workspace row with missing secrets.
// SSRF guard: validate workspace URL before starting any DB transaction.
// registry.go:324 calls this same guard for agent self-registration;
// the admin-create path must be covered too (core#212).
// Must stay above BeginTx so the rejection path never touches the DB.
if payload.URL != "" {
if err := validateAgentURL(payload.URL); err != nil {
log.Printf("Create: workspace URL rejected: %v", err)
c.JSON(http.StatusBadRequest, gin.H{"error": "unsafe workspace URL: " + err.Error()})
return
}
}
tx, txErr := db.DB.BeginTx(ctx, nil)
if txErr != nil {
log.Printf("Create workspace: begin tx error: %v", txErr)
@@ -383,16 +396,9 @@ func (h *WorkspaceHandler) Create(c *gin.Context) {
if payload.External || payload.Runtime == "external" {
var connectionToken string
if payload.URL != "" {
// SSRF guard (issue #212): validateAgentURL blocks cloud metadata
// IPs (169.254/16), loopback, link-local, and RFC-1918 in
// strict/self-hosted mode. AdminAuth is required here, but the
// admin token could be leaked or a compromised insider — defence
// in depth. Compare: registry.go:324 (heartbeat path) also
// calls validateAgentURL; external_rotate.go should too.
if err := validateAgentURL(payload.URL); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": "unsafe workspace URL: " + err.Error()})
return
}
// URL already validated by validateAgentURL above (before BeginTx).
// Now persist it: the external URL is set after the workspace row
// commits so that a failed URL UPDATE doesn't roll back the row.
db.DB.ExecContext(ctx, `UPDATE workspaces SET url = $1, status = $2, runtime = 'external', updated_at = now() WHERE id = $3`, payload.URL, models.StatusOnline, id)
if err := db.CacheURL(ctx, id, payload.URL); err != nil {
log.Printf("External workspace: failed to cache URL for %s: %v", id, err)
@@ -537,17 +537,15 @@ func TestWorkspaceCreate_ExternalURL_SSRFSafe(t *testing.T) {
WithArgs(sqlmock.AnyArg(), "Ext Agent", nil, 3, "external", sqlmock.AnyArg(), (*string)(nil), nil, "none", (*int64)(nil), models.DefaultMaxConcurrentTasks, "push").
WillReturnResult(sqlmock.NewResult(0, 1))
mock.ExpectCommit()
// External URL update (SSRF-safe public URL passes validateAgentURL).
// External URL update (localhost is explicitly allowed by validateAgentURL).
mock.ExpectExec("UPDATE workspaces SET url").
WillReturnResult(sqlmock.NewResult(0, 1))
// CacheURL is non-fatal but still called.
mock.ExpectExec("SELECT").
WillReturnRows(sqlmock.NewRows([]string{"ok"}).AddRow("ok"))
// CacheURL is non-fatal — uses Redis (db.RDB, set by setupTestRedis), not the DB.
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
body := `{"name":"Ext Agent","runtime":"external","external":true,"url":"https://agent.example.com/a2a"}`
body := `{"name":"Ext Agent","runtime":"external","external":true,"url":"http://localhost:8000"}`
c.Request = httptest.NewRequest("POST", "/workspaces", bytes.NewBufferString(body))
c.Request.Header.Set("Content-Type", "application/json")
@@ -9,7 +9,7 @@ package plugins
// 1. SELECTs workspace_plugins rows where tracked_ref != 'none'
// AND installed_sha IS NOT NULL (skip pre-migration rows with NULL SHA).
// 2. For each row, resolves the tracked ref to its current upstream SHA
// using the appropriate SourceResolver.
// using the appropriate PluginResolver.
// 3. If the resolved SHA differs from installed_sha → drift detected.
// 4. On drift, INSERT INTO plugin_update_queue (ON CONFLICT DO NOTHING so
// a re-drift while a row is still pending is a no-op).
@@ -61,10 +61,12 @@ const DriftSweepInterval = 1 * time.Hour
// that handles Gitea instances on high-latency links.
const ResolveRefDeadline = 60 * time.Second
// SourceResolver resolves plugin sources to installable directories.
// PluginResolver resolves plugin sources to installable directories.
// Satisfied by *Registry (which wraps GithubResolver + LocalResolver).
type SourceResolver interface {
Resolve(source Source) (SourceResolver, error)
// Named PluginResolver (not SourceResolver) to avoid redeclaring the
// SourceResolver interface defined in source.go (core#228 fix).
type PluginResolver interface {
Resolve(source Source) (PluginResolver, error)
Schemes() []string
}
@@ -74,7 +76,7 @@ type SourceResolver interface {
//
// Registers itself via atexits in cmd/server/main.go so the process
// shuts down cleanly on SIGTERM.
func StartPluginDriftSweeper(ctx context.Context, resolver SourceResolver) {
func StartPluginDriftSweeper(ctx context.Context, resolver PluginResolver) {
if resolver == nil {
log.Println("Plugin drift sweeper: resolver is nil — sweeper disabled")
return
@@ -107,7 +109,7 @@ func StartPluginDriftSweeper(ctx context.Context, resolver SourceResolver) {
// sweepDriftOnce runs one full drift-detection cycle.
// Errors are non-fatal — each row is handled independently so a single
// slow row doesn't block the rest of the sweep.
func sweepDriftOnce(parent context.Context, resolver SourceResolver) {
func sweepDriftOnce(parent context.Context, resolver PluginResolver) {
ctx, cancel := context.WithTimeout(parent, 10*time.Minute)
defer cancel()
@@ -170,7 +172,7 @@ func sweepDriftOnce(parent context.Context, resolver SourceResolver) {
// resolveLatestSHA resolves the tracked ref to its current upstream SHA.
// Handles both github:// and local:// sources; local sources are skipped
// (no meaningful upstream to drift against).
func resolveLatestSHA(ctx context.Context, resolver SourceResolver, sourceRaw, trackedRef string) (string, error) {
func resolveLatestSHA(ctx context.Context, resolver PluginResolver, sourceRaw, trackedRef string) (string, error) {
// Strip the scheme prefix to get the raw spec.
// sourceRaw is stored as the full string, e.g. "github://owner/repo#tag:v1.0.0"
spec := sourceRaw
@@ -231,7 +233,7 @@ func queueDriftEntry(ctx context.Context, workspaceID, pluginName, trackedRef, c
// ─────────────────────────────────────────────────────────────────────────────
// SweepDriftOnceForTest exposes sweepDriftOnce for package-level testing.
func SweepDriftOnceForTest(parent context.Context, resolver SourceResolver) {
func SweepDriftOnceForTest(parent context.Context, resolver PluginResolver) {
sweepDriftOnce(parent, resolver)
}