molecule-core/platform/internal/handlers/discovery.go
Hongming Wang 789f568bef chore: quality pass — native dialogs, env sync, Go handler splits
Three parallel cleanups driven by the second code-review pass.

## Native dialogs → ConfirmDialog (7 sites)

Violated the standing feedback_no_native_dialogs rule.

- ChannelsTab: confirm() → ConfirmDialog danger variant with pendingDelete state
- ScheduleTab: window.confirm() → ConfirmDialog danger
- ChatTab: confirm("Restart...") → ConfirmDialog warning (restart is recoverable)
- TemplatePalette: two alert() sites collapsed into a single notice state +
  ConfirmDialog as OK-only info toast
- ErrorBoundary: dropped both window.alert calls entirely. Clipboard-copy
  click is self-evident; console.error already captures the fallback.

## .env.example ↔ Go env var sync

Added 11 previously-undocumented env vars grouped into 6 new sections:

- Platform: PLATFORM_URL, MOLECULE_URL, WORKSPACE_DIR, MOLECULE_ENV
- CORS / rate limiting: CORS_ORIGINS, RATE_LIMIT
- Activity retention: ACTIVITY_RETENTION_DAYS, ACTIVITY_CLEANUP_INTERVAL_HOURS
- Container detection: MOLECULE_IN_DOCKER (moved to dedup)
- Observability: AWARENESS_URL
- Webhooks: GITHUB_WEBHOOK_SECRET
- CLI: MOLECLI_URL

All 21 distinct os.Getenv / envx.* keys (excluding HOME) now documented.
Zero orphans in the other direction.

## Go handler function splits (4 funcs, pure refactor)

No behavior change; same tests pass.

| Function                  | Before | After | Helpers                                                       |
|---------------------------|-------:|------:|---------------------------------------------------------------|
| proxyA2ARequest           |    257 |    56 | resolveAgentURL, normalizeA2APayload, dispatchA2A,            |
|                           |        |       | handleA2ADispatchError, maybeMarkContainerDead,               |
|                           |        |       | logA2AFailure, logA2ASuccess                                  |
| Delegate                  |    127 |    60 | bindDelegateRequest, lookupIdempotentDelegation,              |
|                           |        |       | insertDelegationRow                                           |
| Discover                  |    125 |    40 | discoverWorkspacePeer, writeExternalWorkspaceURL,             |
|                           |        |       | discoverHostPeer                                              |
| SessionSearch             |    109 |    24 | parseSessionSearchParams, buildSessionSearchQuery,            |
|                           |        |       | scanSessionSearchRows                                         |

Preserved exact error semantics, log.Printf calls, status codes, and
response shapes. Introduced a proxyDispatchBuildError sentinel in
a2a_proxy so the orchestrator can distinguish "couldn't build the
request" from "Do() failed" without changing existing branches.

## Verification

- go build ./... clean
- go vet ./... clean
- go test -race ./internal/... — all pass
- canvas npm run build — clean
- canvas npm test -- --run — 352/352 pass
- grep window.confirm|window.alert|window.prompt in canvas/src — 0 matches
- every platform os.Getenv key present in .env.example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:36:30 -07:00

343 lines
12 KiB
Go

package handlers
import (
"context"
"database/sql"
"encoding/json"
"errors"
"log"
"net/http"
"strings"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/provisioner"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/registry"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/wsauth"
"github.com/gin-gonic/gin"
)
type DiscoveryHandler struct{}
func NewDiscoveryHandler() *DiscoveryHandler {
return &DiscoveryHandler{}
}
// Discover handles GET /registry/discover/:id
func (h *DiscoveryHandler) Discover(c *gin.Context) {
targetID := c.Param("id")
callerID := c.GetHeader("X-Workspace-ID")
if callerID == "" {
c.JSON(http.StatusBadRequest, gin.H{"error": "X-Workspace-ID header is required"})
return
}
// Phase 30.6 — verify the caller's bearer token before revealing any
// peer URL. Without this, a random internet host that knows a
// workspace ID could enumerate siblings. Legacy workspaces (no
// live tokens) grandfather through the same way heartbeat does.
if err := validateDiscoveryCaller(c.Request.Context(), c, callerID); err != nil {
return // response already written
}
if !registry.CanCommunicate(callerID, targetID) {
c.JSON(http.StatusForbidden, gin.H{"error": "not authorized to discover this workspace"})
return
}
ctx := c.Request.Context()
// Workspace-to-workspace: return Docker-internal URL (containers can't
// reach host ports). External targets need their registered URL with
// 127.0.0.1/localhost rewritten to host.docker.internal when the caller
// is itself a Docker container.
if callerID != "" {
discoverWorkspacePeer(ctx, c, callerID, targetID)
return
}
discoverHostPeer(ctx, c, targetID)
}
// discoverHostPeer handles the canvas/external (no X-Workspace-ID) branch of
// Discover. It returns the host-accessible URL for `targetID`, following any
// forwarding chain (max 5 hops). Currently unreachable because Discover
// requires the X-Workspace-ID header up front, but kept to preserve the
// original code path 1:1 in case the requirement is relaxed.
func discoverHostPeer(ctx context.Context, c *gin.Context, targetID string) {
if url, err := db.GetCachedURL(ctx, targetID); err == nil {
c.JSON(http.StatusOK, gin.H{"id": targetID, "url": url})
return
}
var url sql.NullString
var status string
var forwardedTo sql.NullString
err := db.DB.QueryRowContext(ctx,
`SELECT url, status, forwarded_to FROM workspaces WHERE id = $1`, targetID,
).Scan(&url, &status, &forwardedTo)
if err == sql.ErrNoRows {
c.JSON(http.StatusNotFound, gin.H{"error": "workspace not found"})
return
}
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "lookup failed"})
return
}
// Follow forwarding chain (max 5 hops to prevent loops)
resolvedID := targetID
for i := 0; i < 5 && forwardedTo.Valid && forwardedTo.String != ""; i++ {
resolvedID = forwardedTo.String
err = db.DB.QueryRowContext(ctx,
`SELECT url, status, forwarded_to FROM workspaces WHERE id = $1`, resolvedID,
).Scan(&url, &status, &forwardedTo)
if err != nil {
break
}
}
if !url.Valid || url.String == "" {
c.JSON(http.StatusServiceUnavailable, gin.H{"error": "workspace has no URL", "status": status})
return
}
db.CacheURL(ctx, resolvedID, url.String)
c.JSON(http.StatusOK, gin.H{
"id": resolvedID,
"url": url.String,
"status": status,
})
}
// discoverWorkspacePeer handles the workspace-to-workspace branch of Discover —
// resolves an internal/Docker-routable URL for `targetID` from the perspective
// of `callerID` and writes the JSON response (or an appropriate 404/503 error).
func discoverWorkspacePeer(ctx context.Context, c *gin.Context, callerID, targetID string) {
var wsName, wsRuntime string
db.DB.QueryRowContext(ctx, `SELECT COALESCE(name,''), COALESCE(runtime,'langgraph') FROM workspaces WHERE id = $1`, targetID).Scan(&wsName, &wsRuntime)
// External workspaces: return their registered URL.
// Rewrite 127.0.0.1/localhost → host.docker.internal ONLY when the
// caller itself is a Docker container; a remote (external) caller
// lives on the other side of the wire and needs the URL as-is
// (localhost rewrites wouldn't resolve from its host anyway).
// Phase 30.6.
if wsRuntime == "external" {
if handled := writeExternalWorkspaceURL(ctx, c, callerID, targetID, wsName); handled {
return
}
}
// Try cached internal URL first
if internalURL, err := db.GetCachedInternalURL(ctx, targetID); err == nil && internalURL != "" {
c.JSON(http.StatusOK, gin.H{"id": targetID, "url": internalURL, "name": wsName})
return
}
// Fallback: only synthesize a URL if the workspace exists and is online/degraded
var wsStatus string
dbErr := db.DB.QueryRowContext(ctx,
`SELECT status FROM workspaces WHERE id = $1`, targetID,
).Scan(&wsStatus)
if dbErr == nil && (wsStatus == "online" || wsStatus == "degraded") {
internalURL := provisioner.InternalURL(targetID)
if cacheErr := db.CacheInternalURL(ctx, targetID, internalURL); cacheErr != nil {
log.Printf("Discovery: failed to cache internal URL for %s: %v", targetID, cacheErr)
}
c.JSON(http.StatusOK, gin.H{"id": targetID, "url": internalURL, "name": wsName})
return
}
// Workspace is not reachable — don't fall through to host URL path
if dbErr == nil {
c.JSON(http.StatusServiceUnavailable, gin.H{"error": "workspace not available", "status": wsStatus})
} else {
c.JSON(http.StatusNotFound, gin.H{"error": "workspace not found"})
}
}
// writeExternalWorkspaceURL resolves the registered URL for an external-runtime
// target and writes the response. Returns true when a response was written
// (URL present); returns false when the external workspace has no URL on
// file, leaving the caller to fall through to the internal-URL path.
func writeExternalWorkspaceURL(ctx context.Context, c *gin.Context, callerID, targetID, wsName string) bool {
var wsURL string
db.DB.QueryRowContext(ctx, `SELECT COALESCE(url,'') FROM workspaces WHERE id = $1`, targetID).Scan(&wsURL)
if wsURL == "" {
return false
}
outURL := wsURL
var callerRuntime string
db.DB.QueryRowContext(ctx, `SELECT COALESCE(runtime,'langgraph') FROM workspaces WHERE id = $1`, callerID).Scan(&callerRuntime)
if callerRuntime != "external" {
outURL = strings.Replace(outURL, "127.0.0.1", "host.docker.internal", 1)
outURL = strings.Replace(outURL, "localhost", "host.docker.internal", 1)
}
c.JSON(http.StatusOK, gin.H{"id": targetID, "url": outURL, "name": wsName})
return true
}
// Peers handles GET /registry/:id/peers
func (h *DiscoveryHandler) Peers(c *gin.Context) {
workspaceID := c.Param("id")
ctx := c.Request.Context()
// Phase 30.6 — the peer list leaks sibling identities and URLs.
// Require the bearer token bound to `workspaceID` before returning it.
// The caller HERE is identified by the URL path param, not a header,
// because `/registry/:id/peers` is scoped to "my own peers" — a
// workspace asking for its own view of the team.
if err := validateDiscoveryCaller(ctx, c, workspaceID); err != nil {
return // response already written
}
var parentID sql.NullString
err := db.DB.QueryRowContext(ctx, `SELECT parent_id FROM workspaces WHERE id = $1`, workspaceID).
Scan(&parentID)
if err == sql.ErrNoRows {
c.JSON(http.StatusNotFound, gin.H{"error": "workspace not found"})
return
}
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "lookup failed"})
return
}
var peers []map[string]interface{}
// Siblings
if parentID.Valid {
siblings, _ := queryPeerMaps(`
SELECT w.id, w.name, COALESCE(w.role, ''), w.tier, w.status,
COALESCE(w.agent_card, 'null'::jsonb), COALESCE(w.url, ''),
w.parent_id, w.active_tasks
FROM workspaces w WHERE w.parent_id = $1 AND w.id != $2 AND w.status != 'removed'`,
parentID.String, workspaceID)
peers = append(peers, siblings...)
} else {
siblings, _ := queryPeerMaps(`
SELECT w.id, w.name, COALESCE(w.role, ''), w.tier, w.status,
COALESCE(w.agent_card, 'null'::jsonb), COALESCE(w.url, ''),
w.parent_id, w.active_tasks
FROM workspaces w WHERE w.parent_id IS NULL AND w.id != $1 AND w.status != 'removed'`,
workspaceID)
peers = append(peers, siblings...)
}
// Children
children, _ := queryPeerMaps(`
SELECT w.id, w.name, COALESCE(w.role, ''), w.tier, w.status,
COALESCE(w.agent_card, 'null'::jsonb), COALESCE(w.url, ''),
w.parent_id, w.active_tasks
FROM workspaces w WHERE w.parent_id = $1 AND w.status != 'removed'`, workspaceID)
peers = append(peers, children...)
// Parent
if parentID.Valid {
parent, _ := queryPeerMaps(`
SELECT w.id, w.name, COALESCE(w.role, ''), w.tier, w.status,
COALESCE(w.agent_card, 'null'::jsonb), COALESCE(w.url, ''),
w.parent_id, w.active_tasks
FROM workspaces w WHERE w.id = $1 AND w.status != 'removed'`, parentID.String)
peers = append(peers, parent...)
}
if peers == nil {
peers = make([]map[string]interface{}, 0)
}
c.JSON(http.StatusOK, peers)
}
// queryPeerMaps returns clean JSON-serializable maps instead of Workspace structs.
func queryPeerMaps(query string, args ...interface{}) ([]map[string]interface{}, error) {
rows, err := db.DB.Query(query, args...)
if err != nil {
log.Printf("queryPeerMaps error: %v", err)
return nil, err
}
defer rows.Close()
var result []map[string]interface{}
for rows.Next() {
var id, name, role, status, url string
var tier, activeTasks int
var parentID *string
var agentCard []byte
err := rows.Scan(&id, &name, &role, &tier, &status, &agentCard, &url, &parentID, &activeTasks)
if err != nil {
log.Printf("queryPeerMaps scan error: %v", err)
continue
}
peer := map[string]interface{}{
"id": id,
"name": name,
"tier": tier,
"status": status,
"url": url,
"parent_id": parentID,
"active_tasks": activeTasks,
}
if role != "" {
peer["role"] = role
} else {
peer["role"] = nil
}
if len(agentCard) > 0 && string(agentCard) != "null" {
peer["agent_card"] = json.RawMessage(agentCard)
} else {
peer["agent_card"] = nil
}
result = append(result, peer)
}
return result, nil
}
// CheckAccess handles POST /registry/check-access
func (h *DiscoveryHandler) CheckAccess(c *gin.Context) {
var payload struct {
CallerID string `json:"caller_id" binding:"required"`
TargetID string `json:"target_id" binding:"required"`
}
if err := c.ShouldBindJSON(&payload); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
allowed := registry.CanCommunicate(payload.CallerID, payload.TargetID)
c.JSON(http.StatusOK, gin.H{"allowed": allowed})
}
// validateDiscoveryCaller enforces the Phase 30.6 bearer-token contract
// on the discovery endpoints. Same lazy-bootstrap shape as the registry
// and secrets handlers: legacy workspaces with no tokens are grandfathered,
// workspaces with tokens must present a matching Bearer, token binding
// is strict (A's token cannot authenticate caller B).
//
// Fail-open on DB hiccups. Unlike secrets.Values (which returns plaintext
// secrets and must fail closed), discovery only exposes peer URLs that
// are already behind the existing `CanCommunicate` hierarchy check — a
// momentary DB outage shouldn't take agent-to-agent discovery offline.
func validateDiscoveryCaller(ctx context.Context, c *gin.Context, workspaceID string) error {
hasLive, err := wsauth.HasAnyLiveToken(ctx, db.DB, workspaceID)
if err != nil {
log.Printf("wsauth: discovery HasAnyLiveToken(%s) failed: %v — allowing request", workspaceID, err)
return nil
}
if !hasLive {
return nil // legacy / pre-upgrade
}
tok := wsauth.BearerTokenFromHeader(c.GetHeader("Authorization"))
if tok == "" {
c.JSON(http.StatusUnauthorized, gin.H{"error": "missing workspace auth token"})
return errors.New("missing token")
}
if err := wsauth.ValidateToken(ctx, db.DB, workspaceID, tok); err != nil {
c.JSON(http.StatusUnauthorized, gin.H{"error": "invalid workspace auth token"})
return err
}
return nil
}