molecule-core/workspace-server/cmd/server/dotenv.go
Hongming Wang 9a223afba1 fix(dotenv,socket): review-driven hardening of .env loader + WS poll
Independent code review surfaced three required fixes and one cheap
optional one. All addressed here.

dotenv parser:
- `export FOO=bar` was parsed as key `"export FOO"` (with embedded
  space) and silently os.Setenv'd, so a developer pasting from a
  direnv `.envrc` would get junk vars. Now strips the prefix.
- Quoted values weren't unwrapped: `FOO="hello world"` produced value
  `"hello world"` with literal quotes. Now strips one matched pair of
  surrounding `"` or `'`. Inside a quoted value `#` is part of the
  value, not a comment marker (matches godotenv convention).
- UTF-8 BOM at file start (Windows editors) would have produced a
  first key like U+FEFF + "FOO". Now stripped via TrimPrefix.

dotenv loader:
- findDotEnv()'s upward walk would happily pick up `~/.env` or a
  sibling-repo `.env` if the binary was run from `~/Documents/other-
  project/`. Real foot-gun on shared dev boxes. Now gated on a
  monorepo sentinel: the candidate directory must contain
  `workspace-server/go.mod`. Falls through to "no .env found" (=
  pre-fix behavior) when the sentinel is absent.

socket fallback poll:
- startFallbackPoll() previously fired only on onclose, so the very
  first connect attempt — when onclose hasn't fired yet because we
  never had a successful onopen — left the canvas with no HTTP poll
  for the duration of the failing handshake (Chrome can hold a
  SYN-SENT WebSocket open ~75s before giving up). Now also called at
  the top of connect(); the timer-already-running guard makes it a
  no-op when one cycle later onclose calls it again.

Test coverage added: export prefix, single+double quoted values, hash
inside quotes preserved, unterminated quote falls back to bare value,
CRLF stripping locked in, BOM stripping, and a sentinel-rejection
regression test that creates a temp .env with no workspace-server
sibling and asserts findDotEnv refuses to load it.

Verified: 985 canvas tests + 30 dotenv subtests + 4 dotenv integration
tests all pass; tsc clean; rebuilt platform from monorepo root with
stripped env still loads .env (49 vars) and /workspaces returns 200.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 21:09:18 -07:00

170 lines
6.0 KiB
Go

package main
import (
"bufio"
"log"
"os"
"path/filepath"
"strings"
)
// loadDotEnvIfPresent walks upward from CWD looking for a .env file and
// merges its KEY=VALUE pairs into the process environment. Already-set
// vars (e.g. from `docker run -e`, CI exports, or ad-hoc `KEY=val
// ./binary`) win over file values so operators can override without
// editing the file.
//
// Why walk upward: the binary may be launched from the monorepo root,
// the workspace-server subdir, or anywhere else the operator finds
// convenient. Walking upward from CWD finds the canonical .env
// (gitignored, lives at the monorepo root) regardless of cwd, so a
// fresh `go build -o /tmp/molecule-server ./cmd/server && /tmp/molecule-server`
// from any subdir picks up the same MOLECULE_ENV / DATABASE_URL / etc.
// the operator already has — without sourcing or `set -a`.
//
// Why no godotenv dep: the format we use is simple — KEY=VALUE with
// optional `#` comments and no interpolation — so a tiny in-tree parser
// is auditable, has no supply-chain surface, and avoids drift across
// repos where some teams configure godotenv differently.
//
// Why it's safe in production: the Dockerfile does not COPY .env into
// the image and `.env` is gitignored, so production containers have no
// .env on disk to load. If an operator goes out of their way to put one
// there, the explicit-env-wins rule above means container env still
// dominates.
func loadDotEnvIfPresent() {
path, ok := findDotEnv()
if !ok {
return
}
f, err := os.Open(path)
if err != nil {
log.Printf(".env: open %s: %v (skipping)", path, err)
return
}
defer f.Close()
loaded := 0
skipped := 0
scanner := bufio.NewScanner(f)
for scanner.Scan() {
k, v, ok := parseDotEnvLine(scanner.Text())
if !ok {
continue
}
if _, exists := os.LookupEnv(k); exists {
skipped++
continue
}
if err := os.Setenv(k, v); err != nil {
log.Printf(".env: set %s: %v", k, err)
continue
}
loaded++
}
if err := scanner.Err(); err != nil {
log.Printf(".env: scan %s: %v", path, err)
}
log.Printf(".env: %s — loaded %d, %d already set in env", path, loaded, skipped)
}
// findDotEnv returns the path of the nearest .env file walking upward
// from CWD. Capped at 6 levels so a deeply-nested launch dir doesn't
// scan the entire filesystem.
//
// Sentinel gate: only accept a .env that sits next to `workspace-server/`
// (the monorepo marker). Without it, a developer running the binary from
// `~/Documents/other-project/` would walk up to `~/.env` and load
// arbitrary variables — a real foot-gun on shared dev machines and a
// possible information-leak vector on bare-metal deploys. Skipping the
// match falls through to "no .env found" which is identical to today's
// pre-fix behavior (the operator must export env explicitly).
func findDotEnv() (string, bool) {
dir, err := os.Getwd()
if err != nil {
return "", false
}
for i := 0; i < 6; i++ {
p := filepath.Join(dir, ".env")
if st, err := os.Stat(p); err == nil && !st.IsDir() {
if isMonorepoRoot(dir) {
return p, true
}
// .env exists here but the directory isn't the monorepo
// root — keep walking. Loading it could clobber
// environment with values from an unrelated project.
}
parent := filepath.Dir(dir)
if parent == dir {
break
}
dir = parent
}
return "", false
}
// isMonorepoRoot returns true if `dir` looks like the molecule-core
// monorepo root — the directory that owns the .env we want to load.
// The marker is `workspace-server/go.mod`, which is the canonical
// in-tree go module and exists only in this monorepo. A simple
// `workspace-server/` directory check would false-positive on a fork
// that renamed the dir; the go.mod check is more precise.
func isMonorepoRoot(dir string) bool {
st, err := os.Stat(filepath.Join(dir, "workspace-server", "go.mod"))
return err == nil && !st.IsDir()
}
// parseDotEnvLine parses a single .env line. Returns (key, value, true)
// for KEY=VALUE pairs. Returns (_, _, false) for blanks, comments, and
// malformed lines. Handles:
// - leading `export ` prefix (so shell-friendly .env files written
// for `source .env` or direnv work without modification)
// - leading UTF-8 BOM on the first line (Windows editors)
// - inline `# comment` after a value when preceded by whitespace
// - surrounding `"` or `'` quotes on the value (stripped one matched
// pair); inside a quoted value, `#` is part of the value, not a
// comment marker
func parseDotEnvLine(line string) (string, string, bool) {
// Strip a UTF-8 BOM if present. bufio.Scanner doesn't filter it,
// so the very first line of a Windows-edited .env would otherwise
// produce a key like U+FEFF + "FOO" that os.Setenv silently accepts.
line = strings.TrimPrefix(line, "\ufeff")
line = strings.TrimSpace(line)
if line == "" || strings.HasPrefix(line, "#") {
return "", "", false
}
// Drop a leading `export ` so lines like `export FOO=bar` (the
// form direnv and many `.env` templates emit) don't end up as a
// junk key with an embedded space.
line = strings.TrimPrefix(line, "export ")
line = strings.TrimLeft(line, " \t") // re-trim in case `export` itself had trailing space
eq := strings.IndexByte(line, '=')
if eq <= 0 {
return "", "", false
}
k := strings.TrimSpace(line[:eq])
v := line[eq+1:]
// Quoted value: strip one matched pair of surrounding quotes and
// take the contents verbatim (no inline-comment splitting). Matches
// the godotenv convention so values with leading/trailing spaces or
// `#` survive round-trip.
v = strings.TrimLeft(v, " \t")
if len(v) >= 2 {
first := v[0]
if (first == '"' || first == '\'') && v[len(v)-1] == first {
return k, v[1 : len(v)-1], true
}
}
// Bare value: strip inline comment introduced by whitespace + `#`.
// A bare `#` inside the value (no preceding whitespace) is part of
// the value — matches dotenv parsers and lets `KEY=token#fragment`
// round-trip.
for _, sep := range []string{" #", "\t#"} {
if i := strings.Index(v, sep); i >= 0 {
v = v[:i]
break
}
}
return k, strings.TrimSpace(v), true
}