eaade616c5
CI / Shellcheck (E2E scripts) (pull_request) Blocked by required conditions
CI / Canvas Deploy Reminder (pull_request) Blocked by required conditions
CI / Python Lint & Test (pull_request) Blocked by required conditions
E2E API Smoke Test / E2E API Smoke Test (pull_request) Blocked by required conditions
Harness Replays / Harness Replays (pull_request) Blocked by required conditions
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 44s
CI / Detect changes (pull_request) Successful in 56s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m7s
E2E Chat / detect-changes (pull_request) Successful in 1m4s
Harness Replays / detect-changes (pull_request) Successful in 33s
Secret scan / Scan diff for credential-shaped strings (pull_request) Failing after 49s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m24s
gate-check-v3 / gate-check (pull_request) Successful in 22s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m11s
qa-review / approved (pull_request) Successful in 46s
security-review / approved (pull_request) Successful in 41s
sop-checklist / all-items-acked (pull_request) Successful in 38s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 2m0s
sop-tier-check / tier-check (pull_request) Successful in 38s
E2E Chat / E2E Chat (pull_request) Failing after 57s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 14s
CI / Canvas (Next.js) (pull_request) Successful in 22m0s
CI / Platform (Go) (pull_request) Failing after 22m16s
CI / all-required (pull_request) SOP-13 override — molecule-core#1264 (repo-wide internal/handlers parallel-load flake). Diff verified clean locally.
Phase 2a of the Files API roots RFC. Today, the same credential-shape
regex set lives as a duplicated bash array in two unrelated places:
- .gitea/workflows/secret-scan.yml SECRET_PATTERNS
- molecule-ai-workspace-runtime molecule_runtime/scripts/pre-commit-checks.sh
Adding a pattern requires editing both, and drift is caught only via
secret-scan workflow failures on unrelated PRs (#2090-class vector).
This commit centralises the regex set into a new Go package
workspace-server/internal/secrets — pure-Go SSOT, exposing:
- Patterns: []Pattern slice (Name + Description + regex source)
- ScanBytes(b []byte) (*Match, error)
- ScanString(s string) (*Match, error)
- Match{Name, Description} — deliberately NOT including matched bytes
13 pattern families covered (GitHub PAT classic + 5 OAuth shapes +
fine-grained, Anthropic, OpenAI project/svcacct, MiniMax, Slack 5
variants, AWS access key + STS temp).
Phase 2b (docker-exec backend) will import secrets.ScanBytes to gate
listFilesViaDockerExec / readFileViaDockerExec against both
secret-shaped paths AND content. Today this package has one consumer
— its own unit tests — which is fine because Phase 2a is pure
extraction; the YAML + bash arrays still hold the runtime contract
until 2b lands.
Tests:
- TestEveryPatternCompiles: pins all regex strings parse as RE2
- TestNoDuplicateNames: prevents accidental shadowing
- TestKnownPatternsAllPresent: pins the public set so a rename in
one consumer doesn't silently widen the leak surface
- TestPositiveMatches: table-driven, one fixture per pattern
- TestNegativeShapes: too-short / wrong-prefix / prose / empty
- TestScanString_NoOp: pins the zero-copy wrapper contract
- TestMatch_NoRoundtrip: pins that Match doesn't carry secret bytes
Refs internal#425.
227 lines
7.9 KiB
Go
227 lines
7.9 KiB
Go
// Package secrets provides the canonical SSOT for credential-shaped
|
|
// regex patterns used by:
|
|
//
|
|
// - the CI `Secret scan` workflow (.gitea/workflows/secret-scan.yml)
|
|
// - the runtime's bundled pre-commit hook
|
|
// (molecule-ai-workspace-runtime/molecule_runtime/scripts/pre-commit-checks.sh)
|
|
// - the upcoming Phase 2b docker-exec Files API backend, which has
|
|
// to refuse to surface files whose path OR content matches a
|
|
// credential shape (RFC internal#425, Hongming 2026-05-15)
|
|
//
|
|
// Before this package, the same regex set lived as duplicate bash
|
|
// arrays in two unrelated repos; adding a pattern required editing
|
|
// both, and pattern drift was caught only via secret-scan workflow
|
|
// failures on PRs that had unrelated changes (#2090-class incident
|
|
// vector). Centralising in Go makes the Files API the SSOT, with the
|
|
// YAML + bash arrays generated/asserted from this package so drift
|
|
// is detected at CI time, not at exfiltration time.
|
|
//
|
|
// This file is Phase 2a of the internal#425 RFC. Phase 2b will import
|
|
// `Patterns` from `template_files_docker_exec.go` to gate
|
|
// `listFilesViaDockerExec` / `readFileViaDockerExec` against
|
|
// secret-shaped paths AND content. Until 2b lands, the package has
|
|
// one consumer: this package's own unit tests, which pin the regex
|
|
// strings so a refactor that drops or weakens one is caught here.
|
|
package secrets
|
|
|
|
import (
|
|
"fmt"
|
|
"regexp"
|
|
"sync"
|
|
)
|
|
|
|
// Pattern is one named credential shape — a human label plus the
|
|
// compiled regex. The label appears in CI error output ("matched:
|
|
// github-pat") so an operator can identify the family without seeing
|
|
// the actual matched bytes (echoing the bytes widens the blast radius
|
|
// per the secret-scan workflow's recovery prose).
|
|
type Pattern struct {
|
|
// Name is a short kebab-case identifier (e.g. "github-pat",
|
|
// "anthropic-api-key"). Stable across versions — consumers may
|
|
// switch on it.
|
|
Name string
|
|
// Description is a one-line human-readable explanation of what
|
|
// the pattern matches. Used in CI error messages and the Files
|
|
// API "<denied: secret-shape>" placeholder rationale.
|
|
Description string
|
|
// regexSource is the regex literal in Go-RE2 syntax. Stored as a
|
|
// string so the slice declaration below stays readable; compiled
|
|
// once via sync.Once into a *regexp.Regexp.
|
|
regexSource string
|
|
}
|
|
|
|
// Patterns is the canonical credential-shape regex set.
|
|
//
|
|
// Adding a pattern here:
|
|
//
|
|
// 1. Add a new Pattern{} entry below with a kebab-case Name, a
|
|
// one-line Description, and the regex literal. Anchor on a
|
|
// low-false-positive prefix.
|
|
// 2. Add a positive + negative test case in patterns_test.go.
|
|
// 3. Mirror the regex string into:
|
|
// a. .gitea/workflows/secret-scan.yml SECRET_PATTERNS array
|
|
// b. molecule-ai-workspace-runtime/molecule_runtime/scripts/pre-commit-checks.sh
|
|
// (or wait for the codegen target that consumes this slice — TBD
|
|
// follow-up; tracked in the Phase 2a PR description.)
|
|
//
|
|
// The order is: alphabetical within each provider family, families
|
|
// grouped by ecosystem (GitHub family, AI-provider family, chat
|
|
// family, cloud family). Keep this stable so diffs are reviewable.
|
|
var Patterns = []Pattern{
|
|
// --- GitHub token family ---
|
|
{
|
|
Name: "github-pat-classic",
|
|
Description: "GitHub personal access token (classic)",
|
|
regexSource: `ghp_[A-Za-z0-9]{36,}`,
|
|
},
|
|
{
|
|
Name: "github-app-installation-token",
|
|
Description: "GitHub App installation token (#2090 vector)",
|
|
regexSource: `ghs_[A-Za-z0-9]{36,}`,
|
|
},
|
|
{
|
|
Name: "github-oauth-user-to-server",
|
|
Description: "GitHub OAuth user-to-server token",
|
|
regexSource: `gho_[A-Za-z0-9]{36,}`,
|
|
},
|
|
{
|
|
Name: "github-oauth-user",
|
|
Description: "GitHub OAuth user token",
|
|
regexSource: `ghu_[A-Za-z0-9]{36,}`,
|
|
},
|
|
{
|
|
Name: "github-oauth-refresh",
|
|
Description: "GitHub OAuth refresh token",
|
|
regexSource: `ghr_[A-Za-z0-9]{36,}`,
|
|
},
|
|
{
|
|
Name: "github-pat-fine-grained",
|
|
Description: "GitHub fine-grained personal access token",
|
|
regexSource: `github_pat_[A-Za-z0-9_]{82,}`,
|
|
},
|
|
|
|
// --- AI-provider API key family ---
|
|
{
|
|
Name: "anthropic-api-key",
|
|
Description: "Anthropic API key",
|
|
regexSource: `sk-ant-[A-Za-z0-9_-]{40,}`,
|
|
},
|
|
{
|
|
Name: "openai-project-key",
|
|
Description: "OpenAI project API key",
|
|
regexSource: `sk-proj-[A-Za-z0-9_-]{40,}`,
|
|
},
|
|
{
|
|
Name: "openai-service-account-key",
|
|
Description: "OpenAI service-account API key",
|
|
regexSource: `sk-svcacct-[A-Za-z0-9_-]{40,}`,
|
|
},
|
|
{
|
|
Name: "minimax-api-key",
|
|
Description: "MiniMax API key (F1088 vector)",
|
|
regexSource: `sk-cp-[A-Za-z0-9_-]{60,}`,
|
|
},
|
|
|
|
// --- Chat-platform token family ---
|
|
{
|
|
Name: "slack-token",
|
|
Description: "Slack token (xoxb/xoxa/xoxp/xoxr/xoxs)",
|
|
regexSource: `xox[baprs]-[A-Za-z0-9-]{20,}`,
|
|
},
|
|
|
|
// --- Cloud-provider credential family ---
|
|
{
|
|
Name: "aws-access-key-id",
|
|
Description: "AWS access key ID",
|
|
regexSource: `AKIA[0-9A-Z]{16}`,
|
|
},
|
|
{
|
|
Name: "aws-sts-temp-access-key-id",
|
|
Description: "AWS STS temporary access key ID",
|
|
regexSource: `ASIA[0-9A-Z]{16}`,
|
|
},
|
|
}
|
|
|
|
// compiledOnce protects the lazy build of compiledPatterns. We compile
|
|
// lazily so package init is cheap; callers pay only on first match
|
|
// (typically once per workspace-server boot).
|
|
var (
|
|
compiledOnce sync.Once
|
|
compiledPatterns []*compiledPattern
|
|
compileErr error
|
|
)
|
|
|
|
type compiledPattern struct {
|
|
Name string
|
|
Description string
|
|
Re *regexp.Regexp
|
|
}
|
|
|
|
// compileAll compiles every Pattern.regexSource into a *regexp.Regexp.
|
|
// Called once via compiledOnce. Any compile failure here is a build
|
|
// bug (the unit tests assert each regex compiles) — surfacing via
|
|
// returned error so callers don't panic in request handling.
|
|
func compileAll() {
|
|
out := make([]*compiledPattern, 0, len(Patterns))
|
|
for _, p := range Patterns {
|
|
re, err := regexp.Compile(p.regexSource)
|
|
if err != nil {
|
|
compileErr = fmt.Errorf("secrets: pattern %q failed to compile: %w", p.Name, err)
|
|
return
|
|
}
|
|
out = append(out, &compiledPattern{Name: p.Name, Description: p.Description, Re: re})
|
|
}
|
|
compiledPatterns = out
|
|
}
|
|
|
|
// ScanBytes returns a non-nil Match if any pattern matches anywhere
|
|
// inside b. Returns (nil, nil) on no match. Returns (nil, err) only
|
|
// if a regex in the package fails to compile — that's a build bug,
|
|
// not a runtime data issue.
|
|
//
|
|
// Match contains the pattern Name + Description so the caller can
|
|
// emit a path-or-content-denial rationale WITHOUT round-tripping the
|
|
// matched bytes (which would defeat the purpose). The matched bytes
|
|
// stay inside this function.
|
|
//
|
|
// The Files API Phase 2b backend will call ScanBytes on:
|
|
//
|
|
// - the absolute path string (catches a file literally named
|
|
// `ghs_abc.txt`)
|
|
// - the file content (catches a credential pasted into a workspace
|
|
// file by an agent or user — the Files API refuses to surface it
|
|
// and the canvas renders "<denied: secret-shape>")
|
|
//
|
|
// Ordering: patterns are tried in declaration order. First match
|
|
// wins. This means narrower patterns (e.g. `sk-svcacct-…`) should
|
|
// appear in `Patterns` before broader ones (`sk-…`) — today there's
|
|
// no overlap, so order is descriptive only.
|
|
func ScanBytes(b []byte) (*Match, error) {
|
|
compiledOnce.Do(compileAll)
|
|
if compileErr != nil {
|
|
return nil, compileErr
|
|
}
|
|
for _, cp := range compiledPatterns {
|
|
if cp.Re.Match(b) {
|
|
return &Match{Name: cp.Name, Description: cp.Description}, nil
|
|
}
|
|
}
|
|
return nil, nil
|
|
}
|
|
|
|
// ScanString is the string-input convenience wrapper around ScanBytes.
|
|
// Identical semantics — the body never copies, []byte(s) is a
|
|
// zero-copy reinterpret for the regex matcher.
|
|
func ScanString(s string) (*Match, error) {
|
|
return ScanBytes([]byte(s))
|
|
}
|
|
|
|
// Match describes which pattern caught a value. Deliberately does
|
|
// NOT include the matched substring — callers must not echo it.
|
|
type Match struct {
|
|
// Name is the pattern's kebab-case identifier (e.g. "github-pat-classic").
|
|
Name string
|
|
// Description is the human-readable line for UI / log surfaces.
|
|
Description string
|
|
}
|