Hongming Wang 8417bce50d Memory v2 PR-10: operator docs for writing a custom memory plugin

Builds on merged PR-1..7 (PR-8 in queue). Pure docs; no code.

What ships:
  * docs/memory-plugins/README.md — contract overview, capability
    negotiation, deployment models, replacement workflow
  * docs/memory-plugins/testing-your-plugin.md — using the contract
    test harness to validate wire compatibility, what the harness
    DOES NOT cover (capability accuracy, TTL eviction, concurrency)
  * docs/memory-plugins/pinecone-example/README.md — worked example
    of a Pinecone-backed plugin: capability mapping (only embedding,
    no FTS), wire mapping (memory → vector + metadata), production-
    hardening checklist

Documentation strategy:
  * Lead with what workspace-server takes care of (security perimeter,
    redaction, ACL, GLOBAL audit, prompt-injection wrap) so plugin
    authors don't reimplement those layers
  * Show three deployment models (same machine / separate container /
    self-managed) so operators see their topology
  * Capability table makes it explicit what each capability gates so
    a plugin that supports only one (e.g. semantic search) is still
    a useful plugin
  * Pinecone example is honest: shows the skeleton, the wire mapping,
    and explicitly calls out what's MISSING from the sketch (batch
    commits, TTL janitor, circuit breaker, metrics)

2026-05-04 08:17:03 -07:00

3.6 KiB

Raw Blame History

Pinecone-backed Memory Plugin (worked example)

A working sketch of a memory plugin that delegates storage to Pinecone instead of postgres.

This is example code, not a production binary. It demonstrates how to map the v1 contract onto a vector database. Operators who want to ship this would harden auth, add retries, batch the commit path, etc.

Why Pinecone is interesting

The default postgres plugin's pgvector index works for ~10M memories on a single node. Beyond that, semantic search becomes painful. A managed vector database can handle 1B+ memories, but the trade-offs are different:

Capabilities: Pinecone is great at embedding (its core feature) but has no first-class FTS. So the plugin reports ["embedding"] and ignores the query field.
TTL: Pinecone supports per-vector metadata with deletion via metadata filter — TTL becomes a periodic janitor task, not a per-row property.
Cost: per-vector billing, so the plugin should batch writes and dedup before posting.

Wire mapping

Contract field	Pinecone shape
`namespace`	`namespace` (Pinecone's first-class concept)
`id`	`id`
`content`	metadata.text
`embedding`	`values`
`kind` / `source` / `pin` / `expires_at`	`metadata.{kind, source, pin, expires_at}`
`propagation` (opaque JSON)	`metadata.propagation` (also opaque)

The contract's expires_at becomes a metadata field; a separate janitor cron periodically queries expires_at < now and deletes.

Skeleton

package main

import (
    "context"
    "encoding/json"
    "log"
    "net/http"
    "os"

    "github.com/pinecone-io/go-pinecone/pinecone"
)

type pineconePlugin struct {
    client *pinecone.Client
    index  string
}

func main() {
    apiKey := os.Getenv("PINECONE_API_KEY")
    if apiKey == "" {
        log.Fatal("PINECONE_API_KEY required")
    }
    client, err := pinecone.NewClient(pinecone.NewClientParams{ApiKey: apiKey})
    if err != nil {
        log.Fatal(err)
    }
    p := &pineconePlugin{client: client, index: os.Getenv("PINECONE_INDEX")}

    http.HandleFunc("/v1/health", p.health)
    http.HandleFunc("/v1/search", p.search)
    // ... rest of the routes ...

    log.Fatal(http.ListenAndServe(":9100", nil))
}

func (p *pineconePlugin) health(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(map[string]interface{}{
        "status":       "ok",
        "version":      "1.0.0",
        "capabilities": []string{"embedding"}, // no FTS, no TTL out-of-box
    })
}

func (p *pineconePlugin) search(w http.ResponseWriter, r *http.Request) {
    // Parse contract.SearchRequest
    // Build Pinecone QueryByVectorValuesRequest with body.Embedding
    // For each Pinecone namespace in body.Namespaces, call Query
    // Map results to contract.Memory
    // ...
}

What's missing from this sketch

A production-ready Pinecone plugin would add:

Batch commits: bulk upsert N memories in a single Pinecone call
TTL janitor: periodic deletion of expired vectors
Connection pooling: keep one Pinecone client alive across requests
Retry + circuit breaker: Pinecone occasionally returns 5xx
Metrics: latency histograms per endpoint, write/read counters

But the mapping above is the load-bearing part — the rest is operational hardening, not contract-specific.

3.6 KiB Raw Blame History