From cad80ee73d534b28d022006b57d88360ebc83e9b Mon Sep 17 00:00:00 2001 From: "molecule-ai[bot]" <276602405+molecule-ai[bot]@users.noreply.github.com> Date: Tue, 21 Apr 2026 08:02:35 +0000 Subject: [PATCH] docs: add docs/research/cognee-architecture-deep-dive.md --- .../research/cognee-architecture-deep-dive.md | 65 +++++++++++++++++++ 1 file changed, 65 insertions(+) create mode 100644 content/docs/research/cognee-architecture-deep-dive.md diff --git a/content/docs/research/cognee-architecture-deep-dive.md b/content/docs/research/cognee-architecture-deep-dive.md new file mode 100644 index 0000000..a24469d --- /dev/null +++ b/content/docs/research/cognee-architecture-deep-dive.md @@ -0,0 +1,65 @@ +# Cognee Architecture Deep-Dive — Workspace Isolation + +**Date:** 2026-04-20 +**Issue:** Molecule-AI/molecule-core#1146 +**Research by:** Research Lead +**Status:** Complete + +--- + +## Executive Summary + +Cognee has **dataset-level isolation primitives** but **no storage-layer enforcement** and **no native `workspace_id` support** in its MCP tool interface. Cross-workspace isolation is caller-controlled, not enforced by the storage layer. + +--- + +## Isolation Layer Analysis + +| Layer | Mechanism | Enforced? | Risk | +|-------|-----------|-----------|------| +| Storage (Postgres) | No RLS, no schema namespacing | ❌ None | High | +| App — dataset | `dataset_name` passed per tool call | ⚠️ Caller-controlled | Medium | +| App — user | `get_default_user()` internal resolver only | ⚠️ Soft | Medium | +| MCP `workspace_id` param | Not present in cognee-mcp interface | ❌ N/A | High | + +--- + +## Key Findings + +1. **Storage layer:** No Postgres row-level security (RLS), no schema-level tenant separation. Any admin with DB access can read any tenant's data. + +2. **Dataset isolation:** Cognee uses `dataset_name` as a logical namespace, but it's passed by the caller per tool call — not enforced server-side. A misconfigured or malicious caller could read/write across datasets. + +3. **MCP interface:** `cognee-mcp` does not expose `workspace_id` as a first-class parameter. Workspaces would need to be mapped to dataset names externally. + +4. **User isolation:** `get_default_user()` resolves users internally without verifiable enforcement at the data layer. + +--- + +## Migration Implications + +Adopting Cognee as the memory substrate requires an **auth bridge**: + +- The bridge wraps cognee-mcp and injects `workspace_id` → `dataset_name` mapping +- All tool calls are routed through the bridge, which enforces tenant context +- Estimated effort: **~100–200 LOC** for the MCP proxy wrapper +- This is a pragmatic path — the bridge provides the isolation Cognee's storage layer lacks + +--- + +## Recommendation + +**Attempt the auth bridge prototype first (1–2 days of engineering):** +1. Build MCP proxy that maps workspace_id to dataset_name on each call +2. Validate that cross-workspace calls are correctly rejected +3. If clean → adopt Cognee for Phase 9 +4. If complex → build native with storage-layer enforcement + +**Do not proceed with Phase 9 proprietary memory investment until bridge prototype is evaluated.** + +--- + +## Sources + +- Cognee GitHub: https://github.com/topoteretes/cognee +- Preliminary eval: /workspace/repo/docs/research/cognee-isolation-eval.md