molecule-ai/molecule-core

Fork 2

documentation-specialist 26afbbfdf4

Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s

Details

CI / Detect changes (pull_request) Successful in 5s

Details

E2E API Smoke Test / detect-changes (pull_request) Successful in 6s

Details

E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s

Details

Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s

Details

Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s

Details

Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s

Details

CI / Platform (Go) (pull_request) Successful in 3s

Details

CI / Canvas (Next.js) (pull_request) Successful in 5s

Details

CI / Python Lint & Test (pull_request) Successful in 3s

Details

E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4s

Details

CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s

Details

CI / Canvas Deploy Reminder (pull_request) Has been skipped

Details

E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s

Details

Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s

Details

Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s

Details

Ops Scripts Tests / Ops scripts (unittest) (pull_request) Failing after 12s

Details

CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 51s

Details

CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m20s

Details

CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m20s

Details

docs(internal): bulk-sed molecule-core .md docs → Gitea (#37 final molecule-core sweep)

Mass-sed across 17 files / 38 active refs in molecule-core .md docs
(README + CONTRIBUTING + docs/architecture/ + docs/blog/ + docs/guides/
+ docs/integrations/ + docs/quickstart.md + scripts/README.md).

Driver: /tmp/sweep_core.py — same pattern set as the
internal-marketing bulk-sed (PR #50). 4 url-substitution patterns +
SKIP_PATTERN preserves /pull/<n> /issues/<n> /commit/<sha>
/releases/... historical refs.

Files NOT touched in this PR:
- docs/workspace-runtime-package.md — owned by molecule-core#15
  (workspace-runtime source-edit per #41). Reverted my bulk-sed of
  that file to avoid merge conflict.
- 2 Go-import-path refs in docs/memory-plugins/testing-your-plugin.md
  (github.com/Molecule-AI/molecule-monorepo/platform/internal/...) —
  Q5 cross-repo Go-module migration territory.
- 1 GitHub Gist link in docs/guides/external-workspace-quickstart.md
  (gist.github.com/molecule-ai/...) — no Gitea equivalent;
  consistent with the same handling in docs#1.

Manual fixes (2):
- docs/blog/2026-04-20-chrome-devtools-mcp-seo/index.md:306 —
  GitHub Discussions (no Gitea equivalent) → issue tracker link
- docs/guides/external-workspace-quickstart.md:218 — tracking-issue
  ?q= query-string url (regex didn't catch) → reformulated text +
  Gitea search-by-query approach

Pattern matches my docs#1 (public docs site) PR + internal#50
(internal/marketing bulk-sed). Standard substitutions:
- https://github.com/Molecule-AI/<repo> → https://git.moleculesai.app/molecule-ai/<repo>
- /blob/<branch>/ + /tree/<branch>/ → /src/branch/<branch>/

Refs: molecule-ai/internal#37, molecule-ai/internal#38

2026-05-07 01:27:50 -07:00

14 KiB

Raw Blame History

title

date

slug

description

Give Your AI Agent a Real Browser: MCP + Chrome DevTools

Most AI agents hit the same wall: they can reason, plan, and call APIs — but the moment a task requires clicking through a website, filling a form, or reading a page that has no API, they're stuck.

The fix is giving your agent a real browser. Not a screenshot API, not a Playwright script written by a human. A browser your AI agent controls itself — deciding when to navigate, extract, and interact, the same way a human would.

The Model Context Protocol (MCP) is the bridge. It gives AI models a standardized interface to call browser tools — not buried in a prompt, but as first-class, typed tool calls. Chrome DevTools Protocol (CDP) is the engine: the same underlying protocol that powers Chrome DevTools, Puppeteer, and Playwright, exposed directly to your agent.

This post shows how it works end-to-end — with working Python code and a complete example you can run today.

Why MCP for Browser Automation

Before MCP, connecting an AI agent to a browser meant one of two paths:

Path 1: Custom wrapper scripts. You write Python functions that call Puppeteer or Playwright, expose them via a prompt, and hope the model routes tool calls correctly. It works in demos. It breaks in production when the prompt drifts or the tool schema is ambiguous.

Path 2: SaaS browser APIs. Services like Browserbase or Steel provide managed browser infrastructure, but they add a dependency, a pricing tier, and a network hop between your agent and the browser. For teams already self-hosting or using Molecule AI, it's the wrong direction.

MCP solves both problems. It gives you:

Typed tool definitions — your agent sees browser_navigate, dom_query, page_screenshot with JSON Schema inputs, not raw Python function names buried in a system prompt.
Streaming tool calls — long-running browser operations (page loads, form submissions) stream progress back without blocking the agent's reasoning loop.
Session persistence — CDP sessions maintain browser state (cookies, localStorage, scroll position) across tool calls, so your agent isn't starting from a blank page every turn.

Compare that to the alternatives:

LangChain agents can call Playwright — but you manage session state, handle Playwright timeouts in your prompt, and debug failures by reading through a tangled chain of decorator-wrapped functions. CrewAI's browser tools are tool_USE wrappers, not agent-native — the agent sees them as function calls but can't introspect browser state between steps.

With Molecule AI and MCP, the browser is a first-class citizen in the agent's tool context. The agent sees the browser session as a live state — it can navigate, query, screenshot, and wait without a human manually sequencing the steps.

Infrastructure comparison:

Approach	Setup effort	Session management	Cost
Custom Puppeteer/Playwright	High — you write and maintain the wrapper	DIY	Free (your infra)
Browserbase / Steel (SaaS)	Low	Managed	Per-session pricing
Molecule AI + MCP	Low — built into the workspace	Agent-native	Free (self-hosted) or standard Molecule AI tier

Molecule AI workspaces ship MCP browser tools as part of the standard runtime. If you're already on Molecule AI, browser automation is available — you configure which tools the agent can access, not how they work.

The Chrome DevTools Protocol + MCP Bridge

Chrome ships with a built-in remote debugging interface: the Chrome DevTools Protocol (CDP). It's the same protocol that Chrome DevTools, Puppeteer, and Playwright are built on. CDP exposes browser functionality over a WebSocket connection as JSON-RPC 2.0 commands across a set of domains:

Domain	What it does
`Page`	Navigate, reload, capture screenshots
`DOM`	Query and traverse the DOM tree
`Runtime`	Execute JavaScript in the page context
`Network`	Inspect and intercept network requests
`Input`	Dispatch mouse and keyboard events

An MCP server that bridges to CDP maps these domains onto MCP tool definitions. The result: your AI agent calls browser_navigate and the MCP server translates it to a Page.navigate CDP command over WebSocket.

The tool schema looks like this:

{
  "name": "browser_navigate",
  "description": "Navigate to a URL in the headless Chrome session",
  "inputSchema": {
    "type": "object",
    "properties": {
      "url": { "type": "string", "description": "The URL to navigate to" }
    },
    "required": ["url"]
  }
}

{
  "name": "dom_query",
  "description": "Query the DOM using a CSS selector",
  "inputSchema": {
    "type": "object",
    "properties": {
      "selector": { "type": "string", "description": "CSS selector" }
    }
  }
}

{
  "name": "page_screenshot",
  "description": "Capture a screenshot of the current page",
  "inputSchema": {
    "type": "object",
    "properties": {
      "fullPage": { "type": "boolean", "description": "Capture the full scrollable page", "default": false }
    }
  }
}

The MCP server handles the WebSocket lifecycle, CDP command dispatch, and response parsing. Your agent code stays clean.

Full Code Example: AI Agent That Researches Competitors

Here's a complete example using Molecule AI's Python SDK. The agent's task: go to a competitor's pricing page, extract the plan names and prices, and save a screenshot.

from molecule_ai import Agent, MCPToolset
from browser_mcp import ChromeDevToolsMCP  # your MCP server

# Start the CDP session — connects to Chrome's remote debugging port
browser = ChromeDevToolsMCP(debugging_port=9222)

# Attach browser tools as MCP tools on the agent
agent = Agent(
    system_prompt="You are a competitive research assistant. "
                  "Use the browser tools to gather data.",
    mcp_tools=browser.tools(),   # fetches tools via MCP manifest
)

# Run the task
result = agent.run(
    "Go to https://example-competitor.com/pricing, extract all plan "
    "names and monthly prices, then save a screenshot of the page."
)

print(result.final_output)

Behind the scenes, the tool call cycle looks like this:

Agent → MCP invoke: browser_navigate { url: "https://example-competitor.com/pricing" }
MCP Server → CDP command: Page.navigate { url: "https://example-competitor.com/pricing" }
CDP → Page.loadEventFired event (streamed back)
Agent → MCP invoke: dom_query { selector: ".pricing-plan, [data-plan]" }
Agent → MCP invoke: page_screenshot { fullPage: false }
Agent → MCP invoke: browser_navigate { url: "about:blank" }  # cleanup

Each step is a structured tool call with typed inputs. The agent's prompt never mentions websocket, JSON-RPC, or port 9222. The MCP abstraction hides the infrastructure.

Setting Up Chrome for Remote Debugging

To use CDP, start Chrome with the remote debugging port open:

# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
  --remote-debugging-port=9222 \
  --user-data-dir=/tmp/chrome-debug

# Linux
google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug

# Windows
chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\tmp\chrome-debug"

Or launch a headless instance:

google-chrome \
  --headless \
  --remote-debugging-port=9222 \
  --user-data-dir=/tmp/chrome-headless

Make sure no other Chrome instance is already using port 9222 on your machine.

The MCP Server: Minimal Implementation

If you want to roll your own MCP-to-CDP bridge (or understand what browser_mcp is doing above), here's the core of it:

import json
import asyncio
import websockets

class ChromeDevToolsMCP:
    def __init__(self, debugging_port: int = 9222):
        self.ws_url = f"ws://localhost:{debugging_port}/devtools/browser"
        self._session_id: str | None = None
        self._ws: websockets.WebSocketClientProtocol | None = None

    async def __aenter__(self):
        self._ws = await websockets.connect(self.ws_url)
        # Create a new browser session
        resp = await self._send("Target.createBrowserContext")
        self._session_id = resp["browserContextId"]
        return self

    async def __aexit__(self, *args):
        if self._ws:
            await self._ws.close()

    async def _send(self, method: str, params: dict = None) -> dict:
        """Send a CDP command and wait for the response."""
        await self._ws.send(json.dumps({
            "id": 1,
            "method": method,
            "params": params or {},
        }))
        raw = await self._ws.recv()
        return json.loads(raw)

    def tools(self) -> list[dict]:
        """Return MCP tool definitions for this server."""
        return [
            {
                "name": "browser_navigate",
                "description": "Navigate to a URL",
                "inputSchema": {
                    "type": "object",
                    "properties": {
                        "url": {"type": "string", "format": "uri"}
                    },
                    "required": ["url"]
                },
                "handler": self._navigate,
            },
            {
                "name": "page_screenshot",
                "description": "Capture a screenshot",
                "inputSchema": {
                    "type": "object",
                    "properties": {
                        "fullPage": {"type": "boolean", "default": False}
                    }
                },
                "handler": self._screenshot,
            },
        ]

    async def _navigate(self, url: str) -> str:
        resp = await self._send("Page.navigate", {"url": url})
        return f"Navigated. FrameId: {resp.get('frameId')}"

    async def _screenshot(self, fullPage: bool = False) -> str:
        # Enable screenshot domain first
        await self._send("Page.enable")
        resp = await self._send("Page.captureScreenshot", {
            "format": "png",
            "fullPage": fullPage,
        })
        return f"screenshot:{resp['data']}"  # base64-encoded PNG

This is deliberately minimal — it shows the shape of the bridge without error handling, tab management, or the full CDP command surface. Production MCP servers (including Molecule AI's built-in browser tools) handle all of that.

Real-World Use Cases

Browser automation via MCP isn't just a demo trick. Here are the production use cases teams are already running:

Competitive intelligence pipelines. An agent that visits a competitor's site weekly, extracts pricing and feature data, and writes a diff summary to a Notion page. No Puppeteer scripts to maintain — the agent updates the extraction logic itself when the competitor redesigns.

AI-assisted data entry. An agent that receives a spreadsheet row, navigates to a web form, fills it in, and submits. Particularly useful for legacy systems that only have a web UI and no API.

Automated UI regression testing. Instead of writing Playwright test scripts that break on every CSS change, describe the expected state in natural language. The agent uses dom_query and page_screenshot to verify the UI matches your specification.

Real-time price and availability monitoring. An agent that polls a retail or ticketing site, captures a screenshot on price change, and sends a Slack alert. Runs on a schedule or triggers from a webhook.

All four of these work with the same MCP toolset — the agent's reasoning layer is identical; only the task description changes.

Compare this to n8n workflows: a human manually wires together a sequence of browser nodes — open tab, wait, click, extract, close. Molecule AI agents decide that sequence at runtime. When a competitor's page changes, the agent adapts the extraction strategy itself rather than waiting for a human to redraw the workflow.

Getting Started with Molecule AI

To use browser automation in a Molecule AI workspace, you connect your own MCP server (such as the ChromeDevToolsMCP shown above) using Molecule AI's built-in MCP tool registration. The platform handles the WebSocket lifecycle and tool call routing — you bring the browser logic.

Configure the MCP server URL in your workspace:

# Set your browser MCP server endpoint via the platform API
curl -X PATCH "${PLATFORM_URL}/workspaces/${WORKSPACE_ID}/config" \
  -H "Authorization: Bearer ${WORKSPACE_TOKEN}" \
  -d '{
    "mcp_servers": {
      "browser": {
        "type": "streamable_http",
        "url": "http://localhost:9223/mcp"
      }
    }
  }'

Or use the Canvas UI: Workspace → Config → MCP Servers → Add browser MCP server.

What Molecule AI provides: WebSocket routing, tool call auth, session lifecycle, and the A2A bridge so your agent sees browser tools as native workspace tools. You bring the CDP bridge (or use the ChromeDevToolsMCP example above).

Compare that to wiring Playwright into LangChain: you write async wrapper functions, handle page.goto() timeouts in the prompt, and debug failures by reading through decorator-stacked chain outputs. With Molecule AI and MCP, the browser is a first-class tool — typed, session-aware, and registered the same way as any other MCP tool.

→ MCP Server Setup Guide → Quickstart: Deploy your first AI agent

Try it free — Molecule AI is open source and self-hostable. Get a workspace running in under 5 minutes.

→ Get started on GitHub →

Have a browser automation use case you want to see covered? File an issue with the enhancement label on the molecule-core issue tracker.

14 KiB Raw Blame History

Give Your AI Agent a Real Browser: MCP + Chrome DevTools

Why MCP for Browser Automation

The Chrome DevTools Protocol + MCP Bridge

Full Code Example: AI Agent That Researches Competitors

Setting Up Chrome for Remote Debugging

The MCP Server: Minimal Implementation

Real-World Use Cases

Getting Started with Molecule AI

14 KiB

Raw Blame History