Mass-sed across 17 files / 38 active refs in molecule-core .md docs (README + CONTRIBUTING + docs/architecture/ + docs/blog/ + docs/guides/ + docs/integrations/ + docs/quickstart.md + scripts/README.md). Driver: /tmp/sweep_core.py — same pattern set as the internal-marketing bulk-sed (PR #50). 4 url-substitution patterns + SKIP_PATTERN preserves /pull/<n> /issues/<n> /commit/<sha> /releases/... historical refs. Files NOT touched in this PR: - docs/workspace-runtime-package.md — owned by molecule-core#15 (workspace-runtime source-edit per #41). Reverted my bulk-sed of that file to avoid merge conflict. - 2 Go-import-path refs in docs/memory-plugins/testing-your-plugin.md (github.com/Molecule-AI/molecule-monorepo/platform/internal/...) — Q5 cross-repo Go-module migration territory. - 1 GitHub Gist link in docs/guides/external-workspace-quickstart.md (gist.github.com/molecule-ai/...) — no Gitea equivalent; consistent with the same handling in docs#1. Manual fixes (2): - docs/blog/2026-04-20-chrome-devtools-mcp-seo/index.md:306 — GitHub Discussions (no Gitea equivalent) → issue tracker link - docs/guides/external-workspace-quickstart.md:218 — tracking-issue ?q= query-string url (regex didn't catch) → reformulated text + Gitea search-by-query approach Pattern matches my docs#1 (public docs site) PR + internal#50 (internal/marketing bulk-sed). Standard substitutions: - https://github.com/Molecule-AI/<repo> → https://git.moleculesai.app/molecule-ai/<repo> - /blob/<branch>/ + /tree/<branch>/ → /src/branch/<branch>/ Refs: molecule-ai/internal#37, molecule-ai/internal#38
14 KiB
| title | date | slug | description | tags | |||||
|---|---|---|---|---|---|---|---|---|---|
| Give Your AI Agent a Real Browser: MCP + Chrome DevTools | 2026-04-20 | browser-automation-ai-agents-mcp | Learn how to add browser automation to your AI agents using Chrome DevTools and the Model Context Protocol. Full Python code examples — no Puppeteer wrappers, no SaaS dependencies. |
|
Give Your AI Agent a Real Browser: MCP + Chrome DevTools
Most AI agents hit the same wall: they can reason, plan, and call APIs — but the moment a task requires clicking through a website, filling a form, or reading a page that has no API, they're stuck.
The fix is giving your agent a real browser. Not a screenshot API, not a Playwright script written by a human. A browser your AI agent controls itself — deciding when to navigate, extract, and interact, the same way a human would.
The Model Context Protocol (MCP) is the bridge. It gives AI models a standardized interface to call browser tools — not buried in a prompt, but as first-class, typed tool calls. Chrome DevTools Protocol (CDP) is the engine: the same underlying protocol that powers Chrome DevTools, Puppeteer, and Playwright, exposed directly to your agent.
This post shows how it works end-to-end — with working Python code and a complete example you can run today.
Why MCP for Browser Automation
Before MCP, connecting an AI agent to a browser meant one of two paths:
Path 1: Custom wrapper scripts. You write Python functions that call Puppeteer or Playwright, expose them via a prompt, and hope the model routes tool calls correctly. It works in demos. It breaks in production when the prompt drifts or the tool schema is ambiguous.
Path 2: SaaS browser APIs. Services like Browserbase or Steel provide managed browser infrastructure, but they add a dependency, a pricing tier, and a network hop between your agent and the browser. For teams already self-hosting or using Molecule AI, it's the wrong direction.
MCP solves both problems. It gives you:
- Typed tool definitions — your agent sees
browser_navigate,dom_query,page_screenshotwith JSON Schema inputs, not raw Python function names buried in a system prompt. - Streaming tool calls — long-running browser operations (page loads, form submissions) stream progress back without blocking the agent's reasoning loop.
- Session persistence — CDP sessions maintain browser state (cookies, localStorage, scroll position) across tool calls, so your agent isn't starting from a blank page every turn.
Compare that to the alternatives:
LangChain agents can call Playwright — but you manage session state, handle Playwright timeouts in your prompt, and debug failures by reading through a tangled chain of decorator-wrapped functions. CrewAI's browser tools are tool_USE wrappers, not agent-native — the agent sees them as function calls but can't introspect browser state between steps.
With Molecule AI and MCP, the browser is a first-class citizen in the agent's tool context. The agent sees the browser session as a live state — it can navigate, query, screenshot, and wait without a human manually sequencing the steps.
Infrastructure comparison:
| Approach | Setup effort | Session management | Cost |
|---|---|---|---|
| Custom Puppeteer/Playwright | High — you write and maintain the wrapper | DIY | Free (your infra) |
| Browserbase / Steel (SaaS) | Low | Managed | Per-session pricing |
| Molecule AI + MCP | Low — built into the workspace | Agent-native | Free (self-hosted) or standard Molecule AI tier |
Molecule AI workspaces ship MCP browser tools as part of the standard runtime. If you're already on Molecule AI, browser automation is available — you configure which tools the agent can access, not how they work.
The Chrome DevTools Protocol + MCP Bridge
Chrome ships with a built-in remote debugging interface: the Chrome DevTools Protocol (CDP). It's the same protocol that Chrome DevTools, Puppeteer, and Playwright are built on. CDP exposes browser functionality over a WebSocket connection as JSON-RPC 2.0 commands across a set of domains:
| Domain | What it does |
|---|---|
Page |
Navigate, reload, capture screenshots |
DOM |
Query and traverse the DOM tree |
Runtime |
Execute JavaScript in the page context |
Network |
Inspect and intercept network requests |
Input |
Dispatch mouse and keyboard events |
An MCP server that bridges to CDP maps these domains onto MCP tool definitions. The result: your AI agent calls browser_navigate and the MCP server translates it to a Page.navigate CDP command over WebSocket.
The tool schema looks like this:
{
"name": "browser_navigate",
"description": "Navigate to a URL in the headless Chrome session",
"inputSchema": {
"type": "object",
"properties": {
"url": { "type": "string", "description": "The URL to navigate to" }
},
"required": ["url"]
}
}
{
"name": "dom_query",
"description": "Query the DOM using a CSS selector",
"inputSchema": {
"type": "object",
"properties": {
"selector": { "type": "string", "description": "CSS selector" }
}
}
}
{
"name": "page_screenshot",
"description": "Capture a screenshot of the current page",
"inputSchema": {
"type": "object",
"properties": {
"fullPage": { "type": "boolean", "description": "Capture the full scrollable page", "default": false }
}
}
}
The MCP server handles the WebSocket lifecycle, CDP command dispatch, and response parsing. Your agent code stays clean.
Full Code Example: AI Agent That Researches Competitors
Here's a complete example using Molecule AI's Python SDK. The agent's task: go to a competitor's pricing page, extract the plan names and prices, and save a screenshot.
from molecule_ai import Agent, MCPToolset
from browser_mcp import ChromeDevToolsMCP # your MCP server
# Start the CDP session — connects to Chrome's remote debugging port
browser = ChromeDevToolsMCP(debugging_port=9222)
# Attach browser tools as MCP tools on the agent
agent = Agent(
system_prompt="You are a competitive research assistant. "
"Use the browser tools to gather data.",
mcp_tools=browser.tools(), # fetches tools via MCP manifest
)
# Run the task
result = agent.run(
"Go to https://example-competitor.com/pricing, extract all plan "
"names and monthly prices, then save a screenshot of the page."
)
print(result.final_output)
Behind the scenes, the tool call cycle looks like this:
Agent → MCP invoke: browser_navigate { url: "https://example-competitor.com/pricing" }
MCP Server → CDP command: Page.navigate { url: "https://example-competitor.com/pricing" }
CDP → Page.loadEventFired event (streamed back)
Agent → MCP invoke: dom_query { selector: ".pricing-plan, [data-plan]" }
Agent → MCP invoke: page_screenshot { fullPage: false }
Agent → MCP invoke: browser_navigate { url: "about:blank" } # cleanup
Each step is a structured tool call with typed inputs. The agent's prompt never mentions websocket, JSON-RPC, or port 9222. The MCP abstraction hides the infrastructure.
Setting Up Chrome for Remote Debugging
To use CDP, start Chrome with the remote debugging port open:
# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-debug
# Linux
google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
# Windows
chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\tmp\chrome-debug"
Or launch a headless instance:
google-chrome \
--headless \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-headless
Make sure no other Chrome instance is already using port 9222 on your machine.
The MCP Server: Minimal Implementation
If you want to roll your own MCP-to-CDP bridge (or understand what browser_mcp is doing above), here's the core of it:
import json
import asyncio
import websockets
class ChromeDevToolsMCP:
def __init__(self, debugging_port: int = 9222):
self.ws_url = f"ws://localhost:{debugging_port}/devtools/browser"
self._session_id: str | None = None
self._ws: websockets.WebSocketClientProtocol | None = None
async def __aenter__(self):
self._ws = await websockets.connect(self.ws_url)
# Create a new browser session
resp = await self._send("Target.createBrowserContext")
self._session_id = resp["browserContextId"]
return self
async def __aexit__(self, *args):
if self._ws:
await self._ws.close()
async def _send(self, method: str, params: dict = None) -> dict:
"""Send a CDP command and wait for the response."""
await self._ws.send(json.dumps({
"id": 1,
"method": method,
"params": params or {},
}))
raw = await self._ws.recv()
return json.loads(raw)
def tools(self) -> list[dict]:
"""Return MCP tool definitions for this server."""
return [
{
"name": "browser_navigate",
"description": "Navigate to a URL",
"inputSchema": {
"type": "object",
"properties": {
"url": {"type": "string", "format": "uri"}
},
"required": ["url"]
},
"handler": self._navigate,
},
{
"name": "page_screenshot",
"description": "Capture a screenshot",
"inputSchema": {
"type": "object",
"properties": {
"fullPage": {"type": "boolean", "default": False}
}
},
"handler": self._screenshot,
},
]
async def _navigate(self, url: str) -> str:
resp = await self._send("Page.navigate", {"url": url})
return f"Navigated. FrameId: {resp.get('frameId')}"
async def _screenshot(self, fullPage: bool = False) -> str:
# Enable screenshot domain first
await self._send("Page.enable")
resp = await self._send("Page.captureScreenshot", {
"format": "png",
"fullPage": fullPage,
})
return f"screenshot:{resp['data']}" # base64-encoded PNG
This is deliberately minimal — it shows the shape of the bridge without error handling, tab management, or the full CDP command surface. Production MCP servers (including Molecule AI's built-in browser tools) handle all of that.
Real-World Use Cases
Browser automation via MCP isn't just a demo trick. Here are the production use cases teams are already running:
Competitive intelligence pipelines. An agent that visits a competitor's site weekly, extracts pricing and feature data, and writes a diff summary to a Notion page. No Puppeteer scripts to maintain — the agent updates the extraction logic itself when the competitor redesigns.
AI-assisted data entry. An agent that receives a spreadsheet row, navigates to a web form, fills it in, and submits. Particularly useful for legacy systems that only have a web UI and no API.
Automated UI regression testing. Instead of writing Playwright test scripts that break on every CSS change, describe the expected state in natural language. The agent uses dom_query and page_screenshot to verify the UI matches your specification.
Real-time price and availability monitoring. An agent that polls a retail or ticketing site, captures a screenshot on price change, and sends a Slack alert. Runs on a schedule or triggers from a webhook.
All four of these work with the same MCP toolset — the agent's reasoning layer is identical; only the task description changes.
Compare this to n8n workflows: a human manually wires together a sequence of browser nodes — open tab, wait, click, extract, close. Molecule AI agents decide that sequence at runtime. When a competitor's page changes, the agent adapts the extraction strategy itself rather than waiting for a human to redraw the workflow.
Getting Started with Molecule AI
To use browser automation in a Molecule AI workspace, you connect your own MCP server (such as the ChromeDevToolsMCP shown above) using Molecule AI's built-in MCP tool registration. The platform handles the WebSocket lifecycle and tool call routing — you bring the browser logic.
Configure the MCP server URL in your workspace:
# Set your browser MCP server endpoint via the platform API
curl -X PATCH "${PLATFORM_URL}/workspaces/${WORKSPACE_ID}/config" \
-H "Authorization: Bearer ${WORKSPACE_TOKEN}" \
-d '{
"mcp_servers": {
"browser": {
"type": "streamable_http",
"url": "http://localhost:9223/mcp"
}
}
}'
Or use the Canvas UI: Workspace → Config → MCP Servers → Add browser MCP server.
What Molecule AI provides: WebSocket routing, tool call auth, session lifecycle, and the A2A bridge so your agent sees browser tools as native workspace tools. You bring the CDP bridge (or use the ChromeDevToolsMCP example above).
Compare that to wiring Playwright into LangChain: you write async wrapper functions, handle page.goto() timeouts in the prompt, and debug failures by reading through decorator-stacked chain outputs. With Molecule AI and MCP, the browser is a first-class tool — typed, session-aware, and registered the same way as any other MCP tool.
→ MCP Server Setup Guide → Quickstart: Deploy your first AI agent
Try it free — Molecule AI is open source and self-hostable. Get a workspace running in under 5 minutes.
Have a browser automation use case you want to see covered? File an issue with the enhancement label on the molecule-core issue tracker.