Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
CI / Platform (Go) (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Failing after 12s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 51s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m20s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m20s
Mass-sed across 17 files / 38 active refs in molecule-core .md docs (README + CONTRIBUTING + docs/architecture/ + docs/blog/ + docs/guides/ + docs/integrations/ + docs/quickstart.md + scripts/README.md). Driver: /tmp/sweep_core.py — same pattern set as the internal-marketing bulk-sed (PR #50). 4 url-substitution patterns + SKIP_PATTERN preserves /pull/<n> /issues/<n> /commit/<sha> /releases/... historical refs. Files NOT touched in this PR: - docs/workspace-runtime-package.md — owned by molecule-core#15 (workspace-runtime source-edit per #41). Reverted my bulk-sed of that file to avoid merge conflict. - 2 Go-import-path refs in docs/memory-plugins/testing-your-plugin.md (github.com/Molecule-AI/molecule-monorepo/platform/internal/...) — Q5 cross-repo Go-module migration territory. - 1 GitHub Gist link in docs/guides/external-workspace-quickstart.md (gist.github.com/molecule-ai/...) — no Gitea equivalent; consistent with the same handling in docs#1. Manual fixes (2): - docs/blog/2026-04-20-chrome-devtools-mcp-seo/index.md:306 — GitHub Discussions (no Gitea equivalent) → issue tracker link - docs/guides/external-workspace-quickstart.md:218 — tracking-issue ?q= query-string url (regex didn't catch) → reformulated text + Gitea search-by-query approach Pattern matches my docs#1 (public docs site) PR + internal#50 (internal/marketing bulk-sed). Standard substitutions: - https://github.com/Molecule-AI/<repo> → https://git.moleculesai.app/molecule-ai/<repo> - /blob/<branch>/ + /tree/<branch>/ → /src/branch/<branch>/ Refs: molecule-ai/internal#37, molecule-ai/internal#38
307 lines
14 KiB
Markdown
307 lines
14 KiB
Markdown
---
|
|
title: "Give Your AI Agent a Real Browser: MCP + Chrome DevTools"
|
|
date: 2026-04-20
|
|
slug: browser-automation-ai-agents-mcp
|
|
description: "Learn how to add browser automation to your AI agents using Chrome DevTools and the Model Context Protocol. Full Python code examples — no Puppeteer wrappers, no SaaS dependencies."
|
|
tags: [MCP, browser-automation, AI-agents, CDP, tutorial]
|
|
---
|
|
|
|
# Give Your AI Agent a Real Browser: MCP + Chrome DevTools
|
|
|
|
Most AI agents hit the same wall: they can reason, plan, and call APIs — but the moment a task requires clicking through a website, filling a form, or reading a page that has no API, they're stuck.
|
|
|
|
The fix is giving your agent a real browser. Not a screenshot API, not a Playwright script written by a human. A browser your AI agent controls itself — deciding when to navigate, extract, and interact, the same way a human would.
|
|
|
|
The Model Context Protocol (MCP) is the bridge. It gives AI models a standardized interface to call browser tools — not buried in a prompt, but as first-class, typed tool calls. Chrome DevTools Protocol (CDP) is the engine: the same underlying protocol that powers Chrome DevTools, Puppeteer, and Playwright, exposed directly to your agent.
|
|
|
|
This post shows how it works end-to-end — with working Python code and a complete example you can run today.
|
|
|
|
## Why MCP for Browser Automation
|
|
|
|
Before MCP, connecting an AI agent to a browser meant one of two paths:
|
|
|
|
**Path 1: Custom wrapper scripts.** You write Python functions that call Puppeteer or Playwright, expose them via a prompt, and hope the model routes tool calls correctly. It works in demos. It breaks in production when the prompt drifts or the tool schema is ambiguous.
|
|
|
|
**Path 2: SaaS browser APIs.** Services like Browserbase or Steel provide managed browser infrastructure, but they add a dependency, a pricing tier, and a network hop between your agent and the browser. For teams already self-hosting or using Molecule AI, it's the wrong direction.
|
|
|
|
MCP solves both problems. It gives you:
|
|
|
|
- **Typed tool definitions** — your agent sees `browser_navigate`, `dom_query`, `page_screenshot` with JSON Schema inputs, not raw Python function names buried in a system prompt.
|
|
- **Streaming tool calls** — long-running browser operations (page loads, form submissions) stream progress back without blocking the agent's reasoning loop.
|
|
- **Session persistence** — CDP sessions maintain browser state (cookies, localStorage, scroll position) across tool calls, so your agent isn't starting from a blank page every turn.
|
|
|
|
**Compare that to the alternatives:**
|
|
|
|
LangChain agents can call Playwright — but you manage session state, handle Playwright timeouts in your prompt, and debug failures by reading through a tangled chain of decorator-wrapped functions. CrewAI's browser tools are tool_USE wrappers, not agent-native — the agent sees them as function calls but can't introspect browser state between steps.
|
|
|
|
With Molecule AI and MCP, the browser is a first-class citizen in the agent's tool context. The agent sees the browser session as a live state — it can navigate, query, screenshot, and wait without a human manually sequencing the steps.
|
|
|
|
**Infrastructure comparison:**
|
|
|
|
| Approach | Setup effort | Session management | Cost |
|
|
|---|---|---|---|
|
|
| Custom Puppeteer/Playwright | High — you write and maintain the wrapper | DIY | Free (your infra) |
|
|
| Browserbase / Steel (SaaS) | Low | Managed | Per-session pricing |
|
|
| Molecule AI + MCP | Low — built into the workspace | Agent-native | Free (self-hosted) or standard Molecule AI tier |
|
|
|
|
Molecule AI workspaces ship MCP browser tools as part of the standard runtime. If you're already on Molecule AI, browser automation is available — you configure which tools the agent can access, not how they work.
|
|
|
|
## The Chrome DevTools Protocol + MCP Bridge
|
|
|
|
Chrome ships with a built-in remote debugging interface: the Chrome DevTools Protocol (CDP). It's the same protocol that Chrome DevTools, Puppeteer, and Playwright are built on. CDP exposes browser functionality over a WebSocket connection as JSON-RPC 2.0 commands across a set of domains:
|
|
|
|
| Domain | What it does |
|
|
|---|---|
|
|
| `Page` | Navigate, reload, capture screenshots |
|
|
| `DOM` | Query and traverse the DOM tree |
|
|
| `Runtime` | Execute JavaScript in the page context |
|
|
| `Network` | Inspect and intercept network requests |
|
|
| `Input` | Dispatch mouse and keyboard events |
|
|
|
|
An MCP server that bridges to CDP maps these domains onto MCP tool definitions. The result: your AI agent calls `browser_navigate` and the MCP server translates it to a `Page.navigate` CDP command over WebSocket.
|
|
|
|
The tool schema looks like this:
|
|
|
|
```json
|
|
{
|
|
"name": "browser_navigate",
|
|
"description": "Navigate to a URL in the headless Chrome session",
|
|
"inputSchema": {
|
|
"type": "object",
|
|
"properties": {
|
|
"url": { "type": "string", "description": "The URL to navigate to" }
|
|
},
|
|
"required": ["url"]
|
|
}
|
|
}
|
|
```
|
|
|
|
```json
|
|
{
|
|
"name": "dom_query",
|
|
"description": "Query the DOM using a CSS selector",
|
|
"inputSchema": {
|
|
"type": "object",
|
|
"properties": {
|
|
"selector": { "type": "string", "description": "CSS selector" }
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
```json
|
|
{
|
|
"name": "page_screenshot",
|
|
"description": "Capture a screenshot of the current page",
|
|
"inputSchema": {
|
|
"type": "object",
|
|
"properties": {
|
|
"fullPage": { "type": "boolean", "description": "Capture the full scrollable page", "default": false }
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
The MCP server handles the WebSocket lifecycle, CDP command dispatch, and response parsing. Your agent code stays clean.
|
|
|
|
## Full Code Example: AI Agent That Researches Competitors
|
|
|
|
Here's a complete example using Molecule AI's Python SDK. The agent's task: go to a competitor's pricing page, extract the plan names and prices, and save a screenshot.
|
|
|
|
```python
|
|
from molecule_ai import Agent, MCPToolset
|
|
from browser_mcp import ChromeDevToolsMCP # your MCP server
|
|
|
|
# Start the CDP session — connects to Chrome's remote debugging port
|
|
browser = ChromeDevToolsMCP(debugging_port=9222)
|
|
|
|
# Attach browser tools as MCP tools on the agent
|
|
agent = Agent(
|
|
system_prompt="You are a competitive research assistant. "
|
|
"Use the browser tools to gather data.",
|
|
mcp_tools=browser.tools(), # fetches tools via MCP manifest
|
|
)
|
|
|
|
# Run the task
|
|
result = agent.run(
|
|
"Go to https://example-competitor.com/pricing, extract all plan "
|
|
"names and monthly prices, then save a screenshot of the page."
|
|
)
|
|
|
|
print(result.final_output)
|
|
```
|
|
|
|
Behind the scenes, the tool call cycle looks like this:
|
|
|
|
```
|
|
Agent → MCP invoke: browser_navigate { url: "https://example-competitor.com/pricing" }
|
|
MCP Server → CDP command: Page.navigate { url: "https://example-competitor.com/pricing" }
|
|
CDP → Page.loadEventFired event (streamed back)
|
|
Agent → MCP invoke: dom_query { selector: ".pricing-plan, [data-plan]" }
|
|
Agent → MCP invoke: page_screenshot { fullPage: false }
|
|
Agent → MCP invoke: browser_navigate { url: "about:blank" } # cleanup
|
|
```
|
|
|
|
Each step is a structured tool call with typed inputs. The agent's prompt never mentions `websocket`, `JSON-RPC`, or `port 9222`. The MCP abstraction hides the infrastructure.
|
|
|
|
### Setting Up Chrome for Remote Debugging
|
|
|
|
To use CDP, start Chrome with the remote debugging port open:
|
|
|
|
```bash
|
|
# macOS
|
|
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
|
|
--remote-debugging-port=9222 \
|
|
--user-data-dir=/tmp/chrome-debug
|
|
|
|
# Linux
|
|
google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
|
|
|
|
# Windows
|
|
chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\tmp\chrome-debug"
|
|
```
|
|
|
|
Or launch a headless instance:
|
|
|
|
```bash
|
|
google-chrome \
|
|
--headless \
|
|
--remote-debugging-port=9222 \
|
|
--user-data-dir=/tmp/chrome-headless
|
|
```
|
|
|
|
Make sure no other Chrome instance is already using port 9222 on your machine.
|
|
|
|
### The MCP Server: Minimal Implementation
|
|
|
|
If you want to roll your own MCP-to-CDP bridge (or understand what `browser_mcp` is doing above), here's the core of it:
|
|
|
|
```python
|
|
import json
|
|
import asyncio
|
|
import websockets
|
|
|
|
class ChromeDevToolsMCP:
|
|
def __init__(self, debugging_port: int = 9222):
|
|
self.ws_url = f"ws://localhost:{debugging_port}/devtools/browser"
|
|
self._session_id: str | None = None
|
|
self._ws: websockets.WebSocketClientProtocol | None = None
|
|
|
|
async def __aenter__(self):
|
|
self._ws = await websockets.connect(self.ws_url)
|
|
# Create a new browser session
|
|
resp = await self._send("Target.createBrowserContext")
|
|
self._session_id = resp["browserContextId"]
|
|
return self
|
|
|
|
async def __aexit__(self, *args):
|
|
if self._ws:
|
|
await self._ws.close()
|
|
|
|
async def _send(self, method: str, params: dict = None) -> dict:
|
|
"""Send a CDP command and wait for the response."""
|
|
await self._ws.send(json.dumps({
|
|
"id": 1,
|
|
"method": method,
|
|
"params": params or {},
|
|
}))
|
|
raw = await self._ws.recv()
|
|
return json.loads(raw)
|
|
|
|
def tools(self) -> list[dict]:
|
|
"""Return MCP tool definitions for this server."""
|
|
return [
|
|
{
|
|
"name": "browser_navigate",
|
|
"description": "Navigate to a URL",
|
|
"inputSchema": {
|
|
"type": "object",
|
|
"properties": {
|
|
"url": {"type": "string", "format": "uri"}
|
|
},
|
|
"required": ["url"]
|
|
},
|
|
"handler": self._navigate,
|
|
},
|
|
{
|
|
"name": "page_screenshot",
|
|
"description": "Capture a screenshot",
|
|
"inputSchema": {
|
|
"type": "object",
|
|
"properties": {
|
|
"fullPage": {"type": "boolean", "default": False}
|
|
}
|
|
},
|
|
"handler": self._screenshot,
|
|
},
|
|
]
|
|
|
|
async def _navigate(self, url: str) -> str:
|
|
resp = await self._send("Page.navigate", {"url": url})
|
|
return f"Navigated. FrameId: {resp.get('frameId')}"
|
|
|
|
async def _screenshot(self, fullPage: bool = False) -> str:
|
|
# Enable screenshot domain first
|
|
await self._send("Page.enable")
|
|
resp = await self._send("Page.captureScreenshot", {
|
|
"format": "png",
|
|
"fullPage": fullPage,
|
|
})
|
|
return f"screenshot:{resp['data']}" # base64-encoded PNG
|
|
```
|
|
|
|
This is deliberately minimal — it shows the shape of the bridge without error handling, tab management, or the full CDP command surface. Production MCP servers (including Molecule AI's built-in browser tools) handle all of that.
|
|
|
|
## Real-World Use Cases
|
|
|
|
Browser automation via MCP isn't just a demo trick. Here are the production use cases teams are already running:
|
|
|
|
**Competitive intelligence pipelines.** An agent that visits a competitor's site weekly, extracts pricing and feature data, and writes a diff summary to a Notion page. No Puppeteer scripts to maintain — the agent updates the extraction logic itself when the competitor redesigns.
|
|
|
|
**AI-assisted data entry.** An agent that receives a spreadsheet row, navigates to a web form, fills it in, and submits. Particularly useful for legacy systems that only have a web UI and no API.
|
|
|
|
**Automated UI regression testing.** Instead of writing Playwright test scripts that break on every CSS change, describe the expected state in natural language. The agent uses `dom_query` and `page_screenshot` to verify the UI matches your specification.
|
|
|
|
**Real-time price and availability monitoring.** An agent that polls a retail or ticketing site, captures a screenshot on price change, and sends a Slack alert. Runs on a schedule or triggers from a webhook.
|
|
|
|
All four of these work with the same MCP toolset — the agent's reasoning layer is identical; only the task description changes.
|
|
|
|
Compare this to n8n workflows: a human manually wires together a sequence of browser nodes — open tab, wait, click, extract, close. Molecule AI agents *decide* that sequence at runtime. When a competitor's page changes, the agent adapts the extraction strategy itself rather than waiting for a human to redraw the workflow.
|
|
|
|
## Getting Started with Molecule AI
|
|
|
|
To use browser automation in a Molecule AI workspace, you connect your own MCP server (such as the `ChromeDevToolsMCP` shown above) using Molecule AI's built-in MCP tool registration. The platform handles the WebSocket lifecycle and tool call routing — you bring the browser logic.
|
|
|
|
**Configure the MCP server URL in your workspace:**
|
|
|
|
```bash
|
|
# Set your browser MCP server endpoint via the platform API
|
|
curl -X PATCH "${PLATFORM_URL}/workspaces/${WORKSPACE_ID}/config" \
|
|
-H "Authorization: Bearer ${WORKSPACE_TOKEN}" \
|
|
-d '{
|
|
"mcp_servers": {
|
|
"browser": {
|
|
"type": "streamable_http",
|
|
"url": "http://localhost:9223/mcp"
|
|
}
|
|
}
|
|
}'
|
|
```
|
|
|
|
Or use the Canvas UI: Workspace → Config → MCP Servers → Add browser MCP server.
|
|
|
|
**What Molecule AI provides:** WebSocket routing, tool call auth, session lifecycle, and the A2A bridge so your agent sees browser tools as native workspace tools. You bring the CDP bridge (or use the `ChromeDevToolsMCP` example above).
|
|
|
|
**Compare that to wiring Playwright into LangChain:** you write async wrapper functions, handle `page.goto()` timeouts in the prompt, and debug failures by reading through decorator-stacked chain outputs. With Molecule AI and MCP, the browser is a first-class tool — typed, session-aware, and registered the same way as any other MCP tool.
|
|
|
|
→ [MCP Server Setup Guide](/docs/guides/mcp-server-setup)
|
|
→ [Quickstart: Deploy your first AI agent](/docs/quickstart)
|
|
|
|
**Try it free** — Molecule AI is open source and self-hostable. Get a workspace running in under 5 minutes.
|
|
|
|
→ [Get started on GitHub →](https://git.moleculesai.app/molecule-ai/molecule-core)
|
|
|
|
---
|
|
|
|
*Have a browser automation use case you want to see covered? File an issue with the `enhancement` label on the [molecule-core issue tracker](https://git.moleculesai.app/molecule-ai/molecule-core/issues).*
|