Agent Browser

Add a real browser to a workspace so agents can use it through MCP, with live human handoff for logins, MFA, and CAPTCHAs.

Agent Browser

Some work has no API. Internal tools, legacy portals, and dashboards behind a login sometimes require a real browser. Agent Browser adds a hosted Chrome session to a workspace so agents can navigate, read, click, type, download files, and ask a human to take over the same live session when a password, MFA challenge, or CAPTCHA appears.

For most teams, setup is simple: add agent-browser as a workspace integration, then create an MCP server for that workspace. The agent receives browser tools the same way it receives Slack, Gmail, Filesystem, State KV, Sandbox, and other workspace integrations.

It is not OAuth, and it is not a CAPTCHA bypass. The human signs in directly inside the browser; the agent never receives the password or MFA secret.

By default, browser profile persistence is scoped to the current end user. Pass your user's external ID as endUserId when agents should keep using that user's saved sign-in state. If a workflow is intentionally shared by the workspace or by a custom namespace, set settings.persistence on the workspace integration.

Use agent-browser for deterministic browser tools with no connection auth. Add agent-browser-ai only when you want natural-language browser actions backed by your own LLM provider key.

Primary Path: Workspace Integration + MCP

1

Add Agent Browser to the workspace

Create a workspace integration with a stable alias such as browser. No connection is required for the deterministic browser tools.

2

Create an MCP server for the workspace

Use Code Mode when you want compact search, read, and execute meta-tools. Use Tool Mode when the MCP client should see every browser action as an individual tool.

3

Let the agent use the browser

The first browser action starts a hosted session automatically. Later actions reuse the same live session for that workspace and end user when one is available.

4

Hand control to a human when needed

When a login, MFA challenge, or CAPTCHA blocks progress, the agent calls request_human. Weavz returns a viewer link so a person can control the same browser and then hand it back with resume.

Add It To A Workspace And Create MCP

bash
curl -X POST https://api.weavz.io/api/v1/workspaces/YOUR_WORKSPACE_ID/integrations \
  -H "Authorization: Bearer wvz_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "integrationName": "agent-browser",
    "alias": "browser",
    "displayName": "Agent Browser",
    "settings": { "persistence": { "scope": "end_user" } }
  }'
 
curl -X POST https://api.weavz.io/api/v1/mcp/servers \
  -H "Authorization: Bearer wvz_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Browser Agent Workspace",
    "workspaceId": "YOUR_WORKSPACE_ID",
    "mode": "CODE",
    "authMode": "oauth",
    "endUserAccess": "restricted"
  }'

How Sessions Are Handled

Agent Browser uses browser sessions behind the scenes, but normal MCP and action callers do not need to create sessions manually.

BehaviorWhat happens
First browser actionStarts a hosted browser session automatically when no live session is available
Later browser actionsReuse the active session for the same workspace and end user
sessionId omittedWeavz uses the auto-managed session for that caller context
sessionId providedThe action targets that explicit session after access checks
request_humanMints a viewer link and blocks agent control while the human is driving
resumeReturns control to the agent
end_sessionEnds the live session and snapshots the profile to the configured persistence scope

Pass the end user's externalId as endUserId when you want per-user browser identity. End-user scoped sessions can reuse saved sign-in state across runs. Workspace and external scopes reuse the same live browser session and saved browser profile for the configured workspace integration scope.

Batched Code Mode Workflows

When Agent Browser is exposed through a Code Mode MCP server, agents should batch related browser operations inside one weavz_execute call. A single run can navigate, inspect the page, click or type by snapshot ref, take a screenshot, and return the observations the agent needs. This is faster and more reliable than one execute call per browser action.

javascript
const session = await weavz.browser.start_session({ headless: true })
await weavz.browser.navigate({
  sessionId: session.sessionId,
  url: 'https://app.example.com',
})
 
const snapshot = await weavz.browser.snapshot({ sessionId: session.sessionId })
const status = await weavz.browser.read_text({
  sessionId: session.sessionId,
  target: '#status',
}).catch(() => null)
const screenshot = await weavz.browser.screenshot({
  sessionId: session.sessionId,
  quality: 55,
})
 
return {
  sessionId: session.sessionId,
  snapshot: String(snapshot.snapshot).slice(0, 3000),
  status,
  screenshot: {
    mimeType: screenshot.mimeType,
    width: screenshot.width,
    height: screenshot.height,
  },
}

Use sessionId across separate runs only when the workflow needs incremental state, human handoff, or a later follow-up. For unfamiliar pages, the most robust loop is snapshot, choose an element ref such as e5, then call click, type, read_text, or screenshot with that ref.

Persistence Scopes

Agent Browser and Agent Browser AI use the same settings.persistence object as Filesystem and State KV:

ScopeUse when
end_userEach of your users should keep a separate browser identity. This is the default and requires endUserId on calls.
workspaceThe browser identity is intentionally shared by the workspace.
externalYou want a custom namespace such as a tenant, project, or account key. Set settings.persistence.externalId.
json
{
  "integrationName": "agent-browser",
  "alias": "browser",
  "settings": {
    "persistence": {
      "scope": "workspace"
    }
  }
}

Deterministic Browser Actions

Use these actions directly through REST or the SDK when you are not going through MCP, or when you want to test the integration before connecting an MCP client.

bash
curl -X POST https://api.weavz.io/api/v1/actions/execute \
  -H "Authorization: Bearer wvz_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "workspaceId": "YOUR_WORKSPACE_ID",
    "integrationName": "agent-browser",
    "integrationAlias": "browser",
    "actionName": "navigate",
    "endUserId": "user_123",
    "input": { "url": "https://app.acme.com" }
  }'

The deterministic action set includes snapshot, navigate, navigate_back, click, type, fill_form, select_option, hover, drag, press_key, file_upload, evaluate, read_text, read_html, screenshot, wait_for, handle_dialog, tabs, request_human, resume, start_session, session_status, and end_session.

Screenshots

screenshot returns a browser image envelope:

json
{
  "mimeType": "image/jpeg",
  "width": 1280,
  "height": 720,
  "imageContent": "base64-encoded JPEG",
  "url": "hosted screenshot URL"
}

MCP tool calls also include an MCP image content item, so agents can inspect the screenshot directly without fetching the hosted URL. The URL is included when Filesystem can store the image for human viewing or downstream download. By default screenshots are JPEG quality 60 at agent-friendly scale; set fullResolution only when the agent or your backend needs the original device-scale image.

Optional LLM Driver

Natural-language browser actions live in a separate integration: agent-browser-ai. This integration requires a connection because it uses your LLM provider key. If you do not add agent-browser-ai, the auth-free agent-browser tools still work.

When configuring Agent Browser AI, choose the provider and model from dropdowns and store the API key as a secret connection value. The deterministic agent-browser integration does not require auth.

agent-browser-ai provides:

  • act - complete a natural-language browser task by looping over snapshots and browser actions.
  • extract - extract structured data from the current page.
  • observe - identify relevant page elements without taking action.
bash
curl -X POST https://api.weavz.io/api/v1/actions/execute \
  -H "Authorization: Bearer wvz_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "integrationName": "agent-browser-ai",
    "actionName": "act",
    "workspaceId": "YOUR_WORKSPACE_ID",
    "endUserId": "user_123",
    "input": { "instruction": "Open the latest invoice and download the PDF" }
  }'

Human Handoff

Use request_human when the browser reaches a step the agent should not complete.

bash
curl -X POST https://api.weavz.io/api/v1/actions/execute \
  -H "Authorization: Bearer wvz_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "integrationName": "agent-browser",
    "actionName": "request_human",
    "workspaceId": "YOUR_WORKSPACE_ID",
    "integrationAlias": "browser",
    "endUserId": "user_123",
    "input": { "reason": "Login or MFA required" }
  }'
 
curl -X POST https://api.weavz.io/api/v1/actions/execute \
  -H "Authorization: Bearer wvz_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "integrationName": "agent-browser",
    "actionName": "resume",
    "workspaceId": "YOUR_WORKSPACE_ID",
    "integrationAlias": "browser",
    "endUserId": "user_123",
    "input": {}
  }'

In user state, viewer clicks, typing, scrolling, paste, and navigation control the live page. Agent browser actions are blocked until control returns to agent.

Restrict Browsing

Pass allowedHosts to start_session or the first browser action in a run when a workflow should stay inside a known set of domains.

json
{
  "allowedHosts": ["app.acme.com", "*.acme-cdn.com"]
}

Omit allowedHosts for unrestricted browsing.

Session Lifecycle

Agent Browser manages the hosted browser session behind the workspace integration. The first browser action starts a session for the workspace and end user, later actions reuse it, and end_session releases it when the workflow is finished.

When an action includes endUserId, it must be an existing end user external ID in that workspace. This scopes browser identity and saved sign-in state to that user. Use request_human to mint a fresh human viewer link for login, MFA, CAPTCHA, or payment steps, then call resume after the person completes the step.

See Also