Feature Guide

AI controls a browser.
You watch it happen.

The Web Agent puts an AI in control of a real browser. It navigates, clicks, fills forms, extracts data, and handles logins — on any website, with or without an API. Every step streams directly into your chat as it happens.

What It Does In Your Chat Authentication Acting on Results Cancellation When to Use It

The Core Idea

Not every system has an API. A browser works on everything.

Most integrations on this platform work via APIs — structured endpoints that return data cleanly. But plenty of useful systems don't have APIs, have bad ones, or lock functionality behind a UI that was never designed for automation. Government portals, legacy software, web-only admin panels.

The Web Agent bridges that gap. It uses an AI model to drive a real browser — the same way a human would, but faster, without fatigue, and with instructions in plain English.

Capabilities

Give it a task in plain English.

Navigate

Open any URL, follow links, handle redirects, wait for page load, deal with popups and modals.

Fill forms

Type into text fields, select dropdowns, tick checkboxes, upload files, submit forms.

Click and interact

Click buttons, tabs, toggles, accordion rows, table actions — anything visible on screen.

Extract data

Read tables, lists, cards, dashboards. Return structured data back to the AI so it can act on it, save it, or summarise it.

Handle authentication

Multi-step workflows

Chain sequences of actions across multiple pages or sites in a single task. The agent remembers context across steps.

Live in Your Chat

Watch every step as it happens.

Most background tools are silent — you wait, then you see results. The Web Agent is different. Because browsing tasks can take minutes and the steps matter, it streams progress directly into the chat in real time.

Task dispatched

The AI sends the task to the Web Agent. The browser opens in the background. Control returns to you immediately — you don't have to wait.

Step-by-step updates

As the agent works, each step appears in a chat bubble below the AI's response: what it did, what it saw, what it plans to do next. Text streams in live.

Screenshots at each step

A screenshot of the browser appears after each significant action. You see exactly what the agent is looking at. Identical frames (e.g. during a form fill where nothing visible changed) are skipped automatically to keep the gallery clean.

Task complete

When the agent finishes, the bubble is finalised. All screenshots are stored on the row for clean rendering if you reload the chat later.

Authentication

Sessions persist. Platform pages log themselves in.

Persistent sessions per chat thread

The browser runs in a remote session (via Browserbase) that's tied to your chat thread. Once logged into a site in a task, that login persists for subsequent tasks in the same thread — session cookies are preserved. You don't need to include login instructions every time.

Platform pages auto-login

When the agent needs to visit any page on bitcreative.com.au, the platform automatically generates a short-lived signed token and prepends an auto-login step to the task. The agent visits /auto-login?token=..., the token is verified, and the session is established — exactly as if you had logged in yourself.

The token expires in 5 minutes and is single-use. Passwords are never exposed.

Third-party sites

For sites outside the platform, include the login credentials in your task instructions. Once logged in during a task, the session cookie keeps the agent authenticated for all follow-up tasks in the same chat thread.

Acting on Results

The AI can automatically continue after the browser finishes.

By default, the Web Agent is fire-and-forget — it does its job and you see the results in the chat. But when you need the AI to do something with those results (save them to a database, create tasks, send an email), it can trigger itself automatically once the browser task is done.

Fire-and-forget (default)

Browser runs, you see results

The AI dispatches the task, you watch the steps, and when it's done the results are in the chat. You're in control of what happens next.

Example: “Go to this site and tell me the current pricing.”

Auto-continue

Browser runs, AI acts on results

When the task implies follow-up work, the platform automatically re-triggers the AI once the browser finishes. The AI gets the full step log and all screenshots, then continues — without you sending another message.

Example: “Scrape the table from this dashboard and save it to the database.”

How the AI decides which mode

When you describe the task, the AI infers whether follow-up work is implied. Tasks that end with “and then save it”, “then create tasks from it”, or “then email me a summary” trigger auto-continue. Tasks that are purely observational don't.

Cancellation

Stop a task that's going the wrong way.

While a Web Agent task is running, a Stop button appears in the chat UI. Clicking it sends a cancellation signal to the browser session. The agent stops at the end of its current step and the task is marked cancelled.

The partial results (all steps completed so far, including screenshots) remain in the chat bubble. You can review what it did before you stopped it.

When to Use It

The right tool for the right job.

Good fit

✓ Extracting data from a site with no API
✓ Filing or checking a government portal
✓ Migrating data between two web apps by copy-pasting between screens
✓ Verifying what a page currently looks like
✓ Automating a repetitive multi-step UI workflow
✓ Filling out a complex form that can't be pre-populated via API

Use API instead

→ Syncing records from a service that has a REST API (use a credential + automation)
→ Sending emails (use Brevo)
→ Reading/writing your own Supabase data
→ Anything that needs to run on a reliable cron schedule (browser tasks are slower and less deterministic than API calls)

A note on reliability

Browser automation is inherently less predictable than an API call. A site can change its layout, load slowly, show a CAPTCHA, or render differently to a headless browser. The agent tries to recover, but some tasks will fail on a bad day that worked fine yesterday. This is normal. For anything that needs to run every hour without intervention, build an API integration instead.

Under the Hood

How it actually works.

Browser

Runs in Browserbase — a managed remote browser service. Each chat thread gets its own persistent browser context so sessions survive across multiple tasks.

AI model

Gemini 2.5 Flash drives the browser agent — it evaluates each screenshot, decides the next action, and explains its reasoning. This is separate from the main chat model.

Streaming

Step text and screenshots write directly to the chat history database. The chat UI's real-time subscription picks them up instantly — no polling, no refresh needed.

Max steps

Default 50 steps per task. Each step is one browser action — a click, a navigation, a form fill. Complex tasks can be broken into multiple calls in the same session.

AI controls a browser.You watch it happen.