AI controls a browser.
You watch it happen.
The Web Agent puts an AI in control of a real browser. It navigates, clicks, fills forms, extracts data, and handles logins — on any website, with or without an API. Every step streams directly into your chat as it happens.
The Core Idea
Not every system has an API. A browser works on everything.
Most integrations on this platform work via APIs — structured endpoints that return data cleanly. But plenty of useful systems don't have APIs, have bad ones, or lock functionality behind a UI that was never designed for automation. Government portals, legacy software, web-only admin panels.
The Web Agent bridges that gap. It uses an AI model to drive a real browser — the same way a human would, but faster, without fatigue, and with instructions in plain English.
Capabilities
Give it a task in plain English.
Navigate
Open any URL, follow links, handle redirects, wait for page load, deal with popups and modals.
Fill forms
Type into text fields, select dropdowns, tick checkboxes, upload files, submit forms.
Click and interact
Click buttons, tabs, toggles, accordion rows, table actions — anything visible on screen.
Extract data
Read tables, lists, cards, dashboards. Return structured data back to the AI so it can act on it, save it, or summarise it.
Handle authentication
Log in with credentials. Platform pages (on bitcreative.com.au) are logged in automatically with no password required.
Multi-step workflows
Chain sequences of actions across multiple pages or sites in a single task. The agent remembers context across steps.
Live in Your Chat
Watch every step as it happens.
Most background tools are silent — you wait, then you see results. The Web Agent is different. Because browsing tasks can take minutes and the steps matter, it streams progress directly into the chat in real time.
Task dispatched
The AI sends the task to the Web Agent. The browser opens in the background. Control returns to you immediately — you don't have to wait.
Step-by-step updates
As the agent works, each step appears in a chat bubble below the AI's response: what it did, what it saw, what it plans to do next. Text streams in live.
Screenshots at each step
A screenshot of the browser appears after each significant action. You see exactly what the agent is looking at. Identical frames (e.g. during a form fill where nothing visible changed) are skipped automatically to keep the gallery clean.
Task complete
When the agent finishes, the bubble is finalised. All screenshots are stored on the row for clean rendering if you reload the chat later.
Authentication
Sessions persist. Platform pages log themselves in.
Persistent sessions per chat thread
The browser runs in a remote session (via Browserbase) that's tied to your chat thread. Once logged into a site in a task, that login persists for subsequent tasks in the same thread — session cookies are preserved. You don't need to include login instructions every time.
Platform pages auto-login
When the agent needs to visit any page on bitcreative.com.au,
the platform automatically generates a short-lived signed token and prepends an auto-login
step to the task. The agent visits /auto-login?token=...,
the token is verified, and the session is established — exactly as if you had logged in yourself.
The token expires in 5 minutes and is single-use. Passwords are never exposed.
Third-party sites
For sites outside the platform, include the login credentials in your task instructions. Once logged in during a task, the session cookie keeps the agent authenticated for all follow-up tasks in the same chat thread.
Acting on Results
The AI can automatically continue after the browser finishes.
By default, the Web Agent is fire-and-forget — it does its job and you see the results in the chat. But when you need the AI to do something with those results (save them to a database, create tasks, send an email), it can trigger itself automatically once the browser task is done.
Fire-and-forget (default)
Browser runs, you see results
The AI dispatches the task, you watch the steps, and when it's done the results are in the chat. You're in control of what happens next.
Example: “Go to this site and tell me the current pricing.”
Auto-continue
Browser runs, AI acts on results
When the task implies follow-up work, the platform automatically re-triggers the AI once the browser finishes. The AI gets the full step log and all screenshots, then continues — without you sending another message.
Example: “Scrape the table from this dashboard and save it to the database.”
How the AI decides which mode
When you describe the task, the AI infers whether follow-up work is implied. Tasks that end with “and then save it”, “then create tasks from it”, or “then email me a summary” trigger auto-continue. Tasks that are purely observational don't.
Cancellation
Stop a task that's going the wrong way.
While a Web Agent task is running, a Stop button appears in the chat UI. Clicking it sends a cancellation signal to the browser session. The agent stops at the end of its current step and the task is marked cancelled.
The partial results (all steps completed so far, including screenshots) remain in the chat bubble. You can review what it did before you stopped it.
When to Use It
The right tool for the right job.
Good fit
- ✓ Extracting data from a site with no API
- ✓ Filing or checking a government portal
- ✓ Migrating data between two web apps by copy-pasting between screens
- ✓ Verifying what a page currently looks like
- ✓ Automating a repetitive multi-step UI workflow
- ✓ Filling out a complex form that can't be pre-populated via API
Use API instead
- → Syncing records from a service that has a REST API (use a credential + automation)
- → Sending emails (use Brevo)
- → Reading/writing your own Supabase data
- → Anything that needs to run on a reliable cron schedule (browser tasks are slower and less deterministic than API calls)
A note on reliability
Browser automation is inherently less predictable than an API call. A site can change its layout, load slowly, show a CAPTCHA, or render differently to a headless browser. The agent tries to recover, but some tasks will fail on a bad day that worked fine yesterday. This is normal. For anything that needs to run every hour without intervention, build an API integration instead.
Under the Hood
How it actually works.
Runs in Browserbase — a managed remote browser service. Each chat thread gets its own persistent browser context so sessions survive across multiple tasks.
Gemini 2.5 Flash drives the browser agent — it evaluates each screenshot, decides the next action, and explains its reasoning. This is separate from the main chat model.
Step text and screenshots write directly to the chat history database. The chat UI's real-time subscription picks them up instantly — no polling, no refresh needed.
Default 50 steps per task. Each step is one browser action — a click, a navigation, a form fill. Complex tasks can be broken into multiple calls in the same session.