Browser
The Browser API gives your agent a persistent Chrome session it can control step by step. Create a session, get an accessibility-tree snapshot of the page, then interact — click buttons, type into fields, navigate, and extract content. No CSS selectors needed: every element gets a stable ref like @e1, @e2 that your agent can target directly.
How it works
- Create a session with a starting URL. Ilmenite boots a sandboxed Chrome instance and navigates to the page.
- Snapshot the page. You get back an accessibility tree with labeled element refs — your agent reads it like a screen reader.
- Interact. Click, type, evaluate JS, navigate, or extract content. Each action returns the updated snapshot so your agent always knows the current page state.
Endpoints
| Method | Path | Description | Cost |
|---|---|---|---|
POST | /v1/browser | Create a new browser session | $0.005 |
GET | /v1/browser/:id/snapshot | Get accessibility tree snapshot | $0.001 |
POST | /v1/browser/:id/click | Click an element by ref | $0.002 |
POST | /v1/browser/:id/type | Type text into an element | $0.002 |
POST | /v1/browser/:id/eval | Execute JavaScript in page context | $0.002 |
POST | /v1/browser/:id/go | Navigate to a new URL | $0.003 |
GET | /v1/browser/:id/text | Extract visible text | $0.001 |
GET | /v1/browser/:id/html | Get full page HTML | $0.001 |
GET | /v1/browser/:id/screenshot | Capture PNG screenshot | $0.003 |
GET | /v1/browser/:id | Get session status | Free |
DELETE | /v1/browser/:id | Close session and free resources | Free |
Create a session
POST https://api.ilmenite.dev/v1/browserRequest body
url(string, required) — starting URL to navigate to.wait_for_ms(number, optional) — milliseconds to wait after page load before returning. Default 1000, max 30000.viewport(object, optional) — setwidthandheight. Default 1280x720.proxy(string, optional) —residential,datacenter, or a proxy URL.timeout_ms(number, optional) — max session lifetime. Default 300000 (5 min), max 600000 (10 min).
Example
curl -X POST https://api.ilmenite.dev/v1/browser \
-H "Authorization: Bearer $ILMENITE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/login",
"wait_for_ms": 2000
}'Response
{
"id": "br_8xk2m9f3",
"status": "active",
"url": "https://example.com/login",
"created_at": "2026-04-12T10:30:00Z",
"expires_at": "2026-04-12T10:35:00Z",
"snapshot": {
"title": "Login — Example",
"tree": [
"- heading "Login" [ref=e1]",
"- textbox "Email" [ref=e2]",
"- textbox "Password" [ref=e3]",
"- button "Sign In" [ref=e4]",
"- link "Forgot password?" [ref=e5]"
]
}
}Snapshot
The snapshot is an accessibility tree representation of the page. Each interactive element gets a unique ref like @e1, @e2. Your agent uses these refs to target elements in click, type, and eval actions — no fragile CSS selectors, no XPath.
GET https://api.ilmenite.dev/v1/browser/:id/snapshot{
"url": "https://example.com/login",
"title": "Login — Example",
"tree": [
"- heading "Login" [ref=e1]",
"- textbox "Email" [ref=e2]",
"- textbox "Password" [ref=e3]",
"- checkbox "Remember me" [ref=e4]",
"- button "Sign In" [ref=e5]",
"- link "Forgot password?" [ref=e6]",
"- link "Create account" [ref=e7]"
]
}Interact
Use element refs from the snapshot to click, type, or run JavaScript. Each interaction returns the updated snapshot.
Click
POST https://api.ilmenite.dev/v1/browser/:id/clickcurl -X POST https://api.ilmenite.dev/v1/browser/br_8xk2m9f3/click \
-H "Authorization: Bearer $ILMENITE_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "selector": "@e5" }'Type
POST https://api.ilmenite.dev/v1/browser/:id/typecurl -X POST https://api.ilmenite.dev/v1/browser/br_8xk2m9f3/type \
-H "Authorization: Bearer $ILMENITE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"selector": "@e2",
"text": "user@example.com"
}'Eval
POST https://api.ilmenite.dev/v1/browser/:id/evalcurl -X POST https://api.ilmenite.dev/v1/browser/br_8xk2m9f3/eval \
-H "Authorization: Bearer $ILMENITE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"expression": "document.querySelectorAll(\"tr\").length"
}'
# → { "result": 42 }Navigate
POST https://api.ilmenite.dev/v1/browser/:id/goNavigate the session to a different URL. Optionally wait for the page to settle before returning the snapshot.
curl -X POST https://api.ilmenite.dev/v1/browser/br_8xk2m9f3/go \
-H "Authorization: Bearer $ILMENITE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/dashboard",
"wait_for_ms": 3000
}'Extract content
Pull text, HTML, or a screenshot from the current page state without closing the session.
GET /v1/browser/:id/text— visible text content, cleaned and concatenated.GET /v1/browser/:id/html— full page HTML source.GET /v1/browser/:id/screenshot— PNG screenshot of the current viewport. Returnsimage/pngcontent type.
Close
DELETE https://api.ilmenite.dev/v1/browser/:idTerminates the Chrome instance and frees resources. Sessions also auto-close when timeout_ms expires.
curl -X DELETE https://api.ilmenite.dev/v1/browser/br_8xk2m9f3 \
-H "Authorization: Bearer $ILMENITE_API_KEY"
# → { "id": "br_8xk2m9f3", "status": "closed" }Full example
Login to a site, navigate to a dashboard, and extract data — all in one session.
# 1. Create session at the login page
curl -X POST https://api.ilmenite.dev/v1/browser \
-H "Authorization: Bearer $ILMENITE_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "url": "https://app.example.com/login" }'
# → id: br_8xk2m9f3
# → snapshot: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign In" [ref=e3]
# 2. Type email
curl -X POST https://api.ilmenite.dev/v1/browser/br_8xk2m9f3/type \
-H "Authorization: Bearer $ILMENITE_API_KEY" \
-d '{ "selector": "@e1", "text": "agent@mycompany.com" }'
# 3. Type password
curl -X POST https://api.ilmenite.dev/v1/browser/br_8xk2m9f3/type \
-H "Authorization: Bearer $ILMENITE_API_KEY" \
-d '{ "selector": "@e2", "text": "s3cret" }'
# 4. Click sign in
curl -X POST https://api.ilmenite.dev/v1/browser/br_8xk2m9f3/click \
-H "Authorization: Bearer $ILMENITE_API_KEY" \
-d '{ "selector": "@e3" }'
# → snapshot: heading "Dashboard" [ref=e1], table [ref=e2], ...
# 5. Navigate to the reports page
curl -X POST https://api.ilmenite.dev/v1/browser/br_8xk2m9f3/go \
-H "Authorization: Bearer $ILMENITE_API_KEY" \
-d '{ "url": "https://app.example.com/reports", "wait_for_ms": 2000 }'
# 6. Extract the page text
curl https://api.ilmenite.dev/v1/browser/br_8xk2m9f3/text \
-H "Authorization: Bearer $ILMENITE_API_KEY"
# → { "text": "Monthly Report\n\nRevenue: $142,300\nUsers: 8,291\n..." }
# 7. Close the session
curl -X DELETE https://api.ilmenite.dev/v1/browser/br_8xk2m9f3 \
-H "Authorization: Bearer $ILMENITE_API_KEY"Cost per action
| Action | Cost |
|---|---|
| Create session | $0.005 |
| Snapshot | $0.001 |
| Click | $0.002 |
| Type | $0.002 |
| Eval | $0.002 |
| Navigate (go) | $0.003 |
| Text extract | $0.001 |
| HTML extract | $0.001 |
| Screenshot | $0.003 |
| Session status | Free |
| Close session | Free |
A typical login + extract flow (create, 2 types, 1 click, 1 navigate, 1 text extract) costs $0.016.