Find the hidden APIs behind any website.Try it now
All posts
EngineeringApril 18, 2026·8 min·Ilmenite Team

Web Scraping API for LLM Agents — The Complete Guide

An LLM agent that can't read the web is a toy. A browser plugin. A novelty that forgets the world changed after its training cutoff. The moment you want an agent to research a topic, look up a product...

An LLM agent that can't read the web is a toy. A browser plugin. A novelty that forgets the world changed after its training cutoff. The moment you want an agent to research a topic, look up a product, check a policy, or read a vendor's documentation, you're in web-scraping territory — and the traditional scraping tools are not designed for agents.

This post is the pillar for how to think about web scraping specifically for LLM agents: what agents actually need, why traditional browser-per-request scrapers fail at agent economics, and how the dual-engine pattern (pure-Rust fast path + Chromium opt-in) is changing the shape of scraping infrastructure.

What agents actually need from scraping

Three things, in this order:

1. Clean, token-efficient output. An LLM doesn't want raw HTML. Raw HTML is 10-50× larger than the actual content on a page. At $0.015 per million input tokens for a mid-tier model, that multiplier matters. Agents want Markdown (or structured JSON) — the same content, 1/10th the tokens, with semantic structure preserved.

2. Per-operation economics. Agent workloads are bursty and high-fanout. A research agent might make 50 scrape calls to answer one user question. If you're paying per-browser-hour, one research session can cost more than the user paid for the month. Agents need per-operation pricing where the cost of each call reflects the work actually done.

3. Predictable latency. An agent decision loop that's waiting 6 seconds per scrape can't do real-time work. A user asking "what's the latest on X" won't wait 30 seconds for a research summary. Agents need sub-second response on typical pages.

Traditional scraping tools optimize for none of these. They emit HTML (not Markdown), they meter by browser-hour (not operation), and they default to full-render latency on every request (not adaptive).

Why browser-per-request is wrong for agents

Most commercial scraping APIs run a Chromium instance per request. It's the safe default — any page, any JavaScript, any weird rendering, just works. But for agent workloads it's economically backwards.

The numbers: a browser startup + page load + extraction costs roughly $0.0005-$0.002 in compute plus 2-5 seconds of wall time. For a static page that doesn't need rendering, you just paid 10-100× more than the actual work cost. Repeat that across an agent's 50 calls per session and you've burned $0.05-$0.10 on nothing.

Compare to a pure-Rust HTTP fetch + HTML parse for a static page: roughly $0.0001 and 50-200ms. An order of magnitude cheaper and an order of magnitude faster.

The right mental model: ~85% of web pages don't need a browser. They're static HTML, JSON APIs, or server-side-rendered pages where all the content is in the initial response. The remaining ~15% (heavy React SPAs, client-side-only content) genuinely need Chromium.

If your scraper runs Chromium on everything, you're subsidizing the 15% with your 85% traffic. That's fine at small scale. At agent scale it kills your unit economics.

The dual-engine pattern

Ilmenite's architecture makes the render decision per-request:

  • Default path: pure-Rust HTTP client, HTML parser (scraper), basic JS interpreter (Boa for simple eval), reading-order-aware Markdown conversion. Zero browser processes. Sub-100ms typical. Sub-$0.001 per call.
  • Opt-in path: full Chromium, JS execution, viewport simulation, screenshots. Seconds per call. Higher per-call cost but explicit rather than hidden.

Calling code picks per request:

# Static page / JSON API / SSR site: use the fast path
requests.post(
    "https://api.ilmenite.dev/v1/scrape",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"url": "https://example.com/article", "format": "markdown"},
)

# React SPA with client-rendered content: opt into Chromium
requests.post(
    "https://api.ilmenite.dev/v1/scrape",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "url": "https://spa.example.com/dashboard",
        "format": "markdown",
        "render": "chrome",
    },
)

Agents that know what kind of page they're hitting can make the choice explicitly. Agents that don't can try the fast path first and fall back to Chromium on cache-miss — a common pattern that ends up routing 80-90% of traffic to the cheap path in practice.

/v1/discover — the underrated unfair advantage

Most web scraping is unnecessary. Here's the thing nobody talks about: the vast majority of modern web pages are built on an internal JSON API that the page itself calls. If you can find that API, you don't scrape HTML at all — you call the API directly, get structured JSON back, and skip the entire DOM parsing dance.

Ilmenite's /v1/discover endpoint does exactly this. Give it a URL, it launches a headless browser, records every network request the page makes, classifies them (JSON APIs, JS bundles, tracking beacons, etc.), and returns the JSON APIs that look like they're serving page content:

response = requests.post(
    "https://api.ilmenite.dev/v1/discover",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"url": "https://example.com/products/123"},
)
apis = response.json()["apis"]
# apis is a list of {method, url, body_schema, sample_response}
# Find the one that returned the product data. Hit it directly next time.

For an agent that's going to scrape a site 100 times, running /v1/discover once and then hitting the discovered API saves 99 full scrapes. Structured JSON is also cleaner input for the LLM than scraped-HTML-to-Markdown — no chance of the Markdown converter misinterpreting a layout.

Deeper walkthrough here.

MCP: the right protocol for agent-to-scraper communication

Model Context Protocol (MCP) is Anthropic's open standard for how LLM agents talk to tools. Claude Desktop, Cursor, Continue, and most modern agent frameworks speak MCP. If you're building agent infrastructure in 2026 and you're not MCP-compatible, you're building a walled garden in a standardized world.

Ilmenite ships a first-class MCP server. In Claude Desktop you add:

{
  "mcpServers": {
    "ilmenite": {
      "command": "npx",
      "args": ["-y", "@ilmenite/mcp"],
      "env": { "ILMENITE_API_KEY": "ilm_..." }
    }
  }
}

And Claude immediately has scrape, crawl, extract, discover, search as tools it can call. No custom integration code. No prompt engineering to teach Claude what the API looks like. Full Claude + MCP walkthrough.

For agents built on LangChain, LlamaIndex, or raw SDK calls, plain HTTP works too. But MCP is where the ecosystem is consolidating. Build for MCP and your tool works in every agent host, not just the one you integrated first.

Why MCP isn't a plugin but the default shape of agent tools.

The five endpoints an agent actually uses

Most scraping APIs expose one endpoint: "give URL, get content." Agent workloads need five distinct verbs because they're doing different things at different points in a decision loop.

/v1/scrape — single page fetch, Markdown output. The workhorse. An agent reading "the latest docs page" or "this specific article" hits this. Guide.

/v1/crawl — start at a URL, follow links within the same domain, extract Markdown from each page. Agents use this when they need "everything on this documentation site" or "all product pages in this catalog." Use case.

/v1/extract — fetch a URL, extract data matching a JSON schema. LLM-powered. Agents use this when they need structured data ("price, title, availability") not generic Markdown. The LLM does the field extraction; you get typed output you can put in a database or pass to the next tool. Guide.

/v1/discover — find the JSON API behind a site (described above). Agents use this once per target as an optimization pass — next 100 scrapes become direct API calls.

/v1/search — search-engine-ish endpoint, returns results with content already scraped. Agents use this for "find articles about X" without having to run a separate search API and then scrape each result.

Together these five cover the full agent scraping workflow. No agent is writing BeautifulSoup selectors in 2026 — you tell the API what you want and it handles the how.

Handling the hard sites: Cloudflare, bot detection, anti-scraping

The uncomfortable truth about modern web scraping: a significant fraction of the web is behind Cloudflare, Akamai, DataDome, or similar anti-bot services. These services detect scraping and block it. No amount of JavaScript rendering fixes this — they're fingerprinting at the TLS, HTTP, and behavioral layers, not the content layer.

Two honest answers:

1. Where it's legitimate to scrape the site, use a residential-IP proxy. Ilmenite supports bring-your-own-proxy (BYOP) — you provide proxy credentials, we route through them. Residential IPs pass Cloudflare's reputation checks in a way datacenter IPs never will.

2. Where it's not legitimate, don't. Robots.txt, ToS, and polite rate-limiting exist for reasons. An agent that hammers a site into serving 429s is a liability, not a feature. If the content you need isn't accessible to a polite scraper, consider whether there's a licensed data source or an official API.

More on Cloudflare bypass legitimacy.

Pricing: per-operation vs per-browser-hour

One more time because this is the biggest cost lever: agents make a lot of calls. Per-operation pricing wins dramatically over per-browser-hour for that usage shape.

At Ilmenite's current rates, 1,000 fast-path scrapes costs ~$1. 1,000 Chromium-rendered scrapes costs ~$5-10 depending on page complexity. A research agent that makes 50 scrapes per user question and handles 1,000 questions per day on the fast path costs you ~$50 total. On a per-browser-hour scraper where each call consumes 2-5 seconds of browser time, the same workload runs $500-2,500. That's the difference between a viable agent business and one that doesn't make sense.

The micro-dollar billing means this math is explicit, not hidden. Every response includes the cost. You can budget a user's research session in advance. You can cap agent runaways. You can predict what next month looks like.

Integration examples

Python, raw HTTP:

import httpx

async with httpx.AsyncClient() as client:
    response = await client.post(
        "https://api.ilmenite.dev/v1/scrape",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"url": "https://example.com/article", "format": "markdown"},
    )
    markdown = response.json()["markdown"]

TypeScript / Node, raw fetch:

const response = await fetch("https://api.ilmenite.dev/v1/scrape", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ url, format: "markdown" }),
});
const { markdown } = await response.json();

LangChain integration: Walkthrough.

LlamaIndex integration: Walkthrough.

Claude Desktop via MCP: Walkthrough.

Building an agent that reads the web

The minimum shape of a real agent with web access:

  1. Receive a user question.
  2. Decide what to search for (LLM call with available tools).
  3. Call /v1/search with query terms.
  4. For each promising result, call /v1/scrape to get full content.
  5. Summarize + synthesize into an answer (LLM call).
  6. Cite sources.

That's the loop. Every scrape call is explicit, metered, and fast enough to stay within user latency expectations. The agent decides when to go deep (scrape full article) vs. shallow (use search snippets). The cost is predictable per question.

A concrete example that ties this together: Building an AI Research Assistant That Reads the Web.

Further reading on this site

One paragraph summary

An LLM agent that reads the web needs per-operation pricing (not per-browser-hour), Markdown output (not raw HTML), sub-second typical latency (not always-render), and the ability to choose render depth per-request (not one-size-fits-all). Modern scraping APIs built on the dual-engine pattern deliver all four. If you're building an agent and your scraping costs or latencies aren't in that profile, the scraper is the bottleneck — not the model.