Web scraping API
for AI agents.
Turn any URL into clean markdown, structured data, or extracted JSON. Scrape, crawl, map, extract, and search — one API, pay per use.
- 0.19ms cold start
- 2MB per session
- PDF + JS rendering
- MCP for Claude
# Scrape any URL → clean markdown
curl -X POST https://api.ilmenite.dev/v1/scrape \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"format": "markdown"
}'
# → { "markdown": "# Hacker News\n\n1. Show HN: ..." }
# → 120ms totalHead to head
Ilmenite vs Chrome headless.
Measured on Apple M2, release build, 100 sequential runs averaged. Chrome numbers from puppeteer launched with --headless=new.
Cost estimate: 100 concurrent sessions, 24h/day, on DigitalOcean · methodology
How it works
See every step your agent ran.
One API call can navigate, wait for selectors, click elements, and extract structured data — all in under a second.
- navigatenews.ycombinator.com0.00s
- wait_fortr.athing0.03s
- page.markdown()4.2KB0.18s
- clicka[href*=item?id]:nth(0)0.32s
- wait_for.comment0.46s
- extract(schema)32 comments0.61s
- complete0.78s total0.78s
Primitives
Five endpoints. Everything your agent needs.
Turn any URL into clean markdown.
HTML → LLM-ready markdown, JSON, or structured text. JavaScript rendering, smart content extraction, adaptive parsing. The default call your agent will make 100× a day.
Read referencecurl https://api.ilmenite.dev/v1/scrape \
-H "Authorization: Bearer $KEY" \
-d '{
"url": "https://example.com",
"formats": ["markdown", "links"]
}'Walk an entire site with one call.
Pass a seed URL, get every reachable page. Depth limits, URL globs, concurrency control, robots.txt support, webhook callbacks. 200 pages in under 2 seconds.
Read referenceconst job = await client.crawl.start({
url: "https://docs.stripe.com",
maxPages: 200,
include: ["/docs/*"],
});
for await (const page of client.crawl.stream(job.id)) {
console.log(page.url);
}Schema in, typed data out.
Pass a JSON Schema, get validated structured data. Powered by OpenAI or Anthropic (bring your own key). Built for RAG pipelines, tool calls, and data enrichment.
Read referencecurl https://api.ilmenite.dev/v1/extract \
-H "Authorization: Bearer $KEY" \
-d '{
"url": "https://example.com/product/42",
"schema": {
"name": "string",
"price_usd": "number",
"in_stock": "boolean"
},
"provider": "openai"
}'
# → { "data": { "name": "Widget", "price_usd": 29.99, "in_stock": true } }Discover every URL on a site.
Pass a domain, get every discoverable URL. Parses sitemap.xml, follows links, filters by path. Feed the list into /crawl or /batch/scrape to extract them all.
Read referencecurl https://api.ilmenite.dev/v1/map \
-H "Authorization: Bearer $KEY" \
-d '{
"url": "https://docs.stripe.com",
"limit": 500,
"search": "/docs"
}'
# → { "count": 342, "links": ["https://docs.stripe.com/...", ...] }Google search + scrape in one call.
Send a query, get top results with snippets. Set scrape=true to also extract full markdown from each result page. Built for RAG and research agents.
Read referencecurl https://api.ilmenite.dev/v1/search \
-H "Authorization: Bearer $KEY" \
-d '{
"query": "rust web scraping 2026",
"num_results": 5,
"scrape": true
}'
# → results with title, url, snippet, markdownCookbook
Recipes you can copy today.
Scrape an e-commerce catalog
Crawl product pages, extract typed records.
client.crawl.start({
url: "shop.example.com",
include: ["/products/*"],
extract: ProductSchema,
});Monitor price changes
Poll a URL, diff extracted fields, fire webhooks.
await client.extract({
url: "flights.example.com/lax-jfk",
schema: { price_usd: "number" },
webhook_url: "https://my.app/hook",
});Feed RAG with clean markdown
Walk docs site, chunk markdown, embed.
const pages = await client.crawl.stream(job);
for await (const p of pages) {
await embedAndStore(p.markdown);
}Give Claude a browser
Connect Ilmenite MCP to Claude Desktop.
// claude_desktop_config.json
{
"mcpServers": {
"ilmenite": { "url": "mcp.ilmenite.dev" }
}
}Stream sitemap → warehouse
Map a site, pipe URLs to BigQuery.
const urls = await client.map({
url: "docs.example.com",
});
await bq.insert("pages", urls);PDF → searchable text
Extract text from any PDF URL. Handles arxiv papers, invoices, scans.
const result = await fetch(API + "/v1/scrape", {
method: "POST",
body: JSON.stringify({
url: "https://arxiv.org/pdf/2410.02073.pdf"
})
});
const { content } = await result.json();
console.log(content.markdown);Works with your stack
Drop into what you already have.
Tool + DocumentLoader adapters. Python and TypeScript.
from ilmenite.langchain import IlmeniteScrapeTool
agent = initialize_agent(
tools=[IlmeniteScrapeTool()],
llm=ChatOpenAI(model="gpt-4o"),
)Native MCP server.
{
"ilmenite": {
"url": "mcp.ilmenite.dev"
}
}Sync + async, Pydantic extract.
pip install ilmeniteESM, edge runtime, Zod extract.
npm install ilmeniteCloud client or local engine. Same crate.
use ilmenite::Client;
let client = Client::from_env()?;
let md = client.scrape(url).send().await?.markdown;Give Claude web access. 5 tools: scrape, crawl, map, extract, search.
// claude_desktop_config.json
{
"mcpServers": {
"ilmenite": { "command": "ilmenite-mcp" }
}
}Compare
How Ilmenite stacks up.
Ilmenite is a managed API + self-hostable engine. Playwright is included as a reference point for the automation space.
| Feature | Ilmenite | Browserbase | Steel.dev | PlaywrightOSS library |
|---|---|---|---|---|
| Cold start | 0.19ms | 500-2000ms | 500-2000ms | 500-2000ms |
| RAM per instance | ~2 MB | 200-500 MB | 200-500 MB | 200-500 MB |
| Single binary deploy | ||||
| Chrome dependency | Optional | Required | Required | Required |
| Pure Rust | ||||
| Native MCP server | ||||
| Browser hours metered | ||||
| Self-hostable |
Pricing
Pay per use. Start free.
Credits-based pricing. 1 scrape = 1 credit. No subscriptions, no browser-hour metering.
Free
Try it out. No credit card.
500 credits / month
Start free- 500 credits included
- 2 concurrent requests
- All endpoints (scrape, crawl, map, extract, search)
- PDF extraction
- Community support
Developer
For builders shipping agents. No monthly commitment.
Pay as you go
Get started- Pay only for what you use
- 10 concurrent requests
- $5 min top-up
- MCP server access
- Chrome JS rendering
- Email support
Pro
For production workloads at scale.
Volume pricing
Get started- 40% cheaper per credit
- 50 concurrent requests
- Priority queue
- Webhook on crawl completion
- 99.9% uptime SLA
- Slack support
Enterprise
For platforms and regulated industries.
Volume + self-host
Contact sales- Custom credit pricing
- Unlimited concurrency
- Self-hosted or air-gapped deploy
- SSO, SAML, audit logs
- 99.99% SLA + dedicated engineer
- SOC 2 Type II
Credit usage per operation
FAQ