Core APIs
Crawl
Start from a seed URL and walk every link that stays within your filter rules. Crawls run asynchronously — you get a job ID back and poll (or subscribe via webhook) until it finishes.
Endpoint
POST https://api.ilmenite.dev/v1/crawlRequest body
url(string, required) — seed URL.max_depth(number, optional) — link depth from seed. Default 3.max_pages(number, optional) — hard cap on pages. Default 100.include(string[], optional) — URL patterns to include (glob).exclude(string[], optional) — URL patterns to exclude.same_origin(boolean, optional) — restrict to seed origin. Defaulttrue.respect_robots(boolean, optional) — honor robots.txt. Defaulttrue.concurrency(number, optional) — parallel fetches. Default 10, capped by plan.webhook_url(string, optional) — receive completion callback.formats(string[], optional) — output formats per page. Defaults to["markdown"].
Example
curl -X POST https://api.ilmenite.dev/v1/crawl \
-H "Authorization: Bearer $ILMENITE_API_KEY" \
-d '{
"url": "https://docs.stripe.com",
"max_pages": 200,
"include": ["/docs/*"],
"exclude": ["/docs/search"]
}'Response
{
"job_id": "crw_8f3a9e12",
"status": "queued",
"seed_url": "https://docs.stripe.com",
"created_at": "2026-04-05T21:48:00Z"
}Polling a job
GET https://api.ilmenite.dev/v1/crawl/:job_id{
"job_id": "crw_8f3a9e12",
"status": "complete",
"pages_crawled": 187,
"pages_queued": 0,
"pages_failed": 3,
"total_ms": 18340,
"pages": [
{ "url": "/docs", "title": "Docs", "status": 200, "markdown": "..." },
{ "url": "/docs/api", "title": "API", "status": 200, "markdown": "..." }
]
}Lifecycle
queued— accepted, not yet running.running— at least one page in flight.complete— finished; pages available for 7 days.failed— crawl aborted; seeerrorfield.