Connect Claude & GPT directly to the web.Try it now
Core APIs

Crawl

Start from a seed URL and walk every link that stays within your filter rules. Crawls run asynchronously — you get a job ID back and poll (or subscribe via webhook) until it finishes.

Endpoint

POST https://api.ilmenite.dev/v1/crawl

Request body

  • url (string, required) — seed URL.
  • max_depth (number, optional) — link depth from seed. Default 3.
  • max_pages (number, optional) — hard cap on pages. Default 100.
  • include (string[], optional) — URL patterns to include (glob).
  • exclude (string[], optional) — URL patterns to exclude.
  • same_origin (boolean, optional) — restrict to seed origin. Default true.
  • respect_robots (boolean, optional) — honor robots.txt. Default true.
  • concurrency (number, optional) — parallel fetches. Default 10, capped by plan.
  • webhook_url (string, optional) — receive completion callback.
  • formats (string[], optional) — output formats per page. Defaults to ["markdown"].

Example

curl -X POST https://api.ilmenite.dev/v1/crawl \
  -H "Authorization: Bearer $ILMENITE_API_KEY" \
  -d '{
    "url": "https://docs.stripe.com",
    "max_pages": 200,
    "include": ["/docs/*"],
    "exclude": ["/docs/search"]
  }'

Response

{
  "job_id": "crw_8f3a9e12",
  "status": "queued",
  "seed_url": "https://docs.stripe.com",
  "created_at": "2026-04-05T21:48:00Z"
}

Polling a job

GET https://api.ilmenite.dev/v1/crawl/:job_id
{
  "job_id": "crw_8f3a9e12",
  "status": "complete",
  "pages_crawled": 187,
  "pages_queued": 0,
  "pages_failed": 3,
  "total_ms": 18340,
  "pages": [
    { "url": "/docs", "title": "Docs", "status": 200, "markdown": "..." },
    { "url": "/docs/api", "title": "API", "status": 200, "markdown": "..." }
  ]
}

Lifecycle

  1. queued — accepted, not yet running.
  2. running — at least one page in flight.
  3. complete — finished; pages available for 7 days.
  4. failed — crawl aborted; see error field.