Is Ilmenite really a browser?

Yes — a headless browser built in pure Rust. It parses HTML, queries CSS selectors, runs JavaScript, and extracts content. Not a Chromium fork.

How does pricing work?

Credits-based, pay per use. 1 scrape = 1 credit. Free tier includes 500 credits/month. Pay as you go after that — no subscriptions, no browser-hour metering.

All posts

BlogApril 15, 2026·3 min·Ilmenite Team

PDF→Markdown, billed per feature — what we shipped

> Draft notice: This blog post is a draft. Numbers marked

Draft notice: This blog post is a draft. Numbers marked [BENCHMARK PENDING] will be filled in from the public benchmark JSON files committed alongside the engine. Do not publish until every such marker has a real, traceable number — see memory project_positioning_discover.md and the recent commits sweeping fictional numbers from the blog.

What changed

We shipped a new PDF→Markdown engine inside ilmenite. It runs entirely in Rust, replaces the legacy pdftotext + tesseract subprocess chain with a single in-process pipeline, and bills per capability per page instead of a flat per-page rate. New endpoints:

POST /v1/pdf/extract — fetches the PDF and returns markdown + itemized billing.
POST /v1/pdf/estimate — same request shape, returns a cost preview without actually extracting. Free.

How the pricing works

Most providers charge a flat credit per page for PDFs regardless of what you actually need. We charge the capabilities that ran on each page. Base text extraction is the floor; tables, formulas, OCR, and ML layout each add their own per-page surcharge, billed only on pages where the capability actually fired.

Capability	Per-page surcharge	Billed when
Base text extraction	$0.0001	Every page
`tables`	+$0.0002	Pages where ≥1 table was detected
`formulas`	+$0.0003	Pages where ≥1 math region was detected
`images`	+$0.0001	Pages where ≥1 image was extracted
`preserve_layout`	+$0.0001	Every page in the request
`ocr` (auto)	+$0.0008	Only pages the classifier flagged scanned
`quality` (ML)	+$0.0015	Only pages the classifier flagged complex

You can pick a named tier (preset of features) or pass features directly. Five tiers map to common workloads:

Tier	Per-page	Pages per $1	When to pick it
Light	$0.0001	10,000	Plain born-digital PDFs
Standard (most popular)	$0.0003	3,333	Reports with tables
Scientific	$0.0006	1,666	Papers with math
Scanned	$0.0010	1,000	Scanned docs
Max	$0.0025	400	Maximum fidelity

Or pass tier: "auto" and we route per page from a cheap classifier (<5ms per page). On a 100-page mixed PDF (60 simple + 30 tables + 10 scanned), auto bills the sum of per-page tier costs — typically $0.025 vs. ~$0.10–$0.20 for flat-credit competitors.

How fast it is

We benchmarked the new engine against the legacy ilmenite path, Firecrawl's Fire-PDF, Marker, and Docling on a fixed corpus of real PDFs.

Workload	ilmenite (auto)	ilmenite (max)	Firecrawl Fire-PDF	Marker	Docling
100-page born-digital report	`[BENCHMARK PENDING]`	`[BENCHMARK PENDING]`	`[BENCHMARK PENDING]`	`[BENCHMARK PENDING]`	`[BENCHMARK PENDING]`
50-page scientific paper	`[BENCHMARK PENDING]`	`[BENCHMARK PENDING]`	`[BENCHMARK PENDING]`	`[BENCHMARK PENDING]`	`[BENCHMARK PENDING]`
30-page scanned legal	`[BENCHMARK PENDING]`	`[BENCHMARK PENDING]`	`[BENCHMARK PENDING]`	`[BENCHMARK PENDING]`	`[BENCHMARK PENDING]`

Source: tests/fixtures/pdfs/ + docs/pdf-engine/public-benchmark.md. Every number above traces back to a JSON file in this repo. Until those files are populated, every number remains [BENCHMARK PENDING] and this post does not ship.

Where we lose

We're going to be honest about the failure modes too. Here are the PDFs in the corpus where Marker or Docling beat us, and why:

[CASE PENDING] — we lose by [N]% on [METRIC]. Why: [REASON].
[CASE PENDING] — [REASON].

We'd rather tell you where the seam is than pretend it isn't there. The benchmark corpus grows whenever a customer reports a PDF that fails — every reported failure becomes a permanent regression test.

Try it

curl -X POST https://api.ilmenite.dev/v1/pdf/estimate \
  -H "Authorization: Bearer $ILMENITE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/report.pdf",
    "tier": "auto"
  }'

The estimate endpoint is free and returns the bill before you commit. There's also a paste-and-extract tool at /dashboard/pdf.

Free tier: 5,000 PDF pages/month, no credit card.

Docs: /docs/pdf-extraction.