Stop Fighting Cloudflare. Find the Hidden API Instead.
Every web scraping tutorial starts the same way: "launch a headless browser, render the page, extract the content." Then you run it against a real target and Cloudflare returns Error 1010. You swap in...
Stop Fighting Cloudflare. Find the Hidden API Instead.
Every web scraping tutorial starts the same way: "launch a headless browser, render the page, extract the content." Then you run it against a real target and Cloudflare returns Error 1010. You swap in residential proxies. You patch your TLS fingerprint. You solve the Turnstile challenge. You burn a week and $400 in proxy bandwidth just to scrape a product catalog.
There is a better way. Every JavaScript-heavy website fetches its data from an API. That API is almost always not behind Cloudflare's bot protection — only the HTML shell is. If you can find the API, you can call it directly with plain HTTP. No browser. No proxy. No bot detection.
Today we are shipping /v1/discover, an Ilmenite endpoint that automates exactly this workflow. Give it a URL, it launches Chrome, captures every XHR and fetch call the page makes, probes each one to see if it works without a browser, and returns the list of discovered APIs with a recommendation for how to call each one.
The problem with scraping the page
Consider a site like hn.algolia.com — the Hacker News search frontend. Open it in Chrome and the page shows search results from Algolia. Under the hood, that page is a thin React app that fetches data from Algolia's public API.
A traditional scraper would render the React app through headless Chrome, wait for hydration, then parse the resulting DOM. It works but it is slow (2-4 seconds per page), expensive (credits-per-Chrome-launch), and fragile (a React update breaks your selectors).
The actual data lives one layer deeper. Every search hits an API endpoint like:
https://uj5wyc0l7x-dsn.algolia.net/1/indexes/Item_dev/query?x-algolia-api-key=28f0e1ec37a5e792e6845e67da5f20dd&x-algolia-application-id=UJ5WYC0L7X
The API key is right there in the URL. The endpoint returns JSON. It has no Cloudflare protection because Algolia is not Cloudflare. A single curl gives you structured data in under 100 milliseconds. But nobody told you this endpoint existed.
That is the gap /v1/discover closes.
How /v1/discover works
The endpoint runs a four-step workflow that mirrors what you would do manually with DevTools, except it completes in one HTTP call:
- Launch Chrome. Ilmenite spins up a fresh, stealth-configured Chromium instance with the same anti-detection measures it uses for regular scraping.
- Capture network traffic. Using the Chrome DevTools Protocol's
Networkdomain, every XHR, fetch, GraphQL call, and resource request made during the page load is recorded — URL, method, status, headers, mime type, response size. - Probe each captured request. For every API call captured, Ilmenite runs a transport ladder: first
direct-http(a plainreqwestGET with no extra headers), thenmatched-headers(same request but with the headers Chrome sent). If the probe response matches the original's status and shape, that transport is marked as working. - Return a structured report. You get back a JSON array of discovered APIs with a
probe.recommendedfield telling you exactly how to replay each one — or null if the request needs a browser.
The whole process takes 5-10 seconds and costs 10 credits.
A real example
Let us run it against hn.algolia.com:
curl https://api.ilmenite.dev/v1/discover \
-H "Authorization: Bearer $ILMENITE_API_KEY" \
-d '{
"url": "https://hn.algolia.com/",
"wait_ms": 5000,
"probe": true
}'
The response:
{
"url": "https://hn.algolia.com/",
"total_requests": 2,
"elapsed_ms": 8165,
"discovered": [
{
"method": "GET",
"url": "https://www.google-analytics.com/j/collect?v=1&_v=j102&...",
"status": 200,
"resource_type": "xhr",
"mime_type": "text/plain",
"probe": {
"recommended": "direct-http",
"attempts": [
{
"transport": "direct-http",
"status": 200,
"ok": true,
"duration_ms": 29
}
]
}
},
{
"method": "GET",
"url": "https://uj5wyc0l7x-dsn.algolia.net/1/indexes/Item_dev/query?x-algolia-api-key=28f0e1ec37a5e792e6845e67da5f20dd&x-algolia-application-id=UJ5WYC0L7X",
"status": 200,
"resource_type": "xhr",
"mime_type": "application/json",
"probe": {
"recommended": null,
"attempts": [
{ "transport": "direct-http", "status": 404, "ok": false },
{ "transport": "matched-headers", "status": 404, "ok": false }
]
}
}
]
}
Two things to notice. First, the analytics beacon is marked direct-http — it can be called from any HTTP client. Second, the Algolia endpoint is honestly flagged as "needs more work": our GET-only probe returns 404 because the real request is a POST with a JSON body. But the Algolia API key is now in your hands. You know the endpoint. You know the application ID. One curl with the right POST body and you are pulling structured search results — no Chrome, no rendering, no Cloudflare.
Why this beats stealth mode
Ilmenite already ships with a sophisticated stealth stack: 20+ Chrome launch flags, CDP-level user-agent overrides, JavaScript injection to hide navigator.webdriver, 6 realistic user-agent presets, and full Chrome-matching HTTP headers. It handles most bot-protected sites. But some sites — the Uniswap frontends, the e-commerce giants — sit behind Cloudflare Enterprise Bot Management, which detects headless Chrome at the TLS fingerprint layer before a single JavaScript line runs. No amount of stealth fixes that.
The insight from /v1/discover is that you do not need to fix it. Cloudflare protects the HTML page because that is the user-facing surface. The backend APIs that page calls are usually on a different subdomain, behind a different CDN, or not behind Cloudflare at all. Protecting every API endpoint with bot challenges would break the company's own SPA. So they do not.
Once you know the API URL, you have three options:
- Call it directly with plain HTTP. Works if the probe returned
direct-http. You just got a structured JSON endpoint for free. - Use Ilmenite's existing
/v1/scrapeagainst the API URL. The scrape fast path uses pooledreqwestconnections — 100ms per request, no Chrome overhead. - Fall back to browser-level replay. If the API enforces Chrome TLS fingerprinting or session cookies, route the replay through Ilmenite's BYOP residential proxy feature. You still save on the browser-render cost.
What you can do with it
A few workflows that become trivial once you have the API URL:
- Price monitoring. Run
/v1/discoveragainst a product page once. Find the pricing API. Call it every 10 minutes from a cron job for pennies. - Search aggregation. Discover the underlying search API of any SPA. Build a unified search layer across sites without rendering a single browser.
- RAG data pipelines. Find the content API behind a docs site. Pull structured JSON for every page instead of scraping HTML and parsing markdown.
- Competitive intelligence. See exactly which third-party APIs a competitor integrates with — payments, analytics, recommendations, feature flags. Every XHR is visible.
- Reverse-engineering for your own frontend. Forgot what internal API your team built three quarters ago? Run discover against staging. Get a full map in 10 seconds.
Try it now
/v1/discover is live today for every Ilmenite account, free tier included. It costs 10 credits per call, which makes sense once you see how much browser time it saves you downstream.
The easiest way to try it is the dashboard playground at ilmenite.dev/dashboard/discover — paste a URL, click Run, see the captured APIs with per-endpoint probe verdicts (green for "call directly", red for "needs browser"). Copy any URL with one click. Expand any row to see the full probe attempts.
If you are wiring it into an agent, the endpoint is at POST https://api.ilmenite.dev/v1/discover. Full reference at ilmenite.dev/docs. The x-algolia-api-key in our example response is a real key that real Algolia docs tell you to expose — that is not an accident, Algolia designs those keys to be public. But plenty of sites expose more than they should, and /v1/discover shows you exactly what.
Stop fighting the front door. Go around it.