Price Monitoring at Scale — Architecture Guide
Monitoring competitor prices at scale requires a reliable price monitoring api that can handle JavaScript rendering without the overhead of managing a browser cluster. Most e-commerce sites today use ...
Price Monitoring at Scale — Architecture Guide
Monitoring competitor prices at scale requires a reliable price monitoring api that can handle JavaScript rendering without the overhead of managing a browser cluster. Most e-commerce sites today use React, Next.js, or Vue, meaning a simple HTTP request often returns an empty shell instead of the actual price.
To build a production-grade monitor, you need to solve for three things: discovery of product pages, structured extraction of data, and an infrastructure that doesn't crash under the weight of headless Chrome.
The problem with traditional price monitoring
Most developers start by using Puppeteer or Playwright. While these tools are powerful, they create a significant infrastructure burden. A single Chrome instance consumes 200-500MB of RAM. If you are monitoring 10,000 products daily, the memory leaks and process management overhead become a full-time job.
Beyond infrastructure, there is the "messy HTML" problem. Prices are often buried in deeply nested div tags or rendered dynamically via API calls after the page loads. Writing custom CSS selectors for every competitor is fragile; as soon as a site updates its UI, your scrapers break.
Finally, anti-bot protections like Cloudflare often block basic headless browsers. You need a tool that mimics a real browser environment but operates with the efficiency of a backend service.
The architecture for a price monitoring api system
A scalable price monitoring system consists of four primary stages: Discovery, Extraction, Storage, and Diffing.
1. Discovery (Mapping)
You cannot monitor what you cannot find. Instead of hardcoding URLs, use the /v1/map endpoint to discover all product URLs on a competitor's domain. This allows your system to automatically detect new product launches or category changes.
2. Structured Extraction
Once you have the URLs, you need the data. Rather than parsing raw HTML, use the /v1/extract endpoint. This endpoint allows you to pass a JSON schema. Ilmenite renders the JavaScript, strips the boilerplate, and returns exactly the fields you requested.
3. Storage and State
Store the extracted price, currency, and timestamp in a database (such as PostgreSQL or MongoDB). You must maintain a state for each product to compare the current scrape against the previous one.
4. Diffing and Alerting
A background worker compares the new price with the stored price. If the difference exceeds a certain percentage, the system triggers an alert via a webhook, Slack, or email.
System Flow:
Competitor Domain \rightarrow /v1/map \rightarrow Product URL List \rightarrow /v1/extract \rightarrow Database \rightarrow Diff Engine \rightarrow Alert
Implementation with Python
Below is a professional implementation using the Ilmenite Python SDK. This example demonstrates how to extract prices from a list of URLs and trigger an alert on change.
Prerequisites
Install the SDK and a database client:
pip install ilmenite-sdk psycopg2-binary
Complete Implementation
import os
from ilmenite import Ilmenite
import psycopg2
# Initialize Ilmenite client
client = Ilmenite(api_key=os.environ.get("ILMENITE_API_KEY"))
# Database connection for state management
db = psycopg2.connect("dbname=prices user=admin password=secret")
cur = db.cursor()
# Define the schema for the price monitoring api extraction
# This ensures we get structured JSON back, not messy HTML
price_schema = {
"product_name": "string",
"price": "number",
"currency": "string",
"in_stock": "boolean"
}
def monitor_prices(urls):
for url in urls:
try:
# Use /v1/extract to get structured data
# This handles JS rendering automatically
result = client.extract(
url=url,
schema=price_schema
)
data = result.data
product_id = data['product_name']
current_price = data['price']
# Fetch previous price from DB
cur.execute("SELECT price FROM product_prices WHERE id = %s", (product_id,))
row = cur.fetchone()
if row:
old_price = row[0]
if current_price != old_price:
trigger_alert(product_id, old_price, current_price)
# Update DB with latest price
cur.execute(
"INSERT INTO product_prices (id, price) VALUES (%s, %s) "
"ON CONFLICT (id) DO UPDATE SET price = EXCLUDED.price",
(product_id, current_price)
)
db.commit()
except Exception as e:
print(f"Error scraping {url}: {e}")
def trigger_alert(name, old, new):
diff = ((new - old) / old) * 100
print(f"ALERT: {name} price changed from {old} to {new} ({diff:.2f}%)")
# Example usage
product_urls = [
"https://competitor.com/p/gaming-laptop-x1",
"https://competitor.com/p/wireless-mouse-z2"
]
monitor_prices(product_urls)
Results and performance
When running this architecture, the primary bottleneck is usually the browser startup time and memory consumption. Because Ilmenite is built in pure Rust, it eliminates the "Chrome Tax."
Infrastructure Comparison
| Metric | Traditional Chrome Cluster | Ilmenite API |
|---|---|---|
| Cold Start Time | 500ms - 2,000ms | 0.19ms |
| RAM per Session | 200MB - 500MB | ~2MB |
| Deployment | Heavy Docker / Kubernetes | Single Binary / 12MB Image |
| Latency (p95) | 200ms - 2,000ms | 47ms |
In a traditional setup, a $5/month VPS might struggle to run 10 concurrent Chrome sessions without swapping to disk. With Ilmenite, that same server can handle 1,000 concurrent sessions. This represents a 100x reduction in memory requirements.
From a cost perspective, you pay per operation via credits rather than paying for "browser-hours." A standard scrape costs 1 credit, while an LLM-powered extraction costs 5 credits. This makes the cost per product monitored predictable and significantly lower than maintaining a dedicated headless browser fleet.
Going further with your price monitoring api
Once the basic pipeline is operational, you can optimize for scale and accuracy.
Handling Complex SPAs
While Ilmenite's native Rust-based JavaScript engine (Boa) handles most sites, some extremely complex Single Page Applications (SPAs) require full V8 compatibility. In these cases, you can enable Chrome rendering in your request. This increases the credit cost to 3 credits but ensures 100% compatibility with the most complex web apps.
Enterprise Scaling and Self-Hosting
For teams with strict data residency requirements or massive scale (millions of pages per day), the Enterprise tier allows you to self-host Ilmenite as a single binary or Docker container. Because the image is only 12MB, it can be deployed into sidecar containers or edge functions with almost zero overhead.
AI-Driven Analysis
Instead of simple diffing, you can pipe the markdown output from Ilmenite into an LLM to analyze why a price changed. By combining the search endpoint with the extract endpoint, your agent can determine if a competitor's price drop is part of a wider seasonal sale or a targeted promotion.
If you are building an autonomous agent to handle this entire workflow, you can use the MCP integration to give Claude native access to these browsing capabilities.
Ready to build your monitor? Sign up for a free account and start with 500 free credits.