Find the hidden APIs behind any website.Try it now
All posts
TutorialMarch 27, 2026·5 min·Ilmenite Team

LlamaIndex + Ilmenite — Loading Live Web Data

AI agents and RAG (Retrieval-Augmented Generation) pipelines are only as good as the data they can access. This guide shows you how to implement llamaindex web scraping using Ilmenite to turn any URL ...

AI agents and RAG (Retrieval-Augmented Generation) pipelines are only as good as the data they can access. This guide shows you how to implement llamaindex web scraping using Ilmenite to turn any URL into clean, LLM-ready markdown that can be indexed and queried with natural language.

What we're building

We are building a RAG pipeline that takes a live URL, converts its content into clean markdown via the Ilmenite API, and loads that data into a LlamaIndex vector store. Once indexed, you can ask complex questions about the website's content, and the LLM will answer using the most relevant chunks of the scraped page. This removes the need to manually download HTML or manage headless browser infrastructure.

Prerequisites

To follow this tutorial, you will need:

  • An Ilmenite API key (the free tier provides $5 in balance every month).
  • An OpenAI API key (or any other LLM provider supported by LlamaIndex).
  • Python 3.9+ installed on your machine.
  • The following Python packages:
    • llama-index
    • requests

You can install the dependencies via pip:

pip install llama-index requests

Why Ilmenite for LlamaIndex Web Scraping?

Most developers use Puppeteer or Playwright for web scraping, but running headless Chrome at scale is resource-intensive. Each Chrome instance consumes 200-500MB of RAM and suffers from slow cold starts.

Ilmenite is different. It is built in pure Rust, which allows it to start in the fast path and use a small memory footprint for static scrapes. For a RAG pipeline, this means your data ingestion is faster and your infrastructure costs are significantly lower.

Furthermore, LLMs struggle with raw HTML. HTML is filled with boilerplate, navigation menus, and script tags that waste tokens and confuse the model. Ilmenite's /v1/scrape endpoint strips this noise and returns clean markdown. Markdown preserves the structural hierarchy (headers, lists, links) that LlamaIndex needs for effective chunking and embedding, without the overhead of HTML tags.

Implementing LlamaIndex Web Scraping Step-by-Step

Step 1: Scraping the page with Ilmenite

The first step is to fetch the content of a webpage. We use the /v1/scrape endpoint, which handles JavaScript rendering (React, Vue, Next.js) automatically.

In the example below, we request the output in markdown format. This is the default and most efficient format for RAG pipelines.

import requests

def fetch_web_content(url, api_key):
    endpoint = "https://api.ilmenite.dev/v1/scrape"
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    payload = {
        "url": url,
        "format": "markdown"
    }
    
    response = requests.post(endpoint, json=payload, headers=headers)
    
    if response.status_code == 200:
        return response.json().get("markdown")
    else:
        raise Exception(f"API Error: {response.status_code} - {response.text}")

# Example usage
ILMENITE_API_KEY = "your_ilmenite_key"
url = "https://example.com/blog-post"
content = fetch_web_content(url, ILMENITE_API_KEY)
print(content)

This single API call replaces an entire browser management stack. Because Ilmenite uses a hosted API (no Docker to deploy) and a Rust-based engine, the per-request latency stays low on the fast path, ensuring your ingestion pipeline doesn't become a bottleneck.

Step 2: Loading data into LlamaIndex

Once we have the markdown string, we need to wrap it in a LlamaIndex Document object. LlamaIndex uses these objects to manage the text before it is split into chunks and converted into embeddings.

from llama_index.core import Document

# Convert the markdown string into a LlamaIndex Document
document = Document(
    text=content,
    metadata={
        "source": url,
        "title": "Example Page"
    }
)

By adding the URL to the metadata, you ensure that the LLM can cite its sources when answering questions, which is critical for reducing hallucinations in production AI agents.

Step 3: Creating the Vector Index

Now we will take the document and create a VectorStoreIndex. LlamaIndex will automatically handle the chunking of the markdown text and store the embeddings in an in-memory vector store.

from llama_index.core import VectorStoreIndex

# Build the index from the document list
index = VectorStoreIndex.from_documents([document])

If you are building a larger system, you can replace the in-memory store with a production database like Pinecone, Weaviate, or Qdrant.

Step 4: Querying the live web data

The final step is to create a query engine. This allows you to ask questions in natural language. The engine will search the vector index for the most relevant markdown chunks and feed them to the LLM as context.

query_engine = index.as_query_engine()
response = query_engine.query("What are the main arguments presented in this article?")
print(response)

Scaling from one page to a whole site

The /v1/scrape endpoint is perfect for single pages. However, if you are building a comprehensive knowledge base, you will need to index entire domains.

For this, you should use the /v1/crawl endpoint. Instead of one URL, you provide a starting point and a depth limit. Ilmenite will discover all reachable pages, render them, and return the markdown for each.

Cost Comparison:

OperationCostUse Case
/v1/scrape$0.001Single page analysis
/v1/crawl$0.001 per pageIndexing a full documentation site
/v1/extract$0.005Getting structured JSON from a page

You can find more details on these operations in the Ilmenite documentation.

Full Working Code Example

Here is the complete implementation combining all the steps above.

import os
import requests
from llama_index.core import Document, VectorStoreIndex

# Configuration
ILMENITE_API_KEY = "your_ilmenite_key"
OPENAI_API_KEY = "your_openai_key"
TARGET_URL = "https://ilmenite.dev"

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

def get_markdown_from_ilmenite(url):
    """Fetches clean markdown from a URL using Ilmenite API."""
    endpoint = "https://api.ilmenite.dev/v1/scrape"
    headers = {
        "Authorization": f"Bearer {ILMENITE_API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {
        "url": url,
        "format": "markdown"
    }
    
    response = requests.post(endpoint, json=payload, headers=headers)
    if response.status_code == 200:
        return response.json().get("markdown")
    else:
        print(f"Error fetching {url}: {response.text}")
        return None

def main():
    print(f"Scraping {TARGET_URL}...")
    markdown_content = get_markdown_from_ilmenite(TARGET_URL)
    
    if not markdown_content:
        print("Failed to retrieve content.")
        return

    # Create LlamaIndex Document
    doc = Document(
        text=markdown_content,
        metadata={"source": TARGET_URL}
    )

    # Index the document
    print("Indexing content...")
    index = VectorStoreIndex.from_documents([doc])

    # Query the index
    query_engine = index.as_query_engine()
    question = "What is Ilmenite and what are its performance benefits?"
    
    print(f"\nQuestion: {question}")
    response = query_engine.query(question)
    print(f"Answer: {response}")

if __name__ == "__main__":
    main()

Next Steps

Now that you have a basic RAG pipeline running with live web data, you can expand its capabilities:

  1. Implement Site-wide Indexing: Use the /v1/crawl endpoint to load an entire documentation site into LlamaIndex instead of a single page.
  2. Structured Data Extraction: If you need specific fields (like product prices or dates), use the /v1/extract endpoint to get JSON instead of markdown.
  3. Optimize Costs: Check the pricing page to see how to top up your balance for higher concurrency and volume bonuses.
  4. Explore the Playground: Test different URLs and see the markdown output in real-time using the Ilmenite playground.

Ready to build your AI agent? Sign up for Ilmenite and start scraping for free.