Connect Claude & GPT directly to the web.Try it now
All posts
TutorialApril 9, 2026·5 min·Ilmenite Team

LlamaIndex + Ilmenite — Loading Live Web Data

AI agents and RAG (Retrieval-Augmented Generation) pipelines are only as good as the data they can access. This guide shows you how to implement llamaindex web scraping using Ilmenite to turn any URL ...

LlamaIndex + Ilmenite — Loading Live Web Data

AI agents and RAG (Retrieval-Augmented Generation) pipelines are only as good as the data they can access. This guide shows you how to implement llamaindex web scraping using Ilmenite to turn any URL into clean, LLM-ready markdown that can be indexed and queried with natural language.

What we're building

We are building a RAG pipeline that takes a live URL, converts its content into clean markdown via the Ilmenite API, and loads that data into a LlamaIndex vector store. Once indexed, you can ask complex questions about the website's content, and the LLM will answer using the most relevant chunks of the scraped page. This removes the need to manually download HTML or manage headless browser infrastructure.

Prerequisites

To follow this tutorial, you will need:

  • An Ilmenite API key (the free tier provides 500 credits/month).
  • An OpenAI API key (or any other LLM provider supported by LlamaIndex).
  • Python 3.9+ installed on your machine.
  • The following Python packages:
    • llama-index
    • requests

You can install the dependencies via pip:

pip install llama-index requests

Why Ilmenite for LlamaIndex Web Scraping?

Most developers use Puppeteer or Playwright for web scraping, but running headless Chrome at scale is resource-intensive. Each Chrome instance consumes 200-500MB of RAM and suffers from slow cold starts.

Ilmenite is different. It is built in pure Rust, which allows it to start in 0.19ms and use only 2MB of RAM per session. For a RAG pipeline, this means your data ingestion is faster and your infrastructure costs are significantly lower.

Furthermore, LLMs struggle with raw HTML. HTML is filled with boilerplate, navigation menus, and script tags that waste tokens and confuse the model. Ilmenite's /v1/scrape endpoint strips this noise and returns clean markdown. Markdown preserves the structural hierarchy (headers, lists, links) that LlamaIndex needs for effective chunking and embedding, without the overhead of HTML tags.

Implementing LlamaIndex Web Scraping Step-by-Step

Step 1: Scraping the page with Ilmenite

The first step is to fetch the content of a webpage. We use the /v1/scrape endpoint, which handles JavaScript rendering (React, Vue, Next.js) automatically.

In the example below, we request the output in markdown format. This is the default and most efficient format for RAG pipelines.

import requests

def fetch_web_content(url, api_key):
    endpoint = "https://api.ilmenite.dev/v1/scrape"
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    payload = {
        "url": url,
        "format": "markdown"
    }
    
    response = requests.post(endpoint, json=payload, headers=headers)
    
    if response.status_code == 200:
        return response.json().get("markdown")
    else:
        raise Exception(f"API Error: {response.status_code} - {response.text}")

# Example usage
ILMENITE_API_KEY = "your_ilmenite_key"
url = "https://example.com/blog-post"
content = fetch_web_content(url, ILMENITE_API_KEY)
print(content)

This single API call replaces an entire browser management stack. Because Ilmenite uses a 12MB Docker image and a Rust-based engine, the p95 API latency is just 47ms, ensuring your ingestion pipeline doesn't become a bottleneck.

Step 2: Loading data into LlamaIndex

Once we have the markdown string, we need to wrap it in a LlamaIndex Document object. LlamaIndex uses these objects to manage the text before it is split into chunks and converted into embeddings.

from llama_index.core import Document

# Convert the markdown string into a LlamaIndex Document
document = Document(
    text=content,
    metadata={
        "source": url,
        "title": "Example Page"
    }
)

By adding the URL to the metadata, you ensure that the LLM can cite its sources when answering questions, which is critical for reducing hallucinations in production AI agents.

Step 3: Creating the Vector Index

Now we will take the document and create a VectorStoreIndex. LlamaIndex will automatically handle the chunking of the markdown text and store the embeddings in an in-memory vector store.

from llama_index.core import VectorStoreIndex

# Build the index from the document list
index = VectorStoreIndex.from_documents([document])

If you are building a larger system, you can replace the in-memory store with a production database like Pinecone, Weaviate, or Qdrant.

Step 4: Querying the live web data

The final step is to create a query engine. This allows you to ask questions in natural language. The engine will search the vector index for the most relevant markdown chunks and feed them to the LLM as context.

query_engine = index.as_query_engine()
response = query_engine.query("What are the main arguments presented in this article?")
print(response)

Scaling from one page to a whole site

The /v1/scrape endpoint is perfect for single pages. However, if you are building a comprehensive knowledge base, you will need to index entire domains.

For this, you should use the /v1/crawl endpoint. Instead of one URL, you provide a starting point and a depth limit. Ilmenite will discover all reachable pages, render them, and return the markdown for each.

Credit Cost Comparison:

OperationCreditsUse Case
/v1/scrape1Single page analysis
/v1/crawl1 per pageIndexing a full documentation site
/v1/extract5Getting structured JSON from a page

You can find more details on these operations in the Ilmenite documentation.

Full Working Code Example

Here is the complete implementation combining all the steps above.

import os
import requests
from llama_index.core import Document, VectorStoreIndex

# Configuration
ILMENITE_API_KEY = "your_ilmenite_key"
OPENAI_API_KEY = "your_openai_key"
TARGET_URL = "https://ilmenite.dev"

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

def get_markdown_from_ilmenite(url):
    """Fetches clean markdown from a URL using Ilmenite API."""
    endpoint = "https://api.ilmenite.dev/v1/scrape"
    headers = {
        "Authorization": f"Bearer {ILMENITE_API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {
        "url": url,
        "format": "markdown"
    }
    
    response = requests.post(endpoint, json=payload, headers=headers)
    if response.status_code == 200:
        return response.json().get("markdown")
    else:
        print(f"Error fetching {url}: {response.text}")
        return None

def main():
    print(f"Scraping {TARGET_URL}...")
    markdown_content = get_markdown_from_ilmenite(TARGET_URL)
    
    if not markdown_content:
        print("Failed to retrieve content.")
        return

    # Create LlamaIndex Document
    doc = Document(
        text=markdown_content,
        metadata={"source": TARGET_URL}
    )

    # Index the document
    print("Indexing content...")
    index = VectorStoreIndex.from_documents([doc])

    # Query the index
    query_engine = index.as_query_engine()
    question = "What is Ilmenite and what are its performance benefits?"
    
    print(f"\nQuestion: {question}")
    response = query_engine.query(question)
    print(f"Answer: {response}")

if __name__ == "__main__":
    main()

Next Steps

Now that you have a basic RAG pipeline running with live web data, you can expand its capabilities:

  1. Implement Site-wide Indexing: Use the /v1/crawl endpoint to load an entire documentation site into LlamaIndex instead of a single page.
  2. Structured Data Extraction: If you need specific fields (like product prices or dates), use the /v1/extract endpoint to get JSON instead of markdown.
  3. Optimize Costs: Check the pricing page to see how to move from the Free tier to the Developer or Pro tiers for higher concurrency and lower credit costs.
  4. Explore the Playground: Test different URLs and see the markdown output in real-time using the Ilmenite playground.

Ready to build your AI agent? Sign up for Ilmenite and start scraping for free.