Connect Claude & GPT directly to the web.Try it now
All posts
Use CaseApril 9, 2026·5 min·Ilmenite Team

Building an AI Research Assistant That Reads the Web

LLMs are limited by their training cut-off dates and a tendency to hallucinate when they lack specific, real-time information. To build a functional ai research assistant web tool, you must give the m...

Building an AI Research Assistant That Reads the Web

LLMs are limited by their training cut-off dates and a tendency to hallucinate when they lack specific, real-time information. To build a functional ai research assistant web tool, you must give the model a way to browse the live internet and consume data in a format it understands.

The challenge isn't just getting the data, but getting it clean. Raw HTML is noisy, filled with navigation menus, scripts, and ads that waste tokens and confuse the model. You need a pipeline that converts a user's question into a set of clean, markdown-formatted documents that an LLM can synthesize into an accurate answer.

The problem with traditional web browsing for AI

Most developers attempt to build research assistants by wrapping a headless browser like Playwright or Puppeteer. While these tools are powerful, they introduce significant infrastructure overhead.

A single Chrome instance consumes between 200MB and 500MB of RAM. If your research assistant needs to scrape five different sources to answer a single query, you are suddenly managing massive memory spikes and potential process crashes. Furthermore, the "cold start" time for these browsers—the time it takes to launch the process and navigate to a page—is often between 500ms and 2,000ms.

Then there is the data cleaning problem. LLMs do not need <div> tags or CSS classes; they need the actual content. Manually writing selectors for every website your assistant might visit is impossible. You need a system that automatically strips the boilerplate and returns clean markdown.

The architecture of an ai research assistant web tool

To solve these problems, we use a four-stage pipeline: query expansion, discovery, extraction, and synthesis. Instead of managing browser infrastructure, we use Ilmenite, a web scraping API built in Rust that handles rendering and cleaning in a single call.

1. Query Expansion and Search

The process begins when a user asks a question. The assistant uses an LLM to turn that question into a search query. This query is sent to the /v1/search endpoint.

The search endpoint performs a web search and returns the top relevant URLs. By combining search and scraping into one API, you eliminate the need to manage separate search API keys and custom parsing logic for search engine result pages (SERPs).

2. Parallel Scraping

Once the assistant has a list of URLs, it must fetch the content. This is where performance becomes critical. Because Ilmenite is built in pure Rust, it has a cold start time of 0.19ms and uses only ~2MB of RAM per session.

The assistant sends these URLs to the /v1/scrape endpoint. Ilmenite renders the JavaScript (handling React, Vue, or Next.js sites) and strips away the noise. The result is clean markdown that preserves the structure of the page without the HTML clutter.

3. Context Window Management

The assistant receives several markdown documents. Since even large context windows have limits, the assistant may need to chunk the text or use a RAG (Retrieval-Augmented Generation) approach to find the most relevant snippets across the scraped pages.

4. Synthesis

The final stage is the synthesis. The LLM takes the clean markdown, the original user question, and a system prompt instructing it to cite its sources. It then generates a comprehensive answer based solely on the retrieved web data.

Implementation with Python and Ilmenite

Below is a complete implementation of this architecture using Python. This example assumes you have an OpenAI or Anthropic API key for the synthesis stage and an Ilmenite API key for the web data.

import requests
import openai

# Configuration
ILMENITE_API_KEY = "your_ilmenite_key"
OPENAI_API_KEY = "your_openai_key"
ILMENITE_BASE_URL = "https://api.ilmenite.dev/v1"

client = openai.OpenAI(api_key=OPENAI_API_KEY)

def research_web(query):
    print(f"Searching for: {query}...")
    
    # Step 1: Search for top results
    search_response = requests.post(
        f"{ILMENITE_BASE_URL}/search",
        headers={"Authorization": f"Bearer {ILMENITE_API_KEY}"},
        json={"q": query, "num_results": 3}
    )
    search_results = search_response.json().get("results", [])
    
    # Step 2: Scrape top results into markdown
    context_data = []
    for result in search_results:
        url = result['url']
        print(f"Scraping {url}...")
        
        scrape_response = requests.post(
            f"{ILMENITE_BASE_URL}/scrape",
            headers={"Authorization": f"Bearer {ILMENITE_API_KEY}"},
            json={"url": url, "format": "markdown"}
        )
        
        if scrape_response.status_code == 200:
            markdown_content = scrape_response.json().get("content")
            context_data.append(f"Source: {url}\nContent:\n{markdown_content}")

    # Combine all scraped data into one context block
    full_context = "\n\n---\n\n".join(context_data)
    
    # Step 3: Synthesize answer with LLM
    print("Synthesizing answer...")
    prompt = f"""
    You are a professional research assistant. Use the following web context to answer the user's question.
    If the answer is not in the context, say you don't know. Always cite the source URL.

    Context:
    {full_context}

    Question: {query}
    """
    
    completion = client.chat.completions.create(
        model="gpt-4-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
    
    return completion.choices[0].message.content

# Execution
user_query = "What are the current benchmarks for Rust-based browser engines compared to Chrome?"
answer = research_web(user_query)
print("\nFinal Answer:\n", answer)

Results and performance

When building an ai research assistant web tool, the perceived speed of the agent depends on the latency of the data retrieval.

Using traditional headless Chrome infrastructure, the time spent waiting for browser boot-up and page rendering often exceeds the time the LLM spends generating the answer. In contrast, Ilmenite's p95 API latency is 47ms.

Resource Efficiency

If you are deploying this assistant as a microservice, the infrastructure requirements are drastically lower. A standard Chrome-based scraper requires significant memory to avoid OOM (Out of Memory) errors. Because Ilmenite is a single binary with a 12MB Docker image, you can run thousands of concurrent sessions on a small server.

Token Optimization

By converting HTML to markdown, you reduce the token count by 60-80% per page. This not only lowers your LLM API costs but also allows you to fit more sources into the context window, increasing the accuracy and depth of the research assistant's answers.

MetricTraditional Headless ChromeIlmenite API
Cold Start500ms - 2,000ms0.19ms
RAM per Session200MB - 500MB~2MB
Output FormatRaw HTML (Noisy)Clean Markdown
DeploymentLarge Docker Images (>1GB)12MB Docker Image

Going further with your AI assistant

Once you have a basic research loop working, you can enhance the assistant with more advanced features available in the documentation.

Structured Data Extraction

If your research assistant needs to compare specific data points—such as pricing tables or technical specifications—don't rely on the LLM to find them in a wall of text. Use the /v1/extract endpoint. You can provide a JSON schema, and Ilmenite will return structured data directly, which can be fed into a database or a comparison table.

Integration with Claude via MCP

For those using Claude, you don't need to build a custom wrapper. Ilmenite provides an MCP server integration, allowing Claude to use the browser engine as a native tool. This removes the need to write the orchestration code entirely.

Self-Hosting for Privacy

For enterprise research assistants that handle sensitive queries, you can deploy Ilmenite as a self-hosted binary or Docker container. This ensures that your web traffic and data residency requirements are met without sacrificing the performance of a Rust-based engine.

To start building your own agent, you can sign up for a free account or test the scraping logic in the playground. For a full breakdown of costs per request, visit our pricing page.