Connect Claude & GPT directly to the web.Try it now
All posts
TutorialApril 9, 2026·5 min·Ilmenite Team

How to Convert Any Website to Markdown with an API

AI agents and RAG pipelines require clean, structured data to function. Raw HTML is filled with noise—navigation bars, footer links, scripts, and CSS—that wastes LLM tokens and confuses the model. By ...

How to Convert Any Website to Markdown with an API

Meta description: Learn how to use a website to markdown API to convert web pages into LLM-ready text. Step-by-step guide with Python, TypeScript, and curl examples.

AI agents and RAG pipelines require clean, structured data to function. Raw HTML is filled with noise—navigation bars, footer links, scripts, and CSS—that wastes LLM tokens and confuses the model. By using a website to markdown api, you can strip away the boilerplate and deliver only the core content in a format that LLMs understand natively.

In this guide, we will show you how to use Ilmenite to convert any URL into clean markdown. We will cover basic implementation using curl, Python, and TypeScript, and explain how to handle complex JavaScript-heavy sites.

Why Markdown is Better Than HTML for LLMs

Before implementing the API, it is important to understand why you should avoid feeding raw HTML into a Large Language Model (LLM).

Token Efficiency

LLMs have finite context windows. A typical web page might have 50KB of HTML but only 5KB of actual content. The rest is metadata, scripts, and styling. Converting a page to markdown reduces the token count significantly, allowing you to fit more information into a single prompt and reducing your API costs.

Noise Reduction

HTML contains "noise" that can lead to hallucinations. For example, a sidebar containing "Related Articles" might be interpreted by an LLM as part of the main body text. A dedicated website to markdown api removes these elements, ensuring the model focuses only on the primary content.

Structural Preservation

Unlike plain text, markdown preserves the semantic structure of a page. It keeps headers (#), lists (-), and links (text). This allows the LLM to understand the hierarchy of the information, which is critical for tasks like summarization or data extraction.

Prerequisites

To follow this tutorial, you will need:

  • An Ilmenite API key. You can sign up for a free account to get started.
  • A terminal with curl installed.
  • Python 3.8+ or Node.js 16+ installed on your machine.
  • A target URL you wish to convert to markdown.

Implementing the Website to Markdown API

The core of this process is the /v1/scrape endpoint. Unlike traditional scrapers that require you to write complex CSS selectors, this endpoint handles the cleaning and conversion automatically.

Step 1: Basic Request with curl

The fastest way to test the API is via curl. This request sends a URL to the server, which then renders the page and returns the markdown.

curl -X POST https://api.ilmenite.dev/v1/scrape \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://example.com",
    "format": "markdown"
  }'

The response will be a JSON object containing the cleaned markdown text, the page title, and metadata.

Step 2: Implementation in Python

For AI agent builders using LangChain or LlamaIndex, Python is the standard. We use the requests library to communicate with the API.

import requests

def convert_to_markdown(url, api_key):
    endpoint = "https://api.ilmenite.dev/v1/scrape"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    data = {
        "url": url,
        "format": "markdown"
    }

    response = requests.post(endpoint, json=data, headers=headers)
    
    if response.status_code == 200:
        return response.json().get("markdown")
    else:
        return f"Error: {response.status_code} - {response.text}"

# Usage
API_KEY = "your_api_key_here"
TARGET_URL = "https://docs.python.org/3/"
markdown_content = convert_to_markdown(TARGET_URL, API_KEY)
print(markdown_content)

Step 3: Implementation in TypeScript

If you are building a web-based AI tool or a Node.js backend, TypeScript is the best choice.

import axios from 'axios';

async function convertToMarkdown(url: string, apiKey: string) {
  const endpoint = 'https://api.ilmenite.dev/v1/scrape';
  
  try {
    const response = await axios.post(endpoint, {
      url: url,
      format: 'markdown',
    }, {
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${apiKey}`,
      },
    });

    return response.data.markdown;
  } catch (error) {
    console.error('Error converting website to markdown:', error);
    throw error;
  }
}

// Usage
const API_KEY = 'your_api_key_here';
const TARGET_URL = 'https://typescriptlang.org/';
convertToMarkdown(TARGET_URL, API_KEY).then(console.log);

Step 4: Handling JavaScript-Heavy Websites

Many modern websites use React, Vue, or Next.js. A standard HTTP request to these sites often returns an empty shell because the content is rendered in the browser via JavaScript.

Ilmenite solves this by using a browser engine built in Rust. While our native engine handles most sites, some complex Single Page Applications (SPAs) require full Chrome rendering. You can trigger this by adding the render_js parameter to your request.

Updated curl request for JS rendering:

curl -X POST https://api.ilmenite.dev/v1/scrape \
  -H "Content-Type": "application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://complex-react-site.com",
    "format": "markdown",
    "render_js": true
  }'

Note that rendering JavaScript is more resource-intensive and costs 3 credits per request, compared to 1 credit for standard scraping. You can find more details on credit costs in our pricing page.

Performance and Architecture

When choosing a website to markdown api, performance matters—especially for autonomous agents that need to browse the web in real-time.

Ilmenite is built in pure Rust, which eliminates the overhead associated with Node.js or Python-based wrappers. Our browser engine starts in 0.19ms and uses only 2MB of RAM per session. This is 100x lighter than Chrome-based alternatives.

For developers who require strict data residency or air-gapped environments, Ilmenite can be self-hosted as a single binary or a 12MB Docker image. This ensures your data never leaves your infrastructure while maintaining sub-millisecond startup times.

Full Working Example: RAG-Ready Scraper

Below is a complete Python script that takes a list of URLs and prepares them for a vector database by converting them to markdown.

import requests
import json

class WebToMarkdownConverter:
    def __init__(self, api_key):
        self.api_key = api_key
        self.endpoint = "https://api.ilmenite.dev/v1/scrape"

    def scrape(self, url):
        headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {self.api_key}"
        }
        payload = {
            "url": url,
            "format": "markdown",
            "render_js": True # Ensure we get content from SPAs
        }
        
        try:
            response = requests.post(self.endpoint, json=payload, headers=headers)
            response.raise_for_status()
            return response.json()
        except requests.exceptions.RequestException as e:
            print(f"Failed to scrape {url}: {e}")
            return None

def main():
    API_KEY = "your_api_key_here"
    urls = [
        "https://ilmenite.dev",
        "https://rust-lang.org",
        "https://openai.com/blog"
    ]
    
    converter = WebToMarkdownConverter(API_KEY)
    processed_data = []

    for url in urls:
        print(f"Processing {url}...")
        result = converter.scrape(url)
        if result:
            processed_data.append({
                "url": url,
                "title": result.get("title"),
                "content": result.get("markdown")
            })

    # Save for RAG pipeline indexing
    with open("web_data.json", "w") as f:
        json.dump(processed_data, f, indent=2)
    
    print("Successfully converted all pages to markdown.")

if __name__ == "__main__":
    main()

Next Steps

Now that you can convert websites to markdown, you can expand your AI agent's capabilities:

  1. Crawl Entire Sites: Use the /v1/crawl endpoint to index an entire documentation site rather than single pages. See the crawl documentation.
  2. Structured Extraction: If you need specific data (like product prices) instead of a full page, use the /v1/extract endpoint to get structured JSON.
  3. Integrate with Claude: Use our MCP (Model Context Protocol) server to give Claude native access to the web without writing custom glue code.

Ready to start building? You can start free or test the API in our playground.