MarkCrawl by iD8 by AIMLPM
Fast Python web crawler for RAG and AI ingestion. Extracts clean Markdown from any site for LLMs and vector stores.
About MarkCrawl by iD8
From the project's README at github.com/AIMLPM/markcrawl. Lightly cleaned for readability; for the full source see the upstream repo.
Turn any webpage or website into clean Markdown for LLM pipelines, in one command.
[](https://github.com/AIMLPM/markcrawl/actions/workflows/ci.yml)
MarkCrawl is a crawl-and-structure engine. It fetches one page or crawls an entire website, strips navigation/scripts/boilerplate, and writes clean Markdown files with a structured JSONL index. Every page includes a citation with the access date. No API keys needed.
Everything else, LLM extraction, Supabase upload, MCP server, LangChain tools, is optional and installed separately.
Health score breakdown
6-dimension composite. See methodology for formula and weights.
Adoption signals
Real-world usage data, pulled from each registry. The bigger the numbers, the more battle-tested the project.
| Signal | Value | Source |
|---|---|---|
| GitHub stars | 2 | github.com/AIMLPM/markcrawl |
| GitHub forks | 0 | github.com/AIMLPM/markcrawl |
| PYPI downloads (last month) | 2.7k | markcrawl |
Release & maintenance
Is this project actively maintained, or about to die? Check the recency of last commit and last release.
| Project age | 0.7 years | since Sep 2025 |
| Last commit | 2 days ago | May 5, 2026 |
| Releases shipped | 20 | last: 2 days ago |
| Security policy | SECURITY.md | declared by maintainers |
Self-hosting cost across providers
Detected requirements: 4GB RAM, 40GB disk minimum. Cheapest plan per provider that meets the requirement.
| Provider | Plan | Specs | Monthly | |
|---|---|---|---|---|
| hetzner | CAX11 | 2c · 4GB · 40GB | $4.13 USD | Deploy → |
| vultr | VC2 | 1c · 1GB · 25GB | $5 USD | Deploy → |
| linode | Nanode 1GB | 1c · 1GB · 25GB | $5.12 USD | Deploy → |
| digitalocean | Basic Regular 1GB | 1c · 1GB · 25GB | $6 USD | Deploy → |
Replaces these paid SaaS
MarkCrawl by iD8 is one of the open-source alternatives to:
Ready to self-host MarkCrawl by iD8?
Spin up a hetzner CAX11 (4GB RAM, 40GB disk) for $4.13/mo and follow the project's official install docs.
Data last refreshed May 7, 2026.
Similar open-source projects
Projects in our directory that replace the same SaaS or share topics with MarkCrawl by iD8.