MarkCrawl by iD8 by AIMLPM

Fast Python web crawler for RAG and AI ingestion. Extracts clean Markdown from any site for LLMs and vector stores.

ai-agentsingestion-pipelinellmmarkdown-extractionpgvectorpythonragsupabasewebcrawleranthropic-claudedata-extractiongemini
Verdict 54/100 health $4.13/mo cheapest, hetzner 2/5 setup difficulty Last release 2 days ago

Self-host MarkCrawl by iD8 on hetzner CAX11 for $4.13/mo.

Health score
54 /100
6-dim composite
Self-hosts from
$4.13 /mo
hetzner · CAX11
Difficulty
2 /5
Docker + read README
GitHub stars
2
0 forks

About MarkCrawl by iD8

From the project's README at github.com/AIMLPM/markcrawl. Lightly cleaned for readability; for the full source see the upstream repo.

Turn any webpage or website into clean Markdown for LLM pipelines, in one command.

[](https://github.com/AIMLPM/markcrawl/actions/workflows/ci.yml)

MarkCrawl is a crawl-and-structure engine. It fetches one page or crawls an entire website, strips navigation/scripts/boilerplate, and writes clean Markdown files with a structured JSONL index. Every page includes a citation with the access date. No API keys needed.

Everything else, LLM extraction, Supabase upload, MCP server, LangChain tools, is optional and installed separately.

Health score breakdown

6-dimension composite. See methodology for formula and weights.

activity
68
maturity
75
community
53
security
85
sustainability
55
adoption
3

Adoption signals

Real-world usage data, pulled from each registry. The bigger the numbers, the more battle-tested the project.

SignalValueSource
GitHub stars 2 github.com/AIMLPM/markcrawl
GitHub forks 0 github.com/AIMLPM/markcrawl
PYPI downloads (last month) 2.7k markcrawl

Release & maintenance

Is this project actively maintained, or about to die? Check the recency of last commit and last release.

Project age0.7 yearssince Sep 2025
Last commit2 days agoMay 5, 2026
Releases shipped20last: 2 days ago
Security policySECURITY.mddeclared by maintainers

Self-hosting cost across providers

Detected requirements: 4GB RAM, 40GB disk minimum. Cheapest plan per provider that meets the requirement.

ProviderPlanSpecsMonthly
hetzner CAX11 2c · 4GB · 40GB $4.13 USD Deploy →
vultr VC2 1c · 1GB · 25GB $5 USD Deploy →
linode Nanode 1GB 1c · 1GB · 25GB $5.12 USD Deploy →
digitalocean Basic Regular 1GB 1c · 1GB · 25GB $6 USD Deploy →

Replaces these paid SaaS

MarkCrawl by iD8 is one of the open-source alternatives to:

Supabase alternatives

Ready to self-host MarkCrawl by iD8?

Spin up a hetzner CAX11 (4GB RAM, 40GB disk) for $4.13/mo and follow the project's official install docs.

Data last refreshed May 7, 2026.

Similar open-source projects

Projects in our directory that replace the same SaaS or share topics with MarkCrawl by iD8.

Frequently asked questions

Last verified . Data refreshes every 30 minutes.