DataComPy by capitalone

Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!

pythonpandassparkdatadata-sciencecomparedataframesnumpypysparkfuguedaskpolars
Verdict 71/100 health $4.13/mo cheapest, hetzner 2/5 setup difficulty Last release 28 days ago

Self-host DataComPy on hetzner CAX11 for $4.13/mo.

Health score
71 /100
6-dim composite
Self-hosts from
$4.13 /mo
hetzner · CAX11
Difficulty
2 /5
Docker + read README
GitHub stars
639
160 forks

About DataComPy

From the project's README at github.com/capitalone/datacompy. Lightly cleaned for readability; for the full source see the upstream repo.

[](https://github.com/astral-sh/ruff) [](https://badge.fury.io/py/datacompy) [](https://anaconda.org/conda-forge/datacompy)

DataComPy is a package to compare two DataFrames (or tables) such as Pandas, Spark, Polars, and even Snowflake. Originally it was created to be something of a replacement for SAS's `` for Pandas DataFrames with some more functionality than just `` (in that it prints out some stats, and lets you tweak how accurate matches have to be). Supported types include: Pandas Polars Spark Snowflake (via snowpark) Dask (via Fugue) DuckDB (via Fugue)

> [!IMPORTANT] > datacompy is progressing towards a release. During this transition, a branch will be maintained solely for users. > This branch will only receive dependency updates and critical bug fixes; no new features will be added. > All new feature development should target the branches ( and eventually ). Quick Installation

Health score breakdown

6-dimension composite. See methodology for formula and weights.

activity
80
maturity
100
community
94
security
70
sustainability
53
adoption
26

Adoption signals

Real-world usage data, pulled from each registry. The bigger the numbers, the more battle-tested the project.

SignalValueSource
GitHub stars 639 github.com/capitalone/datacompy
GitHub forks 160 github.com/capitalone/datacompy
PYPI downloads (last month) 2598k datacompy

Release & maintenance

Is this project actively maintained, or about to die? Check the recency of last commit and last release.

Project age8.1 yearssince Mar 2018
Last commit6 days agoMay 1, 2026
Releases shipped61last: 28 days ago

Self-hosting cost across providers

Detected requirements: 4GB RAM, 40GB disk minimum. Cheapest plan per provider that meets the requirement.

ProviderPlanSpecsMonthly
hetzner CAX11 2c · 4GB · 40GB $4.13 USD Deploy →
vultr VC2 1c · 1GB · 25GB $5 USD Deploy →
linode Nanode 1GB 1c · 1GB · 25GB $5.12 USD Deploy →
digitalocean Basic Regular 1GB 1c · 1GB · 25GB $6 USD Deploy →

What people say on Hacker News

Ready to self-host DataComPy?

Spin up a hetzner CAX11 (4GB RAM, 40GB disk) for $4.13/mo and follow the project's official install docs.

Data last refreshed May 7, 2026.

Similar open-source projects

Projects in our directory that replace the same SaaS or share topics with DataComPy.

Frequently asked questions

Last verified . Data refreshes every 30 minutes.