DataComPy by capitalone
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
About DataComPy
From the project's README at github.com/capitalone/datacompy. Lightly cleaned for readability; for the full source see the upstream repo.
[](https://github.com/astral-sh/ruff) [](https://badge.fury.io/py/datacompy) [](https://anaconda.org/conda-forge/datacompy)
DataComPy is a package to compare two DataFrames (or tables) such as Pandas, Spark, Polars, and even Snowflake. Originally it was created to be something of a replacement for SAS's `` for Pandas DataFrames with some more functionality than just `` (in that it prints out some stats, and lets you tweak how accurate matches have to be). Supported types include: Pandas Polars Spark Snowflake (via snowpark) Dask (via Fugue) DuckDB (via Fugue)
> [!IMPORTANT] > datacompy is progressing towards a release. During this transition, a branch will be maintained solely for users. > This branch will only receive dependency updates and critical bug fixes; no new features will be added. > All new feature development should target the branches ( and eventually ). Quick Installation
Health score breakdown
6-dimension composite. See methodology for formula and weights.
Adoption signals
Real-world usage data, pulled from each registry. The bigger the numbers, the more battle-tested the project.
| Signal | Value | Source |
|---|---|---|
| GitHub stars | 639 | github.com/capitalone/datacompy |
| GitHub forks | 160 | github.com/capitalone/datacompy |
| PYPI downloads (last month) | 2598k | datacompy |
Release & maintenance
Is this project actively maintained, or about to die? Check the recency of last commit and last release.
| Project age | 8.1 years | since Mar 2018 |
| Last commit | 6 days ago | May 1, 2026 |
| Releases shipped | 61 | last: 28 days ago |
Self-hosting cost across providers
Detected requirements: 4GB RAM, 40GB disk minimum. Cheapest plan per provider that meets the requirement.
| Provider | Plan | Specs | Monthly | |
|---|---|---|---|---|
| hetzner | CAX11 | 2c · 4GB · 40GB | $4.13 USD | Deploy → |
| vultr | VC2 | 1c · 1GB · 25GB | $5 USD | Deploy → |
| linode | Nanode 1GB | 1c · 1GB · 25GB | $5.12 USD | Deploy → |
| digitalocean | Basic Regular 1GB | 1c · 1GB · 25GB | $6 USD | Deploy → |
What people say on Hacker News
- Xero.DataComparer – High-performance generic list comparator for .NET
- Docker vs. Podman: Which Containerization Tool Is Right for You – DataCamp
- Ask HN: Learning resources for building AI agents?
- Show HN: Open-Source Inbox-as-a-Service for LLM Agents
- Show HN: Parquetastic – a browser-based Parquet metadata inspector
Ready to self-host DataComPy?
Spin up a hetzner CAX11 (4GB RAM, 40GB disk) for $4.13/mo and follow the project's official install docs.
Data last refreshed May 7, 2026.
Similar open-source projects
Projects in our directory that replace the same SaaS or share topics with DataComPy.