Apache Spark by apache
Apache Spark - A unified analytics engine for large-scale data processing
About Apache Spark
From the project's README at github.com/apache/spark. Lightly cleaned for readability; for the full source see the upstream repo.
Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R (Deprecated), and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. Official version: Development version:
[](https://opensource.org/licenses/Apache-2.0) [](https://search.maven.org/search?q=g:org.apache.spark) [](https://adoptium.net/temurin/releases/?version=17) [](https://github.com/apache/spark/actions/workflows/build_main.yml) [](https://codecov.io/gh/apache/spark) [](https://pypi.org/project/pyspark/) Online
Health score breakdown
6-dimension composite. See methodology for formula and weights.
Adoption signals
Real-world usage data, pulled from each registry. The bigger the numbers, the more battle-tested the project.
| Signal | Value | Source |
|---|---|---|
| GitHub stars | 43k | github.com/apache/spark |
| GitHub forks | 29k | github.com/apache/spark |
| Docker Hub pulls | 24360k | hub.docker.com / apache |
Release & maintenance
Is this project actively maintained, or about to die? Check the recency of last commit and last release.
| Project age | 12.2 years | since Feb 2014 |
| Last commit | 2 days ago | May 5, 2026 |
| Security policy | SECURITY.md | declared by maintainers |
Self-hosting cost across providers
Detected requirements: 4GB RAM, 40GB disk minimum. Cheapest plan per provider that meets the requirement.
| Provider | Plan | Specs | Monthly | |
|---|---|---|---|---|
| hetzner | CAX11 | 2c · 4GB · 40GB | $4.13 USD | Deploy → |
| vultr | VC2 | 1c · 1GB · 25GB | $5 USD | Deploy → |
| linode | Nanode 1GB | 1c · 1GB · 25GB | $5.12 USD | Deploy → |
| digitalocean | Basic Regular 1GB | 1c · 1GB · 25GB | $6 USD | Deploy → |
Security advisories
CVE-2025-54920. What people say on Hacker News
- Show HN: We're building Apache spark for agents with Rust and Datafusion
- 2025 ACM Prize in Computing Goes to Apache Spark Creator Matei Zaharia
- SparkVSR: Video Super-Resolution You Can Control with Keyframes
- Show HN: I built a new programming language for AI and Data – 'ThinkingLanguage'
- Show HN: Flare – Full-Stack OpenTelemetry Observability for Apache Spark
Ready to self-host Apache Spark?
Spin up a hetzner CAX11 (4GB RAM, 40GB disk) for $4.13/mo and follow the project's official install docs.
Data last refreshed May 7, 2026.
Similar open-source projects
Projects in our directory that replace the same SaaS or share topics with Apache Spark.