GitHub - dataflint/spark: Drop-in replacement for Apache Spark UI

Spark Performance Made Simple

If you enjoy DataFlint OSS please give us a ⭐️ and join our slack community for feature requests, support and more!

What is DataFlint OSS?

DataFlint OSS is a modern, user-friendly enhancement for Apache Spark that simplifies performance monitoring and debugging. It adds an intuitive tab to the existing Spark Web UI, transforming a powerful but often overwhelming interface into something easy to navigate and understand.

Looking for more? Our full solution is a Production-aware AI copilot for Apache Spark. Learn more at dataflint.io.

Why DataFlint OSS?

Intuitive Design: DataFlint OSS's tab in the Spark Web UI presents complex metrics in a clear, easy-to-understand format, making Spark performance accessible to everyone.
Effortless Setup: Install DataFlint OSS in minutes with just a few lines of code or configuration, without making any changes to your existing Spark environment.
For All Skill Levels: Whether you're a seasoned data engineer or just starting with Spark, DataFlint OSS provides valuable insights that help you work more effectively.

With DataFlint OSS, spend less time deciphering Spark Web UI and more time deriving value from your data. Make big data work better for you, regardless of your role or experience level with Spark.

Usage

After installation, you will see a "DataFlint OSS" tab in the Spark Web UI. Click on it to start using DataFlint OSS.

Demo (Full YouTube Walkthrough)

Features

📈 Real-time query and cluster status
📊 Query breakdown with performance heat map
📋 Application Run Summary
⚠️ Performance alerts and suggestions
👀 Identify query failures
🤖 Spark AI Assistant

See Our Features for more information

Installation

Scala

Install DataFlint OSS via sbt:

libraryDependencies += "io.dataflint" %% "spark" % "0.2.3"

Then instruct spark to load the DataFlint OSS plugin:

val spark = SparkSession
    .builder()
    .config("spark.plugins", "io.dataflint.spark.SparkDataflintPlugin")
    ...
    .getOrCreate()

PySpark

Add these 2 configs to your pyspark session builder:

builder = pyspark.sql.SparkSession.builder
    ...
    .config("spark.jars.packages", "io.dataflint:spark_2.12:0.2.3") \
    .config("spark.plugins", "io.dataflint.spark.SparkDataflintPlugin") \
    ...

Spark Submit

Alternatively, install DataFlint OSS with no code change as a spark ivy package by adding these 2 lines to your spark-submit command:

spark-submit
--packages io.dataflint:spark_2.12:0.2.3 \
--conf spark.plugins=io.dataflint.spark.SparkDataflintPlugin \
...

Additional installation options

There is also support for scala 2.13, if your spark cluster is using scala 2.13 change package name to io.dataflint:spark_2.13:0.2.3
For more installation options, including for python and k8s spark-operator, see Install on Spark docs
For installing DataFlint OSS in spark history server for observability on completed runs see install on spark history server docs
For installing DataFlint OSS on DataBricks see install on databricks docs

How it Works

DataFlint OSS is installed as a plugin on the spark driver and history server.

The plugin exposes an additional HTTP resoures for additional metrics not available in Spark UI, and a modern SPA web-app that fetches data from spark without the need to refresh the page.

For more information, see how it works docs

Medium Articles

Compatibility Matrix

DataFlint OSS require spark version 3.2 and up, and supports both scala versions 2.12 or 2.13.

Spark Platforms	DataFlint OSS Realtime	DataFlint OSS History server
Local	✅	✅
Standalone	✅	✅
Kubernetes Spark Operator	✅	✅
EMR	✅	✅
Dataproc	✅	❓
HDInsights	✅	❓
Databricks	✅	❌

For more information, see supported versions docs

Name		Name	Last commit message	Last commit date
Latest commit History 485 Commits
.github		.github
.vscode		.vscode
documentation/resources		documentation/resources
spark-plugin		spark-plugin
spark-ui		spark-ui
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
THIRD-PARTY-LICENSES.txt		THIRD-PARTY-LICENSES.txt
package-lock.json		package-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Spark Performance Made Simple

What is DataFlint OSS?

Why DataFlint OSS?

Usage

Demo (Full YouTube Walkthrough)

Features

Installation

Scala

PySpark

Spark Submit

Additional installation options

How it Works

Medium Articles

Compatibility Matrix

About

Uh oh!

Releases 29

Uh oh!

Contributors 7

Uh oh!

Languages

Uh oh!

License

Uh oh!

dataflint/spark

Folders and files

Latest commit

History

Repository files navigation

Spark Performance Made Simple

What is DataFlint OSS?

Why DataFlint OSS?

Usage

Demo (Full YouTube Walkthrough)

Features

Installation

Scala

PySpark

Spark Submit

Additional installation options

How it Works

Medium Articles

Compatibility Matrix

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 29

Uh oh!

Contributors 7

Uh oh!

Languages