Solana Validator Data

This project shows the performance impact of using DoubleZero on Solana validators. The dashboard is powered by a data pipeline which extracts data from public APIs, then transforms and combines the data for reporting.

Tableau dashboard: https://public.tableau.com/views/DoubleZeroSummary/SummaryDashboard

Project Structure

doublezero_data/
├── README.md
├── scripts/
│   ├── extraction/             # Data extraction scripts
│   ├── transformation/         # Data transformation scripts
│   ├── utils/                  # Utility modules
│   └── combined_data.py        # Prepare data for reporting
├── data/
│   ├── raw/                    # Epoch level raw JSON data from APIs
│   │   ├── validators_main/    # validators.app data
│   │   └── epoch_performance/  # Solana Compass data
│   ├── transformed/            # Epoch level transformed CSV data
│   |   ├── validators_main/
│   |   └── epoch_performance/
│   └── combined/               # Combined data for reporting
├── requirements.txt
└── .env                       # Configuration file

Setup

1. Install Dependencies

pip install -r requirements.txt

2. Configure Environment

Create a .env file in the project root with your API keys (secrets only):

# Required API keys (replace with your actual keys, no quotes needed)
# Visit https://www.validators.app/api-documentation, sign up, and generate an API key

VALIDATORS_APP_API_KEY=your_validators_app_key_here

Configuration Values: Non-secret configuration values are stored in scripts/utils/config.py and include:

API Base URLs (validators.app, Solana Beach, Solana Compass)
Network configuration (DEFAULT_NETWORK=mainnet)
Epoch range (START_EPOCH=835, END_EPOCH=851)
Logging level (LOG_LEVEL=INFO)
API rate limiting settings

Usage

Data Extraction

Run the extraction scripts to fetch raw data from APIs:

# Extract validator data from validators.app (API key required)
python scripts/extraction/extract_validators_main.py

# Extract epoch performance data from Solana Compass (no API key required)
python scripts/extraction/extract_epoch_performance.py

Data Transformation

Run the transformation scripts to convert raw JSON to clean CSV:

# Transform validators.app data
python scripts/transformation/transform_validators_main.py

# Transform epoch performance data
python scripts/transformation/transform_epoch_performance.py

Data Finalization

This generates data for reporting
In a production environment, this would have been replaced with the 'Load to database' step + creation of appropriate views

python scripts/transformation/combined_data.py

Data Sources

1. Validators.app API

Purpose: Primary validator dimension data by epoch
Authentication: API key required
Key Fields: is_dz, jito, software_version, active_stake, location

2. Solana Compass API

Purpose: Performance metrics for validators by epoch
Authentication: None required
Key Fields: median_block_time, transaction_count, skip_rate_percent, fees

Limitations/ next steps:

Some of the data is not reliable. For example, the values of is_dz (whether a validator uses Double Zero) and active_stake do not seem to be changing with epoch. That is, for any validator, it repeats the same info for all epochs.
I was unable to find a high quality source for is_dz by epoch
Median block time for DoubleZero validators shown in the dashboard is actually the median value of median block time for all validators for the epoch as we do not have block level information. This applies to other median statistics too.
Raw data is collected by epoch. As each epoch lasts for ~2 days, we do not have data for some dates. To create timeline charts by date (instead of epoch #), we would need to impute missing values from the previous day's value.
The latitude and longitude of validators does not always align with the city the data center is in. We need to understand the inconsistency in more detail.
Creating timeline view for staked sol by validator category (DoubleZero vs Non-DoubleZero) would have been interesting but I did not find a soure which gives this data for hundreds of validators in one API call. So this chart will be added in future.
Instead of a separate extraction file for each data source, we could have used just one by passing the data source name as an argument and using a diferent API call accordingly. Similar logic could have been used for transformation too. However, separate files have been used for now for simplicity.
Currently, the scripts are executed locally and Tableau data is refreshed manually. However, this can be productionized by using an appropriate cloud service.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Solana Validator Data

Project Structure

Setup

1. Install Dependencies

2. Configure Environment

Usage

Data Extraction

Data Transformation

Data Finalization

Data Sources

1. Validators.app API

2. Solana Compass API

Limitations/ next steps:

About

Uh oh!

Releases

Packages

Languages

nm-de/DoubleZero_Data

Folders and files

Latest commit

History

Repository files navigation

Solana Validator Data

Project Structure

Setup

1. Install Dependencies

2. Configure Environment

Usage

Data Extraction

Data Transformation

Data Finalization

Data Sources

1. Validators.app API

2. Solana Compass API

Limitations/ next steps:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages