Skip to content

nm-de/DoubleZero_Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Solana Validator Data

This project shows the performance impact of using DoubleZero on Solana validators. The dashboard is powered by a data pipeline which extracts data from public APIs, then transforms and combines the data for reporting.

Tableau dashboard: https://public.tableau.com/views/DoubleZeroSummary/SummaryDashboard

Project Structure

doublezero_data/
├── README.md
├── scripts/
│   ├── extraction/             # Data extraction scripts
│   ├── transformation/         # Data transformation scripts
│   ├── utils/                  # Utility modules
│   └── combined_data.py        # Prepare data for reporting
├── data/
│   ├── raw/                    # Epoch level raw JSON data from APIs
│   │   ├── validators_main/    # validators.app data
│   │   └── epoch_performance/  # Solana Compass data
│   ├── transformed/            # Epoch level transformed CSV data
│   |   ├── validators_main/
│   |   └── epoch_performance/
│   └── combined/               # Combined data for reporting
├── requirements.txt
└── .env                       # Configuration file

Setup

1. Install Dependencies

pip install -r requirements.txt

2. Configure Environment

Create a .env file in the project root with your API keys (secrets only):

# Required API keys (replace with your actual keys, no quotes needed)
# Visit https://www.validators.app/api-documentation, sign up, and generate an API key

VALIDATORS_APP_API_KEY=your_validators_app_key_here

Configuration Values: Non-secret configuration values are stored in scripts/utils/config.py and include:

  • API Base URLs (validators.app, Solana Beach, Solana Compass)
  • Network configuration (DEFAULT_NETWORK=mainnet)
  • Epoch range (START_EPOCH=835, END_EPOCH=851)
  • Logging level (LOG_LEVEL=INFO)
  • API rate limiting settings

Usage

Data Extraction

Run the extraction scripts to fetch raw data from APIs:

# Extract validator data from validators.app (API key required)
python scripts/extraction/extract_validators_main.py

# Extract epoch performance data from Solana Compass (no API key required)
python scripts/extraction/extract_epoch_performance.py

Data Transformation

Run the transformation scripts to convert raw JSON to clean CSV:

# Transform validators.app data
python scripts/transformation/transform_validators_main.py

# Transform epoch performance data
python scripts/transformation/transform_epoch_performance.py

Data Finalization

  • This generates data for reporting
  • In a production environment, this would have been replaced with the 'Load to database' step + creation of appropriate views
python scripts/transformation/combined_data.py

Data Sources

1. Validators.app API

  • Purpose: Primary validator dimension data by epoch
  • Authentication: API key required
  • Key Fields: is_dz, jito, software_version, active_stake, location

2. Solana Compass API

  • Purpose: Performance metrics for validators by epoch
  • Authentication: None required
  • Key Fields: median_block_time, transaction_count, skip_rate_percent, fees

Limitations/ next steps:

  • Some of the data is not reliable. For example, the values of is_dz (whether a validator uses Double Zero) and active_stake do not seem to be changing with epoch. That is, for any validator, it repeats the same info for all epochs.
  • I was unable to find a high quality source for is_dz by epoch
  • Median block time for DoubleZero validators shown in the dashboard is actually the median value of median block time for all validators for the epoch as we do not have block level information. This applies to other median statistics too.
  • Raw data is collected by epoch. As each epoch lasts for ~2 days, we do not have data for some dates. To create timeline charts by date (instead of epoch #), we would need to impute missing values from the previous day's value.
  • The latitude and longitude of validators does not always align with the city the data center is in. We need to understand the inconsistency in more detail.
  • Creating timeline view for staked sol by validator category (DoubleZero vs Non-DoubleZero) would have been interesting but I did not find a soure which gives this data for hundreds of validators in one API call. So this chart will be added in future.
  • Instead of a separate extraction file for each data source, we could have used just one by passing the data source name as an argument and using a diferent API call accordingly. Similar logic could have been used for transformation too. However, separate files have been used for now for simplicity.
  • Currently, the scripts are executed locally and Tableau data is refreshed manually. However, this can be productionized by using an appropriate cloud service.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages