This project shows the performance impact of using DoubleZero on Solana validators. The dashboard is powered by a data pipeline which extracts data from public APIs, then transforms and combines the data for reporting.
Tableau dashboard: https://public.tableau.com/views/DoubleZeroSummary/SummaryDashboard
doublezero_data/
├── README.md
├── scripts/
│ ├── extraction/ # Data extraction scripts
│ ├── transformation/ # Data transformation scripts
│ ├── utils/ # Utility modules
│ └── combined_data.py # Prepare data for reporting
├── data/
│ ├── raw/ # Epoch level raw JSON data from APIs
│ │ ├── validators_main/ # validators.app data
│ │ └── epoch_performance/ # Solana Compass data
│ ├── transformed/ # Epoch level transformed CSV data
│ | ├── validators_main/
│ | └── epoch_performance/
│ └── combined/ # Combined data for reporting
├── requirements.txt
└── .env # Configuration file
pip install -r requirements.txtCreate a .env file in the project root with your API keys (secrets only):
# Required API keys (replace with your actual keys, no quotes needed)
# Visit https://www.validators.app/api-documentation, sign up, and generate an API key
VALIDATORS_APP_API_KEY=your_validators_app_key_hereConfiguration Values: Non-secret configuration values are stored in scripts/utils/config.py and include:
- API Base URLs (validators.app, Solana Beach, Solana Compass)
- Network configuration (DEFAULT_NETWORK=mainnet)
- Epoch range (START_EPOCH=835, END_EPOCH=851)
- Logging level (LOG_LEVEL=INFO)
- API rate limiting settings
Run the extraction scripts to fetch raw data from APIs:
# Extract validator data from validators.app (API key required)
python scripts/extraction/extract_validators_main.py
# Extract epoch performance data from Solana Compass (no API key required)
python scripts/extraction/extract_epoch_performance.pyRun the transformation scripts to convert raw JSON to clean CSV:
# Transform validators.app data
python scripts/transformation/transform_validators_main.py
# Transform epoch performance data
python scripts/transformation/transform_epoch_performance.py- This generates data for reporting
- In a production environment, this would have been replaced with the 'Load to database' step + creation of appropriate views
python scripts/transformation/combined_data.py- Purpose: Primary validator dimension data by epoch
- Authentication: API key required
- Key Fields: is_dz, jito, software_version, active_stake, location
- Purpose: Performance metrics for validators by epoch
- Authentication: None required
- Key Fields: median_block_time, transaction_count, skip_rate_percent, fees
- Some of the data is not reliable. For example, the values of is_dz (whether a validator uses Double Zero) and active_stake do not seem to be changing with epoch. That is, for any validator, it repeats the same info for all epochs.
- I was unable to find a high quality source for is_dz by epoch
- Median block time for DoubleZero validators shown in the dashboard is actually the median value of median block time for all validators for the epoch as we do not have block level information. This applies to other median statistics too.
- Raw data is collected by epoch. As each epoch lasts for ~2 days, we do not have data for some dates. To create timeline charts by date (instead of epoch #), we would need to impute missing values from the previous day's value.
- The latitude and longitude of validators does not always align with the city the data center is in. We need to understand the inconsistency in more detail.
- Creating timeline view for staked sol by validator category (DoubleZero vs Non-DoubleZero) would have been interesting but I did not find a soure which gives this data for hundreds of validators in one API call. So this chart will be added in future.
- Instead of a separate extraction file for each data source, we could have used just one by passing the data source name as an argument and using a diferent API call accordingly. Similar logic could have been used for transformation too. However, separate files have been used for now for simplicity.
- Currently, the scripts are executed locally and Tableau data is refreshed manually. However, this can be productionized by using an appropriate cloud service.