Skip to content

grc-iit/Datasets

Repository files navigation

GRC Datasets Repository

Centralized repository for datasets used by the GRC organization at IIT. Contains scientific simulation datasets and documentation for accessing petabyte-scale public datasets.

Dataset Statistics

Category Datasets Formats Total Size Description
ADIOS 23 BP5 755 MB CFD, MD, Weather simulations
Oceanography 2 NetCDF 2.4 MB CTD profiles, surface analysis
Genomics 7 FASTA, HDF5, SAM, VCF, FASTQ 8.5 MB Genomes, variants, RNA-seq
Astronomy 4 FITS, HDF5 2.7 MB Images, spectra, light curves
Seismology 3 HDF5 16 MB Earthquake data, noise, RFs
Parquet 2 Parquet 48 MB NYC taxi, analytics samples
NetCDF 2 NetCDF 6.7 MB NOAA climate data
HDF5 5 HDF5, PDB 766 KB OpenPMD, protein structures
ROOT 2 ROOT 20.5 MB Higgs analysis, tutorials
FITS 2 FITS 4.8 MB Hubble observations
CIF 3 CIF 85 KB Crystal structures
Darshan Examples LOG 36 MB I/O characterization traces
Shadow 50+ Various PB-scale Documentation for public data

Total local datasets: ~840 MB across 50+ datasets Total accessible (shadow): Petabytes of public scientific data

Index

Data Formats

  • Adios - ADIOS2 I/O framework datasets
  • HDF5 - Hierarchical Data Format files
  • NetCDF - Network Common Data Form files
  • Parquet - Columnar format files
  • ROOT - Particle physics data from CERN
  • FITS - Astronomy image and data files
  • CIF - Crystallographic Information Files (crystal structures)

Scientific Domains

  • Oceanography - Ocean and marine data (NetCDF)
  • Astronomy - Astronomical observations (FITS, HDF5)
  • Seismology - Earthquake and seismic data (HDF5)
  • Genomics - Genomics and bioinformatics data (FASTA, HDF5, SAM, VCF, FASTQ)

Tracking

Shadow Datasets

Documentation for petabyte-scale public datasets:

About

A collection of datasets focusing on I/O systems in HPC

Topics

Resources

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •  

Languages