A powerful command-line tool for Apache Parquet files ๐
English | ็ฎไฝไธญๆ
- ๐ Metadata Viewing: Quickly view Parquet file metadata (row count, column count, file size, compression type, etc.)
- ๐ Schema Display: Beautifully display file column structure and data types
- ๐ Data Preview: Support viewing the first N rows or last N rows of a file
- ๐ข Row Count: Quickly get the total number of rows in a file
- โ๏ธ File Splitting: Split large Parquet files into multiple smaller files
- ๐๏ธ Compression Info: Display file compression type and file size
- ๐จ Beautiful Output: Use Rich library for colorful, formatted terminal output
- ๐ฆ Smart Display: Automatically detect nested structures, showing logical and physical column counts
pip install parq-cli# View file metadata
parq meta data.parquet
# Display schema information
parq schema data.parquet
# Display first 5 rows (default)
parq head data.parquet
# Display first 10 rows
parq head -n 10 data.parquet
# Display last 5 rows (default)
parq tail data.parquet
# Display last 20 rows
parq tail -n 20 data.parquet
# Display total row count
parq count data.parquet
# Split file into 3 parts
parq split data.parquet --file-count 3
# Split file with 1000 records per file
parq split data.parquet --record-count 1000parq meta FILEDisplay Parquet file metadata (row count, column count, file size, compression type, etc.).
parq schema FILEDisplay the column structure and data types of a Parquet file.
# Display first N rows (default 5)
parq head FILE
parq head -n N FILE
# Display last N rows (default 5)
parq tail FILE
parq tail -n N FILE# Display total row count
parq count FILE# Split into N files
parq split FILE --file-count N
# Split with M records per file
parq split FILE --record-count M
# Custom output format
parq split FILE -f N -n "output-%03d.parquet"
# Split into subdirectory
parq split FILE -f 3 -n "output/part-%02d.parquet"Split a Parquet file into multiple smaller files. You can specify either the number of output files (--file-count) or the number of records per file (--record-count). The output file names are formatted according to the --name-format pattern (default: result-%06d.parquet).
--version, -v: Display version information--help: Display help information
Regular File (No Nested Structure):
$ parq meta data.parquetโญโโโโโโโโโโโโโโโโโโโโโโโ ๐ Parquet File Metadata โโโโโโโโโโโโโโโโโโโโโโโโฎ
โ file_path: data.parquet โ
โ num_rows: 1000 โ
โ num_columns: 5 (logical) โ
โ file_size: 123.45 KB โ
โ compression: SNAPPY โ
โ num_row_groups: 1 โ
โ format_version: 2.6 โ
โ serialized_size: 126412 โ
โ created_by: parquet-cpp-arrow version 18.0.0 โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Nested Structure File (Shows Physical Column Count):
$ parq meta nested.parquetโญโโโโโโโโโโโโโโโโโโโโโโโ ๐ Parquet File Metadata โโโโโโโโโโโโโโโโโโโโโโโโฎ
โ file_path: nested.parquet โ
โ num_rows: 500 โ
โ num_columns: 3 (logical) โ
โ num_physical_columns: 8 (storage) โ
โ file_size: 2.34 MB โ
โ compression: ZSTD โ
โ num_row_groups: 2 โ
โ format_version: 2.6 โ
โ serialized_size: 2451789 โ
โ created_by: parquet-cpp-arrow version 21.0.0 โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
$ parq schema data.parquet ๐ Schema Information
โโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโณโโโโโโโโโโโ
โ Column Name โ Data Type โ Nullable โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ id โ int64 โ โ โ
โ name โ string โ โ โ
โ age โ int64 โ โ โ
โ city โ string โ โ โ
โ salary โ double โ โ โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโโโโ
- PyArrow: High-performance Parquet reading engine
- Typer: Modern CLI framework
- Rich: Beautiful terminal output
pip install -e ".[dev]"pytestpytest --cov=parq --cov-report=html# Check and auto-fix with Ruff
ruff check --fix parq tests- Basic metadata viewing
- Schema display
- Data preview (head/tail)
- Row count statistics
- File size and compression information display
- Nested structure smart detection (logical vs physical column count)
- Add split command, split a parquet file into multiple parquet files
- Data statistical analysis
- Add convert command, convert a parquet file to other formats (CSV, JSON, Excel)
- Add diff command, compare the differences between two parquet files
- Add merge command, merge multiple parquet files into one parquet file
We use automated scripts to manage versions and releases:
# Bump version and create tag
python scripts/bump_version.py patch # 0.1.0 -> 0.1.1 (bug fixes)
python scripts/bump_version.py minor # 0.1.0 -> 0.2.0 (new features)
python scripts/bump_version.py major # 0.1.0 -> 1.0.0 (breaking changes)
# Push to trigger GitHub Actions
git push origin main
git push origin v0.1.1 # Replace with actual versionGitHub Actions will automatically:
- โ Check for version conflicts
- โ Build the package
- โ Publish to PyPI
- โ Create GitHub Release
See scripts/README.md for detailed documentation.
Issues and Pull Requests are welcome!
- Fork this repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details
- Inspired by parquet-cli
- Thanks to the Apache Arrow team for powerful Parquet support
- Thanks to the Rich library for adding color to terminal output
- Author: SimonSun
- Project URL: https://github.com/Tendo33/parq-cli
โญ If this project helps you, please give it a Star!