parq-cli

A powerful command-line tool for Apache Parquet files 🚀

✨ Features

📊 Metadata Viewing: Quickly view Parquet file metadata (row count, column count, file size, compression type, etc.)
📋 Schema Display: Beautifully display file column structure and data types
👀 Data Preview: Support viewing the first N rows or last N rows of a file
🔢 Row Count: Quickly get the total number of rows in a file
✂️ File Splitting: Split large Parquet files into multiple smaller files
🗜️ Compression Info: Display file compression type and file size
🎨 Beautiful Output: Use Rich library for colorful, formatted terminal output
📦 Smart Display: Automatically detect nested structures, showing logical and physical column counts

📦 Installation

pip install parq-cli

🚀 Quick Start

Basic Usage

# View file metadata
parq meta data.parquet

# Display schema information
parq schema data.parquet

# Display first 5 rows (default)
parq head data.parquet

# Display first 10 rows
parq head -n 10 data.parquet

# Display last 5 rows (default)
parq tail data.parquet

# Display last 20 rows
parq tail -n 20 data.parquet

# Display total row count
parq count data.parquet

# Split file into 3 parts
parq split data.parquet --file-count 3

# Split file with 1000 records per file
parq split data.parquet --record-count 1000

📖 Command Reference

View Metadata

parq meta FILE

Display Parquet file metadata (row count, column count, file size, compression type, etc.).

View Schema

parq schema FILE

Display the column structure and data types of a Parquet file.

Preview Data

# Display first N rows (default 5)
parq head FILE
parq head -n N FILE

# Display last N rows (default 5)
parq tail FILE
parq tail -n N FILE

Statistics

# Display total row count
parq count FILE

Split Files

# Split into N files
parq split FILE --file-count N

# Split with M records per file
parq split FILE --record-count M

# Custom output format
parq split FILE -f N -n "output-%03d.parquet"

# Split into subdirectory
parq split FILE -f 3 -n "output/part-%02d.parquet"

Split a Parquet file into multiple smaller files. You can specify either the number of output files (--file-count) or the number of records per file (--record-count). The output file names are formatted according to the --name-format pattern (default: result-%06d.parquet).

Global Options

--version, -v: Display version information
--help: Display help information

🎨 Output Examples

Metadata Display

Regular File (No Nested Structure):

$ parq meta data.parquet

╭─────────────────────── 📊 Parquet File Metadata ───────────────────────╮
│ file_path: data.parquet                                                │
│ num_rows: 1000                                                         │
│ num_columns: 5 (logical)                                               │
│ file_size: 123.45 KB                                                   │
│ compression: SNAPPY                                                    │
│ num_row_groups: 1                                                      │
│ format_version: 2.6                                                    │
│ serialized_size: 126412                                                │
│ created_by: parquet-cpp-arrow version 18.0.0                          │
╰────────────────────────────────────────────────────────────────────────╯

Nested Structure File (Shows Physical Column Count):

$ parq meta nested.parquet

╭─────────────────────── 📊 Parquet File Metadata ───────────────────────╮
│ file_path: nested.parquet                                              │
│ num_rows: 500                                                          │
│ num_columns: 3 (logical)                                               │
│ num_physical_columns: 8 (storage)                                      │
│ file_size: 2.34 MB                                                     │
│ compression: ZSTD                                                      │
│ num_row_groups: 2                                                      │
│ format_version: 2.6                                                    │
│ serialized_size: 2451789                                               │
│ created_by: parquet-cpp-arrow version 21.0.0                          │
╰────────────────────────────────────────────────────────────────────────╯

Schema Display

$ parq schema data.parquet

                    📋 Schema Information
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Column Name ┃ Data Type     ┃ Nullable ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ id          │ int64         │ ✗        │
│ name        │ string        │ ✓        │
│ age         │ int64         │ ✓        │
│ city        │ string        │ ✓        │
│ salary      │ double        │ ✓        │
└─────────────┴───────────────┴──────────┘

🛠️ Tech Stack

PyArrow: High-performance Parquet reading engine
Typer: Modern CLI framework
Rich: Beautiful terminal output

🧪 Development

Install Development Dependencies

pip install -e ".[dev]"

Run Tests

pytest

Run Tests (With Coverage)

pytest --cov=parq --cov-report=html

Code Formatting and Checking

# Check and auto-fix with Ruff

ruff check --fix parq tests

🗺️ Roadmap

📦 Release Process (for maintainers)

We use automated scripts to manage versions and releases:

# Bump version and create tag
python scripts/bump_version.py patch  # 0.1.0 -> 0.1.1 (bug fixes)
python scripts/bump_version.py minor  # 0.1.0 -> 0.2.0 (new features)
python scripts/bump_version.py major  # 0.1.0 -> 1.0.0 (breaking changes)

# Push to trigger GitHub Actions
git push origin main
git push origin v0.1.1  # Replace with actual version

GitHub Actions will automatically:

✅ Check for version conflicts
✅ Build the package
✅ Publish to PyPI
✅ Create GitHub Release

See scripts/README.md for detailed documentation.

🤝 Contributing

Issues and Pull Requests are welcome!

Fork this repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details

🙏 Acknowledgments

Inspired by parquet-cli
Thanks to the Apache Arrow team for powerful Parquet support
Thanks to the Rich library for adding color to terminal output

📮 Contact

Author: SimonSun
Project URL: https://github.com/Tendo33/parq-cli

⭐ If this project helps you, please give it a Star!

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
data		data
examples		examples
parq		parq
scripts		scripts
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
README.zh.md		README.zh.md
RELEASE.md		RELEASE.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

parq-cli

✨ Features

📦 Installation

🚀 Quick Start

Basic Usage

📖 Command Reference

View Metadata

View Schema

Preview Data

Statistics

Split Files

Global Options

🎨 Output Examples

Metadata Display

Schema Display

🛠️ Tech Stack

🧪 Development

Install Development Dependencies

Run Tests

Run Tests (With Coverage)

Code Formatting and Checking

🗺️ Roadmap

📦 Release Process (for maintainers)

🤝 Contributing

📄 License

🙏 Acknowledgments

📮 Contact

About

Uh oh!

Releases 5

Packages

Contributors 2

Languages

License

Tendo33/parq-cli

Folders and files

Latest commit

History

Repository files navigation

parq-cli

✨ Features

📦 Installation

🚀 Quick Start

Basic Usage

📖 Command Reference

View Metadata

View Schema

Preview Data

Statistics

Split Files

Global Options

🎨 Output Examples

Metadata Display

Schema Display

🛠️ Tech Stack

🧪 Development

Install Development Dependencies

Run Tests

Run Tests (With Coverage)

Code Formatting and Checking

🗺️ Roadmap

📦 Release Process (for maintainers)

🤝 Contributing

📄 License

🙏 Acknowledgments

📮 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Languages

Packages