A fast CLI tool for working with Apache Avro files written in Go.
- π Fast & Lightweight - Efficient streaming processing with minimal memory footprint
- π JSON Lines Output - Compatible with
jqand other standard UNIX tools - π Powerful Filtering - Built-in CEL (Common Expression Language) query engine
- π¨ Pretty Output - Human-readable formatting with
--prettyflag - π‘οΈ Robust - Comprehensive test coverage including fuzzing tests for security
- π₯ Stdin Support - Use
-to read from stdin for easy piping between tools - π§ Extensible - Clean architecture designed for easy feature additions
go install github.com/en666ki/gavro@latestgit clone https://github.com/en666ki/gavro.git
cd gavro
go build -o gavro# Output Avro file contents as JSON Lines
gavro cat users.avro
# Output with pretty formatting
gavro cat users.avro --pretty
# Display schema
gavro schema users.avro
# Display schema (pretty-printed)
gavro schema users.avro --pretty
# Filter records using CEL expressions
gavro query users.avro "record.age > 18"
gavro q users.avro "record.age > 18" # short alias
# Pretty-printed query results
gavro query users.avro "record.age > 18" --pretty
# Complex filters
gavro query users.avro "record.age >= 30 && record.name.startsWith('A')"
gavro query users.avro "record.email.endsWith('@gmail.com')"
gavro query users.avro "has(record.score) && record.score > 0.5"
# Read from stdin (use "-" as filename)
cat users.avro | gavro cat -
curl https://example.com/data.avro | gavro query - "record.status == 'ERROR'"
cat users.avro | gavro select - record.name
cat users.avro | gavro schema -
# Pipe to jq for further processing
gavro cat users.avro | jq 'select(.age > 18)'
gavro query users.avro "record.active == true" | jq '.name'
# Limit output
gavro cat users.avro --limit 10
gavro query users.avro "record.age > 18" -n 5
# Count records
gavro cat users.avro --count
gavro query users.avro "record.age > 18" --count
# Count with limit
gavro cat users.avro --count --limit 100
# Extract specific fields
gavro cat users.avro | jq '{name, email}'
# Analyze schema with jq
gavro schema users.avro | jq '.fields[].name'gavro cat <file.avro | ->- Output Avro file contents as JSON Lines (use-for stdin)--pretty, -p- Pretty-print JSON with indentation--limit, -n- Maximum number of records to output--count, -c- Only print the number of records
gavro query <file.avro | -> <expression>- Filter records using CEL expressions- Alias:
q --pretty, -p- Pretty-print JSON with indentation--limit, -n- Maximum number of records to output--count, -c- Only print the number of matching records
- Alias:
gavro select <file.avro | -> <field>...- Extract specific fields from records--pretty, -p- Pretty-print JSON with indentation--limit, -n- Maximum number of records to output--count, -c- Only print the number of records
gavro schema <file.avro | ->- Display Avro schema as JSON--pretty, -p- Pretty-print JSON with indentation
gavro --help- Show helpgavro --version- Show version
The query command uses Common Expression Language (CEL) for filtering:
Syntax:
- Fields:
record.fieldName(e.g.,record.age) - Operators:
&&,||,!,==,!=,<,<=,>,>= - String functions:
startsWith(),endsWith(),contains() - Type functions:
has(),size(),int(),string() - Math:
+,-,*,/,%
Examples:
# Simple comparison
gavro query users.avro "record.age > 25"
# Boolean logic
gavro query users.avro "record.age > 18 && record.active == true"
gavro query users.avro "record.age < 20 || record.age > 60"
gavro query users.avro "!(record.deleted == true)"
# String operations
gavro query users.avro "record.name.startsWith('A')"
gavro query users.avro "record.email.endsWith('.com')"
gavro query users.avro "record.description.contains('urgent')"
# Field existence
gavro query users.avro "has(record.optional_field)"
# Complex expressions
gavro query logs.avro "record.level == 'ERROR' && record.timestamp > 1234567890"gavro outputs data in JSON Lines format - one JSON object per line. This format is:
- β Streaming-friendly (no need to load entire file in memory)
- β Easy to pipe to other tools
- β
Compatible with
jqand standard UNIX utilities - β Human-readable
Example output:
{"name":"Alice","age":30,"email":"alice@example.com"}
{"name":"Bob","age":25,"email":"bob@example.com"}
{"name":"Charlie","age":35,"email":"charlie@example.com"}# Build
go build
# Build to /tmp
go build -o /tmp/gavro
# Install to $GOPATH/bin
go install# All tests
go test ./...
# E2E tests
go test -v ./tests/e2e/...
# Fuzzing tests (30 seconds)
go test ./tests/fuzz/... -fuzz=FuzzAvroReader -fuzztime=30s
# With coverage
go test ./... -cover
# With race detector
go test ./... -racego run tests/testdata/generate.goThe project follows a clean, layered architecture:
gavro/
βββ cmd/ # CLI commands (cobra)
β βββ cat.go # Output records
β βββ query.go # Filter records
β βββ schema.go # Display schema
βββ internal/
β βββ reader/ # Avro file reading
β βββ writer/ # JSON Lines output (compact & pretty)
β βββ filter/ # CEL expression filtering
β βββ processor/ # Orchestration layer
βββ tests/
β βββ e2e/ # End-to-end tests
β βββ fuzz/ # Fuzzing tests (Avro & CEL)
β βββ testdata/ # Test Avro files
βββ main.go # Entry point
See CLAUDE.md for detailed architecture documentation.
Features:
-
gavro cat- Output Avro as JSON Lines β -
gavro schema- Display Avro schema β -
gavro query- Filter records with CEL expressions β -
--prettyflag for human-readable output β
Future features planned:
-
gavro convert- Convert between formats (Avro β JSON β CSV) -
gavro stats- Show statistics about Avro file - Support for reading from stdin (use
-as filename) β -
--limitflag for output records β -
--countflag to only count matches β
gavro has comprehensive test coverage:
- E2E tests: Full CLI behavior testing for all commands (cat, query, schema) including error handling, large files, and integration with jq
- Fuzzing tests: 9 fuzzing strategies covering:
- Avro file parsing (5 strategies)
- CEL query expressions (4 strategies including injection attacks)
- Test data: Automatically generated test files (simple, complex, corrupted, large)
- Benchmarks: Performance benchmarks for all commands
See tests/README.md for more details.
- Go 1.21 or higher
- github.com/hamba/avro/v2 - Fast Avro library
- github.com/spf13/cobra - CLI framework
- github.com/google/cel-go - Common Expression Language for query filtering
MIT
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Run tests (
go test ./...) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request