Sonnet Scripts is a collection of pre-built data architecture patterns that you can quickly spin up on a local machine, along with examples of real-world data that you can use with it.
One of the challenges of making content and tutorials on data is the lack of established data infrastructure and real-world datasets. I have often found myself repeating this process over and over again, therefore we decided to create an open-source repo to expedite this process.
According to the Academy of American Poets, a "...sonnet is a fourteen-line poem written in iambic pentameter, employing one of several rhyme schemes, and adhering to a tightly structured thematic organization." Through the constraints of a particular sonnet format, poets throughout centuries have pushed their creativity to express themselves-- William Shakespear being one of the most well-known. I've similarly seen data architectures fill the same role as a sonnet, where their specific patterns push data practioners to think of creative ways to solve business problems.
Welcome to Sonnet Scripts – a fully containerized environment designed for data analysts, analytics engineers, and data engineers to experiment with databases, queries, and ETL pipelines. This repository provides a pre-configured sandbox where users can ingest data, transform it using SQL/Python, and test integrations with PostgreSQL, DuckDB, MinIO and more!
This project is ideal for:
- Data Engineers who want a lightweight environment for testing data pipelines.
- Analytics Engineers experimenting with dbt and SQL transformations.
- Data Analysts looking for a structured PostgreSQL + DuckDB setup.
- Developers working on data APIs using Python.
Before setting up the environment, ensure you have the following installed:
-
Docker & Docker Compose
-
Make (for automation)
- Linux/macOS: Comes pre-installed
- Windows: Install via Chocolatey →
choco install make
-
Python (3.12+)
git clone https://github.com/onthemarkdata/sonnet-scripts.git
cd sonnet-scripsmake setupThis will:
- Build the Docker images
- Start the PostgreSQL, DuckDB, and other containers
- Ensure dependencies are installed
make load-dbmake verify-dbmake testmake exec-pythonbasemake exec-postgresmake exec-duckdbmake exec-pipelinebasemake load-db-postgres-to-minioThis command:
- Exports a sample of data from PostgreSQL to CSV
- Transfers the CSV to the pipelinebase container
- Converts the CSV to Parquet and uploads to MinIO
- Cleans up temporary files
make load-db-minio-to-duckdbmake check-miniomake check-duckdbmake run-all-data-pipelinesThis runs the entire ETL process from PostgreSQL to MinIO to DuckDB.
make stopmake rebuildmake rebuild-cleanThis removes all containers, volumes, and images before rebuilding from scratch.
make statusmake logsFor a specific container: make logs c=container_name
📂 sonnet-scripts
│── 📂 pythonbase/ # Python-based processing container
│── 📂 pipelinebase/ # ETL pipeline and data ingest container
│── 📂 linuxbase/ # Base container for Linux dependencies
│── 📂 jupyterbase/ # Jupyter container for analytics and data science
│── 📂 cli/ # Sonnet CLI tool
│── 🐳 docker-compose.yml # Container orchestration
│── 🛠 Makefile # Automation commands
│── 📜 README.md # You are here!The Sonnet CLI lets you scaffold and run your own local Modern Data Stack projects anywhere on your machine. Zero to running SQL in under 5 minutes.
make install-cli# Create a project with default services (pgduckdb + pgadmin)
sonnet init myproject
# Or interactively select which services to include
sonnet init myproject --interactivecd myproject
# Start all services
sonnet up
# Check status
sonnet status
# Stop all services
sonnet down| Service | Description | Port |
|---|---|---|
| pgduckdb | PostgreSQL with DuckDB extension | 5432 |
| pgadmin | pgAdmin 4 web interface | 8080 |
| cloudbeaver | CloudBeaver web interface | 8978 |
| minio | S3-compatible object storage | 9000, 9001 |
| jupyterbase | Jupyter Lab for Python/SQL | 8888 |
| pipelinebase | ETL pipelines and data loading | - |
| dbtbase | dbt Core for transformations | - |
After running sonnet up, access your stack:
- Database:
postgresql://postgres:postgres@localhost:5432/postgres - pgAdmin: http://localhost:8080 (pgadmin4@pgadmin.org / password)
Github Actions automates builds, test, and environment validation. The pipeline:
- Builds Docker images (
pythonbase,linuxbase) - Starts all services using
docker compose - Runs unit & integration tests (
make test) - Shuts down containers after test pass.
- Push to
mainorfeature/* - Pull Requests to
main
Want to improve Sonnet Scripts? Here's how:
- Fork the repository
- Make your changes and test them locally
- Submit a pull request (PR) for review
For major changes, please open an issue first to discuss your proposal.
We follow Conventional Commits for all commit messages.
Maintained by:
-
[Juan Pablo Urrutia] GitHub: jpurrutia LinkedIn: Juan Pablo Urrutia
-
[Mark Freeman] GitHub: onthemarkdata LinkedIn:Mark Freeman II
If you have questions or encounter issues, feel free to:
- Open a GitHub issue
- Contact directly via LinkedIn
- COMING SOON: Join our Discord community
🚀 Happy data wrangling!
