chore: initial version by xaviviro · Pull Request #6 · toon-format/toon-python

xaviviro · 2025-11-03T09:30:26Z

Initial Release: Python TOON Format Implementation v1.0.0

Description

This PR establishes the official Python implementation of the TOON (Token-Oriented Object Notation) format. TOON is a compact, human-readable serialization format designed for passing structured data to Large Language Models with 30-60% token reduction compared to JSON.

This release migrates the complete implementation from the pytoon repository, adds comprehensive CI/CD infrastructure, and establishes the package as python-toon on PyPI.

Type of Change

New feature (non-breaking change that adds functionality)
Documentation update
Bug fix (non-breaking change that fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactoring (no functional changes)
Performance improvement
Test coverage improvement

Related Issues

Initial release - no related issues.

Changes Made

Core Implementation (11 modules, ~1,922 lines)

Complete encoder implementation with support for objects, arrays, tabular format, and primitives
Full decoder with strict/lenient parsing modes
CLI tool for JSON ↔ TOON conversion
Type definitions and constants following TOON specification
Value normalization for Python-specific types (Decimal, datetime, etc.)

Package Configuration

Package name: python-toon (PyPI)
Module name: toon_format (Python import)
Version: 1.0.0
Python support: 3.8-3.14 (including 3.14t free-threaded)
Build system: hatchling (modern, PEP 517 compliant)
Dependencies: Zero runtime dependencies

CI/CD Infrastructure

GitHub Actions workflow for testing across Python 3.8-3.12
Automated PyPI publishing via OIDC trusted publishing
TestPyPI workflow for pre-release validation
Ruff linting and formatting enforcement
Type checking with mypy
Coverage reporting with pytest-cov

Testing

73 comprehensive tests covering:
- Encoding: primitives, objects, arrays (tabular and mixed), delimiters, indentation
- Decoding: basic structures, strict mode, delimiters, length markers, edge cases
- Roundtrip: encode → decode → encode consistency
- 100% test pass rate

Documentation

Comprehensive README.md with:
- Installation instructions (pip and uv)
- Quick start guide
- Complete API reference
- CLI usage examples
- LLM integration best practices
- Token efficiency comparisons
CONTRIBUTING.md with development workflow
PR template for future contributions
Issue templates for bug reports
examples.py with 7 runnable demonstrations

SPEC Compliance

This PR implements/fixes spec compliance
Spec section(s) affected: All sections (complete implementation)
Spec version: Latest (https://github.com/toon-format/spec)

Implementation Details:

✅ YAML-style indentation for nested objects
✅ CSV-style tabular format for uniform arrays
✅ Inline format for primitive arrays
✅ List format for mixed arrays
✅ Length markers [N] for all arrays
✅ Optional # prefix for length markers
✅ Delimiter options: comma (default), tab, pipe
✅ Quoting rules for strings (minimal, spec-compliant)
✅ Escape sequences: \", \\, \n, \r, \t
✅ Primitives: null, true, false, numbers, strings
✅ Strict and lenient parsing modes

Testing

Test Output

============================= test session starts ==============================
platform darwin -- Python 3.11.14, pytest-8.4.2, pluggy-1.6.0
collected 73 items

tests/test_decoder.py .................................            [ 45%]
tests/test_encoder.py ........................................      [100%]

============================== 73 passed in 0.03s ==============================

Test Coverage:

Encoder: 40 tests covering all encoding scenarios
Decoder: 33 tests covering parsing and validation
All edge cases, delimiters, and format options tested
100% pass rate

Code Quality

Ran ruff check src/toon_format tests - no issues
Ran ruff format src/toon_format tests - code formatted
Ran mypy src/toon_format - informational only (24 type hints to improve in future)
All tests pass: pytest tests/ -v

Linter Output:

$ ruff check src/toon_format tests
All checks passed!

Checklist

My code follows the project's coding standards (PEP 8, line length 100)
I have added type hints to new code
I have added tests that prove my fix/feature works
New and existing tests pass locally
I have updated documentation (README.md if needed)
My changes do not introduce new dependencies
I have maintained Python 3.8+ compatibility
I have reviewed the TOON specification for relevant sections

Performance Impact

No performance impact
Performance improvement (describe below)
Potential performance regression (describe and justify below)

Performance Characteristics:

Encoder: Fast string building with minimal allocations
Decoder: Single-pass parsing with minimal backtracking
Zero runtime dependencies for optimal load times
Suitable for high-frequency encoding/decoding scenarios

Breaking Changes

No breaking changes
Breaking changes (describe migration path below)

This is the initial release, so no breaking changes apply.

Screenshots / Examples

Basic Usage

from toon_format import encode

# Simple object
data = {"name": "Alice", "age": 30}
print(encode(data))

Output:

name: Alice
age: 30

Tabular Array Example

users = [
    {"id": 1, "name": "Alice", "age": 30},
    {"id": 2, "name": "Bob", "age": 25},
    {"id": 3, "name": "Charlie", "age": 35},
]
print(encode(users))

Output:

[3,]{id,name,age}:
  1,Alice,30
  2,Bob,25
  3,Charlie,35

Token Efficiency

import json
from toon_format import encode

data = {
    "users": [
        {"id": 1, "name": "Alice", "age": 30, "active": True},
        {"id": 2, "name": "Bob", "age": 25, "active": True},
        {"id": 3, "name": "Charlie", "age": 35, "active": False},
    ]
}

json_str = json.dumps(data)
toon_str = encode(data)

print(f"JSON: {len(json_str)} characters")
print(f"TOON: {len(toon_str)} characters")
print(f"Reduction: {100 * (1 - len(toon_str) / len(json_str)):.1f}%")

Output:

JSON: 177 characters
TOON: 85 characters
Reduction: 52.0%

Additional Context

Package Details

PyPI Package: python-toon
Import Path: toon_format
CLI Command: toon
License: MIT
Repository: https://github.com/toon-format/toon-python
Documentation: https://github.com/toon-format/spec

Installation

# With pip
pip install python-toon

# With uv (recommended)
uv pip install python-toon

Development Setup

# Clone repository
git clone https://github.com/toon-format/toon-python.git
cd toon-python

# Install with uv
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run linters
ruff check src/toon_format tests
mypy src/toon_format

Key Features

Token Efficiency: 30-60% reduction compared to JSON
Human Readable: YAML-like syntax for objects, CSV-like for arrays
Spec Compliant: 100% compatible with official TOON specification
Type Safe: Full type hints throughout codebase
Well Tested: 73 tests with 100% pass rate
Zero Dependencies: No runtime dependencies
Python 3.8+: Supports Python 3.8 through 3.14t (free-threaded)
Fast: Single-pass parsing, minimal allocations
Flexible: Multiple delimiters, indentation options, strict/lenient modes
CLI Included: Command-line tool for JSON ↔ TOON conversion

Code Quality Notes

Mypy Type Checking: The project currently has 24 mypy type errors that are informational only. The CI is configured with continue-on-error: true for mypy checks, and the pyproject.toml has lenient mypy settings (disallow_untyped_defs = false, check_untyped_defs = false). These type hints can be improved incrementally in future releases without blocking the current functionality.

All runtime behavior is validated through 73 comprehensive tests with 100% pass rate.

Future Roadmap

Improve type hint coverage (address 24 mypy warnings)
Additional encoding options (custom formatters)
Performance optimizations for large datasets
Streaming encoder/decoder for very large files
Additional language implementations
Enhanced CLI features (pretty-printing, validation)

Checklist for Reviewers

Code changes are clear and well-documented
Tests adequately cover the changes
Documentation is updated
No security concerns
Follows TOON specification
Backward compatible (or breaking changes are justified and documented)

Review Focus Areas

Spec Compliance: Verify encoding/decoding matches TOON spec exactly
Edge Cases: Check handling of empty strings, special characters, nested structures
Type Safety: Ensure type hints are accurate and complete
Error Messages: Verify error messages are clear and helpful
Documentation: Confirm examples work as shown
CI/CD: Verify workflows are properly configured for PyPI deployment

johannschopplich · 2025-11-03T09:33:59Z

@xaviviro Please keep a PR open instead of closing and opening again. Rather, iterate over your branch in separate comments. The pipeline should re-run at every commit.

xaviviro · 2025-11-03T09:43:49Z

Thanks @johannschopplich for letting me know I could iterate on the PR - I wasn't aware of that! All checks are now passing. ✅

johannschopplich · 2025-11-03T09:48:33Z

@xaviviro Sure! @toon-format/python-maintainers Please decide whether it's the right approach to do it all in one PR vs. separate PRs for setup/CI etc. Smaller PRs usually make it easier to review and move forward.

Maybe it's even better to make a poll which code base to pick as base and then incorporate what you all have worked on into this new repo. Please discuss beforehand; it's not about first come first served, but the best foundation for this package. Thank you. 🙏

.github/ISSUE_TEMPLATE/bug_report.yml

xaviviro · 2025-11-03T10:04:45Z

Perfect! I had some time now and created this as a starting point to get things moving. I'm completely open to whatever approach the team decides is best - whether it's using this as a base, starting fresh, or incorporating elements from different implementations.
I agree smaller PRs make more sense for review. Happy to break this down or adjust the approach based on what works best for everyone.
Looking forward to the first official release so I can archive my repo and point everyone to the canonical implementation. Thanks for coordinating this! 🙏

Keep both reference repositories section and standard Python gitignore structure. Co-authored-by: Justar96

xaviviro · 2025-11-04T10:08:55Z

Dear team @toon-format/python-maintainers @johannschopplich ,

I hope this message finds you well. Since there hasn't been much activity on this PR, I'd like to provide some additional context that might help move things forward.

The implementation I'm proposing here is based on my python-toon package, which has been live on PyPI and has already accumulated over 5,000 downloads with zero reported issues. You can see the download statistics here:

https://pepy.tech/projects/python-toon?timeRange=threeMonths

This track record demonstrates the robustness and reliability of the codebase.

The implementation includes:

73 comprehensive tests with 100% pass rate
Full CI/CD pipeline with GitHub Actions
Complete TOON spec compliance
Production-ready code quality (ruff, mypy)
Extensive documentation and examples

I'm open to any alternative approach you might prefer, but I think it's important we move forward with an official Python implementation. Let me know how you'd like to proceed!

Best regards,
Xavi

## Code Organization - Add Google-style headers to all 18 source files - Copyright (c) 2025 TOON Format Organization - SPDX-License-Identifier: MIT - Comprehensive module docstrings - Format all source code with Ruff ## Test Suite Expansion - Increase test coverage from 78% to 91% (792 tests) - Add comprehensive test modules: - test_security.py: 24 tests for injection prevention and resource exhaustion - test_internationalization.py: 24 tests for Unicode/UTF-8 support - test_cli.py: 30 integration tests for command-line interface - test_scanner.py: 31 tests for scanner module (100% coverage) - test_string_utils.py: 42 tests for string utilities (100% coverage) - test_normalize_functions.py: 37 tests for normalization (95% coverage) - test_parsing_utils.py: Complete parsing utility coverage - Add 306 official spec compliance tests via test_spec_fixtures.py - Create test fixture infrastructure with JSON schema validation ## Files Changed - Modified: All 18 source files in src/toon_format/ - Added: 8 new test modules - Added: Test fixtures and schema - Added: New utility module _parsing_utils.py

Features: - Add benchmark dependency group with tiktoken>=0.4.0 to pyproject.toml - Export count_tokens, estimate_savings, and compare_formats utilities - Implement token counting using tiktoken with o200k_base encoding (gpt5/gpt5-mini) Documentation Updates: - Add Token Counting & Comparison section to main README with examples - Update docs/README.md with new utility functions in API reference list - Add roadmap section announcing planned comprehensive benchmarks - Add complete Utility Functions section to docs/api.md covering: * count_tokens() - Token counting with tiktoken * estimate_savings() - JSON vs TOON comparison metrics * compare_formats() - Formatted comparison tables - Add Token Efficiency examples with cost estimation patterns - Update LLM integration guide with Measuring Token Savings section - Include cost calculation examples and integration patterns - Update model references from GPT-4 to gpt5 throughout docs - Add benchmark disclaimer noting comprehensive benchmarks coming soon Technical Details: - Update tokenizer documentation from GPT-4o/GPT-4 to gpt5/gpt5-mini - Fix TypedDict usage examples in docs/api.md (EncodeOptions uses dict syntax) - Clarify DecodeOptions is a class while EncodeOptions is a TypedDict - Add toon-spec/ submodule files (CHANGELOG.md and SPEC.md v1.3)

Justar96 · 2025-11-04T12:02:13Z

@johannschopplich @toon-format/python-maintainers
Hey everyone,
I've added a full test suite for compliance [https://github.com/toon-format/spec/tree/main/tests] with 91% coverage, and fixed some encode and decode issues to comply with the main spec.

Code Organization

Add Google-style headers to all 18 source files
- Copyright (c) 2025 TOON Format Organization
- SPDX-License-Identifier: MIT
- Comprehensive module docstrings
Format all source code with Ruff

Test Suite Expansion

Increase test coverage from 78% to 91% (792 tests)
Add comprehensive test modules:
- test_security.py: 24 tests for injection prevention and resource exhaustion
- test_internationalization.py: 24 tests for Unicode/UTF-8 support
- test_cli.py: 30 integration tests for command-line interface
- test_scanner.py: 31 tests for scanner module (100% coverage)
- test_string_utils.py: 42 tests for string utilities (100% coverage)
- test_normalize_functions.py: 37 tests for normalization (95% coverage)
- test_parsing_utils.py: Complete parsing utility coverage
Add 306 official spec compliance tests via test_spec_fixtures.py
Create test fixture infrastructure with JSON schema validation

Files Changed

Modified: All 18 source files in src/toon_format/
Added: 8 new test modules
Added: Test fixtures and schema
Added: New utility module _parsing_utils.py

johannschopplich · 2025-11-04T12:48:03Z

@bpradana @davidpirogov Usually I don't want to interfere with code style and repo setup. However, to interate quickly, please leave a review in the upcoming days for this MR. Otherwise, I'd like to merge in order to move forward quickly. Hope you understand that. 🙂 You can incoporate all the best practices with smaller, incremental PRs.

@Justar96 Lovely! When this PR is merged, feel free to open a new PR to incorporate these changes.

bpradana

Overall, LGTM 🚀

src/toon_format/cli.py

johannschopplich · 2025-11-04T15:28:33Z

@bpradana You have to leave a review 🙂 – and approve this PR so we can get this merged:

@xaviviro Your honor to merge afterwards! Thank you for your work. 👏

bpradana

LGTM!

bpradana · 2025-11-04T15:31:46Z

@johannschopplich my bad, I don't see the approve button, turns out its in the review section 😓

johannschopplich · 2025-11-04T15:33:10Z

@bpradana No worries at all! It's kinda complicated on GitHub anyway. I have set up this repo to require 2 reviewers per PR. If that's too strict for the team, I can always lower that.

Justar96 · 2025-11-04T15:37:06Z

i think it's good.

bpradana · 2025-11-04T15:48:56Z

@johannschopplich 2 reviewers for now is totally fine, especially in the early stages of development. we can dial it back, if it starts feeling overkill 😁

Justar96 · 2025-11-05T05:55:59Z

We still need to publish to pypi @johannschopplich @toon-format/python-maintainers

davidpirogov · 2025-11-05T06:05:42Z

Let’s just hold off for a few days until we stabilize the code base and make sure that we are in compliance with spec and tests.

We still have a lot to migrate and test. We’ll be ready very soon!

Justar96 · 2025-11-05T06:07:56Z

I just wanna point out to that since we have no proper plan and any status.

davidpirogov · 2025-11-05T06:11:58Z

Yeah - fair point. Our plan is documented here: toon-format/toon#54

Probably better make an issue in this repo - we’ll fix up the chaos once we get all the code and content migrated into this repo

Justar96 · 2025-11-05T06:14:57Z

Let's create a proper plan? We can collaborate in notion or any way?

davidpirogov · 2025-11-05T06:18:10Z

Yeah, we need to - let’s stick to GitHub discussions - keeps everything in one place

johannschopplich · 2025-11-05T08:14:32Z

@davidpirogov Thanks for explaining – I aggree!

@Justar96 Usually, to prevent comments from being buried in closed threads (like this one), you can:

Open a discussion for open topics, like general roadmap
Or create issues for specific issues/ideas that can be worked on independently

Both help to keep track of what work still needs to be done. It also gives the team the opportunity to choose the person who works best with a task. E.g.: CI integration for GitHub releases and auto-publishing to PyPi.

If you want, you can compare the current state of the repo to the goal of a v1 (using the language-agnostic tests for example) and open a roadmap discussion. Title idea: "Roadmap to v1"

xaviviro added 7 commits November 3, 2025 09:53

first commit

4721c8d

first

f3e0040

uv fix

a4990c3

uv fix

4bc5354

uv fix

7d78331

code formatted for lint

6853331

code formatted for lint mypi

85d260c

xaviviro requested review from a team and johannschopplich as code owners November 3, 2025 09:30

xaviviro closed this Nov 3, 2025

fixes

87bc369

xaviviro reopened this Nov 3, 2025

johannschopplich requested changes Nov 3, 2025

View reviewed changes

.github/ISSUE_TEMPLATE/bug_report.yml Show resolved Hide resolved

xaviviro added 2 commits November 3, 2025 10:56

templates

0d25799

from python-toon to toon_format

be59f51

xaviviro and others added 2 commits November 3, 2025 20:51

Merge branch 'main' into first-version

1cc45a9

Resolve .gitignore merge conflict

f074da7

Keep both reference repositories section and standard Python gitignore structure. Co-authored-by: Justar96

Justar96 and others added 5 commits November 4, 2025 18:08

Fix coverage configuration for GitHub Actions

c571ab7

Fix linting errors in test suite

3665973

Fix format

dbefa7e

johannschopplich mentioned this pull request Nov 4, 2025

docs: update README with improved TOON description and installation i… #7

Closed

johannschopplich changed the title ~~First version~~ chore: initial version Nov 4, 2025

johannschopplich requested a review from bpradana November 4, 2025 13:19

bpradana reviewed Nov 4, 2025

View reviewed changes

src/toon_format/cli.py Show resolved Hide resolved

johannschopplich mentioned this pull request Nov 4, 2025

JSON indentation option in decode method #10

Open

johannschopplich approved these changes Nov 4, 2025

View reviewed changes

bpradana approved these changes Nov 4, 2025

View reviewed changes

Justar96 merged commit 43fd07b into main Nov 4, 2025
6 checks passed

xaviviro deleted the first-version branch November 4, 2025 18:19

Justar96 mentioned this pull request Nov 7, 2025

feat: setup PyPI publishing infrastructure and v0.9.0-beta.1 release #22

Merged

39 tasks

Conversation

xaviviro commented Nov 3, 2025 • edited by Justar96 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Initial Release: Python TOON Format Implementation v1.0.0

Description

Type of Change

Related Issues

Changes Made

Core Implementation (11 modules, ~1,922 lines)

Package Configuration

CI/CD Infrastructure

Testing

Documentation

SPEC Compliance

Testing

Test Output

Code Quality

Checklist

Performance Impact

Breaking Changes

Screenshots / Examples

Basic Usage

Tabular Array Example

Token Efficiency

Additional Context

Package Details

Installation

Development Setup

Key Features

Code Quality Notes

Future Roadmap

Checklist for Reviewers

Review Focus Areas

Uh oh!

johannschopplich commented Nov 3, 2025

Uh oh!

xaviviro commented Nov 3, 2025

Uh oh!

johannschopplich commented Nov 3, 2025

Uh oh!

Uh oh!

xaviviro commented Nov 3, 2025

Uh oh!

xaviviro commented Nov 4, 2025

Uh oh!

Justar96 commented Nov 4, 2025

Code Organization

Test Suite Expansion

Files Changed

Uh oh!

johannschopplich commented Nov 4, 2025

Uh oh!

bpradana left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johannschopplich commented Nov 4, 2025

Uh oh!

bpradana left a comment

Choose a reason for hiding this comment

Uh oh!

bpradana commented Nov 4, 2025

Uh oh!

Uh oh!

johannschopplich commented Nov 4, 2025

Uh oh!

Justar96 commented Nov 4, 2025

Uh oh!

bpradana commented Nov 4, 2025

Uh oh!

Justar96 commented Nov 5, 2025

Uh oh!

davidpirogov commented Nov 5, 2025

Uh oh!

Justar96 commented Nov 5, 2025

Uh oh!

davidpirogov commented Nov 5, 2025

Uh oh!

Justar96 commented Nov 5, 2025

Uh oh!

xaviviro commented Nov 3, 2025 •

edited by Justar96

Loading