Conversation
|
@xaviviro Please keep a PR open instead of closing and opening again. Rather, iterate over your branch in separate comments. The pipeline should re-run at every commit. |
|
Thanks @johannschopplich for letting me know I could iterate on the PR - I wasn't aware of that! All checks are now passing. ✅ |
|
@xaviviro Sure! @toon-format/python-maintainers Please decide whether it's the right approach to do it all in one PR vs. separate PRs for setup/CI etc. Smaller PRs usually make it easier to review and move forward. Maybe it's even better to make a poll which code base to pick as base and then incorporate what you all have worked on into this new repo. Please discuss beforehand; it's not about first come first served, but the best foundation for this package. Thank you. 🙏 |
|
Perfect! I had some time now and created this as a starting point to get things moving. I'm completely open to whatever approach the team decides is best - whether it's using this as a base, starting fresh, or incorporating elements from different implementations. |
Keep both reference repositories section and standard Python gitignore structure. Co-authored-by: Justar96
|
Dear team @toon-format/python-maintainers @johannschopplich , I hope this message finds you well. Since there hasn't been much activity on this PR, I'd like to provide some additional context that might help move things forward. The implementation I'm proposing here is based on my python-toon package, which has been live on PyPI and has already accumulated over 5,000 downloads with zero reported issues. You can see the download statistics here: https://pepy.tech/projects/python-toon?timeRange=threeMonths This track record demonstrates the robustness and reliability of the codebase. The implementation includes:
I'm open to any alternative approach you might prefer, but I think it's important we move forward with an official Python implementation. Let me know how you'd like to proceed! Best regards, |
## Code Organization - Add Google-style headers to all 18 source files - Copyright (c) 2025 TOON Format Organization - SPDX-License-Identifier: MIT - Comprehensive module docstrings - Format all source code with Ruff ## Test Suite Expansion - Increase test coverage from 78% to 91% (792 tests) - Add comprehensive test modules: - test_security.py: 24 tests for injection prevention and resource exhaustion - test_internationalization.py: 24 tests for Unicode/UTF-8 support - test_cli.py: 30 integration tests for command-line interface - test_scanner.py: 31 tests for scanner module (100% coverage) - test_string_utils.py: 42 tests for string utilities (100% coverage) - test_normalize_functions.py: 37 tests for normalization (95% coverage) - test_parsing_utils.py: Complete parsing utility coverage - Add 306 official spec compliance tests via test_spec_fixtures.py - Create test fixture infrastructure with JSON schema validation ## Files Changed - Modified: All 18 source files in src/toon_format/ - Added: 8 new test modules - Added: Test fixtures and schema - Added: New utility module _parsing_utils.py
Features: - Add benchmark dependency group with tiktoken>=0.4.0 to pyproject.toml - Export count_tokens, estimate_savings, and compare_formats utilities - Implement token counting using tiktoken with o200k_base encoding (gpt5/gpt5-mini) Documentation Updates: - Add Token Counting & Comparison section to main README with examples - Update docs/README.md with new utility functions in API reference list - Add roadmap section announcing planned comprehensive benchmarks - Add complete Utility Functions section to docs/api.md covering: * count_tokens() - Token counting with tiktoken * estimate_savings() - JSON vs TOON comparison metrics * compare_formats() - Formatted comparison tables - Add Token Efficiency examples with cost estimation patterns - Update LLM integration guide with Measuring Token Savings section - Include cost calculation examples and integration patterns - Update model references from GPT-4 to gpt5 throughout docs - Add benchmark disclaimer noting comprehensive benchmarks coming soon Technical Details: - Update tokenizer documentation from GPT-4o/GPT-4 to gpt5/gpt5-mini - Fix TypedDict usage examples in docs/api.md (EncodeOptions uses dict syntax) - Clarify DecodeOptions is a class while EncodeOptions is a TypedDict - Add toon-spec/ submodule files (CHANGELOG.md and SPEC.md v1.3)
|
@johannschopplich @toon-format/python-maintainers Code Organization
Test Suite Expansion
Files Changed
|
|
@bpradana @davidpirogov Usually I don't want to interfere with code style and repo setup. However, to interate quickly, please leave a review in the upcoming days for this MR. Otherwise, I'd like to merge in order to move forward quickly. Hope you understand that. 🙂 You can incoporate all the best practices with smaller, incremental PRs. @Justar96 Lovely! When this PR is merged, feel free to open a new PR to incorporate these changes. |
|
@johannschopplich my bad, I don't see the approve button, turns out its in the review section 😓 |
|
@bpradana No worries at all! It's kinda complicated on GitHub anyway. I have set up this repo to require 2 reviewers per PR. If that's too strict for the team, I can always lower that. |
|
i think it's good. |
|
@johannschopplich 2 reviewers for now is totally fine, especially in the early stages of development. we can dial it back, if it starts feeling overkill 😁 |
|
We still need to publish to pypi @johannschopplich @toon-format/python-maintainers |
|
Let’s just hold off for a few days until we stabilize the code base and make sure that we are in compliance with spec and tests. We still have a lot to migrate and test. We’ll be ready very soon! |
|
I just wanna point out to that since we have no proper plan and any status. |
|
Yeah - fair point. Our plan is documented here: toon-format/toon#54 Probably better make an issue in this repo - we’ll fix up the chaos once we get all the code and content migrated into this repo |
|
Let's create a proper plan? We can collaborate in notion or any way? |
|
Yeah, we need to - let’s stick to GitHub discussions - keeps everything in one place |
|
@davidpirogov Thanks for explaining – I aggree! @Justar96 Usually, to prevent comments from being buried in closed threads (like this one), you can:
Both help to keep track of what work still needs to be done. It also gives the team the opportunity to choose the person who works best with a task. E.g.: CI integration for GitHub releases and auto-publishing to PyPi. If you want, you can compare the current state of the repo to the goal of a v1 (using the language-agnostic tests for example) and open a roadmap discussion. Title idea: "Roadmap to v1" |

Initial Release: Python TOON Format Implementation v1.0.0
Description
This PR establishes the official Python implementation of the TOON (Token-Oriented Object Notation) format. TOON is a compact, human-readable serialization format designed for passing structured data to Large Language Models with 30-60% token reduction compared to JSON.
This release migrates the complete implementation from the pytoon repository, adds comprehensive CI/CD infrastructure, and establishes the package as
python-toonon PyPI.Type of Change
Related Issues
Initial release - no related issues.
Changes Made
Core Implementation (11 modules, ~1,922 lines)
Package Configuration
python-toon(PyPI)toon_format(Python import)CI/CD Infrastructure
Testing
Documentation
SPEC Compliance
Implementation Details:
[N]for all arrays#prefix for length markers\",\\,\n,\r,\tTesting
Test Output
Test Coverage:
Code Quality
ruff check src/toon_format tests- no issuesruff format src/toon_format tests- code formattedmypy src/toon_format- informational only (24 type hints to improve in future)pytest tests/ -vLinter Output:
$ ruff check src/toon_format tests All checks passed!Checklist
Performance Impact
Performance Characteristics:
Breaking Changes
This is the initial release, so no breaking changes apply.
Screenshots / Examples
Basic Usage
Output:
Tabular Array Example
Output:
Token Efficiency
Output:
Additional Context
Package Details
python-toontoon_formattoonInstallation
Development Setup
Key Features
Code Quality Notes
Mypy Type Checking: The project currently has 24 mypy type errors that are informational only. The CI is configured with
continue-on-error: truefor mypy checks, and the pyproject.toml has lenient mypy settings (disallow_untyped_defs = false,check_untyped_defs = false). These type hints can be improved incrementally in future releases without blocking the current functionality.All runtime behavior is validated through 73 comprehensive tests with 100% pass rate.
Future Roadmap
Checklist for Reviewers
Review Focus Areas