Skip to content

Implement RFC 6901-style tilde escaping for special characters in keys#10

Open
simonw wants to merge 1 commit intomainfrom
claude/review-format-design-gKy5S
Open

Implement RFC 6901-style tilde escaping for special characters in keys#10
simonw wants to merge 1 commit intomainfrom
claude/review-format-design-gKy5S

Conversation

@simonw
Copy link
Owner

@simonw simonw commented Feb 13, 2026

Summary

This PR implements proper escaping of special characters in dictionary keys using RFC 6901-style tilde sequences. This allows keys containing ., $, [, and ~ characters to round-trip correctly through flatten() and unflatten() operations.

Key Changes

  • Added escaping functions: Implemented _escape_key() and _unescape_key() functions that handle tilde-based escaping:

    • ~0~ (tilde)
    • ~1. (dot)
    • ~2$ (dollar sign)
    • ~3[ (opening bracket)
  • Updated flatten logic: Modified _object_to_rows() to escape dictionary keys before building the flattened path, preventing special characters from being misinterpreted as format delimiters.

  • Enhanced unflatten logic: Implemented a three-pass approach:

    1. First pass: Parse flattened keys and apply type conversions (unchanged)
    2. Second pass: Convert integer-keyed dicts to lists using [N] pattern matching on escaped keys
    3. Third pass: Unescape all remaining dictionary keys
  • Fixed regex patterns:

    • Changed _int_key_re to use fullmatch() instead of match() and added $ anchor to prevent false matches on keys like [0]suffix
    • Fixed rsplit() call to use maxsplit=1 instead of 2 for correct type suffix extraction
  • Updated test case: Changed the existing dollar sign test case from _$!<home>!$_ to use proper escaping, demonstrating the new behavior.

  • Added comprehensive test suite: 150+ lines of new tests covering:

    • Dots in keys
    • Dollar signs in keys
    • Ambiguous type suffixes
    • Bracket notation in keys
    • Tilde characters in keys
    • Combinations of special characters
  • Updated documentation: Added RFC 6901 reference and escaping table to README with examples.

Implementation Details

The escaping order is critical: ~ must be escaped first to avoid double-escaping. Similarly, unescaping must reverse this order, processing ~3 before ~0 to prevent premature decoding of escape sequences.

The three-pass unflatten approach ensures that array index detection ([N] patterns) works correctly on escaped keys, so literal bracket keys like [0] (escaped as ~30]) are not confused with actual array indices.

https://claude.ai/code/session_01SwKhkAmPwW1qrMHpJmYT3x

Keys containing `.`, `$`, `[`, or `~` previously caused crashes
(ValueError, AttributeError) or silent data corruption on round-trip
through flatten/unflatten. This adds escape sequences (~0 for ~, ~1
for ., ~2 for $, ~3 for [) so all valid JSON keys round-trip correctly.

Also fixes: rsplit("$", 2) -> rsplit("$", 1), _int_key_re.match ->
fullmatch, and guards against AttributeError when root becomes list.

https://claude.ai/code/session_01SwKhkAmPwW1qrMHpJmYT3x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants