Skip to content

feat: add Canada SIN + international patterns (v1.5.0)#26

Merged
FelipeMorandini merged 1 commit intomainfrom
feat/canada-international-patterns
Mar 20, 2026
Merged

feat: add Canada SIN + international patterns (v1.5.0)#26
FelipeMorandini merged 1 commit intomainfrom
feat/canada-international-patterns

Conversation

@FelipeMorandini
Copy link
Owner

Summary

  • Canadian SIN (XXX-XXX-XXX) — Luhn validated, first digit 1-7 or 9
  • E.164 Phone (+CC XXXXXXXXX) — international format, 8-15 digits, validator for length
  • SWIFT/BIC (AAAABBCCXXX) — 8 or 11 chars, ISO 3166-1 country code validation
  • E.164 placed after US/BR/IN phone patterns to avoid double-matching
  • 24 built-in patterns, 559 tests (28 new)

Test plan

  • 559/559 tests pass
  • All checks clean
  • CI passes
  • Copilot review

Add Canadian SIN (XXX-XXX-XXX with Luhn validation, first digit 1-7/9),
E.164 international phone numbers (+ country code + 8-15 digits), and
SWIFT/BIC codes (8 or 11 chars with ISO 3166-1 country validation).

E.164 placed after country-specific phone patterns to avoid overlap.
24 built-in patterns total. 559 tests including 28 new.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new built-in redaction patterns/validators (Canada SIN, E.164 international phones, SWIFT/BIC) and bumps the library version to v1.5.0, updating tests and docs accordingly.

Changes:

  • Add SIN/E.164/SWIFT built-in PatternEntrys with validators and partial maskers, and wire them into built-in pattern ordering.
  • Expand unit tests to cover the new validators/pattern behavior and update built-in pattern count/order assertions.
  • Bump package version to 1.5.0 and update README/ROADMAP documentation.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/hushlog/_patterns.py Introduces SIN/E.164/SWIFT patterns, validators, partial maskers, and updates built-in ordering.
tests/test_patterns.py Adds validator + pattern tests for SIN/E.164/SWIFT and updates built-in count/order expectations.
tests/test_registry.py Updates registry size assertions to reflect 24 built-ins.
src/hushlog/__init__.py Bumps __version__ to 1.5.0.
tests/integration/test_logging_pipeline.py Updates version assertion to 1.5.0.
pyproject.toml Bumps project version to 1.5.0.
uv.lock Updates locked package version to 1.5.0.
README.md Documents new patterns and updates phone coverage statement.
ROADMAP.md Marks SIN/E.164/SWIFT items as completed and clarifies passport deferral.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +337 to +339
# Show + and first 2 digits (country code approximation)
cc = "".join(digits[:2])
return f"+{cc} {mc * 4} {mc * 4} {last4}"
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_partial_mask_e164() always exposes the first 2 digits after + as the “country code”. For 1-digit (e.g. +1…) or 3-digit country codes, this leaks a subscriber digit or mis-identifies the country code, which defeats the intent of only preserving the country code. Consider capturing the country-code part in _E164_PHONE_RE (e.g., a named group) and using that exact group in the partial masker, or otherwise deriving the country code from the matched prefix rather than digits[:2].

Suggested change
# Show + and first 2 digits (country code approximation)
cc = "".join(digits[:2])
return f"+{cc} {mc * 4} {mc * 4} {last4}"
# Mask all digits except the last 4 to avoid leaking country/subscriber digits
return f"+{mc * 2} {mc * 4} {mc * 4} {last4}"

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid. The partial mask approximates country code as first 2 digits. For +1 (US/Canada) this shows +1X which includes the first subscriber digit. Acceptable trade-off for simplicity — the exact country code length varies (1-3 digits) and detecting it properly requires a country code table.



def _swift_validate(text: str) -> bool:
"""Validate a SWIFT/BIC code: chars 5-6 must be a valid ISO 3166-1 country code."""
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_SWIFT_COUNTRY_CODES includes "XK" while the surrounding comments/docstring state ISO 3166-1 alpha-2 validation. XK is commonly used for Kosovo but is not an official ISO 3166-1 code; either remove it to be strictly ISO-compliant or update the documentation to explicitly allow this non-ISO exception.

Suggested change
"""Validate a SWIFT/BIC code: chars 5-6 must be a valid ISO 3166-1 country code."""
"""Validate a SWIFT/BIC code: chars 5-6 must be a valid ISO 3166-1 alpha-2 country code (plus 'XK' for Kosovo)."""

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — XK (Kosovo) is a user-assigned code, not officially ISO 3166-1. However, it's widely used in practice (SWIFT, EU, UN) for Kosovo operations. Including it is the pragmatic choice for a PII detection library.

@FelipeMorandini FelipeMorandini merged commit ebd3f16 into main Mar 20, 2026
10 checks passed
@FelipeMorandini FelipeMorandini deleted the feat/canada-international-patterns branch March 20, 2026 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants