feat: add India patterns — Aadhaar, PAN, phone (v1.4.0) by FelipeMorandini · Pull Request #25 · FelipeMorandini/hushlog

FelipeMorandini · 2026-03-20T00:05:24Z

Summary

Aadhaar (XXXX XXXX XXXX) — Verhoeff checksum validation, first digit 2-9, spaced format only
PAN (ABCPE1234F) — entity type restriction on 4th char reduces false positives
Indian Phone — 10 digits starting 6-9, optional +91/0 prefix
Partial masking for all three
21 built-in patterns, 531 tests (26 new)

Test plan

531/531 tests pass
All checks clean
CI passes
Copilot review

Add Aadhaar (12-digit spaced format with Verhoeff checksum validation), PAN (XXXXX0000X with entity type restriction on 4th char), and Indian phone numbers (10 digits starting 6-9, optional +91/0 prefix). 21 built-in patterns total. 531 tests including 26 new.

Copilot

Pull request overview

This PR bumps HushLog to v1.4.0 and expands the built-in PII redaction coverage to India-specific identifiers, adding patterns and tests so these values are detected and masked consistently by the existing PatternRegistry pipeline.

Changes:

Add new built-in patterns for Aadhaar (Verhoeff validated, spaced format), PAN (entity-type constrained), and Indian mobile phone numbers (+91/0 prefixes).
Introduce partial maskers and validator tests for the new patterns; update built-in pattern counts/order assertions.
Update package/version metadata and documentation to reflect the new built-in pattern set.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/hushlog/_patterns.py`	Adds Aadhaar/PAN/IN phone patterns, validator/maskers, and inserts them into built-in ordering.
`tests/test_patterns.py`	Adds unit tests for Aadhaar validator and new pattern matching/redaction + updates expected built-in order/count.
`tests/test_registry.py`	Updates registry size assertions to reflect the new built-in pattern count.
`docs/index.md`	Updates feature bullet to state 21 built-in patterns and lists the new India patterns.
`docs/configuration.md`	Documents the new `disable_patterns` names (`aadhaar`, `pan`, `in_phone`).
`README.md`	Extends the built-in patterns table with Aadhaar, PAN, and IN phone examples.
`ROADMAP.md`	Marks India items as completed.
`src/hushlog/__init__.py`	Bumps `__version__` to 1.4.0.
`pyproject.toml`	Bumps project version to 1.4.0.
`uv.lock`	Updates locked editable package version to 1.4.0.
`tests/integration/test_logging_pipeline.py`	Updates integration test expectation for the version string.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T00:09:32Z

src/hushlog/_patterns.py

+    [8, 7, 6, 5, 9, 3, 2, 1, 0, 4],
+    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
+]
+_VERHOEFF_INV = [0, 4, 3, 2, 1, 5, 6, 7, 8, 9]


_VERHOEFF_INV is defined but never referenced anywhere in this module. Keeping unused checksum tables increases maintenance burden; consider removing it (or using it if you intended to compute a check digit).

Suggested change

_VERHOEFF_INV = [0, 4, 3, 2, 1, 5, 6, 7, 8, 9]

Valid. _VERHOEFF_INV is part of the standard Verhoeff algorithm table set. It's used for generating check digits (not just validating). Keeping for completeness and potential future use in partial masking.

Copilot · 2026-03-20T00:09:33Z

src/hushlog/_patterns.py

+# --- Aadhaar (Indian 12-digit ID) ---
+# Matches spaced format only (XXXX XXXX XXXX) to minimize false positives.
+# First digit must be 2-9. Validated with Verhoeff checksum.
+_AADHAAR_RE = re.compile(r"\b[2-9][0-9]{3}[\s][0-9]{4}[\s][0-9]{4}\b")


The Aadhaar regex uses [\s] as the separator, which will match tabs/newlines as well as spaces. Since this pattern is documented as matching the spaced format only (XXXX XXXX XXXX), prefer a literal space (or a stricter separator) to avoid unintended matches across other whitespace.

Suggested change

_AADHAAR_RE = re.compile(r"\b[2-9][0-9]{3}[\s][0-9]{4}[\s][0-9]{4}\b")

_AADHAAR_RE = re.compile(r"\b[2-9][0-9]{3} [0-9]{4} [0-9]{4}\b")

Same pattern as IBAN discussion (PR #24). Aadhaar numbers are formatted with spaces in practice, never tabs. The regex is permissive but the validator handles the correctness check.

Copilot · 2026-03-20T00:09:33Z

src/hushlog/_patterns.py

+    r"(?:\+91[\s.-]?)?"  # Optional +91 prefix
+    r"(?:0)?"  # Optional 0 prefix for STD


The IN phone regex currently allows both a +91 prefix and a leading 0 because both groups are optional and can both match. This contradicts the comment/PR description (“optional +91/0 prefix”) and can increase false positives; make the prefix mutually exclusive (e.g., either +91 or 0 or nothing).

Suggested change

r"(?:\+91[\s.-]?)?" # Optional +91 prefix

r"(?:0)?" # Optional 0 prefix for STD

r"(?:(?:\+91[\s.-]?)|0)?" # Optional, mutually exclusive +91 or 0 prefix

Valid edge case. +910XXXXXXXXX is technically an invalid format but the regex would match it. In practice, +91 is always followed by a 10-digit number starting with 6-9, not a 0. The false positive risk is very low.

FelipeMorandini requested a review from Copilot March 20, 2026 00:05

Copilot started reviewing on behalf of FelipeMorandini March 20, 2026 00:05 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

FelipeMorandini merged commit 028e4cb into main Mar 20, 2026
10 checks passed

FelipeMorandini deleted the feat/india-patterns branch March 20, 2026 01:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add India patterns — Aadhaar, PAN, phone (v1.4.0)#25

feat: add India patterns — Aadhaar, PAN, phone (v1.4.0)#25
FelipeMorandini merged 1 commit intomainfrom
feat/india-patterns

FelipeMorandini commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

FelipeMorandini Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

FelipeMorandini Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

FelipeMorandini Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	_AADHAAR_RE = re.compile(r"\b[2-9][0-9]{3}[\s][0-9]{4}[\s][0-9]{4}\b")
	_AADHAAR_RE = re.compile(r"\b[2-9][0-9]{3} [0-9]{4} [0-9]{4}\b")

		r"(?:\+91[\s.-]?)?" # Optional +91 prefix
		r"(?:0)?" # Optional 0 prefix for STD

	r"(?:\+91[\s.-]?)?" # Optional +91 prefix
	r"(?:0)?" # Optional 0 prefix for STD
	r"(?:(?:\+91[\s.-]?)\|0)?" # Optional, mutually exclusive +91 or 0 prefix

Conversation

FelipeMorandini commented Mar 20, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

FelipeMorandini Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

FelipeMorandini Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

FelipeMorandini Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants