-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add India patterns — Aadhaar, PAN, phone (v1.4.0) #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -84,6 +84,50 @@ def _cnpj_validate(text: str) -> bool: | |||||||
| return digits[13] == check2 | ||||||||
|
|
||||||||
|
|
||||||||
| # --------------------------------------------------------------------------- | ||||||||
| # Verhoeff checksum tables (for Aadhaar validation) | ||||||||
| # --------------------------------------------------------------------------- | ||||||||
|
|
||||||||
| _VERHOEFF_D = [ | ||||||||
| [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], | ||||||||
| [1, 2, 3, 4, 0, 6, 7, 8, 9, 5], | ||||||||
| [2, 3, 4, 0, 1, 7, 8, 9, 5, 6], | ||||||||
| [3, 4, 0, 1, 2, 8, 9, 5, 6, 7], | ||||||||
| [4, 0, 1, 2, 3, 9, 5, 6, 7, 8], | ||||||||
| [5, 9, 8, 7, 6, 0, 4, 3, 2, 1], | ||||||||
| [6, 5, 9, 8, 7, 1, 0, 4, 3, 2], | ||||||||
| [7, 6, 5, 9, 8, 2, 1, 0, 4, 3], | ||||||||
| [8, 7, 6, 5, 9, 3, 2, 1, 0, 4], | ||||||||
| [9, 8, 7, 6, 5, 4, 3, 2, 1, 0], | ||||||||
| ] | ||||||||
| _VERHOEFF_INV = [0, 4, 3, 2, 1, 5, 6, 7, 8, 9] | ||||||||
| _VERHOEFF_P = [ | ||||||||
| [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], | ||||||||
| [1, 5, 7, 6, 2, 8, 3, 0, 9, 4], | ||||||||
| [5, 8, 0, 3, 7, 9, 6, 1, 4, 2], | ||||||||
| [8, 9, 1, 6, 0, 4, 3, 5, 2, 7], | ||||||||
| [9, 4, 5, 3, 1, 2, 6, 8, 7, 0], | ||||||||
| [4, 2, 8, 6, 5, 7, 3, 9, 0, 1], | ||||||||
| [2, 7, 9, 3, 8, 0, 6, 4, 1, 5], | ||||||||
| [7, 0, 4, 6, 9, 1, 3, 2, 5, 8], | ||||||||
| ] | ||||||||
|
|
||||||||
|
|
||||||||
| def _aadhaar_validate(text: str) -> bool: | ||||||||
| """Validate an Aadhaar number using the Verhoeff checksum.""" | ||||||||
| digits = [int(c) for c in text if c in _ASCII_DIGITS] | ||||||||
| if len(digits) != 12: | ||||||||
| return False | ||||||||
| # First digit must be 2-9 (Aadhaar doesn't start with 0 or 1) | ||||||||
| if digits[0] < 2: | ||||||||
| return False | ||||||||
| # Verhoeff checksum | ||||||||
| c = 0 | ||||||||
| for i, d in enumerate(reversed(digits)): | ||||||||
| c = _VERHOEFF_D[c][_VERHOEFF_P[i % 8][d]] | ||||||||
| return c == 0 | ||||||||
|
|
||||||||
|
|
||||||||
| def _iban_validate(text: str) -> bool: | ||||||||
| """Validate an IBAN using the mod-97 algorithm (ISO 7064).""" | ||||||||
| # Remove spaces | ||||||||
|
|
@@ -239,6 +283,22 @@ def _partial_mask_eu_vat(m: re.Match[str], mc: str) -> str: | |||||||
| return f"{prefix}{mc * middle_len}{last3}" | ||||||||
|
|
||||||||
|
|
||||||||
| def _partial_mask_aadhaar(m: re.Match[str], mc: str) -> str: | ||||||||
| digits = [c for c in m.group() if c in _ASCII_DIGITS] | ||||||||
| return f"{''.join(digits[:4])} {''.join(digits[4:8])} {mc * 4}" | ||||||||
|
|
||||||||
|
|
||||||||
| def _partial_mask_pan(m: re.Match[str], mc: str) -> str: | ||||||||
| text = m.group() | ||||||||
| return f"{text[:3]}{mc * 6}{text[-1]}" | ||||||||
|
|
||||||||
|
|
||||||||
| def _partial_mask_in_phone(m: re.Match[str], mc: str) -> str: | ||||||||
| digits = [c for c in m.group() if c in _ASCII_DIGITS] | ||||||||
| last4 = "".join(digits[-4:]) | ||||||||
| return f"{mc * 5} {mc * 5}-{last4}" | ||||||||
|
|
||||||||
|
|
||||||||
| def _partial_mask_generic_secret(m: re.Match[str], mc: str) -> str: | ||||||||
| text = m.group() | ||||||||
| for sep in ("=", ":"): | ||||||||
|
|
@@ -594,6 +654,55 @@ def _partial_mask_generic_secret(m: re.Match[str], mc: str) -> str: | |||||||
| ) | ||||||||
|
|
||||||||
|
|
||||||||
| # --- Aadhaar (Indian 12-digit ID) --- | ||||||||
| # Matches spaced format only (XXXX XXXX XXXX) to minimize false positives. | ||||||||
| # First digit must be 2-9. Validated with Verhoeff checksum. | ||||||||
| _AADHAAR_RE = re.compile(r"\b[2-9][0-9]{3}[\s][0-9]{4}[\s][0-9]{4}\b") | ||||||||
|
||||||||
| _AADHAAR_RE = re.compile(r"\b[2-9][0-9]{3}[\s][0-9]{4}[\s][0-9]{4}\b") | |
| _AADHAAR_RE = re.compile(r"\b[2-9][0-9]{3} [0-9]{4} [0-9]{4}\b") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same pattern as IBAN discussion (PR #24). Aadhaar numbers are formatted with spaces in practice, never tabs. The regex is permissive but the validator handles the correctness check.
Copilot
AI
Mar 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IN phone regex currently allows both a +91 prefix and a leading 0 because both groups are optional and can both match. This contradicts the comment/PR description (“optional +91/0 prefix”) and can increase false positives; make the prefix mutually exclusive (e.g., either +91 or 0 or nothing).
| r"(?:\+91[\s.-]?)?" # Optional +91 prefix | |
| r"(?:0)?" # Optional 0 prefix for STD | |
| r"(?:(?:\+91[\s.-]?)|0)?" # Optional, mutually exclusive +91 or 0 prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid edge case. +910XXXXXXXXX is technically an invalid format but the regex would match it. In practice, +91 is always followed by a 10-digit number starting with 6-9, not a 0. The false positive risk is very low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_VERHOEFF_INV is defined but never referenced anywhere in this module. Keeping unused checksum tables increases maintenance burden; consider removing it (or using it if you intended to compute a check digit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid. _VERHOEFF_INV is part of the standard Verhoeff algorithm table set. It's used for generating check digits (not just validating). Keeping for completeness and potential future use in partial masking.