Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,9 @@ HushLog wraps your existing logging formatters with a `RedactingFormatter` that
| BR Phone | `(11) 91234-5678` | `[BR_PHONE REDACTED]` | Brazilian mobile/landline, optional +55 prefix |
| IBAN | `GB29 NWBK 6016 1331 9268 19` | `[IBAN REDACTED]` | International bank account number, mod-97 validated |
| EU VAT | `DE123456789` | `[EU_VAT REDACTED]` | EU VAT numbers with country prefix |
| Aadhaar | `2345 6789 0124` | `[AADHAAR REDACTED]` | Indian 12-digit ID, Verhoeff checksum validated |
| PAN | `BNZPM2501F` | `[PAN REDACTED]` | Indian Permanent Account Number, entity type validated |
| IN Phone | `+91 98765 43210` | `[IN_PHONE REDACTED]` | Indian mobile numbers, optional +91/0 prefix |

## Configuration

Expand Down
6 changes: 3 additions & 3 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,9 +172,9 @@
- [x] EU VAT numbers (country-prefixed)

**India (`hushlog[in]`):**
- [ ] Aadhaar (12 digits with Verhoeff checksum)
- [ ] PAN (`XXXXX0000X` format)
- [ ] Indian phone numbers (+91 formats)
- [x] Aadhaar (12 digits with Verhoeff checksum)
- [x] PAN (`XXXXX0000X` format)
- [x] Indian phone numbers (+91 formats)

**Canada (`hushlog[ca]`):**
- [ ] SIN (`XXX-XXX-XXX` with Luhn validation)
Expand Down
3 changes: 3 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ A frozenset of built-in pattern names to skip. Available names:
| `br_phone` | Brazilian phone numbers |
| `iban` | International bank account numbers (mod-97 validated) |
| `eu_vat` | EU VAT numbers (country-prefixed) |
| `aadhaar` | Indian Aadhaar numbers (Verhoeff checksum validated) |
| `pan` | Indian PAN numbers (entity type validated) |
| `in_phone` | Indian mobile phone numbers |
| `generic_secret` | Generic secrets (label-based) |

### `mask_style`
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ HushLog automatically detects and redacts personally identifiable information (P
## Features

- **Zero-config** — one call to `hushlog.patch()` and you're done
- **18 built-in patterns** — email, credit card (Luhn validated), SSN, phone, JWT, AWS keys, Stripe, GitHub tokens, GCP keys, IPv4/IPv6, CPF, CNPJ, BR phone, IBAN, EU VAT, generic secrets
- **21 built-in patterns** — email, credit card (Luhn validated), SSN, phone, JWT, AWS keys, Stripe, GitHub tokens, GCP keys, IPv4/IPv6, CPF, CNPJ, BR phone, IBAN, EU VAT, Aadhaar, PAN, Indian phone, generic secrets
- **Non-invasive** — wraps existing formatters, no logger rewrites needed
- **Partial masking** — `j***@e***.com` instead of `[EMAIL REDACTED]`
- **Ecosystem integrations** — JSON logs, structlog, loguru
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "hushlog"
version = "1.3.0"
version = "1.4.0"
description = "Zero-config PII redaction for Python logging"
readme = "README.md"
license = "MIT"
Expand Down
2 changes: 1 addition & 1 deletion src/hushlog/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from hushlog._registry import PatternRegistry as PatternRegistry
from hushlog._structlog import structlog_processor as structlog_processor

__version__ = "1.3.0"
__version__ = "1.4.0"

__all__ = [
"Config",
Expand Down
113 changes: 113 additions & 0 deletions src/hushlog/_patterns.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,50 @@ def _cnpj_validate(text: str) -> bool:
return digits[13] == check2


# ---------------------------------------------------------------------------
# Verhoeff checksum tables (for Aadhaar validation)
# ---------------------------------------------------------------------------

_VERHOEFF_D = [
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 2, 3, 4, 0, 6, 7, 8, 9, 5],
[2, 3, 4, 0, 1, 7, 8, 9, 5, 6],
[3, 4, 0, 1, 2, 8, 9, 5, 6, 7],
[4, 0, 1, 2, 3, 9, 5, 6, 7, 8],
[5, 9, 8, 7, 6, 0, 4, 3, 2, 1],
[6, 5, 9, 8, 7, 1, 0, 4, 3, 2],
[7, 6, 5, 9, 8, 2, 1, 0, 4, 3],
[8, 7, 6, 5, 9, 3, 2, 1, 0, 4],
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
]
_VERHOEFF_INV = [0, 4, 3, 2, 1, 5, 6, 7, 8, 9]
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_VERHOEFF_INV is defined but never referenced anywhere in this module. Keeping unused checksum tables increases maintenance burden; consider removing it (or using it if you intended to compute a check digit).

Suggested change
_VERHOEFF_INV = [0, 4, 3, 2, 1, 5, 6, 7, 8, 9]

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid. _VERHOEFF_INV is part of the standard Verhoeff algorithm table set. It's used for generating check digits (not just validating). Keeping for completeness and potential future use in partial masking.

_VERHOEFF_P = [
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 5, 7, 6, 2, 8, 3, 0, 9, 4],
[5, 8, 0, 3, 7, 9, 6, 1, 4, 2],
[8, 9, 1, 6, 0, 4, 3, 5, 2, 7],
[9, 4, 5, 3, 1, 2, 6, 8, 7, 0],
[4, 2, 8, 6, 5, 7, 3, 9, 0, 1],
[2, 7, 9, 3, 8, 0, 6, 4, 1, 5],
[7, 0, 4, 6, 9, 1, 3, 2, 5, 8],
]


def _aadhaar_validate(text: str) -> bool:
"""Validate an Aadhaar number using the Verhoeff checksum."""
digits = [int(c) for c in text if c in _ASCII_DIGITS]
if len(digits) != 12:
return False
# First digit must be 2-9 (Aadhaar doesn't start with 0 or 1)
if digits[0] < 2:
return False
# Verhoeff checksum
c = 0
for i, d in enumerate(reversed(digits)):
c = _VERHOEFF_D[c][_VERHOEFF_P[i % 8][d]]
return c == 0


def _iban_validate(text: str) -> bool:
"""Validate an IBAN using the mod-97 algorithm (ISO 7064)."""
# Remove spaces
Expand Down Expand Up @@ -239,6 +283,22 @@ def _partial_mask_eu_vat(m: re.Match[str], mc: str) -> str:
return f"{prefix}{mc * middle_len}{last3}"


def _partial_mask_aadhaar(m: re.Match[str], mc: str) -> str:
digits = [c for c in m.group() if c in _ASCII_DIGITS]
return f"{''.join(digits[:4])} {''.join(digits[4:8])} {mc * 4}"


def _partial_mask_pan(m: re.Match[str], mc: str) -> str:
text = m.group()
return f"{text[:3]}{mc * 6}{text[-1]}"


def _partial_mask_in_phone(m: re.Match[str], mc: str) -> str:
digits = [c for c in m.group() if c in _ASCII_DIGITS]
last4 = "".join(digits[-4:])
return f"{mc * 5} {mc * 5}-{last4}"


def _partial_mask_generic_secret(m: re.Match[str], mc: str) -> str:
text = m.group()
for sep in ("=", ":"):
Expand Down Expand Up @@ -594,6 +654,55 @@ def _partial_mask_generic_secret(m: re.Match[str], mc: str) -> str:
)


# --- Aadhaar (Indian 12-digit ID) ---
# Matches spaced format only (XXXX XXXX XXXX) to minimize false positives.
# First digit must be 2-9. Validated with Verhoeff checksum.
_AADHAAR_RE = re.compile(r"\b[2-9][0-9]{3}[\s][0-9]{4}[\s][0-9]{4}\b")
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Aadhaar regex uses [\s] as the separator, which will match tabs/newlines as well as spaces. Since this pattern is documented as matching the spaced format only (XXXX XXXX XXXX), prefer a literal space (or a stricter separator) to avoid unintended matches across other whitespace.

Suggested change
_AADHAAR_RE = re.compile(r"\b[2-9][0-9]{3}[\s][0-9]{4}[\s][0-9]{4}\b")
_AADHAAR_RE = re.compile(r"\b[2-9][0-9]{3} [0-9]{4} [0-9]{4}\b")

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same pattern as IBAN discussion (PR #24). Aadhaar numbers are formatted with spaces in practice, never tabs. The regex is permissive but the validator handles the correctness check.


_AADHAAR = PatternEntry(
name="aadhaar",
regex=_AADHAAR_RE,
heuristic=None,
mask="[AADHAAR REDACTED]",
validator=_aadhaar_validate,
partial_masker=_partial_mask_aadhaar,
)


# --- PAN (Indian Permanent Account Number) ---
# Format: 5 uppercase letters + 4 digits + 1 uppercase letter.
# The 4th character is restricted to valid entity type codes (P, C, H, A, B, G, J, L, F, T).
_PAN_RE = re.compile(r"\b[A-Z]{3}[ABCFGHLJPT][A-Z][0-9]{4}[A-Z]\b")

_PAN = PatternEntry(
name="pan",
regex=_PAN_RE,
heuristic=None,
mask="[PAN REDACTED]",
partial_masker=_partial_mask_pan,
)


# --- Indian Phone ---
# Matches Indian mobile numbers (10 digits starting with 6-9).
# Optional +91 or 0 prefix. Supports spaces, dots, and dashes as separators.
_IN_PHONE_RE = re.compile(
r"(?<![0-9])"
r"(?:\+91[\s.-]?)?" # Optional +91 prefix
r"(?:0)?" # Optional 0 prefix for STD
Comment on lines +691 to +692
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IN phone regex currently allows both a +91 prefix and a leading 0 because both groups are optional and can both match. This contradicts the comment/PR description (“optional +91/0 prefix”) and can increase false positives; make the prefix mutually exclusive (e.g., either +91 or 0 or nothing).

Suggested change
r"(?:\+91[\s.-]?)?" # Optional +91 prefix
r"(?:0)?" # Optional 0 prefix for STD
r"(?:(?:\+91[\s.-]?)|0)?" # Optional, mutually exclusive +91 or 0 prefix

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid edge case. +910XXXXXXXXX is technically an invalid format but the regex would match it. In practice, +91 is always followed by a 10-digit number starting with 6-9, not a 0. The false positive risk is very low.

r"[6-9][0-9]{4}[\s.-]?[0-9]{5}"
r"(?![0-9])"
)

_IN_PHONE = PatternEntry(
name="in_phone",
regex=_IN_PHONE_RE,
heuristic=None,
mask="[IN_PHONE REDACTED]",
partial_masker=_partial_mask_in_phone,
)


# --- Generic Secret ---
# Context-dependent: matches values after common secret-related labels.
# Replaces the entire match (label + value) to avoid leaking context.
Expand Down Expand Up @@ -641,6 +750,7 @@ def get_builtin_patterns() -> tuple[PatternEntry, ...]:
- IPv6 before IPv4 to catch ::ffff:x.x.x.x mapped addresses first
- CPF/CNPJ/BR phone after IPv4 — formatted with check digits, specific enough
- IBAN/EU VAT after BR patterns — validated with mod-97 / prefix matching
- Aadhaar/PAN/IN phone after EU patterns — Verhoeff-validated / prefix-restricted
- Context-dependent patterns (generic_secret) before general patterns (email)
because email redaction inserts spaces that break generic_secret's ``\\S{8,128}``
- Broadest patterns (email, phone) last to avoid consuming text needed by
Expand All @@ -662,6 +772,9 @@ def get_builtin_patterns() -> tuple[PatternEntry, ...]:
_BR_PHONE,
_IBAN,
_EU_VAT,
_AADHAAR,
_PAN,
_IN_PHONE,
_GENERIC_SECRET,
_EMAIL,
_PHONE,
Expand Down
2 changes: 1 addition & 1 deletion tests/integration/test_logging_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def test_import_hushlog(self) -> None:

def test_version_matches_pep440(self) -> None:
"""Version string is present and looks like a PEP 440 version."""
assert hushlog.__version__ == "1.3.0"
assert hushlog.__version__ == "1.4.0"

def test_all_exports(self) -> None:
"""__all__ lists exactly the public API surface."""
Expand Down
Loading
Loading