Skip to content

Conversation

@JasonPierce
Copy link
Contributor

Fixes #154:

  • Add unicodeWordBoundaries option for Unicode-aware word matching
  • Enhance benchmarks to include unicodeWordBoundaries

- Introduced `unicodeWordBoundaries` in `ProfanityOptions` to control word boundary behavior.
- Updated `Profanity` class to handle both ASCII and Unicode-aware boundaries based on the new option.
- Enhanced README with usage examples for the new feature.
- Added tests to verify default behavior and custom settings for `unicodeWordBoundaries`.
@JasonPierce JasonPierce requested a review from Copilot September 13, 2025 03:16
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Unicode-aware word boundary support to fix issues with whole-word matching when Unicode characters are present. The enhancement allows proper handling of diacritics and Unicode punctuation while maintaining backward compatibility.

Key changes:

  • Added unicodeWordBoundaries option that defaults to false for performance
  • Enhanced regex building to support Unicode-aware word boundaries using \p{L}, \p{N}, and \p{M} character classes
  • Updated benchmarks to compare Unicode-aware vs ASCII performance

Reviewed Changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/profanity-options.ts Adds unicodeWordBoundaries option with false default
src/profanity.ts Implements Unicode-aware word boundary logic in regex building and whitelist checking
tests/profanity-unicode-boundaries.spec.ts Comprehensive test coverage for Unicode boundary scenarios
tests/profanity-options.spec.ts Tests for the new unicodeWordBoundaries option
src/tools/benchmark/benchmark.ts Enhanced benchmarks to compare Unicode vs ASCII performance with paired execution
README.md Documentation for the new unicodeWordBoundaries option
package.json Updated dependencies and Docker Compose syntax
.nvmrc Updated Node.js version to 22.19.0

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@JasonPierce JasonPierce requested a review from Copilot September 13, 2025 03:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated no new comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@JasonPierce JasonPierce merged commit f304e18 into main Sep 13, 2025
5 checks passed
@JasonPierce JasonPierce deleted the jp-issue-154 branch September 13, 2025 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unicode characters cause wholeWord to break

2 participants