Enhancement: 國際化 word boundary 檢查

## 來源
- [flashtext PR#49](https://github.com/vi3k6i5/flashtext/pull/49)

## 問題描述
目前 `non_word_boundaries` 是硬編碼的 ASCII 字符集：
```python
self.non_word_boundaries = set(string.digits + string.ascii_letters + '_')
```

這對非 ASCII 語言不夠友好。

## 建議方案
使用 Python 的 `\W` regex class 或 Unicode category 來判斷 word boundary：

```python
import unicodedata

def is_word_char(char):
    category = unicodedata.category(char)
    return category.startswith(('L', 'N'))  # Letter or Number
```

## 考量
- 效能影響：需要 benchmark 測試
- 向後兼容：可能需要新增參數 `unicode_boundaries=True`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement: 國際化 word boundary 檢查 #4

來源

問題描述

建議方案

考量

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Enhancement: 國際化 word boundary 檢查 #4

Description

來源

問題描述

建議方案

考量

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions