Commit 2bd88b9
authored
Introducing SentencePiece Unigram Tokenizer Model (#7390)
* Introducing SentencePiece Unigram Tokenizer Model
* Copilot feedback addressing :-)
* addressing Copilot feedback
* Feedback Addressing 1
* Clean up
* More clean up
* Addressing the feedback
* More feedback addressing
* add space1 parent fd62e6c commit 2bd88b9
File tree
17 files changed
+5872
-1955
lines changed- eng
- src/Microsoft.ML.Tokenizers
- Model
- Normalizer
- Utils
- test/Microsoft.ML.Tokenizers.Tests
17 files changed
+5872
-1955
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
136 | 157 | | |
137 | 158 | | |
138 | 159 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
103 | | - | |
| 103 | + | |
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
25 | 28 | | |
26 | 29 | | |
27 | 30 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
| 34 | + | |
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | 57 | | |
65 | 58 | | |
66 | 59 | | |
| |||
0 commit comments