Commit f6751d2
authored
Add an abstract class for Tokenizer (#53)
* Add an abstract class for tokenizer
* Add sentence piece tokenizer as a subclass of Tokenizer
* Fix decode method for SentencePieceTokenizer
* Fix circular import issue
* fix type annotations
* fix linting issues
* Format files using pyink
* Update the tokenizer decode interface to return ids instead of str
* format using pyink
* Move Tokenizer class to a tokenizer_api.py file
* Update engine.build_tokenizer method to return SentencePieceTokenizer by default1 parent 3fdacc8 commit f6751d2
4 files changed
Lines changed: 180 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
96 | | - | |
97 | 96 | | |
98 | 97 | | |
99 | 98 | | |
| |||
397 | 396 | | |
398 | 397 | | |
399 | 398 | | |
400 | | - | |
| 399 | + | |
401 | 400 | | |
402 | 401 | | |
403 | 402 | | |
| |||
429 | 428 | | |
430 | 429 | | |
431 | 430 | | |
432 | | - | |
| 431 | + | |
433 | 432 | | |
434 | | - | |
435 | 433 | | |
436 | 434 | | |
437 | 435 | | |
| |||
568 | 566 | | |
569 | 567 | | |
570 | 568 | | |
571 | | - | |
572 | | - | |
| 569 | + | |
573 | 570 | | |
574 | 571 | | |
575 | 572 | | |
| |||
587 | 584 | | |
588 | 585 | | |
589 | 586 | | |
590 | | - | |
| 587 | + | |
591 | 588 | | |
592 | 589 | | |
593 | 590 | | |
594 | | - | |
595 | 591 | | |
596 | 592 | | |
597 | 593 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
| |||
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
| 43 | + | |
| 44 | + | |
42 | 45 | | |
43 | 46 | | |
44 | 47 | | |
| |||
200 | 203 | | |
201 | 204 | | |
202 | 205 | | |
203 | | - | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
204 | 214 | | |
205 | 215 | | |
206 | 216 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
28 | 27 | | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
29 | 33 | | |
30 | 34 | | |
31 | 35 | | |
| |||
112 | 116 | | |
113 | 117 | | |
114 | 118 | | |
115 | | - | |
| 119 | + | |
116 | 120 | | |
117 | 121 | | |
118 | 122 | | |
| |||
196 | 200 | | |
197 | 201 | | |
198 | 202 | | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
0 commit comments