Rob Speer
|
d6cdef6039
|
Use langcodes when tokenizing again (it no longer connects to a DB)
|
2017-04-27 15:09:59 -04:00 |
|
Rob Speer
|
f671a1db7f
|
import new wordlists from Exquisite Corpus
|
2017-01-05 17:59:26 -05:00 |
|
Rob Speer
|
f89ac5e400
|
test_chinese: fix typo in comment
Former-commit-id: 2a84a926f5
|
2015-09-24 13:41:11 -04:00 |
|
Rob Speer
|
e3a79ab8c9
|
add external_wordlist option to tokenize
Former-commit-id: 669bd16c13
|
2015-09-10 18:09:41 -04:00 |
|
Rob Speer
|
a13f459f88
|
Lower the frequency of phrases with inferred token boundaries
Former-commit-id: 5c8c36f4e3
|
2015-09-10 14:16:22 -04:00 |
|
Rob Speer
|
91cc82f76d
|
tokenize Chinese using jieba and our own frequencies
Former-commit-id: 2327f2e4d6
|
2015-09-05 03:16:56 -04:00 |
|