Commit Graph

4 Commits

Author SHA1 Message Date
Rob Speer
2a84a926f5 test_chinese: fix typo in comment 2015-09-24 13:41:11 -04:00
Rob Speer
669bd16c13 add external_wordlist option to tokenize 2015-09-10 18:09:41 -04:00
Rob Speer
5c8c36f4e3 Lower the frequency of phrases with inferred token boundaries 2015-09-10 14:16:22 -04:00
Rob Speer
2327f2e4d6 tokenize Chinese using jieba and our own frequencies 2015-09-05 03:16:56 -04:00