wordfreq/tests
Robyn Speer 51e260b713 Leave Thai segments alone in the default regex
Our regex already has a special case to leave Chinese and Japanese alone
when an appropriate tokenizer for the language isn't being used, as
Unicode's default segmentation would make every character into its own
token.

The same thing happens in Thai, and we don't even *have* an appropriate
tokenizer for Thai, so I've added a similar fallback.


Former-commit-id: 07f16e6f03
2016-02-22 14:32:59 -05:00
..
test_chinese.py test_chinese: fix typo in comment 2015-09-24 13:41:11 -04:00
test_japanese.py Revert a small syntax change introduced by a circular series of changes. 2015-09-24 13:24:11 -04:00
test.py Leave Thai segments alone in the default regex 2016-02-22 14:32:59 -05:00