wordfreq

mirror of https://github.com/rspeer/wordfreq.git synced 2024-12-23 09:21:37 +00:00

History

Robyn Speer 51e260b713 Leave Thai segments alone in the default regex Our regex already has a special case to leave Chinese and Japanese alone when an appropriate tokenizer for the language isn't being used, as Unicode's default segmentation would make every character into its own token. The same thing happens in Thai, and we don't even have an appropriate tokenizer for Thai, so I've added a similar fallback. Former-commit-id: `07f16e6f03`		2016-02-22 14:32:59 -05:00
..
test_chinese.py	test_chinese: fix typo in comment	2015-09-24 13:41:11 -04:00
test_japanese.py	Revert a small syntax change introduced by a circular series of changes.	2015-09-24 13:24:11 -04:00
test.py	Leave Thai segments alone in the default regex	2016-02-22 14:32:59 -05:00