wordfreq/wordfreq_builder/tests
Rob Speer a3b37f6619 Strip apostrophes from edges of tokens
The issue here is that if you had French text with an apostrophe,
such as "d'un", it would split it into "d'" and "un", but if "d'"
were re-tokenized it would come out as "d". Stripping apostrophes
makes the process more idempotent.


Former-commit-id: 5a1fc00aaa
2015-08-25 12:41:48 -04:00
..
test_tokenizer.py Strip apostrophes from edges of tokens 2015-08-25 12:41:48 -04:00