Commit Graph

2 Commits

Author SHA1 Message Date
Robyn Speer
b22a4b0f02 Strip apostrophes from edges of tokens
The issue here is that if you had French text with an apostrophe,
such as "d'un", it would split it into "d'" and "un", but if "d'"
were re-tokenized it would come out as "d". Stripping apostrophes
makes the process more idempotent.


Former-commit-id: 5a1fc00aaa
2015-08-25 12:41:48 -04:00
Joshua Chin
78e9cf5d8f moved test tokenizers
Former-commit-id: c2d1cdcb31
2015-07-17 14:58:58 -04:00