Commit Graph

3 Commits

Author SHA1 Message Date
Rob Speer
c4a2594217 fix URL expression 2015-08-26 15:00:46 -04:00
Rob Speer
5a1fc00aaa Strip apostrophes from edges of tokens
The issue here is that if you had French text with an apostrophe,
such as "d'un", it would split it into "d'" and "un", but if "d'"
were re-tokenized it would come out as "d". Stripping apostrophes
makes the process more idempotent.
2015-08-25 12:41:48 -04:00
Joshua Chin
c2d1cdcb31 moved test tokenizers 2015-07-17 14:58:58 -04:00