Clarify the changelog.

This commit is contained in:
Andrew Lin 2017-02-14 13:09:12 -05:00
parent 1363f9d2e0
commit c2e1504643

View File

@ -14,7 +14,7 @@
- Add automatic transliteration of Serbian text
- Adjust tokenization of apostrophes next to vowel sounds: the French word
"l'heure" is now tokenized similarly to "l'arc"
- Numbers longer than a single digit are smashed into the same word frequency,
- Multi-digit numbers of each length are smashed into the same word frequency,
to remove meaningless differences and increase compatibility with word2vec.
(Internally, their digits are replaced by zeroes.)
- Another new frequency-merging strategy (drop the highest and lowest,