Clarify the changelog.

This commit is contained in:
Andrew Lin 2017-02-14 13:09:12 -05:00
parent 1363f9d2e0
commit c2e1504643

View File

@ -14,7 +14,7 @@
- Add automatic transliteration of Serbian text - Add automatic transliteration of Serbian text
- Adjust tokenization of apostrophes next to vowel sounds: the French word - Adjust tokenization of apostrophes next to vowel sounds: the French word
"l'heure" is now tokenized similarly to "l'arc" "l'heure" is now tokenized similarly to "l'arc"
- Numbers longer than a single digit are smashed into the same word frequency, - Multi-digit numbers of each length are smashed into the same word frequency,
to remove meaningless differences and increase compatibility with word2vec. to remove meaningless differences and increase compatibility with word2vec.
(Internally, their digits are replaced by zeroes.) (Internally, their digits are replaced by zeroes.)
- Another new frequency-merging strategy (drop the highest and lowest, - Another new frequency-merging strategy (drop the highest and lowest,