From c2e1504643880dd416c5d4ec774535d74f4094fe Mon Sep 17 00:00:00 2001 From: Andrew Lin Date: Tue, 14 Feb 2017 13:09:12 -0500 Subject: [PATCH] Clarify the changelog. --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 619e4fc..2add68f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,7 +14,7 @@ - Add automatic transliteration of Serbian text - Adjust tokenization of apostrophes next to vowel sounds: the French word "l'heure" is now tokenized similarly to "l'arc" -- Numbers longer than a single digit are smashed into the same word frequency, +- Multi-digit numbers of each length are smashed into the same word frequency, to remove meaningless differences and increase compatibility with word2vec. (Internally, their digits are replaced by zeroes.) - Another new frequency-merging strategy (drop the highest and lowest,