From 86f22e852334c6bae67412af43d99967d516be0c Mon Sep 17 00:00:00 2001
From: Robyn Speer <rspeer@luminoso.com>
Date: Thu, 5 Jan 2017 19:24:28 -0500
Subject: [PATCH] Mention that multi-digit numbers are combined together

---
 CHANGELOG.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 9cf98f4..0fefac7 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -14,6 +14,9 @@
 - Add automatic transliteration of Serbian text
 - Adjust tokenization of apostrophes next to vowel sounds: the French word
   "l'heure" is now tokenized similarly to "l'arc"
+- Numbers longer than a single digit are smashed into the same word frequency,
+  to remove meaningless differences and increase compatibility with word2vec.
+  (Internally, their digits are replaced by zeroes.)
 - Another new frequency-merging strategy (drop the highest and lowest,
   average the rest)