mirror of
https://github.com/rspeer/wordfreq.git
synced 2024-12-23 09:21:37 +00:00
Clarify the changelog.
This commit is contained in:
parent
1363f9d2e0
commit
c2e1504643
@ -14,7 +14,7 @@
|
|||||||
- Add automatic transliteration of Serbian text
|
- Add automatic transliteration of Serbian text
|
||||||
- Adjust tokenization of apostrophes next to vowel sounds: the French word
|
- Adjust tokenization of apostrophes next to vowel sounds: the French word
|
||||||
"l'heure" is now tokenized similarly to "l'arc"
|
"l'heure" is now tokenized similarly to "l'arc"
|
||||||
- Numbers longer than a single digit are smashed into the same word frequency,
|
- Multi-digit numbers of each length are smashed into the same word frequency,
|
||||||
to remove meaningless differences and increase compatibility with word2vec.
|
to remove meaningless differences and increase compatibility with word2vec.
|
||||||
(Internally, their digits are replaced by zeroes.)
|
(Internally, their digits are replaced by zeroes.)
|
||||||
- Another new frequency-merging strategy (drop the highest and lowest,
|
- Another new frequency-merging strategy (drop the highest and lowest,
|
||||||
|
Loading…
Reference in New Issue
Block a user