From 287df17a71ed30e81f1e4c1fef5625f1cef317ac Mon Sep 17 00:00:00 2001 From: Robyn Speer Date: Mon, 23 Jul 2018 16:38:39 -0400 Subject: [PATCH] update the changelog for version 2.2 --- CHANGELOG.md | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 3bb5c32..6596a9b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,23 @@ -## Version 2.2 +## Version 2.2 (2018-07-24) + +Library change: + +- While the @ sign is usually considered a symbol and not part of a word, there + is a case where it acts like a letter. It's used in one way of writing + gender-neutral words in Spanish and Portuguese, such as "l@s niƱ@s". The + tokenizer in wordfreq will now allow words to end with "@" or "@s", so it + can recognize these words. + +Data changes: + +- Updated the data from Exquisite Corpus to filter the ParaCrawl web crawl + better. ParaCrawl provides two metrics (Zipporah and Bicleaner) for the + goodness of its data, and we now filter it to only use texts that get + positive scores on both metrics. + +- The input data includes the change to tokenization described above, giving + us word frequencies for words such as "l@s". + ## Version 2.1 (2018-06-18)