mirror of
https://github.com/rspeer/wordfreq.git
synced 2024-12-23 17:31:41 +00:00
update the changelog for version 2.2
This commit is contained in:
parent
f73406c69a
commit
287df17a71
21
CHANGELOG.md
21
CHANGELOG.md
@ -1,4 +1,23 @@
|
||||
## Version 2.2
|
||||
## Version 2.2 (2018-07-24)
|
||||
|
||||
Library change:
|
||||
|
||||
- While the @ sign is usually considered a symbol and not part of a word, there
|
||||
is a case where it acts like a letter. It's used in one way of writing
|
||||
gender-neutral words in Spanish and Portuguese, such as "l@s niñ@s". The
|
||||
tokenizer in wordfreq will now allow words to end with "@" or "@s", so it
|
||||
can recognize these words.
|
||||
|
||||
Data changes:
|
||||
|
||||
- Updated the data from Exquisite Corpus to filter the ParaCrawl web crawl
|
||||
better. ParaCrawl provides two metrics (Zipporah and Bicleaner) for the
|
||||
goodness of its data, and we now filter it to only use texts that get
|
||||
positive scores on both metrics.
|
||||
|
||||
- The input data includes the change to tokenization described above, giving
|
||||
us word frequencies for words such as "l@s".
|
||||
|
||||
|
||||
## Version 2.1 (2018-06-18)
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user