mirror of
https://github.com/rspeer/wordfreq.git
synced 2024-12-23 09:21:37 +00:00
readme update: web text comes from OSCAR
This commit is contained in:
parent
b13d35e503
commit
c244ff0d10
@ -153,8 +153,7 @@ come from multiple sources:
|
||||
- **Subtitles**, from OPUS OpenSubtitles 2018 and SUBTLEX
|
||||
- **News**, from NewsCrawl 2014 and GlobalVoices
|
||||
- **Books**, from Google Books Ngrams 2012
|
||||
- **Web** text, from ParaCrawl, the Leeds Internet Corpus, and the MOKK
|
||||
Hungarian Webcorpus
|
||||
- **Web** text, from OSCAR
|
||||
- **Twitter**, representing short-form social media
|
||||
- **Reddit**, representing potentially longer Internet comments
|
||||
- **Miscellaneous** word frequencies: in Chinese, we import a free wordlist
|
||||
|
Loading…
Reference in New Issue
Block a user