mirror of
https://github.com/rspeer/wordfreq.git
synced 2024-12-23 09:21:37 +00:00
readme update: web text comes from OSCAR
This commit is contained in:
parent
b13d35e503
commit
c244ff0d10
@ -153,8 +153,7 @@ come from multiple sources:
|
|||||||
- **Subtitles**, from OPUS OpenSubtitles 2018 and SUBTLEX
|
- **Subtitles**, from OPUS OpenSubtitles 2018 and SUBTLEX
|
||||||
- **News**, from NewsCrawl 2014 and GlobalVoices
|
- **News**, from NewsCrawl 2014 and GlobalVoices
|
||||||
- **Books**, from Google Books Ngrams 2012
|
- **Books**, from Google Books Ngrams 2012
|
||||||
- **Web** text, from ParaCrawl, the Leeds Internet Corpus, and the MOKK
|
- **Web** text, from OSCAR
|
||||||
Hungarian Webcorpus
|
|
||||||
- **Twitter**, representing short-form social media
|
- **Twitter**, representing short-form social media
|
||||||
- **Reddit**, representing potentially longer Internet comments
|
- **Reddit**, representing potentially longer Internet comments
|
||||||
- **Miscellaneous** word frequencies: in Chinese, we import a free wordlist
|
- **Miscellaneous** word frequencies: in Chinese, we import a free wordlist
|
||||||
|
Loading…
Reference in New Issue
Block a user