From 64bbcbd51bbff36e439690776381651e6d7ab235 Mon Sep 17 00:00:00 2001 From: Robyn Speer Date: Thu, 15 Apr 2021 14:45:29 -0400 Subject: [PATCH] readme update: web text comes from OSCAR --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index e34f62c..c916ede 100644 --- a/README.md +++ b/README.md @@ -153,8 +153,7 @@ come from multiple sources: - **Subtitles**, from OPUS OpenSubtitles 2018 and SUBTLEX - **News**, from NewsCrawl 2014 and GlobalVoices - **Books**, from Google Books Ngrams 2012 -- **Web** text, from ParaCrawl, the Leeds Internet Corpus, and the MOKK - Hungarian Webcorpus +- **Web** text, from OSCAR - **Twitter**, representing short-form social media - **Reddit**, representing potentially longer Internet comments - **Miscellaneous** word frequencies: in Chinese, we import a free wordlist