2.8 KiB
wordfreq
Copyright 2022 Robyn Speer
Attribution notes
Robyn Speer must be credited as Robyn Speer, which is her maiden name, used on academic work. Crediting her as Elia Robyn Lake (her married name) will make the credit less effective, as it will not line up with other work.
Crediting Robyn Speer by a different name than one of the above is a serious violation of the license, in which case you do not have permission to use, copy, or redistribute wordfreq.
If you use wordfreq in academic work, you must cite it. See "Citing wordfreq" in README.md.
Included licenses
wordfreq
is freely redistributable under the Apache license (see
LICENSE.txt
), and it includes data files that may be
redistributed under a Creative Commons Attribution-ShareAlike 4.0
license (https://creativecommons.org/licenses/by-sa/4.0/).
wordfreq
contains data extracted from Google Books Ngrams
(http://books.google.com/ngrams) and Google Books Syntactic Ngrams
(http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html).
The terms of use of this data are:
Ngram Viewer graphs and data may be freely used for any purpose, although
acknowledgement of Google Books Ngram Viewer as the source, and inclusion
of a link to http://books.google.com/ngrams, would be appreciated.
wordfreq
also contains data derived from the following Creative Commons-licensed
sources:
-
The Leeds Internet Corpus, from the University of Leeds Centre for Translation Studies (http://corpus.leeds.ac.uk/list.html)
-
Wikipedia, the free encyclopedia (http://www.wikipedia.org)
-
ParaCrawl, a multilingual Web crawl (https://paracrawl.eu)
It contains data from OPUS OpenSubtitles 2018 (http://opus.nlpl.eu/OpenSubtitles.php), whose data originates from the OpenSubtitles project (http://www.opensubtitles.org/) and may be used with attribution to OpenSubtitles.
It contains data from various SUBTLEX word lists: SUBTLEX-US, SUBTLEX-UK, SUBTLEX-CH, SUBTLEX-DE, and SUBTLEX-NL, created by Marc Brysbaert et al. (see citations below) and available at http://crr.ugent.be/programs-data/subtitle-frequencies.
I (Robyn Speer) have obtained permission by e-mail from Marc Brysbaert to distribute these wordlists in wordfreq, to be used for any purpose, not just for academic use, under these conditions:
- Wordfreq and code derived from it must credit the SUBTLEX authors.
- It must remain clear that SUBTLEX is freely available data.
These terms are similar to the Creative Commons Attribution-ShareAlike license.
Some additional data was collected by a custom application that watches the streaming Twitter API, in accordance with Twitter's Developer Agreement & Policy. This software gives statistics about words that are commonly used on Twitter; it does not display or republish any Twitter content.