wordfreq/NOTICE.md
2024-06-24 19:05:20 -04:00

2.8 KiB

wordfreq
Copyright 2022 Robyn Speer

Attribution notes

Robyn Speer must be credited as Robyn Speer, which is her maiden name, used on academic work. Crediting her as Elia Robyn Lake (her married name) will make the credit less effective, as it will not line up with other work.

Crediting Robyn Speer by a different name than one of the above is a serious violation of the license, in which case you do not have permission to use, copy, or redistribute wordfreq.

If you use wordfreq in academic work, you must cite it. See "Citing wordfreq" in README.md.

Included licenses

wordfreq is freely redistributable under the Apache license (see LICENSE.txt), and it includes data files that may be redistributed under a Creative Commons Attribution-ShareAlike 4.0 license (https://creativecommons.org/licenses/by-sa/4.0/).

wordfreq contains data extracted from Google Books Ngrams (http://books.google.com/ngrams) and Google Books Syntactic Ngrams (http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html). The terms of use of this data are:

Ngram Viewer graphs and data may be freely used for any purpose, although
acknowledgement of Google Books Ngram Viewer as the source, and inclusion
of a link to http://books.google.com/ngrams, would be appreciated.

wordfreq also contains data derived from the following Creative Commons-licensed sources:

It contains data from OPUS OpenSubtitles 2018 (http://opus.nlpl.eu/OpenSubtitles.php), whose data originates from the OpenSubtitles project (http://www.opensubtitles.org/) and may be used with attribution to OpenSubtitles.

It contains data from various SUBTLEX word lists: SUBTLEX-US, SUBTLEX-UK, SUBTLEX-CH, SUBTLEX-DE, and SUBTLEX-NL, created by Marc Brysbaert et al. (see citations below) and available at http://crr.ugent.be/programs-data/subtitle-frequencies.

I (Robyn Speer) have obtained permission by e-mail from Marc Brysbaert to distribute these wordlists in wordfreq, to be used for any purpose, not just for academic use, under these conditions:

  • Wordfreq and code derived from it must credit the SUBTLEX authors.
  • It must remain clear that SUBTLEX is freely available data.

These terms are similar to the Creative Commons Attribution-ShareAlike license.

Some additional data was collected by a custom application that watched the streaming Twitter API, in accordance with Twitter's Developer Agreement & Policy. This software gives statistics about words that were commonly used on Twitter; it does not display or republish any Twitter content, and does not contain any content from after Twitter's sale.