mirror of
https://github.com/rspeer/wordfreq.git
synced 2024-12-23 09:21:37 +00:00
65 lines
2.8 KiB
Markdown
65 lines
2.8 KiB
Markdown
|
wordfreq
|
||
|
Copyright 2022 Robyn Speer
|
||
|
|
||
|
# Attribution notes
|
||
|
|
||
|
Robyn Speer must be credited as Robyn Speer, which is her maiden name, used on academic work.
|
||
|
Crediting her as Elia Robyn Lake (her married name) will make the credit less effective, as it will
|
||
|
not line up with other work.
|
||
|
|
||
|
Crediting Robyn Speer by a different name than one of the above is a serious violation of the license,
|
||
|
in which case you do not have permission to use, copy, or redistribute wordfreq.
|
||
|
|
||
|
If you use wordfreq in academic work, you must cite it. See "Citing wordfreq" in README.md.
|
||
|
|
||
|
# Included licenses
|
||
|
|
||
|
`wordfreq` is freely redistributable under the Apache license (see
|
||
|
`LICENSE.txt`), and it includes data files that may be
|
||
|
redistributed under a Creative Commons Attribution-ShareAlike 4.0
|
||
|
license (<https://creativecommons.org/licenses/by-sa/4.0/>).
|
||
|
|
||
|
`wordfreq` contains data extracted from Google Books Ngrams
|
||
|
(<http://books.google.com/ngrams>) and Google Books Syntactic Ngrams
|
||
|
(<http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html>).
|
||
|
The terms of use of this data are:
|
||
|
|
||
|
Ngram Viewer graphs and data may be freely used for any purpose, although
|
||
|
acknowledgement of Google Books Ngram Viewer as the source, and inclusion
|
||
|
of a link to http://books.google.com/ngrams, would be appreciated.
|
||
|
|
||
|
`wordfreq` also contains data derived from the following Creative Commons-licensed
|
||
|
sources:
|
||
|
|
||
|
- The Leeds Internet Corpus, from the University of Leeds Centre for Translation
|
||
|
Studies (<http://corpus.leeds.ac.uk/list.html>)
|
||
|
|
||
|
- Wikipedia, the free encyclopedia (<http://www.wikipedia.org>)
|
||
|
|
||
|
- ParaCrawl, a multilingual Web crawl (<https://paracrawl.eu>)
|
||
|
|
||
|
It contains data from OPUS OpenSubtitles 2018
|
||
|
(<http://opus.nlpl.eu/OpenSubtitles.php>), whose data originates from the
|
||
|
OpenSubtitles project (<http://www.opensubtitles.org/>) and may be used with
|
||
|
attribution to OpenSubtitles.
|
||
|
|
||
|
It contains data from various SUBTLEX word lists: SUBTLEX-US, SUBTLEX-UK,
|
||
|
SUBTLEX-CH, SUBTLEX-DE, and SUBTLEX-NL, created by Marc Brysbaert et al.
|
||
|
(see citations below) and available at
|
||
|
<http://crr.ugent.be/programs-data/subtitle-frequencies>.
|
||
|
|
||
|
I (Robyn Speer) have obtained permission by e-mail from Marc Brysbaert to
|
||
|
distribute these wordlists in wordfreq, to be used for any purpose, not just
|
||
|
for academic use, under these conditions:
|
||
|
|
||
|
- Wordfreq and code derived from it must credit the SUBTLEX authors.
|
||
|
- It must remain clear that SUBTLEX is freely available data.
|
||
|
|
||
|
These terms are similar to the Creative Commons Attribution-ShareAlike license.
|
||
|
|
||
|
Some additional data was collected by a custom application that watches the
|
||
|
streaming Twitter API, in accordance with Twitter's Developer Agreement &
|
||
|
Policy. This software gives statistics about words that are commonly used on
|
||
|
Twitter; it does not display or republish any Twitter content.
|
||
|
|