mirror of
https://github.com/rspeer/wordfreq.git
synced 2024-12-24 09:51:38 +00:00
updated word_frequency docstring
This commit is contained in:
parent
5e8ef19321
commit
4304a400f7
@ -247,13 +247,14 @@ def word_frequency(word, lang, wordlist='combined', default=0.):
|
|||||||
"""
|
"""
|
||||||
Get the frequency of `word` in the language with code `lang`, from the
|
Get the frequency of `word` in the language with code `lang`, from the
|
||||||
specified `wordlist`. The default wordlist is 'combined', built from
|
specified `wordlist`. The default wordlist is 'combined', built from
|
||||||
whichever of these four sources have sufficient data for the language:
|
whichever of these five sources have sufficient data for the language:
|
||||||
|
|
||||||
- Full text of Wikipedia
|
- Full text of Wikipedia
|
||||||
- A sample of 72 million tweets collected from Twitter in 2014,
|
- A sample of 72 million tweets collected from Twitter in 2014,
|
||||||
divided roughly into languages using automatic language detection
|
divided roughly into languages using automatic language detection
|
||||||
- Frequencies extracted from OpenSubtitles
|
- Frequencies extracted from OpenSubtitles
|
||||||
- The Leeds Internet Corpus
|
- The Leeds Internet Corpus
|
||||||
|
- Google Books Ngrams and Google Books Syntactic Ngrams
|
||||||
|
|
||||||
Another available wordlist is 'twitter', which uses only the data from
|
Another available wordlist is 'twitter', which uses only the data from
|
||||||
Twitter.
|
Twitter.
|
||||||
|
Loading…
Reference in New Issue
Block a user