updated word_frequency docstring

This commit is contained in:
Joshua Chin 2015-07-07 14:56:12 -04:00
parent 5e8ef19321
commit 4304a400f7

View File

@ -247,13 +247,14 @@ def word_frequency(word, lang, wordlist='combined', default=0.):
""" """
Get the frequency of `word` in the language with code `lang`, from the Get the frequency of `word` in the language with code `lang`, from the
specified `wordlist`. The default wordlist is 'combined', built from specified `wordlist`. The default wordlist is 'combined', built from
whichever of these four sources have sufficient data for the language: whichever of these five sources have sufficient data for the language:
- Full text of Wikipedia - Full text of Wikipedia
- A sample of 72 million tweets collected from Twitter in 2014, - A sample of 72 million tweets collected from Twitter in 2014,
divided roughly into languages using automatic language detection divided roughly into languages using automatic language detection
- Frequencies extracted from OpenSubtitles - Frequencies extracted from OpenSubtitles
- The Leeds Internet Corpus - The Leeds Internet Corpus
- Google Books Ngrams and Google Books Syntactic Ngrams
Another available wordlist is 'twitter', which uses only the data from Another available wordlist is 'twitter', which uses only the data from
Twitter. Twitter.