updated word_frequency docstring

Former-commit-id: 4304a400f7
This commit is contained in:
Joshua Chin 2015-07-07 14:56:12 -04:00
parent a9a48229e0
commit e1435136e3

View File

@ -247,13 +247,14 @@ def word_frequency(word, lang, wordlist='combined', default=0.):
"""
Get the frequency of `word` in the language with code `lang`, from the
specified `wordlist`. The default wordlist is 'combined', built from
whichever of these four sources have sufficient data for the language:
whichever of these five sources have sufficient data for the language:
- Full text of Wikipedia
- A sample of 72 million tweets collected from Twitter in 2014,
divided roughly into languages using automatic language detection
- Frequencies extracted from OpenSubtitles
- The Leeds Internet Corpus
- Google Books Ngrams and Google Books Syntactic Ngrams
Another available wordlist is 'twitter', which uses only the data from
Twitter.