Joshua Chin
a289ab7f8b
clearified the docstrings for random_words and random_ascii_words
2015-06-17 14:26:06 -04:00
Joshua Chin
9f288bac31
corrected available_languages to return a dict of strs to strs
2015-06-17 12:43:13 -04:00
Joshua Chin
6cc962bfea
changed yield to yield from in iter_wordlist
2015-06-17 12:38:31 -04:00
Joshua Chin
9b30da4dec
updated db_to_freq docstring
2015-06-17 12:24:23 -04:00
Joshua Chin
7e808bf7c1
added docstrings
2015-06-17 12:20:50 -04:00
Joshua Chin
053e4da3e6
removed temporary variable
2015-06-17 12:16:02 -04:00
Rob Speer
ed19d79c5a
clearer error on py2
2015-05-28 14:05:11 -04:00
Rob Speer
0f4ca80026
add installation instructions to the readme
2015-05-28 14:02:12 -04:00
Rob Speer
611a6a35de
update Japanese data; test Japanese and token combining
2015-05-28 14:01:56 -04:00
Rob Speer
05cf94d1fd
Work on making Japanese tokenization use MeCab consistently
2015-05-27 18:10:25 -04:00
Rob Speer
0e5156e162
Merge branch 'master' into newbuild
...
Conflicts:
setup.py
wordfreq/build.py
wordfreq/config.py
2015-05-21 20:41:47 -04:00
Rob Speer
84e5edcea1
rebuild data
2015-05-21 20:36:15 -04:00
Rob Speer
410912d8f0
remove old tests
2015-05-21 20:36:09 -04:00
Rob Speer
b42594fa5f
allow more language matches; reorder some parameters
2015-05-21 20:35:02 -04:00
Rob Speer
df863a5169
tests for new wordfreq with full coverage
2015-05-21 20:34:17 -04:00
Rob Speer
dd41e61c57
update README, another setup fix
2015-05-13 04:09:34 -04:00
Rob Speer
f13cca4d81
update dependencies
2015-05-12 12:30:01 -04:00
Rob Speer
bb18f741e2
restore missing line in setup.py
2015-05-12 12:24:18 -04:00
Rob Speer
35aec061de
add new data files from wordfreq_builder
2015-05-11 18:45:47 -04:00
Rob Speer
9b63e54471
WIP: burn stuff down
2015-05-08 15:28:52 -04:00
Lance Nathan
e8a1548d93
Tweak to previous variable name fix
2015-05-06 17:57:10 -04:00
Lance Nathan
4632ffb177
Merge pull request #6 from LuminosoInsight/ftfy4
...
Clean data with ftfy v4
2015-05-06 17:32:45 -04:00
Lance Nathan
5f05b52fe5
Merge pull request #5 from LuminosoInsight/dutch-201504
...
Better Dutch surface-form data
2015-05-06 17:15:21 -04:00
Rob Speer
506073030a
fix reused variable name
2015-05-06 17:06:37 -04:00
Rob Speer
2f3bb955d1
set version number to 0.8
2015-05-05 12:05:00 -04:00
Rob Speer
24a7c73e6d
Merge branch 'dutch-201504' into ftfy4
...
Conflicts:
setup.py
2015-05-05 12:04:44 -04:00
Rob Speer
70b2c678ea
require ftfy 4
2015-05-05 12:04:13 -04:00
Rob Speer
873ace87db
v0.7: make a proper Dutch 'surfaces' list
2015-04-30 13:01:24 -04:00
Rob Speer
6cf46ee5aa
Merge branch 'master' into dutch-201503
...
Conflicts:
wordfreq/build.py
2015-04-29 14:36:24 -04:00
Rob Speer
af5f65b328
start a new multilingual wordlist called 'stems'
...
So far, this wordlist is only in Dutch.
2015-03-31 15:59:30 -04:00
Rob Speer
3507d8b630
Fix Dutch lists
...
- Use surface forms consistently, not stems
- Count all instances of words on Wikipedia, not one per article
2015-03-12 16:00:03 -04:00
Andrew Lin
cfe58cd899
Merge pull request #3 from LuminosoInsight/variable_name_fix
...
Fix a variable name for clarity.
2015-03-11 14:10:53 -04:00
Rob Speer
377336bcdc
new Dutch data, bump version to 0.6
2015-03-03 15:54:45 -05:00
Andrew Lin
434c603798
Fix a variable name for clarity.
2015-03-03 11:59:46 -05:00
Andrew Lin
5a4d3a87d5
Merge pull request #2 from LuminosoInsight/new-twitter-lists
...
New twitter lists
2015-02-17 15:36:13 -05:00
Rob Speer
ffdaa82b11
add surface forms from Twitter 2014 data
2015-02-17 15:06:11 -05:00
Rob Speer
b6f246ecbb
stop running 'remove_unsafe_private_use' unnecessarily
2015-02-17 14:02:36 -05:00
Rob Speer
6ab72201cd
add twitter-stems-2014 wordlist data
2015-02-11 13:29:32 -05:00
Rob Speer
bf0071fd8b
Allow multithreaded SQLite on Python 3
2014-10-02 18:10:09 -04:00
Rob Speer
6d90cef415
construct the download path correctly, even on Windows
2014-09-08 10:56:48 -04:00
Rob Speer
c55a701885
remove unused global
2014-09-02 14:29:31 -04:00
Rob Speer
5dee417302
cleanups to building and uploading, from code review
2014-08-18 14:14:01 -04:00
Rob Speer
cb7b2b76e6
Add license text for the whole package
2014-06-02 16:37:32 -04:00
Rob Speer
44ccf40742
A different plan for the top-level word_frequency function.
...
When, before, I was importing wordfreq.query at the top level, this
created a dependency loop when installing wordfreq.
The new top-level __init__.py provides just a `word_frequency` function,
which imports the real function as needed and calls it. This should
avoid the dependency loop, at the cost of making
`wordfreq.word_frequency` slightly less efficient than
`wordfreq.query.word_frequency`.
2014-02-24 18:03:31 -05:00
Rob Speer
3702a7c8d0
version 0.4: minor code changes, debugged database
...
- The database is built under Python 3.3.2, so it should correctly
implement Python 3's Unicode tricks, including special handling
of Greek lowercase letters. (Version 0.3 was supposed to do this
as well, but apparently, it didn't.)
- `word_frequency` and `iter_wordlist` can be imported from the
top level.
- The new function `random_words` supplies a string made from
random words that are sufficiently high in rank order.
2014-02-24 16:29:06 -05:00
Rob Speer
3447ae732e
Sometimes you need some random words.
2014-01-06 15:51:10 -05:00
Andrew Lin
68d262791c
Remove the tests for metanl_word_frequency too. Doh.
2013-11-11 13:21:25 -05:00
Rob Speer
63bebe6ad3
Merge pull request #1 from LuminosoInsight/remove_metanl_wf
...
Remove metanl_word_frequency(), which we no longer need.
2013-11-11 10:13:25 -08:00
Rob Speer
56f2c606f1
data is now hosted on wordfreq.services.luminoso.com
2013-11-07 14:43:15 -05:00
Andrew Lin
76a7267670
Remove metanl_word_frequency(), which we no longer need.
2013-11-04 16:51:25 -05:00