Robyn Speer
524f7c760b
WIP on Ninja build automation
2015-04-29 15:59:06 -04:00
Robyn Speer
f77a61e675
move commands into cli/ directory
2015-04-29 15:22:04 -04:00
Robyn Speer
2bf8870832
always use surface forms
2015-04-29 15:17:00 -04:00
Robyn Speer
b4dfdaa47c
Merge branch 'master' into dutch-201503
...
Conflicts:
wordfreq/build.py
Former-commit-id: 6cf46ee5aa
2015-04-29 14:36:24 -04:00
Robyn Speer
38261a6a0a
handle multi-word stems correctly
2015-04-29 13:45:53 -04:00
Robyn Speer
d29e8bfddf
start a new multilingual wordlist called 'stems'
...
So far, this wordlist is only in Dutch.
Former-commit-id: af5f65b328
2015-03-31 15:59:30 -04:00
Robyn Speer
6b57075275
revise the process of building Wikipedia counts
2015-03-30 18:09:07 -04:00
Robyn Speer
56e811be19
Fix Dutch lists
...
- Use surface forms consistently, not stems
- Count all instances of words on Wikipedia, not one per article
Former-commit-id: 3507d8b630
2015-03-12 16:00:03 -04:00
Andrew Lin
6e98ca9822
Merge pull request #3 from LuminosoInsight/variable_name_fix
...
Fix a variable name for clarity.
Former-commit-id: cfe58cd899
2015-03-11 14:10:53 -04:00
Robyn Speer
ca944e54aa
new Dutch data, bump version to 0.6
...
Former-commit-id: 377336bcdc
2015-03-03 15:54:45 -05:00
Andrew Lin
6882ac9f0e
Fix a variable name for clarity.
...
Former-commit-id: 434c603798
2015-03-03 11:59:46 -05:00
Andrew Lin
39d914f8e1
Merge pull request #2 from LuminosoInsight/new-twitter-lists
...
New twitter lists
Former-commit-id: 5a4d3a87d5
2015-02-17 15:36:13 -05:00
Robyn Speer
ad22387a53
add surface forms from Twitter 2014 data
...
Former-commit-id: ffdaa82b11
2015-02-17 15:06:11 -05:00
Robyn Speer
8d57b39a7b
stop running 'remove_unsafe_private_use' unnecessarily
...
Former-commit-id: b6f246ecbb
2015-02-17 14:02:36 -05:00
Robyn Speer
bc780c63c8
enable wordlist balancing, surface form counting
2015-02-17 13:43:22 -05:00
Robyn Speer
f4280dcad0
add twitter-stems-2014 wordlist data
...
Former-commit-id: 6ab72201cd
2015-02-11 13:29:32 -05:00
Robyn Speer
07e61be7e3
add utility for combining wordlists
2015-02-11 11:45:10 -05:00
Robyn Speer
23bd5ba76c
command-line entry points
2015-02-10 12:28:29 -05:00
Robyn Speer
8b322ce534
Initial commit
2015-02-04 20:19:36 -05:00
Robyn Speer
03fac20b1b
Allow multithreaded SQLite on Python 3
...
Former-commit-id: bf0071fd8b
2014-10-02 18:10:09 -04:00
Robyn Speer
5153faf43e
construct the download path correctly, even on Windows
...
Former-commit-id: 6d90cef415
2014-09-08 10:56:48 -04:00
Robyn Speer
0c61406cdc
remove unused global
...
Former-commit-id: c55a701885
2014-09-02 14:29:31 -04:00
Robyn Speer
b357ffaa09
cleanups to building and uploading, from code review
...
Former-commit-id: 5dee417302
2014-08-18 14:14:01 -04:00
Robyn Speer
759534392f
Add license text for the whole package
...
Former-commit-id: cb7b2b76e6
2014-06-02 16:37:32 -04:00
Robyn Speer
a06c3fc648
A different plan for the top-level word_frequency function.
...
When, before, I was importing wordfreq.query at the top level, this
created a dependency loop when installing wordfreq.
The new top-level __init__.py provides just a `word_frequency` function,
which imports the real function as needed and calls it. This should
avoid the dependency loop, at the cost of making
`wordfreq.word_frequency` slightly less efficient than
`wordfreq.query.word_frequency`.
Former-commit-id: 44ccf40742
2014-02-24 18:03:31 -05:00
Robyn Speer
b6b3a6f5f6
version 0.4: minor code changes, debugged database
...
- The database is built under Python 3.3.2, so it should correctly
implement Python 3's Unicode tricks, including special handling
of Greek lowercase letters. (Version 0.3 was supposed to do this
as well, but apparently, it didn't.)
- `word_frequency` and `iter_wordlist` can be imported from the
top level.
- The new function `random_words` supplies a string made from
random words that are sufficiently high in rank order.
Former-commit-id: 3702a7c8d0
2014-02-24 16:29:06 -05:00
Robyn Speer
207defe6ff
Sometimes you need some random words.
...
Former-commit-id: 3447ae732e
2014-01-06 15:51:10 -05:00
Andrew Lin
181e8e08fa
Remove the tests for metanl_word_frequency too. Doh.
...
Former-commit-id: 68d262791c
2013-11-11 13:21:25 -05:00
Robyn Speer
f369df3e82
Merge pull request #1 from LuminosoInsight/remove_metanl_wf
...
Remove metanl_word_frequency(), which we no longer need.
Former-commit-id: 63bebe6ad3
2013-11-11 10:13:25 -08:00
Robyn Speer
634cf6af6d
data is now hosted on wordfreq.services.luminoso.com
...
Former-commit-id: 56f2c606f1
2013-11-07 14:43:15 -05:00
Andrew Lin
cf45720f66
Remove metanl_word_frequency(), which we no longer need.
...
Former-commit-id: 76a7267670
2013-11-04 16:51:25 -05:00
Robyn Speer
5f7c7e032c
Clear wordlists before inserting them; yell at Python 2
...
Former-commit-id: 823b3828cd
2013-11-01 19:29:37 -04:00
Robyn Speer
5fc933495f
Revert "code review and pep8 fixes"
...
This reverts commit ae6e03fa06
[formerly b4b8ba8be7
].
Conflicts:
wordfreq/transfer.py
Former-commit-id: 5c8ba34492
2013-11-01 17:33:39 -04:00
Robyn Speer
4d904a3bae
Merge branch 'master' of github.com:LuminosoInsight/wordfreq
...
Conflicts:
wordfreq/transfer.py
Former-commit-id: 90e042f196
2013-11-01 17:05:59 -04:00
Robyn Speer
ae6e03fa06
code review and pep8 fixes
...
Former-commit-id: b4b8ba8be7
2013-11-01 17:05:12 -04:00
Lance Nathan
cbb3207e4f
Two small stylistic tweaks
...
Former-commit-id: ea29469643
2013-10-31 16:00:48 -04:00
Robyn Speer
5168da105a
make the tests less picky about numerical exactness
...
Former-commit-id: 2b2bd943d2
2013-10-31 15:43:19 -04:00
Robyn Speer
313306f12e
try to match the wordlist metanl actually uses
...
Former-commit-id: 90772e33fb
2013-10-31 15:13:22 -04:00
Robyn Speer
773f6b9843
The metanl scale is not what I thought it was.
...
Former-commit-id: 0d2fb21726
2013-10-31 14:38:01 -04:00
Robyn Speer
351378e318
Don't download the DB if the right version is already there
...
Former-commit-id: e931062b5a
2013-10-31 14:12:04 -04:00
Robyn Speer
16bc844841
try being really nonspecific about functools32 versions
...
Former-commit-id: c1564908f2
2013-10-31 14:06:06 -04:00
Robyn Speer
8690ac3f57
be less specific about the functools32 version
...
Former-commit-id: 2542cf9e35
2013-10-31 14:02:40 -04:00
Robyn Speer
9163a67a9f
Add wordfreq_data files.
...
Now the build process is repeatable from scratch, even if something goes
wrong with the download server.
Former-commit-id: 26c0d7dd28
2013-10-31 13:39:02 -04:00
Robyn Speer
101e767ad9
When strings are inconsistent between py2 and 3, don't test them on py2.
2013-10-31 13:11:13 -04:00
Robyn Speer
52bcb99c48
add util.py, which provides standardize_word
2013-10-30 18:14:43 -04:00
Robyn Speer
5b31bd415f
and of course this changes the metanl constant
2013-10-30 18:14:34 -04:00
Robyn Speer
4bda3e6b6f
Turns out we need to change the metanl constant after normalizing words.
2013-10-30 16:58:10 -04:00
Robyn Speer
8f00846117
Normalize words when storing them or looking them up.
2013-10-30 14:59:57 -04:00
Robyn Speer
ea5de7cb2a
Revise the build test to compare lengths of wordlists.
...
The test currently fails on Python 3, for some strange reason.
2013-10-30 13:22:56 -04:00
Lance
74cfb69f5a
Another Py3 change, this one for functools32
2013-10-30 12:06:41 -04:00