Robyn Speer
860e929bf8
update Japanese data; test Japanese and token combining
...
Former-commit-id: 611a6a35de
2015-05-28 14:01:56 -04:00
Robyn Speer
5db3c4ef9e
Work on making Japanese tokenization use MeCab consistently
...
Former-commit-id: 05cf94d1fd
2015-05-27 18:10:25 -04:00
Robyn Speer
c66b55d8dd
Merge branch 'master' into newbuild
...
Conflicts:
setup.py
wordfreq/build.py
wordfreq/config.py
Former-commit-id: 0e5156e162
2015-05-21 20:41:47 -04:00
Robyn Speer
65f6107d36
rebuild data
...
Former-commit-id: 84e5edcea1
2015-05-21 20:36:15 -04:00
Robyn Speer
4a865bfaec
remove old tests
...
Former-commit-id: 410912d8f0
2015-05-21 20:36:09 -04:00
Robyn Speer
8954061a2a
allow more language matches; reorder some parameters
...
Former-commit-id: b42594fa5f
2015-05-21 20:35:02 -04:00
Robyn Speer
26517c1b86
tests for new wordfreq with full coverage
...
Former-commit-id: df863a5169
2015-05-21 20:34:17 -04:00
Robyn Speer
7c6cf84749
update README, another setup fix
...
Former-commit-id: dd41e61c57
2015-05-13 04:09:34 -04:00
Robyn Speer
c1edefa419
update dependencies
...
Former-commit-id: f13cca4d81
2015-05-12 12:30:01 -04:00
Robyn Speer
fd4df8d1eb
restore missing line in setup.py
...
Former-commit-id: bb18f741e2
2015-05-12 12:24:18 -04:00
Robyn Speer
aa0e844b81
add new data files from wordfreq_builder
...
Former-commit-id: 35aec061de
2015-05-11 18:45:47 -04:00
Robyn Speer
f92598b13d
WIP: burn stuff down
...
Former-commit-id: 9b63e54471
2015-05-08 15:28:52 -04:00
Lance Nathan
60c7f3a7da
Tweak to previous variable name fix
...
Former-commit-id: e8a1548d93
2015-05-06 17:57:10 -04:00
Lance Nathan
8400dee933
Merge pull request #6 from LuminosoInsight/ftfy4
...
Clean data with ftfy v4
Former-commit-id: 4632ffb177
2015-05-06 17:32:45 -04:00
Lance Nathan
bbf9164542
Merge pull request #5 from LuminosoInsight/dutch-201504
...
Better Dutch surface-form data
Former-commit-id: 5f05b52fe5
2015-05-06 17:15:21 -04:00
Robyn Speer
5ef406fd43
fix reused variable name
...
Former-commit-id: 506073030a
2015-05-06 17:06:37 -04:00
Robyn Speer
d5d24cb098
set version number to 0.8
...
Former-commit-id: 2f3bb955d1
2015-05-05 12:05:00 -04:00
Robyn Speer
c9ca5b94b0
Merge branch 'dutch-201504' into ftfy4
...
Conflicts:
setup.py
Former-commit-id: 24a7c73e6d
2015-05-05 12:04:44 -04:00
Robyn Speer
922c658b68
require ftfy 4
...
Former-commit-id: 70b2c678ea
2015-05-05 12:04:13 -04:00
Robyn Speer
cb6b2a8002
v0.7: make a proper Dutch 'surfaces' list
...
Former-commit-id: 873ace87db
2015-04-30 13:01:24 -04:00
Robyn Speer
b4dfdaa47c
Merge branch 'master' into dutch-201503
...
Conflicts:
wordfreq/build.py
Former-commit-id: 6cf46ee5aa
2015-04-29 14:36:24 -04:00
Robyn Speer
d29e8bfddf
start a new multilingual wordlist called 'stems'
...
So far, this wordlist is only in Dutch.
Former-commit-id: af5f65b328
2015-03-31 15:59:30 -04:00
Robyn Speer
56e811be19
Fix Dutch lists
...
- Use surface forms consistently, not stems
- Count all instances of words on Wikipedia, not one per article
Former-commit-id: 3507d8b630
2015-03-12 16:00:03 -04:00
Andrew Lin
6e98ca9822
Merge pull request #3 from LuminosoInsight/variable_name_fix
...
Fix a variable name for clarity.
Former-commit-id: cfe58cd899
2015-03-11 14:10:53 -04:00
Robyn Speer
ca944e54aa
new Dutch data, bump version to 0.6
...
Former-commit-id: 377336bcdc
2015-03-03 15:54:45 -05:00
Andrew Lin
6882ac9f0e
Fix a variable name for clarity.
...
Former-commit-id: 434c603798
2015-03-03 11:59:46 -05:00
Andrew Lin
39d914f8e1
Merge pull request #2 from LuminosoInsight/new-twitter-lists
...
New twitter lists
Former-commit-id: 5a4d3a87d5
2015-02-17 15:36:13 -05:00
Robyn Speer
ad22387a53
add surface forms from Twitter 2014 data
...
Former-commit-id: ffdaa82b11
2015-02-17 15:06:11 -05:00
Robyn Speer
8d57b39a7b
stop running 'remove_unsafe_private_use' unnecessarily
...
Former-commit-id: b6f246ecbb
2015-02-17 14:02:36 -05:00
Robyn Speer
f4280dcad0
add twitter-stems-2014 wordlist data
...
Former-commit-id: 6ab72201cd
2015-02-11 13:29:32 -05:00
Robyn Speer
03fac20b1b
Allow multithreaded SQLite on Python 3
...
Former-commit-id: bf0071fd8b
2014-10-02 18:10:09 -04:00
Robyn Speer
5153faf43e
construct the download path correctly, even on Windows
...
Former-commit-id: 6d90cef415
2014-09-08 10:56:48 -04:00
Robyn Speer
0c61406cdc
remove unused global
...
Former-commit-id: c55a701885
2014-09-02 14:29:31 -04:00
Robyn Speer
b357ffaa09
cleanups to building and uploading, from code review
...
Former-commit-id: 5dee417302
2014-08-18 14:14:01 -04:00
Robyn Speer
759534392f
Add license text for the whole package
...
Former-commit-id: cb7b2b76e6
2014-06-02 16:37:32 -04:00
Robyn Speer
a06c3fc648
A different plan for the top-level word_frequency function.
...
When, before, I was importing wordfreq.query at the top level, this
created a dependency loop when installing wordfreq.
The new top-level __init__.py provides just a `word_frequency` function,
which imports the real function as needed and calls it. This should
avoid the dependency loop, at the cost of making
`wordfreq.word_frequency` slightly less efficient than
`wordfreq.query.word_frequency`.
Former-commit-id: 44ccf40742
2014-02-24 18:03:31 -05:00
Robyn Speer
b6b3a6f5f6
version 0.4: minor code changes, debugged database
...
- The database is built under Python 3.3.2, so it should correctly
implement Python 3's Unicode tricks, including special handling
of Greek lowercase letters. (Version 0.3 was supposed to do this
as well, but apparently, it didn't.)
- `word_frequency` and `iter_wordlist` can be imported from the
top level.
- The new function `random_words` supplies a string made from
random words that are sufficiently high in rank order.
Former-commit-id: 3702a7c8d0
2014-02-24 16:29:06 -05:00
Robyn Speer
207defe6ff
Sometimes you need some random words.
...
Former-commit-id: 3447ae732e
2014-01-06 15:51:10 -05:00
Andrew Lin
181e8e08fa
Remove the tests for metanl_word_frequency too. Doh.
...
Former-commit-id: 68d262791c
2013-11-11 13:21:25 -05:00
Robyn Speer
f369df3e82
Merge pull request #1 from LuminosoInsight/remove_metanl_wf
...
Remove metanl_word_frequency(), which we no longer need.
Former-commit-id: 63bebe6ad3
2013-11-11 10:13:25 -08:00
Robyn Speer
634cf6af6d
data is now hosted on wordfreq.services.luminoso.com
...
Former-commit-id: 56f2c606f1
2013-11-07 14:43:15 -05:00
Andrew Lin
cf45720f66
Remove metanl_word_frequency(), which we no longer need.
...
Former-commit-id: 76a7267670
2013-11-04 16:51:25 -05:00
Robyn Speer
5f7c7e032c
Clear wordlists before inserting them; yell at Python 2
...
Former-commit-id: 823b3828cd
2013-11-01 19:29:37 -04:00
Robyn Speer
5fc933495f
Revert "code review and pep8 fixes"
...
This reverts commit ae6e03fa06
[formerly b4b8ba8be7
].
Conflicts:
wordfreq/transfer.py
Former-commit-id: 5c8ba34492
2013-11-01 17:33:39 -04:00
Robyn Speer
4d904a3bae
Merge branch 'master' of github.com:LuminosoInsight/wordfreq
...
Conflicts:
wordfreq/transfer.py
Former-commit-id: 90e042f196
2013-11-01 17:05:59 -04:00
Robyn Speer
ae6e03fa06
code review and pep8 fixes
...
Former-commit-id: b4b8ba8be7
2013-11-01 17:05:12 -04:00
Lance Nathan
cbb3207e4f
Two small stylistic tweaks
...
Former-commit-id: ea29469643
2013-10-31 16:00:48 -04:00
Robyn Speer
5168da105a
make the tests less picky about numerical exactness
...
Former-commit-id: 2b2bd943d2
2013-10-31 15:43:19 -04:00
Robyn Speer
313306f12e
try to match the wordlist metanl actually uses
...
Former-commit-id: 90772e33fb
2013-10-31 15:13:22 -04:00
Robyn Speer
773f6b9843
The metanl scale is not what I thought it was.
...
Former-commit-id: 0d2fb21726
2013-10-31 14:38:01 -04:00