Commit Graph

171 Commits

Author SHA1 Message Date
Rob Speer
05cf94d1fd Work on making Japanese tokenization use MeCab consistently 2015-05-27 18:10:25 -04:00
Rob Speer
0e5156e162 Merge branch 'master' into newbuild
Conflicts:
	setup.py
	wordfreq/build.py
	wordfreq/config.py
2015-05-21 20:41:47 -04:00
Rob Speer
84e5edcea1 rebuild data 2015-05-21 20:36:15 -04:00
Rob Speer
410912d8f0 remove old tests 2015-05-21 20:36:09 -04:00
Rob Speer
b42594fa5f allow more language matches; reorder some parameters 2015-05-21 20:35:02 -04:00
Rob Speer
df863a5169 tests for new wordfreq with full coverage 2015-05-21 20:34:17 -04:00
Rob Speer
dd41e61c57 update README, another setup fix 2015-05-13 04:09:34 -04:00
Rob Speer
f13cca4d81 update dependencies 2015-05-12 12:30:01 -04:00
Rob Speer
bb18f741e2 restore missing line in setup.py 2015-05-12 12:24:18 -04:00
Rob Speer
35aec061de add new data files from wordfreq_builder 2015-05-11 18:45:47 -04:00
Rob Speer
9b63e54471 WIP: burn stuff down 2015-05-08 15:28:52 -04:00
Lance Nathan
e8a1548d93 Tweak to previous variable name fix 2015-05-06 17:57:10 -04:00
Lance Nathan
4632ffb177 Merge pull request #6 from LuminosoInsight/ftfy4
Clean data with ftfy v4
2015-05-06 17:32:45 -04:00
Lance Nathan
5f05b52fe5 Merge pull request #5 from LuminosoInsight/dutch-201504
Better Dutch surface-form data
2015-05-06 17:15:21 -04:00
Rob Speer
506073030a fix reused variable name 2015-05-06 17:06:37 -04:00
Rob Speer
2f3bb955d1 set version number to 0.8 2015-05-05 12:05:00 -04:00
Rob Speer
24a7c73e6d Merge branch 'dutch-201504' into ftfy4
Conflicts:
	setup.py
2015-05-05 12:04:44 -04:00
Rob Speer
70b2c678ea require ftfy 4 2015-05-05 12:04:13 -04:00
Rob Speer
873ace87db v0.7: make a proper Dutch 'surfaces' list 2015-04-30 13:01:24 -04:00
Rob Speer
6cf46ee5aa Merge branch 'master' into dutch-201503
Conflicts:
	wordfreq/build.py
2015-04-29 14:36:24 -04:00
Rob Speer
af5f65b328 start a new multilingual wordlist called 'stems'
So far, this wordlist is only in Dutch.
2015-03-31 15:59:30 -04:00
Rob Speer
3507d8b630 Fix Dutch lists
- Use surface forms consistently, not stems
- Count all instances of words on Wikipedia, not one per article
2015-03-12 16:00:03 -04:00
Andrew Lin
cfe58cd899 Merge pull request #3 from LuminosoInsight/variable_name_fix
Fix a variable name for clarity.
2015-03-11 14:10:53 -04:00
Rob Speer
377336bcdc new Dutch data, bump version to 0.6 2015-03-03 15:54:45 -05:00
Andrew Lin
434c603798 Fix a variable name for clarity. 2015-03-03 11:59:46 -05:00
Andrew Lin
5a4d3a87d5 Merge pull request #2 from LuminosoInsight/new-twitter-lists
New twitter lists
2015-02-17 15:36:13 -05:00
Rob Speer
ffdaa82b11 add surface forms from Twitter 2014 data 2015-02-17 15:06:11 -05:00
Rob Speer
b6f246ecbb stop running 'remove_unsafe_private_use' unnecessarily 2015-02-17 14:02:36 -05:00
Rob Speer
6ab72201cd add twitter-stems-2014 wordlist data 2015-02-11 13:29:32 -05:00
Rob Speer
bf0071fd8b Allow multithreaded SQLite on Python 3 2014-10-02 18:10:09 -04:00
Rob Speer
6d90cef415 construct the download path correctly, even on Windows 2014-09-08 10:56:48 -04:00
Rob Speer
c55a701885 remove unused global 2014-09-02 14:29:31 -04:00
Rob Speer
5dee417302 cleanups to building and uploading, from code review 2014-08-18 14:14:01 -04:00
Rob Speer
cb7b2b76e6 Add license text for the whole package 2014-06-02 16:37:32 -04:00
Rob Speer
44ccf40742 A different plan for the top-level word_frequency function.
When, before, I was importing wordfreq.query at the top level, this
created a dependency loop when installing wordfreq.

The new top-level __init__.py provides just a `word_frequency` function,
which imports the real function as needed and calls it. This should
avoid the dependency loop, at the cost of making
`wordfreq.word_frequency` slightly less efficient than
`wordfreq.query.word_frequency`.
2014-02-24 18:03:31 -05:00
Rob Speer
3702a7c8d0 version 0.4: minor code changes, debugged database
- The database is built under Python 3.3.2, so it should correctly
  implement Python 3's Unicode tricks, including special handling
  of Greek lowercase letters. (Version 0.3 was supposed to do this
  as well, but apparently, it didn't.)
- `word_frequency` and `iter_wordlist` can be imported from the
  top level.
- The new function `random_words` supplies a string made from
  random words that are sufficiently high in rank order.
2014-02-24 16:29:06 -05:00
Rob Speer
3447ae732e Sometimes you need some random words. 2014-01-06 15:51:10 -05:00
Andrew Lin
68d262791c Remove the tests for metanl_word_frequency too. Doh. 2013-11-11 13:21:25 -05:00
Rob Speer
63bebe6ad3 Merge pull request #1 from LuminosoInsight/remove_metanl_wf
Remove metanl_word_frequency(), which we no longer need.
2013-11-11 10:13:25 -08:00
Rob Speer
56f2c606f1 data is now hosted on wordfreq.services.luminoso.com 2013-11-07 14:43:15 -05:00
Andrew Lin
76a7267670 Remove metanl_word_frequency(), which we no longer need. 2013-11-04 16:51:25 -05:00
Rob Speer
823b3828cd Clear wordlists before inserting them; yell at Python 2 2013-11-01 19:29:37 -04:00
Rob Speer
5c8ba34492 Revert "code review and pep8 fixes"
This reverts commit b4b8ba8be7.

Conflicts:
	wordfreq/transfer.py
2013-11-01 17:33:39 -04:00
Rob Speer
90e042f196 Merge branch 'master' of github.com:LuminosoInsight/wordfreq
Conflicts:
	wordfreq/transfer.py
2013-11-01 17:05:59 -04:00
Rob Speer
b4b8ba8be7 code review and pep8 fixes 2013-11-01 17:05:12 -04:00
Lance Nathan
ea29469643 Two small stylistic tweaks 2013-10-31 16:00:48 -04:00
Rob Speer
2b2bd943d2 make the tests less picky about numerical exactness 2013-10-31 15:43:19 -04:00
Rob Speer
90772e33fb try to match the wordlist metanl actually uses 2013-10-31 15:13:22 -04:00
Rob Speer
0d2fb21726 The metanl scale is not what I thought it was. 2013-10-31 14:38:01 -04:00
Rob Speer
e931062b5a Don't download the DB if the right version is already there 2013-10-31 14:12:04 -04:00