Commit Graph

27 Commits

Author SHA1 Message Date
slibs63
927d4f45a4 Merge pull request #30 from LuminosoInsight/add-reddit
Add English data from Reddit corpus

Former-commit-id: d18fee3d78
2016-01-14 15:52:39 -05:00
Sara Jewett
42d209cbe2 Specify encoding when dealing with files
Former-commit-id: 37f9e12b93
2015-12-23 15:49:13 -05:00
Robyn Speer
23949a4512 rebuild data files
Former-commit-id: 2dcf368481
2015-11-30 17:06:39 -05:00
Robyn Speer
a4554fb87c tokenize Chinese using jieba and our own frequencies
Former-commit-id: 2327f2e4d6
2015-09-05 03:16:56 -04:00
Robyn Speer
6f10e71d29 bump to version 1.1
Former-commit-id: 694c28d5e4
2015-08-25 17:44:52 -04:00
Robyn Speer
8795525372 Use the regex implementation of Unicode segmentation
Former-commit-id: 95998205ad
2015-08-24 17:11:08 -04:00
Robyn Speer
3ff0f30218 put back the freqs_to_cBpack cutoff; prepare for 1.0
Former-commit-id: c5708b24e4
2015-07-28 18:01:12 -04:00
Robyn Speer
090cfa7088 declare 'mecab' as an extra
Former-commit-id: a69ea5ad52
2015-07-02 17:11:51 -04:00
Robyn Speer
83939020d0 declare that tests require mecab-python3
Former-commit-id: 7b4ebd1805
2015-07-02 11:29:11 -04:00
Robyn Speer
215eafc50b add Twitter-specific wordlists
Former-commit-id: 7e3066d3fc
2015-07-01 17:49:33 -04:00
Robyn Speer
4c2b766f46 bump version number
Former-commit-id: 053f372ebc
2015-06-30 14:54:13 -04:00
Robyn Speer
2dc3d82a98 clearer error on py2
Former-commit-id: ed19d79c5a
2015-05-28 14:05:11 -04:00
Robyn Speer
a3cc8d403c add installation instructions to the readme
Former-commit-id: 0f4ca80026
2015-05-28 14:02:12 -04:00
Robyn Speer
7c6cf84749 update README, another setup fix
Former-commit-id: dd41e61c57
2015-05-13 04:09:34 -04:00
Robyn Speer
c1edefa419 update dependencies
Former-commit-id: f13cca4d81
2015-05-12 12:30:01 -04:00
Robyn Speer
fd4df8d1eb restore missing line in setup.py
Former-commit-id: bb18f741e2
2015-05-12 12:24:18 -04:00
Robyn Speer
aa0e844b81 add new data files from wordfreq_builder
Former-commit-id: 35aec061de
2015-05-11 18:45:47 -04:00
Robyn Speer
f92598b13d WIP: burn stuff down
Former-commit-id: 9b63e54471
2015-05-08 15:28:52 -04:00
Robyn Speer
cb6b2a8002 v0.7: make a proper Dutch 'surfaces' list
Former-commit-id: 873ace87db
2015-04-30 13:01:24 -04:00
Robyn Speer
351378e318 Don't download the DB if the right version is already there
Former-commit-id: e931062b5a
2013-10-31 14:12:04 -04:00
Robyn Speer
16bc844841 try being really nonspecific about functools32 versions
Former-commit-id: c1564908f2
2013-10-31 14:06:06 -04:00
Robyn Speer
8690ac3f57 be less specific about the functools32 version
Former-commit-id: 2542cf9e35
2013-10-31 14:02:40 -04:00
Robyn Speer
8f00846117 Normalize words when storing them or looking them up. 2013-10-30 14:59:57 -04:00
Lance
74cfb69f5a Another Py3 change, this one for functools32 2013-10-30 12:06:41 -04:00
Robyn Speer
a95d88d1b9 Implement the data uploady downloady stuff in setup. 2013-10-29 16:44:13 -04:00
Robyn Speer
36344d3737 prepare to write custom commands in setup.py 2013-10-29 12:43:41 -04:00
Robyn Speer
e8273e47a1 Initial version.
Noticeably missing: data files or any way to get them.
2013-10-28 19:26:44 -04:00