slibs63
|
927d4f45a4
|
Merge pull request #30 from LuminosoInsight/add-reddit
Add English data from Reddit corpus
Former-commit-id: d18fee3d78
|
2016-01-14 15:52:39 -05:00 |
|
Sara Jewett
|
42d209cbe2
|
Specify encoding when dealing with files
Former-commit-id: 37f9e12b93
|
2015-12-23 15:49:13 -05:00 |
|
Robyn Speer
|
23949a4512
|
rebuild data files
Former-commit-id: 2dcf368481
|
2015-11-30 17:06:39 -05:00 |
|
Robyn Speer
|
a4554fb87c
|
tokenize Chinese using jieba and our own frequencies
Former-commit-id: 2327f2e4d6
|
2015-09-05 03:16:56 -04:00 |
|
Robyn Speer
|
6f10e71d29
|
bump to version 1.1
Former-commit-id: 694c28d5e4
|
2015-08-25 17:44:52 -04:00 |
|
Robyn Speer
|
8795525372
|
Use the regex implementation of Unicode segmentation
Former-commit-id: 95998205ad
|
2015-08-24 17:11:08 -04:00 |
|
Robyn Speer
|
3ff0f30218
|
put back the freqs_to_cBpack cutoff; prepare for 1.0
Former-commit-id: c5708b24e4
|
2015-07-28 18:01:12 -04:00 |
|
Robyn Speer
|
090cfa7088
|
declare 'mecab' as an extra
Former-commit-id: a69ea5ad52
|
2015-07-02 17:11:51 -04:00 |
|
Robyn Speer
|
83939020d0
|
declare that tests require mecab-python3
Former-commit-id: 7b4ebd1805
|
2015-07-02 11:29:11 -04:00 |
|
Robyn Speer
|
215eafc50b
|
add Twitter-specific wordlists
Former-commit-id: 7e3066d3fc
|
2015-07-01 17:49:33 -04:00 |
|
Robyn Speer
|
4c2b766f46
|
bump version number
Former-commit-id: 053f372ebc
|
2015-06-30 14:54:13 -04:00 |
|
Robyn Speer
|
2dc3d82a98
|
clearer error on py2
Former-commit-id: ed19d79c5a
|
2015-05-28 14:05:11 -04:00 |
|
Robyn Speer
|
a3cc8d403c
|
add installation instructions to the readme
Former-commit-id: 0f4ca80026
|
2015-05-28 14:02:12 -04:00 |
|
Robyn Speer
|
7c6cf84749
|
update README, another setup fix
Former-commit-id: dd41e61c57
|
2015-05-13 04:09:34 -04:00 |
|
Robyn Speer
|
c1edefa419
|
update dependencies
Former-commit-id: f13cca4d81
|
2015-05-12 12:30:01 -04:00 |
|
Robyn Speer
|
fd4df8d1eb
|
restore missing line in setup.py
Former-commit-id: bb18f741e2
|
2015-05-12 12:24:18 -04:00 |
|
Robyn Speer
|
aa0e844b81
|
add new data files from wordfreq_builder
Former-commit-id: 35aec061de
|
2015-05-11 18:45:47 -04:00 |
|
Robyn Speer
|
f92598b13d
|
WIP: burn stuff down
Former-commit-id: 9b63e54471
|
2015-05-08 15:28:52 -04:00 |
|
Robyn Speer
|
cb6b2a8002
|
v0.7: make a proper Dutch 'surfaces' list
Former-commit-id: 873ace87db
|
2015-04-30 13:01:24 -04:00 |
|
Robyn Speer
|
351378e318
|
Don't download the DB if the right version is already there
Former-commit-id: e931062b5a
|
2013-10-31 14:12:04 -04:00 |
|
Robyn Speer
|
16bc844841
|
try being really nonspecific about functools32 versions
Former-commit-id: c1564908f2
|
2013-10-31 14:06:06 -04:00 |
|
Robyn Speer
|
8690ac3f57
|
be less specific about the functools32 version
Former-commit-id: 2542cf9e35
|
2013-10-31 14:02:40 -04:00 |
|
Robyn Speer
|
8f00846117
|
Normalize words when storing them or looking them up.
|
2013-10-30 14:59:57 -04:00 |
|
Lance
|
74cfb69f5a
|
Another Py3 change, this one for functools32
|
2013-10-30 12:06:41 -04:00 |
|
Robyn Speer
|
a95d88d1b9
|
Implement the data uploady downloady stuff in setup.
|
2013-10-29 16:44:13 -04:00 |
|
Robyn Speer
|
36344d3737
|
prepare to write custom commands in setup.py
|
2013-10-29 12:43:41 -04:00 |
|
Robyn Speer
|
e8273e47a1
|
Initial version.
Noticeably missing: data files or any way to get them.
|
2013-10-28 19:26:44 -04:00 |
|