Commit Graph

27 Commits

Author SHA1 Message Date
slibs63
258f5088e9 Merge pull request #30 from LuminosoInsight/add-reddit
Add English data from Reddit corpus

Former-commit-id: d18fee3d78
2016-01-14 15:52:39 -05:00
Sara Jewett
7b6f88b059 Specify encoding when dealing with files
Former-commit-id: 37f9e12b93
2015-12-23 15:49:13 -05:00
Rob Speer
9a1b00ba0c rebuild data files
Former-commit-id: 2dcf368481
2015-11-30 17:06:39 -05:00
Rob Speer
91cc82f76d tokenize Chinese using jieba and our own frequencies
Former-commit-id: 2327f2e4d6
2015-09-05 03:16:56 -04:00
Rob Speer
1f5c828642 bump to version 1.1
Former-commit-id: 694c28d5e4
2015-08-25 17:44:52 -04:00
Rob Speer
f4cf46ab9c Use the regex implementation of Unicode segmentation
Former-commit-id: 95998205ad
2015-08-24 17:11:08 -04:00
Rob Speer
4350bc3ed7 put back the freqs_to_cBpack cutoff; prepare for 1.0
Former-commit-id: c5708b24e4
2015-07-28 18:01:12 -04:00
Rob Speer
19e74e91c6 declare 'mecab' as an extra
Former-commit-id: a69ea5ad52
2015-07-02 17:11:51 -04:00
Rob Speer
5d0d5f7cd2 declare that tests require mecab-python3
Former-commit-id: 7b4ebd1805
2015-07-02 11:29:11 -04:00
Rob Speer
66ad6f882e add Twitter-specific wordlists
Former-commit-id: 7e3066d3fc
2015-07-01 17:49:33 -04:00
Rob Speer
c9c7e49465 bump version number
Former-commit-id: 053f372ebc
2015-06-30 14:54:13 -04:00
Rob Speer
9a46b80028 clearer error on py2
Former-commit-id: ed19d79c5a
2015-05-28 14:05:11 -04:00
Rob Speer
51f4e4c826 add installation instructions to the readme
Former-commit-id: 0f4ca80026
2015-05-28 14:02:12 -04:00
Rob Speer
c953fc1626 update README, another setup fix
Former-commit-id: dd41e61c57
2015-05-13 04:09:34 -04:00
Rob Speer
5cbc0d0f94 update dependencies
Former-commit-id: f13cca4d81
2015-05-12 12:30:01 -04:00
Rob Speer
6f61cac4cb restore missing line in setup.py
Former-commit-id: bb18f741e2
2015-05-12 12:24:18 -04:00
Rob Speer
1c65cb9f14 add new data files from wordfreq_builder
Former-commit-id: 35aec061de
2015-05-11 18:45:47 -04:00
Rob Speer
9cd6f7c5c5 WIP: burn stuff down
Former-commit-id: 9b63e54471
2015-05-08 15:28:52 -04:00
Rob Speer
732c932ac7 v0.7: make a proper Dutch 'surfaces' list
Former-commit-id: 873ace87db
2015-04-30 13:01:24 -04:00
Rob Speer
63b465c767 Don't download the DB if the right version is already there
Former-commit-id: e931062b5a
2013-10-31 14:12:04 -04:00
Rob Speer
8c3e8f9eb4 try being really nonspecific about functools32 versions
Former-commit-id: c1564908f2
2013-10-31 14:06:06 -04:00
Rob Speer
676cba640f be less specific about the functools32 version
Former-commit-id: 2542cf9e35
2013-10-31 14:02:40 -04:00
Rob Speer
40102a3f63 Normalize words when storing them or looking them up. 2013-10-30 14:59:57 -04:00
Lance
ce07c881c5 Another Py3 change, this one for functools32 2013-10-30 12:06:41 -04:00
Rob Speer
ca5b3e2f5d Implement the data uploady downloady stuff in setup. 2013-10-29 16:44:13 -04:00
Rob Speer
bc00bb3a8b prepare to write custom commands in setup.py 2013-10-29 12:43:41 -04:00
Rob Speer
709ca6be66 Initial version.
Noticeably missing: data files or any way to get them.
2013-10-28 19:26:44 -04:00