wordfreq

mirror of https://github.com/rspeer/wordfreq.git synced 2024-12-23 09:21:37 +00:00

Author	SHA1	Message	Date
Robyn Speer	0a2bfb2710	Tokenization in Korean, plus abjad languages (#38 ) * Remove marks from more languages * Add Korean tokenization, and include MeCab files in data * add a Hebrew tokenization test * fix terminology in docstrings about abjad scripts * combine Japanese and Korean tokenization into the same function Former-commit-id: `fec6eddcc3`	2016-07-15 15:10:25 -04:00
Andrew Lin	e27a75029d	Revert "Remove the no-longer-existent .txt files from the MANIFEST." This reverts commit `2089090151` [formerly `db41bc7902`]. Former-commit-id: `cd0797e1c8`	2015-09-24 13:31:34 -04:00
Andrew Lin	2089090151	Remove the no-longer-existent .txt files from the MANIFEST. Former-commit-id: `db41bc7902`	2015-09-02 14:27:15 -04:00
Joshua Chin	7c8266aeb7	removes combining marks from arabic words instead of treating them as punctuation Former-commit-id: `cebca52ea3`	2015-06-25 12:36:41 -04:00
Joshua Chin	0ddf0220fa	added non_punct to MANIFEST.in and moved it into data Former-commit-id: `b198f4b0c2`	2015-06-24 17:30:01 -04:00
Robyn Speer	aa0e844b81	add new data files from wordfreq_builder Former-commit-id: `35aec061de`	2015-05-11 18:45:47 -04:00