wordfreq

mirror of https://github.com/rspeer/wordfreq.git synced 2024-12-23 17:31:41 +00:00

Author	SHA1	Message	Date
Elia Robyn Lake	bf05b1b1dc	estimate the freq distribution of numbers	2022-03-10 18:33:42 -05:00
Elia Robyn Speer	c2a9fe03f1	use ftfy's uncurl_quotes in lossy_tokenize	2021-09-02 17:47:47 +00:00
Robyn Speer	08816a21d1	Remove Malayalam; support for it isn't ready There are Unicode normalization problems with Malayalam -- as best I understand it, Unicode simply neglected to include normalization forms for Malayalam "chillu" characters even though they changed how they're represented in Unicode 5.1 and again in Unicode 9. The result is that words that print the same end up with multiple entries, with different codepoint sequences that don't normalize to each other. I certainly don't know how to resolve this, and it would need to be resolved to have something that we could reasonably call Malayalam word frequencies.	2021-03-30 14:10:58 -04:00
Robyn Speer	90f0e0a88e	Update table, remove Galician (only two sources)	2021-03-30 13:17:36 -04:00
Robyn Speer	8777ad0811	remove Swahili, the data isn't reliable	2021-03-29 18:15:58 -04:00
Robyn Speer	ec48c0a123	update data and tests for 2.5	2021-03-29 16:18:08 -04:00
Robyn Speer	7a32b56c1c	Round frequencies to 3 significant digits	2018-06-18 15:21:33 -04:00
Robyn Speer	42efcfc1ad	relax the test that assumed the Chinese list has few ASCII words	2018-06-15 16:29:15 -04:00
Robyn Speer	ad0f046f47	fixes to tests, including that 'test.py' wasn't found by pytest	2018-06-15 15:48:41 -04:00