wordfreq

mirror of https://github.com/rspeer/wordfreq.git synced 2024-12-23 09:21:37 +00:00

History

Robyn Speer 08816a21d1 Remove Malayalam; support for it isn't ready There are Unicode normalization problems with Malayalam -- as best I understand it, Unicode simply neglected to include normalization forms for Malayalam "chillu" characters even though they changed how they're represented in Unicode 5.1 and again in Unicode 9. The result is that words that print the same end up with multiple entries, with different codepoint sequences that don't normalize to each other. I certainly don't know how to resolve this, and it would need to be resolved to have something that we could reasonably call Malayalam word frequencies.		2021-03-30 14:10:58 -04:00
..
test_at_sign.py	include data from xc rebuild	2018-07-15 01:01:35 -04:00
test_chinese.py	specifically test that the long sequence underflows to 0	2021-02-18 15:09:31 -05:00
test_french_and_related.py	fix regex's inconsistent word breaking around apostrophes	2020-04-28 15:19:56 -04:00
test_general.py	Remove Malayalam; support for it isn't ready	2021-03-30 14:10:58 -04:00
test_japanese.py	Round frequencies to 3 significant digits	2018-06-18 15:21:33 -04:00
test_korean.py	Round frequencies to 3 significant digits	2018-06-18 15:21:33 -04:00
test_transliteration.py	port remaining tests to pytest	2018-06-01 16:40:51 -04:00