wordfreq

mirror of https://github.com/rspeer/wordfreq.git synced 2024-12-23 17:31:41 +00:00

Author	SHA1	Message	Date
Elia Robyn Lake	2be781fd1a	v3.1: support py3.12, update formatting, replace pkg_resources with locate	2023-11-21 18:07:04 -05:00
Elia Robyn Lake	bf05b1b1dc	estimate the freq distribution of numbers	2022-03-10 18:33:42 -05:00
Elia Robyn Speer	b60ac1b803	Merge remote-tracking branch 'origin/apostrophe-consistency'	2021-09-02 18:13:53 +00:00
Robyn Speer	ed23bf3ebe	specifically test that the long sequence underflows to 0	2021-02-18 15:09:31 -05:00
Robyn Speer	75a56b68fb	change math for INFERRED_SPACE_FACTOR to not overflow	2021-02-18 14:44:39 -05:00
Robyn Speer	ad02d96f1b	update dependencies and test for consistent results	2020-09-08 16:03:33 -04:00
Robyn Speer	86b928f967	include data from xc rebuild	2018-07-15 01:01:35 -04:00
Robyn Speer	75b4d62084	port test.py and test_chinese.py to pytest	2018-06-01 16:33:06 -04:00
Robyn Speer	8e3dff3c1c	Traditional Chinese should be preserved through tokenization	2018-03-08 18:08:55 -05:00
Robyn Speer	5ab5d2ea55	Separate preprocessing from tokenization	2018-03-08 16:26:17 -05:00
Robyn Speer	71a0ad6abb	Use langcodes when tokenizing again (it no longer connects to a DB)	2017-04-27 15:09:59 -04:00
Robyn Speer	7dc3f03ebd	import new wordlists from Exquisite Corpus	2017-01-05 17:59:26 -05:00
Robyn Speer	4a4534c466	test_chinese: fix typo in comment Former-commit-id: `2a84a926f5`	2015-09-24 13:41:11 -04:00
Robyn Speer	1adbb1aaf1	add `external_wordlist` option to tokenize Former-commit-id: `669bd16c13`	2015-09-10 18:09:41 -04:00
Robyn Speer	f0c7c3a02c	Lower the frequency of phrases with inferred token boundaries Former-commit-id: `5c8c36f4e3`	2015-09-10 14:16:22 -04:00
Robyn Speer	a4554fb87c	tokenize Chinese using jieba and our own frequencies Former-commit-id: `2327f2e4d6`	2015-09-05 03:16:56 -04:00

16 Commits