Elia Robyn Lake (Robyn Speer)
372f6dbb3b
Merge pull request #97 from synapticarbors/patch-1
...
Include license file in source distribution
2022-09-26 17:48:55 -04:00
Elia Robyn Lake
804b809c9f
Merge branch 'master' of github.com:rspeer/wordfreq
2022-09-26 17:47:59 -04:00
Elia Robyn Lake
1e343970e6
fix version dependency of regex
2022-09-26 17:47:06 -04:00
Elia Robyn Lake (Robyn Speer)
7b46df6895
Merge pull request #105 from xxyzz/add-optional-deps
...
Add extras packages to `tool.poetry.dependencies` in pyproject.toml
2022-09-26 17:46:38 -04:00
xxyzz
3714db9dd6
Add extras packages to tool.poetry.dependencies
in pyproject.toml
...
Extras dependencies need to be added as optional dependencies, otherwise they
won't be installed.
Document: https://python-poetry.org/docs/pyproject/#extras
Poetry GitHub issue: https://github.com/python-poetry/poetry/issues/5604
2022-09-25 13:26:17 +08:00
Elia Robyn Lake
f3074a67be
move mypy to dev dependencies
2022-04-01 12:11:39 -04:00
Elia Robyn Lake
0fc775636b
packaging updates
2022-03-11 10:43:37 -05:00
Elia Robyn Lake
318097264f
documentation updates
2022-03-10 19:22:53 -05:00
Elia Robyn Lake
2738737293
add py.typed
2022-03-10 19:16:38 -05:00
Elia Robyn Lake
2563eb8d72
update version and documentation
2022-03-10 19:12:45 -05:00
Elia Robyn Lake
5d6a41499b
estimate the freq distribution of numbers
2022-03-10 18:33:42 -05:00
Elia Robyn Lake
a01110604b
move notes to self into notes/
2022-03-09 17:22:36 -05:00
Elia Robyn Lake
342c1d0f0e
work on rel. frequencies of numbers, and other features
2022-02-18 11:33:28 -05:00
Elia Robyn Lake
538145c05c
run black
2022-02-08 18:27:18 -05:00
Elia Robyn Lake
91195c793d
update packaging, try to handle digits better
2022-02-08 18:24:36 -05:00
Joshua Adelman
dab4c8da2a
Include license file in source distribution
2021-10-19 15:30:59 -04:00
Elia Robyn Speer
11a3138cea
fix merge conflict markers in setup
2021-09-02 21:49:49 +00:00
Elia Robyn Speer
cc4f39d8c2
Merge remote-tracking branch 'origin/apostrophe-consistency'
2021-09-02 18:13:53 +00:00
Elia Robyn Speer
dc9585766a
use ftfy's uncurl_quotes in lossy_tokenize
2021-09-02 17:47:47 +00:00
Robyn Speer
af847699f6
update email address
2021-08-23 17:46:34 -04:00
Robyn Speer
64bbcbd51b
readme update: web text comes from OSCAR
2021-04-15 14:45:29 -04:00
Sara Jewett
c56e633d53
Merge pull request #91 from LuminosoInsight/data-update-2.5
...
Version 2.5, incorporating OSCAR data
2021-04-15 14:32:10 -04:00
Robyn Speer
2417ea0d39
XC was built without Russian Web data; reflect this in the table
...
The Russian sub-corpus of OSCAR is corrupted, so we skipped over it in
the exquisite-corpus build.
2021-04-14 14:28:12 -04:00
Robyn Speer
81bb9f4338
Merge branch 'data-update-2.5' of github.com:LuminosoInsight/wordfreq into data-update-2.5
2021-04-14 14:26:54 -04:00
Robyn Speer
f885a60bf0
Remove Malayalam; support for it isn't ready
...
There are Unicode normalization problems with Malayalam -- as best I understand
it, Unicode simply neglected to include normalization forms for Malayalam "chillu"
characters even though they changed how they're represented in Unicode 5.1 and
again in Unicode 9.
The result is that words that print the same end up with multiple entries, with
different codepoint sequences that don't normalize to each other.
I certainly don't know how to resolve this, and it would need to be resolved to
have something that we could reasonably call Malayalam word frequencies.
2021-03-30 14:10:58 -04:00
Robyn Speer
08b6cea451
Update table, remove Galician (only two sources)
2021-03-30 13:17:36 -04:00
Robyn Speer
8fd3d77e4f
add OSCAR citation
2021-03-30 12:56:10 -04:00
Robyn Speer
efdf110351
Merge remote-tracking branch 'origin/master' into data-update-2.5
2021-03-30 12:53:09 -04:00
Robyn Speer
cb78887446
remove Swahili, the data isn't reliable
2021-03-29 18:15:58 -04:00
Robyn Speer
ec2e148f8e
Merge branch 'master' into data-update-2.5
2021-03-29 16:42:24 -04:00
Robyn Speer
4263f1af14
small documentation fixes
2021-03-29 16:41:47 -04:00
Robyn Speer
d1949a486a
update data and tests for 2.5
2021-03-29 16:18:08 -04:00
Lance Nathan
4c0b29f460
Merge pull request #89 from LuminosoInsight/dependencies-and-tokens
...
Rework CJK dependencies and fix a tokenization bug
2021-02-23 15:15:17 -05:00
Robyn Speer
d99ac1051a
fix version, update instructions and changelog
2021-02-18 18:25:16 -05:00
Robyn Speer
2cc58d68ad
Use Python packages to find dictionaries for MeCab
2021-02-18 18:18:06 -05:00
Robyn Speer
6b97d093b6
specifically test that the long sequence underflows to 0
2021-02-18 15:09:31 -05:00
Robyn Speer
bd57b64d00
change math for INFERRED_SPACE_FACTOR to not overflow
2021-02-18 14:44:39 -05:00
Lance Nathan
02c3cbe3fb
Merge pull request #88 from LuminosoInsight/version2.4
...
work with langcodes 3.0, without language_data
2021-02-09 17:36:09 -05:00
Robyn Speer
f71acec2d7
work with langcodes 3.0, without language_data
2021-02-09 17:27:22 -05:00
Robyn Speer
7a742499a4
Merge pull request #84 from LuminosoInsight/add-initial-vowels
...
Update the "initial vowels" in French/Catalan
2021-02-03 13:47:30 -05:00
Lance Nathan
917bcdebaa
Update the "initial vowels" in French/Catalan
...
User LBeaudoux observed (https://github.com/LuminosoInsight/wordfreq/pull/82 )
that "Œ and œ should be considered as vowels that might appear at the start of
a word in French". Further investigation of the French wordfreq list revealed
words in the data starting with other vowels (such as d'yvonne, d'åland, l'ïle,
d'özil). This PR is a combination of LBeaudoux's PR and the latter fact.
(The updated regex is also used for Catalan, but should have no actual effect.
To the best of our understanding, "y" appears in Catalan only in the digraph
"ny" and in foreign words--the Catalan wordlist contains "york", "by", "city",
several English names, and so forth, but no real Catalan words starting with
"y"; cf "ioga", "iogurt". The wordlist in fact contained "l'fbi" and "l'nba",
but cases of "l'" followed by a vowel like the ones found in French.)
2020-10-08 12:23:22 -04:00
Robyn Speer
a8915d67f7
update the changelog
2020-10-01 16:12:41 -04:00
Robyn Speer
5986342bc6
update README examples
2020-10-01 16:05:43 -04:00
Robyn Speer
fa98f0b2f6
updated frequency data
2020-09-30 17:56:12 -04:00
Robyn Speer
174ecf580a
update dependencies and test for consistent results
2020-09-08 16:03:33 -04:00
Lance Nathan
e3f87d4aed
Merge pull request #77 from LuminosoInsight/regex-apostrophe-fix
...
Fix regex's inconsistent word breaking around apostrophes
2020-04-28 16:19:40 -04:00
Robyn Speer
becf94f767
update version and changelog
2020-04-28 15:24:24 -04:00
Robyn Speer
96e7792a4a
fix regex's inconsistent word breaking around apostrophes
2020-04-28 15:19:56 -04:00
Robyn Speer
3b7382d770
update CHANGELOG for 2.3.1
2020-04-22 11:12:02 -04:00
Robyn Speer
59f4a08920
packaging fix: require msgpack >= 1.0
2020-04-22 11:10:03 -04:00