Elia Robyn Lake
b24094b726
update changelog
2022-09-26 17:54:52 -04:00
Elia Robyn Lake (Robyn Speer)
287a7602a5
Merge pull request #97 from synapticarbors/patch-1
...
Include license file in source distribution
2022-09-26 17:48:55 -04:00
Elia Robyn Lake
f722248f4d
Merge branch 'master' of github.com:rspeer/wordfreq
2022-09-26 17:47:59 -04:00
Elia Robyn Lake
f1926de486
fix version dependency of regex
2022-09-26 17:47:06 -04:00
Elia Robyn Lake (Robyn Speer)
a6cf11f94d
Merge pull request #105 from xxyzz/add-optional-deps
...
Add extras packages to `tool.poetry.dependencies` in pyproject.toml
2022-09-26 17:46:38 -04:00
xxyzz
fae9f5843d
Add extras packages to tool.poetry.dependencies
in pyproject.toml
...
Extras dependencies need to be added as optional dependencies, otherwise they
won't be installed.
Document: https://python-poetry.org/docs/pyproject/#extras
Poetry GitHub issue: https://github.com/python-poetry/poetry/issues/5604
2022-09-25 13:26:17 +08:00
Elia Robyn Lake
19535d08ef
move mypy to dev dependencies
2022-04-01 12:11:39 -04:00
Elia Robyn Lake
71f2757b8b
packaging updates
2022-03-11 10:43:37 -05:00
Elia Robyn Lake
f893435b75
documentation updates
2022-03-10 19:22:53 -05:00
Elia Robyn Lake
981fab53aa
add py.typed
2022-03-10 19:16:38 -05:00
Elia Robyn Lake
ed7dccbf8b
update version and documentation
2022-03-10 19:12:45 -05:00
Elia Robyn Lake
bf05b1b1dc
estimate the freq distribution of numbers
2022-03-10 18:33:42 -05:00
Elia Robyn Lake
4e373750e8
move notes to self into notes/
2022-03-09 17:22:36 -05:00
Elia Robyn Lake
f800ff9bcc
work on rel. frequencies of numbers, and other features
2022-02-18 11:33:28 -05:00
Elia Robyn Lake
ef4d6fe0df
run black
2022-02-08 18:27:18 -05:00
Elia Robyn Lake
3c4819e7e5
update packaging, try to handle digits better
2022-02-08 18:24:36 -05:00
Joshua Adelman
60f7baba5d
Include license file in source distribution
2021-10-19 15:30:59 -04:00
Elia Robyn Speer
2361606b3a
fix merge conflict markers in setup
2021-09-02 21:49:49 +00:00
Elia Robyn Speer
b60ac1b803
Merge remote-tracking branch 'origin/apostrophe-consistency'
2021-09-02 18:13:53 +00:00
Elia Robyn Speer
c2a9fe03f1
use ftfy's uncurl_quotes in lossy_tokenize
2021-09-02 17:47:47 +00:00
Robyn Speer
6f1f626f1b
update email address
2021-08-23 17:46:34 -04:00
Robyn Speer
c244ff0d10
readme update: web text comes from OSCAR
2021-04-15 14:45:29 -04:00
Sara Jewett
b13d35e503
Merge pull request #91 from LuminosoInsight/data-update-2.5
...
Version 2.5, incorporating OSCAR data
2021-04-15 14:32:10 -04:00
Robyn Speer
16122083b3
XC was built without Russian Web data; reflect this in the table
...
The Russian sub-corpus of OSCAR is corrupted, so we skipped over it in
the exquisite-corpus build.
2021-04-14 14:28:12 -04:00
Robyn Speer
b6614c1a33
Merge branch 'data-update-2.5' of github.com:LuminosoInsight/wordfreq into data-update-2.5
2021-04-14 14:26:54 -04:00
Robyn Speer
08816a21d1
Remove Malayalam; support for it isn't ready
...
There are Unicode normalization problems with Malayalam -- as best I understand
it, Unicode simply neglected to include normalization forms for Malayalam "chillu"
characters even though they changed how they're represented in Unicode 5.1 and
again in Unicode 9.
The result is that words that print the same end up with multiple entries, with
different codepoint sequences that don't normalize to each other.
I certainly don't know how to resolve this, and it would need to be resolved to
have something that we could reasonably call Malayalam word frequencies.
2021-03-30 14:10:58 -04:00
Robyn Speer
90f0e0a88e
Update table, remove Galician (only two sources)
2021-03-30 13:17:36 -04:00
Robyn Speer
9bab1024b7
add OSCAR citation
2021-03-30 12:56:10 -04:00
Robyn Speer
fea45fd501
Merge remote-tracking branch 'origin/master' into data-update-2.5
2021-03-30 12:53:09 -04:00
Robyn Speer
8777ad0811
remove Swahili, the data isn't reliable
2021-03-29 18:15:58 -04:00
Robyn Speer
00e60df106
Merge branch 'master' into data-update-2.5
2021-03-29 16:42:24 -04:00
Robyn Speer
fc5c4cdda8
small documentation fixes
2021-03-29 16:41:47 -04:00
Robyn Speer
ec48c0a123
update data and tests for 2.5
2021-03-29 16:18:08 -04:00
Lance Nathan
32093d9efc
Merge pull request #89 from LuminosoInsight/dependencies-and-tokens
...
Rework CJK dependencies and fix a tokenization bug
2021-02-23 15:15:17 -05:00
Robyn Speer
168bb2a6ed
fix version, update instructions and changelog
2021-02-18 18:25:16 -05:00
Robyn Speer
de636a804e
Use Python packages to find dictionaries for MeCab
2021-02-18 18:18:06 -05:00
Robyn Speer
ed23bf3ebe
specifically test that the long sequence underflows to 0
2021-02-18 15:09:31 -05:00
Robyn Speer
75a56b68fb
change math for INFERRED_SPACE_FACTOR to not overflow
2021-02-18 14:44:39 -05:00
Lance Nathan
7318f58df9
Merge pull request #88 from LuminosoInsight/version2.4
...
work with langcodes 3.0, without language_data
2021-02-09 17:36:09 -05:00
Robyn Speer
ad3a5c533f
work with langcodes 3.0, without language_data
2021-02-09 17:27:22 -05:00
Robyn Speer
53b1ee2fa0
Merge pull request #84 from LuminosoInsight/add-initial-vowels
...
Update the "initial vowels" in French/Catalan
2021-02-03 13:47:30 -05:00
Lance Nathan
a31deec580
Update the "initial vowels" in French/Catalan
...
User LBeaudoux observed (https://github.com/LuminosoInsight/wordfreq/pull/82 )
that "Œ and œ should be considered as vowels that might appear at the start of
a word in French". Further investigation of the French wordfreq list revealed
words in the data starting with other vowels (such as d'yvonne, d'åland, l'ïle,
d'özil). This PR is a combination of LBeaudoux's PR and the latter fact.
(The updated regex is also used for Catalan, but should have no actual effect.
To the best of our understanding, "y" appears in Catalan only in the digraph
"ny" and in foreign words--the Catalan wordlist contains "york", "by", "city",
several English names, and so forth, but no real Catalan words starting with
"y"; cf "ioga", "iogurt". The wordlist in fact contained "l'fbi" and "l'nba",
but cases of "l'" followed by a vowel like the ones found in French.)
2020-10-08 12:23:22 -04:00
Robyn Speer
c8229a5378
update the changelog
2020-10-01 16:12:41 -04:00
Robyn Speer
fd0ac9a272
update README examples
2020-10-01 16:05:43 -04:00
Robyn Speer
8c00a3c500
updated frequency data
2020-09-30 17:56:12 -04:00
Robyn Speer
ad02d96f1b
update dependencies and test for consistent results
2020-09-08 16:03:33 -04:00
Lance Nathan
ca4681b361
Merge pull request #77 from LuminosoInsight/regex-apostrophe-fix
...
Fix regex's inconsistent word breaking around apostrophes
2020-04-28 16:19:40 -04:00
Robyn Speer
0ff812a711
update version and changelog
2020-04-28 15:24:24 -04:00
Robyn Speer
13ce4606b2
fix regex's inconsistent word breaking around apostrophes
2020-04-28 15:19:56 -04:00
Robyn Speer
86ae2a610f
update CHANGELOG for 2.3.1
2020-04-22 11:12:02 -04:00