Commit Graph

650 Commits

Author SHA1 Message Date
Elia Robyn Lake (Robyn Speer)
bafaf71cdd
Sep 2024 update based on popular coverage 2024-09-22 20:58:30 -04:00
Elia Robyn Lake (Robyn Speer)
146fbae1b3
Update SUNSET.md
Remove a misinterpretable sentence about Reddit data
2024-09-19 11:33:53 -04:00
Elia Robyn Lake
7fcbe64c84 sunset: rephrase a couple of paragraphs 2024-06-25 11:00:43 -04:00
Elia Robyn Lake
9e033608b2 update more text 2024-06-24 19:05:20 -04:00
Elia Robyn Lake
b2e1f68ac8 update docs 2024-06-24 19:02:22 -04:00
Elia Robyn Lake (Robyn Speer)
ca7055b667 Merge pull request #109 from rspeer/dependabot/pip/pygments-2.15.0
Bump pygments from 2.13.0 to 2.15.0
2023-11-21 18:14:40 -05:00
dependabot[bot]
d9799e3d00 Bump pygments from 2.13.0 to 2.15.0
Bumps [pygments](https://github.com/pygments/pygments) from 2.13.0 to 2.15.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.13.0...2.15.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-11-21 23:14:14 +00:00
Elia Robyn Lake
acea20fdf9 update changelog 2023-11-21 18:11:47 -05:00
Elia Robyn Lake
355ef19cfa v3.1.1: fix README 2023-11-21 18:08:47 -05:00
Elia Robyn Lake
891c20919a Merge branch 'master' of github.com:rspeer/wordfreq 2023-11-21 18:07:48 -05:00
Elia Robyn Lake
2be781fd1a v3.1: support py3.12, update formatting, replace pkg_resources with locate 2023-11-21 18:07:04 -05:00
Elia Robyn Lake
c889bd67ed fix mixed up language codes 2023-11-16 09:31:38 -05:00
Elia Robyn Lake
6fc77b4b29 simplify deps by updating pytest 2022-10-25 14:25:00 -04:00
Elia Robyn Lake
59103c52b9 update citation 2022-10-25 14:24:32 -04:00
Elia Robyn Lake
ba424e6c2d update to Apache license 2022-10-25 14:20:23 -04:00
Elia Robyn Lake
d8f10b7fb8 fix tox setup, test on python 3.11 2022-10-25 13:59:13 -04:00
Elia Robyn Lake
b24094b726 update changelog 2022-09-26 17:54:52 -04:00
Elia Robyn Lake (Robyn Speer)
287a7602a5 Merge pull request #97 from synapticarbors/patch-1
Include license file in source distribution
2022-09-26 17:48:55 -04:00
Elia Robyn Lake
f722248f4d Merge branch 'master' of github.com:rspeer/wordfreq 2022-09-26 17:47:59 -04:00
Elia Robyn Lake
f1926de486 fix version dependency of regex 2022-09-26 17:47:06 -04:00
Elia Robyn Lake (Robyn Speer)
a6cf11f94d Merge pull request #105 from xxyzz/add-optional-deps
Add extras packages to `tool.poetry.dependencies` in pyproject.toml
2022-09-26 17:46:38 -04:00
xxyzz
fae9f5843d Add extras packages to tool.poetry.dependencies in pyproject.toml
Extras dependencies need to be added as optional dependencies, otherwise they
won't be installed.

Document: https://python-poetry.org/docs/pyproject/#extras
Poetry GitHub issue: https://github.com/python-poetry/poetry/issues/5604
2022-09-25 13:26:17 +08:00
Elia Robyn Lake
19535d08ef move mypy to dev dependencies 2022-04-01 12:11:39 -04:00
Elia Robyn Lake
71f2757b8b packaging updates 2022-03-11 10:43:37 -05:00
Elia Robyn Lake
f893435b75 documentation updates 2022-03-10 19:22:53 -05:00
Elia Robyn Lake
981fab53aa add py.typed 2022-03-10 19:16:38 -05:00
Elia Robyn Lake
ed7dccbf8b update version and documentation 2022-03-10 19:12:45 -05:00
Elia Robyn Lake
bf05b1b1dc estimate the freq distribution of numbers 2022-03-10 18:33:42 -05:00
Elia Robyn Lake
4e373750e8 move notes to self into notes/ 2022-03-09 17:22:36 -05:00
Elia Robyn Lake
f800ff9bcc work on rel. frequencies of numbers, and other features 2022-02-18 11:33:28 -05:00
Elia Robyn Lake
ef4d6fe0df run black 2022-02-08 18:27:18 -05:00
Elia Robyn Lake
3c4819e7e5 update packaging, try to handle digits better 2022-02-08 18:24:36 -05:00
Joshua Adelman
60f7baba5d Include license file in source distribution 2021-10-19 15:30:59 -04:00
Elia Robyn Speer
2361606b3a fix merge conflict markers in setup 2021-09-02 21:49:49 +00:00
Elia Robyn Speer
b60ac1b803 Merge remote-tracking branch 'origin/apostrophe-consistency' 2021-09-02 18:13:53 +00:00
Elia Robyn Speer
c2a9fe03f1 use ftfy's uncurl_quotes in lossy_tokenize 2021-09-02 17:47:47 +00:00
Robyn Speer
6f1f626f1b update email address 2021-08-23 17:46:34 -04:00
Robyn Speer
c244ff0d10 readme update: web text comes from OSCAR 2021-04-15 14:45:29 -04:00
Sara Jewett
b13d35e503 Merge pull request #91 from LuminosoInsight/data-update-2.5
Version 2.5, incorporating OSCAR data
2021-04-15 14:32:10 -04:00
Robyn Speer
16122083b3 XC was built without Russian Web data; reflect this in the table
The Russian sub-corpus of OSCAR is corrupted, so we skipped over it in
the exquisite-corpus build.
2021-04-14 14:28:12 -04:00
Robyn Speer
b6614c1a33 Merge branch 'data-update-2.5' of github.com:LuminosoInsight/wordfreq into data-update-2.5 2021-04-14 14:26:54 -04:00
Robyn Speer
08816a21d1 Remove Malayalam; support for it isn't ready
There are Unicode normalization problems with Malayalam -- as best I understand
it, Unicode simply neglected to include normalization forms for Malayalam "chillu"
characters even though they changed how they're represented in Unicode 5.1 and
again in Unicode 9.

The result is that words that print the same end up with multiple entries, with
different codepoint sequences that don't normalize to each other.

I certainly don't know how to resolve this, and it would need to be resolved to
have something that we could reasonably call Malayalam word frequencies.
2021-03-30 14:10:58 -04:00
Robyn Speer
90f0e0a88e Update table, remove Galician (only two sources) 2021-03-30 13:17:36 -04:00
Robyn Speer
9bab1024b7 add OSCAR citation 2021-03-30 12:56:10 -04:00
Robyn Speer
fea45fd501 Merge remote-tracking branch 'origin/master' into data-update-2.5 2021-03-30 12:53:09 -04:00
Robyn Speer
8777ad0811 remove Swahili, the data isn't reliable 2021-03-29 18:15:58 -04:00
Robyn Speer
00e60df106 Merge branch 'master' into data-update-2.5 2021-03-29 16:42:24 -04:00
Robyn Speer
fc5c4cdda8 small documentation fixes 2021-03-29 16:41:47 -04:00
Robyn Speer
ec48c0a123 update data and tests for 2.5 2021-03-29 16:18:08 -04:00
Lance Nathan
32093d9efc Merge pull request #89 from LuminosoInsight/dependencies-and-tokens
Rework CJK dependencies and fix a tokenization bug
2021-02-23 15:15:17 -05:00