Robyn Speer
fd0ac9a272
update README examples
2020-10-01 16:05:43 -04:00
Robyn Speer
8c00a3c500
updated frequency data
2020-09-30 17:56:12 -04:00
Lance Nathan
ca4681b361
Merge pull request #77 from LuminosoInsight/regex-apostrophe-fix
...
Fix regex's inconsistent word breaking around apostrophes
2020-04-28 16:19:40 -04:00
Robyn Speer
0ff812a711
update version and changelog
2020-04-28 15:24:24 -04:00
Robyn Speer
13ce4606b2
fix regex's inconsistent word breaking around apostrophes
2020-04-28 15:19:56 -04:00
Robyn Speer
86ae2a610f
update CHANGELOG for 2.3.1
2020-04-22 11:12:02 -04:00
Robyn Speer
26b4175f3b
packaging fix: require msgpack >= 1.0
2020-04-22 11:10:03 -04:00
Lance Nathan
7c537134ae
Merge pull request #75 from LuminosoInsight/language-match-update
...
use langcodes 2.0 and deprecate 'match_cutoff'
2020-04-20 14:48:58 -04:00
Robyn Speer
d45bcf97de
update changelog for 2.3
2020-04-16 15:51:20 -04:00
Robyn Speer
bf795e6d6c
use langcodes 2.0 and deprecate 'match_cutoff'
2020-04-16 14:09:30 -04:00
Moss Collum
40443c9a3b
Merge pull request #74 from LuminosoInsight/msgpack-1.0-bugfix
...
Fix code affected by a breaking change in msgpack 1.0
2020-02-28 13:05:37 -05:00
Lance Nathan
45a002c1e1
Fix code affected by a breaking change in msgpack 1.0
...
The msgpack readme explains: "Default value of strict_map_key is changed to
True to avoid hashdos. You need to pass strict_map_key=False if you have data
which contain map keys which type is not bytes or str."
chinese.py loads SIMPLIFIED_MAP from disk. Since it is a str.translate
dictionary, its keys are numbers. And since it's a dictionary we created
ourselves, there's no hashdos concern, so we can load it with
strict_map_key=False.
2020-02-28 13:02:45 -05:00
Lance Nathan
e043ebb481
Merge pull request #73 from LuminosoInsight/add-mailmap
...
Add a mailmap
2019-12-18 13:59:36 -05:00
Robyn Speer
feab8b77fb
add a mailmap
2019-12-18 13:52:22 -05:00
Lance Nathan
5f085b2c17
Merge pull request #71 from LuminosoInsight/pytest-fixes
...
Fix a deprecation warning by using raw strings
2019-08-14 16:25:42 -04:00
Robyn Speer
7690bd5b49
fix a deprecation warning by using raw strings
2019-07-16 17:27:14 -04:00
Lance Nathan
832d8f2fdd
Merge pull request #70 from LuminosoInsight/pytest-fixes
...
Fixes to scripts that accidentally run during tests
2019-04-16 11:41:27 -04:00
Robyn Speer
3d02a88b14
Protect top_n from running on import
2019-04-16 11:33:22 -04:00
Robyn Speer
17b1537f2f
ignore the 'scripts' dir when collecting tests
2019-02-20 17:21:07 -05:00
Moss Collum
90bbacb5cb
Merge pull request #69 from LuminosoInsight/revert-68-pytest-jenkins
...
Revert "Build with Pytest on Jenkins"
2019-02-13 18:11:57 -05:00
Moss Collum
50ea040d65
Revert "Build with Pytest on Jenkins"
2019-02-13 18:11:44 -05:00
Lance Nathan
f467504835
Merge pull request #68 from LuminosoInsight/pytest-jenkins
...
Build with Pytest on Jenkins
2019-02-13 17:57:16 -05:00
Moss Collum
e014f1abf7
Build with Pytest on Jenkins
2019-02-13 17:56:20 -05:00
Robyn Speer
a3834180c9
update changelog for v2.2.1
2019-02-05 15:58:10 -05:00
Lance Nathan
96b9808550
Merge pull request #66 from LuminosoInsight/update-msgpack-call
...
Update msgpack parameter
2019-02-05 11:17:07 -05:00
Robyn Speer
dd72051929
update msgpack call in scripts/make_chinese_mapping
2019-02-05 11:16:22 -05:00
Robyn Speer
61a1604b38
update encoding='utf-8' to raw=False
2019-02-04 14:57:38 -05:00
Moss Collum
65a6a89993
Add Jenkinsfile to drive internal build scripts
2019-02-01 19:05:35 -05:00
Robyn Speer
d30183a7d7
Allow a wider range of 'regex' versions
...
The behavior of segmentation shouldn't change within this range, and it
includes the version currently used by SpaCy.
2018-10-25 11:07:55 -04:00
Lance Nathan
c1fe37bab5
Merge pull request #62 from LuminosoInsight/name-update
...
Update my name and the Zenodo citation
2018-10-03 17:30:47 -04:00
Robyn Speer
563e8f7444
Update my name and the Zenodo citation
2018-10-03 17:27:10 -04:00
Lance Nathan
2f8600e975
Merge pull request #60 from LuminosoInsight/gender-neutral-at
...
Recognize "@" in gender-neutral word endings as part of the token
2018-07-24 18:16:31 -04:00
Robyn Speer
287df17a71
update the changelog for version 2.2
2018-07-23 16:38:39 -04:00
Robyn Speer
f73406c69a
Update README to describe @ tokenization
2018-07-23 11:21:44 -04:00
Robyn Speer
86b928f967
include data from xc rebuild
2018-07-15 01:01:35 -04:00
Robyn Speer
65692c3d81
Recognize "@" in gender-neutral word endings as part of the token
2018-07-03 13:22:56 -04:00
Robyn Speer
7bf69595bb
update the CHANGELOG for MeCab fix
2018-06-26 11:31:03 -04:00
Lance Nathan
0149e9ec7f
Merge pull request #59 from LuminosoInsight/korean-install-fixes
...
Korean install fixes
2018-06-26 11:08:06 -04:00
Lance Nathan
79caa526c3
Merge pull request #58 from LuminosoInsight/significant-figures
...
Round wordfreq output to 3 sig. figs, and update documentation
2018-06-25 18:53:39 -04:00
Robyn Speer
830157d8e4
Fix instructions and search path for mecab-ko-dic
...
I'm starting a new Python environment on a new Ubuntu installation. You
never know when a huge yak will show up and demand to be shaved.
I tried following the directions in the README, and found that a couple
of steps were missing. I've added those.
When you follow those steps, it appears to install the MeCab Korean
dictionary in `/usr/lib/x86_64-linux-gnu/mecab/dic`, which was none
of the paths we were checking, so I've added that as a search path.
2018-06-21 15:56:54 -04:00
Robyn Speer
fdf064b234
doctest the README
2018-06-18 17:11:42 -04:00
Robyn Speer
c6552f923f
update README and CHANGELOG
2018-06-18 15:21:43 -04:00
Robyn Speer
7a32b56c1c
Round frequencies to 3 significant digits
2018-06-18 15:21:33 -04:00
Lance Nathan
a95b360563
Merge pull request #57 from LuminosoInsight/version2.1
...
Version 2.1
2018-06-18 12:06:47 -04:00
Robyn Speer
39a1308770
update table in README: Dutch has 5 sources
2018-06-18 11:43:52 -04:00
Robyn Speer
0280f82496
fix typo in previous changelog entry
2018-06-18 10:52:28 -04:00
Robyn Speer
42efcfc1ad
relax the test that assumed the Chinese list has few ASCII words
2018-06-15 16:29:15 -04:00
Robyn Speer
ad0f046f47
fixes to tests, including that 'test.py' wasn't found by pytest
2018-06-15 15:48:41 -04:00
Robyn Speer
a975bcedae
update tests to include new languages
...
Also, it's easy to say `>=` in pytest
2018-06-12 17:55:44 -04:00
Robyn Speer
4b7e3d9655
bump version to 2.1; add test requirement for pytest
2018-06-12 17:48:24 -04:00