Commit Graph

639 Commits

Author SHA1 Message Date
Robyn Speer
ad02d96f1b update dependencies and test for consistent results 2020-09-08 16:03:33 -04:00
Lance Nathan
ca4681b361 Merge pull request #77 from LuminosoInsight/regex-apostrophe-fix
Fix regex's inconsistent word breaking around apostrophes
2020-04-28 16:19:40 -04:00
Robyn Speer
0ff812a711 update version and changelog 2020-04-28 15:24:24 -04:00
Robyn Speer
13ce4606b2 fix regex's inconsistent word breaking around apostrophes 2020-04-28 15:19:56 -04:00
Robyn Speer
86ae2a610f update CHANGELOG for 2.3.1 2020-04-22 11:12:02 -04:00
Robyn Speer
26b4175f3b packaging fix: require msgpack >= 1.0 2020-04-22 11:10:03 -04:00
Lance Nathan
7c537134ae Merge pull request #75 from LuminosoInsight/language-match-update
use langcodes 2.0 and deprecate 'match_cutoff'
2020-04-20 14:48:58 -04:00
Robyn Speer
d45bcf97de update changelog for 2.3 2020-04-16 15:51:20 -04:00
Robyn Speer
bf795e6d6c use langcodes 2.0 and deprecate 'match_cutoff' 2020-04-16 14:09:30 -04:00
Moss Collum
40443c9a3b Merge pull request #74 from LuminosoInsight/msgpack-1.0-bugfix
Fix code affected by a breaking change in msgpack 1.0
2020-02-28 13:05:37 -05:00
Lance Nathan
45a002c1e1 Fix code affected by a breaking change in msgpack 1.0
The msgpack readme explains: "Default value of strict_map_key is changed to
True to avoid hashdos. You need to pass strict_map_key=False if you have data
which contain map keys which type is not bytes or str."

chinese.py loads SIMPLIFIED_MAP from disk.  Since it is a str.translate
dictionary, its keys are numbers.  And since it's a dictionary we created
ourselves, there's no hashdos concern, so we can load it with
strict_map_key=False.
2020-02-28 13:02:45 -05:00
Lance Nathan
e043ebb481 Merge pull request #73 from LuminosoInsight/add-mailmap
Add a mailmap
2019-12-18 13:59:36 -05:00
Robyn Speer
feab8b77fb add a mailmap 2019-12-18 13:52:22 -05:00
Lance Nathan
5f085b2c17 Merge pull request #71 from LuminosoInsight/pytest-fixes
Fix a deprecation warning by using raw strings
2019-08-14 16:25:42 -04:00
Robyn Speer
7690bd5b49 fix a deprecation warning by using raw strings 2019-07-16 17:27:14 -04:00
Lance Nathan
832d8f2fdd Merge pull request #70 from LuminosoInsight/pytest-fixes
Fixes to scripts that accidentally run during tests
2019-04-16 11:41:27 -04:00
Robyn Speer
3d02a88b14 Protect top_n from running on import 2019-04-16 11:33:22 -04:00
Robyn Speer
17b1537f2f ignore the 'scripts' dir when collecting tests 2019-02-20 17:21:07 -05:00
Moss Collum
90bbacb5cb Merge pull request #69 from LuminosoInsight/revert-68-pytest-jenkins
Revert "Build with Pytest on Jenkins"
2019-02-13 18:11:57 -05:00
Moss Collum
50ea040d65 Revert "Build with Pytest on Jenkins" 2019-02-13 18:11:44 -05:00
Lance Nathan
f467504835 Merge pull request #68 from LuminosoInsight/pytest-jenkins
Build with Pytest on Jenkins
2019-02-13 17:57:16 -05:00
Moss Collum
e014f1abf7 Build with Pytest on Jenkins 2019-02-13 17:56:20 -05:00
Robyn Speer
a3834180c9 update changelog for v2.2.1 2019-02-05 15:58:10 -05:00
Lance Nathan
96b9808550 Merge pull request #66 from LuminosoInsight/update-msgpack-call
Update msgpack parameter
2019-02-05 11:17:07 -05:00
Robyn Speer
dd72051929 update msgpack call in scripts/make_chinese_mapping 2019-02-05 11:16:22 -05:00
Robyn Speer
61a1604b38 update encoding='utf-8' to raw=False 2019-02-04 14:57:38 -05:00
Moss Collum
65a6a89993 Add Jenkinsfile to drive internal build scripts 2019-02-01 19:05:35 -05:00
Robyn Speer
d30183a7d7 Allow a wider range of 'regex' versions
The behavior of segmentation shouldn't change within this range, and it
includes the version currently used by SpaCy.
2018-10-25 11:07:55 -04:00
Lance Nathan
c1fe37bab5 Merge pull request #62 from LuminosoInsight/name-update
Update my name and the Zenodo citation
2018-10-03 17:30:47 -04:00
Robyn Speer
563e8f7444 Update my name and the Zenodo citation 2018-10-03 17:27:10 -04:00
Lance Nathan
2f8600e975 Merge pull request #60 from LuminosoInsight/gender-neutral-at
Recognize "@" in gender-neutral word endings as part of the token
2018-07-24 18:16:31 -04:00
Robyn Speer
287df17a71 update the changelog for version 2.2 2018-07-23 16:38:39 -04:00
Robyn Speer
f73406c69a Update README to describe @ tokenization 2018-07-23 11:21:44 -04:00
Robyn Speer
86b928f967 include data from xc rebuild 2018-07-15 01:01:35 -04:00
Robyn Speer
65692c3d81 Recognize "@" in gender-neutral word endings as part of the token 2018-07-03 13:22:56 -04:00
Robyn Speer
7bf69595bb update the CHANGELOG for MeCab fix 2018-06-26 11:31:03 -04:00
Lance Nathan
0149e9ec7f Merge pull request #59 from LuminosoInsight/korean-install-fixes
Korean install fixes
2018-06-26 11:08:06 -04:00
Lance Nathan
79caa526c3 Merge pull request #58 from LuminosoInsight/significant-figures
Round wordfreq output to 3 sig. figs, and update documentation
2018-06-25 18:53:39 -04:00
Robyn Speer
830157d8e4 Fix instructions and search path for mecab-ko-dic
I'm starting a new Python environment on a new Ubuntu installation. You
never know when a huge yak will show up and demand to be shaved.

I tried following the directions in the README, and found that a couple
of steps were missing. I've added those.

When you follow those steps, it appears to install the MeCab Korean
dictionary in `/usr/lib/x86_64-linux-gnu/mecab/dic`, which was none
of the paths we were checking, so I've added that as a search path.
2018-06-21 15:56:54 -04:00
Robyn Speer
fdf064b234 doctest the README 2018-06-18 17:11:42 -04:00
Robyn Speer
c6552f923f update README and CHANGELOG 2018-06-18 15:21:43 -04:00
Robyn Speer
7a32b56c1c Round frequencies to 3 significant digits 2018-06-18 15:21:33 -04:00
Lance Nathan
a95b360563 Merge pull request #57 from LuminosoInsight/version2.1
Version 2.1
2018-06-18 12:06:47 -04:00
Robyn Speer
39a1308770 update table in README: Dutch has 5 sources 2018-06-18 11:43:52 -04:00
Robyn Speer
0280f82496 fix typo in previous changelog entry 2018-06-18 10:52:28 -04:00
Robyn Speer
42efcfc1ad relax the test that assumed the Chinese list has few ASCII words 2018-06-15 16:29:15 -04:00
Robyn Speer
ad0f046f47 fixes to tests, including that 'test.py' wasn't found by pytest 2018-06-15 15:48:41 -04:00
Robyn Speer
a975bcedae update tests to include new languages
Also, it's easy to say `>=` in pytest
2018-06-12 17:55:44 -04:00
Robyn Speer
4b7e3d9655 bump version to 2.1; add test requirement for pytest 2018-06-12 17:48:24 -04:00
Robyn Speer
3259c4a375 Merge remote-tracking branch 'origin/pytest' into version2.1 2018-06-12 17:46:48 -04:00