Commit Graph

589 Commits

Author SHA1 Message Date
Robyn Speer
174ecf580a update dependencies and test for consistent results 2020-09-08 16:03:33 -04:00
Lance Nathan
e3f87d4aed
Merge pull request #77 from LuminosoInsight/regex-apostrophe-fix
Fix regex's inconsistent word breaking around apostrophes
2020-04-28 16:19:40 -04:00
Robyn Speer
becf94f767 update version and changelog 2020-04-28 15:24:24 -04:00
Robyn Speer
96e7792a4a fix regex's inconsistent word breaking around apostrophes 2020-04-28 15:19:56 -04:00
Robyn Speer
3b7382d770 update CHANGELOG for 2.3.1 2020-04-22 11:12:02 -04:00
Robyn Speer
59f4a08920 packaging fix: require msgpack >= 1.0 2020-04-22 11:10:03 -04:00
Lance Nathan
af22c03609
Merge pull request #75 from LuminosoInsight/language-match-update
use langcodes 2.0 and deprecate 'match_cutoff'
2020-04-20 14:48:58 -04:00
Robyn Speer
258670b823 update changelog for 2.3 2020-04-16 15:51:20 -04:00
Robyn Speer
3aeeeb64c7 use langcodes 2.0 and deprecate 'match_cutoff' 2020-04-16 14:09:30 -04:00
Moss Collum
33bfb1409d
Merge pull request #74 from LuminosoInsight/msgpack-1.0-bugfix
Fix code affected by a breaking change in msgpack 1.0
2020-02-28 13:05:37 -05:00
Lance Nathan
86e988b838 Fix code affected by a breaking change in msgpack 1.0
The msgpack readme explains: "Default value of strict_map_key is changed to
True to avoid hashdos. You need to pass strict_map_key=False if you have data
which contain map keys which type is not bytes or str."

chinese.py loads SIMPLIFIED_MAP from disk.  Since it is a str.translate
dictionary, its keys are numbers.  And since it's a dictionary we created
ourselves, there's no hashdos concern, so we can load it with
strict_map_key=False.
2020-02-28 13:02:45 -05:00
Lance Nathan
401889d7c8
Merge pull request #73 from LuminosoInsight/add-mailmap
Add a mailmap
2019-12-18 13:59:36 -05:00
Robyn Speer
f91cdb3e9b add a mailmap 2019-12-18 13:52:22 -05:00
Lance Nathan
cea8dcbea9
Merge pull request #71 from LuminosoInsight/pytest-fixes
Fix a deprecation warning by using raw strings
2019-08-14 16:25:42 -04:00
Robyn Speer
55e72977a7 fix a deprecation warning by using raw strings 2019-07-16 17:27:14 -04:00
Lance Nathan
170e3c6536
Merge pull request #70 from LuminosoInsight/pytest-fixes
Fixes to scripts that accidentally run during tests
2019-04-16 11:41:27 -04:00
Robyn Speer
1f61c9b27a Protect top_n from running on import 2019-04-16 11:33:22 -04:00
Robyn Speer
bb1bd50c44 ignore the 'scripts' dir when collecting tests 2019-02-20 17:21:07 -05:00
Moss Collum
a17587dcbb
Merge pull request #69 from LuminosoInsight/revert-68-pytest-jenkins
Revert "Build with Pytest on Jenkins"
2019-02-13 18:11:57 -05:00
Moss Collum
26cbb5a7c8
Revert "Build with Pytest on Jenkins" 2019-02-13 18:11:44 -05:00
Lance Nathan
53ec5d87d2
Merge pull request #68 from LuminosoInsight/pytest-jenkins
Build with Pytest on Jenkins
2019-02-13 17:57:16 -05:00
Moss Collum
92c3ca0a66
Build with Pytest on Jenkins 2019-02-13 17:56:20 -05:00
Robyn Speer
0931f1297d update changelog for v2.2.1 2019-02-05 15:58:10 -05:00
Lance Nathan
1442ee044d
Merge pull request #66 from LuminosoInsight/update-msgpack-call
Update msgpack parameter
2019-02-05 11:17:07 -05:00
Robyn Speer
36fd42ca08 update msgpack call in scripts/make_chinese_mapping 2019-02-05 11:16:22 -05:00
Robyn Speer
c7a14cd4ab update encoding='utf-8' to raw=False 2019-02-04 14:57:38 -05:00
Moss Collum
0b69118558 Add Jenkinsfile to drive internal build scripts 2019-02-01 19:05:35 -05:00
Robyn Speer
4cd7b4bada Allow a wider range of 'regex' versions
The behavior of segmentation shouldn't change within this range, and it
includes the version currently used by SpaCy.
2018-10-25 11:07:55 -04:00
Lance Nathan
fa8be1962b
Merge pull request #62 from LuminosoInsight/name-update
Update my name and the Zenodo citation
2018-10-03 17:30:47 -04:00
Robyn Speer
51ca052b62 Update my name and the Zenodo citation 2018-10-03 17:27:10 -04:00
Lance Nathan
bc12599010
Merge pull request #60 from LuminosoInsight/gender-neutral-at
Recognize "@" in gender-neutral word endings as part of the token
2018-07-24 18:16:31 -04:00
Rob Speer
d9fc6ec42c update the changelog for version 2.2 2018-07-23 16:38:39 -04:00
Rob Speer
0644c8920a Update README to describe @ tokenization 2018-07-23 11:21:44 -04:00
Rob Speer
d06a6a48c5 include data from xc rebuild 2018-07-15 01:01:35 -04:00
Rob Speer
b2d242e8bf Recognize "@" in gender-neutral word endings as part of the token 2018-07-03 13:22:56 -04:00
Rob Speer
ca9cf7d90f update the CHANGELOG for MeCab fix 2018-06-26 11:31:03 -04:00
Lance Nathan
3961a28973
Merge pull request #59 from LuminosoInsight/korean-install-fixes
Korean install fixes
2018-06-26 11:08:06 -04:00
Lance Nathan
a619ba6457
Merge pull request #58 from LuminosoInsight/significant-figures
Round wordfreq output to 3 sig. figs, and update documentation
2018-06-25 18:53:39 -04:00
Rob Speer
676686fda1 Fix instructions and search path for mecab-ko-dic
I'm starting a new Python environment on a new Ubuntu installation. You
never know when a huge yak will show up and demand to be shaved.

I tried following the directions in the README, and found that a couple
of steps were missing. I've added those.

When you follow those steps, it appears to install the MeCab Korean
dictionary in `/usr/lib/x86_64-linux-gnu/mecab/dic`, which was none
of the paths we were checking, so I've added that as a search path.
2018-06-21 15:56:54 -04:00
Rob Speer
5e05c942ac doctest the README 2018-06-18 17:11:42 -04:00
Rob Speer
1dc763c9c5 update README and CHANGELOG 2018-06-18 15:21:43 -04:00
Rob Speer
c3b32b3c4a Round frequencies to 3 significant digits 2018-06-18 15:21:33 -04:00
Lance Nathan
0911e90ba0
Merge pull request #57 from LuminosoInsight/version2.1
Version 2.1
2018-06-18 12:06:47 -04:00
Rob Speer
2b85a1cef2 update table in README: Dutch has 5 sources 2018-06-18 11:43:52 -04:00
Rob Speer
52aae3459d fix typo in previous changelog entry 2018-06-18 10:52:28 -04:00
Rob Speer
2f6b87c86b relax the test that assumed the Chinese list has few ASCII words 2018-06-15 16:29:15 -04:00
Rob Speer
57f676f4a6 fixes to tests, including that 'test.py' wasn't found by pytest 2018-06-15 15:48:41 -04:00
Rob Speer
93e3e03c60 update tests to include new languages
Also, it's easy to say `>=` in pytest
2018-06-12 17:55:44 -04:00
Rob Speer
93ddc192d8 bump version to 2.1; add test requirement for pytest 2018-06-12 17:48:24 -04:00
Rob Speer
ff4f7bf3f6 Merge remote-tracking branch 'origin/pytest' into version2.1 2018-06-12 17:46:48 -04:00