Robyn Speer
174ecf580a
update dependencies and test for consistent results
2020-09-08 16:03:33 -04:00
Lance Nathan
e3f87d4aed
Merge pull request #77 from LuminosoInsight/regex-apostrophe-fix
...
Fix regex's inconsistent word breaking around apostrophes
2020-04-28 16:19:40 -04:00
Robyn Speer
becf94f767
update version and changelog
2020-04-28 15:24:24 -04:00
Robyn Speer
96e7792a4a
fix regex's inconsistent word breaking around apostrophes
2020-04-28 15:19:56 -04:00
Robyn Speer
3b7382d770
update CHANGELOG for 2.3.1
2020-04-22 11:12:02 -04:00
Robyn Speer
59f4a08920
packaging fix: require msgpack >= 1.0
2020-04-22 11:10:03 -04:00
Lance Nathan
af22c03609
Merge pull request #75 from LuminosoInsight/language-match-update
...
use langcodes 2.0 and deprecate 'match_cutoff'
2020-04-20 14:48:58 -04:00
Robyn Speer
258670b823
update changelog for 2.3
2020-04-16 15:51:20 -04:00
Robyn Speer
3aeeeb64c7
use langcodes 2.0 and deprecate 'match_cutoff'
2020-04-16 14:09:30 -04:00
Moss Collum
33bfb1409d
Merge pull request #74 from LuminosoInsight/msgpack-1.0-bugfix
...
Fix code affected by a breaking change in msgpack 1.0
2020-02-28 13:05:37 -05:00
Lance Nathan
86e988b838
Fix code affected by a breaking change in msgpack 1.0
...
The msgpack readme explains: "Default value of strict_map_key is changed to
True to avoid hashdos. You need to pass strict_map_key=False if you have data
which contain map keys which type is not bytes or str."
chinese.py loads SIMPLIFIED_MAP from disk. Since it is a str.translate
dictionary, its keys are numbers. And since it's a dictionary we created
ourselves, there's no hashdos concern, so we can load it with
strict_map_key=False.
2020-02-28 13:02:45 -05:00
Lance Nathan
401889d7c8
Merge pull request #73 from LuminosoInsight/add-mailmap
...
Add a mailmap
2019-12-18 13:59:36 -05:00
Robyn Speer
f91cdb3e9b
add a mailmap
2019-12-18 13:52:22 -05:00
Lance Nathan
cea8dcbea9
Merge pull request #71 from LuminosoInsight/pytest-fixes
...
Fix a deprecation warning by using raw strings
2019-08-14 16:25:42 -04:00
Robyn Speer
55e72977a7
fix a deprecation warning by using raw strings
2019-07-16 17:27:14 -04:00
Lance Nathan
170e3c6536
Merge pull request #70 from LuminosoInsight/pytest-fixes
...
Fixes to scripts that accidentally run during tests
2019-04-16 11:41:27 -04:00
Robyn Speer
1f61c9b27a
Protect top_n from running on import
2019-04-16 11:33:22 -04:00
Robyn Speer
bb1bd50c44
ignore the 'scripts' dir when collecting tests
2019-02-20 17:21:07 -05:00
Moss Collum
a17587dcbb
Merge pull request #69 from LuminosoInsight/revert-68-pytest-jenkins
...
Revert "Build with Pytest on Jenkins"
2019-02-13 18:11:57 -05:00
Moss Collum
26cbb5a7c8
Revert "Build with Pytest on Jenkins"
2019-02-13 18:11:44 -05:00
Lance Nathan
53ec5d87d2
Merge pull request #68 from LuminosoInsight/pytest-jenkins
...
Build with Pytest on Jenkins
2019-02-13 17:57:16 -05:00
Moss Collum
92c3ca0a66
Build with Pytest on Jenkins
2019-02-13 17:56:20 -05:00
Robyn Speer
0931f1297d
update changelog for v2.2.1
2019-02-05 15:58:10 -05:00
Lance Nathan
1442ee044d
Merge pull request #66 from LuminosoInsight/update-msgpack-call
...
Update msgpack parameter
2019-02-05 11:17:07 -05:00
Robyn Speer
36fd42ca08
update msgpack call in scripts/make_chinese_mapping
2019-02-05 11:16:22 -05:00
Robyn Speer
c7a14cd4ab
update encoding='utf-8' to raw=False
2019-02-04 14:57:38 -05:00
Moss Collum
0b69118558
Add Jenkinsfile to drive internal build scripts
2019-02-01 19:05:35 -05:00
Robyn Speer
4cd7b4bada
Allow a wider range of 'regex' versions
...
The behavior of segmentation shouldn't change within this range, and it
includes the version currently used by SpaCy.
2018-10-25 11:07:55 -04:00
Lance Nathan
fa8be1962b
Merge pull request #62 from LuminosoInsight/name-update
...
Update my name and the Zenodo citation
2018-10-03 17:30:47 -04:00
Robyn Speer
51ca052b62
Update my name and the Zenodo citation
2018-10-03 17:27:10 -04:00
Lance Nathan
bc12599010
Merge pull request #60 from LuminosoInsight/gender-neutral-at
...
Recognize "@" in gender-neutral word endings as part of the token
2018-07-24 18:16:31 -04:00
Rob Speer
d9fc6ec42c
update the changelog for version 2.2
2018-07-23 16:38:39 -04:00
Rob Speer
0644c8920a
Update README to describe @ tokenization
2018-07-23 11:21:44 -04:00
Rob Speer
d06a6a48c5
include data from xc rebuild
2018-07-15 01:01:35 -04:00
Rob Speer
b2d242e8bf
Recognize "@" in gender-neutral word endings as part of the token
2018-07-03 13:22:56 -04:00
Rob Speer
ca9cf7d90f
update the CHANGELOG for MeCab fix
2018-06-26 11:31:03 -04:00
Lance Nathan
3961a28973
Merge pull request #59 from LuminosoInsight/korean-install-fixes
...
Korean install fixes
2018-06-26 11:08:06 -04:00
Lance Nathan
a619ba6457
Merge pull request #58 from LuminosoInsight/significant-figures
...
Round wordfreq output to 3 sig. figs, and update documentation
2018-06-25 18:53:39 -04:00
Rob Speer
676686fda1
Fix instructions and search path for mecab-ko-dic
...
I'm starting a new Python environment on a new Ubuntu installation. You
never know when a huge yak will show up and demand to be shaved.
I tried following the directions in the README, and found that a couple
of steps were missing. I've added those.
When you follow those steps, it appears to install the MeCab Korean
dictionary in `/usr/lib/x86_64-linux-gnu/mecab/dic`, which was none
of the paths we were checking, so I've added that as a search path.
2018-06-21 15:56:54 -04:00
Rob Speer
5e05c942ac
doctest the README
2018-06-18 17:11:42 -04:00
Rob Speer
1dc763c9c5
update README and CHANGELOG
2018-06-18 15:21:43 -04:00
Rob Speer
c3b32b3c4a
Round frequencies to 3 significant digits
2018-06-18 15:21:33 -04:00
Lance Nathan
0911e90ba0
Merge pull request #57 from LuminosoInsight/version2.1
...
Version 2.1
2018-06-18 12:06:47 -04:00
Rob Speer
2b85a1cef2
update table in README: Dutch has 5 sources
2018-06-18 11:43:52 -04:00
Rob Speer
52aae3459d
fix typo in previous changelog entry
2018-06-18 10:52:28 -04:00
Rob Speer
2f6b87c86b
relax the test that assumed the Chinese list has few ASCII words
2018-06-15 16:29:15 -04:00
Rob Speer
57f676f4a6
fixes to tests, including that 'test.py' wasn't found by pytest
2018-06-15 15:48:41 -04:00
Rob Speer
93e3e03c60
update tests to include new languages
...
Also, it's easy to say `>=` in pytest
2018-06-12 17:55:44 -04:00
Rob Speer
93ddc192d8
bump version to 2.1; add test requirement for pytest
2018-06-12 17:48:24 -04:00
Rob Speer
ff4f7bf3f6
Merge remote-tracking branch 'origin/pytest' into version2.1
2018-06-12 17:46:48 -04:00