Fix code affected by a breaking change in msgpack 1.0

The msgpack readme explains: "Default value of strict_map_key is changed to
True to avoid hashdos. You need to pass strict_map_key=False if you have data
which contain map keys which type is not bytes or str."

chinese.py loads SIMPLIFIED_MAP from disk.  Since it is a str.translate
dictionary, its keys are numbers.  And since it's a dictionary we created
ourselves, there's no hashdos concern, so we can load it with
strict_map_key=False.
This commit is contained in:
Lance Nathan 2020-02-28 12:51:18 -05:00
parent e043ebb481
commit 45a002c1e1
3 changed files with 8 additions and 2 deletions

View File

@ -1,3 +1,9 @@
## Version 2.2.2 (2020-02-28)
Library change:
- Fixed an incompatibility with newly-released `msgpack 1.0`.
## Version 2.2.1 (2019-02-05)
Library changes:

View File

@ -35,7 +35,7 @@ if sys.version_info < (3, 4):
setup(
name="wordfreq",
version='2.2.1',
version='2.2.2',
maintainer='Robyn Speer',
maintainer_email='rspeer@luminoso.com',
url='http://github.com/LuminosoInsight/wordfreq/',

View File

@ -6,7 +6,7 @@ import gzip
DICT_FILENAME = resource_filename('wordfreq', 'data/jieba_zh.txt')
ORIG_DICT_FILENAME = resource_filename('wordfreq', 'data/jieba_zh_orig.txt')
SIMP_MAP_FILENAME = resource_filename('wordfreq', 'data/_chinese_mapping.msgpack.gz')
SIMPLIFIED_MAP = msgpack.load(gzip.open(SIMP_MAP_FILENAME), raw=False)
SIMPLIFIED_MAP = msgpack.load(gzip.open(SIMP_MAP_FILENAME), raw=False, strict_map_key=False)
jieba_tokenizer = None
jieba_orig_tokenizer = None