fix version, update instructions and changelog

2024-12-23 17:31:41 +00:00 · 2021-02-18 18:25:16 -05:00 · 2021-02-18 18:25:16 -05:00 · d99ac1051a
commit d99ac1051a
parent 2cc58d68ad
3 changed files with 24 additions and 60 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,3 +1,14 @@
 ## Version 2.4.2 (2021-02-19)
 - When tokenizing Japanese or Korean, MeCab's dictionaries no longer have to
  be installed separately as system packages. They can now be found via the
  Python packages `ipadic` and `mecab-ko-dic`.
 - When the tokenizer had to infer word boundaries in languages without spaces,
  inputs that were too long (such as the letter 'l' repeated 800 times) were
  causing overflow errors. We changed the sequence of operations so that it
  no longer overflows, and such inputs simply get a frequency of 0.
 ## Version 2.4.1 (2021-02-09)
 - Changed a log message to not try to call a language by name, to remove
--- a/README.md
+++ b/README.md
@ -381,67 +381,16 @@ Simplified Chinese), you will get the `zh` wordlist, for example.
 ## Additional CJK installation
 Chinese, Japanese, and Korean have additional external dependencies so that
-they can be tokenized correctly. Here we'll explain how to set them up,
+they can be tokenized correctly. They can all be installed at once by requesting
-in increasing order of difficulty.
+the 'cjk' feature:
    pip install wordfreq[cjk]
-### Chinese
+Tokenizing Chinese depends on the `jieba` package, tokenizing Japanese depends
 on `mecab-python` and `ipadic`, and tokenizing Korean depends on `mecab-python`
 and `mecab-ko-dic`.
-To be able to look up word frequencies in Chinese, you need Jieba, a
+As of version 2.4.2, you no longer have to install dictionaries separately.
 pure-Python Chinese tokenizer:
    pip3 install jieba
 ### Japanese
 We use MeCab, by Taku Kudo, to tokenize Japanese. To use this in wordfreq, three
 things need to be installed:
  * The MeCab development library (called `libmecab-dev` on Ubuntu)
  * The UTF-8 version of the `ipadic` Japanese dictionary
    (called `mecab-ipadic-utf8` on Ubuntu)
  * The `mecab-python3` Python interface
 To install these three things on Ubuntu, you can run:
 ```sh
 sudo apt-get install python3-dev libmecab-dev mecab-ipadic-utf8
 pip3 install mecab-python3
 ```
 If you choose to install `ipadic` from somewhere else or from its source code,
 be sure it's configured to use UTF-8. By default it will use EUC-JP, which will
 give you nonsense results.
 ### Korean
 Korean also uses MeCab, with a Korean dictionary package by Yongwoon Lee and
 Yungho Yu. This dictionary is not available as an Ubuntu package.
 Here's a process you can use to install the Korean dictionary and the other
 MeCab dependencies:
 ```sh
 sudo apt-get install libmecab-dev mecab-utils
 pip3 install mecab-python3
 wget https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.1-20150920.tar.gz
 tar xvf mecab-ko-dic-2.0.1-20150920.tar.gz
 cd mecab-ko-dic-2.0.1-20150920
 ./autogen.sh
 ./configure
 make
 sudo make install
 ```
 If wordfreq cannot find the Japanese or Korean data for MeCab when asked to
 tokenize those languages, it will raise an error and show you the list of
 paths it searched.
 Sorry that this is difficult. We tried to just package the data files we need
 with wordfreq, like we do for Chinese, but PyPI would reject the package for
 being too large.
 ## License
--- a/setup.py
+++ b/setup.py
@ -33,7 +33,7 @@ dependencies = [
 setup(
    name="wordfreq",
-    version='2.5.0',
+    version='2.4.2',
    maintainer='Robyn Speer',
    maintainer_email='rspeer@luminoso.com',
    url='http://github.com/LuminosoInsight/wordfreq/',
@ -55,8 +55,12 @@ setup(
    #
    # Similarly, jieba is required for Chinese word frequencies.
    extras_require={
        # previous names for extras
        'mecab': ['mecab-python3', 'ipadic', 'mecab-ko-dic'],
-        'jieba': ['jieba >= 0.42']
+        'jieba': ['jieba >= 0.42'],
        # get them all at once
        'cjk': ['mecab-python3', 'ipadic', 'mecab-ko-dic', 'jieba >= 0.42']
    },
    tests_require=['pytest', 'mecab-python3', 'jieba >= 0.42', 'ipadic', 'mecab-ko-dic'],
 )