Document the version of Unicode used to build the regexes.

Former-commit-id: 9f8464c2d1
2024-12-24 09:51:38 +00:00 · 2015-07-08 18:48:33 -04:00 · 2015-07-08 18:48:33 -04:00 · 8961729401
commit 8961729401
parent 8b3c5348e3
1 changed files with 6 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -21,6 +21,12 @@ install them on Ubuntu:
    sudo apt-get install mecab-ipadic-utf8 libmecab-dev
    pip3 install mecab-python3
 ## Unicode data
 The tokenizers used to split non-Japanese phrases use regexes built using the
 `unicodedata` module from Python 3.4, which uses Unicode version 6.3.0.  To
 update these regexes, run `scripts/gen_regex.py`.
 ## License
 `wordfreq` is freely redistributable under the MIT license (see