mirror of
https://github.com/rspeer/wordfreq.git
synced 2024-12-23 17:31:41 +00:00
Document the version of Unicode used to build the regexes.
This commit is contained in:
parent
cc6920d7e4
commit
9f8464c2d1
@ -21,6 +21,12 @@ install them on Ubuntu:
|
||||
sudo apt-get install mecab-ipadic-utf8 libmecab-dev
|
||||
pip3 install mecab-python3
|
||||
|
||||
## Unicode data
|
||||
|
||||
The tokenizers used to split non-Japanese phrases use regexes built using the
|
||||
`unicodedata` module from Python 3.4, which uses Unicode version 6.3.0. To
|
||||
update these regexes, run `scripts/gen_regex.py`.
|
||||
|
||||
## License
|
||||
|
||||
`wordfreq` is freely redistributable under the MIT license (see
|
||||
|
Loading…
Reference in New Issue
Block a user