mirror of
https://github.com/rspeer/wordfreq.git
synced 2024-12-24 09:51:38 +00:00
Document the version of Unicode used to build the regexes.
Former-commit-id: 9f8464c2d1
This commit is contained in:
parent
8b3c5348e3
commit
8961729401
@ -21,6 +21,12 @@ install them on Ubuntu:
|
|||||||
sudo apt-get install mecab-ipadic-utf8 libmecab-dev
|
sudo apt-get install mecab-ipadic-utf8 libmecab-dev
|
||||||
pip3 install mecab-python3
|
pip3 install mecab-python3
|
||||||
|
|
||||||
|
## Unicode data
|
||||||
|
|
||||||
|
The tokenizers used to split non-Japanese phrases use regexes built using the
|
||||||
|
`unicodedata` module from Python 3.4, which uses Unicode version 6.3.0. To
|
||||||
|
update these regexes, run `scripts/gen_regex.py`.
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
`wordfreq` is freely redistributable under the MIT license (see
|
`wordfreq` is freely redistributable under the MIT license (see
|
||||||
|
Loading…
Reference in New Issue
Block a user