Robyn Speer
5986342bc6
update README examples
2020-10-01 16:05:43 -04:00
Robyn Speer
1f61c9b27a
Protect top_n from running on import
2019-04-16 11:33:22 -04:00
Robyn Speer
36fd42ca08
update msgpack call in scripts/make_chinese_mapping
2019-02-05 11:16:22 -05:00
Rob Speer
61b2e4062d
remove unnecessary enumeration from top_n.py
2017-09-08 16:52:06 -04:00
Rob Speer
e3352392cc
v1.7: update tokenization, update data, add bn
and mk
2017-08-25 17:37:48 -04:00
Rob Speer
b5b653f0a1
Remove ninja2dot script, which is no longer used
2017-02-01 14:49:44 -05:00
Rob Speer
42ccba4fa6
load the Chinese character mapping from a .msgpack.gz file
...
Former-commit-id: 6cf4210187
2015-09-22 16:32:33 -04:00
Rob Speer
e12a42f38a
document what this file is for
...
Former-commit-id: 06f8b29971
2015-09-22 15:31:27 -04:00
Rob Speer
e2a3758832
WIP: Traditional Chinese
...
Former-commit-id: 7906a671ea
2015-09-04 18:52:37 -04:00
Rob Speer
cb5b696ffa
Add SUBTLEX as a source of English and Chinese data
...
Meanwhile, fix up the dependency graph thingy. It's actually kind of
legible now.
Former-commit-id: 2d58ba94f2
2015-09-03 18:13:13 -04:00
Rob Speer
4aac7bdd65
update the build diagram and its script
...
Former-commit-id: 5def3a7897
2015-08-28 17:47:04 -04:00
Rob Speer
759a8199fb
remove obsolete gen_regex.py
...
Former-commit-id: 102bc715ae
2015-08-24 17:11:18 -04:00
Joshua Chin
40ba602c10
made single line docstring single line
...
Former-commit-id: c70ddf00ea
2015-07-20 10:29:02 -04:00
Joshua Chin
2180f71296
added docstring and moved to scripts
...
Former-commit-id: 5d26c9f57f
2015-07-17 14:56:18 -04:00
Joshua Chin
0594eb65c2
added comment about parsing_range
...
Former-commit-id: 7706496080
2015-07-10 14:27:48 -04:00
Joshua Chin
275b761fe1
%c is a thing
...
Former-commit-id: 447fb7aacd
2015-07-10 14:23:06 -04:00
Joshua Chin
1578a1eb0d
merge
...
Former-commit-id: 2612bc23ff
2015-07-10 14:12:42 -04:00
Joshua Chin
b3ae254f87
updated func_to_regex to remove end check
...
Former-commit-id: 87830d138b
2015-07-10 14:10:26 -04:00
Andrew Lin
b77cb1ac75
Improve variable names.
...
Former-commit-id: 95da6985d4
2015-07-10 14:02:33 -04:00
Joshua Chin
648f15e997
created alternate implementation of func-to-regex
...
Former-commit-id: 7c189ef129
2015-07-10 11:03:57 -04:00
Andrew Lin
8e89671560
Clarify the algorithm for range calculation using an explicit variable.
...
Former-commit-id: 6755741e7d
2015-07-09 16:47:33 -04:00
Andrew Lin
2b7d1249f6
Whoops -- put back 'file' as a variable name. (The perils of trusting syntax highlighting...)
...
Former-commit-id: f591e74663
2015-07-09 16:18:56 -04:00
Andrew Lin
8b3c5348e3
Tweaks to the regex generator for brevity:
...
* Don't repeat the logic that generates the ranges
* Include only unassigned characters between two accepted ranges; this causes the resulting
regexes to be a bit more readable.
* Rearrange the script itself to avoid long lambdas and group helper functions together
* Precompute the list of all the character classes for speed and terseness
Former-commit-id: cc6920d7e4
2015-07-08 15:29:31 -04:00
Joshua Chin
b145e02ce4
removed unused imports
...
Former-commit-id: b9578ae21e
2015-07-07 16:21:22 -04:00
Joshua Chin
4d3123e2ee
cleaned up gen regex
...
Former-commit-id: 27ea107e6f
2015-07-07 16:00:24 -04:00
Joshua Chin
a5dc6eb5fc
updated emoji parser
...
Former-commit-id: f04ca8fc9e
2015-07-07 15:43:34 -04:00
Joshua Chin
0589bed362
updated docstring
...
Former-commit-id: 9b851f3afe
2015-07-07 15:33:51 -04:00
Joshua Chin
950e41c8bb
fixed spacing
...
Former-commit-id: ae4699029d
2015-07-07 15:23:15 -04:00
Joshua Chin
aeea503739
fixed gen_regex
...
Former-commit-id: 5510fce675
2015-07-07 15:22:04 -04:00
Joshua Chin
f1e71839ea
fix grammar
...
Former-commit-id: bd172594d3
2015-07-07 14:59:28 -04:00
Joshua Chin
589bb624af
updated _emoji_char_class docstring
...
Former-commit-id: 10b5727725
2015-07-07 14:58:50 -04:00
Joshua Chin
b81c04a182
updated gen_regex to be run as script
...
Former-commit-id: 22fbea4248
2015-07-07 14:50:56 -04:00
Joshua Chin
20c4930435
updated imports
...
Former-commit-id: f2b615b0f0
2015-07-07 14:46:42 -04:00
Joshua Chin
6deced5244
factored out regex generation
...
Former-commit-id: 476a909e4d
2015-07-07 14:38:21 -04:00