Robyn Speer
b042f2be9d
remove unnecessary enumeration from top_n.py
2017-09-08 16:52:06 -04:00
Robyn Speer
46e32fbd36
v1.7: update tokenization, update data, add bn
and mk
2017-08-25 17:37:48 -04:00
Robyn Speer
a099a5a881
Remove ninja2dot script, which is no longer used
2017-02-01 14:49:44 -05:00
Robyn Speer
db30d09947
load the Chinese character mapping from a .msgpack.gz file
...
Former-commit-id: 6cf4210187
2015-09-22 16:32:33 -04:00
Robyn Speer
fe8a6b51e7
document what this file is for
...
Former-commit-id: 06f8b29971
2015-09-22 15:31:27 -04:00
Robyn Speer
7d1c2e72e4
WIP: Traditional Chinese
...
Former-commit-id: 7906a671ea
2015-09-04 18:52:37 -04:00
Robyn Speer
f66d03b1b9
Add SUBTLEX as a source of English and Chinese data
...
Meanwhile, fix up the dependency graph thingy. It's actually kind of
legible now.
Former-commit-id: 2d58ba94f2
2015-09-03 18:13:13 -04:00
Robyn Speer
247d7c6579
update the build diagram and its script
...
Former-commit-id: 5def3a7897
2015-08-28 17:47:04 -04:00
Robyn Speer
3674d35501
remove obsolete gen_regex.py
...
Former-commit-id: 102bc715ae
2015-08-24 17:11:18 -04:00
Joshua Chin
3b6b8d3ab1
made single line docstring single line
...
Former-commit-id: c70ddf00ea
2015-07-20 10:29:02 -04:00
Joshua Chin
4bfdd263b7
added docstring and moved to scripts
...
Former-commit-id: 5d26c9f57f
2015-07-17 14:56:18 -04:00
Joshua Chin
821fbb1b02
added comment about parsing_range
...
Former-commit-id: 7706496080
2015-07-10 14:27:48 -04:00
Joshua Chin
6dee3054de
%c is a thing
...
Former-commit-id: 447fb7aacd
2015-07-10 14:23:06 -04:00
Joshua Chin
c77449785b
merge
...
Former-commit-id: 2612bc23ff
2015-07-10 14:12:42 -04:00
Joshua Chin
8cbcef9bef
updated func_to_regex to remove end check
...
Former-commit-id: 87830d138b
2015-07-10 14:10:26 -04:00
Andrew Lin
2262088b5f
Improve variable names.
...
Former-commit-id: 95da6985d4
2015-07-10 14:02:33 -04:00
Joshua Chin
e23a8c0dc6
created alternate implementation of func-to-regex
...
Former-commit-id: 7c189ef129
2015-07-10 11:03:57 -04:00
Andrew Lin
54eece5e8c
Clarify the algorithm for range calculation using an explicit variable.
...
Former-commit-id: 6755741e7d
2015-07-09 16:47:33 -04:00
Andrew Lin
8a3638bc59
Whoops -- put back 'file' as a variable name. (The perils of trusting syntax highlighting...)
...
Former-commit-id: f591e74663
2015-07-09 16:18:56 -04:00
Andrew Lin
05e14592af
Tweaks to the regex generator for brevity:
...
* Don't repeat the logic that generates the ranges
* Include only unassigned characters between two accepted ranges; this causes the resulting
regexes to be a bit more readable.
* Rearrange the script itself to avoid long lambdas and group helper functions together
* Precompute the list of all the character classes for speed and terseness
Former-commit-id: cc6920d7e4
2015-07-08 15:29:31 -04:00
Joshua Chin
d4409a2214
removed unused imports
...
Former-commit-id: b9578ae21e
2015-07-07 16:21:22 -04:00
Joshua Chin
7e9338f87e
cleaned up gen regex
...
Former-commit-id: 27ea107e6f
2015-07-07 16:00:24 -04:00
Joshua Chin
4389422958
updated emoji parser
...
Former-commit-id: f04ca8fc9e
2015-07-07 15:43:34 -04:00
Joshua Chin
94ba6e650f
updated docstring
...
Former-commit-id: 9b851f3afe
2015-07-07 15:33:51 -04:00
Joshua Chin
a87d84b796
fixed spacing
...
Former-commit-id: ae4699029d
2015-07-07 15:23:15 -04:00
Joshua Chin
cb4e444723
fixed gen_regex
...
Former-commit-id: 5510fce675
2015-07-07 15:22:04 -04:00
Joshua Chin
a408e6f96a
fix grammar
...
Former-commit-id: bd172594d3
2015-07-07 14:59:28 -04:00
Joshua Chin
02526f658c
updated _emoji_char_class docstring
...
Former-commit-id: 10b5727725
2015-07-07 14:58:50 -04:00
Joshua Chin
d875aa8842
updated gen_regex to be run as script
...
Former-commit-id: 22fbea4248
2015-07-07 14:50:56 -04:00
Joshua Chin
3d221f0605
updated imports
...
Former-commit-id: f2b615b0f0
2015-07-07 14:46:42 -04:00
Joshua Chin
93681e43b3
factored out regex generation
...
Former-commit-id: 476a909e4d
2015-07-07 14:38:21 -04:00