Commit Graph

31 Commits

Author SHA1 Message Date
Robyn Speer
b042f2be9d remove unnecessary enumeration from top_n.py 2017-09-08 16:52:06 -04:00
Robyn Speer
46e32fbd36 v1.7: update tokenization, update data, add bn and mk 2017-08-25 17:37:48 -04:00
Robyn Speer
a099a5a881 Remove ninja2dot script, which is no longer used 2017-02-01 14:49:44 -05:00
Robyn Speer
db30d09947 load the Chinese character mapping from a .msgpack.gz file
Former-commit-id: 6cf4210187
2015-09-22 16:32:33 -04:00
Robyn Speer
fe8a6b51e7 document what this file is for
Former-commit-id: 06f8b29971
2015-09-22 15:31:27 -04:00
Robyn Speer
7d1c2e72e4 WIP: Traditional Chinese
Former-commit-id: 7906a671ea
2015-09-04 18:52:37 -04:00
Robyn Speer
f66d03b1b9 Add SUBTLEX as a source of English and Chinese data
Meanwhile, fix up the dependency graph thingy. It's actually kind of
legible now.


Former-commit-id: 2d58ba94f2
2015-09-03 18:13:13 -04:00
Robyn Speer
247d7c6579 update the build diagram and its script
Former-commit-id: 5def3a7897
2015-08-28 17:47:04 -04:00
Robyn Speer
3674d35501 remove obsolete gen_regex.py
Former-commit-id: 102bc715ae
2015-08-24 17:11:18 -04:00
Joshua Chin
3b6b8d3ab1 made single line docstring single line
Former-commit-id: c70ddf00ea
2015-07-20 10:29:02 -04:00
Joshua Chin
4bfdd263b7 added docstring and moved to scripts
Former-commit-id: 5d26c9f57f
2015-07-17 14:56:18 -04:00
Joshua Chin
821fbb1b02 added comment about parsing_range
Former-commit-id: 7706496080
2015-07-10 14:27:48 -04:00
Joshua Chin
6dee3054de %c is a thing
Former-commit-id: 447fb7aacd
2015-07-10 14:23:06 -04:00
Joshua Chin
c77449785b merge
Former-commit-id: 2612bc23ff
2015-07-10 14:12:42 -04:00
Joshua Chin
8cbcef9bef updated func_to_regex to remove end check
Former-commit-id: 87830d138b
2015-07-10 14:10:26 -04:00
Andrew Lin
2262088b5f Improve variable names.
Former-commit-id: 95da6985d4
2015-07-10 14:02:33 -04:00
Joshua Chin
e23a8c0dc6 created alternate implementation of func-to-regex
Former-commit-id: 7c189ef129
2015-07-10 11:03:57 -04:00
Andrew Lin
54eece5e8c Clarify the algorithm for range calculation using an explicit variable.
Former-commit-id: 6755741e7d
2015-07-09 16:47:33 -04:00
Andrew Lin
8a3638bc59 Whoops -- put back 'file' as a variable name. (The perils of trusting syntax highlighting...)
Former-commit-id: f591e74663
2015-07-09 16:18:56 -04:00
Andrew Lin
05e14592af Tweaks to the regex generator for brevity:
* Don't repeat the logic that generates the ranges
  * Include only unassigned characters between two accepted ranges; this causes the resulting
    regexes to be a bit more readable.
  * Rearrange the script itself to avoid long lambdas and group helper functions together
  * Precompute the list of all the character classes for speed and terseness


Former-commit-id: cc6920d7e4
2015-07-08 15:29:31 -04:00
Joshua Chin
d4409a2214 removed unused imports
Former-commit-id: b9578ae21e
2015-07-07 16:21:22 -04:00
Joshua Chin
7e9338f87e cleaned up gen regex
Former-commit-id: 27ea107e6f
2015-07-07 16:00:24 -04:00
Joshua Chin
4389422958 updated emoji parser
Former-commit-id: f04ca8fc9e
2015-07-07 15:43:34 -04:00
Joshua Chin
94ba6e650f updated docstring
Former-commit-id: 9b851f3afe
2015-07-07 15:33:51 -04:00
Joshua Chin
a87d84b796 fixed spacing
Former-commit-id: ae4699029d
2015-07-07 15:23:15 -04:00
Joshua Chin
cb4e444723 fixed gen_regex
Former-commit-id: 5510fce675
2015-07-07 15:22:04 -04:00
Joshua Chin
a408e6f96a fix grammar
Former-commit-id: bd172594d3
2015-07-07 14:59:28 -04:00
Joshua Chin
02526f658c updated _emoji_char_class docstring
Former-commit-id: 10b5727725
2015-07-07 14:58:50 -04:00
Joshua Chin
d875aa8842 updated gen_regex to be run as script
Former-commit-id: 22fbea4248
2015-07-07 14:50:56 -04:00
Joshua Chin
3d221f0605 updated imports
Former-commit-id: f2b615b0f0
2015-07-07 14:46:42 -04:00
Joshua Chin
93681e43b3 factored out regex generation
Former-commit-id: 476a909e4d
2015-07-07 14:38:21 -04:00