Commit Graph

34 Commits

Author SHA1 Message Date
Robyn Speer
5986342bc6 update README examples 2020-10-01 16:05:43 -04:00
Robyn Speer
1f61c9b27a Protect top_n from running on import 2019-04-16 11:33:22 -04:00
Robyn Speer
36fd42ca08 update msgpack call in scripts/make_chinese_mapping 2019-02-05 11:16:22 -05:00
Rob Speer
61b2e4062d remove unnecessary enumeration from top_n.py 2017-09-08 16:52:06 -04:00
Rob Speer
e3352392cc v1.7: update tokenization, update data, add bn and mk 2017-08-25 17:37:48 -04:00
Rob Speer
b5b653f0a1 Remove ninja2dot script, which is no longer used 2017-02-01 14:49:44 -05:00
Rob Speer
42ccba4fa6 load the Chinese character mapping from a .msgpack.gz file
Former-commit-id: 6cf4210187
2015-09-22 16:32:33 -04:00
Rob Speer
e12a42f38a document what this file is for
Former-commit-id: 06f8b29971
2015-09-22 15:31:27 -04:00
Rob Speer
e2a3758832 WIP: Traditional Chinese
Former-commit-id: 7906a671ea
2015-09-04 18:52:37 -04:00
Rob Speer
cb5b696ffa Add SUBTLEX as a source of English and Chinese data
Meanwhile, fix up the dependency graph thingy. It's actually kind of
legible now.


Former-commit-id: 2d58ba94f2
2015-09-03 18:13:13 -04:00
Rob Speer
4aac7bdd65 update the build diagram and its script
Former-commit-id: 5def3a7897
2015-08-28 17:47:04 -04:00
Rob Speer
759a8199fb remove obsolete gen_regex.py
Former-commit-id: 102bc715ae
2015-08-24 17:11:18 -04:00
Joshua Chin
40ba602c10 made single line docstring single line
Former-commit-id: c70ddf00ea
2015-07-20 10:29:02 -04:00
Joshua Chin
2180f71296 added docstring and moved to scripts
Former-commit-id: 5d26c9f57f
2015-07-17 14:56:18 -04:00
Joshua Chin
0594eb65c2 added comment about parsing_range
Former-commit-id: 7706496080
2015-07-10 14:27:48 -04:00
Joshua Chin
275b761fe1 %c is a thing
Former-commit-id: 447fb7aacd
2015-07-10 14:23:06 -04:00
Joshua Chin
1578a1eb0d merge
Former-commit-id: 2612bc23ff
2015-07-10 14:12:42 -04:00
Joshua Chin
b3ae254f87 updated func_to_regex to remove end check
Former-commit-id: 87830d138b
2015-07-10 14:10:26 -04:00
Andrew Lin
b77cb1ac75 Improve variable names.
Former-commit-id: 95da6985d4
2015-07-10 14:02:33 -04:00
Joshua Chin
648f15e997 created alternate implementation of func-to-regex
Former-commit-id: 7c189ef129
2015-07-10 11:03:57 -04:00
Andrew Lin
8e89671560 Clarify the algorithm for range calculation using an explicit variable.
Former-commit-id: 6755741e7d
2015-07-09 16:47:33 -04:00
Andrew Lin
2b7d1249f6 Whoops -- put back 'file' as a variable name. (The perils of trusting syntax highlighting...)
Former-commit-id: f591e74663
2015-07-09 16:18:56 -04:00
Andrew Lin
8b3c5348e3 Tweaks to the regex generator for brevity:
* Don't repeat the logic that generates the ranges
  * Include only unassigned characters between two accepted ranges; this causes the resulting
    regexes to be a bit more readable.
  * Rearrange the script itself to avoid long lambdas and group helper functions together
  * Precompute the list of all the character classes for speed and terseness


Former-commit-id: cc6920d7e4
2015-07-08 15:29:31 -04:00
Joshua Chin
b145e02ce4 removed unused imports
Former-commit-id: b9578ae21e
2015-07-07 16:21:22 -04:00
Joshua Chin
4d3123e2ee cleaned up gen regex
Former-commit-id: 27ea107e6f
2015-07-07 16:00:24 -04:00
Joshua Chin
a5dc6eb5fc updated emoji parser
Former-commit-id: f04ca8fc9e
2015-07-07 15:43:34 -04:00
Joshua Chin
0589bed362 updated docstring
Former-commit-id: 9b851f3afe
2015-07-07 15:33:51 -04:00
Joshua Chin
950e41c8bb fixed spacing
Former-commit-id: ae4699029d
2015-07-07 15:23:15 -04:00
Joshua Chin
aeea503739 fixed gen_regex
Former-commit-id: 5510fce675
2015-07-07 15:22:04 -04:00
Joshua Chin
f1e71839ea fix grammar
Former-commit-id: bd172594d3
2015-07-07 14:59:28 -04:00
Joshua Chin
589bb624af updated _emoji_char_class docstring
Former-commit-id: 10b5727725
2015-07-07 14:58:50 -04:00
Joshua Chin
b81c04a182 updated gen_regex to be run as script
Former-commit-id: 22fbea4248
2015-07-07 14:50:56 -04:00
Joshua Chin
20c4930435 updated imports
Former-commit-id: f2b615b0f0
2015-07-07 14:46:42 -04:00
Joshua Chin
6deced5244 factored out regex generation
Former-commit-id: 476a909e4d
2015-07-07 14:38:21 -04:00