Rob Speer
4771c12814
remove wiki2tokens and tokenize_wikipedia
...
These components are no longer necessary. Wikipedia output can and
should be tokenized with the standard tokenizer, instead of the
almost-equivalent one in the Nim code.
2015-06-30 15:28:01 -04:00
Joshua Chin
1cf7e3d2b9
added pycld2 dependency
2015-06-16 15:06:22 -04:00
Rob Speer
1b7a2b9d0b
fix dependency
2015-05-07 23:55:57 -04:00
Rob Speer
abb0e059c8
a reasonably complete build process
2015-05-07 19:38:33 -04:00
Rob Speer
d2f9c60776
WIP on more build steps
2015-05-07 16:49:53 -04:00
Rob Speer
5787b6bb73
add and adjust some build steps
...
- more build steps for Wikipedia
- rename 'tokenize_twitter' to 'pretokenize_twitter' to indicate that
the results are preliminary
2015-05-05 13:59:21 -04:00
Rob Speer
5437bb4e85
WIP on new build system
2015-04-30 16:24:28 -04:00
Rob Speer
693c35476f
Initial commit
2015-02-04 20:19:36 -05:00