Commit Graph

8 Commits

Author SHA1 Message Date
Rob Speer
4771c12814 remove wiki2tokens and tokenize_wikipedia
These components are no longer necessary. Wikipedia output can and
should be tokenized with the standard tokenizer, instead of the
almost-equivalent one in the Nim code.
2015-06-30 15:28:01 -04:00
Joshua Chin
1cf7e3d2b9 added pycld2 dependency 2015-06-16 15:06:22 -04:00
Rob Speer
1b7a2b9d0b fix dependency 2015-05-07 23:55:57 -04:00
Rob Speer
abb0e059c8 a reasonably complete build process 2015-05-07 19:38:33 -04:00
Rob Speer
d2f9c60776 WIP on more build steps 2015-05-07 16:49:53 -04:00
Rob Speer
5787b6bb73 add and adjust some build steps
- more build steps for Wikipedia
- rename 'tokenize_twitter' to 'pretokenize_twitter' to indicate that
  the results are preliminary
2015-05-05 13:59:21 -04:00
Rob Speer
5437bb4e85 WIP on new build system 2015-04-30 16:24:28 -04:00
Rob Speer
693c35476f Initial commit 2015-02-04 20:19:36 -05:00