Commit Graph

  • ed32b278cc make an example clearer with wordlist='large' Rob Speer 2016-03-30 15:08:32 -0400
  • cecf852040 update wordlists for new builder settings Robyn Speer 2016-03-28 12:26:47 -0400
  • c82073270b update wordlists for new builder settings Rob Speer 2016-03-28 12:26:47 -0400
  • a10c1d7ac0 update wordlists for new builder settings Rob Speer 2016-03-28 12:26:47 -0400
  • 0c7527140c Discard text detected as an uncommon language; add large German list Robyn Speer 2016-03-28 12:26:02 -0400
  • 3e34dbdd38 Discard text detected as an uncommon language; add large German list Rob Speer 2016-03-28 12:26:02 -0400
  • abbc295538 Discard text detected as an uncommon language; add large German list Rob Speer 2016-03-28 12:26:02 -0400
  • aa7802b552 oh look, more spam Robyn Speer 2016-03-24 18:42:47 -0400
  • 1c4a2077a4 oh look, more spam Rob Speer 2016-03-24 18:42:47 -0400
  • 08130908c7 oh look, more spam Rob Speer 2016-03-24 18:42:47 -0400
  • 2840ca55aa filter out downvoted Reddit posts Robyn Speer 2016-03-24 18:05:13 -0400
  • cebf99f7ba filter out downvoted Reddit posts Rob Speer 2016-03-24 18:05:13 -0400
  • 5b98794b86 filter out downvoted Reddit posts Rob Speer 2016-03-24 18:05:13 -0400
  • 16841d4b0c disregard Arabic Reddit spam Robyn Speer 2016-03-24 17:44:30 -0400
  • fe6d8fea85 disregard Arabic Reddit spam Rob Speer 2016-03-24 17:44:30 -0400
  • cfe68893fa disregard Arabic Reddit spam Rob Speer 2016-03-24 17:44:30 -0400
  • 034d8f540b fix extraneous dot in intermediate filenames Robyn Speer 2016-03-24 16:52:44 -0400
  • d2cc42936f fix extraneous dot in intermediate filenames Rob Speer 2016-03-24 16:52:44 -0400
  • 6feae99381 fix extraneous dot in intermediate filenames Rob Speer 2016-03-24 16:52:44 -0400
  • 460fbb84fd bump version to 1.4 Robyn Speer 2016-03-24 16:29:29 -0400
  • 28028115c2 bump version to 1.4 Rob Speer 2016-03-24 16:29:29 -0400
  • 1df97a579e bump version to 1.4 Rob Speer 2016-03-24 16:29:29 -0400
  • 969a024dea actually use the results of language-detection on Reddit Robyn Speer 2016-03-24 16:27:24 -0400
  • c3364ef821 actually use the results of language-detection on Reddit Rob Speer 2016-03-24 16:27:24 -0400
  • 75a4a92110 actually use the results of language-detection on Reddit Rob Speer 2016-03-24 16:27:24 -0400
  • fbc19995ab Merge remote-tracking branch 'origin/master' into big-list Robyn Speer 2016-03-24 14:11:44 -0400
  • a5fcfd100d Merge remote-tracking branch 'origin/master' into big-list Rob Speer 2016-03-24 14:11:44 -0400
  • 164a5b1a05 Merge remote-tracking branch 'origin/master' into big-list Rob Speer 2016-03-24 14:11:44 -0400
  • f493d0eec4 make max-words a real, documented parameter Robyn Speer 2016-03-24 14:10:02 -0400
  • 670ab12f54 make max-words a real, documented parameter Rob Speer 2016-03-24 14:10:02 -0400
  • 178a8b1494 make max-words a real, documented parameter Rob Speer 2016-03-24 14:10:02 -0400
  • 298cb69353 Merge pull request #33 from LuminosoInsight/bugfix Robyn Speer 2016-03-24 13:59:50 -0400
  • 384cd6a9fc Merge pull request #33 from LuminosoInsight/bugfix staging-20160408 code-review-20160408 Rob Speer 2016-03-24 13:59:50 -0400
  • 7b539f9057 Merge pull request #33 from LuminosoInsight/bugfix Rob Speer 2016-03-24 13:59:50 -0400
  • 1942bc690f Restore a missing comma. Andrew Lin 2016-03-24 13:57:18 -0400
  • c85146e156 Restore a missing comma. Andrew Lin 2016-03-24 13:57:18 -0400
  • 38016cf62b Restore a missing comma. #33 Andrew Lin 2016-03-24 13:57:18 -0400
  • 68e7846d50 Merge pull request #32 from LuminosoInsight/thai-fix Andrew Lin 2016-03-10 11:57:44 -0500
  • 241956ed7c Merge pull request #32 from LuminosoInsight/thai-fix staging-20160311 code-review-20160311 Andrew Lin 2016-03-10 11:57:44 -0500
  • 84497429e1 Merge pull request #32 from LuminosoInsight/thai-fix Andrew Lin 2016-03-10 11:57:44 -0500
  • f25985379c move Thai test to where it makes more sense Robyn Speer 2016-03-10 11:56:04 -0500
  • c2eab6881e move Thai test to where it makes more sense Rob Speer 2016-03-10 11:56:04 -0500
  • 4ec6b56faa move Thai test to where it makes more sense #32 Rob Speer 2016-03-10 11:56:04 -0500
  • 51e260b713 Leave Thai segments alone in the default regex Robyn Speer 2016-02-22 14:26:50 -0500
  • a32162c04f Leave Thai segments alone in the default regex Rob Speer 2016-02-22 14:26:50 -0500
  • 07f16e6f03 Leave Thai segments alone in the default regex Rob Speer 2016-02-22 14:26:50 -0500
  • 6344b38194 Add and document large wordlists Robyn Speer 2016-01-22 16:23:43 -0500
  • 23c5c4adca Add and document large wordlists Rob Speer 2016-01-22 16:23:43 -0500
  • d79ee37da9 Add and document large wordlists Rob Speer 2016-01-22 16:23:43 -0500
  • 12e779fc79 configuration that builds some larger lists Robyn Speer 2016-01-22 14:20:12 -0500
  • 3b95d349e0 configuration that builds some larger lists Rob Speer 2016-01-22 14:20:12 -0500
  • c1a12cebec configuration that builds some larger lists Rob Speer 2016-01-22 14:20:12 -0500
  • 83559a53d4 add Zipf scale Robyn Speer 2016-01-21 14:07:01 -0500
  • 35ee23591e add Zipf scale Rob Speer 2016-01-21 14:07:01 -0500
  • 9907948d11 add Zipf scale Rob Speer 2016-01-21 14:07:01 -0500
  • 927d4f45a4 Merge pull request #30 from LuminosoInsight/add-reddit slibs63 2016-01-14 15:52:39 -0500
  • 258f5088e9 Merge pull request #30 from LuminosoInsight/add-reddit staging-20160129 code-review-20160129 slibs63 2016-01-14 15:52:39 -0500
  • d18fee3d78 Merge pull request #30 from LuminosoInsight/add-reddit slibs63 2016-01-14 15:52:39 -0500
  • 6eca3cff5a fix documentation in wordfreq_builder.tokenizers Robyn Speer 2016-01-13 15:18:12 -0500
  • ee8cfb5a50 fix documentation in wordfreq_builder.tokenizers Rob Speer 2016-01-13 15:18:12 -0500
  • 8ddc19a5ca fix documentation in wordfreq_builder.tokenizers #30 Rob Speer 2016-01-13 15:18:12 -0500
  • 95cdf41fe8 reformat some argparse argument definitions Robyn Speer 2016-01-13 12:05:07 -0500
  • 56f830d678 reformat some argparse argument definitions Rob Speer 2016-01-13 12:05:07 -0500
  • 511fcb6f91 reformat some argparse argument definitions Rob Speer 2016-01-13 12:05:07 -0500
  • 738243e244 build a bigger wordlist that we can optionally use Robyn Speer 2016-01-12 14:05:17 -0500
  • f4761029d0 build a bigger wordlist that we can optionally use Rob Speer 2016-01-12 14:05:17 -0500
  • df8caaff7d build a bigger wordlist that we can optionally use Rob Speer 2016-01-12 14:05:17 -0500
  • 2069e30c89 fix usage text: one comment, not one tweet Robyn Speer 2016-01-12 13:05:38 -0500
  • 83bd019efe fix usage text: one comment, not one tweet Rob Speer 2016-01-12 13:05:38 -0500
  • 8d9668d8ab fix usage text: one comment, not one tweet Rob Speer 2016-01-12 13:05:38 -0500
  • 883aa5baeb Separate tokens with spaces, not line breaks, in intermediate files Robyn Speer 2016-01-12 12:59:18 -0500
  • 1d3485c855 Separate tokens with spaces, not line breaks, in intermediate files Rob Speer 2016-01-12 12:59:18 -0500
  • 115c74583e Separate tokens with spaces, not line breaks, in intermediate files Rob Speer 2016-01-12 12:59:18 -0500
  • eae7b2752e Merge pull request #31 from LuminosoInsight/use_encoding Andrew Lin 2015-12-23 16:13:47 -0500
  • c9f679a7a3 Merge pull request #31 from LuminosoInsight/use_encoding staging-20160114 code-review-20160114 Andrew Lin 2015-12-23 16:13:47 -0500
  • f30efebba0 Merge pull request #31 from LuminosoInsight/use_encoding Andrew Lin 2015-12-23 16:13:47 -0500
  • 42d209cbe2 Specify encoding when dealing with files Sara Jewett 2015-12-23 15:49:13 -0500
  • 7b6f88b059 Specify encoding when dealing with files Sara Jewett 2015-12-23 15:49:13 -0500
  • 37f9e12b93 Specify encoding when dealing with files #31 Sara Jewett 2015-12-23 15:49:13 -0500
  • 7d1719cfb4 builder: Use an optional cutoff when merging counts Robyn Speer 2015-12-15 14:44:34 -0500
  • 6d62a8ff51 builder: Use an optional cutoff when merging counts Rob Speer 2015-12-15 14:44:34 -0500
  • 973caca253 builder: Use an optional cutoff when merging counts Rob Speer 2015-12-15 14:44:34 -0500
  • f5e09f3f3d gzip the intermediate step of Reddit word counting Robyn Speer 2015-12-09 13:30:08 -0500
  • 4e985e3bca gzip the intermediate step of Reddit word counting Rob Speer 2015-12-09 13:30:08 -0500
  • 9a5d9d66bb gzip the intermediate step of Reddit word counting Rob Speer 2015-12-09 13:30:08 -0500
  • 682e08fee2 no Thai because we can't tokenize it Robyn Speer 2015-12-02 12:38:03 -0500
  • dc94222d7d no Thai because we can't tokenize it Rob Speer 2015-12-02 12:38:03 -0500
  • 95f53e295b no Thai because we can't tokenize it Rob Speer 2015-12-02 12:38:03 -0500
  • 064ee22a33 forgot about Italian Robyn Speer 2015-11-30 18:18:24 -0500
  • 237fabb4c5 forgot about Italian Rob Speer 2015-11-30 18:18:24 -0500
  • 8f6cd0e57b forgot about Italian Rob Speer 2015-11-30 18:18:24 -0500
  • ab8c2e2331 add tokenizer for Reddit Robyn Speer 2015-11-30 18:16:54 -0500
  • 6caa9ca443 add tokenizer for Reddit Rob Speer 2015-11-30 18:16:54 -0500
  • 5ef807117d add tokenizer for Reddit Rob Speer 2015-11-30 18:16:54 -0500
  • 23949a4512 rebuild data files Robyn Speer 2015-11-30 17:06:39 -0500
  • 9a1b00ba0c rebuild data files Rob Speer 2015-11-30 17:06:39 -0500
  • 2dcf368481 rebuild data files Rob Speer 2015-11-30 17:06:39 -0500
  • 6d2709f064 add word frequencies from the Reddit 2007-2015 corpus Robyn Speer 2015-11-30 16:38:11 -0500
  • d1b667909d add word frequencies from the Reddit 2007-2015 corpus Rob Speer 2015-11-30 16:38:11 -0500
  • b2d7546d2d add word frequencies from the Reddit 2007-2015 corpus Rob Speer 2015-11-30 16:38:11 -0500