Commit Graph

  • 721a1e9fd9 Merge pull request #51 from LuminosoInsight/version1.7 v1.7 staging-20170922 code-review-20170922 Andrew Lin 2017-09-08 17:02:05 -0400
  • b042f2be9d remove unnecessary enumeration from top_n.py Robyn Speer 2017-09-08 16:52:06 -0400
  • 61b2e4062d remove unnecessary enumeration from top_n.py #51 Rob Speer 2017-09-08 16:52:06 -0400
  • fb4a7db6f7 update README for 1.7; sort language list in English order Robyn Speer 2017-08-25 17:38:31 -0400
  • 396b0f78df update README for 1.7; sort language list in English order Rob Speer 2017-08-25 17:38:31 -0400
  • 46e32fbd36 v1.7: update tokenization, update data, add bn and mk Robyn Speer 2017-08-25 17:37:48 -0400
  • e3352392cc v1.7: update tokenization, update data, add bn and mk Rob Speer 2017-08-25 17:37:48 -0400
  • 9dac967ca3 Tokenize by graphemes, not codepoints (#50) Robyn Speer 2017-08-08 11:35:28 -0400
  • dcef5813b3 Tokenize by graphemes, not codepoints (#50) staging-20170811 code-review-20170811 Rob Speer 2017-08-08 11:35:28 -0400
  • e68f253370 approve using version 2017.07.28 of regex #50 Rob Speer 2017-08-08 11:03:39 -0400
  • 353329e458 Update docstring - Brahmic scripts are no longer an exception Rob Speer 2017-07-13 14:10:12 -0400
  • e8af247688 Remove extra line break Rob Speer 2017-07-07 15:47:07 -0400
  • 29d9957280 Add more documentation to TOKEN_RE Rob Speer 2017-07-07 15:21:34 -0400
  • 5672240e52 Tokenize by graphemes, not codepoints Rob Speer 2017-06-30 18:05:18 -0400
  • 6c118c0b6a Merge pull request #49 from LuminosoInsight/restore-langcodes Andrew Lin 2017-05-10 16:20:06 -0400
  • baf6771e97 Merge pull request #49 from LuminosoInsight/restore-langcodes staging-20170519 code-review-20170519 Andrew Lin 2017-05-10 16:20:06 -0400
  • aa3ed23282 v1.6.1: depend on langcodes 1.4 Robyn Speer 2017-05-10 13:26:23 -0400
  • 37b4914970 v1.6.1: depend on langcodes 1.4 #49 Rob Speer 2017-05-10 13:26:23 -0400
  • 71a0ad6abb Use langcodes when tokenizing again (it no longer connects to a DB) Robyn Speer 2017-04-27 15:09:59 -0400
  • d6cdef6039 Use langcodes when tokenizing again (it no longer connects to a DB) Rob Speer 2017-04-27 15:09:59 -0400
  • ae7bc5764b Merge pull request #48 from LuminosoInsight/code-review-notes Robyn Speer 2017-02-15 12:29:25 -0800
  • 97042e6f60 Merge pull request #48 from LuminosoInsight/code-review-notes staging-20170224 code-review-20170224 Rob Speer 2017-02-15 12:29:25 -0800
  • c2e1504643 Clarify the changelog. Andrew Lin 2017-02-14 13:09:12 -0500
  • f28a193015 Clarify the changelog. #48 Andrew Lin 2017-02-14 13:09:12 -0500
  • 1363f9d2e0 Correct a case in transliterate.py. Andrew Lin 2017-02-14 13:08:23 -0500
  • e21bcc2a58 Correct a case in transliterate.py. Andrew Lin 2017-02-14 13:08:23 -0500
  • 72e3678e89 Merge pull request #47 from LuminosoInsight/all-1.6-changes Andrew Lin 2017-02-01 15:36:38 -0500
  • 21b331e898 Merge pull request #47 from LuminosoInsight/all-1.6-changes staging-20170210 code-review-20170210 Andrew Lin 2017-02-01 15:36:38 -0500
  • a099a5a881 Remove ninja2dot script, which is no longer used Robyn Speer 2017-02-01 14:49:44 -0500
  • b5b653f0a1 Remove ninja2dot script, which is no longer used #47 Rob Speer 2017-02-01 14:49:44 -0500
  • 7dec335f74 describe the current problem with 'cyrtranslit' as a dependency Robyn Speer 2017-01-31 18:25:52 -0500
  • 391a723662 describe the current problem with 'cyrtranslit' as a dependency Rob Speer 2017-01-31 18:25:52 -0500
  • 19b72132e7 Fix some outdated numbers in English examples Robyn Speer 2017-01-31 18:25:41 -0500
  • 7fa5e7fc22 Fix some outdated numbers in English examples Rob Speer 2017-01-31 18:25:41 -0500
  • abd0820a32 Handle smashing numbers only at the end of tokenize(). Robyn Speer 2017-01-11 19:04:19 -0500
  • 68e4ce16cf Handle smashing numbers only at the end of tokenize(). Rob Speer 2017-01-11 19:04:19 -0500
  • 93306e55a0 Update README with new examples and URL Robyn Speer 2017-01-09 15:13:19 -0500
  • e6114bf0fa Update README with new examples and URL Rob Speer 2017-01-09 15:13:19 -0500
  • 9a6beb0089 test that number-smashing still happens in freq lookups Robyn Speer 2017-01-06 19:20:41 -0500
  • f03a37e19c test that number-smashing still happens in freq lookups Rob Speer 2017-01-06 19:20:41 -0500
  • 573ecc53d0 Don't smash numbers in *all* tokenization, just when looking up freqs Robyn Speer 2017-01-06 19:18:52 -0500
  • 4dfa800cd8 Don't smash numbers in *all* tokenization, just when looking up freqs Rob Speer 2017-01-06 19:18:52 -0500
  • 3cb3c38f47 update the README, citing OpenSubtitles 2016 Robyn Speer 2017-01-06 19:04:40 -0500
  • d2bb5b78f3 update the README, citing OpenSubtitles 2016 Rob Speer 2017-01-06 19:04:40 -0500
  • 86f22e8523 Mention that multi-digit numbers are combined together Robyn Speer 2017-01-05 19:24:28 -0500
  • 3f9c8449ff Mention that multi-digit numbers are combined together Rob Speer 2017-01-05 19:24:28 -0500
  • 48a5967e9a mention tokenization change in changelog Robyn Speer 2017-01-05 19:19:31 -0500
  • a05a1c8d5c mention tokenization change in changelog Rob Speer 2017-01-05 19:19:31 -0500
  • 39e459ac71 Update documentation and bump version to 1.6 Robyn Speer 2017-01-05 19:18:06 -0500
  • 803ebc25bb Update documentation and bump version to 1.6 Rob Speer 2017-01-05 19:18:06 -0500
  • 23c7c8e936 update data from Exquisite Corpus in English and Swedish Robyn Speer 2017-01-05 19:17:51 -0500
  • f9238ac30f update data from Exquisite Corpus in English and Swedish Rob Speer 2017-01-05 19:17:51 -0500
  • 7dc3f03ebd import new wordlists from Exquisite Corpus Robyn Speer 2017-01-05 17:59:26 -0500
  • f671a1db7f import new wordlists from Exquisite Corpus Rob Speer 2017-01-05 17:59:26 -0500
  • de32a15b4f Merge branch 'transliterate-serbian' into all-1.6-changes Robyn Speer 2017-01-05 17:57:52 -0500
  • 847b85c5b8 Merge branch 'transliterate-serbian' into all-1.6-changes Rob Speer 2017-01-05 17:57:52 -0500
  • d66d04210f transliterate: organize the 'borrowed letters' better Robyn Speer 2017-01-05 13:23:20 -0500
  • e4f40a0ce9 transliterate: organize the 'borrowed letters' better Rob Speer 2017-01-05 13:23:20 -0500
  • 87b03325db transliterate: Handle unexpected Russian invasions Robyn Speer 2017-01-04 18:51:00 -0500
  • 99eac54b31 transliterate: Handle unexpected Russian invasions Rob Speer 2017-01-04 18:51:00 -0500
  • c27e7f9b76 remove wordfreq_builder (obsoleted by exquisite-corpus) Robyn Speer 2017-01-04 17:45:53 -0500
  • 6171b3d066 remove wordfreq_builder (obsoleted by exquisite-corpus) Rob Speer 2017-01-04 17:45:53 -0500
  • 6211b35fb3 Add transliteration of Cyrillic Serbian Robyn Speer 2016-12-29 18:27:17 -0500
  • b3e5d1c9e9 Add transliteration of Cyrillic Serbian Rob Speer 2016-12-29 18:27:17 -0500
  • 0aa7ad46ae fixes to tokenization Robyn Speer 2016-12-13 14:43:29 -0500
  • d376f4e2e2 fixes to tokenization Rob Speer 2016-12-13 14:43:29 -0500
  • d6d528de74 Replace multi-digit sequences with zeroes Robyn Speer 2016-12-09 15:55:08 -0500
  • bb5df3b074 Replace multi-digit sequences with zeroes Rob Speer 2016-12-09 15:55:08 -0500
  • 0620176ea9 Merge 24e26c4c1d into f6f0914e81 #46 Rob Speer 2016-12-06 22:39:48 +0000
  • a8e2fa5acf add a test for "aujourd'hui" Robyn Speer 2016-12-06 17:39:40 -0500
  • 24e26c4c1d add a test for "aujourd'hui" #46 Rob Speer 2016-12-06 17:39:40 -0500
  • 21a78f5eb9 Bake the 'h special case into the regex Robyn Speer 2016-12-06 17:37:35 -0500
  • d18b149262 Bake the 'h special case into the regex Rob Speer 2016-12-06 17:37:35 -0500
  • 82eba05f2d eh, this is still version 1.5.2, not 1.6 Robyn Speer 2016-12-05 18:58:33 -0500
  • 752c90c8a5 eh, this is still version 1.5.2, not 1.6 Rob Speer 2016-12-05 18:58:33 -0500
  • 4376636316 add a specific test in Catalan Robyn Speer 2016-12-05 18:48:02 -0500
  • f285430c84 add a specific test in Catalan Rob Speer 2016-12-05 18:48:02 -0500
  • ff5a8f2a65 add tests for French apostrophe tokenization Robyn Speer 2016-12-05 18:42:16 -0500
  • 02e2430dfb add tests for French apostrophe tokenization Rob Speer 2016-12-05 18:42:16 -0500
  • 596368ac6e fix tokenization of words like "l'heure" Robyn Speer 2016-12-05 18:40:53 -0500
  • a92c805a82 fix tokenization of words like "l'heure" Rob Speer 2016-12-05 18:40:53 -0500
  • 7f26270644 Merge pull request #45 from LuminosoInsight/citation Lance Nathan 2016-09-12 18:34:55 -0400
  • f6f0914e81 Merge pull request #45 from LuminosoInsight/citation staging-20160923 code-review-20160923 Lance Nathan 2016-09-12 18:34:55 -0400
  • 7fabbfef31 Describe how to cite wordfreq Robyn Speer 2016-09-12 18:24:55 -0400
  • 872eeb8848 Describe how to cite wordfreq #45 Rob Speer 2016-09-12 18:24:55 -0400
  • c0fbd844f6 Add a changelog Robyn Speer 2016-08-22 12:41:39 -0400
  • 0ba563c99c Add a changelog v1.5.1 staging-20160830 code-review-20160830 Rob Speer 2016-08-22 12:41:39 -0400
  • 976c8df0fd Merge pull request #44 from LuminosoInsight/mecab-loading-fix Andrew Lin 2016-08-19 11:59:44 -0400
  • 91f7ef37eb Merge pull request #44 from LuminosoInsight/mecab-loading-fix Andrew Lin 2016-08-19 11:59:44 -0400
  • aa880bcd84 bump version to 1.5.1 Robyn Speer 2016-08-19 11:42:29 -0400
  • fb5a55de7e bump version to 1.5.1 #44 Rob Speer 2016-08-19 11:42:29 -0400
  • e1d6e7d96f Allow MeCab to work in Japanese or Korean without the other Robyn Speer 2016-08-19 11:41:35 -0400
  • 31be4fd309 Allow MeCab to work in Japanese or Korean without the other Rob Speer 2016-08-19 11:41:35 -0400
  • e4b32afa18 Merge pull request #42 from LuminosoInsight/mecab-finder Andrew Lin 2016-08-08 16:00:39 -0400
  • 0250547c7a Merge pull request #42 from LuminosoInsight/mecab-finder staging-20160811 code-review-20160811 Andrew Lin 2016-08-08 16:00:39 -0400
  • 548162c563 Remove unnecessary variable from make_mecab_analyzer #42 Rob Speer 2016-08-04 15:17:02 -0400
  • 88c93f6204 Remove unnecessary variable from make_mecab_analyzer Robyn Speer 2016-08-04 15:17:02 -0400
  • 8c79465d28 Remove unnecessary variable from make_mecab_analyzer Rob Speer 2016-08-04 15:17:02 -0400
  • 2b984937be consolidate logic about MeCab path length Rob Speer 2016-08-04 15:16:20 -0400
  • 6440d81676 consolidate logic about MeCab path length Robyn Speer 2016-08-04 15:16:20 -0400