Commit Graph

31 Commits

Author SHA1 Message Date
Rob Speer
4e4c77e7d7 fix to README: we're only using Reddit in English
Former-commit-id: dcb77a552b
2016-05-11 15:38:29 -04:00
Rob Speer
f4aa2cad7b fix table showing marginal Korean support
Former-commit-id: 697842b3f9
2016-03-30 15:11:13 -04:00
Rob Speer
758e37af07 make an example clearer with wordlist='large'
Former-commit-id: ed32b278cc
2016-03-30 15:08:32 -04:00
Rob Speer
c82073270b update wordlists for new builder settings
Former-commit-id: a10c1d7ac0
2016-03-28 12:26:47 -04:00
Rob Speer
23c5c4adca Add and document large wordlists
Former-commit-id: d79ee37da9
2016-01-22 16:23:43 -05:00
Rob Speer
8fea2ca181 Merge branch 'master' into chinese-external-wordlist
Conflicts:
	wordfreq/chinese.py

Former-commit-id: 1793c1bb2e
2015-09-28 14:34:59 -04:00
Rob Speer
3bd1fe2fe6 Fix documentation and clean up, based on Sep 25 code review
Former-commit-id: 44b0c4f9ba
2015-09-28 12:58:46 -04:00
Rob Speer
7c596de98a describe optional dependencies better in the README
Former-commit-id: b460eef444
2015-09-24 17:54:52 -04:00
Rob Speer
76c4a8975a fix README conflict
Former-commit-id: 5b918e7bb0
2015-09-22 14:23:55 -04:00
Rob Speer
7f92557a58 Merge branch 'greek-and-turkish' into chinese-and-more
Conflicts:
	README.md
	wordfreq_builder/wordfreq_builder/ninja.py

Former-commit-id: 3cb3061e06
2015-09-10 15:27:33 -04:00
Rob Speer
a13f459f88 Lower the frequency of phrases with inferred token boundaries
Former-commit-id: 5c8c36f4e3
2015-09-10 14:16:22 -04:00
Rob Speer
9c08442dc5 fixes based on code review notes
Former-commit-id: 354555514f
2015-09-09 13:10:18 -04:00
Rob Speer
37e5e1009f fix SUBTLEX citations
Former-commit-id: 6502f15e9b
2015-09-08 17:45:25 -04:00
Rob Speer
0f9497d864 take out OpenSubtitles for Chinese
Former-commit-id: d9c44d5fcc
2015-09-08 17:25:05 -04:00
Rob Speer
b4100b5bfb update the README for Chinese
Former-commit-id: d576e3294b
2015-09-05 03:42:54 -04:00
Rob Speer
e2a3758832 WIP: Traditional Chinese
Former-commit-id: 7906a671ea
2015-09-04 18:52:37 -04:00
Rob Speer
62f5a8eb1e add Polish and Swedish to README
Former-commit-id: 3c3371a9ff
2015-09-04 17:10:40 -04:00
Rob Speer
138e8aaa3f add more citations
Former-commit-id: 8196643509
2015-09-04 15:57:40 -04:00
Rob Speer
c08e593234 Use SUBTLEX for German, but OpenSubtitles for Greek
In German and Greek, SUBTLEX and Hermit Dave turn out to have been
working from the same source data. I looked at the quality of how they
processed the data, and chose SUBTLEX for German, and Dave's wordlist
for Greek.


Former-commit-id: 77c60c29b0
2015-09-04 15:52:21 -04:00
Rob Speer
a0997a79a4 update README with additional SUBTLEX support
Former-commit-id: 81bbe663fb
2015-09-04 13:23:33 -04:00
Rob Speer
bf88f97744 expand list of sources and supported languages
Former-commit-id: d9a1c34d00
2015-09-04 01:03:36 -04:00
Rob Speer
a6ef3224a6 support Turkish and more Greek; document more
Former-commit-id: d94428d454
2015-09-04 00:57:04 -04:00
Rob Speer
a92c398258 add SUBTLEX to the readme
Former-commit-id: e6a2886a66
2015-09-03 18:56:56 -04:00
Rob Speer
d883eaeca5 fix heading
Former-commit-id: 00a2812907
2015-08-28 17:49:38 -04:00
Rob Speer
390a431181 fix list formatting
Former-commit-id: 93f44683c5
2015-08-28 17:49:07 -04:00
Rob Speer
43fd15c938 improve README with function documentation and examples
Former-commit-id: 2370287539
2015-08-28 17:45:50 -04:00
Rob Speer
d064fbec7d update the README
Former-commit-id: 573dd1ec79
2015-08-25 17:44:34 -04:00
Joshua Chin
45799955ab no use for use
Former-commit-id: b0a9a2980f
2015-07-17 14:46:40 -04:00
Andrew Lin
8961729401 Document the version of Unicode used to build the regexes.
Former-commit-id: 9f8464c2d1
2015-07-08 18:48:33 -04:00
Rob Speer
51f4e4c826 add installation instructions to the readme
Former-commit-id: 0f4ca80026
2015-05-28 14:02:12 -04:00
Rob Speer
1f41cb083c update Japanese data; test Japanese and token combining
Former-commit-id: 611a6a35de
2015-05-28 14:01:56 -04:00