Commit Graph

22 Commits

Author SHA1 Message Date
Robyn Speer
f2be213933 Merge branch 'greek-and-turkish' into chinese-and-more
Conflicts:
	README.md
	wordfreq_builder/wordfreq_builder/ninja.py

Former-commit-id: 3cb3061e06
2015-09-10 15:27:33 -04:00
Robyn Speer
f0c7c3a02c Lower the frequency of phrases with inferred token boundaries
Former-commit-id: 5c8c36f4e3
2015-09-10 14:16:22 -04:00
Robyn Speer
872556f7bb fixes based on code review notes
Former-commit-id: 354555514f
2015-09-09 13:10:18 -04:00
Robyn Speer
3dd70ed1c2 fix SUBTLEX citations
Former-commit-id: 6502f15e9b
2015-09-08 17:45:25 -04:00
Robyn Speer
1d3521dfda take out OpenSubtitles for Chinese
Former-commit-id: d9c44d5fcc
2015-09-08 17:25:05 -04:00
Robyn Speer
c1f27d3095 update the README for Chinese
Former-commit-id: d576e3294b
2015-09-05 03:42:54 -04:00
Robyn Speer
7d1c2e72e4 WIP: Traditional Chinese
Former-commit-id: 7906a671ea
2015-09-04 18:52:37 -04:00
Robyn Speer
e77c2dbca8 add Polish and Swedish to README
Former-commit-id: 3c3371a9ff
2015-09-04 17:10:40 -04:00
Robyn Speer
032fea27c3 add more citations
Former-commit-id: 8196643509
2015-09-04 15:57:40 -04:00
Robyn Speer
8277b34571 Use SUBTLEX for German, but OpenSubtitles for Greek
In German and Greek, SUBTLEX and Hermit Dave turn out to have been
working from the same source data. I looked at the quality of how they
processed the data, and chose SUBTLEX for German, and Dave's wordlist
for Greek.


Former-commit-id: 77c60c29b0
2015-09-04 15:52:21 -04:00
Robyn Speer
37e510345d update README with additional SUBTLEX support
Former-commit-id: 81bbe663fb
2015-09-04 13:23:33 -04:00
Robyn Speer
3cb4dd777e expand list of sources and supported languages
Former-commit-id: d9a1c34d00
2015-09-04 01:03:36 -04:00
Robyn Speer
574c383202 support Turkish and more Greek; document more
Former-commit-id: d94428d454
2015-09-04 00:57:04 -04:00
Robyn Speer
d267e0967c add SUBTLEX to the readme
Former-commit-id: e6a2886a66
2015-09-03 18:56:56 -04:00
Robyn Speer
942761d2f6 fix heading
Former-commit-id: 00a2812907
2015-08-28 17:49:38 -04:00
Robyn Speer
7bdffaae5c fix list formatting
Former-commit-id: 93f44683c5
2015-08-28 17:49:07 -04:00
Robyn Speer
44c655d9a6 improve README with function documentation and examples
Former-commit-id: 2370287539
2015-08-28 17:45:50 -04:00
Robyn Speer
a3a3180bb9 update the README
Former-commit-id: 573dd1ec79
2015-08-25 17:44:34 -04:00
Joshua Chin
4c7910246e no use for use
Former-commit-id: b0a9a2980f
2015-07-17 14:46:40 -04:00
Andrew Lin
383963f8a9 Document the version of Unicode used to build the regexes.
Former-commit-id: 9f8464c2d1
2015-07-08 18:48:33 -04:00
Robyn Speer
a3cc8d403c add installation instructions to the readme
Former-commit-id: 0f4ca80026
2015-05-28 14:02:12 -04:00
Robyn Speer
860e929bf8 update Japanese data; test Japanese and token combining
Former-commit-id: 611a6a35de
2015-05-28 14:01:56 -04:00