Robyn Speer
f2be213933
Merge branch 'greek-and-turkish' into chinese-and-more
...
Conflicts:
README.md
wordfreq_builder/wordfreq_builder/ninja.py
Former-commit-id: 3cb3061e06
2015-09-10 15:27:33 -04:00
Robyn Speer
f0c7c3a02c
Lower the frequency of phrases with inferred token boundaries
...
Former-commit-id: 5c8c36f4e3
2015-09-10 14:16:22 -04:00
Robyn Speer
872556f7bb
fixes based on code review notes
...
Former-commit-id: 354555514f
2015-09-09 13:10:18 -04:00
Robyn Speer
3dd70ed1c2
fix SUBTLEX citations
...
Former-commit-id: 6502f15e9b
2015-09-08 17:45:25 -04:00
Robyn Speer
1d3521dfda
take out OpenSubtitles for Chinese
...
Former-commit-id: d9c44d5fcc
2015-09-08 17:25:05 -04:00
Robyn Speer
c1f27d3095
update the README for Chinese
...
Former-commit-id: d576e3294b
2015-09-05 03:42:54 -04:00
Robyn Speer
7d1c2e72e4
WIP: Traditional Chinese
...
Former-commit-id: 7906a671ea
2015-09-04 18:52:37 -04:00
Robyn Speer
e77c2dbca8
add Polish and Swedish to README
...
Former-commit-id: 3c3371a9ff
2015-09-04 17:10:40 -04:00
Robyn Speer
032fea27c3
add more citations
...
Former-commit-id: 8196643509
2015-09-04 15:57:40 -04:00
Robyn Speer
8277b34571
Use SUBTLEX for German, but OpenSubtitles for Greek
...
In German and Greek, SUBTLEX and Hermit Dave turn out to have been
working from the same source data. I looked at the quality of how they
processed the data, and chose SUBTLEX for German, and Dave's wordlist
for Greek.
Former-commit-id: 77c60c29b0
2015-09-04 15:52:21 -04:00
Robyn Speer
37e510345d
update README with additional SUBTLEX support
...
Former-commit-id: 81bbe663fb
2015-09-04 13:23:33 -04:00
Robyn Speer
3cb4dd777e
expand list of sources and supported languages
...
Former-commit-id: d9a1c34d00
2015-09-04 01:03:36 -04:00
Robyn Speer
574c383202
support Turkish and more Greek; document more
...
Former-commit-id: d94428d454
2015-09-04 00:57:04 -04:00
Robyn Speer
d267e0967c
add SUBTLEX to the readme
...
Former-commit-id: e6a2886a66
2015-09-03 18:56:56 -04:00
Robyn Speer
942761d2f6
fix heading
...
Former-commit-id: 00a2812907
2015-08-28 17:49:38 -04:00
Robyn Speer
7bdffaae5c
fix list formatting
...
Former-commit-id: 93f44683c5
2015-08-28 17:49:07 -04:00
Robyn Speer
44c655d9a6
improve README with function documentation and examples
...
Former-commit-id: 2370287539
2015-08-28 17:45:50 -04:00
Robyn Speer
a3a3180bb9
update the README
...
Former-commit-id: 573dd1ec79
2015-08-25 17:44:34 -04:00
Joshua Chin
4c7910246e
no use for use
...
Former-commit-id: b0a9a2980f
2015-07-17 14:46:40 -04:00
Andrew Lin
383963f8a9
Document the version of Unicode used to build the regexes.
...
Former-commit-id: 9f8464c2d1
2015-07-08 18:48:33 -04:00
Robyn Speer
a3cc8d403c
add installation instructions to the readme
...
Former-commit-id: 0f4ca80026
2015-05-28 14:02:12 -04:00
Robyn Speer
860e929bf8
update Japanese data; test Japanese and token combining
...
Former-commit-id: 611a6a35de
2015-05-28 14:01:56 -04:00