wordfreq/tests/test_japanese.py

from nose.tools import eq_, assert_almost_equal
from wordfreq import tokenize, word_frequency


def test_tokens():
    eq_(tokenize('おはようございます', 'ja'),
        ['おはよう', 'ござい', 'ます'])


def test_combination():
    ohayou_freq = word_frequency('おはよう', 'ja')
    gozai_freq = word_frequency('ござい', 'ja')
    masu_freq = word_frequency('ます', 'ja')

    assert_almost_equal(
        word_frequency('おはようおはよう', 'ja'),
        ohayou_freq / 20
    )
    assert_almost_equal(
        1.0 / word_frequency('おはようございます', 'ja'),
        (100.0 / ohayou_freq + 100.0 / gozai_freq + 100.0 / masu_freq)
    )
update Japanese data; test Japanese and token combining Former-commit-id: 611a6a35de798e7033e33fa4085be9919a53a503 2015-05-28 18:01:11 +00:00			`from nose.tools import eq_, assert_almost_equal`
Express the combining of word frequencies in an explicitly associative and commutative way. Former-commit-id: 32b4033d6399f10e10dd3f1c9194847a7f01f302 2015-07-09 19:26:54 +00:00			`from wordfreq import tokenize, word_frequency`
update Japanese data; test Japanese and token combining Former-commit-id: 611a6a35de798e7033e33fa4085be9919a53a503 2015-05-28 18:01:11 +00:00

			`def test_tokens():`
			`eq_(tokenize('おはようございます', 'ja'),`
			`['おはよう', 'ござい', 'ます'])`


			`def test_combination():`
			`ohayou_freq = word_frequency('おはよう', 'ja')`
			`gozai_freq = word_frequency('ござい', 'ja')`
			`masu_freq = word_frequency('ます', 'ja')`

			`assert_almost_equal(`
			`word_frequency('おはようおはよう', 'ja'),`
Lower the frequency of phrases with inferred token boundaries Former-commit-id: 5c8c36f4e30bdb329861a514a1c4d54a8636a95b 2015-09-10 18:16:22 +00:00			`ohayou_freq / 20`
update Japanese data; test Japanese and token combining Former-commit-id: 611a6a35de798e7033e33fa4085be9919a53a503 2015-05-28 18:01:11 +00:00			`)`
			`assert_almost_equal(`
Express the combining of word frequencies in an explicitly associative and commutative way. Former-commit-id: 32b4033d6399f10e10dd3f1c9194847a7f01f302 2015-07-09 19:26:54 +00:00			`1.0 / word_frequency('おはようございます', 'ja'),`
Lower the frequency of phrases with inferred token boundaries Former-commit-id: 5c8c36f4e30bdb329861a514a1c4d54a8636a95b 2015-09-10 18:16:22 +00:00			`(100.0 / ohayou_freq + 100.0 / gozai_freq + 100.0 / masu_freq)`
update Japanese data; test Japanese and token combining Former-commit-id: 611a6a35de798e7033e33fa4085be9919a53a503 2015-05-28 18:01:11 +00:00			`)`