wordfreq/tests/test_build.py

from nose.tools import eq_
from wordfreq.build import load_all_data
from wordfreq.query import wordlist_info
from wordfreq.transfer import download_and_extract_raw_data
from wordfreq import config
import os
import tempfile
import shutil
import sqlite3
import sys

PYTHON2 = (sys.version_info.major == 2)

def flatten_list_of_dicts(list_of_dicts):
    things = [sorted(d.items()) for d in list_of_dicts]
    return sorted(things)


def test_build():
    """
    Ensure that the build process builds the same DB that gets distributed.
    """
    if not os.path.exists(config.RAW_DATA_DIR):
        download_and_extract_raw_data()

    tempdir = tempfile.mkdtemp('.wordfreq')
    try:
        db_file = os.path.join(tempdir, 'test.db')
        load_all_data(config.RAW_DATA_DIR, db_file, do_it_anyway=True)
        conn = sqlite3.connect(db_file)

        # Compare the information we got to the information in the default DB.
        new_info = flatten_list_of_dicts(wordlist_info(conn))
        old_info = flatten_list_of_dicts(wordlist_info(None))
        eq_(len(new_info), len(old_info))
        for i in range(len(new_info)):
            # Don't test Greek and emoji on Python 2; we can't make them
            # consistent with Python 3.
            if PYTHON2 and ((u'lang', u'el') in new_info[i]):
                continue
            if PYTHON2 and ((u'wordlist', u'twitter') in new_info[i]):
                continue
            eq_(new_info[i], old_info[i])
    finally:
        shutil.rmtree(tempdir)


def test_python2():
    """
    Python 2 got to skip two tests up there, because we built a slightly
    wrong wordlist. Now let's test that, in normal operation, it will refuse
    to build this wordlist.
    """
    if PYTHON2:
        try:
            load_all_data(config.RAW_DATA_DIR, tempfile.mkstemp())
            assert False, "The database should not have been built"
        except UnicodeError:
            # This is the correct case
            pass
Revise the build test to compare lengths of wordlists. The test currently fails on Python 3, for some strange reason. 2013-10-30 17:22:56 +00:00			`from nose.tools import eq_`
now this package has tests 2013-10-29 21:21:55 +00:00			`from wordfreq.build import load_all_data`
Revise the build test to compare lengths of wordlists. The test currently fails on Python 3, for some strange reason. 2013-10-30 17:22:56 +00:00			`from wordfreq.query import wordlist_info`
now this package has tests 2013-10-29 21:21:55 +00:00			`from wordfreq.transfer import download_and_extract_raw_data`
			`from wordfreq import config`
			`import os`
			`import tempfile`
			`import shutil`
Revise the build test to compare lengths of wordlists. The test currently fails on Python 3, for some strange reason. 2013-10-30 17:22:56 +00:00			`import sqlite3`
When strings are inconsistent between py2 and 3, don't test them on py2. 2013-10-31 17:11:13 +00:00			`import sys`
Revise the build test to compare lengths of wordlists. The test currently fails on Python 3, for some strange reason. 2013-10-30 17:22:56 +00:00
When strings are inconsistent between py2 and 3, don't test them on py2. 2013-10-31 17:11:13 +00:00			`PYTHON2 = (sys.version_info.major == 2)`
Revise the build test to compare lengths of wordlists. The test currently fails on Python 3, for some strange reason. 2013-10-30 17:22:56 +00:00
			`def flatten_list_of_dicts(list_of_dicts):`
			`things = [sorted(d.items()) for d in list_of_dicts]`
			`return sorted(things)`
now this package has tests 2013-10-29 21:21:55 +00:00

			`def test_build():`
			`"""`
			`Ensure that the build process builds the same DB that gets distributed.`
			`"""`
			`if not os.path.exists(config.RAW_DATA_DIR):`
			`download_and_extract_raw_data()`

			`tempdir = tempfile.mkdtemp('.wordfreq')`
			`try:`
			`db_file = os.path.join(tempdir, 'test.db')`
Clear wordlists before inserting them; yell at Python 2 Former-commit-id: 823b3828cd6cc49eb05b81ff48932f765b9790f6 2013-11-01 23:29:37 +00:00			`load_all_data(config.RAW_DATA_DIR, db_file, do_it_anyway=True)`
Revise the build test to compare lengths of wordlists. The test currently fails on Python 3, for some strange reason. 2013-10-30 17:22:56 +00:00			`conn = sqlite3.connect(db_file)`
now this package has tests 2013-10-29 21:21:55 +00:00
Revise the build test to compare lengths of wordlists. The test currently fails on Python 3, for some strange reason. 2013-10-30 17:22:56 +00:00			`# Compare the information we got to the information in the default DB.`
When strings are inconsistent between py2 and 3, don't test them on py2. 2013-10-31 17:11:13 +00:00			`new_info = flatten_list_of_dicts(wordlist_info(conn))`
			`old_info = flatten_list_of_dicts(wordlist_info(None))`
			`eq_(len(new_info), len(old_info))`
			`for i in range(len(new_info)):`
			`# Don't test Greek and emoji on Python 2; we can't make them`
			`# consistent with Python 3.`
			`if PYTHON2 and ((u'lang', u'el') in new_info[i]):`
			`continue`
			`if PYTHON2 and ((u'wordlist', u'twitter') in new_info[i]):`
			`continue`
			`eq_(new_info[i], old_info[i])`
now this package has tests 2013-10-29 21:21:55 +00:00			`finally:`
			`shutil.rmtree(tempdir)`
Clear wordlists before inserting them; yell at Python 2 Former-commit-id: 823b3828cd6cc49eb05b81ff48932f765b9790f6 2013-11-01 23:29:37 +00:00

			`def test_python2():`
			`"""`
			`Python 2 got to skip two tests up there, because we built a slightly`
			`wrong wordlist. Now let's test that, in normal operation, it will refuse`
			`to build this wordlist.`
			`"""`
			`if PYTHON2:`
			`try:`
			`load_all_data(config.RAW_DATA_DIR, tempfile.mkstemp())`
			`assert False, "The database should not have been built"`
			`except UnicodeError:`
			`# This is the correct case`
			`pass`