Unit Testing with Erlang’s Common Test Framework

November 26, 2008 at 10:57 am (erlang) (, , , , )

One of the first things people look for when getting started with Erlang is a unit testing framework, and EUnit tends to be the framework of choice. But I always had trouble getting EUnit to play nice with my code since it does parse transforms, which screws up the handling of include files and record definitions. And because Erlang has pattern matching, there’s really no reason for assert macros. So I looked around for alternatives and found that a testing framework called common_test has been included since Erlang/OTP-R12B. common_test (and test_server), are much more heavy duty than EUnit, but don’t let that scare you away. Once you’ve set everything up, writing and running unit tests is quite painless.

Directory Setup

I’m going to assume an OTP compliant directory setup, specifically:

  1. a top level directory we’ll call project/
  2. a lib/ directory containing your applications at project/lib/
  3. application directories inside lib/, such as project/lib/app1/
  4. code files are in app1/src/ and beam files are in app1/ebin/

So we end up with a directory structure like this:

project/
    lib/
        app1/
            src/
            ebin/

Test Suites

Inside the app1/ directory, create a directory called test/. This is where your test suites will go. Generally, you’ll have 1 test suite per code module, so if you have app1/src/module1.erl, then you’ll create app1/test/module1_SUITE.erl for all your module1 unit tests. Each test suite should look something like this: (unfortunately, wordpress doesn’t do syntax highlighting for erlang, so it looks kinda crappy)

-module(module1_SUITE).

% easier than exporting by name
-compile(export_all).

% required for common_test to work
-include("ct.hrl").

%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% common test callbacks %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Specify a list of all unit test functions
all() -> [test1, test2].

% required, but can just return Config. this is a suite level setup function.
init_per_suite(Config) ->
    % do custom per suite setup here
    Config.

% required, but can just return Config. this is a suite level tear down function.
end_per_suite(Config) ->
    % do custom per suite cleanup here
    Config.

% optional, can do function level setup for all functions,
% or for individual functions by matching on TestCase.
init_per_testcase(TestCase, Config) ->
    % do custom test case setup here
    Config.

% optional, can do function level tear down for all functions,
% or for individual functions by matching on TestCase.
end_per_testcase(TestCase, Config) ->
    % do custom test case cleanup here
    Config.

%%%%%%%%%%%%%%%%
%% test cases %%
%%%%%%%%%%%%%%%%

test1(Config) ->
    % write standard erlang code to test whatever you want
    % use pattern matching to specify expected return values
    ok.

test2(Config) -> ok.

Test Specification

Now the we have a test suite at project/app1/test/module1_SUITE.erl, we can make a test specification so common_test knows where to find the test suites, and which suites to run. Something I found out that hard way is that common_test requires absolute paths in its test specifications. So instead of creating a file called test.spec, we’ll create a file called test.spec.in, and use make to generate the test.spec file with absolute paths.

test.spec.in

{logdir, "@PATH@/log"}.
{alias, app1, "@PATH@/lib/app1"}.
{suites, app1, [module1_SUITE]}.

Makefile

src:
    erl -pa lib/*/ebin -make

test.spec: test.spec.in
    cat test.spec.in | sed -e "s,@PATH@,$(PWD)," > $(PWD)/test.spec

test: test.spec src
    run_test -pa $(PWD)/lib/*/ebin -spec test.spec

Running the Tests

As you can see above, I also use the Makefile for running the tests with the command make test. For this command to work, run_test must be installed in your PATH. To do so, you need to run /usr/lib/erlang/lib/common_test-VERSION/install.sh (where VERSION is whatever version number you currently have). See the common_test installation instructions for more information. I’m also assuming you have an Emakefile for compiling the code in lib/app1/src/ with the make src command.

Final Thoughts

So there you have it, an example test suite, a test specification, and a Makefile for running the tests. The final file and directory structure should look something like this:

project/
    Emakefile
    Makefile
    test.spec.in
    lib/
        app1/
            src/
                module1.erl
            ebin/
            test/
                module1_SUITE.erl

Now all you need to do is write your unit tests in the form of test suites and add those suites to test.spec.in. There’s a lot more you can get out of common_test, such as code coverage analysis, HTML logging, and large scale testing. I’ll be covering some of those topics in the future, but for now I’ll end with some parting thoughts from the Common Test User’s Guide:

It’s not possible to prove that a program is correct by testing. On the contrary, it has been formally proven that it is impossible to prove programs in general by testing.

There are many kinds of test suites. Some concentrate on calling every function or command… Some other do the same, but uses all kinds of illegal parameters.

Aim for finding bugs. Write whatever test that has the highest probability of finding a bug, now or in the future. Concentrate more on the critical parts. Bugs in critical subsystems are a lot more expensive than others.

Aim for functionality testing rather than implementation details. Implementation details change quite often, and the test suites should be long lived.

Aim for testing everything once, no less, no more

Advertisements

Permalink 1 Comment

Part of Speech Tagging with NLTK – Part 2

November 10, 2008 at 2:42 pm (python) (, , )

Following up on Part of Speech Tagging with NLTK – Part 1, I test the accuracy of adding an AffixTagger and a RegexpTagger to my SequentialBackoffTagger chain.

Affix Tagging

The AffixTagger learns prefix and suffix patterns to determine the part of speech tag for word. I tried inserting the AffixTagger into every possible position of the ubt_tagger to see which method increased accuracy the most. As you’ll see in the results, the aubt_tagger had the highest accuracy.

ubta_tagger = backoff_tagger(train_sents, [nltk.tag.UnigramTagger, nltk.tag.BigramTagger, nltk.tag.TrigramTagger, nltk.tag.AffixTagger])
ubat_tagger = backoff_tagger(train_sents, [nltk.tag.UnigramTagger, nltk.tag.BigramTagger, nltk.tag.AffixTagger, nltk.tag.TrigramTagger])
uabt_tagger = backoff_tagger(train_sents, [nltk.tag.UnigramTagger, nltk.tag.AffixTagger, nltk.tag.BigramTagger, nltk.tag.TrigramTagger])
aubt_tagger = backoff_tagger(train_sents, [nltk.tag.AffixTagger, nltk.tag.UnigramTagger, nltk.tag.BigramTagger, nltk.tag.TrigramTagger])

Regexp Tagging

The RegexpTagger allows you to define your own word patterns for determining the part of speech tag. Some of the patterns defined below were taken from chapter 3 of the NLTK book, others I added myself. Since I had already determined that the aubt_tagger was the most accurate, I only tested the RegexpTagger at the beginning and end of the chain.

word_patterns = [
	(r'^-?[0-9]+(.[0-9]+)?$', 'CD'),
	(r'.*ould$', 'MD'),
	(r'.*ing$', 'VBG'),
	(r'.*ed$', 'VBD'),
	(r'.*ness$', 'NN'),
	(r'.*ment$', 'NN'),
	(r'.*ful$', 'JJ'),
	(r'.*ious$', 'JJ'),
	(r'.*ble$', 'JJ'),
	(r'.*ic$', 'JJ'),
	(r'.*ive$', 'JJ'),
	(r'.*ic$', 'JJ'),
	(r'.*est$', 'JJ'),
	(r'^a$', 'PREP'),
]

aubtr_tagger = nltk.tag.RegexpTagger(word_patterns, backoff=aubt_tagger)
raubt_tagger = backoff_tagger(train_sents, [nltk.tag.AffixTagger, nltk.tag.UnigramTagger, nltk.tag.BigramTagger, nltk.tag.TrigramTagger],
    backoff=nltk.tag.RegexpTagger(word_patterns))

Affix and Regexp Tagging Accuracy

Conclusion

As you can see, the aubt_tagger provided the most gain over the ubt_tagger, and the raubt_tagger had a slight gain on top of that. In Part 3 I’ll discuss the results of using the BrillTagger to push the accuracy even higher.

Permalink 1 Comment

Part of Speech Tagging with NLTK – Part 1

November 3, 2008 at 6:19 pm (python) (, , )

An important part of weotta’s tag extraction is part of speech tagging, a process of identifying nouns, verbs, adjectives, and other parts of speech in context. NLTK provides the necessary tools for tagging, but doesn’t actually tell you what methods work best, so I decided to find out for myself.

Training and Test Sentences

NLTK has a data package that includes 3 tagged corpora: brown, conll2000, and treebank. I divided each of these corpora into 2 sets, the training set and the testing set. The choice and size of your training set can have a significant effect on the tagging accuracy, so for real world usage, you need to train on a corpus that is very representative of the actual text you want to tag. In particular, the brown corpus has a number of different categories, so choose your categories wisely. I chose these categories primarily because they have a higher occurance of the word food than other categories.

import nltk.corpus, nltk.tag, itertools
from nltk.tag import brill
# PRESS: REVIEWS
brownc_sents = nltk.corpus.brown.tagged_sents(categories="c")
# POPULAR LORE
brownf_sents = nltk.corpus.brown.tagged_sents(categories="f")
# FICTION: ROMANCE
brownp_sents = nltk.corpus.brown.tagged_sents(categories="p")

brown_train = list(itertools.chain(brownc_sents[:1000], brownf_sents[:1000], brownp_sents[:1000]))
brown_test = list(itertools.chain(brownc_sents[1000:2000], brownf_sents[1000:2000], brownp_sents[1000:2000]))

conll_sents = nltk.corpus.conll2000.tagged_sents()
conll_train = list(conll_sents[:4000])
conll_test = list(conll_sents[4000:8000])

treebank_sents = nltk.corpus.treebank.tagged_sents()
treebank_train = list(treebank_sents[:1500])
treebank_test = list(treebank_sents[1500:3000])

Ngram Tagging

I started by testing different combinations of the 3 NgramTaggers: UnigramTagger, BigramTagger, and TrigramTagger. These taggers inherit from SequentialBackoffTagger, which allows them to be chained together for greater accuracy. To save myself a little pain when constructing and training these taggers, I created a utility method for creating a chain of SequentialBackoffTaggers.

def backoff_tagger(tagged_sents, tagger_classes, backoff=None):
	if not backoff:
		backoff = tagger_classes[0](tagged_sents)
		del tagger_classes[0]

	for cls in tagger_classes:
		tagger = cls(tagged_sents, backoff=backoff)
		backoff = tagger

	return backoff

ubt_tagger = backoff_tagger(train_sents, [nltk.tag.UnigramTagger, nltk.tag.BigramTagger, nltk.tag.TrigramTagger])
utb_tagger = backoff_tagger(train_sents, [nltk.tag.UnigramTagger, nltk.tag.TrigramTagger, nltk.tag.BigramTagger])
but_tagger = backoff_tagger(train_sents, [nltk.tag.BigramTagger, nltk.tag.UnigramTagger, nltk.tag.TrigramTagger])
btu_tagger = backoff_tagger(train_sents, [nltk.tag.BigramTagger, nltk.tag.TrigramTagger, nltk.tag.UnigramTagger])
tub_tagger = backoff_tagger(train_sents, [nltk.tag.TrigramTagger, nltk.tag.UnigramTagger, nltk.tag.BigramTagger])
tbu_tagger = backoff_tagger(train_sents, [nltk.tag.TrigramTagger, nltk.tag.BigramTagger, nltk.tag.UnigramTagger])

Accuracy Testing

To test the accuracy of a tagger, we can compare it to the test sentences using the nltk.tag.accuracy function.

nltk.tag.accuracy(tagger, test_sents)

Ngram Tagging Accuracy

Ngram Tagging Accuracy

Ngram Tagging Accuracy

Conclusion

The ubt_tagger and utb_taggers are extremely close to each other, but the ubt_tagger is the slight favorite (note that the backoff sequence is in reverse order, so for the ubt_tagger, the TrigramTagger backsoff to the BigramTagger, which backsoff to the UnigramTagger.)

Update: in Part of Speech Tagging with NLTK – Part 2, I do further testing using the AffixTagger and the RegexpTagger to get the accuracy up past 80%.

Permalink 4 Comments