14 August 2010

Unit tests

A program that has not been tested does not work. The ideal of designing and/or verifying a program so that it works the first time is unattainable for all but the most trivial programs. We should strive toward that ideal, but we should not be fooled into thinking that testing is easy. […] Test strategies should be generated as part of the design and implementation efforts or at least should be developed in parallel with them. As soon as there is a running system, testing should begin. Postponing serious testing until “after the implementation is complete” is a prescription for slipped schedules and/or flawed releases.
The C++ Programming Language, Bjarne Stroustrup

Like many people I've found unit testing to be a useful technique for raising the quality of the code I work on. What I learned when I started using unit tests was that they are not just a means to expose bugs - they have a wider positive influence on the code. More anon.


An example


Say I need to extract the CPU clock speed from the processor name embedded in the chip. So I want something like this

    extract_mhz("Intel(R) Pentium(R) 4 CPU 1.70GHz")

to return the value 1700 (meaning 1700MHz). So I code this function and then I think about how to test it.

To test this function I might want something like this

   TEST_EQUAL(extract_mhz("Intel(R) Pentium(R) 4 CPU 1.70GHz"), 1700);

This line would most likely be in a function called from main() in a standalone unit test program. When the program finishes it says something like “tests executed 1, tests failed 0". If the function doesn’t produce the expected value the unit test app tells me there has been a failure and gives the module name, line number, the expected value and the actual value.

Some people use unit test frameworks, of which there are many available. I tend to use just a couple of simple macros such as the TEST_EQUAL shown here.

Obviously one doesn’t stop there. I add as many real-world examples of valid embedded processor strings as I can get my hands on, and I try to test all the edge cases I can think of

    void test_typical_inputs_give_correct_results()
{
TEST_EQUAL(extract_mhz("Intel(R) Pentium(R) 4 CPU 1.70GHz"), 1700);
TEST_EQUAL(extract_mhz("Intel(R) Pentium(R) 4 CPU 1800MHz"), 1800);
TEST_EQUAL(extract_mhz("Mobile Intel(R) Pentium(R) 4 - M CPU 2.40GHz"), 2400);
TEST_EQUAL(extract_mhz("AMD Athlon(tm) XP 2600+"), 0);
TEST_EQUAL(extract_mhz("Pentium(R)III"), 0);
}

void test_valid_speeds_correctly_extracted()
{
TEST_EQUAL(extract_mhz("1234MHz"), 1234);
TEST_EQUAL(extract_mhz("2000 MHz"), 2000);
TEST_EQUAL(extract_mhz("12.34MHz"), 12);
TEST_EQUAL(extract_mhz("1.234GHz"), 1234);
TEST_EQUAL(extract_mhz("1.2345678GHz"), 1234);
TEST_EQUAL(extract_mhz("1234GHz"), 1234000);
}

void test_missing_speed_produces_zero_result()
{
// none of these strings contain valid speeds
TEST_EQUAL(extract_mhz(""), 0);
TEST_EQUAL(extract_mhz("just text"), 0);
TEST_EQUAL(extract_mhz("1"), 0);
TEST_EQUAL(extract_mhz("1234"), 0);
TEST_EQUAL(extract_mhz("1234mHz"), 0);
TEST_EQUAL(extract_mhz("1234GH") 0);
// the following are more interesting cases; see the text below
TEST_EQUAL(extract_mhz("12..34MHz"), 0);
TEST_EQUAL(extract_mhz("1234.GHz"), 0);
TEST_EQUAL(extract_mhz("1.2345.678GHz"), 0);
}


You get to decide whether 12..34MHz is a real speed or not (e.g. you could argue it represents junk followed by 34 MHz). Or you may have a specification that decides this for you. But unless this is safety critical code I wouldn't spend too much time worrying about it. Often we just need a reasonably correct result and robust code that won't bomb out when given an unexpected input (cf. Robustness principle). It's very unlikely any processor manufacturer is going to embed garbage like 12..34MHz in their chip, but if they do you know your code will do something reasonable.

But the point is, thinking up crazy unit test cases gives you a moment to consider what you are doing.

I try to think about the code in two different ways: I’m a ‘customer’ of the code – testing the interface – how would I like it to behave? But I also know the code, so I can try to exercise as many code paths as I can.

Kevlin Henney suggests you name unit tests using propositions; not test_men but test_all_men_are_created_equal. (I don't always remember to do this.)

Some people say you are not the best person to test your own code because you will hold back in some way, deliberately not trying hard to break it. I don't recognise that mindset; I'd much prefer to break my own code in a unit test and quietly understand, fix and learn from the failure than have the code break more publicly, with who knows what consequences.

Of course, it may not be a deliberate holding back, it may just be some blind spot that stops you creating a body of tests that compose an adequate coverage of the code. This may be where writing at least some of the tests before the implementation may help.

It’s not appropriate for this extract_mhz() function, but sometimes I’ve been able to exhaustively test a function (or class) by iterating over the entire range of possible inputs and expected outputs. For example, I have code that converts a UCS-2 character (16-bit value) to the equivalent UTF-8 string (1, 2 or 3 bytes long), and another that converts a UTF-8 string to a UCS-2 character. So it is possible with a simple for loop to convert 65,536 possible UCS-2 input values to UTF-8 and then back again and test we have the same value we started with. (Clearly this must be combined with other tests as it doesn't say anything about whether the intermediate UTF-8 is correct.)

Sometimes I use files of sample real-world data. For example, Windows executables from which the embedded file version information is extracted. The unit test suite may include files that have caused specific problems in the field. Once a bug is fixed and the file that exposed the bug is included in the unit tests I know that bug will never again resurface without me knowing.

Unit tests are not always the appropriate way to test code, but I’ve found them useful even when at first sight I didn’t think they applied. For example, I have some code to identify the Windows operating system. (Sadly, this is much trickier than you might hope.) This code calls various Microsoft APIs and, of course, is always going to give just one answer, according to the version of Windows it is currently running on. So how can it be unit tested? Maybe it's not worth the effort.

And so I thought. However, perhaps because of the arcane nature of the APIs, the code was not always giving the right answer; perhaps it would report Windows 2003 Advanced Server when it should have reported Windows Server 2003 Enterprise Edition, or whatever. So I tried a bit harder to write unit tests for this code.

I divided the code into the part that gets the raw data from the various Windows APIs from the part that interprets this raw data. This separation allows me to (a) obtain the raw data from many different flavours of Windows independently of the unit test suite, and then (b) unit test the interpretation part by playing in the different sets of raw data and testing that the results are as expected. So the unit test doesn’t test the entire module, but at least it tests the more complex interpretation part.


The good effects


It’s tedious and unreliable to do much testing by hand; proper testing involves lots of tests, lots of inputs, and lots of comparisons of outputs. Testing should therefore be done by programs, which don’t get tired or careless. It’s worth taking the time to write a script or trivial program that encapsulates all the tests, so a complete test suite can be run by (literally or figuratively) pushing a single button. The easier a test suite is to run, the more often you’ll run it and the less likely you’ll skip it when time is short.
The Practice of Programming, Kernighan & Pike


Unit tests will not bring to light all the defects in a piece of code, but they do have some advantages over other techniques (such as code inspections):

Regression testing: unit tests are repeatable at zero cost. This means that if you change the code and break something you get to know about it there and then. This in turn gives you confidence when working on the code.

Documentation: unit tests document the code, showing how you expected the code under test would be called.

Pause for thought: the process of creating unit tests forces me to put the implementation details to one side for a moment and think clearly about the interface to the code; is my interface simple to use? What do I want it to do when it gets given invalid inputs?

Encourage good code structure: if the code is going to be capable of being driven by a unit test it is going to have to be parameterisable. I’ve found code designed to be unit tested is often easier to reuse in new situations.

Framework for debugging: When something goes wrong at a remote customer site you may be able to take a file or snippet of log data from them, create a new unit test from it and reproduce the problem.


A few final thoughts


The English Positivist philosophers Locke, Berkeley, and Hume said that anything that can’t be measured doesn’t exist. When it comes to code, I agree with them completely. Software features that can’t be demonstrated by automated tests simply don’t exist. I am good at fooling myself into believing that what I wrote is what I meant. I am also good at fooling myself into believing that what I meant is what I should have meant. So I don’t trust anything I write until I have tests for it. The tests give me a chance to think about what I want independent of how it will be implemented. Then the tests tell me if I implemented what I thought I implemented.
Extreme Programming Explained, Kent Beck


I consider myself a careful programmer but, like every other human being, I sometimes make mistakes and when I write some code that I believe to be correct, then write some unit tests, I often find at least one bug.

In the book Test-Driven Development Kent Beck suggests that you write the unit test first and see it fail, then you write the code to make the test pass as simply as possible, then you refactor the code if necessary.
In the book Code Complete Steve McConnell says that unit testing will typically only find about 25% of the defects in a given piece of code, compared with code inspections which may detect 60%. He recommends combining two or more quality improving techniques together.

Finally, here is a complete unit test program for a simple is_power_of_2 function.

#include <iostream>
#include <stdexcept>


// This is the function being tested. Usually the unit test code is in a
// separate module and includes an appropriate header file to access the
// code under test. Sometimes I keep the unit tests together with the
// implementation but enclosed in #ifdef UNIT_TESTS_ENABLED.

// return true iff ‘n’ is a power of 2
bool is_power_of_2(long n)
{
// to demonstrate the TEST_EXCEPTION macro we're going to have negative inputs throw
if (n < 0)
throw std::runtime_error("is_power_of_2 negative input");
return n > 0 && (n & (n - 1)) == 0;
}

// The rest of this file is the unit test for the above function.


unsigned g_test_count; // count of number of unit tests executed
unsigned g_fault_count; // count of number of unit tests that fail


template <typename T>
void test_equal_(const T & value, const T & expected_value, const char * file, int line)
{
++g_test_count;
if (value != expected_value) {
std::cout
<< file << '(' << line << ") : "
<< " expected " << expected_value
<< ", got " << value
<< '\n';
++g_fault_count;
}
}

// write a message to std::cout if value != expected_value
#define TEST_EQUAL(value, expected_value) test_equal_(value, expected_value, __FILE__, __LINE__)

// write a message to std::cout if the expected exception is not thrown by expression
#define TEST_EXCEPTION(expression, exception_expected) \
{ \
bool got_exception = false; \
try { \
expression; \
} \
catch (const exception_expected & e) { \
got_exception = true; \
} \
TEST_EQUAL(got_exception, true); \
} \


// for every possible power of 2, test is_power_of_2() returns true
void test_all_powers_of_2_detected()
{
TEST_EQUAL(is_power_of_2(0x00000001L), true);
TEST_EQUAL(is_power_of_2(0x00000002L), true);
TEST_EQUAL(is_power_of_2(0x00000004L), true);
TEST_EQUAL(is_power_of_2(0x00000008L), true);
TEST_EQUAL(is_power_of_2(0x00000010L), true);
TEST_EQUAL(is_power_of_2(0x00000020L), true);
TEST_EQUAL(is_power_of_2(0x00000040L), true);
TEST_EQUAL(is_power_of_2(0x00000080L), true);
TEST_EQUAL(is_power_of_2(0x00000100L), true);
TEST_EQUAL(is_power_of_2(0x00000200L), true);
TEST_EQUAL(is_power_of_2(0x00000400L), true);
TEST_EQUAL(is_power_of_2(0x00000800L), true);
TEST_EQUAL(is_power_of_2(0x00001000L), true);
TEST_EQUAL(is_power_of_2(0x00002000L), true);
TEST_EQUAL(is_power_of_2(0x00004000L), true);
TEST_EQUAL(is_power_of_2(0x00008000L), true);
TEST_EQUAL(is_power_of_2(0x00010000L), true);
TEST_EQUAL(is_power_of_2(0x00020000L), true);
TEST_EQUAL(is_power_of_2(0x00040000L), true);
TEST_EQUAL(is_power_of_2(0x00080000L), true);
TEST_EQUAL(is_power_of_2(0x00100000L), true);
TEST_EQUAL(is_power_of_2(0x00200000L), true);
TEST_EQUAL(is_power_of_2(0x00400000L), true);
TEST_EQUAL(is_power_of_2(0x00800000L), true);
TEST_EQUAL(is_power_of_2(0x01000000L), true);
TEST_EQUAL(is_power_of_2(0x02000000L), true);
TEST_EQUAL(is_power_of_2(0x04000000L), true);
TEST_EQUAL(is_power_of_2(0x08000000L), true);
TEST_EQUAL(is_power_of_2(0x10000000L), true);
TEST_EQUAL(is_power_of_2(0x20000000L), true);
TEST_EQUAL(is_power_of_2(0x40000000L), true);
// Note: 0x80000000L is negative (-2147483648) and is therefore not a power of 2
}

// is_power_of_2 should always return false when given an input that is not a power of 2
void test_non_powers_of_2_detected()
{
TEST_EQUAL(is_power_of_2(0), false);
TEST_EQUAL(is_power_of_2(0x10000001L), false);
TEST_EQUAL(is_power_of_2(0x00000003L), false);
TEST_EQUAL(is_power_of_2(0x00000005L), false);
TEST_EQUAL(is_power_of_2(0x00000006L), false);
TEST_EQUAL(is_power_of_2(0x00000007L), false);
TEST_EQUAL(is_power_of_2(0x00000009L), false);
TEST_EQUAL(is_power_of_2(0x0000000aL), false);
TEST_EQUAL(is_power_of_2(0x0000000bL), false);
TEST_EQUAL(is_power_of_2(0x0000000cL), false);
TEST_EQUAL(is_power_of_2(0x0000000dL), false);
TEST_EQUAL(is_power_of_2(0x0000000eL), false);
TEST_EQUAL(is_power_of_2(0x0000000fL), false);
TEST_EQUAL(is_power_of_2(0x00001001L), false);
TEST_EQUAL(is_power_of_2(0x00003000L), false);
TEST_EQUAL(is_power_of_2(0x0000ffffL), false);
TEST_EQUAL(is_power_of_2(0x12345678L), false);
}

// is_power_of_2 should always throw an exception when given a negative input
void test_negative_inputs_generate_exception()
{
TEST_EXCEPTION(is_power_of_2(0x80000000L), std::runtime_error);
TEST_EXCEPTION(is_power_of_2(-1L), std::runtime_error);
TEST_EXCEPTION(is_power_of_2(-2L), std::runtime_error);
TEST_EXCEPTION(is_power_of_2(-3L), std::runtime_error);
TEST_EXCEPTION(is_power_of_2(-4L), std::runtime_error);
}

int main ()
{
test_all_powers_of_2_detected();
test_non_powers_of_2_detected();
test_negative_inputs_generate_exception();

std::cout
<< "total tests " << g_test_count
<< ", total failures " << g_fault_count
<< "\n";

return g_fault_count ? EXIT_FAILURE : EXIT_SUCCESS;
}


The problem is simple: Art is never defect-free. Things that are remarkable never meet spec because that would make them standardized, not worth talking about.
Linchpin, Seth Godin



index of blog posts

No comments:

Post a Comment