Minimum fuss

One way to waste a lot of time trying to find bugs is to write code like this:

#define min(a,b) a < b ? a : b

// ...

int i(5);
int j = min(++i, 8);
double d(3.0/min(1.0,2.0));

std::cout << j << std::endl;
std::cout << d << std::endl;

(This code is in min.cpp in git://git.istic.org/cplusplus.git.)

As you can see, the programmer has written a macro whose intent is to evaluate to the lesser of its two arguments. But this is not the actual behaviour. Let's look first at j.

int j = min(++i, 8);

If we're thinking of min as if it were a function, we'd expect this to evaluate ++i, which is 6, compare it with 8, and then return 6. But in fact j gets the value 7. How? What does this code become after being run through the macro preprocessor? You can find out by running the above snippet through gcc -E: the -E option tells it to stop after the preprocessor step and output what it has so far. (It doesn't matter whether the snippet would compile or not.)

int j = ++i < 8 ? ++i : 8;

It should be fairly obvious that ++i is going to be evaluated twice: first to determine the result of ++i < 8, and then (assuming that is true) to compute the result. In this case, it's fairly obvious that our macro is doing the wrong thing because the expression we are giving it has noticeable side-effects, but we could just as easily have done

int j = min(some_function_that_takes_ages_to_run(), 100);

and then the result would be correct, but in many cases the code would take twice as long to run. But what of our other use? We might expect

double d(3.0/min(1.0,2.0));

to evalute to 3.0/1.0, which equals 3.0, but in fact it becomes 2.0. When run through the preprocessor, this line becomes

double d(3.0/1.0 < 2.0 ? 1.0 : 2.0);

which, given that the precedence of / is higher than <, is evaluated in the order the spacing suggests: that is, the division is done before the comparison.

Another more obvious error results if I try to use the macro like this:

std::cout << min(5,6) << std::endl;

This is for much the same reason: running it through the preprocessor yields the code

std::cout << 5 < 6 ? 5 : 6 << std::endl;

which means that it first does std::cout << 5 , and then tries to compare the result of that (which is std::cout, because operator<< should always return its left-hand argument) with 6. There is no such comparison operator defined, so the compilation fails.

How can we avoid these problems. The first possibility is to define the macro in a more sensible way:

#define min(a,b) ((a) < (b) ? (a) : (b))

The extra set of brackets round the outside stops the ternary operator (? :) being broken up by other operators, like in my d example, and the brackets around a and b mean that if they are expressions they will stay together. It doesn't fix the problem of evaluating arguments too many times, which is inherent to macros and there's no easy way of avoiding. There's another problem that's inherent to macros as they are done in C: they don't have any namespacing mechanism. A macro introduces a new global name, so if I had the following line somewhere after my macro definition (maybe even in an include file I don't control), it would fail to compile.

int k = std::numeric_limits<unsigned int>::min();

The macro preprocessor doesn't understand namespaces, so it tries to expand the min macro even though I actually want to call a static method, and it finds the call doesn't have enough arguments, so the program fails even before getting to the compiler proper. The workaround for this, should you ever be stuck with such a macro interfering with your function or variable names, is to put brackets around the name of the function:

int k = (std::numeric_limits<unsigned int>::min)();

This separates the min from the following (, which ensures that it is not treated as a macro.

All of these workarounds are just hackery that shouldn't be necessary any more. C++ compilers, unlike many early C compilers, honour the inline keyword on functions, which means you get the advantages of using a function—respecting namespaces, evaluating its arguments exactly once, not having to worry about the precedence of operators inside the function body—as well as the efficiency advantage that macro-lovers cling to, of not introducing the cost of a function call. The inline keyword is just a hint to the compiler, and it's not obliged to insert the code inline, but sensible compilers almost always will. You need to make min templated, so that like its macro counterpart, it can operate on values of any type (as long as operator< is defined appropriately).

template<typename T>
inline T min(T a, T b)
{
    return a < b ? a : b;
}

(The earlier example, using this definition, is given in min-inline.cpp.)

Another good feature of having this is that, even though it is inline, an actual function is still generated by the compiler. Unlike a macro, you can pass a pointer to this function to higher-order functions. For example, you could pass it to std::transform to ensure that all the elements of a list of numbers are non-negative, by taking the minimum of each with zero.

There's even better news to come: you don't even have to write this function. It already exists, and is called std::min; std::max exists as well. There is one slight gotcha with the latter. If asked to guess how std::max is defined, you might write something like:

template<typename T>
inline T max(T a, T b)
{
    return a > b ? a : b;
}

which seems pretty sane, but in fact the crucial line has a < b ? b : a. There are two reasons for this. First, it means that a user-defined type only needs to define operator< and not operator> to work with both std::min and std::max. Second, it ensures that std::min and std::max called on the same arguments always give different results, even if the comparison operators are slightly oddly defined. (This might occur if not all values of the type in question are comparable; that is, if operator< doesn't represent a total order.)

These two properties are very useful, but it can cause some confusion when you are replacing a naïve max macro or inline with std::max. If you are taking the max of a NaN with a real number (whatever the precision), then the version above will return the real number, but std::max will return the NaN. This is because all comparisons with NaNs return false: real numbers are neither greater than nor less than NaNs. That's a minor point, but it caught me out in similar circumstances.

One final point: if you're developing on Windows, in general it's useful to call std::min and std::max by their full names rather than bring them into your namespace with a using std::min declaration. windows.h defines min and max macros unless you tell it not to, so it's very easy to end up with them available in your file. If you use std::min, then if this occurs you get an obvious preprocessor error and you can fix your includes or use the bracket workaround discussed above, whereas if you just call it min then it will silently call the macro instead, which may not do what you want.



Comments on Minimum fuss | no comments | Post a comment

[YAML] [JSON] [XML]