Shallow copy constructors are broken

If you are a C++ programmer, please do not write or allow to be generated shallow copy constructors or assignment operators when your destructor deletes allocated data. If you are not a C++ programmer, you may safely disregard this article.

The other day at work, I was working on a program that uses the C++ plugin mechanism of another company's program. That program had a class set up like this:

class A {
  int *data;

  public:
    A() : data(new int(3)) { }
    ~A() { delete data; }
    void print() const { std::cout << *data << std::endl; }
};

(The example code along with the source for this post is available from git://git.istic.org/cplusplus.git, which will be updated with future examples. Please do clone it and follow my C++ gitcast. If you don't have git, you can read the files from my gitweb interface at http://git.istic.org/.)

That is, it contained a pointer to some heap-allocated data, this being allocated in the constructor and deallocated in the destructor. (I will use the abbreviations ctor and dtor.) But, consider the copy ctor (and the copy assignment operator). Both of these are automatically generated by the compiler if not supplied by the class author. The automatically generated variants copy all the member variables, in this case data. The compiler will generate something like this for A:

A::A(const A& right) {
  data = right.data;
}

This is really bad, as now the copy and the original point to the same bit of memory. Now, when the dtor on one of them is called, the memory will be deallocated, and the other will be left with a dangling pointer, like this:

int main() {
  A alpha;
  if (1) { // force there to be a new scope here
    A abel = alpha; // XXX

    alpha.print();
    abel.print();
  } // abel.~A() is called here
  alpha.print(); // undefined behaviour happens here

  return 0;
} // alpha.~A() is called here if you get this far, and does a double-free

Once the line marked XXX has executed, there is no safe way to proceed. You can't avoid calling the dtor, and whichever way around you do it, the second one will do a double-free. If you don't realise this has happened, and you carry on accessing one copy after the other has gone out of scope, you will get arbitrary results, and they will probably be different when run under the debugger. The results are something like this:

z1@foyle:~/pub/blog/cplusplus$ ./copyctors
3
3
0
*** glibc detected *** ./copyctors: double free or corruption (fasttop): 0x0000000000602010 ***
[ some lines of debugging information ]
Aborted

In my case, this bug in the supplied class interacted most infelicitously with a mistake I had made. I was getting a reference to an A from inside a complicated data structure, using that reference to get at its data, and then using the larger data structure. At least, that's what I was trying to do. Can you spot the deliberate mistake?

// in vendor's headers
class ProgramState {
  A a;
  ...
  public:
    const A& getA() const { return a; } // the consts, they do nothing!
};

// in my plugin
void doStuff(ProgramState *ps) {
  const A a = ps->getA();
  a.print();
}

When doStuff returns, the program is dead; it just doesn't realise it yet. I should have written const A& a = ps->getA();, but I missed the ampersand, so I got a copy rather than a reference. When my copy went out of scope, the program state became invalid and it would fall over later. It took me some hours to find this bug, not helped by it being slightly non-deterministic, behaving differently under the debugger, and the failure mode tending to leave the A object's data incorrect but plausible. Eventually I tracked it down by looking at the disassembly (on a hunch) and noticing that A's dtor was unexpectedly being called at the end of doStuff. After that, it became obvious what was happening. (Needless to say, this property was not mentioned in the class's documentation or header file.)

Oddly enough, having getA return a pointer instead of a reference would have mitigated the problem in this specific case, because assigning an A* to an A alerts the type-checker that something is up. Because there is (by design) no explicit syntax for dereferencing a reference, there is no way to distinguish this error from a genuine operation of copying the refenced object.

The vendor's original source is not as const-correct as I suggest; I merely added them to show that const-correctness doesn't protect you in this case. Destructors are still called on const objects, and modifying my copy of a doesn't modify the original structure, merely its referents. Still, a well-designed library would not have allowed me to break it in this way; in fact, there is no safe way to copy instances of A in the library as it stands.

How could it have been robust? The obvious way is to provide a copy ctor and assignment operator that do a deep copy. Writing such methods is slightly non-trivial, though, as there are subtleties to take care of: self-assignment and exception-safety. And, of course, if A's field is a large one, it will be inefficient and should be discouraged.

In that case, copying should be disallowed for that class. This can be done trivially by deriving (inheriting) from boost::noncopyable, documentation for which exists at http://www.boost.org/libs/utility/utility.htm#Class_noncopyable. This class declares a private copy ctor and assignment operator, causing the compiler to reject copy attempts. I often use noncopyable, often for cases where I just haven't got around to writing a copy ctor and assignment operator yet, and want to make sure I don't accidentally call an incorrect compiler-generated one.

Using boost::scoped_ptr in this case has the same effect of disallowing copying. As you can see from reading http://www.boost.org/libs/smart_ptr/scoped_ptr.htm, a scoped_ptr is just a pointer with a dtor that deletes the referent. (There is a variant, scoped_array, that is the same but delete[]s the referent, but you shouldn't be using C-style arrays anyway without a really good excuse.) It takes no extra space compared to a normal pointer. The delete-on-destruct feature often removes the need to write your own destructor and stops you copying the pointer. A with a scoped_ptr looks like this, and doesn't exhibit the error.

class A {
  scoped_ptr<int> data;
  public:
    A() : data(new int(3)) { }
    // note lack of dtor
    void print() const { std::cout << *data << std::endl; }
};

With this modified version, the same test program fails at compile-time:

z1@foyle:~/pub/blog/cplusplus$ make copyctors-scoped
g++     copyctors-scoped.cpp   -o copyctors-scoped
/usr/include/boost/scoped_ptr.hpp: In copy constructor ‘A::A(const A&)’:
/usr/include/boost/scoped_ptr.hpp:45: error: ‘boost::scoped_ptr<T>::scoped_ptr(const boost::scoped_ptr<T>&) [with T = int]’ is private
copyctors2.cpp:4: error: within this context
copyctors2.cpp: In function ‘int main()’:
copyctors2.cpp:15: note: synthesized method ‘A::A(const A&)’ first required here 
make: *** [copyctors2] Error 1

(No, I'm not holding a secret Makefile not in the repo; I'm building using GNU make's implicit rules.)

The error may not be very readable — C++ errors rarely are — but at least it tells you where the pointer is declared, and where the offending copy operation is. To anyone who has used a scoped_ptr, that makes the error obvious.

You don't have to choose between a slow and memory-hungry deep copy and a noncopyable class, though. If you really do want to do a shallow copy, you can easily do it safely by using reference-counting. boost::shared_ptr does just that. Again, it means you don't have to worry about deleting the pointer (or, in particular, about deleting it too many or too few times).

class A {
  boost::shared_ptr<int> data;
  public:
    A() : data(new int(3)) { }
    // note lack of dtor
    void print() const { std::cout << *data << std::endl; }
};

This time, the test program compiles and runs without error. I assume you understand what reference-counting is, so I'll skip that lecture, but the copyctors-scoped example in the gitcast shows that now modifying one of the data fields modifies both, as they are both pointing to the same int. If you really need to be able to modify the two fields independently, you'll have to bite the bullet and do a deep copy. If you're worried about people triggering this expensive operation by accident, you could always make your class noncopyable and write a static deepCopy method, or even make your copy ctor do a shallow copy and zero all the pointers, but whatever you do, don't do what the author of this library did and write (or allow to be generated) a broken copy ctor.

Notice how, in both examples using smart pointers, the class definition is simpler and shorter than the original, and less bug-prone. Smart pointers allow you to think once about what it means to copy your object and who is responsible for it, and then forget. You don't have to keep chasing round ownership flags like you did in C. You don't have to worry about large structures being left in memory or holding locks, or inappropriate referent sharing, like you do in environments with garbage collection as the only mechanism for lifetime control and inflexible reference semantics. You don't even have to remember to delete your heap objects. And future maintainers of your code will be able to tell what your intention was and how they can avoid introducing bugs, without having to draw a diagram. Using smart pointers is so easy, and makes your life (and the lives of people who have to touch your code) so much easier, that there's no excuse for writing classes you have to pussyfoot around to avoid crashing your program.


Last modified: Sat Mar 8 09:53:54 2008