Shallow copy constructors are broken
If you are a C++ programmer, please do not write or allow to be generated shallow copy constructors or assignment operators when your destructor deletes allocated data. If you are not a C++ programmer, you may safely disregard this article.
The other day at work, I was working on a program that uses the C++ plugin mechanism of another company's program. That program had a class set up like this:
class A {
int *data;
public:
A() : data(new int(3)) { }
~A() { delete data; }
void print() const { std::cout << *data << std::endl; }
};
(The example code along with the source for this post is available from git://git.istic.org/cplusplus.git, which will be updated with future examples. Please do clone it and follow my C++ gitcast. If you don't have git, you can read the files from my gitweb interface at http://git.istic.org/.)
That is, it contained a pointer to some heap-allocated data, this being
allocated in the constructor and deallocated in the destructor. (I will use the
abbreviations ctor and dtor.) But, consider the copy ctor (and the copy
assignment operator). Both of these are automatically generated by the compiler
if not supplied by the class author. The automatically generated variants copy
all the member variables, in this case data. The compiler will generate
something like this for A:
A::A(const A& right) {
data = right.data;
}
This is really bad, as now the copy and the original point to the same bit of memory. Now, when the dtor on one of them is called, the memory will be deallocated, and the other will be left with a dangling pointer, like this:
int main() {
A alpha;
if (1) { // force there to be a new scope here
A abel = alpha; // XXX
alpha.print();
abel.print();
} // abel.~A() is called here
alpha.print(); // undefined behaviour happens here
return 0;
} // alpha.~A() is called here if you get this far, and does a double-free
Once the line marked XXX has executed, there is no safe way to proceed. You can't avoid calling the dtor, and whichever way around you do it, the second one will do a double-free. If you don't realise this has happened, and you carry on accessing one copy after the other has gone out of scope, you will get arbitrary results, and they will probably be different when run under the debugger. The results are something like this:
z1@foyle:~/pub/blog/cplusplus$ ./copyctors 3 3 0 *** glibc detected *** ./copyctors: double free or corruption (fasttop): 0x0000000000602010 *** [ some lines of debugging information ] Aborted
In my case, this bug in the supplied class interacted most infelicitously with a mistake I had made. I was getting a reference to an A from inside a complicated data structure, using that reference to get at its data, and then using the larger data structure. At least, that's what I was trying to do. Can you spot the deliberate mistake?
// in vendor's headers
class ProgramState {
A a;
...
public:
const A& getA() const { return a; } // the consts, they do nothing!
};
// in my plugin
void doStuff(ProgramState *ps) {
const A a = ps->getA();
a.print();
}
When doStuff returns, the program is dead; it just doesn't realise it yet. I
should have written const A& a = ps->getA();, but I missed the ampersand,
so I got a copy rather than a reference. When my copy went out of scope, the
program state became invalid and it would fall over later. It took me some hours
to find this bug, not helped by it being slightly non-deterministic, behaving
differently under the debugger, and the failure mode tending to leave the A
object's data incorrect but plausible. Eventually I tracked it down by looking
at the disassembly (on a hunch) and noticing that A's dtor was unexpectedly
being called at the end of doStuff. After that, it became obvious what was
happening. (Needless to say, this property was not mentioned in the class's
documentation or header file.)
Oddly enough, having getA return a pointer instead of a reference would have
mitigated the problem in this specific case, because assigning an A* to an
A alerts the type-checker that something is up. Because there is (by design)
no explicit syntax for dereferencing a reference, there is no way to distinguish
this error from a genuine operation of copying the refenced object.
The vendor's original source is not as const-correct as I suggest; I merely
added them to show that const-correctness doesn't protect you in this case.
Destructors are still called on const objects, and modifying my copy of a
doesn't modify the original structure, merely its referents. Still, a
well-designed library would not have allowed me to break it in this way; in
fact, there is no safe way to copy instances of A in the library as it
stands.
How could it have been robust? The obvious way is to provide a copy ctor and
assignment operator that do a deep copy. Writing such methods is slightly
non-trivial, though, as there are subtleties to take care of: self-assignment
and exception-safety. And, of course, if A's field is a large one, it will be
inefficient and should be discouraged.
In that case, copying should be disallowed for that class. This can be done
trivially by deriving (inheriting) from boost::noncopyable, documentation for
which exists at
http://www.boost.org/libs/utility/utility.htm#Class_noncopyable. This class
declares a private copy ctor and assignment operator, causing the compiler to
reject copy attempts. I often use noncopyable, often for cases where I just
haven't got around to writing a copy ctor and assignment operator yet, and want
to make sure I don't accidentally call an incorrect compiler-generated one.
Using boost::scoped_ptr in this case has the same effect of disallowing
copying. As you can see from reading
http://www.boost.org/libs/smart_ptr/scoped_ptr.htm, a scoped_ptr is just a
pointer with a dtor that deletes the referent. (There is a variant,
scoped_array, that is the same but delete[]s the referent, but you
shouldn't be using C-style arrays anyway without a really good excuse.) It takes
no extra space compared to a normal pointer. The delete-on-destruct feature
often removes the need to write your own destructor and stops you copying the
pointer. A with a scoped_ptr looks like this, and doesn't exhibit the
error.
class A {
scoped_ptr<int> data;
public:
A() : data(new int(3)) { }
// note lack of dtor
void print() const { std::cout << *data << std::endl; }
};
With this modified version, the same test program fails at compile-time:
z1@foyle:~/pub/blog/cplusplus$ make copyctors-scoped g++ copyctors-scoped.cpp -o copyctors-scoped /usr/include/boost/scoped_ptr.hpp: In copy constructor ‘A::A(const A&)’: /usr/include/boost/scoped_ptr.hpp:45: error: ‘boost::scoped_ptr<T>::scoped_ptr(const boost::scoped_ptr<T>&) [with T = int]’ is private copyctors2.cpp:4: error: within this context copyctors2.cpp: In function ‘int main()’: copyctors2.cpp:15: note: synthesized method ‘A::A(const A&)’ first required here make: *** [copyctors2] Error 1
(No, I'm not holding a secret Makefile not in the repo; I'm building using GNU make's implicit rules.)
The error may not be very readable — C++ errors rarely are — but at least it
tells you where the pointer is declared, and where the offending copy operation
is. To anyone who has used a scoped_ptr, that makes the error obvious.
You don't have to choose between a slow and memory-hungry deep copy and a
noncopyable class, though. If you really do want to do a shallow copy, you can
easily do it safely by using reference-counting. boost::shared_ptr does just
that. Again, it means you don't have to worry about deleting the pointer (or, in
particular, about deleting it too many or too few times).
class A {
boost::shared_ptr<int> data;
public:
A() : data(new int(3)) { }
// note lack of dtor
void print() const { std::cout << *data << std::endl; }
};
This time, the test program compiles and runs without error. I assume you
understand what reference-counting is, so I'll skip that lecture, but the
copyctors-scoped example in the gitcast shows that now modifying one of the
data fields modifies both, as they are both pointing to the same int. If you
really need to be able to modify the two fields independently, you'll have to
bite the bullet and do a deep copy. If you're worried about people triggering
this expensive operation by accident, you could always make your class
noncopyable and write a static deepCopy method, or even make your copy ctor
do a shallow copy and zero all the pointers, but whatever you do, don't do what
the author of this library did and write (or allow to be generated) a broken
copy ctor.
Notice how, in both examples using smart pointers, the class definition is simpler and shorter than the original, and less bug-prone. Smart pointers allow you to think once about what it means to copy your object and who is responsible for it, and then forget. You don't have to keep chasing round ownership flags like you did in C. You don't have to worry about large structures being left in memory or holding locks, or inappropriate referent sharing, like you do in environments with garbage collection as the only mechanism for lifetime control and inflexible reference semantics. You don't even have to remember to delete your heap objects. And future maintainers of your code will be able to tell what your intention was and how they can avoid introducing bugs, without having to draw a diagram. Using smart pointers is so easy, and makes your life (and the lives of people who have to touch your code) so much easier, that there's no excuse for writing classes you have to pussyfoot around to avoid crashing your program.
Last modified: Sat Mar 8 09:53:54 2008
It's so hard to see the Sun with the truth in your eyes.
Comments on Shallow copy constructors are broken | no comments | Post a comment