Initialisation and Destruction This is a follow-up to my previous article, Five Things You Didn't Know About C++, a collection of obscure, surprising and (possibly) useful C++ features. This article is more of the same, but focuses on aspects of C++ that deal with initialisation, constructors, destructors and memory management. B b = a vs B b(a) Assuming that B is a class and a is a variable (not necessarily of type B), what is the difference between these declarations? B b = a; B b(a)Well, the first looks like an assignment, but it is actually an initialisation, so if you thought the first involved the assignment operator you would be mistaken. Some textbooks may tell you that the two are equivalent, and this is true for non-classes and where a is of type B (or a descendant class). In the latter case, both declarations initialise B using the copy constructor. If a is not a descendant of B, then things get a little more interesting. In the second case, this is a normal constructor call (assuming that there is a matching constructor; there are other things that happen if not). However, in the first case, the constructor is used to create a temporary of type B, which in turn is used to initialise b through the copy constructor. Now before you test this by writing a program that uses cout in its copy constructor, keep in mind that the C++ standard allows the compiler to eliminate these kind of redundant copies (and GCC does so in this case), so in fact the two versions may end up executing identically. You can still detect that this is really going on, however, by changing the access control on the copy constructor (which the compiler is required to check even when it eliminates the copy). Define a private copy constructor for B, and the first declaration will fail while the second will work. This is potentially useful to know because copy constructors can sometimes have unwanted consequences. For example, if a class holds a pointer to data and frees that pointer on destruction, then a bit-level copy is a dangerous thing to have around as it will cause the pointer to be freed twice. A more serious case is an object that encapsulates a lock or otherwise non-clonable resource. In these cases it is good programming practice to declare the copy constructor and assignment operator as private, so that copies cannot be made accidentally. In the event that you have such a class which needs to be initialised from some other object, you will need to use the second form of initialisation. A consequence of the implicit conversion of a is that only conversion constructors qualify for the conversion, in particular those without the explicit keyword. For example, if you write vector<int> p = 3 rather than vector<int> p(3)it will not work, because the constructor that takes an integer is marked explicit. Initialisation of variables It is common wisdom that variables in C and C++ are not initialised to any particularly value. In fact, this is not true for variables of “static storage duration” (basically, global variables). They are initially all zero-initialised (meaning that all the fundamental values are filled with zeros). The second stage is static initialisation, where specified constant values are filled in. The third stage is dynamic initialisation, which means calling constructors on objects. For local variables the situation is a little different. To understand the rules it is necessary to under the concept of POD (Plain Old Data) types. The definition is quite complex, but essentially types that would be valid in C are POD, while full-blown classes with destructors and the like are not. Non-POD classes will always be initialised by a constructor, possibly an implicit one. This isn't necessary a good thing, because POD members within these classes will not be implicitly initialised. POD local variables are not initialised by default, but if an incomplete initialiser is given, then the remaining elements are set to zero. For example, in the code below, all the local variables will be completely zeroed out: struct T { bool b; int *p; } int int_array[15] = {0}; T structure = { false }; T structure_array[10] = {{false, NULL}, {false}};The situation is even more complicated than I've presented it, but this should be enough to let you feel confident in not initialising some things and only partially initialising others, while at the same time realising that it isn't obvious and playing it safe when in doubt. Internals of new The new keyword is the standard way to allocate memory in C++, such as new int or new Object(123). You probably think about it just as a memory allocator that also handles object construction, but in fact there is a lot of machinery under the hood. The object construction is mostly straightforward, so I'm going to leave that and look at the internals of the memory allocation. So when you use new, how does the compiler allocate the memory? The first layer of abstraction is that it looks for a function called operator new, first in the scope of Object, then at global scope, and passes it the number of bytes that should be allocated and expects to get a void * back. The user can thus either override the global allocator, or provide a per-class allocator (e.g., allocating aligned memory for a class that contains data for SSE). Not surprisingly, the standard library is required to provide an implementation of operator new. But even here, it is more than just a wrapper around malloc. It is required to attempt an allocation (the method is not specified, although GCC uses malloc). If that fails, and the user has provided a “new handler”, it should call that before starting again. Presumably, the handler could attempt to free up some caches or other memory to make room for a new allocation. So far, I've referred to operator new as a single function, but in fact there can be multiple versions with different signatures. These are used by passing arguments to the new keyword, for example new(nothrow) or new(ptr) Object. The arguments are passed to operator new immediately after the size. The standard library provides a few convenient implementations. The first is “nothrow new”, where the magic argument nothrow is passed. This indicates that if allocation fails, a NULL pointer should be returned rather than a bad_alloc exception. The second is “placement new”, where a pointer is provided by the user and is used instead of any allocation. Placement new is used to separate memory allocation from object construction. This is heavily used in STL implementations, where for example a vector may contain spare memory, and calling resize may result in objects being constructed in the existing memory. I have glossed over the handling of array allocation; it's mostly the same but using operator new[] instead of operator new. Internals of delete The first and probably most practical thing I discovered about delete is that it is safe to call it on a NULL pointer; this is specified to have no effect. This means that if you might or might not be holding a pointer to an object, you're free (excuse the pun) to delete it without first having to check whether the pointer is NULL. C programmers may be happy to know that free and realloc have the same property i.e., free(NULL) is a no-op while realloc(NULL, size) is equivalent to malloc(size). As you may expect by now, a delete expression calls the destructor for the object then calls operator delete to deallocate the memory. The operation is somewhat symmetric to new, except that you can't pass parameters to the delete expression, so there is only one signature for operator delete — or is there? In fact, there is one case where an overloaded version may be used. If the constructor throws an exception, the same parameters that were given to new are also passed to operator delete. This is probably not very useful information for most people, but it is at least comforting to know that you can throw exceptions from a constructor and it will not cause a memory leak or otherwise wreak havoc. Destructor and pseudo-destructor calls Earlier on we saw how placement new can be used to construct an object in previously allocated memory. What about the reverse, destroying an object without reclaiming the memory? The syntax for this turns out to be a lot more obvious: you just call the destructor as if it were a method. For example, if a is of type A, then a.~A() destroys the object. The part that makes this interesting is that it works (as a no-op) on scalar types as well, in which case it is known as a pseudo-destructor. It doesn't work if you pass the actual name of the type (e.g., int), but does work if it is aliased by a typedef or template parameter. This allows the STL memory management to call the destructor when you call pop_back on a vector, without concern for whether a destructor really exists. |
|