Tips and Tricks of the C++ Professionals

Table of Contents

This book is for C++ programmers who want to reduce their development time, ease their portability problems, and reduce the number of bugs and memory leaks.

A different name for this book was considered:

Tips and Tricks of the C++ Masters

but this led to the connotation that this was a book of C++ peculiarities and wizards tricks. While there might be one or two of those to be found -- most of them gleened from looking at the source code for early STL versions -- that isn't the focus of the book.

Instead, this book is for the professional developer of large scale C++ programs. Its about standard practices, regularity, simplicity, and correctness.

Experience can be a difficult task master. Hopefully this book can help the reader avoid a few painful lashes.

This book is not a tutorial of the C++ language. Rather, it addresses approaches to successfully using existing C++ compilers to develop real world programs in a professional manner -- and, hopefully, as quickly and as painlessly as possible. The following chapters are organized such that they can be easily used as a reference, but also contain sufficient explanatory material to justify the advice found in them. Each chapter begins with a very quick summary of what is said in the chapter -- usually as an indented list of suggestions. The remainder of each chapter expands and justifies each.

The remainder of the book is arranged according to this broad outline:

Become a C++ expert

A discussion of some of the important skills professional programmers need to develop.

Adopt good habits

Specific C++ implementation habits that will simply development when working in large teams.

Speed: writing fast programs

Tips for improving performance.

Designing Classes

Tips for class design that improve maintainability.

Tricks of the template masters

Using templates to simplify code maintenance.

Write no code before its time

This chapter addresses the issues to be considered before coding actually starts -- such as

what is supposed to do?
what platforms is the code supposed to run on
how are errors handled

Compilers, librarians, and linkers

A discussion of the wierdnesses to experienced using the primary tools of software development. Particularly, unexpected and annoying behaviours of the real world programs implementing the ivory tower descriptions of these tools.

Here are some things all professional C++ programs need to learn or do:

Understand the language!
Practice, practice, practice!
Read good books on programming
- Bjarne Stroustrup's: The C++ Programming Language, 3rd edition.
- Scott Myers: Effective C++
- John Lakos: Large Scale C++ Development
Learn to use your tools
- Learn your editor -- you'll use it a LOT
- Use a good CM system and be expert
- Learn purify, quantify, purecov
- Learn a good text processing script language
Learn the STL and follow its design patterns

C++ can be used to implement object oriented programs, but it is not object oriented in itself: class objects are proper 'objects' only if designed to be so.

Objected oriented thinking can be extremely helpful in system design and analsysis and should be well understood by professional programmers -- particularly for application level class designs. However, focussing only on object oriented methodology can lead to a failure to use some of the more helpful features of C++ -- particularly templates and meta-object construction.

That is, C++ provides OO tools for high level program artifacts, but it also provides its own particular specialties to assist in the day to day mechanics of development quite apart from OO methodology. Like many complex things, there are several layers in which C++ can be understood. The following paragraphs discuss these ways. You really only need to understand C to use this level. This level of understanding gives you better compile errors, but will not allow you to take advantages of the C++ features that make it worthwhile to use.

Features will begin to notice at this level include:

C++ complains more about questionable practices, so it helps you avoid problems.

At this level, you need to start learning to do without macros. Templates, inline functions, and reference variables (used as function arguments) almost completely eliminate the need for C macros. Note that const variable declarations and enum's remove the need for #defining constants. For example:

enum constants_list { constant_name_1=constant_value, constant_name_2=constant_value_2 };

Note that #define'd constants are really only needed for integers and single characters -- because constants are required for switch statements and array sizes. Neither strings nor floats can be used as switch statement case selectors nor array sizes.

All other uses of #defined constants can be just as effectively handled with global variables instead of defined constants. C++ class and struct design allows for related functions to be packaged together with an enforced naming convention. That is, the names of class methods all begin with the class' name: SomeClass::some_member().

Further, the member access declarators let the class designer prohibit access to methods not part of documented interfaces.

Further, constructors and destructors help guarantee that package specific design assumptions are enforced. That is, when a class is designed, object member variable values are assumed. In plain old C, without constructors and with no power to prevent the calling of the wrong functions, these guarantees are impossible to enforce.

The first step in true object oriented programming is understand encapsulation of data and methods in classes. The next is to learn to design a class' methods so that they guarantee the internal state of the data in the class objects at all times and then to prevent access by outside functions which might not obey these state guarantees.

The C++ language does not force object oriented programming on the developer -- it is up to the developer rigorously attend to this paradigm. Collections of class objects can be implemented in numerous ways. Some standard container paradigms are: vectors, stacks, lists, associative arrays, etc. Templates let you instantiate behaviorally consistent but class specific collections of objects in an efficient manner. There is an ANSI standard set of container classes, the STL (Standard Template Library). Each compiler also provides its own approximation to the STL. These are typically not 100% compatible. You can make your own container templates and port them but doing so is by no means a trivial task. It requires that you understand how new and delete really work. It requires that you understand the assignment and copy constructor logic, etc. You really need a good understanding of pointers, references, inline functions, and operator overloading to build an efficient container class. Here, you build data structures that emulate the atomic objects. For example, you might want to implement a representation of the time that can be treated as an integer in that you can assign an integer to the time and vice versa. You would want to be able to use the time on the right hand side of operator= to perform calculations involving integers and time values. Further, you probably want a character string representation, eg "12/28/57-11:03pm". So, you'd want to be able to write expressions like this:

newTime = oldTime + 3 /*seconds*/ - "4 hours"; Doing this kind of thing requires that you understand all the operators functions and compiler temporaries. You need to worry about the efficiency issues related to function returns, assignment operations, compiler temporary generation. You also need to understand operator overloading -- both at the global level and between cooperating classes. A common approach to speeding up the copying of objects is to use a copy on write approach where the obvious objects are really pointers to hidden objects and whose member functions are pass throughs to that object. When you modify the obvious object, a new hidden object is created with the modified attributes. In this way copying is fast because you are just copying a pointer and incrementing a reference count. Modifications become slower because you have to allocate another heap packet to perform the modification. Meta objects are objects that describe objects. Certain kinds of objects which emulate atomic objects, cannot be implemented efficiently if you do so directly. Two examples:

String concatenation expressions:

  A = B + C + D + E;
If this were implemented directly, you'd get

  concatenate B with C and store the result in T1   concatenate T1 with D and store the result in T2   concatenate T2 with E and store the result in T3   store T3 in A
A more efficient approach is to define the string::operator+ to return not a string but a description of the string. That is, instead of T1, T2, and T3 being strings with the concatenated values in them, they are actually data structures that describe the components that make them up:

   T3 contains a pointer to T2 and to E    T2 contains a pointer to T1 and to D    T1 contains a pointer to B and to C
Then make the string::operator= know how to convert a string description into a string and store it in A efficiently. It must also free up any memory used to hold the description.
Matrix arithmetic operations are similar. Instead of a having

A = B * C * D * E;
evaluated using 3 intermediate steps, it is much faster to have a matrix multiply by a list of matrices function and have the matrix:operator* produce the list. The matrix::operator= then knows how to invoke the multiply list function and save the results.

With a little thought, numerous slow operations can be turned into lazy evaluation scenarious like the above. Understanding how to do this though can be very complex because the compiler creates temporaries, destroys them, copies them, constructs new ones from old ones at all kinds of odd places. Developing a meta-object representation and ensuring that the memory allocated to it is freed up when the operator= function is done with it can be very complex. A great deal of experimentation and function call logging is needed to really understand what's going on.

Practice is the best teacher

There is no substitute for practice when it comes to software development. The more often you deal with your editor, the compiler, its error messages, and porting, the easier it will be to do these things. But don't play in the production code base. Do these things in your own personal directory. If an idea pops into your head, create a small C++ source file to check it out. Then port it to all platforms available to you. A good configuration management system is essential to the success of any production software product. For individual developers, the most important benefit it provides is the ability to get back the code you used to have after you have screwed it up. This ability, to run back time, as it were, allows you to edit your code without fear. Many developers are rightly afraid that they will mess up the product and spend a lot of time and effort getting it fixed. A good CM system will help you overcome the irrational aspects of this fear -- you can always find out from the CM system who did what -- and how to undo it.

A good CM system is not necessarily cheap, nor is it necessarily easy to use. Just because you can get one, like CVS, for free, does not necessarily mean that it is a good system. Whatever system you use, be an expert with it. If you don't become an expert, you are like to be more afraid of it than you are of breaking in the code base. Learn the CM system and become and expert.

In particular, you should learn how to make branches and apply labels for your own private use. Once you have learned to branch the code base, you can make any change you want with impunity. If you completely screw it up, just delete the branch -- or simply ignore it and go back to the 'standard' branch. Clearcase, for example, makes this very simple. It is a bit more complex with CVS. Because people are habitual in nature, we sometimes do the same wrong thing over and over again in dozens of files. Perhaps your organization established standards and practices that later turned out to be a bad idea. The ubiquitousness of a mistake should not be an impediment to fixing it.

Don't be afraid to edit your code using some text processing script language. The standard unix tool, sed, is relatively easy to use for simple textual substitutions using regular expressions. For more sophisticated changes -- or substitutions that span multiple lines of text -- you'll have to use perl.

If you have a good CM system, you should not be afraid to sweep through every file, check it out and change it with a script. If your script messes up a file, just get back the prior version and make the needed changes by hand. Even if you don't have a good CM system, you can just copy the files to another directory and do your experiments there.

Often, you can only fix some significant fraction of the code using a script -- perfecting said script to the point where it fixes all the cases might well take longer than fixing the remainders by hand. Thus in many cases, when a script is used there will still be some hand work to be done.

"Algorithms and iterators work so well together because they nothing about each other". -- Andrew Koenig.

The standard template library was design by Mr. Andrew Koenig. If ever there was a 'gods gift to programming' -- he's it. The rest of us can relax -- the job is already filled.

(Of course, Bjarne Stroustrup is the giver of the gift!)

In addition to being 'standard', the STL is a great source of programming paradigms and implementation paradigms. Chief among them are 'algorithsm', 'containers', and 'iterators'.

When you implement your own algorithms, containers, and iterators, make sure you follow the paradigm set by the STL classes completely -- or as closely as possible. If you don't, your users (and later yourself) will be totally confused as to why they can't write working code using your interfaces.

Iterators are class objects that more or less emulate pointers. All true STL style containers provide several types of iterator classes nested within their definition:

iterator: The normal iterator for the container.
const_iterator: The iterator that only points to constant data
reverse_iterator: An iterator which iterates 'backwards'.
const_reverse_iterator: The const version of the reverse_iterator

When writing your own containers, don't forget to make it 'const correct'. This is very tedius to fix later. Worse, it forces your container's users to break const correctness in their own code in order to use your code. Bad habit.

When creating custom iterators, it is best to fully implement one of the iterator categories below -- and not to try to add features to the iterator that are not appropriate for it. For example, a linked list iterator could be enhanced to support subtraction -- but the result would be horribly slow. Rather, stick one of the appropriate categories below:

output iterators

Output iterators only really support dereferencing and assignment:

operator*
operator++

A good example is the std::ostream_iterator. In that case, increment operator causes the data to be physically written.

Note that output iterators cannot (conceptually at least) be copied -- although they can be returned from functions. You can't save them in temporary variables and use the saved value later.

input iterators

Similar to output iterators, this form also only performs operator* and operator++. The same restrictions about saving input iterators applies here.

A good example, is the std::istream_iterator.

forward iterators

Forward iterators have all the features of input and output iterators, but they do allow you to save them and use the old value.

bidirectional iterators

A bi-directional iterator is the same as a forward iterator but it also supports operator--

random access iterators

Random access iterators support all the features of all other iterators and also add

the array index operator, operator [] ().
iterator subtraction
adding/subtracting ints to iterators

Further, the assumption is that all these activities are O(1).

Note that the above iterator categories are not classes from which iterators are inherited. They are merely a notational and documentative convenience used in the STL to convey the requirements standard algorithms place on the iterators. For example, you might see an algorithm designed like this: template<class ForwardIterator> void write_one_o (ForwardIterator t) { *t++ = '0'; }

When writing iterators of your own, make sure that maintain const correctness. This usually means that you must have both an iterator class and a const_iterator. The reverse iterators rarely needed.

Make sure that all your iterator member functions are as fast as possible and as small as make sense. Do not, for example, design a forward iterator which supports the array index operators. This gives the user the false impression that the function will be fast -- which it won't.

STL algorithms are very numerous. This has the advantage that you can probably find one that does exactly what you want, but it also means that you need a good reference manual to understand your choices. The STL source code is a bit difficult to understand because of function name overloading and concepts like parameter type promotion. Further, different compilers will promote


       const

parameter types in slightly different ways.

When writing your own template algorithms, try to do the following:

Write your new algorithms according to standard paradigms -- or as Stroustrup suggests: "prefer the standard to the offbeat" .
Understand the iterator classes and make sure that you use the least functional iterator that does the work. For example:
- don't use the iterator's operator + () unless you really have to. If you are going to only add one or two, just use the operator ++ (). Ditto for the subtraction operator.
- don't use the iterator's operator -- () unless you have to.
If you feel that you need to implement your algorithm to get the maximum efficiency out of random access iterators -- which do support pointer arithmetic, consider writing two algorithms -- one for random access iterators and another for all others. Theoretically, you might need 4 versions -- one for random access iterators, one for bidirectional iterators, and one for forward iterators (this will handle the input and output iterator categories as well -- usually).
When you decide to provide multiple instances of your algorithm, use 'routing' template function to select among the alternates. See the distance overloaded function example below.

The auto_ptr is a really nice way to tell the compiler, and code maintainers, that a given object is a pointer and more importantly that the memory to which it points is owned by the auto_ptr object. There are however a lot of caveats with its use (but it is still worthwhile, see below).

An auto_ptr is particularly useful in some important cases:

local variables initialized with operator new
class members that own memory
functions returning new'd up objects which must be deleted by the caller.

It is important in these cases because it does two things: first, it makes sure that the memory gets deleted -- even if you forget, and second it provides compile time detection of mis-uses. For example, an auto_ptr acts like a pointer but is not a pointer -- thus it cannot be easily passed as a function parameter when a regular pointer is expected. Instead, it must be clearly stated in the code whether or not the pointer is being temporarily shared or given up completely when it is passed to the function.

Here is a simple example of an auto_ptr's use:

#include <memory.h> using namespace std; void func() { auto_ptr<int> bob( new int(8) ); cout << *bob << endl; // prints the number 8 } // at this point, the memory owned by bob is freed // as the auto_ptr is destructed

Only one auto_ptr object can own a given piece of memory. Whenever the copy constructor or assignment operator of the auto_ptr class is invoked, it takes ownership away from the source variable. Consider the following code:

{ auto_ptr<int> bill( new int(47) ); auto_ptr<int> tom; // at this point, bill points to 47 but // tom is not initialized tom = bill; // at this point, bill is uninitialized // and tom now points to 47. auto_ptr<int> fred(tom); // at this point fred points to 47 // and tom is uninitialized } // at this point, the integer is safely freed. As you can see, while this is a very useful feature, it very significantly violates what is normally the sacrosanct definition of the copy construction and assignment operations. Further, this causes all kinds of trouble for compilers which are trying to help diagnose programming mistakes. Consider the VA5 compiler from IBM. That compiler, at the time of this writing produces an warning if the copy constructor is written without its parameter being a const & when it used. Normally this is a good thing -- copy constructors should not be modifying the source. Its just that the auto_ptr is a very special case.

While the auto_ptr is a good tool to use, it is not a good design template to follow. Use the extant


       auto_ptr

but don't make your own!

Consider the following code: #include <memory> using namespace std; extern void other_function(float *p); void some_function() { auto_ptr<float> fp( new float(9.1) ); other_function(fp); // wrong! won't compile other_function(fp.get()); // retains ownership other_function(fp.release()); // gives ownership away } The question of course is this: which right:

fp.get() or fp.release()

in the above case? The answer depends on how

other_function

is written.

It is tempting to think that other_function should be declared like this if it were to take ownership of the pointer:

void other_function(auto_ptr<float>); and like this if it were not: void other_function(auto_ptr<float> const &); However this can result in non-portable behaviour due to auto_ptr implementation differences. It can also result in some great confusions when calling other_function incorrectly. Consider the following code: void other_function(auto_ptr<float> p); void some_function( float *p ) { other_function(p); // will destroy p -- is this desired? } Note that in this definition of other_function, and auto_ptr temporary variable will be created by the compiler in order to make the code compile. This will be true even if other_function were written to take auto_ptr<float> const & as its parameter.

The automatic destruction of the temporary variable will destroy the object refered to by parameter p. This is not likely to be the desired result -- and it will be very hard to debug without using purify.

In general, it is better not to use auto_ptr as function parameters -- although function returns are a very good use:

auto_ptr<SomeClass> some_function() { return new SomeClass(parms); } void caller() { auto_ptr<SomeClass> p (some_function()); p->member(); } Writing some_function() such that it returns an auto_ptr to SomeClass guarantees that the returned value will eventually be destroyed -- as is the intent of the author of some_function. Simply documenting that such is needed will not.

Finally, while it is tempting to use a container of auto_ptr's to automate the process of destructing objects whose pointers are stored in the container when the container is destroyed, this won't work very well because of the copy constructor logic of auto_ptr does not in fact make a copy. This of course violates a basic principle in the design of STL containers and they cannot be expected to work correctly using auto_ptr as template parameters.

Instead of using, say a std::vector< auto_ptr< t > > which will automatically destroy the T's owned by the container, derive from the vector and make the derived class' destructor do the freeing:

// Instead of this: typedef std:vector< std::auto_ptr< SomeClass > > Container; // Do this: struct Container : public std::vector< SomeClass* > { ~Container() { iterator f = begin(), l = end(); while(f != l) { delete *f++; } } Container(Container const &src) { // new up copies of the objects in the source // container and store them in this container } Container& operator=(Container const &src) { this->~Container(); new(this) Container(src); } }; After auto_ptr's are initialized they can be changed via assignment using operator = (). They can also be changed using the auto_ptr::reset. The reset method is the basis for its assignment operator. Resetting an auto_ptr deletes the current pointer and takes ownership of another. Generally, it is clearer to use the reset method that the assignment operator. Consider the following example program fragment that parses expressions consisting of terms optionally added together: class Expression; auto_ptr<Expression> parse_term(); auto_ptr<Expression> add_expression(Expression*left, Expression*right ); char current_token(); // returns next token auto_ptr<Expression> parse_expression() //. //. Parse expressions of the form: //. a + b + c //. and return an Expression ptr //. { auto_ptr<Expression> rv( parse_term() ); while(rv.get() && current_token() == '+') { rv.reset( add_expression(rv.release(), parse_term()) ); } return rv; }

Unfortunately, at this point in the history of the STL, the auto_ptr class is implemented/declared in different ways on different compilers. The HP compiler, for example, requires that the strict ansii compatibility compile option (-AA) be used event to use auto_ptr -- and even then it is very selective about non-trivial uses.

The following code will likely work even if the runtime library version causes trouble. This code puts its auto_ptr definition in a namespace, alt_tools. Doing this is not a requirement, and will not work unless namespaces are supported. Simply remove the namespace wrapper if needed.

Due to the unusual nature of the auto_ptr many uses and bugs in various compilers, there is some ugly casting away of const that cannot be avoided. But they mean that care must be taken with this auto_ptr's use (for example, don't use it as a function parameter's type).

namespace alt_tools { template<class T> class auto_ptr //. A self deleting pointer object -- easier to use that //. std::auto_ptr. //. //. a minimalistic definition of auto_ptr that results in //. fewer compile errors and compiler bugs (SPOS MSVC++!) //. //. Not that this definition does not include the member //. templates that life a little easier when using derived //. pointers. Sigh. //. //. Because of compiler wierdnesses, I have declined to include an //. operator=. Instead of using it, use the reset() method. //. { T *p_; //. actual pointer being managed public: explicit auto_ptr(T*p): p_(p) { } //. construct from a pointer but don't allow automatic //. conversions ~auto_ptr() { reset(0); } //. destroy and make unusuable auto_ptr( auto_ptr<T> const &r) // //. copy construct (and transfer ownership!) //. //. Yes this violates const correctness but //. the techniques to make this function work //. properly without doing so aren't truly portable //. yet. { auto_ptr<T> &q = const_cast< auto_ptr<T> & >(r); p_ = q.release(); } auto_ptr& operator=( auto_ptr<T> const &r) //. assign from other ap and own //. //. Shouldn't need this -- use .reset(r.release()) //. instead. //. //. Yes this violates const correctness but //. the techniques to make this function work //. properly without doing so aren't truly portable //. yet. { auto_ptr<T> &q = const_cast< auto_ptr<T> & >(r); reset(q.release()); } auto_ptr& operator=( T* r ) //. assign from and own a pointer //. //. Shouldn't need this -- use .reset(r) //. instead. { reset(r); } void reset(T *p) //. delete the current object and take //. ownership of a new one { if(p != p_) delete p_; p_ = p; } T* get() { return p_; } //. get a copy of the pointer T const * get() const { return p_; } //. get a const copy of the pointer T* release() { T* tmp = p_; p_ = 0; return tmp; } // //. disconnect the auto_ptr from its pointer //. so it won't get deleted when the auto_ptr is destructed T& operator*() { return *p_; } //. perform the dereference on the pointer T const & operator*() const { return *p_; } //. ditto T* operator->() { return p_; } //. access a member give the pointer T const * operator->() const { return p_; } //. ditto }; };

Use auto_ptr to make sure that memory is not accidentally lost due to a failure to call delete either on local variables or on class members.

Use auto_ptr to declare the return value from functions returning memory created via operator new (assuming the caller is supposed to delete the memory).

Do not use auto_ptr as function parameters.

Here are some good habits to develop:

Play in your home directory, not the code base
design all software as if it were a library
Keep it simple stupid
Don't imbed calculations in your UI layer -- they are too hard to test there.
Use maximum warning detection (strict Ansi too)
Test early and often
adopt and stick to a naming convention
Port early and often
Use a CM system; commit and label often
Avoid macros, prefer inline functions
Make your destructors work correctly!
Don't do things half way
Learn from the STL
Static initialization order depencency
Limit the use of 3rd party library symbols
Use nested typedefs to shorten and standardize names
when writing templates, add code to check for erroneous template parameter types

Programming is a fascinating subject with so many facets that it is almost like some kind of giant video game. However, don't play the game in the project code base. Do this in your home directory, not the code base.

For example, if you decide (as I once did) that it would be really cool to write your own operating system, don't jump at the chance to write your first operating system as part of some production product development cycle (as I did). Instead, do this in your own private directories -- and stick to traditional operating systems until you perfect yours (and can convince other people to use it by force of argument rather than by shoving it down their throats just before leaving them to debug it when you go on to the next project where you get to write the next cool thing and stick it in the production code base).

Practice is good, but don't practice in the code base -- that is for work, not play. One of the advantages that are found in the development of good library modules is that they are easily tested. Unfortunately, a lot of application programs quickly lose this ability due to complex internal interfaces that do not lend themselves to easy testing. For example, there may be many functions in a program which were written with the assumption that they will be used but at any given time the program never call them.

The fact that these functions have never been tested will be lost and when program is changed and the code finally does call the neglected functions they will malfunction in strange ways. This often leads developers to tear their hear out in frustration with the famous exclamation, "How did this ever work?!" In this case, it didn't but no one knew.

The principle problem here is that the program was thought of as a sand pile rather than as a building constructed of floors -- each of which could have been individually built and tested. But this view is simply a matter of choice on the developer's part. There is no reason that the "application" cannot be a thin layer of glue logic on top of an existing set of libraries whose interfaces are clearly documented and well defined and easily tested early.

Another important advantage of library design is that it encourages re-use. Re-use is a complex thing that is best accomplished by practice. The more you try to make the code re-usable, the more often you will be successful at it. People writing "programs" rather than libraries rarely pay attention to the usability and documentation details that make or break re-usability.

The pursuit of re-usability is to some extent like a moral philosophy. At any given point in time it might not be in one's short term best interest to adhere to a particular precept of the philosophy, but the belief is that, if done consistently, one's long term best interest will be ultimately served by conformity.

Of course, deadlines and ill considered promises interfere with perfect adherence to the goal of writing all code to be re-used. Further, in many cases, code is known to be discardable. In these situations, it is still valuable to follow the general outline of creating all programs as a collection of libraries which are glued together by the main program. This design makes it easier to understand the parts of the program -- even if they were hastily constructed. It also makes it easier to write a new library that replaces an old buggy one!

In programs that are not constructed as collections of libraries, there are often interconnections between disjoint parts that are difficult to break or even understand. Even worse, some of the interconnections are not physical so much as accidental. For example, if a function never returns a number greater than 10 during its initial implementation phase, a consumer of this function might mistakenly assume that it never will and forget to test for this condition. Part of design code in libraries is the act of documenting the behavior. Library design makes the need for this documentation more obvious than is often found in "application" code where all the source code is bundled together for easy viewing and misunderstanding. Human beings cannot remember to always do the right thing, but computers can help -- use automation to make sure that mistakes are either prevented or caught early.

Products, of course, should be built using either Makefiles, scripts, or IDE project build instructions. This is a good place to automate the detection of the use of prohibited symbols or coding practices -- if such can be quickly detected using grep, perl, or other fast text processing language. Just add additional rules to search the code during build time for violations. The following paragraphs describe some approaches to using the compiler to help prevent problems. Also, see Early Error Detecton in the templates chapter. The principle advantage of C++ is that it provides automatic behaviors such as the firing of constructors and destructors. A standard automatic way of making sure that allocated resources get freed is to declare a variable of a special type whose destructor performs the release. That is declare an object that represents the allocation of a resource. When the object is destructed -- if not before -- free the allocated resource. Here's an example code fragment:

extern void alloc_it(); extern void unalloc_it(); class Locker { public: Locker() { alloc_it(); } ~Locker() { unalloc_it(); } }; ... void main() { Locker lock1; // allocates then destroys // a Locker object which // first calls alloc_it() // then unalloc_it(); } There is of course little reason to create Locker objects on the heap -- unless auto_ptr's are used to make sure they are in fact deleted as needed -- but the principle use of this design pattern is to make sure that destructors get called properly from all possible exists from a function. When using old fashioned C code, failure to call free() after having called malloc is a major source of memory leaks. Consider cases like the following: void print_name() { NameBuffer * p = new NameBuffer; if( get_name(p) ) { return; // bug! } cout << "p contains " << p << endl; delete p; } This code appears to be working correctly -- its checking for a non-zero return code from the call to function get_name() but it is not handling the cleanup correctly -- it forgets to delete the NameBuffer in the event of an error. The auto_ptr class can be used to make sure that buffer gets deleted. Here's a rewritten version: void print_name() { auto_ptr<NameBuffer> p(new NameBuffer); if( get_name(p.get()) ) { return; // no bug } cout << "p contains " << *p << endl; } The auto_ptr class is discussed elsewhere.

Handles are similar to auto_ptr's in that destruction is guaranteed (almost) but differ in that there can be more than one handle to a heap packet -- not just the one allowed by an auto_ptr. Unfortunately, there is no standard 'handle' as of yet. The following paragraphs describe one possible implemenation.

Typically, a handle is a class object that is used like a pointer. However, its implementation is such that there is both a pointer to the 'handled' object, but also a reference count. Creating a handle to a given physical object increments the reference count associated with that object. Destroying the handle decrements the count. When a handle's destructor fires, and the reference count decrements to 0, then the object refered to by the handle is destroyed.

In a single threaded application, handles are a clear win because they prevent memory leaks (mostly) in cases where ownership of objects must be shared between multiple code fragments or classes. However, in a threaded situation, where multiple threads may share control of an object, and thus its handle's reference count must be thread locked, signficant performance penalties can occur. Multiple copies may well be the best solution in a multi-threaded environment. Performance analysis tools may be required to understand one way or the other.

The construction of an auto_ptr object looks basically like this:

auto_ptr<ClassName> p ( new ClassName(parms) ); In this case, when p is de-scoped, the ClassName object is destroyed -- unless it has been somehow released. The same would be true of an object owned by a handle -- except that if another handle refers to the same ClassName object, it would not be deleted until all the handles had been de-scoped. Consider the following example: { auto_handle<ClassName> p ( new ClassName(parms) ); { auto_handle<ClassName> q( new ClassName(other parms)); p = q; // both point to same things, the first ClassName // object is destructed here } // q is de-descoped but the second ClassName object is still // in existence } // destruction of the second ClassName object occurs ehre There is simply no substitute for proper testing. Unfortunately, organizations often confuse the issue by having a 'QA' or a 'test' group. There focus should be on customer acceptance tests. Developers are, whether it looks like it from an organizational standpoint, ultimately repsonsible for testing the code. The QA team should be viewed as the customer's representative in the testing arena -- not the principle source of testing. You developers should create a test suite that they can run before checking changes into your code base. This test suite should run quickly but have far more diagnostic testing turned on that the QA team would know or care to use.

Only developers can write 'white box' tests because they know how the code works. They have to think about the things that can go wrong given the architecture of the code and make sure that tests are written that detect and report such malfunctions.

The QA team should only be writing 'black box' tests. That is, tests written using only properly documented functions that the user is specifically paying for. If the QA team writes tests that rely on undocumented features, it will be very difficult to get rid of features -- and likely be expensive to maintain them -- even though the customer never wanted them. Only developers can write tests that validate the test coverage levels on a per line of code basis. It is extremely important that code be tested to a level of at least 85%. That is, that 85% of the lines of code have been executed during the course of a suite of tests and that these tests do in fact actually operate on real data -- and most importantly that the code does in fact give the right answers. There are several tools that can help you determine if your tests have achieved at least 85% coverage:

purecov by IBM/rational
tcov on linux
pcov on other unix machines
covtool -- see covtool.sourceforge.net

The general approach to test coverage analysis is as follows:

Prepare a special version of your executables and libraries that will produce information about the lines of code that have been executed as the program runs. The technique for doing this varies from tool to tool.
Run all your tests using these instrumented executables.
Aggregate the coverage information produced by all the various runs.
Use the tool to make a report of which lines of code you have executed and which you have not.
If you have not achieved at lest 85% test coverage, add more tests and repeat the above.

An important way for you to detect problems with your code is to execute your tests using some for of memory misuse detection software. Clearly, the best product on the market for this is purify, a product of IBM/rational. Purify is used to create special versions of your executables that diagnose themselves as they run. You run all your tests using purify, from time to time to give yourself the warm and fuzzy feeling that all is well. If it says your program has no errors -- you are likely not to have them. Developers can also use purify to help them isolate where problems are when bugs are reported.

There is a share ware program that works roughly like purify that is avaialable for linux: valgrind. It does not have all of purify's features but since you can download it for free in some settings, it can give you a taste of how valuable purify can be -- and valgrind can be quite helpful in its own right.

Using purify correctly is somewhat complex but if you learn it and set up regular regression testing using it, you will find that the payoff can be very high. Imaging trying to catch a random memory misuse that only occurs after 12 hours of program execution when the program's memory size has grown to 20 gigabytes or so -- how would you go about it? You get a machine loaded with 56 GB of ram, purify the executable, and let it run for 2 weeks until it prints the first message: memory misuse caused by line 102 in file "bla.c" with the following calling stack ....

Luckily most problems can be duplicated quickly, with a little work, but even then, purify can often point you to the exact line of code that caused the problem and describe the circumstance in such detail as to make it easy to fix. If you are unable to use purify or valgrind to help you detect memory bugs, try using the memory allocator's internal tables to detect when its memory chains are broken. Sometimes when bugs envolving randomly writing all over memory. The memory allocator has pointers throughout your heap space. The random writes may hit your memory allocator's chains. The malloc header file may well describe how to iterate over the chains and detect bad links. The Microsoft compiler provides a function called heapwalk to help you do this. Other platforms do not.

If you are using a 3^rd party memory allocation tool, it might might have special features.

Another 'doit it youself' technique is to superseed the global ::operator new. This function basically just calls malloc and returns its return value. You can intercept these calls to new and add some extra space at the beginning and ending of the heap packets requested from malloc.

You fill these packets with known values in ::operator new. Then, during ::operator delete (), you check to make sure the correct values are still there. If they are not, then you know that your program has misbehaved.

If you try something like this, remember the following:

the size parameter to operator new can be any number
memory can only be allocated in fixed size chunks -- usually multiples of sizeof(double)
adding space to the beginning of the packets must be done in multiples of the chunk size or you will get memory crashes caused by your implementation of operator new ()

What's in a name?

As annoying as it may sound, if you are working in a porting environment, you have to take into account the mistakes of the compiler vendors and those of the vendors of any third party libraries.

One of the most tedius problems encountered is the use of the preprocessor to define names that you want to use for something else. For example, the name 'unix', as well as 'UNIX', and 'Unix', are highly likely to be #define'd to be '1' on some system or other. This means that you cannot define a class, a macro, a function, or a variable named simply 'unix', or 'UNIX', or 'Unix'.

Sadly, there is a myriad host of simple computer science related words that should not be used for the same reason. There is no fixed list, but a simple rule to follow is this: do not use any simple english word that might be computer science related. Instead, used underscores to separate word fragments. For example, instead of using 'unix', use 'unix_flag', or some other such thing.

The compiler vendors are supposed to put leading underscores on their #define's -- so you should never define a symbol that begins with an '_'. For example, don't use '_unix', nor use '__unix'. Nor '_bob', '_srikanth', etc. Leading underscores belong to the compiler vendor.

Nesting your symbols in namespaces, functions, and classes does not help if the symbol is defined in the preprocessor.

The only work around to this problem if you have unwittingly caused it is to put #undef's in your source somewhere after including the header file defining the symbol causing you trouble. For example, the stdio.h headef file defines the name 'fileno()' as a macro. If you had large body of code and you could not easily change your own variable name, fileno, to something else, but you absolutely have to include stdio.h somewhere where your own fileno is defined, you can do it like this:

#include <stdio.h> #undef fileno #include <your_header.h> .... // code using your version of fileno

Developing and sticking to a standard naming convention can greatly ease your porting burdens -- and improve the understandability of your code even if you aren't porting. Here is a simple example of such a naming convention:

Do not define any name that begins with a leading underscore.
Avoid #define whenever possible -- prefer typedef's, inline functions and const variable's instead.
Avoid #define, symbol, class, and variable names that are simple computer science terms or operating system names, etc -- use longer names separated by underscores instead.
Whenever possible, do not pollute the global namespace -- that, put your symbols in namespaces or nested within classes.
The standard library uses all lower case identifiers, so projects can use mixed case names for their own purposes. Here are some example choice:
- Class, typedef, and enum type names could be distinguished by making them mixed case: Some_Class, or SomeClass.
- Namespace, variables and function names can be lower case because functions are easily distinguished by the '()'s that immediately follow their use: fred_is_bad, func(), etc.
- global or namespace constants, enumeration values, and #define'd constants can be all upper case with separating underscores.
- class members can have a trailing underscore: member_one_.
Source code file names should be limited to characters that are known to port to all target platforms of interest. This means that you should in general use the unix file pathname syntax -- ie with a '/' as a separator rather than the '\' (backslash) character in any #include directives. This is because the microsoft compiler will recognize the '/' as a path separator but the unix compilers will not recognize the backslash ('\').
When writing #include directives, it is generally inadvisable to use fully qualified pathnames. No to versions of the same operating system / compiler combination will guarantee continuity of the names of directories where standard tools can be found. Usually they are consistent but there is no guarantee. Rather than imbed specific pathnames in your source code, rely on the compiler's -I parameters to help you find the code.
Also, the use of double quotes as opposed to angle brackets is a bit tricky. Some compilers treat the double quotes as being exactly like the angle brackets and some (Borland) treat them differently. The difference has to do with the start of any search for the named file. G++ also seems have unusual features in this regard. It is best to avoid the double quotes unless you understand the porting implications for the specific platforms of interest.
To reduce the chance of a name conflict between header file names, it is advisable to refer to project header files using the directory name as well as the file name. This requires that when you compile, the -I directives be set up correctly to make it work. So, instead of using this:
#include <myfile.h>
You would do something like this:
#include <my_project/myfile.h>
This prevents trouble caused when 'myfile.h' is defined in some third party library as well as yours.
Note that this approach can also reduce the number of -I statements needed by the compiler on a large project. For example, suppose your project source code tree looks like this:
main_dir
  lib1
      header1.h
      header2.h
  lib2
      header3.h
      header4.h

Instead of using using two -I statements on the command line to the compiler like this:
-Imain_dir/lib1 -Imain_dir/lib2
You could just use one:
-Imain_dir
This means of course that the header files would have be included using the directory name fragment:
#include <lib1/header1.h> #include <lib1/header2.h> #include <lib2/header3.h> #include <lib2/header4.h>
In general, header file names should have some file name extension, preferably .h. Unfortunately, writing 'make' rules that properly handle all the cases is made much more complex if you leave the file name extension. Also, Microsoft Windows and most browsers rely on the file name extension for type information.

Here is a short example source file obeying these rules:

   #ifndef SOME_CLASS_DEF_H
   #define SOME_CLASS_DEF_H 

   extern const int MAX_DOG_COUNT;

   namespace our_company
   {
     void 
     some_function (int some_parm)
     {
	float some_variable = some_parm;
     }

     struct Some_Struct
     {
       typedef std::vector Vector;

       Vector vec_;

       Vector const &vec () const { return vec_; }

       enum Constants
       {
	 ONE=1, TWO=2
       };

     };

   };

   #endif

Despite people's best efforts to review and understand code, the language is sufficiently complicated that accidental features can become essential to the proper operation of your code. This is very bad. Out of the blue, one day, your code will suddently malfunction after some trivial change -- and after spending many hours tracking down the problem -- you will say to yourself "how did this ever work!"

Porting your code to other compilers can help detect such things with little work on your part. Other compilers will have had other developers write them. These other developers will have made different choices about warnings and error detection. If you port your code to a variety of platforms, one of them will likely give you a good error message about the problem you have unwittingly created. Therefore, it is wise to start the porting process as early as possible -- rather than leave it to the end.

Porting the code envolves both getting it to compile and also running your regression tests on that platform. You should do both of these things early and often. When the time between a code commit and the discovery of the breakage is large, it becomes more difficult to remember what you did that might have caused the problem. Not all porting errors appear as compile errors -- some are just bugs. Detecting a bug on a porting platform long after the author created it is very tedius.

Some porting difficulty is caused by floating point numbers -- you will get minor differences in roundoff or stream output behavior. These differences are not necessarily bugs -- but they might be.

One way to work around such 'allowed' differences between operating system behaviours -- particularly with respect to floating point numbers is allow different results on different platforms based on specific test results on specific platforms. You should be wary of just ignoring the test results on a platform though. You just need some way of officially stating that 'test A works differently on platform P than on platform O'. Don't let these differences between platforms dissuade you from automted test mechanisms -- use scripts if you have to enable automated testing for all platforms.

assuming that pointers are the same size as integers or longs -- there is no guarantee that this is so.
assuming that integers and longs are the same size
using platform specific behaviours
- header files that exist on only one platform
- run time library functions that exist on only one platform
- treating vector<T>::iterator as it if were simply a pointer not a full function class object (which cannot be converted directly into a pointer!)
- forgetting to avoid symbol names that might easily be defined as macros on some other system -- such as 'unix'.
different compiler revision levels on different platforms means that some features of the language have to be avoided
assuming that all local and global variables obey the worst case memory alignment criteria. For example, the following code may crash and it might work fine:
Here is a slight change that will gaurantee the proper alighment:

C++ is a big complex language and there are many stepping stones that often become stumbling blocks because developers fail to make complete implementations.

A common mistake is to make a class object with a variety of data members and functions but which does not make guarantees about the state of these data members. You do not have a class, but rather a named 'pile' in this case. And just like a pile of sand, as you add more grains to the top, you will eventually see a catastrophic collapse as all the sand grains slide to the ground.

Don't make piles of data. C++ classes should be objects whose internal states are guaranteed at all times which makes sense. Here are some general guidelines for completeness:

Watch the ownership of new'ed up memory! This isn't java -- you must take responsibility for memory which has been new'd up.
Make your destructors work. You might want to use auto_ptr's to simplify your life.
If your class has any pointers which refer to memory which has been newed up in the constructor or which the class has been given ownership thereof you are likely need to create all of the following:
- copy constructor
- destructor
- assignment operator
- comparison operators (if needed)
Failure to make these things when needed is silent but deadly -- it will come back to haunt you. This is an easy thing to look for in code reviews: if you own memory, you need all the above.
When writing templates, make add template type constraints to help clarify errors that your poor user is likely to get. See below.

One of the more interesting features of the C++ language is that it provides static initialization using function calls and class object construction -- unlike C which only allows for initialization via constant expressions. Basically, this means that you can instruct the compiler to intialize variables in a user defined (and complex) way before main() begins executing. Here is an example of a static initialization:

// at file scope extern int fred(); int bill = fred();

A slightly different variation of the above theme occurs when you declare a global variable of a type which is a class having a constructor -- or has members which have constructors. For example:

// at file scope #include <class.h> Class varname; // this is a static init

An almost infinite number of useful ways can be imagined to use this feature. However, it should be used with great caution. The following sections will describe various problems that can occur if you do use it. The problems are such that you should able to avoid them all, and use static intialization to your hearts content. But, "should" and "likely to" are very different in this case -- for reasons discussed below.

In general, you should not rely on static initializations of this form in production products -- at least not those in which you place code in object module libraries.

There is always a special case that will work, for a few releases of yor product at least, but in the long term, after newly hired people have taken over maintenance of your product, static initialization will eventually come back to bite you for reasons described below. It will be safer if you find a way to initialize all your program variables after main() begins executing. Typically class objects should not be allocated statically because of their constructors -- but you can allocate static pointers and fill those pointers after main() begins executing. In the example from the prior section, a declaration of an external function, fred() is made, and a variable name bill is defined. Also, before main begins executing, bill is initialized with return value of fred()

This is a nice feature, but it leaves you open for some vary nasty shocks. What if fred() requires that bill be properly initialized before you call it? It will return an undefined value -- which may be annoyingly constant until you port to another platform and then you will get nasty surprises.

You might think that an intelligent developer is not likely to make this mistake. However, experience shows this not to be true. And worse, it is not one developer that is involved in an error of this form. The original author of fred() may well know not to use variable bill as party of its implementation -- but subsequent maintainers may not know that bill is initialized by calling fred() and may unwittingly make fred() dependent on bill -- perhaps even accidentally. Consider, if a change to fred() envolves a call to function tom() and the developer who modifies fred() did not write tom(), how will he know that tom() uses the global variable bill?

There is no solution to this problem -- you just cannot write code that depends on a global variable safely if the function is used to initialize the variable or any other variable envolved that variable's initialization. Luckily bugs of this form are easily caught. Unlike the obvious order dependency bug described above, other, more subtle order dependency bugs can creap into your statici intialization logic. Suppose your program has two global variables which are both statically initialized and further suppose that the second variable's intialization requires that the first be initialized it can be initialized correctly. Will this situation work? Here is a concrete example. Consider the following files:

File h.h

extern int g (); extern int f (); extern int var_one; extern int var_two;

File one.c

#include <h.h> int var_one = f();

File two.c

#include <h.h> int var_two = var_one + g();

This program will work correctly one if var_one is statically constructed before var_two. That will only happen if file one.c is linked into the program before file two.c.

In a program with only two object modules, this is easily controlled. Typically, the linker places the objects into the module in the order in which it encounters them. All you have to do then, is make sure that file one.c's object module appears on the command line to the linker before file two.c's object module. But what if file one.c and file two.c are in libraries? The linker has no way of knowing that it should put file one.c's object first. It might put it first, and it might not depending on a large number variables. Further, different linkers on different platforms will make decisions differently -- and thus the code might work one platform in one release, but may not work on other platforms in that same release -- and may not work on any platform in a different release. Here are some approaches for forcing initialization after main():

You could make all variables that require function calls to initialize them be pointers -- that means all global variables and static class objects. You then publish the fact that before using your code, the customer must call some initialization function you provide -- which will fill those pointers.
You could make a global linked list of 'stuff to do' and have main() execute code to 'do' all the stuff. This of approach requires that you create static initalizations that append the 'stuff to do' for each file into the global linked list. You also have verify that the global linked list gets linked into the exeuctable first.

This may not seem like a safe solution -- given all the problems documented above, but the fact is that it is possible to get a single object module linked first -- you just can't get a bunch of them linked first. Besides, it is better to have your main() program call a single function to handle all initializations everywhere than to have a long list.

So you want your code to be fast!. Consider the following ways of achieving this:

don't do things you don't need to do
don't leak arge amounts of memory
reserve threads for very special situations
The "Big O" -- understanding algorithmic complexity
1. STL container overview
2. using std::vector
  - Fastest 'find' but slowest insert
  - Pre-allocate space if you can
3. using std::list
4. using std::map
use the throw() keyword when you don't use exceptions
virtual function calls
maximizing the chance of inlining
Malloc performance and operator new

Algorithmic complexity is measured in 'orders'. The orders are typically something like this:

O(1): Order 1. When something takes 'order 1 time' it means that the cost for accomplishing the task requires a constant amount of time -- which is to say, 'outstanding' from a complexity viewpoint.
There are no O(99)'s or any other O(constant number) -- O(1) suffices to describe this case. Even zero time, when no work at all needs to be done, counts as O(1).
O(ln(N)): When N things are available to be processed, the cost for actually doing the work is only logarithm(N) -- which is to say 'great' because ln(N) (as well as log(N)) is always less than N when N is greater than 1. Whether base 2 or base 10 does not matter.
O(N): When N things are available to be processed, the cost for actually doing the work is N. That is to say, you have to process all the objects at least once. This is 'good'.
O(N*ln(N)): When N things are available to be processed, the cost for actually doing the work is N * ln(N). That is to say, you have to process all the objects at least once. This is 'ok'.
O(N^2): When N things are available to be processed, the cost for actually doing the work is at least N squared. Which is to say 'bad'.
O(F(N)): When N things need to be processed, the cost for actually doing the work is N^3, N!, or some other horrifyingly huge amount of time.

Note that because an algorithm if O(1) does not mean that it will be fast. It could be a large O(1). However, O(1) does mean that the amount of data being processed does not matter. Which means that adding a million times as much data does not making the code any slower.

Consider the cost of the following operations on an STL container: first, insert 1 million items, then find each item in a random order. The following table describes the cost (time) required for each step:

Container	Insert Cost	Find Cost
vector	O(N^2)	O(N^2)
list	O(N)	O(N^2)
map	O(N*ln(N))	O(N*ln(N))
set	O(N*ln(N))	O(N*ln(N))

The above table is an example of the 'worst case' situation. It is applicable when large amounts of data must be randomly created and searched.

The use of threads may or may not improve the performance or responsiveness of your programs:

The simple fact of starting a second thread causes malloc to run a lot slower than it did before. If your program makes a lot of small malloc calls (or in this case operator new) a threaded program will spend up to 50% of its total runtime waiting for memory. This is true, even if all threads save 1 are actually waiting for something to do.
Global variables which are truly shared between threads must be guarded with mutex locks to prevent data corruption. Waiting for access to a variable (both and read and write) can consume a large portion of your program's runtime.

To overcome the above problems: pre-allocate all the memory for the threads, and architect a program so that each thread has its own data to process -- with little or no overlap in access to global variables. However, retrofitting a large application, which was not designed with threads in mind, can be difficult or impossible -- and simply turning on threads may cut your performance in 1/2.

In general, before committing to the path of using threads in an application, verify that it will in fact speed it up to do so. Numerous experiments are generally required for this. These experiments should be done on a multi-processor machine because single cpu boxes will hide certain programmatic bugs. Failure to put mutex locks around accesses to global variables will result in program malfunction much faster on a multiple cpu machine than on one having only one cpu.

It may be advisable to break an application into two programs. One that does not have threads -- but executes commands in serial. The second is threaded and handles i/o such as web transactions which benefit from thread parallelism. The threaded application can martial data and serialize commands to the non-threaded engine (thus allowing it to run a full speed).

An alternative to this draconian separation is to have an executable which is only threaded during certain phases. When performing lots of small mallocs, for instance, thread could be turned off. Then when the memory is allocated, turn threads on. This only works if the threads are all dead though -- not just stopped during the non-threaded periods.

Another problem with threads is that the stack size is generally fixed. Unlikely the primary thread of an application, the secondary thread stacks do not grow as needed. This means that you have to be more carefull with recursive algorithms implemented in threads than in a non-threaded program -- which is to say, that you have to guess correctly about the stack size. Technically, of course, all recursive algorithms can overflow any stack size -- even the program's main stack. However, threads tend to make the problem worse.

Starting threads is faster than launching a whole new process -- but it is not infinitely fast. In really high performance situations, a pool of threads which are already up and running but waiting to handle the next command is likely to give the best results. The threads in this pool might not terminate until the application does.

In general, threads should be reserved for special situations where each thread can have its own data to process -- with little or no dependence on global variables or interaction with other threads. If possible, pre-allocate all the memory to be used by the threads because malloc must perform mutex locks.

Making class members virtual does not add a considerable overhead in most situations. There are a few cases where care should be taken to avoid them. All these cases boil down to code fragments that do a large number of function calls which do not in themselves do very much. Some obvious cases are:

container class member accessors and iterator methods
input/output functions

Consider the input/output functions. Suppose a 1 megabyte file is being read using std::istream_iterator<T>. Such an iterator allows for algorithms of the following form to be written:



   //
   // Function to read 1 megabyte from an input stream 
   // using an std::istream_iterator.
   //
   typedef  std::vector<char>                         CharVec;
   typedef  std::istream_iterator<char, std::ostream> StrmItr;
   void 
   read_1mb(StrmItr  &it, 
	    CharVec  &buffer
	   )
   {
      StrmItr           end;
      CharVec::iterator out   = buffer.begin();
      int               count = 0;

      while(count != one_million && it != end)
      {
	*out++ = *it++;
	++count;
      }
   }

Despite the fact that iterators are envolved, the above code can approach the speeds of simple character pointer operations -- if the following methods are all inline and simple:



    CharVec::iterator::operator++(int);
    CharVec::iterator::operator*();
    StrmItr::operator*();
    StrmItr::operator !=(StrmItr const &);

If these functions are not inline, there will be very noticeable performance impacts compared to implementations where they are because of the very large amount of data being processed (1 megabyte) and there being several function calls per byte.

In fact, in all major implementations, the istream_iterator and the ostream_iterator are not truly inline.

No virtual method gets actually called in an inline manner unless it is declared inline and is referenced using the class member operator (::). In some implementations, the ostream and istream i/o methods are virtual members of a base class -- resulting in at least one true function call for every i/o operation in the calling code.

Luckily, there is a pair of alternate classes that are:


      std::ostreambuf_iterator
     
 std::istreambuf_iterator

The function above runs much faster using istreambuf_iterator;

In all major implementations, the std::vector member accessors are made as close to purely inline as possible. Using the inline keyword is not always sufficient to ensure that a given function or class method is in fact inline. The reason for this is that the langauge definition allows compiler vendors to determine that an given function is too complex to inline and in these cases the function is implemented as a static method in all translation units that reference it. Understanding the rules for all compilers is a bit of a challenge. In general the following things prevent inlining on at least one and probably more of the major compilers:

large functions
functions with loops (while, for)
recursive functions
functions with multiple return statements

Sometimes there is nothing to be done, but sometimes a trick can be used to allow the most common paths through the candidate function to actually be inlined -- while leaving the uncommon paths as outline functions.

For example, suppose a function has a while loop in it -- but it is only used some times by the function, whereas most of the time the function does something very trivial. This code can be split into two functions, one inline without the loop, and the other outline with the loop. Here is an example of the original function:


  //
  // extract characters from an input string
  // but ignore the space characters
  //
  inline
  void
  fetch_ignoring_spaces(char** in, char *out)
  {
    while(**in == ' ')
       ++(*in);

    char c = **in;

    ++*in;

    return c;
  }

As can be seen, there is a loop in the above code to ignore repeated space characters in the input data. This will prevent inlining on many compilers. Inlining can be achieved though with a simple change: add a new function that begins inline and switches to out of line only if it has to:



  //
  // slow version way of extracting characters from an input string
  // but ignore the space characters
  //
  void
  slow_fetch_ignoring_spaces(char** in, char *out)
  {
    while(**in == ' ')
       ++(*in);

    char c = **in;

    ++*in;

    return c;
  }

  inline
  void
  fetch_ignoring_spaces(char** in, char *out)
  {
    if(**in == ' ')
      slow_fetch_ignoring_spaces(in,out);
    else
    {
      char c = **in;
      
      ++*in;
      
      return c;
    }
  }

The C++ operator new function ultimately calls the C language function malloc. Most operating systems provide an implementation thereof that solves most application problems in an affective if not optimal manner. There are a couple of cases where the built in malloc may turn out to be less than optimal:

if the program allocates a large number of small objects
multi-threaded applications

When a performance analysis tool such as quantify, prof, indicates that a program is spending large amounts of time in malloc, the first step should be to determine if the program, cannot easily be changed in some way to reduce the number of calls. However, this is not always easy or the best solution. After market memory allocators are available that can give significant performance optimizations without major code rewrites. Consider trying:

In addition to speeding up malloc, these tools typically provide additional built in error detection that can be turned on with little effort.

It is generally inadvisable to use these tools when building a purified executable -- so the build process must provide for ways of choosing the 3^rdparty memory allocator or not in a given build.

Home grown memory allocators may work work well in non-threaded applications but writing an efficient allocator that works well in a threaded application requires careful design and a lot of debug time. It is best left to specialists.

Using operator new to initialize local variables is generally dangerous -- although it is certainly a ubiquitous practice. Consider using auto_ptr's or handles.

Stroustrup has written several times about the desire, when adding language features, to not force an expensive feature on programs that take no advantage of it. Generally, the language does not do this, but exception handling is an "exception" to this general rule.

The mere fact that a program is compiled in such a way that allows exceptions to be handled forces it to accept a performance degradation of between 5% and 25% depending on the compiler.

Compilers often provide a way to turn exception handling off at compile time. If exceptions are not used in the program, then it will be worthwhile to use this option. However, this might disable use of the STL, etc.

If a specific function does not use the throw keyword and does not call any other functions that do, it can be individually marked as 'not supporting exceptions' by using the following syntax:

void function() throw() { // do not throw any exceptions }

Sometimes the bulk of a program does not use exceptions but some small part does. If the part that does use exceptions is invoked from a single function call, or small number thereof, it might make sense to compile that section of the code with exception support turned on -- and never let any exceptions leave -- that is, have the highest level function in the exception handling part of the program catch all exceptions.

If you must live with exceptions and are noticing a signficant performace drop, try increasing the compiler optimization levels. Unfortunately, this may result in compiler bugs and thus some work to find the modules that can not be successfully compiled at higher optimization. Object oriented programming methodologies spend a lot of energy on determining the specific needs the high level object types -- the application objects. This is obviously good in that all requirements should be met and no unnecessary functionality should be included as it merely burdens the implementation without value.

However, the bulk of the C++ classes that get implemented are not high level application objects but rather represent temporary values and other objects not directly visible in the UML, design diagrams, etc. Attempting to design all these classes using design diagrams may provide a low return on the investment of effort.

Further, focusing only on the specific needed features of these many "helper" classes as understood at the time the project is begun can result in program bugs because the objects will be used in ways not conceived up front. The trade off between over design of simple classes and under design will often fall towards overdesign -- simply to eliminate unnecessary debug effort when these classes are used in ways not originally conceived. At a high level, however, the trade off often falls the other way. Even the low level classes need to be implemented completely. Documenting that such and such a feature does not work is not good enough -- people often think that understand code when they don't. When designing classes try to help prevent such errors using one or all of the following:

implement all features a reasonable person would expect to find.
use the private keyword to force the compiler to complain about misuses.
imbed diagnostic code (perhaps only in debug builds) that verifies the internal state of objects is correct (as best you can).

When an class represents a number or a string, it is often imagined by its users as if it were an integer with special properties. It is easier to invision extra properties than missing properties. Thus, it will save debug time and eliminate bugs if classes always have at least the following members properly implemented: class SomeClass { public: SomeClass(); SomeClass(SomeClass const &); ~SomeClass(); SomeClass &operator= (SomeClass const &); }; In the event that a class cannot be copied, the member functions that do the copying should be specifically made private and not implemented: class SomeClass { public: SomeClass(); ~SomeClass(); private: SomeClass(SomeClass const &); // Not implemented SomeClass &operator= (SomeClass const &); // Not implemented }; Failure to make these methods private will result in the compiler generating a bad version when code mistakenly copies an object for which copying was not intended.

The compiler generates bad code in the event that the class owns a pointer that should be unique to every instance of the object. The compiler won't know this, so it won't copy the objects pointed to, only the pointers -- leaving two objects pointing to the same thing. A key reason that C++ is better than C as a programming language is that it provides for data hiding. That is, instead of simply exposing data members to all callers -- some of which will not be perfect in their understanding of their responsibilities when using the class -- hide the data and provide members to correctly perform all needed activities on that class.

Unfortunately, a lot of code is written that does something like the following:

hide a data member
provide a method returning a reference or pointer to the member

For example: class BadClass { int member_; public: BadClass(); ~BadClass(); int & member() { return member_; } }; The purpose of encapsulation is to make sure that fundamental assumptions about a class' design are always maintained. This is impossible if references and pointers to members are exported and it is difficult for friend classes are used extensively.

Instead of export references to members, export public functions that do all needed things with the members -- and implement them in a way to maintain the class' design assumptions.

Of course, from time to time one might want to export a member reference for speed reasons -- but this should almost always be limited to const references. Consider:

class GoodClass { SomeType member_; public: BadClass(); ~BadClass(); SomeType const & member() { return member_; } void change_member(int parm); }; In this case, member() is an inline function which gives fast but un-modifiable access to the member. If changes need to be made quickly, make change_member() fast rather than making member() return a writeable reference. Object oriented programming techniques tend to focus on class hierarchies but this causes an inherent reduction in execution speed when polymorphism is used. See Virtual Function Call Performance above. As described there, the penalty is not always high enough to be concerned with, but when implementing container classes, I/O methods, or any methods where large numbers of objects will be processed it can be high enough to warrant consideration.

An alternative to inheritence is the use of template algorithms. Templates increase the chance of achieving inline code performance and because of template specialization can actually be more flexible -- in that basic assumptions can, for specific template parameter types, be changed. Of course, templates do not provide the 'is a' relationship that inheritence provides.

That is, a family of template classes provides the same basic functionality as a class hierarchy where the inheritence is private rather than public.

The two approaches are not mutually incompatible, of course. A template class specialization can be designed so that it is derived from a non-template class. For example:

class SomeBase { ... }; template<class T> class SomeTemplate : public SomeBase { public: .... };

In this case, any SomeTemplate 'is_a' SomeBase object. There are several reasons to design a class hierarchy using templates for the derived classes:

mainly to get the 'is_a' relationship
but also to get inline speeds rather than pay the virtual function overhead
but also rely on the default implementations of virtual methods of the base class
to enable all templates to share the same static data members of the base class

If the 'is_a' relationship between the base and derived classes is not desired, using private inheritence.

When you write templates, add in template parameter type constraints.
Port early and often!
Instantiation models
Templates make debugging harder
- So test the heck out of them
- Use them when needed to keep code simple -- not just because they are cool.
Understand compiler limitations
- The 'normal instantiation model (COMDAT)
- The AIX VA5 variant and its limitations (static objects)
Use templates to make maintenance easier not because they are cool.
Avoid const types as template parameters
Policy template parameters.
Template specialization, tool of the gods!

Because templates make understanding compile errors a bit difficult, it is advisable when writing templates to put in code to help clarify errors that are likely to occur later. Stroustrup calls this 'adding template parameter constraints'. That is, you should put code into your template class or function definitions which is likely to produce understandable compile errors when the template is misused rather than simply allowing the natural errors to occur.

Consider the following example:

template<class T> class Templ { public: Templ () { typename T::known_member *t1=0; // If the above line fails to compile it // means that T is of the wrong type. You // can only use class X, Y, and Z. } };

As you can see in this example, a single line of code that does nothing more than declare a pointer and set it to zero is used to intentionally cause a compile error if the wrong template parameter type is used. Further, the line of code about which the error occurs has only one purpose -- to cause the error. This makes it very easy to understand what the error means. On the other hand, if left it to chance, you might get an error of the form:

Error, file "x.c", line 100: can't construct a Blurb from a Plarf.

And what would that mean? If you looked up line 100 in file x.c, you would likely see some perfectly normal code with no hint to the fact that you should never have been using a Plarf reference in the first place as a template parameter. But when you get a compile error telling you that


     
       Error, "file "y.c", line 237: Plarf has no member type named 'known_member'

And a comment, such as the one in the example above, tells you that you can only use X, Y, and Z as template parameters to template Templ, you can easily understand your mistake. The following sub-topics are covered in this section:

Overloaded template functions
Passing types as parameters
Distance algorithm optimization example
Template class specialization -- and category classes

Templates are normally a mechanism implementing the same algorithms and data structures across a wide variety of data types. Specialization lets you actually have slightly different algorithms based on the types actually used. That is, you get to have it both ways.

There are two types of specialization -- overloaded functions and explicit specialization. Like normal functions, template functions can be overloaded (at least in the compilers conforming to the 1998/09/01 standard -- and most do). Unlike normal function signature overloads, however, overloaded template functions let you apply algorithms to broad classes of types. For example, you could write one template function that applies to pointers, and another that applies only to non-pointers.

Consider the following functions that let you have different algorithms for different data types using the same function name (check()):

// //Header file 'check.h' // template<class T> void check (T &t) { printf("Not a pointer\n"); } template<class T> void check (T *t) { printf("Pointer!\n"); } template<class T> void check (T const*t) { printf("Const Pointer!\n"); }

In this case, the difference in algorithms is quite trivial: we are just printing different text. However, it is possible using this approach to have completely different algorithms. Here are some example uses of the above functions:

#include <check.h> int main () { int t; int *tp = &t; int const *tcp = &t; int ta[4]; check(t); // prints 'Not a pointer' check(tp); // prints 'Pointer' check(tcp); // prints 'Const Pointer!' check(ta); // iffy -- see below } Unfortunately, you can't get too carried away with this approach for specializing. Some compilers are very picky about the placement of const and the reference operator (&) and it may be that you cannot sucessfully port your code if you get too sneaky.

For example, the HPUX aCC compiler, at least in the older versions, will not promote an array name to be treated as a pointer. That is, on older HP compilers, the check(ta); line in the above example will print 'Not a Pointer'. But on solaris, aix, g++, and Microsoft 7.0+, and the Intel C++ compiler it will print 'Pointer'.

Additionally, some older compilers, like HP will produce copious quantities of meaningless warnings about the above code. The warnings are mistaken per the language standard but it will be faster to suppress the warnings using compiler directives than to wait for HP to fix the compiler (;->)

Note also that you cannot, portably at least, declare this overload in addition to the ones above:

template<class T> void check (T const &t) { printf("Not a pointer\n"); }

If you do, you will numerous real overload resolution conflict errors particularly on aix.

A word to the wise: if you write your own templates, like the above, make sure you test your code on all platforms of interest and using all parameter types of interested (particularly pointer versus non-pointer and const versus non-const) before checking your code into your configuration management system.

At first, the title of this section would seem to make no sense -- how can you pass a type to a function -- and don't all parameters carry type information? Why would you want need to only pass the type by itself?

Well, I'm glad you asked. Because template functions can be overloaded based on the types of parameters, you can create a family of functions with the same name but providing slightly different algorithms based on the types of the parameters actually passed. See Overloaded template functions above. And, yes, you can do this without doing anything special -- it is just helpful sometimes to be able to explicitly pass a parameter whose only purpose is to help establish which algorithm to use. For example, you might want a variety of copy() algorithms, each differing only the type of iterators it deals with, albeit with profoundly different algorithms based on those iterator types. You could have a variety of algorithms with exactly the same name and almost identical parameter lists, or you could have a single copy() interface that detects the types of iterators and then invokes a different algorithm, say copy_helper() with an additional parameter whose type makes it absolutely clear how one copy_helper() differs from another.

Passing a 'type' is really nothing more than passing a 0 -- its just that the compiler knows 'what kind of 0' you are talking about. The runtime performance implication is that of passing an extra parameter to the function.

When you pass a'type' as a parameter to a function, you are really doing nothing more than forcing the compiler to pick a specific overload from the family of functions with the same name. This is done extensively in the STL and can occassionally be useful as a general purpose tool to reduce unnecessary duplication of similar text (and the associated duplicate maintenance).

A good example of this from the STL is the distance() family of functions. This family of functions has the simple task of counting the number objects between two iterators.

The general form is like this:

template<class Iterator, class Size_T> void distance (Iterator first, Iterator last, Size_T &count) { count = 0; while(first != last) { ++count; ++first; } }

Warning: some compilers implement this function a bit differently (usually AIX or the Microsoft compiler will be the divergent forms. Perhaps they will implement the function such that returns a 'size_t' instead of taking the count parameter. Make sure you test any code on all platforms of interest before committing it.

As you can see the general algorithm is required to actually step through all the locations the iterator can point to. However, this is O(N) -- a good 'Big O', but not a great one. With some kinds of iterator, this is the best you can do. However, for iterators that are actually implemented as pointers, this is horrible. You should be able to do the calculation in O(1) time. In fact, for all random access iterators you should be able to calculate the distance between two iterators in constant time.

That is, the random access iterator form should look like this:

template<class Iterator, class Size_T> void distance (Iterator first, Iterator last, Size_T &count) { count = (last - first); } Again, compilers very with respect to this features

So the question is, how do you instruct the compiler as to which algorithm to use? The STL answer is to provide a family of related functions that perform the task in different ways and used overloaded template functions to implement the alternatives.

The general scheme works like this:

There is a public interface that is documented.
The public interface serves only to detect which specific interface needs to be actually used, and routes a call to that 'private' interface -- although it is not 'private' in the C++ sense.

In the distance() example, you could implement the public interface like this:

template<class Iterator, class Size_T> void distance (Iterator first, Iterator last, Size_T &count) { distance_helper(first,last,count, iterator_category(first)); } Again, compilers very with respect to this features

In this case, there are two iterator_helper implementations: one for plan vanilla iterators and one for random access iterators. There must also be a family of overloaded functions that help you detect which of the two helper algorithms should be used. Of course, this concept of 'iterator category' goes well beyond the distance() algorithm, so the STL provides a general purpose technique -- the iterator_category() family.

This functionality is documented in other places, but the general idea is that each of the kinds of iterators, is represented as a struct with no members. Something like:


      struct input_iterator_tag   {};
     
 struct output_iterator_tag  {};
     
 struct forward_iterator_tag {};
     
 struct bidirectional_iterator_tag {};
     
 struct random_access_iterator_tag {};

There are also, a collection of functions of which return an object of one of the above iterator category tags. Pointers are treated as random access iterators. The general format of these routines is like this:

template<class Iterator> some_iterator_tag iterator_category ( Iterator const & it ) { return some_iterator_tag(); }

Given that this family of routines exists, and there are two distance_helper functions with signatures like this:

     template<class Iter, class TagType>
     void distance_helper (Iter first, 
			   Iter last, 
			   Size_T size, 
                           TagType t)
     {
       // see the generic O(N) algorithm
     }

     template<class Iter>
     void distance_helper(Iter first, 
			  Iter last, 
                          Size_T size, 
			  random_access_iterator_tag)
     {
       // see the fast O(1) algorithm
     }

then, the optimal algorithm will be chosen:

for pointers and random access iterators (like those in the vector class) the O(1) algorithm will be chosen.
all other iterators will get the O(N) algorithm.

In addition to template function signature overloading as a way to determine the 'class' of an object, or other categorization thereof, you can also use template class specialization. When you specialize a template class, you can add, delete, and change nested template types -- and typedefs. For that matter, you can override the name of a class and make it a function -- although that is a bit wierd. The stl numeric_limits is a good example of this. There is also the char_traits, etc. For an example of class specialization in a manner similar to the stl approach, see below.

Specializing a template class is done in manner quite different from function overloading. Although, it is possible to specialize template functions in a manner similar to template class specialization. Explicit template function specialization lets you override the function body for a given type. Explicit class specialization lets you override a class' body for a given template parameter. Template specialization provides an alternative to virtual methods. Virtual function calls are neither terribly slow, nor terribly fast, but as Stroustrup points out, the iostream interface was designed so that it would not be necessary to have a virtual function call on very character read or written -- so it is an issue worth taking into account.

Here's how to use template specialization instead of virtual methods:

Declare a normal template signature, but do not instantiate the bodies of class members or the body of the function if it is a global free function.
Specialize the signature for all instances of interest, implementing function bodies and data members as needed. Note that you can add members to specialized template classes.

Here is a comparison between the above method and simply using virtual functions:

advantages: you don't have to pay the penalty for virtual methods, the implementations can be wildly different -- including different data types and members, different protection for functions, etc.
disadvantages: no inheritence relationship exists between classes

Use template class specialization instead of template class derivation to cut down on namespace pollution when you need conceptually related classes to have different numbers of members:

template<class T> struct Numeric_Traits // no implementations { T max_value () const; // standard methods only T min_value () const; int size () const; }; // all instances are specializations template<> struct Numeric_Traits<char> // integral types need std methods only { char max_value () const { return 0x7f; } char min_value () const { return 0x80; } int size () const { return 1; } }; template<> struct Numeric_Traits<float> // float types need wierd extras { float max_value () const { return FLT_MAX; } float min_value () const { return FLT_MIN; } int size () const { return sizeof(float); } // now adding float epsilon () const { return FLT_EPSILON; } int mantissa_bits () const { return 24; } int exponent_bits () const { return 8; } // ... }; // ...

Use specialization to handle wierd stuff that should be associated with a class but which might be so bulky as to obscure readability of the original class:

template<class T> struct Helpers {}; class RealClass { friend class Helpers<RealClass>; public: // ... }; template<> struct Helpers<RealClass> { typedef RealClass::iterator iterator; int size () const { return sizeof(RealClass); } // ... };

Note that this mechanism helps make built in types more usable in templates because you can write code that assumes that a builtin class has a Helpers but you cannot assume that a builtin class has a nested method. For example, the following code won't compile:

template<class T> void function (T const &t) { T::method(t); } ... { int i; function(i); // wrong! won't compile } but this code will: template<class T> void function (T const &t) { Helpers<T>::method(t); } ... { int i; function(i); } because you can instantiate Helpers containing 'method(int)' easily.

A nice thing about this approach is that it lets you separate 'views' of a class. In the one way of looking at a vector, it is only an array of ints -- dependent on nothing but operator new and delete to implement.

In another view, it is an object which can be read/written to a persistant store -- and thus has dependencies on i/o concepts.

If you can't describe exactly what a piece of code is going to do, you aren't ready to start writing it.

Bjarne Stroustrup asserts in his excellent book, "The C++ Programming Language -- 3rd Edition", that

Vague and unrealistic goals are the primary cause of software project failure.

Truer words were never spoken -- but sometimes vague and unrealistic goals are all you have to work with. It would be wonderful to have all the project requirements neatly wrapped up in a box and handed to you in toto before you start. However, this almost never occurs. Even the most oppressive military style development process does not prevent signficant design changes after you have made critical design decisions.

None the less, the inevitability of coming changes does not give you an excuse to do sloppy design work. In fact, it makes it more important that designs be carefully thought out in light all the changes you are likely to face -- it is rare that changes occur out of the blue, there is usually plenty of up front warning of the broad outlines you can expect. You may, however, be forced to take a pro-active role in ascertaining the range of changes. Customers often have no idea what they really want. Make sure you find out, even if they don't.

Requirements analysis is by no means a simple process. Requirements do not simply exist and are merely waiting to be gathered. Rather, requirements are manufactured through hard work and planning. Good requirements are far more than just a list of things that the customers would like to find in the product when it is finished.

The work product of the requirements analysis phase of a product development cycle should be a top level design from the customer's perspective -- if not from the implementors. It is a base document that establishes the language by which the customers will be communicating their understanding of their needs. Unfortunately, most customers do not have a full understanding of what their needs are and will only be able to dribble them out as questions are asked.

The software vendors must be able to elicit the customer's understanding of what is needed and be able to communicate that understanding effectively to the customer as well as to the software designers. Properly done, the customer will dribble out all the requirements in this phase, rather than as the software is demonstrated for the first time.

Experience has shown that customers who are not themselves software developers will be unable to read and understand developer oriented documentation. Instead, they need to focus on

The differing types of users of the software and the roles they play -- such as:
- top managers need profit / cost reports
- line workers need to input assembly line production info
- middle managers need monthly productivity reports, production defect rates, etc.
- sysadmins need to be able to install the software, backup and restore working data, upgrade the software, etc.
The workflows each kind of user will go through and the data they need to see at each step as well as the commands that they need to be able to perform.
Performance criteria

While the customers may not immediately see the need for it, they will at least understand when asked how to provide "use cases" or actual examples of real or approximately real data that can be used to demonstrate proper operation when the developers are nearing completion.

This is a complex subject in its own right, consider reading "Conceptual Design" by Beyer and Holtzblatt. There is a tendency, as the saying goes, to have the programmers start coding while the system engineers go find out what the customers want. This is a source of many major mis-steps.

Code written in this way is like a "black hole" in astronomy. One way for a black hole to form is by accretion. That is, material falls into the star until it swells to such a large mass that the gravitational force exceeds the outward pressure of the star's thermonuclear reactions. When this occurs, the star collapses under its own weight and the density further increases the rate of gravitational collapse. The star litterally disappears out of the face of the universe never to be seen again -- directly at least.

At first, the rate of progress on a project run in the Nike way, by "just doing it", will seem high. Unfortunately, at some point the lack of planning will inevitably result in violations of the basic principles of good design and progress will slow to a halt. Even an army of programmers won't be able to speed things up. This is why that on many traditional projects, the initial group of developers all seem to be geniuses and the later groups seem less gifted.

Following good architectural principles is more important when building a sky scraper than when building a chicken coup. Small programs don't benefit that much from a lot of thought. However, unlike chicken coup's, computer programs are often extended well beyond their original design goals. If a chicken coup's requirements change and a sky scraper is needed it is likely that the coup will be discarded and the sky scraper designed from scratch. Unfortunately, there is a tendency in software to over extend extant implementations until progress becomes impossible.

A word to the wise: develop the habit of always using professional requirements analsis, architectural analysis, and coding practices -- use them even in toy programs. Practice makes perfect. Sometimes these little programs accrete features and become key to a company's business model.

In any complex multi-person activity, it is particularly important that interfaces between products created by different individuals or teams be designed very carefully. The implementations of the functionality behind these interfaces requires less (or no) coordination. However, the functions, objects, protocols, that are provided or required across teams should be thought about carefully and changes thereto be very carefully managment across the teams affected.

A key feature that many projects neglect is software layering. Rather than simply creating a large pile of code, software should be implemented in layers, like floors in a building. The lower levels strong enough to support the upper levels. Like a building, the layers should be only allowed to interact with one another at key interfaces. For example, buildings don't have stairwells placed at random locations, and pipes don't run willy nilly through the floors at odd angles. Software created by accretion, however, is likely to. These inappropriate interfaces are the source of the gradual reduction in productivity that randomly written programs will inevitably face.

In addition to layers, code should be implemented in a modular fashion. That is, a layer is not a pool of water into which salt is poured -- and evenly distributed. Rather it is like the floor of a building with rooms in it. Each room serves its own purpose and interacts with the rest of the rooms on the floor only through specific doors and windows.

The 'rooms' are analogous to software modules. A module should be highly cohesive -- meaning that it serves a single purpose and does not contain extraneous functions -- they go in a different module. For example, one does not expect to find a sink in the living room.

Program modules should also have low coupling between them. That is, in modules should not be broken out into such fine detail that there are a huge number interfaces from one module to another in the same layer.

Software architecture is a complex problem. Much is already written on that subject. Consider "Large Scale C++ Software Design" by John Lakos. Portability means that you are able to compile and execute your program correctly on more than one platform. It is easier to write portable code if you know what the target platforms are. Despite the similarities between operating systems and numerous standards available, there will be minor differences that make portability a taxing experience if you don't attend to it properly from the start of the project. The best way to have portability is to port your code early and often.

Porting your code to platforms that you do not necessarily think you need to, if they are easily available to develop on (such as Linux if you are doing PC development and vice versa), will help you in two ways:

It will give your company the option to sell your product on the platform to which you ported. You can't sell a product unless you have it running on the customer's platform. They typically won't wait for you to port your code before making a purchasing decison.
The act of porting will greatly help you clean up your code. Different compiler vendors will make different decisions about how to detect and report errors. Having many electronic eyes looking at your code will greatly help catch mistakes that only a single compiler won't.

A common trick to aid in porting is to require that the first header file included in all compilations be a 'porting' header file. That is, you add a header file whose only purpose is to define flags and or work around problems on specific machines.

This approach also allows you to force standard symbols into every compilation -- but you have to be careful not to include the entire C++ include file set in every compilation! Decide at a product level if you are using C++ exceptions or not.

Decide how you are going to handle floating point exceptions, etc. For example: divide by zero. Nan's, infinities, etc. Individual programmers should not be making this strategy decision because everyone will come to a different understanding of how to do this and it will be harder to work on someone else's code.

Know your tools

This chapter provides advice concerning compilers and related tools: linkers and librarians. The following topics are covered:

Running the compiler, linker, and librarian
Optimization levels
Types of builds (compilations)
Understanding compile errors
Dynamic linking
Rom-able code

The 'compiler' command line command or IDE functionality usually allows for automatic invocation of the linker and sometimes the librarian. That is, the compiler not only compiles .C source files into object modules but also lets you automatically link said objects into programs or archive the libraries. That is, the command you actually invoke is just a wrapper around a script that defines the compilation workflow:

compile .C to objects
merge objects into libraries
link objects and libraries into executables.

Usually, compilers also provide a way to convert .C source files into preprocessed output source files. These are useful when trying to debugging compile errors if the compiler decides to be annoying and fails to give you exact line numbers describing the your mistakes. See Understanding compile errors below.

Generally, with C++ programs, you should use the compiler to link rather than trying to figure out how to run the linker yourself. Often the compiler vendor doesn't want to document all the steps truly needed for linking and trying to figure this out for yourself only leads time, trouble, and non-portability.

Further, if your compiler provides a mechanism to build libraries, you should probably use that instead of manually running the librarian. Older compilers using non-standard template instantiation logic will provide a mechanism to 'close' the library that is not easy to do any other way. Library 'closure' refers to its completeness. Earlier compilers generate out of line template functions as separate step than normal compilation. This means that merely compiling an object module does not, in these older compilers, actually build all the object code needed. So, if you only take the generated objects and archive them into a library, you cannot successfully link with that library -- you'll get missing symbols. To get 'closed' libraries, that is libraries with all the needed symbols, you need to use the compiler to make the library. Again, this applies to older compilers. Newer compilers use compile time instantiation of all needed object code -- so the problem does not usually occur. Compilers often provide a variety of optimization levels. Optimization can refer to the efficiency of the generated object code but it can include other things as well. The question is, "optimized for what?".

Compilers typically provide for several levels of optimization. These levels exist primarily because the compiler vendors are only human and sometimes make mistakes writing the compiler for high levels of optimization. Usually, the lower optimization levels are likelier to give you properly working code, but not always. Since the lower optimization levels are less desireable to the users, they get less testing than the higher optimization levels. But the lesson here is that if code gets generated badly at one optimization level, it might work at another. When this occurs, an small example program that demonstrates the problem should be sent the compiler so that the compiler bug can be fixed (assuming your company is important enough to the compiler vendor ;-).

Optimization levels typically work like this:

debug: This optimization level generates object modules which are typically simple stack machines. That is, the actions your code specifically says to perform get performed exactly that way. Additionally, debug symbol table information is included. Some compilers provide the ability to include debug symbol table information with other compiler optimization levels but this is not a given and further it is very confusing to debug an optimized program. Importantly, in C++, the inline functions you define are generated out of line in the debug optimization level.
O1: Level 1 optimization does not typically include debug symbol table information, and adds register optimizations and perhaps loop unrolling. Code generated in level 1 optimization is faster than debug levels but often is still somewhat inefficient -- for example the inline functions might still be instantiated out of line.
O2: Level 2 optimization usually guarantees inline functions to be truly inline (unless too complicated) and adds more sophisticated optimizations. This is the highest level of optimization recommeded for general programming activities.
O3: Level 3 and Higher should be used at your own risk and generally only on small program subsets. Compiler bugs may make these levels unusuable -- further the generated code may grow greatly in size and may or may not be faster than level 2. Basically, try and see. If wierd things start happening, drop back to level 2 first before trying much debugging.

Sadly, you may well be forced to turn off compiler optimization, or change the level thereof in one file or another. Your build process should account for this. The Microsoft compiler allows you to insert #pragma statements in your code to set the compiler options. Others do not. For these compilers, you will have to ensure that the build rules supply the needed optimization selection options.

Sadly, even the Microsoft compiler occasionally has bugs that prevent proper operation of the pragmas to select the compiler options in pragma directives.Before using pragmas to select compiler optimization levels, first make sure that using command line options to change them does fix your problem. Then, if the change helps, try using pragmas. In addition to debugging information provided by the compiler, it is often desireable to add your own additional code to perform error detection as the program runs. In some cases, that code has low cost and should be left in the final product builds. However, in some cases, the diagnostic code may be too expensive for this.

A common approach to writing the code is to have more than one kind of compilation (ie "build"). Typically, a developer must select the kind of build desired by editing a file and adding or changing a #define'd symbol to make the selection. Common build types are:

production build: An highly optimized build meant to be sent to end customers and containing neither debug information nor diagnostic code.
diagnostic build: An highly optimized build containing diagnostic code but no debug symbols. This is meant mainly as a debugging aide when bugs show up in production builds but not in debug builds.
debug build: A diagnostic build that also has the debug symbol table information.

The build process is envolved in selecting these types of build: you have to set the optimization level to debug, O1, or O2 as described above. However, there must also be some #define'd symbol set, perhaps named "OurCompany_Diagnost_Build", or some such name. When this symbol is defined, your code should include additional error checks as needed to exhaustively check for the correctness of its own internal state. Without the symbol, these exhaustive checks should be left out. Here is an example class definition that uses this technique:

class MyClass { public: MyClass () { ... } ~MyClass () { diag(); ... } MyClass ( MyClass const & other ) { other.diag(); ... } void member () { diag(); ... } private: #ifdef MyCompany_Diagnostic_Build void diag () { // check state of *this } #else void diag () { // do nothing in normal builds } #endif }; All professional C++ programmers will learn to both love and hate compilers -- which are only programs -- written by people just like you. All compilers have their own bugs and annoyances. Sometimes these issues make it impossible for you to write the code the way that seems most natural to you. However, with few exceptions, the C++ language provides a variety of ways in which to say and do the same things -- thus it behooves you to be flexible and creative -- and also patient.

In addition to the obvious kinds of errors most people would expect to get -- such as syntax errors, using undefined type names, etc -- compilers often provide some helpful warnings and errors that may or may not require work on your part to enable. Turn them on!. As Scott Myers says in his book, "Efficient C++":

"Prefer compile time and link time errors to run time errors".

Finding and fixing your errors at compile time is likely to be 10 times as easy as fixing a bug that accidentally got to your customer's web site -- and is a lot less annoying (although it might not seem like it at the time). Fixing all warnings will give you safer code and once the act of fixing the problems has trained you not to create them in the first place, your code quality will be greatly improved.

If possible, use a compiler option (if avaiable) to convert warnings to errors -- which will require that all warnings be fixed in order to compile successfully.

Many compilers allow you to turn on 'portability' errors and warnings. The exact meaning of the 'portability problems' will vary by vendor. Still, it is advisable to try turning them on. If you get an error that you just can't fix -- probably because the error was incorrectly detected by the compiler, you can always turn off this flag.

Caveat: Sadly, nothing is ever perfect. Some compilers do mistakenly detect non-problems and call them problems. Hopefully, in this case, the compiler will allow you to turn off the erroneous detection of the specific error -- so that you will not be forced to turn off the extra warning detection in general.

When faced with compiler bugs, it might take you considerable time to determine a single syntax that works on all platforms to which you are trying to port. Luckily, since February 2001, this kind of thing has greatly abated but it has not gone away completely. This is generally time well spent however. Using #ifdef's to work around compiler differences can greatly complicate your code. It is better to find one way than to try and support many. One of the most annoying problems developers face is to determine what the compiler is trying to tell you. This is particularly true of of compile errors in templates -- or even worse if you use a lot of macros. If you are lucky, your compile will give you good errors of the form:

Error 122, "something went wrong" in file "name.c", line 247
where instantiated from file "other.c", line 1000
where instantiated from file "different.c", line 12345
where instantiated from non-template code in file "main.c", line 100

You will get a message of this form if you are lucky. All compilers have limitations about producing these error tracebacks. Sometimes the traceback just gets cut off.

In these cases, you have figure the problem out for yourself. Here is one approach. This approach is analogous to Newton's method of root location:

Instead of compiling your .c file directly, use the compiler to pre-compile it into a .i file first and start working with that. The technique for generating a .i file varies from compiler to compiler. Sometimes, you have to run a separate program, cpp. In other cases, you add the -P or -E option to the compiler command line you normally use.
Some compilers will let you compile the .i file directly, others require you to rename it to "something.c".
Once you have a macro-expanded (.i) file that you can compile, you will have a huge file to edit. (Your editor will need to be robust to do this). Remove bottom 1/2 the file and recompile. If the error goes away, you know that the problem was caused by the bottom half. Put the the bottom half back and remove the bottom quarter. If the problem goes away, put that quarter back, then remove the bottom 1/8'th. Repeat this process recursively until you get down the line that causes the bug.
Once you have the line of code that caused the problem, check for the mis-use of const objects or the use of types that the template does not accept as valid parameters.

To reduce the difficulty in understanding errors envolving templates which you write, see Early Error Detection. Dynamic linking is the mechanism that lets us build our programs in pieces which can be shipped to the customer separately. The pieces can even be built in different languages and be made from different product builds.

Dynamic linking means that a program fragment is loaded at runtime rather than being bound into the executable at link time. A program fragment is actually a program unto itself -- it is just a simplistic program whose only purpose is to provide access to functions to other programs.

On Windows, a dynamically linked library is named Something.DLL. On unix, it might be named 'Something.SO' or 'Something.SL'. However the concept is pretty much the same.

The C++ habit of mangling function names make the use of C++ in dynamic libraries a bit tedius. That is, the name of the function actually stored in the library will be long and ugly. Instead of MyClass::member (int) you might see something roughtly like:

_Fx7MyClass6Member_fi

To make matters worse, the exact name varies from compiler version to version! So that code compiled with a new compiler version may not link with code compiled with an older version of the same compiler.

The solution to this problem is relatively simple, however. Instead of calling the functions in your dynamic library directly, declare them to be virtual members of a some class. Then, declare inside the dynamic library a C, not C++, function that returns a pointer to a member of this class. Once returned, the virtual calls should work as expected -- although even this could malfunction of the compiler changes the way virtual tables are layed out. With a given compiler, however, this is unlikely. Here is an example:

// // Header file defining funtions // in the library // class DynamicFuncs { public: virtual void method1 (); }; extern "C" { // // Call get_ptr() to use functions // in this library. For example, // use get_ptr()->method1(); // DynamicFuncs* get_ptr (); };

When you wrap a function declaration or definition in an extern "C" wrapper, the compiler does not mangle the name according to C++ rules, but rather according to C rules. The function calling convention (ie, how the parameters are passed) is likely to be different in C than it is in C++. This has little practical importance, but you must remember to get the declarations all identical so that you do not get link errors. Template instatiation is the act of creating a real function or class from the 'template' provided to the compiler. Newer language specifications call this 'template specialization'.

Template bodies can be specialized in two fundamental ways:

It can be done manually.
The compiler can use the template body to perform the specialization. If there is no template body available, then the compiler cannot make the template instance.

The STL and presumably any templates you write will make the template bodies available to the compiler (in header files) so that it can make instances as it needs them -- but there are occasional cases wherein you will need to manually force template instantiation. Manual specialization occurs in either of the following ways:

Definining new template function bodies for a extant template but which are only applicable to a specific set of template parameters.
The use of the 'template' directive to force the compiler to 'make it now' from the extant template bodies.

One use of manual instantiation is to close a library. That is, if you have written a library and that library uses templates, need to make sure that all template bodies needed by the library are in fact in it. Since libraries don't require linking to create them, there is always some chance you'll forget some bodies. Thus, you can force the compiler to instantiate all template bodies for all template signatures you need. When you use the template keyword to instantiate a class, all member functions and variables get instantiated.

This however, can lead to problems -- sometimes templates are written to take any kind of parameter -- and some functions in the template will not work with some kinds of template parameters. Then, forcing the template to instantiate in its entirely will result in compile errors that you really can not deal with -- nor do you need to. The work around for this is painful, however: Manually instantiate the specific members you do need -- and there might be a lot of them.

A work around for having to do lots of manual instantiations is to write a function, which is never called, but which when compiled forces the instantiation of all needed members -- and none you don't need.

In the past, there was a great deal of concern for the time that the compiler would take when instantiating template function bodies. This led to the view that the bodies for templates should be separated from their declarations. Compiler writers went to great lengths to create repositories of template bodies which had already been created so that each compilation would not have to duplicate this work. On single processor machines with a very fast operating system, this seemed to make sense. However, on multiprocessor machines, this approach actually makes things worse. The act of detecting whether or not a template body had in fact been instantiated by some other compilation stream was shown to be actually larger than simply compiling the template body. Therefore modern compilers employ the 'compile time instantiation' strategy wherein the source for the template bodies is placed in header files (or is otherwise available at compile time).

Thus, every compilation you invoke will, normally, instantiate in every object module, all templates needed by that object modules (assuming the bodies are available of course). The linker is then required to ignore duplicate template bodies. This doesn't sound faster, but all compiler vendors have moved to the compile time instantiation approach -- so it must be. As strange as it may seem, it is occasionally necessary to compile your C++ code into assembly language source files rather than into object modules. All compiler provide some sort of command line option to allow this. Usually, it is -s.

So why would you need to do this? To find out what the compiler is actually doing for one thing. If you have a lot of overloaded functions with similar signatures, it may become very difficult to determine which of the alternatives was actually selected. Of course, if you have the source to all these functions, you need only debug the program and single step into the called function. But what if the function is is part of the STL or a 3^rd party run time library -- and you have no source?

The solution here is to:

Modify the compile options to generate assembly source rather than an object module.
Generate the assembly source.
Use c++filt to demangle the mangled names you will see in the assembly source.
Look at the generated code to see which function it actually called.

Another reason to generate assembly language source files is to diagnose problems with the compiler's name mangling logic. Yes, as mentioned before, compilers are written by human beings and occaisionally have bugs. By producing the assembly source, you see what functions are actually implemented by the compiler (with mangled names) and you can also look at the raw function calls (with mangled names) to make sure the compiler did in fact generate a call to the function that got defined.

Don't be fooled by the output of c++filt. Sometimes it has bugs too! If you get a link error but you are feel certain that the symbols are in fact defined, you must compare the object module symbol tables directly.

Hopefully, you won't have to do this very often. Whenever the name mangling goes wrong, it is often only wrong by a single character in very long name filled with seemingly random characters...

When developing code that is meant to execute from a ROM (read only memory) one has to accept some signficant restrictions. Unfortunately, there is no general list of restrictions that is universally applicable. Restrictions on one platform may or may not be there on another.

Program variables cannot be stored in ROM, although simple numeric and "C style" string constants can. Class objects which require construction typically cannot because the 'static construction' logic makes no sense for symbols stored in ROM.

While it is not possible to write to ROM as the program runs, there must be some section of memory wherein the program stack resides -- usually a very limited amount. There may or may not be any heap space in such a situation -- and there may or may not be any 'global data' space wherein global and static variables might be found. If heap and global data space is available, it might be assessible only through pointers rather than through the normal linking and operator new mechanisms. Luckily, C++ provides a builtin mechanism for dealing with special purpose memories -- the stl::allocator concept.

Understand your tools before writing the code!

Code not originally written to run in ROM will likely either not run or will require rework to make it run from ROM.

For better or worse, this web page was hand constructed using html, java script, cascading style sheets, and a lot of typing.