Reduce Compilation Dependencies in Large Scale C++ Projects: Factory Pattern

1. Introduction

Most large-scale projects start from a small project, and
gradually evolve into larger ones. The issues one might face in a large-scale
project may not be very prominent when the project size is small; therefore
most of the projects, which initially start small, may not handle those issues
properly when its size grows. One such problem that may arise in large-scale C++
project is physical dependencies, also known as compilation dependencies, of a
project. Compilation dependencies, if not managed properly, can increase the
compilation time of a project unnecessarily.

Design patterns [2] are usually used to discuss the logical
design of the project, but are also helpful to manage the physical design.
Although prototype hierarchy [1] was the first design pattern to discuss the
compilation dependencies, there were already some techniques, and idioms [3],
which discuss this issue. The PImpl principle [4], also known as pointer to
implementation, is also one of that, which can be said a variant of Handle/Body
idiom [3]. Changes are inevitable in large projects. Here we are going to
introduce some techniques, which are useful to minimize the compilation time
during the development.

2. Separate Compilation

It is common practice of C++ Programmers to break the code
in multiple implementation files (usually extension with .c, .cxx, .cpp, etc.),
and definition files (usually extension with .h, .hxx, .hpp, etc.). It is the
responsibility of the preprocessor of a language to make the contents of all
the required definition files available in the implementation file before
compilation.

We used to do this because we wanted to reduce the
compilation time during the development as well as reuse the code written in
different files. For example, if we want to develop a project, which has 10,000
lines of code, now during the development of the project or after it if we
change any single line, then the compiler has to recompile all the 10,000
lines. In today’s computers this might not be a big problem, but it will
eventually become a nightmare when projects become larger and larger. On the
other hand if we split our project into more than one file, such as 10 files
each contain roughly 1000 lines, then any change in one file ideally should not
affect the other files. It is very common in large-scale projects to have some
general-purpose classes, which are useful in other projects too. So the natural
solution to use those classes in other projects is to make the classes in
separate files.

On the other hand, if we don’t develop the program carefully
then sometimes it is impossible to just include these two files in another
project, and use it. One of the most common problems that may arise is to also
include some other definition files in our project, which we might not need; and
other files may also need some other files, therefore at the end we may have to
include a bunch of files to just use one single class.

From a compiler prospective, an implementation file with all
expended preprocessor directives is called translation unit. In other words,
the translation unit is an implementation file with all the definition files
included, and macro expended. If we change anything in any definition file then
all the files in which this definition file is included needs to be recompiled,
whether it is definition file or implementation file.

If one definition file is included in other definition file,
then changes in the first definition file will alter all the files that include
either first file or second file. The situation becomes even worse when a definition
file is included another definition file, which includes another definition
file and so on. Now changes in one file may mean that the compilation is not
limited to one file only, but it may involve recompiling the whole project.
This diagram shows this concept clearly.

Physical
Figure 1: Physical

It doesn’t matter that our camera class does not include
Point.H or ViewPort.H directly; it is included in the camera translation unit.
A change in point header file will compile not only camera translation unit,
but also all translation units in this example.

3. Applying Patterns to Minimize Compilation Dependencies

The above dependencies can be minimized with the help of
forward decelerations [4]. However, sometimes it is impossible to use classes
with only forward deceleration. Let’s look at an example to better understand
this. It is not unusual for a program to communicate with different databases
such as Oracle, Sybase, and SQL Server, etc. at a time, and change the database
at run time. To gain the maximum speed benefit, we can use the native APIs of
these databases. To give the similar and polymorphic interface to the client,
we make an abstract base class called Database, which contain the pure virtual
functions of all required interfaces, and inherit all of the database specific
classes from it. We can also keep all these classes in separate components, if
necessary. Just to keep things simple, here is our database class.

class __declspec(dllexport) Database
{
public:
        Database(void);
        virtual ~Database(void);
        virtual bool OpenConnection(std::string connectionString) = 0;
        virtual void CloseConnection(void) = 0;
        virtual void ExecuteCommand(std::string command) = 0;
};

We inherited three classes from it for Oracle, SQL Server
and Sybase implementation. Here is the code of the Oracle class; others are
very similar to this.

class __declspec(dllexport) Oracle :
        public Database
{
public:
        Oracle(void);
        ~Oracle(void);
        bool OpenConnection(std::string connectionString);
        void CloseConnection(void);
        void ExecuteCommand(std::string command);
};

In implementation of these methods, I simply display the
message whose function is called. Here is our implementation.

bool Oracle::OpenConnection(std::string connectionString)
{
        std::cout << "Oracle::OpenConnection" << std::endl;
        return false;
}


void Oracle::CloseConnection(void)
{
        std::cout << "Oracle::CloseConnection" << std::endl;
}


void Oracle::ExecuteCommand(std::string command)
{
        std::cout << "Oracle::ExecuteCommand" << std::endl;
}

This is a class diagram of our classes.

No Factory Pattern
Figure 2: No Factory Pattern

In this design we have to include the definition file of a
child class in the client program, because without that we won’t be able to
create an object of it [5]. If the client of these classes does not know in
advance which database to communicate with, or wants to give this flexibility
to the user, then it has to include definition files of all the child classes.
Here is a simple client code to demonstrate this.

Database* pDataBase = NULL;

switch (choice)
{
case 1:
        pDataBase = new Oracle();
        break;

case 2:
        pDataBase = new SQLServer();
        break;

case 3:
        pDataBase = new Sybase();
        break;
}

if (pDataBase != NULL)
{
        pDataBase->OpenConnection("This is connection string");
        pDataBase->ExecuteCommand("This is command");
        pDataBase->CloseConnection();

        delete pDataBase;
}

In addition, if we want to add one more database support,
then we need to inherit its class from Database, and also include its
definition file in the client, which results in a lot of recompilation.

We can reduce the dependencies between these classes and
clients by introducing indirection. We introduce a Factory method [2] to create
the object of the child classes instead of client. Now client only communicates
with the factory method to create instances of the required class. We create
the DatabaseFactory class with one static method CreateObject. Now it is the
responsibility of this method to create the object of appropriate class and
return its address. Here is the code of our factory method (CreateObject method
in DatabaseFactory class).

Database* DatabaseFactory::CreateObject(int databaseType)
{
        if (databaseType == 1)
               return new Oracle();
        else if (databaseType == 2)
               return new SQLServer();
        else if (databaseType == 3)
               return new Sybase();
        else
               return NULL;
}

Here is a class diagram of this.

Factory Pattern
Figure 3: Factory Pattern

The client of the database classes will need to create the
instances appropriate database with CreateObject methods of DatabaseFactory
class depending on the information passed in the form of parameters. The advantage
of this technique is that the client of the database classes now needs the
definition files of only two classes, i.e. DatabaseFactory and Database. Here
is the client code using the factory method.

Database* pDataBase = NULL;

pDataBase = DatabaseFactory::CreateObject(choice);

if (pDataBase != NULL)
{
        pDataBase->OpenConnection("This is connection string");
        pDataBase->ExecuteCommand("This is command");
        pDataBase->CloseConnection();

        delete pDataBase;
}

In the future, if we want to add support of one more
database such as DB2, MySql, etc., then we don’t need to include its definition
file at client side.

With the addition of new database support the only thing we
need to change is the implementation of the CreateObject function in the
DatabaseFactory class. If this function is not made in-line, then it will not
affect the client of the database, and reduce compilation. It is also a better
practice to write the function body in the implementation file, even if it is
an in-line function, to reduce the physical dependencies [6]. If performance is
concerned, then this function can be declared inline explicitly. If there is
any change in the implementation of the function, then compiler will only
recompile that translation unit. On the other hand, the change of
implementation of function means the recompilation of all the translation units
that contains this definition file.

4. Conclusion

Most of the compile time dependencies can be removed with
the proper use of design patterns. Design patterns are not only useful to
improve the logical design of the project, but can also make the physical
design of a project better to minimize the compilation time of the project.
There is a rule written in “The Elements of Style”, “Omit needless words” [7].
We can apply a similar rule here, “Omit needless headers”.

Most of the things discussed here are used to reduce the
compile time dependencies of a project. This work can be further enhanced to
minimize the link time dependencies too.

5. Reference

  1. Large Scale C++ Software Design
    John Lokos
  2. Design Pattern, Elements of Reusable Object Oriented
    Software
    Erich Gamm, Richard Helm, Ralph Johnson, John Vlissides
  3. Advance C++ Programming Style and Idioms
    James O Coplien
  4. Exceptional C++
    Herb Sutter
  5. The C++ Programming Language 3rd edition
    Bjarne Stroustrup
  6. 6. Manage Physical Dependencies of a Project to Reduce
    Compilation
    Zeeshan Amjad
    http://www.codeproject.com/KB/cpp/ZeeshanPhysical.aspx
    http://www.codeguru.com/Cpp/Cpp/cpp_mfc/files/article.php/c6859/
  7. 7. The Elements of Style
    William Strunk Jr, E.B. White, Roger Angell

More by Author

Must Read