Allocators (STL)

Environment:

Compilers/IDE: VC++ 6.0 SP5, Dev-C++ 5 using gcc version 3.2 (mingw special 20020817-1), KDevelop 2.0 using gcc 2.96 20000731
STL Implementations: Dinkumware for VC++ 6.0, STLport 4-5-3, GNU ISO C++ Library
Operating systems: Windows 2000 SP2, Red Hat Linux 7.2 (kernel 2.4.7-10)

Introduction
What Is an Allocator?
Why Do We Need an Allocator?
What Is the Default Allocator?
Why Would I Want to Write My Own Allocator?
First Steps to Implement Your Own Allocator
Memory Management—A Memory Pool
Memory Management—Simple Segregated Storage
Implementing the Allocators
Making the Code Compile and Run
Conclusion

Introduction

In the previous article, A Beginner’s Tutorial For std::vector, Part 1, it was promised that the next part of the tutorial will comment on the second template parameter of the 'std::vector', that is, the allocator. However, as we started digging into this topic, we realized that it is worth a stand-alone article. One of the reasons is that, after putting together a neat custom allocator and successfully testing it with 'std::vector', we thought we give it a shot with 'std::list', too. Boom! It did not work. So, besides talking about allocators as a concept, we will also take a look at some pitfalls you may encounter if you decide to implement one.

However, this concept is pretty complex and thus—although we are trying to explain things in a simple way—we have to expect a little bit of knowledge about the basic principles of programming from you. Furthermore, by no means is this supposed to be a complete tutorial to standard allocators or to memory management in general.

By the way, as you can see, the Environment section at the beginning of this article is somewhat special. That is because most of the code we are going to look at is standard C++. We wanted to test it with various compilers and STL implementations, both under Windows and Linux. The reason for doing this is the fact that, at the time speaking, you will hardly find a compiler that fulfills the ISO C++ Standard to 100%. Moreover, the compilers people use for their daily work often are not the newest ones. They tend to lack a couple of advanced facilities, like partial template specialization, template member functions, and so on. As we will see, this has a major impact on the way you write standard C++ code. Okay…enough with the smalltalk…let’s get started.

What Is an Allocator?

If you take a look at the word allocator, you might already know that it will allocate—well, something. If you are already a programmer—which we assume at this point—you also might have noticed that it sounds like the old ANSI C memory allocations routines such as 'malloc()' or 'calloc()'; thus, you already can imagine that it most likely has something to do with the management of the resource memory.

As you (hopefully) have learned in the first article, a 'std::vector' is a replacement for an old ANSI C array. The following code should look familiar to many developers:

  int *Array = (int *) malloc(100 * sizeof(int));

It will simply allocate memory for storing 100 integers. 'malloc()' will allocate the requested amount of memory from the heap. Thus, after you are done with it, you have to explicitly release the memory by a call to 'free()':

  if(Array)
    free(Array);

In C++, this error-prone approach is no longer necessary using the container 'std::vector' from the STL:

  #include <vector>

  std::vector<int> Array(100);

As you can see, there is no explicit request for memory allocation involved. All the necessary memory allocation will be done by the 'std::vector' itself…implicitly. And yes, as you already might have guessed, the whole memory management is done through the so-called allocator.

Why Do We Need an Allocator?

If you think about how you usually allocate memory dynamically (using the 'new' operator), you could ask why the STL provides such a thing called allocator that does all the memory management of the container classes. The concept of allocators was originally introduced to provide an abstraction for different memory models to handle the problem of having different pointer types on certain 16-bit operating systems (such as near, far, and so forth). However, this approach failed. Nowadays, allocators serve as an abstraction to translate the need to use memory into a raw call for memory.

Thus, allocators simply separate the implementation of containers, which need to allocate memory dynamically, from the details of the underlying physical memory management. Thus, you can simply apply different memory models such as shared memory, garbage collections, and so forth to your containers without any hassle because allocators provide a common interface.

To completely understand why allocators are an abstraction, you have to think about how they are integrated into the container classes. If you take a look at the constructor of 'std::vector':

  vector<T, Alloc>

you will notice that two template parameters exist. ‘T’ represents the vector’s value type—in other words, the type of object that is stored in the vector. ‘Alloc’ represents the vector’s allocator—in other words, the method for the internal memory management.

The internal implementation of the allocator is completely irrelevant to the vector itself. It is simply relying on the standardized public interface every allocator has to provide. The vector does not need to care any longer whether it would need to call 'malloc', 'new', and so on to allocate some memory; it simply calls a standardized function of the allocator object named 'allocate()' that will simply return a pointer to the newly allocated memory. Whether this function internally uses 'malloc', 'new', or something else, is not of any interest to the vector.

What Is the Default Allocator?

After reading the background and purpose of allocators, you might wonder whether you need to provide your own allocator every time you want to use a container from the STL. You can breathe a sigh of relief…you do not have to. The standard provides an allocator that internally uses the global operators 'new' and 'delete'. It is defined within the header file <memory> and is used as the default one everywhere an allocator is needed.

The public interface is described by the ISO C++ standard, section 20.4.1:

  namespace std {
    template <class T> class allocator;

    // specialize for void:
    template <> class allocator<void> {
    public:
      typedef void*       pointer;
      typedef const void* const_pointer;
      // reference to void members are impossible.
      typedef void value_type;
      template <class U> struct rebind { typedef allocator<U>
                                         other; };
    };

    template <class T> class allocator {
    public:
      typedef size_t    size_type;
      typedef ptrdiff_t difference_type;
      typedef T*        pointer;
      typedef const T*  const_pointer;
      typedef T&        reference;
      typedef const T&  const_reference;
      typedef T         value_type;
      template <class U> struct rebind { typedef allocator<U>
                                         other; };

      allocator() throw();
      allocator(const allocator&) throw();
      template <class U> allocator(const allocator<U>&) throw();
      ~allocator() throw();

      pointer address(reference x) const;
      const_pointer address(const_reference x) const;

      pointer allocate(size_type,
                       allocator<void>::const_pointer hint = 0);
      void deallocate(pointer p, size_type n);
      size_type max_size() const throw();

      void construct(pointer p, const T& val);
      void destroy(pointer p);
    };
  }

In addition to that, the following two global functions belong to it as well:

  template <class T1, class T2>
  bool operator==(const allocator<T1>&, const allocator<T2>&)
                  throw();

  template <class T1, class T2>
  bool operator!=(const allocator<T1>&, const allocator<T2>&)
                  throw();

It does not even look that scary, does it? The public interface consists of five parts:

A couple of type definitions. These ensure that the allocators’ client (for instance, 'std::vector') is able to use some relevant types by known names. For example, consider that you write an allocator, that is able to allocate memory in a far area, that cannot be reached by normal pointers (let your imagination wander). Now, the 'allocator' will use some pointer-like construct. The allocators’ client has, of course, no idea of such a thing. When a client needs to pass such a pointer it will use the
```
typedef T* pointer;
```
and if it needs to suptract such pointers, the result will have the type 'difference_type', whatever that internally means for the allocator.
A peculiar looking template member structure:
```
template <class U> struct rebind { typedef allocator
 other; };
```
This is a very elegant solution to a requirement any allocator has to fulfill: to be able to allocate objects of different types than its template parameter. You might wonder why one cannot simply write 'allocator' instead of doing this 'rebind' thingie. Because, as stated before, the allocators’ client has no idea what type the allocator actually has; thus, the need to ask the allocator itself to instantiate its own template with '' and give it back to the client.
The default constructor, two copy constructors, and the destructor. There are two issues here:
- The one copy constructor is a template.
- There is no 'operator=()'.
As you know, when you need to write a copy constructor for one of your classes, you will also need to write the 'operator=()'. Both issues are explained by a further requirement any standard allocator has to accomplish: Any two allocator objects, instantiated from the same template class, must be equivalent. That is made clear by the two template operators '==' and '!=' (the 5th part of the public interface). This being said, it is clear that an assignment operator is useless. So, why are the copy constructors there? Because of their exception specification. The construction of an allocator is not allowed to throw.

All constructors and the destructor are trivial for the standard allocator (as in, empty). For user-defined allocators, they might be non-trivial. However, they are not allowed to throw any exception at all.
The allocators functionality is to allocate or deallocate raw memory in the first place. Having said this, it might additionally initialize the allocated memory—which is the case for the default allocator because it uses the 'new' operator. The allocator also must be able to provide a (constant) pointer to a given object it allocated and the maximum size of memory it can allocate.
The free template operators '==' and '!='. These are hardcoded to return true (operator '==') or false (operator '!='). Every (standard-like) allocator should provide these the same way. However, there are extensions in some STL implementations that treat allocator objects of the same type as distinct objects (for example, copies them and let them have state)—this is a relatively uncharted area (thanks to Bjarne Stroustrup for this comment).
In case you are wondering why we count the free template operators to the public interface of the 'allocator': We use the term public interface in a wider sense—we are not referring strictly to the class members declared as 'public:', but to the set of entities that are visible from outside the class and that define that class. The 'allocator' class won’t be complete without the two free template operators because the operators would not have any meaning without the allocator class. They belong together and represent the public interface.

The allocator needs to be specialized for 'void' because you cannot have references to 'void'.

Why Would I Want to Write My Own Allocator?

The default allocator that comes with your implementation of the STL will do a very good job in nearly all cases. That is not surprising because the implementations of the STL are written by very experienced people. In other words, the assumption of being able to write an allocator that outperforms the standard one in the general case is at least questionable. So, why implement an allocator on your own in the first place? There are a couple of reasons justifying that:

You want to use a different memory model, such as shared memory, and so forth.
It is not raw speed you are after, but some other effect. For example, you want to trace the memory operations of your application to a file.
It is raw speed you are after, and you plan to use the 'allocator' for a well-defined, particular task. In this case, your a-priori knowledge about the allocation/deallocation behavior of your code enables you to design an 'allocator' that is faster than the default one.
Let your imagination wander…

First Steps to Implement Your Own Allocator

As you can see, the reasons to write your own allocator are very dependent on your application. In this article, we will ignore the fact that we (probably) cannot out-perform the default allocator, and write a couple of all-purpose custom allocators. Believe it or not, the devil lies in the details…

If you need (or want) to write a user-defined allocator, you have to follow some basic requirements:

Required type definitions
- value_type
 
 Type of the element that is being used in this memory model.
- size_type
 
 Unsigned integral value that can represent the size of the largest object in the memory model.
 
 If the allocator should be used with STL containers, this type must be equivalent to 'size_t'.
- difference_type
 
 Unsigned integral value that can represent the difference between two pointers in the memory model.
 
 If the allocator should be used with STL containers, this type must be equivalent to 'ptrdiff_t'.
- pointer
 
 Type of a pointer to the element type being used in the memory model.
 
 If the allocator should be used with STL containers, this type must be equivalent to 'T*' for 'allocator<T>.
- const_pointer
 
 Type of a constant pointer to the element type being used in the memory model.
 
 If the allocator should be used with STL containers, this type must be equivalent to 'const T*' for 'allocator<T>.
- reference
 
 Type of a reference to the element type being used in the memory model.
 
 If the allocator should be used with STL containers, this type must be equivalent to 'T&' for 'allocator<T>.
- const_reference
 
 Type of a constant reference to the element type being used in the memory model.
 
 If the allocator should be used with STL containers, this type must be equivalent to 'const T&' for 'allocator<T>.
- rebind
 
 A template structure, which allows any allocator might allocate memory of another type indirectly.
 
 It has to be declared as follows:
```
template <class U> struct rebind { typedef allocator
 other; };
```
Required operations
- allocator()
  
  Default constructor.
- allocator(const allocator&)
  
  Copy constructor.
  
  Copies an allocator, so that storage allocated by the original can be released by the copy and vice-versa.
- ~allocator()
  
  Default destructor.
- address(reference v)
  
  Returns a pointer to the value v.
- address(const_reference v)
  
  Returns a constant pointer to the constant value v.
- max_size()
  
  Returns the largest value which can be passed to the 'allocate()' function.
- allocate(size_type n)
  
  Returns storage for n elements of the element type being used in the memory model.
  
  Elements will not be constructed/initialized.
- deallocate(pointer p, size_type n)
  
  Deallocates the storage for n elements of the element type being used in the memory model,beginning at location p.
  
  The storage must have been allocated by the same or any equal allocator.
  
  n must be the same value as it was while calling allocate().
  
  p must not be 0.
  
  Elements must have been destroyed before.
- construct(pointer p, const_reference v)
  
  Initializes the storage of the element, pointed to by p, with the value v.
  
  The storage must have been allocated by a call to allocate().
- destroy(pointer p)
  
  Destroys the element, pointed to by p, without deallocating the storage.
- operator ==(const_reference a1, const_reference a2)
  
  Returns true if storage allocated by allocator a1 can be deallocated by allocator a2 and vice-versa.
  
  If the allocator should be used with STL containers, the above is required. Thus, this function should always return true.
- operator !=(const_reference a1, const_reference a2)
  
  Returns true if storage allocated by allocator a1 cannot be deallocated by allocator a2 and vice-versa.
  
  If the allocator should be used with STL containers, the opposite of the above is required. Thus, this function should always return false.

On a more practical level, there are two steps when implementing a custom allocator (actually there are three steps, but we will come to that later).

Design a memory management mechanism. For example, grab a huge chunk of memory and manage that by yourself.
Use the designed memory management mechanism to implement a standard-like allocator.

Although it is possible to have both the storage management and the allocator interface in one and the same class, we prefer to separate them. That is, we prefer to implement a class that does the memory management, and to implement the allocator in terms of this class, by composition (and NOT inheritance!).

Memory Management—A Memory Pool

If you did an Internet search, you would find dozens of high-tuned or special purpose memory management implementations. Just have a look at the Boost library. This is a vast topic that is beyond the scope of this article. What we want to demonstrate is the link between memory management schemes and standard-like allocators. Actually, as we stated before, it is the allocation/deallocation behaviour of the code that dictates the design of the memory management scheme. So let’s start simple…

The concept of our first memory management class is that it grabs a large chunk of raw memory on construction, and emulates allocations and deallocations on this chunk. We will call it a memory pool. We will keep track of the allocated and free blocks of memory by using a linked list of nodes, that are directly embedded into the memory chunk. For those who are curious, we saw this kind of memory management in the source code of Doom (by ID Software—the source code used to be available under the GPL). Doom was written in C (not C++) and their memory management scheme was a bit more complex than what we are going to show. We have no idea who invented it, we suspect that it was many, many years ago though.

For a start, we could say that our chunk is a 'char[fixed_size]'. Something like this:

  class pool
  {
    enum {
      size = 0xffff;
  };

  public:
    // Whatever

  private:
    char mem_[size];
  };

That has a couple of drawbacks, one of them being deadly:

We would not be able to grow the chunk.
The standard does not guarantee that 'mem_' will have a correct alignment for something else than 'char'. And this is deadly. There are rumors that the next revision of the standard will address this problem, but until then…

To avoid the second issue, we could dynamically allocate 'mem_' (the standard guarantees correct alignment for dynamically allocated memory). However, despite growing the chunk would not be impossible anymore, it would for sure be a mess—complicated, error prone, and so on. Well, we are talking about standard C++ here, aren’t we? What about this approach:

  #include <list>

  class pool
  {
    enum {
      size = 0xffff;
    };

  public:
    // Whatever

  private:
    std::list<char*> mem_;
  };

We 'push_back()' a 'new char[size]' into 'mem_' in the constructor, and on demand, another one and so on. By the way, 'size = 0xffff' is an ad hoc value we typed into the code. It is up to you to find a better one. Now that we have decided how to handle the raw storage, we also need to organize it. The client is likely to ask for a block of memory, or to release a block of memory it previously allocated. We will keep track of this blocks by prepending a structure at the beginning of each block:

  struct block {
    block  *prev_;
    block  *next_;
    size_t size_;
    int    free_;

    block(block *prev, block *next, size_t size, int free):
    prev_(prev), next_(next), size_(size), free_(free) {}
    ~block() {}
  };

A newly created 'pool' will contain only one chunk in the list, like this:

The entire chunk will be described by one single block, marked as free. When the client allocates some memory, a new block is created:

This allocation process can continue until the chunk is exhausted. Then, a new chunk is allocated and 'push_back()'ed into 'mem_' and the process continues. Of course, allocations and deallocations happen randomly, in general. When deallocating a block, we check the previous and the next blocks, to see whether they are free. In this case, we glue them together into one larger block. This avoids the fragmentation of the chunks.

When an allocation request is received, we will travel along the blocks until we find one that is large enough and free. If there are free blocks, but none large enough, a new chunk is added to 'mem_'. Finally, when allocating from a block, we check the remaining part (assuming that the block is larger than the amount of memory to allocate). If the remaining part is smaller than a threshold value, we do not split the block, but mark it allocated entirely. This ensures that we won’t have useless small blocks.

The memory pool allocating schema has a couple of drawbacks:

It behaves enthropically: that is, if a chunk is added to 'mem_', it will remain there. We could add a check for completely free chunks, and delete them, but that would have a performance impact. We will leave that to you as an exercise (read: We are too lazy to implement that too…).
It cannot allocate contiguous memory blocks larger than 'size - sizeof(block)'. This could be addressed by making 'size' large enough (we do not like this large enough solution—it is a poor man’s approach), or, we could pass the 'size' as a template or constructor parameter.
Allocating and deallocating a large number of small blocks is slow. There is nothing to do here—just do not use a pool allocator in such a case.

With the design in place, implementing the 'pool' class is not that hard. You just have to keep in mind what you want to use it for, that is, an allocator. Thus, a couple of implementation decisions: copying pools makes no sense—the copy constructor and the 'operator =()' shall be private and not implemented. We plan to use pool by composition—there will be no virtual functions. The destructor also is not virtual, signaling that 'pool' is not meant to be inherited from.

We have implemented 'pool' directly in the header (pool.h). It makes no sense to show the entire implementation here—all we need to know for now, is, that it defines the public interface:

  class pool
  {
    enum {
      init_size = 0xfffff,
      min_size = 0xf
    };

  public:
    pool(size_t size = init_size) : size_(size);
    ~pool();
    void* allocate(size_t size);
    void deallocate(void *p, size_t = 0);
    void dump();    // for debugging

  private:
    // ...
  };

The second argument of 'pool::deallocate()' is a dummy. We will just pass the second argument of the allocators 'deallocate()', but won’t use it because 'pool' does not need this information. It is more of an aesthetic issue.

With the class 'pool' in place, writing a 'pool_allocator' seems to be trivial. We will come back to this later. Let’s have a look at the second memory management scheme first…

Memory Management—Simple Segregated Storage

The concept of a simple segregated storage also is not new. Our implementation is based on the example presented in The C++ Programming Language, 3rd Edition, by Bjarne Stroustrup, page 570. The idea behind this memory management technique is to grab a large chunk of memory and to partition it into equally large blocks of a known size—that is, the size of the type you want to allocate. This leads to a very fast allocation/deallocation of storage for that particular type.

As opposed to the memory pool, we will use a simple linked list. That is, each block contains a pointer that points to the next (free) block. Another difference is that, when allocating a block, the pointer will be trashed. When the block gets deallocated, we rebuild the pointer so it points to the 'head_' of the list and make the 'head_' point to the just released block. Well, as Pulitzer said, an image is worth a thousand words:

The image above shows the initial state of the storage. After allocating four blocks, it will look like this:

Now, assume that the second and third blocks are deallocated (in this order):

As you can see, after randomly allocating and deallocating blocks, 'head_' will point to an arbitrary block, and the pointers will jump back and forth through the chunk. Of course, we want to be able to grow the storage, so we will do the same thing we did with 'pool'—we will use a 'std::list<char *>' to store the chunks. This corresponds more or less to Bjarne Stroustrup’s example. There is one more point, though: We should also be able to allocate blocks of arbitrary size. Stroustrup leaves that as an exercise to the reader.

When allocating a block of a size different from the segregation size, we first check whether it is by chance smaller than it. In this case, we allocate a block as we usually do, and…finish (yes, we waste some memory here). If the size to allocate is larger than the segregation size, we compute the number of blocks needed. Then, we scan the chunk until we find a contiguous portion containing at least the computed number of blocks. We might have to grow the storage, even if we have enough free blocks, but not contiguous.

The public interface of the simple segregated storage class ('ss_storage') does not differ much from that of 'pool':

  template <typename T>
  class ss_storage
  {
    enum ss_defaults{init_size = 0xfffff};

  public:
    ss_storage(site_t size = init_size);

    T* allocate();
    void* allocate(size_t n);
    void deallocate(void *p, size_t n);
    void dump();
    ~ss_storage();

  private:
    // ...
  };

There are two differences:

We have a template class. We use the template parameter to determine the segregation size.
We have two versions of 'allocate()'.

As with 'pool', 'ss_storage' also is non-copyable and not meant to be inherited from.

Implementing the Allocators

With 'pool' and 'ss_storage' in place, it is no big deal to implement two allocators, 'pool_allocator' respectively 'ss_allocator'. The pool allocator will look like this:

  template <typename T>
  class pool_allocator
  {
  public:
    // The same as the default allocator

  private:
    static pool mem_;
  };

Note that the 'pool' member variable must be 'static' because any two 'pool_allocator' objects have to be equivalent. We will simply delegate the 'allocate()' and 'deallocate()' allocators to the 'pool'. The member function 'construct()' is trivial:

  void construct(pointer p, const T& val)
  {
    new(static_cast<void*>(p)) T(val);
  }

That is called placement new. The member function 'destroy()' is a bit trickier because it has to decide whether to call a destructor or not. We will use the way C++ resolves function overloading to solve this. We need a couple of helper functions:

  namespace pool_alloc {
    inline void destruct(char*) {}
    inline void destruct(wchar_t*) {}
    template <typename T>
    inline void destruct(T* t) { t->~T(); }
  }

We can simply delegate 'destroy()' this way:

  void destroy(pointer p) { pool_alloc::destruct(p); }

The compiler will pick the best match for 'destruct()' and we have no problem.

We implement 'ss_allocator' exactly the same way, except that we delegate to one of the 'allocate()' functions of 'ss_storage', depending on what is to be allocated:

  pointer allocate(size_type size, ss_allocator<void>
                   ::const_pointer hint = 0) const
  {
    if(size == 1)
      return mem_.allocate();

    return static_cast<pointer>(mem_.allocate(size * sizeof(T)));
  }

That is all. Or, it seems to be. We wrote a simple test harness for our allocators. We will use them to allocate some (useless) 'foo' class, both with 'std::vector' and 'std::list'. We will time the code to see how long the allocation takes and how long sorting the 'std::vector' and the 'std::list' takes. The test code looks like this:

  #include "pool_allocator.h"
  #include "ss_allocator.h"
  #include <vector>
  #include <iostream>
  #include <algorithm>
  #include "timer.h"
  #include <string>
  #include <ctime>

  using namespace std;

  const int size = 10000;

  struct foo
  {
    foo(const string &s = "Hello world", int i = 0) : s_(s),
       i_(i) {}

    string s_;
    int    i_;

    bool operator <(const foo& other) const { return i_ <
                                              other.i_; }
  };

  foo generator()
  {
    return foo("Hello world", rand());
  }


  int main()
  {
    timer t;
    srand(time(0));

    cout.precision(10);
    cout << "***Tests with <vector>***" << endl;

    t.elapsed();

    vector<foo, pool_allocator<foo> > pool_vector(size);
    cout << "Construction (pool_allocator):" << t.elapsed()
         << endl;

    vector<foo, ss_allocator<foo> > ss_vector(size);
    cout << "Construction (ss_allocator):" << t.elapsed()
         << endl;

    vector<foo> default_vector(size);
    cout << "Construction (default allocator):"
         << t.elapsed() << endl;

    generate(pool_vector.begin(), pool_vector.end(), generator);
    copy(pool_vector.begin(), pool_vector.end(),
                              default_vector.begin());
    copy(pool_vector.begin(), pool_vector.end(),
                              ss_vector.begin());

    t.elapsed();

    sort(pool_vector.begin(), pool_vector.end());
    cout << "Sort (pool_allocator):" << t.elapsed()
         << endl;

    sort(ss_vector.begin(), ss_vector.end());
    cout << "Sort (ss_allocator):" << t.elapsed()
         << endl;

    sort(default_vector.begin(), default_vector.end());
    cout << "Sort (default allocator):" << t.elapsed()
         << endl;


    cout << "***Tests with <list>***" << endl;

    t.elapsed();

    list<foo, pool_allocator<foo> > pool_list(size);
    cout << "Construction (pool_allocator):" << t.elapsed()
         << endl;

    list<foo, ss_allocator<foo> > ss_list(size);
    cout << "Construction (ss_allocator):" << t.elapsed()
         << endl;

    list<foo> default_list(size);
    cout << "Construction (default allocator):" << t.elapsed()
         << endl;

    generate(pool_list.begin(), pool_list.end(), generator);
    copy(pool_list.begin(), pool_list.end(), default_list.begin());
    copy(pool_list.begin(), pool_list.end(), ss_list.begin());

    t.elapsed();

    pool_list.sort();
    cout << "Sort (pool_allocator):" << t.elapsed() << endl;

    ss_list.sort();
    cout << "Sort (ss_allocator):" << t.elapsed() << endl;

    default_list.sort();
    cout << "Sort (default allocator):" << t.elapsed() << endl;

    cout << "Press <Enter>..." << endl;

    char c;
    cin.getline(&c, 1);

    return 0;
  }

That is standard C++ code, except for the class 'timer'. The latter does not interact much with the code. It compiles fine under all environments.

The first surprise is that the test code fails to compile under VC++ 6.0, both when using the Dinkumware STL and when using STLport; it fails to compile under Dev-C++, both with the GNU ISO C++ Library and STLport and … yes, you guessed it, it fails to compile under KDevelop, too. This brings us to the third step when implementing your own allocator. Remember, I said there were two steps (actually three)?

This third step is the most unpleasant one because it deals with broken compilers, STL implementations that use obscure workarounds, and documentation that is written in a way that makes you think that it was never meant to be read by someone.

Making the Code Compile and Run

Everyone who has used templates and the STL knows how confusing the error message the compilers issues can be. Because we mainly work with VC++ 6.0, it was also the first compiler we used to test the code. We used the standard settings (that is, Dinkumware), and the result was:

C:\Programme\Microsoft Visual Studio\VC98\INCLUDE\list(386) : error C2039: ‘_Charalloc’ :
is not a member of ‘pool_allocator<struct foo>’
C:\Programme\Microsoft Visual Studio\VC98\INCLUDE\list(386)
: while compiling class-template member function ‘struct std::list<struct foo,class pool_allocator<struct
foo> >::_Node *__thiscall std::list<struct foo,class pool_allocator<struct foo> >::_Buynode(struct
std::list<struct foo,class pool_allocator<struct foo> >::_Node *,struct std::list<struct foo,class
pool_allocator<struct foo> >::_Node *)’

'_Charalloc'…what on earth is '_Charalloc'? In this case, MSDN kindly gives us a hint.

VC++ 6.0 does not understand member template classes, so 'template <class U> struct rebind { typedef allocator other; };' is useless. Good to know. So, if our allocators have to work with VC++ 6.0 and Dinkumware, we need to implement a member function called '_Charalloc()':

  // Begin Dinkumware (VC++ 6.0 SP5):
  char *_Charalloc(size_type n) { return static_cast<char*>
                   (mem_.allocate(n)); }
  // End Dinkumware

As a side note, you normally should not choose identifier names that begin with an underscore, or that contain two concatenated underscores. The standard clearly says that such names are reserved for the compiler/library implementation. In the case of '_Charalloc', it is okay, however, because we follow the instructions of the library implementer.

Now it should work, shouldn’t it? Yes, it compiles. However, the VC++ linker complains in debug mode (and sometimes in release mode, too):

main.obj : fatal error LNK1179: invalid or corrupt file: duplicate comdat
“?for_each@std@@YAP6AXPAD@ZViterator@?$list@PADV?$allocator@PAD@std@@@1@1P6AX0@Z@Z”

This time MSDN is practically useless. The help for LNK1179 says nothing that is even remotely applicable to this code. And, searching for COMDAT will tell you that the compiler will package functions in the form of packaged functions (COMDATs). Not really helpful. Skimming through the Internet, we read suggestions to turn on inlining in debug mode, and other tweaks of the project settings. First, that does not help in a consistent manner (that is, the error goes away, you change a few lines in the code, and then you have the error again). Second, we do not want to change the project settings. Okay, this is just example code, so we actually might not care, but if we did so, all this discussion had no practical value.

The correct solution to eliminate the error has two steps. The first one is to accept that the VC++ 6.0 compiler is brain-damaged here—changing its settings won’t get you anything. The second step is to find a way to circumvent the conditions generating the error without breaking the code. The error clearly complains about 'std::for_each()'. We use 'for_each()' twice, namely in the destructors of 'pool' and 'ss_storage':

  ~pool()
  {
    std::for_each(pool_mem_.begin(), pool_mem_.end(), kill);
  }


  ~ss_storage()
  {
    std::for_each(ss_mem_.begin(), ss_mem_.end(), kill);
  }

Both classes have a static member function named 'kill()':

  static void kill(char *p){delete [] p;}

Because both classes are implemented in their headers, the definition of both destructors will be in the same translation unit ('main()'). And, the compiler will instantiate 'for_each()' with the following template parameters twice:

  std::for_each<std::list<char *>, std::allocator<...> >::iterator,
                std::list<char *>, std::allocator<...> >::iterator,
                void(*)(char *)>()

Well, we know that the third template parameter has the wrong type. The VC++ 6.0 compiler does not understand that the two static member functions have different types because they belong to different classes, and looks at them as being identical. And even if they were identical, the compiler should not instantiate 'for_each()' twice. It should definitively not do that, but it does it. We could replace 'for_each()' with a hand-crafted loop, or, if VC++ 6.0 insists on instantiating 'for_each()' more than once, we can fool it into taking different template parameters. Replacing it with a hand-crafted loop is very simple—but not elegant. Imagine that we had the same problem with 'std::sort()'—not so easy to replace with hand-crafted code. That is why we choose the second solution.

We simply replace the static member function 'kill()' with a functor, named 'killer':

  struct killer
  {
    void operator()(char *p) { delete [] p; }
  };

We also slightly change the destructors:

  ~pool()
  {
    std::for_each(pool_mem_.begin(), pool_mem_.end(), killer());
  }


  ~ss_storage()
  {
    std::for_each(ss_mem_.begin(), ss_mem_.end(), killer());
  }

That solves the problem. Although the two 'for_each()' lines look similar, the second template parameters have different types, namely 'pool::killer::operator()(char*)' and 'ss_storage::killer::operator()(char*)', and VC++ 6.0 understands it this time.

Note: The first code version, using a static member function, is perfectly legal C++. It compiles and links without problems under Dev-C++ and KDevelop. However, if we need the code to compile under VC++ 6.0, we have to change it. Always seek for a solution that also is standard and thus portable.

Dev-C++ was less problematic. It just gave us a lesson about const-correctness. As we stated before, we were using some 'foo' structure to fill the 'std::vector' and the 'std::list' in the test harness. Here is the definition:

  struct foo
  {
    foo(const string &s = "Hello world", int i = 0) : s_(s),
       i_(i) {}

    string s_;
    int    i_;

    bool operator <(const foo& other) const { return i_<other.i_; }
  };

We accidentally omitted the 'const' that is marked in bold. And, we promptly got an error because the STL implementation expects the 'operator <' to be 'const'.

So, we got it running, under VC++ 6.0, Dev-C++, and KDevelop—all using their built-in STL implementations. Do we manage to have the code running with STLport, too?

Well, the first prerequisite is to manage to install STLport for the named IDEs. For VC++ 6.0, it is pretty simple, but for Dev-C++ it is a real fight. The details are beyond the scope of this article, but if you are curious to hear what we have to say about that, drop us a line.

Having the a priori knowledge that VC++ 6.0 does not support member template classes, we expected the allocators not to work with STLport because they certainly solve the rebinding problem differently than Dinkumware. And we were right—switching to STLport under VC++ 6.0 gave us the following error:

C:\PROGRAMME\MICROSOFT VISUAL STUDIO\VC98\INCLUDE\STLPORT\stl/_alloc.h(503) : error C2784: ‘class ss_allocator<_Tp2>
&__cdecl _STL::__stl_alloc_rebind(class ss_allocator<_Tp1> &,const _Tp2 *)’
: could not deduce template argument for ‘class ss_alloc ator<T> &’
from ‘class pool_allocator<STRUCTfoo>’
C:\PROGRAMME\MICROSOFT VISUAL STUDIO\VC98\INCLUDE\STLPORT\stl/_alloc.h(502) : while compiling class-template
member function ‘struct foo *__thiscall _STL::_STLP_alloc_proxy<STRUCT foo
pool_allocator<struct foo,class *,struct> >::allocate (unsigned int)’

There actually were more errors, but all looked like this one. How do you solve something like that? The first step is, of course, to have a look at the line in the library file that gives the error. A few words about STLport, at this place: First, it comes with some documentation, but that won’t help much. Second, it is implemented in a much more complicated way than the Dinkumware or the GNU ISO C++ one. The reason is that STLport is designed to work with a large number of compilers and compiler flavors, under a large number of platforms, as opposed to the other two, which are tailored to work with a specific compiler under a specific platform.

This has its price, though: The implementation is a labyrinth of conditional compiled blocks and macros. It is not a simple task to track them down and to find out what exactly gets compiled in a particular case. Fortunately, most compilers have an option that will have them output the preprocessed code to a file. For VC++ 6.0, the necessary switches are /P /EP. This will produce a file with the extension .i containing the preprocessed code. The bad news is that this file contains a huge number of empty lines, thus being hard to read. The following IDE macro will eliminate all empty lines from the currently opened file:

  Sub DeleteEmptyLines

    Dim l
    l = 1

    ActiveDocument.Selection.StartOfDocument

    While(True)
      ActiveDocument.Selection.SelectLine

      If ActiveDocument.Selection = vbCrLf Then
        ActiveDocument.Selection.Delete
      Else
        ActiveDocument.Selection.LineDown l = l  + 1
      End If

      ActiveDocument.Selection.CurrentLine < l Then Exit Sub
    Wend

  End Sub

The file will, of course, contain all included headers, all macros will be expanded, and only the portions that are really going to be compiled will appear. So, if you have to dig yourself into the implementation of STLport, this is one good method. It has a drawback, however: The comments are stripped away, and these are of crucial importance sometimes. This is unfortunately the case when implementing a custom allocator that has to work with STLport and a compiler that does not support template member classes (such as VC++ 6.0). Having a look to stl/_alloc.h, 38 lines before the line where the error is reported, reveals the following comment:

  // If custom allocators are being used without member template
  // classes support:
  // user (on purpose) is forced to define rebind/get operations!!!

The rebind/get operations for the default STLport 'allocator' follow after this comment. Using them, we implement the rebind/get operations for our allocators:

  // For VC++ 6.0/STLPort 4-5-3 see /stl/_alloc.h, line 464
  // "If custom allocators are being used without member template
  // classes support:
  // user (on purpose) is forced to define rebind/get operations!!!"
  #ifdef _WIN32
  #define SS_ALLOC_CDECL __cdecl
  #else
  #define SS_ALLOC_CDECL
  #endif

  namespace std {
    template <class _Tp1, class _Tp2>
    inline ss_allocator<_Tp2>& SS_ALLOC_CDECL
    __stl_alloc_rebind(ss_allocator<_Tp1>& __a, const _Tp2*)
    {
      return (ss_allocator<_Tp2>&)(__a);
    }

    template <class _Tp1, class _Tp2>
    inline ss_allocator<_Tp2> SS_ALLOC_CDECL
    __stl_alloc_create(const ss_allocator<_Tp1>&, const _Tp2*)
    {
      return ss_allocator<_Tp2>();
    }
  }

We do the same for the 'pool_allocator'. Note that we have to place the template functions into the 'std' namespace. Normally, the 'std' namespace is terra sacra for the user: You are not allowed to place anything of your own there. However, in this case, we follow the instructions of the library implementer.

With these changes in place, both our allocators work under all STL implementations, compilers, and platforms we aimed for:

Conclusion

This article is not only meant to teach you what a standard allocator is and how it works. It is also aimed at showing a modus operandi, when implementing extensions to the STL and when having to deal with broken compilers. Understanding how to write portable code is much more important than having some out-of-the-box allocator that won’t outperform the standard one in the general case, anyway.

Final note: If you are wondering why 'pool_allocator' performs so poorly with 'std::list' under Linux: our Linux computer is much slower than the Windows 2000 one.

Final-final note: If you are wondering why we do not provide complete out-of-the-box demo applications…we have decided to simply provide the necessary source code files only rather than specific Visual Studio or KDevelop projects for various reasons. Nevertheless, creating a project should be obvious…