Application-Level Memory Management for Memory-Constrained Devices

1. Introduction

Dynamic memory allocation has always played an important role in software development. Real-time system applications allocate and deallocate memory quite frequently, affecting performance and fragmentation severely. Better performance and memory management can be obtained if applications manage their 'own' memory on real-time systems. This article discusses a faster and better dynamic memory allocation technique for real-time systems.

2. Techniques for Application-Level Memory Management

Before discussing the development of a C++ framework for memory management, here's a background of the two common memory management techniques that are popular within the development community:

  • Explicit lists
  • Segregated lists

The C++ framework discussed here makes use of the segregated list mechanism.

2a. Explicit lists

Explicit lists generally have the memory region connected as a:

  • Singly linked or a doubly linked list
  • The list also can be circular

Figure 1 shows how contiguous heap regions are woven into singly linked lists. Occupied cells generally have a header placed at the beginning of the cell, followed by the actual 'dynamic data'. The memory management system keeps a pointer to the 'head', the freelist. On getting a 'malloc()' request (as in the case of 'C/C++' ) for an object of size X, the system will traverse through the 'freelist' to figure out a block of size Y, given Y >= X, where this object can fit in. The problem arises if Y is far greater than X. Typical memory management systems adopt a 'best fit' strategy where a 'Y - X' value is reduced to the minimum, thus reducing fragmentation. The users also can opt for a 'first fit' strategy.

Figure 1: Explicit lists structure

Advantages:

  1. Bookkeeping is easy.
  2. Less memory is consumed for bookkeeping.

Disadvantages:

  1. Allocation and deallocation are much slower. The time required is not predictable.
  2. Coalescing of cells is difficult.
  3. Loops can result if not managed properly.
  4. General arithmetic is very complex.

2b. Segregated lists

Segregated lists are another mechanism in which there are different free lists for blocks of particular sizes. In this case, there are:

  • Different free lists for blocks of different sizes
  • Typically doubly linked
  • Need header and footer for coalescing
  • The links need not be in the same order as the blocks

Figure 2: Segregated Lists structure

Advantages:

  1. Less and fine grained fragmentation
  2. Allocation and deallocation faster
  3. Coalescing of cells easier
  4. Locating a free cell is easier

Disadvantages:

  1. Bookkeeping is difficult.
  2. realloc() is difficult.

The C++ framework discussed here uses the segregated list mechanism.

3. Segregated Lists

Any kind of memory management begins with a big chunk of contiguous memory and a free memory block. This free contiguous memory block has to be initialized first.

Initialization:

Initially, the free block would be as shown below. This free block is called the 'free list'.

Figure 3: Free memory block (memory pool)

This big block is divided into smaller blocks. All small blocks are the same size and they are called cells or pages, of size say M.

Figure 4: Memory segmented into pages

Objects of size "less than or equal to" M/2 are packed into the same cell/page. That is, a page of size M can hold a maximum of 2 objects of size M/2 and a free cell/page can hold 4 objects of size M/4. To identify the "free block/cell" within a cell, a bitmap is used. An occupied cell is marked with a 1 and the one not occupied is marked with a 0. This can be easily noticed from Figure 5.

Figure 5: Bitmap indicating free or allocated block

In Figure 5, the shaded areas are the cells that have been occupied and the white ones aren't occupied. Each block is of size M/4 within the cell. The inner cells are not restricted to just objects of size M/4. This statement means that the cells of size M/4 can hold objects of size X, where M/8 < X <= M/4, thus making it a completely binary arrangement.

Speed and response times are an important aspect of any real-time system. This would inadvertently mean that the memory management in real-time also has to be fast. Power of 2 (divide and conquer) algorithms are the most suited for attaining faster performances. The application level memory manager algorithm discussed here also uses a "power of 2" technique to achieve better performance.

3a. Segregated lists and binary arithmetic (buddy system)

In the algorithm that will be discussed here, all memory 'allocations' are categorized based on the size of the object, where the size of the objects falls with a range of powers of 2. For example, if the size of the object being allocated is. say. 35 bytes, the object will fall into the category where the objects are of size greater than 32 (2 ^ 5) and less than or equal to 64 (2 ^ 6).

Each page holds objects of one 'category', grouping objects into different categories. That is, if a an object of size 'm' is allocated in a page P, such that i < m <= j (j = i+1), where i and j are some adjacent powers of 2, all the objects within that page would be of size p, where p = 2 ^ j.

For example, if the allocation request is for an object X of size "17" bytes, for this particular allocation request, the equation shown above translates to 16 < 17 <= 32, where i = 4 and j = 5 are the powers of 2 and m is equal to 17. All the objects in the category have the size '32' bytes, irrespective of whether they need 32 bytes or not. The page in which X is being stored would hold objects of size 16 < X <= 32 bytes. Any allocation request for an object of a different size that doesn't fall into this range would go to a different page.

Objects that fall into the same 'category' are linked together using a separate chain as shown in Figure 6 below.

Figure 6: Data structures in segregated list

There will be a 'head' pointer that would point to the first page in the 'category' chain and there will be a 'tail' pointer that would point to the last page in the 'category' chain.

Application-Level Memory Management for Memory-Constrained Devices

3b. Segregated lists and allocation strategy

Allocation strategies are grouped into three, based on the size of the object.

  • Allocation for objects of size X, where X <= M/2

    Pages generally contain more than one object if the size of the object being requested is X where X <= M/2, M being the page size. As discussed earlier, the 'category' to which X belongs is identified by finding two powers of 2, i, and j, such that i < X <= j.

    If the head of the segregated list does not have free space, a new page is allocated from the free list and put at the head of the segregated list (in the appropriate category).

  • Allocation for objects of size X, where M/2 < X <= M

    Request for a block of size X where X > M/2 where M is the page (block) size. This is the case where a page accommodates only one element.

    A new page is allocated from the free list and put at the head of the segregated list.

  • Allocation for objects of size X, where X > M

    1. Multiple blocks span across cells.
    2. They can or need not be part of "segregated chain".

This is the case of a multi-page block. Multi page blocks waste memory. However, it's quite unlikely that memory constrained applications come up with requests for big blocks. The number of pages occupied by these blocks is generally (X / M) + 1. These multi-page blocks have to be handled separately. The number of pages that they span through can be identified easily by looking at the contents of the first page that contains them.

Figure 7: Memory fragments in segregated lists

Figure 8 shows one possible memory layout for the allocation scheme shown above.

Figure 8: Memory layout in segregated lists

In Figure 8, The yellow pages hold objects of size X, where 1 <= X <= 16, whereas the blue pages are the ones that hold objects of size Y, where 16 < X <= 32. Head1 begins somewhere in the middle and chains to the block that is at the beginning of the contiguous memory chunk, which in turn gets connected to the last page in the contiguous memory chunk. However, the chain for objects of size Y moves forward without having backward chains. The 'white' blocks in between are the pages that either have not been allocated, or that have gotten deallocated after an allocate operation.

4. Multi-Page Objects and Boundary Tag

A free list is normally a chain of a set of free pages. The pages that appear in the free list can either be contiguous or scattered. In Figure 9, you can notice that the cells marked in orange are the free blocks. The first two blocks have just one page whereas the 3rd block in the chain is a set of contiguous pages. Objects that span multiple pages (objects that have a size X, where X > M, M being the page size) have a set of multiple contiguous pages.

Figure 9: Free list tracking in segregated list

Boundary Tags

Boundary Tags are used in the case of objects that span across pages. The last page of a multi-page object will have a pointer to the first page in the block. This is typically used during coalescing of blocks during deallocation. In Figure 10, data is a contiguous block of 'n' pages and data is a multi-page object. The last page, (page n) here, has a link to the first page (page 1). This link is a boundary tag.

Figure 10: Boundary tag pointers for deallocation of memory

5. Deallocation and Coalescing

During deallocation of a cell, it is checked whether the adjacent page(s) can be merged with the page being freed or not. In the layout shown in Figure 11, the orange block is the one being freed. It can be noticed easily that it can be merge with the 'page' on its left. However, merger with the right page cannot occur because it is still occupied.

Figure 11: Merging adjacent free memory blocks

In the algorithm discussed in this memory manager, coalescing with the "left" page is easy. All the user needs to do is to add the size of the "pages getting freed" to the "left block." In other words, the size of the orange block is 2M (where M is the page size) and the blank block on its left has a size of a*M. Then, on freeing the orange block, the user will have (a+2)*M sized contiguous block as shown in Figure 12. Each page has a bitmap that identifies whether it is free or not. The 'last page' in a multi-page block always has a pointer to the head. This is called a boundary tag. In the case of a single page block, the boundary tag can be set to either NULL or to point to itself. Based on the boundary tag, the user can get to the head of a page. Be it a single page block or a multi-page block, the head will always have the 'size' of that particular block. A single page will a size M whereas a multi-page block will have a size of a*M, where 'a' is the number of pages in that particular block.

Figure 12: Memory block after free blocks merging

Coalescing with the "right" blocks is a bit tricky. First, you have to find out what the "right" cell was pointing to, and make the "cell getting freed" point to what the "right" cell was pointing to. In Figure 13, the purple cell is the one getting freed. The cell to its "right" is free and it is pointing to the next free cell available.

In this case, the user can:

  1. First coalesce this cell with the one to its right and then coalesce with left one OR
  2. Coalesce with the cell to its left first and then the right one.

The strategy used here coalesces with the right one first. In Figure 14, block C points to block E, which is at the end of the contiguous memory chunk. On freeing B, it has to first coalesce with block C, and then the 'coalesced' block has to be further consolidated with block A. In the first step (in other words, B coalescing with C), B is made to point to block E (the block that C was pointing to, till the 'merge') as shown in Figure 15. The size of the block C gets added to the size of block B. After that, a test is made to see whether this consolidated chunk (B + C) can be coalesced further with block A (in this case, it can be). This is done as shown in Figure 16.

Note: The intermediate pointer to E which gets created from E, gets dropped.

Figure 13: Initial memory block

Figure 14: Memory block after 'right' merge

Figure 15: Memory block after 'left' merge

Application-Level Memory Management for Memory-Constrained Devices

6. C++ Framework

The backbone of the C++ memory manager discussed here is the MemManager class. It has to be ensured that only one instance of the MemManager class (make MemManager class a singleton) is active at any point in time. The good old technique of using 'friend functions' is used to make the class singleton because it not only ensures that it's a singleton but also that the MemManager class' only instance gets created only when the user tries to allocate some space dynamically. The other techniques, such as using a static member variable to make singleton, are not as effective as using the friend functions.

Note: This framework does not discuss making this class thread safe.
class MemManager
{
private:
   MemManager();    // suppress the constructor and copy the
                    // constructor
   MemManager (MemManager&);
   void operator= (MemManager&);
                    // and the overloaded '=' operator

public:
   friend MemManager& getInstance();
};

In the implementation of the "getInstance()" method, the user has to create a 'static' member and return a reference to that as shown in the following code snippet.

MemManager::MemManager()
{
   init();
}
MemManager& getInstance()
{
   static MemManager theInstance;
   return theInstance;
}

Once the 'singleton framework' is ready, the user has to provide the basic allocation and deallocation routines. The allocation and deallocation routines do all the necessary memory management activities. The init() routine initializes the data structures.

Once the user has ensured that there is only one instance of the memory manager alive during the execution of the application, the user has to make sure that no malpractices happen purposely or accidentally during code execution. Suppress the constructor, copy constructor, and '=' operator by setting them to 'private'. This ensures that users at no point in time can create any other instance of this class.

class MemManager
{
private:
   MemManager();    // suppress the constructor
                    // and copy the constructor
   MemManager(MemManager&);
   void operator=(MemManager&);
                    // and the overloaded '=' operator

public:
   friend MemManager& getInstance();

   void* allocate(size_t s); 
   void deAllocate(void* ptr);
};

C++ provides a facility to intercept all the "new" and "delete" calls. By overloading the "global" new, new[], delete and delete[] operators, the user can redirect all the calls to the OS memory management system to the allocate() and deallocate() routines of the memory management framework.

void* operator new(size_t s)
{
   return getInstance().allocate(s);
}
void operator delete(void* ptr)
{
   return getInstance().deAllocate(ptr);
}
void* operator new[](size_t s)
{
   return getInstance().allocate(s);
}
void operator delete(void* ptr)
{
   return getInstance().deAllocate(ptr);
}

All the information pertaining to a page would be kept in the Cell class. The basic structure of the Cell class would be as shown in the code below.

struct Cell
   {
      long    dataSize;
      void    *data;

      long    noOfElements;
      long    bitmap;

      Cell    *prev;
      Cell    *next;
      Cell    *header;    // boundary tag
   };

The data object points to the actual memory location where the newly allocated object would be kept. The bitmap object tells whether a particular area within a cell/page can be occupied by the new object (to be created) or not. The dataSize object tells the size of the each object that the page contains (this is always a multiple of 2). The header field is used to get to the first page in the case of multi-page block.

xx
struct Cell
   {
      long    dataSize;
      void    *data;

      long    noOfElements;
      long    bitmap;

      Cell    *prev;
      Cell    *next;
      Cell    *header;    // boundary tag
   };

The init() routine has to do a lot of work before the actual allocation-deallocation cycle happens. This involves:

  1. Creating the free list. (The free list is the initial chunk of memory that the system allocates from the system at the beginning. All further allocation-deallocations happen in this free chunk.)
  2. Initializating the data structure used for bookkeeping.
  3. Partioning the free list into cells/pages of equal sizes.

Figure 16: Data structure for bookkeeping

Static char buffer[MAX_BUFFER];
// align it to a word boundary if desired
Const int noOfPages = MAX_BUFFER / PAGESIZE;
Void MemManager::init()
{
   // 1. Allocate a big chunk. This has to be a preferably
   //    a static block. For example:- buffer;
   // 2. Divide it into cells/pages of equal size. This can
   //    be done by setting the 'Cell' datastructure point
   //    to the buffer

for(int i=0;i<noOfPages;i++) Cell[i].data = &buffer + i*PAGESIZE; Cell[0].dataSize = noOfPages*PAGESIZE; // This is required because intially, it's a big chunk/block // of data. Later it gets fragmented into smaller chunks. // 3. Also set the category list to null, as it would // initially not contain any page for(int j=0; j<noOfCategories;j++) CellCategory[j].head = CellCategory[j].tail = 0; }

Figure 17: Bitmap indicating free and allocated blocks

Application-Level Memory Management for Memory-Constrained Devices

In this case, the cell/page holds at most four objects. The bitmap in Figure 17 clearly shows that of the four cells, cells 1 and 3 are occupied. Cells 2 and 4 are not. Assuming that the size of each cell is 256 bytes, in this particular case, the size of each of the inner partitions is 256/4 = 64 bytes. The size of the object being allocated to each partition should be 33 to 64 bytes. This results in a fine-grained fragmentation, the worst being the case when the size of the object is 33 bytes (these result in wastage of 31 bytes).

The categorization and chaining of pages of same 'size' is done using the CellGroup class.

class CellCategory
{
   Cell *head;
   Cell *tail;
};

Figure 18: Memory structure

Any call for dynamic memory, say "new char[17]", gets intercepted by the allocate() routine of the memory manager. The allocate routine first identifies the category into which the object falls. As is visible from Figure 18, the category is 2; in other words, an object of size 17 - 32. If the head link of that category has a free space, the new object is assigned to that particular location and the address is returned. In addition to that, the bitmap is also masked accordingly.

Figure 19: Memory structure and bitmap showing free and allocated blocks

Once the head page/cell is full, it is moved to the end of the category list. This ensures that a free cell is either always available at the beginning of the list, or the memory manager has to allocate a fresh page from the free list.

In general, the allocate() routine does the following.

Void* MemManager::allocate(size_t s)
{
   // 1. Find the category to which the object falls.
   // 2. If a free 'partition' is available at the 'head',
   //    grab it immediately.
   // 3. If not, then grab a new cell/page from the free list.
   //    Put it at the 'head' of the category list.
   // 4. Allocate space for the object, from the cell
   // 5. Set the bitmap accordingly.
   // 6. Return the address.
}

Multi-page/multi-cell blocks and single-page blocks have to be taken care of separately.

Void* MemManager::allocate(size_t s)
{
   // 1. Find the category to which the object falls.
   // 2. If the object is a multi-page or a single-page object,
   //    then grab fresh cells/pages.
   // 3. Add them to the category list.
   // 4. If the object doesn't span the whole page/multi page,
   //    then if a free 'partition' is available at the 'head',
   //    grab it immediately.
   // 3. If not, grab a new cell/page from the free list.
   //    Put it at the 'head' of the category list.
   // 4. Allocate space for the object, from the cell
   // 5. Set the bitmap accordingly.
   // 6. Return the address.
}

Any freeing of a dynamically created memory, say "delete[] obj", gets intercepted by the deAllocate() routine of the memory manager. The ttt>deallocate routine first identifies the category into which the object falls. This is easy because the size of the 'cell/page' is always fixed. The cell boundary address can be obtained with simple arithmetic.

CellNum = (objectAddress - baseAddress) / CELLSIZE;

Once this is known, the active cell boundary can be obtained by the following arithmetic.

boundaryAddress = baseAddress) + CellNum*CELLSIZE;

With all these information in hand, freeing an allocated space is very simple. Set the bitmap to '0' for that particular location.

Figure 20: Memory structure and bitmap before 'free'

Figure 21: Memory structure and bitmap after 'free'

It is quite likely that, once the object is freed from the 'Cell', the complete 'cell/page' is free. Once a cell/page is completely free, it has to be returned to the free list. This is also done by the deAllocate() routine.

Void MemManager::deAllocate(void* ptr)
{
   // 1. Find the category and the cell to which the
   //    object falls.
   // 2. Set the bitmap to '0' for the partition being freed.
   // 3. If the whole cell/page is free, delete the page from
   //    the category list and return it to the free list.
   // 4. If not, put the object at the 'head' of the list as
   //    it guarantees that there is always a free cell at
   //    the head of the list
}

Void MemManager::deAllocate(void* ptr)
{
   // 1. Find the category and the cell to which the object falls.
   // 2. Set the bitmap to '0' for the partition being freed.
   // 3. If the whole cell/page is free, delete the page from
   //    the category list and return it to the free list.
   // 3-1. During the return to the free list, try to coalesce
   //      it with the adjacent blocks.
   // 4. If not, put the object at the 'head' of the list as
   //    it guarantees that there is always a free cell at the
   //    head of the list
}


Void MemManager::coalesce(void* ptr)
{
   // 1. Check if the immediate previous block is free.
   //    Let us call that 'p'.
   // 2. If its free, it's quite simple. Just add the size of
   //    the current block (lets call this 'curr') to the
   //    previous block's capacity.
   //    ie. p->size += curr->size; curr = p;

   // 3. Next, check if the adjacent 'next' block is free.
   //    Let us call that 'n'
   // 4. If the next block is free, it gets quite complicated.
   // 4-1. Find the 'next' block of 'n'. Make that the 'next'
   //      of the 'curr' block.
   //      ie curr->next = n->next;
   //      curr->size += n->size;
}

7. Summary

In short, a 'power of 2 buddy-system' algorithm, along with 'separate chains' for objects of particular size range, allow the application to achieve very good performance and memory management.

8. References

  • Dynamic Storage Allocation For Realtime Systems by M. Masmano, I. Repoll, and A. Crespo.
  • Dynamic Storage Allocation—A Survey and Critical Review by Paul R. Wilson, Mark S. Johnstone, Michael Neely, and David Boles.
  • Art of Computer Programming—Fundamentals of Computer Algorithms by Donald E Knuth.
  • More Effective C++ by Scott Meyers.


Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • Agile methodologies give development and test teams the ability to build software at a faster rate than ever before. Combining DevOps with hybrid cloud architectures give teams not just the principles, but also the technology necessary to achieve their goals. By combining hybrid cloud and DevOps: IT departments maintain control, visibility, and security Dev/test teams remain agile and collaborative Organizational barriers are broken down Innovation and automation can thrive Download this white paper to …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds