Simple Binary Tree Class

Environment: Developed in (but not restricted to) VC6

I won't go into the gory details of Binary Tree Theory as this has already been discussed by Per Nilsson in his "Balanced Binary Tree" submission found here.

Binary Search Trees are useful for finding items in a list that changes infrequently. Add and Remove operations are typically expensive since Binary Search Trees require that a tree be balanced.

Suffice to say that this class is just a different implementation on the same theme.

Implementation Details

Creating the Tree

To create the Binary Tree simply create a CSimpleBinaryTree object and initialise it.

#define MAXELEMS 100
CSimpleBinaryTree btree;

btree.Initialise(MAXELEMS,sizeof(CSomeObject));
or
btree.Initialise(MAXELEMS,sizeof(CSomeObject), someCompareFunction);
or
btree.Initialise(MAXELEMS,sizeof(CSomeObject), someCompareFunction, nGrowTrigger, nGrowByValue, nShrinkTrigger, nShrinkByValue);

The "someCompareFunction" is a pointer to a function that takes two pointer parameters and returns a negative value, 0 or a positive value depending on whether the 1st parameter is less than, equal to or greater than the 2nd parameter respectively. CSimpleBinaryTree provides a generic global compare function that compares two pointers of type long.

Adding Elements

To add an item to the tree, simply call AddItem() and give it a pointer to the type of object being stored. Internally the data referenced by the pointer is copied into dynamically allocated memory.

CSomeObject sObj;
CSomeObject *psObj1=&sObj;
CSomeObject psObj2;
int nResult;
nResult=btree.AddItem(&sObj);
or
nResult=btree.AddItem(psObj1);
or
nResult=btree.AddItem(psObj1, psObj2);

The AddItem() function returns two values. The first is the integer result. If the item was added successfully then the index of the item in the tree is returned, if not then one of the following error values is returned:

  • DUPLICATE_FOUND
  • OUT_OF_MEMORY
  • TREE_IS_FULL

The second value returned is to the optional second parameter. If the operation was successful then a pointer to the newly allocated object is returned, if the operation was unsuccessful then the pointer is set to NULL;

The return value DUPLICATE_FOUND is only returned if the public member variable m_bAllowDuplicates is set to FALSE otherwise the tree will allow duplicate values (the default).

If duplicates are not allowed then the tree is searched every time an item is added to see whether it exists already. This slows down the speed of adding items but does potentially save on the amount of memory allocated.

Removing Elements

To remove an item simply call the RemoveItem() function and feed it the index of the item you want to remove.

BOOL bResult;
bResult=RemoveItem(nIndex);

If the item was successfully removed from the tree, the function returns TRUE, otherwise it returns FALSE;

When an item is removed from the tree all elements above the said item are shifted left one place and the "total number of items" counter decremented.

This isn't very efficient as, potentially, the function might have to loop through the whole tree. However adding and removing elements from binary trees is inherently slower than other storage/retrieval algorithms due to the fact that binary search trees require the elements be sorted.

Searching the Tree

To find if an item exists in the tree, simply call the FindItem() function.

int nIndex;
nIndex = btree.FindItem(psObj);
or
CSomeObject *pFoundObject;
nIndex = btree.FindItem(psObj, pFoundObject);

FindItem() returns two values. The first is the index of the item in the tree if it exists or the ITEM_NOT_PRESENT value if it does not. The second value is returned to the optional second parameter. If the item is found then pFoundObject will point to the object stored in the tree, if the item was not found then pFoundObject will be set to NULL;

There are two searching algorithms implemented in the CSimpleBinaryTree class. A linear search and a binary search. The linear search is used up to the point when the number of items in the tree exceeds the threshold switching value (default is 10), thereafter the searching is done using the binary search algorithm. The reason for this is that a linear search doesn't require the elements to be sorted and its algorithm is a lot less complicated. So for a small number of items a linear search is ideal.

If duplicates are allowed (m_bAllowDuplicates=TRUE) in the tree then balancing (all items sorted) is done only on an "as needed" basis as opposed to when m_bAllowDuplicates=FALSE and the items are sorted everytime an item is added. Instead, sorting may occur within a call to the FindItem() function. When searching for an item using FindItem() and the binary search algorithm is being used, the search may fail even if the item exists in the tree. This happens if the tree is not fully balanced. In this case the algorithm searchs for the item and fails, detects that the tree is not fully balanced, balances it by sorting the items in the tree and calls FindItem() again recursively. Once the tree is balanced an internal flag is set and is only unset when new items are added to the tree, thus avoiding any recursive loops. Therefore all subsequent calls to FindItem() will avoid this double call mechanism. To the programmer there is no need to take any special action. This is just a note about the internal implementation of the class.

It is hoped that this increases the efficiency in the case where duplicates are allowed as there is no need to call SortItems() every time a new item is added to the tree. The sorting algorithm used is the C library function qsort().

Resizing

To increase the size of the binary tree to allow more items to be added or to reduce the number of items allowed in the tree use the ReSize() method.

btree.ReSize(300);

The call to ReSize() above sets the maximum number of elements allowed in the tree to 300. However if the tree contained say 220 items and it's maximum allowed number of items was 250 and we called btree.ReSize(100), then the first 100 items of the tree would be transferred internally to a new pointer array and the subsequent 120 items would be removed from the tree and the memory they occupied would be freed.

By setting the public member variable m_bAutoResize to TRUE (the default), the tree is automatically resized using the parameters contained in the member variables controlling Growing and Shrinking.

Growing and Shrinking

Growing and shrinking parameters control the size of the tree when the public member variable m_bAutoResize is set to TRUE. There are two variables for each operation. A Trigger and a GrowBy/ShrinkBy value. All these values are represented as percentages of the current tree size.

The trigger value determines when either a grow or shrink operation will occur. The GrowBy/ShrinkBy percentages determine to what extent the tree will grow or shrink.

For example if I have 80 items in the tree where the max allowed number of items is 100 and m_bAutoResize=TRUE. Also the GrowTrigger is set to 90 (90%) and the ShrinkTrigger is set to 10 (10%) and both the GrowBy and ShrinkBy values are 50 (50%).

If I add 11 more items to the tree, the tree will become 91% full and cause the GrowTrigger to fire. This then increases the tree by 50%, i.e. internally it calls ReSize(150). Similarly if I removed 71 of the 80 items the tree would be only 9% full and the ShrinkTrigger would fire causing the tree to shrink by 50%, i.e. internally it calls ReSize(50);

Resizing is a costly operation and should be avoided as much as possible. The default values for the grow/shrink trigger and the growby/shrinkby values are given in the example above.

Summary of Public Methods

AddItem() Adds a new item to the BTree by allocating space for the new item and adding the pointer to the BTree array. Returns negative value on error.
RemoveItem() Removes an Item from the BTree. Item is deleted and all pointers above this position in the array are shifted left 1 place. Item count member variable is updated accordingly. Returns TRUE if nItem is in the range of valid indexes.
FindItem() Decides which search method to use to find an item. If the Array size is below the Binary Search Threshold then a linear search is performed instead of a Binary Search.
ReSize()

Shrinks or Grows the BTree according to the new Size indicated by nNewSize. If nNewSize equals the current size or no more memory can be allocated the function returns FALSE otherwise returns TRUE.

Initialise() Initialisation function MUST be called after object is created. Sets internal member variables and allocates memory for BTree array.
GetSearchMethodThreshold() Gets the current search method threshold value.
GetTotalItems() Gets the number of items currently stored in the tree.
GetTreeSize() Gets the maximum number of items the tree can hold.
GetShrinkTriggerValue() Returns the current value at which a Shrink Resize is triggered
SetShrinkTriggerValue() Sets the percentage value of the Shrink Trigger. When the Tree is using less than nPercentageUsed of its current space, the tree is truncated by automatically resizing if and only if AutoResizing is enabled. The default value for the ShrinkTrigger is 10%
GetShrinkByPercentage() Returns the current ShrinkBy Percentage
SetShrinkByPercentage() Sets the percentage value by which the tree will shrink if AutoResize is enabled. When AutoResize is enabled and a Resize is triggered the tree will deallocate nPercentDecrease empty items from the tree.
GetGrowTriggerValue() Returns the current value at which a Growth Resize is triggered
SetGrowTriggerValue() Sets the percentage value of the Grow Trigger. When the Tree has allocated nPercentageUsed of its current space, the tree is expanded by automatically resizing if and only if AutoResizing is enabled. The default value for the GrowTrigger is 90%
GetGrowByPercentage() Returns the current GrowBy Percentage
SetGrowByPercentage() Sets the percentage value by which the tree will grow if AutoResize is enabled. When AutoResize is enabled and a Resize is triggered the tree will allocate nPercentIncrease items to the tree
FreeMemory()

Frees up any heap memory allocated for the BTree array and items contained there within. User should never have to call this unless they need memory in a hurry and are willing to destroy the BTree.

Better to let the BTree go out of scope and have the default destructor call FreeMemory() automatically.

SetSearchMethodThreshold() Sets the threshold value at which searching will be done using the binary search method instead of the linear search method. The The search method is switched when the number of items in the tree exceeds the threshold value. The default value is 10.
GetItemPtr() Returns a void* which points to the data item of the BTree indexed by nItem. Returns NULL if nItem index is outside the valid range.
GenericLongCompare() Global generic 'compare' function provided for the user. Takes two longs and compares them. Returns 0 if equal, negative int if a<b, postive int if a>b.
DoTest() Provides example usage of the functions contained in the CSimpleBinarySearch Class. MUST define BINARY_TREE_EXAMPLE to use this function.

Downloads

Download demo project - ~21Kb
Download source - ~5Kb


Comments

  • Practical application?

    Posted by Legacy on 10/13/2002 12:00am

    Originally posted by: Sergey

    There are two interested articles about BBTs (for beginners). Who study programinig language in high school , should know this theory. But what about practice? Databases have frequently changing lists of data. How often trees and balancing functions are used in databases? In which "computing branches" this theory is in use?

    P.S. simple calculations:
    1) I have simple sorted list (16 000 000 entries). For finding element in worse case I have to make 24 comparsions.
    2) I have Balanced Binary Tree (16 000 000 entries). For finding the same element in worse I will need 11 comparsions.

    So, now I have to choose between speed for searching (2X slower) and between complicity of maintanence ( 100X easiest ).
    I made last week simple application with 250 000 list of elements. I chose 1 method

    Reply
  • STL already supports this.

    Posted by Legacy on 09/10/2002 12:00am

    Originally posted by: Josh

    This is a nice article and is well thought out.

    However, STL already has a tree feature. If your current version of STL doesn't support this then you can download it at... http://www.sgi.com/tech/stl/download.html

    The current version of SGI's STL doesn't support all platforms. If you are concerned about this then you can use "STL Port" which wraps SGI's STL. This can be downloaded at... http://www.stlport.org

    Reply
  • Small bug in AddItem

    Posted by Legacy on 09/05/2002 12:00am

    Originally posted by: Steve V

    Your implementation was very useful. Thanks! However, I found a small bug.

    In the AddItem method, the items are sorted before the m_nTotalItems is incremented. This means that the last item added is not sorted into the list. The next time the list is sorted, this corrects itself. However, if it is out of order and you immediately add a duplicate of the previous item, it will be added even though m_bAllowDuplicates is FALSE. There can never be more then one duplicate, but I think this proposed code change will fix the problem.

    ...
    if((m_BTree[m_nLastItemIndex]=malloc(m_nSizeOfItem))!=NULL){
    memcpy(m_BTree[m_nLastItemIndex], pNewData, m_nSizeOfItem);
    // This line moved to before the sort
    m_nTotalItems++;
    if(!m_bAllowDuplicates && m_nTotalItems>=m_nDoBSThreshold)
    SortItems();
    else
    m_bFullySorted=FALSE;
    //Now m_Btree[m_LastItemIndex] is NOT guaranteed to be
    //the last item added. So we search for new item. However,
    //if duplicates are allowed, this will select the first
    //occurence of the new data.
    if ((nNewIndex = FindItem(pNewData))) {
    pStoredObject=m_BTree[nNewIndex];
    }
    else {
    // return ERROR_FINDING_NEW_OBJECT
    }
    nResult=m_nLastItemIndex++;
    } else
    ...

    Reply
  • Thank you

    Posted by Legacy on 06/11/2002 12:00am

    Originally posted by: Mighty B

    Thank you for putting this stuff up it is heaps more helpful than my textbook, cheers,

    Mighty b.

    Reply
  • As usual..

    Posted by Legacy on 04/13/2002 12:00am

    Originally posted by: Jimmy

    as usual, this is a nice academic article - but in real life, just use the STL.

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • IBM Worklight is a mobile application development platform that lets you extend your business to mobile devices. It is designed to provide an open, comprehensive platform to build, run and manage HTML5, hybrid and native mobile apps.

  • Agile methodologies give development and test teams the ability to build software at a faster rate than ever before. Combining DevOps with hybrid cloud architectures give teams not just the principles, but also the technology necessary to achieve their goals. By combining hybrid cloud and DevOps: IT departments maintain control, visibility, and security Dev/test teams remain agile and collaborative Organizational barriers are broken down Innovation and automation can thrive Download this white paper to …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds