Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
Environment: Developed in (but not restricted to) VC6
I won't go into the gory details of Binary Tree Theory as this has already been discussed by Per Nilsson in his "Balanced Binary Tree" submission found here.
Suffice to say that this class is just a different implementation on the same theme.
Creating the Tree
To create the Binary Tree simply create a CSimpleBinaryTree object and initialise it.
btree.Initialise(MAXELEMS,sizeof(CSomeObject), someCompareFunction, nGrowTrigger, nGrowByValue, nShrinkTrigger, nShrinkByValue);
The "someCompareFunction" is a pointer to a function that takes two pointer parameters and returns a negative value, 0 or a positive value depending on whether the 1st parameter is less than, equal to or greater than the 2nd parameter respectively. CSimpleBinaryTree provides a generic global compare function that compares two pointers of type long.
To add an item to the tree, simply call AddItem() and give it a pointer to the type of object being stored. Internally the data referenced by the pointer is copied into dynamically allocated memory.
The AddItem() function returns two values. The first is the integer result. If the item was added successfully then the index of the item in the tree is returned, if not then one of the following error values is returned:
The second value returned is to the optional second parameter. If the operation was successful then a pointer to the newly allocated object is returned, if the operation was unsuccessful then the pointer is set to NULL;
The return value DUPLICATE_FOUND is only returned if the public member variable m_bAllowDuplicates is set to FALSE otherwise the tree will allow duplicate values (the default).
If duplicates are not allowed then the tree is searched every time an item is added to see whether it exists already. This slows down the speed of adding items but does potentially save on the amount of memory allocated.
To remove an item simply call the RemoveItem() function and feed it the index of the item you want to remove.
If the item was successfully removed from the tree, the function returns TRUE, otherwise it returns FALSE;
When an item is removed from the tree all elements above the said item are shifted left one place and the "total number of items" counter decremented.
This isn't very efficient as, potentially, the function might have to loop through the whole tree. However adding and removing elements from binary trees is inherently slower than other storage/retrieval algorithms due to the fact that binary search trees require the elements be sorted.
Searching the Tree
To find if an item exists in the tree, simply call the FindItem() function.
nIndex = btree.FindItem(psObj);
nIndex = btree.FindItem(psObj, pFoundObject);
FindItem() returns two values. The first is the index of the item in the tree if it exists or the ITEM_NOT_PRESENT value if it does not. The second value is returned to the optional second parameter. If the item is found then pFoundObject will point to the object stored in the tree, if the item was not found then pFoundObject will be set to NULL;
There are two searching algorithms implemented in the CSimpleBinaryTree class. A linear search and a binary search. The linear search is used up to the point when the number of items in the tree exceeds the threshold switching value (default is 10), thereafter the searching is done using the binary search algorithm. The reason for this is that a linear search doesn't require the elements to be sorted and its algorithm is a lot less complicated. So for a small number of items a linear search is ideal.
If duplicates are allowed (m_bAllowDuplicates=TRUE) in the tree then balancing (all items sorted) is done only on an "as needed" basis as opposed to when m_bAllowDuplicates=FALSE and the items are sorted everytime an item is added. Instead, sorting may occur within a call to the FindItem() function. When searching for an item using FindItem() and the binary search algorithm is being used, the search may fail even if the item exists in the tree. This happens if the tree is not fully balanced. In this case the algorithm searchs for the item and fails, detects that the tree is not fully balanced, balances it by sorting the items in the tree and calls FindItem() again recursively. Once the tree is balanced an internal flag is set and is only unset when new items are added to the tree, thus avoiding any recursive loops. Therefore all subsequent calls to FindItem() will avoid this double call mechanism. To the programmer there is no need to take any special action. This is just a note about the internal implementation of the class.
It is hoped that this increases the efficiency in the case where duplicates are allowed as there is no need to call SortItems() every time a new item is added to the tree. The sorting algorithm used is the C library function qsort().
To increase the size of the binary tree to allow more items to be added or to reduce the number of items allowed in the tree use the ReSize() method.
The call to ReSize() above sets the maximum number of elements allowed in the tree to 300. However if the tree contained say 220 items and it's maximum allowed number of items was 250 and we called btree.ReSize(100), then the first 100 items of the tree would be transferred internally to a new pointer array and the subsequent 120 items would be removed from the tree and the memory they occupied would be freed.
By setting the public member variable m_bAutoResize to TRUE (the default), the tree is automatically resized using the parameters contained in the member variables controlling Growing and Shrinking.
Growing and Shrinking
Growing and shrinking parameters control the size of the tree when the public member variable m_bAutoResize is set to TRUE. There are two variables for each operation. A Trigger and a GrowBy/ShrinkBy value. All these values are represented as percentages of the current tree size.
The trigger value determines when either a grow or shrink operation will occur. The GrowBy/ShrinkBy percentages determine to what extent the tree will grow or shrink.
For example if I have 80 items in the tree where the max allowed number of items is 100 and m_bAutoResize=TRUE. Also the GrowTrigger is set to 90 (90%) and the ShrinkTrigger is set to 10 (10%) and both the GrowBy and ShrinkBy values are 50 (50%).
If I add 11 more items to the tree, the tree will become 91% full and cause the GrowTrigger to fire. This then increases the tree by 50%, i.e. internally it calls ReSize(150). Similarly if I removed 71 of the 80 items the tree would be only 9% full and the ShrinkTrigger would fire causing the tree to shrink by 50%, i.e. internally it calls ReSize(50);
Resizing is a costly operation and should be avoided as much as possible. The default values for the grow/shrink trigger and the growby/shrinkby values are given in the example above.
Summary of Public Methods
|AddItem()||Adds a new item to the BTree by allocating space for the new item and adding the pointer to the BTree array. Returns negative value on error.|
|RemoveItem()||Removes an Item from the BTree. Item is deleted and all pointers above this position in the array are shifted left 1 place. Item count member variable is updated accordingly. Returns TRUE if nItem is in the range of valid indexes.|
|FindItem()||Decides which search method to use to find an item. If the Array size is below the Binary Search Threshold then a linear search is performed instead of a Binary Search.|
Shrinks or Grows the BTree according to the new Size indicated by nNewSize. If nNewSize equals the current size or no more memory can be allocated the function returns FALSE otherwise returns TRUE.
|Initialise()||Initialisation function MUST be called after object is created. Sets internal member variables and allocates memory for BTree array.|
|GetSearchMethodThreshold()||Gets the current search method threshold value.|
|GetTotalItems()||Gets the number of items currently stored in the tree.|
|GetTreeSize()||Gets the maximum number of items the tree can hold.|
|GetShrinkTriggerValue()||Returns the current value at which a Shrink Resize is triggered|
|SetShrinkTriggerValue()||Sets the percentage value of the Shrink Trigger. When the Tree is using less than nPercentageUsed of its current space, the tree is truncated by automatically resizing if and only if AutoResizing is enabled. The default value for the ShrinkTrigger is 10%|
|GetShrinkByPercentage()||Returns the current ShrinkBy Percentage|
|SetShrinkByPercentage()||Sets the percentage value by which the tree will shrink if AutoResize is enabled. When AutoResize is enabled and a Resize is triggered the tree will deallocate nPercentDecrease empty items from the tree.|
|GetGrowTriggerValue()||Returns the current value at which a Growth Resize is triggered|
|SetGrowTriggerValue()||Sets the percentage value of the Grow Trigger. When the Tree has allocated nPercentageUsed of its current space, the tree is expanded by automatically resizing if and only if AutoResizing is enabled. The default value for the GrowTrigger is 90%|
|GetGrowByPercentage()||Returns the current GrowBy Percentage|
|SetGrowByPercentage()||Sets the percentage value by which the tree will grow if AutoResize is enabled. When AutoResize is enabled and a Resize is triggered the tree will allocate nPercentIncrease items to the tree|
Frees up any heap memory allocated for the BTree array and items contained there within. User should never have to call this unless they need memory in a hurry and are willing to destroy the BTree.
Better to let the BTree go out of scope and have the default destructor call FreeMemory() automatically.
|SetSearchMethodThreshold()||Sets the threshold value at which searching will be done using the binary search method instead of the linear search method. The The search method is switched when the number of items in the tree exceeds the threshold value. The default value is 10.|
|GetItemPtr()||Returns a void* which points to the data item of the BTree indexed by nItem. Returns NULL if nItem index is outside the valid range.|
|GenericLongCompare()||Global generic 'compare' function provided for the user. Takes two longs and compares them. Returns 0 if equal, negative int if a<b, postive int if a>b.|
|DoTest()||Provides example usage of the functions contained in the CSimpleBinarySearch Class. MUST define BINARY_TREE_EXAMPLE to use this function.|
DownloadsDownload demo project - ~21Kb
Download source - ~5Kb