Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
The Tree Container Library (TCL) is a class template library that you can use to store data elements in a tree-like structure. The TCL consists of four templatized container classes, similar to those found in the STL. These classes allow storage of basic data types or user-defined types to be stored as nodes within a tree structure. Each node of the tree is considered a tree itself, having all the properties of a tree container. So, any operations that can be performed on the tree container likewise can be performed on any node within the tree. Iterators are provided to allow the transversal of the tree nodes. Insert and find operations, as well as various other operations, are also provided. The TCL provides four tree containers that differ according to their intention of use:
- tree: The tree container is used for tree structures in which every sibling node is unique, or rather, every child node of a particular parent can be uniquely distinguished. Non-sibling nodes need not be unique.
- multitree: The multitree container is used for tree structures in which siblings need not be unique, or rather, children which have the same parent node need not be distinguishable.
- unique_tree: The unique_tree container is used for tree structures in which every node in the tree is unique. Because every node in the tree is guaranteed to be unique, the unique tree offers a find_deep() operation, as well as other operations different from tree and multitree.
- sequential_tree: The sequential_tree container is used for tree structures in which the tree nodes remain in the same order in which they are inserted. Any or all nodes may later be sorted, by using a binary predicate or function object.
The tree and multitree are very similar in operation and interface. The difference between the two is much like the difference between the set and multiset in the STL. The unique_tree offers many more features because each node in the tree is unique. For example, the find() operation is available to tree, multitree, and unique_tree, which searches for a matching node contained within a single parent node. The unique_tree offers an additional operation, find_deep(), which not only searches the parent node's immediate children but also searches its descendants.
The unique_tree also offers extensions to the common interface for the three tree types. For example, all four trees offer the insert(child) operation, in which a child is inserted in the parent issuing the insert operation. The unique_tree offers an extension to this and other operations. So, the unique_tree provides another insert operation insert(parent, child), which inserts the child in the specified parent node (if found).
tree, multitree, and unique_tree are considered associative tree containers because they all use std::set for internal child node containment. sequential_tree, however, uses a std::vector to store the child nodes. Thus, sequential_tree does not offer find() operations like the associative tree containers do. sequential_tree has its unique operations, however, like multiple sort() operations that can be used to sort any/all of the child nodes within their parent. The associative trees do not need sort() operations because their nodes are ordered naturally.
Like the STL, the TCL uses iterators in many of its operations. Not only can iterators be used to traverse the tree structures in various ways, many of the operations performed on the trees returns an iterator, such as the find() and insert() operations. The most common iterators in the library are the iterator and const_iterator. These iterators are created using the same syntax as iterators in the STL. If a tree container contains objects of the type CMyClass, an iterator would be created as tree<CMyClass>::iterator it;, or tree<CMyClass>::const_iterator it;. These two standard iterators traverse only a parents immediate children.
To traverse a parent's descendant nodes as well as its children, there are three more iterators provided: pre_order_iterator, post_order_iterator, and level_order_iterator. The creation of these descendant iterators uses the same syntax as the standard iterators mentioned above.
Like the iterators of STL, the * and -> operators are overridden to return the reference/pointer to the underlying node. Because of this, the underlying node operations are available to the iterators, such as insert() and find(). All nodes have the same interface as the declared tree in which the nodes reside (remember, nodes are themselves trees). This is a very important concept in the TCL. Every node in a tree structure has the same operations of the tree itself because nodes are in fact the same type as the declared tree that it's in. The only difference between a declared tree container and one of its nodes is the simple fact that the declared tree container doesn't have a parent node (it's the root node).
Using the Tree Container Library (TCL) is much like using the container classes in the STL. The syntax and operations are very similar. First, you will need to decide which of the four tree containers will fit your needs. The difference between the tree containers is described above.
The tree and multitree both take two template parameters, the second being optional. The first parameter represents the data type you will store in the container, henceforth called the stored_type. The stored_type can be either a basic data type, such as an int or a string, or a user defined type (class). The second template parameter represents the node comparison operation, a functor, used in the searching and ordering of the nodes, henceforth called the node_compare_type. If not supplied, this template parameter defaults to std::less<stored_type>. So, if the container stores a basic type such as ints, the node_compare_type will default to std::less<int>. If the container stores class objects, you could either supply a < operation for the class that would be used when the node_compare_type defaults to std::less<stored_type>, or, create and use a functor for the second parameter.
The unique_tree takes three template parameters, the last two being optional. The first two template parameters are the same as with the tree and multitree described above. The third template parameter represents another node comparison operation, called the node_order_compare_type that is used for special iterators called ordered_iterators that can be used to determine an alternate ordering of nodes within their parent. An alternate ordering for child nodes can be handy if the comparison operation used to distinguish unique nodes within the unique_tree is not the desired ordering you want to be used to traverse child nodes within a single parent.
The sequential_tree takes a single template parameter, which is required. This parameter represents the data type you will store in the container, stored_type, discussed above. Because this container does not naturally order the nodes within their parent, there is not a node_compare_type that is needed for sequential_tree. The nodes remain in the order in which they were inserted into the tree. This container does, however, offer sorting operations for any or all of the nodes, so the order of the nodes can be changed if desired.
Given the above information, the following are simple declarations of the four tree containers:
- tree<std::string> my_tree;
- multitree<int> my_tree
- unique_tree<CMyClass> my_tree;
- sequential_tree<std::string> my_tree;
The TCL allows for polymorphic storage without the need to store pointers in the tree container. This means that you can store objects of stored_type and class objects derived from stored_type, and those objects will keep their original identity. To do this, you must supply a clone function to the tree container. The container uses the clone function to copy the stored_type or objects derived from stored_type, rather than using the stored_type copy constuctor.
The demo download contains a number of files. The files are all part of four sample console programs. The following list details the four demo programs and their associated files.
- Generic Example: This example shows all three associative tree containers at work. It demonstrates insertion and the use of child_iterators and descendant iterators. It also demonstrates the difference between the three types of associative tree containers, by attempting to insert duplicate nodes.
- generic_example.cpp: The CPP file to use in your console project.
- generic_example_explanation.rtf: An RTF file that explains the example code in detail.
- generic_example_diagram.jpg: A diagram depicting the tree structure used in the example.
- generic_example_results.jpg: A screenshot of the console results you should expect to see.
- unique_tree Example: This example demonstrates the operations and features of the unique_tree because the unique_tree offers features that the tree, multitree, and sequential_tree do not.
- unique_tree_example.cpp: The CPP file to use in your console project.
- unique_tree_explanation.rtf: An RTF file that explains the example code in detail.
- unique_tree_example_diagram.jpg: A diagram depicting the tree structure used in the example.
- unique_tree_example_results.jpg: A screenshot of the console results you should expect to see.
- Polymorphic Example: This example demonstrates the polymorphic nature of the associative tree containers by inserting derived class objects of the declared tree type.
- polymorphic_example.cpp: The CPP file to be used in your console project.
- polymorphic_example_explanation.rtf: An RTF file that explains the example code in detail.
- polymorphic_example_diagram.jpg: A diagram depicting the tree structure used in the example.
- polymorphic_example_results.jpg: A screenshot of the console results you should expect to see.
- sequential_tree example
- sequential_tree_example.cpp: The CPP file to be used in your console project.
- sequential_tree_example_explanation.rtf: An RTF file which explains in detail the example code.
- sequential_tree_example_diagram.jpg: A diagram depicting the tree structure used in the example.
- sequential_tree_example_results.jpg: A screenshot of the console results you should expect to see.
If you are using Visual C++ 6.0, you will need to uncheck the Enable minimal rebuild option in the project for successful compilation.
The following operations describe the public interface that is available to all four tree containers in the TCL. Many of the basic functions return child iterators. All occurences of the term iterator below refer to the appropriate type of child iterator for the specific tree container or tree_type. Both child iterators, iterator and const_iterator, are bi-directional iterators.
In the function signatures below, stored_type is used to refer to the type of objects being stored in the tree container. The term node_compare_type is used to refer to the function object that provides the comparison operator for the tree nodes.
The following list is categorized for easy reference.
- iterator insert(const stored_type& stored_obj): This is the standard insertion function for all trees. It's overridden in each class, and each in turn calls the similar function in the base class. The parameter is a reference to the stored object being stored in the node. This function is polymorphic, and allows the insertion of objects derived from stored_type. The object is inserted into a child node of the parent in which the function was called. An iterator is returned that points to the inserted node. For tree and unique_tree, if the stored_type object already exists as a child node, the object is not inserted, and the end iterator is returned. For sequential_tree, this function inserts the object as the last child in the parent.
- iterator insert(const tree_type& tree_obj): This function inserts a tree into another tree of the same type. All descendants of the inserted tree are inserted in the process.
- void erase(iterator it):
This operation is available for all trees. It accepts an iterator that points to a child node within a parent to erase.
- bool erase(stored_type& stored_obj): This operation is available for the associative trees only, and not for sequential_tree.
This function varies slightly depending on the tree container that it's called. For a tree, the immediate children are searched. The return result indicates its success. For a multitree, the immediate children are searched, and all occurences of the object are erased. The return result indicates whether any node was erased. For a unique_tree, all descendants are searched. The return result indicates its success.
- void clear(): Clears (erases) all descendants of the node. The node itself is not erased, and its stored_type object remains intact.
All find() operations are available for the associative trees only.
- iterator find(const stored_type& stored_obj): Searches child nodes for the stored_type object. Returns an iterator pointing to the found node, or the end node if unsuccessful. There is a slight variance in the behavior of the multitree container. Because multitrees can have multiple occurences in a single parent, the iterator will return a pointer to the first found occurrence, or the end iterator if not found. Remember that, when searching nodes, the node_compare_type will determine the equality of the node that's being sought.
- const_iterator find(const stored_type& stored_obj) const: Same as the above function, but returns a const_iterator rather than iterator.
Querying node state
- stored_type get(): Returns a pointer to the stored_type object within the node. Remember that the nodes are not themselves objects of store_type, but rather contain objects of those types.
- const stored_type get() const: Returns a const pointer to the stored_type object within the node.
- tree_type* parent(): Returns a pointer to the parent node, or NULL if there is no parent node (indicating that this is the root node).
- bool is_root(): Indicates whether node is the root node in the tree structure.
- bool empty(): Indicates whether or not the node has children. Does not care about the stored_type within the node itself.
- int size(): Returns the number of child nodes within the node. Only counts the immediate children, not the descendants.
Setting node state
- void set(const stored_type& stored_obj): Sets the stored_type object in a node. This function is polymorphic, so the passed type can either be a stored_type or a type derived from stored_type.
- void set(const tree_type& tree_obj): Sets the node to the passed tree object. All descendants of the passed tree object will afterwards be descendants of the node. A copy is made of the passed tree object and all its descendants.
- void for_each(void (*pFcn)(tree_type*)): Calls pFcn(tree_type*) on the node and all its descendants. Uses a pre_order_iterator internally to traverse the descendants.
- void for_each(T& func_ob) Same as the function above, but a function object is passed, rather than a pointer to a function.
- iterator begin(): Returns an iterator pointing to the first child in the node. If the node is empty, returns the end iterator.
- const_iterator begin() const: Same as above, but returns a const_iterator rather than iterator.
- iterator end(): Returns an iterator pointing to past the last child node. Iterators can be checked against the end iterator to see whether an operation succeeded, or whether they have traversed through all the children.
- const_iterator end() const: Same as above, but returns a const_iterator rather than iterator.
- pre_order_iterator pre_order_begin(): Returns the beginning pre_order_iterator for the node. pre_order_begin() will by nature return an iterator pointing to the first immediate child, if any.
- const_pre_order_iterator pre_order_begin() const: Same as above, but returns a const_pre_order_iterator
- pre_order_iterator pre_order_end(): Returns the ending pre_order_iterator for the node. When using the pre_order_iterator, it must be checked against pre_order_end() to know when all the descendants have been traversed.
- const_pre_order_iterator pre_order_end() const: Same as above, but returns a const_pre_order_iterator
- post_order_iterator post_order_begin(): Returns the beginning post_order_iterator for the node. post_order_begin() will, by nature, return an iterator pointing to the deepest first descendant node, if any.
- const_post_order_iterator post_order_begin() const: Same as above, but returns a const_post_order_iterator
- post_order_iterator post_order_end(): Returns the ending post_order_iterator for the node. When using the post_order_iterator, it must be checked against post_order_end() to know when all the descendants have been traversed.
- const_post_order_iterator post_order_end() const: Same as above, but returns a const_post_order_iterator.
- level_order_iterator level_order_begin(): Returns the beginning level_order_iterator for the node. level_order_begin() will, by nature, return an iterator pointing to the first immediate child, if any.
- const_level_order_iterator level_order_begin() const: Same as above, but returns a const_level_order_iterator.
- level_order_iterator level_order_end(): Returns the ending level_order_iterator for the node. When using the level_order_iterator, it must be checked against level_order_end() to know when all the descendants have been traversed.
- const_level_order_iterator level_order_end() const: Same as above, but returns a const_level_order_iterator