HybridList - A fast N lg (N) sort algorithm for lists

Management of large dictionaries is always a big problem.  That's why I decided to create an efficient and simple algorithm to manage very long sorted lists.  The HybridList algorithm resembles skip-lists but it differs considerably in basic principles and memory usage.  I hate the idea of allocating memory blocks for an array of pointers and predicting the number of nodes in a nonexistent list.

Main features

  • Search, insert, delete operations require  O(lg n) comparisons in all cases
  • As every insertion sort it is stable - it retains the original ordering of  identical keys 
  • It is in-place algorithm - no additional memory or stack is required
  • Data  can be linked in simple as well as double-linked list
  • Only 4 bytes for 'Next' pointer are required in user data structure

Concept

There are two types of nodes: user bottom-nodes (blue circles) and top-nodes (green rectangles) that  create a kind of hierarchy. When user adds/removes bottom-node to/from the list, 'parent' top-node on the first level is checked to keep 10-30 bottom nodes. If node has more than 30 nodes, then new node is created and both nodes have 15 subnodes. If node has less than 10 subnodes, its right neighbor is removed. Therefore hierarchy is always quite balanced.

QList_scheme.gif (6718 bytes)

Simple example

  #include "HybridList.h"

  class SampleNode : public HListNode <SampleNode>
  {
    // defines Prev,Next members
    public:
      int Value;  // random value to sort
      SampleNode() { Value = rand(); }
      bool operator ==(SampleNode& comp) {return Value ==comp.Value;}
      bool operator > (SampleNode& comp) {return Value > comp.Value;}
      bool operator < (SampleNode& comp) {return Value < comp.Value;}
  };

  bool TEST()
  {
    HybridList<SampleNode> List;
    for (int i=0; i < 1000; i++)
        List.InsertSorted (new SampleNode);
    SampleNode to_find;
    SampleNode * first_equal = List.FindEqual(to_find);
    while (true)
        {
          SampleNode * head = List.RemoveHead();
          if (!head)
             break; // end of list achieved
          printf ("%ld->",head->Value);
          delete head;
        }
    return true;
  }

Speed

There is no sense to compare O(lg n) algorithm with O(n) algorithm. But even  O(lg n) algorithms, like binary trees, red-black trees, skip-lists have different speed due to balancing techniques, memory usage, comparison function, etc. My preliminary tests of red-black tree, that seems to be the fastest algorithm for dictionaries, shows that HybridList is more than twice as fast. But it seems to me to be not so important with regard to memory usage issue: HybridList node require only additional 4 byte for 'Next' pointer and red-black node require 12-16 bytes.

More details  you can find on http://members.tripod.com/~DanKozub. Timing diagram and comparison with Quicksort and test exe-file are also available there.

Source

The algorithm is implemented in C++ . In principle it does not require any additional libraries or system support. But I think some insignificant changes will be needed to make it run on other platforms than Windows 95/NT.

Download source   HybridList.zip (7 KB)



Comments

  • could you help me?

    Posted by zhoeng on 10/23/2005 08:56am

    i can not download the source code

    • Download Failed

      Posted by Promotional Engine on 08/24/2006 09:39pm

      There isn't any file to download.

      Reply
    Reply
  • Good for tree with this mode !!!!

    Posted by Legacy on 03/15/2002 12:00am

    Originally posted by: D

    I do not care this good or other good!
    I want a tree structure like folders,
    do not tell me to use CTreeCtrl, that is a wnd, not a structure.
    also i do not need sort, i only need search and display part of them.
    but it have a lot of elements and is dynamic!

    i think this is good for me!

    Reply
  • Simple skip list

    Posted by Legacy on 04/10/2001 12:00am

    Originally posted by: Boris

    That data type is called skip list and is widely discussed in various literature

    Reply
  • Agreement with chris..

    Posted by Legacy on 07/29/1999 12:00am

    Originally posted by: Brad Hochgesang

    I have to agree with Chris on this one. The stl is meant as a very genaric class to provide basic functionality. Any good Computer Scientist knows that an extreemly generic function isn't always the most efficient one. Most of the time, it isn't. For instance, I had an application the required dynamic allocation of lists that needed to be maintained in order to provide a functional binary search. There wasn't any efficient way to do it using a Vector or a List, since their [] operators are fairly slow compared to that of an array. I had large data structures. Very large structures, and couldn't waste memory by trying to allocating them in an array. So, I made an array of pointers, which are much much smaller, and used them to point to the corresponding address of my structure. For my aplication, that was a much more efficient way to do it (since I have to search thousands of times in a row). When I used a vector (which I did try) my searching took over ten times as long.

    You see, encapsulation is a wonderful concept. Program it once, and forget about it.... But that is equivelant to a factory manager in the 70's when computers were coming out. They already had their plants set up and running efficiently. Why should they change what's working? Simple, because there is always a better way!

    Excellent linked list class...

    Thank you.

    Reply
  • Ponderings ...

    Posted by Legacy on 06/16/1999 12:00am

    Originally posted by: Chris K

    What does this have to do with the differences between MFC and STL?

    The previous two guys took this opportunity to bash MFC users as lazy in learning STL. Are you against MFC? This is an MFC forum, is it not? I don't understand.

    Anyway this is a templated set of classes that sort, allegedly faster than the existing stl classes. I don't see anything wrong with these exercises. Its not reinventing if your improving. Plus it doesn't use or require MFC that I can tell.

    Please, view the source and test what's offered before you steal the moment to make comments that truly are unrelated.

    A lot of people aren't completely satisfied with all of the implementations in the standard, and recommending that people just accept what's there is bad advice. We might miss out on some innovation if we stick with this advice.
    On the other hand, recommending that the STL be checked out first IMO is good advice. But, assuming that just because someone is building a class or set of classes that exists int the STL, that they haven't checked the standard first, is a bad one.

    This is a decent piece of work.

    Speaking for myself, I don't feel that religious about either MFC or STL, they both have their uses and drawbacks.

    Reply
  • Pondering on STL

    Posted by Legacy on 03/31/1999 12:00am

    Originally posted by: xu

    I agree with Brett Calcott completely on the use of STL. I guess too many MFC programmers are so fancinated with MFC that they forget to learn and use STL. In my personal opinion, when it comes to data structures and algorithm, STL is more flexible and portable than MFC. For example, STL's vector is far better than MFC's ill-concocted CArray and its equally ugly designed cousines. Also, some people like to reinvent such wheels as doubly linked list or binary tree, while there are list and set already in STL. Remember folks, STL is C++ standard, so you should be as comfortable with STL as with strcpy or sqrt.

    Reply
  • Ponderings on the STL...

    Posted by Legacy on 03/30/1999 12:00am

    Originally posted by: Brett Calcott

    I am no expert on sorting, and applaud any inventive faster/smaller utility classes, however...
    The STL provides all of this. And its now part of the C++ standard, and its (relatively) portable, and stable.
    More than that (IMHO) its an elegant design, it decouples algorithms and data storage and allow orthogonal extensibility.

    This is not to say that you can't improve these classes. But using them as a model you can implement a far more modular and extensible class. Of course, it takes a lot more effort to create an "STL compliant" data structure.

    <getting on soapbox..>
    its seems that a lot MFC programmers neglect learning this powerful part of the C++ standard.
    <getting off soapbox..>

    Any thoughts...

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • Companies must routinely transfer files and share data to run their business, work with partners, and speed operations. However, many find the traditional approach to file transfer lacks necessary security, is too complex and difficult to manage, does not support the levels of automation needed, and breaks down when addressing the file transfer requirements of new areas like Big Data analytics and mobile applications. This QuinStreet SmartSelect discusses how the changing business environment is making the use …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds