Click to See Complete Forum and Search --> : I need a datastructure: identifiers for sets


seb30
February 21st, 2008, 03:11 PM
Hi,
I wonder if there is an efficient datastructure with an interface like this (For example, in C++. Answers in other programming languages or pseudo languages or book references are welcome, too):

class Set_Id_Factory
{
public:
SetId empty_set();
SetId insert(SetId, Element);
private:
....
};

SetId should be an integral type (or a pointer), the factory may choose them.

The following condition should hold:

insert(...insert(empty_set(), a_1)..., a_n) == insert(...insert(empty_set(), b_1)..., b_m)

where a_1...a_n is the same set of elements as b_1...b_m, but can be in a different order and have different duplication of elements.

I know I can implement it like this (C++ Pseudo-Code!):

vector<set<Element>> id2set;
map<set<Element>, unsigned> set2id;
unsigned insert(unsigned id, Element e)
{
set<Element> s = id2set(id);
s.insert(e);
if(not set2id.has_key(s))
{
id2set.append(s);
set2id[s] = id2set.size();
}
return set2id[s];
}

But I wonder if there is a more efficient solution which doesn't need as much space (doesn't store each element of each set separately) and/or as much time (ideally less than linear with respect to the size of the set s).

Thanks,
Sebastian

TheCPUWizard
February 21st, 2008, 03:20 PM
What is the type of "element". If it is integral, and of a constrained range, then the best implementation is usually a "bit array". This can be done with standard classes, or custom coded. Trivial implementation
(hand written, not tested, assumes 32 bit machine with 32 bit int.

class MyBitSet<SIZE>
{
private:
unsigned int m_Bits[(SIZE+31)/32];

unsigned int Byte(int element) { return element / 32; }
unsidgned int Mask(int element) { return element % 32; }
void Set(int element { m_Bits[Byte(element)] |= Mask(element);
void Clear(nt element { m_Bits[Byte(element)] ^&= ~Mask(element);
void Intersect(MyBitSet<SIZE> const &other)
{
for (int i= 0; i< [(SIZE+31)/32; ++i)
m_Bits[i] &= other.m_Bits[i];
}
}


The rest of the operations should be easy to derive given the above...

seb30
February 22nd, 2008, 11:44 AM
The reason I want a SetId is that I can compare it fast (for example to use it as a key for another map).

To compare bitarrays (which you need if you want to use it as key for a map to SetId) you need not only linear time in the size of the sets, you need linear time in the size of the element range, which is worse.

Sebastian

TheCPUWizard
February 22nd, 2008, 02:04 PM
The reason I want a SetId is that I can compare it fast (for example to use it as a key for another map).

To compare bitarrays (which you need if you want to use it as key for a map to SetId) you need not only linear time in the size of the sets, you need linear time in the size of the element range, which is worse.

Sebastian

1) Ideally the element range should be exactly the same as the max size of the set. (i.e. Union "All").

2) I dont see how the "SetID" would help in comparing two different instances. If you are talking about having two references to the SAME set, then that is a completely different issue, and has absolutely NOTHING to do with sets at all. You can simply compare if the address refered to by the reference is the same.....