Click to See Complete Forum and Search --> : Weighted Sorting


NickR
September 19th, 2005, 11:38 PM
Hey guys,

I'm looking at this particular problem of weighted sorting and group inference. Maybe someone can help me devise an efficient and robust algorithm for it.

The scenario is that I have an array (or any other data collection for that matter) of string values. For example:

a[1] = "home/public/file1"
a[2] = "home/ajax/file2"
a[3] = "home/math/file3"
a[4] = "home/ajax/file4"
a[5] = "home/ajax/file5"
:
:
a[n] = "..."

Now what I want to derieve from the base collection is a set of similar string values. If I sort the array, the values can be grouped but what would be the best solution for filtering/fetching a group of similar values, such that the above example would render a new filtered collection based on weighted sorting:

a[1] = "home/ajax/file2"
a[2] = "home/ajax/file4"
a[3] = "home/ajax/file5"
:
:
a[n] = "..."

I hope the problem description is clear enough. I don't know how else to refer to it except for "weighted sorting". Maybe there's a formal term for it based on lexicography or heuristic patterns. Maybe regular expressions can be applied to it as well.

Maybe you guys have some thoughts or suggestions on solution.

mehdi62b
September 20th, 2005, 12:20 PM
well,what is the diferrence if you just sort them in its usual form because they were string and they would give you the same result.
BTW,if you need abit more complicated conditions for your searching method you should consider other possibilities
(I don't know anything about C++ and STL, but in .NET I make a class(structure) for strings and implement IComparable interface for my structure,furthermore I can implement an IComparare for giving more options to the user for different searching methods, example (http://www.codeguru.com/forum/showpost.php?p=1134114&postcount=13))

NickR
September 21st, 2005, 12:46 AM
I guess the earlier example is not the best, but I'll try explaining with another one:

Input Array (unsorted, 8 elements):

a[1] = "/home/dir1/file1"
a[2] = "/home/dir2/file3"
a[3] = "/home/dir1/file3"
a[4] = "/home/dir2/file2"
a[5] = "/home/dir1/file2"
a[6] = "/home/dir1/file4"
a[7] = "/home/dir2/file1"
a[8] = "/home/dir3/file2"

Output Array (sorted 4 elements):

b[1] = "/home/dir1/file1"
b[2] = "/home/dir1/file2"
b[3] = "/home/dir1/file3"
b[4] = "/home/dir1/file4"

The output array now holds the most number of closely similar string elements. Based on a prefix pattern lookup (the prefix will be the substring till the last slash) the algorithm groups and filters the most similar string values (since /home/dir1 has the most number of prefix occurences). Makes sense guys?!?!

mehdi62b
September 21st, 2005, 12:05 PM
I told you what you should do,you need here a suitable Comparator which determine the order of sorting method according to the count of the words or its alphabetic word,your collection can support every comparable object or could accept any comparator ,it would be sorted with an efficent algoritm like quicksort (n*logn),this is the the most efficent one.
its better to make a structure for your inputs!

Edit:
[C#]public class WordData:IComparable
{
string prefix;
public string Prefix
{
get{return prefix;}
}
ArrayList words;
public ICollection Words
{
get{return words;}
}
public int Count
{
get{return this.words.Count;}
}

public WordData(string word,string prefix)
{
words=new ArrayList();
words.Add(word);
this.prefix=prefix;
}
public void AddNewItem(string word)
{
words.Add(word);
}
#region IComparable Members
public int CompareTo(object obj)
{
WordData theword=(WordData )obj;
return this.words.Count.CompareTo(theword.words.Count);
}
#endregion
public enum SortKind{Perefix,Count};
public class Comparator:IComparer
{
SortKind sortKind;
public Comparator(sortKind SortKind)
{
sortKind=SortKind;
}
#region IComparer Members
public int Compare(object x, object y)
{
WordData wordData1=(WordData )x;
WordData wordData2=(WordData )y;
if(sortKind==SortKind.Count)
return wordData1.Count.CompareTo(wordData2.Count);
else
return wordData1.Prefix.CompareTo(wordData2.Prefix);
}
#endregion
}

}
now make another class which has a collection(hashtable),ittererate through the array and instasiate the WordData's ,give the prefix to the key of the hashtable ,in next itteration if you have the same key just call AddNewItem of the the current Worddata which is accessible from the value property of the hashtable,otherwise instansiate a new WordData and add it to the hashtable
finally assigne the Value property of the hashtable to an arraylist and call its sort method,it would sort according to the nummber of the occurrences(IComparable),
you can give the sort method a comparator which exist as nested class in Worddata so you can select whetere it should be sorted according the number of occurrences or according to prefixes(IComparer)
it implies for .NET but I know definitly Java also have such things,or you can make your own interfaces and collections though.
I hope you get its picture!