Click to See Complete Forum and Search --> : Calculate the median using only counters
jharvey963
October 5th, 2006, 03:54 PM
Somebody must have an algorithm for this. Here's the situation:
I have a stream of integers of known size (12 bit) and known cardinality (i.e., how many integers there are).
I also have 2 counters that will count the numbers <= or >= limit values that I can set (which can be different for the 2 counters).
I can run the stream of integers past the counters as many times as I want (although fewer is better, obviously).
How do I determine the MEDIAN of the integers?
What I DON'T have is any extra storage: i.e., I can't sort the elements or change the elements' locations. All I can do is stream the entire sequence past the counters. I also don't need to know the index of the median value -- just the median value itself.
While I'll be doing this on modern custom hardware, it harkens back to the day of streaming tape drives. If you had a set of integers on a streaming tape drive and had to determine the median value, how would you do it?
My initial thought was to do a "binary search" on the median value; e.g., Set the limit for the first counter at 2^6 and stream the values. If the count is less than the Cardinality / 2 (i.e., N/2), I'd switch the limit value to 2^9, and stream the integers again, etc. Using this method, I can "shrink" the range of my limit value and hone in on the median.
The problems, though:
1. If the sample size is even, I'll have to find both the N/2 and the N/2+1 values and average them.
2. Duplicate values in the stream are allowed.
3. What is the stopping condition?
Anyone know of an algorithm to handle this?
Thanks,
J.
kumaresh_ana
October 6th, 2006, 05:18 AM
median is the value of c such that the quantity E(|X - c|) is minimum. This is what wikipedia tells. here E stands for average and X is your distribution. So, if you can do this, you can find median without sorting the elements or moving them around.
jharvey963
October 6th, 2006, 11:44 AM
median is the value of c such that the quantity E(|X - c|) is minimum. This is what wikipedia tells. here E stands for average and X is your distribution. So, if you can do this, you can find median without sorting the elements or moving them around.
Unfortunately, my hardware doesn't have the ability to do this...
J.
RoboTact
October 6th, 2006, 03:00 PM
The problems, though:
1. If the sample size is even, I'll have to find both the N/2 and the N/2+1 values and average them.
2. Duplicate values in the stream are allowed.
3. What is the stopping condition?
Problems are in specification. When you say 'find median of this set of N integers' you ususlly mean 'find an element at index k in sorted order of these elements' having a specific case of k=N/2. It's most likely what you need.
Now solving this problem optimally may be interesting. When you check condition with given boundaries you obtain the knowledge on number of elements in specified intervals of values. Bisection if far from optimal here. :)
jharvey963
October 6th, 2006, 07:33 PM
Problems are in specification. When you say 'find median of this set of N integers' you ususlly mean 'find an element at index k in sorted order of these elements' having a specific case of k=N/2. It's most likely what you need.
Now solving this problem optimally may be interesting. When you check condition with given boundaries you obtain the knowledge on number of elements in specified intervals of values. Bisection if far from optimal here. :)
The set of integers is not sorted. I'm not interested in the INDEX of the median in the set of integers. I'm interested in the VALUE of the median value. It is not possible to sort the set of integers before looking for the median.
I'm also not interested in an optimal solution, just one that's "good enough".
I'm looking for a "pretty good" algorithm that will make multiple passes through the data set (by streaming the set of integers through the counters) and determine the median value.
J.
RoboTact
October 7th, 2006, 04:55 AM
The set of integers is not sorted. I'm not interested in the INDEX of the median in the set of integers. I'm interested in the VALUE of the median value. It is not possible to sort the set of integers before looking for the median.I didn't say you are to sort numbers, did I? I said that median is usually defined as value of element at given index in sorted order of elements in given set. Sorting is in this case only a mathematical obstraction, you don't need to find sorted order in order to find single k-th smallest element in the set.
http://en.wikipedia.org/wiki/Selection_algorithmI'm also not interested in an optimal solution, just one that's "good enough".Then define 'good enough'... :rolleyes:
kumaresh_ana
October 7th, 2006, 06:13 AM
The constraints are too tight :confused:. The bisection method is the only one I can think of.
My initial thought was to do a "binary search" on the median value; e.g., Set the limit for the first counter at 2^6 and stream the values. If the count is less than the Cardinality / 2 (i.e., N/2), I'd switch the limit value to 2^9, and stream the integers again, etc.Now move this 2^6 to the second counter. Stream the numbers. Adjust the counters accordingly. If some counter hits N / 2 stop adjusting it. If the other counter also hits N / 2 then the problem is solved. Obviously, this method is way too costly.
RoboTact
October 7th, 2006, 06:43 AM
The constraints are too tight :confused:. The bisection method is the only one I can think of.
Now move this 2^6 to the second counter. Stream the numbers. Adjust the counters accordingly. If some counter hits N / 2 stop adjusting it. If the other counter also hits N / 2 then the problem is solved. Obviously, this method is way too costly.You can actually perform trisection.
kumaresh_ana
October 7th, 2006, 07:14 AM
You can actually perform trisection.
Can you elaborate it?
EDIT
I get it. With a single pivot you can bisect. So, with two pivots you can trisect.
MikeAThon
October 7th, 2006, 02:04 PM
Finding the median is a specialized case of so-called "selection algorithms", in which you are given a list of N elements and your job is to find the k-th element if the list had been sorted (or, put another way, to find the k-th smallest element in the list). For the median (obviously), k=N/2.
One brute-force way is to run through the list k times, the first time selecting the smallest elemnt, and thereafter selecting the next smallest element. That is guaranteed to find the k-th element without any extra storage or sorting of elements, in O(kN) time. If that's good enough for you, then use it.
Start with the wikipedia to get going on other ideas: http://en.wikipedia.org/wiki/Selection_algorithm
Mike
kumaresh_ana
October 9th, 2006, 12:21 AM
One brute-force way is to run through the list k times, the first time selecting the smallest elemnt, and thereafter selecting the next smallest element. That is guaranteed to find the k-th element without any extra storage or sorting of elements, in O(kN) time. If that's good enough for you, then use it.
Start with the wikipedia to get going on other ideas: http://en.wikipedia.org/wiki/Selection_algorithm
MikeThis method needs the elements to be swapped. OP says the elements cannot be moved around.
MikeAThon
October 9th, 2006, 12:23 AM
This method needs the elements to be swapped. OP says the elements cannot be moved around.
Of course it doesn't require swapping of elements. Think harder. It only requires some method of tracking the index of the currently smallest/smaller element.
Mike
kumaresh_ana
October 9th, 2006, 01:19 AM
Of course it doesn't require swapping of elements. Think harder. It only requires some method of tracking the index of the currently smallest/smaller element.
MikeCorrect me if I am wrong. You need to keep track of all the extreme elements (say smaller) by far found to be less than median. This requires extra storage. Provided that the algo will not be O(nk) as we need to stream the numbers again for every inner loop.
MikeAThon
October 9th, 2006, 11:50 AM
Correct me if I am wrong. You need to keep track of all the extreme elements (say smaller) by far found to be less than median.
You are partly correct. You do not need to keep track of *all* the extreme elements; you only need to keep track of the next-most extreme element (i.e., the one that you have found so far).
This requires extra storage.
Yes, but as stated by the OP:I also have 2 counters that will count the numbers <= or >= limit values that I can set (which can be different for the 2 counters).
Provided that the algo will not be O(nk) as we need to stream the numbers again for every inner loop.
I am not certain of your point here. I already stated that the algorithm will be O(kN), where N is the number of elements in the list, and k is the number of the element you are seeking. For the median, k=N/2, so the algorithm is O(N^2/2). Is that what you mean by "inner loop"?
Mike
Yves M
October 9th, 2006, 12:35 PM
Well, if N is reasonably large, the trisection method is a lot better, because it will be in O(N log3(2^12))
karal
October 27th, 2006, 12:07 AM
sorry, i would not be able to tell through counters.
codeguru.com
Copyright Internet.com Inc., All Rights Reserved.