Click to See Complete Forum and Search --> : Large Datasets Handling


fragy153
February 15th, 2005, 10:52 AM
Hi guys,

I am developing an application for fast processing and displaying (using gdi) of large amounts of data read from files. Data will be displayed as graphs in the x-y plane.

I need suggestions on which are the best data structures and memory management methods for reading storing this data from the file into the computer memory.

For example creating an object for each one point is like committing suicide if you consider that a file may contain 2-3 million points.

Arrays is also out of discussion

All ideas are welcome

Thanks!

MrViggy
February 15th, 2005, 10:58 AM
Well, if you're going to display single points, then that information is going to have to be in memory in one way or another. At the very least, the GDI system will store that information.

Is that all the data is? A set of points on the x-y plane?

Viggy

fragy153
February 16th, 2005, 02:26 AM
Yes that's all a set of points on the x-y plane.

but they may be a lot. I mean milllions..

And I need them to be organized in memory in a way that they can be very quickly accessed.

Because, to go a bit further, if someone wants to zoom in or out that means redrawing of all the graph and furthermore calculating new coordinates for each one of the points. If we want this to be performed fast we need very fast access to where the points are stored.

Maybe store the points in different sections of memory (according to their range of values for example) and not in a continuous big piece of memory.

I am a bit of a begginer in this type of tasks so all ideas are welcome

thanks again!

MrViggy
February 16th, 2005, 03:40 PM
Yes that's all a set of points on the x-y plane.

but they may be a lot. I mean milllions..

And I need them to be organized in memory in a way that they can be very quickly accessed.

Because, to go a bit further, if someone wants to zoom in or out that means redrawing of all the graph and furthermore calculating new coordinates for each one of the points. If we want this to be performed fast we need very fast access to where the points are stored.
If you plan on doing all the drawing yourself, yes. If you plan on using the GDI, then nope. Basically, you just setup a couple or coordinate spaces; set some transforms; and the GDI takes care of zooming, panning, rotating, etc. All you need to draw is the graph, in one size, in one way:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/gdi/cordspac_784z.asp

Maybe store the points in different sections of memory (according to their range of values for example) and not in a continuous big piece of memory.

I am a bit of a begginer in this type of tasks so all ideas are welcome

thanks again!
Not sure that would really help. Are you going to be searching for a point (by "region of x/y space")? Or, are you just going to be plotting the graph as is? Do you actually need search capability?

Viggy

fragy153
February 17th, 2005, 02:52 AM
What you said is quite helpful

If I understand what you are asking yes. I will need searching capability.

For example I will have trackers dragged with the mouse and I will need to display labels on the points where the trackers intersect with the curve. This means that I will need to very frequently search in my data whether the coordinates that the tracker is in, currently exist in my data (the curve points).

I guess there must be more efficient ways to do that than searching between millions of points every time a mouse event arrives (movement of the tracker) something that will degrade severely the performance of the application

Any help on that.?

MrViggy
February 17th, 2005, 11:52 AM
That really depends. If you only have trackers on one axis, then the dataset only needs to be sorted in that direction.

If you need trackers on both axes, then one option is to store two datasets, one sorted for each axis. If all you're storing is integers, then this may double the memory requirements. If these data points are anything more then 32 bits in size (like doubles, objects, etc), then you could just store the actual data in one place, and each sorted data structure just points to the actual value.

I was going to suggest a BSP tree. But, they are really optimized for polygons stored in n-dimentional space. I'm not sure how efficient a BSP tree would be on a set of single data points.

Unfortunately, I can't think of anything really good for this situation. If you must display all the points on the screen; and there are millions, then you are going to use up memory. Tricks like memory mapped files; caching; not displaying some points (there's no point in loading/showing 1000 points if they all map to the same display point on the screen) might help.

Viggy