Click to See Complete Forum and Search --> : XMl files vs Database


mansur78
August 8th, 2002, 06:58 AM
I have to take a design time decision. We are developing an enterprise application based on COM/COM+ that already makes use of SQL Server. Our application supports some standard documents that are XML-based and are stored in the form of files. Our application requires frequent read/write operations on these documents. Now there are two techniques that may be followed:

One of them is to continue using the physical files as documents, and use parsers like SAX, XPATH, DOM etc to manipulate the documents. The drawbacks would be very slow read/write operations, large memory consumption, and concurrent accesses to these files to support multiple users at a single time would be cumbersome to manage. I cannot think of any advantage except that the data would exist physically separate from each other in separate files.

The other alternative is to map the hierarchy of our xml document to a hierarchy of tables in our database. The data from the xml files can be parsed and dumped into those tables in database at the time when the file is imported into the system, any subsequent read/write operation would now require simple queries on the database. Anytime a user demands a physical file, it can be built on the fly from the stored state in the database. The database option would also allow to utilize the inherent transaction management support which will solve the problems of concurrent accesses to the documents. This would also result in simpler logic and less complex, efficient and traditional code, which may have a significant effect over the overall performance of our application. The drawbacks, as feared by some of my teammates, would likely be increased size of the database(which might be coz of redundant data), speed(Well, I strictly oppose this argument, since I believe that speed of the operations on the database would still be faster than manipulating the physical files themselves, since commercial databases are highly optimized form of data storage).

I am not here to argue over the use of database or xml files as a means of document storage in an enterprise application, even though I am in favor of going with the 2nd approach. I have to see things from cost and benefit point of view, since we have limited resources, and we cannot go for adventures like writing a customized database system of our own using xml files as the storage mechanism, and I want to seek a solution which is optimized and efficient enough to be incorporated in our enterprise-level application and at the same time requiring less time and resources. Well these are my points of view, I might be missing something or may be wrong altogether. I need some input from you guys who have been though this kind of situation before, and what you prefer would be an ideal solution in this situation. Your help will be appreciated.

Thanking you
Mansoor Ali Khan

Waldo2k2
August 8th, 2002, 12:47 PM
Just so I can get an idea of time consumption on your backend...what does the application do, particularly with the XML files. From what you said I agree, XML files are great for some things, but as far as rapid data transmission they're not well optimized (mainly because they have to be parsed). To drive your point home to your colleagues, you should bring up the question as to why you're using XML in the first place. The advantages of XML over things like an SQL database is that it's data can be read by any computer, it's more flexible than records stored in a DB. However, if they idea was considered to map your XML files to records in a DB, then why even include the extra step? It seems as though you wouldn't be taking any advantage from the use of XML in this manner. Only time and headaches. However I'm not 100% sure what you're doing with your application, so if I'm of base you could explain what your application does. In this way I could offer better advice, but I'd bring these points up at your next meeting if they're pertinent. Good luck.

mansur78
August 12th, 2002, 04:54 AM
Let me reelaborate my stand. Actually, as i have mentioned, our application operates on data which is stored in the form of a xml-based standard called PDX(Product Data Exchange). PDX packages come in the form of xml files, and currently our application supports importing of these files into our system from other tools/applications, thus we CANNOT avoid use of xml files altogether. But we do have the option to use database for representing this data for internal usage to avoid compexity and redundancy in our code as well as to improve performance of our system, as it is a very critical factor for any enterprise-level application. And any time we need to export a PDX package, we can do it instantly by building an xml file from the state stored in database. My question is that what are the advantages/disadvantages of each of the approach I had mentioned earlier.

Waldo2k2
August 12th, 2002, 11:36 AM
Well first of all the advantages would far outweigh the latter. Most databases/servers are quite fast, especially if they are dealing with text (which is all the an html page would be to a server, such is the internet). Creating a file from the fly would take less time than to open one, copy it, and send the copy. This uses much more system resources than opening a file, writing, and sending. They do both have three steps, but, accessing the bulk of xml files will take more time considering they aren't organized properly, disk seek times will differ. Copying files that exist takes up valuable system resources, most of the time the copy will be either in RAM (not good at all for the system) or somewhere on the hard drive (in which case the added steps of writing and deleting will occur), however most of the time the copies will end up in RAM. Creating the xml files on the fly cuts logic. If you don't have to decide what file to go retrieve, that's precious time you're recovering. With fewer lines of code you can just create a file and send it. On another note, if the application using the PDX system has write access to the location you're sending your XML files out to (i'm just assuming they're being sent somewhere besides the local disk), using the database approach will cut time even further. Now, instead of generating and sending, it will simply create the file on the other disk, saving memory and thus time. So far, a clear disadvantage that i see is the task up mapping these XML files to your database, which may take some time before the app can be ready.