Environment: .NET, Visual Basic
Abstract
Based on a real-world project, this article shows how to handle XML data in .NET using C# DataSets and DataViews. The article presents, step by step, how to bind, read, and view XML data and generate graphical reports with the built-in Crystal Reports engine. The article puts special emphasis on XML standardization and interoperability issues experienced during the project, and presents workarounds for existing limitations.
Introduction
This project was part of a larger project in the context of training, acquisition, and analysis. A Palm PDA is used for mobile data acquisition; these data are transferred onto a PC into a well-formed and valid XML data file. The task of this project was to develop a sophisticated data analysis and reporting application on the PC using the given XML data file.
The .NET framework was chosen for the PC application because of its great data handling capabilities, its powerful graphic functionality, and the built-in reporting engine of Crystal Reports. We selected the programming language C# because we believe that this is the language of the future under .NET—and because there is no major difference among the supported languages in VS.NET with respect to their offered functionality.
The idea was that both the existing Palm application and the new C# application can share one common XML file as its data source.
XML Access Methods in .NET
.NET offers numerous XML-related classes for reading and writing XML documents. Although the documentation states that any class conforms to the W3C XML 1.0 standard, there are trade-offs in respect to performance, efficiency, productivity, and XML compliance among the different implementations.
So, we first investigated these different access methods and then selected the most appropriate for our application. This step also addresses important issues that must be considered when dealing with legacy valid and well-formed XML data in .NET.
The C# developer can use any of the following five methods to access XML data:
- XmlTextReader
- XmlValidatingReader
- XmlDocument (and the rest of the DOM API)
- XPathNavigator
- ADO.NET DataSets
The detailed description of these methods would go beyond the scope of this article. Thus, the following table summarizes very briefly the characteristics of each method and shows when to use which technology.
Table 1. XML Data Access Methods in .NET | |
---|---|
Method | Use |
XmlTextReader |
|
XmlValidatingReader |
|
XmlDocument |
|
XPathNavigator |
|
ADO.NET DataSets |
|
When dealing with complex XML Schemas, ADO.NET DataSet offers the greatest support and flexibility. The DataSet, which can be used in either relational databases or XML data files, is an in-memory cache of retrieved data. It is the major component for data handling in the ADO.NET architecture.
The DataSet provides a consistent relational programming model regardless of the data source; thus, it consists of a collection of DataTable objects that you can relate to each other with DataRelation objects. A DataSet reads and writes data and schema as XML documents. The data and schema can be transported across HTTP and used by any application, on any platform that is XML-enabled.
Considering all these features and the fact that this C# application will need to be Web Service interoperable with a server application in the future, we decided to go for DataSets.
XML Conformance in .NET
Going with DataSets, ReadXML and ReadXMLSchema are the methods of choice to read the XML data and the corresponding XML schema.
However, doing so with the given valid and well-formed XML data file, raised the following runtime error:
“The same table {description} cannot be the child table in two nested relations.”
Looking deeper into the file shows that the valid and well-formed (and thus fully W3C compliant) XML file had a child table called “description” which had more than one parent table. This part of the XML Schema is shown in Figure 1.
Figure 1. Original XML Schema
This is a valid and allowed design according to the W3C XML schema specification. (Refer to W3C’s definition of the Parent-child Relationship). Also, the XMLSpy tool validated this XML schema as valid and well-formed.
More investigation showed that the .NET XML framework does not (yet?) support the full XML 1.0 standard. It does not support recursive schemas or designs where a child node type is beneath more than one parent node type. Of course, also all classes based on DataSets, like the System.Windows.Forms.DataGrid, which we were going to use, have the same restriction. In other words:
XML Design Issue 1: “.NET Datasets require that a table can only be a child of one other table.”
Microsoft has identified this as a bug (Q325695) and their knowledge base article [MS-Q325695] gives more information.
There are essentially three ways to solve this problem:
- Write custom functions.
- Write a converter.
- Change your XML Schema.
If you can’t change the existing XML Schema (for example, because you are not the owner of the file), there are two ways to solve the problem: You customize all your code, or you write a converter.
Customizing all your code means that you write special methods for the XmlReader class and add records to a table with a hierarchical type design. You also have to implement a custom filtering mechanism if you want to display this data in bound controls, and a custom writing mechanism to write to the type of hierarchical format you are reading from. Even worse, any change in the schema file will cause changes in all of your custom functions—so this isn’t really an option.
The second option, if you can’t change the given XML Schema, is to write a converter. That means that you define a new .NET-compliant XML Schema for the given XML file and make use of the given DataSet capabilities. Then, you write a bi-directional converter to convert the data from one XML Schema into the other before you use it. Although this still causes additional effort, it causes the least additional effort.
If you are the owner of the XML Schema file, the solution is to change the given the given XML Schema and also make it .NET compliant. In our situation, this meant to define three different description type structures (Idescription, Ddecscription, and Fdescription), for the three occurrences of the description.
To avoid multiple definitions of the same structure (which can cause maintenance errors in the future), we made use of the inheritance capabilities of XML Schema. We derived the three description types from the still-existing base structure “description”. The details are shown in Figure 2.
Figure 2. Avoid one child with several parents
Reading XML Data with DataSets
C# developers have several approaches of using a DataSet in the applications. This section explains the chosen approach to create a DataSet from the modified XML Schema, the tools that were used, and how we populated the DataSet from the XML source file.
A DataSet can be typed or untyped. A typed DataSet is a dataset that is created, based a given XML Schema definition (XSD) file. Information from the schema (tables, columns, and so forth) is generated and compiled into the new DataSet class as a set of first-class objects and properties.
Because a typed DataSet class inherits from the base DataSet class, the typed class assumes all of the functionality of the DataSet class and can be used with methods that take an instance of a DataSet class as a parameter.
An untyped DataSet, in contrast, has no corresponding built-in schema. As in a typed dataset, an untyped DataSet contains tables and columns—but those are exposed only as collections. (However, after manually creating the tables and other data elements in an untyped DataSet, you can export the DataSet‘s structure as a schema using the DataSet‘s WriteXmlSchema method.)
You can use either type of DataSet in your application. However, Visual Studio has more tool support for typed DataSets, and they make programming with the DataSet much easier and less error-prone. So, having considered all these options, the decision was to go with ADO.NET typed DataSets.
Typed Datasets can be generated with the XSD.EXE tool, which is part of the VS.NET environment. The tool accepts a valid XML Schema file as input, as well as the language to use (C#, VB). The following line shows a typical command line of the tool that uses the XML Schema file XSDSchemaFileName.xsd.
xsd.exe /d /l:C# XSDSchemaFileName.xsd /n:XSDSchema.Namespace
The /d directive tells the tool to generate DataSets, /l specifies the language to use, and the optional /n defines the namespace to generate. The generated DataSet classes will be saved in the source file XSDSchemaFileName.cs.
Once the Typed DataSet classes are generated, the further procedure is almost child’s play. The provided methods and properties guarantee data access in a type safe manner.
So, the next step was to populate the Typed DataSet at runtime from the XML file. The ReadXml() and the WriteXml() methods of the typed DataSet class do this very easily without any difficulty, as the following two lines of code show:
DataSet myDS = new DataSet(); myDS.ReadXml("input.xml", XmlReadMode.ReadSchema);
Viewing XML Data
Having created and populated the DataSet, the next step was to bind the data to the Windows controls for user interaction. Because we were going to implement a grid view, this involved binding DataTables to DataGrids with user navigation facilities and providing parent-child relationships in the grid view so that when a user selects a parent row, the corresponding child rows are to be shown automatically for editing purposes.
The DataTable is the primary building block of ADO.NET. The DataTable is made up of DataColumn and DataRow collections. The columns define the schema of the DataTable and the rows make up the actual data in the DataTable. A DataView is a bindable, customized view of a DataTable. You can create multiple DataViews of the same DataTable; each one can contain different data sorted in different order. Additionally, you can add, delete, or edit information in each DataView.
DataTable and DataView use the same instances of DataColumns; they share the common structure. However, DataTable and DataView each has its own row collections. The DataTable consists of DataRows while the DataView is made up of DataRowViews. Figure 3 shows the relationship among DataSet, DataTable, and DataView.
Figure 3. Tables and Views
The data binding of typed DataSets is very smooth in C#. You can bind the data sources to the WindowsForms controls at design time or runtime without any difficulty. For example, you can specify which data column is to be bound to a ComboBox in a WindowsForm by setting the DataSource and DataMember property of that control so that it gets bound automatically at runtime.
We used the .NET DataView class to bind DataTables to the DataGrid. DataView supports data filtering and data sorting at design time or at runtime. For convenience, a default DataView is generated for the developer for customization.
In the analysis part, the application needs to display some statistical information, which is calculated from several columns and is shown in some dynamically generated extra columns, as shown in Figure 4.
Figure 4. The data view
Some of the data columns needed for these calculations were in different parent and child tables. Unfortunately, it was not possible to join two tables in the dataset to show up in a single DataGrid (for example, by using the DataSet.Relations property). To further clarify the situation, say I have two Parent & child tables Table1(id,columnA,columnB) and Table2(id,columnC,columnD) and I want the resulting datagrid columns to be viewed as (columnA , columnB , columnC , columnD). It is similar to the situation as if I had written a sql query like this one:
select a.id , a.columnA , a.columnB , b.[columnC] , b.[columnD] From Table1 a , Table2 b where a.id =b.id
It is important to understand that a DataView is not equivalent to a SQL VIEW. A Dataset does not contain a query processor and has no mechanism to perform a join.
The other way to do it (most developers seem to suggest) is to simply create a new datatable that is typed the way you want it and manually move the data from both datatables to the new one. Also, any data manipulation has to copy over to the original tables using nested loops. This is a brute force method. It would be nice if anyone came up with something else.
Finally, more references were added to the schema file. But that is not the best workaround. The reason that the columns could not be merged is founded in the relational programming model of the DataSets. The existing XML file uses nested child tables instead of references. This is not (yet?) resolved by the .NET DataSets, so that references are missing to cross-reference tables and to establish relationships.
XML Design Issue 2: “Use references in XML data files instead of nested child tables to show up a joined table in a single DataGrid.”
So, the next change was to introduce references where needed.
Figure 5. Adding references to tables
In our application, we used the Date column as a filter criteria for the generated graphs in the Crystal Reports Engine. Thus, in Figure 5, a Date column has been added to the session table, which is a direct reference to the date column of the day table.
Reporting XML Data
Having completed the data binding to Windows forms, the next task was to generate the summarized reports in both graphical and tabular forms. We have chosen the Crystal Reports Engine (which is delivered with the VS.NET environment) with its powerful dynamic reporting capabilities.
Crystal Reports for Visual Studio .NET offers the following three object models:
The top-level Engine object model is the report document object. The report document object offers sophisticated control over your object. It contains all the properties and methods needed to interface with and customize a report. However, it does not provide the display of the reports.
This is where viewer objects and the report document objects come together. The Windows Forms viewer allows you to view a report in a Windows application and the Web Forms viewer does it for a Web application.
So, we selected the Engine Object model together with the Windows Object Model to accomplish the requirements. The C# CrystalDecisions.Windows.Forms namespace in the CrystalDecisions.Windows.Forms.dll assembly provides support for the Windows Forms Viewer control and its associated classes. It also can dynamically update the report being hosted and interact with controls within a Windows application.
When using the Crystal Reports Engine, you basically have two options for the selection of data source. The first one is passing the raw XML file, and the second is passing the populated DataSet to the Crystal Reports Engine.
When using the first approach, you have to verify on every usage that the XML file contains the correct data types (data validation). Otherwise, the Crystal Reports Engine would interpret the data types in the XML files differently (for example, floats are used as strings), which prevents the data from being processed further within Crystal Reports.
The second, recommended, option is passing the already populated typed DataSets to the Crystal Reports Engine. This saves the runtime verification of data types. The content and the scope of the report is determined at runtime according to the user selection of the required report format. Crystal Reports provides template files for this. Figure 6 shows a sample report.
Figure 6. Crystal Report View
Summary
Visual Studio .NET has a very rich set of tightly integrated tools that work together to support the C# application developer. For XML-based applications, it provides schema generation and validation tools. It offers the methods of choice for XML handling. Especially the DataSet classes make reading and writing of XML data extremely easy. The DataSet classes provide easy data binding to the GUI and even to third-party tools such as Crystal Reports for advanced reporting.
Other very nice new features are the new source code documentation support and application installer production. The code documentation tool generates a very nice HTML documentation (they improved the JavaDoc style), which is completely XML-based. The integrated application installer allows the easy creation of a setup program for your application.
Despite the currently existing limitations of DataSets concerning its XML standard support, it is the tool of choice for XML data handling with .NET.
Once your XML data sources confirm to the “.NET XML standard,” the rest can be handled without much difficulty. It offers a lot of built-in powerful classes and methods for reading, writing, and displaying data. This allows the developer to concentrate on the more important application logic rather than having to worry about data handling issues or framework specifics.
References
Reference | Description |
MSVSNET | Visual Studio .NET Help, Microsoft Visual Studio IDE v7.0 |
W3CXML | Extensible Markup Language (XML) 1.0, 2nd Ed., http://www.w3.org/TR/2000/REC-xml-20001006, Oct 2000 |
W3CDOM | Document Object Model Technical Reports, http://www.w3.org/DOM/DOMTR |
W3CCHI | W3C definition of Parent child Relationship, http://www.w3.org/TR/2000/sec-well-formed |
MS-Q325695 | “Error Message When You View Multi-Dimensional XSD Schemas in the Visual Studio .NET XML Schema Designer,” http://support.microsoft.com/default.aspx?scid=kb;en-us;Q325695 |
About the Authors
Nalaka Withanage has just finished his study of Electronics & Telecommunications Engineering at the University of Moratuwa, Sri Lanka. He is currently doing his summer practice at the University of Applied Sciences Solothurn/Northwest Switzerland in Oensingen. He has been working on various projects in the area of .NET and Web Services. You can reach him at nalaka.withanage@fhso.ch.
Martin Kropp is a professor of Computer Science at the University of Applied Sciences Solothurn/Northwest Switzerland in Oensingen. He teaches software engineering with an emphasis on object-oriented technology. His current interests are in the .NET Framework. He has conducted several student projects on .NET with special emphasis on interoperability capabilities with other platforms. You can reach him at martin.kropp@fhso.ch.