Using Collections in .NET

By Luther Stanton

Collections are a vital element of any object-oriented architecture. Offering the capability to group objects in logical constructs, they improve code readability and self documentation, as well as enhance maintainability. In this article, and others in the series, Luther Stanton of Intellinet Corporation will cover topics related to collections, from basic implementation and performance analysis to extending collections with polymorphism and inheritance.

  • download source code below

Object-Oriented Primer

Most often, object-oriented constructs are described in the type of relationship that they implement. A classic example of describing object-oriented constructs in terms of their real-world implementation counterparts is inheritance. Inheritance implements the “IS A” relationship. As an example, there is Vehicle class that has certain properties of a vehicle, such as Make, Model, NumberOfDoors or HorsePower. There is then a class that derives from Vehicle called SportUtility. The SportUtility class inherits all the properties of the Vehicle class from which it derives and also adds its own properties, such as GroundClearance. It is then said that the SportUtility vehicle “IS A” Vehicle. This relationship is graphically illustrated in Figure 1.




Figure 1 – Inheritance sample showing the SportUtility class deriving from the Vehicle class

The Collection class implements the “HAS” relationship, and as such could be considered a form of the containment relationship that is used to implement the “HAS A” relationship. In a collection, however, the multiplicity is defined as one to many. Continuing with our vehicle example, a VehicleDealership object “HAS” vehicles. This relation could logically be implemented as a collection of Vehicle objects, as illustrated in Figure 2.




Figure 2 – UML representation of an inheritance relationship

When comparing a collection to real-world constructs, the collection does not implement a concrete type; rather, it implements a relationship. As above, there is a class that physically represents a sport utility vehicle. That is, if there are physically ten vehicles that need to be represented, there will be 10 instances (objects) of the SportUtility type. However, the collection does not represent something that can be touched; rather it represents a relationship or grouping. As such, an Relational DataBase Management System (RDBMS) will not necessarily implement a table to represent the Vehicles collection in the same way there may be a table with one row for each vehicle. Rather the collection will represent the relationship between the parent and children objects. As an example, there may be a foreign key type relationship in the SportUtility table to link each row of the database (or class instance) to the VehicleDealership which has that SportUtility on its lot. As an example, Figure 3 illustrates a possible data model to realize the Vehicle Dealership – Vehicles – Vehicle relationship shown in Figure 2. One case where a collection may be represented by a physical table is a many-to-many relationship implemented through a mapping table.




Figure 3 – Data representation of collection relationship

Another important point to take from the previous paragraph is the statement “rather the collection will represent the relationship between the parent and children objects.” While it is a little too early in this article to discuss the physical implementation of a collection class, it is good to keep in the back of our thoughts that we will need some type of mechanism to maintain a parent-child relationship between collections. If you look at the Vehicles’ properties carefully in Figure 2, you will see there is a private integer declared to maintain this relationship value. This could be interpreted to mean collections could never exist by themselves, but rather they are always created in the context of their parent. In fact, it is a good test for your architecture and design. If you have standalone collections, it may be time to go back and review your architecture or its implementation. Object-oriented systems, in the purest sense, should always have some parent object from which everything derives. That base object is always a singleton, not a collection. Turning again to Figure 2, this requirement is implemented through the constructor, which takes the parent key as an integer. That is, the collection cannot be instantiated without a parent key. Even something such as a listing of all possible values in a table used for pick-list maintenance would belong to the “system” or “application.”

Motivations To Develop Object-Oriented Solutions

With a basic theoretical understanding of what collections represent in an object-oriented world, let’s discuss what object-oriented development brings to the table. It is often said that software is written to solve a business problem. This statement isn’t true, in my opinion, in the majority of the cases. More often, software is written to enhance a business process. There is rarely a case where software is the only way a business can “solve a problem.” More common is developing software to improve or expedite a business process by increasing operational efficiencies or removing tedious repetitive tasks.

Two common approaches to software development are functional decomposition and object-oriented analysis and design, also known as business modeling. Functional decomposition focuses on identifying very specific areas and determining the exact functions needed to realize application functionality. Each requirement is decomposed into its constituent functions. This type of analysis approach often leads to a very procedural system relying on specific function calls from start to finish to achieve the desired resultsThe cost of building software in this manner is that the natural relationships between entities within the system are not modeled. A system developed using functional decomposition is often very efficient within the specific area it was designed to automate and quick to build, however, it is typically very brittle when it comes to adapting to changing business requirements or enhancing the functionality. Often, these systems do not create many reusable components outside of the utility space.

The second approach, object-oriented analysis and design, develops software that models the business process through objects representing real-world elements. This process typically involves understanding the business need and creating models, often through Unified Modeling Language (UML), to capture the interactions and processes that deliver the needed functionality. The models are then used to identify and define the classes that will ultimately become the application elements. The business requirements are then met, not with a precise execution of function calls, but rather through the interactions of the objects within the system, much the same way in which the process the software is mimicking achieves results. Because a process is emulated, it is easy to change how the key players within that process, the objects, interact when the real-world process changes. This allows greater flexibility in adapting applications to changing business requirements. Also, businesses are not disconnected entities. The same notion of a customer may exist within numerous elements of a business, from marketing to customer service to accounting. By creating software constructs that model these entities, we can also get a larger amount of code reuse through well-designed objects.

Another benefit of using an object-oriented approach to software development is the inherited Application Programming Interface (API) that is developed as a result of the object associations. Suddenly, operations such as saving the data associated with a person become very intuitive, especially with the IntelliSense built into the Visual Studio .NET (VS .NET) Integrated Development Environment (IDE). Figure 4 shows the difference between a sample procedural system that uses an administrative module to perform functions, such as saving user data, and an object version of the same system. While both approaches technically use objects (after all, everything in .NET derives from the type System.Object), I like to term the two approaches object-based development and object-oriented development. Object-based development is more procedural in nature and does not necessarily combine data and behavior within the same class, while object-oriented development does keep behavior and data in the same class.




Figure 4a – Object-based code sample




Figure 4b – Object-oriented code sample

One example of where the intuitive API that emerges from an object-oriented system lends even more benefit is in larger projects where you have separate teams working on the user interface (UI) and the business logic. I was involved in a project some time back where there was a team of ASP user-interface developers that were not familiar with middle-tier or database development. There was a separate team of object developers that implemented the business logic in COM / COM+ objects. It was very easy for the user interface team to use our objects without having to know how our objects were implemented. Those developers did not have to have volumes of documentation to look up information such as what function within what module needed to be called to update user information, as the code structure in Figure 4a would require. Instead, they simply knew they needed to grab an instance of the user object that they needed, set the properties, and call the update method, as in Figure 4b.

Benefits of Custom Collections in .NET

Hopefully you are starting to see some of the benefits of moving to an object-oriented approach to software development. Let’s look at some specific reasons to implement custom collections. You may say, “I understand why it is important to have a customer object, but I can just use the Rows collection of a DataSet to represent a group of related customers. After all, isn’t the DataSet an object and the Rows object a collection?” Technically this is an accurate statement that raises valid points. A DataSet is an object and the Rows attribute of the DataSet is a collection. Additionally, the .NET Framework provides powerful methods of binding DataSets (as well as DataReaders) to display elements such as grids and drop-down lists with very little custom code. Another point is that a DataSet is easy to retrieve; it is the native form of retrieving data from persistent data stores such as Microsoft SQL and Oracle.

However, when compared to a custom collection, a DataSet is a very generic object; it can represent everything from a collection of customers to a collection line items associated with a purchase order. While this serves a purpose in retrieving data from the database and perhaps providing an implementation-agnostic transport mechanism, it is not the best choice for a system that wants to model specific, real-world entities as objects.

Additionally, a true object collection contains individual elements, i.e., the objects that can model real-world behavior through custom operations. The constituent elements of a DataSet, the DataRow objects, cannot provide this same level of custom behavior. Another limitation of DataSets is the code that consumes them needs to have intimate knowledge of the data schema. Perhaps the object that needs to be represented is stored in multiple tables in the database. Or perhaps your database-naming scheme does not match the naming scheme of the properties of an object. Although the latter point could be fixed by renaming the columns as they are selected from the database, this is an extra-tedious task that needs to be completed. I have provided two code samples in Figure 5. One shows iterating through a collection of objects and the other shows iterating through the DataRow collection of a data set to pull out the same data. You’ll probably agree that iterating through the collection is more self-documenting and intuitive than using the DataSet Rows collection.




Figure 5a – Iterating through a collection of objects




Figure 5b – Iterating through the DataRow collection of a DataSet

Table 1 summarizes the pros and cons of using pure collections within your object architectures. Hopefully these points are enough to sway you to at least consider implementing some of your own collections in a future development effort. The remainder of this article may help you down that road. Enough of the theory, let’s write some code!

Table 1

Pros of Using Custom Collections

Cons of Using Custom Collections

Provides a container of specific objects

  • Provides a built-in API for code
  • Reduces errors
  • Individual elements can exhibit business behavior

Need custom code to develop

  • Increased development time

Hides persistence implementation

  • Extra layer of isolation allows persistence layer to change without affecting “upstream” code
  • Clients of collections have no need to maintain knowledge of data sources

Need to be marshaled from a data source, such as a data reader or data set

  • Requires additional code
  • Extra processing required to complete the marshaling every time a collection is instantiated.

Can easily be extended using other OO concepts, such as polymorphism and inheritance

  • Less work to implement derivatives once the base classes are implemented
  • Reduces errors by reusing stabilized code
     

Implementing a Custom Collection Class

There are numerous ways to implement a custom Collection class. One of the benefits of implementing a collection is the ability to use the foreach (C#) or For Each Next (VB.NET) keywords to iterate through the elements of the collection. In addition to being a very compact syntax, the foreach keyword makes for very clean and self-documenting code. Figure 6 illustrates a segment of C# code that iterates through a collection of vehicle objects.




Figure 6 – Using the C# foreach syntax to iterate over elements of a collection

(NOTE: While a list can be populated by iterating through a list of items such as the code in Figure 6 does, this is often not the optimal or preferred way to populate lists. Techniques such as data binding are often a better way to accomplish this task. I included this snippet of code for academic reasons. Future articles in this series will look at using .NET data binding with custom collections.)

In order to use the foreach keyword to iterate through a collection of objects, the Collection object needs to implement both the IEnumerator and the IEnumerable class interfaces. These interfaces reside in the System.Collections namespace.

Implementing Interfaces

Before we move on to actually implementing these two interfaces, let’s take a moment to discuss interfaces. An interface can be likened to a contract. When a class decides to implement an interface, it must abide by the contract. However, how the class chooses to implement the “terms” of the contract is the responsibility of the class. An interface definition never included implementation and therefore can never be instantiated directly.
Declaring that a class will implement an interface is quite simple. In VB.NET use the Implements keyword when the class is defined, as shown Figure 7.




Figure 7 – VB.NET interface implementation syntax

In C#, the “:” can be used to implement the interface as in Figure 8.




Figure 8 – C# interface implementation syntax

Implementing IEnumerable

Once a class is tagged as implementing the interface, each interface operation must be realized within the class. According to the .NET Framework SDK, the IEnumerable has only a single operation that needs to be implemented, GetEnumerator. This operation needs to return an Enumerator, defined as an IEnumerator interface. But wait, previously I said that an interface can never be instantiated directly. So our implementation of GetEnumerator must return an instance of a class that implements the IEnumerator interface. Hence the fact that our Collection class needs to implement the IEnumerable and IEnumerator interfaces in order to be used with foreach syntax.

Armed with our current knowledge, let’s start a skeleton Collection class. Figure 9 illustrates all that is needed in our code to implement the IEnumerable interface. Let’s look at a few lines of this code in detail.




Figure 9 – Implementing the required elements of the IEnumerable interface

Ensure the System.Collections namespace is included in the project, as in line 2. Both the IEnumerable and IEnumerator interfaces reside in the namespace. Line 9 declares our custom Collection class as Vehicles. It also indicates that the class implements the IEnumerable interface. Line 12 declares a private array of Vehicle objects to store the elements of the collection. Selection of a simple array object for this task is far from ideal, however, subsequent articles in this series will discuss some alternatives. For purposes of this article, which focuses on the implementation of the IEnumerable and IEnumerator interfaces, the use of the array helps maintain simplicity. Line 14 declares the default constructor. Line 18 declares the implementation of GetEnumerator, the only method of the IEnumerable interface. Notice that the return type of the GetEnumerator method is of type VehicleEnumerator, which is our private class declared on line 23 that implements that IEnumerator interface.

(NOTE: There is no private parent-key variable included in this sample. While in actual implementations there should always be a parent key associated with a collection, it is intentionally left out in this example in order not to cloud the concepts being studied.)

Implementing IEnumerator

Thus far there’s a complete build out of the IEnumerable interface. Don’t attempt to build the solution just yet. Remember that, by contract, we must always provide realizations of all the methods defined within the interfaces being implemented. Method and property realizations need to be added for those methods declared in the IEnumerator interface. These methods are:

  • Current – property to retrieve the “current” element of the collection
  • MoveNext() – method to move the “current” element of the collection to the next element
  • Reset() – Sets the enumerator to its initial position within the collection

Developing these functions is straightforward. The resulting code is shown in Figure 10. As before, a detailed look at some of the code is in order.




Figure 10 – Code showing the implementation of the IEnumerator interface

Lines 41 and 42 implement the two needed local variables. We need access to the Collection class itself in order to access the internal array’s elements. Because this class is a private member of the Collection class, it can access the elements of the private array. The integer m_intPos is used to keep track of the array index pointing to the “current” element of the collection.

Line 44 declares the constructor. The only action taken in the constructor is to take the passed instance of the collection to enumerate over and assign it to a local class variable.

Line 49 declares the attribute “Current” which is required when IEnumerator is implemented. It simply returns the array element residing at the index pointed to by m_intPos.

Line 55 declares the MoveNext() method, also required when implementing the IEnumerator interface. A check is performed to ensure that boundaries of the internal element array are not exceeded.

Finally, Line 67 declares the final method we need to realize for the IEnumerator interface, Reset. This method simply resets the value of m_intPos to -1.

Figure 11 shows some additional code added to the collection’s constructor to add two elements to the internal array for testing purposes.




Figure 11 – Code sample showing the logic contained within the collection constructor to populate the internal array

That’s really all there is to implementing the IEnumerator and IEnumerable interfaces. At this point, once the code is compiled, we have a very rudimentary, but functional, custom Collection object that can be used with the foreach syntax to iterate over the elements of the collection. After all the code is in place and compiled, Figure 12 shows the results of populating a drop-down list using the elements of the collection.




Figure 12 – A functional collection

Figure 13 provides the UML representation of our custom collection.



Summary

This article covers a lot of ground. First, it looked at where object-oriented analysis and design fits into development within the industry today. Next, it looked at some of the benefits and potential drawbacks of using custom collections versus built-in collection-like elements provided with the .NET Framework, such as DataSets. It then provided a brief discussion of interfaces and discussed the requirements to enable the collection to interact with the foreach syntax to iterate over constituent elements. Finally, by using a sample Vehicle collection sample, the article demonstrates the code needed to implement the appropriate enumeration interfaces. In the ZIP file containing the article’s code, there is a project containing the VB.NET implementation of the Vehicles collection as well. The UML models used to create the images for this article were created with Sparx Systems’ Enterprise Architect 3.51. The models also are included with the code download. A free viewer application for the models can be downloaded from http://www.sparxsystems.com.au/.

Future articles in this series will examine some performance enhancements and basing collections off of more robust .NET Framework objects such as ArrayLists and the abstract CollectionBase object.

About the Author

Luther Stanton is a Principal Consultant with Intellinet Corporation, based in the Atlanta area. Intellinet, the southeast’s only four-time Microsoft Gold Certified Partner, provides application-development and infrastructure consulting services throughout the southeast. Intellinet builds enterprise .NET systems using object-oriented development techniques and their custom process, a best-of-breed development methodology combining the leading agile techniques in the business.

Luther resides in Locust Grove, Georgia, with his wife, Heidi. He has actively developed Microsoft-centric applications for over seven years and has worked with .NET since Beta 1. Luther’s current focus is on methodology development and application design using object-oriented analysis and design techniques.
Luther can be reached at luthers@intellinet.com.

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read