.NET Serialization

Serialization is a key part of the .NET framework. The remoting infrastructure, including Web Services and SOAP depend on serialization, which is the process of reducing an object instance into a transportable format that represents a high-fidelity representation of the object. What makes this process interesting is that you may also take the serialized representation, transport it to another context such as a different machine, and rebuild your original object. Given an effective serialization framework, objects may be persisted to storage by simply serializing object representation to disk. Given an efficient serialization framework, objects may be remoted by simply serializing an object to a stream of bytes stored in memory, and transmitting the stream to a cooperating machine that understands your serialization format.

In .NET, serialization is often used with streams, which are the abstractions used to read and write to sources such as files, network endpoints, and memory sinks. Richard Grimes wrote an earlier column on .NET streams for CodeGuru, which you can read here.

How Do You Use Serialization?
Simplified Serialization Using Attributes
Serialization and Private Members
How does Serialization Work in .NET?
When is Deserialization Complete?
A Word About Final Classes
Next Column
About the Author

How Do You Use Serialization?

Serialization is handled primarily by classes and interfaces in the System.Runtime.Serialization namespace. To serialize an object, you need to create two things:

a stream to contain the serialized objects
a formatter to serialize the objects into the stream

The code required to perform serialization in .NET is very simple. Most serialization code is similar to the boilerplate code shown below, which serializes an object into a file stream using the BinaryFormatter class:

public static void WriteToFile(BaseballPlayer bp, String filename)
{
    Stream str = File.OpenWrite(filename);
    BinaryFormatter formatter = new BinaryFormatter();
    formatter.Serialize(str, bp);
    str.Close();
}

Using the BinaryFormatter for serialization results in a compact representation on disk, although it is not a form that is easily read using a text editor. If you would like a more human-friendly representation, you can use the SOAP formatter, as shown below:

public static void WriteToFile(SerialCircle shape, String filename)
{
    Stream str = File.OpenWrite(filename);
    SoapFormatter formatter = new SoapFormatter();
    formatter.Serialize(str, shape);
    str.Close();
}

Simplified Serialization Using Attributes

The simplest way to make your classes eligible for serialization is to use the Serializable attribute. By decorating your class with this attribute as shown below, your classes are immediately made serializable:

[Serializable]
public class BaseballPlayer
{
    [...]
}

By default, all members of a class are serialized, including private members. To reduce the amount of data serialized to the stream, you can inhibit serialization of members that are not required to reconstitute the class by attaching the NonSerialized attribute to those members:

[NonSerialized]
private Decimal _salary;

The NonSerialized attribute is useful for those member variables that represent calculated values or contain information that is transient rather than persistent, or data that should be hidden and not persisted to a storage medium.

Serialization and Private Members

In order for serialization to work effectively, the serialization mechanism must be able to capture enough information about the state of an object to allow it to properly recreate a true copy of the original object at a later time. This often requires information not available to public clients of a class, but it is a necessary side-effect of the work that the serializer must perform. Improperly serialized objects cannot be deserialized properly—it’s as simple as that. An example of failed serialization can be seen in the movie Galaxy Quest, where the transporter mechanism fails its test, effectively deserializing a menacing beast inside-out, with explosive (and messy) results.

Is Exposure Required?

In the MFC class library, objects were responsible for serializing themselves. While this did prevent the sharing of internal class details with formatters, it did require all class authors to write correct serialization code for all classes that might ever require serialization. This requirement leads to the following problems:

Many developers who can write usable classes cannot write good serialization code. This leads to problems similar to the exploding space pig in Galaxy Quest.
Testing can detect the likely hood that a specific class will result in explosions, but such tests increase the testing effort required for each class that implements serialization support.
Embedded serialization code adds to the costs of maintainability and adds to the risk of updating components. Any changes to a class that supports serialization in MFC must be properly reflected in the serialization code. Errors in this code may be, as noted previously, just as catastrophic as code that performs the “real work” of the class.
But there is a simple way around this mechanism—you can simply elect to not participate in MFC serialization, which may limit the usefulness of your class. Note that even though your class does not need serialization today, it may need it tomorrow, and software developers are notoriously bad at foretelling the future. In NET, all types are self-describing, and the serialization architecture simply leverages the self-describing nature of .NET objects to perform serialization, without all of the problems inherent in the MFC approach. What do you get out of the improved serialization in .NET?
In most cases, you need to write very little serialization code. In this column, the code examples are covering special cases, but much of the time you’ll need to write zero (or very little) code.
You don’t need to maintain the serialization code you don’t write—there’s somebody at Microsoft doing that for you.
You can version your classes as needed—the serialization will occur correctly, as the serialization architecture adapts to changes to your classes.
The serialization framework provides several places where you can customize portions of the framework to suit your needs. For example, you can write your own formatter class if you need to format your serialization output in a specific way, using ROT-13 encoded XML, for example.
All of your classes can participate in serialization, with no work (other than an attribute tag) required by you.

How does Serialization Work in .NET?

As discussed earlier, .NET objects are serialized to streams, which are discussed in the Richard Grimes article here. To summarize and review, when serializing objects to a stream, you must use a .NET formatter class to control the serialization of the object to and from the stream. In addition to the serialized data, the serialization stream carries information about the object’s type, including its assembly name, culture, and version.

The Role of Formatters in .NET Serialization

A formatter is used to determine the serialized format for objects. All formatters expose the IFormatter interface, and two formatters are provided as part of the .NET framework:

BinaryFormatter provides binary encoding for compact serialization to storage, or for socket-based network streams. The BinaryFormatter class is generally not appropriate when data must be passed through a firewall.
SoapFormatter provides formatting that can be used to enable objects to be serialized using the SOAP protocol. The SoapFormatter class is primarily used for serialization through firewalls or among diverse systems. The .NET framework also includes the abstract Formatter class that may be used as a base class for custom formatters. This class inherits from the IFormatter interface, and all IFormatter properties and methods are kept abstract, but you do get the benefit of a number of helper methods that are provided for you.

When implementing a formatter, you’ll need to make use of the FormatterServices and ObjectManager classes. The FormatterServices class provides basic functionality that a formatter requires, such as retrieving the set of serializable members object, discovering their types, and retrieving their values. The ObjectManager class is used during deserialization to assist with recovering objects from the stream. When a type is encountered in the stream, it is sometimes a forward reference, which requires special handling by the ObjectManager class.

Taking Control of Serialization with the ISerializable Interface

While the [Serializable] attribute is fine for classes that don’t require fine-grained control of their object state, occasionally you may require a more flexible serialization mechanism. Classes that require more control over the serialization process can implement the ISerializable interface.

When implementing the ISerializable interface, a class must provide the GetObjectData method that is included in the interface, as well as a specialized constructor that is specialized to accept two parameters: an instance of SerializationInfo, and an instance of StreamingContext. A minimal class that implements ISerializable is shown below:

[Serializable]
public class SerialCircle: ISerializable
{
    public SerialCircle(double radius)
    {
        Console.WriteLine("Normal constructor");
        ConfigureCircleFromRadius(radius);
    }

    private SerialCircle(SerializationInfo info, StreamingContext context)
    {
        Console.WriteLine("Deserialization constructor via ISerializable");
        double radius = info.GetDouble("_radius");
        ConfigureCircleFromRadius(radius);
    }

    public void GetObjectData(SerializationInfo info, StreamingContext context)
    {
        Console.WriteLine("Serialization via ISerializable.GetObjectData");
        info.AddValue("_radius", _radius);
    }

    private void ConfigureCircleFromRadius(double radius)
    {
        _radius = radius;
        _circumference = 2 * 3.14 * radius;
        _area = 3.14 * radius * radius;
    }

    public double Circumference { get {return _circumference;} }
    public double Radius { get {return _radius;} }
    public double Area   { get {return _area;} }

    private double _radius;
    private double _area;
    private double _circumference;
}

A version of SerialCircle that uses default serialization serializes the values of each member variable to the stream. However, the _area and _circumference members can be calculated based on the value of _radius. The SerialCircle class implements the ISerializable interface so that it can control which class members are serialized—it serializes only the _radius member, and calculates that values for other members when deserialized.

The GetObjectData function is used to serialize the object, and the specialized constructor is used to deserialize the object. The constructor and GetObjectData are passed the same parameters: an instance of the SerializationInfo class and an instance of the StreamingContext structure.

The framework calls GetObjectData to notify an object that a serialization is in progress. The object is expected to serialize itself into the SerializationInfo object that is passed as a parameter to GetObjectData:

public void GetObjectData(SerializationInfo info, StreamingContext context)
{
    info.AddValue("_radius", _radius);
}

SerializationInfo is a final class that holds the serialized representation of an object. During serialization, an instance of this class is populated with data about the serialized object using the AddInfo function. During deserialization, an instance of SerializationInfo is used to construct a new instance of the serialized class, by calling one of the GetXxxx functions to extract data from the SerializationInfo object:

private SerialCircle(SerializationInfo info, StreamingContext context)
{
    double radius = info.GetDouble("_radius");
}

The SerializationInfo.AddInfo member function is overloaded to provide versions for any type that you can serialize. The AddInfo method is used to create a name-value pair that is serialized to the stream. During deserialization the name is used to retrieve the value, using one of the GetXxxx methods, where Xxxx is replaced by the type to be recovered. In the example above, GetDouble is used to return a double; there are similar versions for Strings, integers, and all other .NET types.

Using the StreamingContext Structure

The StreamingContext structure is used to indicate how a serialized object will be used. Classes that implement ISerializable may optionally use this information to determine which fields are relevant for serialization. For example, some objects may use a more compact representation when serialized to disk, with the intention of recreating internal structures when the object is recovered. When cloned into a memory stream in the same process—perhaps for cloning a new object, the serialization policy may preserve internal structures for runtime efficiency. Will every class need to use this sort of advanced serialization Kung Fu? No, but it’s there if you need it.

There are two properties that are exposed by the StreamingContext structure:

State is a value from the ContextStreamingStates enumeration, which is discussed below. This is the property gives you a hint about the reason for the serialization request.
Context an object that is associated with this instance of StreamingContext. This value is generally not used unless you have associated an interesting value with the StreamingContext as part of the serialization process.

ContextStreamingStates is an enumeration that provides a clue about the type of serialization that is occurring. This information is sometimes useful—for example, when a client is on a remote machine, information about process handles should be serialized completely, but if the serialization is occurring within a process, a reference to the handle may be sufficient. The enumeration values for ContextStreamingStates are shown in the table below.

Value	Meaning
All	The serialized data may be used or sent from any context
Clone	The serialized data targets the same process
CrossAppDomain	The serialized data is for a different AppDomain
CrossMachine	The serialized data is for a different computer
CrossProcess	The serialized data is for a different process on the current computer
File	The serialized data is read or written to a file
Other	The context is not known
Persistence	The serialized data is stored in a database, file, or other persistent store
Remoting	The serialized data is for a remote context, which may be a different computer

Of the possible values for ContextStreamingStates, you should pay special attention to the File and Persistence states. These two values indicate that the object is being deserialized into a stream that may be long-lived, and is likely to require special handling. For example, the object may be deserialized days, weeks or years from now—serializing values that are short-lived may not be required.

In the SerialCircle example, ISerializable is implemented in order to remove two fields from the serialization stream; however, you could just as easily add information into the stream, such as authentication hints or optimization instructions. Once you take control of the serialization process for your class, you can manage serialization however you see fit.

When is Deserialization Complete?

Serializing simple objects that have no dependencies on other objects is a simple matter; even a computer book author can do it. In real life, objects are often serialized together, with some objects in the serialization stream depending on other objects. This presents a problem, as during deserialization there is no guarantee on the order that specific objects are reconstituted. If you find that instances of your class depend on other objects that are being deserialized, you can receive a notification when all deserialization is complete by implementing the IDeserializationCallback interface.

IDeserializationCallback has one method: OnDeserialization. This method is implemented by serializable classes, and is invoked by the framework when all objects have been deserialized. Continuing with the SerialCircle example presented earlier, initialization of the circle can be deferred until deserialization is complete by waiting until OnDeserialization is called:

private SerialCircle(SerializationInfo info, StreamingContext context)
{
    _radius = info.GetDouble("_radius");
}

public void OnDeserialization(Object sender)
{
    ConfigureCircleFromRadius(_radius);
}

In the code fragment above, we have changed the deserialization constructor so that it only initializes the _radius member variable. When the framework invokes OnDeserialization through the IDeserializationCallback interface, the initialization of the object is completed by calling ConfigureCircleFromRadius. The SerialCircle project included with this article includes the OnDeserialization code.

A Word About Final Classes

The .NET framework allows classes such as SerializationInfo and SerializableAttribute to be declared as final, meaning that they cannot be subclassed. Although the framework uses the term final, as in “this is the final form of this type,” each language in the runtime seems to use a different term:

Visual Basic programmers use the NotInheritable keyword
C# programmers use the sealed keyword
If you’re using the managed C++ compiler, look for __sealed
Eiffel users have the frozen keyword

That’s all of the languages I know about, but if you’re aware of more, send them to me at mw@codevtech.com, and I’ll add them to my .NET Rosetta Stone.

Next Column

The next column will discuss memory management in .NET, and how it differs from the memory management you’re probably accustomed to. We’ll discuss the garbage collector, finalizers, and how to write efficient code in a GC environment.

About the Author

Mickey Williams is the founder of Codev Technologies, a provider of tools and consulting for Windows Developers. He is also on the staff at .NET Experts (www.dotnetexperts.com), where he teaches the .NET Framework course. He has spoken at conferences in the USA and Europe, and has written eight books on Windows programming. Mickey can be reached at mw@codevtech.com.