A Simple Approach to Using Microsoft Azure Table Storage

Introduction

Microsoft Azure has several different storage options. These include blobs, for storing files, queues for passing messages around, and tables for highly scalable, hierarchical data. Microsoft Azure also provides relational data storage, using traditional mechanics with SQL Azure.

In this article we are going to cover a simple way to start using Azure tables for easy data storage.

Azure Tables store data in the cloud for us, making that data available to applications running in Azure, or applications running anywhere else. Azure Tables are built on the REST protocol, which makes it easy for any platform to connect and use the data in your Table. Most .NET developers will use the Client Storage Library that is provided with the Microsoft Azure SDK.

Before we get too far along, let's briefly look at how tables work. When you create an Azure Subscription account you will be able to log in to your Azure portal, and be able to create a storage account. A storage account is a storage container attached to your account. You can hold several types of storage within one storage account.

Tables can be easily created and destroyed in the cloud. This is one key advantage over traditional databases. You won't have any delays or overhead in creating the tables you need. This also makes it easy for your application to self-provision itself when it starts up. During startup you can have it look to see if the tables it needs are created, and if not, create them on the fly, potentially even deploying default data to that table.

Each table is really a collection of entities, which are basically the same as the entity objects you might be using in your application. Each entity has its own set of properties and values. This means that each entity can have its own schema. This is great news; this gives you a great degree of flexibility during both design time and run time. This is the biggest difference from traditional relational data servers (such as SQL Server) you will probably run into.

We are going to build an Azure table, and then some code to work with it. While the table can run locally in our local development storage fabric, we will eventually move it to the cloud so it can run in production. The code we are going to write will be a simple command line program, so we can focus on the core code needed to interact with the table. You will notice that while we have one line of code to create the table itself, we never run any code that creates columns, stored procedures, or indexes of any kind.

We are going to build a small application that tracks parking tickets issued by a parking lot attendant. We are going to start with our entity class, the class in our code that contains the plain old data our system will need to track for each ticket. Here is our definition.

  public class ParkingTicket 
    {
        public string LotID { get; set; }
        public string TicketID { get; set; }
        public DateTime DateIssued { get; set; }
        public int AttendentID { get; set; }
        public string CarTagNumber { get; set; }
        public string FineAmount { get; set; }
        public string CauseForTicket { get; set; }
    }

This is a simple entity class. Its only job will be to hold the data that we need to work with for a parking ticket. It has several properties, each defined as autoproperities in C# programming. This means that the compiler will create and manage the hidden private fields we need to back these public properties, a common short cut in defining properties in C#.

This entity is what we will store in our Azure Table. The table will have a property for each property of this entity. Azure Tables all have to have three specific properties in each entity. These required properties help Azure scale and manage your data. They are as follows:

  1. PartitionKey - The scale group the entity belongs to.
  2. RowKey - The unique id for this entity (when combined with the PartitionKey).
  3. DateTimeStamp - The last time this entity was updated.

The PartitionKey and the RowKey, when combined logically, create a sort of composite primary key for the entity. The PartitionKey is used by itself to help Azure scale your data. All of the entities in your table with the same PartitionKey will be managed together as a group. These partitions are used to dynamically scale up the system.

As an example, let's pretend your table has millions of rows, and thousands of partitions. As the load on your system increases, some of those partitions are going to carry a greater load than other partitions. Perhaps, in our sample, there is a lot of load on a particular parking lot. As that partition becomes busy, or heats up, it is moved by itself, to a separate storage server. This gives the partition more available hardware (CPU and memory) to respond to queries, resulting in better performance. As the load on partitions change, they are moved around the storage servers so that they always have enough hardware to respond to their queries. As a partition cools off, it collapses back down to the old server with the other cold partitions.

A PartitionKey is any string that you want to pick. The trick with partitions is to choose a strategy for your key that will match the vector of scale and common queries. In our example we are going to put the parking lot id in the PartitionKey property, since we are likely to want to group by the parking lot the ticket was written in.

The RowKey is any string you want to assign to it. When combined, the PartitionKey and the RowKey must be unique in your entire Azure Table. We don't want to hard code these three specific properties into our entity class because they have nothing to do with our class. This would be mingling our data, and create some hard coupling between our class and the use of Azure Tables. To get around this our entity class needs to inherit from TableServiceEntity. To upgrade our code we need to reference in two Azure assemblies, Microsoft.WindowsAzure.ServiceRuntime and Microsoft.WindowsAzure.StorageClient. Add some usings to your code, and then update your class to inherit from TableServiceEntity.

We are then going to update our definitions for LotID and TicketID so they are tracked as the PartitionKey and RowKey respectively. We do this by updating their set methods. Because we are doing something special we can't use autoproperties for these two properties.

  public class ParkingTicket : TableServiceEntity
    {
        private string _lotID;
        public string LotID
        {
            get { return _lotID; }
            set
            {
                _lotID = value;
                PartitionKey = value.ToLower();
            }
        }

        private string _ticketID;
        public string TicketID
        {
            get { return _ticketID; }
            set
            {
                _ticketID = value;
                RowKey = value.ToLower();
            }
        }

        public DateTime DateIssued { get; set; }
        public int AttendentID { get; set; }
        public string CarTagNumber { get; set; }
        public string FineAmount { get; set; }
        public string CauseForTicket { get; set; }
    }

After making these changes our entity class is ready to be stored in an Azure Table. Notice we haven't hard coded what the name of the table is, or how to connect to it. The entity class only represents the data, not the behavior or configuration.



A Simple Approach to Using Microsoft Azure Table Storage

We will need to create a 'context' class that will understand where to put the data, and how to work with it. The Client Storage Library that is provided with the Azure SDK implements the WCF Data Service context model for you. You just need to inherit from it and provide some specifics. This context class understands everything it needs to know (beyond connection credentials of course) on how to interact with your data in an Azure Table. If you ever needed to move your data to another storage mechanism you could easily swap out this context class for something else you and you would be quickly rewired.

To get this done we will need to add another class to our solution called ParkingTicketContext. The class will need to inherit from TableServiceContext. Most of the work is done for us by the base class. We just need to provide some basic info to wire it up.

  public class ParkingTicketContext : TableServiceContext
    {
        private string tableName = ConfigurationManager.AppSettings["ticketTableName"];

        public ParkingTicketContext(string baseAddress, 
            StorageCredentials credentials) : 
            base(baseAddress, credentials) { }

        public IQueryable<ParkingTicket> ParkingTicket
        {
            get
            {
                return this.CreateQuery<ParkingTicket>(tableName);
            }
        }
    }

The first thing we do is grab the table name we should store tickets in from our app.config file. We also need to provide a valid constructor for the class that takes in the credentials needed to connect to the cloud. The last part is a small method that will provide a generic query interface. When using WCF Data Services you can easily query your data using LINQ.

We now have an entity class that holds our data, and a context class that knows how to talk to the data source, the Azure Table. Now we need a service class that will hold our behavior when working with these parking ticket entities.

This service class is a normal class; we don't have to inherit from anything special. It will use configuration to grab our Azure credentials, and the name of the table we want to store tickets in. We will also declare an instance of the context class we created above. The service class will provide a lot of services for us. We will work exclusively through this service class when we want to interact with parking tickets, especially when we want to read, write, and query the ticket table. Please don't confuse the use of the term service with this class and a web service.

To start with, we will add a method that will return all of the tickets in the table. This will execute a simple LINQ query.

public static IEnumerable<ParkingTicket> GetAllTickets()
        {
            try
            {
                var tickets = (from item in _ParkingTicketContext.ParkingTicket
                               select item);

                return tickets.ToList();
            }
            catch (Exception)
            {
                return null;
            }
        }

If we want to return just a single ticket, perhaps retrieved by the ticketID we just change our LINQ expression. In this case we will use the First() projection on this query so that the expression returns a single object instead of a collection with only one object in it.

public static ParkingTicket GetParkingTicketByID(string TicketID)

        {
            try
            {
                var theParkingTicket = (from item in _ParkingTicketContext.ParkingTicket
                                        where item.RowKey == TicketID.ToLower()
                                        select item).First();

                return theParkingTicket;
            }
            catch (Exception)
            {

                return null;
            }
        }

Saving objects to the table is quite easy using the context class and WCF Data Services. We simply need to provide an object. We can create a new object like this:

ParkingTicket aParkingTicket = new ParkingTicket
       {
        CarTagNumber = "SAMPLE",
        AttendentID = 31415,
        CauseForTicket = "Slippery Parking",
        DateIssued = DateTime.Now,
        FineAmount = "134.50",
        LotID = "15",
        TicketID = System.Guid.NewGuid().ToString()
       };
To save it we need to add it to the context, and then call the save method. There are a lot options for handling advanced complexity, which should be saved for a dedicated article on WCF Data Services. In our simple case we will just save the object. The context class will hold onto the object in memory until we call the SaveChanges() method. We can add and remove objects through the context class several times before we call SaveChanges().

_ParkingTicketContext.AddObject(ticketsTableName, aParkingTicket);
_ParkingTicketContext.AddObject(ticketsTableName, aSecondParkingTicket);

       _ParkingTicketContext.SaveChanges();

Hopefully we have shown you how easy it is to store and retrieve data using Azure Tables. We should probably spend some time looking at some of the limits Azure Tables have, and how they might differ from a traditional data store.

Azure Tables can hold up to a 100TB in data! That is a lot of data. While I have worked with some very large databases, it usually takes a lot of money to buy the hardware, and to hire the right people to manage such a sophisticated infrastructure. To simply be able to store so much data without worrying about those issues is just wonderful.

Tables do not have any relationships at all. There aren't any foreign keys, joins, or anything. There is no way to query across tables on the server side. You could write a LINQ query that accesses multiple tables, but the joining would be executed by your local code, not on the server. While transactions of the sort you might be familiar with, they do exist if you restrict the scope of the transaction to a single data partition in the table.

Azure Tables are a lot cheaper than running local data servers, or running SQL Azure. You will pay Azure like all of the other storage options, $0.15/GB of storage, any bandwidth used, plus $0.01 for every 10,000 reads or writes to the data service. These charges are very small when you look at the type of data you are working with in a table, versus storing large videos and images in Azure Blob storage.

The magic of the Client Storage Library API in the Windows Azure SDK is made possible through the use of WCF Data Services. WCF Data Services sit on top of a new open protocol called OData. This protocol is gaining popularity, making it easy to consume any data with commands based on REST. Microsoft is investing heavily in this technology.

The code we wrote would work to access an Azure Table from code running in the cloud, running on a server in your data center, or on every desktop in your company. This is thanks to OData, and REST. This makes it easy for you to deploy data separately from your applications.





About the Author

Brian Prince

Brian H. Prince is an Architect Evangelist with Microsoft focused on building and educating the architect community in his district. Prior to joining Microsoft in March 2008, he was a Senior Director, Technology Strategy for a major mid-west partner.

Further, he is a co-founder of the non-profit organization CodeMash (www.codemash.org). He speaks at various regional and national technology events including TechEd.

Brian holds a Bachelor of Arts degree in Computer Science and Physics from Capital University, Columbus, Ohio. He is also an avid gamer.

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • On-demand Event Event Date: December 18, 2014 The Internet of Things (IoT) incorporates physical devices into business processes using predictive analytics. While it relies heavily on existing Internet technologies, it differs by including physical devices, specialized protocols, physical analytics, and a unique partner network. To capture the real business value of IoT, the industry must move beyond customized projects to general patterns and platforms. Check out this webcast and join industry experts as …

  • On-demand Event Event Date: October 29, 2014 It's well understood how critical version control is for code. However, its importance to DevOps isn't always recognized. The 2014 DevOps Survey of Practice shows that one of the key predictors of DevOps success is putting all production environment artifacts into version control. In this webcast, Gene Kim discusses these survey findings and shares woeful tales of artifact management gone wrong! Gene also shares examples of how high-performing DevOps …

Most Popular Programming Stories

More for Developers

RSS Feeds