Debugging in the Cloud - Using Microsoft Azure Diagnostics

Introduction

Azure is Microsoft's cloud computing platform. Microsoft Azure makes it really easy to deploy your application to the cloud, abstracting away all of the platform details from you. You can deploy an application in just minutes, and easily scale it to as many servers as you need.

You use the Azure SDK to build applications with Microsoft Visual Studio 2008 or 2010. When doing so you can easily debug your application locally. The debug experience is just like with a normal ASP.NET application. You can set break points, watches, and start the application locally using F5.

You can't, however, debug in the cloud. It is not allowed, for several reasons. The first is just plain connectivity related. Since instances of your application can be moved around as needed for support and reliability reasons, this would quickly disrupt your debug session. There is also the matter of being able to directly connect to a machine specific IP address, which you can't do since each server is hidden behind firewalls and load balancers.

So what is a developer to do? How would you solve production issues? If the platform is abstracted away, how would an IT Pro detect performance issues? All of these questions can be answered with the Windows Azure Diagnostics system.

Using normal diagnostics in the cloud can be challenging because of the sheer number of servers you will be working with, and how those servers can be created and deleted over time with ease. Since cloud applications can be very dynamic it is hard to know which one server to look at to solve the problem. You also can't remote desktop into a system to get hands on with traditional diagnostic tools. Even if you could, the traditional tools only look at a single machine's information, and in a distributed system, you need to collate the information from a variety of servers.

The diagnostics team's goal was to let you access all of the data and diagnostics systems you are used to, but bring it all together into one cohesive data set to make it easier to work with. This is the job of the Diagnostic Agent. This agent runs on each role instance of your application and is configured with a connection string to your Windows Azure storage account. The process is called MonAgentHost.exe on your Azure servers.

The agent's job is to collect the data you want to a local buffer, and then transfer it to your Azure storage account when you want it. The agent can tap into all of the normal diagnostics engines you normally use. They are:

  • Trace Logs*
  • Diagnostic Infrastructure Logs*
  • IIS Logs*
  • Performance Counters
  • Windows Event Logs
  • IIS Failed Request Logs
  • Crash Dumps
  • Any Arbitrary Files

Windows trace logs, IIS logs, and the diagnostic infrastructure logs are all enabled by default. Some data sources are stored in a table (such as the performance counter data), and some are transferred to blob storage as files (such as the IIS logs). The agent is started by default in the OnStart method of your role. If not changed it will start with the above default configuration, including a defined 4GB buffer for diagnostic data. As the buffer fills up the oldest data will be aged out. With this default configuration, all of your normal trace calls in your code will still work. They will be caught and captured by the Azure Diagnostics trace listener that is enabled by default in the Azure project templates.

Each diagnostic data type is configured separately. To change the configuration you need to get a copy of the default configuration with the following line:

var newConfig = DiagnosticMonitor.GetDefaultInitialConfiguration();

Once you have done this you can start making changes to the configuration. One easy change you can make is to tell the agent to start collecting some performance monitor counters for you. In this case we will collect the CPU idle time to see how much CPU load we are putting on the server. We will configure it to sample the performance counter once every five seconds.

             newConfig.PerformanceCounters.DataSources.Add(
             new PerformanceCounterConfiguration()
             {
                 CounterSpecifier = @"\Processor(*)\% Idle Time",
                 SampleRate = TimeSpan.FromSeconds(5.0)
             });

While we are only adding one new piece of configuration, you can add all the changes to the configuration you want in one go. We also need to tell the monitor agent when we want this data transferred to our storage account. Once the data is moved to the storage account it will be up to use to use it, and manage it the way we want. We can let it pile up forever, or we can download it to our local workstation and delete it from the cloud.

newConfig.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(1.0);

In our example we have set the transfer to happen every minute, which is ok in a development environment, but is probably a little too aggressive in a production situation. In production I might set it to transfer once an hour, but the interval would really depend on how critical it was that I have the data. You can stop the scheduled transfer for that diagnostic element by setting the interval to zero.

When you have made all of your changes you need to restart the monitoring agent with the Start method.

DiagnosticMonitor.Start("DiagnosticsConnectionString", newConfig);

This method will restart the agent on the role instance with your new configuration. Our sample has covered how you can change how the agent works from within the role instance it is running in, perhaps in response to an issue detected in your code.

After a few minutes the diagnostics agent will start transferring the buffered performance monitor data to a table in your storage account. It will contain several columns. These would include the role name, the instance name, the time of the check, the counter name, and the counter value. A sample of the output data is displayed below. Because we used "\Processor(*)" we get an entry for the total value, and an entry for each processor on the machine.

You can use very similar code to make changes from outside the instance. You simply use the class DeploymentDiagnosticManager to get references to the instances agents, and then set their configuration like shown above. When doing this you can control the configuration of a single instance, or all the instances of a single role, and work with any role in your service account.

By using the diagnostics agent we can gather all of the diagnostic data we would normally want to access, but it is gathered and assembled for us into one central place, making it easy for us to see a complete picture for our Azure environment.

The common uses for the diagnostics system can vary widely. The obvious use is to track what might be going wrong with an application, trying to figure out where the issues and bottlenecks are. You could also use the system to gather any file you generate locally, perhaps an audit file that tracks all user interaction. You might use the data to track the use of the system, and track historical data. This can be useful to plan for future capacity allocation in the cloud.

We have taken a quick look at the Microsoft Azure Diagnostics system, which is handy for accessing any information that the normal channels for diagnostic data available on a Windows Server. The system is geared to help you with the challenges of gathering and analyzing the diagnostic data you need from a variety of machines in a distributed system. Once you get this simple sample working you can work on some of the more advanced features like an on demand transfer, and the ability to transfer any file on the role instance.



About the Author

Brian Prince

Brian H. Prince is an Architect Evangelist with Microsoft focused on building and educating the architect community in his district. Prior to joining Microsoft in March 2008, he was a Senior Director, Technology Strategy for a major mid-west partner.

Further, he is a co-founder of the non-profit organization CodeMash (www.codemash.org). He speaks at various regional and national technology events including TechEd.

Brian holds a Bachelor of Arts degree in Computer Science and Physics from Capital University, Columbus, Ohio. He is also an avid gamer.

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: September 16, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you starting an on-premise-to-cloud data migration project? Have you thought about how much space you might need for your online platform or how to handle data that might be related to users who no longer exist? If these questions or any other concerns have been plaguing you about your migration project, check out this eSeminar. Join our speakers Betsy Bilhorn, VP, Product Management at Scribe, Mike Virnig, PowerSucess Manager and Michele …

  • This ESG study by Mark Peters evaluated a common industry-standard disk VTl deduplication system (with 15:1 reduction ratio) versus a tape library with LTO-5, drives with full nightly backups, over a five-year period.  The scenarios included replicated systems and offsite tape vaults.  In all circumstances, the TCO for VTL with deduplication ranged from about 2 to 4 times more expensive than the LTO-5 tape library TCO. The paper shares recent ESG research and lots more. 

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds