Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
Developers are rapidly adopting Microsoft Windows Azure as their cloud. Because it is based on Windows and .NET, it presents a high degree of compatibility with their existing applications, making it easy to move an application to the cloud.
There are two key phases that an application usually goes through as it is moved to the cloud. The first is changing as little as possible so that it works as-is in Microsoft Azure, just changing what HAS to be changed to make it work. The second phase is to then upgrade elements of the application to take advantage of some of the unique capabilities Microsoft Azure has to offer.
1. Data Migration
Starting from the bottom of the application, we have to contend with where and how we store our data. Most common ASP.NET applications use SQL Server to store data in a relational data model. Regardless of how you code works with this data (Entity Framework, nHibernate, ADO.NET, etc.) you should look at moving your SQL database to SQL Azure. This puts you in a 'near data' scenario, where application responsiveness will remain high.
It is possible for an application in Microsoft Azure to connect and consume data from an on-premises SQL Server, but this creates a 'far data' scenario. In this scenario you will see higher latencies in database access, and this will degrade performance.
SQL Azure was built to have a high degree of compatibility with SQL Server, so migration isn't that hard, especially for most run of the mill databases. You will have to concern yourself with the maximum size a SQL Azure database is allowed to have, which at this time is 50GB. If your database is greater than this size you will need to look at partitioning or sharing your data.
There is an open source tool available, called SQL Azure Migration Wizard, that you can use to both analyze and move your data. It will look at your current schema and point out any elements that are not compatible with SQL Azure, and help you fix them. It can then, using BCP behind the scenes, move your data into the cloud. You can find the tool at http://sqlazuremw.codeplex.com/.
Long term you might look at your data and determine if any of it is not relational in nature, and move it to Windows Azure Table storage.
2. ASP.NET Session State
Many ASP.NET web applications use Session State to track the state of a user's behavior with the application. This could be their profile information, a shopping cart, or where they are in a business process. Session state is a powerful (if not overused) tool to bring the appearance of state to an inherently stateless environment.
Because of many web applications dependencies on session state, their load balancers are configured for 'sticky sessions.' This means that once a user starts using your site, they always visit back to the same server in your server farm, at least during that session. This makes it possible to host the state in the process memory of that particular web server. I don't like this approach because it can lead to an imbalance in the load on the servers over time, and to a reliability blind spot when a server has an issue. If a server goes down, the users 'stuck' to that server go down with it.
Since the load balancers in Microsoft Azure are non-sticky you have to find a way to make the session state that is stored available to each server in you Azure environment. Thankfully the session state mechanism in ASP.NET uses a provider model. This allows you to swap in a new provider to change how session state is stored and managed in your web server farm. You can change which provider you are using by deploying an assembly and changing some settings in your web.config file.
ASP.NET comes with three providers by default. The most common is the in-process memory provider, and it behaves as described above. The second is a state server, which is a server that is dedicated to hosting the state data in its own memory. The other provider is a SQL Server provider, which will store the session state in a SQL Server hosted database. Since each web server can then load the right session state from the database for the user, the user can now bounce around the farm.
While you could use the SQL Server provider in Microsoft Azure, and point it to a SQL Azure database, you will better off using the Windows Azure Storage based provider. With this provider entries are made in an Azure Table that relates to each user, and the application. Because the exact size and shape of the state cannot be known ahead of time, the actual state is stored as a file in an Azure Blob container.
You can download the provider by finding the Windows Azure Platform Training kit. In the kit there is a sample project called
AspProviders. You can either bring this project into your solution, or use an assembly reference to bring the DLL into your project.
Once you have referenced the assembly you will have to make some changes to your web.config. Open your config and look for the <system.web> element. Remove any existing <sessionState> configuration that is already present, and paste in the following. When you do, make sure to put in the real name of your application.
<sessionState mode="Custom" customProvider="TableStorageSessionStateProvider"> <providers> <clear/> <add name="TableStorageSessionStateProvider" type= "Microsoft.Samples.ServiceHosting.AspProviders. TableStorageSessionStateProvider" applicationName="yourWebAppName"/> </providers> </sessionState>
Many web applications rely on using the web.config to store run time configuration. This is a handy and secure place to store your configuration (although you should get in the habit of encrypting sensitive information). An advantage we are used to is the ability to change this configuration while the application is still running (although it causes a restart).
When you deploy your web application to Windows Azure your package is deployed to your web role instance as a read only package. This means none of the files that are in your project can be changed at runtime, and this includes the web.config. This means that if you want to change the web.config you actually have to do a whole new deployment, which isn't very convenient.
The best way to handle this is to either deal with the limitation, or refactor your configuration and store it in the
ServiceConfiguration.cscfg file in your Microsoft Azure project. Data in this configuration can be edited at runtime (which also causes a restart).
Migrating your configuration to the
cscfg file will require three steps. First you will have to define the configuration element in the
ServiceDefinition.csdef file. If we were moving a configuration element that sets the 'maximum money laundering limit' as a business rule, it might look like the following.
<CofigurationSettings> <Setting name="DiagnosticsConnectionString" /> <Setting name="MaxMoneyLaunderingLimit"/> </ConfigurationSettings>
Once you have done this, you can add the setting to your
ServiceConfiguration.cscfg file, as follows. We have set our current limit at $100,000.
<ConfigurationSettings> <Setting name="DiagnosticsConnectionString" value="UseDevelopmentStorage=true" /> <Setting name="MaxMoneyLaunderingLimit" value="100000"/> </ConfigurationSettings>
You will have to go through your code and change how you read your configuration. Hopefully you have abstracted away how your application reads configuration into one class so it is all in one place. In either case you will change the code that reads the configuration to be something like the following.
string s = RoleEnvironment.GetConfigurationSettingValue("MaxMoneyLaunderingLimit");
You can change the contents of the
cscfg file manually through the portal, or you can use the Service Management API to upload a new config. When you do, your applications will be restart with the new configuration and the
RoleEnvironmentChanging event will fire to warn you your configuration has changed.
4. File System Use
Many web applications read and write to the file system. Their use of the file system is usually fairly simple and straight forward. This is probably the least convenient change you might have to make to your code.
Why is this a problem in the cloud? Because the web servers your code is running on are stateless, and could be destroyed and recreated at any time. Because of the stateless nature, the local disk is normally not available for your use.
If you only use the file system as a temporary store, perhaps to store an uploaded file that then is read and loaded into a database, then perhaps the use of 'local storage' would be useful. Local storage is a feature of Microsoft Azure that will assign a piece of the local disk on the server for your full use. You have to configure the local storage in the
csdef file, and then load the path from the
RoleEnvironment object. Local storage is consider volatile, meaning that there is no guarantee that the storage will be there for any specified length of time.
To configure the local storage space you need, add the following to the role configuration in your
csdef file, and provide a name for the space you are requesting. You can request more than one piece of local storage. This is handy if you typically would use several folders in your code.
LocalResource localCache = RoleEnvironment.GetLocalResource("TempUploads"); string localCacheRootDirectory = localCache.RootPath;
Your second option in migrating code that reads and writes directly to the file system is to convert your code to use Azure Blob storage. This will require some code changes, but will move you towards the full use of the cloud that much quicker.
If you can change your code to that degree during the migration, then you can consider your third option, which is the use of the Microsoft Azure Drive. This is a blob that is formatted as an NTFS drive (think of it as a VHD file for virtual PC). This file that is formatted as a drive can then be loaded as a local drive letter on your role instance. An Azure drive can be mounted to multiple server instances only if it is read-only. If you want to be able to write to the drive you will only be able to mount in on one instance.
The pattern is very similar to the local storage pattern. You will configure the drive to be mounted, and then use some code at runtime to determine the physical drive letter and path to the mounted drive. Once you have that path, your normal code will be work as usual.
An important performance boost to the Windows Azure Drive is to make sure you configure some local storage as a cache for the drive. This is built into the drive API, and results in a big improvement in performance.
5. Identity in the Cloud
Many web applications rely on integrated authentication to provide a seamless authentication experience to internal users. When the user visits your site while they are logged into their desktop, their Windows identity is transferred to the web application. They login without knowing it.
This process doesn't work when the server is not in the domain as the users, and not on the same network. When you migrate the application, users will be confused when they now have to log into the application.
The easiest way to bridge your internal user identities with the cloud is to use a concept called Federated Identity. Federated identity is based on open standards, either OAuth or SAML. Using these protocols you can configure your application to trust the identities from users in your domain.
To do this you will need to use the Windows Azure AppFabric ACS service, which is an authentication service in the cloud. You will also need to have a secure token server in your company. Most companies use Windows Server Active Directory Federation Services v2. Once the services are configured, when the user goes to your site, their identity will be transferred from the ADFSv2 server, through ACS, to your application. This results in a seamless login experience for the user, and minimizes the changes to your code.
Once you start using ACS and federated identity, you will be able to federate with trading partners, vendors, and customers. This will make it easy for them to login into your system as well. It is important to note that only the identity you want shared is shared (who they are, groups they belong to). Actual credentials are never shared (such as the password or smartcard).
Microsoft Azure is based on Windows Server, and SQL Azure is based on SQL Server. This makes the Microsoft Azure platform a highly compatible environment. This makes it easier to migrate an application from on-premises to the cloud.
Notice I said easier, not easy. How long it will take to migrate your application will be driven by the architecture and code base of your application. I have worked with many customers who have migrated significant web applications in a matter of weeks. Of course once the migration is done a significant amount of testing must be performed on the new cloud based system to ensure that the system still functions properly.