Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
Welcome to the Renaissance, version 2.0. There are many new possibilities open to developers today. The breadth of technologies and platforms that have become available in only the past few years have created an environment of unprecedented innovation and creativity. Today, we are able to build applications that we could not have dreamed of only a few years ago. We are seeing the birth of a second renaissance period, one that will make the dot-com boom of the late nineties look like a practice lap.
Part of this Renaissance is caused by the move towards service oriented architecture, and the availability of cloud computing. Suddenly, we are able to create applications, with sophisticated capabilities, without having to write the entire infrastructure by hand, or pay for it all out of pocket. Skype is a very popular application that allows its users to make phone calls over the Internet. Imagine how you would build Skype today. You would not have to figure out how to talk across firewalls, or how to encrypt the data properly. You wouldn't have to build the entire user credential infrastructure. You would only have to set out with a grand idea, and write only the code that is important to your idea. Skype isn't about traversing firewalls, or a new way to manage user names and passwords. It's about people communicating easily across the Internet. The entire infrastructure that isn't strategic to you is left to someone else. This frees up an amazing amount of power and productivity. This leads us to this new Renaissance.
Microsoft Azure, The Cloud Operating System
What Is This Cloud You Speak Of?
In general, the term 'cloud' refers to delivering a service of value to a consumer over the Internet. These services are typically infrastructure or platform oriented in nature. Many of these services can play a role in the application you are building today. For example, if you are building a photo album application, you may not have the funds or expertise to build a large file storage farm to store all of the images. This would involve buying a lot of hard disks, SAN arrays, backup systems, as well as hiring smart people to run the infrastructure. Instead of doing all of that, you can simply buy some storage in the cloud, from a storage provider. Your application can still store photos, but you aren't bogged down with the details, and aren't limited to the amount of cash you have to buy the required infrastructure.
Later, when your application becomes immensely popular, you only need to call up the cloud, and ask for more space. The term 'Internet scale' is used a lot in conjunction with cloud services. This means that the service you are relying on can scale to a great degree, hosting thousands, if not millions, of users. It is important when you are building your own applications that the services you rely on can scale as high as you need them.
There are many cloud platform vendors that want to convince you that your entire application should run in the cloud. This is possible, and in many cases is, a very realistic approach. Unfortunately, it doesn't meet everyone's needs. In order to deliver the value and experience that you want to deliver to your users, you may need the flexibility of local software, which takes advantage of some services in the cloud.
If the developer hosts their application with a traditional hoster they can define all of the server's specifications. They can configure them to their liking, deploying specific patches, and configurations. When you are using a cloud service, these details (and work) are abstracted from you. When you are using that storage in the cloud for your photo album, you have no idea what hard disk brand they are using, how they perform backups, or how they are achieving their geographic redundancy and scale. You shouldn't care about those details, just like you don't care how your cable provider gets the signal from the SyFy channel to your house on Friday night. The details are abstracted from you by the service provider.
What is A Cloud Operating System?
You use an operating system in front of you every day. A million years ago (in Internet years) when the personal computer was first making its way into every home, a developer had to care about all of the pieces of the PC. They had to take into account the different CPUs, different sound cards, and drive capabilities. They had to provide their own way of printing, and of storing user data.
Over the years, the operating system evolved to abstract away these details from the developer (and the user), so the developer could focus on what they wanted to write, and bypass the entire underlying infrastructure. Today, you don't usually care what sound card, or printer the user may have. The operating system deals with it.
The operating system takes care of a lot of details for us. It manages a shared pool of resources, and allocates those resources across the different applications running on it. It also manages authorization and access control to those resources.
As you move your applications into the cloud, you will be faced with similar concerns. Azure's job is to provide that layer of abstraction for you. Not just on one PC, but over hundreds of thousands of servers, in many different locations. Azure abstracts all of this away, and presents it to you as a seamless 'fabric'. When you deploy an application to Azure, you have no idea which server it is running on, which nodes, or which data centers. You simply deploy your application, with a configuration model. Azure manages all of the details. It configures a virtual LAN for your application, assigns cores from available CPUs, deploys lightweight virtual servers to run your application, automatically configures DNS and network load balancers to distribute the workload, and wires all of this up to monitoring systems to make sure everything is running smoothly. All you have to do is hit 'go.'
When Microsoft set out to build the Azure Services Platform, they had to take into account all of the possibilities a developer might face. Azure had to have these qualities:
- Easy to develop with
- Support open standards and protocols
- Allow the developer to use the existing developer tools
- Provide 'Internet scale'
- Easy enough to handle simple scenarios
- Sophisticated enough for the complexity of modern systems
- Reduce the friction in migrating to the cloud
Understanding the Microsoft Azure Services Platform
The Azure Services Platform encompasses all of the cloud services that Microsoft is offering. The platform includes Azure at its foundation, and builds on that with a series of building block services.
FIGURE 1 The Microsoft Azure operating system, and the building block services that it supports
Figure 1 represents the Azure Services Platform. Azure, the operating system for the cloud, sits at the bottom of this diagram. It provides the foundation for everything above it, the operating system layer of abstraction we discussed above. Azure has been built so you could write your own applications and deploy them to cloud. It provides all of the capabilities you would need. Microsoft is so sure that you can run your applications on it that they have designed many of their own services and applications to run on it.
NOTE: Azure's code word during development was 'Red Dog.' Some of the team wore red sneakers emblazoned with the code word during PDC2008, when Azure was first released as a CTP to the public.
In order for Azure to be able to provide the level or reliability and scale that is demanded, Microsoft has shifted to building data centers built with shipping containers. Each container comes prebuilt from the server vendor. It is pre-wired with about 2,500 servers, and includes everything it needs. The container is dropped off by a truck, and plugged into power, cooling, and network connections. The Chicago data center has around 360,000 servers. The data center located in Dublin, Ireland is part of their next generation of data centers,stores the containers outside, inside a walled off compound, without a roof. This will save on air conditioning, and power demands. Of course, they will have to hire grounds keepers to mow the data center, which is an unusual budget item for a data center to have.
The Components of Microsoft Azure
When your code is deployed, and is executed on Azure, it is contained within a role. There are currently two roles available to you, with more in development. The role is responsible for executing your code, and providing an implementation of the Azure APIs that you may be referencing.
The two current roles available are the Web Role, and the Worker Role. These are the two initial roles, because in many systems, you will have logical front end and back end components. You can scale your system by defining how many instances of a role should be provisioned by the Azure fabric. A simple system can consist of one role acting alone, while a complex system might have many roles, each running different pieces of code, all acting together.
The Azure fabric allows you to dynamically define how many instances of a role are running. This allows you an elastic computation resource. As demand on your system grows (perhaps because your sales force signed a big new customer or it's a busy weekend), you simply tell Azure to run more instances of your roles. This scales up the availability of your system. It would take months of work and planning in a traditional IT center to respond to this sudden growth. Azure makes it just as easy to scale down, when that big weekend event is over. You simply reduce the number of instances of your roles, and your Azure bill gets smaller. You don't have to worry about a bunch of expensive hardware lying around, being underutilized.
You can change your instance allocation to each role in several ways. The first way is to change the configuration through the portal. You can also use the service management API to change the number of instances through code. This management code can be run from within Azure, as part of a role, or from outside Azure as part of a management tool.
Let's look at an example. Imagine you are running a company that provides background checks on prospective employees, and you have a few small clients. Your customers log into your website, fill out a form for a request for a background check for a person, and submit payment. Their request is packaged up, and dropped into the pending background check queue. The backend process then picks up the request, makes a series of outbound calls to databases you own, and perhaps third party systems such as government and law enforcement agencies. Once the data is collected, the processor builds a report, and drops it into a shared storage area. The web site sees this new report, and displays it in a list to the user. The user can then view the report, or download it to help make a hiring decision. You might build your system similar to what is shown in figure 2.
FIGURE 2 - A simple Azure application
The Roles in Microsoft Azure
A Web Role is designed to run an ASP.NET application, running managed code on .NET 3.5 SP1 inside of IIS7. It is expected that this role will provide the front end capabilities for your Azure based system, although it is not limited to only front end type work. The Web Role is designed to communicate with HTTP or HTTPS, and supports a subset of ASP.NET and WCF capabilities.
A Worker Role is similar in many regards. This role is not running IIS like the web role because it is intended to be the backend processor for your service. The Worker Role can make outbound connections as needed, perhaps to other services, or to your other enterprise systems. The worker role can host WCF services, and make these available to the Internet or only to your application in Azure.
What roles your service uses, and how they are provisioned is controlled by the Service Definition. This is a lot like a configuration file, and it tells Azure what roles you have, how many instances of each role should be running, and other metadata about your service.
For the most part, an existing ASP.NET application should port over easily. There are some parts of the .NET framework that will not work in an Azure environment. For example, an application writing directly to the disk will need to be modified. Since Azure abstracts away the file system (how do you know which disk on which server to access, when your code could be running any server in the fabric), the Azure SDK provides alternatives.
In this example, the application could be changed to leverage Azure storage. Azure has BLOBs, tables, and queues. You could also create a small writable space in isolated storage. Most of the restrictions are made because of security limitations.
Storing Data In The Storage Fabric
As mentioned above, there are currently three pieces in the storage system of Windows Azure. There are BLOBs, tables, and queues. There are the fundamental pieces of storage that most system will need. All data stored in the storage system are stored in three replicas. This is to ensure the proper level of reliability and scalability.
The first storage mechanism is the BLOB. BLOB means binary large object, and many developers are familiar with them from using them to store large objects in a database, for example a photo of a user. While working with BLOBs in a database can be challenging at times, the API for BLOBs in Azure is simple to use. Each BLOB is given a name, and is stored in the fabric, available to all of your running instances.
The Azure storage system also provides a capability called 'tables.' Do not confuse these with traditional database tables. A single table can hold up to 100TB of data. Each table holds entities, and each entity can have its own schema. Each entity must have two properties, Partition Key and Row ID.
The partition key is used to break you data into partitions, which are used as units of scale. As partitions heat up (by receiving a lot of queries) they can be moved to other storage servers to better respond. As they cool down they can be consolidated again. The row id and the partition key properties act together as a sort of a composite primary key in your table.
Each entity can have different properties, even in the same table. This gives you a great degree of flexibility in working with your data. Azure tables are meant to be extremely fast to use, and can be accessed with REST or the storage client in the Azure SDK.
The last piece of Azure storage is the queue. Queues are often used to help different systems communicate in a loosely coupled manner. Architecting your system in this manner de-couples the front end from the back end. With this structure the backend doesn't care how something was placed into the queue, only that it needs to process that work, and move onto the next work item.
A queue is a simple data structure that allows for the storing of messages from one system to another. These messages are stored specifically in a FIFO manner, ie. First in-first out. A good analogy would be that these messages are like emails in your inbox, but you are only allowed to read them in the order they were received. Each queue can hold an unlimited number of messages, but each message is limited to 8KB in size. A common approach, like our scenario above, is to store the real piece of data to be worked on in a BLOB, and send a work ticket to the Worker role via the queue.
A queue provides a very easy mechanism for multiple instances of your worker roles to work together to process all of the requests made to the system. As each worker role is able to process the next request, it retrieves the next message in the queue. The Azure queue then marks that message as invisible, but does not delete it. Once the worker role is done processing that message, it tells the queue that it was successful, and the queue then deletes that message. By marking the message invisible, we can avoid the same message being processed multiple times by different role instances. This allows the system to recover the message in case the code processing the message crashes. In this manner, the queue will never lose a message.
It is important to note that interacting with Azure storage is via REST, and doesn't have to come from code running in Azure. You could have an application on-premises that receives requests. That application could place the request into the Azure queue, where an Azure based Worker Role picks up the message, and processes the request.
The next level up in figure 1 contains the building block services of the Azure Services Platform. They are called building block services, because they are not meant to be used directly by end users. They are intended to be used by an application running in the cloud, perhaps on Azure, or by applications that are running elsewhere, and just want to leverage their value.
These services may be used in a piecemeal way, as needed. You may have an on-premises application that only uses a few of them, perhaps the service bus to connect to clients out on the Internet.
SQL Azure will make it easier for you to migrate applications to the cloud, giving you a SQL compatible relational database platform in the cloud. It is highly compatible with SQL Server. Many tools that work with SQL Server will work with SQL Azure. If you are migrating a database you might want to check out the SQL Azure Migration Wizard (on CodePlex at http://sqlazuremw.codeplex.com/). This tool helps you migrate your schema and data to the cloud.
Windows Azure platform AppFabric gives you two services to work with REST services. The first is ACS which is based on an open protocol call OAuth, codeveloped by Microsoft, Google, and Yahoo!. ACS makes it dead simple to secure REST based services with claims based authorization.
The second AppFabric service is the Service Bus that makes it easy for any client to connect with any service, regardless of where it is hosted. You register your cloud based or on-premises based service with the bus, and then your clients can connect to it through the bus in the cloud. This makes it easy for communication to work across NATs, proxies, firewalls, and other network barriers by relaying messages through the cloud.
Wrapping up the cloud
So an application doesn't have to be all in the cloud. Applications will evolve to be hybrid in nature, combining the best of on-premises with the cloud to provide increasing value to the users and the business. By following a hybrid approach you can easily adapt the cloud features that make sense to use with your existing applications. Windows Azure makes this easy since the platform stretches from the phone, to the desktop, to the server, and now the cloud. It is easy to take your existing skills and tools, and start writing applications that use the cloud. Don't be afraid. Login and try deploying an application today.