Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
The term 'web scale' has become very popular lately. It is one that is now thrown about willy-nilly, and refers to a site's ability to scale well. We're not talking my blog scale; we are talking Twitter and Facebook scale. If you want your architecture to scale well you are going to have to rely on caching in several places. Microsoft has just released Windows Server AppFabric which has a distributed caching component that can fill this need quite well, and it works well with Microsoft .NET Framework 3.5 and 4.
Windows Server AppFabric is a new server tool that provides two services. The first is workflow and service hosting, in addition to, and on top of IIS and WAS. The other major service is the distributed caching component. The caching component is designed to be your caching tier (which usually resides just on top of your data tier).
Using Cache in your Application
There are many other ways to use cache in your application. You will likely also want to reduce the load on a backend system by reducing the number of repeated queries for read only data. For example, why should you always requery the database for a list of the states to put in a drop down? Why should you always requery the list of products from the database? Neither of the two lists tend to change too often (we haven't added a new state to the US in a while, and most companies have a fairly static list of products) and can be cached for periods of time with safety. This will reduce the load on the backend system, as well as improve the performance of the application and the responsiveness to the user.
I have seen other systems that use cache as a way to bypass or reduce the impact from a slower system. Imagine a doctor's office that needs to download the full patient record from the main office (the hospital) to have it handy when the patient is seen during their appointment. If they waited to download that data it would lead to delays in seeing and helping the patient. It would be much faster if the system could fetch these the night before based on the patient schedule for the day, and have them ready in the local cache before they are needed. This would speed up record retrieval for sure.
Installing Windows Server AppFabric
The cache server for Windows Server AppFabric needs to be installed on each server in your cache tier. You will also need to install the 'client' on each web server calling that cache tier. The easiest way to install Windows Server AppFabric is to use the Web Platform Installer. It will automatically download and install all the parts and any dependencies you may need. It is important to note that AppFabric requires .NET Framework 4 installed to run on the server, but clients can use .NET 3.5 or greater.
All of your web servers will be configured to speak to the cache servers (we discuss exactly how in a few moments). We will write our code using a very standard pattern, and one that you are probably using right now in your application. This pattern is called '
cache aside.' When we need data we will check the cache first. If it has the data, the cache will return it. If it doesn't, we will go to the data source to get the data, and then place it in the cache. This is the primary pattern that
AppFabric Cache was designed to support. You will see this relationship in the following diagram. The web servers talk to both the cache servers and the SQL servers. The cache servers do not talk to the data source themselves. They are ignorant of the data storage system.
When you install the cache server on a server be aware that you don't want to be running anything else on that server. You don't want to install this on your data server, or your web server because AppFabric Cache will grab as much memory as it can, and starve out other applications on the server. Cache Server is not meant to be installed side by side with other applications, it is meant to be installed on its own server. You can install it into a virtual server if you want.
When a client is accessing the cache server it needs several DLLs. There are two ways to install the cache client DLLs. The first is to use the Web Platform Installer on the web servers, and choose to install just the client. The second is to copy the two needed DLLs into your solution with
copy local = true. This will deploy the DLLs with your code, and make for a much easier deployment.
In the diagram above I used two cache servers as an example.
AppFabric Cache can be deployed to just one server. While this is less expensive (uses less hardware) it will not meet the usual high availability requirements many environments require. You can put as many servers into your caching tier as you think you need. They will auto-configure with each other, working out which data needs to be replicated to each server. When they do this they are using several self-healing algorithms Microsoft developed for Microsoft Azure. This means that not every piece of data in the cache will be copied to every server. Instead the Cache engine determines what the optimal layout will be, and then implements it.
The servers use three different ports on the network. The first is the port that will be used to communicate with the clients (the web servers). The second port is used by the cache servers to communicate with each other, to replicate their data. The third port is used as a heartbeat, so that the cache servers can monitor their health and status.