Many Web applications provide a search capability that allows users to search all the content of a Web application. This seems at first glance to be a difficult task. How can you search all the files in your Web application and at the same provide robust search capabilities most users have come to take for granted? For example, you can search for any combination of words with AND or OR combination, search for phrases, or search for partial word matches. Building such a search engine is not a small task. There are many custom solutions out there that create their own search algorithms. They are either cheap but provide only limited search capabilities or they are expensive.
But, there is no need to build your own solution or to buy an expensive one. Microsoft provides a solution to this problem: Microsoft Indexing Server. The Microsoft Indexing Server comes as part of Windows 2000, Windows XP, and Windows 2003 and does not require any additional licensing. This article will explain the capabilities of the Indexing Server. It will also walk through the code about how you can use the Indexing Server to provide search capabilities right from within your application. It is important to understand that the Indexing Server indexes files on the files system. It is not a Web spider that can walk your Web site and find all linked pages and index them. Web spiders are used by the search engines such as Google, MSN, Yahoo, and so forth to index Web pages. But, Indexing Server can be pointed to a local Web site, get the physical path where the files for this Web site are located, and then index those files and also store the virtual path of those files. This way, you can still index the files of your local Web application and know the URL for each file.
How to Install Microsoft Indexing Server
Windows 2000 comes with MS Indexing Server 2.0, and Windows XP and Windows 2003 come with MS Indexing Server 3.0. This article will concentrate on MS Indexing Server 3.0, although almost everything applies to the previous version. MS Indexing Server is a separate Windows component that needs to be installed. Go to "Add or Remove Programs" in your Control Panel and select "Add/Remove Windows Components." Make sure that the component "Indexing Service" is installed. It is important to install this component after IIS has been installed. IIS has an Indexing Service extension, which will only get installed if the Indexing Service is installed after IIS. If this component does not show up, then uninstall the "Indexing Service" component and then reinstall it again. Afterwards, you will see the "Indexing Service" extension also in IIS. More info to this issue can be found here.
How to Start the Indexing Service
You can configure the Indexing Server through the "Computer Management." You find under the entry "Services and Applications" in the left side pane an entry called "Indexing Service." Right-click on "Indexing Service" and select Start from the popup menu. The first time you do this, it will ask you whether you want to start the service when the computer starts up. Answer with yes, which will set the "Indexing Service" to automatic startup. Through here you can also stop, pause, and resume the Indexing Service. When you expand the "Indexing Service" entry with the plus sign then you can see a list of catalogs. A catalog is a group of folders which gets indexed for search. The Indexing Server comes out of the box with two catalogs, but you can create custom catalogs as needed.
How the System Catalog Is Used
The System catalog is used by the Windows file search function. The Windows file search is opened when you select "Search" from your Windows start menu or open up the file explorer and click the "Search" button. If the Indexing Server is not running or if there is no System catalog then this will search the actual file system. But, it will utilize the System catalog when available and when the Indexing Service is running. You also can configure the Search function to utilize the System catalog or not. At the bottom of the search pane (in the file explorer), select the option "Change Option." Select the option "With Indexing Service" and then select the option "Yes, enable Indexing Service." You can the same way turn the usage of the "Indexing Service" off. Utilizing the Indexing Service will make searching faster as all the files have been already indexed and the system just needs to search the index instead of the file system itself.
Expand the "System" catalog with the plus sign to find out more details about it. Select the Directories entry to see all the folders that are included or excluded in this catalog. By default, this includes the folders "c:\" and "c:\Documents and Settings" but it excludes the folders "c:\Documents and Settings\*\Application Data\*" and "c:\Documents and Settings\*\Local Settings\*." This means it will exclude the "Application Data" folder with all its subfolders as well as the "Local Settings" folder and all it sub folders for any user profile. You can double-click each folder and change it to include or exclude. You also can add new folders by right-clicking the "Directories" entry in the left side pane and then selecting "New | Directory" from the popup menu. You can enter a local folder or a UNC path to any network folder. For network folders, you also need to enter the username and password to use.
How the Web Catalog Is Used
The Web catalog is available only when IIS has been installed. It by default points to the "Default Web Site" of the local IIS instance. Right-click on the "Web" entry in the left side pane and select Properties from the popup menu. Select the Tracking tab to see which Web site this catalog is pointing to. By default, this is the "Default Web Site." You can select any existing Web site of your local IIS instance. The Indexing Service will find the physical path of this Web site and add it to the directories to be indexed. It will also look for all virtual folders configured under this Web site and again add their physical paths to this catalog.
The "Indexing Service" extension ties IIS right into the Indexing Server. In the IIS Manager, open the properties dialog box of the Web site or any virtual folder configured under this Web site. Under the "Home Directory" or "Virtual Directory" tab, you see the option "Index this resource." Unselecting this option and save the settings to remove this physical path from the appropriate catalog in the Indexing Server (this requires that the catalog is started and may take a bit till it makes that change). Selecting this option will automatically add the physical folder again for the appropriate catalog. This allows you to control, right from within the IIS Manager, which folders will be indexed. You also have the "Index this resource" option for all file system folders shown under a Web site or virtual folder. But, I have not seen that this option makes any difference for file folders shown in the IIS Manager.
For this to work on IIS 6 that comes with Windows XP and Windows 2003, you need to make sure that the "Indexing Service" extension is running. Open the IIS Manager and open the "Web Service Extension" item in the left side pane. On the right side, it shows all extensions and, by default, the "Indexing Service" is prohibited. Select it and enable it through the "Allow" button. Also, make sure that IIS has been installed before the Indexing Service as explained earlier.
Other Administrative Options that Are Available
You can create new catalogs by right-clicking on the "Indexing Service" entry in the left side pane and selecting "New | Catalog" from the popup menu. Enter the name of the catalog and the folder where the catalog files are stored. You then need to stop and restart the Indexing Service itself so that the catalog files for this new catalog are created. You can also stop, pause, and start individual catalogs by right-clicking on the catalog name and selecting the appropriate option under "All Tasks" in the popup menu. You can add or remove folders to be included through the "Directories" entry under the catalog name. If you want to index a Web site, open up the properties of the catalog (right-click on the catalog name and select Properties from the popup menu) and, under the "Tracking" tab, select the Web site to index.
The Indexing Service also can create an abstract for the indexed files. Select the "Generation" tab of the properties dialog and uncheck the "Inherit above settings from Service" option. Then, select the "Generate abstracts" option and enter the maximum length of the abstract. You also can set this through the properties of the Indexing Service itself, which then applies to any catalog that inherits the settings from the Indexing Service.
Windows also gives administrators control over which folders or files can be indexed by the Indexing Server. This allows you to protect sensitive files so that they never get included in an index and therefore will never show up in a search result. A good example would be any financial details about the company. Open the file explorer and navigate to the appropriate folder or file. Bring up the properties of the folder or file, click on the Advanced button, and then uncheck or check the "For fast searching, allow Indexing Service to index this file" option, and then save the settings. If you selected a folder, it will ask you whether this setting should be applied to all subfolders or just the selected folder itself. For example, the actual catalog files themselves have all unchecked that setting so that Indexing Server will never try to index its own catalog files.
You also can query the catalog through the Computer Management console. Expand a catalog with the plus sign and you will see an entry called "Query the Catalog." This brings up a Web page with a simple query form. You can perform simple or advanced searches. Select the "Standard query (free text)" option, type in a search term, and then click the Search button. This will perform a simple search and display any matches below the search form. Select "Advanced query" and then type in a complex search term that can include operators such as AND and OR. You will look at the actual query language used by Indexing Server later in this article. This is a convenient way for administrators to test the actual catalog.
Indexing Server is very easy to use and very powerful. The only annoying things for a production usage are the fact that you need to restart the Indexing Service after creating a new catalog as well as stopping the Indexing Service before deleting a catalog. Sure, these are not everyday tasks, but when used with many catalogs—potentially for many customers in a hosted environment—this is a bit annoying.