Full-Text Searching with IFilters

Introduction

The nineties were all about information creation and sharing. Today's challenge is about finding the information you need when you need it. We all feel the ongoing pain that we never can find that piece of information that helps us to do the tasks at hand. The result being that we either spend a lot of time searching for information or, if we can't find it, we spend a lot of time achieving the task at hand with trial and error until we figured out how to do it. Microsoft products such as Indexing Server, Exchange Server, SharePoint Server, SQL Server, and Windows Desktop Search provide powerful full text search capabilities. All of these products share a common building block for their full text searching—IFilters.

All Microsoft full text search engines have in common that they index the actual content and then allow you to perform searches against these indexes. The indexing process finds the file type associated with the content and then invokes the associated IFilter. The COM object that implements the IFilter encapsulates the understanding about the content structure and performs the actual indexing of the content. If a third-party ISV has some proprietary content tjat should be searchable by these Microsoft products, the ISV needs to create an appropriate IFilter COM object. As soon as this IFilter gets registered, it can be utilized by all Microsoft full text search engines. This simplifies tremendously the process for ISVs to make their content searchable with all thedifferent Microsoft products.

How IFilters Get Associated with the Different File/Content Types

Any content searched has a "file extension" associated. Indexing Server, SharePoint and Windows Desktop Search are used to index and search files on the file system. Exchange Server, SharePoint, and SQL Server can have files embedded that again have a file extension. All other fields in SQL Server are naturally assumed to be in text format and therefore assume the ".txt" extension. Messages in Exchange Server assume also the ".txt" extension. The Registry. therefore, is the natural place to associate IFilters with each file extension. The indexing process first determines the file extension of the content. Then, it performs the following steps:

  • Step 1: Determine whether there is a PersistentHandler associated with the file extension. This can be found in the Registry under HKEY_LOCAL_MACHINE\Software\Classes\FileExtension; for example, HKLM\Software\Classes\.htm. The default value of the sub key called PersistentHandler gives you the GUID of the PersistenHandler. If present, skip to Step Four; otherwise, continue with Step Two.
  • Step 2: Determine the CLSID associated with the file extension. Take the default value that is associated with the extension; for example, "htmlfile" for the key HKLM\Software\Classes\.htm. Next, search for that entry—for example, "hmtlfile"—under HKLM\ Software\Classes. The default value of the sub key CLSID contains the CLSID associated with that file extension.
  • Step 3: Next, search for that CLSID under HKLM\Software\Classes\CLSID. The default value of the sub key called PersistentHandler gives you the GUID of the PersistenHandler.
  • Step 4: Search for that GUID under HKLM\Software\Classes\CLSID. Under it, you find a PersistentAddinsRegistered sub key that has always a {89BCB740-6119-101A-BCB7-00DD010655AF} sub key (this is the GUID of the IFilter interface). The default value of this key has the IFilter PersistenHandler GUID.
  • Step 5: Search for this GUID once more under HKLM\Software\Classes\CLSID. Under its key, you find the InProcServer32 sub key and its default value contains the name of the DLL that provides the IFilter interface to use for this extension. For example, for the .htm and .html extension, this is the nlhtml.dll DLL.

The following article provides a more detailed description with examples how the IFilter DLL is found. For more information about the PersistentHandler, refer to this article.

How to Create Your Own IFilter Component

ISVs that register their own file extensions with proprietary content structures need to provide their own IFilter components so these file types can be searched by Microsoft products. The Platform SDK describes in detail the IFilter interface. The Platform SDK also contains three sample IFilter implementations.

The Three IFilter Components Used in this Article

The "Channel9 Wiki" lists the IFilter components that are present out of the box. Please note that a number of software packages install additional IFilter components. It also provides links to a number of additional IFilter components available. The Windows Desktop Search has its own site for additional IFilter components available—http://addins.msn.com. The rest of this article will explain how the full text search in Indexing Server, SQL Server, Windows Desktop Search, and SharePoint works. It will also document any additional settings you need to make for new IFilter components to work. The three IFilter components used are:

  • CHM file extension: The CHM extension is used by compiled Windows help files. Out of the box, CHM files have no PersistentHandler associated so they are not searchable. The installer places and registers one DLL: CHMIFilter.dll.
  • ZIP file extension: Also, ZIP files are not searchable out of the box because they again have no PersistenHandler associated. This installer places also and registers one DLL: ztvArchFil.dll. The ZIP IFilter made available by Citeknet worked fine with Indexing Server and Windows Desktop Search, but I could not get it to work with SQL Server. It also itself places and registers one DLL: ZIPIFilter.dll.
  • XML file extension: The XML file extension has by default a PersistentHandler associated that works fine with Indexing Server and Windows Desktop Search. But, the default IFilter did not work with SQL Server. This XML IFilter component works with all three. First, you need to extract the file, then copy the XMLFilter.dll to the windows\system32 folder, and then register it.

You can also download a Filter Explorer from Citeknet. This explorer walks the Registry and will list all the IFilter components available. It also can show all the file extensions that have no IFilter associated, meaning they are not searchable. This can be very useful in understanding what content is searchable or not. It also simulates the slightly different behavior of the different Microsoft products as to how each will read the Registry entries to find available IFilter components.

Full Text Search with Indexing Server

The following article describes how you can use Indexing Server to index and search files on the file system. Indexing Server can perform an auto registration of filters if they are added to the DLLsToRegister registry value (under HKLM\System\CurrentControlSet\Control\ContentIndex). When the Indexing service starts up, it calls DllRegisterServer for each DLL listed. In Windows 2003 and XP, this is a multi string value so you can edit it through the Windows Registry editor. In Windows 2000, this is a binary value. Some filters add themselves to this Registry value during registration; for example, the ZIP IFilter.

A newly registered IFilter takes effect only after the Indexing service has been restarted or an individual Indexing catalog itself has been restarted (then it only takes effect for that Indexing catalog). Unregistering an IFilter also takes effect only until the Indexing service or an individual Indexing catalog gets restarted. To remove already indexed content, you need to start a full rescan.

Full-Text Searching with IFilters

Full Text Search with SQL Server

The SQL Server full text search capability is available only when the "Full-Text Search" component (under Server Components) has been installed. The component is included if you chose a typical or complete installation. Otherwise, you need to launch setup again and add the missing component. This adds the "Microsoft Search" service that performs the actual indexing and full text searching. This service needs to be running for the full text search to work. Any time SQL Server encounters the CONTAINS or FREETEXT SQL statement or the CONTAINSTABLE or FREETEXTTABLE function, it calls out to the "MS Search" service to perform the actual full text search against the Indexing catalog. The following article provides an overview of the Microsoft Search service.

How do I create a full text search catalog for my SQL Server database?

Open the Enterprise Manager for SQL Server and navigate to your SQL Server database (in the left side navigation pane, select a SQL Server Group; next, select a registered SQL Server, and finally select under the "Databases" entry your SQL Server database). You see a number of entries under your database; one of them is called "Full Text Catalogs." This shows you any full text search catalog defined for your database. Right-click on the right side pane or the "Full Text Catalogs" item and select "New Full-Text Catalog" from the popup menu. Give the full text search catalog a name and select where the catalog files get placed which is by default "c:\Program Files\Microsoft SQL Server\MSSQL\FTData". The "Schedules" tab allows you to set one or more schedules when the full text search catalog gets updated; for example, perform a full population every Sunday and an incremental population every 10 minutes.

Click on the "New Catalog Schedule" button to define a new schedule. Give the schedule a name and select whether it is enabled or disabled through the "Enabled" check box. Under the job type, select whether this schedule performs a full population or incremental population. A full population rebuilds the search catalog and an incremental population indexes changes only since the last population. Finally, you select the schedule, which can be either at startup time of the SQL Server Agent, a specific date and time, or a recurring schedule. If you select "Recurring," click on the "Change" button to define the recurrence schedule; for example, every 10 minutes or every day at 1:00 AM from July 1st to July 31st.

How do I enable full text searching on one or more fields of a table?

To perform a full text search on a table, you need to create a full text index on the table. You can only define one full text index per table. Select the "Table" entry in the left side navigation pane to get a list of all tables defined in the database. Right-click on a table and select "Full-Text Index Table | Define Full-Text Indexing on a Table" from the popup menu. This will bring up a wizard that allows you to select which fields should be indexed. Tables that are indexed need to have a "unique single column index" that does not allow Nulls; for example, a primary key on an ID field that does not allow Nulls or an index on an ID field that has a unique constraint and does not allow Nulls. As the index gets built, each index entry points back to the table rows it applies to (through the unique identifier). SQL Server delegates any free text search as part of a query to the Microsoft Search service. The Search service performs the actual free text search and then returns the list of rows to include in the result-set. This is done through the unique identifier associated with each index entry.

The "Full-Text Indexing" wizard guides you through the process of creating a full text index. First, you select the "unique index" to use that allows you to select from a list of all unique indexes present on the table. Next, you select the fields to index by checking the checkbox in front of each field to index. The list only shows fields that can be indexed that are fields of the following data types: char, nchar, varchar, nvarchar, text, ntext, and image. All data types except the image data type are treated as text fields; therefore, the TXT IFilter will be used during the indexing process. Fields of the image data type contain file images. When you select a field of the data type image then you select under "document type column" which field will contain the data type of the file stored in the image field. The Search service looks at this field during the indexing process to understand the file type stored and which IFilter to apply. For example, you may have a field called FileImage of the data type image and a field called FileType of the data type nvarchar. While creating records in the table, you would store the file in the FileImage field and the file type in the FileType field; for example, "zip".

After selecting all the fields to index, you select to which full text catalog this index belongs. You can select from the list of existing catalogs or create a new one. Next, you can add a new catalog schedule or table index schedule. Any catalog schedule you add will apply to all table indexes in the full text catalog. A table index schedule you add will apply only for the table index that allows you to create different schedules for different tables. Finishing the wizard will apply all the changes, meaning it will add the catalog schedules, create the table index, and create the table index schedules.

How do I manage existing full-text catalogs?

SQL Server provides a number of options to manage your full-text catalogs. The "Full-Text Catalogs" entry will list all defined catalogs. In the right side pane, right-click on the catalog to manage. The popup menu will show you a number of options:

  • Rebuild Catalog: Rebuilds the catalog that generates a new empty catalog
  • Start Full Population: Starts a full population, which effectively rebuilds the catalog
  • Start Incremental Population: Indexes all changes since the last population
  • Stop Population: Stops a running population
  • Schedules: Brings up the list of defined schedules and allows you to change the existing schedules or create new ones
  • Delete: Allows you to delete the catalog with all its table indexes
  • Properties: Shows the properties of the catalog; the "Tables" tab shows all the tables that have an index and are part of this catalog; the "Schedules" tab" shows all the schedules defined.

How do I manage table indexes?

You can find out through the full-text catalogs which tables have a table index defined. You then find the appropriate table and right-click on it. From the "Full-Text Index Table" popup menu, you can select from a number of options:

  • Edit Full-Text Indexing: Brings up the "Full-Text Indexing" wizard that allows you to edit the full text index
  • Remove Full-Text Indexing from a Table: Allows you to remove a table index
  • Start Full Population: Starts a full population, which effectively rebuilds the table index
  • Start Incremental Population: Indexes all changes since the last population
  • Stop Population: Stops a running population
  • Schedules: Brings up the list of defined schedules and allows you to change the existing schedules or create new ones.

The attached sample database

The attached SQL Server database named "FullTextSearchSample" illustrates how you can store files in a database and search the file contents through the full-text search engine from SQL Server. It contains a table called DocumentLibrary that has a field DocumentImage of type image and DocumentType of type nvarchar. The attached sample application "Insert Files into Database" provides a Windows forms application to insert files into a table. You enter the name of the database server and the user credentials to use. Next, you enter the name of the table where to insert the file into, the name of the field where to insert the file contents, and the name of the field where to insert the file extension. Finally, you select the file to insert and click "Insert" to create a new record in the table and insert the file and file type. You also can achieve this by using the TextCopy utility provided by SQL Server. It is located at "c:\Program Files\Microsoft SQL Server\MSSQL\Binn". Here is an example:

TextCopy /S servername /U username /P password
         /D FullTextSearchSample /T DocumentLibrary
         /C DocumentImage /I /F filename /W "where ID = 8" /z

For a complete description of all command line arguments, click here. This utility can only update an existing record and the field needs to be not NULL; otherwise, you will get the following error: "Text or image pointer and timestamp retrieval failed". After you add some files into the sample database, make sure that the index gets updated. You can start it manually by right-clicking on the "DocumentLibrary" full-text catalog and selecting "Start Full Population" from the popup menu. Next, open the "SQL Query Analyzer," log on, select the "FullTextSearchSample" database, and run the following query:

SELECT * FROM DocumentLibrary
         WHERE CONTAINS( DocumentImage, 'Enterprise-Minds' )

This will query for all records with files in the DocumentImage field that contain the text "Enterprise-Minds." The sample database comes pre-populated with some files; therefore, it will return two records.

A newly registered IFilter takes effect without the need to restart services. Unregistering an IFilter also takes effect without restarting any services. To remove already indexed content, you need to start a full population.

Full-Text Searching with IFilters

Full Text Search with Windows Desktop Search

Windows Desktop Search is Microsoft's approach of enabling individual users to index and search their personal content. You can download the MSN Search toolbar with Windows Desktop Search from here. Similar products are Google Desktop and Yahoo! Desktop Search. This article covers Windows Desktop Search as it also utilizes IFilter components. This article does not rate or compare any of the desktop search tools mentioned above. because installation of the MSN Search toolbar with Windows Desktop Search is very straightforward. It also can search your personal e-mails; this requires Outlook 2000 or higher. The installation will ask you if you want to proceed if Outlook 2000 or higher is not present. This still allows you to search the content of files, just not for e-mails. Confirm the message with Ok to proceed (if the message appears). This starts the installation and registration of all the files followed by the "MSN Search Toolbar Customization Wizard" that allows you to configure the MSN Search toolbar with Windows Desktop Search.

You can check the "Use the default settings and close this wizard" option if you want to use all default options. Otherwise, proceed with the next button. There are three MSN Search toolbars you can install. The MSN Search toolbar displayed in Internet Explorer and Windows Explorer, the MSN Search toolbar displayed in Outlook (grayed out if Outlook 2000 or higher is missing), and the MSN Search Deskbar, which is shown in the Windows taskbar. Select which search toolbars you want to enable. Next, you can choose whether you want to participate in the "Customer Experience Improvement Program" and whether you want to make msn.com your default Internet search. You have also control over which content gets indexed and the indexing process in general. Next, you are presented with the following options:

  • Automatically start Windows Desktop Search: This is recommended because it will automatically start Windows Desktop Search when you log in (the install adds Windows Desktop Search to the Startup group). Otherwise, you need to start it manually through the Windows start menu. Under "All Programs," you find a Windows Desktop Search icon. As soon as Windows Desktop Search is started, you will see an icon in the icon tray of the Windows taskbar.
  • What to index: Here you specify what drives, folders, and e-mail folders to index. You have the option to index all emails and hard-disks (this indexes only the files and folders you have access to), all emails and files stored under your "My Documents" (which is all your personal content and is the default option) or you can specify which folders and e-mails to index. If you choose the last option, click the Browse button to see a list of all hard drives, folders, and e-mail folders. Select the ones you want to index (this shows again only drives and folders you have access to).
  • Index e-mail attachments: Check this checkbox if you also want to index file attachments (which is recommended and the default).
  • Index new items while on battery power: This is useful to preserve power while on battery and is again the recommended and default selection.

This finishes the configuration wizard and also launches (if selected) your browser to http://addins.msn.com. This site shows Windows Desktop Search addins available; for example, additional IFilter components. The installation automatically starts the Windows Desktop Search that shows an icon in the icon tray of the Windows taskbar. This starts two processes, WindowsSearch.exe and WindowsSearchIndexer.exe. A third process, WindowsSearchFilter.exe, gets started when an indexing is in progress. You can right-click on the icon in the icon tray of the Windows taskbar to see a list of available options:

  • Snooze Indexing 1 Hour: Allows you to pause the indexing process for an hour
  • Index Now: Windows Desktop Search only indexes when the user is inactive. This allows optimal responsiveness of Windows while the user is active and indexes content while the user is inactive. Through this option, you can force an indexing to happen now.
  • Indexing Status: Shows a Windows form in the lower right corner with the current indexing status. You can snooze the indexing process for up to a day or also force an indexing now.
  • Desktop Search options: Brings up the "MSN Search Toolbar Options" dialog that provides a number of settings:
    • General: Here, you can set which country/region will be searched if you invoke a Web search from the Windows Desktop Search (which by default uses msn.com). You also can specify any other search engine to use, for example http://www.google.com/search?hl=en&q=$w to use Google.
    • Deskbar: Allows you to set a number of options for the Windows Deskbar. You can show or hide the Deskbar from here, set the key combination how to activate it (same as clicking into the search box on the Deskbar), whether to search and display results while you type, show the go button, and so forth.
    • Desktop Search: Provides a number of options what to index and how to run the indexing. These options are the same you were presented with during the installation. Under the Advanced sub-item, you can specify which file extensions are handled as text files (for example, files with the extension .c or .cs) and which file extensions should be ignored (for example, files with the extension .386 or .bak). It also allows you to change the location where the indexing files are placed, which is by default under "c:\Documents and Settings\Username\Local Settings\Application Data\ Microsoft". Under it, you find a folder named "Desktop Search" that contains configuration information, temporary folders, log files, and so on. By placing it under a user's profile, it assures that each user only stores/sees their own information. When checking the "Prioritize Indexing" option, you can tell Windows Desktop Search also to index while the user is active.
    • Toolbar: Contains a number of options such as which icons to show in the toolbars, to enable tabbed browsing, to enable the popup blocker, turn on search result highlighting (which highlights the search term in the search result), and the like.
  • Search Now: Brings up the "Windows Desktop Search Result" window that allows users to perform searches for local or Web content. Can also be opened through the "Window Desktop Search" icon under All Programs of the Windows start menu.
  • Exit: Closes Windows Desktop Search which then also stops all indexing.

How do I use the Windows Deskbar to search local content?

You can show or hide the Windows Deskbar by right-clicking on the Windows taskbar and selecting "Toolbars | MSN Search Deskbar" from the popup menu. This shows a toolbar with a search box and go button. Activate the search box by clicking on it or by pressing Ctrl+Alt+M (default key combination). This brings up the "Windows Desktop Results" popup window that will show matching results as you type your search phrase. Clicking the go button beside the search box or the "Search Desktop" button at the bottom of the "Windows Desktop Results" popup window will bring up the "Windows Desktop Search Result" window. The search phrase will get passed along and the window will already show the result for the local search. Clicking on the Web button at the bottom of the "Windows Desktop Search" popup window will invoke a Web search and launch a browser showing the Web search result. This by default uses the msn.com search or whatever search engine has been configured.

Searching from within the "Windows Desktop Search Result" window

As explained above, the "Windows Desktop Search Result" window can be launched through the Windows start menu or the Windows Deskbar (which already passes along a search term and shows the result). At the top of the window, you see the search bar where you can enter a search phrase and perform a local search by clicking on the "Search Desktop" button or perform a Web search by clicking on the "Web" button. The Web search again will use the configured search engine. Local searches will show the result in the lower left pane. You can filter the result set by clicking on the filter buttons (on the filter bar, which is right underneath the search); for example, Everything, Documents, Music, IM Chats, and so forth. This allows you to filter by content type. The "Other" button shows you a list of all available filter options to choose from. In the lower right pane, you see a preview of the selected file in the search result. The second last icon from the right in the filter bar allows you to switch the result set from large icons to small icons and to select where to place the preview pane (right, bottom, or off).

A newly registered IFilter takes effect without the need to restart services. Unregistering an IFilter also takes effect without restarting any services. To remove already indexed content, you need to rebuild your index. Open the "Desktop Search Options," select the "Desktop Search" item on the left side, and then select the "Rebuild Index" button. This will close the Windows Desktop Search, rebuild the catalog, start the Windows Desktop Search again, and then start the indexing process. This may take a few minutes.

Full text search with Windows SharePoint Services

Both Windows SharePoint Services as well as SharePoint Portal Server utilize IFilter components for searching. The following article explains how to install Windows SharePoint Services so that searching is available. The settings around searching in Windows SharePoint Services are limited. Open the SharePoint Central Administration (through the Administrative Tools Windows start menu) and select the item "Configure full-text search" under the section "Component Configuration". The only option is to enable or disable the full-text search capability that applies to all SharePoint portals. When full-text searching is enabled, you can search the contents of any document uploaded into any "Document Library." Windows SharePoint Services automatically refreshes the index when new documents get uploaded. Normally, it takes a few minutes until the index has been updated and that the new document is included in the search result.

A newly registered IFilter takes effect only after you disable and then re-enable the full-text search option, which effectively rebuilds the catalog. It may then take a few minutes until the index has been rebuilt. Unregistering an IFilter also takes effect only after disabling and then re-enabling the full-text search option.

Full-Text Searching with IFilters

Full Text Search with SharePoint Portal Server

SharePoint Portal Server provides much greater control over the indexing process. SharePoint Portal Server registers its separate "Microsoft SharePointPS Search" Windows service for full-text searching. Similar to "Microsoft Search" for SQL Server, it is responsible for the actual indexing and search process. To configure the search in SharePoint Portal Server, open your portal and then click on "Site Settings." This brings up the "Site Settings" page that has a section called "Search Settings and Indexed Content." This section provides a number of settings for how portal content and non-portal content gets indexed. Non-portal content can be a Web page or complete Web site, a public folder from Exchange Server or a file share. So, SharePoint Portal Server allow you to index and search a much wider array of sources.

Select the "Configure search and indexing" item under the "Search Settings and Indexed Content" section. The "Configure Search and Indexing" page has three sections. The "General Content Settings and Indexing Status" section shows you an overall overview of the index. You can see how many portal or non-portal documents have been indexed, the last time the portal or non-portal index was updated, as well as any errors or warnings from the last index update. Click on the number of errors or warnings shown to bring up the "Gatherer log details" page with the detailed list of the occurred errors or warnings. SharePoint has four different modes of index updates:

  • Full: This mode indexes all content and performs a full update of the index
  • Incremental: This mode indexes changes since the last update
  • Incremental (inclusive): This mode indexes changes including Web Part pages and pages since the last update
  • Adaptive: This mode indexes content that is likely to have changed based on the site history

You can find a detailed description of each index update mode in the following article. Under the overall index overview, you find a list of links as follows:

  • Refresh: Refreshes this page with the latest information.
  • Start portal content update: Shows the four modes of index updates available for portal content. Select one to perform that index update; for example, click on Full to perform a full index update. If an index update is in progress, you only see Stop that allows you to stop the index update.
  • Start non portal content update: Provides the same index update modes for non-portal content.
  • Manage Search schedules: Brings up the "Manage Search Schedules" page that allows you to manage the index update schedule. Out of the box, there are three schedules defined: an incremental update of non-portal content once a day, an inclusive incremental update of portal content once a day, and an incremental update of portal content every ten minutes. You can add new schedules and remove or update existing schedules.
  • View errors and warnings on: Allows you to bring up the "Gatherer log details" page to view all errors and warnings for either portal or non-portal content.
  • Manage Search scopes: Brings up the "Manage Search Scopes" page that is used to manage search scopes. Out of the box, there is only one scope with the name "All Sources" defined. You can create new search scopes and remove or edit existing search scopes. Search scopes allow you to limit the search to only one or more areas. When users perform a search, they can select from a list of available search scopes.
  • Include file types: Brings up the "Specify File Types to Include" page. This allows you to specify file types that will be included in searches. For new IFilter components to take effect, you need to register the component and then add the file type to this list; for example, register the "ztvArchFil.dll" component and then add the "zip" extension to the include file types list. If you unregister IFilter components, you should also remove the file type from this list. You also have the option to keep the IFilter registered but only remove the file type from this list. All changes take effect when a full index update is performed.
    New file types will have no image associated out of the box. Copy the image file to associate with the new file type to the folder "c:\Program Files\Common Files\Microsoft Shared\web server extensions\60\template\images". Then, edit the "c:\Program Files\Common Files\Microsoft Shared\web server extensions\60\template\xml\docicon.xml" file. Under the "ByExtension" node, add a new mapping between the file type and the image to use for that file type; for example, "<Mapping Key="zip" Value="zip.gif"/>". You need to restart IIS for the image to take effect.
    Note that SharePoint also allows you to block file types. If you want to add new file types that are blocked, you need to unblock it. Open the "SharePoint Central Administration" (through the Administrative Tools of the Windows start menu) and select the item "Manage blocked file types" under the section "Security Configuration". Remove the file type you want to unblock or add the file type you want to block. Changes take effect immediately.
  • Add content Source: This shows the "Add Content Source" wizard that allows you to add new non-portal data sources to search. These can be public folders on Exchange Server, file shares, or Web pages or entire Web sites. For example, to add a new file share, select the "File Share" option and click Next. Then, enter in the field Address the file share (for example, \\Enterprise-Minds\MyShare), a description, and select whether to include sub folders or not. Finally, click on the Finish button that shows you a summary of the new data source. From here, you can create a new update schedule or start a full index update.
  • Enable advanced search administration mode: The advanced mode allows you to manage multiple search indexes. Switching to this mode shows a new section called "Content Indexes" on the "Configure Search and Indexing" page. By default, you see a Portal_Content and Non_Portal_Content index. Through the "Manage content indexes" link, you bring up the "Manage Content Indexes" page. You can create new indexes and edit or remove existing indexes. This mode also removes the "Start portal content update," "Start non portal content update," and "View errors and warnings on" links from the "Configure Search and Indexing" page. You now perform these actions through the "Manage Content Indexes" page. Click on one of the indexes to bring up the popup menu and allow you to select these options from there.

The section "Other Content Sources" on the "Configure Search and Indexing" page allows youto add new content sources. By default, you see a "This Portal" and "People" data source. The "Manage content sources" link brings up the "Manage Content Sources" page that allows to add new data sources (the same as the "Add content source" wizard described above) and edit or remove existing ones. Through the section "Site Directory" on the "Configure Search and Indexing" page, you can manage how SharePoint crawls Web sites. Please refer to the SharePoint documentation because this topic is beyond the scope of this article.

Full-Text Searching with IFilters

Using Windows Desktop Search as a Front End for the Other Search Applications

The Windows Deskbar allows you to create shortcuts that can point to URLs, files, or applications. The search text box of the Windows Deskbar can be used to execute, create, and update these shortcuts. To create or update a shortcut, start with the @ character followed by the name of the shortcut, a comma, and then the URL or file name to associate with the shortcut. If the shortcut points to an application, type an equal sign followed by the application name after the comma. Here is an example:

@ixs,=iexplore.exe file:///c:/windows/help/ciquery.htm#machine
     =MachineName, catalog=CatalogName

This creates a shortcut with the name ixs that launches Internet Explorer and passes along a local HTML page to load, including some parameters. The ciquery.htm page is provided by Indexing Server and provides a front end to search an indexing catalog. The machine argument passes along the Indexing Server to query and the catalog argument the indexing catalog to query. This shows the "Indexing Query Form" thatich allows a user to enter a search criteria, execute the query, and then view the matching results. To launch the "Indexing Query Form," you type in the shortcut name in the search box of the Windows Deskbar and press the Enter key. This launches the browser and loads the form. On Windows XP and Windows 2003 the browser will show you a warning about active content. Click on the information bar on top and select from the popup menu "Allow Blocked Content". This allows the "Indexing Query Form" to properly load. The form only works with Internet Explorer.

You can also create a shortcut to search SharePoint Portal Server from the Windows Deskbar. Create a shortcut pointing it to the search.aspx page of your SharePoint Portal Server. Please note that this does not work with Windows SharePoint Services. Here is an example:

@sp,http://<SharePoint Server>/search.aspx?k=$w&s=All%20sources

This example creates a shortcut called "sp" that calls the search.aspx page and passes along two arguments. The "k" argument is the search term that is set to $w, which means Windows Deskbar passes along the value the user passes along to the shortcut and the "s" argument which is the search scope. The search scope in this example is always set to "All sources." The user then can enter "sp <search term>" into the Windows Deskbar, which brings up a browser with the search result of the search performed by SharePoint Portal Server; for example, "sp Vancouver". This also works with multiple search terms separated by spaces; for example, "sp City of Vancouver".

Summary

with Indexing Server, Microsoft has released its first full-text search engine. Since then, it has built on that very same concept and provided a common way how to extend the full-text search engines of Indexing Server, SQL Server, SharePoint, Exchange Server, and Windows Desktop Search. This makes it very easy for ISVs to add full-text search capabilities of their proprietary content to most Microsoft products. IFilters are widely known in the community and you can find IFilter components for most file types. The Platform SDK has detailed examples and also provides a number of tools how to test IFilter components. If you have comments on this article or this topic, please contact me @ klaus_salchner@hotmail.com. I want to hear if you learned something new. Contact me if you have questions about this topic or article.

About the author

Klaus Salchner has worked for 14 years in the industry, nine years in Europe and another five years in North America. As a Senior Enterprise Architect with solid experience in enterprise software development, Klaus spends considerable time on performance, scalability, availability, maintainability, globalization/localization, and security. The projects he has been involved in are used by more than a million users in 50 countries on three continents.

Klaus calls Vancouver, British Columbia his home at the moment. His next big goal is running the New York marathon in 2006. Klaus is interested in guest speaking opportunities or as an author for .NET magazines or Web sites. He can be contacted at klaus_salchner@hotmail.com or http://www.enterprise-minds.com.

Enterprise application architecture and design consulting services are available. If you want to hear more about it, contact me! Involve me in your projects and I will make a difference for you. Contact me if you have an idea for an article or research project. Also, contact me if you want to co-author an article or join future research projects!



Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Where the business performance of their mobile app portfolios are concerned, most companies are flying blind. While traditional application portfolios are held to all kinds of ROI measure, the investment plan for mobile apps -- increasingly the more crucial bet -- is made by guesswork and dart-throwing. This interactive e-book investigates how mobile is driving the need for app and portfolio measures unlike any we saw in the days of web. Good mobile analytics must deliver leading indicators of user experience …

  • In this on-demand webcast, Oracle ACE and Toad Product Architect Bert Scalzo discusses 10 powerful and hidden features in Toad® that help increase your productivity and DB performance. Watch this webcast today.

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds