Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
My daughter George up from Texas this week with the grandkids, so I didn't get a lot of work done—but, that's it should be. I was having a problem getting a reader registered on my book's support site when she looked over my shoulder. I was in the process of building a Reporting Services report against my Hitchhiker's Guide users' database which that track of registered readers so they can download the errata and premium content and ask questions. She asked if I could break down the data into regions so we could see where my readers live—she's an engineer/mom and thinks of things like that. I thought it was a good idea, so I dove into the problem while the kids were napping (we think).
It did not take long before I discovered that very few of the registered readers divulge their personal information, such as country or state. I don't really blame them—I don't either. It turns out that for some reason the registration application did capture their time zone in virtually every case. Okay, I told myself I just need to create a report that builds a pie chart based on the time zone. That took about 10 minutes from start to finish with SQL Server Reporting Services. Of course, Peter Blackburn and I wrote a 5-star book on Reporting Services, so your mileage may vary. The problem is that the report did not include the name of the time zone. For example, the Pacific Time Zone in the US is designated as UTC-8 or eight hours ahead of Greenwich Mean Time (which is now known as Coordinated Universal Time or UTC). Those reading the report would not know (even generally) which portion of the registered users lived in each of the segments. I needed a way to name each zone in the report's pie chart—or at least in the report legend.
After considerable searching, I hit on a few articles and MSDN help topics that said all a developer needs to do is just pull the time zone data from the Registry—another "just." As a rule of thumb, whenever I find the word "just" in a suggestion, a warning bell goes off in my head. For example, if someone says "If you just take out their leader, we'll be out of there in no time... " you might think again. To me, "just" means that they just don't understand the whole problem. But, I digress...
Extracting Values from the Time Zones Registry Keys
After some Google searching, I found there is a list of time zones in the system Registry. Ordinarily, I don't encourage my readers to dink with the Registry—it's akin to brain surgery in the mirror, but in this case you're only going to read the values from the Registry. These key values are used to populate the dropdown lists used by the Windows applications such as those used to set the system clock. I'm also not accessing the Registry as the report is rendered, but extracting the data into a database table ahead of time. This permits me to use the Business Intelligence tools more easily to fold in the time zone name data corresponding to the UTC offset value (like -8 in my case). Sure, this means when Kazakhstan decides to change their time zone to align with Uzbekistan, I'll have to re-run the application used to rebuild the database table. Yes, another approach would be to extract the data from the system hosting the report-rendering engine (as I discuss in Chapter 14 of my book Hitchhiker's Guide to Visual Studio and SQL Server (7th Edition)), but that would have meant a 301 lesson instead of a 201 lesson. Reporting Services is quite capable of running report-specified DLLs or even report-imbedded Visual Basic source code, but these require a lot more work (IMHO) than they are worth—at least in this case.
So, how does one extract this information from the Registry? Well, part of the solution can be found online. MSDN includes this frail page that simply describes the binary structures used in the Registry key that holds the time zone information in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time Zones. Unfortunately, it provides no code or additional information about the contents of the specified Registry key. I had to dump the key values using Regedit to see what was available to extract. Figure 1 shows the Time Zones key values for Eastern Standard Time:
Figure 1: Regedit showing Time Zones key values
Google also led me to http://www.dotnet247.com/247reference/msgs/24/123755.aspx, which hosts a C# example that illustrates a technique to extract data from the Registry by walking through the aforementioned Time Zone key to get the values. Unfortunately, the example missed the "Description" and "Dlt" subkey values. But, once you see how easy it is to extract any Registry key, all you need to do is just...
The Visual Basic Registry Extractor
Okay, take this one step at a time. Because the example is in C#, I had to start from scratch in Visual Basic.NET so that more developers could use it. Nope, the automatic code translators on the web didn't work (they rarely do for the C# examples I pull down from the web), so I had to translate this myself.
To start with, to extract the TZI values (if you need them) you're going to need to build a couple of structures as described in the MSDN article. This binary structure is used to show how and when the daylight savings time should be applied. Frankly, I didn't need this information for my report, but I thought that while I'm here, I might as well extract it. The Visual Basic Structure blocks map the binary value in TZI to Visual Basic variables, as shown in Figure 2.
Figure 2: Structures to map the TZI binary key value
The data extraction application collects data from the registry keys and saves them to a new database table I've already built (with the SQL Server Management Studio Object Explorer). Yes, I could have done it in Visual Studio with the Server Explorer but I chose the SSMS tools for this task.
Looping Through the Zones
The next step loops through the Registry key and dumps the zone key values. Notice that I first extracted the list of time zones into a string array. Each zone is referenced by name using the OpenSubKey method against the root key in the registry. A list of the time zones is saved to a string array populated with the GetSubKeyNames method.
As shown in Figure 3, the code saves the individual Registry key values to a strongly typed DataTable. This was built using Visual Studio's Data Source wizard that was pointed at the empty TimeZone table in the database. The benefit here is that you don't have to build any of the CRUD to save the data and you can reference the columns with human-readable names.
Figure 3: Saving key values to DataTable columns.
Extracting the Binary TZI Value
As you would imagine, accessing a binary key value in the Registry is a bit more complex. The trick is to map the binary value to a Structure that permits the code to reference each element of the structure with a named variable. This code is shown in Figure 4.
Figure 4: Extracting the TZI binary value.
The last step in the code (as shown in Figure 4) uses the TableAdapter Update method to save the DataTable to the database. Notice that the code generates the GMTOffset column by extracting a piece of the Description key value.
Building the Report
Okay, now that I have a table that has the GMTOffset and a matching Description, I can use Reporting Services to build a report that shows the correlated values in a chart by including query that uses a JOIN on the TimeZone and the RegisteredUsers tables. Figure 5 shows how the report looks so far. Just exactly how this report is built is fodder for another article.
Figure 5: The Users by Time Zone Report
Notice that the GMT -5 time zone (Eastern Standard Time in the US) is marked as "Bogota, Lima...". That's correct. Unfortunately, I don't really know if these users are from North or South America. That's a task I'll need to take up after the grandkids fly home. Right now, they seem to be disassembling something with a chainsaw in the next room.
About the Author
William (Bill) Vaughn is an industry-recognized author, mentor, and subject-matter expert on Visual Studio, SQL Server, Reporting Services, and data access interfaces. He's worked in the computer industry for over thirty-five years—working with mainframe, minicomputer, and personal computer systems as a developer, manager, architect, trainer, marketer, support specialist, writer, and publisher. In 2000, after 14 years at Microsoft, Bill stepped away to work on his books, mentoring, and independent training seminars. He's written seven editions of the Hitchhiker's Guide to Visual Basic and SQL Server and three editions of ADO.NET and ADO Examples and Best Practices for Visual Basic (and C#) Programmers. He and Peter Blackburn also wrote the critically acclaimed Hitchhiker's Guide to SQL Server 2000 Reporting Services.
Bill is a top-rated speaker and frequents conferences all over the world including TechEd, Visual Studio/SQL Connections, DevTeach, and many others. He's also written a wealth of articles for magazines such as MSDN, SQL Server Magazine, Visual Basic Programmer's Journal, .NET Magazine, and many others as well as a regular editorial for Processor magazine. Bill spends considerable time answering questions on the public newsgroups and speaking at INETA user group meetings all over the country and at other speaking venues all over the world. He's available for consulting, mentoring, or custom training. See www.betav.com or www.betav.com/blog/billva for his current schedule and course catalog.