Logging 404's using Filter DLL

When mastering a very big website, sometimes scoping the log for 404's can be a tedious task. There might be some logreaders here and there that simplify this process.

I chose to create a filtering DLL that logs only 404 not founds. In the general logfile the referer-URL is never inserted, which means you still don't know what page to change. By using ISAPI, you can parse all HTTP-headers and if the browser sends them along, log the referer-URL as well. That way you know what link was broken on which page.

If you put the logfile on a smart location (where the webserver can serve them), it is possible to remotely query the logfile. It's very well possible to create an Excel file with macro's and a button. As the user clicks the button, a connection is made to a webserver and a logfile loaded. This logfile should then be parsed in memory and every field on a logline put on a separate datasheet. By sorting the data afterwards, you can undouble every 404 and increment a counter somewhere for every same 404. The result is a sortable, filterable Data sheet listing 404's in a certain time-period where the referer-URL gives you an idea where to look for. Note that already certain web-analyzers can be used for HTML-links. However, sometimes JavaScript is used or someone links to your pages from an external site. This way, you can inform someone else of this mistake using e-mail and resolve the JavaScript on a certain page. (I already found that certain 'image-buffering' scripts are not analyzed by web-analyzers. These not founds will show up using these DLL's).

Log404.dll

This DLL is the filtering DLL. When it initializes it configures the path to the log by reading it from the registry. This path should be made using escaped backslashes as in a normal C/C++ application. It will then fill a Security-Attributes structure. This is necessary if you want to be able to clean the log using an extension DLL. The filter ISAPI DLL does not run under the same user as an extension DLL (called from a form for instance), therefore the extension DLL cannot normally write to the logfile (just read it). In this example, I set the permissions for the logfile to include "Everyone" access with "Full control" permissions.

As soon as a file cannot be found on the IIS Server, it will jump into the OnLog - event of this filtering DLL. This method first checks if the reply-code to the client is 404 (not found). Any other event is not handled and the function returns. It checks if the path to the logfile is not zero length. If the function has not returned, we are allowed to carry on. Because we open, write and close a file here, we need to enter a critical section here for thread-safety.

The 'referer' attribute is not included, but if it was sent by the browser it will be included in the CHttpFilterContext class, passed along in this function. There I query the ServerVariables for all unparsed HTTP headers. Next I look for http-referer: inside these headers to find what I am looking for. The result is returned in a CString. Then the logline is filled using the information out of the http-referer header and some information of the HTTP_FILTER_LOG structure. You can add any information you like there.

After that, it will try to create the file using the "CreateFile" function. Because of the "CREATE_NEW" constant, this function will fail if the file already exists. When that happens, I am assuming that is the error that occurred and try to open the file using the plain "fopen" command. If the file did not exist before, the file IS created and the logline is written to that file. Last the critical section is left and we can wait for another 404-logging request.

Because I do not want to restart the service when the logfile location changes I included "/magic_string_reset_log_location". When this URL is requested on this server it will re-read the path to the logfile from the registry and put that in the m_LogPath attribute. Any subsequent logging is done using that file. This becomes handy when you really need to keep your IIS service running and cannot afford to shut down for a few seconds. The disadvantage is that since the client connection has already been terminated (we're in the logging phase now), we cannot reply to the client that this special URL has been typed in and feedback success or error messages. Parsing on the URL in a previous stage however is something I would not advise, because you simply do not want to parse at the stage of URL mapping. That'd go beyond the point of this logging DLL. This DLL is only called when logging should occur. (however, 200 OK's are logged as well many times... hmmm..)

404Maintenance.dll

This dll was written to support the maintenance of the log in a more glorious way. The magic-string solution in the previous solution is insufficient right here, because this time we want a report whether our function has succeeded or not. Also, this DLL gives us better means of performing maintenance on our log and we can set special access rights for the use of this DLL or change it frequently using standard IIS security functionality.

This DLL goes into a directory with authentication, so only some people will be able to call the functions contained here. Consult IIS documentation for how to set this up.

When this DLL is called from a webbrowser we need to specify the function we want to use. Since at this time only the "GET" method is used, using this DLL from a form is highly insecure. Anyone that has access to the machine where the webbrowser was run can read the "Username" and "Password" from the URL - historylist. Therefore it's highly recommendable to use the "POST" method instead, but I could not find a way to use at short notice (and this is the first ISAPI application I wrote).

The "SetLogLocation" function will reset the value in the registry to point to the new location, as inserted on the webform. Then it returns with the magic-string set up as a link once this function succeeded. The user can then click on that link to make the new location current for the filtering DLL (404log.dll).

The "LogClean" function will read the path to the logfile from the registry and will try to open it for writing. The file will be truncated if it succeeds. The file is then immediately closed and a success message returned to the client.

The validity check on the UserName and Password does not use any system calls. Instead, these parameters are compiled in. Consult the header file for these values. Currently, in this example, these are both set to "Test". A better solution should be

Maintenance.htm

This file is a sample form which you can use for testing. "MfcISAPICommand" is a special hidden field that allows you to use "GET" operations on specific functions inside the ISAPI extension DLL. Many browsers are not consistent when "ACTION" is specified using a question-mark inbetween. Consult the MSDN library, chapter "ISAPI Extension DLL" for more details on this topic.

Disclaimer

This DLL is my first project on this topic. The software may contain bugs and it will have security-problems. Use this software at your own risk, as I will not accept any responsibility for production loss where this software has been used as-is.

Downloads

Download source code for filtering functions - 9 Kb
Download source code for log file maintenance - 12 Kb


Comments

  • LeaveCriticalSection missing on return in CLog404Filter::OnLog

    Posted by Legacy on 12/13/2001 12:00am

    Originally posted by: winmike

    ...after strcmp(...,..."/magic...")

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • In support of their business continuity and disaster recovery plans, many midsized companies endeavor to avoid putting all their eggs in one basket. Understanding the critical role of last-mile connectivity and always available Internet access for their enterprises, savvy firms utilize redundant connections from multiple service providers. Despite the good intentions, their Internet connectivity risk may still be in a single basket. That is because internet service providers (ISPs) and competitive local …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds