Logging 404’s using Filter DLL

When mastering a very big website, sometimes scoping the log for 404’s can be a tedious
task. There might be some logreaders here and there that simplify this process.

I chose to create a filtering DLL that logs only 404 not founds. In the general logfile
the referer-URL is never inserted, which means you still don’t know what page to change.
By using ISAPI, you can parse all HTTP-headers and if the browser sends them along, log
the referer-URL as well. That way you know what link was broken on which page.

If you put the logfile on a smart location (where the webserver can serve them), it is
possible to remotely query the logfile. It’s very well possible to create an Excel file
with macro’s and a button. As the user clicks the button, a connection is made to a
webserver and a logfile loaded. This logfile should then be parsed in memory and every
field on a logline put on a separate datasheet. By sorting the data afterwards, you can
undouble every 404 and increment a counter somewhere for every same 404. The result is a
sortable, filterable Data sheet listing 404’s in a certain time-period where the
referer-URL gives you an idea where to look for. Note that already certain web-analyzers
can be used for HTML-links. However, sometimes JavaScript is used or someone links to your
pages from an external site. This way, you can inform someone else of this mistake using
e-mail and resolve the JavaScript on a certain page. (I already found that certain
‘image-buffering’ scripts are not analyzed by web-analyzers. These not founds will show up
using these DLL’s).

Log404.dll

This DLL is the filtering DLL. When it initializes it configures the path to the log by
reading it from the registry. This path should be made using escaped backslashes as in a
normal C/C++ application. It will then fill a Security-Attributes structure. This is
necessary if you want to be able to clean the log using an extension DLL. The filter ISAPI
DLL does not run under the same user as an extension DLL (called from a form for
instance), therefore the extension DLL cannot normally write to the logfile (just read
it). In this example, I set the permissions for the logfile to include
"Everyone" access with "Full control" permissions.

As soon as a file cannot be found on the IIS Server, it will jump into the OnLog –
event of this filtering DLL. This method first checks if the reply-code to the client is
404 (not found). Any other event is not handled and the function returns. It checks if the
path to the logfile is not zero length. If the function has not returned, we are allowed
to carry on. Because we open, write and close a file here, we need to enter a critical
section here for thread-safety.

The ‘referer’ attribute is not included, but if it was sent by the browser it will be
included in the CHttpFilterContext class, passed along in this function. There I query the
ServerVariables for all unparsed HTTP headers. Next I look for http-referer: inside these
headers to find what I am looking for. The result is returned in a CString. Then the
logline is filled using the information out of the http-referer header and some
information of the HTTP_FILTER_LOG structure. You can add any information you like there.

After that, it will try to create the file using the "CreateFile" function.
Because of the "CREATE_NEW" constant, this function will fail if the file
already exists. When that happens, I am assuming that is the error that occurred and try
to open the file using the plain "fopen" command. If the file did not exist
before, the file IS created and the logline is written to that file. Last the critical
section is left and we can wait for another 404-logging request.

Because I do not want to restart the service when the logfile location changes I
included "/magic_string_reset_log_location". When this URL is requested on this
server it will re-read the path to the logfile from the registry and put that in the
m_LogPath attribute. Any subsequent logging is done using that file. This becomes handy
when you really need to keep your IIS service running and cannot afford to shut down for a
few seconds. The disadvantage is that since the client connection has already been
terminated (we’re in the logging phase now), we cannot reply to the client that this
special URL has been typed in and feedback success or error messages. Parsing on the URL
in a previous stage however is something I would not advise, because you simply do not
want to parse at the stage of URL mapping. That’d go beyond the point of this logging DLL.
This DLL is only called when logging should occur. (however, 200 OK’s are logged as well
many times… hmmm..)

404Maintenance.dll

This dll was written to support the maintenance of the log in a more glorious way. The
magic-string solution in the previous solution is insufficient right here, because this
time we want a report whether our function has succeeded or not. Also, this DLL gives us
better means of performing maintenance on our log and we can set special access rights for
the use of this DLL or change it frequently using standard IIS security functionality.

This DLL goes into a directory with authentication, so only some people will be able to
call the functions contained here. Consult IIS documentation for how to set this up.

When this DLL is called from a webbrowser we need to specify the function we want to
use. Since at this time only the "GET" method is used, using this DLL from a
form is highly insecure. Anyone that has access to the machine where the webbrowser was
run can read the "Username" and "Password" from the URL – historylist.
Therefore it’s highly recommendable to use the "POST" method instead, but I
could not find a way to use at short notice (and this is the first ISAPI application I
wrote).

The "SetLogLocation" function will reset the value in the registry to point
to the new location, as inserted on the webform. Then it returns with the magic-string set
up as a link once this function succeeded. The user can then click on that link to make
the new location current for the filtering DLL (404log.dll).

The "LogClean" function will read the path to the logfile from the registry
and will try to open it for writing. The file will be truncated if it succeeds. The file
is then immediately closed and a success message returned to the client.

The validity check on the UserName and Password does not use any system calls. Instead,
these parameters are compiled in. Consult the header file for these values. Currently, in
this example, these are both set to "Test". A better solution should be

Maintenance.htm

This file is a sample form which you can use for testing. "MfcISAPICommand"
is a special hidden field that allows you to use "GET" operations on specific
functions inside the ISAPI extension DLL. Many browsers are not consistent when
"ACTION" is specified using a question-mark inbetween. Consult the MSDN library,
chapter "ISAPI Extension DLL" for more details on this topic.

Disclaimer

This DLL is my first project on this topic. The software may contain bugs and it will
have security-problems. Use this software at your own risk, as I will not accept any
responsibility for production loss where this software has been used as-is.

Downloads

Download source code for filtering functions – 9 Kb

Download source code for log file maintenance – 12 Kb

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read