There are many reasons to log what users do throughout the shell. You may need logging for the applications’ sake, to reverse-engineer possible crashes or blocking errors, or to monitor the user’s activity. The key tool that allows for this is a fairly simple, and definitely underestimated, COM interface called IShellExecuteHook. Write a COM object which exposes this interface, register it properly, and you’re given a great chance to control, and even influence, the way in which things go throughout the Windows shell. The IShellExecuteHook shell extensions are supported on Windows 98, Windows 2000 and on Windows 95 and Windows NT 4.0 only if also the Active Desktop Shell Update has been installed. Notice that on those systems you can have the Active Desktop installed only through the IE 4.01 setup and only if you check the otherwise unchecked option.
A COM object that implements IShellExecuteHook is a shell extension that allows to hook on any call made to both ShellExecute and ShellExecuteEx. These two API functions have a couple of very interesting features. They can accept a file name on the command line and are capable to retrieve the name of the associated executable. In addition, they ensure the respect of any system policy set by the administrator. For example, if you don’t want users with a certain set of rights to be able to run a certain group of applications you just define a system policy. ShellExecute and ShellExecuteEx will check for this before creating the new process. (The same capability isn’t provided for example by CreateProcess or WinExec.) The flow of execution for the functions can be outlined like this:
- Obtain the name of the executable to run. The name can be passed as an actual parameter or, if you passed a file name as the argument, retrieved from the registry.
- Check the executable name against the Run system policies
- Invoke all the registered IShellExecuteHook extensions
- When all of them have agreed to continue, spawn a new process and return
ShellExecute and ShellExecuteEx are largely used by the Windows shell to start almost all the operations Explorer does. There is either of these functions between common operations such as double-clicking a folder item, exploring the content of a folder, printing or editing a document, viewing the Properties dialog box, selecting any item from a context menu. In addition, the Start menu’s Run dialog box also executes programs through ShellExecuteEx. The same does the start.exe command line utility available from the MS-DOS prompt. In a nutshell, everything the user does through the shell is caught up by a shellexecute hook. Furthermore, the hook detects also shell operations which originate programmatically.
Writing a Shell Logging Utility
To write a shell logging utility let’s start by creating a minimal COM object through ATL. What you need is just a COM in-proc object so feel free to use raw C++ code as well. Make sure the ATL component implements the IShellExecutHook interface by inheriting from a hand-crafted IShellExecuteHookImpl.h header file.
#include <AtlCom.h> #include <ShlObj.h> class ATL_NO_VTABLE IShellExecuteHookImpl : public IShellExecuteHook { public: // IUnknown // STDMETHOD(QueryInterface)(REFIID riid, void** ppvObject) = 0; _ATL_DEBUG_ADDREF_RELEASE_IMPL( IShellExecuteHookImpl ) // IShellExecuteHook // STDMETHOD(Execute)(LPSHELLEXECUTEINFO lpsei) { return S_FALSE; } };
In Listing 1 you can see the complete source code for the ATL object’s header file. Notice that it makes publicly available the IShellExecuteHook interface
BEGIN_COM_MAP(CLogger) COM_INTERFACE_ENTRY(ILogger) COM_INTERFACE_ENTRY(IShellExecuteHook) END_COM_MAP()
and overrides the Execute method.
// ILogger
public:
STDMETHOD(Execute)(LPSHELLEXECUTEINFO lpsei);
In Listing 2 you can see the code necessary to detect and log any operation that goes through the shell. Once the shell extension has been properly registered (more on this later on) your Execute method gets automatically called each time ShellExecute or ShellExecuteEx are invoked by any piece of software running on the machine.
Execute can do whatever you need it to, including stopping the whole operation.
Execute receives as an argument a structure called SHELLEXECUTEINFO
typedef struct _SHELLEXECUTEINFO{
DWORD cbSize;
ULONG fMask;
HWND hwnd;
LPCTSTR lpVerb;
LPCTSTR lpFile;
LPCTSTR lpParameters;
LPCTSTR lpDirectory;
int nCmdShow;
HINSTANCE hInstApp;
// Optional members
LPVOID lpIDList;
LPCSTR lpClass;
HKEY hkeyClass;
DWORD dwHotKey;
HANDLE hIcon;
HANDLE hProcess;
} SHELLEXECUTEINFO, FAR *LPSHELLEXECUTEINFO;
The operation that is about to take place is referred to by the lpOperation member, while lpFile contains the name of the file being processed. An operation is nothing more than a string. This is often referred to as a verb in some shell-related documentation. An operation can be a string like open, edit, explore, properties, find and any other name of context menu items.
Execute knows about the executable the shell is going to start and about the operation it is going to accomplish. Sometimes, though, the lpFile member doesn’t contain an executable file name. This happens when ShellExecute receives a document file name. When you double-click on a .txt file, actually you’re asking the shell to run a text file. Windows reacts by calling ShellExecute specifying the text file name as the executable name. Internally, ShellExecute retrieves and launches the name of the .exe file associated with the .txt class of documents. FindExecutable is the API function which gently returns this name.
In brief, Execute knows about the executable you want to run, its arguments, working directory and activation flags. However, with the sole exception of nCmdShow, there’s nothing that you can modify on-the-fly to alter the default behavior. The nCmdShow argument defines how the main window of the called application should open. It can be minimized, maximized, normal, and the like. This is the only parameter you can silently modify. Unfortunately, any other change you make to the executable name or any of the command-line arguments will be ignored.
Given this, there are mostly two things that you can do from within Execute. You can limit yourself to lurk what’s going or you can accomplish the operation yourself. When you finished pre-processing the operation, either you can ask the shell to continue with the default tasks or you can tell it that you don’t need any further processing. This depends upon the return value of Execute. Let’s analyze the first option in more detail.
A shell logging utility just needs to gather all the information it wants to store and then update the log file.
TCHAR szText[BUFSIZE]; wsprintf(szText, _T("%s: %s at %srn"), lpsei->lpVerb, lpsei->lpFile, szTime);
The above code snippet produces a log file filled with lines like this:
open: C:WINNTexplorer.exe at 5:41:18 PM
You can make this information more complete by adding the user name or any other data you may need.
The Execute’s return value is interpreted as the answer to the following question: is the operation terminated? If the answer is S_FALSE, then the shell goes on with the default behavior associated with that command. Otherwise, if the answer is S_OK the shell understands that the extension successfully completed the operation and does nothing more. If the return value is an error code, or any other unknown value, you get an error message.
What does this mean to you? Taking advantage of this feature, you can block the execution of certain executables with certain parameters or when run by certain users. Through this, you can arrange your personal policy manager. For example, the following code snippet prevents the actual execution of any .bmp file, no matter how the registry is configured.
strlwr((LPSTR)lpsei->lpFile); if (strstr(lpsei->lpFile, _T(".bmp"))) return S_OK;
If you don’t like how the shell processes a certain file or resolves a certain operation, you can always do it yourself. Suppose you want to make sure that a certain program utilizes a command line that is completely invisible to users through the icon properties dialog box. What you can do is running the program yourself with the right command line and tell the shell that no further processing is required after you return.
In this case, though, you cannot use ShellExecute or ShellExecuteEx to run your executable. In fact, if you do this you’ll enter into an infinite loop since any new call to either function originates a new call to the hook. You should run your executable with the modified command line through CreateProcess or WinExec. None of these functions, in fact, are detected by the shellexecute hook. Once the process has been spawned, you just return S_OK to the shell and exit.
Registering a ShellExecute Hook
For a shellexecute hook to work, a proper set of registry keys is needed. The following source code is the ATL script that you should add to the RGS file of your ATL project.
HKLM { SOFTWARE{ Microsoft { Windows { CurrentVersion { Explorer { ShellExecuteHooks { val {CLSID}= s 'Description' }}}}}}}
Replace the {CLSID} string with the CLSID of your ATL object and set an explanatory description for the component. It’s not strictly necessary that you add a description but I do recommend that you do this for the sake of clarity. CLSIDs aren’t very intelligible and as soon as you start having more shell hooks identifying the right one might soon become a terrible issue.
The above script creates a tree under HKEY_LOCAL_MACHINE and groups under the ShellExecuteHooks below the Explorer’s node the list of all the available hooks. Immediately after the Windows installation there are no hooks installed. However, at any moment in time you can have any number of shellexecute hooks. In these cases, it’s not clear which is the logic the shell uses to load them. My guess is that the shell loads the shell extensions based on their timestamp, starting with the oldest. You obtain the timestamp of a registry key using the RegEnumKeyEx API function. What matters, though, is that when an extension returns S_OK, then the shell stops calling the others. In pseudo code, this is what the shell does:
For Each extension In ShellExecuteHooks Load extension Invoke Execute() If return_value Is S_OK Then Exit Next
As you can easily imagine, this can be a problem if you absolutely need to execute your extension. Let’s consider a possible workaround. During the setup of your shell extensions keep track and then delete all the existing extensions. Next, register your own and re-write all the others. In this way, to the shell’s eyes your extension looks like the oldest one.
ShellExecuteHook Warnings
Keep in mind that ShellExecuteHook is not a perfect tool to monitor the system in all cases. It simply allows you to inject your custom code in the body of ShellExecute and ShellExecuteEx. While these two functions are largely used within the shell, they can’t ensure you keep under control all possible events that occur within Explorer. Want an example?
Judging from what I stated earlier, and based on what the documentation suggests, you may think that all the commands available from a context menu execute through ShellExecute. Well, this is untrue at least for the Properties context menu item. If you write a shellexecute hook with the precise goal of intercepting and redirecting the Properties dialog box you’ll soon realize to be out of luck. The lpVerb field of SHELLEXECUTEINFO should be set to properties when you right-click and select the Properties item for any file in any folder. Unfortunately, when this happens such an extension is never triggered and consequently there’s no way to hook on the event.
Quite surprisingly, if you write a little piece of C++ code and call ShellExecute or ShellExecuteEx specifying ‘properties’ as the verb name,
ShellExecute(NULL, "properties", "foo.txt", NULL, NULL, SW_SHOW);
the hook is regularly invoked. What does this mean? The shell isn’t using ShellExecute to show the Properties dialog box.
Is there a lesson to learn from this? The IShellExecuteHook interface is useful, and really helpful in several cases, but I strongly recommend that you make sure that it really solves your problem before you suggest its use to your boss. I solved apparently impossible issues with it, but it also reserved to me some really poor figures with clients.