Porting Legacy Browser Helper and IE Extension Objects to .NET

CodeGuru content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.





Click here for larger image

Picture 1 – Peek at Document Object Model from Browser Helper Object

Environment: NET Framework Beta 2

Introduction

This article describes the port of Browser Helper (BHO) and IE Extension objects, originally implemented in C++/ATL/WTL, to .NET class library implemented in C#. Some instructive problems and advantages of such implementation will be shown on concrete example.

Legacy ATL/WTL component code is available for comparison on this site at:
https://www.codeguru.com/atl/AnalyzeIE.html

In addition, article will demonstrate dynamic TreeView form in C#, for which children nodes are created on demand. This is useful technique for large trees.

Implementing IObjectWithSite Interface

The legacy component is inproc COM server so I started by creating C# class library. C# class must be visible as a COM class and implement IObjectWithSite interface, through which we get other interfaces required to display Document Object Model (DOM) tree of a HTML document.

IObjectWithSite method with COM signature:


virtual HRESULT STDMETHODCALLTYPE
SetSite( /* [in] */ IUnknown *pUnkSite);

is declared in the managed interface:


[ComImport, Guid(“FC4801A3-2BA9-11CF-A229-00AA003D7352”),
InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
public interface IObjectWithSite
{
void SetSite( [In, MarshalAs(UnmanagedType.IUnknown)]
object pUnkSite);
void GetSite( [In] ref Guid riid, [Out] IntPtr ppvSite);
}

MSDN Library specification for SetSite requires that incoming interface pointer is AddRef-ed before previously stored interface, if any, is released. However, .NET runtime will create so-called Runtime Callable Wrapper (RCW) to represent unmanaged COM interface and .NET object will be passed as input parameter. This may prompt you to assume that COM Interop code in .NET runtime will take care of all reference counting. Thus,
if you defined member object to store incoming interface object like:


public class DOMPeek: IObjectWithSite
{
private object m_IUnkSite;

method implementation should be as simple as:


void IObjectWithSite.SetSite( object pUnkSite)
{
m_IUnkSite = pUnkSite;
}

If you need more info about RCWs read MSDN article:
http://msdn.microsoft.com/msdnmag/issues/01/08/Interop/Interop.asp

The implementation above will work fine but it is due to the way IE calls this method on BHO: when IE loads BHO it will call SetSite once with non-null interface parameter and, before it quits, it will call it second time with null parameter. Because IE is quitting it does not really matter if reference counts on objects that it used, and which are going away with it, are correct. However, if we want to make sure that IObjectWIthSite interface is implemented according to the specification, irrespective of its use in some specific situation, we can imagine some unmanaged client calling it like in the following code snippet:


IUnknown* pUnk = NULL;
// create some object and get its IUnknown

// at this point reference count on pUnk is 1

// now create our component that implements IObjectWithSite

IObjectWithSite* pSiteHolder = NULL;
hRes = CoCreateInstance( CLSID_AnalyzeIE,
NULL,
CLSCTX_INPROC,
IID_IObjectWithSite,
void**)&pSiteHolder);

if ( SUCCEEDED( hRes))
{
// pass pUnk as site pointer
pSiteHolder->SetSite( pUnk); // this ads reference on
// pUnk – should be 2


// do something that does not change reference count of pUnk
// …….

pSiteHolder->SetSite( NULL); // pUnk should be released by
// other object and its ref count 1


// but if pUnk was not released above, let’s
// release COM object and, presumably, also all
// corresponding .NET objects including RCWs.

count = pSiteHolder->Release(); // this reference count
// is 0 – object goes away

}

// execute free unused libraries below and wait in
// debugger for about 10 minutes because COM Runtime
// does not really free all DLLs right after this call

CoFreeUnusedLibraries();

count = pUnk->Release(); // should be 0, at least after
// few hours to make 100% sure
// that COM freed all DLLS

When .NET runtime created RCW for pUnk parameter it did increment reference count to 2. However, in my tests, reference count on the last line above was 1, not 0 as expected. If runtime released RCW, it did not call Release on wrapped COM interface pointer.

While this is just one example, you can easily imagine situation where object behind pUnk would hold resources that you want freed. Then reference count of 1 would be a reason for concern.

If you read MSDN article referenced above, you may try to solve the reference count problem in this way:


void IObjectWithSite.SetSite( object pUnkSite)
{
m_IUnkSite = pUnkSite;
GC.Collect(); // when pUnkSite is null GC will collect
// old RCW and release interface?

}

However, in .NET Beta2 this won’t work either. Implementation that will work is:


void IObjectWithSite.SetSite( object pUnkSite)
{
if ( m_IUnkSite != null)
Marshal.ReleaseComObject( m_IUnkSite);
m_IUnkSite = pUnkSite;
}

I don’t know if final release of .NET will avoid the need to use ReleaseComObject. So be aware that COM interop issues may be more complex that you would expect and that .NET runtime is not always taking care of all reference counting.

Next comes the implementation of GetSite method with COM signature:


HRESULT STDMETHODCALLTYPE GetSite( /*[in]*/ REFIID riid,
/*[iid_is][out]*/ void **ppvSite);

It required some experimentation to figure out how to deal with the outgoing void** parameter. A client calling this method on us can, in principle, pass any interface IID and we should return a pointer to that interface, if we did implement it. You will rarely explicitly query for interface in .NET managed languages. Instead, you cast a managed object representing some interface to an object representing requested interface. Interface querying is done ‘behind the scenes’ by the runtime. If query failed, you will get InvalidCast exception. But in this case we do not know which interface may be requested so we cannot use casts. Instead, we use Marshal.QueryInterface and, because it does not throw exception when query fails, we have to throw it ourselves.


void IObjectWithSite.GetSite( ref Guid riid, IntPtr ppvSite)
{
const int e_fail = unchecked((int)0x80004005);

if ( !ppvSite.Equals((IntPtr)0))
{
IntPtr pvSite = (IntPtr)0;
// be a good COM interface imp – NULL the
// destination ptr first

Marshal.WriteIntPtr( ppvSite, pvSite);

if ( m_IUnkSite != null)
{
IntPtr pUnk =
Marshal.GetIUnknownForObject( m_IUnkSite);
Marshal.QueryInterface( pUnk, ref riid, out pvSite);
Marshal.Release(pUnk); // GetIUnknownForObject
// AddRefs so Release

if ( !pvSite.Equals((IntPtr)0))
Marshal.WriteIntPtr( ppvSite, pvSite);
else
Marshal.ThrowExceptionForHR(e_fail);
}
else
Marshal.ThrowExceptionForHR(e_fail);

}
else
Marshal.ThrowExceptionForHR(e_fail);
}

Note that I defined standard COM E_FAIL HRESULT.

At this stage I wanted to make sure IE will see my .NET component as BHO COM component. That is, on top of the standard COM component registration that RegAsm tool does, BHO needs extra keys and values in the Registry so that IE can find it. These are added using RegistryKey class inside the method with the special attribute that makes it called during registration:


[ComRegisterFunctionAttribute()]
static void RegisterServer(String str1)

Sink for DWebBrowserEvents2

Once BHO is visible to IE, we can start adding code that will handle DWebBrowserEvents2 events. To make types from SHDocVw.dll (DWebBrowserEvents2 interface being one of them) available to BHO I added the reference to this DLL using “Add Reference” menu in VS.NET. Proxy DLL was created as the result.

One way to use this proxy DLL is described in MSDN Library article “Handling Events Raised by a COM Source” and accompanying sample code. However, that case is very different from ours because sample code has a client that actually creates managed IExplorer object. In our case unmanaged IExplorer already exists and our component was created by it as inproc server. We have no managed object on which to register our event handlers! In other words, we will have to implement DWebBrowserEvents2 interface.

I imagine some of you may now be thinking along these lines: implementing IObjectWithSite in legacy project was one line of code – we simply inherited ATL implementation. Here we had to struggle a bit, but the interface has only two methods. Now, however, we would need to implement DWebBrowserEvents2 and also hook up the sink via connection point interfaces! That certainly looks like a lot of work when compared to one line which caused all that to happen in ATL code.

Good news is that managed version of connection point hookup is easy and takes only few lines of code. What bothered me, however, was that I could not reuse more code from SHDocVW proxy DLL. I looked at it with ILDASM and found DWebBrowserEvents2_SinkHelper class that implements DWebBrowserEvents2, but this class doesn’t have public constructor. Luckily, it does not prevent Activator class from creating instances:


SHDocVw.DWebBrowserEvents2_SinkHelper m_SinkHelper;
…..
Type type = typeof(SHDocVw.DWebBrowserEvents2_SinkHelper);
m_SinkHelper =
(SHDocVw.DWebBrowserEvents2_SinkHelper)Activator.CreateInstance(type);

You can see the rest in attached project sources. At this point browser events were arriving at the sink and the next step is GUI to show DOM tree.

DOM dialog with treeview

Somewhat non-standard GUI work here involved making the treeview resizable with the dialog. It takes very simple code in handlers for the Load and Resize dialog events. Much tougher, and ultimately unsolved, problem was how to make modeless dialog form owned by the IE window, as it is in legacy code. That is, displaying the dialog with:


m_DocDlg.Show();

makes it owned by the Desktop with the effect that, when you minimize browser window, it is still visible on screen. One Form can be owned by another by setting the Owner property. However, we do not have owner form but only browser window handle from IWebBrowser2’s HWND property. I could not figure out, assuming it is doable, how to “wrap” a Form object around existing window handle. Best I could come up with was to make browser window parent of the dialog:


[DllImport(“user32”)]
static extern int SetParent( int hWndChild, int hWndNewParent);

int parenthwnd = m_IWebBrowser2.HWND;
SetParent( m_DocDlg.Handle.ToInt32(), parenthwnd);

To get access to HTML DOM interfaces I added the reference to MSHTML.tlb. Beware – this can take several minutes (at one point I even thought that program locked up) and results in about 10 MB large proxy Interop.MSHTML_4_0.dll. It does seem like a lot to distribute along with our assembly, which takes only 36 KB. When .NET becomes more widespread, maybe the primary assemblies for popular legacy components will come preinstalled in Global Cache. Alternatively, one would hope that it would be possible to import only those types that one intends to use.

In ATL/WTL treeview we did not store text with each item but, instead, provided it in response to TVN_GETDISPINFO message. System.Windows.Forms.TreeView doesn’t expose any event for this functionality. While data for display could not be provided on demand, I was still unwilling to construct potentially huge DOM tree each time user downloads HTML document. I have seen several, otherwise good, products where I had to wait about 15 seconds just to see a few children of the treeviev item on which I clicked. Software anticipated that I may want to look at all children’s children, etc. and took its time to construct an entire branch. It is possible to do better! In C++ code we did set cChildren field of TVITEM structure, which in turn is the field in TV_INSERTSTRUCT, to tell treeview that item will have children, even if we didn’t add any at that time. Nothing similar is available for treeview form in C#, so we must resort to the trick – we’ll add a dummy child node so that we get BeforeExpand event when user clicks on parent node. At that point we remove the dummy and add children nodes. See source code for the details. Adding a context menu was somewhat simpler, in my judgment, than in C++ equivalent. Also, TreeView form has the method to expand an entire branch, while in C++ we had to code it explicitly and work around one Win32 treeview bug (or poorly documented feature.)

Tastes may differ but, comparing treeview-related code in ATL/WTL with C# one, I would say that this is where C# shines: for example, we use casts instead of COM smart pointers, there is no setting of various structure fields, altogether it presents less and more legible code.

IE Extension

The purpose of this object is to provide the user interface (button in IE toolbar and menu item on Tools menu) letting user stop dialog from popping up or building DOM tree each time some HTML page gets downloaded. It is the toggle that changes the value of the static variable:


public class DOMPeek: IDOMPeek, IObjectWithSite
{
private static bool m_bShowDialog = true;

and, based on new value, closes or displays the dialog. If you create multiple instances of IExplorer (for example by clicking New Window menu item), there will be multiple instances of IE Extension object too. Even if this object is small, it makes sense to have only one instance of it. Note that I am not talking about C# language or .NET remoting singleton, but about controlling the way that COM factory creates new object instances. Unfortunately, I could not find anything in Beta 2 documentation to help me do this in C# code.

Extension needs to implement IOleCommandTarget interface and it also presented some problems. In particular, the structure passed to one of IOleCommandTarget mehods, defined in C header as:


typedef struct _tagOLECMDTEXT
{
DWORD cmdtextf;
ULONG cwActual;
ULONG cwBuf;
/* [size_is] */ wchar_t rgwz[ 1 ];
} OLECMDTEXT;

could not be marshalled as managed structure


[StructLayout(LayoutKind.Sequential)]
public struct OLECMDTEXT
{
public uint cmdtextf;
public uint cwActual;
public uint cwBuf; // specifies the number of chars in array
public char[] rgwz; // string or whatever – nothing
// works, no MarshalAs attributes help

}

and it looks like using IntPtr and manually poking it in unsafe code would be the only option, if we needed to read or change values of structure fields. IntPtr is .NET wrapper for native integer and I would recommend that, whenever you have problems marshalling a pointer parameter in COM, you try using IntPtr first. Namely, if you don’t get managed type for parameter right, one of two things can happen:


  • interop layer will throw exception without ever calling your method. This can be confusing if you
    set the breakpoint in method entry, because you don’t know if something else went wrong

  • method will be called but marshaled parameters will contain garbage. At this point you can start experimenting with parameter types and MarshalAs attribute

With IntPtr as parameter type you are more likely to encounter the second case. Going back to our problem, for our use of IE Extension OLECMDTEXT parameter can be ignored. So, let us keep in mind one more gotcha of the COM Interop (in Beta 2 at least) and proceed to the part where it does a nice job of making life easier, in comparison to the legacy code. First, look at the picture of a typical situation with multiple instances:

Picture 2 – BHO and IE Extension objects in two IE instances

Now, if you clicked Extension’s toolbar button in instance 2, it has to call a method (blue pointed lines) on both instances of BHO. Should we worry about cross-apartment access then? One may think that we need not because BHO and IE Extension are C# classes. We can define static array for BHO instances like:


public class DOMPeek: IDOMPeek, IObjectWithSite
{

public static ArrayList m_Instances = new ArrayList();

initialize array in SetSite:


void IObjectWithSite.SetSite( object pUnkSite)
{

m_Instances.Add( this);

and provide static access method for IE Extension:


public static bool ToggleDialogShow()
{
m_bShowDialog = !m_bShowDialog;
// call code to close or display the dialog
return m_bShowDialog;
}

However, C# objects calling methods on other C# objects is not the only thing going on here: code that initializes dialog also queries COM interfaces and calls methods on those interface. For example:


private void SetTitle()
{
MSHTML.IHTMLLocation ILoc = m_IDoc2.location;
Text = ILoc.href;
….

which gets HTML page location and sets it as dialog’s title. In effect, we are calling the method on COM interface pointer from different STA apartment. Morevover, the call above worked when executed inside the calling apartment but failed in other! Initially I took this as a sign that I do need to account for COM apartments and that some .NET equivalent of Global Interface Table (that helped cross apartments in C++) is required. However, other COM calls worked fine, prompting me to conclude that failure in getting location property is just a bug in interop proxy DLL. Another conclusion that we may draw is that RCWs wrap apartment-neutral COM interface pointers. Some posts on very useful DevelopMentor DOTNET discussion list indicate that this is indeed true or, in other words, .NET classes are “context agile” and multiple threads can execute within the same (in our case default) context.

The Final Hack

I encountered the following problem early in the development but I describe it last because solution is a hack. Namely, BHO can be loaded both by the Windows Explorer (WE) and IE. However, not only is this BHO intended only for IE but, if WE loads BHO, you won’t be able to link, while you are developing, because linker can not overwrite the file in use. You must unregister the component, log out and log back in, to continue modifying and testing your code. Even worse – if there is a major bug in BHO that you just tested and you forgot to unregister it before logging out, the next time you log in Explorer may lock up immediately!

The hack starts with the observation that, when we register our assembly as a COM server using RegAsm, it is not the path to assembly that is stored under the InprocServer32 key. Instead, it points to the proxy, mscoree.dll, part of the .NET runtime. Therefore, we can insert another proxy DLL under the InprocServer32 key, with the difference that this one will allow only Internet Explorer to load it.

Consequently, I created the new DLL project in VS6, called wrapmscoree, with the entry code:


HINSTANCE hLib;
extern “C” BOOL WINAPI DllMain( HINSTANCE hInstance,
DWORD dwReason,
LPVOID /*lpReserved*/)
{
if (dwReason == DLL_PROCESS_ATTACH)
{
TCHAR Loader[MAX_PATH];

GetModuleFileName( NULL, Loader, MAX_PATH);
for ( int i = lstrlen( Loader); i > 0; i–)
if ( Loader[i] == _T(‘\\’))
{
lstrcpy( Loader, Loader + i + 1);
break;
}

if ( lstrcmpi( Loader, _T(“iexplore.exe”)))
return FALSE;
if ( ( hLib = LoadLibrary(_T(“mscoree.dll”))) == NULL)
return FALSE;
….

As you can see, DllMain returns FALSE if it is not called by iexplore.exe and also loads the real .NET proxy. The only other functions in wrapmscoree.dll that IE, as COM client, cares about are implemented by calling the corresponding ones in mscoree.dll:


STDAPI DllCanUnloadNow(void)
{
typedef HRESULT (_stdcall *fpCanUnloadNow)(void);
fpCanUnloadNow fp;
if ( hLib && ( fp =
(fpCanUnloadNow)GetProcAddress( hLib,
_T(“DllCanUnloadNow”))))
return fp();
return S_OK;
}

STDAPI DllGetClassObject(REFCLSID rclsid,
REFIID riid,
LPVOID* ppv)
{
typedef HRESULT (_stdcall *fpGetClassObject)
(REFCLSID rclsid, REFIID riid, LPVOID* ppv);
fpGetClassObject fp;
if ( hLib && ( fp =
(fpGetClassObject)GetProcAddress( hLib,
_T(“DllGetClassObject”))))
return fp( rclsid, riid, ppv);
return E_FAIL;
}

Finally, I added some code to change default InprocServer32 path set by RegAsm during registration.

Building and installing

I removed references to SHDocVw.dll and MSHTML.tlb from the solution file in attached project, because building the corresponding proxy assemblies can take a long time and they may not be located in \WINNT\System32 folder as they are on my machine. Therefore, you should add them yourself before building. BHO can be installed by copying it, together with support files, into any folder folowed by invoking RegAsm “path to BHO” /codebase on command prompt. Here is the list of the installed files:


  • AnalyzeIE.dll – BHO and IE Extension assembly

  • Interop.MSHTML_4_0.dll – MSHTML proxy

  • Interop.SHDocVw_1_1.dll – SHDocVw proxy

  • wrapmscoree.dll – DllInit hack

  • iespy.ico – IE Extension button standard icon

  • iespyhot.ico – IE Extension button hot icon

Conclusion

If DOM display is all that it does, there are no compelling reasons to re-implement Browser Helper Object and IE Extension in one of .NET managed languages. It is the “classic COM” component for use by the classic COM client. However, the amount of code to write is not large and it is not entirely trivial. Combined with the availability of original C++ code for comparison, I thought that the full port to C#, as opposed to one more tutorial, would be instructive.

You are more than welcome to comment on implementation decisions for which you think there are more elegant solutions in .NET framework. There certainly are issues that I haven’t had the time to investigate in detail or simply don’t understand as well as I would want to.

Few of such issues are listed below in no particular order:


  • can we tell COM factory to create COM singleton from managed code?

  • is there no simpler way to prevent loading of BHO by Windows Explorer except for the hack with
    wrapper DLL? Can the same be accomplished with managed code?

  • is there really no way to start with a window handle and attach it to Form object in managed code?

  • are standard COM HRESULTs defined somewhere in .NET and accessible to managed code?

Downloads

Download demo project and source code – 22 Kb

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read