Viewing the IE Document Object Model

Using WTL treeview in ATL Browser Helper Object

.

Environment: VC++ 6.0, IE 5+

The VC++ 6.0 project that comes with this article is an ATL in-process COM server, housing Browser Helper (BHO) and IE Extension objects. When the IE browser downloads HTML page, BHO will pop up a dialog box with a treeview which will display Document Object Model (DOM) nodes and attributes of that HTML document.

You can close the dialog but it will pop up again when you download new document. Also, each new browser window will show its own DOM tree dialog. IE Extension object is used to change this default behavior. Clicking Extension's button on IE toolbar you can disable dialog from popping up while you are browsing and display it later when you want to see the DOM tree of some particular HTML page.

This Article demonstrates some useful techniques and ways of using WTL features like:

  • dialog box is resizable
  • WTL treeview doesn't store text strings for each tree item. Display of each item is done 'on demand', when TVN_GETDISPINFO notification message arrives
  • tree items contain COM interface pointers as item data. In display phase these interfaces are queried to find out what text and bitmap to display for the DOM node or attribute in question
  • tree is constructed dynamically. No child nodes are added until the node is expanded. Children nodes are deleted when parent is collapsed
  • there is context menu that allows you to expand or collapse a node in its entirety, with all of its children, its children's children, and so on
  • BHO and IE Extension objects may be loaded in different COM apartments. BHO registers its instances in Global Interface Table (GIT) so that Extension can call methods on it

There are practical reasons that these techniques were applied. I don't like it when I cannot resize control and have to scroll just to see a tiny bit of the tree. DOM tree can be huge and constructing a large tree can be memory and time consuming. There is little sense in constructing, let's say, a 500 items tree, as you are moving from one Web page to another, especially if you are not going to look at the tree of each page in turn. Finally, when I want to look at the DOM tree I don't want to spend 5 minutes clicking mouse to open each node - interesting information is typically in the values of the terminal, leaf nodes.

To compile this project you will need to put WTL version 3.1 headers somewhere in your path. You can get them as the part of Platform SDK. Declarations for Explorer and HTML interfaces come from the import statement in my StdAfx.h:

#import "c:\winnt\system32\mshtml.tlb" named_guids
#import "c:\winnt\system32\shdocvw.dll" named_guids

You will need to change paths to point to the correct location on your machine. Just in case, if you download source code, there is one compiled component DLL in ReleaseMinDependency folder. Register it with Regsvr32 and you will get DOM tree display the next time you open IE.

I tested this COM server on Win95 with IE 5.0 and on Win2000 with IE 6.0, UNICODE and MBCS Minimum dependency Release compilations. At the time of posting this, I am not aware of any crippling bugs.

With such good article as Dino Esposito's "Browser Helper Objects: The Browser the Way You Want It" available in MSDN Library, I won't waste space explaining how BHOs work.

My BHO object CDOMPeek is declared as:

class ATL_NO_VTABLE CDOMPeek : 
 public CComObjectRootEx< CComSingleThreadModel>,
 public CComCoClass< CDOMPeek, &CLSID_DOMPeek>,
 public IObjectWithSiteImpl< CDOMPeek>,
 public IDispatchImpl< IDOMPeek, &IID_IDOMPeek, &LIBID_ANALYZEIELib>,
 public IDispEventImpl< 0, 
                           CDOMPeek, 
                           &SHDocVw::DIID_DWebBrowserEvents2,
                           &SHDocVw::LIBID_SHDocVw,
                           1,
                           0>

where IObjectWithSiteImpl is used to get IWebBrowser2 interface from which we can find all the other interfaces required to display DOM. IDispEventImpl will handle DWebBrowserEvents2 outgoing interface so that we can construct new or update existing DOM tree when event fires, telling us that document download was completed.

CDOMPeek object has ATL dialog declared as its member object - CMainDlg m_DocDlg - which in turn has WTL treeview object as its member - CDOMTree m_DOMTree.

We make dialog resizable by declaring it as:

class CMainDlg : public CDialogImpl< CMainDlg>, 
                  public CDialogResize< CMainDlg>

and calling DlgResize_Init() in response to WM_INITDIALOG. CDialogResize WTL class needs to handle Windows messages so we chain it to dialog's message map:

 BEGIN_MSG_MAP(CMainDlg)
    CHAIN_MSG_MAP( CDialogResize< CMainDlg>) 

DOM tree is a contained window:

 class CDOMTree : public CContainedWindowT< CTreeViewCtrl>

and declares it own alternate message map #1 in dialog constructor:

 CMainDlg() : m_DOMTree( this, 1) {}

Finally, we need to hook CDOMTree object to treeview control in WM_INITDIALOG:

 m_DOMTree.SubclassWindow( GetDlgItem(IDC_TREE1)); 

Now that we have BHO and UI objects in place, we need to construct the DOM tree when browser tells us document is complete. From IWebBrowser2 we get IHTMLDocument2:

 CComPtr<IDispatch> spDispatch;
 if ( SUCCEEDED( m_spWebBrowser2->get_Document( &spDispatch)))
    m_spDoc2 = spDispatch;

where m_spDoc2 is:

 CComQIPtr< MSHTML::IHTMLDocument2,
               &MSHTML::IID_IHTMLDocument2> m_spDoc2;

and we also get the handle of the window that will be the owner of the dialog:

 // plug our dialog into browser window
 HWND hwnd = NULL;
 if ( FAILED( m_spWebBrowser2->get_HWND( 
               reinterpret_cast< long*>(&hwnd))))
    hwnd = ::GetActiveWindow();

 if ( m_DocDlg.Create( hwnd))
    m_DocDlg.ShowWindow( SW_NORMAL);

DOM tree is constructed starting with the root DOM node:

inline bool CMainDlg::CDOMTree::PrepareDOMTree()
{
 CComQIPtr< MSHTML::IHTMLDocument3,
               &MSHTML::IID_IHTMLDocument3> spDoc3 = m_spDoc2;
 bool bRet = spDoc3 != NULL;

 if ( bRet)
 {
    CComPtr< MSHTML::IHTMLElement> spRootElement;
    bRet = SUCCEEDED( spDoc3->get_documentElement( &spRootElement));
    if ( bRet)
    {
      CComQIPtr< MSHTML::IHTMLDOMNode> spRootNode = spRootElement;
      bRet = spRootNode != NULL;
      if ( bRet)
        InsertDOMNode( spRootNode, NULL, NULL);
    }
 }
 else
 {
    ATLASSERT(0); // IE 4 does not support IHTMLDocument3
 }
 return bRet;
}

Notice that interface pointer to IHTMLDOMNode was passed to InsertDOMNode method. After some standard preparation of TVITEM structure it stores interface pointer as treeview item data:

inline HTREEITEM CMainDlg::CDOMTree::InsertDOMNode( 
                     MSHTML::IHTMLDOMNode* pINode,
                     HTREEITEM hparent,
                     HTREEITEM hinsertAfter)
{
  TV_INSERTSTRUCT tvis;

  ....

  tvis.item.mask =  TVIF_TEXT | TVIF_IMAGE | 
                    TVIF_SELECTEDIMAGE | TVIF_CHILDREN |
                    TVIF_PARAM; 
  tvis.item.pszText = LPSTR_TEXTCALLBACK;
  tvis.item.iImage = I_IMAGECALLBACK; 
  tvis.item.iSelectedImage = I_IMAGECALLBACK;
  tvis.item.cChildren = 0;

  .....

  // Need to AddRef because we'll be keeping interface 
  // pointer as treeview item data and use it in display phase
  pINode->AddRef();

  tvis.item.lParam = reinterpret_cast< LPARAM>(pINode);
  HTREEITEM hthisItem = InsertItem( &tvis);
  ......
}

You can check the rest of the dynamic construction and treeview display code in the attached VS 6.0 project. I hope there are enough comments in the code to make reasons for particular algorithms, API calls and implementation decisions easily understandable.

Some possible enhancements:

  • When HTML page contains frames, only DOM tree of the top frame is displayed and it is not very interesting. DOM tree of the document in each frame should also be displayed
  • CDOMTree class can be decoupled from the dialog class for better reusability
  • add features to edit DOM tree, add and delete nodes and change node properties
  • extend it so that it can display XML tree if document is XML

Downloads

Download source - 50.2 Kb


Comments

  • Microsoft Web Browser Object in a Dialog window ?

    Posted by Legacy on 04/08/2003 12:00am

    Originally posted by: Adrian Bacaianu

    I need some simplest like your map of ie on a dialog, face, i need just =
    to have an ie control window in my dialogbox....

    I have a simplest win32 console application, without MFC.
    I maked CreateDialog, to open a dialog box window, from a dialog resource.

    In that dialog i put an dll control of Microsoft Web Browser Object, from controls.

    How to connect to that object and tell him Navigate2 ?
    (with mfc is very simple is automatticaly link a m_spBrowser variable....)

    Reply
  • BHO can not work well under Windows XP with new patches

    Posted by Legacy on 10/09/2002 12:00am

    Originally posted by: microran

    Recently, I found that BHO (Browser Help Object )could not work well under windows XP with new security patches. It seemed that microsoft would not provide surpport BHO in its future operate system or Internet explorer.
    After the patches were downloaded and installed, some software (IE Plugin,such as ZeroPop) would not be loaded by IE or Explorer. when IE or explorer is running , I can delete the BHO component easily,that is to say BHOs never are running. I downloaded some sourcecode on BHO,only to get the same results. However,these components can run well under other OS and XP without these patches.
    who can tell me how to deal with these cases so as to use these BHOs in windows XP?
    thanks
    ran

    Reply
  • You might want to point out this caveat

    Posted by Legacy on 07/19/2002 12:00am

    Originally posted by: Steve Owens

    If you are coding for IE6.0 in ATL then the following could cause you great headaches. There is no rhyme nor reason for why this should be so.
    
    

    Intuitively you would think that if you performed a createElement in MSHTML you would get back an interface to an object corresponding to the element created. For example you wish to create table dynamically in code. If you createElement(L"TABLE", &pResult), when you query for the IHTMLTable interface you would expect to get it. But you dont!

    Read On:

    CComPtr<IHTMLDocument2> pDoc;
    .
    .
    . Create pDoc or somehow make it valid.
    . then call the Foo function.
    .
    HRESULT Foo(IHTMLDocument2* pDoc)
    {
    HRESULT hr = E_FAIL;
    CComPtr<IHTMLElement> pElement;
    hr = pDoc->createElement(L"TABLE", &pElement);
    if(SUCCEEDED(hr))
    {
    CComQIPtr<IHTMLTable> pTable;
    pTable = pElement;
    ASSERT(pTable); // Fails Always.
    if(!pTable)
    hr = E_UNEXPECTED;
    return hr;
    // Would also fail if you called
    //hr =
    // pElement->QueryInterface(IID_IHTMLTable,
    // (void**)&pTable);

    }
    return hr;
    }

    Reply
  • Very Good. Need More Info.

    Posted by Legacy on 02/11/2002 12:00am

    Originally posted by: Guna

    This Code is really useful to what I am doing right now. I am quite new to VC++ and would like to understand how to get the IHTMLDocument2 into my application. I appreciate any help in this regard.

    Reply
  • No IHTMLDocument3

    Posted by Legacy on 01/13/2002 12:00am

    Originally posted by: JTG

    I would like to use the IHTMLDocument3 but my VC++ doesn't know what it is. I am already using IHTMLDocument and IHTMLDocument2 in the same project. do I need to get a newer Mshtml.h file?

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: December 11, 2014 @ 1:00 p.m. ET / 10:00 a.m. PT Market pressures to move more quickly and develop innovative applications are forcing organizations to rethink how they develop and release applications. The combination of public clouds and physical back-end infrastructures are a means to get applications out faster. However, these hybrid solutions complicate DevOps adoption, with application delivery pipelines that span across complex hybrid cloud and non-cloud environments. Check out this …

  • Due to internal controls and regulations, the amount of long term archival data is increasing every year. Since magnetic tape does not need to be periodically operated or connected to a power source, there will be no data loss because of performance degradation due to the drive actuator. Read this white paper to learn about a series of tests that determined magnetic tape is a reliable long-term storage solution for up to 30 years.

Most Popular Programming Stories

More for Developers

RSS Feeds