Parsing HTML without Using the Browser Control

How to Use MS HTML as a HTML Parser in Visual Basic Without Using the Browser Control.

.



Click here for larger image

Environment: VB6 SP5, XPPro, IE6

The main goal of this article is to provide a way to use the HTML parser inside Microsoft Internet Explorer within your program.

This is something usually easy if you use the browser control. There are plenty of examples on the Internet, but when it comes to using it in a UI-less way, there is nothing done in Visual Basic. All examples I've seen are in Visual C++ using interfaces that are not available in Visual Basic.

After days of trying to find a way, trying the .NET platform to be able to use an HTML parser in a Windows NT service, I finally found a way. I don't claim this is the nicest way to do it, but it works like a charm, and you have access to the DOM of the HTML document you want, which can be very useful if you're looking to parse a HTML document.

Your code must have a reference to the Microsoft HTML Object Library. Internet Explorer 5 or more is required to do this. Simply copy this code in any function.

Dim objLink As HTMLLinkElement
Dim objMSHTML As New MSHTML.HTMLDocument
Dim objDocument As MSHTML.HTMLDocument


' This function is only available with Internet Explorer 5

Set objDocument = objMSHTML.createDocumentFromUrl(txtURL.Text, _
                                                  vbNullString)
    
' Tricky, to make the function wait for the document to 
' complete, usually the transfer is asynchronous. Note 
' that this string might be different if you have another
' language than English for Internet Explorer on the
' machine where the code is executed.

While objDocument.readyState <> "complete"
    DoEvents
Wend

' Source Code

Debug.Print = objDocument.documentElement.outerHTML

' Title

Debug.Print "Title : " & objDocument.Title

' Link Collection

For Each objLink In objDocument.links
    lstLinks.AddItem objLink
    Debug.Print "Link:  " & objLink
Next

Downloads

Download demo project - 3 Kb