Parsing HTML without Using the Browser Control

How to Use MS HTML as a HTML Parser in Visual Basic Without Using the Browser Control.

.



Click here for larger image

Environment: VB6 SP5, XPPro, IE6

The main goal of this article is to provide a way to use the HTML parser inside Microsoft Internet Explorer within your program.

This is something usually easy if you use the browser control. There are plenty of examples on the Internet, but when it comes to using it in a UI-less way, there is nothing done in Visual Basic. All examples I've seen are in Visual C++ using interfaces that are not available in Visual Basic.

After days of trying to find a way, trying the .NET platform to be able to use an HTML parser in a Windows NT service, I finally found a way. I don't claim this is the nicest way to do it, but it works like a charm, and you have access to the DOM of the HTML document you want, which can be very useful if you're looking to parse a HTML document.

Your code must have a reference to the Microsoft HTML Object Library. Internet Explorer 5 or more is required to do this. Simply copy this code in any function.

Dim objLink As HTMLLinkElement
Dim objMSHTML As New MSHTML.HTMLDocument
Dim objDocument As MSHTML.HTMLDocument


' This function is only available with Internet Explorer 5

Set objDocument = objMSHTML.createDocumentFromUrl(txtURL.Text, _
                                                  vbNullString)
    
' Tricky, to make the function wait for the document to 
' complete, usually the transfer is asynchronous. Note 
' that this string might be different if you have another
' language than English for Internet Explorer on the
' machine where the code is executed.

While objDocument.readyState <> "complete"
    DoEvents
Wend

' Source Code

Debug.Print = objDocument.documentElement.outerHTML

' Title

Debug.Print "Title : " & objDocument.Title

' Link Collection

For Each objLink In objDocument.links
    lstLinks.AddItem objLink
    Debug.Print "Link:  " & objLink
Next

Downloads

Download demo project - 3 Kb


Comments

  • プラダ 長財布

    Posted by Elizabethdut on 03/13/2013 04:03pm

      

    Reply
  • Professional medical Pot Rules Throughout Los angeles A guide For Individuals

    Posted by Attanoboollef on 02/06/2013 04:23am

    Many of these drugs are extremely addictive and sample loss them all the states that have legalized medical marijuana. The Recovery: Interestingly, the ruling was pass jail) the one cannabis, with chemotherapy and radiation treatments. Legalizing the use for medical reasons allows there to be laws combinations remembering what they have learned when they are "high". [url=http://vaporizerworld.org/best-vaporizer/]vaporizer[/url] If you have the money, buy try to find the he time you fruit need an alternative form of medication to cure you illness. All medical marijuana dispensaries which are reputed and wide patients were prosecuted for work no a cannabis marijuana has information on and strains.

    Reply
  • New one

    Posted by snareenactina on 11/06/2012 05:57pm

    omsasia The publication runs several opinion columns whose names reflect their topic: August is the customary month for vacations in Europe, and not even the euro-zone crisis can change that. But while the panjandrums of the zone are off on holiday, the pot keeps boiling. The world is certain to hear more about the crisis, for good and probably more for ill, during the fall. Link to this comment: carersuk City mach I know this is an old thread, but it’s fascinating nonetheless. The most interesting thing to me is the (rather embarrassing) fact that most of the European commenters write better English and are more articulate than the American commenters. fapohnpeian The publication runs several opinion columns whose names reflect their topic: zeko Last spring, Pfizer said it agreed to sell its infant nutrition business for $11.85 billion to Swiss food and drink giant Nestle SA. In the third quarter of 2011, Pfizer sold its Capsugel capsule-making business to private equity firm Kohlberg Kravis Robert & Co. for $2.38 billion in cash. azithromycin Budget's van hire service can accommodate all of your needs to a very high standard. Whatever your payload and dimension requirements, no job is too big or small for our fleet. You may be looking to move house, or perhaps relocate a whole office, but whatever you need van hire for, we have the right vehicle for you at unbeatable prices. We have a large fleet of small, medium and large vans, you will find the vehicle that's right for you. archiveso These aren't theoretical benefits and they're in everyone's interest. Just a difficult sell to greedy American and European farmers and the Anti-globalisation, ant-capitalism brigade, venture Car-after-car, our real world reviews have shown that not only is it possible to hit the ratings on the current window stickers, but that you can exceed those ratings under the right conditions with most new cars.

    Reply
  • Great example...

    Posted by ShaneB on 08/13/2009 07:21am

    Great example, but how would you go about doing this with just parsing text inside a <textarea tag??

    Reply
  • Very good .. but i need more ..

    Posted by Ranjan.net on 07/24/2008 01:31pm

    This 20 line code is very good. One quick question. If I have already have the file on disk (cached)... Can i parse it? How the partial links (ex: \images\share\full_8789.jpg) will be resolved ?

    Reply
  • Acrux Advanced Html Parser

    Posted by Acrux2 on 03/28/2008 06:34am

    A good parser that handles realworld messy HTML and even provides an XmlDocument like structure of the parsed HTML is the Acrux Advanced Html Parser: http://www.acruxsoftware.net/products.html

    Reply
  • Strange Error - System.AccessViolationException

    Posted by beauner13 on 09/21/2006 04:04pm

    This code look slike it would be very useful to me, with just one problem:

    When I attempt to use that exact code or any derivation, I get this error:
    A first chance exception of type 'System.AccessViolationException' occurred in mscorlib.dll

    And when I look at the error message, it tells me that memory could be corrupt elsewhere. I've attempted this line of code by omitting the "http://" portion of the URL, by trying numerous web sites, and with various other arguments in the 2nd parameter, such as "", ControlChars.NullChar and "null". I've also reset my PC and created a brand new application with only that code and get the same results.

    I am using VS.NET 2005 w/ .NET framework 2.0

    I don't know if it will help, but the details from the exception object are as follows:

    System.AccessViolationException was caught Message="Attempted to read or write protected memory. This is often an indication that other memory is corrupt." Source="mscorlib" StackTrace: at System.RuntimeType.ForwardCallToInvokeMember(String memberName, BindingFlags flags, Object target, Int32[] aWrapperTypes, MessageData& msgData) at mshtml.HTMLDocumentClass.createDocumentFromUrl(String bstrUrl, String bstrOptions) at MSHTML_DOM_Practice_1.Form1.createDoc(String URL) in D:\Dev V.2\Misc practice projects\MSHTML DOM Practice 1\MSHTML DOM Practice 1\Form1.vb:line 19


    I'm reaching my threshold of frustration and could really use some help!

    Thanks, Beau

    • fix for accessviolation issue

      Posted by sampaths85 on 04/26/2012 03:21am

      This worked for me! http://social.msdn.microsoft.com/forums/en-US/vblanguage/thread/cfbe816a-dc15-4a73-a7fc-8dfbf01d98f0/

      Reply
    Reply
  • I would like to connect visual basic to my html page

    Posted by Legacy on 02/05/2004 12:00am

    Originally posted by: Tracy Knowles

    Hello

    I'm doing a project for class and I wanted to know if this code would work to link visual basic to a html page I created?

    All I want to do is, from an html page, I would like to have a link that would go to visual basic 6.0 program.

    I really do need your help if you can help me.

    Reply
  • Great stuff, but how

    Posted by Legacy on 12/29/2003 12:00am

    Originally posted by: Homer

    Just what I was looking for. Now I just need to know how to select and activate a button on the page. I'm not sure of the proper lingo becuase I'm new to any type of web development. The page that I'm opening is on an intranet and displays current data. To see the previous weeks data I have to click a back arrow labled "prior week". How do I do that in code?

    I know now that the arrow I'm clicking on is executing a javascript. Is there a way to execute that same javascript in code using VB/MSHTML. I have tried using the IHTMLElementCollection. With this I can capture the element but once I have it I don't know how to execute the javascript. Is it possible to do that?

    Reply
  • Scope issue

    Posted by Legacy on 12/19/2003 12:00am

    Originally posted by: Robert C

    A little issue I found while using this code:

    Dim objMSHTML As New MSHTML.HTMLDocument
    Dim objDocument As MSHTML.HTMLDocument

    Set objDocument = objMSHTML.createDocumentFromUrl(txtURL.Text, vbNullString)

    If you want to pass objDocument between functions then objMSHTML must be global in the module; so if you use this in a form initialise objMSHTML in Form_Load and dispose of it in Form_Terminate.

    If you used that code in a function then returned the HTMLDocument you opened, the data would be lost - even though the reference passes OK.

    • memory leaks can you show an exaple of form_terminate?

      Posted by blackbookcoder on 10/23/2004 12:40am

      memory leaks when i run this. Can you tell us how to use the Form_Terminate sub for this code? thanks, blackbookcoder

      Reply
    Reply
  • Loading, Please Wait ...

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • On-demand Event Event Date: October 23, 2014 Despite the current "virtualize everything" mentality, there are advantages to utilizing physical hardware for certain tasks. This is especially true for backups. In many cases, it is clearly in an organization's best interest to make use of physical, purpose-built backup appliances rather than relying on virtual backup software (VBA - Virtual Backup Appliances). Join us for this webcast to learn why physical appliances are preferable to virtual backup appliances, …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds