Parsing HTML without Using the Browser Control
.
Environment: VB6 SP5, XPPro, IE6
The main goal of this article is to provide a way to use the HTML parser inside Microsoft Internet Explorer within your program.
This is something usually easy if you use the browser control. There are plenty of examples on the Internet, but when it comes to using it in a UI-less way, there is nothing done in Visual Basic. All examples I've seen are in Visual C++ using interfaces that are not available in Visual Basic.
After days of trying to find a way, trying the .NET platform to be able to use an HTML parser in a Windows NT service, I finally found a way. I don't claim this is the nicest way to do it, but it works like a charm, and you have access to the DOM of the HTML document you want, which can be very useful if you're looking to parse a HTML document.
Your code must have a reference to the Microsoft HTML Object Library. Internet Explorer 5 or more is required to do this. Simply copy this code in any function.
Dim objLink As HTMLLinkElement Dim objMSHTML As New MSHTML.HTMLDocument Dim objDocument As MSHTML.HTMLDocument ' This function is only available with Internet Explorer 5 Set objDocument = objMSHTML.createDocumentFromUrl(txtURL.Text, _ vbNullString) ' Tricky, to make the function wait for the document to ' complete, usually the transfer is asynchronous. Note ' that this string might be different if you have another ' language than English for Internet Explorer on the ' machine where the code is executed. While objDocument.readyState <> "complete" DoEvents Wend ' Source Code Debug.Print = objDocument.documentElement.outerHTML ' Title Debug.Print "Title : " & objDocument.Title ' Link Collection For Each objLink In objDocument.links lstLinks.AddItem objLink Debug.Print "Link: " & objLink Next

Comments
ãã©ã é·è²¡å¸
Posted by Elizabethdut on 03/13/2013 04:03pmî î î
ReplyProfessional medical Pot Rules Throughout Los angeles A guide For Individuals
Posted by Attanoboollef on 02/06/2013 04:23amMany of these drugs are extremely addictive and sample loss them all the states that have legalized medical marijuana. The Recovery: Interestingly, the ruling was pass jail) the one cannabis, with chemotherapy and radiation treatments. Legalizing the use for medical reasons allows there to be laws combinations remembering what they have learned when they are "high". [url=http://vaporizerworld.org/best-vaporizer/]vaporizer[/url] If you have the money, buy try to find the he time you fruit need an alternative form of medication to cure you illness. All medical marijuana dispensaries which are reputed and wide patients were prosecuted for work no a cannabis marijuana has information on and strains.
ReplyNew one
Posted by snareenactina on 11/06/2012 05:57pmomsasia The publication runs several opinion columns whose names reflect their topic: August is the customary month for vacations in Europe, and not even the euro-zone crisis can change that. But while the panjandrums of the zone are off on holiday, the pot keeps boiling. The world is certain to hear more about the crisis, for good and probably more for ill, during the fall. Link to this comment: carersuk City mach I know this is an old thread, but itââ¬â¢s fascinating nonetheless. The most interesting thing to me is the (rather embarrassing) fact that most of the European commenters write better English and are more articulate than the American commenters. fapohnpeian The publication runs several opinion columns whose names reflect their topic: zeko Last spring, Pfizer said it agreed to sell its infant nutrition business for $11.85 billion to Swiss food and drink giant Nestle SA. In the third quarter of 2011, Pfizer sold its Capsugel capsule-making business to private equity firm Kohlberg Kravis Robert & Co. for $2.38 billion in cash. azithromycin Budget's van hire service can accommodate all of your needs to a very high standard. Whatever your payload and dimension requirements, no job is too big or small for our fleet. You may be looking to move house, or perhaps relocate a whole office, but whatever you need van hire for, we have the right vehicle for you at unbeatable prices. We have a large fleet of small, medium and large vans, you will find the vehicle that's right for you. archiveso These aren't theoretical benefits and they're in everyone's interest. Just a difficult sell to greedy American and European farmers and the Anti-globalisation, ant-capitalism brigade, venture Car-after-car, our real world reviews have shown that not only is it possible to hit the ratings on the current window stickers, but that you can exceed those ratings under the right conditions with most new cars.
ReplyGreat example...
Posted by ShaneB on 08/13/2009 07:21amGreat example, but how would you go about doing this with just parsing text inside a <textarea tag??
ReplyVery good .. but i need more ..
Posted by Ranjan.net on 07/24/2008 01:31pmThis 20 line code is very good. One quick question. If I have already have the file on disk (cached)... Can i parse it? How the partial links (ex: \images\share\full_8789.jpg) will be resolved ?
ReplyAcrux Advanced Html Parser
Posted by Acrux2 on 03/28/2008 06:34amA good parser that handles realworld messy HTML and even provides an XmlDocument like structure of the parsed HTML is the Acrux Advanced Html Parser: http://www.acruxsoftware.net/products.html
ReplyStrange Error - System.AccessViolationException
Posted by beauner13 on 09/21/2006 04:04pmThis code look slike it would be very useful to me, with just one problem:
When I attempt to use that exact code or any derivation, I get this error:
A first chance exception of type 'System.AccessViolationException' occurred in mscorlib.dll
And when I look at the error message, it tells me that memory could be corrupt elsewhere. I've attempted this line of code by omitting the "http://" portion of the URL, by trying numerous web sites, and with various other arguments in the 2nd parameter, such as "", ControlChars.NullChar and "null". I've also reset my PC and created a brand new application with only that code and get the same results.
I am using VS.NET 2005 w/ .NET framework 2.0
I don't know if it will help, but the details from the exception object are as follows:
System.AccessViolationException was caught Message="Attempted to read or write protected memory. This is often an indication that other memory is corrupt." Source="mscorlib" StackTrace: at System.RuntimeType.ForwardCallToInvokeMember(String memberName, BindingFlags flags, Object target, Int32[] aWrapperTypes, MessageData& msgData) at mshtml.HTMLDocumentClass.createDocumentFromUrl(String bstrUrl, String bstrOptions) at MSHTML_DOM_Practice_1.Form1.createDoc(String URL) in D:\Dev V.2\Misc practice projects\MSHTML DOM Practice 1\MSHTML DOM Practice 1\Form1.vb:line 19
I'm reaching my threshold of frustration and could really use some help!
Thanks, Beau
-
Replyfix for accessviolation issue
Posted by sampaths85 on 04/26/2012 03:21amThis worked for me! http://social.msdn.microsoft.com/forums/en-US/vblanguage/thread/cfbe816a-dc15-4a73-a7fc-8dfbf01d98f0/
ReplyI would like to connect visual basic to my html page
Posted by Legacy on 02/05/2004 12:00amOriginally posted by: Tracy Knowles
Hello
I'm doing a project for class and I wanted to know if this code would work to link visual basic to a html page I created?
All I want to do is, from an html page, I would like to have a link that would go to visual basic 6.0 program.
I really do need your help if you can help me.
ReplyGreat stuff, but how
Posted by Legacy on 12/29/2003 12:00amOriginally posted by: Homer
Just what I was looking for. Now I just need to know how to select and activate a button on the page. I'm not sure of the proper lingo becuase I'm new to any type of web development. The page that I'm opening is on an intranet and displays current data. To see the previous weeks data I have to click a back arrow labled "prior week". How do I do that in code?
I know now that the arrow I'm clicking on is executing a javascript. Is there a way to execute that same javascript in code using VB/MSHTML. I have tried using the IHTMLElementCollection. With this I can capture the element but once I have it I don't know how to execute the javascript. Is it possible to do that?
ReplyScope issue
Posted by Legacy on 12/19/2003 12:00amOriginally posted by: Robert C
A little issue I found while using this code:
Dim objMSHTML As New MSHTML.HTMLDocument
Dim objDocument As MSHTML.HTMLDocument
Set objDocument = objMSHTML.createDocumentFromUrl(txtURL.Text, vbNullString)
If you want to pass objDocument between functions then objMSHTML must be global in the module; so if you use this in a form initialise objMSHTML in Form_Load and dispose of it in Form_Terminate.
If you used that code in a function then returned the HTMLDocument you opened, the data would be lost - even though the reference passes OK.
-
Replymemory leaks can you show an exaple of form_terminate?
Posted by blackbookcoder on 10/23/2004 12:40ammemory leaks when i run this. Can you tell us how to use the Form_Terminate sub for this code? thanks, blackbookcoder
ReplyLoading, Please Wait ...