Click to See Complete Forum and Search --> : Get location of element by IHTMLElement in the HTML?


dc_2000
July 2nd, 2007, 04:54 AM
Hi everyone:


I'm trying to isolate all <SCRIPT></SCRIPT> tags inside an HTML document. I can easily find the beginning of the SCRIPT tag (by simply using a text search function on HTML text) but to find its end one needs to parse the contents of the script itself. In light of this, I thought that maybe there's an easier way to find the character offset of all script objects using the MSHTML interfaces?


Here's again what I need in a small example. Say HTML is:
<html><body><script>var v=2;</script></body></html>

I want to get the offset of the script block. In this case it will begin at offset = 12 and will have length of 25 characters.


Now the coding part. I have IHMLDocument2 interface pointer, which I use to get IHTMLElementCollection for all script objects and then look through each of them and get IHTMLElement interface pointer for them. At that point my knowledge of DHTML ends. How can I get the offset of an element in the HTML code? I would appreciate if someone could help me out here :)


Thanks in advance.

Sahir
July 2nd, 2007, 07:01 AM
I'm trying to isolate all <SCRIPT></SCRIPT> tags inside an HTML document........ I have IHMLDocument2 interface pointer, which I use to get IHTMLElementCollection for all script objects and then look through each of them and get IHTMLElement interface pointer for them. At that point my knowledge of DHTML ends. How can I get the offset of an element in the HTML code? I would appreciate if someone could help me out here :)

If you are looking at the contents of a the HTML file as a string and trying to find the start position of a substring <script> you can use string's find function. But remember the script tag need not always appear as <script> it can vary according to the attributes used e.g. <script type="text/javascript">

If you are looking to get the contents of the script element you can iterate through the IHTMLElement collection get the IHTMLElement. Note : IHTMLElement not IHTMLObjectElement IHTMLObjectElement is the <object> element . Check the IHTMLElement's tagName property if it's "SCRIPT" then cast it to an IHTMLScriptElement . From the IHTMLScriptElement's text property you can get the contents of the script.

dc_2000
July 2nd, 2007, 03:26 PM
Thanks for your input. You see what I am trying to accomplish is to remove all the <script> tag blocks from HTML code. If I wanted simply to retrieve the code in them, definitely your method would work. In my case I need to pin-point the exact location of each script block in the HTML text. Of course, I can search the whole HTML document using the contents of each SCRIPT tag, but this method would not work at a condition when two or more script blocks would have the same contents, would it?

Any other ideas how I can implement that?