Click to See Complete Forum and Search --> : Regular Expression Woes


Mutilated1
February 16th, 2005, 12:14 PM
I have several large HTML documents that were created in Front Page. I want to remove all the junk that Frontpage puts in the tags and be left with regular plain vanilla HTML.

In other words I want to take all the tags like this:

<TD
style="PADDING-LEFT: 0.75pt; PADDING-BOTTOM: 0in; WIDTH: 1in; PADDING-TOP: 0.75pt; HEIGHT: 12.75pt"
vAlign=bottom noWrap width=96>

And trim out everything except this:

<TD>

I know I can use preg_replace, but my problem is that I can't quite seem to hit upon the right mixed pattern to make the replacement work

Can any of you guys give me a hint ?

khp
February 17th, 2005, 08:44 PM
Here's asking the obligatory, what have you got so far ?

Without giving the problem much thought I would say something like.
preg_replace("/(<\w+)(.*?)(\/?>)/","$1$3", $string);

visualAd
March 3rd, 2005, 02:14 AM
Have you looked at the HTML tidy module. http://www.php.net/tidy

It is very good at cleaning up messy HTML created by the likes of Front Page.