Click to See Complete Forum and Search --> : text property regex


saul11
October 3rd, 2004, 11:41 AM
I have been working on a regex that should filter the text property attributes or CSS in a font or span tag, e.g. :

<span style="color: #2D1600; vertical-align: top; font-size: 12px; padding-left: 15px" bar="foo" color=red size="2" wordInventedAttribute="expendable">1</span>

Only the color and font-size style properties and the color and size attributes should match


Below, you see what I have, but there are two things I want to improve :
- First I need the regex to match, even when only one of the two style sheet properties is available (only font-size or only color)
- And second, it would be nice if a heading space is captured with the style sheets properties to. I have tried this already but didn't succed. with the tag attributes it does work.


# one tag name matcher and three times the style or attribute matcher with each 6 match possibilities (three times to reckon with the attributes placing)
# capturing (only one) front space of each attribute
# quotes aren't needed to match, but are captures when available
<
(span|font) # tagname

.*?
(?:
(?:
(\s?style="?).*? # style opener
((?:\s?font-size:.+?\s*(?:;|,|(?="))+)|(?:\s?color:.+?\s*(?:;|,|(?="))+))[^"]* # font-size or color style property
((?:\s?font-size:.+?\s*(?:;|,|(?="))+)|(?:\s?color:.+?\s*(?:;|,|(?="))+))[^"]* # font-size or color style property
# (the line above should be made optional.)
("?) # style end quote if available
|
(\s?size="?.*?(?:(?=\s)|"|(?=>))) # size attribute
|
(\s?color="?.*?(?:(?=\s)|"|(?=>))) # color attribute
|
(?=>)
)
.*?
){3}

>

saul11
October 9th, 2004, 09:03 AM
I solved my main (first) problem with just another 'alternative branch', but if someone should come up with a more elegant solution, please reply.



<
(span|font) # tagname

.*?
(?:
(?:
(\s?style="?).*? # style opener
((?:\s?font-size:.+?\s*(?:;|,|(?="))+)|(?:\s?color:.+?\s*(?:;|,|(?="))+))[^"]* # font-size or color style property
((?:\s?font-size:.+?\s*(?:;|,|(?="))+)|(?:\s?color:.+?\s*(?:;|,|(?="))+))[^"]* # font-size or color style property
("?) # style end quote if available
|
(\s?style="?).*? # style opener
((?:\s?font-size:.+?\s*(?:;|,|(?="))+)|(?:\s?color:.+?\s*(?:;|,|(?="))+))[^"]* # font-size or color style property
("?) # style end quote if available
|
(\s?size="?.*?(?:(?=\s)|"|(?=>))) # size attribute
|
(\s?color="?.*?(?:(?=\s)|"|(?=>))) # color attribute
|
(?=>)
)
.*?
){3}

>

saul11
October 10th, 2004, 04:24 AM
Hey all, maybe I've created some indistinctness, but here's some brightening. The regex does now what I want, but still if someone should come up with a more elegant solution, please let me know...

I'm concentrating in this post on the regex and so the format of it (the line breaks and the comments) is just for ease of reading.
I build up my regex in the Regulator found here (http://weblogs.asp.net/rosherove/archive/2003/10/23/33126.aspx) and here (http://royo.is-a-geek.com/iserializable/regulator/) if the first site should be down.
With this program you can make use of line breaks and comments.
The regulator is a nice tool but beware of the editor's behavior, save frequently or you'll find yourself losing code when the editor doen't act like it should (see more info and other bugs at sourceforge (http://sourceforge.net/tracker/index.php?func=detail&aid=983940&group_id=105210&atid=640570))

So far for the format of the regex.

I'll paste a small example where I use (test) the regex.
The regex has now no line breaks or comments and is formatted for javascript : /regex/flags (see more info at evolt.org (http://www.evolt.org/article/Regular_Expressions_in_JavaScript/17/36435/))

De regex scans the html for span or font tags and removes all attributes or style sheets that don't define the font-size or color.

e.g.
<span size="2" color=red style="vertical-align: top; padding-left: 15px; color: 12px;" bar="foo" wordInventedAttribute="expendable">1</span>
becomes
<span style=" color: 12px;' size="2" color=red>1</span>

run the code below to see it in action



<script>

string = '<span size="2" color=red style="vertical-align: top; padding-left: 15px; color: 12px;" bar="foo" wordInventedAttribute="expendable">1</span>\n<span color=blue style="FONT-SIZE: 26pt;" hello size="4" itworks>2</span>\n<span size=14 style="border: 1 px; FONT-SIZE: 26pt; width: 100 % ; color:25; height : 2 px;" color=red>3</span>\n<span style="border: 1 px; width: 100 % ; height : 2 px;">no match</span>\n<font color="#000000"> innerHTML </font>\n<font size=25 color="#000000"> abc </font>\n<font color="#000000">hello</font>\n<span>matching</span>\n<strong color=blue>bar</strong>\nsome text';
$regex = /<(span|font).*?(?:(?:(\s?style="?).*?((?:\s?font-size:.+?\s*(?:;|,|(?="))+)|(?:\s?color:.+?\s*(?:;|,|(?="))+))[^"]*((?:\s?font-size:.+?\s*(?:;|,|(?="))+)|(?:\s?color:.+?\s*(?:;|,|(?="))+))[^"]*("?)|(\s?style="?).*?((?:\s?font-size:.+?\s*(?:;|,|(?="))+)|(?:\s?color:.+?\s*(?:;|,|(?="))+))[^"]*("?)|(\s?size="?.*?(?:(?=\s)|"|(?=>)))|(\s?color="?.*?(?:(?=\s)|"|(?=>)))|(?=>)).*?){3}>/ig
matches = string.replace($regex, '<$1$2$3$4$5$6$7$8$9$10>');
alert(string + "\n\n" + matches);

</script>




(The regex is used to clean up pasted code from word in my online wysiwyg)