saul11
October 3rd, 2004, 11:41 AM
I have been working on a regex that should filter the text property attributes or CSS in a font or span tag, e.g. :
<span style="color: #2D1600; vertical-align: top; font-size: 12px; padding-left: 15px" bar="foo" color=red size="2" wordInventedAttribute="expendable">1</span>
Only the color and font-size style properties and the color and size attributes should match
Below, you see what I have, but there are two things I want to improve :
- First I need the regex to match, even when only one of the two style sheet properties is available (only font-size or only color)
- And second, it would be nice if a heading space is captured with the style sheets properties to. I have tried this already but didn't succed. with the tag attributes it does work.
# one tag name matcher and three times the style or attribute matcher with each 6 match possibilities (three times to reckon with the attributes placing)
# capturing (only one) front space of each attribute
# quotes aren't needed to match, but are captures when available
<
(span|font) # tagname
.*?
(?:
(?:
(\s?style="?).*? # style opener
((?:\s?font-size:.+?\s*(?:;|,|(?="))+)|(?:\s?color:.+?\s*(?:;|,|(?="))+))[^"]* # font-size or color style property
((?:\s?font-size:.+?\s*(?:;|,|(?="))+)|(?:\s?color:.+?\s*(?:;|,|(?="))+))[^"]* # font-size or color style property
# (the line above should be made optional.)
("?) # style end quote if available
|
(\s?size="?.*?(?:(?=\s)|"|(?=>))) # size attribute
|
(\s?color="?.*?(?:(?=\s)|"|(?=>))) # color attribute
|
(?=>)
)
.*?
){3}
>
<span style="color: #2D1600; vertical-align: top; font-size: 12px; padding-left: 15px" bar="foo" color=red size="2" wordInventedAttribute="expendable">1</span>
Only the color and font-size style properties and the color and size attributes should match
Below, you see what I have, but there are two things I want to improve :
- First I need the regex to match, even when only one of the two style sheet properties is available (only font-size or only color)
- And second, it would be nice if a heading space is captured with the style sheets properties to. I have tried this already but didn't succed. with the tag attributes it does work.
# one tag name matcher and three times the style or attribute matcher with each 6 match possibilities (three times to reckon with the attributes placing)
# capturing (only one) front space of each attribute
# quotes aren't needed to match, but are captures when available
<
(span|font) # tagname
.*?
(?:
(?:
(\s?style="?).*? # style opener
((?:\s?font-size:.+?\s*(?:;|,|(?="))+)|(?:\s?color:.+?\s*(?:;|,|(?="))+))[^"]* # font-size or color style property
((?:\s?font-size:.+?\s*(?:;|,|(?="))+)|(?:\s?color:.+?\s*(?:;|,|(?="))+))[^"]* # font-size or color style property
# (the line above should be made optional.)
("?) # style end quote if available
|
(\s?size="?.*?(?:(?=\s)|"|(?=>))) # size attribute
|
(\s?color="?.*?(?:(?=\s)|"|(?=>))) # color attribute
|
(?=>)
)
.*?
){3}
>