Click to See Complete Forum and Search --> : XML to specific text format


E-Rock
September 16th, 2003, 11:12 AM
Hi, I'm relatively new to XML/XSLT. I managed to figure out how to create some XSL files to to use with some XML files for plain text and HTML formats. I am trying to figure out another text formatting.

The format is like plain text except I want only 64 characters per line maximum. The other item for this formatting is that words do not get cut off. So for instance, if I have used 58 out of the 64 characters and I come across a word that is 7 letters long, I do not want to split this word up between line 'a' and line 'b'. I will just move the word to the next line.

I looked at the substring functions, but I don't think they will work for me. They want a specified start and stop position which I do not know ahead of time.

I thought maybe there was a way to keep a running count of how many characters I've used up and use some kind of method to tokenize the substrings out. I'm not having any luck right now locating information on this.

If any of you could provide information or ideas on this, I'd appreciate it.

E

khp
September 16th, 2003, 02:14 PM
Well, producing text files that are always as close to 64 characters per line as possible, without word breaks, is very very difficult in xslt, because you don't have true variables in xslt.
(There is something called variables but they only work like constants.

As long as we are only operating on single text nodes it is possible, to do something like what you want, with recursive use of a named template. Something like this... (This is just written of the top of my head without any testing, so beware of bugs)


<xsl:template name="textCutter">
<xsl:param name="outString"/>
<xsl:param name="remaining">64</xsl:paramater>
<xsl:variable name="firstWord">
<xsl:value-of select="substring-before($outString,' ')"/>
</xsl:variable>
<xsl:choose>
<xsl:when test="-string-length($firstWord)&gt;$remaining">
<xsl:text>
</xsl:text>
<xsl:call-template name="textCutter">
<xsl:with-param name="outString">
<xsl:value-of select="$outString"/>
</xsl:with-param>
<xsl:with-param name="remaining">64</xsl:with-param>
</xsl:call-template>
</xsl:when>
<xsl:default>
<xsl:value-of select="$firstWord"/>
<xsl:call-template name="textCutter">
<xsl:with-param name="outString">
<xsl:value-of select="substring-after($outString,' ')"/>
</xsl:with-param>
<xsl:with-param name="remaining">
<xsl:value-of select="$remaining - string-length($firstword)"/>
</xsl:call-template>
</xsl:default>
</xsl:choose>
</xsl:template>


Solving the problem in general, when text might come from any number of nodes, is next to impossible in xslt.
And note that the above code is extremely inefficient.
Also note that I don't check for newline characters that are already in the text, this might cause some odd breaks, so these still needs to be removed.

E-Rock
September 16th, 2003, 03:07 PM
Hey, thanks for the info. I found the following link:

http://www.dpawson.co.uk/xsl/sect2/N7240.html

If you scroll to "5. splliting lines at n characters," you'll see a snippet of XSL code there. Check it out. I modified it to fit the specifics of the format I'm going for and it seems to work ok (right now).

Thanks for your help though, I appreciate it.

E


Edit: It's pretty close to "correct." The only problem with the author's XSL code is that the last line of output should actually be on the line above it. I'm not sure what's causing it right now. I'm trying to follow the recursion that's going on with it at the moment. - E

khp
September 16th, 2003, 07:24 PM
The code you linked to, is a somewhat more efficent method that uses the same recursivness that I suggested above.

The idear behind my code above, is to output one word at a time, and checking if the output string is going to overflow.
The code you linked, takes things a step furter and uses a more direct method of finding the longest substring that doesn't overflow the output. But still uses recursive calls for each outputted line, which technically speaking has a runtime of O(Nē), because of the string coping done at each recursive call.

For small files this is ok, but on large files, both methods are extremely inefficient.

E-Rock
September 17th, 2003, 10:47 AM
There are a couple slight problems with the author's XSL code.

For example, if you specify a width of 30 and the value of the 30th character is a single letter such as 'a' or 'I', it should output this character on the same line, but instead looks for the first white space and writes the line out without the 'I'.

The second problem is when the remaining text to be examined has a width value that is less than the specified width.

In other words, if the remaining text is 50 characters long and the width per line is 64 characters, it generates incorrect output. To remedy this, I added the following:

<xsl:otherwise>
<xsl:choose>
<xsl:when test="string-length($txt) &lt; $width">
<xsl:value-of select="string-length($txt)"/>
</xsl:when>

This snippet of code was placed after the author check for a width value of '0'.

This effectively assigns 'real-width' the value of the string length.

I still need to do more testing on it, but I haven't had any 'bad' output generated from my current XML file yet.

E