RTF-to-HTML Converter
Note: DoxyGen compatible Dokumentation, Warning Level 2
Portions (Tree Template) From Alexander Kovachev's Tree Class.
To edit formatted text, Windows provides a RichEdit control (RICHED32.DLL, respectively RICHED20.DLL), which is encapsulated in MFC using the RichEditCtrl class. It is stored in the RichEdit (RTF) format; WordPad and MS Word are examples of applications that use RTF files. But today's world is speaking HTML, so I looked for a possibility to convert RTF data to HTML data. I haven't found any free example yet, so I decided to write my own class as an Exporter. Here it is.
The source zip file contains the needed classes and file:
- RTF2HTMLConverter.h/cpp, which is the main converter class CRTF_HTMLConverter
- RTF2HTMLTree.h/cpp, which contains Alexander Kovachev's template tree class
- Util.h/cpp, some very simple helper routines
- RTF2HTML.h/cpp, a console-based converter demo app
The converter class itself does no reading or writing from/to files or RichEdit controls; this has to be done outside. (For example, to learn how to stream the complete RTF content from/into a RichEdit controls, just look here in the same section.) The class is derived only from CObject, and works with CString >>/<< streaming functions. When streaming in, the data is converted.
Note: Only the RTF->HTML direction is supported at the moment. There is also a very small subset of possible RTF supported, at this time.
- Bold, Italic, Underline
- Font Size, Color, and Face
- Paragraph alignment
- Special characters, such as encoded German Umlauts
I hope the class is easy to extend (for new tags, mostly ::R2H_InterpretTag has to be modified) and any suggestions or extensions are very welcome; I'll post them here. But please don't give me "My RTF file isn't correctly exported" comments; I mentioned it is only a demo and only a few tags are currently supported. I've made my RTF file using the WordPad editor shipped with Windows; MS Word builds a more complex RTF structure. For complete RTF documentation, see MSDN ("RTF Specification").
An RTF file stores text data in a structured way, together with formatting tags (slightly similar to HTML). Let's have the following example:
--
TEST BIG SMALL AGAIN BROWN BLUE AND ANOTHER fONT.
This is a left-aligned paragraph
right-aligned one
centered one
--
It is represented in RTF as the following:
{\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fswiss\
fcharset0 Arial;}{\f1\fmodern\fprq1\fcharset0 Courier New;}
{\f2\fswiss\fprq2\fcharset0 Arial;}
{\f3\fnil\fcharset2 Symbol;}}
{\colortbl ;\red128\green0\blue0;\red0\green0\blue255;}
\viewkind4\uc1\pard\f0\fs24 TEST \fs40 BIG \fs24 SMALL AGAIN\b
\cf1 BROWN \cf2 BLUE \cf0\b0 AND \f1 ANOTHER fONT.\par
\par
\par
\f2 This is a left-aligned paragraph\par
\pard\qr right-aligned one\par
\pard\qc centered one\par
}
ConvertRTF2HTML is the main converting procedure. It performs the following steps:
- R2H_BuildTree
As you see, RTF has a nested structure, where each section is in braces {}. So, our first step is to build a tree structure :
+RTF1
+COLORTBL
+FONTTBL
+F0
+F1
+F2
+F3
Here, I've just noted the section's first attribute (section name). Each section then contains more code; both plain text (RTF1 is the main section with the main text) and attributes. - R2H_SetMetaData
Sub-Items such as colortbl and fonttbl are helper tables and in the main text's RTF tags there are references to it, so these global attributes have to be scanned and stored. - R2H_CreateHTMLElements
Loop thru RTF1 main text and add HTML elements. HTML elements could be either: - Plain Text—Is added like it is
- RTF tags starting with a \. These have to be converted to the correspondig HTML tags with R2H_InterpretTag. Sometimes, there must be look-ups in global tables (e.g. color or font table), or previously inserted elements must be scanned or modified.
- R2H_GetHTMLHeader—Write HTML header in target HTML
- R2H_GetHTMLElements—Dump added HTML elements in target HTML
- R2H_GetHTMLFooter—Write HTML footer in target HTML
Ready!

Comments
Data base access to excel
Posted by vancong180182 on 06/23/2006 04:58amcould you tell me how to read access table then write to a excel file. pls send me the demo project. Thanks
ReplyHTML contains more code than needed
Posted by _Sanjay_ on 03/18/2006 12:48amThe html contains more lines than needed For eg {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Bitstream Vera Sans Mono;}{\f1\fnil Arial;}} {\colortbl ;\red0\green0\blue0;} \viewkind4\uc1\pard\tx720\cf1\f0\fs26 Hello World\f1\fs20\par } converts to "" Hello WorldReply"" instead of just one line ""Hello World""
HTML contains more code than needed
Posted by _Sanjay_ on 03/18/2006 12:47amThe html contains more lines than needed For eg {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Bitstream Vera Sans Mono;}{\f1\fnil Arial;}} {\colortbl ;\red0\green0\blue0;} \viewkind4\uc1\pard\tx720\cf1\f0\fs26 Hello World\f1\fs20\par } converts to Hello WorldReplyinstead of just one line Hello World
Cool.. how bout HTML 2 RTF
Posted by Legacy on 09/02/2003 12:00amOriginally posted by: Muzbye
:)
-
Replyemp
Posted by emp on 08/31/2004 10:00pmhello
Reply