RTF-to-HTML Converter

Environment: VC++, RichEdit V.1

Note: DoxyGen compatible Dokumentation, Warning Level 2
Portions (Tree Template) From Alexander Kovachev’s Tree Class.

To edit formatted text, Windows provides a RichEdit control (RICHED32.DLL, respectively RICHED20.DLL), which is encapsulated in MFC using the RichEditCtrl class. It is stored in the RichEdit (RTF) format; WordPad and MS Word are examples of applications that use RTF files. But today’s world is speaking HTML, so I looked for a possibility to convert RTF data to HTML data. I haven’t found any free example yet, so I decided to write my own class as an Exporter. Here it is.

The source zip file contains the needed classes and file:

  • RTF2HTMLConverter.h/cpp, which is the main converter class CRTF_HTMLConverter
  • RTF2HTMLTree.h/cpp, which contains Alexander Kovachev’s template tree class
  • Util.h/cpp, some very simple helper routines
  • RTF2HTML.h/cpp, a console-based converter demo app

The converter class itself does no reading or writing from/to files or RichEdit controls; this has to be done outside. (For example, to learn how to stream the complete RTF content from/into a RichEdit controls, just look here in the same section.) The class is derived only from CObject, and works with CString >>/<< streaming functions. When streaming in, the data is converted.

Note: Only the RTF->HTML direction is supported at the moment. There is also a very small subset of possible RTF supported, at this time.

  • Bold, Italic, Underline
  • Font Size, Color, and Face
  • Paragraph alignment
  • Special characters, such as encoded German Umlauts

I hope the class is easy to extend (for new tags, mostly ::R2H_InterpretTag has to be modified) and any suggestions or extensions are very welcome; I’ll post them here. But please don’t give me “My RTF file isn’t correctly exported” comments; I mentioned it is only a demo and only a few tags are currently supported. I’ve made my RTF file using the WordPad editor shipped with Windows; MS Word builds a more complex RTF structure. For complete RTF documentation, see MSDN (“RTF Specification”).

An RTF file stores text data in a structured way, together with formatting tags (slightly similar to HTML). Let’s have the following example:




This is a left-aligned paragraph

right-aligned one

centered one

It is represented in RTF as the following:

  fcharset0 Arial;}{f1fmodernfprq1fcharset0 Courier New;}
                   {f2fswissfprq2fcharset0 Arial;}
                   {f3fnilfcharset2 Symbol;}}
{colortbl ;red128green0blue0;red0green0blue255;}
viewkind4uc1pardf0fs24 TEST fs40 BIG fs24 SMALL AGAINb
          cf1 BROWN cf2 BLUE cf0b0 AND f1 ANOTHER fONT.par
f2 This is a left-aligned paragraphpar
pardqr right-aligned onepar
pardqc centered onepar

ConvertRTF2HTML is the main converting procedure. It performs the following steps:

  1. R2H_BuildTree
    As you see, RTF has a nested structure, where each section is in braces {}. So, our first step is to build a tree structure :


    Here, I’ve just noted the section’s first attribute (section name). Each section then contains more code; both plain text (RTF1 is the main section with the main text) and attributes.

  2. R2H_SetMetaData
    Sub-Items such as colortbl and fonttbl are helper tables and in the main text’s RTF tags there are references to it, so these global attributes have to be scanned and stored.
  3. R2H_CreateHTMLElements
    Loop thru RTF1 main text and add HTML elements. HTML elements could be either:
    • Plain Text—Is added like it is
    • RTF tags starting with a . These have to be converted to the correspondig HTML tags with R2H_InterpretTag. Sometimes, there must be look-ups in global tables (e.g. color or font table), or previously inserted elements must be scanned or modified.
  4. R2H_GetHTMLHeader—Write HTML header in target HTML
  5. R2H_GetHTMLElements—Dump added HTML elements in target HTML
  6. R2H_GetHTMLFooter—Write HTML footer in target HTML



Download source and demo project – 15 Kb

More by Author

Must Read