RTF-to-HTML Converter

Environment: VC++, RichEdit V.1

Note: DoxyGen compatible Dokumentation, Warning Level 2
Portions (Tree Template) From Alexander Kovachev's Tree Class.

To edit formatted text, Windows provides a RichEdit control (RICHED32.DLL, respectively RICHED20.DLL), which is encapsulated in MFC using the RichEditCtrl class. It is stored in the RichEdit (RTF) format; WordPad and MS Word are examples of applications that use RTF files. But today's world is speaking HTML, so I looked for a possibility to convert RTF data to HTML data. I haven't found any free example yet, so I decided to write my own class as an Exporter. Here it is.

The source zip file contains the needed classes and file:

  • RTF2HTMLConverter.h/cpp, which is the main converter class CRTF_HTMLConverter
  • RTF2HTMLTree.h/cpp, which contains Alexander Kovachev's template tree class
  • Util.h/cpp, some very simple helper routines
  • RTF2HTML.h/cpp, a console-based converter demo app

The converter class itself does no reading or writing from/to files or RichEdit controls; this has to be done outside. (For example, to learn how to stream the complete RTF content from/into a RichEdit controls, just look here in the same section.) The class is derived only from CObject, and works with CString >>/<< streaming functions. When streaming in, the data is converted.

Note: Only the RTF->HTML direction is supported at the moment. There is also a very small subset of possible RTF supported, at this time.
  • Bold, Italic, Underline
  • Font Size, Color, and Face
  • Paragraph alignment
  • Special characters, such as encoded German Umlauts

I hope the class is easy to extend (for new tags, mostly ::R2H_InterpretTag has to be modified) and any suggestions or extensions are very welcome; I'll post them here. But please don't give me "My RTF file isn't correctly exported" comments; I mentioned it is only a demo and only a few tags are currently supported. I've made my RTF file using the WordPad editor shipped with Windows; MS Word builds a more complex RTF structure. For complete RTF documentation, see MSDN ("RTF Specification").

An RTF file stores text data in a structured way, together with formatting tags (slightly similar to HTML). Let's have the following example:

--
TEST BIG SMALL AGAIN BROWN BLUE AND ANOTHER fONT.

 

 

This is a left-aligned paragraph

right-aligned one

centered one

--
It is represented in RTF as the following:

{\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fswiss\
  fcharset0 Arial;}{\f1\fmodern\fprq1\fcharset0 Courier New;}
                   {\f2\fswiss\fprq2\fcharset0 Arial;}
                   {\f3\fnil\fcharset2 Symbol;}}
{\colortbl ;\red128\green0\blue0;\red0\green0\blue255;}
\viewkind4\uc1\pard\f0\fs24 TEST \fs40 BIG \fs24 SMALL AGAIN\b
          \cf1 BROWN \cf2 BLUE \cf0\b0 AND \f1 ANOTHER fONT.\par
\par
\par
\f2 This is a left-aligned paragraph\par
\pard\qr right-aligned one\par
\pard\qc centered one\par
}

ConvertRTF2HTML is the main converting procedure. It performs the following steps:

  1. R2H_BuildTree
    As you see, RTF has a nested structure, where each section is in braces {}. So, our first step is to build a tree structure :
    +RTF1
      +COLORTBL
      +FONTTBL
        +F0
        +F1
        +F2
        +F3
    Here, I've just noted the section's first attribute (section name). Each section then contains more code; both plain text (RTF1 is the main section with the main text) and attributes.
  2. R2H_SetMetaData
    Sub-Items such as colortbl and fonttbl are helper tables and in the main text's RTF tags there are references to it, so these global attributes have to be scanned and stored.
  3. R2H_CreateHTMLElements
    Loop thru RTF1 main text and add HTML elements. HTML elements could be either:
    • Plain Text—Is added like it is
    • RTF tags starting with a \. These have to be converted to the correspondig HTML tags with R2H_InterpretTag. Sometimes, there must be look-ups in global tables (e.g. color or font table), or previously inserted elements must be scanned or modified.
  4. R2H_GetHTMLHeader—Write HTML header in target HTML
  5. R2H_GetHTMLElements—Dump added HTML elements in target HTML
  6. R2H_GetHTMLFooter—Write HTML footer in target HTML

Ready!

Downloads

Download source and demo project - 15 Kb


Comments

  • Data base access to excel

    Posted by vancong180182 on 06/23/2006 04:58am

    could you tell me how to read access table then write to a excel file. pls send me the demo project. Thanks

    Reply
  • HTML contains more code than needed

    Posted by _Sanjay_ on 03/18/2006 12:48am

    The html contains more lines than needed
    For eg
    {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Bitstream Vera Sans Mono;}{\f1\fnil Arial;}}
    {\colortbl ;\red0\green0\blue0;}
    \viewkind4\uc1\pard\tx720\cf1\f0\fs26 Hello World\f1\fs20\par
    }
    
    converts to 
    
    ""
    
    Hello World
    
    

    "" instead of just one line ""Hello World""

    Reply
  • HTML contains more code than needed

    Posted by _Sanjay_ on 03/18/2006 12:47am

    The html contains more lines than needed
    For eg
    {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Bitstream Vera Sans Mono;}{\f1\fnil Arial;}}
    {\colortbl ;\red0\green0\blue0;}
    \viewkind4\uc1\pard\tx720\cf1\f0\fs26 Hello World\f1\fs20\par
    }
    
    converts to 
    
    
    
    Hello World
    
    

    instead of just one line Hello World

    Reply
  • Cool.. how bout HTML 2 RTF

    Posted by Legacy on 09/02/2003 12:00am

    Originally posted by: Muzbye

    :)

    • emp

      Posted by emp on 08/31/2004 10:00pm

      hello

      Reply
    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: October 29, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Are you interested in building a cognitive application using the power of IBM Watson? Need a platform that provides speed and ease for rapidly deploying this application? Join Chris Madison, Watson Solution Architect, as he walks through the process of building a Watson powered application on IBM Bluemix. Chris will talk about the new Watson Services just released on IBM bluemix, but more importantly he will do a step by step cognitive …

  • Live Event Date: October 23, 2014 @ 12:00 p.m. ET / 9:00 a.m. PT Despite the current "virtualize everything" mentality, there are advantages to utilizing physical hardware for certain tasks. This is especially true for backups. In many cases, it is clearly in an organization's best interest to make use of physical, purpose-built backup appliances rather than relying on virtual backup software (VBA - Virtual Backup Appliances). Join us for this eSeminar to learn why physical appliances are preferable to …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds