CodeGuru
Earthweb Search
Login Forums Wireless Jars Gamelan Developer.com
CodeGuru Navigation
RSS Feeds

RSSAll

RSSVC++/C++

RSS.NET/C#

RSSVB

See more EarthWeb Network feeds

follow us on Twitter

Member Sign In
User ID:
Password:
Remember Me:
Forgot Password?
Not a member?
Click here for more information and to register.

Become a Marketplace Partner

jobs.internet.com

internet.commerce
Partners & Affiliates
















Home >> Visual C++ / C++ >> Controls >> Rich Edit Control >> Conversions


RTF-to-HTML Converter
Rating:

Daniel Beutler (view profile)
August 27, 2003

Environment: VC++, RichEdit V.1

Note: DoxyGen compatible Dokumentation, Warning Level 2
Portions (Tree Template) From Alexander Kovachev's Tree Class.

To edit formatted text, Windows provides a RichEdit control (RICHED32.DLL, respectively RICHED20.DLL), which is encapsulated in MFC using the RichEditCtrl class. It is stored in the RichEdit (RTF) format; WordPad and MS Word are examples of applications that use RTF files. But today's world is speaking HTML, so I looked for a possibility to convert RTF data to HTML data. I haven't found any free example yet, so I decided to write my own class as an Exporter. Here it is.


(continued)




The source zip file contains the needed classes and file:

  • RTF2HTMLConverter.h/cpp, which is the main converter class CRTF_HTMLConverter
  • RTF2HTMLTree.h/cpp, which contains Alexander Kovachev's template tree class
  • Util.h/cpp, some very simple helper routines
  • RTF2HTML.h/cpp, a console-based converter demo app

The converter class itself does no reading or writing from/to files or RichEdit controls; this has to be done outside. (For example, to learn how to stream the complete RTF content from/into a RichEdit controls, just look here in the same section.) The class is derived only from CObject, and works with CString >>/<< streaming functions. When streaming in, the data is converted.

Note: Only the RTF->HTML direction is supported at the moment. There is also a very small subset of possible RTF supported, at this time.
  • Bold, Italic, Underline
  • Font Size, Color, and Face
  • Paragraph alignment
  • Special characters, such as encoded German Umlauts

I hope the class is easy to extend (for new tags, mostly ::R2H_InterpretTag has to be modified) and any suggestions or extensions are very welcome; I'll post them here. But please don't give me "My RTF file isn't correctly exported" comments; I mentioned it is only a demo and only a few tags are currently supported. I've made my RTF file using the WordPad editor shipped with Windows; MS Word builds a more complex RTF structure. For complete RTF documentation, see MSDN ("RTF Specification").

An RTF file stores text data in a structured way, together with formatting tags (slightly similar to HTML). Let's have the following example:

--
TEST BIG SMALL AGAIN BROWN BLUE AND ANOTHER fONT.

 

 

This is a left-aligned paragraph

right-aligned one

centered one

--
It is represented in RTF as the following:

{\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fswiss\
  fcharset0 Arial;}{\f1\fmodern\fprq1\fcharset0 Courier New;}
                   {\f2\fswiss\fprq2\fcharset0 Arial;}
                   {\f3\fnil\fcharset2 Symbol;}}
{\colortbl ;\red128\green0\blue0;\red0\green0\blue255;}
\viewkind4\uc1\pard\f0\fs24 TEST \fs40 BIG \fs24 SMALL AGAIN\b
          \cf1 BROWN \cf2 BLUE \cf0\b0 AND \f1 ANOTHER fONT.\par
\par
\par
\f2 This is a left-aligned paragraph\par
\pard\qr right-aligned one\par
\pard\qc centered one\par
}

ConvertRTF2HTML is the main converting procedure. It performs the following steps:

  1. R2H_BuildTree
    As you see, RTF has a nested structure, where each section is in braces {}. So, our first step is to build a tree structure :
    +RTF1
      +COLORTBL
      +FONTTBL
        +F0
        +F1
        +F2
        +F3
    Here, I've just noted the section's first attribute (section name). Each section then contains more code; both plain text (RTF1 is the main section with the main text) and attributes.
  2. R2H_SetMetaData
    Sub-Items such as colortbl and fonttbl are helper tables and in the main text's RTF tags there are references to it, so these global attributes have to be scanned and stored.
  3. R2H_CreateHTMLElements
    Loop thru RTF1 main text and add HTML elements. HTML elements could be either:
    • Plain Text—Is added like it is
    • RTF tags starting with a \. These have to be converted to the correspondig HTML tags with R2H_InterpretTag. Sometimes, there must be look-ups in global tables (e.g. color or font table), or previously inserted elements must be scanned or modified.
  4. R2H_GetHTMLHeader—Write HTML header in target HTML
  5. R2H_GetHTMLElements—Dump added HTML elements in target HTML
  6. R2H_GetHTMLFooter—Write HTML footer in target HTML

Ready!

Downloads

Download source and demo project - 15 Kb

Tools:
Add www.codeguru.com to your favorites
Add www.codeguru.com to your browser search box
IE 7 | Firefox 2.0 | Firefox 1.5.x
Receive news via our XML/RSS feed







RATE THIS ARTICLE:   Excellent  Very Good  Average  Below Average  Poor  

(You must be signed in to rank an article. Not a member? Click here to register)

Latest Comments:
Data base access to excel - cong buivan (06/23/2006)
HTML contains more code than needed - _Sanjay_ (03/18/2006)
HTML contains more code than needed - _Sanjay_ (03/18/2006)
How to do ansi to html and html to ansi?? - hungtao (08/31/2004)
Cool.. how bout HTML 2 RTF - Legacy CodeGuru (09/02/2003)

View All Comments
Add a Comment:
Title:
Comment:
Pre-Formatted: Check this if you want the text to display with the formatting as typed (good for source code)



(You must be signed in to comment on an article. Not a member? Click here to register)

internet.commediabistro.comJusttechjobs.comGraphics.com

Search:

WebMediaBrands Corporate Info

Legal Notices, Licensing, Reprints, Permissions, Privacy Policy.
Advertise | Newsletters | Shopping | E-mail Offers | Freelance Jobs