How to Paste HTML

Environment: Win 32

I was working on a project where I had to paste into textbox HTML, copied from the Browser. A quick search on "HTML Clipboard Format" in MSDN gives you an article that thoroughly explained how HTML is kept in the Clipboard. Unfortunately, this article tells you that it's kept in UTF-8 format without explaining how to convert from UTF-8 back to HTML. So I had do some research on my own.

UTF-8 is the format that allows using Unicode characters in ASCII text by embedding a special token, &#code;, into the text, where the code is Unicode code (in decimal format) for the symbol. For some symbols there are special names. An example " " is " "... You can jump to the specification if you need more examples.

Here is a UTF8ToHtml function, which converts from UTF-8 to HTML. The algorithm is not explained, but you can read more about it here.

//utf8 - pointer to UTF8 formatted text. dwSize - size of UTF8 text; ptr is the pointer to Output buffer.

//The OnClickedPastehtml is the handler for BN_CLICK event of the button in Dialog box. IDC_TEXT is the multiline text box.

void UTF8ToHtml(BYTE *utf8, DWORD dwSize, CHAR *ptr )
{
  int code;
  BYTE *end = utf8 + dwSize;
  while( utf8 < end )
  {
    code = 0;
    if( (*utf8 & 0xF0) == 0xF0  )
    {
      code = (((*utf8)&0x0F) << 18) | (((*(utf8+1))
        & 0x7F)<<12) | (((*(utf8+2)) & 0x7F)<<6)
        | ((*(utf8+3)) & 0x7F );
      utf8+=3;
    }
    else
    {
      if( (*utf8 & 0xE0) == 0xE0 )
      {
        code = (((*utf8)&0x1F) << 12) | (((*(utf8+1))
         & 0x7F)<<6 ) | ((*(utf8+2)) & 0x7F );
        utf8+=2;
      }
      else
      {
        if( (*utf8 & 0xC0) == 0xC0 )
        {
          code = (((*utf8)&0x3F) << 6) | ((*(utf8+1)) & 0x7F) ;
          utf8+=1;
        }
      }
    }


    if( code == 0 )
    {
      *ptr = *utf8;
    }
    else
    {
      char s[10];
      switch(code)
      {
      case 160:
        strcpy(s, "& ");
        break;
      case 34:
        strcpy(s, "&");
        break;
      case 36:
        strcpy( s, "&&");
        break;
      case 60:
        strcpy( s, "&<");
        break;
      case 62:
        strcpy( s, "&>");
        break;
      default:
        sprintf( s, "&#%d;", code );
        break;
      }
      strcpy( ptr, s );
      ptr += strlen(s)-1;
    }
    utf8++;
    ptr++;
  }
  *ptr = 0;
}
LRESULT CDialog::OnClickedPastehtml( WORD wNotifyCode, 
                                     WORD wID,
                                     HWND hWndCtl, 
                                     BOOL& bHandled)
{
  if (!OpenClipboard() )
    return 0; 
  UINT uHtmlFormat = RegisterClipboardFormat("HTML Format");
  UINT uFormat = uHtmlFormat;
  if( IsClipboardFormatAvailable( uHtmlFormat ) == FALSE )
  {
    if( IsClipboardFormatAvailable( CF_TEXT ) == FALSE )
      return 0;
    uFormat = CF_TEXT;
  }
  
  HGLOBAL   hglb;
  LPTSTR    lptstr;
  hglb = GetClipboardData(uFormat);
  if (hglb != NULL) 
  {
    lptstr = (LPTSTR)GlobalLock(hglb); 
    if (lptstr != NULL) 
    {
      char *ptr1 = strstr( lptstr, "<!--StartFragment-->");
      if( ptr1 != 0 )
      {
        ptr1 += 20;
        char * ptr2 = strstr( lptstr, "<!--EndFragment-->");
        int iSize = (ptr2 - ptr1);
        char * tmp = (char*)_alloca( iSize *2);
        UTF8ToHtml((BYTE*)ptr1, iSize, tmp );
        //memcpy(tmp, ptr1, iSize );
        //tmp[iSize] = 0;
        SetDlgItemText(IDC_TEXT, tmp );
      }
      else
        SetDlgItemText(IDC_TEXT, lptstr );
      GlobalUnlock(hglb); 
    }
  }
  CloseClipboard(); 
  return 0;
}



Comments

  • Unicode Build

    Posted by Legacy on 02/06/2004 12:00am

    Originally posted by: Andy Tang

    This line
    lptstr = (LPTSTR)GlobalLock(hglb);

    should be
    lpstr = (LPSTR)GlobalLock(hglb);

    where
    LPTSTR lptstr;
    becomes
    LPSTR lpstr;

    This is because GlobalLock will ALWAYS return non wide char version of the code. In a unicode builded application, LPTSTR will force it to be wide char when it isn't, so you must implicitly force it to non wide char.

    Correct me if I'm wrong. :)

    Reply
  • Create Text File in UTF-8

    Posted by Legacy on 01/20/2004 12:00am

    Originally posted by: shanmuk

    How to create Text file in UTF-8 file format any idea?

    Reply
  • How to Drag and Drop Hyperlinks

    Posted by Legacy on 09/04/2003 12:00am

    Originally posted by: Agent of FBI

    How to Drag and Drop Hyperlinks (like in WebZip)?

    Reply
  • UTF-8 to HTML

    Posted by Legacy on 07/30/2003 12:00am

    Originally posted by: Gail

    I need to convert my MSN 8 Favorites in UTF-8 to HTML so that they will mesh with Internet Explorer Favorites.  I can not find a converter to do this.  Any ideas?
    

    Reply
  • How to paste a HTML page ( graphic and text) ?

    Posted by Legacy on 06/22/2003 12:00am

    Originally posted by: Holm

    Hi,

    Can someone explain how to paste my HTML formated page including text and a picture into the clipboard?

    Text only is simple, a picture is simple too.

    Both as a page ????

    Thanks for your help.

    Holm

    Reply
  • Drag-drop problem (not HTML)

    Posted by Legacy on 04/20/2003 12:00am

    Originally posted by: rajas

    Can someone tell me how I can debug my drag-drop implementation? Obviously i cannot step through my code - and usually I just manage to 'hang' either my application or MSVC6.
    Secondarily, I have drag-drop working between different windows in my application; but I am not able to drag items on to other applications (say MSWord). However, I can use the clipboard! I use the same code to create the datasource (am using COleDataSource ) - in one instance it is SetClipboard and in another it is DoDragDrop. This has been rather frustrating.

    Reply
  • Thanks

    Posted by Legacy on 08/04/2002 12:00am

    Originally posted by: Paul Selormey

    I have just used it :-)

    Thanks so much sharing it.

    Best regards,
    Paul.

    Reply
  • Code is incorrect due HTML formatting.

    Posted by Legacy on 08/02/2002 12:00am

    Originally posted by: George Ter-Saakov

    Hi guys. You are right i gave incorrect deffinition of UTF-8. But the article was about how to convert UTF-8 to HTML so i did not convcentrate on defenition of UTF-8. Sorry about that.

    Also due the HTML formatting the code is shown incorrectly.
    The "&amp;&nbsp;" shown as "& "
    Here is the correct code for switch statement in UTF8ToHtml

    switch(code)
    {
    case 160:
    strcpy(s, "&amp;&nbsp;");
    break;
    case 34:
    strcpy(s, "&amp;");
    break;
    case 36:
    strcpy( s, "&amp;&amp;");
    break;
    case 60:
    strcpy( s, "&amp;&lt;");
    break;
    case 62:
    strcpy( s, "&amp;&gt;");
    break;
    default:
    sprintf( s, "&amp;#%d;", code );
    break;
    }


    Reply
  • That's not UTF-8!

    Posted by Legacy on 07/23/2002 12:00am

    Originally posted by: David Piepgrass

    character coding such as   or &whatever are HTML-specific and have nothing to do with UTF-8. UTF-8 is a way of encoding UNICODE characters in an 8-bit stream by translating non-ASCII characters (characters above 0x007F) into a series of 2 or more non-ASCII characters between 0x80 and 0xFF. HTML can represent characters above 0x007F using entirely ASCII characters, therefore it is NOT UTF-8!

    Reply
  • HTML Buffer

    Posted by Legacy on 07/11/2002 12:00am

    Originally posted by: VC

    Does anyone know how to show a buffer(non persisted) in html view.
    Details: I have CHtmlView derived view in which i wanted to show a html page which is generated dynamically by xml/xslt.

    Any clue is appreciated.
    Thanks

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • On-demand Event Event Date: September 10, 2014 Modern mobile applications connect systems-of-engagement (mobile apps) with systems-of-record (traditional IT) to deliver new and innovative business value. But the lifecycle for development of mobile apps is also new and different. Emerging trends in mobile development call for faster delivery of incremental features, coupled with feedback from the users of the app "in the wild." This loop of continuous delivery and continuous feedback is how the best mobile …

  • Java developers know that testing code changes can be a huge pain, and waiting for an application to redeploy after a code fix can take an eternity. Wouldn't it be great if you could see your code changes immediately, fine-tune, debug, explore and deploy code without waiting for ages? In this white paper, find out how that's possible with a Java plugin that drastically changes the way you develop, test and run Java applications. Discover the advantages of this plugin, and the changes you can expect to see …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds