How to Paste HTML

Environment: Win 32

I was working on a project where I had to paste into textbox HTML, copied from the Browser. A quick search on "HTML Clipboard Format" in MSDN gives you an article that thoroughly explained how HTML is kept in the Clipboard. Unfortunately, this article tells you that it's kept in UTF-8 format without explaining how to convert from UTF-8 back to HTML. So I had do some research on my own.

UTF-8 is the format that allows using Unicode characters in ASCII text by embedding a special token, &#code;, into the text, where the code is Unicode code (in decimal format) for the symbol. For some symbols there are special names. An example " " is " "... You can jump to the specification if you need more examples.

Here is a UTF8ToHtml function, which converts from UTF-8 to HTML. The algorithm is not explained, but you can read more about it here.

//utf8 - pointer to UTF8 formatted text. dwSize - size of UTF8 text; ptr is the pointer to Output buffer.

//The OnClickedPastehtml is the handler for BN_CLICK event of the button in Dialog box. IDC_TEXT is the multiline text box.

void UTF8ToHtml(BYTE *utf8, DWORD dwSize, CHAR *ptr )
{
  int code;
  BYTE *end = utf8 + dwSize;
  while( utf8 < end )
  {
    code = 0;
    if( (*utf8 & 0xF0) == 0xF0  )
    {
      code = (((*utf8)&0x0F) << 18) | (((*(utf8+1))
        & 0x7F)<<12) | (((*(utf8+2)) & 0x7F)<<6)
        | ((*(utf8+3)) & 0x7F );
      utf8+=3;
    }
    else
    {
      if( (*utf8 & 0xE0) == 0xE0 )
      {
        code = (((*utf8)&0x1F) << 12) | (((*(utf8+1))
         & 0x7F)<<6 ) | ((*(utf8+2)) & 0x7F );
        utf8+=2;
      }
      else
      {
        if( (*utf8 & 0xC0) == 0xC0 )
        {
          code = (((*utf8)&0x3F) << 6) | ((*(utf8+1)) & 0x7F) ;
          utf8+=1;
        }
      }
    }


    if( code == 0 )
    {
      *ptr = *utf8;
    }
    else
    {
      char s[10];
      switch(code)
      {
      case 160:
        strcpy(s, "& ");
        break;
      case 34:
        strcpy(s, "&");
        break;
      case 36:
        strcpy( s, "&&");
        break;
      case 60:
        strcpy( s, "&<");
        break;
      case 62:
        strcpy( s, "&>");
        break;
      default:
        sprintf( s, "&#%d;", code );
        break;
      }
      strcpy( ptr, s );
      ptr += strlen(s)-1;
    }
    utf8++;
    ptr++;
  }
  *ptr = 0;
}
LRESULT CDialog::OnClickedPastehtml( WORD wNotifyCode, 
                                     WORD wID,
                                     HWND hWndCtl, 
                                     BOOL& bHandled)
{
  if (!OpenClipboard() )
    return 0; 
  UINT uHtmlFormat = RegisterClipboardFormat("HTML Format");
  UINT uFormat = uHtmlFormat;
  if( IsClipboardFormatAvailable( uHtmlFormat ) == FALSE )
  {
    if( IsClipboardFormatAvailable( CF_TEXT ) == FALSE )
      return 0;
    uFormat = CF_TEXT;
  }
  
  HGLOBAL   hglb;
  LPTSTR    lptstr;
  hglb = GetClipboardData(uFormat);
  if (hglb != NULL) 
  {
    lptstr = (LPTSTR)GlobalLock(hglb); 
    if (lptstr != NULL) 
    {
      char *ptr1 = strstr( lptstr, "<!--StartFragment-->");
      if( ptr1 != 0 )
      {
        ptr1 += 20;
        char * ptr2 = strstr( lptstr, "<!--EndFragment-->");
        int iSize = (ptr2 - ptr1);
        char * tmp = (char*)_alloca( iSize *2);
        UTF8ToHtml((BYTE*)ptr1, iSize, tmp );
        //memcpy(tmp, ptr1, iSize );
        //tmp[iSize] = 0;
        SetDlgItemText(IDC_TEXT, tmp );
      }
      else
        SetDlgItemText(IDC_TEXT, lptstr );
      GlobalUnlock(hglb); 
    }
  }
  CloseClipboard(); 
  return 0;
}



Comments

  • Unicode Build

    Posted by Legacy on 02/06/2004 12:00am

    Originally posted by: Andy Tang

    This line
    lptstr = (LPTSTR)GlobalLock(hglb);

    should be
    lpstr = (LPSTR)GlobalLock(hglb);

    where
    LPTSTR lptstr;
    becomes
    LPSTR lpstr;

    This is because GlobalLock will ALWAYS return non wide char version of the code. In a unicode builded application, LPTSTR will force it to be wide char when it isn't, so you must implicitly force it to non wide char.

    Correct me if I'm wrong. :)

    Reply
  • Create Text File in UTF-8

    Posted by Legacy on 01/20/2004 12:00am

    Originally posted by: shanmuk

    How to create Text file in UTF-8 file format any idea?

    Reply
  • How to Drag and Drop Hyperlinks

    Posted by Legacy on 09/04/2003 12:00am

    Originally posted by: Agent of FBI

    How to Drag and Drop Hyperlinks (like in WebZip)?

    Reply
  • UTF-8 to HTML

    Posted by Legacy on 07/30/2003 12:00am

    Originally posted by: Gail

    I need to convert my MSN 8 Favorites in UTF-8 to HTML so that they will mesh with Internet Explorer Favorites.  I can not find a converter to do this.  Any ideas?
    

    Reply
  • How to paste a HTML page ( graphic and text) ?

    Posted by Legacy on 06/22/2003 12:00am

    Originally posted by: Holm

    Hi,

    Can someone explain how to paste my HTML formated page including text and a picture into the clipboard?

    Text only is simple, a picture is simple too.

    Both as a page ????

    Thanks for your help.

    Holm

    Reply
  • Drag-drop problem (not HTML)

    Posted by Legacy on 04/20/2003 12:00am

    Originally posted by: rajas

    Can someone tell me how I can debug my drag-drop implementation? Obviously i cannot step through my code - and usually I just manage to 'hang' either my application or MSVC6.
    Secondarily, I have drag-drop working between different windows in my application; but I am not able to drag items on to other applications (say MSWord). However, I can use the clipboard! I use the same code to create the datasource (am using COleDataSource ) - in one instance it is SetClipboard and in another it is DoDragDrop. This has been rather frustrating.

    Reply
  • Thanks

    Posted by Legacy on 08/04/2002 12:00am

    Originally posted by: Paul Selormey

    I have just used it :-)

    Thanks so much sharing it.

    Best regards,
    Paul.

    Reply
  • Code is incorrect due HTML formatting.

    Posted by Legacy on 08/02/2002 12:00am

    Originally posted by: George Ter-Saakov

    Hi guys. You are right i gave incorrect deffinition of UTF-8. But the article was about how to convert UTF-8 to HTML so i did not convcentrate on defenition of UTF-8. Sorry about that.

    Also due the HTML formatting the code is shown incorrectly.
    The "&amp;&nbsp;" shown as "& "
    Here is the correct code for switch statement in UTF8ToHtml

    switch(code)
    {
    case 160:
    strcpy(s, "&amp;&nbsp;");
    break;
    case 34:
    strcpy(s, "&amp;");
    break;
    case 36:
    strcpy( s, "&amp;&amp;");
    break;
    case 60:
    strcpy( s, "&amp;&lt;");
    break;
    case 62:
    strcpy( s, "&amp;&gt;");
    break;
    default:
    sprintf( s, "&amp;#%d;", code );
    break;
    }


    Reply
  • That's not UTF-8!

    Posted by Legacy on 07/23/2002 12:00am

    Originally posted by: David Piepgrass

    character coding such as   or &whatever are HTML-specific and have nothing to do with UTF-8. UTF-8 is a way of encoding UNICODE characters in an 8-bit stream by translating non-ASCII characters (characters above 0x007F) into a series of 2 or more non-ASCII characters between 0x80 and 0xFF. HTML can represent characters above 0x007F using entirely ASCII characters, therefore it is NOT UTF-8!

    Reply
  • HTML Buffer

    Posted by Legacy on 07/11/2002 12:00am

    Originally posted by: VC

    Does anyone know how to show a buffer(non persisted) in html view.
    Details: I have CHtmlView derived view in which i wanted to show a html page which is generated dynamically by xml/xslt.

    Any clue is appreciated.
    Thanks

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • With 81% of employees using their phones at work, companies have stopped asking: "Is corporate data leaking from personal devices?" and started asking: "How do we effectively prevent corporate data from leaking from personal devices?" The answer has not been simple. ZixOne raises the bar on BYOD security by not allowing email data to reside on the device. In addition, Zix allows employees to maintain complete control of their personal device, therefore satisfying privacy demands of valued employees and the …

  • Do you know where your data is? Consumer cloud-based file sharing services store your sensitive company data on servers outside of your control, outside of your policy and regulatory guidelines – maybe even outside your country – and not managed by you. The potential for data leakage, security breaches, and harm to your business is enormous. Download this white paper to learn about file sync and share alternatives that allow you to manage and protect your sensitive data while integrating and …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds