How to Paste HTML

Environment: Win 32

I was working on a project where I had to paste into textbox HTML, copied from the Browser. A quick search on "HTML Clipboard Format" in MSDN gives you an article that thoroughly explained how HTML is kept in the Clipboard. Unfortunately, this article tells you that it's kept in UTF-8 format without explaining how to convert from UTF-8 back to HTML. So I had do some research on my own.

UTF-8 is the format that allows using Unicode characters in ASCII text by embedding a special token, &#code;, into the text, where the code is Unicode code (in decimal format) for the symbol. For some symbols there are special names. An example " " is " "... You can jump to the specification if you need more examples.

Here is a UTF8ToHtml function, which converts from UTF-8 to HTML. The algorithm is not explained, but you can read more about it here.

//utf8 - pointer to UTF8 formatted text. dwSize - size of UTF8 text; ptr is the pointer to Output buffer.

//The OnClickedPastehtml is the handler for BN_CLICK event of the button in Dialog box. IDC_TEXT is the multiline text box.

void UTF8ToHtml(BYTE *utf8, DWORD dwSize, CHAR *ptr )
{
  int code;
  BYTE *end = utf8 + dwSize;
  while( utf8 < end )
  {
    code = 0;
    if( (*utf8 & 0xF0) == 0xF0  )
    {
      code = (((*utf8)&0x0F) << 18) | (((*(utf8+1))
        & 0x7F)<<12) | (((*(utf8+2)) & 0x7F)<<6)
        | ((*(utf8+3)) & 0x7F );
      utf8+=3;
    }
    else
    {
      if( (*utf8 & 0xE0) == 0xE0 )
      {
        code = (((*utf8)&0x1F) << 12) | (((*(utf8+1))
         & 0x7F)<<6 ) | ((*(utf8+2)) & 0x7F );
        utf8+=2;
      }
      else
      {
        if( (*utf8 & 0xC0) == 0xC0 )
        {
          code = (((*utf8)&0x3F) << 6) | ((*(utf8+1)) & 0x7F) ;
          utf8+=1;
        }
      }
    }


    if( code == 0 )
    {
      *ptr = *utf8;
    }
    else
    {
      char s[10];
      switch(code)
      {
      case 160:
        strcpy(s, "& ");
        break;
      case 34:
        strcpy(s, "&");
        break;
      case 36:
        strcpy( s, "&&");
        break;
      case 60:
        strcpy( s, "&<");
        break;
      case 62:
        strcpy( s, "&>");
        break;
      default:
        sprintf( s, "&#%d;", code );
        break;
      }
      strcpy( ptr, s );
      ptr += strlen(s)-1;
    }
    utf8++;
    ptr++;
  }
  *ptr = 0;
}
LRESULT CDialog::OnClickedPastehtml( WORD wNotifyCode, 
                                     WORD wID,
                                     HWND hWndCtl, 
                                     BOOL& bHandled)
{
  if (!OpenClipboard() )
    return 0; 
  UINT uHtmlFormat = RegisterClipboardFormat("HTML Format");
  UINT uFormat = uHtmlFormat;
  if( IsClipboardFormatAvailable( uHtmlFormat ) == FALSE )
  {
    if( IsClipboardFormatAvailable( CF_TEXT ) == FALSE )
      return 0;
    uFormat = CF_TEXT;
  }
  
  HGLOBAL   hglb;
  LPTSTR    lptstr;
  hglb = GetClipboardData(uFormat);
  if (hglb != NULL) 
  {
    lptstr = (LPTSTR)GlobalLock(hglb); 
    if (lptstr != NULL) 
    {
      char *ptr1 = strstr( lptstr, "<!--StartFragment-->");
      if( ptr1 != 0 )
      {
        ptr1 += 20;
        char * ptr2 = strstr( lptstr, "<!--EndFragment-->");
        int iSize = (ptr2 - ptr1);
        char * tmp = (char*)_alloca( iSize *2);
        UTF8ToHtml((BYTE*)ptr1, iSize, tmp );
        //memcpy(tmp, ptr1, iSize );
        //tmp[iSize] = 0;
        SetDlgItemText(IDC_TEXT, tmp );
      }
      else
        SetDlgItemText(IDC_TEXT, lptstr );
      GlobalUnlock(hglb); 
    }
  }
  CloseClipboard(); 
  return 0;
}



Comments

  • Unicode Build

    Posted by Legacy on 02/06/2004 12:00am

    Originally posted by: Andy Tang

    This line
    lptstr = (LPTSTR)GlobalLock(hglb);

    should be
    lpstr = (LPSTR)GlobalLock(hglb);

    where
    LPTSTR lptstr;
    becomes
    LPSTR lpstr;

    This is because GlobalLock will ALWAYS return non wide char version of the code. In a unicode builded application, LPTSTR will force it to be wide char when it isn't, so you must implicitly force it to non wide char.

    Correct me if I'm wrong. :)

    Reply
  • Create Text File in UTF-8

    Posted by Legacy on 01/20/2004 12:00am

    Originally posted by: shanmuk

    How to create Text file in UTF-8 file format any idea?

    Reply
  • How to Drag and Drop Hyperlinks

    Posted by Legacy on 09/04/2003 12:00am

    Originally posted by: Agent of FBI

    How to Drag and Drop Hyperlinks (like in WebZip)?

    Reply
  • UTF-8 to HTML

    Posted by Legacy on 07/30/2003 12:00am

    Originally posted by: Gail

    I need to convert my MSN 8 Favorites in UTF-8 to HTML so that they will mesh with Internet Explorer Favorites.  I can not find a converter to do this.  Any ideas?
    

    Reply
  • How to paste a HTML page ( graphic and text) ?

    Posted by Legacy on 06/22/2003 12:00am

    Originally posted by: Holm

    Hi,

    Can someone explain how to paste my HTML formated page including text and a picture into the clipboard?

    Text only is simple, a picture is simple too.

    Both as a page ????

    Thanks for your help.

    Holm

    Reply
  • Drag-drop problem (not HTML)

    Posted by Legacy on 04/20/2003 12:00am

    Originally posted by: rajas

    Can someone tell me how I can debug my drag-drop implementation? Obviously i cannot step through my code - and usually I just manage to 'hang' either my application or MSVC6.
    Secondarily, I have drag-drop working between different windows in my application; but I am not able to drag items on to other applications (say MSWord). However, I can use the clipboard! I use the same code to create the datasource (am using COleDataSource ) - in one instance it is SetClipboard and in another it is DoDragDrop. This has been rather frustrating.

    Reply
  • Thanks

    Posted by Legacy on 08/04/2002 12:00am

    Originally posted by: Paul Selormey

    I have just used it :-)

    Thanks so much sharing it.

    Best regards,
    Paul.

    Reply
  • Code is incorrect due HTML formatting.

    Posted by Legacy on 08/02/2002 12:00am

    Originally posted by: George Ter-Saakov

    Hi guys. You are right i gave incorrect deffinition of UTF-8. But the article was about how to convert UTF-8 to HTML so i did not convcentrate on defenition of UTF-8. Sorry about that.

    Also due the HTML formatting the code is shown incorrectly.
    The "&amp;&nbsp;" shown as "& "
    Here is the correct code for switch statement in UTF8ToHtml

    switch(code)
    {
    case 160:
    strcpy(s, "&amp;&nbsp;");
    break;
    case 34:
    strcpy(s, "&amp;");
    break;
    case 36:
    strcpy( s, "&amp;&amp;");
    break;
    case 60:
    strcpy( s, "&amp;&lt;");
    break;
    case 62:
    strcpy( s, "&amp;&gt;");
    break;
    default:
    sprintf( s, "&amp;#%d;", code );
    break;
    }


    Reply
  • That's not UTF-8!

    Posted by Legacy on 07/23/2002 12:00am

    Originally posted by: David Piepgrass

    character coding such as   or &whatever are HTML-specific and have nothing to do with UTF-8. UTF-8 is a way of encoding UNICODE characters in an 8-bit stream by translating non-ASCII characters (characters above 0x007F) into a series of 2 or more non-ASCII characters between 0x80 and 0xFF. HTML can represent characters above 0x007F using entirely ASCII characters, therefore it is NOT UTF-8!

    Reply
  • HTML Buffer

    Posted by Legacy on 07/11/2002 12:00am

    Originally posted by: VC

    Does anyone know how to show a buffer(non persisted) in html view.
    Details: I have CHtmlView derived view in which i wanted to show a html page which is generated dynamically by xml/xslt.

    Any clue is appreciated.
    Thanks

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • IBM Worklight is a mobile application development platform that lets you extend your business to mobile devices. It is designed to provide an open, comprehensive platform to build, run and manage HTML5, hybrid and native mobile apps.

  • Companies must routinely transfer files and share data to run their business, work with partners, and speed operations. However, many find the traditional approach to file transfer lacks necessary security, is too complex and difficult to manage, does not support the levels of automation needed, and breaks down when addressing the file transfer requirements of new areas like Big Data analytics and mobile applications. This QuinStreet SmartSelect discusses how the changing business environment is making the use …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds