How to Paste HTML
Environment: Win 32
I was working on a project where I had to paste into textbox HTML, copied from the Browser. A quick search on "HTML Clipboard Format" in MSDN gives you an article that thoroughly explained how HTML is kept in the Clipboard. Unfortunately, this article tells you that it's kept in UTF-8 format without explaining how to convert from UTF-8 back to HTML. So I had do some research on my own.
UTF-8 is the format that allows using Unicode characters in ASCII text by embedding a special token, &#code;, into the text, where the code is Unicode code (in decimal format) for the symbol. For some symbols there are special names. An example " " is " "... You can jump to the specification if you need more examples.
Here is a UTF8ToHtml function, which converts from UTF-8 to HTML. The algorithm is not explained, but you can read more about it here.
//utf8 - pointer to UTF8 formatted text. dwSize - size of UTF8 text; ptr is the pointer to Output buffer.
//The OnClickedPastehtml is the handler for BN_CLICK event of the button in Dialog box. IDC_TEXT is the multiline text box.
void UTF8ToHtml(BYTE *utf8, DWORD dwSize, CHAR *ptr )
{
int code;
BYTE *end = utf8 + dwSize;
while( utf8 < end )
{
code = 0;
if( (*utf8 & 0xF0) == 0xF0 )
{
code = (((*utf8)&0x0F) << 18) | (((*(utf8+1))
& 0x7F)<<12) | (((*(utf8+2)) & 0x7F)<<6)
| ((*(utf8+3)) & 0x7F );
utf8+=3;
}
else
{
if( (*utf8 & 0xE0) == 0xE0 )
{
code = (((*utf8)&0x1F) << 12) | (((*(utf8+1))
& 0x7F)<<6 ) | ((*(utf8+2)) & 0x7F );
utf8+=2;
}
else
{
if( (*utf8 & 0xC0) == 0xC0 )
{
code = (((*utf8)&0x3F) << 6) | ((*(utf8+1)) & 0x7F) ;
utf8+=1;
}
}
}
if( code == 0 )
{
*ptr = *utf8;
}
else
{
char s[10];
switch(code)
{
case 160:
strcpy(s, "& ");
break;
case 34:
strcpy(s, "&");
break;
case 36:
strcpy( s, "&&");
break;
case 60:
strcpy( s, "&<");
break;
case 62:
strcpy( s, "&>");
break;
default:
sprintf( s, "&#%d;", code );
break;
}
strcpy( ptr, s );
ptr += strlen(s)-1;
}
utf8++;
ptr++;
}
*ptr = 0;
}
LRESULT CDialog::OnClickedPastehtml( WORD wNotifyCode,
WORD wID,
HWND hWndCtl,
BOOL& bHandled)
{
if (!OpenClipboard() )
return 0;
UINT uHtmlFormat = RegisterClipboardFormat("HTML Format");
UINT uFormat = uHtmlFormat;
if( IsClipboardFormatAvailable( uHtmlFormat ) == FALSE )
{
if( IsClipboardFormatAvailable( CF_TEXT ) == FALSE )
return 0;
uFormat = CF_TEXT;
}
HGLOBAL hglb;
LPTSTR lptstr;
hglb = GetClipboardData(uFormat);
if (hglb != NULL)
{
lptstr = (LPTSTR)GlobalLock(hglb);
if (lptstr != NULL)
{
char *ptr1 = strstr( lptstr, "<!--StartFragment-->");
if( ptr1 != 0 )
{
ptr1 += 20;
char * ptr2 = strstr( lptstr, "<!--EndFragment-->");
int iSize = (ptr2 - ptr1);
char * tmp = (char*)_alloca( iSize *2);
UTF8ToHtml((BYTE*)ptr1, iSize, tmp );
//memcpy(tmp, ptr1, iSize );
//tmp[iSize] = 0;
SetDlgItemText(IDC_TEXT, tmp );
}
else
SetDlgItemText(IDC_TEXT, lptstr );
GlobalUnlock(hglb);
}
}
CloseClipboard();
return 0;
}

Comments
Unicode Build
Posted by Legacy on 02/06/2004 12:00amOriginally posted by: Andy Tang
This line
lptstr = (LPTSTR)GlobalLock(hglb);
should be
lpstr = (LPSTR)GlobalLock(hglb);
where
LPTSTR lptstr;
becomes
LPSTR lpstr;
This is because GlobalLock will ALWAYS return non wide char version of the code. In a unicode builded application, LPTSTR will force it to be wide char when it isn't, so you must implicitly force it to non wide char.
Correct me if I'm wrong. :)
ReplyCreate Text File in UTF-8
Posted by Legacy on 01/20/2004 12:00amOriginally posted by: shanmuk
How to create Text file in UTF-8 file format any idea?
ReplyHow to Drag and Drop Hyperlinks
Posted by Legacy on 09/04/2003 12:00amOriginally posted by: Agent of FBI
How to Drag and Drop Hyperlinks (like in WebZip)?
ReplyUTF-8 to HTML
Posted by Legacy on 07/30/2003 12:00amOriginally posted by: Gail
ReplyHow to paste a HTML page ( graphic and text) ?
Posted by Legacy on 06/22/2003 12:00amOriginally posted by: Holm
Hi,
Can someone explain how to paste my HTML formated page including text and a picture into the clipboard?
Text only is simple, a picture is simple too.
Both as a page ????
Thanks for your help.
Holm
ReplyDrag-drop problem (not HTML)
Posted by Legacy on 04/20/2003 12:00amOriginally posted by: rajas
Can someone tell me how I can debug my drag-drop implementation? Obviously i cannot step through my code - and usually I just manage to 'hang' either my application or MSVC6.
ReplySecondarily, I have drag-drop working between different windows in my application; but I am not able to drag items on to other applications (say MSWord). However, I can use the clipboard! I use the same code to create the datasource (am using COleDataSource ) - in one instance it is SetClipboard and in another it is DoDragDrop. This has been rather frustrating.
Thanks
Posted by Legacy on 08/04/2002 12:00amOriginally posted by: Paul Selormey
I have just used it :-)
Thanks so much sharing it.
Best regards,
ReplyPaul.
Code is incorrect due HTML formatting.
Posted by Legacy on 08/02/2002 12:00amOriginally posted by: George Ter-Saakov
Hi guys. You are right i gave incorrect deffinition of UTF-8. But the article was about how to convert UTF-8 to HTML so i did not convcentrate on defenition of UTF-8. Sorry about that.
Also due the HTML formatting the code is shown incorrectly.
The "& " shown as "& "
Here is the correct code for switch statement in UTF8ToHtml
switch(code)
{
case 160:
strcpy(s, "& ");
break;
case 34:
strcpy(s, "&");
break;
case 36:
strcpy( s, "&&");
break;
case 60:
strcpy( s, "&<");
break;
case 62:
strcpy( s, "&>");
break;
default:
sprintf( s, "&#%d;", code );
break;
}
ReplyThat's not UTF-8!
Posted by Legacy on 07/23/2002 12:00amOriginally posted by: David Piepgrass
character coding such as or &whatever are HTML-specific and have nothing to do with UTF-8. UTF-8 is a way of encoding UNICODE characters in an 8-bit stream by translating non-ASCII characters (characters above 0x007F) into a series of 2 or more non-ASCII characters between 0x80 and 0xFF. HTML can represent characters above 0x007F using entirely ASCII characters, therefore it is NOT UTF-8!
ReplyHTML Buffer
Posted by Legacy on 07/11/2002 12:00amOriginally posted by: VC
Does anyone know how to show a buffer(non persisted) in html view.
Details: I have CHtmlView derived view in which i wanted to show a html page which is generated dynamically by xml/xslt.
Any clue is appreciated.
Thanks
Reply