| CodeGuru Home | VC++ / MFC / C++ | .NET / C# | Visual Basic | Newsletters | VB Forums | Developer.com |
|
|||||||
| C++ (Non Visual C++ Issues) Ask or answer C and C++ questions not related to Visual C++. This includes Console programming, Linux programming, or general ANSI C++. |
![]() |
|
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
In a nutshell - I have a wstring, but each consecutive pair of wchar_t are to be converted to a single whar_t, and then this is the character code to use for display purposes.
How best to do the conversion? How did it end up like this? downloading foreign web-pages that use a different charset for their content, but the html file itself is ASCII. So the browser would know to take two consecutive chars and convert into a single wchar. I read the contents into a wstring so each char is converted to wchar_t. This much I do not want to change. Thanks in advance. |
|
#2
|
||||
|
||||
|
Re: How to convert 2 wchar_t to 1 wchar_t
Could you attach a sample file that you are trying to read? And some code demonstrating how you are currently reading the file.
gg |
|
#3
|
|||
|
|||
|
I think its me getting confused. Im reading in and parsing a downloaded webpage but it uses the cyrillic character set. The file itself is still ASCII as all webpages are?!
But as I understand now - its necessary for me to switch code pages to map onto the cyrillic characters. Its very confusing all this until you know how. So i was originally thinking there was a 2:1 mapping of characters when a webpage content is something other than our character set. But im wrong i think its still 1:1 but must use a code-page parameter when converting these strings. Im taking out a subset of this content(cyrillic) and displaying it in a CListCtrl. Its just not happening yet. |
|
#4
|
|||
|
|||
|
Re: How to convert 2 wchar_t to 1 wchar_t
The web page may be encoded in UTF-8 Unicode. That's not quite the same thing as using a code page.
|
|
#5
|
||||
|
||||
|
Re: How to convert 2 wchar_t to 1 wchar_t
The HTML should tell you how to interpret the bytes: http://en.wikipedia.org/wiki/Charact...odings_in_HTML
Once we know how the data is encoded, we can help will converting it to a wchar_t string. Here are some references and conversion code samples to get you started: http://www.codeguru.com/forum/showpo...82&postcount=8 gg |
|
#6
|
|||
|
|||
|
Re: How to convert 2 wchar_t to 1 wchar_t
Quote:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1251"> I need to take a subset of the text from this page and display it as cyrillic within a CListCtrl. |
|
#7
|
|||
|
|||
|
Re: How to convert 2 wchar_t to 1 wchar_t
The wikipedia page on 1251:
http://en.wikipedia.org/wiki/Windows-1251 You can use that page to create a 256-element array which maps the characters in the HTML document to their Unicode equivalents. Since none of the Unicode values on the page is excessively large, you should be able to drop them into a wchar_t directly, no need for anything special to make them UTF-16. Last edited by Lindley; September 9th, 2009 at 03:41 PM. |
|
#8
|
|||
|
|||
|
Re: How to convert 2 wchar_t to 1 wchar_t
Quote:
I was under the impression that i could call something like CW2AEX helper class, specifiying proper code-page identifier (e.g. 1251 for Windows-1251 Cyrillic) in the constructor. |
|
#9
|
|||
|
|||
|
Re: How to convert 2 wchar_t to 1 wchar_t
Maybe you can, I'm not an expert in that. I was merely offering an approach which would get the job done, not necessarily the best one.
|
|
#10
|
|||
|
|||
|
Re: How to convert 2 wchar_t to 1 wchar_t
Me neither, im just picking it up as I go along. At home right now so will try these approaches tomorrow and let you know how i get on. Cant wait to see some cyrillic text in my CListCtrl. Dont suppose many people wish for such a thing :-)
|
|
#11
|
||||
|
||||
|
Re: How to convert 2 wchar_t to 1 wchar_t
Here's a more generic version of the conversion code samples:
Code:
#include <windows.h>
#include <string>
#include <sstream>
#include <vector>
std::wstring str_to_wstr(const std::string &str, UINT cp = CP_ACP)
{
int len = MultiByteToWideChar(cp, 0, str.c_str(), str.length(), 0, 0);
if (!len)
return L"ErrorA2W";
std::vector<wchar_t> wbuff(len + 1);
// NOTE: this does not NULL terminate the string in wbuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!MultiByteToWideChar(cp, 0, str.c_str(), str.length(), &wbuff[0], len))
return L"ErrorA2W";
return &wbuff[0];
}//str_to_wstr
std::string wstr_to_str(const std::wstring &wstr, UINT cp = CP_ACP)
{
int len = WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
0, 0, 0, 0);
if (!len)
return "ErrorW2A";
std::vector<char> abuff(len + 1);
// NOTE: this does not NULL terminate the string in abuff, but this is ok
// since it was zero-initialized in the vector constructor
if (!WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(),
&abuff[0], len, 0, 0))
{
return "ErrorW2A";
}//if
return &abuff[0];
}//wstr_to_str
Code:
std::wstring cyrillic_wstr = str_to_wstr(cyrillic_str, 1251); Last edited by Codeplug; September 18th, 2009 at 08:24 AM. Reason: bug fix |
|
#12
|
|||
|
|||
|
Quote:
|
|
#13
|
||||
|
||||
|
Re: How to convert 2 wchar_t to 1 wchar_t
This will give the same results using ATL tools:
Code:
std::wstring cyrillic_wstr = ATL::CA2WEX<>(cyrillic_str.c_str(), 1251); |
![]() |
| Bookmarks |
| Tags |
| wchar_t |
|
||||||
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|