| CodeGuru Home | VC++ / MFC / C++ | .NET / C# | Visual Basic | Newsletters | VB Forums | Developer.com |
|
|||||||
| Visual C++ Programming Ask questions about Windows programming with Visual C++ and help others by answering their questions. |
![]() |
|
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
Code page in WideCharToMultiByte
I am using _getmbcp() to get the current code page used in WideCharToMultiByte. By the way, it is a _UNICODE compile.
I am trying to convert Kanji unicode to multibyte. Somewhere along the line I not getting the correct characters, but it could be elsewhere. Is this the correct code page for this situation? Last edited by Bob H; September 26th, 2002 at 12:06 PM. |
|
#2
|
|||
|
|||
|
_getmbcp returns the current multibyte code page.
Another possibly related issue: I thought that all uncode strings could be translated into a 2-bytes per character multibyte string using WideCharToMultiByte. But, in one of my books on unicode, I see a table which shows the number of bytes to encode UTF-8 characters and the number goes to 4. Does anyone know if Kanji uses more than 2-bytes? |
|
#3
|
|||
|
|||
|
I read your reply to another unicode question on the forum which was helpful.
I have created a test program for my contact in Japan to run with Windows 2000. It displays text which he enters in a CEdit box and in another CEdit box the length of the CString string holding the text is displayed. The simple test program is unicode compiled. By the way, the code runs great with MS Mincho (with the code page set to Japan) on my XP machine but fails when it runs on Win XP, 2000 computers in Japan. So, I presume if the length of text equals the length of the string, we are in the UTF-16 mode. If we are not, then I am in deep trouble. My software assumes one 2-byte TCHAR per character. The code uses macros like _istlead and _tcsinc. Is the UTF value set by the font or the operating system and is there a way to test for it and/or set it? |
|
#4
|
|||
|
|||
|
See GetFontUnicodeRanges( ) in MSDN.
__________________
Waqar |
|
#5
|
||||
|
||||
|
Hum...
Quote:
Another possibility is to use the Windows API calls which work fine for me. Code:
UINT LangIDToCodePage(long lLangID)
{
char codepage[7];
int Res;
memset(codepage, 0, 7);
Res = GetLocaleInfo(lLangID, LOCALE_IDEFAULTANSICODEPAGE, codepage, 6);
if (Res != 0) {
return atoi(codepage);
} else {
return CP_ACP;
}
}
...
// On startup do : // for me in OnCreate
m_InputCP = LangIDToCodePage(LOWORD(GetKeyboardLayout(0)));
// In your message handler do :
case WM_INPUTLANGCHANGE :
m_InputCP = LangIDToCodePage(LOWORD(lParam));
bHandled = TRUE;
break;
// When you convert from Unicode to MuliByte, use m_InputCP as the codepage
__________________
Get this small utility to do basic syntax highlighting in vBulletin forums (like Codeguru) easily. Supports C++ and VB out of the box, but can be configured for other languages. Last edited by Yves M; September 27th, 2002 at 10:15 AM. |
|
#6
|
|||
|
|||
|
I can't imagine that a single byte code page would be the situation since the problem is occurring on Win 2000 computers in Japan. But, I will create a test dialog which displays the value of your routine and _getmbcp().
|
|
#7
|
||||
|
||||
|
True, it would not be related to your problem with japanese, but in Russian, Greek arabic, Hebrew etc things wouldn't work.
Oh yes, by the way you will have to rewrite my LangIDToCodePage function for Unicode since you compile your app for Unicode. Last edited by Yves M; September 28th, 2002 at 08:39 AM. |
|
#8
|
|||
|
|||
|
The code also services ANSI purposes -- English, German, etc. -- and Win 9x computers so I need to go the TCHAR/_MBCS route. There will be a separate _MBCS build for 9x computers which by the way works correctly on Japanese computers. It is the unicode version which has problems which are probably due to my mapping between text characters and text glyphs.
I don't sufficient resources to have a different code base for this unicode Japanese application. Also I don't want to rewrite all MFC controls which use CString I believe. Evidence so far is that there is one TCHAR per text character. I have a bastardized GetGlyphIndex function which was inherited from the _MBCS world (and works for that world). I need to try the true unicode call for this function and I think my problem may be solved. |
|
#9
|
|||
|
|||
|
Since the last posting I have figured out my problems and learned some things.
First, the LangIDToCodePage code returns the same value as _getmbcp(). Second, in my _mbcs compile I was using what, I believe, are called character codes for GetGlyphOutline. This does not work in general for a _unicode compile. When I used glyph indices (which for ascii codes < 127 differ from character coces by 29) and set the glyphindex flag in GetGlyphOutline, my problem went away. I used GetCharacterPlacement to get the glyph indices. |
|
#10
|
||||
|
||||
|
Quote:
|
|
#11
|
|||
|
|||
|
I am fairly sure that English can be entered from a Japanese keyboard. I get a lot of emails from Japan in English.
|
|
#12
|
||||
|
||||
|
Well, I can also enter Japanese on my spanish or my swiss keyboards when I switch input locales
|
![]() |
| Bookmarks |
|
||||||
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|