Easy Unicode

Introduction

I was given a very simple task at work: Take a dialog box that acts as a registration page with a few registration fields (edit boxes), and convert it to support Unicode. This means that the entered data is wide characters. This data of course must be dealt with and should be (in my case) sent to a server. So? Easy isn't it?

I looked it up, studied the concept of Unicode, and?!?!? NADA! You can easily change the character set of your project to Unicode, but who is crazy enough to do that? You will waste lots of time altering your whole code to fit a Unicode character set. You might even end up rewriting your whole code... I could not find an easy or simple way of doing it otherwise. All the solutions I could find contained pages of useless or irrelevant code.

After some research, I came up with a simple and elegant solution that I share here, and I hope it might help someone's project or save some people a very large headache. This article includes code samples and a complete downloadable project (Unicode.zip) that you can compile and play with. In addition, the compiled sample application (Unicode_app.zip) is downloadable as well.

My Solution

Use a Rich Edit-Box Control (CRichEditCtrl) instead of an Edit-Box Control (CEdit). Please note that, to use a rich edit control, you must call AfxInitRichEdit() at the initialization of your application. This function initializes the rich edit control (see the AfxInitRichEdit MSDN documentation regarding rich edit box versions). Now, use SendMessageW to send your text to or from the rich edit box.

What SendMessageW Is

#ifdef UNICODE
#define SendMessage SendMessageW
#else
#define SendMessage SendMessageA
#endif    //!UNICODE

SendMessageW is actually the Unicode instance of SendMessage. If you compile your code with a Unicode character set, SendMessageW is used. Otherwise, SendMessageA is used. You actually want to enforce the usage of the Unicode version of SendMessage without a definition of UNICODE, so you call SendMessageW directly. How do you use SendMessageW? Well, the same way you use SendMessage, but you need to remember that all the function's parameters are Wide Character now (Unicode).

How to Use It

Following is an example code (I used it in Unicode.exe) for the usage of Unicode in a non-Unicode (ANSI) application:

void CUnicodeDlg::OnCopy()
{
   GETTEXTEX wParamIN;
   SETTEXTEX wParamOUT;
   LRESULT lResult;
   WCHAR lParam[100]      = {0};
   wParamIN.cb            = (this->m_rich1.
                             GetWindowTextLength()+1)*2;
   wParamIN.flags         = GT_RAWTEXT;
   wParamIN.codepage      = 1200;
   wParamIN.lpDefaultChar = NULL;
   wParamIN.lpUsedDefChar = NULL;
   wParamOUT.codepage     = 1200;
   wParamOUT.flags        = ST_SELECTION;

   // handle to source control
   lResult = SendMessageW(this->m_rich1.m_hWnd,
                          // message ID
                          EM_GETTEXTEX,
                          // = (WPARAM) () wParam
                          (WPARAM) &wParamIN,
                          // = (LPARAM) () lParam
                          (LPARAM) lParam);

   MessageBoxW(this->m_hWnd, lParam, L"Here we go...",
               MB_OK|MB_ICONASTERISK);

   // handle to destination control
   lResult = SendMessageW(this->m_rich2.m_hWnd,
                          // message ID
                          EM_SETTEXTEX,
                          // = (WPARAM) () wParam
                          (WPARAM) &wParamOUT,
                          // = (LPARAM) () lParam
                          (LPARAM) lParam);
}

How to Use Unicode.exe

My simple application (Unicode.exe) simply demonstrates entering a Unicode string into a rich edit box of a non-Unicode application and copying it to another rich edit box. You can use the Windows Character Map (Start->Programs-> Accessories->System Tools) to get Unicode symbols.

Paste the copied symbols in the Input rich edit box of the Unicode.exe.

Click Copy. A message box with the input string is shown. Click OK.

The entered Unicode string is copied to the Output rich edit box.

Summary

In my sample application, I used lParam as a wide character parameter to contain the Unicode string. You can keep using it to send its content forward, but always remember that it contains Unicode formatted data, meaning that each character/symbol takes twice the memory size.

I hope I could help. Enjoy.



About the Author

Lior Peretz

I'm a developer in Aladdin Ltd. at the Software DRM R&D.

Downloads