CodeGuru Forums -
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic Newsletters VB Forums Developer.com


Newest CodeGuru.com Articles:

  • Building Interactive UIs with ASP.NET Ajax: Rebinding Client-Side Events After a Partial Page Postback
  • Speed Up Repetitive Insert, Update, and Delete Query Statements
  • Binding Data to Silverlight 4.0 Controls Using ASP.NET MVC Framework 2.0
  • ADO.NET Data Services in the .NET Framework

  • Search CodeGuru:
     



    Go Back   CodeGuru Forums > Visual C++ & C++ Programming > C++ (Non Visual C++ Issues)
    FAQ Members List Calendar Search Today's Posts Mark Forums Read

    C++ (Non Visual C++ Issues) Ask or answer C and C++ questions not related to Visual C++. This includes Console programming, Linux programming, or general ANSI C++.

    Reply
     
    Thread Tools Search this Thread Rate Thread Display Modes
      #1    
    Old September 9th, 2009, 07:27 AM
    PRMARJORAM PRMARJORAM is offline
    Member
     
    Join Date: Apr 2005
    Posts: 66
    PRMARJORAM is an unknown quantity at this point (<10)
    Question How to convert 2 wchar_t to 1 wchar_t

    In a nutshell - I have a wstring, but each consecutive pair of wchar_t are to be converted to a single whar_t, and then this is the character code to use for display purposes.
    How best to do the conversion?


    How did it end up like this? downloading foreign web-pages that use a different charset for their content, but the html file itself is ASCII. So the browser would know to take two consecutive chars and convert into a single wchar. I read the contents into a wstring so each char is converted to wchar_t. This much I do not want to change.

    Thanks in advance.
    Reply With Quote
      #2    
    Old September 9th, 2009, 12:25 PM
    Codeplug's Avatar
    Codeplug Codeplug is offline
    Senior Member
     
    Join Date: Nov 2003
    Posts: 1,346
    Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)
    Re: How to convert 2 wchar_t to 1 wchar_t

    Could you attach a sample file that you are trying to read? And some code demonstrating how you are currently reading the file.

    gg
    Reply With Quote
      #3    
    Old September 9th, 2009, 01:01 PM
    PRMARJORAM PRMARJORAM is offline
    Member
     
    Join Date: Apr 2005
    Posts: 66
    PRMARJORAM is an unknown quantity at this point (<10)
    Angry Re: How to convert 2 wchar_t to 1 wchar_t

    I think its me getting confused. Im reading in and parsing a downloaded webpage but it uses the cyrillic character set. The file itself is still ASCII as all webpages are?!

    But as I understand now - its necessary for me to switch code pages to map onto the cyrillic characters. Its very confusing all this until you know how.

    So i was originally thinking there was a 2:1 mapping of characters when a webpage content is something other than our character set. But im wrong i think its still 1:1 but must use a code-page parameter when converting these strings.

    Im taking out a subset of this content(cyrillic) and displaying it in a CListCtrl. Its just not happening yet.
    Reply With Quote
      #4    
    Old September 9th, 2009, 01:11 PM
    Lindley Lindley is offline
    Elite Member
    Power Poster
     
    Join Date: Oct 2007
    Location: Fairfax, VA
    Posts: 8,701
    Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)
    Re: How to convert 2 wchar_t to 1 wchar_t

    The web page may be encoded in UTF-8 Unicode. That's not quite the same thing as using a code page.
    Reply With Quote
      #5    
    Old September 9th, 2009, 01:31 PM
    Codeplug's Avatar
    Codeplug Codeplug is offline
    Senior Member
     
    Join Date: Nov 2003
    Posts: 1,346
    Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)
    Re: How to convert 2 wchar_t to 1 wchar_t

    The HTML should tell you how to interpret the bytes: http://en.wikipedia.org/wiki/Charact...odings_in_HTML

    Once we know how the data is encoded, we can help will converting it to a wchar_t string.

    Here are some references and conversion code samples to get you started: http://www.codeguru.com/forum/showpo...82&postcount=8

    gg
    Reply With Quote
      #6    
    Old September 9th, 2009, 03:27 PM
    PRMARJORAM PRMARJORAM is offline
    Member
     
    Join Date: Apr 2005
    Posts: 66
    PRMARJORAM is an unknown quantity at this point (<10)
    Re: How to convert 2 wchar_t to 1 wchar_t

    Quote:
    Originally Posted by Codeplug View Post
    The HTML should tell you how to interpret the bytes: http://en.wikipedia.org/wiki/Charact...odings_in_HTML

    Once we know how the data is encoded, we can help will converting it to a wchar_t string.

    Here are some references and conversion code samples to get you started: http://www.codeguru.com/forum/showpo...82&postcount=8

    gg
    Well its just a webpage from a Ukrainian website that has the following
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1251">

    I need to take a subset of the text from this page and display it as cyrillic within a CListCtrl.
    Reply With Quote
      #7    
    Old September 9th, 2009, 03:38 PM
    Lindley Lindley is offline
    Elite Member
    Power Poster
     
    Join Date: Oct 2007
    Location: Fairfax, VA
    Posts: 8,701
    Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)
    Re: How to convert 2 wchar_t to 1 wchar_t

    The wikipedia page on 1251:
    http://en.wikipedia.org/wiki/Windows-1251
    You can use that page to create a 256-element array which maps the characters in the HTML document to their Unicode equivalents. Since none of the Unicode values on the page is excessively large, you should be able to drop them into a wchar_t directly, no need for anything special to make them UTF-16.

    Last edited by Lindley; September 9th, 2009 at 03:41 PM.
    Reply With Quote
      #8    
    Old September 9th, 2009, 03:45 PM
    PRMARJORAM PRMARJORAM is offline
    Member
     
    Join Date: Apr 2005
    Posts: 66
    PRMARJORAM is an unknown quantity at this point (<10)
    Re: How to convert 2 wchar_t to 1 wchar_t

    Quote:
    Originally Posted by Lindley View Post
    The wikipedia page on 1251:
    http://en.wikipedia.org/wiki/Windows-1251
    You can use that page to create a 256-element array which maps the characters in the HTML document to their Unicode equivalents. Since none of the Unicode values on the page is excessively large, you should be able to drop them into a wchar_t directly, no need for anything special to make them UTF-16.
    Thanks.

    I was under the impression that i could call something like
    CW2AEX helper class, specifiying proper code-page
    identifier (e.g. 1251 for Windows-1251 Cyrillic) in the constructor.
    Reply With Quote
      #9    
    Old September 9th, 2009, 03:53 PM
    Lindley Lindley is offline
    Elite Member
    Power Poster
     
    Join Date: Oct 2007
    Location: Fairfax, VA
    Posts: 8,701
    Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)Lindley is a name known to all (1000+)
    Re: How to convert 2 wchar_t to 1 wchar_t

    Maybe you can, I'm not an expert in that. I was merely offering an approach which would get the job done, not necessarily the best one.
    Reply With Quote
      #10    
    Old September 9th, 2009, 03:58 PM
    PRMARJORAM PRMARJORAM is offline
    Member
     
    Join Date: Apr 2005
    Posts: 66
    PRMARJORAM is an unknown quantity at this point (<10)
    Re: How to convert 2 wchar_t to 1 wchar_t

    Quote:
    Originally Posted by Lindley View Post
    Maybe you can, I'm not an expert in that. I was merely offering an approach which would get the job done, not necessarily the best one.
    Me neither, im just picking it up as I go along. At home right now so will try these approaches tomorrow and let you know how i get on. Cant wait to see some cyrillic text in my CListCtrl. Dont suppose many people wish for such a thing :-)
    Reply With Quote
      #11    
    Old September 9th, 2009, 04:04 PM
    Codeplug's Avatar
    Codeplug Codeplug is offline
    Senior Member
     
    Join Date: Nov 2003
    Posts: 1,346
    Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)
    Re: How to convert 2 wchar_t to 1 wchar_t

    Here's a more generic version of the conversion code samples:
    Code:
    #include <windows.h>
    #include <string>
    #include <sstream>
    #include <vector>
    
    std::wstring str_to_wstr(const std::string &str, UINT cp = CP_ACP)
    {
        int len = MultiByteToWideChar(cp, 0, str.c_str(), str.length(), 0, 0);
        if (!len)
            return L"ErrorA2W";
        
        std::vector<wchar_t> wbuff(len + 1);
        // NOTE: this does not NULL terminate the string in wbuff, but this is ok
        //       since it was zero-initialized in the vector constructor
        if (!MultiByteToWideChar(cp, 0, str.c_str(), str.length(), &wbuff[0], len))
            return L"ErrorA2W";
    
        return &wbuff[0];
    }//str_to_wstr
    
    std::string wstr_to_str(const std::wstring &wstr, UINT cp = CP_ACP)
    {
        int len = WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(), 
                                      0, 0, 0, 0);
        if (!len)
            return "ErrorW2A";
    
        std::vector<char> abuff(len + 1);
    
        // NOTE: this does not NULL terminate the string in abuff, but this is ok
        //       since it was zero-initialized in the vector constructor
        if (!WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(), 
                                 &abuff[0], len, 0, 0))
        {
            return "ErrorW2A";
        }//if
    
        return &abuff[0];
    }//wstr_to_str
    So you take a Cyrillic string, extracted from the HTML, and call:
    Code:
    std::wstring cyrillic_wstr = str_to_wstr(cyrillic_str, 1251);
    gg

    Last edited by Codeplug; September 18th, 2009 at 08:24 AM. Reason: bug fix
    Reply With Quote
      #12    
    Old September 9th, 2009, 04:06 PM
    PRMARJORAM PRMARJORAM is offline
    Member
     
    Join Date: Apr 2005
    Posts: 66
    PRMARJORAM is an unknown quantity at this point (<10)
    Smile Re: How to convert 2 wchar_t to 1 wchar_t

    Quote:
    Originally Posted by Codeplug View Post
    Here's a more generic version of the conversion code samples:
    Code:
    #include <windows.h>
    #include <string>
    #include <sstream>
    #include <vector>
    
    std::wstring str_to_wstr(const std::string &str, UINT cp = CP_ACP)
    {
        int len = MultiByteToWideChar(cp, 0, str.c_str(), str.length(), 0, 0);
        if (!len)
            return L"ErrorA2W";
        
        std::vector<wchar_t> wbuff(len);
        // NOTE: this does not NULL terminate the string in wbuff, but this is ok
        //       since it was zero-initialized in the vector constructor
        if (!MultiByteToWideChar(cp, 0, str.c_str(), str.length(), &wbuff[0], len))
            return L"ErrorA2W";
    
        return &wbuff[0];
    }//str_to_wstr
    
    std::string wstr_to_str(const std::wstring &wstr, UINT cp = CP_ACP)
    {
        int len = WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(), 
                                      0, 0, 0, 0);
        if (!len)
            return "ErrorW2A";
    
        std::vector<char> abuff(len + 1);
    
        // NOTE: this does not NULL terminate the string in abuff, but this is ok
        //       since it was zero-initialized in the vector constructor
        if (!WideCharToMultiByte(cp, 0, wstr.c_str(), wstr.length(), 
                                 &abuff[0], len, 0, 0))
        {
            return "ErrorW2A";
        }//if
    
        return &abuff[0];
    }//wstr_to_str
    So you take a Cyrillic string, extracted from the HTML, and call:
    Code:
    std::wstring cyrillic_wstr = str_to_wstr(cyrillic_str, 1251);
    gg
    excellent, thanks i will give these a try tomorrow when im back at my desk.
    Reply With Quote
      #13    
    Old September 9th, 2009, 04:12 PM
    Codeplug's Avatar
    Codeplug Codeplug is offline
    Senior Member
     
    Join Date: Nov 2003
    Posts: 1,346
    Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)Codeplug is a glorious beacon of light (400+)
    Re: How to convert 2 wchar_t to 1 wchar_t

    This will give the same results using ATL tools:
    Code:
    std::wstring cyrillic_wstr = ATL::CA2WEX<>(cyrillic_str.c_str(), 1251);
    gg
    Reply With Quote
    Reply

    Bookmarks

    Tags
    wchar_t
    Go Back   CodeGuru Forums > Visual C++ & C++ Programming > C++ (Non Visual C++ Issues)


    Thread Tools Search this Thread
    Search this Thread:

    Advanced Search
    Display Modes Rate This Thread
    Rate This Thread:

    Posting Rules
    You may not post new threads
    You may not post replies
    You may not post attachments
    You may not edit your posts

    BB code is On
    Smilies are On
    [IMG] code is On
    HTML code is Off
    Forum Jump


    All times are GMT -5. The time now is 02:21 AM.



    Acceptable Use Policy

    Internet.com
    The Network for Technology Professionals

    Search:

    About Internet.com

    Legal Notices, Licensing, Permissions, Privacy Policy.
    Advertise | Newsletters | E-mail Offers


    Powered by vBulletin® Version 3.7.3
    Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.