Locales and Facets in Visual C++

-->

If you've come across the "locale" notion while reading on C++ you may feel somewhat unfulfilled. If you planned on using Bjarne Stroustrup's "C++ Programming Language" book to clear your lantern, he brushes off the topic, stating the discussion "is beyond the scope of this book". "C++ Primer" by Stanley Lippman and Josee Lajoie will not be of much help in this topic either; since their implementation did not support locales they used the Standard C implementation. I do still highly recommend both books to anyone interested in serious C++ development.

To begin with, my development environment is MSVC++ 5.0 sp3 and 6.0. The code listings provided have been compiled and tested with both compilers. Note that I use the latest STL (or rather C++ Standard) headers as provided by P.J. Plauger. Mr. Plauger is president of Dinkumware and he wrote many of Microsoft's C++ Standard headers, of which he posts bug fixes at http://www.dinkumware.com/vc_fixes.html These fixes have not yet been incorporated into MSVC++ 6, which is why I still rely on the P.J's headers.

To understand locales, we will begin with the C programming language. C programs use locales via the ANSI locale.h header. A locale defines how time, dates, characters, strings and other expressions are formatted for the current country, or cultural area. In C, the one mandatory, or predefined locale is the "C" locale (for United States defaults), and the empty string, "", meaning implementation defined. The "C" locale assumes that all char data types are 1 byte and that their value is usually less than 256. By default, all locale-dependent routines in the Microsoft run-time library use the code page that corresponds to the "C" locale. In C, you retrieve some locale-specific information with the localeconv() function and set the locale with the setlocale() function. As we'll see later, The runtime library always uses the C locale until told otherwise with setlocale(). C++ adds functionality as well by wrapping locale information in classes. You can then have more than one of these locale objects alive at the same time.

Listing 1 shows the simplest example of a C application using locales by displaying the date in the "C" and the "french" locales. You'll notice the program specifies LC_ALL in setlocale(), which is a #define in locale.h. We can change a specific category of a locale by specifying it as the parameter to setlocale(). In our example, we could of used LC_TIME, and the results would of been identical. On the other hand, specifying LC_NUMERIC would not of affected the date displayed. The locale categories are

LC_COLLATE   Which is used by strcoll() and strxfrm()
LC_CTYPE     Which is used by the character handling functions in 
LC_MONETARY  Monetary formatting, no C functions use this information except localconv()
LC_NUMERIC   For non-monetary formatting.
LC_TIME      Which is used by strftime(), not ctime() or asctime()

Retrieving locale information with the localeconv() function returns a struct lconv object. A quick glance into the header reveals the structures innards. Listing 2 shows the default values for each lconv member in the "C" locale. To demonstrate some of these default values, Listing 3 retrieves and displays some "C" locale default information using the localeconv() function.

By putting our C++ thinking cap on, we can quickly get a feel of how all this could be done in C++; wrap each locale category in a separate class, have a generic locale class container default to the "C" behavior for each of the locale classes. Then if someone needs any of those locale objects to behave in a specific way, he can derive his own class from them and override the necessary virtual functions.

Given our C locale categories, let's begin by determining which classes handle each category. Following is the category, C++ Standard class name and the header file where the class is located. You only need to include to use any of these:

Locale category C++ class name Header

LC_COLLATE    collate             <Locale>
LC_CTYPE      ctype               <Xlocale>
LC_MONETARY   money_put/money_get <Xlocmon>
LC_NUMERIC    numpunct            <Xlocnum>
LC_TIME       time_get/time_put   <Xloctime>

All these classes are templates, capable of handling he 'char' and 'wchar_t' types. This can be a real help for developers creating ANSI and UNICODE applications. The standard library doesn't support MBCS, only single-byte and wide characters. All of these classes have a _Getcat() member function which returns the LC_xxx category. Listing 4 shows the locale category associated with each class.

The C++ framework for the aforementioned classes is as follows; All these classes are derived from the locale::facet base class. _Getcat() for example is a member of locale::facet which simply returns -1. The reason for the locale::facet syntax is because facet is a local class of the enclosing locale class. So, the facet class actually contains information on the localization aspect whereas the locale class is a container of facets. To use a facet, you use the Standard provided global function template use_facet(). More on use_facet() later.

You may be thinking _Getcat() is one of many virtual functions which behaves polymorphically in the derived classes, and has defaults in the base class (returning -1 in this case). In fact, neither locale nor facet have any virtual functions (with the exception of the facet destructor). So then, how do we extend the behavior of these facets? You do so by adding your own unique facet to a locale, with locale::_Addfac(), specifying the facet and the facet Id. All of this machinery can be found in \crt\src\locale.cpp. A locale cannot contain two facets with the same id. So when we'll derive new classes from those previously mentioned, they'll have the same id as the original class, and will therefore replace them. For example, if Mynumpunct is derived from numpunct, and we then call _ADDFAC(locale(), Mynumpunct), the facet Id will be the same as numpunct::Id. Listing 5 shows how a class derived from numpunct has the same Id as numpunct itself. Once created a locale and its facet cannot change. _Getcat() and _Addfac() are nonstandard MSVC additions.

A special note about the ctype class; for efficiency, the C++ Standard requires that ctype for chars be implemented as a template specialization. For this reason ctype differs somewhat from the other classes in that it derives from ctype_base, which itself is derived from locale::facet, as opposed to being directly derived from locale::facet.

ctype<char> : public ctype_base
template<class _E> class ctype : public ctype_base

The ctype_base contains an enumeration indicating a character's particular semantics. This allows very fast character classification using bit masks. Listing 6 shows how ctype can be used for determining character semantics. I recommend stepping through all the examples listed to better grasp all concepts presented. You might be surprised to find the implementation for class locale in locale.cpp and local0.cpp, in the "\crt\src" directory. I would of thought a "\cpp\src" directory structure would of been more appropriate.

Note the _DIGIT #define which Listing 6 uses. This checks if the character is in the '0'-'9' range. Other available bit masks are:

_UPPER to determine if it is an upper case letter A-Z
_LOWER to determine if it is a lower case letter a-z
_SPACE to determine if it is a horizontal tab, carriage return, 
       newline, vertical tab or form feed
_PUNCT to determine if it is a punctuation character
_CONTROL to determine if it is a control character such as BEL and backspace
_BLANK to determine if it is a space
_HEX   to determine if it is a hexadecimal digit (0-9, A-F or a-f)

You'll find the definitions for these in <wchar.h> and in <Xlocinfo.h>. If the syntax seems contrived to you, <Locale> provides some support templates. Using those helper templates, Listing 6 can be rewritten in a clearer fashion as shown in Listing 7.

Previously I mentioned that to access a facet contained in a local you call the use_facet function. As you may of noticed in the sample listings, I use a macro called _USE(). _USE() is simply defined as #define _USE(loc, fac) use_facet(loc, (fac *)0, true) in this implementation. The first parameter is a locale, and the second parameter is a facet. I often use locale::classic() or locale::empty() as the locale parameter. empty() is a nonstandard locale object with no facets which behaves differently from locale(). Whereas classic() is the locale that C defines, empty() makes a transparent locale where any missing facets permit facets from the global locale to shine through. use_facet is a template function which returns a reference to the locale facet found in the specified locale. It does this by calling locale::_Getfacet(). If there is no facet in the locale object, it simply throws bad_cast. We could ask use_facet to create the facet for us, that is what the third parameter is for. Passing true, as is always the case with the _USE() macro, asks use_facet to new the facet. Another macro, _USEFAC, is exactly like _USE() except it passes false to use_facet, asking that the facet not be new'ed. The use of true as a third argument is designed to support lazy evaluation of the standard facets and should not be used indiscriminately.

Listing 8 shows what happens when attempting to use a facet without passing true to use_facet. The facet returned by these two macros is locale::_Locimp::facet::_Fv. P.J. had to provide these two macros because at the time the MSVC++ compiler did not support explicit template argument specification. Listing 9 shows what happens when a function foo() has a template argument which the compiler cannot determine on its own. The function bar() shows a similar problem, in the call to bar(1., 2), the two parameters are of different types, so the compiler does not know which of the two types Type should be, so it chooses not to decide by flagging it as an error. With all of this framework background we're ready to implement our own facets and use them within a real-world C++ Windows application. Win32 provides many functions similar to the facet functionality. IsCharAlpha() for example is similar to isalpha<char>() except that it uses the information specified by the user in the control panel. For Windows applications, this is preferable, except for the fact that it would leave out all the Standard locale and facet classes we've come to love. What we need to do is implement our own facets which make use of the Win32 APIs. That would allow us to program in a consistent fashion.

We can use GetLocaleInfo() to retrieve information about the current user locale and implement our own facet which makes use of this information. Listing 10 shows our Dmoneypunct facet in action. In between executions, if you go in Control Panel and select the Regional Settings Icon and go in the Currency tab, you can make changes and they will immediately be used by the Dmoneypunct class.

Special thanks to P.J. Plauger for his technical proof reading and comments.

/*
Mario Contestabile is a C++ developer at Zero-Knowledge Systems,
and can be reached at MarioC@Computer.Org
*

/* Listing 1, Locales in C *
/*
#include <windows.h>
#include <locale.h>
#include <stdio.h>
#include <time.h>

void localf(const char* pLoc)
{
  if(setlocale(LC_ALL, pLoc) == NULL)
  {
    fprintf(stderr, "Unable to establish locale\n");
    return;
  }
  else
  {
    time_t system_time = time(NULL);
    CHAR time_text[81];
    strftime(time_text, 80, "%x %A %B %d", localtime(&system_time));
    printf("[%s]\n", time_text);
  }
  return;
}

int main()
{

 localf("C");
 localf("french");

 return 1;
}
*

/* Listing 2 struct lconv from <locale.h> with default value in "C" locale.
CHAR_MAX is 127

struct lconv {
 char *decimal_point;  "."
 char *thousands_sep;  ""
 char *grouping;    ""
 char *int_curr_symbol;  ""
 char *currency_symbol;  ""
 char *mon_decimal_point; ""
 char *mon_thousands_sep; ""
 char *mon_grouping;   ""
 char *positive_sign;  ""
 char *negative_sign;  ""
 char int_frac_digits;  CHAR_MAX
 char frac_digits;   CHAR_MAX
 char p_cs_precedes;   CHAR_MAX
 char p_sep_by_space;  CHAR_MAX
 char n_cs_precedes;   CHAR_MAX
 char n_sep_by_space;  CHAR_MAX
 char p_sign_posn;   CHAR_MAX
 char n_sign_posn;   CHAR_MAX
};
*

/* Listing 3 Some "C" local defaults *
/*
#include <locale.h>
#include <stdio.h>


int main()
{

 struct lconv *loc = localeconv();
 printf("[%s] [%s] [%s] [%s]\n", loc->decimal_point, loc->thousands_sep,
                                 loc->currency_symbol, loc->positive_sign);

 return 1;
}
*

/* Listing 4 locale categories associated with each class *
/*
#include <locale>
#include <cassert>

using namespace std;

int main(){

 assert(LC_COLLATE == collate<char>::_Getcat());
 assert(LC_CTYPE == ctype<char>::_Getcat());
 assert(LC_MONETARY == money_put<char>::_Getcat());
 assert(LC_MONETARY == money_get<char>::_Getcat());
 assert(LC_NUMERIC == numpunct<char>::_Getcat());
 assert(LC_TIME == time_put<char>::_Getcat());
 assert(LC_TIME == time_get<char>::_Getcat());

 return 1;
}
*

/* Listing 5 What are the facet Id's? *
/*
#include <iostream>
#include <locale>
#include <cassert>

using namespace std;

class Mynumpunct : public numpunct<char>{
};

int main(){

 assert(numpunct<char>::id == Mynumpunct::id);

 cout << numpunct<char>::id << endl;
 cout << ctype<char>::id << endl;
 cout << collate<char>::id << endl;
 cout << money_put<char>::id << endl;
 cout << money_get<char>::id << endl;
 cout << time_get<char>::id << endl;
 cout << time_put<char>::id << endl;

 cout << numpunct<wchar_t>::id << endl;
 cout << ctype<wchar_t>::id << endl;
 cout << collate<wchar_t>::id << endl;
 cout << money_put<wchar_t>::id << endl;
 cout << money_get<wchar_t>::id << endl;
 cout << time_get<wchar_t>::id << endl;
 cout << time_put<wchar_t>::id << endl;

 return 1;
}
*

/* Listing 6 ctype<char> is a specialized template for efficiency *
/*
#include <iostream>
#include <locale>

using namespace std;

int main(){

 const char a = 'a';

 if(_USE(locale::empty(), ctype<char>).is(_DIGIT, a))
  cout << a << " is a digit" << endl;
 else
  cout << a << " is not a digit" << endl;

 const char b = '1';
 if(_USE(locale::empty(), ctype<char>).is(_DIGIT, b))
  cout << b << " is a digit" << endl;
 else
  cout << b << " is not a digit" << endl;

 return 1;
}
*

/* Listing 7 ctype helper functions *
/*
#include <iostream>
#include <locale>

using namespace std;

int main(){

 const char a = 'a';

 if(isalpha<char>(a))
  cout << a << " is an alphabetic character" << endl;
 else
  cout << a << " is not an alphabetic character" << endl;

 char b = 'b';
 if(islower<char>(b))
  cout << b << " is a lowercase character" << endl;
 else
  cout << b << " is not a lowercase character" << endl;

 b = toupper<char>(b);
 if(islower<char>(b))
  cout << b << " is a lowercase character" << endl;
 else
  cout << b << " is not a lowercase character" << endl;

 return 1;
}
*

/* Listing 8 use_facet new's your facet, or throws bad_cast if there's no
such facet in the given locale*
/*
#include <iostream>
#include <locale>

int main(){

 const char a = 'a';

 try{
  // The _USE macro passes true to use_facet
  if(std::_USE(std::locale(), std::ctype<char>).is(_DIGIT, a))
   std::cout << a << " is a digit" << std::endl;
  else
   std::cout << a << " is not a digit" << std::endl;

  std::locale loc = std::_ADDFAC(std::locale(), new std::ctype<char>);

  if(std::_USEFAC(loc, std::ctype<char>).is(_DIGIT, a))
   std::cout << a << " is a digit" << std::endl;
  else
   std::cout << a << " is not a digit" << std::endl;

  // The _USEFAC macro passes false to use_facet
  if(std::_USEFAC(std::locale(), std::ctype<char>).is(_DIGIT, a))
   std::cout << a << " is a digit" << std::endl;
  else
   std::cout << a << " is not a digit" << std::endl;


 }
 catch(std::bad_cast& ex){
  std::cout << ex.what() << std::endl;
 }
 return 1;
}
*

/* Listing 9 Help your compiler in determining template arguments! *
/*
template<typename Type>
int foo(){
 return 1;
}

template <typename Type>
Type bar(Type p1, Type p2){
 return 1;
}

int main(){

 //foo();  could not deduce template argument
 foo<int>();

 //bar(1., 2); ambiguous template parameter
 bar(1, 2);

 return 1;
}
*

/* Listing 10 C++ and Win32 hand-in-hand *

#include <windows.h>
#include <locale>
#include <iostream>

class Dmoneypunct : public std::moneypunct<TCHAR, true>{
    mutable TCHAR buf[20];

protected:
    virtual TCHAR do_decimal_point() const {
        if(0 != GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SMONDECIMALSEP, buf, sizeof buf))
            return buf[0];
        else
            return std::_USE(std::locale(), std::moneypunct<TCHAR>).decimal_point();
    }
    virtual TCHAR do_thousands_sep() const {
        if(0 != GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SMONTHOUSANDSEP, buf, sizeof buf))
            return buf[0];
        else
            return std::_USE(std::locale(), std::moneypunct<TCHAR>).thousands_sep();
    }

    virtual std::string do_grouping() const{
        
        if(0 != GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SMONGROUPING, buf, sizeof buf))
            return buf;
        else
            return std::_USE(std::locale(), std::moneypunct<TCHAR>).grouping();
    }
    
    virtual std::string do_curr_symbol() const {
        
        if(0 != GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SCURRENCY, buf, sizeof buf))
            return buf;
        else
            return std::_USE(std::locale(), std::moneypunct<TCHAR>).curr_symbol();
    }
    virtual std::string do_positive_sign() const {
        
        if(0 != GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SPOSITIVESIGN, buf, sizeof buf))
            return buf;
        else
            return std::_USE(std::locale(), std::moneypunct<TCHAR>).positive_sign();
    }
    virtual std::string do_negative_sign() const {
        
        if(0 != GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SNEGATIVESIGN, buf, sizeof buf))
            return buf;
        else
            return std::_USE(std::locale(), std::moneypunct<TCHAR>).negative_sign();
    }
};

int main(){

 std::moneypunct<TCHAR, true> i;
 std::cout<< "moneypunct decimal point " << i.decimal_point() << std::endl;
 std::cout<< "moneypunct thousands sep " << i.thousands_sep() << std::endl;
 std::cout<< "moneypunct grouping " << i.grouping() << std::endl;
 std::cout<< "moneypunct currency symbol " << i.curr_symbol() << std::endl;
 std::cout<< "moneypunct positive sign " << i.positive_sign() << std::endl;
 std::cout<< "moneypunct negative sign " << i.negative_sign() << std::endl;

 std::cout<< std::endl;

 Dmoneypunct D;
 std::cout<< "Dmoneypunct decimal point " << D.decimal_point() << std::endl;
 std::cout<< "Dmoneypunct thousands sep " << D.thousands_sep() << std::endl;
 std::cout<< "Dmoneypunct grouping " << D.grouping() << std::endl;
 std::cout<< "Dmoneypunct currency symbol " << D.curr_symbol() << std::endl;
 std::cout<< "Dmoneypunct positive sign " << D.positive_sign() << std::endl;
 std::cout<< "Dmoneypunct negative sign " << D.negative_sign() << std::endl;

 return 1;
}



Comments

  • Locales and Facets in Visual C++

    Posted by Legacy on 08/25/1999 12:00am

    Originally posted by: Lars Schouw

    How do I get information in and out off the class?
    I want to add a number to the class and get it out localized.
    Please reply in email as well, thanks.

    Reply
  • Sharing Locales and facets from EXE to DLL?

    Posted by Legacy on 05/18/1999 12:00am

    Originally posted by: Michael S. Scherotter

    I have a project with one EXE and multiple DLLs.
    I am using the C++ locales and locale::facets to manage
    internationalization. The changes that I make to the EXE's global
    locale are not being reflected in the DLL's locales. Is there any way
    for all of the modules to share one global locale or automatically
    transfer changes from one module to another?

    Reply
  • using Dmoneypunct facet?

    Posted by Legacy on 04/26/1999 12:00am

    Originally posted by: Todd Gruben

    i am very new to localization and am having trouble utilizing your dmoneypunct facet with money_put.  The
    ouput of my routine is simple the integer part of my double value.It seems that my _ADDFACT doesn't take. 
    Here is my code on VC6 SP 2
    
    


    using namespace std;
    locale loc = std::_ADDFAC(std::locale(), new Dmoneypunct);



    double val = 1211.49;
    ostringstream str;
    str.imbue(loc);
    const money_put<char>& t = _USE(loc,money_put<char>);


    ostreambuf_iterator<char> junk (str);
    string s;

    t.put(junk,true,str,'X',val);
    s = str.str();
    cout << s << endl;
    //the output of this function is 1211

    any ideas?

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: September 10, 2014 @ 11:00 a.m. ET / 8:00 a.m. PT Modern mobile applications connect systems-of-engagement (mobile apps) with systems-of-record (traditional IT) to deliver new and innovative business value. But the lifecycle for development of mobile apps is also new and different. Emerging trends in mobile development call for faster delivery of incremental features, coupled with feedback from the users of the app "in the wild". This loop of continuous delivery and continuous feedback is …

  • The explosion in mobile devices and applications has generated a great deal of interest in APIs. Today's businesses are under increased pressure to make it easy to build apps, supply tools to help developers work more quickly, and deploy operational analytics so they can track users, developers, application performance, and more. Apigee Edge provides comprehensive API delivery tools and both operational and business-level analytics in an integrated platform. It is available as on-premise software or through …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds