Locales and Facets in Visual C++

 

If you’ve come across the “locale” notion while reading on C++
you may feel somewhat unfulfilled. If you planned on using Bjarne
Stroustrup’s “C++ Programming Language” book to clear your lantern, he
brushes
off the topic, stating the discussion “is beyond the scope of this book”.
“C++ Primer” by Stanley Lippman and Josee Lajoie will not be of much help
in this topic either; since their implementation did not support locales
they used the Standard C implementation. I do still highly recommend both
books to anyone interested in serious C++ development.

To begin with, my development environment is MSVC++ 5.0 sp3 and 6.0. The
code listings provided have been compiled and tested with both compilers.
Note that I use the latest STL (or rather C++ Standard) headers as provided
by P.J. Plauger. Mr. Plauger is president of Dinkumware and he wrote many of
Microsoft’s C++ Standard headers, of which he posts bug fixes at
http://www.dinkumware.com/vc_fixes.html These fixes have not yet been
incorporated into MSVC++ 6, which is why I still rely on the P.J’s headers.

To understand locales, we will begin with the C programming language. C
programs
use locales via the ANSI locale.h header. A locale defines how time,
dates, characters, strings and other expressions are formatted for the
current
country, or cultural area. To learn more about C, consider checking out some other articles on the site. In C, the one mandatory, or predefined locale is
the “C” locale (for United States defaults), and the empty string, “”,
meaning
implementation defined. The “C” locale assumes that all char data types
are 1 byte and that their value is usually less than 256. By default, all
locale-dependent routines in the Microsoft run-time library use the code
page
that corresponds to the “C” locale. In C, you retrieve some locale-specific
information with the
localeconv() function and set the locale with the setlocale() function. As
we’ll see later, The runtime library always uses the C locale until told
otherwise with
setlocale(). C++ adds functionality as well by wrapping locale information
in
classes. You can then have more than one of these locale objects alive at
the same
time.

Listing 1 shows the simplest example of a C application using locales by
displaying the date in the “C” and the “french” locales. You’ll notice the
program specifies LC_ALL in setlocale(), which is a #define in locale.h.
We can change a specific category of a locale by specifying it as the
parameter
to setlocale(). In our example, we could of used LC_TIME, and the results
would
of been identical. On the other hand, specifying LC_NUMERIC would not of
affected the date displayed. The locale categories are

LC_COLLATE   Which is used by strcoll() and strxfrm()
LC_CTYPE     Which is used by the character handling functions in 
LC_MONETARY  Monetary formatting, no C functions use this information except localconv()
LC_NUMERIC   For non-monetary formatting.
LC_TIME      Which is used by strftime(), not ctime() or asctime()

Retrieving locale information with the localeconv() function returns a
struct lconv
object. A quick glance into the header reveals the structures
innards.
Listing 2 shows the default values for each lconv member in the “C” locale.
To demonstrate
some of these default values, Listing 3 retrieves and displays some “C”
locale default
information using the localeconv() function.

By putting our C++ thinking cap on, we can quickly get a feel of how all
this could
be done in C++; wrap each locale category in a separate class, have a
generic locale
class container default to the “C” behavior for each of the locale classes.
Then if
someone needs any of those locale objects to behave in a specific way, he
can derive
his own class from them and override the necessary virtual functions.

Given our C locale categories, let’s begin by determining which classes
handle each category. Following is the category, C++ Standard class name and the header
file where the class is located. You only need to include to use any of these:

Locale category C++ class name Header

LC_COLLATE    collate             <Locale>
LC_CTYPE      ctype               <Xlocale>
LC_MONETARY   money_put/money_get <Xlocmon>
LC_NUMERIC    numpunct            <Xlocnum>
LC_TIME       time_get/time_put   <Xloctime>

All these classes are templates, capable of handling
he ‘char’ and ‘wchar_t’ types. This can be a real help for developers
creating ANSI and UNICODE applications. The standard library doesn’t support MBCS, only
single-byte and wide characters. All of these classes have a _Getcat() member function
which returns the LC_xxx category. Listing 4 shows the locale category
associated with each class.

The C++ framework for the aforementioned classes is as follows; All these
classes are
derived from the locale::facet base class. _Getcat() for example is a member
of locale::facet which simply returns -1. The reason for the locale::facet
syntax is because facet is a local class of the enclosing locale class.
So, the facet class actually contains information on the localization aspect
whereas the locale class is a container of facets. To use a facet, you use
the
Standard provided global function template use_facet(). More on use_facet()
later.

You may be thinking _Getcat() is one of many virtual functions which behaves
polymorphically in the derived classes, and has defaults in the base class
(returning -1
in this case). In fact, neither locale nor facet have any virtual functions
(with
the exception of the facet destructor). So then, how do we extend the
behavior of
these facets? You do so by adding your own unique facet to a locale, with
locale::_Addfac(), specifying the facet and the facet Id. All of this
machinery can
be found in \crt\src\locale.cpp. A locale cannot contain two facets with the
same id.
So when we’ll derive new classes from those previously mentioned, they’ll
have the
same id as the original class, and will therefore replace them. For example,
if Mynumpunct is derived from numpunct, and we then call _ADDFAC(locale(),
Mynumpunct),
the facet Id will be the same as numpunct::Id. Listing 5 shows how a class
derived
from numpunct has the same Id as numpunct itself. Once created a locale and
its
facet cannot change. _Getcat() and _Addfac() are nonstandard MSVC additions.

A special note about the ctype class; for efficiency, the C++ Standard
requires that ctype for chars be implemented as a template specialization. For this
reason ctype differs somewhat from the other classes in that it derives from
ctype_base, which itself is derived from locale::facet, as opposed to being directly
derived from locale::facet.

ctype<char> : public ctype_base
template<class _E> class ctype : public ctype_base

The ctype_base contains an enumeration indicating a character’s particular
semantics. This allows very fast character classification using bit masks. Listing 6
shows how ctype can be used for determining character semantics. I recommend stepping
through all the examples listed to better grasp all concepts presented. You might be
surprised to find the implementation for class locale in locale.cpp and local0.cpp, in
the “\crt\src” directory. I would of thought a “\cpp\src” directory structure would of been
more appropriate.

Note the _DIGIT #define which Listing 6 uses. This checks if the character
is in the ‘0’-‘9’ range. Other available bit masks are:

_UPPER to determine if it is an upper case letter A-Z
_LOWER to determine if it is a lower case letter a-z
_SPACE to determine if it is a horizontal tab, carriage return,
       newline, vertical tab or form feed
_PUNCT to determine if it is a punctuation character
_CONTROL to determine if it is a control character such as BEL and backspace
_BLANK to determine if it is a space
_HEX   to determine if it is a hexadecimal digit (0-9, A-F or a-f)

You’ll find the definitions for these in <wchar.h> and in <Xlocinfo.h>. If
the syntax seems contrived to you, <Locale> provides some support templates. Using
those helper templates, Listing 6 can be rewritten in a clearer fashion as shown in
Listing 7.

Previously I mentioned that to access a facet contained in a local you call
the use_facet function. As you may of noticed in the sample listings, I use
a macro called _USE(). _USE() is simply defined as

#define _USE(loc, fac) use_facet(loc, (fac *)0, true)

in this implementation. The first parameter is a locale, and the second
parameter is a facet. I often use locale::classic() or locale::empty() as the locale
parameter. empty() is a nonstandard locale object with no facets which behaves
differently from locale(). Whereas classic() is the locale that C defines, empty()
makes a transparent locale where any missing facets permit facets from the global
locale to shine through. use_facet is a template function which returns
a reference to the locale facet found in the specified locale. It does this
by calling locale::_Getfacet(). If there is no facet in the locale object, it simply
throws bad_cast. We could ask use_facet to create the facet for us, that is what the third
parameter is for. Passing true, as is always the case with the _USE() macro,
asks use_facet to new the facet. Another macro, _USEFAC, is exactly like
_USE() except it passes false to use_facet, asking that the facet not be
new’ed. The use of true as a third argument is designed to support lazy evaluation
of the standard facets and should not be used indiscriminately.

Listing 8 shows what happens when attempting to use a facet without passing
true to use_facet. The facet returned by these two macros is
locale::_Locimp::facet::_Fv. P.J. had to provide these two macros because at the time the MSVC++ compiler
did not support explicit template argument specification. Listing 9 shows what
happens when a function foo() has a template argument which the compiler cannot determine
on its own. The function bar() shows a similar problem, in the call to bar(1., 2), the
two parameters are of different types, so the compiler does not know which of the two types
Type should be, so it chooses not to decide by flagging it as an error.
With all of this framework background we’re ready to implement our own
facets and use them within a real-world C++ Windows application. Win32 provides
many functions similar to the facet functionality. IsCharAlpha() for example is similar to
isalpha<char>() except that it uses the information specified by the user in the control panel. For
Windows applications, this is preferable, except for the fact that it would leave out all the
Standard locale and facet classes we’ve come to love. What we need to do is implement our
own facets which make use of the Win32 APIs. That would allow us to program in a
consistent fashion.

We can use GetLocaleInfo() to retrieve information about the current user
locale and implement our own facet which makes use of this information. Listing 10
shows our Dmoneypunct facet in action. In between executions, if you go in Control
Panel and select the Regional Settings Icon and go in the Currency tab, you can make
changes and they will immediately be used by the Dmoneypunct class.

Special thanks to P.J. Plauger for his technical proof reading and comments.

/*
Mario Contestabile is a C++ developer at Zero-Knowledge Systems,
and can be reached at MarioC@Computer.Org
*

/* Listing 1, Locales in C *
/*
#include <windows.h>
#include <locale.h>
#include <stdio.h>
#include <time.h>

void localf(const char* pLoc)
{
  if(setlocale(LC_ALL, pLoc) == NULL)
  {
    fprintf(stderr, "Unable to establish locale\n");
    return;
  }
  else
  {
    time_t system_time = time(NULL);
    CHAR time_text[81];
    strftime(time_text, 80, "%x %A %B %d", localtime(&system_time));
    printf("[%s]\n", time_text);
  }
  return;
}

int main()
{

 localf("C");
 localf("french");

 return 1;
}
*

/* Listing 2 struct lconv from <locale.h> with default value in "C" locale.
CHAR_MAX is 127

struct lconv {
 char *decimal_point;  "."
 char *thousands_sep;  ""
 char *grouping;    ""
 char *int_curr_symbol;  ""
 char *currency_symbol;  ""
 char *mon_decimal_point; ""
 char *mon_thousands_sep; ""
 char *mon_grouping;   ""
 char *positive_sign;  ""
 char *negative_sign;  ""
 char int_frac_digits;  CHAR_MAX
 char frac_digits;   CHAR_MAX
 char p_cs_precedes;   CHAR_MAX
 char p_sep_by_space;  CHAR_MAX
 char n_cs_precedes;   CHAR_MAX
 char n_sep_by_space;  CHAR_MAX
 char p_sign_posn;   CHAR_MAX
 char n_sign_posn;   CHAR_MAX
};
*

/* Listing 3 Some "C" local defaults *
/*
#include <locale.h>
#include <stdio.h>


int main()
{

 struct lconv *loc = localeconv();
 printf("[%s] [%s] [%s] [%s]\n", loc->decimal_point, loc->thousands_sep,
                                 loc->currency_symbol, loc->positive_sign);

 return 1;
}
*

/* Listing 4 locale categories associated with each class *
/*
#include <locale>
#include <cassert>

using namespace std;

int main(){

 assert(LC_COLLATE == collate<char>::_Getcat());
 assert(LC_CTYPE == ctype<char>::_Getcat());
 assert(LC_MONETARY == money_put<char>::_Getcat());
 assert(LC_MONETARY == money_get<char>::_Getcat());
 assert(LC_NUMERIC == numpunct<char>::_Getcat());
 assert(LC_TIME == time_put<char>::_Getcat());
 assert(LC_TIME == time_get<char>::_Getcat());

 return 1;
}
*

/* Listing 5 What are the facet Id's? *
/*
#include <iostream>
#include <locale>
#include <cassert>

using namespace std;

class Mynumpunct : public numpunct<char>{
};

int main(){

 assert(numpunct<char>::id == Mynumpunct::id);

 cout << numpunct<char>::id << endl;
 cout << ctype<char>::id << endl;
 cout << collate<char>::id << endl;
 cout << money_put<char>::id << endl;
 cout << money_get<char>::id << endl;
 cout << time_get<char>::id << endl;
 cout << time_put<char>::id << endl;

 cout << numpunct<wchar_t>::id << endl;
 cout << ctype<wchar_t>::id << endl;
 cout << collate<wchar_t>::id << endl;
 cout << money_put<wchar_t>::id << endl;
 cout << money_get<wchar_t>::id << endl;
 cout << time_get<wchar_t>::id << endl;
 cout << time_put<wchar_t>::id << endl;

 return 1;
}
*

/* Listing 6 ctype<char> is a specialized template for efficiency *
/*
#include <iostream>
#include <locale>

using namespace std;

int main(){

 const char a = 'a';

 if(_USE(locale::empty(), ctype<char>).is(_DIGIT, a))
  cout << a << " is a digit" << endl;
 else
  cout << a << " is not a digit" << endl;

 const char b = '1';
 if(_USE(locale::empty(), ctype<char>).is(_DIGIT, b))
  cout << b << " is a digit" << endl;
 else
  cout << b << " is not a digit" << endl;

 return 1;
}
*

/* Listing 7 ctype helper functions *
/*
#include <iostream>
#include <locale>

using namespace std;

int main(){

 const char a = 'a';

 if(isalpha<char>(a))
  cout << a << " is an alphabetic character" << endl;
 else
  cout << a << " is not an alphabetic character" << endl;

 char b = 'b';
 if(islower<char>(b))
  cout << b << " is a lowercase character" << endl;
 else
  cout << b << " is not a lowercase character" << endl;

 b = toupper<char>(b);
 if(islower<char>(b))
  cout << b << " is a lowercase character" << endl;
 else
  cout << b << " is not a lowercase character" << endl;

 return 1;
}
*

/* Listing 8 use_facet new's your facet, or throws bad_cast if there's no
such facet in the given locale*
/*
#include <iostream>
#include <locale>

int main(){

 const char a = 'a';

 try{
  // The _USE macro passes true to use_facet
  if(std::_USE(std::locale(), std::ctype<char>).is(_DIGIT, a))
   std::cout << a << " is a digit" << std::endl;
  else
   std::cout << a << " is not a digit" << std::endl;

  std::locale loc = std::_ADDFAC(std::locale(), new std::ctype<char>);

  if(std::_USEFAC(loc, std::ctype<char>).is(_DIGIT, a))
   std::cout << a << " is a digit" << std::endl;
  else
   std::cout << a << " is not a digit" << std::endl;

  // The _USEFAC macro passes false to use_facet
  if(std::_USEFAC(std::locale(), std::ctype<char>).is(_DIGIT, a))
   std::cout << a << " is a digit" << std::endl;
  else
   std::cout << a << " is not a digit" << std::endl;


 }
 catch(std::bad_cast& ex){
  std::cout << ex.what() << std::endl;
 }
 return 1;
}
*

/* Listing 9 Help your compiler in determining template arguments! *
/*
template<typename Type>
int foo(){
 return 1;
}

template <typename Type>
Type bar(Type p1, Type p2){
 return 1;
}

int main(){

 //foo();  could not deduce template argument
 foo<int>();

 //bar(1., 2); ambiguous template parameter
 bar(1, 2);

 return 1;
}
*

/* Listing 10 C++ and Win32 hand-in-hand *

#include <windows.h>
#include <locale>
#include <iostream>

class Dmoneypunct : public std::moneypunct<TCHAR, true>{
    mutable TCHAR buf[20];

protected:
    virtual TCHAR do_decimal_point() const {
        if(0 != GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SMONDECIMALSEP, buf, sizeof buf))
            return buf[0];
        else
            return std::_USE(std::locale(), std::moneypunct<TCHAR>).decimal_point();
    }
    virtual TCHAR do_thousands_sep() const {
        if(0 != GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SMONTHOUSANDSEP, buf, sizeof buf))
            return buf[0];
        else
            return std::_USE(std::locale(), std::moneypunct<TCHAR>).thousands_sep();
    }

    virtual std::string do_grouping() const{

        if(0 != GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SMONGROUPING, buf, sizeof buf))
            return buf;
        else
            return std::_USE(std::locale(), std::moneypunct<TCHAR>).grouping();
    }

    virtual std::string do_curr_symbol() const {

        if(0 != GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SCURRENCY, buf, sizeof buf))
            return buf;
        else
            return std::_USE(std::locale(), std::moneypunct<TCHAR>).curr_symbol();
    }
    virtual std::string do_positive_sign() const {

        if(0 != GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SPOSITIVESIGN, buf, sizeof buf))
            return buf;
        else
            return std::_USE(std::locale(), std::moneypunct<TCHAR>).positive_sign();
    }
    virtual std::string do_negative_sign() const {

        if(0 != GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SNEGATIVESIGN, buf, sizeof buf))
            return buf;
        else
            return std::_USE(std::locale(), std::moneypunct<TCHAR>).negative_sign();
    }
};

int main(){

 std::moneypunct<TCHAR, true> i;
 std::cout<< "moneypunct decimal point " << i.decimal_point() << std::endl;
 std::cout<< "moneypunct thousands sep " << i.thousands_sep() << std::endl;
 std::cout<< "moneypunct grouping " << i.grouping() << std::endl;
 std::cout<< "moneypunct currency symbol " << i.curr_symbol() << std::endl;
 std::cout<< "moneypunct positive sign " << i.positive_sign() << std::endl;
 std::cout<< "moneypunct negative sign " << i.negative_sign() << std::endl;

 std::cout<< std::endl;

 Dmoneypunct D;
 std::cout<< "Dmoneypunct decimal point " << D.decimal_point() << std::endl;
 std::cout<< "Dmoneypunct thousands sep " << D.thousands_sep() << std::endl;
 std::cout<< "Dmoneypunct grouping " << D.grouping() << std::endl;
 std::cout<< "Dmoneypunct currency symbol " << D.curr_symbol() << std::endl;
 std::cout<< "Dmoneypunct positive sign " << D.positive_sign() << std::endl;
 std::cout<< "Dmoneypunct negative sign " << D.negative_sign() << std::endl;

 return 1;
}

 

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read