The Kazakh Language and OS400

The subject of this article is encoding Kazakh language characters in OS400. It seems there is no need to investigate this problem because it must a be national standard. Leading vendors have developed their strategies, and all questions will be solved in time.

The problem of national language support in former USSR republics appeared more important as new countries created their own state machinery. The main language in all republics was Russian, but after the SU collaped, some of the new states have been included in the EU; they have developed and registered new standards for national language encoding in ISO, thus giving sufficient resources for software developers. Other countries have moved from the Cyrillic alphabet to Roman, shifting the problem inside. The Republic of Kazakhstan also has seriously considered moving to the Roman alphabet, but a special committee analyzed expenses and time involved for such a move, closely scrutinized other countries' experiences (for example, Turkey's); the project was postponed until the conditions are more suitable. Thus, people remained in the Cyrillic universe and should use today's standard RK 1048-2002, having nothing better at this time. This standard was created in the year 2002; its first goal was to remove the chaos that existed among the different encodings used for ASCII, ANSI, and UNICODE. This was well described.

The RK 1048-2002 standard defines two encodings: UNICODE and ANSI. The first (2-byte) encoding was happily congruent with the existing ISO standard, but the second (1-byte) is local and not ISO registered. Leading vendors (Microsoft and IBM) support Kazakh-UNICODE, but not Kazakh-ANSI. The last actually is supported by a group of volunteers that offers proprietary drivers, fonts, conversion tables, and procedures.

Additionally, there were few IBM AS400 computers in CIS countries in the year 2002. This explains why the RK 1048-2002 standard did not try to link with EBCIDIC encoding and AS400 applications.

The company I work for had bought AS400 (IBM i-series with OS400) and great Banking System (Equation). All seemed okay until the support for Kazakh was requested. All our applications and databases use simple byte encoding, actually CCID 1025. The main AS400 applications installed work via a terminal 5250 emulation program from the IBM Client Access package. Really, this is a Windows application (more precisely, a Java program) and it performs its own international support and the Kazakh language is not on the list.

Here is what we have: On the AS400 side, data and modules use CCID 1025 (EBCIDIC Cyrillic); on the client side (workstation), the page is CP_1251 (Windows Cyrillic). All the necessary conversions are made automatically, corresponding to the type of interaction implemented (ODBC, ADO, JDBC, Data transfer to and from, File Data sharing, and so on).

Today's encoding of the Kazakh language is actually an extension of Cyrillic and it seems robust to use this relation. I mean, one may use some code of CP1025 for his own purpose to represent Kazakh letters instead of its original use. I have made a test of using the RK 1048-2002 standard as described below. The C_1251.nls on a workstation was replaced by one that needed Kazakh standard support on Windows (this is included in the KAZWIN version 3.0 driver package), the necessary keyboard layout was added, the data was entered, and transferred to AS400 table. Then, they were requested by the query on the terminal 5250 screen. As was found, the Kazakh letters (Ө and ө) were mapped into control characters (called field modifiers) and they impact onto the displaying information. The effect is shown in Figure 1. Standard RK 1048-2002 encoding is shown in Figure 2.

Figure 1: The control symbols impact.

Figure 2: Standard RK 1048-2002 encoding.

Investigation was continued and a productive idea was found. That idea used and offered for encoding Kazakh letters in the KAZWIN driver version 2.5 package. Encoding Kazakh letters used to encode other subfamilies of Cyrillic was not necessary for Kazakh—I mean Serbian, Macedonian, and others! This may work because these languages are supported by CP1025 (EBCIDIC Cyrillic), and in this case all the letters were mapped into letters during the conversion from CP_1251 (Windows Cyrillic) to CP1025 (EBCIDIC Cyrillic) and vice versa. And, voilà; all work perfectly. The thing left to was to create a sorting table, and an uppercase table on the AS400 side; this may be easily done by the corresponding OS400 service. Thus, my task may be resolved by installing a CP 1251k (see Figure 3) from the KAZWIN version 2.5. package on every PC.

Figure 3: CP 1251k.

Other Advantages

  • Kazakh language support as extension of Russian, both for Windows and OS400, gives the opportunity to buy software created for Russia and use their experience with no additional changes and programming.
  • The opportunity to get the last modern solutions for Russia, released by IBM and other third parties.

But, using CP1251k on one PC and the standard RK 1048-2002 on the others simultaneously was very uncomfortable. So, there is an insistent need to correct the national standard and use CP1251k instead, offered in RK 1048-2002, and register this new standard in ISO! Of course, this may lead to some losses, but how much it will be?

Costs: There will be the need to convert ANSI-coded data. This may be done saving it in UNICODE, making the transition, and saving in the ANSI. One may use conversion programs; there are a lot and they may be easily created. It is expected that some programs using proprietary sorting procedures, case changing, and raster fonts should be rewritten.

I should mention IBM has its own vision about how to support the Kazakh language. It have created CCID 01166 EBCDIC, and many applications support UNICODE. See the Kazakh language support by IBM. But, Kazakh ANSI was not supported until its registration by ISO.

Losses in the Case of Leaving Today's Standard as is

  • There is no way to use OS400 software created for Russia.
  • One needs to especially reorder such software.
  • One needs additional support for such software.
  • There is an unavoidable lag from the modern state and cost increasing for software, developing, and support as an impact of the decreased RK market.
  • Large expenses for IBM and other vendors for Kazakh language support. (The list of IBM products mentioned above contains about 400 packages and a large number of third-party products).

Ideas used may be applicable for other languages' support.

Questions to the Reader

  • What do you think about this solution?
  • Do you recommend that RK accept these ideas and change their standard?

Your comments and advice are welcome here: tradmir@mail.ru.

You may send also any questions and requests, your contras and pros, to standards changing to the same address.

Turmukhambetov Radmir N.
Monday, February 11, 2008



Comments

  • Text Editor which support CT RK 1048 character encoding

    Posted by Dhananjay Mirashi on 02/26/2013 09:39pm

    Hello tradmir, I'm looking for a text editor tool which supports CT RK 1048 character set. At present we are having Notepad++ which support multiple character encoding sets. But CT RK 1048 is not there. When we try to open a file provided support team which is encoded in CT RK 1048, it gets converted to ANSI while downloading it from email. Could you please help us in this case.

    Reply
  • Appendix

    Posted by radmir on 06/17/2010 07:00am

    Many thanks to them who have supported me in this investigation and help sending me many advices, but bureaucratic ministry of trade had refused to consider my offer requesting translation on a second state language, sending and gathering reviews from institutions listed in their list, and assembling all in one case by their standards. Today, ANSI standard 1048, was used despite non registering it in ISO, and the only way remains to write small inner dll to use this offer. Supporting Dispatch interface enough to get o goal. In such a case Client Access will work with CP1251, changed to support 1048, when AS400 may use page 1025. This works for IBM Access Client versions 5.4.1-4. Yours, Radmir.

    Reply
Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Live Event Date: November 20, 2014 @ 2:00 p.m. ET / 11:00 a.m. PT Are you wanting to target two or more platforms such as iOS, Android, and/or Windows? You are not alone. 90% of enterprises today are targeting two or more platforms. Attend this eSeminar to discover how mobile app developers can rely on one IDE to create applications across platforms and approaches (web, native, and/or hybrid), saving time, money, and effort and introducing apps to market faster. You'll learn the trade-offs for gaining long …

  • IBM Worklight is a mobile application development platform that lets you extend your business to mobile devices. It is designed to provide an open, comprehensive platform to build, run and manage HTML5, hybrid and native mobile apps.

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds