iconv_ibmkanji(5) [osf1 man page]

iconv_ibmkanji(5)						File Formats Manual						 iconv_ibmkanji(5)

NAME

       iconv_ibmkanji - Specification for controlling conversion between IBM Kanji and Tru64 UNIX Japanese codesets

DESCRIPTION

       The iconv utility supports the ability to convert the encoding of characters between IBM Kanji System Characters (IBM Kanji) and one of the
       following Tru64 UNIX codesets: DEC Kanji, Super DEC Kanji, Japanese EUC, or Shift JIS. You choose the type of conversion by specifying  the
       appropriate values for the utility's from-code and to-code parameters, as follows:

       -----------------------------------------------------
       Type of Code Conversion	      from-code   to-code
       -----------------------------------------------------
       IBM Kanji to DEC Kanji	      ibmkanji	  deckanji
       IBM Kanji to Super DEC Kanji   ibmkanji	  sdeckanji
       IBM Kanji to Japanese EUC      ibmkanji	  eucJP
       IBM Kanji to Shift JIS	      ibmkanji	  SJIS
       DEC Kanji to IBM Kanji	      deckanji	  ibmkanji
       Super DEC Kanji to IBM Kanji   sdeckanji   ibmkanji
       Japanese EUC to IBM Kanji      eucJP	  ibmkanji
       Shift JIS to IBM Kanji	      SJIS	  ibmkanji
       -----------------------------------------------------

       Conversion  behavior  for the following items is affected by the definition of environment variables or profile entries in the user's envi-
       ronment. For more information, see the "Environment Variables" and "Profile" sections.  The UDC (User-Defined Character) mapping table that
       is used for UDC conversion

	      This  table  must be an ASCII text file that contains UDC mapping information.  The table affects conversion of user-defined charac-
	      ters between the codesets.  The EBCDIC to/from ISO code (ASCII, JIS Roman characters) mapping table that is used for conversion

	      This table must be ASCII text file that contains information on how to map characters between EBCDIC and ISO code.  The K-shift code

	      This is a one- or two-byte hexadecimal code that marks the beginning of Kanji mode.  The A-shift code

	      This is a one- or two-byte hexadecimal code that marks the beginning of EBCDIC mode.  The status	of  the  initial  mode	(Kanji	or
	      EBCDIC)  at  the	time iconv command starts or the first time the iconv() function is called after calling the iconv_open() function
	      that initializes the converter in a program

	      The status keywords are either kanji_mode or ebcdic_mode.  How to treat undefined characters when these are detected in Kanji mode

	      Specify this action by using one of the following keywords: Stop codeset conversion.  Output the undefined  characters  without  any
	      processing and continue codeset conversion.  Output padding characters instead of the undefined characters and continue codeset con-
	      version.	Ignore the undefined characters and continue codeset conversion.  The two-byte padding character used in Kanji mode

	      This value is meaningful when replace is chosen for the processing of undefined characters in Kanji mode. Specify the padding  char-
	      acter by its hexadecimal value.  How to treat undefined characters when these are detected in EBCDIC mode

	      Specify  this  action  by using one of the following keywords: Stop codeset conversion.  Output the undefined characters without any
	      processing and continue codeset conversion.  Output padding characters instead of the undefined characters and continue codeset con-
	      version.	Ignore the undefined characters and continue codeset conversion.  The one-byte padding character used in EBCDIC mode

	      This value is meaningful when replace is chosen for the processing of undefined characters in EBCDIC mode. Specify the padding char-
	      acter by its hexadecimal value.

       When the to-code parameter for the conversion is ibmkanji, you can also specify the following items for conversion  behavior:  Whether  the
       initial	shift  code is output at the start of conversion if the status of the initial mode (Kanji or EBCDIC) is different from the mode of
       the first input character

	      The start of conversion is the time the iconv utility starts processing, or when the iconv() function is called just  after  opening
	      the converter with iconv_open(). Keyword values for this item are yes or no.  Whether or not the utility outputs the last shift code
	      when iconv() is called with a zero length input string, and the current mode (Kanji or EBCDIC) is different from the mode  specified
	      by the last shift state

	      Keyword values for this item are yes or no.  The last status (Kanji mode or EBCDIC mode)

	      Specify  kanji_mode or ebcdic_mode for this value. It is meaningful only when yes is the setting for whether the utility outputs the
	      last shift code.

       If the items that control conversion behavior are specified by both environment variables and the profile file, values set  by  environment
       variables  override  values set by comparable entries in the profile. Note that values for all conversion control items are case-sensitive,
       whether they are set by environment variables or in the profile. The following table contains the default values for each  conversion  con-
       trol item:

       ----------------------------------------------------
       Conversion Control Item		     Default Value
       ----------------------------------------------------
       UDC mapping table		     None
       K shift code			     0x0e
       A shift code			     0x0f
       Initial state			     ebcdic_mode
       Processing for undefined characters
       in Kanji mode			     abort
       Processing for undefined characters
       in EBCDIC mode			     pass
       ----------------------------------------------------

       The  default  padding  characters  are white spaces, whose code values for each destination codeset are noted in the following table. These
       padding characters are output when you specify replace for processing of undefined characters and do not  explicitly  specify  the  padding
       character.

       ---------------------------------------------------
       Mode	     Default Value   Destination Codeset
       ---------------------------------------------------
       Kanji mode    0x44e9	     ibmkanji
		     0xa1a1	     deckanji, sdeckanji,
				     or eucJP
		     0x8140	     SJIS
       EBCDIC mode   0x40	     ibmkanji
		     0x20	     deckanji, sdeckanji,
				     eucJP, or SJIS
       ---------------------------------------------------

       The    default	 EBCDIC-ISO    mapping	  table    is	 as    follows;    For	  conversion   from   IBM   Kanji   to	 other	 codesets:
       /usr/lib/nls/loc/iconv/data/ebcdic_kana.tbl For conversion from other codesets to IBM Kanji: /usr/lib/nls/loc/iconv/data/kana_ebcdic.tbl

       These mapping tables map both EBCDIC and ISO code, which includes JIS Roman characters. The kana_ebcdic.tbl mapping  table  also  maps  ISO
       lowercase characters to EBCDIC uppercase characters.

       The following default values for conversion control items are meaningful when the iconv utility's to-code conversion parameter is ibmkanji:

       ---------------------------------------------
       Conversion Control Item		Default
       ---------------------------------------------
       Output the initial shift code?	yes
       Output the last shift code?	yes
       Output the last status?		ebcdic_mode
       ---------------------------------------------

   Environment Variables
       This  section  discusses the environment variables that you can set to control conversion behavior. The names for these variables adhere to
       the following format:

       fromcode_tocode_controlitem

       The name segments for fromcode or tocode can be one of the following key words:

       ----------------------------
       For Codeset:	 Use:
       ----------------------------
       IBM Kanji	 IBMKANJI
       DEC Kanji	 DECKANJI
       Super DEC Kanji	 SDECKANJI
       Japanese EUC	 EUCJP
       Shift JIS	 SJIS
       ----------------------------

       The name segments for controlitem can be one of the following keywords:

       --------------------------------------------------------
       For Control Item:		    Use:
       --------------------------------------------------------
       UDC mapping table		    UDC_TABLE
       EBCDIC-ISO mapping table 	    EBCDIC_TABLE
       K shift code			    K_SHIFT_CODE
       A shift code			    A_SHIFT_CODE
       Initial state			    INITIAL_STATE
       Processing of undefined characters
       in Kanji mode			    KANJI_EXCEPT_PROC
       Processing of undefined characters
       in EBCDIC mode			    EBCDIC_EXCEPT_PROC
       Padding characters
       in Kanji mode			    PADDING_2BYTE_CHAR
       Padding characters
       in EBCDIC mode			    PADDING_1BYTE_CHAR
       Output initial
       shift code			    INITIAL_SHIFT_CODE
       Output last
       shift code			    TRAILER_SHIFT_CODE
       Last status			    LAST_STATE
       File path of the profile 	    PROFILE
       --------------------------------------------------------

       Following are examples of using the setenv C shell command to define environment variables to control conversion behavior. In  these  exam-
       ples, the fromcode name segment indicates Japanese EUC and the tocode name segment indicates IBM Kanji:

       setenv	   EUCJP_IBMKANJI_UDC_TABLE	 eucjp_ibmkanji_udc.tbl      setenv	EUCJP_IBMKANJI_EBCDIC_TABLE	kana_ebcdic.tbl     setenv
       EUCJP_IBMKANJI_K_SHIFT_CODE  0x0e  setenv  EUCJP_IBMKANJI_A_SHIFT_CODE  0x0f   setenv   EUCJP_IBMKANJI_INITIAL_STATE   ebcdic_mode   setenv
       EUCJP_IBMKANJI_KANJI_EXCEPT_PROC  replace  setenv EUCJP_IBMKANJI_EBCDIC_EXCEPT_PROC replace setenv EUCJP_IBMKANJI_PADDING_2BYTE_CHAR 0x44e9
       setenv EUCJP_IBMKANJI_PADDING_1BYTE_CHAR 0x40 setenv EUCJP_IBMKANJI_INITIAL_SHIFT_CODE  yes  setenv  EUCJP_IBMKANJI_TRAILER_SHIFT_CODE  yes
       setenv  EUCJP_IBMKANJI_LAST_STATE  ebcdic_mode  setenv  EUCJP_IBMKANJI_INITIAL_SHIFT_CODE  yes setenv EUCJP_IBMKANJI_TRAILER_SHIFT_CODE yes
       setenv EUCJP_IBMKANJI_LAST_STATE ebcdic_mode setenv EUCJP_IBMKANJI_PROFILE .eucjp_ibmkanji_profile

   Directory Search Path
       When you specify a file name without a directory, the iconv utility searches the following directories and uses the first file found:  Cur-
       rent   directory   Home	 directory   The  iconv/data  subdirectory  of	the  directory	specified  by  the  environment  variable  LOCPATH
       /usr/lib/nls/loc/iconv/data /usr/i18n/lib/nls/loc/iconv/data

       If you specify a relative directory path for a file, the utility searches these same directories in the same order and uses the first  file
       found.

   Profile File
       Entry lines in the profile file adhere to the following format:

       entry_name	 string_value

       The  entry_name	and  string_value  fields  are	separated by spaces or tabs. Do not append a colon (:) after entry_name. The file can also
       include blank lines and comment entries, which begin with the # character.

       Following are the entry_name values for different conversion control items:

       ------------------------------------------------------------
       Conversion Control Item		 entry_name
       ------------------------------------------------------------
       UDC mapping table		 udc_mapping_table
       EBCDIC-ISO mapping table 	 ebcdic_mapping_table
       K shift code			 k_shift_code
       A shift code			 a_shift_code
       Initial state			 initial_state
       Processing undefined characters
       in Kanji mode			 kanji_except_proc
       Processing undefined characters
       in EBCDIC mode			 ebcdic_except_proc
       Padding character
       in Kanji mode			 padding_2byte_char
       Padding character
       in EBCDIC mode			 padding_1byte_char

       Output initial
       shift code			 output_initial_shift_code
       Output last
       shift code			 output_trailer_shift_code
       Last state			 last_state
       ------------------------------------------------------------

       Following is a sample profile for converting from Japanese EUC to IBM Kanji.

       # #  sample profile for	eucJP_ibmkanji	#  udc_mapping_table	       eucjp_ibmkanji_udc.tbl  ebcdic_mapping_table	   kana_ebcdic.tbl
       k_shift_code		   0x0e 	 #  ebcdic  ->	kanji  a_shift_code		   0x0f 	 #  kanji  -> ebcdic initial_state
       ebcdic_mode kanji_except_proc	      replace ebcdic_except_proc	 replace padding_2byte_char	    0x44e9	 # kanji mode pad-
       ding_1byte_char		 0x40		#   ebcdic   mode   output_initial_shift_code	 yes   output_trailer_shift_code   yes	last_state
       ebcdic_mode

       The default file names for the profile are as follows;

       -----------------------------------------------------------
       Code Conversion		      Default Profile Name
       -----------------------------------------------------------

       IBM Kanji to DEC Kanji	      .ibmkanji_deckanji_profile
       IBM Kanji to Super DEC Kanji   .ibmkanji_sdeckanji_profile
       IBM Kanji to Shift JIS	      .ibmkanji_sjis_profile
       IBM Kanji to Japanese EUC      .ibmkanji_eucjp_profile

       DEC Kanji to IBM Kanji	      .deckanji_ibmkanji_profile
       Super DEC Kanji to IBM Kanji   .sdeckanji_ibmkanji_profile
       Shift JIS to IBM Kanji	      .sjis_ibmkanji_profile
       Japanese EUC to IBM Kanji      .eucjp_ibmkanji_profile
       -----------------------------------------------------------

       By default, the iconv utility checks the directory search path mentioned in the "Directory Search Path" section and uses the first  profile
       it  finds.  However,  you  can  also specify an arbitrary file path for your profile instead of the default names by defining the following
       environment variables:

       -----------------------------------------------------------------
       Code Conversion		      Profile Path Environment Variable
       -----------------------------------------------------------------
       IBM Kanji to DEC Kanji	      IBMKANJI_DECKANJI_PROFILE
       IBM Kanji to Super DEC Kanji   IBMKANJI_SDECKANJI_PROFILE
       IBM Kanji to Shift JIS	      IBMKANJI_SJIS_PROFILE
       IBM Kanji to Japanese EUC      IBMKANJI_EUCJP_PROFILE

       DEC Kanji to IBM Kanji	      DECKANJI_IBMKANJI_PROFILE
       Super DEC Kanji to IBM Kanji   SDECKANJI_IBMKANJI_PROFILE
       Shift JIS to IBM Kanji	      SJIS_IBMKANJI_PROFILE
       Japanese EUC to IBM Kanji      EUCJP_IBMKANJI_PROFILE
       -----------------------------------------------------------------

   UDC Mapping Table
       Entries in a UDC mapping table adhere to the following format:

       fromcode      tocode

       Each of these values is a two-byte hexadecimal number. In the case of Super DEC Kanji and Japanese EUC, three-byte hexadecimal values  that
       begin with SS3 (0x8f), such as 0x8fxxxx, are also valid.

       You  can  specify  ranges  of UDC from and to values in the same file entry by using a hyphen to separate the codes that start and end each
       range:

       start_fromcode-end_fromcode   start_tocode-end_tocode

       When specifying entries that include ranges of values, the number of codes in the from range must always equal the number of codes  in  the
       to  range. A UDC mapping table can also include blank lines and comment lines, which begin with the # character. Following is an example of
       a UDC mapping table:

       # ibmkanji	     eucJP

       0x6941-0x72fe	       0xf5a1-0xfefe		 #    udc    0x7341-0x7cfe	      0x8ff5a1-0X8ffefe 	 #    udc    0x7d41-0x7ffe
       0x8feea1-0X8ff0fe       # udc

       The first entry in this file specifies a range of IBM Kanji values from 0x6941 to 0x72fe that are mapped to Japanese EUC code values in the
       range 0xf5a1 to 0xfefe. You can find additional sample UDC mapping table files in the /usr/i18n/examples/iconv/data directory.

   EBCDIC-ISO Mapping Table
       Entries in an EBCDIC-ISO mapping table adhere to the following format:

       fromcode       tocode

       Each code is a one-byte hexadecimal number. You can specify a range of character codes as follows:

       start_fromcode-end_fromcode     start_tocode-end_tocode

       When using the range format, the number of hex values in the from range must be the same as the number of hex values in the to range.

       The EBCDIC-/ISO mapping table can also include blank lines and comment entries, which begin with the # character.

       Following is an example of EBCDIC-ISO code mapping table:

       # EBCDIC 	       Kana

       0x40		       0x20	       # space 0x4f		       0x21	       # '!' 0x7f		     0x22	     # '"'
	 .			 .
	 .			 .
	 .			 .  0xc1-0xc9		    0x41-0x49	    # 'A' -  'I'  0xd1-0xd9		   0x4a-0x52	    #  'J'  -  'R'
       0xe2-0xe9	       0x53-0x5a       # 'S' - 'Z'
	 .			 .
	 .			 .
	 .			 .

       In this example, the first column of values are from codes and the second column of values are to codes.  The first three value entry lines
       specify mapping for single characters, whereas the last three value entry lines specify mapping for ranges of  characters.   You  can  find
       additional sample EBCDIC-ISO mapping tables in the /usr/i18n/lib/nls/loc/iconv/data directory.

NOTES

       This  reference	page contains code conversion specifications that apply only to conversion between IBM Kanji System characters and the DEC
       Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS codesets. Refer to iconv_JEF(5) for code conversion specifications between Fujitsu  JEF
       characters  and the DEC Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS codesets. Refer to iconv_KEIS(5) for code conversion specifica-
       tions between Hitachi KEIS characters and the DEC Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS codesets. Refer to iconv_intro(5) for
       information about conversion between DEC Kanji, Super DEC Kanji, Japanese EUC, Shift JIS, and other Tru64 UNIX codesets.

SEE ALSO

       Commands: iconv(1)

       Functions: iconv(3), iconv_close(3), iconv_open(3)

       Others: deckanji(5), eucJP(5), iconv_intro(5), iconv_JEF(5), iconv_KEIS(5), Japanese(5), sdeckanji(5), SJIS(5)

																 iconv_ibmkanji(5)
Linux and UNIX Man Pages

iconv_ibmkanji(5) [osf1 man page]