Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

encode::jp5.18(3pm) [mojave man page]

Encode::JP(3pm) 					 Perl Programmers Reference Guide					   Encode::JP(3pm)

NAME
Encode::JP - Japanese Encodings SYNOPSIS
use Encode qw/encode decode/; $euc_jp = encode("euc-jp", $utf8); # loads Encode::JP implicitly $utf8 = decode("euc-jp", $euc_jp); # ditto ABSTRACT
This module implements Japanese charset encodings. Encodings supported are as follows. Canonical Alias Description -------------------------------------------------------------------- euc-jp /euc.*jp$/i EUC (Extended Unix Character) /jp.*euc/i /ujis$/i shiftjis /shift.*jis$/i Shift JIS (aka MS Kanji) /sjis$/i 7bit-jis /jis$/i 7bit JIS iso-2022-jp ISO-2022-JP [RFC1468] = 7bit JIS with all Halfwidth Kana converted to Fullwidth iso-2022-jp-1 ISO-2022-JP-1 [RFC2237] = ISO-2022-JP with JIS X 0212-1990 support. See below MacJapanese Shift JIS + Apple vendor mappings cp932 /windows-31j$/i Code Page 932 = Shift JIS + MS/IBM vendor mappings jis0201-raw JIS0201, raw format jis0208-raw JIS0201, raw format jis0212-raw JIS0201, raw format -------------------------------------------------------------------- DESCRIPTION
To find out how to use this module in detail, see Encode. Note on ISO-2022-JP(-1)? ISO-2022-JP-1 (RFC2237) is a superset of ISO-2022-JP (RFC1468) which adds support for JIS X 0212-1990. That means you can use the same code to decode to utf8 but not vice versa. $utf8 = decode('iso-2022-jp-1', $stream); and $utf8 = decode('iso-2022-jp', $stream); yield the same result but $with_0212 = encode('iso-2022-jp-1', $utf8); is now different from $without_0212 = encode('iso-2022-jp', $utf8 ); In the latter case, characters that map to 0212 are first converted to U+3013 (0xA2AE in EUC-JP; a white square also known as 'Tofu' or 'geta mark') then fed to the decoding engine. U+FFFD is not used, in order to preserve text layout as much as possible. BUGS
The ASCII region (0x00-0x7f) is preserved for all encodings, even though this conflicts with mappings by the Unicode Consortium. SEE ALSO
Encode perl v5.18.2 2013-11-04 Encode::JP(3pm)

Check Out this Related Man Page

JIS2K(3)						User Contributed Perl Documentation						  JIS2K(3)

NAME
Encode::JIS2K - JIS X 0212 (aka JIS 2000) Encodings SYNOPSIS
use Encode::JIS2K; use Encode qw/encode decode/; $euc_2k = encode("euc-jisx0213", $utf8); $utf8 = decode("euc-jisx0213", $euc_jp); ABSTRACT
This module implements encodings that covers JIS X 0213 charset (AKA JIS 2000, hence the module name). Encodings supported are as follows. Canonical Alias Description -------------------------------------------------------------------- euc-jisx0213 qr/euc.*jp[ -]?(?:2000|2k)$/i EUC-JISX0213 qr/jp.*euc[ -]?(2000|2k)$/i qr/ujis[ -]?(?:2000|2k)$/i shiftjisx0123 qr/shift.*jis(?:2000|2k)$/i Shift_JISX0213 qr/sjisp -]?(?:2000|2k)$/i iso-2022-jp-3 jis0213-1-raw JIS X 0213 plane 1, raw format jis0213-2-raw JIS X 0213 plane 2, raw format -------------------------------------------------------------------- DESCRIPTION
To find out how to use this module in detail, see Encode. what is JIS X 0213 anyway? Simply put, JIS X 0213 is a rework and reorganization of JIS X 0208 and JIS X 0212. They consist of two 94x94 planes which roughly corrensponds as follows; JIS X 0213 Plane 1 = JIS X 0208 + extension JIS X 0213 Plane 2 = JIS X 0212 reorganized + extension And here is the character repertoire there of at a glance. # of codepoints Kuten Ku (rows) used -------------------------------------------------------- JIS X 0208 6,879 1..8,16..83 JIS X 0213-1 8,762 1..94 (all!) JIS X 0212 6,067 2,6..7,9..11,16..77 JIS X 0213-2 2,436 1,3..5,8,12..15,78..94 ------------------------------------------------------- (JIS X0213 Total) 11,197 JIS X 0213 was designed to extend JIS X 0208 and JIS X 0212 without being imcompatible to (classic) EUC-JP and Shift_JIS. The following characteristics are as a result thereof. o JIS X plane 1 is (almost) a superset of JIS X 0208. However, with Unicode 3.2.0 the mappings differ in 3 codepoints. Kuten JIS X 0208 -> Unicode JIS X 0213 -> Unicode -------------------------------------------------------------- 1-1-17 <UFFE3> # FULLWIDTH MACRON <U203E> # OVERLINE 1-1-29 <U2014> # EM DASH <U2015> # HORIZONTAL BAR 1-1-79 <UFFE5> # FULLWIDTH YEN SIGN <U00A5> # YEN SIGN o By the same token, JIS X 0213 plane 2 contains JIS Dai-4 Suijun Kanji (JIS Kanji Repertoire Level 4). This allows EUC-JP's G3 to contain both JIS X 0212 and JIS 0213 plane 2. However, JIS X 0212:1990 already contains many of Dai-4 Suijun Kanji so EUC's G3 is subject to containing duplicate mappings. o Because of Halfwidth Katakana, Shift_JIS mapping has been tricky and it is even trickier. Here is a regex that matches Shift_JISX0213 sequence (note: you have to "use bytes" to make it work!) $re_valid_shifjisx0213 = qr/^(?: [x00-x7f] | # ASCII or [xa1-xdf] | # JIS X 0201 KANA or [x81-x9fxe0-xfc][x40-x7ex80-xfc] # JIS X 0213 )+$/xo; Note on EUC-JISX0213 (vs. EUC-JP) As of Encode-1.64, 'euc-jp' does support euc-jisx0213 for decoding. However, 'euc-jp' in Encode and 'euc-jisx0213' differ as follows; euc-jp euc-jisx0213 -------------------------------------------------------------- Decodes.... (0201-K|0208|0212|0213) ditto Round-Trip (|0) (020-K|0208|0212) JIS X (0201-K|0213) Decode Only (|3) those only found in 0213 those only found in 0212 -------------------------------------------------------------- AUTHORS
Dan Kogai <dankogai@dan.co.jp> COPYRIGHT
Copyright 2002 by Dan Kogai <dankogai@dan.co.jp>. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See <http://www.perl.com/perl/misc/Artistic.html> SEE ALSO
Encode, Encode::JP Japanese Graphic Character Set for Information Interchange -- Plane 1 http://www.itscj.ipsj.or.jp/ISO-IR/228.pdf <http://www.itscj.ipsj.or.jp/ISO-IR/228.pdf> Japanese Graphic Character Set for Information Interchange -- Plane 2 http://www.itscj.ipsj.or.jp/ISO-IR/229.pdf <http://www.itscj.ipsj.or.jp/ISO-IR/229.pdf> perl v5.12.1 2005-05-12 JIS2K(3)
Man Page