Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

euc(5) [mojave man page]

EUC(5)							      BSD File Formats Manual							    EUC(5)

NAME
euc -- EUC encoding of wide characters SYNOPSIS
ENCODING "EUC" VARIABLE len1 mask1 len2 mask2 len3 mask3 len4 mask4 mask DESCRIPTION
EUC implements a system of 4 multibyte codesets. A multibyte character in the first codeset consists of len1 bytes starting with a byte in the range of 0x00 to 0x7f. To allow use of ASCII, len1 is always 1. A multibyte character in the second codeset consists of len2 bytes starting with a byte in the range of 0x80-0xff excluding 0x8e and 0x8f. A multibyte character in the third codeset consists of len3 bytes starting with the byte 0x8e. A multibyte character in the fourth codeset consists of len4 bytes starting with the byte 0x8f. The wchar_t encoding of EUC multibyte characters is dependent on the len and mask arguments. First, the bytes are moved into a wchar_t as follows: byte0 << ((lenN-1) * 8) | byte1 << ((lenN-2) * 8) | ... | bytelenN-1 The result is then ANDed with ~mask and ORed with maskN. Codesets 2 and 3 are special in that the leading byte (0x8e or 0x8f) is first removed and the lenN argument is reduced by 1. For example, the ja_JP.eucJP locale has the following VARIABLE line: VARIABLE 1 0x0000 2 0x8080 2 0x0080 3 0x8000 0x8080 Codeset 1 consists of the values 0x0000 - 0x007f. Codeset 2 consists of the values who have the bits 0x8080 set. Codeset 3 consists of the values 0x0080 - 0x00ff. Codeset 4 consists of the values 0x8000 - 0xff7f excluding the values which have the 0x0080 bit set. Notice that the global mask is set to 0x8080, this implies that from those 2 bits the codeset can be determined. SEE ALSO
mklocale(1), setlocale(3) BSD
November 8, 2003 BSD

Check Out this Related Man Page

cset(3C)						   Standard C Library Functions 						  cset(3C)

NAME
cset, csetlen, csetcol, csetno, wcsetno - get information on EUC codesets SYNOPSIS
#include <euc.h> int csetlen(int codeset); int csetcol(int codeset); int csetno(unsigned char c); #include <widec.h> int wcsetno(wchar_t pc); DESCRIPTION
Both csetlen() and csetcol() take a code set number codeset, which must be 0, 1, 2, or 3. The csetlen() function returns the number of bytes needed to represent a character of the given Extended Unix Code (EUC) code set, excluding the single-shift characters SS2 and SS3 for codesets 2 and 3. The csetcol() function returns the number of columns a character in the given EUC code set would take on the display. The csetno() function is implemented as a macro that returns a codeset number (0, 1, 2, or 3) for the EUC character whose first byte is c. For example, #include<euc.h> ... x+=csetcol(csetno(c)); increments a counter "x" (such as the cursor position) by the width of the character whose first byte is c. The wcsetno() function is implemented as a macro that returns a codeset number (0, 1, 2, or 3) for the given process code character pc. For example, #include<euc.h> #include<widec.h> ... x+=csetcol(wcsetno(pc)); increments a counter "x" (such as the cursor position) by the width of the Process Code character pc. USAGE
These functions work only for the EUC locales. The cset(), csetlen(), csetcol(), csetno(), and wcsetno() functions can be used safely in multithreaded applications, as long as setlo- cale(3C) is not being called to change the locale. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------------------------------------+ | ATTRIBUTE TYPE ATTRIBUTE VALUE | |MT-Level MT-Safe with exceptions | +-----------------------------------------------------------+ SEE ALSO
setlocale(3C) euclen(3C), attributes(5) SunOS 5.11 16 Nov 2003 cset(3C)
Man Page