utf(6) [plan9 man page]

UTF(6)								   Games Manual 							    UTF(6)

NAME

       UTF, Unicode, ASCII, rune - character set and format

DESCRIPTION

       The  Plan 9 character set and representation are based on the Unicode Standard and on the ISO multibyte UTF-8 encoding (Universal Character
       Set Transformation Format, 8 bits wide).  The Unicode Standard represents its characters in 16 bits; UTF-8 represents  such  values  in	an
       8-bit byte stream.  Throughout this manual, UTF-8 is shortened to UTF.

       In Plan 9, a rune is a 16-bit quantity representing a Unicode character.  Internally, programs may store characters as runes.  However, any
       external manifestation of textual information, in files or at the interface  between  programs,	uses  a  machine-independent,  byte-stream
       encoding called UTF.

       UTF  is	designed so the 7-bit ASCII set (values hexadecimal 00 to 7F), appear only as themselves in the encoding.  Runes with values above
       7F appear as sequences of two or more bytes with values only from 80 to FF.

       The UTF encoding of the Unicode Standard is backward compatible with ASCII: programs presented only with ASCII work on Plan 9 even  if  not
       written	to deal with UTF, as do programs that deal with uninterpreted byte streams.  However, programs that perform semantic processing on
       ASCII graphic characters must convert from UTF to runes in order to work properly with non-ASCII input.	See rune(2).

       Letting numbers be binary, a rune x is converted to a multibyte UTF sequence as follows:

       01.   x in [00000000.0bbbbbbb] -> 0bbbbbbb
       10.   x in [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
       11.   x in [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb

       Conversion 01 provides a one-byte sequence that spans the ASCII character set in a compatible way.  Conversions 10 and 11 represent higher-
       valued  characters  as  sequences of two or three bytes with the high bit set.  Plan 9 does not support the 4, 5, and 6 byte sequences pro-
       posed by X-Open.  When there are multiple ways to encode a value, for example rune 0, the shortest encoding is used.

       In the inverse mapping, any sequence except those described above is incorrect and is converted to rune hexadecimal 0080.

FILES

       /lib/unicode
	      table of characters and descriptions, suitable for look(1).

SEE ALSO

       ascii(1), tcs(1), rune(2), keyboard(6), The Unicode Standard.

																	    UTF(6)

Check Out this Related Man Page

ASCII(1)						      General Commands Manual							  ASCII(1)

NAME

       ascii, unicode - interpret ASCII, Unicode characters

SYNOPSIS

       ascii [ -8 ] [ -oxdbn ] [ -nct ] [ text ]

       unicode [ -nt ] hexmin-hexmax

       unicode [ -t ] hex [ ...  ]

       unicode [ -n ] characters

       look hex /lib/unicode

DESCRIPTION

       Ascii prints the ASCII values corresponding to characters and vice versa; under the -8 option, the ISO Latin-1 extensions (codes 0200-0377)
       are included.  The values are interpreted in a settable numeric base; -o specifies octal, -d decimal, -x hexadecimal (the default), and -bn
       base n.

       With  no  arguments, ascii prints a table of the character set in the specified base.  Characters of text are converted to their ASCII val-
       ues, one per line. If, however, the first text argument is a valid number in the specified base, conversion goes the opposite way.  Control
       characters are printed as two- or three-character mnemonics.  Other options are:

       -n     Force numeric output.

       -c     Force character output.

       -t     Convert from numbers to running text; do not interpret control characters or insert newlines.

       Unicode	is  similar; it converts between UTF and character values from the Unicode Standard (see utf(6)).  If given a range of hexadecimal
       numbers, unicode prints a table of the specified Unicode characters -- their values and UTF representations.  Otherwise it translates  from
       UTF  to numeric value or vice versa, depending on the appearance of the supplied text; the -n option forces numeric output to avoid ambigu-
       ity with numeric characters.  If converting to UTF , the characters are printed one per line unless the -t flag is set, in which  case  the
       output is a single string containing only the specified characters.  Unlike ascii, unicode treats no characters specially.

       The output of ascii and unicode may be unhelpful if the characters printed are not available in the current font.

       The  file /lib/unicode contains a table of characters and descriptions, sorted in hexadecimal order, suitable for look(1) on the lower case
       hex values of characters.

EXAMPLES

       ascii -d
	      Print the ASCII table base 10.

       unicode p
	      Print the hex value of `p'.

       unicode 2200-22f1
	      Print a table of miscellaneous mathematical symbols.

       look 039 /lib/unicode
	      See the start of the Greek alphabet's encoding in the Unicode Standard.

FILES

       /lib/unicode
	      table of characters and descriptions.

SOURCE

       /sys/src/cmd/ascii.c
       /sys/src/cmd/unicode.c

SEE ALSO

       look(1) tcs(1), utf(6), font(6),

																	  ASCII(1)

Linux and UNIX Man Pages

utf(6) [plan9 man page]

Check Out this Related Man Page