TCS(1) General Commands Manual TCS(1)NAME
tcs - translate character sets
SYNOPSIS
tcs [ -slcv ] [ -f ics ] [ -t ocs ] [ file ... ]
DESCRIPTION
Tcs interprets the named file(s) (standard input default) as a stream of characters from the ics character set or format, converts them to
runes, and then converts them into a stream of characters from the ocs character set or format on the standard output. The default value
for ics and ocs is utf, the UTF encoding described in utf(6). The -l option lists the character sets known to tcs. Processing continues
in the face of conversion errors (the -s option prevents reporting of these errors). The -c option forces the output to contain only cor-
rectly converted characters; otherwise, 0x80 characters will be substituted for UTF encoding errors and 0xFFFD characters will substituted
for unknown characters.
The -v option generates various diagnostic and summary information on standard error, or makes the -l output more verbose.
Tcs recognizes an ever changing list of character sets. In particular, it supports a variety of Russian and Japanese encodings. Some of
the supported encodings are
utf The Plan 9 UTF encoding, known by ISO as UTF-8
utf1 The deprecated original UTF encoding from ISO 10646
ascii 7-bit ASCII
8859-1 Latin-1 (Central European)
8859-2 Latin-2 (Czech .. Slovak)
8859-3 Latin-3 (Dutch .. Turkish)
8859-4 Latin-4 (Scandinavian)
8859-5 Part 5 (Cyrillic)
8859-6 Part 6 (Arabic)
8859-7 Part 7 (Greek)
8859-8 Part 8 (Hebrew)
8859-9 Latin-5 (Finnish .. Portuguese)
koi8 KOI-8 (GOST 19769-74)
jis-kanji
ISO 2022-JP
ujis EUC-JX: JIS 0208
ms-kanji
Microsoft, or Shift-JIS
jis (from only) guesses between ISO 2022-JP, EUC or Shift-Jis
gb Chinese national standard (GB2312-80)
big5 Big 5 (HKU version)
unicode
Unicode Standard 1.0
tis Thai character set plus ASCII (TIS 620-1986)
msdos IBM PC: CP 437
atari Atari-ST character set
EXAMPLES
tcs -f 8859-1
Convert 8859-1 (Latin-1) characters into UTF format.
tcs -s -f jis
Convert characters encoded in one of several shift JIS encodings into UTF format. Unknown Kanji will be converted into 0xFFFD char-
acters.
tcs -lv
Print an up to date list of the supported character sets.
SOURCE
/sys/src/cmd/tcs
SEE ALSO ascii(1), rune(2), utf(6).
TCS(1)
Check Out this Related Man Page
kiconv_open(9F) Kernel Functions for Drivers kiconv_open(9F)NAME
kiconv_open - code conversion descriptor allocation function
SYNOPSIS
#include <sys/sunddi.h>
kiconv_t kiconv_open(const char *tocode, const char *fromcode);
INTERFACE LEVEL
Solaris DDI specific (Solaris DDI).
PARAMETERS
tocode Points to a target codeset name string.
fromcode Points to a source codeset name string.
DESCRIPTION
The kiconv_open() function returns a code conversion descriptor that describes a conversion from the codeset specified by fromcode to the
codeset specified by tocode. For state-dependent encodings, the conversion descriptor is in a codeset-dependent initial state (ready for
immediate use with the kiconv() function).
Supported code conversions are between UTF-8 and the following:
Name Description
Big5 Traditional Chinese Big5
Big5-HKSCS Traditional Chinese Big5-Hong Kong
Supplementary Character Set
CP720 DOS Arabic
CP737 DOS Greek
CP850 DOS Latin-1 (Western European)
CP852 DOS Latin-2 (Eastern European)
CP857 DOS Latin-5 (Turkish)
CP862 DOS Hebrew
CP866 DOS Cyrillic Russian
CP932 Japanese Shift JIS (Windows)
CP950-HKSCS Traditional Chinese HKSCS-2001 (Windows)
CP1250 Central Europe
CP1251 Cyrillic
CP1252 Western Europe
CP1253 Greek
CP1254 Turkish
CP1255 Hebrew
CP1256 Arabic
CP1257 Baltic
EUC-CN Simplified Chinese EUC
EUC-JP Japanese EUC
EUC-JP-MS Japanese EUC MS
EUC-KR Korean EUC
EUC-TW Traditional Chinese EUC
GB18030 Simplified Chinese GB18030
GBK Simplified Chinese GBK
ISO-8859-1 Latin-1 (Western European)
ISO-8859-2 Latin-2 (Eastern European)
ISO-8859-3 Latin-3 (Southern European)
ISO-8859-4 Latin-4 (Northern European)
ISO-8859-5 Cyrillic
ISO-8859-6 Arabic
ISO-8859-7 Greek
ISO-8859-8 Hebrew
ISO-8859-9 Latin-5 (Turkish)
ISO-8859-10 Latin-6 (Nordic)
ISO-8859-13 Latin-7 (Baltic)
ISO-8859-15 Latin-9 (Western European with euro sign)
KOI8-R Cyrillic
Shift_JIS Japanese Shift JIS (JIS)
TIS_620 Thai (a.k.a. ISO 8859-11)
Unified-Hangul Korean Unified Hangul
UTF-8 and the above names can be used at tocode and fromcode to specify the desired code conversion. The following aliases are also sup-
ported as alternative names to be used:
Aliases Original Name
720 CP720
737 CP737
850 CP850
852 CP852
857 CP857
862 CP862
866 CP866
932 CP932
936, CP936 GBK
949, CP949 Unified-Hangul
950, CP950 Big5
1250 CP1250
1251 CP1251
1252 CP1252
1253 CP1253
1254 CP1254
1255 CP1255
1256 CP1256
1257 CP1257
ISO-8859-11 TIS_620
PCK, SJIS Shift_JIS
A conversion descriptor remains valid until it is closed by using kiconv_close().
RETURN VALUES
Upon successful completion, kiconv_open() returns a code conversion descriptor for use on subsequent calls to kiconv(). Otherwise, if the
conversion specified by fromcode and tocode is not supported or for any other reasons the code conversion descriptor cannot be allocated,
kiconv_open() returns (kiconv_t)-1 to indicate the error.
CONTEXT
kiconv_close() can be called from user context only.
EXAMPLES
Example 1 Opening a Code Conversion
The following example shows how to open a code conversion from ISO 8859-15 to UTF-8
#include <sys/sunddi.h>
kiconv_t cd;
cd = kiconv_open("UTF-8", "ISO-8859-15");
if (cd == (kiconv_t)-1) {
/* Cannot open up the code conversion. */
return (-1);
}
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
+-----------------------------+-----------------------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+-----------------------------+-----------------------------+
|Interface Stability |Committed |
+-----------------------------+-----------------------------+
SEE ALSO iconv(3C), iconv_close(3C), iconv_open(3C), u8_strcmp(3C), u8_textprep_str(3C), u8_validate(3C), uconv_u16tou32(3C), uconv_u16tou8(3C),
uconv_u32tou16(3C), uconv_u32tou8(3C), uconv_u8tou16(3C), uconv_u8tou32(3C), attributes(5), kiconv(9F), kiconvstr(9F), kiconv_close(9F),
u8_strcmp(9F), u8_textprep_str(9F), u8_validate(9F), uconv_u16tou32(9F), uconv_u16tou8(9F), uconv_u32tou16(9F), uconv_u32tou8(9F),
uconv_u8tou16(9F), uconv_u8tou32(9F)
The Unicode Standard
http://www.unicode.org/standard/standard.html
NOTES
The code conversions are available between UTF-8 and the above noted codesets. For example, to convert from EUC-JP to Shift_JIS, first con-
vert EUC-JP to UTF-8 and then convert UTF-8 to Shift_JIS.
The code conversions supported are based on simple one-to-one mappings. There is no special treatment or processing done during code con-
versions such as case conversion, Unicode Normalization, or mapping between combining or conjoining sequences of UTF-8 and pre-composed
characters in non-UTF-8 codesets.
All supported non-UTF-8 codesets use pre-composed characters only. However, UTF-8 allows combining or conjoining characters too. For this
reason, using a form of Unicode Normalizations on UTF-8 text with u8_textprep_str() before or after doing code conversions might be neces-
sary.
SunOS 5.11 16 Oct 2007 kiconv_open(9F)