Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

cz::cstocs(3pm) [debian man page]

Cz::Cstocs(3pm) 					User Contributed Perl Documentation					   Cz::Cstocs(3pm)

NAME
Cz::Cstocs - conversions of charset encodings for the Czech language SYNOPSIS
use Cz::Cstocs; my $il2_to_ascii = new Cz::Cstocs 'il2', 'ascii'; while (<>) { print &$il2_to_ascii($_); } use Cz::Cstocs 'il2_ascii'; while (<>) { print il2_ascii($_); } use Cz::Cstocs; sub il2toascii; # inform the parser that there is a function il2toascii *il2toascii = new Cz::Cstocs 'il2', 'ascii'; # now define the function print il2toascii $data; # thanks to Jan Krynicky for poining this out DESCRIPTION
This module helps in converting texts between various charset encodings, used for Czech and Slovak languages. The instance of the object Cz::Cstocs is created using method new. It takes at least two parameters for input and output encoding and can be afterwards used as a function reference to convert strings/lists. Cz::Cstocs supports fairly free form of aliases, so iso8859-2, ISO-8859-2, iso88592 and il2 are all aliases of the same encoding. For backward compatibility, method conv is supported as well, so the example above could also read while (<>) { print $il2_to_ascii->conv($_); } You can also use typeglob syntax. The conversion function takes a list and returns list of converted strings (in the list context) or one string consisting of concatenated results (in the scalar context). You can modify the behaviour of the conversion function by specifying hash of other options after the encoding names in call to new. fillstring Gives alternate string that will replace characters from input encoding that are not present in the output encoding. Default is space. use_accent Defines whether the accent file should be used. Default is 1 (true). nofillstring When 1 (true), will keep characters that do not have friends in accent nor output encoding, will no replace them with fillstring. Default is 0 except for tex, because you probably rather want to keep backslashed symbols than loose them. cstocsdir Alternate location for encoding and accent files. The default is the Cz/Cstocs/enc directory in Perl library tree. This location can also be changed with the CSTOCSDIR environment variable. There is an alternate way to define the conversion function: any arguments after use Cz::Cstocs that have form encoding_encoding or encoding_to_encoding are processed and the appropriate functions are imported. So, use Cz::Cstocs qw(pc2_to_il2 il2_ascii); define two functions, that are loaded into caller's namespace and can be used directly. In this case, you cannot specify additional options, you only have default behaviour. ERROR HANDLING
If you request an unknown encoding in the call to new Cz::Cstocs, the conversion object is not defined and the variable $Cz::Cstocs::errstr is set to the error message. When you specify unknown encoding in the use call style (like "use Cz::Cstocs 'il2_ascii';"), the die is called. AUTHOR
Jan Pazdziora, adelton@fi.muni.cz, created the module version. Jan "Yenya" Kasprzak has done the original Un*x implementation. VERSION
3.4 SEE ALSO
cstocs(1), perl(1), or Xcstocs at http://www.lut.fi/~kurz/programs/xcstocs.tar.gz. perl v5.10.1 2002-10-17 Cz::Cstocs(3pm)

Check Out this Related Man Page

UM(3pm) 						User Contributed Perl Documentation						   UM(3pm)

NAME
XML::UM - Convert UTF-8 strings to any encoding supported by XML::Encoding SYNOPSIS
use XML::UM; # Set directory with .xml files that comes with XML::Encoding distribution # Always include the trailing slash! $XML::UM::ENCDIR = '/home1/enno/perlModules/XML-Encoding-1.01/maps/'; # Create the encoding routine my $encode = XML::UM::get_encode ( Encoding => 'ISO-8859-2', EncodeUnmapped => &XML::UM::encode_unmapped_dec); # Convert a string from UTF-8 to the specified Encoding my $encoded_str = $encode->($utf8_str); # Remove circular references for garbage collection XML::UM::dispose_encoding ('ISO-8859-2'); DESCRIPTION
This module provides methods to convert UTF-8 strings to any XML encoding that XML::Encoding supports. It creates mapping routines from the .xml files that can be found in the maps/ directory in the XML::Encoding distribution. Note that the XML::Encoding distribution does install the .enc files in your perl directory, but not the.xml files they were created from. That's why you have to specify $ENCDIR as in the SYNOPSIS. This implementation uses the XML::Encoding class to parse the .xml file and creates a hash that maps UTF-8 characters (each consisting of up to 4 bytes) to their equivalent byte sequence in the specified encoding. Note that large mappings may consume a lot of memory! Future implementations may parse the .enc files directly, or do the conversions entirely in XS (i.e. C code.) get_encode (Encoding => STRING, EncodeUnmapped => SUB) The central entry point to this module is the XML::UM::get_encode() method. It forwards the call to the global $XML::UM::FACTORY, which is defined as an instance of XML::UM::SlowMapperFactory by default. Override this variable to plug in your own mapper factory. The XML::UM::SlowMapperFactory creates an instance of XML::UM::SlowMapper (and caches it for subsequent use) that reads in the .xml encoding file and creates a hash that maps UTF-8 characters to encoded characters. The get_encode() method of XML::UM::SlowMapper is called, finally, which generates an anonimous subroutine that uses the hash to convert multi-character UTF-8 blocks to the proper encoding. dispose_encoding ($encoding_name) Call this to free the memory used by the SlowMapper for a specific encoding. Note that in order to free the big conversion hash, the user should no longer have references to the subroutines generated by get_encode(). The parameters to the get_encode() method (defined as name/value pairs) are: o Encoding The name of the desired encoding, e.g. 'ISO-8859-2' o EncodeUnmapped (Default: &XML::UM::encode_unmapped_dec) Defines how Unicode characters not found in the mapping file (of the specified encoding) are printed. By default, they are converted to decimal entity references, like '&#123;' Use &XML::UM::encode_unmapped_hex for hexadecimal constants, like '&#xAB;' CAVEATS
I'm not exactly sure about which Unicode characters in the range (0 .. 127) should be mapped to themselves. See comments in XML/UM.pm near %DEFAULT_ASCII_MAPPINGS. The encodings that expat supports by default are currently not supported, (e.g. UTF-16, ISO-8859-1), because there are no .enc files available for these encodings. This module needs some more work. If you have the time, please help! AUTHOR
Original Author is Enno Derksen. Send bug reports, hints, tips, suggestions to T.J Mather at <tjmather@tjmather.com>. perl v5.10.1 2010-01-03 UM(3pm)
Man Page