Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

marc::charset(3pm) [debian man page]

MARC::Charset(3pm)					User Contributed Perl Documentation					MARC::Charset(3pm)

NAME
MARC::Charset - convert MARC-8 encoded strings to UTF-8 SYNOPSIS
# import the marc8_to_utf8 function use MARC::Charset 'marc8_to_utf8'; # prepare STDOUT for utf8 binmode(STDOUT, 'utf8'); # print out some marc8 as utf8 print marc8_to_utf8($marc8_string); DESCRIPTION
MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates unicode, and allows you to put non-Roman scripts in MARC bibliographic records. http://www.loc.gov/marc/specifications/spechome.html EXPORTS
ignore_errors() Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current setting. This is helpful if you have records that contain both MARC8 and UNICODE characters. my $ignore = MARC::Charset->ignore_errors(); MARC::Charset->ignore_errors(1); # ignore errors MARC::Charset->ignore_errors(0); # DO NOT ignore errors assume_unicode() Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in ignore_errors mode and returns the current setting. This is helepfuli if you have records that contain both MARC8 and UNICODE characters. my $setting = MARC::Charset->assume_unicode(); MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8) MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode assume_encoding() Tells MARC::Charset whether or not to assume a specific encoding when an error is encountered in ignore_errors mode and returns the current setting. This is helpful if you have records that contain both MARC8 and other characters. my $setting = MARC::Charset->assume_encoding(); MARC::Charset->assume_encoding('cp850'); # assume characters are cp850 MARC::Charset->assume_encoding(''); # DO NOT assume any encoding marc8_to_utf8() Converts a MARC-8 encoded string to UTF-8. my $utf8 = marc8_to_utf8($marc8); If you'd like to ignore errors pass in a true value as the 2nd parameter or call MARC::Charset->ignore_errors() with a true value: my $utf8 = marc8_to_utf8($marc8, 'ignore-errors'); or MARC::Charset->ignore_errors(1); my $utf8 = marc8_to_utf8($marc8); utf8_to_marc8() Will attempt to translate utf8 into marc8. my $marc8 = utf8_to_marc8($utf8); If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter: my $marc8 = utf8_to_marc8($utf8, 'ignore-errors'); or MARC::Charset->ignore_errors(1); my $utf8 = marc8_to_utf8($marc8); DEFAULT CHARACTER SETS
If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the appropriate character set code: use MARC::Charset::Constants qw(:all); $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC; $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC; SEE ALSO
o MARC::Charset::Constant o MARC::Charset::Table o MARC::Charset::Code o MARC::Charset::Compiler o MARC::Record o MARC::XML AUTHOR
Ed Summers (ehs@pobox.com) perl v5.12.4 2011-08-05 MARC::Charset(3pm)

Check Out this Related Man Page

MARC::File(3pm) 					User Contributed Perl Documentation					   MARC::File(3pm)

NAME
MARC::File - Base class for files of MARC records SYNOPSIS
use MARC::File::USMARC; # If you have werid control fields... use MARC::Field; MARC::Field->allow_controlfield_tags('FMT', 'LDX'); my $file = MARC::File::USMARC->in( $filename ); while ( my $marc = $file->next() ) { # Do something } $file->close(); undef $file; EXPORT
None. METHODS
in() Opens a file for import. Ordinarily you will use "MARC::File::USMARC" or "MARC::File::MicroLIF" to do this. my $file = MARC::File::USMARC->in( 'file.marc' ); Returns a "MARC::File" object, or "undef" on failure. If you encountered an error the error message will be stored in $MARC::File::ERROR. Optionally you can also pass in a filehandle, and "MARC::File". will "do the right thing". my $handle = IO::File->new( 'gunzip -c file.marc.gz |' ); my $file = MARC::File::USMARC->in( $handle ); next( [&filter_func] ) Reads the next record from the file handle passed in. The $filter_func is a reference to a filtering function. Currently, only USMARC records support this. See MARC::File::USMARC's "decode()" function for details. Returns a MARC::Record reference, or "undef" on error. skip() Skips over the next record in the file. Same as "next()", without the overhead of parsing a record you're going to throw away anyway. Returns 1 or undef. warnings() Simlilar to the methods in MARC::Record and MARC::Batch, "warnings()" will return any warnings that have accumulated while processing this file; and as a side-effect will clear the warnings buffer. close() Closes the file, both from the object's point of view, and the actual file. write() Writes a record to the output file. This method must be overridden in your subclass. decode() Decodes a record into a USMARC format. This method must be overridden in your subclass. RELATED MODULES
MARC::Record TODO
o "out()" method We only handle files for input right now. LICENSE
This code may be distributed under the same terms as Perl itself. Please note that these modules are not products of or supported by the employers of the various contributors to the code. AUTHOR
Andy Lester, "<andy@petdance.com>" perl v5.10.1 2010-03-29 MARC::File(3pm)
Man Page