Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

utf8trans(1) [debian man page]

utf8trans(1)							     docbook2X							      utf8trans(1)

NAME
utf8trans - Transliterate UTF-8 characters according to a table SYNOPSIS
utf8trans charmap [file]... DESCRIPTION
utf8trans transliterates characters in the specified files (or standard input, if they are not specified) and writes the output to standard output. All input and output is in the UTF-8 encoding. This program is usually used to render characters in Unicode text files as some markup escapes or ASCII transliterations. (It is not in- tended for general charset conversions.) It provides functionality similar to the character maps in XSLT 2.0 (XML Stylesheet Language - Transformations, version 2.0). OPTIONS
-m, --modify Modifies the given files in-place with their transliterated output, instead of sending it to standard output. This option is useful for efficient transliteration of many files at once. --help Show brief usage information and exit. --version Show version and exit. USAGE
The translation is done according to the rules in the 'character map', named in the file charmap. It has the following format: 1. Each line represents a translation entry, except for blank lines and comment lines, which are ignored. 2. Any amount of whitespace (space or tab) may precede the start of an entry. 3. Comment lines begin with #. Everything on the same line is ignored. 4. Each entry consists of the Unicode codepoint of the character to translate, in hexadecimal, followed one space or tab, followed by the translation string, up to the end of the line. 5. The translation string is taken literally, including any leading and trailing spaces (except the delimeter between the codepoint and the translation string), and all types of characters. The newline at the end is not included. The above format is intended to be restrictive, to keep utf8trans simple. But if a XML-based format is desired, there is a xmlcharmap2utf8trans script that comes with the docbook2X distribution, that converts character maps in XSLT 2.0 format to the utf8trans format. LIMITATIONS
o utf8trans does not work with binary files, because malformed UTF-8 sequences in the input are substituted with U+FFFD characters. Howev- er, null characters in the input are handled correctly. This limitation may be removed in the future. o There is no way to include a newline or null in the substitution string. AUTHOR
Steve Cheng <stevecheng@users.sourceforge.net>. docbook2X 0.8.8 3 March 2007 utf8trans(1)

Check Out this Related Man Page

docbook2x-man(1)						     docbook2X							  docbook2x-man(1)

NAME
docbook2x-man - Convert DocBook to man pages SYNOPSIS
docbook2x-man [options] xml-document DESCRIPTION
docbook2x-man converts the given DocBook XML document into man pages. By default, the man pages will be output to the current directory. Only the refentry content in the DocBook document is converted. (To convert content outside of a refentry, stylesheet customization is re- quired. See the docbook2X package for details.) The docbook2x-man command is a wrapper script for a two-step conversion process. See the section "CONVERSION PROCESS" below for details. OPTIONS
The available options are essentially the union of the options from db2x_xsltproc(1) and db2x_manxml(1). Some commonly-used options are listed below: --encoding=encoding Sets the character encoding of the output. --string-param parameter=value Sets a stylesheet parameter (options that affect how the output looks). See "Stylesheet parameters" below for the parameters that can be set. --sgml Accept an SGML source document as input instead of XML. --solinks Make stub pages for alternate names for an output man page. STYLESHEET PARAMETERS uppercase-headings Brief. Make headings uppercase? Default setting. 1 (boolean true) Headings in man page content should be or should not be uppercased. manvolnum-cite-numeral-only Brief. Man page section citation should use only the number Default setting. 1 (boolean true) When citing other man pages, the man-page section is either given as is, or has the letters stripped from it, citing only the number of the section (e.g. section 3x becomes 3). This option specifies which style. quotes-on-literals Brief. Display quotes on literal elements? Default setting. 0 (boolean false) If true, render literal elements with quotes around them. show-comments Brief. Display comment elements? Default setting. 1 (boolean true) If true, comments will be displayed, otherwise they are suppressed. Comments here refers to the comment element, which will be re- named remark in DocBook V4.0, not XML comments (<-- like this -->) which are unavailable. function-parens Brief. Generate parentheses after a function? Default setting. 0 (boolean false) If true, the formatting of a <function> element will include generated parenthesis. xref-on-link Brief. Should link generate a cross-reference? Default setting. 1 (boolean true) Man pages cannot render the hypertext links created by link. If this option is set, then the stylesheet renders a cross reference to the target of the link. (This may reduce clutter). Otherwise, only the content of the link is rendered and the actual link itself is ignored. header-3 Brief. Third header text Default setting. (blank) Specifies the text of the third header of a man page, typically the date for the man page. If empty, the date content for the refen- try is used. header-4 Brief. Fourth header text Default setting. (blank) Specifies the text of the fourth header of a man page. If empty, the refmiscinfo content for the refentry is used. header-5 Brief. Fifth header text Default setting. (blank) Specifies the text of the fifth header of a man page. If empty, the 'manual name', that is, the title of the book or reference con- tainer is used. default-manpage-section Brief. Default man page section Default setting. 1 The source document usually indicates the sections that each man page should belong to (with manvolnum in refmeta). In case the source document does not indicate man-page sections, this option specifies the default. custom-localization-file Brief. URI of XML document containing custom localization data Default setting. (blank) This parameter specifies the URI of a XML document that describes text translations (and other locale-specific information) that is needed by the stylesheet to process the DocBook document. The text translations pointed to by this parameter always override the default text translations (from the internal parameter local- ization-file). If a particular translation is not present here, the corresponding default translation is used as a fallback. This parameter is primarily for changing certain punctuation characters used in formatting the source document. The settings for punctuation characters are often specific to the source document, but can also be dependent on the locale. To not use custom text translations, leave this parameter as the empty string. custom-l10n-data Brief. XML document containing custom localization data Default setting. document($custom-localization-file) This parameter specifies the XML document that describes text translations (and other locale-specific information) that is needed by the stylesheet to process the DocBook document. This parameter is internal to the stylesheet. To point to an external XML document with a URI or a file name, you should use the custom-localization-file parameter instead. However, inside a custom stylesheet (not on the command-line) this paramter can be set to the XPath expression document(''), which will cause the custom translations directly embedded inside the custom stylesheet to be read. author-othername-in-middle Brief. Is othername in author a middle name? Default setting. 1 If true, the othername of an author appears between the firstname and surname. Otherwise, othername is suppressed. EXAMPLES
$ docbook2x-man --solinks manpages.xml $ docbook2x-man --solinks --encoding=utf-8//TRANSLIT manpages.xml $ docbook2x-man --string-param header-4="Free Recode 3.6" document.xml .fi CONVERSION PROCESS
Converting to man pages DocBook documents are converted to man pages in two steps: 1. The DocBook source is converted by a XSLT stylesheet into an intermediate XML format, Man-XML. Man-XML is simpler than DocBook and closer to the man page format; it is intended to make the stylesheets' job easier. The stylesheet for this purpose is in xslt/man/docbook.xsl. For portability, it should always be referred to by the following URI: http://docbook2x.sourceforge.net/latest/xslt/man/docbook.xsl Run this stylesheet with db2x_xsltproc(1). Customizing. You can also customize the output by creating your own XSLT stylesheet -- changing parameters or adding new templates -- and importing xslt/man/docbook.xsl. 2. Man-XML is converted to the actual man pages by db2x_manxml(1). The docbook2x-man command does both steps automatically, but if any problems occur, you can see the errors more clearly if you do each step separately: $ db2x_xsltproc -s man mydoc.xml -o mydoc.mxml $ db2x_manxml mydoc.mxml .fi Options to the conversion stylesheet are described in the man-pages stylesheets reference. Pure XSLT conversion. An alternative to the db2x_manxml Perl script is the XSLT stylesheet in xslt/backend/db2x_manxml.xsl. This stylesheet performs a similar function of converting Man-XML to actual man pages. It is useful if you desire a pure XSLT solution to man-page conversion. Of course, the quality of the conversion using this stylesheet will never be as good as the Perl db2x_manxml, and it runs slower. In particular, the pure XSLT version currently does not support tables in man pages, but its Perl counterpart does. Character set conversion When translating XML to legacy ASCII-based formats with poor support for Unicode, such as man pages and Texinfo, there is always the prob- lem that Unicode characters in the source document also have to be translated somehow. A straightforward character set conversion from Unicode does not suffice, because the target character set, usually US-ASCII or ISO Latin-1, do not contain common characters such as dashes and directional quotation marks that are widely used in XML documents. But docu- ment formatters (man and Texinfo) allow such characters to be entered by a markup escape: for example, (lq for the left directional quote ". And if a markup-level escape is not available, an ASCII transliteration might be used: for example, using the ASCII less-than sign < for the angle quotation mark <. So the Unicode character problem can be solved in two steps: 1. utf8trans(1), a program included in docbook2X, maps Unicode characters to markup-level escapes or transliterations. Since there is not necessarily a fixed, official mapping of Unicode characters, utf8trans can read in user-modifiable character map- pings expressed in text files and apply them. (Unlike most character set converters.) In charmaps/man/roff.charmap and charmaps/man/texi.charmap are character maps that may be used for man-page and Texinfo conversion. The programs db2x_manxml(1) and db2x_texixml(1) will apply these character maps, or another character map specified by the user, auto- matically. 2. The rest of the Unicode text is converted to some other character set (encoding). For example, a French document with accented charac- ters (such as e) might be converted to ISO Latin 1. This step is applied after utf8trans character mapping, using the iconv(1) encoding conversion tool. Both db2x_manxml(1) and db2x_tex- ixml(1) can call iconv(1) automatically when producing their output. FILES
/usr/local/share/docbook2X/xslt/man/docbook.xsl /usr/local/share/docbook2X/xslt/backend/db2x_manxml.xsl /usr/local/share/docbook2X/xslt/catalog.xml /usr/local/share/docbook2X/charmaps/roff.charmap /usr/local/share/docbook2X/charmaps/roff.charmap.xml The above files are distributed and installed by the docbook2X package. NOTES
The docbook2x-man or the docbook2texi command described in this manual page come from the docbook2X package. It should not be confused with the command of the same name from the obsoleted docbook-utils package. LIMITATIONS
o Internally there is one long pipeline of programs which your document goes through. If any segment of the pipeline fails (even trivially, like from mistyped program options), the resulting errors can be difficult to decipher -- in this case, try running the components of docbook2X separately. AUTHOR
Steve Cheng <stevecheng@users.sourceforge.net>. SEE ALSO
db2x_xsltproc(1), db2x_manxml(1), utf8trans(1) The docbook2X manual (in Texinfo or HTML format) fully describes how to convert DocBook to man pages and Texinfo. Up-to-date information about this program can be found at the docbook2X Web site <http://docbook2x.sourceforge.net/> . docbook2X 0.8.8 3 March 2007 docbook2x-man(1)
Man Page