inline-detox(1) [debian man page]

DETOX(1)						    BSD General Commands Manual 						  DETOX(1)

NAME

     inline-detox -- clean up filenames (stream-based)

SYNOPSIS

     inline-detox [-hnLrv] [-s -sequence] [-f -configfile] file ...

DESCRIPTION

     The inline-detox utility can remove spaces and other such annoyances from streams.  It'll also translate or cleanup Latin-1 (ISO 8859-1)
     characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters.  Basically its detox, but does not oper-
     ate on files.

   Sequences
     inline-detox is driven by a configurable series of filters, called a sequence.  Sequences are covered in more detail in detoxrc(5) and are
     discoverable with the -L option.  Some examples of default sequences are iso8859_1 and utf_8.

   Options
     The main options:

     -f configfile
		 Use configfile instead of the default configuration files for loading translation sequences.  No other config file will be
		 parsed.

     -h --help	 Display helpful information.

     -L 	 List the currently available sequences.  When paired with -v this option shows what filters are used in each sequence and any
		 properties applied to the filters.

     -r 	 Recurse into subdirectories.

     -s sequence
		 Use sequence instead of default.

     -v 	 Be verbose about which files are being renamed.

     -V 	 Show the current version of inline-detox.

   Deprecated Options
     Deprecated Options are options that were available in earlier versions of inline-detox but have lost their meaning and are being phased out.

     --remove-trailing
		 Removes _ and - after .'s in filenames.  This was first provided in the 0.9 series of inline-detox.  After the introduction of
		 sequences, it lost its meaning, as you could now determine the properties of wipeup through a particular sequence's configura-
		 tion.	It presently forces all instances of the wipeup filter to use remove trailing, regardless of what's actually in the config
		 files.

FILES

     detoxrc	    The system-wide detoxrc file.
     ~/.detoxrc     A user's personal detoxrc.	Normally it extends the system-wide detoxrc, unless -f has been specified, in which case, it is
		    ignored.
     iso8859_1.tbl  The default ISO 8859-1 translation table.
     unicode.tbl    The default Unicode (UTF-8) translation table.

EXAMPLES

     echo Foo Bar | inline-detox -s iso8859_1 -v
		 Will run the sequence iso8859_1 listing any changes and returning the result to STDOUT.

SEE ALSO

     detox(1), detoxrc(5), detox.tbl(5).

HISTORY

     detox was originally designed to clean up files that I had received from friends which had been created using other operating systems.  It's
     trivial to create a filename with spaces, parenthesis, brackets, and ampersands under some operating systems.  These have special meaning
     within FreeBSD and Linux, and cause problems when you go to access them.  I created inline-detox to clean up these files.

AUTHORS

     inline-detox was written by Doug Harple.

BUGS

     Long options don't work under Solaris or Darwin.

     An error in the config file will cause a segfault as it's going to print the offending word within the config file.

BSD
								  August 3, 2004							       BSD

Check Out this Related Man Page

UTF(6)								   Games Manual 							    UTF(6)

NAME

       UTF, Unicode, ASCII, rune - character set and format

DESCRIPTION

       The  Plan 9 character set and representation are based on the Unicode Standard and on the ISO multibyte UTF-8 encoding (Universal Character
       Set Transformation Format, 8 bits wide).  The Unicode Standard represents its characters in 16 bits; UTF-8 represents  such  values  in	an
       8-bit byte stream.  Throughout this manual, UTF-8 is shortened to UTF.

       In Plan 9, a rune is a 16-bit quantity representing a Unicode character.  Internally, programs may store characters as runes.  However, any
       external manifestation of textual information, in files or at the interface  between  programs,	uses  a  machine-independent,  byte-stream
       encoding called UTF.

       UTF  is	designed so the 7-bit ASCII set (values hexadecimal 00 to 7F), appear only as themselves in the encoding.  Runes with values above
       7F appear as sequences of two or more bytes with values only from 80 to FF.

       The UTF encoding of the Unicode Standard is backward compatible with ASCII: programs presented only with ASCII work on Plan 9 even  if  not
       written	to deal with UTF, as do programs that deal with uninterpreted byte streams.  However, programs that perform semantic processing on
       ASCII graphic characters must convert from UTF to runes in order to work properly with non-ASCII input.	See rune(2).

       Letting numbers be binary, a rune x is converted to a multibyte UTF sequence as follows:

       01.   x in [00000000.0bbbbbbb] -> 0bbbbbbb
       10.   x in [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
       11.   x in [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb

       Conversion 01 provides a one-byte sequence that spans the ASCII character set in a compatible way.  Conversions 10 and 11 represent higher-
       valued  characters  as  sequences of two or three bytes with the high bit set.  Plan 9 does not support the 4, 5, and 6 byte sequences pro-
       posed by X-Open.  When there are multiple ways to encode a value, for example rune 0, the shortest encoding is used.

       In the inverse mapping, any sequence except those described above is incorrect and is converted to rune hexadecimal 0080.

FILES

       /lib/unicode
	      table of characters and descriptions, suitable for look(1).

SEE ALSO

       ascii(1), tcs(1), rune(2), keyboard(6), The Unicode Standard.

																	    UTF(6)

Linux and UNIX Man Pages

inline-detox(1) [debian man page]

Check Out this Related Man Page