unihist(1) [debian man page]

unihist(1)						      General Commands Manual							unihist(1)

NAME

       unihist - Generate a histogram of the characters in a Unicode file

SYNOPSIS

       unihist ([option flags])

DESCRIPTION

       unihist	generates  a  histogram  of the characters in its input, which must be encoded in UTF-8 Unicode. By default, for each character it
       prints the frequency of the character as a percentage of the total, the absolute number of tokens in the input, the UTF-32 code	in   hexa-
       decimal,  and, if the character is displayable, the glyph itself as UTF-8 Unicode. Command line flags allow unwanted information to be sup-
       pressed.  In particular, note that by suppressing the percentages and counts it is possible to generate a list of the unique characters	in
       the input.

       Output is produced ordered by character code. To sort it in descending order of frequency, pipe the output into the command:

	      sort -k1 -n -r

       By  default,  unihist  handles all of Unicode. To reduce memory usage and increase speed, it may be compiled so as to handle only the Basic
       Multilingual Plane (plane 0) by defining BMPONLY.

COMMAND LINE FLAGS

       -c     Suppress printing of counts and percentages.

       -g     Suppress printing of glyphs.

       -h     Print usage information.

       -u     Suppress printing of the Unicode code as text.

       -v     Print version information.

SEE ALSO

       uniname (1)

REFERENCES

       Unicode Standard, version 5.0

AUTHOR

       Bill Poser
       billposer@alum.mit.edu

LICENSE

       GNU General Public License

								     May, 2008								unihist(1)

Check Out this Related Man Page

ASCII(1)						      General Commands Manual							  ASCII(1)

NAME

       ascii, unicode - interpret ASCII, Unicode characters

SYNOPSIS

       ascii [ -8 ] [ -oxdbn ] [ -nct ] [ text ]

       unicode [ -nt ] hexmin-hexmax

       unicode [ -t ] hex [ ...  ]

       unicode [ -n ] characters

       look hex /lib/unicode

DESCRIPTION

       Ascii prints the ASCII values corresponding to characters and vice versa; under the -8 option, the ISO Latin-1 extensions (codes 0200-0377)
       are included.  The values are interpreted in a settable numeric base; -o specifies octal, -d decimal, -x hexadecimal (the default), and -bn
       base n.

       With  no  arguments, ascii prints a table of the character set in the specified base.  Characters of text are converted to their ASCII val-
       ues, one per line. If, however, the first text argument is a valid number in the specified base, conversion goes the opposite way.  Control
       characters are printed as two- or three-character mnemonics.  Other options are:

       -n     Force numeric output.

       -c     Force character output.

       -t     Convert from numbers to running text; do not interpret control characters or insert newlines.

       Unicode	is  similar; it converts between UTF and character values from the Unicode Standard (see utf(6)).  If given a range of hexadecimal
       numbers, unicode prints a table of the specified Unicode characters -- their values and UTF representations.  Otherwise it translates  from
       UTF  to numeric value or vice versa, depending on the appearance of the supplied text; the -n option forces numeric output to avoid ambigu-
       ity with numeric characters.  If converting to UTF , the characters are printed one per line unless the -t flag is set, in which  case  the
       output is a single string containing only the specified characters.  Unlike ascii, unicode treats no characters specially.

       The output of ascii and unicode may be unhelpful if the characters printed are not available in the current font.

       The  file /lib/unicode contains a table of characters and descriptions, sorted in hexadecimal order, suitable for look(1) on the lower case
       hex values of characters.

EXAMPLES

       ascii -d
	      Print the ASCII table base 10.

       unicode p
	      Print the hex value of `p'.

       unicode 2200-22f1
	      Print a table of miscellaneous mathematical symbols.

       look 039 /lib/unicode
	      See the start of the Greek alphabet's encoding in the Unicode Standard.

FILES

       /lib/unicode
	      table of characters and descriptions.

SOURCE

       /sys/src/cmd/ascii.c
       /sys/src/cmd/unicode.c

SEE ALSO

       look(1) tcs(1), utf(6), font(6),

																	  ASCII(1)

Linux and UNIX Man Pages

unihist(1) [debian man page]

Check Out this Related Man Page