Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

cssdiff(1) [suse man page]

cssdiff(1)							      CRM114								cssdiff(1)

  NAME
      cssdiff - generate a difference summary of two .css files

  SYNOPSIS
      cssdiff [cssfile 1] [cssfile 2]

  WARNING
      This  man  page  is  taken from an older CRM114 version.	It is provided as a convenience to Debian users and may not be up-to-date.  If you
      would like to update it, please send appropriate patches to the Debian bug tracking system.

  DESCRIPTION
      cssdiff is a special-purpose utility to measure the distance between the classes represented by cssfile1 and cssfile2.  The  summary  result
      output  tells  how  many	features  were in each of the .css files, how many features appeared in both (balanced overlap), how many features
      appeared only in one (or unbalanced overlaps), and how often the feature set of one .css file strictly dominated the feature set of  another
      .css  file.  This set of metrics provides an intuitive way to determine the similarity (or dissimilarity) of two classes represented as .css
      files.  When using the CRM114 spamfilter, it can be used to find out how easy it will be for CRM114 to differentiate spam from nonspam  with
      your .css files.	cssdiff prints a report like e.g.

	 Sparse spectra file spam.css has 1048577 bins total
	 Sparse spectra file nonspam.css has 1048577 bins total

	 File 1 total features		  :	 1605968
	 File 2 total features		  :	 1045152

	 Similarities between files	  :	  142039
	 Differences between files	  :	 1279964

	 File 1 dominates file 2	  :	 1463929
	 File 2 dominates file 1	  :	  903113

      Note that in this case there's a big difference between the two files; in this case there are about 10 times as many differences between the
      two files as there are similarities.

  OPTIONS
      There are no options to cssdiff.

  SHORTCOMINGS
      Note that cssdiff as of version 20040816 is NOT capable of dealing with the CRM114 Winnow classifier's  floating-point  .cow  files.  Worse,
      cssdiff  is  unaware of it's shortcomings, and will try anyway. The only recourse is to be aware of this issue and not use cssdiff on Winnow
      classifier floating point .cow format files.

  HOMEPAGE AND REPORTING BUGS
      http://crm114.sourceforge.net/

  VERSION
      This manpage: $Id: cssdiff.azm,v 1.5 2004/08/19 09:06:44 vanbaal Exp $ This  manpage  describes  cssdiff	as  shipped  with  crm114  version
      20040816.BlameClockworkOrange.

  AUTHOR
      William S. Yerazunis. Manpage typesetting by Joost van Baal and Shalendra Chhabra

  COPYRIGHT
      Copyright  (C)  2001, 2002, 2003, 2004 William S. Yerazunis.  This is free software, copyrighted under the FSF's GPL.  There is NO warranty;
      not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the file COPYING for more details.

  SEE ALSO
      cssutil(1), cssmerge(1), crm(1), cssmerge(1)

  cssdiff 20040816.BlameClockworkOrange-auto.3			      19 Aug 2004							  cssdiff(1)

Check Out this Related Man Page

cssutil(1)							      CRM114								cssutil(1)

  NAME
      cssutil - utility to measure and manipulate CRM114 statistics files.

  SYNOPSIS
      cssutil [.css file] [OPTIONS]

  WARNING
      This  man  page  is  taken from an older CRM114 version.	It is provided as a convenience to Debian users and may not be up-to-date.  If you
      would like to update it, please send appropriate patches to the Debian bug tracking system.

  OPTIONS
      -h
	 print basic help

      -b
	 brief - print only a summary of the statistics of the .css file (otherwise, prints a full list of how	many  bins  are  in  each  counter
	 state)

      -q
	 quiet mode; no warning messages

      -r
	 report then exit (no menu). The default if -r is not specified is to drop into a command-menu based system.

      -s
	 if no css file found, create new one with this many buckets. Default is 1 million + 1 buckets

      -S
	 same as -s, but round up to next 2^n + 1 boundary.

      -v
	 print version and exit

      -D
	 dump  css  file  to stdout in the architecture-independent CSV format, suitable for reloading with -R in an architecture. (note that .css
	 files are a hardware-architecture dependent format)

      -R
	 create and restore css from the hardware-architecture independent CSV format file (reads from stdin if csv-file is not supplied.

  THE COMMAND MENU
      If -r is not supplied, a menu appears with the following options. Note that all of these operations are "in place" and surgical- there is NO
      undo functionality. Wise users will make a backup copy of all .css files before using cssutil to alter values.

      -Z
	 zero  all  bins  at or below a value. This is useful for deleting all small-count features from the .css statistics files leaving higher-
	 count features untouched.

      -S
	 subtract a constant from all bins - this rolls all features back a constant amount.

      -D
	 divide all bins by a constant - this rolls features back linearly, rather than in scalar fashion.

      -R
	 rescan - regenerate the statistics output that was initially printed.

      -P
	 pack - re-slot features to optimize access time.

      -Q
	 - gracefully exit, saving changes. (note that since these operations are in-place and surgical, there is no option to exit without saving
	 changes.

  DESCRIPTION
      cssutil  is a general utility to manipulate and measure the .css format statistics files used by CRM114's Markovian and OSB classifiers. The
      biggest uses are to check the available space remaining in a .css file, to selectively groom a .css file, and to port architecture-dependent
      .css  files  to and from an ASCII CSV format, which is architecture independent.	The cssutil program can be used to create information-less
      .css files:

	   cssutil -b -r spam.css
	   cssutil -b -r nonspam.css

      . This creates the full-size files ./spam.css and ./nonspam.css, holding no information.	The cssutil program can be  used  check  that  the
      .css files are reasonable.  Invoke cssutil as:

	  cssutil -b -r spam.css
	  cssutil -b -r nonspam.css

      You should get back a report something like this:

	   Sparse spectra file spam.css statistics:

	   Total available buckets	    :	   1048576
	   Total buckets in use 	    :	    506987
	   Total hashed datums in file	    :	   1605968
	   Average datums per bucket	    :	      3.17
	   Maximum length of overflow chain :		39
	   Average length of overflow chain :	      1.84
	   Average packing density	    :	      0.48

      Note  that  the  packing density is 0.48; this means that this .css file is about half full of features. Once the packing density gets above
      about 0.9, you will notice that CRM114 will take longer to process text. The penalty is small below packing densities below about  0.95  and
      only about a factor of 2 at 0.97 .  Best is to keep it below .7 to .8.

  SHORTCOMINGS
      Note  that  cssutil  as  of version 20040816 is NOT capable of dealing with the CRM114 Winnow classifier's floating-point .cow files. Worse,
      cssutil is unaware of it's shortcomings, and will try anyway. The only recourse is to be aware of this issue and not use cssutil on a Winnow
      classifier floating point .cow format file.

  HOMEPAGE AND REPORTING BUGS
      http://crm114.sourceforge.net/

  VERSION
      This  manpage:  $Id:  cssutil.azm,v  1.4	2004/08/19  09:23:24  vanbaal  Exp $ This manpage describes cssutil as shipped with crm114 version
      20040816.BlameClockworkOrange.

  AUTHOR
      William S. Yerazunis. Manpage typesetting by Joost van Baal and Shalendra Chhabra

  COPYRIGHT
      Copyright (C) 2001, 2002, 2003, 2004 William S. Yerazunis. This is free software, copyrighted under the FSF's GPL. There is NO warranty; not
      even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the file COPYING for more details.

  SEE ALSO
      cssmerge(1), cssdiff(1), crm(1)

  cssutil 20040816.BlameClockworkOrange-auto.3			      19 Aug 2004							  cssutil(1)
Man Page