formatrpsdb(1) [debian man page]

FORMATRPSDB(1)						     NCBI Tools User's Manual						    FORMATRPSDB(1)

NAME

       formatrpsdb - Build databases for RPS Blast

SYNOPSIS

       formatrpsdb [-] [-E N] [-G N] [-S X] [-U str] [-b] [-f X] -i filename [-l filename] [-n str] [-o] [-t str] [-v N]

DESCRIPTION

       Formatrpsdb  is	a  utility  that  converts a collection of input sequences into a database suitable for use with Reverse Position Specific
       (RPS) Blast.  Each input sequence, together with its position-specific scoring matrix (PSSM), is ASN.1 encoded  into  a	PssmWithParameters
       (or  `scoremat')  object  and  resides in a separate file.  Scoremat objects can be created using blastpgp.  Formatrpsdb is given a list of
       these files and produces the corresponding database.

       Formatrpsdb is designed to perform the work of formatdb, makemat and copymat simultaneously, without generating the large number of  inter-
       mediate	files  these  utilities  would	need  to create an RPS Blast database.	Further, scoremat objects are in more general use than the
       binary format makemat requires.	It is hoped that direct manipulation of  scoremat  objects  will  encourage  conversion  of  more  diverse
       sequence collections into RPS Blast databases.

       Databases  generated by formatrpsdb are binary compatible with databases generated by formatdb/makemat/copymat, although the database files
       will in general not be byte- for-byte identical.

OPTIONS

       A summary of options is included below.

       -      Print usage message

       -E N   The gap extension penalty (if not specified in the scoremat; default = 1)

       -G N   The gap opening penalty (if not specified in the scoremat; default = 11)

       -S X   For scoremats that contain only residue frequencies, the scaling factor to apply when creating PSSMs (default = 100)

       -U str Underlying score matrix (if not specified in the scoremat; default = BLOSUM62)

       -b     Scoremat files are binary (vs. text) ASN1.

       -f X   Threshold for extending hits for RPS database (default = 11)

       -i filename
	      Input file containing list of ASN.1 Scoremat filenames

       -l filename
	      Log file name (default = formatrpsdb.log)

       -n str Base name of output database (same as input file if not specified)

       -o     Create index files for database

       -t str Title for database file

       -v N   Database volume size in millions of letters (default = 0, which really means no limit)

AUTHOR

       The National Center for Biotechnology Information.

SEE ALSO

       blast(1), copymat(1), formatdb(1), makemat(1), /usr/share/doc/blast2/formatrpsdb.html

NCBI
								    2004-10-20							    FORMATRPSDB(1)

Check Out this Related Man Page

CLEANASN(1)						     NCBI Tools User's Manual						       CLEANASN(1)

NAME

       cleanasn - clean up irregularities in NCBI ASN.1 objects

SYNOPSIS

       cleanasn  [-]  [-A filename]  [-C str] [-D str] [-F str] [-K str] [-L filename] [-M filename] [-N str] [-P str] [-Q str] [-R] [-S str] [-T]
       [-U str] [-V str] [-X str] [-Z str] [-a str] [-b] [-c] [-d str]	[-f str]  [-i filename]  [-j filename]	[-k filename]  [-m str]  [-n path]
       [-o filename] [-p path] [-q path] [-r path] [-v path] [-x ext]

DESCRIPTION

       cleanasn is a utility program to clean up irregularities in NCBI ASN.1 objects.

OPTIONS

       A summary of options is included below.

       -      Print usage message

       -A filename
	      Accession list file

       -C str Sequence operations, per the flags in str:
	      c      Compress
	      d      Decompress
	      v      Virtual gaps inside segmented sequence
	      s      Convert segmented set to delta sequence

       -D str Clean up descriptors, per the flags in str:
	      t      Remove Title
	      c      Remove Comment
	      n      Remove Nuc-Prot Set title
	      e      Remove Pop/Phy/Mut/Eco Set title
	      m      Remove mRNA title
	      p      Remove Protein title

       -F str Clean up features, per the flags in str:
	      u      Remove User-objects
	      d      Remove db_xrefs
	      e      Remove /evidence and /inference
	      r      Remove redundant gene xrefs
	      f      Fuse duplicate features
	      k      Package coding-region or parts features
	      z      Delete or update EC numbers

       -K str Perform a general cleanup, per the flags in str:
	      b      BasicSeqEntryCleanup
	      p      C++ BasicCleanup (via an external utility)
	      s      SeriousSeqEntryCleanup
	      g      GpipeSeqEntryCleanup
	      n      Normalize descriptor order
	      u      Remove NcbiCleanup User Objects
	      c      Synchronize genetic Codes
	      d      Resynchronize CDS partials
	      m      Resynchronize mRNA partials
	      t      Resynchronize Peptide partials
	      a      Adjust consensus splice
	      i      Promote to "worst" Seq-ID

       -L filename
	      Log file

       -M filename
	      Macro file

       -N str Clean up links, per the flags in str:
	      o      Link CDS mRNA by Overlap
	      p      Link CDS mRNA by Product
	      r      Reassign feature IDs
	      f      Fix missing reciprocal feature IDs
	      c      Clear feature IDs

       -P     Publication options:
	      a      Remove All publications
	      s      Remove Serial number
	      f      Remove Figure, numbering, and name
	      r      Remove Remark
	      u      Update PMID-only publication
	      #      Replace unpublished with PMID

       -Q str Report:
	      c      Record count
	      r      ASN.1 BSEC report
	      s      ASN.1 SSEC report
	      n      NORM vs. SSEC report
	      e      PopPhyMutEco AutoDef report
	      o      Overlap report
	      l      Latitude-longitude country diff
	      d      Log SSEC differences
	      g      GenBank SSEC diff
	      f      asn2gb/asn2flat diff
	      h      Seg-to-delta GenBank diff
	      v      Validator SSEC diff
	      m      Modernize Gene/RNA/PCR
	      u      Unpublished Pub lookup
	      p      Published Pub lookup
	      j      Unindexed Journal report
	      x      Custom scan

       -R     Remote fetching from ID (NCBI sequence databases)

       -S str Selective difference filter (capital letters skip)
	      s      SSEC
	      b      BSEC
	      A      Author
	      p      Publication
	      l      Location
	      r      RNA
	      q      Qualifier sort order
	      g      Genbank block
	      k      Package CdRegion or parts features
	      m      Move publication
	      o      Leave duplicate Bioseq publication
	      d      Automatic definition line
	      e      Pop/Phy/Mut/Eco Set definition line

       -T     Taxonomy Lookup

       -U str Modernize, per the flags in str:
	      g      Genes
	      r      RNA
	      p      PCR Primers

       -V str Remove features by validator severity:
	      r      Reject
	      e      Error
	      w      Warning
	      i      Info

       -X str Miscellaneous options, per str:
	      d      Automatic definition line
	      e      Pop/Phy/Mut/Eco Set definition line
	      n      Instantiate NC title
	      m      Instantiate NM titles
	      x      Special XM titles
	      p      Instantiate Protein titles
	      c      Create mRNAs for coding sequences
	      f      Fix reciprocal protein_id/transcript_id

       -Z str Remove indicated User-object

       -a str ASN.1 type
	      a      Any (default)
	      e      Seq-entry
	      b      Bioseq
	      s      Bioseq-set
	      m      Seq-submit
	      t      Batch Processing [String]

       -b     Input ASN.1 is Binary

       -c     Input ASN.1 is Compressed

       -d str Source database
	      a      Any (default)
	      g      GenBank
	      e      EMBL
	      d      DDBJ
	      b      EMBL or DDBJ
	      r      RefSeq
	      n      NCBI
	      v      Only segmented sequences
	      w      Exclude segmented sequences
	      x      Exclude EMBL/DDBJ
	      y      Exclude gbcon, gbest, gbgss, gbhtg, gbpat, gbsts

       -f str Substring filter

       -i filename
	      Single input file (defaults to stdin)

       -j filename
	      First filename

       -k filename
	      Last filename

       -m str Flatfile mode:
	      r      Release
	      e      Entrez
	      s      Sequin
	      d      Dump

       -n path
	      asn2flat executable (default is /netopt/ncbi_tools/bin/asn2flat)

       -o filename
	      Single output file (defaults to stdout)

       -p path
	      Process all matching files in path

       -q path
	      ffdiff executable (default is /netopt/genbank/subtool/bin/ffdiff)

       -r path
	      Path for results

       -v path
	      asnval executable (default is /netopt/ncbi_tools/bin/asnval)

       -x ext File selection suffix for use with -p (defaults to .ent)

AUTHOR

       The National Center for Biotechnology Information.

SEE ALSO

       asndisc(1), asnval(1), sequin(1).

NCBI
								    2012-06-24							       CLEANASN(1)

Linux and UNIX Man Pages

formatrpsdb(1) [debian man page]

Check Out this Related Man Page