Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

seqstat(1) [debian man page]

seqstat(1)							  Biosquid Manual							seqstat(1)

NAME
seqstat - show statistics and format for a sequence file SYNOPSIS
seqstat [options] seqfile DESCRIPTION
seqstat reads a sequence file seqfile and shows a number of simple statistics about it. The printed statistics include the name of the format, the residue type of the first sequence (protein, RNA, or DNA), the number of sequences, the total number of residues, and the average and range of the sequence lengths. OPTIONS
-a Show additional verbose information: a table with one line per sequence showing name, length, and description line. These lines are prefixed with a * character to enable easily grep'ing them out and sorting them. -h Print brief help; includes version number and summary of all options, including expert options. -B (Babelfish). Autodetect and read a sequence file format other than the default (FASTA). Almost any common sequence file format is recognized (including Genbank, EMBL, SWISS-PROT, PIR, and GCG unaligned sequence formats, and Stockholm, GCG MSF, and Clustal align- ment formats). See the printed documentation for a complete list of supported formats. EXPERT OPTIONS
--informat <s> Specify that the sequence file is in format <s>, rather than the default FASTA format. Common examples include Genbank, EMBL, GCG, PIR, Stockholm, Clustal, MSF, or PHYLIP; see the printed documentation for a complete list of accepted format names. This option overrides the default expected format (FASTA) and the -B Babelfish autodetection option. --quiet Suppress the verbose header (program name, release number and date, the parameters and options in effect). SEE ALSO
afetch(1), alistat(1), compalign(1), compstruct(1), revcomp(1), seqsplit(1), sfetch(1), shuffle(1), sindex(1), sreformat(1), stranslate(1), weight(1). AUTHOR
Biosquid and its documentation are Copyright (C) 1992-2003 HHMI/Washington University School of Medicine Freely distributed under the GNU General Public License (GPL) See COPYING in the source code distribution for more details, or contact me. Sean Eddy HHMI/Department of Genetics Washington University School of Medicine 4444 Forest Park Blvd., Box 8510 St Louis, MO 63108 USA Phone: 1-314-362-7666 FAX : 1-314-362-2157 Email: eddy@genetics.wustl.edu Biosquid 1.9g January 2003 seqstat(1)

Check Out this Related Man Page

sindex(1)							  Biosquid Manual							 sindex(1)

NAME
sindex - index a sequence database for sfetch SYNOPSIS
sindex [options] seqfile1 [seqfile2...] DESCRIPTION
sindex indexes one or more seqfiles for future sequence retrievals by sfetch. An SSI ("squid sequence index") file is created in the same directory with the sequence files. By default, this file is called <seqfile>.ssi. If there is more than one sequence file on the command line, the SSI filename will be constructed from the last sequence file name. This may not be what you want; see the -o option to specify your own name for the SSI file. sindex is capable of indexing large files (>2 GB) if optional LFS support has been enabled at compile-time. See INSTALL instructions that came with @PACKAGE@. OPTIONS
-h Print brief help; includes version number and summary of all options, including expert options. -o <ssi outfile> Direct the SSI index to a file named <outfile>. By default, the SSI file would go to <seqfile>.ssi. EXPERT OPTIONS
--64 Force the SSI file into 64-bit (large seqfile) mode, even if the seqfile is small. You don't want to do this unless you're debug- ging. --external Force sindex to do its record sorting by external (on-disk) sorting. This is only useful for debugging, too. --informat <s> Specify that the sequence file is definitely in format <s>; blocks sequence file format autodetection. This is useful in automated pipelines, because it improves robustness (autodetection can occasionally go wrong on a perversely misformed file). Common examples include genbank, embl, gcg, pir, stockholm, clustal, msf, or phylip; see the printed documentation for a complete list of accepted format names. --pfamseq A hack for Pfam; indexes a FASTA file that is known to have identifier lines in format ">[name] [accession] [optional description]". Normally only the sequence name would be indexed as a primary key in a FASTA SSI file, but this allows indexing both the name (as a primary key) and accession (as a secondary key). SEE ALSO
afetch(1), alistat(1), compalign(1), compstruct(1), revcomp(1), seqsplit(1), seqstat(1), sfetch(1), shuffle(1), sreformat(1), strans- late(1), weight(1). AUTHOR
Biosquid and its documentation are Copyright (C) 1992-2003 HHMI/Washington University School of Medicine Freely distributed under the GNU General Public License (GPL) See COPYING in the source code distribution for more details, or contact me. Sean Eddy HHMI/Department of Genetics Washington University School of Medicine 4444 Forest Park Blvd., Box 8510 St Louis, MO 63108 USA Phone: 1-314-362-7666 FAX : 1-314-362-2157 Email: eddy@genetics.wustl.edu Biosquid 1.9g January 2003 sindex(1)
Man Page