Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

formatrpsdb(1) [debian man page]

FORMATRPSDB(1)						     NCBI Tools User's Manual						    FORMATRPSDB(1)

NAME
formatrpsdb - Build databases for RPS Blast SYNOPSIS
formatrpsdb [-] [-E N] [-G N] [-S X] [-U str] [-b] [-f X] -i filename [-l filename] [-n str] [-o] [-t str] [-v N] DESCRIPTION
Formatrpsdb is a utility that converts a collection of input sequences into a database suitable for use with Reverse Position Specific (RPS) Blast. Each input sequence, together with its position-specific scoring matrix (PSSM), is ASN.1 encoded into a PssmWithParameters (or `scoremat') object and resides in a separate file. Scoremat objects can be created using blastpgp. Formatrpsdb is given a list of these files and produces the corresponding database. Formatrpsdb is designed to perform the work of formatdb, makemat and copymat simultaneously, without generating the large number of inter- mediate files these utilities would need to create an RPS Blast database. Further, scoremat objects are in more general use than the binary format makemat requires. It is hoped that direct manipulation of scoremat objects will encourage conversion of more diverse sequence collections into RPS Blast databases. Databases generated by formatrpsdb are binary compatible with databases generated by formatdb/makemat/copymat, although the database files will in general not be byte- for-byte identical. OPTIONS
A summary of options is included below. - Print usage message -E N The gap extension penalty (if not specified in the scoremat; default = 1) -G N The gap opening penalty (if not specified in the scoremat; default = 11) -S X For scoremats that contain only residue frequencies, the scaling factor to apply when creating PSSMs (default = 100) -U str Underlying score matrix (if not specified in the scoremat; default = BLOSUM62) -b Scoremat files are binary (vs. text) ASN1. -f X Threshold for extending hits for RPS database (default = 11) -i filename Input file containing list of ASN.1 Scoremat filenames -l filename Log file name (default = formatrpsdb.log) -n str Base name of output database (same as input file if not specified) -o Create index files for database -t str Title for database file -v N Database volume size in millions of letters (default = 0, which really means no limit) AUTHOR
The National Center for Biotechnology Information. SEE ALSO
blast(1), copymat(1), formatdb(1), makemat(1), /usr/share/doc/blast2/formatrpsdb.html NCBI
2004-10-20 FORMATRPSDB(1)

Check Out this Related Man Page

CLEANASN(1)						     NCBI Tools User's Manual						       CLEANASN(1)

NAME
cleanasn - clean up irregularities in NCBI ASN.1 objects SYNOPSIS
cleanasn [-] [-A filename] [-C str] [-D str] [-F str] [-K str] [-L filename] [-M filename] [-N str] [-P str] [-Q str] [-R] [-S str] [-T] [-U str] [-V str] [-X str] [-Z str] [-a str] [-b] [-c] [-d str] [-f str] [-i filename] [-j filename] [-k filename] [-m str] [-n path] [-o filename] [-p path] [-q path] [-r path] [-v path] [-x ext] DESCRIPTION
cleanasn is a utility program to clean up irregularities in NCBI ASN.1 objects. OPTIONS
A summary of options is included below. - Print usage message -A filename Accession list file -C str Sequence operations, per the flags in str: c Compress d Decompress v Virtual gaps inside segmented sequence s Convert segmented set to delta sequence -D str Clean up descriptors, per the flags in str: t Remove Title c Remove Comment n Remove Nuc-Prot Set title e Remove Pop/Phy/Mut/Eco Set title m Remove mRNA title p Remove Protein title -F str Clean up features, per the flags in str: u Remove User-objects d Remove db_xrefs e Remove /evidence and /inference r Remove redundant gene xrefs f Fuse duplicate features k Package coding-region or parts features z Delete or update EC numbers -K str Perform a general cleanup, per the flags in str: b BasicSeqEntryCleanup p C++ BasicCleanup (via an external utility) s SeriousSeqEntryCleanup g GpipeSeqEntryCleanup n Normalize descriptor order u Remove NcbiCleanup User Objects c Synchronize genetic Codes d Resynchronize CDS partials m Resynchronize mRNA partials t Resynchronize Peptide partials a Adjust consensus splice i Promote to "worst" Seq-ID -L filename Log file -M filename Macro file -N str Clean up links, per the flags in str: o Link CDS mRNA by Overlap p Link CDS mRNA by Product r Reassign feature IDs f Fix missing reciprocal feature IDs c Clear feature IDs -P Publication options: a Remove All publications s Remove Serial number f Remove Figure, numbering, and name r Remove Remark u Update PMID-only publication # Replace unpublished with PMID -Q str Report: c Record count r ASN.1 BSEC report s ASN.1 SSEC report n NORM vs. SSEC report e PopPhyMutEco AutoDef report o Overlap report l Latitude-longitude country diff d Log SSEC differences g GenBank SSEC diff f asn2gb/asn2flat diff h Seg-to-delta GenBank diff v Validator SSEC diff m Modernize Gene/RNA/PCR u Unpublished Pub lookup p Published Pub lookup j Unindexed Journal report x Custom scan -R Remote fetching from ID (NCBI sequence databases) -S str Selective difference filter (capital letters skip) s SSEC b BSEC A Author p Publication l Location r RNA q Qualifier sort order g Genbank block k Package CdRegion or parts features m Move publication o Leave duplicate Bioseq publication d Automatic definition line e Pop/Phy/Mut/Eco Set definition line -T Taxonomy Lookup -U str Modernize, per the flags in str: g Genes r RNA p PCR Primers -V str Remove features by validator severity: r Reject e Error w Warning i Info -X str Miscellaneous options, per str: d Automatic definition line e Pop/Phy/Mut/Eco Set definition line n Instantiate NC title m Instantiate NM titles x Special XM titles p Instantiate Protein titles c Create mRNAs for coding sequences f Fix reciprocal protein_id/transcript_id -Z str Remove indicated User-object -a str ASN.1 type a Any (default) e Seq-entry b Bioseq s Bioseq-set m Seq-submit t Batch Processing [String] -b Input ASN.1 is Binary -c Input ASN.1 is Compressed -d str Source database a Any (default) g GenBank e EMBL d DDBJ b EMBL or DDBJ r RefSeq n NCBI v Only segmented sequences w Exclude segmented sequences x Exclude EMBL/DDBJ y Exclude gbcon, gbest, gbgss, gbhtg, gbpat, gbsts -f str Substring filter -i filename Single input file (defaults to stdin) -j filename First filename -k filename Last filename -m str Flatfile mode: r Release e Entrez s Sequin d Dump -n path asn2flat executable (default is /netopt/ncbi_tools/bin/asn2flat) -o filename Single output file (defaults to stdout) -p path Process all matching files in path -q path ffdiff executable (default is /netopt/genbank/subtool/bin/ffdiff) -r path Path for results -v path asnval executable (default is /netopt/ncbi_tools/bin/asnval) -x ext File selection suffix for use with -p (defaults to .ent) AUTHOR
The National Center for Biotechnology Information. SEE ALSO
asndisc(1), asnval(1), sequin(1). NCBI
2012-06-24 CLEANASN(1)
Man Page