Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

slmbuild(1) [debian man page]

SLMBUILD(1)						User Contributed Perl Documentation					       SLMBUILD(1)

NAME
slmbuild - generate language model from idngram file SYNOPSIS
slmbuild [option]... idngram_file... DESCRIPTION
slmbuild generates a back-off smoothing language model from a given idngram file. Generally, the idngram_file is created by ids2ngram. OPTIONS All the following options are mandatory. -n,--NMax N 1 for unigram, 2 for bigram, 3 for trigram. Any number not in the range of 1..3 is not valid. -o, --out output-file Specify the output xfilei name. -l, --log using -log(pr), use pr directly by default. -w, --wordcount N Lexican size, number of different words. -b, --brk id... Set the ids which should be treated as breaker. -e, --e id... Set the ids which should not be put into LM. -c, --cut c... k-grams whose freq <= c[k] are dropped. -d, --discount method, param... The k-th -d parm specifies the discount method For k-gram, possibble values for method/param are: B<GT>,I<R>,I<dis> : B<GT> discount for r E<lt>= I<R>, r is the freq of a ngram. Linear discount for those r E<gt> I<R>, i.e. r'=r*dis 0 E<lt>E<lt> dis E<lt> 1.0, for example 0.999 B<ABS>,[I<dis>] : Absolute discount r'=r-I<dis>. And I<dis> is optional 0 E<lt>E<lt> I<dis> E<lt> cut[k]+1.0, normally I<dis> E<lt> 1.0. LIN,[I<dis>] : Linear discount r'=r*dis. And dis is optional 0 E<lt> dis E<lt> 1.0 NOTE
-n must be given before -c -b. And -c must give right number of cut-off, also -ds must appear exactly N times specifying the discounts for 1-gram, 2-gram..., respectively. BREAKER-IDs could be SentenceTokens or ParagraphTokens. Conceptually, these ids have no meaning when they appeared in the middle of n-gram. EXCLUDE-IDs could be ambiguious-ids. Conceptually, n-grams which contain those ids are meaningless. We can not erase ngrams according to BREAKER-IDS and EXCLUDE-IDs directly from IDNGRAM file, because some low-level information is still useful in it. EXAMPLE
Following example read 'all.id3gram' and write trigram model 'all.slm'. At 1-gram level, use Good-Turing discount with cut-off 0, i<R>=8, dis=0.9995. At 2-gram level, use Absolute discount with cut-off 3, dis auto-calc. At 3-gram level, use Absolute discount with cut-off 2, dis auto-calc. Word id 10,11,12 are breakers (sentence/para/paper breaker, etc). Exclude-ID is 9. Lexicon contains 200000 words. The result languagme model uses -log(pr). slmbuild -l -n 3 -o all.slm -w 200000 -c 0,3,2 -d GT,8,0.9995 -d ABS -d ABS -b 10,11,12 -e 9 all.id3gram AUTHOR
Originally written by Phill.Zhang <phill.zhang@sun.com>. Currently maintained by Kov.Chai <tchaikov@gmail.com>. SEE ALSO
ids2ngram(1), slmprune(1). perl v5.14.2 2012-06-09 SLMBUILD(1)

Check Out this Related Man Page

dis(1)								   User Commands							    dis(1)

NAME
dis - object code disassembler SYNOPSIS
dis [-onqCLV] [-d sec] [-D sec] [-F function] [-l string] [-t sec] file... DESCRIPTION
The dis command produces an assembly language listing of file, which can be an object file or an archive of object files. The listing includes assembly statements and an octal or hexadecimal representation of the binary that produced those statements. OPTIONS
Options are interpreted by the disassembler and can be specified in any order. The following options are supported: -C Displays demangled C++ symbol names in the disassembly. -d sec Disassembles the named section as data, printing the offset of the data from the beginning of the section. -D sec Disassembles the named section as data, printing the actual address of the data. -F function Disassembles only the named function in each object file specified on the command line. The -F option can be specified mul- tiple times on the command line. -l string Disassembles the archive file specified by string. For example, one would issue the command dis -l x -l z to disassemble libx.a and libz.a, which are assumed to be in LIBDIR. This option is obsolete and might be removed in a future release of Solaris. -L Invokes a lookup of C-language source labels in the symbol table for subsequent writing to standard output. This option is obsolete and might be removed in a future release of Solaris. -n Displays all addresses numerically. Addresses are displayed using symbolic names by default. -o Prints numbers in octal. The default is hexadecimal. -q Quiet mode. Does not print any headers or function entry labels. -t sec Disassembles the named section as text. -V Prints, on standard error, the version number of the disassembler being executed. This option is obsolete and might be removed in a future release of Solaris. If the -d, -D, or -t options are specified, only those named sections from each user-supplied file is disassembled. Otherwise, all sections containing text is disassembled. On output, a number enclosed in brackets at the beginning of a line, such as [5], indicates that the break-pointable line number starts with the following instruction. These line numbers is printed only if the file was compiled with additional debugging information, for example, the -g option of cc(1B). An expression such as <40> in the operand field or in the symbolic disassembly, following a relative dis- placement for control transfer instructions, is the computed address within the section to which control is transferred. A function name appears in the first column, followed by () if the object file contains a symbol table. OPERANDS
The following operand is supported: file A path name of an object file or an archive (see ar(1)) of object files. ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of dis: LC_CTYPE, LC_MESSAGES, and NLSPATH. LIBDIR If this environment variable contains a value, use this as the path to search for the library. If the variable contains a null value, or is not set, it defaults to searching for the library under /usr/lib. EXIT STATUS
The following exit values are returned: 0 Successful completion. >0 An error occurred. FILES
/usr/lib default LIBDIR ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWbtool | +-----------------------------+-----------------------------+ |Interface Stability |See below. | +-----------------------------+-----------------------------+ The human readable output is Unstable. The command line options are Evolving. SEE ALSO
ar(1), as(1), cc(1B), ld(1), a.out(4), attributes(5), environ(5) DIAGNOSTICS
The self-explanatory diagnostics indicate errors in the command line or problems encountered with the specified files. SunOS 5.11 28 Jun 2007 dis(1)
Man Page