Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

tslminfo(1) [debian man page]

TSLMINFO(1)						User Contributed Perl Documentation					       TSLMINFO(1)

NAME
tslminfo - get information of a threaded back-off language model SYNOPSIS
tslminfo [option]... threaded_slm_file DESCRIPTION
slminfo tells information of back-off language model 'threaded_slm_file'. It can also print the model to ARPA format. When no option is given, slminfo will only print number of items in each level of the language model. OPTIONS
-v,--verbose Turn on verbose mode, printing arpa format. -p,--pr Prefer normal probability than -log(Pr) which is default. Valid under -v option. -l,--lexicon dict_file Specify the lexicon. Valid under -v option. Substitute the word-id with word-text in the output. AUTHOR
Originally written by Phill.Zhang <phill.zhang@sun.com>. Currently maintained by Kov.Chai <tchaikov@gmail.com>. SEE ALSO
slminfo(1). perl v5.14.2 2012-06-09 TSLMINFO(1)

Check Out this Related Man Page

SLMBUILD(1)						User Contributed Perl Documentation					       SLMBUILD(1)

NAME
slmbuild - generate language model from idngram file SYNOPSIS
slmbuild [option]... idngram_file... DESCRIPTION
slmbuild generates a back-off smoothing language model from a given idngram file. Generally, the idngram_file is created by ids2ngram. OPTIONS All the following options are mandatory. -n,--NMax N 1 for unigram, 2 for bigram, 3 for trigram. Any number not in the range of 1..3 is not valid. -o, --out output-file Specify the output xfilei name. -l, --log using -log(pr), use pr directly by default. -w, --wordcount N Lexican size, number of different words. -b, --brk id... Set the ids which should be treated as breaker. -e, --e id... Set the ids which should not be put into LM. -c, --cut c... k-grams whose freq <= c[k] are dropped. -d, --discount method, param... The k-th -d parm specifies the discount method For k-gram, possibble values for method/param are: B<GT>,I<R>,I<dis> : B<GT> discount for r E<lt>= I<R>, r is the freq of a ngram. Linear discount for those r E<gt> I<R>, i.e. r'=r*dis 0 E<lt>E<lt> dis E<lt> 1.0, for example 0.999 B<ABS>,[I<dis>] : Absolute discount r'=r-I<dis>. And I<dis> is optional 0 E<lt>E<lt> I<dis> E<lt> cut[k]+1.0, normally I<dis> E<lt> 1.0. LIN,[I<dis>] : Linear discount r'=r*dis. And dis is optional 0 E<lt> dis E<lt> 1.0 NOTE
-n must be given before -c -b. And -c must give right number of cut-off, also -ds must appear exactly N times specifying the discounts for 1-gram, 2-gram..., respectively. BREAKER-IDs could be SentenceTokens or ParagraphTokens. Conceptually, these ids have no meaning when they appeared in the middle of n-gram. EXCLUDE-IDs could be ambiguious-ids. Conceptually, n-grams which contain those ids are meaningless. We can not erase ngrams according to BREAKER-IDS and EXCLUDE-IDs directly from IDNGRAM file, because some low-level information is still useful in it. EXAMPLE
Following example read 'all.id3gram' and write trigram model 'all.slm'. At 1-gram level, use Good-Turing discount with cut-off 0, i<R>=8, dis=0.9995. At 2-gram level, use Absolute discount with cut-off 3, dis auto-calc. At 3-gram level, use Absolute discount with cut-off 2, dis auto-calc. Word id 10,11,12 are breakers (sentence/para/paper breaker, etc). Exclude-ID is 9. Lexicon contains 200000 words. The result languagme model uses -log(pr). slmbuild -l -n 3 -o all.slm -w 200000 -c 0,3,2 -d GT,8,0.9995 -d ABS -d ABS -b 10,11,12 -e 9 all.id3gram AUTHOR
Originally written by Phill.Zhang <phill.zhang@sun.com>. Currently maintained by Kov.Chai <tchaikov@gmail.com>. SEE ALSO
ids2ngram(1), slmprune(1). perl v5.14.2 2012-06-09 SLMBUILD(1)
Man Page