apertium-lextor(1) [debian man page]

apertium-lextor(1)														apertium-lextor(1)

NAME

       apertium-lextor - This application is part of ( apertium )

       This tool is part of the apertium machine translation architecture: http://apertium.org.

SYNOPSIS

       apertium-lextor --trainwrd stopwords words n left right corpus model [ --weightexp w ] [ --debug ]

       apertium-lextor --trainlch stopwords lexchoices n left right corpus wordmodel dic bildic model [ --weightexp w ] [ --debug ]

       apertium-lextor --lextor model dic left right [ --debug ] [ --weightexp w ]

DESCRIPTION

       apertium-lextor is the application responsible for training and usage of the lexical selector module.

OPTIONS

       --trainwrd | -t
       Train word co-occurrences model. It needs the following required parameters:

       stopwords file containing a list of stop words. Stop words are ignored.

       words file containing a list of words. For each word a co-occurrence model is built.

       n number of words per co-occurrence model (for each model, the n most frequent words).

       left left-side context to take into account (number of words).

       right right-side context to take into account (number of words).

       corpus file containing the training corpus.

       model output file on which the co-occurrence models are saved.

       --trainlch | -r
       Train  lexical  choices co-occurrence models using a target language co-occurrence model and a bilingual dictionary. It needs the following
       required parameters:

       stopwords file containing a list of stop words. Stop words are ignored.

       lexchoices file containing a list of lexical choices. For each lexical choice a co-occurrence model is built.

       n number of words per co-occurrence model (for each model, the n most frequent words).

       left left-side context to take into account (number of words).

       right right-side context to take into account (number of words).

       corpus file containing the training corpus.

       wordmodel target-language word co-occurrence model (previously trained by means of the --trainwrd option).

       dic the lexical-selection dictionary (binary format).

       bildic the bilingual dictionary (binary format).

       model output file on which the co-occurrence models are saved.

       --lextor | -l
       Perform the lexical selection on the input stream. It needs the following required parameters:

       model file containing the model to be used for the lexical selection.

       dic lexical-selection dictionary (binary format).

       left left-side context to take into account (number of words).

       right right-side context to take into account (number of words).

       --weightexp w
       Specify a weight value to change the influence of surrounding words while training or performing the lexical  selection.  The  parameter  w
       must be a positive value.

       --debug | -d
       Show debug information while working.

       --help | -h
       Shows this help.

       --version | -v
       Shows license information.

SEE ALSO

       apertium-gen-lextorbil(1),   apertium-preprocess-corpus-lextor(1),  apertium-gen-stopwords-lextor(1),  apertium-gen-wlist-lextor(1),  aper-
       tium-gen-wlist-lextor-translation(1), apertium-lextor-eval(1), apertium-lextor-mono(1).

BUGS

       Lots of...lurking in the dark and waiting for you!

AUTHOR

       (c) 2005,2006 Universitat d'Alacant / Universidad de Alicante. All rights reserved.

								    2006-12-12							apertium-lextor(1)

Check Out this Related Man Page

.TH lt-proc 1 2006-03-23 "" "" lt-proc - This application is part
of the lexical processing modules and tools ( ) This tool is part
of     the    apertium	  machine    translation    architecture:
http://www.apertium.org.  [ ] fst_file [input_file [output_file]]
[  ]  fst_file	[input_file [output_file]] is the application re-
sponsible for providing the four lexical processing  functionali-
ties

o morphological analyser  ( option -a ) o lexical transfer  ( op-
tion -n ) o morphological generator  ( option -g ) o post-genera-
tor   ( option -p ) It accomplishes these tasks by reading binary
files containing a compact and efficient representation  of  dic-
tionaries  (a  class of finite-state transducers called augmented
letter transducers). These files are generated by lt-comp(1).  It
is  worth  to  mention	that some characters (`[', `]', `$', `^',
`/', `+') are special chars used for  format  and  encapsulation.
They should be escaped if they have to be used literally, for in-
stance: `['...`]' are ignored and the format  of  a  linefeed  is
`^...$'.   Tokenizes  the text in surface forms (lexical units as
they appear in texts) and delivers, for each surface form, one or
more lexical forms consisting of lemma, lexical category and mor-
phological inflection information. Tokenization is not	straight-
forward  due  to the existence, on the one hand, of contractions,
and, on the other hand, of multi-word lexical units. For contrac-
tions, the system reads in a single surface form and delivers the
corresponding sequence of lexical forms. Multi-word surface forms
are  analysed  in  a left-to-right, longest-match fashion. Multi-
word surface forms may be invariable (such as a multi-word prepo-
sition or conjunction) or inflected (for example, in es, ";echaban
de menos";, "they missed", is a form of the  imperfect  indicative
tense  of  the verb ";echar de menos", "to miss"). Limited support
for some kinds of discontinuous multi-word units is  also  avail-
able. Single-word surface forms analysis produces output like the
one in these examples:	"cantar" -> `^cantar/cantar<vblex><inf>$'
or		   `"daba"		  ->		    `^da-
ba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$'.	Use   the
literal  case  of  the incoming characters Delivers a target-lan-
guage surface form for	each  target-language  lexical	form,  by
suitably  inflecting  it.  Morphological generation (like -g) but
without unknown word marks (asterisk `*').  Performs orthographi-
cal  operations  such  as  contractions  and apostrophations. The
post-generator is usually dormant (just copies the input  to  the
output)  until	a  special alarm symbol contained in some target-
language surface forms wakes it up to perform a particular string
transformation	if  necessary; then it goes back to sleep.  Input
processing is in orthoepikon (previously `sao') annotation system
format:  http://orthoepikon.sf.net.  Apply a transliteration dic-
tionary Display the version number.  Display this help.  The  in-
put compiled dictionary.  Lots of...lurking in the dark and wait-
ing for you!  (c) 2005,2006 Universitat d'Alacant  /  Universidad
de Alicante. All rights reserved.

Linux and UNIX Man Pages

apertium-lextor(1) [debian man page]

Check Out this Related Man Page