apertium-lextor(1)apertium-lextor(1)NAME
apertium-lextor - This application is part of ( apertium )
This tool is part of the apertium machine translation architecture: http://apertium.org.
SYNOPSIS
apertium-lextor --trainwrd stopwords words n left right corpus model [ --weightexp w ] [ --debug ]
apertium-lextor --trainlch stopwords lexchoices n left right corpus wordmodel dic bildic model [ --weightexp w ] [ --debug ]
apertium-lextor --lextor model dic left right [ --debug ] [ --weightexp w ]
DESCRIPTION
apertium-lextor is the application responsible for training and usage of the lexical selector module.
OPTIONS --trainwrd | -t
Train word co-occurrences model. It needs the following required parameters:
stopwords file containing a list of stop words. Stop words are ignored.
words file containing a list of words. For each word a co-occurrence model is built.
n number of words per co-occurrence model (for each model, the n most frequent words).
left left-side context to take into account (number of words).
right right-side context to take into account (number of words).
corpus file containing the training corpus.
model output file on which the co-occurrence models are saved.
--trainlch | -r
Train lexical choices co-occurrence models using a target language co-occurrence model and a bilingual dictionary. It needs the following
required parameters:
stopwords file containing a list of stop words. Stop words are ignored.
lexchoices file containing a list of lexical choices. For each lexical choice a co-occurrence model is built.
n number of words per co-occurrence model (for each model, the n most frequent words).
left left-side context to take into account (number of words).
right right-side context to take into account (number of words).
corpus file containing the training corpus.
wordmodel target-language word co-occurrence model (previously trained by means of the --trainwrd option).
dic the lexical-selection dictionary (binary format).
bildic the bilingual dictionary (binary format).
model output file on which the co-occurrence models are saved.
--lextor | -l
Perform the lexical selection on the input stream. It needs the following required parameters:
model file containing the model to be used for the lexical selection.
dic lexical-selection dictionary (binary format).
left left-side context to take into account (number of words).
right right-side context to take into account (number of words).
--weightexp w
Specify a weight value to change the influence of surrounding words while training or performing the lexical selection. The parameter w
must be a positive value.
--debug | -d
Show debug information while working.
--help | -h
Shows this help.
--version | -v
Shows license information.
SEE ALSO apertium-gen-lextorbil(1), apertium-preprocess-corpus-lextor(1), apertium-gen-stopwords-lextor(1), apertium-gen-wlist-lextor(1), aper-
tium-gen-wlist-lextor-translation(1), apertium-lextor-eval(1), apertium-lextor-mono(1).
BUGS
Lots of...lurking in the dark and waiting for you!
AUTHOR
(c) 2005,2006 Universitat d'Alacant / Universidad de Alicante. All rights reserved.
2006-12-12 apertium-lextor(1)
Check Out this Related Man Page
.TH lt-proc 1 2006-03-23 "" "" lt-proc - This application is part
of the lexical processing modules and tools ( ) This tool is partof the apertium machine translation architecture:
http://www.apertium.org. [ ] fst_file [input_file [output_file]]
[ ] fst_file [input_file [output_file]] is the application re-
sponsible for providing the four lexical processing functionali-tieso morphological analyser ( option -a ) o lexical transfer ( op-tion -n ) o morphological generator ( option -g ) o post-genera-tor ( option -p ) It accomplishes these tasks by reading binaryfiles containing a compact and efficient representation of dic-tionaries (a class of finite-state transducers called augmentedletter transducers). These files are generated by lt-comp(1). It
is worth to mention that some characters (`[', `]', `$', `^',
`/', `+') are special chars used for format and encapsulation.
They should be escaped if they have to be used literally, for in-stance: `['...`]' are ignored and the format of a linefeed is
`^...$'. Tokenizes the text in surface forms (lexical units as
they appear in texts) and delivers, for each surface form, one ormore lexical forms consisting of lemma, lexical category and mor-phological inflection information. Tokenization is not straight-
forward due to the existence, on the one hand, of contractions,and, on the other hand, of multi-word lexical units. For contrac-tions, the system reads in a single surface form and delivers thecorresponding sequence of lexical forms. Multi-word surface formsare analysed in a left-to-right, longest-match fashion. Multi-word surface forms may be invariable (such as a multi-word prepo-sition or conjunction) or inflected (for example, in es, "echaban
de menos", "they missed", is a form of the imperfect indicative
tense of the verb "echar de menos", "to miss"). Limited support
for some kinds of discontinuous multi-word units is also avail-able. Single-word surface forms analysis produces output like theone in these examples: "cantar" -> `^cantar/cantar<vblex><inf>$'
or `"daba" -> `^da-
ba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$'. Use the
literal case of the incoming characters Delivers a target-lan-guage surface form for each target-language lexical form, by
suitably inflecting it. Morphological generation (like -g) butwithout unknown word marks (asterisk `*'). Performs orthographi-
cal operations such as contractions and apostrophations. Thepost-generator is usually dormant (just copies the input to theoutput) until a special alarm symbol contained in some target-
language surface forms wakes it up to perform a particular stringtransformation if necessary; then it goes back to sleep. Input
processing is in orthoepikon (previously `sao') annotation system
format: http://orthoepikon.sf.net. Apply a transliteration dic-
tionary Display the version number. Display this help. The in-put compiled dictionary. Lots of...lurking in the dark and wait-ing for you! (c) 2005,2006 Universitat d'Alacant / Universidad
de Alicante. All rights reserved.