Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

mbtg(1) [debian man page]

mbtg(1) 						      General Commands Manual							   mbtg(1)

NAME
MBTG - Memory Based Tagger generator SYNOPSYS
mbtg -T <filename> -s <setting filename> or mbtg [options] DESCRIPTION
This programs generates, based on a tagged corpus, all the files needed to be able to tag a text with mbt. OPTIONS
-h or --help show help -T <tagged training corpus file> or -E <enriched tagged training corpus file> All further options have reasonable defaults, so using them is only needed for the experienced user. See the mbt manual for more details. -s settingsfile mbtg creates this file, which can be used to run mbt with minimal effort. (like mbt -s settings -T somefile) -p pattern the pattern for known words (default ddfa) -P pattern the pattern for unknown words (default dFapsss) -% <number> filter threshold for ambitag construction (default 5%) -l <lexiconfile> -L <file with list of frequent words> -r <ambitagfile> -k <known words case base> -u <unknown words case base> -K <known words instances file> -U <unknown words instances file> -V or --version show version info -e <sentence delimiter> (default '<utt>') -X keep the intermediate files -Otimbl options (Note: there is NO SPACE between O and the options) <options> classifier options for both known and unknown words instances bases K: <options> classifier options for known words instance base U: <options> classifier options for unknown words case base valid timbl options are: a d k m q v w x - BUGS
possibly AUTHORS
Ko van der Sloot Timbl@uvt.nl Antal van den Bosch Timbl@uvt.nl SEE ALSO
timbl(1) mbt(1) mbtserver(1) 2011 march 21 mbtg(1)

Check Out this Related Man Page

frog(1) 						      General Commands Manual							   frog(1)

NAME
frog - Dutch morpho-syntactic analyzer, IOB chunker and dependency parser SYNOPSYS
frog [options] frog -t test-file DESCRIPTION
frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. frog's current version will tok- enize, tag, lemmatize, and morphologically segment word tokens in Dutch text files, add IOB chunks and will assign a dependency graph to each sentence. OPTIONS
-c <configfile> set the configuration using 'file' -d <level> set debug level. -e <encoding> set input encoding. (default UTF8) -h give some help --keep-parser-files=[yes|no] keep the intermediate files from the parser. Last sentence only! -n assume inputfile to hold one sentence per line -o <file> send output to 'file' instead of stdout. Defaults to the name of the inputfile with '.out' appended. --outputdir <dir> send all output to 'dir' instead of stdout. Creates filenames from the inputfilename(s) with '.out' appended. --skip=[mptc] skip parts of the proces: Tokenizer (t), Chunker (c), Multi-Word unit (m) or Parser (p) -Q Enable quotedetection in the tokenizer. May run havock! -S <port> Run a server on 'port' -t <file> process 'file' -x <xmlfile> process 'xmlfile', which is supposed to be in FoLiA format! If 'xmlfile' is empty, and --testdir=<dir> is provided, all files in 'dir' will be processed as FoLia XML. --testdir=<dir> process all files in 'dir'. see also --outputdir --tmpdir=<dir> location to store intermediate files. Default /tmp. -V or --version show version info --xmldir=<dir> generate FoLiA XML output and send it to 'dir'. Creates filenames from the inputfilename with '.xml' appended. -X <file> generate FoLiA XML output and send it to 'file'. Defaults to the name of the inputfile(s) with '.xml' appended. --id=<id> When -X for FoLia is given, use 'id' to give the doc an ID. BUGS
likely AUTHORS
Maarten van Gompel proycon@anaproy.nl Ko van der Sloot Timbl@uvt.nl Antal van den Bosch Timbl@uvt.nl SEE ALSO
ucto(1) 2012 January 31 frog(1)
Man Page