next up previous contents index
Next: EXIT STATUS Up: fsa95guess Previous: DESCRIPTION

OPTIONS

-d dictionary

use that dictionary. Several dictionaries may be given. At least one dictionary must be specified. Dictionaries are automata built using fsa_ubuild or fsa_build with -X option. Data for the automata must be prepared in a special way.

If the automata are to be used to predict only the categories, each line of the input to should contain inverted word with the beginning (the end when inverted) of the word marked with the filler character, followed by an annotation separator, and followed by tags. See prep_atg.awk script available in the package.To treat such dictionaries fsa_guess should not be compiled with GUESS_LEXEMES compile option. The standard name extension for dictionaries prepared in this way is atg.

If fsa_guess is to guess also lexemes, it must be compiled with GUESS_LEXEMES compile option, but without GUESS_PREFIX, and the input to fsa_build must contain in each line: the inflected form, annotation separator, a code, lexeme ending, annotation separator, and tags (annotations). The code specifies how many characters from the end of the inflected form must be deleted before appending there the lexeme ending to get the lexeme. It is one character. To calculate the number, take the character code and substract 65 (character code for `A') from it. See prep_atl.awk script available in the package. The standard name extension for automata prepared in this way is atl.

To make fsa_guess take into account information included in prefixes, it must be compiled with GUESS_PREFIX. In data lines for fsa_build, the first annotation separator is replaced by two annotation separators for entries that do not contain prefixes, otherwise the prefix is deleted from the inverted inflected form leaving the filler character, and placed between the two annotation marks. The standard name extension for automata prepared in this way is atp.

-g
makes fsa_guess work as if it were compiled without GUESS_LEXEMES. This option is available only if the program was compiled with GUESS_LEXEMES.

-p
makes fsa_guess work as if it were compiled without GUESS_PREFIX. This option is avalaible only if the program is compiled with GUESS_PREFIX.

-i input_file
specifies an input file - file that contains words which categories should be guessed. More than one file can be specified (i.e. the option can be used more than once). In absence of -i option, standard input is used.

-l language_file
specifies a file that holds language specific information, i.e. (for now) characters that form words, and pairs of (lowercase, uppercase) characters for case conversion. If the option is not specified, latin letters with standard case conversions will be used.

-v
print version details.


next up previous contents index
Next: EXIT STATUS Up: fsa95guess Previous: DESCRIPTION

Jan Daciuk
Wed Jun 3 14:37:17 CEST 1998

Software at http://www.pg.gda.pl/~jandac/fsa.html