Table of Contents
fsa_synth - perform morphological synthesis of inflected forms
fsa_synth
[ options ] [ <infile ] [ >outfile ]
fsa_synth reads lines from
the input. Each line contains a pair: a canonical form and tags. They are
separated with white spaces. All inflected forms from the dictionary that
match the tags are printed.
- -d dictionary
- use that dictionary. Several
dictionaries may be given. At least one dictionary must be specified. Dictionaries
are automata built using fsa_ubuild or fsa_build. The data for them must
be prepared in a special way. Each line of data consists of the canonical
form, followed by a separator, followed by a code K, followed by the ending
of the lexeme, followed by the separator, followed by tags. The code K specifies
how many characters from the end of the inflected form are different from
last characters of the lexeme (i.e. how many characters constitute the inflected
word ending). ’A’ means that no characters are to be rejected, ’B’ - 1, ’C’ - 2,
and so on. Though the data is different from the normal word list format,
the automaton is the same (its content is different), so the magic number
is the same as with the normal data.
- -i input_file
- specifies an input file
- file that contains words that should be analysed. More than one file can
be specified (i.e. the option can be used more than once). In absence of
-i option, standard input is used.
- -P
- indicates that the dictionary contains
coded prefixes (see fsa_synth(5)
).
- -I
- indicates that the dictionary contains
coded infixes (see fsa_synth(5)
).
- -a
- generate all surface forms for the given
lexical form (no tags should be given)
- -r
- indicates that the tags are given
as regular expressions. A regular expression is:
- a
- any normal character
(it means itself). Any special character (see below) should be escaped with
a backslash. .
- means any character.
- [ab]
- means any of the characters given
inside brackets (no commas nor vertical bars are needed to separate the
characters - they would represent themselves). A circumflex (^) immediately
after the opening bracket means complementation, i.e. the expression in brackets
will match a character that is not any of the characters listed after the
circumflex. Ranges of characters may be given, i.e. a-z means all characters
with codes not smaller than the code of a, and not greater than the code
of z. The character preceding the dash must have a smaller code than the
character following the dash.
- (a)
- means a. Parentheses can be used for grouping.
- ab
- concatenation of a and b
- a|b
- either a or b
- a*
- means a appearing any
number of times (including 0).
- a+
- means aa*
- a?
- means optional a , i.e. an
a that may appear 0 or 1 time.
- -v
- print version details, including compile
options used to build the program.
- OK
- Invalid options, or lack
of a required option.
- Dictionary file could not be opened.
- Invalid regular
expression.
- Not enough memory.
- Invalid UTF8 character.
- Error in determinization
(transitions not grouped for source states).
fsa_accent(1)
, fsa_build(1)
,
fsa_guess(1)
, fsa_hash(1)
, fsa_morph(5)
, fsa_prefix(1)
, fsa_spell(1)
, fsa_ubuild(1)
,
fsa_visual(1)
.
Send bug reports to the author: Jan Daciuk, jan dac@eti.pg.gda.pl
(remove the space in my user name).
Table of Contents