Table of Contents

Name

fsa_synth - perform morphological synthesis of inflected forms

Synopsis

fsa_synth [ options ] [ <infile ] [ >outfile ]

Description

fsa_synth reads lines from the input. Each line contains a pair: a canonical form and tags. They are separated with white spaces. All inflected forms from the dictionary that match the tags are printed.

Options

-d dictionary
use that dictionary. Several dictionaries may be given. At least one dictionary must be specified. Dictionaries are automata built using fsa_ubuild or fsa_build. The data for them must be prepared in a special way. Each line of data consists of the canonical form, followed by a separator, followed by a code K, followed by the ending of the lexeme, followed by the separator, followed by tags. The code K specifies how many characters from the end of the inflected form are different from last characters of the lexeme (i.e. how many characters constitute the inflected word ending). ’A’ means that no characters are to be rejected, ’B’ - 1, ’C’ - 2, and so on. Though the data is different from the normal word list format, the automaton is the same (its content is different), so the magic number is the same as with the normal data.
-i input_file
specifies an input file - file that contains words that should be analysed. More than one file can be specified (i.e. the option can be used more than once). In absence of -i option, standard input is used.
-P
indicates that the dictionary contains coded prefixes (see fsa_synth(5) ).
-I
indicates that the dictionary contains coded infixes (see fsa_synth(5) ).
-a
generate all surface forms for the given lexical form (no tags should be given)
-r
indicates that the tags are given as regular expressions. A regular expression is:
a
any normal character (it means itself). Any special character (see below) should be escaped with a backslash. .
  • means any character.
  • [ab]
    means any of the characters given inside brackets (no commas nor vertical bars are needed to separate the characters - they would represent themselves). A circumflex (^) immediately after the opening bracket means complementation, i.e. the expression in brackets will match a character that is not any of the characters listed after the circumflex. Ranges of characters may be given, i.e. a-z means all characters with codes not smaller than the code of a, and not greater than the code of z. The character preceding the dash must have a smaller code than the character following the dash.
    (a)
    means a. Parentheses can be used for grouping.
    ab
    concatenation of a and b
    a|b
    either a or b
    a*
    means a appearing any number of times (including 0).
    a+
    means aa*
    a?
    means optional a , i.e. an a that may appear 0 or 1 time.
    -v
    print version details, including compile options used to build the program.

    Exit Status

    1. OK
    2. Invalid options, or lack of a required option.
    3. Dictionary file could not be opened.
    4. Invalid regular expression.
    5. Not enough memory.
    6. Invalid UTF8 character.
    7. Error in determinization (transitions not grouped for source states).

    See Also

    fsa_accent(1) , fsa_build(1) , fsa_guess(1) , fsa_hash(1) , fsa_morph(5) , fsa_prefix(1) , fsa_spell(1) , fsa_ubuild(1) , fsa_visual(1) .

    Bugs

    Send bug reports to the author: Jan Daciuk, jan dac@eti.pg.gda.pl (remove the space in my user name).


    Table of Contents