The whole concept of transducers consists in translating one string into another. Therefore, it is possible to translate the surface string into the lexical string, and vice versa. The translation can either be stored in a lexicon, or generated from translation rules. The latter approach can be found in numerous sources, e.g. [KK94], [Kos83], [Kos84], [RRBP92], [RS95], and [Spr92]. We do not describe it here, and we do not use it.
The lexicon-based method (see [Kar94] or [Moh94a]) is very simple. The paths in a transducer describe translations of strings (fig. 3.1).
Figure 3.1: Morphology of spała as a path in a
transducer. Categories kept to a minimum due to lack of space
(only part-of-speech)
Therefore, the format of data can be as simple as possible. In our implementation, one line of input data has 3 HT-separated fields (HT means horizontal tabulation character, aka. TAB): inflected form, lexeme, and categories (fig. 3.2).
Figure 3.2: Transducer input data for "spała"
The surface and lexical strings may not be of the same length. Special filler characters must be used to pad the shorter string. They can be used, however, inside the strings as well. By inserting the filler characters at well-chosen places one can reduce the size of the transducer by increasing the number of identical transitions. The well-chosen places are those that allow for alignment of corresponding segments in both strings.
Note that such representation leads to small devices, as the beginnings of words consist mainly of pairs of identical characters, and the endings have the same mappings of surface ending (the ending of the inflected form) to lexical ending. Also, the analysis is simple, as no decoding is necessary.