next up previous contents index
Next: With Transducers Up: Data for Applications Previous: Spelling and Restoration of

Morphological Lexicon

 

The specific form of data for morphological lexicon  depends on the kind of automata to be used (simple automata - acceptors, or transducers).

In morphology, there are two levels: the strings come from the surface level, and the lexical level. The surface string is an inflected form of a word. E.g. the string spała is an inflected form (one of many) of the word spać. The lexical string is the corresponding lexeme with morphological annotations . The lexeme is the word in a form that can be found as the main entry for that word in a dictionary. E.g. the lexeme corresponding to the inflected form spała is spac. The annotations describe the properties of a word form, e.g. for the word spała, we could note that it is a verb in past tense, third person, female, singular, imperfect.

We may need morphology for various purposes. If it is tagging (annotating a corpus with part-of-speech labels), we do not need lexemes; only categories matter. For stemming, which is widely used in e.g. information retrieval, we retain lexemes, but suppress categories.





Jan Daciuk
Wed Jun 3 14:37:17 CEST 1998

Software at http://www.pg.gda.pl/~jandac/fsa.html