Po polsku
This page contains no Javascript. No need to analyze the source!

Important note

Due to problems with GUT WWW servers, and due to GUT's planned switch to another internet domain, this page has been moved to a new location: http://www.jandaciuk.pl/fsa.html. Note that the software repository on an FTP server had to be moved as well, as not only it will be affected by the domain switch (the current domain will be abandoned!), but GUT's FTP server crew is unreliable (they switched off the server for several month at the beginning of 2016).

Finite state utilities

This page describes two software packages and some accompanying files available from http://www.jandaciuk.pl/Software/Fsa/ and http://www.jandaciuk.pl/Software/Utr/. Farther on this page, you will find direct links to the current versions of both software packages.

Further down on this page:

What are they for?

Available Packages

Both packages are written in C++, and they can be compiled with g++. Problems may arise when compiling with different compilers due to the use of templates that tend not to be implemented universally. Both packages use compact dictionaries that are automata (of different forms) An interface in elisp is provided for both of them. The interface worked with emacs19, but it is highly unlikely that it will work with emacs20 or later (specially with mule).

fsa - finite state automata

Current version number: 0.51. Man pages (except for fsa_synth) in HTML format are also available as one file.

utr - transducers

Current version number: 0.10

Dictionaries

Name extension conventions

To avoid confusion, I use some name extensions to indicate the contensts of a dictionary:
fsa
a list of words compiled to a simple automaton - to be used with fsa_accent, fsa_spell.
fsm
morphology in a form of a simple automaton - to be used with fsa_morph.
atg
a list a tergo (i.e. inverted) of inflected forms with their categories - to be used with fsa_guess compiled without GUESS_LEXEMES or run with -a option. This is used to guess categories of words from their endings.
atl
a list a tergo (i.e. inverted) of inflected forms with lexemes and categories compiled to a simple automaton - to be used with fsa_guess compiled with GUESS_LEXEMES, but without GUESS_PREFIX, or with GUESS_PREFIX, but run with -p option. This is used to guess corresponding lexemes and categories from word endings.
atp
a list a tergo (i.e. inverted) of inflected forms with lexemes and categories compiled to a simple automaton - to be used with fsa_guess compiled with both GUESS_LEXEMES and GUESS_PREFIX. In this automaton prefixes are stored differently so that the automaton is smaller, and some more generalizations are possible. This is used to guess corresponding lexemes and categories from word endings and beginnings.
tr
transducer - to be used with all programs that have names beginning with tr_.

Available dictionaries

Note that the dictionaries were constructed long time ago, and with fsa_build that used only a format available at that time. To use the default set of compile options for the current version, you have to rebuild the automata.
deutsch1.fsa.gz
German word list from ftp.informatik.tu-muenchen.de:/pub/doc/dict/. 7 bit only, umlauts coded with following e, sharp s with ss. It is difficult to convert them to 8 bit, as not every oe is o umlaut, not every ss is sharp s, etc.
english.fsa.gz
English word list from /usr/dict/words
francais.atl.gz
French dictionary a tergo with categories and lexemes from ISSCO
francais.fsa.gz
French word list form ISSCO
francais.fsm.gz
French morphology (simple automaton) from ISSCO
francais.tr.gz
French transducer from ISSCO
french_moby.fsa.gz
French word list from the Moby Project
polski.fsa.gz
Polish word list extracted from Dziennik Baltycki articles

Some Other Finite-State Software I Wrote

Software Directly Related to Mine

More On-line Information on Finite State Automata

The bibliography from my Ph.D. thesis lists more sources, but not all of them are available on-line. In the HTML version I provided links to those of them that are. A good source of articles from this field is the Computational Linguistics, a journal distributed to the members of the ACL.

Money

This software is free.

This software is provided "as is". If you have lost a million dollars by using it, that was your million, not mine. I should not be liable for any losses.

Starting from version 0.42, the fsa package can be distributed with a GPL licence except for one third party file that has its own copyright and licence. Since I'm not sure how to understand the legalese gibberish with regard to third party software, GPL is not included in the package.

Feedback

If you find bugs in those programs, please tell me about them. If you do not, you will have to correct them yourself in all future versions!
Jan Daciuk
This page contains no Javascript. No need to analyze the source!