From the NannyMUD documentation

LAST CHANGE

2000-09-15

NAME

        typo - Typo detection/correction.

FUNCTIONS

QUERY FUNCTIONS

	investigate_word

DESCRIPTION

	INTRODUCTION
	This object provides functionality by which you can detect
	single typoes in user input. This can be used to find the
	correct words, and suggest those to the user. This is very
	useful in a help system.

	It is common knowledge, or so the customer support at my work
	tells me, that the most common errors when writing texts on a
	keyboard is to make single typing mistakes. There are four
	such common mistakes:

	1/ Two consecutive characters are swapped.
	2/ One character is missing.
	3/ There is an extra character.
	4/ One character has been replaced.


	USING THE RULES
	When the user enters a word which does not match anything in
	the dictionary, we assume that it is a typo of the above
	kind. Thus, a single mistake has been done. We have:

	correct ->(apply one rule once) -> typo.

	Thus, the typo is one step away from the correct word. But
	this is bidirectional; we also have:

	typo ->(apply one rule once) -> correct.

	Thus, we can generate a list of candidates of correct words by
	applying the rules to the typo. Mosty of the words in the list
	of candidates will not be correct, and we check the list
	against the dictionary. After that, we will have a very short
	list of candidates. usually it contains a single word.


	PERFORMANCE CONSIDERATION
	We will have to loop over the content of the word for rule 1
	and 2, and over the word and alphabet content for rule 3 and
	4. If the word contains N characters, and the alphabet is M
	long, we have:
	1/ Rule 1 will give N-1 candidates.
	2/ Rule 2 will give N candidates.
	3/ Rule 3 will give M*(N+1) candidates.
	4/ Rule 4 will give (M-1)*N candidates.

	All in all, we get 2*M*N + N + M + 1 candidates, i.e. the
	method has an Ordo(M*N) behaviour. For the alphabet a-zåäö-_
	M is 31, giving candidate lists that are 63*N + 32 words
	long. At the time of writing, the longest keyword used in the
	xdoc system is 'query_prevent_move_from_statue', which gives a
	max length of the candidate list of 1932.

FUNCTION


NAME

        investigate_word - Find similar words.

SYNTAX

	varargs string *investigate_word(string word,
				         mapping dictionary,
					 [ string *alphabet ] )

DESCRIPTION

	This function finds words in the dictionary that are one step
	away from the given word. It uses the given alphabet in the
	process.