From the NannyMUD documentation
2000-09-15
typo - Typo detection/correction.
investigate_word
INTRODUCTION This object provides functionality by which you can detect single typoes in user input. This can be used to find the correct words, and suggest those to the user. This is very useful in a help system. It is common knowledge, or so the customer support at my work tells me, that the most common errors when writing texts on a keyboard is to make single typing mistakes. There are four such common mistakes: 1/ Two consecutive characters are swapped. 2/ One character is missing. 3/ There is an extra character. 4/ One character has been replaced. USING THE RULES When the user enters a word which does not match anything in the dictionary, we assume that it is a typo of the above kind. Thus, a single mistake has been done. We have: correct ->(apply one rule once) -> typo. Thus, the typo is one step away from the correct word. But this is bidirectional; we also have: typo ->(apply one rule once) -> correct. Thus, we can generate a list of candidates of correct words by applying the rules to the typo. Mosty of the words in the list of candidates will not be correct, and we check the list against the dictionary. After that, we will have a very short list of candidates. usually it contains a single word. PERFORMANCE CONSIDERATION We will have to loop over the content of the word for rule 1 and 2, and over the word and alphabet content for rule 3 and 4. If the word contains N characters, and the alphabet is M long, we have: 1/ Rule 1 will give N-1 candidates. 2/ Rule 2 will give N candidates. 3/ Rule 3 will give M*(N+1) candidates. 4/ Rule 4 will give (M-1)*N candidates. All in all, we get 2*M*N + N + M + 1 candidates, i.e. the method has an Ordo(M*N) behaviour. For the alphabet a-zåäö-_ M is 31, giving candidate lists that are 63*N + 32 words long. At the time of writing, the longest keyword used in the xdoc system is 'query_prevent_move_from_statue', which gives a max length of the candidate list of 1932.
NAME
investigate_word - Find similar words.SYNTAX
varargs string *investigate_word(string word, mapping dictionary, [ string *alphabet ] )DESCRIPTION
This function finds words in the dictionary that are one step away from the given word. It uses the given alphabet in the process.