Newsgroups: comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!agate!ames!news.Hawaii.Edu!uhunix3.uhcc.Hawaii.Edu!pollarda
From: pollarda@uhunix3.uhcc.Hawaii.Edu (Art Pollard)
Subject: Re: n-tag spell checking??
Message-ID: <Cz3CAD.41H@news.Hawaii.Edu>
Sender: news@news.Hawaii.Edu
Organization: University of Hawaii
References: <1994Nov7.180757.3984@seas.smu.edu>
Date: Fri, 11 Nov 1994 06:49:24 GMT
Lines: 27


Well, this might work for some errors but not others.

Approxamently 80% of all spelling errors occur from one of the following:

One letter missing
One letter wrong
One extra letter
Transposed letters

However, the other 20% is either a royaly screwed up word or, a word which
looks and sounds right but is wrong.  

So, it will work to a point.  Unfortunely, I don't see it as a viable
technique because it takes too much memory to store the model.  I have written
a commercial spellchecker toolkit and get by quite well with approx. 60K of 
RAM.  A model for English such as you describe will take much more than that 
if it is any good.  (Possibly on the order of 1MB.)  Also, you run into the 
possibility of "missing" a character sequence when training it unless you run
the trainer against a dictionary.  Also, there are enough weird words out there
that you might run into problems tagging words as wrong when they are infact
right.  There is unfortunely, no way to escape using a dictionary of some sort.

-Art



