Newsgroups: comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!cs.utexas.edu!convex!seas.smu.edu!pedersen
From: pedersen@seas.smu.edu (Ted Pedersen)
Subject: n-tag spell checking??
Message-ID: <1994Nov7.180757.3984@seas.smu.edu>
Sender: news@seas.smu.edu (USENET News System)
Nntp-Posting-Host: rapid_f.seas.smu.edu
Organization: SMU - School of Engineering & Applied Science - Dallas
Date: Mon, 7 Nov 1994 18:07:57 GMT
Lines: 28


Here's an idea I've been toying with lately...probabilistic spelling
error detection. It seems like the n-tag models familiar to those who
use probabilistic part of speech taggers could also be used to check
spelling. 

For instance, take a large corpus of correctly spelled words. (Suppose
you are using a tri-tag model.) you calculate that `e' follows `th' x
percent of the time, `i' follows `ch' y percent of the time etc. 

Then you use those probabilities to spell check a text. If certain
combinations in your text didn't occur in the training text then you
have a spelling error. 

Does this idea make any sense? Has anyone pursued such an idea? I'm
not much of an expert on spell checking so I don't know if there would
be any advantages to this approach over whatever methods are currently
used. But I've become pretty curious about this notion so I'd
appreciate any comments. 

Regards
Ted

---
* Ted Pedersen                                  pedersen@seas.smu.edu * 
* Department of Computer Science and Engineering,                     *
* Southern Methodist University, Dallas, TX 75275      (214) 768-2126 *

