Newsgroups: comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!swrinde!gatech!bloom-beacon.mit.edu!news.kei.com!simtel!noc.netcom.net!netcom.com!pingpong
From: pingpong@netcom.com (Robert Shain)
Subject: Scanned Documents to HTML/SGML??
Message-ID: <pingpongD9E8qG.34M@netcom.com>
Organization: NETCOM On-line Communication Services (408 261-4700 guest)
X-Newsreader: TIN [version 1.2 PL1]
Date: Tue, 30 May 1995 13:42:15 GMT
Lines: 7
Sender: pingpong@netcom.netcom.com

Does anyone know if there are algorithms or software that can figure out 
where Chapters, subsections, figures, pointers to figures begin.  Even if 
60% were done automatically, it would help.  I need to put some scanned 
research docs up as a WAIS database.

- Bob

