Newsgroups: comp.ai.doc-analysis.ocr
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!fas-news.harvard.edu!newspump.wustl.edu!news.ecn.bgu.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!spool.mu.edu!bloom-beacon.mit.edu!news!media.mit.edu!testarne
From: testarne@media.mit.edu (Thad E Starner)
Subject: Re: OCR from video images?
Message-ID: <1995Jun13.031655.348@media.mit.edu>
Sender: news@media.mit.edu (USENET News System)
Organization: MIT Media Laboratory
References:  <3ri24f$7s7@decaxp.harvard.edu>
Date: Tue, 13 Jun 1995 03:16:55 GMT
Lines: 31

	OCR is hard in such an unconstrained environment.  Video in particular
is very noisy and low resolution.  Unless the video frames are directly 
from the lecture notes (i.e. the camera is in a stand looking directly
at the paper) and the font is very large, you chance of success is slim.  
Resolution enhancement and image mosaicing can help, but the compute power
required is prohibitive.

	However, another method is possible.  Instead of keying from recognized
words, it may be possible to key from images (though this is difficult too).

See Tech Report #302 and #245 at

http://www-white.media.mit.edu/vismod/cgi-bin/tr_pagemaker

to get a feel for what is possible.  Some of the methods described can 
be extended to general images (active research...not mine).

You may also want to consider a multi-modal and user-assisted approach, 
combining user input, speech, video, and location information to help
with the task.  For more on this, and the equipment that enables you to
do so, see Tech Report #318 at the same location and

http://www-white.media.mit.edu/vismod/people/students/starner.html

and

http://www-white.media.mit.edu/~steve/

A wearable computing web page is also in the works.

						Thad
