Newsgroups: comp.speech
Path: pavo.csi.cam.ac.uk!warwick!pipex!uunet!olivea!koriel!sh.wide!wnoc-kyo!atrwide!atr-la!paul
From: paul@itl.atr.co.jp (Paul Taylor)
Subject: Re: Do I need more than 8-bits voice recognition?
In-Reply-To: pablo@austin.ibm.com's message of Fri, 3 Dec 1993 14:19:10 GMT
Message-ID: <PAUL.93Dec10115057@as51.itl.atr.co.jp>
Sender: news@itl.atr.co.jp (USENET News System)
Nntp-Posting-Host: as51
Organization: ATR Interpreting Telecommunications Research Labs., Japan
References: <CHGqFz.nEM@austin.ibm.com>
Date: Fri, 10 Dec 1993 02:50:57 GMT
Lines: 40

In article <CHGqFz.nEM@austin.ibm.com> pablo@austin.ibm.com (Paul Greenwood) writes:

   I am making a voice-input device for a computer.  I am trying to decide if I
   need 10,12, or even 16 bits.  If I put the math to it, I come up with:

   With 8-bits:
   Assuming a 5Vp waveform, 1 / 256 = .00390625
   This is .00390625 * 5volts = .01953125 volts of resolution


   With 12-bits:
   Assuming a 5Vp waveform, 1 / 4096 = .0002441406
   This is .002441406 * 5volts = .0012207031 volts of resolution

   Now, it seems to me that 2/100th of a volt in relation to 5 volts full-scale
   would only be background noise as it is.  This is the 8-bit converter.  12 or
   more bits seems overkill.  So, why do the manufacturers of the Soundblaster and
   the ThunderBoard have a 16-bit A/D?  Is there a reason?  Am I missing
   something?  Is my math right?  Is my assumption that 2/100th of a volt is
   insignificant wrong?

The standard way of looking at these things is in DB. 8 bit is 48dB, 12 bit
is 72dB and 16 bit is 96db. Human hearing typically has a useful
sensitivity of 60dB (I think). Record players are about 60db and CDs
are 96dB.

So you are right in that 16 bit does seem like overkill, but 8 bit
is too low to really ensure you have all the dynamic range. A while
ago, people used 12 bit sampling, but 16 bit seems more common, maybe
due to ease in storing the data.

However, it is really only in playback that the dynamic range is important,
If you want to listen to Opera a high dynamic range is prefferable. For
sound input, especially speech, I reckon that 8 bit should be fine. Some
information might be lost, but I would be surprised if it made a big
difference.

Paul Taylor
ATR, Japan.

