Archive-name: comp-speech-faq/part2
Last-modified: 1995/01/19


              COMP.SPEECH FAQ POSTING - PART 2/3


[Note: this document has been automatically extracted from a WWW site:
        http://www.speech.su.oz.au/comp.speech
This may introduce some formatting errors.]



===========================================================================

   
FAQ SECTION 2 - Signal Processing for Speech

  Q2.1: WHAT SAMPLING DO I NEED FOR SPEECH?
  
   For recorded speech to be understood by humans you need an 8kHz
   sampling rate or more and at least 8 bit sampling. This produces poor
   quality speech - but in can be understood.
   
   Improvements can be achieved by increasing the number of bits in
   sampling to 12bits or 16bits, or by using a non-linear encoding
   technique such as mu-law or A-law (see Q2.7). This improves the
   "signal-to-noise" ratio.
   
   Increasing the sampling rate above 8kHz, say to 10kHz, 16kHz or 20Khz,
   improves the frequency response: the higher the sampling frequency the
   better the high frequency content will be. A 16kHz sampling rate is a
   reasonable target for high quality speech recording and playback.
   
   When doing speech recognition you need to remember that the your
   computer is not as good as your ear so it will have trouble with poor
   quality sounds. The choice of an appropriate sampling setup depends
   very much on the speech recognition task and the amount of computer
   power available.
     _________________________________________________________________
   
  Q2.2: HOW DO I FIND THE PITCH OF A SPEECH SIGNAL?
  
   This topic comes up regularly in the comp.dsp newsgroup. Question 2.5
   of the FAQ posting for comp.dsp gives a comprehensive list of
   references on the definition, perception and processing of pitch.
     _________________________________________________________________
   
  Q2.3: HOW DO I FIND THE START AND END POINTS OF A SPEECH SIGNAL?
  
   A large number of papers have been presented on this task. Try the
   following papers:
     * Rabiner LR, Sambur MR, "An Algorithm for Determining the Endpoints
       of Isolated Utterances", Bell System Technical Journal, Vol 54,
       No. 2, pp 297-315, 1975.
     * Drago, P.G. et al. "Digital Dynamic Speech Detectors." IEEE Trans
       on Communications, Vol 26, No 1, Jan 78, pp. 140-145.
     * Newman, W.C. "Detecting Speech with an Adapative Neural Network."
       Electronic Design. 22 March 1990.
     * Taboada. J et al "Explicit Estimation of Speech Boundaries" IEE
       Proc. Sci. Meas. Technol., Vol 141, No.3, May 1994 pp153-159.
       
     _________________________________________________________________
   
  Q2.4: WHERE CAN I FIND FFT SOFTWARE?
  
   Try the following file available by anonymous ftp. It contains a
   series of optimised fft routines, including mixed-radix algorithms.
   The .gz suffix indicates GNU zip format.
     * ftp://usc.edu/pub/C-numanal/fft-stuff.tar.gz
       
     _________________________________________________________________
   
  Q2.5: WHAT SIGNAL PROCESSING TECHNIQUES ARE USED IN SPEECH TECHNOLOGY?
  
   This question is far to big to be answered in a FAQ posting.
   Fortunately there are many good books which answer the question. Some
   good introductory books include
     * Digital processing of speech signals; L. R. Rabiner, R. W.
       Schafer. Englewood Cliffs; London: Prentice-Hall, 1978
     * Voice and Speech Processing; T. W. Parsons. New York; McGraw Hill
       1986
     * Computer Speech Processing; ed Frank Fallside, William A. Woods
       Englewood Cliffs: Prentice-Hall, c1985
     * Digital speech processing : speech coding, synthesis, and
       recognition edited by A. Nejat Ince; Kluwer Academic Publishers,
       Boston, c1992
     * Speech science and technology; edited by Shuzo Saito pub. Ohmsha,
       Tokyo, c1992
     * Speech analysis; edited by Ronald W. Schafer, John D. Markel New
       York, IEEE Press, c1979
     * Douglas O'Shaughnessy -- Speech Communication: Human and Machine
       Addison Wesley series in Electrical Engineering: Digital Signal
       Processing, 1987.
     * Discrete-time processing of speech signals; John R Deller, John G
       Proakis, John H L Hansen; Macmillan 1993.
     * Signal processing of speech; F J Owens; Macmillan 1993.
       
     _________________________________________________________________
   
  Q2.6: WHAT SPEECH SAMPLING AND SIGNAL PROCESSING HARDWARE CAN I USE?
  
   In addition to the following information, have a look at the Audio
   File format document prepared by Guido van Rossum (see details in
   Section 1.8).
   
   Can anyone provide information on Mac, SGI, NeXT and other hardware?
   
    Sun standard audio port: SPARC I & II
     * Input and Output: 1 channel, 8 bit mu-law encoded, 8kHz sample
       rate. This provides telephone quality sampling.
       
    Sun standard audio port (SPARC 10 & 20)
     * Input and Output: Stereo (2 channels). 16-bit linear sampling.
       Multiple sample rates (48000, 44100, 37800, 32000, 22050, 18900,
       16000, 11025, 9600, 8000 Hz)
       
    Macintosh Audio Hardware - an overview
     * Description: ALL Macintosh computers come with the ability to
       play back sounds at any sample rate (sample rate conversion is
       done in software.) Older machines have 8 bit stereo output
       (hardware runs at 22254 samples/second). The newer machines have
       16 bit stereo hardare running at 44100 samples/second.
       
       Most of the recent Macintosh computers come with sound input
       hardware. There are probably exceptions to this, but the older and
       some of the current low-end machines have 8 bit (linear) mono
       hardware running at 22254.54 samples/second. All of the PowerPC,
       AV, and the 500 series notebook computers come with 16 bit 44kHz
       stereo sampling hardware. They can also record at 22050
       samples/second. The sound manager implements an AGC (Automatic
       Gain Control) function for the 8 bit hardware. The drivers have a
       switch to turn off the AGC.
       
       There are a number of DSP vendors that support high quality audio.
       Generally this means quieter analog sections, and more IO formats
       (AES/IBU, for example). Try DigiDesign and Spectral Innovations.
       
       The software drivers for sound are described in "Inside Macintosh:
       Sound". If you want to see some sample code check out the sources
       for the Matlab "Sound and Image Toolbox". They can be found at
          + ftp://ftp.apple.com/pub/malcolm/SoundAndImageToolbox.cpt.hqx
   
       Routines that play and record sounds using the toolbox are
       included (and interfaced to Matlab).
       
    Ariel Signal Processors
     * Platform: Various
     * Description: A range of signal I/O, A/D, D/A and DSP products
       are available. There are too many to list.
     * Contact:
    Ariel Corp.
    433 River Road, Highland Park, NJ 08904.
    Ph: 908-249-2900 Fax: 908-249-2123 DSP BBS: 908-249-2124
    
    IBM RS/6000 ACPA (Audio Capture and Playback Adapter)
     * Description: The card supports PCM, Mu-Law, A-Law and ADPCM at
       44.1kHz (& 22.05, 11.025, 8kHz) with 16-bits of resolution in
       stereo. The card has a built-in DSP (don't know which one). The
       device also supports various formats for the output data, like
       big-endian, twos complement, etc. Good noise immunity.
       
       The card is used for IBM's VoiceServer (they use the DSP for
       speech recognition). Apparently, the IBM voiceserver has a
       speaker-independent vocabulary of over 20,000 words and each ACPA
       can support two independent sessions at once.
     * Cost: $US495
     * Contact: ?
       
    Sound Galaxy NX , Aztech Systems
     * Platform: PC - DOS,Windows 3.1
     * Cost: ?
     * Input: 8bit linear, 4-22 kHz.
     * Output: 8bit linear, 4-44.1 kHz
     * Misc: 11-voice FM Music Synthesizer YM3812; Built-in power
       amplifier; DSP signal processing support - ST70019SB Hardware
       ADPCM decompression (2:1,3:1,4:1) "AdLib" and "Sound Blaster"
       compatbility. Software includes a simple Text-to-Speech program
       "Monologue".
       
    Sound Galaxy NX PRO, Aztech Systems
     * Platform: PC - DOS,Windows 3.1
     * Cost: ?
     * Input: 2 * 8bit linear, 4-22.05 kHz(stereo), 4-44.1 KHz(mono).
     * Output: 2 * 8bit linear, 4-44.1 kHz(stereo/mono)
     * Misc: 20-voice FM Music Synthesizer; Built-in power amplifier;
       Stereo Digital/Analog Mixer; Configuration in EEPROM. Hardware
       ADPCM decompression (2:1,3:1,4:1). Includes DSP signal processing
       support. "AdLib" and "Sound Blaster Pro II" compatybility.
       Software includes a simple Text-to-Speech program "Monologue" and
       Sampling laboratory for Windows 3.1: WinDAT.
     * Contact: USA (510)6238988
       
    ATI Stereo F/X Sound Board
     * Platform: PC XT or AT - DOS, Windows 3.0, 3.1
     * Cost: $120 Canadian
     * Description: Input - 8 bit ADC, 44.1 kHz mono, 22.05 kHz Stereo.
       Output - Dynamic range = 48 dB, 32 anti-aliasing filters. Adds
       Stereo effect to existing mono Adlib or Sound Blaster apps.
       11-voice YAMAHA FM Music Synthesizer. Built-in 8 watt power
       amplifier, 4 watts per channel. Volume ctrl on rear. 2 Joystick
       input, software setup (no switches), software included. "AdLib"
       and "Sound Blaster" compatibility. DMA support for high speed
       digital audio. ADPCM decomp @ 4:1, 3:1, 2:1. Will play .WAV files.
       Optional MIDI I/O port $79. (MIDI IN, OUT, THRU, and sequencer).
     * Contact:
    ATI Technologies Inc.
    3761 Victoria Park Avenue, Scarborough, Ontario
    CANADA, M1W 3S2
    Ph: (416) 756-0711 Fax: (416) 756-0720
    BBS: (416) 764-9404 (9600 baud N.8.1)
    
    Other PC Sound Cards
============================================================================
sound          stereo/mono              compatible     included   voices
card           & sample rate            with           ports
============================================================================
Adlib Gold     stereo: 8-bit 44.1khz    Adlib ?        audio      20 (opl3)
1000                  16-bit 44.1khz                   in/out,    +2 digital
               mono: 8-bit 44.1khz                     mic in,    channels
                    16-bit 44.1khz                     joystick,
                                                       MIDI

Sound Blaster  mono: 8-bit 22.1khz      Adlib          audio       11 synth.
               FM synth with                           in/out,
               2 operators                             joystick,

Sound Blaster  stereo: 8-bit 22.05khz   Adlib          audio       22
Pro Basic      mono: 8-bit 44.1khz      Sound Blaster  in/out,
                                                       joystick,

Sound Blaster  stereo: 8-bit 22.05khz   Adlib          audio       11
Pro            mono: 8-bit 44.1khz      Sound Blaster  in/out
                                                       joystick,
                                                       MIDI, SCSI

Sound Blaster  stereo: 8-bit 4-44.1khz  Sound Blaster  audio       20
16 ASP         stereo: 16-bit 4-44.1khz                in/out,
                                                       joystick,
                                                       MIDI

Audio Port     mono: 8-bit 22.05khz     Adlib          audio       11
                                        Sound Blaster  in/out,
                                                       joystick

Pro Audio      stereo: 8-bit 44.1khz    Adlib          audio,      20
Spectrum +                              Pro Audio      in/out,
                                        Spectrum       joystick

Pro Audio      stereo: 16-bit 44.1khz   Adlib          audio       20
Spectrum 16                             Pro Audio      in/out,
                                        Spectrum       joystick,
                                        Sound Blaster  MIDI, SCSI

Thunder Board  stereo: 8-bit 22khz      Adlib          audio       11
                                        Sound Blaster  in/out,
                                                       joystick

Gravis         stereo: 8-bit 44.1khz    Adlib,         audio line  32 sampled
Ultrasound     mono: 8-bit 44.1khz      Sound Blaster  in/out,     32 synth.
                                                       amplified
                                                       out,
               (w/16-bit daughtercard)                 mic in, CD
               stereo: 16-bit 44.1khz                  audio in,
               mono: 16-bit 44.1khz                    daughterboard
                                                       ports (for
                                                       SCSI and
                                                       16-bit)

MultiSound     stereo: 16-bit 44.1kHz   Nothing        audio       32 sampled
               64x oversampling                        in/out,
                                                       joystick,
                                                       MIDI

=============================================================================

     _________________________________________________________________
   
  Q2.7: HOW DO I CONVERT TO/FROM MU-LAW FORMAT?
  
   Mu-law coding is a form of compression for audio signals including
   speech. It is widely used in the telecommunications field because it
   improves the signal-to-noise ratio without increasing the amount of
   data. Typically, mu-law compressed speech is carried in 8-bit samples.
   It is a companding technqiue. That means that carries more information
   about the smaller signals than about larger signals.
   
   On SUN Sparc systems have a look in the directory /usr/demo/SOUND.
   Included are table lookup macros for ulaw conversions. [Note however
   that not all systems will have /usr/demo/SOUND installed as it is
   optional - see your system admin if it is missing.]
   
   OR, here is some sample conversion code in C.
/**
 ** Signal conversion routines for use with Sun4/60 audio chip
 **/

#include stdio.h

unsigned char linear2ulaw(/* int */);
int ulaw2linear(/* unsigned char */);

/*
** This routine converts from linear to ulaw
**
** Craig Reese: IDA/Supercomputing Research Center
** Joe Campbell: Department of Defense
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711  (very difficult to follow)
** 2) "A New Digital Technique for Implementation of Any
**     Continuous PCM Companding Law," Villeret, Michel,
**     et al. 1973 IEEE Int. Conf. on Communications, Vol 1,
**     1973, pg. 11.12-11.17
** 3) MIL-STD-188-113,"Interoperability and Performance Standards
**     for Analog-to_Digital Conversion Techniques,"
**     17 February 1987
**
** Input: Signed 16 bit linear sample
** Output: 8 bit ulaw sample
*/

#define ZEROTRAP    /* turn on the trap as per the MIL-STD */
#define BIAS 0x84   /* define the add-in bias for 16 bit samples */
#define CLIP 32635

unsigned char
linear2ulaw(sample)
int sample; {
  static int exp_lut[256] = {0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
                             4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
                             5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
                             5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
                             6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
                             6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
                             6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
                             6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};
  int sign, exponent, mantissa;
  unsigned char ulawbyte;

  /* Get the sample into sign-magnitude. */
  sign = (sample >> 8) & 0x80;          /* set aside the sign */
  if (sign != 0) sample = -sample;              /* get magnitude */
  if (sample > CLIP) sample = CLIP;             /* clip the magnitude */

  /* Convert from 16 bit linear to ulaw. */
  sample = sample + BIAS;
  exponent = exp_lut[(sample >> 7) & 0xFF];
  mantissa = (sample >> (exponent + 3)) & 0x0F;
  ulawbyte = ~(sign | (exponent << 4) | mantissa);
#ifdef ZEROTRAP
  if (ulawbyte == 0) ulawbyte = 0x02;   /* optional CCITT trap */
#endif

  return(ulawbyte);
}

/*
** This routine converts from ulaw to 16 bit linear.
**
** Craig Reese: IDA/Supercomputing Research Center
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711  (very difficult to follow)
** 2) MIL-STD-188-113,"Interoperability and Performance Standards
**     for Analog-to_Digital Conversion Techniques,"
**     17 February 1987
**
** Input: 8 bit ulaw sample
** Output: signed 16 bit linear sample
*/

int
ulaw2linear(ulawbyte)
unsigned char ulawbyte;
{
  static int exp_lut[8] = {0,132,396,924,1980,4092,8316,16764};
  int sign, exponent, mantissa, sample;

  ulawbyte = ~ulawbyte;
  sign = (ulawbyte & 0x80);
  exponent = (ulawbyte >> 4) & 0x07;
  mantissa = ulawbyte & 0x0F;
  sample = exp_lut[exponent] + (mantissa << (exponent + 3));
  if (sign != 0) sample = -sample;

  return(sample);
}

     _________________________________________________________________


===========================================================================

   
FAQ SECTION 3 - Speech Coding and Compression

  Q3.1: SPEECH COMPRESSION TECHNIQUES.
  
   Can anyone provide a 1-2 page summary on speech compression?
   
   Note: the FAQ for comp.compression includes a few questions and
   answers on the compression of speech.
     _________________________________________________________________
   
  Q3.2: WHAT ARE SOME GOOD REFERENCES/BOOKS ON CODING/COMPRESSION?
     * Douglas O'Shaughnessy -- Speech Communication: Human and Machine
       Addison Wesley series in Electrical Engineering: Digital Signal
       Processing, 1987.
     * Bishnu Atal in ed. Fallside, F. and W. Woods, ed. Computer Speech
       Processing. London: Prentice/Hall International, 1985.
     * Makhoul, J. "Linear Prediction: A Tutorial Review." Proc. of the
       IEEE 63 (1975): 561 - 580.
       
     _________________________________________________________________
   
  Q3.3: WHAT SPEECH COMPRESSION/CODING SOFTWARE IS AVAILABLE?
  
   Note: there are two types of speech compression technique referred to
   below. Lossless technqiues preserve the speech through a
   compression-decompression phase. Lossy techniques do not preserve the
   speech prefectly. As a general rule, the more you compress speech, the
   more the quality degardes.
   
    File format conversion
     * Platform: SUN OS?
     * Description: Conversion utility able to encode and decode
       between the the following formats: G.723, G.721, A-law, u-law and
       linear.
     * Availability: By anonymous ftp from
          + ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.Z
            
    shorten - a lossless compressor for speech signals
     * Platform: UNIX/DOS
     * Description: A fast waveform coder suitable for a speech and
       music signals in a wide variety of file formats. The degree of
       compression is adjustable from lossless to three bits a sample.
       16bit 16kHz speech generally attains 50% lossless compression and
       16:3 compression of CDROM quality speech is obtainable with only
       minor audiable degredation.
     * Availability: Anonymous ftp - UNIX and DOS versions are in
          +
            ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/shorten-1.
            14.tar.Z
          +
            ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/shn114.zip
            
    32 kbps ADPCM
     * Platform: SGI and Sun Sparcs
     * Description: 32 kbps ADPCM C-source code (G.721 compatibility is
       uncertain)
     * Contact: Jack Jansen
     * Availablity: Anoymous ftp
          + ftp://ftp.cwi.nl/pub/adpcm.shar
            
    GSM 06.10 Compression
     * Platform: Unix; faster than real time on most Sun SPARCstations
     * Description: GSM 06.10 is a standardized lossy speech
       compression employed by most European wireless telephones. It uses
       RPE/LTP (residual pulse excitation/long term prediction) coding to
       compress frames of 160 13-bit samples (8 kHz sampling rate, i.e. a
       frame rate of 50 Hz) into 260 bits.
     * Contact: GSM 06.10 support and implementation
       jutta@cs.tu-berlin.de, cabo@cs.tu-berlin.de
     * Availability: The following configurations are available be
       anonymous ftp:
          + gzip compression from Germany:
            ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/gsm-1.0.5.
            tar.gz
          + MS-DOS compression from Germany:
            ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/gsm-105.zi
            p
          + MS-DOS compression from USA:
            ftp://ftp.mv.com/pub/ddj/1194.12/gsm-105.zip
     * Misc: The WWW site is
          + http://www.cs.tu-berlin.de/~jutta/toast.html
            
    G.711/721/723 Compression
     * Description:
          + G.711 : CCITT u-law and A-law compression
          + G.721 : CCITT 32 kbps ADPCM coder
          + G.723 : CCITT 24 kbps and 40 kbps ADPCM coders
     * Availability: By email to teledoc@itu.arcom.ch, with
                GET ITU-3022
   as the *only* line in the body of the message. This is also available
       by anonymous ftp from:
          +
            ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/G711_G721_
            G723.tar.Z
            
    G.728 Compression
     * Description: G.728 low delay celp package written by Alex
       Zatsman of Analog Devices, Inc.
     * Availability: By anonymous ftp from
          + ftp://dspsun.eas.asu.edu/pub/speech/ldcelp.tgz
            
    G.728 LD-CELP vocoder
     * Platform: Analog Devices ADSP-2171
     * Description: Real-time, full-duplex G.728 LD-CELP vocoder that
       runs on a single Analog Devices ADSP-2171. Source and object code
       available for a one-time license fee.
     * Contact:
    Cole Erskine
    Analogical Systems
    299 California Avenue, Suite 120
    Palo Alto, CA 94306, USA
    Tel:(415) 323-3232 FAX:(415) 323-4222
    Internet: cole@analogical.com
    
    U.S.F.S. 1016 CELP vocoder for DSP56001
     * Platform: DSP56001
     * Description: Real-time U.S.F.S. 1016 CELP vocoder that runs on a
       single 27MHz Motorola DSP56001. Free demo software available for
       PC-56 and PC-56D. Source and object code available for a one-time
       license fee.
     * Contact:
    Cole Erskine
    Analogical Systems
    299 California Avenue, Suite 120
    Palo Alto, CA 94306, USA
    Tel:(415) 323-3232 FAX:(415) 323-4222
    Email: cole@analogical.com
    
    8 Kbit/s CELP on the TMS320C5x family of DSP chips
     * Description: For low bandwidth transmission of voice, compact
       voice storage for archival purposes, low-cost digital answering
       machines and efficient storage for voice mail. Features :
          + near toll quality at 8 Kb/s.
          + Variable rate option with 1 Kb/s silence encoding.
          + Implemented on a fixed-point processor for lower system cost.
          + Attractive licensing scheme.
          + Future availability of 4 Kb/s.
          + Custom rates possible.
   Capacity :
          + Two half-duplex or one full duplex channels on the 20 MIPS
            'C5x (at 95% and 55% CPU utilization respectively).
          + Two full duplex channels on the 28.6 MIPS 'C5x (at 77% CPU
            utilization).
          + Requires 9 K-words program memory and 3 K-words data memory.
          + Decoding in real-time on a 486 class CPU.
     * Contact:
    CVI Inc.
    443 Vienna Cres. North Vancouver, BC, Canada V7N 3B3
    Tel: (604) 987 1719 Fax: (604) 986 8139
    Email: cvi@extropia.wimsey.com
    
    CELP 3.2a & LPC
     * Platform: Sun (the makefiles & source can be modified for other
       platforms)
     * Description: CELP is lossy compression technqiue. The U.S. DoD's
       Federal-Standard-1016 based 4800 bps code excited linear
       prediction voice coder version 3.2a (CELP 3.2a) Fortran and C
       simulation source codes. Available for worldwide distribution (on
       DOS diskettes, but configured to compile on Sun SPARC stations)
       from NTIS and DTIC. Example input and processed speech files are
       included. A Technical Information Bulletin (TIB), "Details to
       Assist in Implementation of Federal Standard 1016 CELP," and the
       official standard, "Federal Standard 1016, Telecommunications:
       Analog to Digital Conversion of Radio Voice by 4,800 bit/second
       Code Excited Linear Prediction (CELP)," are also available.
     * Availability 1: Through the National Technical Information
       Service:
    NTIS
    U.S. Department of Commerce
    5285 Port Royal Road, Springfield, VA 22161, USA
   
       The "AD" ordering number for the CELP software is AD M000 118 (US$
       90.00) and for the TIB it's AD A256 629 (US$ 17.50). The LPC-10
       standard, described below, is FIPS Pub 137 (US$ 12.50). There is a
       $3.00 shipping charge on all U.S. orders. The telephone number for
       their automated system is 703-487-4650, or 703-487-4600 if you'd
       prefer to talk with a real person.
       
       (U.S. DoD personnel and contractors can receive the package from
       the Defense Technical Information Center: DTIC, Building 5,
       Cameron Station, Alexandria, VA 22304-6145. Their telephone number
       is 703-274-7633.)
     * Availability 2: By anonymous ftp from:
          + ftp://ftp.super.org(192.31.192.1)/pub/celp_3.2a.tar.Z
          + OR
            ftp://svr-ftp.eng.cam.ac.uk/comp.speech/sources/celp_3.2a.tar
            .Z
     * Misc: The following articles describe the Federal-Standard-1016
       4.8-kbps CELP coder (it's unnecessary to read more than one):
          + Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C.
            Welch, "The Federal Standard 1016 4800 bps CELP Voice Coder,"
            Digital Signal Processing, Academic Press, 1991, Vol. 1, No.
            3, p. 145-155.
          + Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C.
            Welch, "The DoD 4.8 kbps Standard (Proposed Federal Standard
            1016)," in Advances in Speech Coding, ed. Atal, Cuperman and
            Gersho, Kluwer Academic Publishers, 1991, Chapter 12, p.
            121-133.
          + Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C.
            Welch, "The Proposed Federal Standard 1016 4800 bps Voice
            Coder: CELP," Speech Technology Magazine, April/May 1990, p.
            58-64.
   
       The U.S. DoD's Federal-Standard-1015/NATO-STANAG-4198 based 2400
       bps linear prediction coder (LPC-10) was republished as a Federal
       Information Processing Standards Publication 137 (FIPS Pub 137).
       It is described in:
          + Thomas E. Tremain, "The Government Standard Linear Predictive
            Coding Algorithm: LPC-10," Speech Technology Magazine, April
            1982, p. 40-49.
   
       There is also a section about FS-1015 in the book:
          + Panos E. Papamichalis, Practical Approaches to Speech Coding,
            Prentice-Hall, 1987.
   
       The voicing classifier used in the enhanced LPC-10 (LPC-10e) is
       described in:
          + Campbell, Joseph P., Jr. and T. E. Tremain, "Voiced/ Unvoiced
            Classification of Speech with Applications to the U.S.
            Government LPC-10E Algorithm," Proceedings of the IEEE
            International Conf. on Acoustics, Speech, and Signal
            Processing, 1986, p. 473-6.
   Copies of the official standard, "Federal Standard 1016, Tele-
       communications: Analog to Digital Conversion of Radio Voice by
       4,800 bit/second Code Excited Linear Prediction (CELP)" are
       available for US$ 5.00 each from:
    GSA Federal Supply Service Bureau
    Specification Section, Suite 8100
    470 E. L'Enfant Place, S.W.
    Washington, DC 20407
    (202)755-0325
   Realtime DSP code for FS-1015 and FS-1016 is sold by:
    John DellaMorte, DSP Software Engineering
    165 Middlesex Tpk, Suite 206, Bedford, MA 01730, USA
    Ph: 1-617-275-3733 Fax: 1-617-275-4323
    dspse.bedford@channel1.com
   DSP Software Engineering's FS-1016 code can run on a DSP Research's
       Tiger 30 (a PC board with a TMS320C3x and analog interface suited
       to development work).
    DSP Research
    1095 E. Duane Ave, Sunnyvale, CA 94086, USA
    Ph: (408)773-1042 Fax: (408)736-3451
    
     _________________________________________________________________


===========================================================================

   
FAQ SECTION 4 - Natural Language Processing

   There is now a newsgroup specifically for Natural Language Processing.
   It is called comp.ai.nat-lang.
   
   There is also a lot of useful information on Natural Language
   Processing in the FAQ for comp.ai. That FAQ lists available software
   and useful references. It includes a substantial list of software,
   documentation and other info available by ftp.
     _________________________________________________________________
   
  Q4.1: WHAT ARE SOME GOOD REFERENCES/BOOKS ON NLP?
  
   Take a look at the FAQ for the "comp.ai" newsgroup as it also includes
   some useful references.
     * James Allen: Natural Language Understanding, (Benjamin/Cummings
       Series in Computer Science) Menlo Park: Benjamin/Cummings
       Publishing Company, 1987.
          + This book consists of four parts: syntactic processing,
            semantic interpretation, context and world knowledge, and
            response generation.
     * G. Gazdar and C. Mellish, Natural Language Processing in Prolog,
       Addison Wesley, 1989
     * G. Gazdar and C. Mellish, Natural Language Processing in Lisp,
       Addison Wesley, 1989
     * G. Gazdar and C. Mellish, Natural Language Processing in Pop11,
       Addison Wesley, 1989
          + Emphasis on parsing, especially unification-based parsing,
            lots of details on the lexicon, feature propagation, etc.
            Fair coverage of semantic interpretation, inference in
            natural language processing, and pragmatics; much less
            extensive than in Allen's book, but more formal. There are
            three versions, one for each programming language listed
            above, with complete code.
     * Shapiro, Stuart C.: Encyclopedia of Artificial Intelligence Vol.1
       and 2. New York: John Wiley & Sons, 1990.
          + There are articles on the different areas of natural language
            processing which also give additional references.
     * Paris, Ce'cile L.; Swartout, William R.; Mann, William C.:
       Natural Language Generation in Artificial Intelligence and
       Computational Linguistics. Boston: Kluwer Academic Publishers,
       1991.
          + The book describes the most current research developments in
            natural language generation and all aspects of the generation
            process are discussed. The book is comprised of three
            sections: one on text planning, one on lexical choice, and
            one on grammar.
     * Readings in Natural Language Processing, ed by B. Grosz, K.
       Sparck Jones and B. Webber, Morgan Kaufmann, 1986
          + A collection of classic papers on Natural Language
            Processing. Fairly complete at the time the book came out
            (1986) but now seriously out of date. Still useful for ATN's,
            etc.
     * Klaus K. Obermeier, Natural Language Processing Technologies in
       Artificial Intelligence: The Science and Industry Perspective,
       Ellis Horwood Ltd, John Wiley & Sons, Chichester, England, 1989.
       
    Journals
    
   The major journals of the field are
     * Computational Linguistics and Cognitive Science for the
       artificial intelligence aspects,
     * Cognition for the psychological aspects,
     * Language and Linguistics and Philosophy and Linguistic
       Inquiry for the linguistic aspects.
     * Artificial Intelligence occasionally has papers on natural
       language processing.
       
    Conferences
    
   The major conferences of the field are
     * ACL (held every year)
     * and COLING (held every two years). Most AI conferences have a NLP
       track; AAAI, ECAI, IJCAI and the Cognitive Science Society
       conferences usually are the most interesting for NLP. CUNY is an
       important psycholinguistic conference. There are lots of
       linguistic conferences: the most important seem to be NELS, the
       conference of the Chicago Linguistic Society (CLS), WCCFL, LSA,
       the Amsterdam Colloquium, and SALT.
       
     _________________________________________________________________
   
  Q4.2: WHAT NLP SOFTWARE IS AVAILABLE?
  
   Check the comments at the start of this section for information on
   other newsgroups and sources of information on NLP.
   
    Natural Language Software Registry (NLSR) - NLP Tools
     * The Natural Language Software Registry is available from the
       German Research Institute for Artificial Intelligence (DFKI) in
       Saarbrucken. Its purpose is to facilitate the exchange and
       evaluation of natural language processing software within the
       research community. To this end, the NLSR is cataloging natural
       language software projects, both commercial and non- commercial.
       The new updated and enlarged version contains more than 100
       descriptions of natural processing software. Registry listings
       include:
          + speech signal processors, such as the Computerized Speech Lab
            (Kay Elemetrics)
          + morphological analyzers, such as PC-KIMMO (Summer Institute
            for Linguistics)
          + parsers, such as Alveytools (University of Edinburgh)
          + semantic and pragmatic analyzer, such as NLL (University of
            the Saarland, Germany)
          + generation programs, such as FUF (Ben Gurion University of
            the Negev)
          + knowledge representation systems, such as Rhet (University of
            Rochester)
          + multicomponent systems, such as ELU (ISSCO), PENMAN (ISI),
            Pundit (UNISYS), SNePS (SUNY Buffalo),
          + NLP-Tools, such as GULP (University of Georgia) or Linguist
            (Kansai Research Laboratory)
          + applications programs (misc.)
     * If you have developed a piece of software for natural language
       processing that other researchers might find useful, you can
       include it by returning the questionnaire available from the
       sources below.
     * ftp://ftp.dfki.uni-sb.de/pub/registry
     * e-mail: registry@dfki.uni-sb.de
     * post:
    Natural Language Software Registry
    Deutsches Forschungsinstitut fuer Kuenstliche Intelligenz (DFKI)
    Stuhlsatzenhausweg 3
    D-66123 Saarbruecken
    Germany
     * Other ftp sites are
          + ftp://crlftp.nmsu.edu/pub/non-lexical/NL_Software_Registy
          + ftp://dri.cornell.edu/pub/Natural_Language_Software_Registry
            
    Part of Speech Tagger
     * Description: A rule-based part pf speech tagger developed by
       Eric Brill. For a detailed description of the tagger see chapter 6
       of his thesis.
     * Availability: The tagger and description are available by
       anonymous ftp from
          + ftp://lightning.lcs.mit.edu/pub/BRILL/Programs & Papers
            
     _________________________________________________________________




Andrew Hunt
  ---
Speech Technology Research Group		Ph:  61-2-351 4509
Dept. of Electrical Engineering			Fax: 61-2-351 3847
University of Sydney, NSW, 2006, Australia	email: andrewh@speech.su.oz.au