From akra@uranus.di.uoa.ariadne-t.gr Tue Jun  7 11:21:06 1994
Path: lyra.csx.cam.ac.uk!warwick!zaphod.crihan.fr!jussieu.fr!math.ohio-state.edu!cs.utexas.edu!not-for-mail
From: akra@uranus.di.uoa.ariadne-t.gr (Argiris Kranidiotis)
Newsgroups: alt.sci.physics.acoustics,comp.dsp,comp.music,comp.speech
Subject: Human Audio Perception FAQ (v.2 / June 4,1994)
Date: 7 Jun 1994 04:57:14 -0500
Organization: UTexas Mail-to-News Gateway
Lines: 501
Sender: nobody@cs.utexas.edu
Distribution: inet
Message-ID: <9406070938.AA01046@uranus.di.uoa.ariadne-t.gr>
NNTP-Posting-Host: news.cs.utexas.edu
Xref: lyra.csx.cam.ac.uk alt.sci.physics.acoustics:874 comp.dsp:7478 comp.music:15033 comp.speech:2637

Note:This is my second attempt to post this message.
I apologize if you get this message for second time.

Argiris A. Kranidiotis
----------------------------------------------------------------------------


           ______________________________________________________
          |                                                      |
          |   HUMAN AUDIO PERCEPTION FREQUENTLY ASKED QUESTIONS  |
          |              version 2.0   June 4 , 1994             |
          |______________________________________________________|


                          I n t r o d u c t i o n
                        ---------------------------

All started from a recent UseNet posting of mine. From the volume of mail I
received , it seems to be a very interesting subject.I decided to release an
edited version of all the answers I received so far in the form of a F.A.Q.
(Frequently Asked Questions).

This version is preliminary.It is still *VERY* incomplete .With your help I
will try to make it as complete as possible.Please read on to see what other
additional information is needed...

The main topic remains the same :

Given two spectra ( STFFT's Short Time Fast Fourier Transforms for example )
we try to estimate a psychoacoustic distance between them (i.e.: a timbral
metric). This involves some additional data:

1) Equal loudness curves (Fletcher-Munson).
   Originally published in J.A.S.A. (Journal of the Acoustical Society of
   America) in 1933. Please send to me your data/approximations/formulae.
   Still more information needed on this subject.

2) Bark frequency scale (Critical Bands) . I have found some approximations
   in the range 0..5 KHz . Again more precise information needed.

3) "Masking" effects . Useful introductory information can be found at the
   MPEG Audio compression FAQ (available via anonymous FTP at sunsite.unc.edu,
   at IUMA archive).

4) Other psychoacoustic data ?

______________________________________________________________________________


-MANY THANKS to all those kind people who contributed to this text
 (they are too many to list).

-My comments are put in square brackets [ ... ].

-A recent version of this text is available via anonymous FTP at:
 svr-ftp.eng.cam.ac.uk ( maintained by Tony Robinson <ajr@eng.cam.ac.uk> )
 Directory: /pub/comp.speech/info , Filename: HumanAudioPerception.
 Please note that this FAQ is *NOT* restricted in speech topics.


                          Argiris A. Kranidiotis

                           University Of Athens
                          Informatics Department

                       akra@zeus.di.uoa.ariadne-t.gr


______________________________________________________________________________

                           Equal loudness curves
______________________________________________________________________________


From: Various people
------------------------------------------------------------------------
-Flecher-Munson curves (the most popular answer).

Peak sensitivity at 3,300 Hz , falling off below 40 Hz, and above 10 kHz.

-"An Introduction to the Psychology of Hearing". By Moore , 3d edition.
(the most popular reference).


From: Vincent Pagel <Vincent.Pagel@loria.fr>
------------------------------------------------------------------------

[...]

It's a family of curves [Fletcher Munson curves --AK] a bit like this:


     Db ^|
	||                            |
	| \                          |
	| |                         |
	|  \                       /
	|   |                     /
	|    \________     ______/
	|             \___/
	|
	|
	|_________________________________________________>  Frequency (Hz)
           400      2500   6000    10000  20000


PERCEPTUALLY all the sounds corresponding to the points on the curve have
the same intensity : this means that the ear has a large range where it is
nearly linear ( 1000 to 8000 Hz ), achieving better result on a little
domain (around 3000 Hz if my memory serves).

[ the curve has a minimum at 3,300 Hz -- AK ]

The rate drops dramatically after 10000 Hz and before 500 Hz ).

You can draw different equal loudness curves depending on the first intensity you
begin with ( e.g. if the intensity at 2500Hz is 50 db you get one curve,
but if you start at 2500 Hz with 70 db you get another equal loudness curve ....
generally equal loudness curves have nearly the same shape and it does not depend
too much on the point it begins at)

To my knowledge there is no mathematical formula given to approximate equal loudness
curves, but with the data in the book by Moor it should not be very difficult
to find an approximation.


From: Angelo Campanella <acampane@magnus.acs.ohio-state.edu>
------------------------------------------------------------------------

Obtain the ISO "Zero Phons" standard threshold of human hearing.

-The standard was ISO 389-1975 "Audiometer Standard Reference Zero".
-The US Equivalent is ANSI S3.6 - 1969.

The following numbers apply:

These are dB re 20 micropascals for a sound of pure tone or very narrow
band noise:

--------------------------------------------------------------------------
Audio Frequency        125   250   500  1000  2000  3000 4000  6000 8000
=========================================================================
Human (Monaural)
Threshold of Hearing   45.5  24.5  11    6.5   8.5   7.5  9     8    9.5
Normal young adult
with undisturbed
hearing.  dB re
20 micropascals.


Binaural hearing is 10 to 15 dB better, since the brain has a magnificent
capability to correlate the simultaneous listening of both ears.


From: walkow@compsci.bristol.ac.uk (Tomasz Walkowiak)
------------------------------------------------------------------------
The equal loudness curve can be approximated by:

E(w)=1.151*SQRT( (w^2+144*10^4)*w^2/((w^2+16*10^4)*(w^2+961*10^4)) )

From: Robinson et al.: Br.J.A.Phys. 7, 166-181, 1956.

This approximation is for Nyquist frequency equal to 5 kHz, so
w = 2*Pi*f/5kHz   , for 0<f<5kHz. Therefore E(w) is defined for 0<w<Pi.
The E(w) is linear.  And usually is applied to the power spectrum.


______________________________________________________________________________

                        Bark scale / Critical Bands
______________________________________________________________________________


From: basbug@netcom.com (Filiz Basbug)
------------------------------------------------------------------------

>From a paper given by David Lubman at Inter-Noise '92(Toronto) the
critical band rate (z) in Bark can be determined by

z=[13*arctan(0.76*f)+3.5*arctan(f^2/56.25)]

where f is in kHz and the angles returned from the arctangent
expressions are in radians. When z is an integer, f is the dividing
line frequency between two critical bands.

If the frequency corresponding to a particular Bark (z) is desired,
use the following:

f={[(exp(0.219*z)/352)+0.1]*z-0.032*exp{-0.15*(z-5)^2]}

where f is in kHz.

Finally, the critical bandwith (df) can be calculated for a given
center frequency (f) by

df={25+75*[1+1.4*(f^2)]^0.69}

where f is in kHz and df is in Hz.

There are no explicitly stated limits on the variables, but according to
the table that Mr. Lubman generated from the formulas, 1<=z<=24 for Bark,
and 20<=f<=15500 for frequency, except 50<=f<=13500 for the center
frequencies. (df) ranges from 100 Hz to 3500 Hz.

Also note that these formulas are generally accepted approximations but,
as far as I know, are not yet standardized. I believe they have all been
empirically derived.

Calculation of psychoacoustic Loudness steady-state sounds is defined in
ISO 532, ISO Rec. 675, and DIN 45631.

Extension to non-steady sounds was defined by Zwicker but is not yet
standardized (as of 1992).


______________________________________________________________________________

                              Masking effects
______________________________________________________________________________


From: Vincent Pagel <Vincent.Pagel@loria.fr>
--------------------------------------------------------------------------

[...]

About curves corresponding to the masking effect:

Those curves show the minimal intensity a sound with a given frequency
must have to be perceived, when played simultaneously with a sound having
a constant frequency during the experiment ( e.g. let's say that you want
to find out the masking effect of a 500 Hz frequency .... you'll play it
for example a 50 db ....and at the same time you'll play another frequency
and you adjust the level of the second frequency to find out the limen
where it is perceived. For example a sound played at 1000 Hz have to be
louder than a sound at 700 Hz, because it's an harmonic of the masking
frequency of 500 Hz ).


______________________________________________________________________________

                   Psychoacoustic norm / Timbral Metric
______________________________________________________________________________


From: Fahey@psyvax.psy.utexas.edu (Richard Fahey)
--------------------------------------------------------------------------

These curves [Fletcher-Munson again...--AK] may be used to normalize
spectra for loudness at different frequencies (changing dB into phons),
and with a further change into sones one obtains a loudness density plot.

The plot can be made more psychologically real by changing the frequency
scale to the Bark scale, and using an auditory filter to smear the spectrum.

The distance between two spectra represented in ways similar to this can be
calculated as a Euclidean distance, and compared with psychoacoustic data.

From: James Beauchamp <beaucham@uxh.cso.uiuc.edu>
--------------------------------------------------------------------------

Here, we are comparing two time-varying spectra which are very similar
to one another.

This would be used to measure the efficiency of a particular synthesis
technique.  Our first guess was to use :

	             SUM(k=1 to n) ((A2(t,k) - A1(t,k))^2
	e(t) = sqrt( ------------------------------------ )
	                  SUM(k=1 to n) A1(t,k)^2

which gives a normalized difference (error) vs. time.   k is the partial number
t is time, and A1(t,k) and A2(t,k) are the kth partial amplitudes vs. time for
signals s1(t) and s2(t).  Then the average error over time is given by

	e_ave = (1/DUR) SUM(t=0 to DUR) e(t)

The theory is that given two syntheses of signal s1, namely s2 and s3, s2 is a
better synthesis of s1 than is s3 if e_ave_2 < e_ave_3.  This formulation
seems to work fairly well, but it really fails when a synthesis has weak upper
partials not found in the original.  The weak upper partials contribute very
little to the error calculation, but make a big difference in the perceived
result.  Therefore, it would probably be much better to add up the amplitudes
within critical bands than to give all frequencies equal weights as we have
been doing, and also to use an amplitude-to-loudness (in sones) translation.
(Usually, S = K*A^0.6).

The problem with equalizing the A(k,t) using the Fletcher-Munson curves is that
one doesn't really know the absolute level of a given sound prior to playing it
back, except in a lab testing situation, perhaps. Thus, the difference result
would vary with playback level, an uncomfortable situation.


From: Richard Parncutt <parncutt@sound.music.mcgill.ca>
-------------------------------------------------------------------------

The psychoacoustic distance between two steady state complex sounds
(or its converse, perceived similarity) is influenced by a number of
factors, including similarity of loudess, timbral similarity, and the
degree to which the sounds have pitches in common (where by "pitch" I
mean PERCEIVED pitch in the psychoacoustic sense.)

Terhardt (1972) distinguished two kinds of pitch. Spectral pitches
correspond to individual audible pure-tone components. Virtual
pitches correspond to groups of audible pure-tone components whose
frequencies form an approximately harmonic pattern, suggesting the
presence of an (embedded) harmonic-complex tone. Most pitches
perceived in everyday and musical sounds are virtual pitches. The
relative perceptual salience of pitches may be estimated by the
algorithm of Terhardt et al. (1982).

Parncutt (1989) defined the pitch commonality of two complex sounds
as the extent to which they have perceived pitches in common,
depending on the number and salience of coinciding pitches (by
comparison to non-coinciding pitches). Calculated pitch commonality
values correlate well with similarity judgments of pairs of complex
sounds that differ relatively little in loudness and timbre
(Parncutt, 1989, 1993), and with music-theoretic accounts of the
strength of harmonic relationship between musical tones and chords
(Parncutt, 1989).


From: Christopher John Rolfe <rolfe@sfu.ca>
-------------------------------------------------------------------------

Metrics have a long tradition in the literature, beginning
with Fechner in the 19th Century. Cognitive science, however, points
out that perceptual space may be non-Euclidean. In other words, there
is NO simple metric.


______________________________________________________________________________

                            References / Books
______________________________________________________________________________


"Loudness: its definition, measurement, and calculation, Journal of the
Acoustical Society of America, 1933, vol 5, p 9.

Author: Fry R.B.  PhD Dissertation, Duke University
Title: Measurement of Specific Sequence Effects in Loudness Perception
Date: 1981

Author: Lane H.L., Catania A.C., Stevens S.S.
Title: Voice Level: Autophonic Scale, Perceived Loudness, and Effects of
Sidetone
Journal: JASA
Volume: 33
Number: 2
Page(s): 160-167
Date: 1961

Author: Peterson G E, McKinney N P
Title: The measurement of speech power
Journal: Phonetica
Volume: 7
Page(s): 65-84
Date: 1961

Author: Schlauch R.S., Wier C.C.
Title: A Method for Relating Loudness-Matching and Intensity-Discrimination
Data
Journal: Journal of Speech and Hearing Research
Volume: 30
Page(s): 13-20
Date: 1987

Author: Small AM, Brandt JF, Cox PG
Title: [...?] function of signal duration
Journal: JASA
Volume: 34
Page(s): 513-514
Date: 1962

Author: Stevens S.S.
Title: Calculation of the Loudness of Complex Noise
Journal: JASA
Volume: 28
Number: 5
Page(s): 807-832
Date: 1956

Handel, S. (1989).  "Listening: an introduction to the perception of
auditory events." MIT, Cambridge, MA

Dooling, R. J. and Hulse, S. H. (ed.) (1989).  The comparative
psychologoy of audition: Perceiving complex sounds.  Erlbaum, Hillsdale, NJ.

McAdams, S. and Bigand, E. (ed.) (1993).  Thinking in sound: the
cognitive psychology of human audition. Oxford Univ. Press, NY

Sloboda, J. A. (1985).  The musical mind: The cognitive psychology of
music.  Clarendon, Oxford

Proceedings of IEEE, V. 81, No 10 ,"Signal Compression Based on Models
of Human Perception".

Grey, J.M. "Multidimensional Perceptual Scaling of Musical Timbres"
Journal of the Acoustical Soceiety of America, 63, 1493-1500.

Repp, B.H (1984) "Categorical perception: Issues, methods, findings"
In N.J. Lass (ed.) Speech and Language: Advances in Basic
Research and Practice. Vol. 10. 1249-1257.

Moore and Glasberg, JASA 74(3) 1983. "Suggested formulae for calculating
auditory-filter bandwidths and excitation patterns"

Bladon and Lindblom, JASA 69(5) 1981. "Modeling the judgement of vowel
quality differences"

J. R. Pierce, The Science of Musical Sound (Freenam, New York, 1983).

J. G. Roederer, Introduction to the Physics and Psychophysics of Music
(Springer-Verlag, New York, 1975).

S. S. Stevens, "Measurement of Loudness", JASA 27 (1955): 815

S. S. Stevens, "Neural Events ans Psyhcophysical Law", _Science 170_
(1970): 1043

E. Zwicker, G. Flottorp, and S. S. Stevens, "Critical Bandwidth in Loudness
Summation",  JASA 29 (1957): 548

Author:Hynek Hermansky
Institution:Speech Technology Laboratory, Division of Panasonic
Technologies, Inc., 3888 State Street, Santa Barbara, CA 93105, USA
Title:Perceptual linear predictive ({PLP}) analysis of speech},
Journal: JASA
Year:1990
Vol.87 ,Number 4 , Page(s):1738-1752

Gersho et al (Bark Spectral Distance).
IEEE Journal Selected areas of Communications Sept. (?) 1992


Name:    "An Introduction to the Physiology of Hearing"
Author:  James O. Pickles,Dept. of Physiology,Uni. Birmingham,England.
Publisher: Academic Press,1982.
ISBN 0-12-554750-1 (hardback)
ISBN 0-12-554752-8 (paperback).

"An introduction to the psychology of hearing" by B. MOORE , 3d Edition.

Terhardt, E. (1972). Zur Tonhoehenwahrnehmung von Klaengen
(Perception of the pitch of complex tones). Acustica, 26, 173-199.

Terhardt, E., Stoll, G., & Seewann, M. (1982). Algorithm for
extraction of pitch and pitch salience from complex tonal signals.
Journal of the Acoustical Society of America, 71, 679-688.

[ The following papers are from Richard Parncutt
  (parncutt@sound.music.mcgill.ca) -- AK ]

Bigand, E., Parncutt, R., & Lerdahl, F. (under review). Perception of
musical tension in short chord sequences: The influence of harmonic
function, sensory dissonance, horizontal motion, and musical
training.  Perception and Psychophysics.

Parncutt, R. (1993). Pitch properties of chords of octave-spaced
tones. Contemporary Music Review, 9, 35-50.

Parncutt, R. (1989). Harmony: A Psychoacoustical Approach.
Springer-Verlag, Berlin. (Springer Series in Information Sciences,
Vol. 19. Eds.: T.S. Huang & M.R. Schroeder. ISBN 3-540-51279-9. 218
pages, 22 figs.)

Stoll, G., & Parncutt, R. (1987). Harmonic relationship in similarity
judgments of nonsimultaneous complex tones. Acustica, 63, 111-119.

Terhardt, E., Stoll., G., Schermbach, R., & Parncutt, R. (1986).
Tonhoehenmehrdeutigkeit, Tonverwandschaft und Identifikation von
Sukzessivintervallen (Pitch ambiguity, harmonic relationship, and
melodic interval identification). Acustica, 61, 57-66.

Parncutt, R. (1989). Harmony. A psychoacoustical approach.
Heidelberg: Springer-Verlag.

Parncutt, R. (1993). Pitch properties of chords of octave-spaced
tones. Contemporary Music Review, 9, 35-50.

______________________________________________________________________________

-- 
      ____________________________      __________________________________
     /                           /\    /                                 /\  
    /   Argiris A. Kranidiotis _/ /\  /       E-mail (Internet):       _/ /\ 
   /  University Of Athens    / \/   /                                / \/  
  / Informatics Department    /\    /  akra@zeus.di.uoa.ariadne-t.gr  /\    
 /___________________________/ /   /_________________________________/ /     
 \___________________________\/    \_________________________________\/      
  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \     \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \