Notes from Yost, Chapter 10: Auditory sensitivity 27 March 2003
The first step in understanding perception is to understand sensitivity to frequency, amplitude, and starting phase writes Yost, and the hope is that understanding how the system responds to sinusoids may provide a complete picture since "all acoustic stimuli can be defined in terms of sinusoids." But then he goes on to say it is not that simple. For example, when a tone is presented to one ear we are insensitive to starting phase, though we are sensitive to the phase relations of two tones presented simultaneously (this leads to beats), and we are sensitive to phase for a single tone presented to both ears (and location).
The first measure of sensitivity is that of amplitude: the smallest stimulus level required for detection is the threshold for audibility. But does this mean 100% of the time? A significant (more than chance) percentage of detection? Thresholds may be defined according to a number of the psychophysical methods given in the Appendix D. Several psychometric functions are given in Figure 10.1, and Figure 10.2 gives several different sorts of thresholds across different frequencies. These different curves given in Figure 10.2 (and Table 10.1) are especially of interest to the audiologist, who must standardize the audiogram tests in order to compare hearing and hearing loss across individuals. There are detailed differences between collecting thresholds with ear phones vs. in the free field in part because the ear phones do not take advantages of the resonance of the outer ear and also because we hear bone conducted internal noises more easily when the ear canal is blocked off. From a practical standpoint these procedural effects seem not to be significant, as clinical losses of 10 dB to perhaps 20 dB are not generally thought of as being serious; however, in the clinic it is important to standardize, and it may be important as well, if one is trying to determine if particular work conditions are producing a temporary hearing loss (many audiologists work in industrial settings.)
The thresholds for pain and discomfort do not depend on frequency, which may be saying something important about the functioning of the ear, at least for discomfort (loudness is flat across frequency at high levels). It turns out as well that loudness judgments across frequency are very similar at high levels, but at low levels the mid frequencies seem much louder than the very low and the very high frequencies given at the same SPL. This may in part result because the very high levels of stimulation affect most of the basilar membrane, as well as affecting cutaneous/pain receptors in the middle ear and the tympanic membrane, and perhaps in the cochlea. Most of the details in this first section are important only to those who may be going into audiology (page 150, the supra-aural phones and the 6-cc coupler, for example); but as a piece of trivia, it is impressive that we can hear a displacement of the tympanic membrane that is the diameter of a hydrogen atom (page 152).
The little section on duration is interesting, and to some extent unsurprising, though there is a nice point to be made about the difference between using units of power or energy in figuring out what happens to perception with an increase in duration. Thus, with increased duration but equal energy then power must decrease with duration: and up to some duration this does not make a difference and we detect a brief stimulus on the basis of its constant energy: but with longer duration's (above 100 to 200 ms) the energy gets spread out over too long a time, and the threshold increases, that is, we need more energy in the signal in order to hear it. On the other hand, if power (energy per unit time) is constant, then with increasing duration the threshold drops, until it comes to an asymptote. Figure 10.3 is a hard graph to figure out because it compares the threshold as a function of duration for some particular frequency with that of a 1 second duration tone. But the main conclusion is that there is a range of effects over which a log unit change in duration (i.e., a ten-fold increase in time, = 10 dB) leads to about a log unit change in power (i.e., 10 dB): it makes sense, if the ear is a perfect integrator of energy. This approximate constant effect of energy in yielding detection thresholds holds only between limits of about 10 ms to about 200 ms, for a couple of reasons. For short durations, below about 10 ms, the energy at one frequency is scattered over a wide frequency range (think of the Fourier analysis of the brief signal), and because the energy is spread over the entire basilar membrane any one small patch of hair cells will not be excited up to a threshold response. Then for long durations, about 200 ms for weak stimuli, the duration for temporal summation is exceeded and the energy leaks out of the auditory system. The integration time depends also on the level of the stimulus: for the absolute threshold (that is, the detection of the presence of a stimulus on some proportion of the times it is presented, say 50%) it will be 200 to possibly 300 ms; for higher levels it will be much lower, less that 100 ms, maybe less than 10 ms Ð and long duration high level stimuli get to be very annoying! The practical lesson for audiologists is they must be careful in controlling the duration of a test tone, but they do not have to be very careful - if they try to keep the duration at least 500 ms or so, they are certainly working with a comfortable margin for error for near threshold detection levels. (It is more important that the person being testing does not come to anticipate when the stimulus is about to be presented, as people often hear what they expect to hear!)
Yost then begins to talk about differential sensitivity for frequency and for intensity, and relates these to the Weber-Fechner Law: This Law, described in Fechner's "Elements of Psychophysics" published in 1860, states that the threshold for discrimination of some stimulus from a standard along a dimension is a constant proportion of the measure of the standard, so that ΔS/S = K. In words, if we can just detect a 1 unit increment (ΔS = 1) in a 10 unit standard (S = 10), then we should also be able to just detect a 5 unit increment in a 50 unit standard: 1/10 = 5/50= .10. Yost calls our attention to something else as well, which is that in these sorts of experiments we might hear other things besides the change in the dimension of interest -- an abrupt change in the level of a tone can produce a click for example, which is a spectral shift, and unless the experimenter is careful to prevent off-frequency listening, then subjects might pick up on the click, even if they cannot hear the pure tone. Experimenters often use a masking noise at frequencies that are not of interest in order to prevent off-frequency listening. Psychophysical experiments are very boring for the subject. In fact, William James wrote of psychophysics in 1890 "This method taxes patience to the utmost, and could hardly have arised in a country whose natives could be bored. Such Germans as Weber, Fechner, Vierordt, and Wundt obviously cannot...." (in Principles of Psychology, 1890, Volume 1, page 192). But also they are more difficult to do cleanly than would first appear, that is, to be sure that the stimulus is just what it is supposed to be, and neither less nor more.
For frequency discrimination (Figure 10.5) we can see that over a range of low to moderate frequencies (200 up to 1000 or maybe up to 2000 Hz) the Weber-Fechner Law does not hold because ΔF (rather than ΔF /F) is a constant and is close to 1 Hz, while in the mid-range above 1000 Hz then ΔF/F is a constant and so ΔF is changing as F increases. In general also, ΔF is a function of stimulus level, at least up to about 20 dB, so that we are better at detecting frequency differences for mid-level stimuli, compared to either very quiet, and perhaps very loud stimuli. (Perhaps this makes sense, given the way the basilar membrane responds to high level input, but this spread of BM movement to a high level tone is much too widespread, compared to how good our acuity is at high levels.). Figure 10.6 gives a different view of the same data, and spreads out the small differences along the seemingly straight line from 2 kHz down for 40 dB. Here we see that the Weber fraction ΔS/S varies from about .005 at 200 Hz (we may detect a change of about 1 Hz against a base of 200 Hz); to about .0025 at 400 Hz (again, we may detect a change of about 1 Hz but now around 400 Hz); to about .002 at 1000 Hz (now about 2 Hz); and .002 at 2000 Hz (now about 4 Hz) and then at 5000 and 10000 Hz the Weber-Fechner ratio starts to climb to much larger values, close to .01 (which at 10000 Hz would be 100 Hz).
It would be interesting to speculate how this might translate into differences in neural capacity at different frequencies, but unfortunately Yost does not help us to think about what this difference in the threshold means. One hypothesis is that it might have to do with the density of haircells that are assigned to different frequencies, there being roughly a logarithmic relation between the number of haircells assigned to a frequency and the frequency itself. However, the same logarithmic relationship seems to hold for low frequencies, but here ΔF/F is constant. A second hypothesis is that the different threshold values mean some frequencies are coded by temporal events, and so the number of haircells assigned to a narrow frequency range is not a critical factor in the judgment.
For intensity discrimination (Figure 10.7) we see a very modest change in the difference limen for intensity with a white noise stimulus of about 1 dB over a wide range of 10 to 80 dB: but now remember, the dB scale is already a ratio measure. It is easier to detect a change at 80 dB than at 10 or 20 dB when the measure is in dB, and this directly confirms the Weber-Fechter rule, because the dB measure is already a ratio of two power or pressure values. For separate frequencies the curves are less stable, and vary over a range of about 1 dB. Again it would be interesting to speculate about why this may be so. If loudness is related to firing rate in the auditory nerve then intuitively it would be thought that it should be easier to pick up changes at low levels, not at high levels, because of saturation in the typical auditory nerve fiber for high stimulus levels. A more complicated thought is that somehow the auditory system at its higher levels is able to modulate or adapt to the prevailing external intensity so as to maintain a constant low level of firing; and then a change in the prevailing rate is picked up against this adaptation rate. Perhaps this is the function of the complicated cells in the DCN that show side band inhibition.
Temporal discrimination is done in a variety of ways, the easiest (two tones having different durations) being not the best because of loudness and scatter differences. So Yost talks about gap detection as if the tones on the edges served as markers for times, which is reasonable; and in general we note that in Figure 10.8 the Δ S/S function is more or less linear: as the duration of the standard gap is increased, say, from 10 ms to 100 ms to 1000 ms, then the change in duration that we can discriminate from the standard also increases (estimating from the values given in Figure 10.8, the threshold changes from 3 ms, to 10 ms, to 70 ms). But this increase obviously doesn't follow the Weber-Fechner rule, because these numbers as proportions are 30%, 10% and 7%. Why might this be? Unfortunately Yost does not speculate about this, but one could imagine that the neural representation of time changes for short vs. long intervals because it takes a while for the nervous system to suppress neural activity. If this were in general to be about 5 ms, then a 1 ms external pulse would really be 6 ms long; a 2 ms pulse would be 7 ms long, a 10 ms pulse would be 15 ms long and so forth. and then being able to detect a difference between a 1 vs. a 2 ms pulse would translate into discriminating between 6 and 7 ms periods of neural activity (= 16%); and detecting the difference between pulses of 10 ms and 13 ms, is the difference between neural representations of 15 and 18 ms (= 20%) and so forth. There is also a literature concerned with a construct called "The psychological moment" which suggests that the coding of momentary events is such that we think of them as lasting for some minimal period of time, maybe 50 ms or more.
The last part of this section is on "temporal modulation transfer functions" which we have seen before in electrophysiological work in the inferior colliculus. In these experiments a carrier sound (in Figure 10.9 a white noise) is often modulated by a sinusoid (in Figure 10.9 apparently one with a frequency of 100 per second); and the depth of modulation is changed from 0 to 100%. The experiment is run under a lot of conditions so that the subject is always trying to detect which of 2 stimuli is modulated when within a series a particular AM frequency is given and the depth of modulation is changed from 0 to some threshold value. In Figure 10.10 a standard way of writing the abscissa is used, as 20 log m (m = Modulation depth from 1 = 100% etc.) so that when MD = 100% then m = 1, and 20 log 1 = 0; and when MD = 10% m = .1 and 20 log (.1) = - 20 etc. We can see in Figure 10.10 that the threshold is at a modulation depth of about 5% through about a modulation frequency of 20 Hz, and then drops rapidly to near 50% at rates of around 700 Hz. This sensory function is not substantially different from functions that could be obtained in the inferior colliculus or in the auditory cortex.