Notes from Yost, Chapter 10: Auditory sensitivity 31 March 2004

 

The first step in understanding perception is to understand sensitivity to frequency, amplitude, and starting phase writes Yost, and the hope is that understanding how the system responds to sinusoids may provide a complete picture since “all acoustic stimuli can be defined in terms of sinusoids.” But then he goes on to say it is not that simple. For example, when a tone is presented to one ear we are insensitive to starting phase, though we are sensitive to the phase relations of two tones presented simultaneously (this leads to beats), and we are sensitive to phase for a single tone presented to both ears (and this lead to the perception of sound location).

 

The first measure of sensitivity is the smallest stimulus level required for detection is the threshold for audibility, sometimes called the absolute threshold. But does this mean detection 100% of the time? A significant (more than chance) percentage of detection? Most often it is neither of these possibilities. Thresholds may be defined according to a number of the psychophysical methods given in the Appendix D. Several psychometric functions are given in Figure 10.1, and Figure 10.2 gives several different sorts of thresholds across different frequencies. The different curves given in Figure 10.2 (and Table 10.1) are especially of interest to the audiologist, who must standardize the audiogram tests in order to compare hearing and hearing loss across individuals. There are detailed differences between collecting thresholds with ear phones vs. in the free field in part because the ear phones do not take advantages of the resonance of the outer ear and also because we hear bone conducted internal noises more easily when the ear canal is blocked off. From a practical standpoint, however, these procedural effects seem not to be really important for diagnoses about a single individual, as clinical losses of 10 dB to perhaps 20 dB are not generally thought of as being serious; however, in the clinic it is important to standardize. It is important also if one is trying to determine if particular work conditions are producing a temporary hearing loss (many audiologists work in industrial settings.) In addition to standardizing the procedures it is also very important to standardize the stimuli, which requires special care with calibration of test equipment.

 

Absolute thresholds depend very much on frequency, but the thresholds for pain and discomfort do not. Also, while loudness at low levels depends on frequency (being greatest at mid frequencies, less at low and high frequencies), but loudness at high levels is not so affected. These effects on loudness may in part result because high levels of stimulation affect most of the basilar membrane, and the effects of "very-very" high levels affects cutaneous/pain receptors in the middle ear and the tympanic membrane, and perhaps in the cochlea. Most of the details in this first section are important only to those who may be going into audiology (page 150, the supra-aural phones and the 6-cc coupler, for example); but as a piece of trivia, it is impressive that we can hear a displacement of the tympanic membrane that is the diameter of a hydrogen atom (page 152).

 

The little section on duration is interesting, and to some extent unsurprising, though there is a nice point to be made about the difference between using units of power or energy in figuring out what happens to perception with an increase in duration. Thus, when duration is increased but total energy is held constant, then the power must decrease with duration. Up to some duration this does not make a difference and we detect a brief stimulus on the basis of its constant energy. This means that the ear (probably the brain, not the ear) is able to integrate energy over relatively brief periods of time. However, with longer duration’s (above 100 to 200 ms) the energy gets spread out over too long a time, longer than the brains integration time. Then the threshold absolute threshold increases, that is, we need more energy in the signal in order to hear it. On the other hand, if power (energy per unit time) is constant, then with increasing duration the threshold drops, until it comes to an asymptotic, that again has to do with how long a time the system can integrate energy without it "leaking out". Figure 10.3 is a hard graph to figure out because it compares the threshold as a function of duration for some particular frequency with that of a 1 second duration tone. But the main conclusion is that there is a range of effects over which a log unit change in duration (i.e., a ten-fold increase in time, = 10 dB in temporal units) leads to about a log unit change in power (i.e., 10 dB in power units): it makes sense, if the ear is a perfect integrator of energy in this range.

 

This approximate constant effect of energy in yielding detection thresholds holds only between limits of about 10 ms to about 200 ms, for a couple of reasons. For short durations, below about 10 ms, the energy at one frequency is scattered over a wide frequency range (think of the Fourier analysis of the brief signal, Figure 4.3 in Yost for example), and because the energy is spread over the entire basilar membrane any one small patch of hair cells may not be excited up to a threshold response. Then for long durations, about 200 ms for weak stimuli, the duration for temporal summation is exceeded and the energy leaks out of the auditory system. The integration time depends also on the level of the stimulus: for the absolute threshold (that is, the detection of the presence of a stimulus on some proportion of the times it is presented, say 50%) it will be 200 to possibly 300 ms; for higher levels it will be much lower, less that 100 ms, maybe less than 10 ms – and at a high level a long duration stimulus gets to be very annoying, as well as potentially causing hearing loss! The practical lesson for audiologists is they must be careful in controlling the duration of a test tone, but they do not have to be very careful - if they try to keep the duration at least 500 ms or so, they are certainly working with a comfortable margin for error for near threshold detection levels. (It is more important that the audiologist does not provide subtle cues that a stimulus is about to be delivered, because then the patient may learn to anticipate the stimulus. Listeners often hear what they expect to hear, even if there is no stimulus.)

 

Yost begins then the next section concerned with differential sensitivity for frequency and for intensity. This study is mostly related to the Weber-Fechner Law, which specifies how what we hear depends on the background level. The Law was described in Fechner's “Elements of Psychophysics” (1860). Fechner wrote that the threshold for discrimination of some stimulus from a standard along a dimension is a constant proportion of the measure of the standard, so that ∆S/S = K. In words, if we can just detect a 1 unit increment (∆S = 1) in a 10 unit standard (S = 10), then we should also be able to just detect a 5 unit increment in a 50 unit standard: The ratio 1/10 = 5/50= .10. Yost calls our attention to something else as well, which is that in these experiments we might hear other things besides the change in the dimension of interest -- an abrupt change in the level of a tone can produce a click for example, which is a spectral shift. Unless the experimenter is careful to prevent off-frequency listening, then subjects might pick up on the click, even if they cannot hear the pure tone. Experimenters often use a masking noise to obscure frequencies that are not of interest in order to prevent off-frequency listening.

 

Psychophysical experiments tend to be very boring for the listener. In fact, William James wrote of the "science of psychophysics" in 1890 “This method taxes patience to the utmost, and could hardly have arisen in a country whose natives could be bored. Such Germans as Weber, Fechner, Vierordt, and Wundt obviously cannot....” (in Principles of Psychology, 1890, Volume 1, page 192). But also these experiment are more difficult to do cleanly than would first appear, that is, to be sure that the stimulus is just what it is supposed to be, and neither less nor more.

 

For frequency discrimination (Figure 10.5) we can see that over a range of low to moderate frequencies (200 up to 1000 or maybe up to 2000 Hz) the Weber-Fechner Law does not hold because ∆F (rather than ∆F /F) is a constant and is close to 1 Hz. However, in the mid-range above 1000 Hz then ∆F/F is a constant, as the W-F Law expects, and so ∆F is changing as F increases. In general ∆F is not a function of stimulus level, except at low levels, below about 20 dB SL (remember, SL is a measure of stimulus level relative to the persons individual threshold at that frequency). It is possible also that ∆F/F gets bigger at very high levels. This last bit might make sense, given the way the basilar membrane responds to high level input (look at the right hand side of Fig. 10.5), but the measured spread of BM movement to a high level tone (look at Figure 9.4 for example) is much too widespread, compared to how good our acuity is at high levels. Figure 10.6 gives a different view of the same data, and spreads out the small differences in ∆F/F along the seemingly straight line from 400 Hz to 2000 Hz all at a level of 40 dB SL. Here we see that the Weber fraction ∆S/S varies from about .005 at 200 Hz (we may detect a change of about 1 Hz against a base of 200 Hz); to about .0025 at 400 Hz (again, we may detect a change of about 1 Hz but now around 400 Hz); to about .002 at 1000 Hz (now about 2 Hz); and .002 at 2000 Hz (now about 4 Hz) and then at 5000 and 10000 Hz the Weber-Fechner ratio starts to climb to much larger values, close to .01 (which at 10,000 Hz would be 100 Hz).

 

It would be interesting to know how these differences in ∆F /F translate into differences in neural capacity at different frequencies, but unfortunately Yost does not help us to think about what this difference in the threshold means. One hypothesis is that it might have to do with the density of haircells that are assigned to different frequencies, there being roughly a logarithmic relation between the number of haircells assigned to a frequency and the frequency itself. However, the same logarithmic relationship seems to hold for very low frequencies, but here ∆F is constant. Possibly the apparently wider tuning curves for these very low frequencies may contribute to this effect (see Fig. 9.5), but that still leaves the change in the relative width of the tuning curves to be explained. A second hypothesis is that the different threshold values mean some frequencies are coded by temporal events, and so the number of haircells assigned to a narrow frequency range is not a critical factor in the judgment. This could be particularly relevant at very high frequencies, beyond the 4 kHz upper limit on phase locking (which is not all that good above 2 kHz, for that matter).

 

For intensity discrimination (Figure 10.7) we see a very modest change in the difference limen for intensity with a white noise stimulus of about 1 dB over a wide range of 10 to 80 dB: So this is a near-constant 1 dB, but now remember, the dB scale is already a ratio measure, of two different intensities. For pure tones the functions shown in Figure 10.7 are almost but not quite constant, varying over a range of about 1 dB for most frequencies (note that there is a big change in the increment threshold for 8 kHz from about 5 to 10 dB, but this is dB SPL, and so it is likely that 5 dB 8kHz was not heard very well). Again it would be interesting to speculate about why increment thresholds are about, but not quite constant across frequency. Presumably we are picking up differences in loudness, which we might assumed is related to the rate of firing in the auditory nerve. Given the relationship between firing rate and stimulus level seen in the auditory nerve (Figure 9.2 for example) then intuitively it would be thought that it should be easier to pick up changes at low stimulus levels, not at high levels, because of saturation in the typical auditory nerve fiber. A more complicated idea is that somehow the central auditory system could be able to modulate or adapt to the prevailing external intensity so as to maintain a constant low level of firing, so that any change in the prevailing external rate is picked out against this adapted level of responding. Perhaps this "adaptive calibration" of the background level is the task of the complicated inhibitory networks in the DCN that produce cells with side band inhibition (Figure 15.14).

 

Temporal discrimination is done in a variety of ways, the apparently easiest procedure, that would ask for a discrimination between two tones having different durations, being not the best method because it would produce loudness differences and also spectral differences dues to transient splatter if the stimuli were very brief. Yost describes one way of measuring temporal discrimination as using a procedure called "gap detection" (which for most people is usually reserved as a measure of temporal acuity). In the typical gap detection experiment the subject is presented with a brief sound, usually a noise burst, on two occasions, one when the noise in continuous, and one when the sound includes a brief quiet period. So this could be seen as asked for a perception of one noise burst vs. two noise bursts, but normally the gap is so short that it sounds just like a tiny glitch. (Of course the two different types of trials have to be carefully controlled so that they seem equally loud; and noise is usually used because the brief gap would produce a spectral change, that would, however, be masked by the gap.)

 

Yost talks about gap detection as if the tones on the edges served as markers for times, which is reasonable; and in general we note that in Figure 10.8 the ∆ S/S function is more or less linear: as the duration of the standard gap is increased, say, from 10 ms to 100 ms to 1000 ms, then the change in duration that we can discriminate from the standard also increases (estimating from the values given in Figure 10.8, the threshold changes from 3 ms, to 10 ms, to 70 ms). But this increase obviously doesn’t follow the Weber-Fechner rule, because these numbers as proportions are 30%, 10% and 7%. Why might this be? I went back to the original article to find out how the author thought about these data, which led on a very long chase, as "Abel (1971)" is not in the bibliography. Eventually I found an abstract by S. Abel who presented a paper at a convention in 1971, that seemed to be on this topic, and then eventually found that Sharon M. Abel published a paper in JASA, 1972, called "discrimination of temporal gaps" which is obviously the right publication. However, the data are not those presented in Figure 10.8, but are very close to them in outcome, though the procedure is different from that described by Yost. (In fact Yost appears to have mistakenly used Figure 1 from a paper Sharon Abel published earlier in 1972, on the duration of noise stimuli.) She actually used gaps between two noise bursts for her experiment, putting standard gaps between the noise bursts ranging from 0.63 ms to 640 ms, in 11 log steps. Then in a 2AFC experiment she presented one of the standard gaps and then a test comparison gap that was variable, and got the threshold for the judgment of "longer." She presents data as ∆T in ms, and as ∆T/T as a ratio. She used different durations of noise at different levels but the overall shapes of the curves were about the same. To a first approximation there were two linear functions describing the ratio outcome: a decreasing line from about 4/1 down to about 1/1 ms in duration for gaps of .63, 1.25 and 2.5 ms; and then an increasing straight line from about 1/2 at 5 ms (an increase of about 2.5 ms is necessary to say that the gap in longer than 5 ms) on out to a ratio of 1/5 at 640 ms (so it takes an increase of about 130 ms to say that a gap is longer than 640 ms). So the Weber function doesn't work at all. The interesting part of this task is trying to understand how we can do it at all, let along come up with a reasonably straight line (in a log/log plot) that describes our data. For durations of noise we can imagine that the nervous system counts spikes; but here we are discriminating (approximately) 300 ms of silence from 360 ms of silence: what are we counting?

 

The last part of this section is on “temporal modulation transfer functions” which we have seen before in electrophysiological work in the inferior colliculus. In these experiments a carrier sound (in Figure 10.9 a white noise) is often modulated by a sinusoid (in Figure 10.9 apparently one with a frequency of 100 per second); and the depth of modulation is changed from 0 to 100%. The experiment is run under a lot of conditions so that the subject is always trying to detect which of 2 stimuli is modulated when within a series a particular AM frequency is given and the depth of modulation is changed from 0 to some threshold value. In Figure 10.10 a standard way of writing the abscissa is used, as 20 log m (m = Modulation depth from 1 = 100% etc.) so that when MD = 100% then m = 1, and 20 log 1 = 0; and when MD = 10% m = .1 and 20 log (.1) = - 20 etc. We can see in Figure 10.10 that the threshold is at a modulation depth of about 5% through about a modulation frequency of 20 Hz, and then drops rapidly to near 50% at rates of around 700 Hz. This sensory function is not substantially different from data that could be obtained in the inferior colliculus or in the auditory cortex of a laboratory mammal, looking at single units or evoked potentials for the same type of stimuli.