Notes from Yost Chapter 12: Binaural hearing pp. 179-192
7 April 2003
We have already covered in class many of the points made in this chapter. Yost begins with the concept of localization, which refers to all three dimensions: the horizontal plane (azimuth); the vertical plane; and distance (Figure 12.1). Spatial location is not directly given by the stimuli, but depends on our ability to process temporal, level, and spectral cues to obtain spatial correlates.
Localization along the azimuth depends to a large extent on the binaural (two-ears) auditory system (Figures 12.2 and 12.3). This of course refers to interaural differences in time and stimulus level (ITD and ILD), though Yost most reasonably distinguishes between differences as time of arrival, that is the same at all frequencies (or almost the same -- low frequencies take a slightly longer path around the head), and interaural phase differences, which vary across frequencies: he gives the example of a 1 kHz tone arriving at the two ears 180 degrees out of phase (.5 msec) which is for a 2000 Hz tone 360 degrees out of phase (or, no difference at all) and for a 500 Hz tone, 45 degrees out of phase. He notes too that there are two sources of the interaural level differences, one being the additional difference traveled between the two ears (which is negligible) and the other being the result of the sound shadow, which we know is a high frequency effect. Figure 12.3 is complicated but will repay study, in its showing the interaural time differences that result from different locations as a function of frequency (at the top) and the inter-ear level differences that result at different frequencies across the azimuth. Figure 12.4 shows that listeners make most errors at about 2 kHz, but note that the graph is a little distorted and misleading Ð one way of looking at these data is to suggest that we double our error rate as the frequency changes from about 60 Hz to 2 kHz and then back down to 4 kHz, which we do, but another way is to think that, OK, we are great at low and high frequencies, at about 11% errors, and not too much different at 2 kHz, at 20% errors. The Stevens and Newman data of 1936 that are captured in this graph are classic, and obtained when the authors were sitting surrounded by moving loudspeakers, possibly blindfolded, on top of a very tall chair on top of a very tall building on the Harvard campus.
Figure 12.5 is for a blindfolded subject locating a white noise burst ranging along the azimuth, across a range of 300 degrees. In general the judgments are very accurate though it does look as if there is some error around 90 to 120 degrees left and right. The next section deals with a subject's ability to tell a difference between two locations, which is a slightly different task. The experiment tries to determine the "minimum audible angle" which is the threshold separation as shown in Figure 12.6. Figure 12.7 shows this function for different tonal frequencies, and indicates that the MAA is best around the 0 degree position (directly in front) compared to the side, and that it is poor for frequencies around 1500 to 2000 Hz, and then poor for high frequencies around 8 kHz. The hypothesis, which is reasonable, is that the middle frequencies where the MAA is large are between the low frequencies that are good for IPD and the high frequencies that are good for ILD: but the subsequent loss at high frequencies is a puzzle to me: perhaps the listener had a high frequency hearing loss? Localization in three dimensions as shown by Wightman and Kistler in Figure 12.9 is not as good as localization along the azimuth (horizontal). Yost points out that this task is done by spectral cues, which depend on the component frequencies of a complex sound being partially and differentially reflected off the ear and the head. As we know, this allows us to locate complex sounds by monaural as well as binaural cues, and takes us back to studies by Plenge, by the Gardners, and by Butler and Belendiuk on the role of the pinna in "coloring" the spectrum of a complex auditory event.
Distance is an obvious fact of space, not so obvious a fact for auditory localization. Of course if we recognize sounds than we have a sense of how loud the sound object should be according to distance – the train whistle in the distance for example. A more subtle cue for a complex object is that the long wave lengths travel better than short wave lengths (which are being reflected back by any little object in their way) and so low frequency sounds tend to be thought of as more distant, again if we recognize the object and what it should sound like. In most listening environments we are beset by echoes that would be thought to confuse localization. As we have noted before, there are neural mechanisms that subdue these echoes, and it works out that only the first wave front is used for localization. This is called "The law of the first wave front" or the "Precedence effect" in localization experiments.
The next section has to do with "lateralization" which is the illusion of stimuli with sources within the head, towards one ear of the other. We know that this is a headphone effect, that results because head phones do not provide the subtle shift in the pattern of frequencies for compound stimuli normally provided by the head and the external ear (Plenge again). The fusion of an image depends on the two ears being stimulated at meaningful time differences at reasonably close frequencies. If the time differences are large enough (2 ms perhaps) and the frequencies not the same, then two separate stimuli are heard. Figure 12.10 is complicated. Stimuli are being presented separately to the two ears over head phones, in the top have various phase degrees as the base: say 0 degrees phase difference between the two ears, which would be the equivalent of straight ahead for all frequencies; to 45 and 90 degrees phase differences and then 135 degrees and 180 degrees, whose equivalent positions in space are going to depend on the frequency of the tone. And then the just noticeable phase difference is determined across frequency. Not surprisingly, low frequency stimuli are best for this (where degrees are bigger time differences) and in general as the degrees of starting phase difference increase then so does the difference limen. This means we are best at discriminating spatial locations when we are looking toward the sound source, so it is 0 degrees off at the two ears. In part b (lower graph) level differences are manipulated for different frequencies, with the parameter being the starting level of one of the inputs, and the task is to find the difference limen for different level differences in the two ears. The loud tone is displaced towards the side of the head from the midline. If we start with a 0 dB level difference we can detect a change of about .5 dB between the two ears, while for a 15 dB difference it takes about 1.5 dB: again we are most sensitive around the straight ahead position where the two ears pick up the same stimulus level.. Also note that this is worst around 1 kHz. The reason for this is not clear (to me anyway), but it is a bit curious: in fact the normal case in the free field for low frequency stimuli would be to never have a level difference in the two ears, unless one were going deaf in one ear! So most of the data here are for artificial stimulus presentations.
The next point is that the binaural timing system is sensitive to amplitude modulation of high frequency tones so that if a tone is modulated at for example, 100 Hz then it can be accurately located even if it is a high frequency tone, though perhaps not quite as good as a 100 Hz tone. Yost treats the hypothesis that phase locking is responsible for localization by IPD only briefly, but indicates at least that this is a plausible explanation. Physiological data support this hypothesis.
A little section is given to differences between localization and lateralization, which we have taken up before in Plenge's work. Here Yost treats a more recent paper, a review written by Wightman and colleagues as the source of his treatment. These days this sort of approach is used for "virtual reality experiments" in which head related transfer functions are run through a computer, so that the input to head phones can be suitably altered as the listeners move their heads.
The last section is binaural masking, which is very important. The critical experimental paradigm independently varies the input to the two ears. On page 190 we see this as a monotic condition = 1 ear; a diotic condition, which is 2-eared with the same stimulus in both ears; and a dichotic condition in which different signals are put into the two ears. Masked thresholds are the same for monotic and diotic conditions (it doesn't help to use two ears when both have the same signal-to-noise ratio); but for dichotic conditions the masked thresholds are lower. There follows on page 190 a list of different ways of presenting the maskers and signals to the two ears, variously in phase and out of phase: note the 7 conditions. So it could be that a masker is in phase at the two ears (hence M0) while the signal is out of phase (S ). Note that if the two stimuli are in phase the stimuli would appear to be fused and straight ahead, but if they are out of phase the stimuli could be perceived separately at the two ears. In this case there would be less masking compared to monaural conditions or to binaural conditions with both in phase (or out of phase). This difference is called the Masking Level Difference and it can be substantial (see Table 12.2). This is responsible, it is thought, for the Cherry "cocktail party phenomenon", in which we can pick signals out of a background noise if they are spatially dispersed. Like localization findings in general, the effect depends on the frequency of the signal, and is reduced at high frequencies (see Figure 12.11). This binaural masking level difference was discovered by Licklider, who was a U of R graduate from about 1937. He also is given credit for coming up with the idea of the Internet (which he named). He provided the funding for the development of the Internet when he was the head of the Advanced Research Projects Program in the Defense Department (he did however interest former congressman Al Gore in this work, who did in fact funnel funds into DARPA during his congressional days to foster the development of the Internet.