Notes on Yost: Chapter 11, Masking. pp. 165-178.
2 April 2003
The practical interest in the topic of masking is that most stimuli are not heard in the pure conditions of the laboratory or clinic but in the context of other stimuli, and to a certain but variable extent, the two stimulus events will interact: so we should learn something about these interactions. It is interesting that the nervous system has evolved ways to limit the degree of interference -- binaural unmasking for example, and perhaps the signal to noise processing provided by the efferent system or by sideband inhibition. It is interesting too that under some conditions these neural mechanisms seem to break down. One of the problems of the aged listener, for example, is a decrease in the ability to separate signal from noise.
Yost began in the third edition with a phenomenon in tone-on-tone masking, which is a somewhat devious way of approaching the problem of our trying to determine whether a tone has increased in loudness or not (the "difference limen" of Chapter 10). Imagine that you are listening for an increase in the level of a tone of 70 dB. It may be that the difference limen is 1 dB -- so that you can hear the difference between 70 and 71 dB. But if one were to take one tone of 70 dB and add it to a second tone of the same frequency and phase then that second tone would have to be presented at about 64 dB in order to reach a compound level of 71 dB (you might try going back to Chapter 3 to work this out). One way of thinking about this is that the tone of 70 dB has then "masked" tones of that same frequency and phase of less than 64 dB, and so could be seen as being a simple case of tone on tone masking.
In the 4th edition Yost begins with presenting two tones of different frequencies: so we may be able to discriminate between tones of 1000 Hz and 1002 kHz; but one might ask, to what extent do the two tones interact? And this answer could be found out in the masking experiment: how does a tone of 1000 Hz mask a tone of 1002 Hz, for example? Yost asks us to think about a weak near-threshold level tone A, being presented simultaneously with tone B: and then increasing the level of tone B until it just masks our detection of tone A. If the level of tone B must be raised by a lot before A is masked then there is not much masking of B on A; or otherwise, if tone B must be raised just a little, then B provides a lot of masking of A. And then could follow this for different frequencies of the masking tone B. The results of this experiment are shown in Figure 11.1, in an experiment conducted by Wightman, et al. (1977). The outcome is our old familiar tuning curve, of the same sort as might be seen on the basilar membrane, etc. In fact this is called "A Psychophysical Tuning Curve" (PTC) and it is a way of doing "physiology" in human listeners. You could imagine, for example, getting PTCs in old humans and then looking at physiological tuning curves in the auditory nerves, or inferior colliculi, or auditory cortex of old mice, and comparing them to the data obtained in humans. Figure 11.1 gives three tuning curves for humans, which are rather like the various curves presented in the auditory nerve at different frequencies, in their being symmetrical at low frequencies but sharply asymmetric at high frequencies.
But Figure 11.2 shows that the PTC is less sharp than the auditory nerve tuning curve, and in general this is the case – there seems to be a little more relative masking going on at the edges of the function than would be expected from the filter shape as measured in the auditory nerve fiber. In Figure 11.2 for the auditory nerve a level of about 65 dB is necessary for a masker at 500 Hz to mask the detection of a 1000 Hz, but a comparable effect for a psychophysical tuning curve seems to be only 50 dB. So why is this? Is it that sensory experience is not as good as the nervous system? Or is it something about the method? There are different ideas about why this is so. Part of the difference is that there are two stimuli in the human experiment (a signal and a masker) while in the auditory nerve work there is no second stimulus (just a tone at the "masker frequency" that is driving a cell that happens to have a best frequency as the signal. Understanding why this is important takes a bit of a digression (and will end up at Figure 11.13).
Not all masking is "simultaneous" masking in which the two tones are presented together. Masking can also be shown if the masking tone is presented first and then the masked tone is present a short time later. This is called forward masking, and is treated towards the end of Chapter 11. If the PTC is obtained in a forward masking procedure then the function is much sharper and looks more like the auditory nerve fiber. In auditory nerve experiments there is no masking stimulus, and it is simply that the auditory nerve fiber is being stimulated by tones of different frequencies at different times, to see how they stimulate a nerve that has some best (other) frequency at threshold. Moore (in an edited textbook called simply "Hearing") suggests that in addition to their masking each other, two tones that are close together might suppress each other (as we saw in the auditory nerve), and so in addition to masking we also get two-tone suppression adding to the masking effect in the PTC experiment. But it has been observed that two-tone suppression is an "immediate" phenomenon, with no lag in latency before it begins or ends its work. So, if the widening of the PTC results from two-tone suppression, then forward masking should no suppression, and should give a purer picture of masking. And, indeed, forward masking gives a better approximation to auditory nerve effects.
Moore also gives some other examples of interactions between two tones which are interesting in the context of masking. As mentioned above, we know from the initial section on acoustics that two tones will "beat" -- wax and wane in level at the frequency of the frequency difference between the two tones. This means that in a two-alternative forced choice experiment a subject might use the presence of beats as information that a second tone has been added to the first. The way to get around this is to use brief stimuli, obviously. Then there are some other problems. For example, at high levels there may be distortion in a collection of harmonic tones or combination tones, both of which might also be used to indicate the presence of two tones, again at frequencies so far apart that they are not masked. These account for the very complex masking pattern found in the classic data of Wegel and Lane in 1924, which are given in Figure 11.3, when the masker was a 1200 Hz tone at 80 dB and the masked tone varied in frequency from about 300 to 3500 Hz. Note that the asymmetric shape of the function is turned around in Figure 11.3 compared to 11.1; this is because 11.1 gives the frequency of the masker while Figure 11.3 gives the frequency of the signal along the abscissa. Note the beating effect near 1200, the harmonic beating effect near 2400 and 3600, and places where difference tones might be heard.
Yost then goes on to talk about "noise-on-tone" masking. Figure 11.4 and 11.5 give some interesting data on this phenomenon. In 11.4 we see that as the frequency of the masked signal is increased for a constant masker then the signal level for detection has to be progressively raised Note also that the lowest curve is not really the threshold for audibility (which is measured in quiet) but has its greatest sensitivity at 500 Hz rather than 3 to 4 kHz). Why is this? Possibly it is because high frequencies are masked by low frequencies but not the other way around; but also because the critical bands for high frequencies are wider than those for low frequencies. Note also a very common effect, which is that the signal-to-noise ratio at threshold remains the same across differences in masker intensity: this is in line with the Weber-Fechner ratio.
The next section is very important in its introducing the concept of the "critical band" which is related to the hypothetical "internal filter" as seen in Figure 11.6. We have been talking in class about the auditory system (or any sensory system) as being made up of filters, and then the ear and the basilar membrane and so forth acting as filters. Here the notion is to imagine that the task of the listener is to pick a filter in which there is a signal, but also some noise. If the noise fills up the filter then the signal will be hard to detect; if the noise fills up only part of the filter then the noise will be easier to detect; and if the noise only acts on a filter other than that which contains the noise then there should be no masking and the tone should be easy to detect. The internal filter is the "critical band" because those frequencies are "critical for masking" according to Fletcher (a famous hearing scientist at Bell Labs., in the 1930s and 40s). Figure 11.7 is a great approach to this, first used by Patterson in his study of masking in elderly vs. normal listeners in 1983. He developed the use of "band-reject" noise to figure out the width of the critical band. Note that as the notch in the noise grew larger then the threshold for the signal dropped precipitously, along a more or less straight line. Figure 11.8 indicates that the width of the critical band increases with frequency, about linearly in a log-log plot.
The next section asks us to imagine the relationship between excitation patterns and critical bands. This topic is developed on the assumption of there being a fixed masker and various signals (as in Figure 11.4). Then by looking at the degree of masking of different signals one can (maybe) figure out what frequencies that masker is affecting, that is its "excitation pattern." The assumption is that if it masks a signal it must be extending into the critical band of that signal (look at Figure 11.10) and vice versa.
The next section on temporal masking gives the standard simultaneous, forward, and backward masking temporal patterns, described in Figure 11.11, and Figure 11.12 is purported to show that there is relatively more masking just at onsets and offsets of the stimulus. This is called an "overshoot" -- that I would like to think occurs because the nervous system weights "change" heavily at these points. However, instead Yost reprints Figure 11.2 in place of 11.12. A real 11.12 is shaped a little bit like a batman costume, with ear-like increases in masking right around the onset and offset of the masking stimulus. The next section (Figure 11.13) treats the example given earlier, that forward masking looks more like the tuning curve.
The final section treats a very complicated case, comparing simultaneous vs. forward masking when a pair of tones are used as maskers and one tone is the same as the signal and the other tone, 20 dB more intense than the masker, and is varied from very high to very low frequencies. I cannot imagine quite what the realistic example of this is, but the data are interesting anyway, if only because they are so complicated in the case of forward masking. First note that when the two maskers and the signal are all presented together there is no big deal. When the suppressor tone is around the frequency of the other tones then it just adds to the masking effect, and when it is much different then nothing happens. For forward masking however, the situation is quite different. When the suppressor is lower than the other tones then there is not much difference between the effects of simultaneous vs. forward masking. However, when the suppressor is a bit higher than the masking tone then we see that it suppresses the masker but not the signal, and is actually reduced by the suppressor. Why does suppression not work for suppressing tones that are below the masker? Unfortunately Yost does not explain this effect, but one might note that two tone suppression in physiological data is about equal on both sides of the center frequency while masking is more asymmetric. We could imagine that the lower suppressor tones suppress the masker but also forward-mask the signal, while higher suppressor tones (simultaneously) suppress the masker but do not themselves mask the signal.
Yost concludes with the comment that these complex effects are not well understood, but must be considered when trying to understand how the auditory system handles combinations of tones. And also, masking must be considered when we think about the effects of hearing loss and age on auditory perception. Age has an effect on masking that seems additional to any age effects on hearing loss, suggesting that the physiological control of masking by neural networks that are devoted to that function is diminished in aged human listeners.