Notes from Yost: Chapter 13, Loudness and Pitch pp 193-203
14 April 2004
This is a surprisingly short chapter given that the two dimensions it concerns are arguably the central dimensions of auditory sensation. Yost is concerned with the basics of these two dimensions as usual, but not discussing the long history of interesting theorization that surrounds them.
The initial problem is how to measure loudness. With humans one uses psychophysical judgments to measure sensory dimensions, primarily scaling methods and matching methods. So for example, one might not know the loudness of a 1000 Hz tone presented at 40 dB; but one could ask the question about how other tones compare to this one in loudness. Indeed this is a common method, and the results of one such experiment are presented in 13.1. A psychological unit of loudness is constructed called the “phon” and it is asserted by definition that the 40 dB 1000 Hz stimulus is 40 phons (at 1000 Hz, each dB = 1 phon). Then a series of other tones would be given at different frequencies, and at each frequency the subject would be asked to find the level of that frequency so that its loudness is equal to the 40 phon 1000 Hz tone (perhaps by the method of adjustment): all of those matched frequencies would then mark off the 40 phon loudness curve. Figure 13.1 provides equal loudness curves for 20, 40, 60, 80, and 100 phons. Note that they look very much like the audiogram save they are increasingly shallow at higher loudness values. The phon scale is easy to understand: ask subjects to match the unknown loudness against various levels of 1000 Hz, and the match is the answer.
The "sone" scale is also thought to measure loudness, though it has a less obvious rationale. It does not depend on simply matching one stimulus to another, but of making a “scaling judgment” of the sort that allows a person to say that “sound A is twice as loud as sound B” etc. “One sone” is the loudness of a 1000 Hz tone given at 40 dB SPL (so here one sone = one phon). Figure 13.2a shows the effect of varying the level of the tone in dB SPL on these judgments. Save for a little deviation at the lowest values this gives a straight line: but note that the dB abscissa (X-axis) is the equivalent of the phon scale (because it is at 1000 Hz), and the ordinate “Loudness in Sones” is on a log scale. The empirical rule seems to be that loudness doubles for every 10 dB increase in the level of the stimulus: or otherwise, loudness goes up by a factor of 2 while power increases by a factor of 10.
The lower part of lower part of Figure 13.2 provides data (very much prettified) from a famous paper by Steinberg and Gardner in 1937, in which a group of unilaterally-impaired listeners matched loudness in their two ears for a 1000 Hz tone presented at various levels, and also a group of normal listeners matched loudness in their two ears when one ear also heard a masking tone at about 50 dB. Not surprisingly, in both the impaired and the masked ears at low levels the tone was not heard and so the loudness remains at “0” sones even while the power in the stimulus is increased. Then as the stimulus reached and then exceeded the threshold, the stimulus was first heard and loudness increased thereafter very rapidly, until over the course of 30 or 40 dB the two curves were identical. The rapid growth of loudness is called “loudness recruitment” and it is characteristic of both masking and also sensorineural hearing loss (it is not present with conductive hearing loss). The present thought is that this results because the missing hair cells affect near threshold hearing but become less important to detection and loudness as the stimulus reaches the threshold for the inner hair cells. I think this may have been the first publication to suggest that hearing loss can be considered a form of masking, and vice versa.
Of course loudness varies with bandwidth and with the duration of a tone (up to about 300 ms). And also, one last little section, the longer a tone is on the less loud it appears to be. This is called loudness adaptation (“perstimulatory fatigue”). One would like to think it is related to the change in neural firing in the auditory nerve with a continued stimulus, save that it takes several seconds or even minutes to be noticed, while the changes seen in the auditory nerve are measured in milliseconds.
Yost says that “pitch has come to represent different aspects of an auditory stimulus.” He seems to explain the curious statement in the next few sentences: that pitch is correlated with frequency, but we can report perceiving a pitch (does this mean “hear”?) even if there is no energy at that frequency (and he is not talking about tinnitus or hallucinations, of course, but the missing fundamental, that we heard in the demonstrations). He says that the normal perception is predictable on the basis of the tonotopic organization of the basilar membrane and so forth; but the perception of pitch in the absence of energy must have some other explanation. (We would like to think it has to do with phase locking).
He notes that experimenters (and here he is referring to Smitty Stevens, the famous hearing scientist from the 40's who sat on a chair on top of a tall building at Harvard to measure his ability to locate sounds) have used various kinds of scaling procedures with pitch (for example, asking a subject to pick a pitch that is half that of a standard). Yost says the values are different for pitch than for loudness because pitch is a different kind of a scale that does not vary along a single axis. There is a typo on page 196, but he is intending to say that “one pitch may not be greater in magnitude than another pitch.” Yost is an expert in psychophysics, but it does seem to me that some researchers would disagree with him. Often pitch is graphed as a rising helix with octave notes along a vertical straight line. Yost neglects these measures of pitch, unfortunately, but does give three musical scales of tuning for pitch, in which there are 12 intervals per octave, and the intervals are more-or-less evenly spaced on a logarithmic scale. These scales have to do with methods of adjusting musical instruments, but I am not sure how this relates to auditory theory, save they may have something to do with the inability of the auditory nerve to completely phase lock on time as frequency increases. Yost also mentions the mel scale, which along with the sone scale is an invention of Smitty Stevens, but he says the mel scale is not used much these days. Others say the mel scale is a bit bizarre because it doesn’t take account of octave relationships. I am not sure how Stevens considered the special effects of harmonics and octaves.
An interesting point is how long does a stimulus have to be on before we can appreciate its tonality. Note that this must have something to do with the Fourier analysis of brief stimuli: it needs to be on for some finite time so that it doesn’t disperse its energy across many frequencies. In general it is necessary for tones above about 1000 Hz to be on for about 10 ms (which is at least 10 waves), but a low frequency tone, below 1000 Hz may require some number of waves, from 3 to 9 said von Bekesy, other say 6 (which is about 3 to 9 I guess). So the relationship between tonality and time is not a constant, but varies with frequency; a 100 Hz tone with a period of 10 ms may have to be on for 30 to 90 ms before its tonality can be appreciated.
The next section has been critically important in theories of pitch perception. There are several early demonstrations of pitch without frequency but the example that Yost likes is Fletcher's demonstration of “the missing fundamental.” If a complex tone consists of tones with a common denominator, then the pitch of the compound is that matched by a single tone at the frequency of the denominator. So, a sound having 700, 800, 900, and 1000 will have a pitch of a 100 tone (though it can be easily distinguished from a pure tone on the basis of its complexity). The point is that these are the 7th, 8th, 9th, and 10th harmonics of 100, which is why the phenomenon is called the missing fundamental. Figure 13.3a gives the Fourier analysis of this complex (the dotted line showing no energy), and 13.3b gives the time description of the wave form. It is apparent that there is a burst of activity every 10 ms, and so very obviously (but the story is more complicated than this) it may be imagined that this is the simple neural basis of the phenomenon. Periodicity can be shown also for periodic noise bursts, such as a square wave having a constant inter-click period, as was shown in Chapter 4 (Figure 4.8), which actually has energy at the frequency of the click in the Fourier analysis. Modulated noise has a similar effect, as is shown in Figure 13.4 a and b, even without any energy at the frequency that is heard. Then further, in Figure 13.3a all of the waves are in phase, which generates the time domain spectrum of 13.3b. But the missing fundamental occurs when the components are out of phase, and then the time domain spectrum does not have any periodicity. On the other hand, the effect does not arise because of nonlinearity on the basilar membrane. Licklider showed (and we heard in the demonstration CD) that if a narrow band noise masker is placed over the 100 Hz site on the basilar membrane, so that a real 100 Hz tone would be masked, still the imagined pitch is not masked: so it is something happening at higher levels of the nervous system. Obviously it must be related to phase locking but it also must be (because of the random phase results, above) that the fibers for each of the frequencies are separately evaluated: it is not that they all have to fire together every 10 ms, but that their own regular rates of firing must be separately noted. I don’t think that we know how the neural machinery within the auditory system accomplishes this feat.
A last complicated example is given on page 199 in Figure 13.5, in which a set of harmonic related tones are all displaced, so that, for example, 400, 500, 600, etc. become 425, 525, 625 etc. Now 100 is not the missing fundamental, but, instead, it should be 25 Hz. However, what is heard is not 25 Hz or 100 Hz, but, instead, 104 Hz. Yost calls this “the pitch shift of the residue” and its perception remains a major mystery: it is as if the nervous system is trying the best it can to find the lowest common denominator that is within the range of hearing (25 Hz isn't).
The last section in this chapter examines the loudness of nonlinear tones, the harmonics and the difference tones. So for example, the strong “cubic difference tone” of (2f1-f2) [if f1 is 1200 and f2 is 2000, then the cbt is 400 Hz] is heard even though there is no energy at this frequency in the stimulus, though there is displacement on the basilar membrane. Yost describes the way of measuring this perception by using a cancellation tone of 400 Hz, and then adjusting its phase and its level until the two tones are canceled so that neither one is heard (if the tones are 180 degrees out of phase and of equal level, then they cancel each other.) . Figure 13.6a shows the equivalent level of the difference tone and how it varies with the level of the original tones (it looks about 20 to 25 dB down from the primaries).
Lastly Yost talks about other subjective attributes of sound, namely timbre, consonance, and dissonance. Timbre is strangely defined as the difference between two sounds that makes them sound different though they have the same pitch, loudness, and duration, which makes it seem like a junk bag concept, which it is not. More than that it has to do with things like the richness of harmonics, and the rise and decay times of the notes. Consonance means “pleasurable” and is opposed to dissonant or harshness, but pleasantness in music seems a learned emotion. Some people think of consonant complex sounds as being made up of harmonic components as opposed to non-harmonic components. People also describe sounds as being dense or voluminous, and although we have a volume control on our audio equipment this is not thought of as being quite the same as loudness [though the difference to me is awfully subtle]. Yost says that density accompanies an increase in level or frequency while volume is the opposite. However, this may be grain of salt time, as Yost suggests: perhaps we learn to use these words through experience and they do not have much fundamental meaning.