Notes from Yost Chapter 4: Complex stimuli pp 39 Ð 51: 24 January 2003

My Overview

There is a point to all of these pictures of multiple sinusoids as they variously summate or separate in Fourier Analyses. It is that whereas the case of the simple single sinusoid is the easiest to think about and the easiest to manipulate in analytical experiments and demonstrations, 100% of real world sounds consist of multiple sinusoidal complexes (except when musicians play a flute). In fact some experimenters believe that there are auditory modules in the brains of animals, including humans, that are particularly sensitive to combinations of sounds that occur in species-specific communication, and cannot be activated by pure tones. Yost describes what happens when sinusoids are mixed, including the effects of abrupt shifts in the amplitude of single sinusoids and noise bands, and periodic brief transients. The most complex and interesting wave form is that of speech. You might look ahead to page 223 and check out a "spectrograph" of a speech signal. Yost made an odd mistake in his section about beats on page 43: his example uses a complex stimulus made up of a 560 Hz and a 660 Hz complex. I suspect it was meant to be 56 Hz and 66 Hz. Beats do not occur when the separation is over about 10 to 15 Hz.

Yost begins by pointing out that most real stimuli consist of many sinusoids, and that such stimuli can be characterized or graphed in several ways. In Chapter 2 (Figure 2.10 for example) they are drawn in the TIME DOMAIN, showing the instantaneous amplitude (displacement etc.) along the ordinate with time along the abscissa. Now Yost introduces the FREQUENCY DOMAIN in which several dimensions of the sound wave (amplitude and phase) are along their own ordinates, (y-axis) and the abscissa (x-axis) is frequency rather than time. The graph showing the amplitude as a function of frequency is called (reasonably enough) the "amplitude (or power) spectrum"; the graph showing the starting phase as a function of frequency is called the "phase spectrum." All of this is shown in Figure 4.1 The complex wave (4.1 a) apparently consists of 3 frequencies of equal amplitude and with the same starting phase (4.1 b and c) and bearing an harmonic relationship, all having the common denominator of 100 Hz (that is, they are 100, 200, and 300 Hz tones). The steps from 4.1 a to b and c require Fourier analysis, described in some detail in the Appendix for those who remember calculus. Nowadays one buys a machine, puts in the complex wave, and out comes the Fourier analysis. Or otherwise, one puts the wave as a sound wave into one's ear, and lets the ear do the work.

Yost then makes another point, distinguishing between "line spectra" and "continuous spectra" the line spectra consisting of "simple complex" stimuli, with certain discrete values (as in 4.1 b) while continuous spectra have a blend of all frequencies, at least all over a large range -- the noise of a thunderclap for example, or a click. "Simple-complex" stimuli are often generated by the human voice or by a musical instrument and their frequencies share a common multiple: the highest common denominator is the fundamental (the first harmonic) and the later multiples are the second, third, fourth harmonics, etc. Continuous spectra are heard as noise.

Figure 4.2 illustrates the synthesis part of Fourier analysis, showing how 3 harmonics add up to a complex tone. Figure 4.3 illustrates a fact that not enough people who do experiments with auditory stimuli appreciate: it is that "pure tones" do not have simple line spectra when they are presented with abrupt onsets and offsets: the sharp onset is heard as a click, which is a result not of some defect in the ear, but happens because the synthesis of many frequencies is necessary to simulate the sharp edge. Even if the tone is on for a long time the onset and offset are heard as clicks, although the center portion of the stimulus will sound as a pure tone should. Yost shows that for a brief burst, D seconds in duration, a tone has a curious "continuous spectrum" which peaks at the frequency of the tone, then drops off to zero at the frequency of the tone (F) +/- X[1/(2T)], where X is a series of integers. His example is a 20 ms (= .020 seconds) tone set at F = 1000 Hz. Its continuous spectrum has a peak at 1000 Hz, which drops off to zero points at [1000 +/- 1/(2 x .020)]Hz, = 1000 +25 = 1025, 1000-25 =975, etc. Similarly, a tone of 10,000 Hz presented for 10 ms (.01 sec) would have a power peak at the center frequency at 10,000 Hz, and would decline to 0 at integer multiples of (1/.02 =) 50 Hz, namely, at 10,050 and 9,050 Hz; rise again and decline again etc.; and all of the frequencies would have the same starting phase. Obviously if the duration were very short, say, a fraction of a millisecond, then the zero points would be very far from the frequency of the tone and the sound would be a noisy click.

In Figure 4.4 he shows the analysis of presenting a single 1 ms click which has another sort of multi-modal spectrum, with nodes at X(1/D). In Figure 4.5 he shows the effect of a periodic click, with duration D and period Pr: this yields multiple nodes at 1/D, and bands made up of harmonics spaced a 1/Pr. So if the duration is 1 ms the nodes are at 1000 Hz, 2000 Hz, and so on, and if the interpulse onset duration is 5 ms, and harmonics are multiples of 1/.005 = 200 Hz.

Note that the fundamental would be 200 Hz, and presumably we would hear a very rich 200 Hz tone (lots of harmonics). We will see when we look at auditory nerve function that brief click presentations are technically very useful, because they stimulate all of the nerve fibers, and because then the nerve fibers fire at their own resonant frequency. This response to an acoustic transient is called "The impulse response."

The next topic has to do with "beats" and amplitude modulation. I find Yost's descriptions of beating confusing and wrong. Most authors treat beating as being heard when the two tones are close together, within a few Hertz: say 400 and 410.Then the "fused" single impression of a tone has the average frequency of 405 Hz, that beats at the difference between them, which equals 10 Hz. When the difference is beyond 10 or 15 Hertz beating stops and is replaced by roughness in one tone, then roughness in two tones, then beyond a "critical band" which is about 100 Hz wide up to tones of 1000 Hz, and then increases with the frequency to be about 10%, it is heard as two separate tones. So 560 and 660 should not beat. Yost's error possibly began with a typographical error in which 55 Hz and 65 Hz became 550 and 650.

The next example of beating, provided by multiplying two frequencies together, provides a picture of the spectrum that is very different in the Fourier analysis, though in the time domain it looks very similar to that of adding up two similar sinusoids: the graphs shown in 4.9 look near identical with 4.8 but are not. Here a single tone (Fc = carrier frequency) has its amplitude modulated by another tone (Fm - modulating frequency): so perhaps there is a basic tone of 1000 Hz, but then the amplitude of this tone is modulated by a 10 Hz sinusoid. Here the two tones do not average to give an apparent single tone with a frequency f (1000 + 10)/2, nor a single tone of 1000 that goes up and down in amplitude at a rate of 10 Hz. Indeed the Fourier analysis doesn't provide anything with the frequency of 10 Hz: instead the Fourier analysis of this compound has three tones, Fc, Fc - Fm, and Fc + Fm. The two flanking tones are called side bands, and have their amplitude determined by the depth of modulation provided by the 10 Hz modulation frequency. Yost points out that these stimuli are often used to measure temporal acuity in the auditory system. In addition, the presence of beats is used by piano tuners to detect small frequency differences between a standard and the frequency to be tuned. As we will see, at the periphery of the auditory system the ear is very much concerned with identifying Fc; but in the higher regions of the auditory brainstem the ear is much more concerned with Fm!

Frequency modulation (contrasted with amplitude modulation, above) is a very cunning stimulus, where the complex sound varies over time in frequency rather than amplitude. In 4.9 through 4.11 these stimuli are given in the time domain, and in 4.12 in the frequency domain. This latter is very complicated, consisting of Fc +/- (n X Fm), when n is an integer. So, we could imagine a basic 1000 Hz stimulus that has a 10 % modulation depth, from 950 to 1050, that itself has a 4 Hz cycle. The long term Fourier analysis would not have all of the frequencies from 950 to 1050, but would have discrete lines at 996 and 1004; 992 and 1008; 988 and 1012, etc. In 4.11 Yost describes a complex tone that change in both frequency and amplitude over time, which is shown in the spectrogram in Figure 4.12 (the legend says it is for 4.9, but it is really 4.11). This indeed seems too complicated to deal with: it is complicated, but it is also the idealized form of the speech signal, which can be displayed in the time domain as a "spectrograph."

Yost then goes on to the technical discussion of "noise" in which the instantaneous amplitude varies randomly over time, so there is no periodicity. "Gaussian" noise has its amplitude varying in a normal curve function with time, so that the mean amplitude is zero and small variations around zero are more likely that large variations. "White" noise means that all of the frequencies have the same average amplitude across time, across some bandwidth of interest (note Figures 4.3 and 4.4). "Gaussian White Noise" has a density distribution of amplitudes that is normal. Note a little later Yost specifies something called "Pink noise" for which octave bands of frequencies have the same power: every time the frequency is doubled, then the band width of the band is doubled and so then the amplitude at any given frequency is halved, which is a decrement of 3 dB. The interest in pink noise is that because the frequencies are laid out by octaves along the basilar membrane, then it should approximately provide a constant amount of energy for every receptor. Sinusoids have one measure of intensity, namely the power (or pressure) at that frequency. Noise has two: one is total power (or pressure) for the entire sound; the other is the "spectrum level", which is the amount of energy/power etc. at one frequency, say 999.5 to 1000.5: spectrum level is abbreviated No (pronounced "eN oh". The total power is the sum of all of the intensities in the bandwidth (see Figure 4.4). How do we get from No to total power? This is a good question, because the answer is, it isn't easy. We would have to add up the powers (or pressures) in the separate frequencies, and not the dB values (because dB are in logarithms, and adding logarithms is in fact to multiple the underlying values). The formula is

Total power in dB = No (in dB) + 10 log BW. So, if No is 50 dB, a noise with a bandwidth of 2 Hz would yield the value Total power = 50 + 10 log (2) = 50 + 10 (.3) = 53 dB.

A general rule that appears here is that any time the bandwidth of a noise is doubled the level of the noise goes up by 3 dB. (Because the power is doubled, and the formula for intensity in dB becomes 10log (2/1) the log of 2 is pretty near to .3, and so the dB increase is 3.) The little section on narrowband noise is interesting, because of the amplitude changes that occur because if the range of acceptable frequencies is pretty small then the random process that generates the noise will often pick frequencies that are filtered out, and so the amplitude of the noise falls. As a consequence the noise signal starts to look a lot like the envelope of a speech signal.

Researchers manipulate FM and AM sound because they seem to capture in a simple way complex features of the speech signal, but they are simple only with respect to the complexity of speech. Compared to simple tones, they can be very complex. The most complex stimulus that I have run across in an experiment that had a very practical intent was devised by Stefanatos, Green, and Ratcliff (Archives of Neurology, 1989), in an attempt to explore the possibility that dyslexia has in part a basis in auditory dysfunction.

Stefanatos et al. started with a frequency modulated tone, in which the frequency was modulated at 10 Hz. The carrier tone was 1000 Hz, and so it increased and decreased past 1000 every 50 ms, with the complete cycle taking 100 ms. Normally an FM tone would be expected to go from a constant high value to a constant low value in that period of time: for example, it might cycle back and forth between 900 Hz and 1100 Hz. The authors wanted to know whether children would be able to detect this rate of change in frequency, but they also wanted to use an objective measure of brain activity as their measure, which was the auditory evoked potential. For this purpose they had to have an acoustic event that could be heard at relatively slow rates: so they needed to present both the fast change in frequency modulation that could be heard or not, and a slow change in something else that would be picked up as an evoked potential, but only if the subjects heard the fast change. In order to do this Stefanatos et al. also varied the range of frequencies over which the carrier frequency was being changed (this is called modulation depth) at 4 Hz. Thus in one 100 ms. cycle the tone might start to shift relatively slowly from 950 to 1050, but then as the cycle continued the target would shift to 1080 and then 1100, and so forth. They never varied the amplitude of the stimulus, so the loudness was constant. Also constant was the increase or decrease in depth over time, so the only stimulus that could be heard to change over time was how fast the frequencies were changing. They thought that their stimulus would be very similar to the frequency shifts of speech, and they could also because they could get auditory evoked potentials to a 4 Hz modulation, but only if the children could hear the frequency shift. They went on to show that a certain category of dyslexic children (receptive dyslexics) could not detect this modulation rate, indicating that they had a problem with temporal acuity.