Yost Chapter 9: The Neural Response and the Auditory Code (17 Feb 2003)

This chapter concerns the function of the afferent and efferent nerves and the way in which frequency, level, and phase are coded at the periphery. Obviously the story begins with the graded electrical potentials of the hair cells which we know as the cochlear potential and the summating potential. The hair cell releases a neurochemical (glutamate) that produces a graded potential in the NMDA and AMPA receptors of the nerve fiber (the "generator potential"), which proceeds (possibly diminishing along the way) to the myelinated portion of the fiber as it passes through the habenula perforata: at that point it becomes an action potential. Yost then describes the familiar spiked action potential, probably no different here than in other nerves (for a review see Appendix E). By and large neurons in the brain have an absolute refractory period of substantial fractions of a millisecond, which if true of the hair cell/auditory nerve connection would limit the ability of a single fiber to follow an acoustic sinusoid beyond 1 kHz. However recent evidence shows that there are very special concentrations of very fast potassium channels in the auditory nerve and other parts of the auditory system, this suggesting that auditory neurons are able to fire at faster rates than those in other parts of the nervous system (Adamson et al., 2002, Journal of Comparative Neurology, v 447, pp 331-350).

Single unit recording methods, now in common use, use a baseline of firing rate (spikes/sec perhaps) with no stimulation, this providing a measure of spontaneous activity (though Yost is careful to point out that there may be some sounds in the environment that is responsible for this firing — internal noises, for example). After the experimenter establishes the spontaneous rate then typically a threshold will be determined: but what value may be described as greater than the spontaneous rate? This is an ambiguous statistical decision, often handled by the experimenter's being able to hear the difference, by passing the action potential through an amplifier to a speaker. Other fancier methods start with a pair of brief observation periods, one paired with a given stimulus and one not. Then a decision of "1 or 0" is made based on the number of spikes in the period with a stimulus being greater or less than that without the stimulus, and then the stimulus is said to give rise to a response if the proportion of "1" is greater than the proportion of "0" by a certain amount, say, 75% to 25%. Note in Figure 9.1 an adaptation of the famous data provided by Liberman (1978) for the thresholds of high (>18 spikes/sec), medium (.5 to 18 spikes/sec) and low (< .5 spikes/sec) spontaneous rate fibers, as a function of its best frequency. These data were obtained in the cat. This adaptation is a bit heavy, and the data as presented by Liberman are easier to understand. Basically he showed that auditory nerve fibers have a wide range of spontaneous rates, from near 0 spikes/sec to over 100 spike/sec. The modal rate is close to 0 s/s (low spontaneous rate), but a broad distribution ranged between about 20 to 100 spikes/sec (high spontaneous rate). The thresholds for all of the high rate units are low, while the thresholds for the low spontaneous rate units range from pretty low (but not very low) to very high. Also the different spontaneous rates were spread across the audiogram, so it wasn't that fibers in the most sensitive region had the highest spontaneous rates. The overall population of low and medium fibers looks like the cat audiogram, which seems a reasonable outcome. On the other hand the low spontaneous rate fibers are a bit anomalous and seem not to follow the audiogram. Note especially in Figure 9.2, a graph describing the "rate-level function." The high spontaneous rate fibers continue to out perform the other fibers as a group, but as a rule all of the groups show on average a dynamic range of only 20 or 30 dB, though some few might extend over a 50 dB range.

Figure 9.3 is worth some study, as it puts together rate-level functions for several frequencies on one fiber. One outcome that is striking here is that the "best frequency" at threshold may not be the best frequency at high levels of input. For example, this fiber has a best frequency (BF) of 2000 Hz, in the sense that 2 kHz is its lowest threshold at about 34 dB. For a lower frequency stimulus (1800 Hz) the firing rate is never as good as the 2000 Hz stimulus and the dynamic range is about the same (the distance in dB between the threshold and the asymptote). But while a 2200 Hz stimulus has a higher threshold (at about 50 dB), it eventually catches up to the 2000 Hz stimulus in asymptotic firing rate. Under 9.3a are two other curves, one giving the response rate at a 60 dB stimulus level (spikes/sec), and the other the threshold values (dB at a significant shift in firing rate). This tuning curve of Figure 9.3c is called an "iso-rate curve" — it could be taken at any rate and the function might vary considerably with that rate. Figure 9.4 is a famous sort of figure, showing input/output functions for one fiber with BF at 4100 Hz, when tested at frequencies ranging from about 500 to 5000 Hz, at different levels, 40 to 90 dB. Note that a broad band of frequencies activate this fiber at high levels. Figure 9.5 shows threshold measurements for fibers having different BFs, all looking like asymmetric filters (in log-frequency units). At least beyond the lowest frequency fibers, cells do not respond well to tones higher than their BF, but have a long tail for low frequencies: so, think of the traveling wave for an explanation of this effect. In fact, if you look back to basilar membrane and hair cell responses that I borrowed from Pickles for my overheads to supplement chapters 7 and 8, you see very similar functions on the basilar membrane and the haircells to those shown here for the auditory nerve.

The next section treats an important topic with an emphasis on recording methodology: The "post-stimulus time histogram." For very low frequency stimuli (Yost suggests below 1000 Hz, but this varies across species) the cell will fire on each cycle of the wave; and at higher frequencies it will fire not on every cycle, but when it does fire it fires at a particular part of the cycle, at multiples of the period of the tone. The "phase-locked response" is certainly important for spatial location: it is not completely certain that it is important for other dimensions of the stimulus (i.e., pitch, or possible threshold) but pitch at least seems a likely possibility. A "histogram" is a record of a cell in a simple bar chart, where the bars are adjacent. Typically it starts with the presentation of a stimulus, and so it is called a "post-stimulus time histogram" as in Figure 9.7. Quite apart from the technique, note that there is a rapid high onset response; a decay (adaptation) to a plateau; and then a prominent depression at offset. You can imagine some of the sensory properties of these figures. Phase locking is seen with one further modification, the "interval histogram" in which the abscissa is scaled as the interval between successive spikes. For low frequency stimuli the cells fire at some multiple of the period, i.e., they phase lock (see Figure 9.8). In this figure a single fiber with a BF of 1000 Hz was tested at lots of different frequencies. Note that it responds with the frequency of input for a single sinusoid and not with its own best frequency, but it does respond best at that frequency. [If it were responding to a noise or to a click then it would respond at its own resonant best frequency.] Figure 9.9 shows how the cell can track the phase of the sinusoid by firing differentially at different parts of the phase cycle, essentially providing a rectified sinusoidal output. Note how complicated the stimulus is (two tones that add up, the 3rd and 4th harmonics of 266 Hz, so they wax and wane in intensity, and the fiber follows this pattern). The cell does not fire throughout the entire cycle because it is following the haircell, and the haircell is following the direction of the basilar membrane.

The next phenomenon of interest is the response to a click. Now note that the fiber fires with its own best frequency: what is the difference between this case of the click and the prior example of a tone? Why is the response different to a rarefaction as opposed to a condensation click? We know this has to do first with the Fourier analysis of the click and the resonant frequency of the hair cell, and second with the direction of the deflection of the hair cells at the onset of the click.

Yost then begins to deal with more of the "non-linear" non-additive properties of the auditory system, beginning with two-tone suppression. Strangely, adding one tone to another tone may cause not a further increase in firing, but a decrement. Note in Figure 9.11 that an auditory nerve fiber tuning curve may have inhibitory sidebands at certain frequencies that lower the firing rate, even though by themselves these frequencies may produce firing: So two positive tones somehow might cause a negative response. Yost points out this may be seen in a change in firing rate, or in phase locking. A significant question from the past has been whether this effect is a neural process or a biomechanical process. Because it turns on and off so fast it is thought to occur at the level of the basilar membrane, but there may additional similar neural processes as well at higher levels as well. The next kind of suppression of neural firing is very different in its explanation.

A very small section is given to the function of the efferent system, a system that seems to be inhibitory in its effect on auditory nerve firing. That is, when simultaneously recording from a nerve while presenting a sound and then additionally stimulating the crossed olivocochlear bundle (because it is easier to get to) then one notices that the firing rate declines. A similar effect can be produced by presenting a contralateral sound (see Figure 9.12, from an often quoted paper of Warren and Liberman, 1989). It is thought that this activity may "protect" the ear, and Liberman has some recent data suggesting that animals with a strong olivocochlear reflex response are less susceptible to noise induced hearing loss. Other people suggest it is important in auditory attention and possibly in increasing signal to noise ratio at the inner ear (work showing attentional deficits in patients without an efferent system, Scharf et al., 1997, Hearing Research, v 75, pp. 11-26).

The last section of this chapter covers a very fast treatment of coding for pitch, loudness, and phase effects, building on chapters 7 and 8. The first characteristic of interest is the relative position of pitch along the basilar membrane, but note, that this may be quite crude for higher intensities over threshold. Some further evidence for the place theory of pitch is given by the approximate correlation of hearing loss for certain frequencies with hair cell loss along the basilar membrane. This is the physical basis for the "place theory of pitch perception." But additionally pitch can be coded as periodicity in nerve cell firing, and this does not degrade with stimulus level. Data relevant to this distinction is given in Figures 9.13 versus 9.14, showing in the first a response to the phoneme /e/ measure as rate, and in the second as synchronized rate (the speech sounds were presented to an anesthetized cat, not a human). The synchronized rate does very nicely at two different intensities. Note that the formats of the speech sound are preserved in the synchronized rate (at about 600, 1800 and 2400 Hz).

Rate itself has been thought to be related to loudness, not a surprising notion. The fact that one fiber saturates is of little consequence because others come in at different levels, and so the population as a whole might code loudness over a very large range. However, one notes that there must be integrating mechanisms present at higher levels of the auditory system that are able to receive information from an assembly of neurons, to encode their joint activity. Figure 9.15 is the depiction of a computer modeling of a basilar membrane response to the phoneme /e/. Yost says the time intervals separating the low from the higher frequencies correspond to the length of time that it takes for the traveling wave to traverse the basilar membrane, but numbers on the order of 10 ms seem a bit excessive. Apparent in this 3-D picture is the dominant formats of the phoneme (at 550 for example, and around 2000), and then the periodicity around 100 Hz in the amplitude modulation of the signal.

The supplement to the chapter contains some important information. First, the history of hair cell recording is described, together with the idea that the summating potential is responsible for auditory nerve firing at non-phase locked frequencies, while the haircell AC components are responsible for phase locking.

Another subtle point is that low frequency fibers are not as sharply tuned as high frequency fibers: Q10 values increase from about 1 to 10 with frequency. For humans the critical band is about 10 to 15 % of the center frequency beyond about 1000 Hz, but it is higher for low frequency fibers, and may be close to 100%. The critical bandwidth of the filter is important because it suggests what the masking frequencies might be for those signal frequencies.