The Handbook of Speech Perception. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу The Handbook of Speech Perception - Группа авторов страница 45
Figure 3.3 Firing‐rate distributions in response to the vowel [ɛ] in head [hɛːd] for low spontaneous rate fibers (left) and high spontaneous rate fibers (right) at three different sound intensities.
To give you a sense of what these different auditory nerve fiber types contribute to speech representations as different sound levels, Figure 3.3 shows the firing‐rate distributions for the vowel [ɛ], much as in the right panel of Figure 3.2, but at three different sound levels (from a very quiet 25 dB SPL to a pretty loud 85 dB SPL, and for both LSR and HSR populations. As you can see, the LSR fibers (left panel) hardly respond at all at 25 dB, but the HSR fibers show clear peaks at the formant frequencies already at those very low sound levels. However, at the loud sound levels, most of the HSR fibers saturate, meaning that most of them are firing as fast as they can, so that the valleys between the formant peaks begin to disappear. One interesting consequence of this division of labor between HSR and LSR fibers for representing speech at low and high sound levels respectively is that it may provide an explanation why some people, particularly among the elderly, may complain of an increasing inability to understand speech in situations with high background noise. Recent work by Kujawa and Liberman (2015) has shown that, perhaps paradoxically, the less sound‐sensitive LSR fibers are actually more likely to be damaged during prolonged noise exposure. Patients with such selective fiber loss would still be able to hear quiet sounds quite well because their HSR fibers are intact, but they would find it very difficult to resolve sounds in high sound levels when HSR fibers are saturating and the LSR fibers that should encode spectral contrast at these high levels are missing. It has long been recognized that our ability to hear speech in noise tends to decline with age, even in those elderly who are lucky enough to retain normal auditory sensitivity (Stuart & Phillips, 1996), and it has been suggested that cumulative noise‐induced damage to LSR fibers such as that described by Kujawa and Liberman in their mouse model may pinpoint a possible culprit. Such hidden hearing loss, which is not detectable with standard audiometric hearing tests that measure sensitivity to probe tones in quiet, can be a significant problem, for example by taking all the fun out of important social occasions, such as lively parties and get‐togethers, which leads to significant social isolation. However, some recent studies have looked for, but failed to find, a clear link between greater noise exposure and poorer reception of speech in noise (Grinn et al., 2017; Grose, Buss, & Hall, 2017), which would suggest that perhaps the decline in our ability to understand speech in noise as we age may be more to do with impaired representations of speech in higher cortical centers than with impaired auditory nerve representations.
Of course, when you listen to speech, you don’t really want to have to ask yourself whether, given the current ambient sound levels, you should be listening to your HSR or your LSR auditory nerve fibers in order to get the best representation of speech formants, and one of the jobs of the auditory brainstem and midbrain circuitry is to combine information across these nerve fiber populations so that representations at midbrain and cortical stations will automatically adapt to changes both in mean sound level and in sound‐level contrast or variability, so that features like formants are efficiently encoded whatever the current acoustic environment happens to be (Dean, Harper, & McAlpine, 2005; Rabinowitz et al., 2013; Willmore et al., 2016).
As we saw earlier, the tonotopic representation of speech‐sound spectra in the auditory nerve provides much information about speech formants, but not a great deal about harmonics, which would reveal voicing or voice pitch. We probably owe much of our ability to nevertheless hear voicing and pitch easily and with high accuracy to the fact that, in addition to the small number of resolved harmonics, the auditory nerve delivers a great deal of so‐called temporal fine structure information to the brain. To appreciate what is meant by that, consider Figure 3.4, which shows the waveform (top), a spectrogram (middle) and an auditory nerve neurogram display (bottom) for a recording of the spoken word ‘head.’ The neurogram was produced by computing firing rates of a bank of LSR auditory nerve fibers in response to the sound as a function of time using the model by Zilany, Bruce, and Carney (2014). The waveform reveals the characteristic energy arc remarked upon by Greenberg (2006) for spoken syllables, with a relatively loud vowel flanked by much quieter consonants. The voicing in the vowel is manifest in the large sound‐pressure amplitude peaks, which arise from the glottal pulse train at regular intervals of approximately 7 ms, that is at a rate of approximately 140 Hz. This voice pitch is also reflected in the harmonic stack in the spectrogram, with harmonics at multiples of ~140 Hz, but this harmonic stack is not apparent in the neurogram. Instead we see that the nerve fiber responses rapidly modulate their firing rates to produce a temporal pattern of bands at time intervals which either directly reflect the 7 ms interval of the glottal pulse train (for nerve fibers with CFs below 0.2 kHz or above 1 kHz) or at intervals that are integer fractions (harmonics) of the glottal pulse interval. In this manner auditory nerve fibers convey important cues for acoustic features such as periodicity pitch by phase locking their discharges to salient features of the temporal fine structure of speech sounds with sub‐millisecond accuracy.
Figure 3.4 Waveform (top), spectrogram (middle), and simulated LSR auditory nerve‐fiber neurogram of the spoken word head [hɛːd].
As an aside, note that it is quite common for severe hearing impairment to be caused by an extensive loss of auditory hair cells in the cochlea, which can leave the auditory nerve fibers largely intact. In such patients it is now often possible to restore some hearing through cochlear implants, which use electrode arrays implanted along the tonotopic array to deliver direct electrical stimulation to the auditory nerve fibers. The electrical stimulation patterns delivered by the 20‐odd electrode contacts provided by these devices are quite crude compared to the activity patterns created when the delicate dance of the basilar membrane is captured by some 3,000 phenomenally sensitive auditory hair cells, but because coarsely resolving only a modest number of formant peaks is normally sufficient to allow speech sounds to be discriminated, the large majority of deaf cochlear implant patients do gain the ability to have pretty normal spoken conversations – as long as there is little background noise. Current cochlear implant processors are essentially incapable of delivering any of the temporal fine structure information which we have just described via the auditory nerve, and consequently cochlear implant users miss out on things like periodicity pitch cues, which may help them separate out voices in a cluttered auditory scene. A lack of temporal fine structure can also affect the perception of dialect and affect in speech, as well as melody, harmony, and timbre in music.
Subcortical pathways