The Handbook of Speech Perception. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу The Handbook of Speech Perception - Группа авторов страница 47
(Source: Adapted from Humphries, Liebenthal, & Binder, 2010.)
Figure 3.5 depicts a flattened view of the left‐hemisphere cortex colored in dark gray. Superimposed onto the flattened cortex is a tonotopic map (grayscale corresponding to the color bar on the bottom right). Each point on the surface of the tonotopic map has a preferred stimulus frequency, in hertz, and along the dotted arrow across HG there is a gradient pattern of responses corresponding to low frequencies, high frequencies, and then low frequencies again. Given this tonotopic organization of the primary auditory cortex, which is in some respects not that different from the tonotopy seen in lower parts of the auditory system, we may expect the nature of the representation of sounds (including speech sounds) in this structure to be to a large extent spectrogram‐like. That is, if we were to read out the firing‐rate distributions along the frequency axes of these areas while speech sounds are represented, the resulting neurogram of activity would exhibit dynamically shifting peaks and troughs that reflect the changing formant structure of the presented speech. That this is indeed the case has been shown in animal experiments by Engineer et al. (2008), who, in one set of experiments, trained rats to discriminate a large set of consonant–vowel syllables and, in another, recorded neurograms for the same set of syllables from the primary cortices of anesthetized rats using microelectrodes. They found, first, that rats can learn to discriminate most American English syllables easily, but are more likely to confuse syllables that humans too find more similar and easier to confuse (e.g. ‘sha’ vs. ‘da’ is easy, but ‘sha’ vs. ‘cha’ is harder). Second, Engineer et al. found that the ease with which rats can discriminate between two speech syllables can be predicted by how different the primary auditory cortex neurograms for these syllables are.
These data would suggest that the representation of speech in primary auditory cortex is still a relatively unsophisticated time–frequency representation of sound features, with very little in the way of recognition, categorization, or interpretation. Calling primary auditory cortex unsophisticated is, however, probably doing it an injustice. Other animal experiments indicate that neurons in the primary auditory cortex can, for example, change their frequency tuning quickly and substantially if a particular task requires attention to be directed to a particular frequency band (Edeline, Pham, & Weinberger, 1993; Fritz et al., 2003). Primary auditory cortex neurons can even become responsive to stimuli or events that aren’t auditory at all if these events are firmly associated with sound‐related tasks that an animal has learned to master (Brosch, Selezneva, & Scheich, 2005). Nevertheless, it is currently thought that the neural representations of sounds and events in the primary auditory cortex are probably based on detecting relatively simple acoustic features and are not specific to speech or vocalizations, given that the primary cortex does not seem to have any obvious preference for speech over nonspeech stimuli. In the human brain, to find the first indication of areas that appear to prefer speech to other, nonspeech sounds, we must move beyond the tonotopic maps of the primary auditory cortex (Belin et al., 2000; Scott et al., 2000).
In the following sections we will continue our journey through the auditory system into cortical regions that appear to make specialized contributions to speech processing, and which are situated in the temporal, parietal, and frontal lobes. We will also discuss how these regions communicate with each other in noisy contexts and during self‐generated speech, when information from the (pre)motor cortex influences speech perception, and look at representations of speech in time. Figure 3.6 introduces the regions and major connections to be discussed. In brief, we will consider the superior temporal gyrus (STG) and the premotor cortex (PMC), and then loop back to the STG to discuss how brain regions in the auditory system work together as part of a dynamic network.
Figure 3.6 A map of cortical areas involved in the auditory representation of speech. PAC = primary auditory cortex; STG = superior temporal gyrus; aSTG = anterior STG; pSTG = posterior STG; IFG = inferior frontal gyrus; PMC = pre‐motor cortex; SMC = sensorimotor cortex; IPL = inferior parietal lobule. Dashed lines indicate medial areas.
(Source: Adapted from Rauschecker & Scott, 2009.)
What does the higher‐order cortex add?
All the systems that we have reviewed so far on our journey along the auditory pathway have been general auditory‐processing systems. So, although they are important for speech processing, their function is not speech specific. For example, the cochlea converts pressure waves into electrical impulses, whether the pressure waves are encoding a friendly ‘hello’ or the sound of falling rain; the subcortical pathways process and propagate these neural signals to the primary auditory cortex, regardless of whether they are encoding a phone conversation, barking dogs, or noisy traffic; and the primary auditory cortex exhibits a tonotopic representation of an auditory stimulus, whether that stimulus is part of a Shakespearean soliloquy or of Ravel’s Boléro. In this section, we encounter a set of cortical areas that preferentially process speech over other kinds of auditory stimuli. We will also describe deeply revealing new work into the linguistic‐phonetic representation of speech, obtained using surgical recordings in human brains.
Speech‐preferential areas
That areas of the brain exist that are necessary for the understanding of speech but not for general sound perception has been known since the nineteenth century, when the German neurologist Carl Wernicke associated the aphasia that bears his name with damage to the STG (Wernicke, 1874). Wernicke’s eponymous area was, incidentally, reinterpreted by later neurologists to refer only to the posterior third of the STG and adjacent parietal areas (Bogen & Bogen, 1976), although some disagreement about its precise boundaries continues until this day (Tremblay & Dick, 2016).
With the advent of fMRI at the end of the twentieth century, the posterior STG (pSTG) was confirmed to respond more strongly to vocal sounds than to nonvocal sounds (e.g. speech, laughter, or crying compared to the sounds of wind, galloping, or cars; Belin et al., 2000). Neuroimaging also revealed a second, anterior, area in the STG, which responds more to vocal than to nonvocal sounds (Belin et al., 2000). These voice‐preferential areas can be found in both hemispheres of the brain. Additional studies have shown that it is not just the voice but also intelligible speech that excites these regions, with speech processing being more specialized in the left hemisphere (Scott et al., 2000). Anatomically, the anterior and posterior STG receive white‐matter connections from the primary auditory cortex, and in turn feed two auditory‐processing streams, one antero‐ventral, which extends into the inferior frontal cortex, and the other postero‐dorsal, which curves into the inferior parietal lobule. The special function of these streams remains a matter of debate. For example, Rauschecker and Scott (2009) propose that the paths differ in processing what and where information in the auditory signal, where what refers to recognizing the cause of the sound (e.g. it’s a thunderclap) and where to locating the sound’s spatial location (e.g. to the west). Another, more linguistic, suggestion is that the ventral stream is broadly semantic, whereas the dorsal stream may be described as more phonetic in nature (Hickok & Poeppel, 2004). Whatever the functions, however, there appear to be two streams diverging around the anterior and posterior STG.
Over the years, these early STG results have been replicated many times using neuroimaging (Price, 2012). Each technique for observing activity of the human brain, whether it is noninvasive magnetoencephalography (MEG) or fMRI, or invasive surgical techniques such electrocorticography (ECoG; described in the next section), all have their limitations