The Handbook of Speech Perception. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу The Handbook of Speech Perception - Группа авторов страница 71

The Handbook of Speech Perception - Группа авторов

Скачать книгу

to the spectral shape at stop consonant onset (Blumstein & Stevens, 1980; see also Chang & Blumstein, 1981).

      Invariant properties were identified for additional phonetic features, giving rise to a theory of acoustic invariance hypothesizing that, despite the variability in the acoustic input, there were more generalized patterns that provided the listener with a stable framework for the perception of the phonetic features of language (Blumstein & Stevens, 1981; Stevens & Blumstein, 1981; see also Kewley‐Port, 1983; Nossair & Zahorian, 1991). These features include those signifying manner of articulation for [stops], [glides], [nasals], and [fricatives] (Kurowski & Blumstein, 1984; Mack & Blumstein, 1983; Shinn & Blumstein, 1984; Stevens & Blumstein, 1981). Additionally, research has shown that if the speech auditory input were normalized for speaker and vowel context, generalized patterns can be identified for both stop (Johnson, Reidy, & Edwards, 2018) and fricative place of articulation (McMurray & Jongman, 2011).

      A new approach to the question of invariance provides perhaps the strongest support for the notion that listeners extract global invariant acoustic properties in processing the phonetic categories of speech. Pioneering work from the lab of Eddie Chang is examining neural responses to speech using electrocorticography (ECoG). Here, intracranial electrophysiological recordings are made in patients with intractable seizures, with the goal of identifying the site of seizure activity. A grid of electrodes is placed on the surface of the brain and neural activity is recorded directly, with both good spatial and temporal resolution. In a recent study (Mesgarani et al., 2014), six participants listened to 500 natural speech sentences produced by 400 speakers. The sentences were segmented into sequences of phonemes. Results showed, not surprisingly, responses to speech in the posterior and mid‐superior temporal gyrus, consistent with fMRI studies showing that the perception of speech recruits temporal neural structures adjacent to the primary auditory areas (for reviews see Price, 2012; Scott & Johnsrude, 2003). Critically important were the patterns of activity that emerged. In particular, Mesgarani et al. (2014) showed selective responses of individual electrodes to features defining natural classes in English. That is, selective responses occurred for stop consonants including [p t k b d g], fricative consonants [s z f š ϴ], and nasals [m n ƞ]. That these patterns emerged across speakers, vowel, and phonetic contexts indicate that the inherent variability in the speech stream was essentially averaged out, leaving generalized patterns common to those features representing manner of articulation (see also Arsenault & Buchsbaum, 2015). It is unclear whether the patterns extracted are the same as those identified in the Stevens and Blumstein studies described above. However, what is clear is that the basic representational units corresponding to these features are acoustic in nature.

      In another notable study out of the Chang lab, Cheung and colleagues (2016) used ECoG to examine neural responses to speech perception in superior temporal gyrus sites, as they did in Mesgarani et al. (2014). Critically, they also examined neural responses to both speech perception and speech production in frontal areas, in particular in the motor cortex – the ventral half of lateral sensorimotor cortex (vSMC). Nine participants listened to and produced the consonant–vowel (CV) syllables [pa ta ka ba da ga, sa, ša] in separate tasks, and in a third task, passively listened to portions of a natural speech corpus (TIMIT) consisting of 499 sentences spoken by a total of 400 male and female speakers. As expected, for production, responses in the vSMC reflected the somatotopic representation of the motor cortex with distinct clustering as a function of place of articulation. That is, as expected, separate clusters emerged reflecting the different motor gestures used to produce labial, alveolar, and velar consonants.

      Results of the passive listening task replicated Mesgarani et al.’s (2014) findings, showing selective responses in the superior temporal gyrus (STG) to manner of articulation as a function of manner of articulation, that is, the stop consonants clustered together and the fricative consonants clustered together. Of importance, a similar pattern emerged in the vSMC: neural activity clustered in terms of manner of articulation, although interestingly the consonants within each cluster did not group as closely as the clusters that emerged in the STG. Thus, frontal areas are indeed activated in speech perception; however, this activation appears to correspond to the acoustic representation of speech extracted from the auditory input rather than being a transformation of the auditory input to articulatory, motor, or gestural representations. While only preliminary, these neural findings suggest that the perceptual representation of features, even in motor areas, are acoustic or auditory in nature, not articulatory or motor. These results are preliminary but provocative. Additional research is required to examine neural responses in frontal areas to auditory speech input to the full consonant inventory across vowel contexts, phonetic position, and speakers. The question is: When consonant, vowel, or speaker variability is increased in the auditory input, will neural responses in frontal areas pattern with spectral and temporal features or gestural features.

      So what is the story? Does acoustic invariance obviate variability? Does variability trump invariance? In both cases, we believe not. Both stable acoustic patterns and variability inherent in the speech stream play a critical role in speech perception and word recognition processes. Invariant acoustic patterns corresponding to features allow for stability in perception. As such, features serve as essential building blocks for the speaker‐hearer in processing the sounds of language. They provide the framework for the speaker‐hearer in processing speech and ultimately words, by allowing for acoustically variable manifestations of sound in different phonetic contexts to be realized as one and the same phonetic dimension. In short, they serve as a means of bootstrapping the perceptual system for the critical job of mapping the auditory input not only onto phonetic categories but also onto words.

Скачать книгу