The Handbook of Speech Perception. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу The Handbook of Speech Perception - Группа авторов страница 61
Direct measurements of babbling acoustics have shown evidence for babbling drift, albeit only small effects. For example, Whalen, Levitt, and Goldstein (2007) measured voice onset time (VOT) in French‐ and English‐learning infants at ages 9 and 12 months. There were no differences in VOT or in the duration of prevoicing that was observed. However, there was a greater incidence of prevoicing in the French babies which corresponds to adult French–English differences.
The most serious concern from the existing data is that there is no evidence for speech‐production tuning of targets based on production errors. MacDonald et al. (2012) data suggest that young children do not correct errors. However, there are several caveats to that conclusion. First, the magnitude of the perturbation may have to be within a critical range and the perturbations for all ages in MacDonald et al. were the same in hertz. It is possible that younger children require larger perturbations to elicit compensations. A related issue is that the perturbations may have been within the noisy categories that the children were producing. The variability of production may be an indicator of the category status. However, even if this were true, it begs the question: How could an organism learn to produce adult targets under these conditions? The challenges are enormous. Juveniles in all species have vocal tracts that do not match their parents’ vocal tracts. Birds and other species show marked production variance as juveniles (e.g. Bertram et al., 2014). There is no obvious feedback base mechanism that permits the mapping from adult targets to young productions (see Messum & Howard, 2015). Error correction as normally envisioned in motor control may not be engaged.
Perception–production interaction
This puzzle reflects the general problem of understanding the relationship between the processes of listening to speech and producing it. Liberman (1996, p. 247) stated:
In all communication, sender and receiver must be bound by a common understanding about what counts; what counts for the sender must count for the receiver, else communication does not occur. Moreover, the processes of production and perception must somehow be linked; their representation must, at some point, be the same.
This is certainly true in a very general sense but the roles played in communication by the auditory signal that reaches the listener and by the signal that reaches the speaker are dramatically different. For the listener, the signal is involved in categorical discrimination and information transmission, while for the talker the signal is primarily thought to influence motor precision and error correction. These two issues are not independent but are far from equivalent. The problem for researchers is that the perception and production of speech are so intrinsically intertwined in communication that it is difficult to distinguish the influence of these “two solitudes” of speech research on spoken language.
While historically the relationship between speech perception and production has been implicated as explanations of language change, patterns of language disorder, and the developmental time course of speech acquisition, there has been little comprehensive theorizing about how speech input and output interact (Levelt, 2013). Recently, Kittredge and Dell (2016) outlined three stark hypotheses about the relationship between speech perception and production. In their view, the representations for perception and production could be completely separate, absolutely inseparable, or separable under some if not many conditions.
A number of different types of experimental evidence might distinguish these possibilities, including (1) data that examine whether learning/adaptation changes in perception influence production and vice versa; (2) correlational data showing individual differences in the processing of speech perception and production (e.g. perceptual precision and production variability); and (3) data showing interference between the two processes of perception and production.
Learning/adaptation changes
In speech perception, selective adaptation for both consonants and vowels results in changes to category boundaries after exposure to a repetitive adapting stimulus. Cooper (1974; Cooper & Lauritsen, 1974) reported production changes in produced VOT following repeated presentation of a voiceless adapting stimulus. In a manner similar to the effects of selective adaptation on perceptual category boundaries, talkers produced shorter VOTs after adaptation. The effect was attributed to a perceptuomotor mechanism that mediates both the perception and the production of speech. More recently, Shiller et al. (2009) found that, when subjects produced fricatives with frequency‐altered feedback, they produced fricatives that compensated for the perturbation. Most interestingly, the subjects shifted their perceptual boundary for /s/–/sh/ identification following this production perturbation. However, as Perkell (2012) cautions, the segment durations of the fricatives were far beyond the natural range, raising the possibility that the effect was acoustic rather than phonetic. Lametti et al. (2014) showed the opposite direction of influence. A perceptual training task designed to alter perceptual boundaries between vowels preceded a production task. No shift was observed in baseline vowel formant values but a difference was observed in the magnitude of compensation to F1 perturbations. Oddly, this difference was observed in a follow‐up days later. The persistence is surprising for a number of reasons. First, the speech adaptation effects produced by formant shifts themselves drift away relatively quickly within an experimental session following return to normal feedback. Second, the perceptual training didn’t influence baseline vowel production immediately after training nor days later. The influence of perceptual change on production is shown only in the magnitude of compensation (i.e. in the behavior of the auditory feedback processing system). Finally, the length of effect is noteworthy. While it is not unheard of for perceptual effects to persist across many days, it is not common; the McCollough effect in vision has been shown to last for months after a 15‐minute training period (Jones & Holding, 1975). However, the reinforcement learning paradigm used by Lametti et al. (2014) is considerably different from the adaptation approach used in other studies and suggests a more selective influence on the perception–production linkage.
The published data suggest modest effects from speech‐perception training on speech production and vice versa. As Kittredge and Dell (2016) suggest, the pathway for exchange between the input and output systems may be restricted to a small set of special conditions. Kittredge and Dell suggest that one possibility is that perceptual behavior that involves prediction invokes the motor system and this directly influences production.
A separate line of research has suggested this influence may exist but has shown similar, small effect sizes in experiments. In the study of face‐to‐face conversations, considerable theoretical proposals support the idea that interlocutors align their language at many levels (Garrod & Pickering, 2004). At the phonetic level, the findings have been weak but consistent. Few acoustic findings support alignment but small perceptual effects have been frequently reported (Pardo et al., 2012; Kim, Horton, & Bradlow, 2011. The surprising aspect of these findings is the small effect size. Given the proposed importance of alignment in communication (and the proposed linkage between perception and production; Pickering & Garrod, 2013), the small influence is problematic.