The Handbook of Speech Perception. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу The Handbook of Speech Perception - Группа авторов страница 59

The Handbook of Speech Perception - Группа авторов

Скачать книгу

role for sensory feedback. In such frameworks, auditory feedback is used to establish auditory target regions and to learn and maintain “forward models” that predict the consequences of behavior (e.g. Kawato, 1990). In part, these more computational models were anticipated by earlier physiological ideas about efference copy and corollary discharge.

      The term efference copy is a direct translation of the German Efferenzkopie, introduced by von Holst and Mittelstaedt in 1950 to explain how we might distinguish changes in visual sensations due to our own movement and changes in visual sensations due to movement of the world. Crapse and Sommer (2008) consider corollary discharge (coined by Sperry in the same year, 1950) to be the more general term. Corollary discharges are viewed as copies of motor commands sent to any sensory structures, while efference copies were thought to be sent only to early or primary sensory structures.

      Two current types of neurocomputational models of speech production differentiate how such corollary discharges and sensory feedback could influence speech. The Directions into Velocities of Articulators (DIVA) model and its extension, the Gradient Order DIVA (GODIVA) model, use the comparison of overt auditory feedback to auditory target maps as the mechanism to control speech errors (Guenther & Hickok, 2015). The auditory target maps can be understood as the predictions of the sensory state following a motor program. These predictions are also the goals represented in the speech‐sound map, where a speech sound is defined as a phonetic segment with its own motor program. This model requires two sensory‐to‐movement mappings to be learned in development. The speech‐sound map must be mapped to appropriate movements in what is considered a forward model. When errors are detected by mismatches between feedback and predicted sensory information, a correction must be generated. The sensorimotor mapping responsible for such corrective movements is considered an inverse model.

      In contrast, the state feedback control model of speech production (SFC), or its extension, the hierarchical state feedback control model (HSFC) assumes an additional internal feedback loop (Hickok, 2012; Houde & Nagarajan, 2011; Houde & Chang, 2015). Similar to the DIVA models, the SFC models incorporate a form of corollary discharge. One critical difference is that the corollary discharge in SFC models is checked against an internal target map rather than overt auditory feedback (i.e. a prediction of speech errors is generated and thus provides a mechanism to prevent such errors). Overt auditory feedback is included in the model through its influence on how the speech‐error predictions are converted into corrections (Houde & Nagarajan, 2011).

       Neural processing of feedback

      There is an extensive literature on the neural substrates supporting speech production (see Guenther, 2016, for a review). Much of this is based on mapping the speech‐production network using fMRI (Guenther, Ghosh, & Tourville, 2006). Our focus here is more narrow – how speech sounds produced by the talker are dealt with in the nervous system. The neural processing of self‐produced sound necessitates mechanisms that allow the differentiation between sound produced by oneself and sound produced by others. Two coexisting processes may play a role in this: (1) a perceptual suppression of external sound and voices, and (2) specialized processing of one’s own speech (Eliades & Wang, 2008). Cortical suppression has unique adaptive functions depending on the species. In nonhuman primates, for example, the ability to discern self‐vocalization from external sound serves to promote antiphonal calling whereby the animal must recognize their species‐specific call and respond by producing the same call (Miller & Wang, 2006). Takahashi, Fenley, and Ghazanfar (2016) have invoked the development of self‐monitoring and self‐recognition as essential in developing coordinated turn taking in marmoset monkeys.

      Muller‐Preuss and Ploog (1981) found that most neurons in the primary auditory cortex of unanesthetized squirrel monkeys that were excited in response to a playback of self‐vocalization were either weakened or completely inhibited during phonation. However, approximately half of superior temporal gyrus (primary auditory cortex) neurons do not demonstrate that distinction (Muller‐Preuss & Ploog, 1981). This ultimately reflects phonation‐dependent suppression in specific populations of auditory cortical neurons. Electrocorticography data in humans has also supported the idea that specific portions of the auditory cortex are supporting auditory feedback processing (Chang et al., 2013).

      In a magnetoencephalography (MEG) study, Houde and colleagues (2002) investigated directly whether vocalization‐induced auditory cortex suppression resulted from a neurological comparison between an incoming signal (auditory feedback) and an internal “prediction” of that signal. They created a discrepancy, or “mismatch,” between the signal and expectation by altering the auditory feedback. Specifically, participants heard a sum of their speech and white noise that lasted the duration of their utterance. The authors found that, when feedback was altered using the gated noise (speech plus white noise), self‐produced speech no longer suppressed M100 amplitude in the auditory cortex. Suppression was observed during normal self‐produced speech. Therefore, these findings support a forward model whereby expected auditory feedback during talking produces cortical suppression of the auditory cortex.

      In order to determine whether a forward model system truly regulates cortical suppression of the human auditory cortex during speech production, Heinks‐Maldonado and colleagues (2005) examined event‐related potentials (N100) during speech production. Like Houde et al. (2002), they found that the amplitude of N100 was reduced in response to unaltered vocalization relative to

Скачать книгу