The Handbook of Speech Perception. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу The Handbook of Speech Perception - Группа авторов страница 42
190 Treille, A., Cordeboeuf, C., Vilain, C., & Sato, M. (2014). Haptic and visual information speed up the neural processing of auditory speech in live dyadic interactions. Neuropsychologia, 57(1), 71–77.
191 Treille, A., Vilain, C., & Sato, M. (2014).The sound of your lips: Electrophysiological cross‐modal interactions during hand‐to‐face and face‐to‐face speech perception. Frontiers in Psychology, 5, 1–8.
192 Turner, T. H., Fridriksson, J., Baker, J., et al. (2009). Obligatory Broca’s area modulation associated with passive speech perception. Neuroreport, 20(5), 492–496.
193 Uno, T., Kawai, K., Sakai, K., et al. (2015). Dissociated roles of the inferior frontal gyrus and superior temporal sulcus in audiovisual processing: Top‐down and bottom‐up mismatch detection. PLOS ONE, 10(3).
194 van de Rijt, L. P. H., van Opstal, A. J., Mylanus, E. A. M., et al. (2016). Temporal cortex activation to audiovisual speech in normal‐hearing and cochlear implant users measured with functional near‐infrared spectroscopy. Frontiers in Human Neuroscience, 10, 1–14.
195 Van Engen, K. J., Xie, Z., & Chandrasekaran, B. (2016). Audiovisual sentence recognition is not predicted by susceptibility to the McGurk effect. Attention, Perception, &Psychophysics, 79, 396– 403.
196 van Wassenhove, V. (2013). Speech through ears and eyes: Interfacing the senses with the supramodal brain. Frontiers in Psychology, 4, 1–17.
197 van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration in auditory‐visual speech perception. Neuropsychologia, 45(3), 598–607.
198 van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences of the United States of America, 102(4), 1181–1186.
199 Venezia, J. H., Fillmore, P., Matchin, W., et al. (2016). Perception drives production across sensory modalities: A network for sensorimotor integration of visual speech. NeuroImage, 126, 196–207.
200 Venezia, J. H., Thurman, S. M., Matchin, W., et al. (2016). Timing in audiovisual speech perception: A mini review and new psychophysical data. Attention, Perception & Psychophysics, 78(2), 583–601.
201 von Kriegstein, K., Dogan, O., Grüter, M., et al. (2008). Simulation of talking faces in the human brain improves auditory speech recognition. Proceedings of the National Academy of Sciences of the United States of America, 105(18), 6747–6752.
202 von Kriegstein, K., & Giraud, A. L. (2006). Implicit multisensory associations influence voice recognition. PLOS Biology, 4(10), 1809–1820.
203 von Kriegstein, K., Kleinschmidt, A., Sterzer, P., & Giraud, A. L. (2005). Interaction of face and voice areas during speaker recognition. Journal of Cognitive Neuroscience, 17(3), 367–376.
204 Watkins, S., Shams, L., Tanaka, S., et al. (2006). Sound alters activity in human V1 in association with illusory visual perception. NeuroImage, 31(3), 1247–1256.
205 Wayne, R. V., & Johnsrude, I. S. (2012). The role of visual speech information in supporting perceptual learning of degraded speech. Journal of Experimental Psychology: Applied, 18(4), 419–435.
206 Wilson, A., Alsius, A., Paré, M., & Munhall, K. (2016). Spatial frequency requirements and gaze strategy in visual‐only and audiovisual speech perception, Journal of Speech, Language, and Hearing Research, 59, 601–615.
207 Windmann, S. (2004). Effects of sentence context and expectation on the McGurk illusion. Journal of Memory and Language, 50(2), 212–230.
208 Windmann, S. (2007). Sentence context induces lexical bias in audiovisual speech perception. Review of Psychology, 14(2), 77–91.
209 Yakel, D. A., Rosenblum, L. D., & Fortier, M. A. (2000). Effects of talker variability on speechreading. Perception & Psychophysics, 62, 1405–1412.
210 Yamamoto, E., Nakamura, S., & Shikano, K. (1998). Lip movement synthesis from speech based on hidden Markov models. Speech Communication, 26(1–2), 105–115.
211 Yehia, H. C., Kuratate, T., & Vatikiotis‐Bateson, E. (2002). Linking facial animation, head motion, and speech acoustics. Journal of Phonetics, 30(3), 555–568.
212 Yehia, H., Rubin, P., & Vatikiotis‐Bateson, E. (1998). Quantitative association of vocal‐tract and facial behavior. Speech Communication, 26(1–2), 23–43.
213 Zheng, Y., & Samuel, A. G. (2019). How much do visual cues help listeners in perceiving accented speech? Applied Psycholinguistics, 40(1), 93–109.
214 Zilber, N., Ciuciu, P., Gramfort, A., et al. (2014). Supramodal processing optimizes visual perceptual learning and plasticity. NeuroImage, 93, 32–46.
3 How Does the Brain Represent Speech?
OIWI PARKER JONES1 AND JAN W. H. SCHNUPP2
1 University of Oxford, United Kingdom
2 City University of Hong Kong, Hong Kong
Introduction
In this chapter, we provide a brief overview of how the brain’s auditory system represents speech. The topic is vast, many decades of research on the subject have generated several books’ worth of insight into this fascinating question, and getting close up and personal with this subject matter necessitates a fair bit of background knowledge about neuroanatomy and physiology, as well as acoustics and linguistic sciences. Providing a reasonably comprehensive overview of the topic that is accessible to a wide readership, within a short chapter, is a near‐impossible task, and we apologize in advance for the shortcomings that this chapter will inevitably have. With these caveats and without further ado, let us jump right in and begin by examining the question What is there to ‘represent’ in a speech signal?
The word representation is quite widely used in sensory neuroscience, but it is rarely clearly defined. A neural representation tends to refer to the manner in which neural activity patterns encode or process some key aspects of the sensory world. Of course, if we want to understand how the brain listens to speech, then grasping how neural activity in early stages of the nervous system encodes speech sounds is really only a very small part of what we would ideally like to understand. It is a necessary first step that leaves many interesting questions unanswered, as you can easily appreciate if you consider that fairly simple technological devices such as telephone lines are able to represent speech with patterns of electrical activity, but these devices tell us relatively little about what it means for a brain to hear speech. Phone lines merely have to capture enough of the physical parameters of an acoustic waveform to allow the resynthesis of a sufficiently similar acoustic waveform to facilitate comprehension by another person at the other end of the line. Brains, in contrast, don’t just deliver signals to a mind at the other end of the line, but they have to make the mind at the other end of the line, and to do that they have to try to learn something from the speech signal about who speaks, where they might be, what mood they are in, and, most importantly, the ideas the speaker is trying to communicate. Consequently it would be nice to know how the brain represents not just the acoustics, but also the phonetic, prosodic, and semantic features of the speech it hears.
Readers of this volume are likely to be well aware that extracting