The Handbook of Speech Perception. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу The Handbook of Speech Perception - Группа авторов страница 19

The Handbook of Speech Perception - Группа авторов

Скачать книгу

by the carefully selected and recently updated collection of chapters in this handbook, is so stimulating precisely because it illuminates the milestones that mark the path to, through, and beyond this nexus.

      “… hearing and speech perception do not function as independent autonomous streams of information or discrete processing operations that take place in isolation from the structure and functioning of the whole information‐processing system. While it is clear that the early stages of speech recognition in listeners with normal hearing are heavily dependent on the initial encoding and registration of highly detailed sensory information, audibility and the sensory processing of speech is only half of the story”.

      The chapters in this handbook provide a superbly sign‐posted map of the full story.

      Any compendium of knowledge on a particular topic represents a body of knowledge that developed in a specific time and place. The contributors to this handbook cover several generations of researchers spread over many academic disciplines working primarily on both sides of the North Atlantic Ocean. Yet, the scientific study of speech perception as presented in this outstanding handbook is still relatively young and localized. Perhaps one of the lasting lessons of the current pandemic is that we are all even more connected than we thought. New ideas and new ways of knowing can circulate as extensively, though maybe not quite as quickly, as a virus. This bodes well for the future of speech perception research.

      Ann R. Bradlow

      Northwestern University

      Historically, the study of audition has lagged behind the study of vision, partly, no doubt, because seeing is our first sense, hearing our second. But beyond this, and perhaps more importantly, instruments for acoustic control and analysis demand a more advanced technology than their optic counterparts: having a sustained natural source of light, but not of sound, we had lenses and prisms long before we had sound generators and oscilloscopes. For speech, moreover, early work revealed that its key perceptual dimensions are not those of the waveform as it impinges on the ear (amplitude, time), but those of its time‐varying Fourier transform, as it might appear at the output of the cochlea (frequency, amplitude, time). So it was only with the invention of instruments for analysis and synthesis of running speech that the systematic study of speech perception could begin: the sound spectrograph of R. K. Potter and his colleagues at Bell Telephone Laboratories in New Jersey during World War II, the Pattern Playback of Franklin Cooper at Haskins Laboratories in New York, a few years later. With these devices and their successors, speech research could finally address the first task of all perceptual study: definition of the stimulus, that is, of the physical conditions under which perception occurs.

      Yet, a reader unfamiliar with the byways of modern cognitive psychology who chances on this volume may be surprised that speech perception, as a distinct field of study, even exists. Is the topic not subsumed under general auditory perception? Is speech not one of many complex acoustic signals to which we are exposed, and do we not, after all, simply hear it? It is, of course, and we do. But due partly to the peculiar structure of the speech signal and the way it is produced, partly to the peculiar equivalence relation between speaker and hearer, we also do very much more.

      Notice that without the alphabet as a means of notation, linguistics itself, as a field of study, would not exist. But the alphabet is not merely a convenient means of representing language; it is also the primary objective evidence for our intuition that we speak (and language achieves its productivity) by combining a few dozen discrete phonetic elements to form an infinite variety of words and sentences. Thus, the alphabet, recent though it is in human history, is not a secondary, purely cultural aspect of language. The inventors of the alphabet brought into consciousness previously unexploited segmental properties of speech and language, much as, say, the inventors of the bicycle discovered previously unexploited cyclic properties of human locomotion. The biological nature and evolutionary origins of the discrete phonetic categories represented by the alphabet are among many questions on which the study of speech perception may throw light.

      To perceive speech is not merely to recognize the holistic auditory patterns of isolated words or phrases, as a bonobo or some other clever animal might do; it is to parse words from a spoken stream, and segments from a spoken word, at a rate of several scores of words per minute. Notice that this is not a matter of picking up information about an objective environment, about banging doors, passing cars, or even crying infants; it is a matter of hearers recognizing sound patterns coded by a conspecific speaker into an acoustic signal according to the rules of a natural language. Speech perception, unlike general auditory perception, is intrinsically and ineradicably intersubjective, mediated by the shared code of speaker and hearer.

      Curiously, however, the discrete linguistic events that we hear (segments, syllables, words) cannot be reliably traced in either an oscillogram or a spectrogram. In a general way, their absence has been understood for many years as due to their manner of production: extensive temporal and spectral overlap, even across word boundaries, among the gestures that form neighboring phonetic segments. Yet, how a hearer separates the more or less continuous flow into discrete elements is still far from understood. The lack of an adequate perceptual model of the process may be one reason why automatic speech recognition, despite half a century of research, is still well below human levels of performance.

      The ear’s natural ease with the dynamic spectro‐temporal patterns of speech contrasts with the eye’s difficulties: oscillograms are impossible, spectrograms formidably hard, to read – unless one already knows what they say. On the other hand, the eye’s ease with the static linear string of alphabetic symbols contrasts with the ear’s difficulties: the ear has limited powers of temporal resolution, and no one has ever devised an acoustic alphabet more efficient than Morse code, for which professional rates of perception are less than a tenth of either normal speech or normal reading. Thus, properties of speech that lend themselves to hearing (exactly what they are, we still do not know) are obstacles to the eye, while properties

Скачать книгу