Читать онлайн книгу - The Concise Encyclopedia of Applied Linguistics. Carol A. Chapelle. Языкознание. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

The Concise Encyclopedia of Applied Linguistics - Carol A. Chapelle

Скачать книгу

Yu & Deng, 2015; Mitra et al., 2017; Zhang et al., 2017).

Deep learning refers to a set of machine learning techniques and models that are based on nonlinear information processing and learning of feature representations. One such model is deep neural network (DNN), which started gaining widespread adoption in ASR systems around 2010 (Deng & Yu, 2014). Unlike HMMs and traditional ANNs that rely on shallow architecture (i.e., one hidden layer) and can only handle context‐dependent, constrained input due to their susceptibility to background noise and differences between training and testing conditions (Mitra et al., 2017), DNNs use multiple layers of representation for acoustic modeling that improve speech recognition performance (Deng & Yu, 2014). Recent studies have shown that DNN‐based ASR systems can significantly increase recognition accuracy (Mohamed, Dahl, & Hinton, 2012; Deng et al., 2013; Yu & Deng, 2015) and reduce the relative error rate by 20–30% or more (Pan, Liu, Wang, Hu, & Jiang, 2012). Deep learning architecture is now utilized in all major ASR systems.

Challenges and Applications of ASR

Developing an effective ASR system poses a number of challenges. They include speech variability (e.g., intra and interspeaker variability such as different voices, accents, styles, contexts, and speech rates), recognition units (e.g., words and phrases, syllables, phonemes, diphones, and triphones), language complexity (e.g., vocabulary size and difficulty), ambiguity (e.g., homophones, word boundaries, syntactic and semantic ambiguity), and environmental conditions (e.g., background noise or several people speaking simultaneously).

Despite these challenges, in recent years numerous companies such as Nuance Communications, Google, Microsoft, Apple, and Amazon have developed and released ASR systems and software packages that have applications in computer system interfaces (e.g., voice control of computers, data entry, dictation), education (e.g., toys, games, language translators, language‐learning software), healthcare (e.g., systems for creating various medical reports, aids for blind and visually impaired patients), telecommunications (e.g., phone‐based interactive voice response systems for banking services, information services), military (e.g., voice control of fighter aircraft), and—more increasingly—consumer products and services (e.g., car navigation systems, household appliances, and mobile devices). Some of the most well‐known ASR software packages include Dragon NaturallySpeaking, Braina, and LumenVox Speech Recognizer, as well as interactive ASR‐supported systems such as Siri, Cortana, and Alexa.

ASR in Applied Linguistics

ASR has tremendous potential in applied linguistics. In one application area, that of language teaching, Eskenazi (1999) compares the strengths of ASR to effective immersion language learning in developing spoken‐language skills. ASR‐based systems can provide a way for learners of a foreign language to hear large amounts of the foreign language spoken by many different speakers, produce speech in large amounts, and get relevant feedback. In addition, Eskenazi (1999) suggests that using ASR computer‐assisted language learning (CALL) materials allows learners to feel at greater ease and get more consistent assessment of their skills. ASR can also be used for virtual dialogues with native speakers (Harless, Zier, & Duncan, 1999) and for pronunciation training (Dalby & Kewley‐Port, 1999). Most importantly, learners enjoy ASR applications. Study after study indicates that appropriately designed software that includes ASR is a benefit to language learners in terms of practice, motivation, and the feeling that they are actually communicating in the language rather than simply repeating predigested words and sentences.

The holy grail of a computer recognition system that matches human speech recognition remains out of reach at present. A number of limitations appear consistently in attempts to apply ASR systems to foreign language‐learning contexts. The major limitation occurs because most ASR systems are designed to work with a limited range of native speech patterns. Consequently, most ASR systems do not do well in recognizing non‐native speech, both because of unexpected phone mapping and because of prosody differences. In one now dated study, Derwing, Munro, and Carbonaro (2000) tested Dragon Naturally Speaking's ability to identify errors in speech of very advanced L2 speakers of English. Human listeners were able to successfully transcribe between 95% and 99.7% of the words, and the recognition rates by the program were a respectable 90% for native English speakers. In contrast, the system accurately transcribed only around 70% for the non‐native speakers who were mostly intelligible to human listeners. Despite problems with L2 speech recognition, recent studies have demonstrated that even imperfect commercial recognizers can be helpful in providing feedback on pronunciation (McCrocklin, 2016; Liakin, Cardoso, & Liakina, 2017).

In addition, ASR systems have been built for word recognition rather than assessment and feedback, and thus many commercial recognition systems offer only implicit feedback on pronunciation but not specific mispronunciation detection. However, most language learners require assessment of the specifics of their pronunciation and specific feedback to make progress. Fortunately, these are topics that are consistently being explored in speech sciences (e.g., Duan, Kawahara, Dantsuji, & Zhang, 2017).

Automatic Rating of Pronunciation

Historically, many studies have examined whether ASR systems can identify pronunciation errors in non‐native speech and give feedback that can help learners and teachers know what areas of foreign‐language pronunciation are most important for intelligibility. Dalby and Kewley‐Port (1999) demonstrated that such diagnosis and assessment is possible (to some extent) for minimal pairs, and that automatic ratings of pronunciation accuracy can correlate with human ratings. However, the kind of feedback given to learners is not usually very helpful. For those systems that attempt to do so, there are two options: giving a global pronunciation rating or identifying specific errors. To reach either of these goals, ASR systems need to identify word boundaries, accurately align speech to intended targets, and compare the segments produced with those that should have been produced. A variety of systems have been designed to provide global evaluations of pronunciation using automatic measures including speech rate, duration, and spectral analyses (e.g., Neumeyer, Franco, Digalakis, & Weintraub, 2000; Witt & Young, 2000). All of the studies have found that automatic measures are never as good as human ratings, but a combination of automatic measures is always better than a single rating.

ASR systems also have trouble precisely identifying specific errors in articulation, sometimes identifying correct speech as containing errors, but not identifying errors that actually occur. Neri, Cucchiarini, Strik, and Boves (2002) found that only 25% of pronunciation errors were detected by their ASR system, while some correct productions were identified as errors. Truong, Neri, de Wet, Cucchiarini, and Strik (2005) studied whether an ASR system could identify mispronunciations of three sounds typically mispronounced by learners of Dutch. Errors were successfully detected for one of the three sounds, but the ASR system was less successful for the other sounds. However, even modest success in error detection has led to a reduced number of pronunciation errors in comparison to a control group (Cucchiarini, Neri, & Strik, 2009).

Other ASR Applications in Applied Linguistics

There are other areas in which ASR has been used by applied linguists: reading instruction and the use of ASR in dialogue systems used with language‐learning software. One use of ASR that seems to have been particularly successful has been in teaching children to read. Mostow and Aist (1999) found that ASR used in conjunction with an understanding of teacher–student classroom behavior was successful in teaching oral reading skills and word recognition. In a later study, Poulsen, Hastings, and Allbritton (2007) found that reading interventions for young learners of English were far more effective when an ASR system was included.

Another use of ASR technology is in spoken CALL dialogue systems. If a software program for practicing spoken language provides the first line of a dialogue, learners give one of two responses. If these responses are dissimilar, the ASR system can recognize which sentence has been spoken (even with pronunciation

Скачать книгу

The Concise Encyclopedia of Applied Linguistics. Carol A. Chapelle

Чтение книги онлайн.

Читать онлайн книгу The Concise Encyclopedia of Applied Linguistics - Carol A. Chapelle страница 111

Информация о книге:

Challenges and Applications of ASR

ASR in Applied Linguistics

Automatic Rating of Pronunciation

Other ASR Applications in Applied Linguistics