The Concise Encyclopedia of Applied Linguistics. Carol A. Chapelle

Чтение книги онлайн.

Читать онлайн книгу The Concise Encyclopedia of Applied Linguistics - Carol A. Chapelle страница 111

The Concise Encyclopedia of Applied Linguistics - Carol A. Chapelle

Скачать книгу

Yu & Deng, 2015; Mitra et al., 2017; Zhang et al., 2017).

      Deep learning refers to a set of machine learning techniques and models that are based on nonlinear information processing and learning of feature representations. One such model is deep neural network (DNN), which started gaining widespread adoption in ASR systems around 2010 (Deng & Yu, 2014). Unlike HMMs and traditional ANNs that rely on shallow architecture (i.e., one hidden layer) and can only handle context‐dependent, constrained input due to their susceptibility to background noise and differences between training and testing conditions (Mitra et al., 2017), DNNs use multiple layers of representation for acoustic modeling that improve speech recognition performance (Deng & Yu, 2014). Recent studies have shown that DNN‐based ASR systems can significantly increase recognition accuracy (Mohamed, Dahl, & Hinton, 2012; Deng et al., 2013; Yu & Deng, 2015) and reduce the relative error rate by 20–30% or more (Pan, Liu, Wang, Hu, & Jiang, 2012). Deep learning architecture is now utilized in all major ASR systems.

      Developing an effective ASR system poses a number of challenges. They include speech variability (e.g., intra and interspeaker variability such as different voices, accents, styles, contexts, and speech rates), recognition units (e.g., words and phrases, syllables, phonemes, diphones, and triphones), language complexity (e.g., vocabulary size and difficulty), ambiguity (e.g., homophones, word boundaries, syntactic and semantic ambiguity), and environmental conditions (e.g., background noise or several people speaking simultaneously).

      ASR has tremendous potential in applied linguistics. In one application area, that of language teaching, Eskenazi (1999) compares the strengths of ASR to effective immersion language learning in developing spoken‐language skills. ASR‐based systems can provide a way for learners of a foreign language to hear large amounts of the foreign language spoken by many different speakers, produce speech in large amounts, and get relevant feedback. In addition, Eskenazi (1999) suggests that using ASR computer‐assisted language learning (CALL) materials allows learners to feel at greater ease and get more consistent assessment of their skills. ASR can also be used for virtual dialogues with native speakers (Harless, Zier, & Duncan, 1999) and for pronunciation training (Dalby & Kewley‐Port, 1999). Most importantly, learners enjoy ASR applications. Study after study indicates that appropriately designed software that includes ASR is a benefit to language learners in terms of practice, motivation, and the feeling that they are actually communicating in the language rather than simply repeating predigested words and sentences.

      The holy grail of a computer recognition system that matches human speech recognition remains out of reach at present. A number of limitations appear consistently in attempts to apply ASR systems to foreign language‐learning contexts. The major limitation occurs because most ASR systems are designed to work with a limited range of native speech patterns. Consequently, most ASR systems do not do well in recognizing non‐native speech, both because of unexpected phone mapping and because of prosody differences. In one now dated study, Derwing, Munro, and Carbonaro (2000) tested Dragon Naturally Speaking's ability to identify errors in speech of very advanced L2 speakers of English. Human listeners were able to successfully transcribe between 95% and 99.7% of the words, and the recognition rates by the program were a respectable 90% for native English speakers. In contrast, the system accurately transcribed only around 70% for the non‐native speakers who were mostly intelligible to human listeners. Despite problems with L2 speech recognition, recent studies have demonstrated that even imperfect commercial recognizers can be helpful in providing feedback on pronunciation (McCrocklin, 2016; Liakin, Cardoso, & Liakina, 2017).

      In addition, ASR systems have been built for word recognition rather than assessment and feedback, and thus many commercial recognition systems offer only implicit feedback on pronunciation but not specific mispronunciation detection. However, most language learners require assessment of the specifics of their pronunciation and specific feedback to make progress. Fortunately, these are topics that are consistently being explored in speech sciences (e.g., Duan, Kawahara, Dantsuji, & Zhang, 2017).

      ASR systems also have trouble precisely identifying specific errors in articulation, sometimes identifying correct speech as containing errors, but not identifying errors that actually occur. Neri, Cucchiarini, Strik, and Boves (2002) found that only 25% of pronunciation errors were detected by their ASR system, while some correct productions were identified as errors. Truong, Neri, de Wet, Cucchiarini, and Strik (2005) studied whether an ASR system could identify mispronunciations of three sounds typically mispronounced by learners of Dutch. Errors were successfully detected for one of the three sounds, but the ASR system was less successful for the other sounds. However, even modest success in error detection has led to a reduced number of pronunciation errors in comparison to a control group (Cucchiarini, Neri, & Strik, 2009).

      There are other areas in which ASR has been used by applied linguists: reading instruction and the use of ASR in dialogue systems used with language‐learning software. One use of ASR that seems to have been particularly successful has been in teaching children to read. Mostow and Aist (1999) found that ASR used in conjunction with an understanding of teacher–student classroom behavior was successful in teaching oral reading skills and word recognition. In a later study, Poulsen, Hastings, and Allbritton (2007) found that reading interventions for young learners of English were far more effective when an ASR system was included.

      Another use of ASR technology is in spoken CALL dialogue systems. If a software program for practicing spoken language provides the first line of a dialogue, learners give one of two responses. If these responses are dissimilar, the ASR system can recognize which sentence has been spoken (even with pronunciation

Скачать книгу