The Handbook of Speech Perception. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу The Handbook of Speech Perception - Группа авторов страница 53
So far, our example may seem tedious and somewhat arbitrary: we had to come up with attributes such as “manufactured” or “edible,” then consider their merit as semantic feature dimensions without any obvious objective criteria. However, there are many ways to automatically search for word embeddings without needing to dream up a large set of semantic fields. An incrementally more complex way is to rely on the context words that each one of our target words occurs within a corpus of sentences. Consider a corpus that contains exactly four sentences.
1 The boy rode on the airplane.
2 The boy also rode on the boat.
3 The celery tasted good.
4 The strawberry tasted better.
Our target‐words are, again, ‘airplane,’ ‘boat,’ ‘celery,’ and ‘strawberry.’ The context‐words are ‘also,’ ‘better,’ ‘boy,’ ‘good,’ ‘on,’ ‘rode,’ ‘tasted,’ and ‘the’ (ignoring capitalization). If we create a table of target words in rows and context words in columns, we can count how many times each context word occurs in a sentence with each target word. This will produce a new set of word embeddings (Table 3.2).
Unlike the previous semantic‐field embeddings, which were constructed using our “expert opinions,” these context‐word embeddings were learned from data (a corpus of four sentences). Learning a set of word embeddings from data can be very powerful. Indeed we can automate the procedure; and even a modest computer can process very large corpora of text to produce embeddings for hundreds of thousands of words in seconds. Another strength of creating word embeddings like these is that the procedure is not limited to concrete nouns, since context words can be found for any target word – whether an abstract noun, verb, or even a function word. You may be wondering how context words are able to represent meaning, but notice that words with similar meanings are bound to co‐occur with similar context words. For example, an ‘airplane’ and a ‘boat’ are both vehicles that you ride in, so they will both occur quite frequently in sentences with the word ‘rode’; however, one will rarely find sentences that contain both ‘celery’ and ‘rode.’ Compared to ‘airplane’ and ‘boat,’ ‘celery’ is more likely to occur in sentences containing the word ‘tasted.’ As the English phonetician Firth (1957, p. 11) wrote: “You shall know a word by the company it keeps.”
Table 3.2 Context‐word encodings of four words.
Word | also | better | boy | good | on | rode | tasted | the |
---|---|---|---|---|---|---|---|---|
airplane | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 2 |
boat | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 2 |
celery | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
strawberry | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
With a reasonable vector representation for words like these, one can begin to see how it may be possible to predict the brain activation for word meanings (Mitchell et al., 2008). Start with a fairly large set of words and their vector representations, and record the brain activity they evoke. Put aside some of the words (including perhaps the word ‘strawberry’) and use the remainder as a training set in order to find the best linear equation that maps from word vectors to patterns of brain activation. Finally, use that equation to predict what the brain activation should have been for the words you held back, and test how similar that predicted brain activation is to the one that is actually observed, and whether the activation patterns for ‘strawberry’ is indeed more similar to that of ‘celery’ than it is to that of ‘boat.’ One similarity measure commonly used for this sort of problem is the cosine similarity, which can be defined for two vectors ⃗p and ⃗q, according to the following formula:
Now if we plug the context‐word embeddings for each pair of words from our four‐word set into this equation, we end up with the similarity scores shown in Table 3.3. Note that numbers closer to 1 mean more similar and numbers closer to 0 mean more dissimilar. A perfect score of 1 actually means identical, which we see when we compare any word embedding with itself. Note that we have only populated the diagonal and upper triangle of this table, because the lower part is a reflection of the upper part, and therefore redundant.
As expected, the words ‘airplane’ and ‘boat’ received a very high similarity score (0.94), whereas ‘airplane’ and ‘celery,’ for example, received lower similarity scores (0.41). The score for ‘celery’ and ‘strawberry,’ however, were also more similar (0.67). Summary statistics such as these for the similarity between two very long lists of numbers are quick and easy to compute, even for very long lists of numbers. Exploring them also helps to build an intuition about how encoding models, such as those of Mitchell et al. (2008), represent the meanings of words and thus what the brain maps they discover represent. Specifically, Firth’s (1957) idea that the company a word keeps can be used to build up a semantic representation of a word has had a profound impact on the study of semantics recently, especially in the computational fields of natural language processing and machine learning (including deep learning). Mitchell et al.’s (2008) landmark study bridged natural language processing with neuroscience in a way that finds common ground for both fields at the time of writing. Not only do we expect words that belong to similar semantic domains to co‐occur with similar context words, but if the brain is capable of statistical learning, as many believe, then this is exactly the kind of pattern we should expect to find encoded in neural representations.
Table 3.3 Cosine similarities between four words.
|
airplane
|
---|