Machine Vision Inspection Systems, Machine Learning-Based Approaches. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Machine Vision Inspection Systems, Machine Learning-Based Approaches - Группа авторов страница 19
(category_images[category]) 3. traing_couples.add(image_couples) 4. expected_values.add([1] * image_couples.length) 5. for (category1 in cat_list[]) 6. for (category2 in cat_list[]) 7. If (category1 == category2) 8. Continue 9. image_couples=get_different_couples
(category_images[category1],category_images[category2]) 10. training_couples.add(image_couples) 11. expected_values.add([0] * image_couples.length) 12. Shuffle (training_couples, expected_values) 13. return training_couples[], expected_values []
2.4 Experiments and Results
The proposed methodology has experimented with a few models based on capsule networks, while keeping the convolutional Siamese network that has proposed by Koch et al. as a baseline. As an initial attempt to understand the applicability of Capsules in Siamese networks, we integrate the network proposed by Sabour et al. [9] to a Siamese network, which does not give satisfactory result due to its inability to converge properly. Sabour et al. proposed this model for the MNIST dataset which is a collection of 28 × 28 images of numbers. However, in our study, we scale out this model to 105 × 105 images of Omniglot dataset, which makes it highly compute-intensive to train the learning model.
In order to mitigate the high computational power, improvements were made to the previous model based on the ideas proposed in DeepCaps [34] to stack multiple capsule layers and finally replace the L1 distance layer with a vector difference layer. Validation accuracies for different models are reported in Table 2.2. Here, three Siamese networks were tested while keeping Convolutional Siamese network [7], as the base. The network is purely based on Sabour et al. [9] showed poor performance, while Capsule Siamese 1 with deep capsule networks and Capsule Siamese 2 with deep capsule and new vector difference layer shows on par performance to the base model. This is an indication that the original Siamese network with classical Capsule layers is not generalized enough.
Table 2.2 Model validation accuracy.
Class | Agreement (%) |
---|---|
Convolutional Siamese | 94 ± 2% |
Sabour et al. Capsule Siamese | 78 ± 5% |
Deep Capsule Siamese 1 | 89 ± 3% |
Deep Capsule Siamese 2 | 95 ± 2.5% |
2.4.1 N-Way Classification
One expectation of this model is achieving the ability to generalize previous experience and use it to make decisions with completely new unseen alphabets. Thus, the n-way classification task was designed to evaluate the model in classifying previously unseen characters. Here, we have used 30 alphabets having 659 characters from the evaluation set of Omniglot dataset which was not used in the training. However, that makes the model completely unfamiliar with these characters.
In this experiment, we have designed the one-shot learning task as deciding the category of a given test image X out of n given categories. For an n-way classification task, we selected n character categories and selected one-character category from the same set as the test category. Then the one-shot task is prepared with one test image (X) from test category and reference image set {XN}; one image for each character category. The Siamese network is fed with X, Xn couples and predict the similarity. Belonging category, n* is selected as category with the maximum similarity as in Equation (2.3). The argmax function denotes the index of n that maximize F function.
The model is evaluated by N-way classification, N varying in the range [1, 40] and results are depicted in Figure 2.2.
Figure 2.2 Omniglot one-shot learning performance of Siamese networks.
According to Figure 2.2, the proposed model of this study, capsule layer-based Siamese network classification has on par results with Koch et al.’s model with the convolutional Siamese network classification. However, our model has 2.4 million parameters, which is 40% less compared to 4 million parameters in Koch et al.’s model. Although the overall performance of Koch et al.’s model with the convolutional classification, and the proposed model in this study which is based on capsule network, are on par, there are certain cases our model shows superior performance. For instance, the proposed model has a superior capability of identifying minor changes in characters.
For the n-way classification task, the statistical approach random guessing techniques are defined, such that if there are n options and if only one is correct, the chance of prediction being correct is 1/n. Thus, for the repeated experiment the accuracy is considered as a percentage of that probability. Here, the classification accuracy has dropped with the growth of the reference set, because then the solution space is large for the classification task. Nearest neighbor shows exponential degrades while Siamese networks have less reduction with a similar level of performance.
Figure 2.3 shows the classification results obtained by different models, namely the 20-way classification task (top), Capsule Siamese network (middle) and Convolutional Siamese network (bottom). The figure shows the samples of the test images and the corresponding classification results. Capsule based architecture was able to identify small changes in image structure, as shown in the middle row.
Figure 2.3 illustrates a few 20-way classification problems in which the proposed capsule layers-based Siamese network model outperforms the convolutional Siamese network. In most of the cases, the convolutional network fails to identify minor changes in the image, such as small line segments, curves. However, with the detailed features extracted through capsules, such decisions were made easy in the proposed capsule network model.
Figure 2.3 Sample 1 classification results.
Figure 2.4 depicts a few samples, where the proposed capsule network model fails to classify characters correctly. For certain characters, there is a vast difference in the writing styles between two people. In such cases, the proposed capsule layers-based Siamese network underperforms compared to the CNN. Capsule network model fails in certain cases while convolutional units successfully identify the character.
As a solution to the decrease of n-way classification accuracy, we propose