Machine Vision Inspection Systems, Machine Learning-Based Approaches. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Machine Vision Inspection Systems, Machine Learning-Based Approaches - Группа авторов страница 17
The above-mentioned methods needed some manual feature engineering, but in human cognition, the required features are learned along with the process of learning new visual concepts. For example, when we observe a car, we decompose it to wheels, body, and internal parts spontaneously. Moreover, to differentiate it from a bicycle, we use those learned features. A similar process can be replicated in machines using capsule neural networks.
Koch et al. [7] in 2015, have proposed a model using Siamese neural networks as a solution to the one-shot learning problem. They have used the same dataset and approach as Lake et al. [6], but their model has used convolutional units in neural networks to achieve understanding about the image. According to Hinton et al. [11], CNNs are misguided in what they are trying to achieve and far from how human visual perception works; hence, they have proposed capsules instead of convolutions.
In this chapter, we present a Siamese neural network based on Capsule networks to solve one-shot learning problem. The idea of the capsule first proposed by Hinton et al. in 2011 and later used for numerous applications [31, 32]. Generally, CNNs aim for viewpoint invariance of the “neuron” activities, so that the characters can be recognized irrespective of the viewing angle. This can be performed by a single scalar output to recap the tasks of replicated feature detectors [9]. In contrast to CNN, capsule networks use local “capsules” that can perform computations on the inputs, internally. These results are encapsulated into an informative output vector [11]. Sabour et al. [9], have proposed an algorithm to train capsule networks based on the concept of routing by agreement between capsules. Dynamic routing helps to achieve equivariance, while CNNs can only achieve invariance by the pooling layers.
Table 2.1 summarizes the techniques used in the related studies for One- shot learning. Accordingly, most studies have used capsule-based techniques in recent years. This could be because capsule networks show better generalization with small datasets.
In this chapter, we design a Siamese network similar to Koch et al. but with useful modifications to accommodate more complex details grabbed by capsules. Siamese network is a twin network, which takes two images as input and feeds to the weight sharing twin network. Our contributions in this chapter include exploring the applicability of capsules in Siamese networks, introducing novel architecture to handle intricate details of capsule output, and integrating recent advancement to go deep with capsule networks [33, 34] into Siamese networks.
Table 2.1 Comparison of related studies.
Related work | Bayesian network | Neural network | Siamese neural network | Capsule network |
---|---|---|---|---|
Lake et al. [6] | X | |||
Koch et al. [7] | X | X | ||
Hinton et al. [11] | X | |||
Bertinetto et al. [24] | X | |||
Chen et al. [13] | X | |||
Fei-Fei et al. [4] | X | |||
Lie et al. [15] | X | |||
Bromley et al. [28] | X | |||
Kumar et al. [31] | X | |||
Zhao et al. [32] | X | |||
Sabour et al. [9] | X | |||
Sethy et al. [12] | X |
2.3 System Design
In this research, we define character image classification as a subproblem of character verification and develop an image verification model which learns a function F, as shown in Equation (2.1)
which gives the Pi,j probability of two images, Xi and Xj belonging to the same category. We expect the model to learn a general representation of images that could be applied to unseen data without any further training. After fine-tuning the model for the verification task, we use a one-shot learning approach to classify images, as explained in Section 2.3.1.
We propose a Siamese architecture comprising of a twin capsule network and a fully connected network, for character verification. This architecture mainly consists of a weight sharing twin network, vector difference layer and a fully connected network, as shown in Figure 2.1. Siamese network takes two input images, run them through a feature extraction process, comparison layers, and finally gives out the probability of belonging to the same category. Twin network starts with a convolutional layer and has four capsule layers before the fully connected entity capsule layer. Results from twin network merged using a vector difference function and input to the