Machine Vision Inspection Systems, Machine Learning-Based Approaches. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Machine Vision Inspection Systems, Machine Learning-Based Approaches - Группа авторов страница 16
Keywords: Character recognition, capsule networks, deep learning, one-shot, learning, sinhala dataset
2.1 Introduction
Ability to learn visual concepts using a small number of examples is a distinctive ability of human cognition. For instance, even a child can correctly distinguish between a bicycle and car, after showing them one example. Taking this one step further, if we show them a plane and ship, which they have never seen before, they could correctly understand that they are two different vehicle types. One could argue that this ability is an application of previous experience and domain knowledge to new situations. How could we reproduce this same ability in machines? In this chapter, we propose a method to transfer previously learned knowledge about characters to differentiate between new character images.
There are versatile applications in image classification using few training samples [1–3]. Being able to classify images without any previous training possess greater importance in situations like character recognition, signature verification, and robot vision. This paradigm, where only one sample is used to learn and make predictions, is known as one-shot learning [4]. Especially when it comes to low resource languages, currently available deep learning techniques fail due to lack of large labeled datasets. If a model could do one-shot learning for an alphabet using a single image as a training sample for classification, that model could make a massive impact for optical character recognition [5].
This chapter uses Omniglot dataset [6] to train such one-shot learning model. Omniglot stands for the online encyclopedia of writing systems and languages, which is a dataset of handwritten characters and widely used in similar tasks that need a small number of data samples belonging to many classes. In the research, we extend this dataset by introducing a set of characters from Sinhala language, which has around 17 million native speakers and mainly used only in Sri Lanka. Due to lack of available resources for the language, using novel deep learning-based Optical Character Recognition (OCR) methods are challenging. With the trained model introduced in this chapter, significant character recognition accuracy was achieved for Sinhala language using a small dataset.
Character detection using one-shot learning has been addressed previously by researchers such as Lake et al. [6] using generative character model, Koch et al. [7] using Convolutional Neural Networks (CNN). In this proposed study, we focus on using capsule networks integrated into a Siamese network [8] to learn a generalized abstract function which outputs the similarity of two images. Capsule networks are the latest advancement in the computer vision domain, and they possess several advantages over traditional convolutional layers [9].
Translation invariance or disability to identify the position of an object relative to another is one main shortcoming of convolutional layers compared to capsules [10]. Further use of global pooling in CNN causes loss of valuable information. Hinton et al. [11] have proposed capsule networks as a solution to these problems. In this study, by using a capsule-based network architecture, we achieve equal level performance as deep convolutional Siamese networks, which proposed in previous literature but using a smaller number of parameters.
The main contributions of the study:
Propose a novel capsule-based Siamese network architecture to perform one-shot learning,
Improve energy function of Siamese network to grab complex information output by Capsules,
Evaluate and analyse the performance of the model to identify characters which are previously not seen,
Extend Omniglot dataset by adding new characters from Sinhala language.
The chapter is structured as follows. Section 2.2 explores related learning techniques. Section 2.3 describes the design and implementation aspects of the proposed solution for the capsule layers-based Siamese network. Section 2.4 evaluates the methodology using several experiments and analyzing the results. Section 2.5 discusses the contribution of the proposed solution with the existing studies and concludes the chapter.
2.2 Background Study
2.2.1 Convolutional Neural Networks
Convolutional neural networks have been commonly used in computer vision research and applications [12] due to their ability to process a large amount of data and extract meaningful and powerful representations from it [13–15]. Before the era of CNNs, computer vision tasks largely relied on handcrafted features and mathematical modeling. There a large number of applications that relies on features Gabor wavelets [16–18], fractal dimensions [19–21], symmetric axis chords [22].
However, when it comes to handwritten character classification for low resource languages, the deep neural network’s this ability becomes more of a limitation, as not much of labeled training data available.
An ideal solution for handwritten character recognition should be based on zero-shot learning, where no previous sample used to classify or one- shot learning, where only one or few samples are used for training [23]. Several attempts have been made to modify different deep neural networks to match requirements of one-shot learning [24–26].
2.2.2 Related Studies on One-Shot Learning
Initial attempts on one-shot learning in computer vision domain are based on probabilistic approaches. Fei-Fei et al. [4] in 2003, have introduced a model to learn visual concepts and then use that knowledge to learn new categories. They have used a variational Bayesian framework. Here, the probabilistic models have used to represent the object groups and a probability density function has used to denote the prior knowledge. Their model has supported to learn four visual concepts, human faces, aeroplanes, motorcycles, and spotted cats. Initially, abstract knowledge is learned by training on many samples belong to three categories. Then this knowledge is used to understand the remaining category with the help of a small number of examples (1 to 5 training examples).
Lately, neural networks came in as a solution to the one-shot learning problem. The two main types of networks used in the one-shot learning tasks are memory augmented neural networks [26, 27] and Siamese neural networks [7, 24, 28]. Memory augmented neural networks are similar to Recurrent neural networks (RNN), but they have an external memory and try to separate the computation from memory [29]. Siamese networks have two similar branches of networks, and the output of those compared to get a decision on one-shot task. Most of the time, Siamese network branches are built on convolutional layers or fully connected layers.
2.2.3 Character Recognition as a One-Shot Task
Lake et al. [6] in 2013, has introduced Omniglot dataset and defined a one- shot learning problem there as a handwritten character recognition task. Omniglot is a handwritten character dataset similar to digit dataset named MNIST, which stands for Modified National Institute of Standards and Technology database [30]. However, in contrast to MNIST, Omniglot has 1,600 characters belonging to 50 alphabets.