Computational Statistics in Data Science. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 40
4 Convolutional Neural Networks
4.1 Introduction
A CNN is a modified DNN that is particularly well equipped to handling image data. CNN usually contains not only fully connected layers but also convolutional layers and pooling layers, which make a difference. Image is a matrix of pixel values, which should be flattened to vectors before feeding into DNN as DNN takes a vector as input. However, spatial information might be lost in this process. The convolutional layer can take a matrix or tensor as input and is able to capture the spatial and temporal dependencies in an image.
In the convolutional layer, the weight matrix (kernel) scans over the input image to produce a feature matrix. This process is called convolution operation. The pooling layer operates similar to the convolutional layer and has two types: Max Pooling and Average Pooling. The Max Pooling layer returns the maximum value from the portion of the image covered by the kernel matrix. The Average Pooling layer returns the average of all values covered by the kernel matrix. The convolution and pooling process can be repeated by adding additional convolutional and pooling layers. Deep convolutional networks have been successfully trained and used in image classification problems.
Figure 2 Convolution operation with stride size
.4.2 Convolutional Layer
The convolution operation is illustrated in Figure 2. The weight matrix of the convolutional layer is usually called the kernel matrix. The kernel matrix (
Another commonly used structure in a CNN is the pooling layer, which is good at extracting dominant features from the input. Two main types of pooling operation are illustrated in Figure 3. Similar to a convolution operation, the kernel shifts over the input matrix with a specified stride size. If Max Pooling is applied to the input, the maximum of the covered portion will be taken as the result. If Average Pooling is applied, the mean of the covered portion will be calculated and taken as the result. The example in Figure 3 shows the result of pooling with kernel size that equals
4.3 LeNet‐5
LeNet‐5 is a CNN introduced by LeCun et al. [8]. This is one of the earliest structures of CNNs and was initially introduced to do handwritten digit recognition on the MNIST dataset [9]. The structure is straightforward and simple to understand, and details are shown in Figure 4.
The LeNet‐5 architecture consists of seven layers, where three are convolutional layers, two are pooling layers, and two are fully connected layers. LeNet‐5 takes images of size
Figure 3 Pooling operation with stride size
.LeNet‐5 of LeCun et al. [8].
Source: Modified from LeCun et al. [8].
The first layer (C1) is a convolutional layer, which consists of six kernel matrices of size
Table 1 Connection between input and output matrices in the third layer of LeNet‐5 [8].
Source: LeCun et al. [8].
Indices of output matrices | ||
---|---|---|
1 | 1 |
5
|