Читать онлайн книгу - Computational Statistics in Data Science. Группа авторов. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Computational Statistics in Data Science - Группа авторов

Скачать книгу

an MLP can have any number of hidden layers. The more hidden layers there are, the more complex the model, and therefore the more difficult it is to train/optimize the weights. The model remains almost exactly the same, except for the insertion of multiple hidden layers between the first hidden layer and the output layer. Values for each node in a given layer are determined in the same way as before, that is, as a nonlinear transformation of the values of the nodes in the previous layer and the associated weights. Training the network via backpropagation is almost exactly the same.

4 Convolutional Neural Networks

4.1 Introduction

A CNN is a modified DNN that is particularly well equipped to handling image data. CNN usually contains not only fully connected layers but also convolutional layers and pooling layers, which make a difference. Image is a matrix of pixel values, which should be flattened to vectors before feeding into DNN as DNN takes a vector as input. However, spatial information might be lost in this process. The convolutional layer can take a matrix or tensor as input and is able to capture the spatial and temporal dependencies in an image.

In the convolutional layer, the weight matrix (kernel) scans over the input image to produce a feature matrix. This process is called convolution operation. The pooling layer operates similar to the convolutional layer and has two types: Max Pooling and Average Pooling. The Max Pooling layer returns the maximum value from the portion of the image covered by the kernel matrix. The Average Pooling layer returns the average of all values covered by the kernel matrix. The convolution and pooling process can be repeated by adding additional convolutional and pooling layers. Deep convolutional networks have been successfully trained and used in image classification problems.

Figure 2 Convolution operation with stride size

4.2 Convolutional Layer

The convolution operation is illustrated in Figure 2. The weight matrix of the convolutional layer is usually called the kernel matrix. The kernel matrix () shifts over the input matrix and performs elementwise multiplication between the kernel matrix () and the covered portion of the input matrix (), resulting in a feature matrix (). The stride of the kernel matrix determines the amount of movement in each step. In the example in Figure 2, the stride size is 1, so the kernel matrix moves one unit in each step. In total, the kernel matrix shifts 9 times, resulting in a feature matrix. The stride size does not have to be 1, and a larger stride size means fewer shifts.

Another commonly used structure in a CNN is the pooling layer, which is good at extracting dominant features from the input. Two main types of pooling operation are illustrated in Figure 3. Similar to a convolution operation, the kernel shifts over the input matrix with a specified stride size. If Max Pooling is applied to the input, the maximum of the covered portion will be taken as the result. If Average Pooling is applied, the mean of the covered portion will be calculated and taken as the result. The example in Figure 3 shows the result of pooling with kernel size that equals and stride that equals 1 on a input matrix.

4.3 LeNet‐5

LeNet‐5 is a CNN introduced by LeCun et al. [8]. This is one of the earliest structures of CNNs and was initially introduced to do handwritten digit recognition on the MNIST dataset [9]. The structure is straightforward and simple to understand, and details are shown in Figure 4.

The LeNet‐5 architecture consists of seven layers, where three are convolutional layers, two are pooling layers, and two are fully connected layers. LeNet‐5 takes images of size as input and outputs a 10‐dimensional vector as the predict scores for each class.

Figure 3 Pooling operation with stride size

Figure 4

LeNet‐5 of LeCun et al. [8].

Source: Modified from LeCun et al. [8].

The first layer (C1) is a convolutional layer, which consists of six kernel matrices of size 5 times 5 and stride 1. Each of the kernel matrices will scan over the input image and produce a feature matrix of size 28 times 28 . Therefore, six different kernel matrices will produce six different feature matrices. The second layer (S2) is a Max Pooling layer, which takes the matrices as input. The kernel size of this pooling layer is 2 times 2 , and the stride size is 2. Therefore, the outputs of this layer are six 14 times 14 feature matrices.

Table 1 Connection between input and output matrices in the third layer of LeNet‐5 [8].

Source: LeCun et al. [8].

	Indices of output matrices

1	1	5 Скачать книгу В начало < 35 36 37 38 39 40 41 42 43 44 > В конец e-mail: [email protected]

Computational Statistics in Data Science. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 40

Информация о книге:

4 Convolutional Neural Networks

4.1 Introduction

4.2 Convolutional Layer

4.3 LeNet‐5

Computational Statistics in Data Science. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 40

Информация о книге:

4 Convolutional Neural Networks 4.1 Introduction

4.2 Convolutional Layer

4.3 LeNet‐5

4 Convolutional Neural Networks

4.1 Introduction