Читать онлайн книгу - Computational Statistics in Data Science. Группа авторов. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Computational Statistics in Data Science - Группа авторов

Скачать книгу

l"/> can be any loss function that evaluates the distance between

delta left-parenthesis bold-italic x Subscript i Baseline comma bold-italic theta Subscript script upper M Baseline right-parenthesis

and

, such as cross‐entropy loss and square loss.

2.3 Gradient Descent

The form of the function will usually be fairly complex, so attempting to find via direct differentiation will not be feasible. Instead, we use gradient descent to minimize the error function.

Gradient descent is a general optimization algorithm that can be used to find the minimizer of any given function. We pick an arbitrary starting point, and then at each time point, we take a small step in the direction of the greatest decrease, which is given by the gradient. The idea is that if we repeatedly do this, we will eventually arrive at a minimum. The algorithm guarantees a local minimum, but not necessarily a global one [4]; see Algorithm 1.

Gradient descent is often very slow in machine learning applications, as finding the true gradient of the error criterion usually involves iterating through the entire dataset. Since we need to calculate the gradient at each time step of the algorithm, this leads to having to iterate through the entire dataset a very large number of times. To speed up the process, we instead use a variation on gradient descent known as stochastic gradient descent. Stochastic gradient descent involves approximating the gradient at each time step with the gradient at a single observation, which significantly speeds up the process [5]; see Algorithm 2.

3 Feedforward Neural Networks

3.1 Introduction

A feedforward neural network, also known as a multilayer perceptron (MLP), is a popular supervised learning method that provides a parameterized form for the nonlinear map from an input to a predicted label [6]. The form of here can be depicted graphically as a directed layered network, where the directed edges go upward from nodes in one layer to nodes in the next layer. The neural network has been seen to be a very powerful model, as they are able to approximate any Borel measurable function to an arbitrary degree, provided that parameters are chosen correctly.

3.2 Model Description

We start by describing a simple MLP with three layers, as depicted in Figure 1.

The bottom layer of a three‐layer MLP is called the input layer, with each node representing the respective elements of an input vector. The top layer is known as the output layer and represents the final output of the model, a predicted vector. Again, each node in the output layer represents the respective predicted score of different classes. The middle layer is called the hidden layer and captures the unobserved latent features of the input. This is the only layer where the number of nodes is determined by the user of the model, rather than the problem itself.

The directed edges in the network represent weights from a node in one layer to another node in the next layer. We denote the weight from a node in the input layer to a node in the hidden layer as . The weight from a node in the hidden layer to a node in the output layer will be denoted . In each of the input and hidden layers, we introduce intercept nodes, denoted and , respectively. Weights from them to any other node are called biases. Each node in a given layer is connected by a weight to every node in the layer above except the intercept node.

The value of each node in the hidden and output layers is determined as a nonlinear transformation of the linear combination of the values of the nodes in the previous layers and the weights from each of those nodes to the node of interest. That is, the value of , , is given by , where , , and is a nonlinear transformation with range in the interval . Similarly, the value of , , is given by , where , , and

Скачать книгу

Computational Statistics in Data Science. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 38

Информация о книге:

2.3 Gradient Descent

3 Feedforward Neural Networks

3.1 Introduction

3.2 Model Description

Computational Statistics in Data Science. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 38

Информация о книге:

2.3 Gradient Descent

3 Feedforward Neural Networks 3.1 Introduction

3.2 Model Description

3 Feedforward Neural Networks

3.1 Introduction