Computational Statistics in Data Science. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 39

Computational Statistics in Data Science - Группа авторов

Скачать книгу

alt="tau left-parenthesis dot right-parenthesis"/> is also a nonlinear transformation with a range in the interval left-parenthesis 0 comma 1 right-parenthesis.

      More formally, the map delta provided by an MLP from a sample bold-italic x Subscript i to modifying above y with caret Subscript i can be written as follows:

StartLayout 1st Row 1st Column delta left-parenthesis bold-italic x Subscript i Baseline comma bold-italic theta Subscript script upper M Baseline right-parenthesis equals modifying above y with caret Subscript i Baseline equals tau left-parenthesis bold-italic upper V Superscript upper T Baseline gamma left-parenthesis bold-italic upper W Superscript upper T Baseline bold-italic x Subscript i Baseline right-parenthesis right-parenthesis 2nd Column Blank EndLayout

      where bold-italic upper V equals left-parenthesis bold-italic v 0 comma period period period comma bold-italic v Subscript m Baseline right-parenthesis, bold-italic upper W equals left-parenthesis bold-italic w 0 comma period period period comma bold-italic w Subscript m Baseline right-parenthesis, bold-italic x Subscript i Baseline equals left-parenthesis x Subscript i Superscript 0 Baseline comma x Subscript i Superscript 1 Baseline comma period period period comma x Subscript i Superscript p Baseline right-parenthesis, and tau left-parenthesis dot right-parenthesis and gamma left-parenthesis dot right-parenthesis are nonlinear functions.

stat08316fgz001

      3.3 Training an MLP

      We want to choose the weights and biases in such a way that they minimize the sum of squared errors within a given dataset. Similar to the general supervised learning approach, we want to find an optimal prediction delta Superscript asterisk Baseline left-parenthesis bold-italic upper X comma bold-italic upper W comma bold-italic upper V right-parenthesis such that

      where bold-italic upper X equals left-parenthesis bold-italic x 1 comma bold-italic x 2 comma period period period comma bold-italic x Subscript n Baseline right-parenthesis, and script l left-parenthesis dot comma dot right-parenthesis is cross‐entropy loss.

      (2)StartLayout 1st Row 1st Column script l left-parenthesis modifying above y with caret Subscript i Baseline comma y Subscript i Baseline right-parenthesis equals minus sigma-summation Underscript c equals 1 Overscript m Endscripts y Subscript i comma c Baseline log modifying above y with caret Subscript i comma c Baseline 2nd Column Blank EndLayout

      where m is the total number of classes; y Subscript i comma c Baseline equals 1 if the ith sample belongs to class c, otherwise it is equal to 0; and modifying above y with caret Subscript i comma c is the predicted score of the ith sample belonging to class c.

      We would like to address the issue of possibly being trapped in local minima, as backpropagation is a direct application of gradient descent to neural networks, and gradient descent is prone to finding local minima, especially in high‐dimensional spaces. It has been observed in practice that backpropagation actually does not typically get stuck in local minima and generally reaches the global minimum. There do, however, exist pathological data examples in which backpropagation will not converge to the global minimum, so convergence to the global minimum is certainly not an absolute guarantee. It remains a theoretical mystery why backpropagation does in fact generally converge to the global minimum, and under what conditions it will do so. However, some theoretical results have been developed to address this question. In particular, Gori and Tesi [7] established that for linearly separable data, backpropagation will always converge to the global solution.

Скачать книгу