Computational Statistics in Data Science. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 46

Computational Statistics in Data Science - Группа авторов

Скачать книгу

2nd Column equals sigma-summation Underscript t equals 0 Overscript upper T Endscripts StartFraction delta script l Superscript left-parenthesis upper T right-parenthesis Baseline Over delta bold-italic h Superscript left-parenthesis upper T right-parenthesis Baseline EndFraction StartFraction delta bold-italic h Superscript left-parenthesis upper T right-parenthesis Baseline Over delta bold-italic c Superscript left-parenthesis upper T right-parenthesis Baseline EndFraction left-parenthesis product Underscript j equals t plus 1 Overscript upper T Endscripts bold-italic f Superscript left-parenthesis t right-parenthesis Baseline plus upper A Superscript left-parenthesis t right-parenthesis Baseline right-parenthesis StartFraction delta bold-italic c Superscript left-parenthesis t right-parenthesis Baseline Over delta bold-italic upper W Subscript 1 f Baseline EndFraction EndLayout"/>

      where upper A Superscript left-parenthesis t right-parenthesis represents other terms in the partial derivative calculation. Since the sigmoid function is used when calculating the values of bold-italic i Superscript left-parenthesis t right-parenthesis Baseline comma bold-italic f Superscript left-parenthesis t right-parenthesis Baseline comma bold-italic o Superscript left-parenthesis t right-parenthesis, this implies that they will be close to either 0 or 1. When bold-italic f Superscript left-parenthesis t right-parenthesis is close to 1, the gradient does not vanish, and when it is close to 0, it means that the previous information is not useful for the current state and should be forgotten.

      We discussed the architectures of four types of neural networks and their extensions in this chapter. There have been many other neural networks proposed in the past years, but the ones discussed in this chapter are the classical ones that served as foundations for many other works. Though DNNs have achieved breakthroughs in many fields, the performances in many fields are far from perfect. Developing new architectures that can improve the performances on various tasks or solve new problems is an important research direction. Analyzing the properties and problems of existing architectures is also of great interest to the community.

      1 1 Larochelle, H., Bengio, Y., Louradour, J., and Lamblin, P. (2009) Exploring strategies for training deep neural networks. J. Mach. Learn. Res., 1, 1–40.

      2 2 Hinton, G.E. and Salakhutdinov, R.R. (2006) Reducing the dimensionality of data with neural networks. Science, 313, 504–507.

      3 3 Hastie, T., Tibshirani, R., and Friedman, J. (2002) The Elements of Statistical Learning, Springer, New York.

      4 4 Boyd, S., Boyd, S.P., and Vandenberghe, L. (2004) Convex Optimization, Cambridge university press.

      5 5 Nocedal, J. and Wright, S. (2006) Numerical Optimization, Springer Science & Business Media.

      6 6 Izenman, A.J. (2008) Modern multivariate statistical techniques. Regression Classif. Manifold Learn., 10, 978–980.

      7 7 Gori, M. and Tesi, A. (1992) On the problem of local minima in backpropagation. IEEE Trans. Pattern Anal. Mach. Intell., 14, 76–86.

      8 8 LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998) Gradient‐based learning applied to document recognition. Proc. IEEE, 86, 2278–2324.

      9 9 LeCun, Y. (1998) The MNIST Database of Handwritten Digits, http://yann.lecun.com/exdb/mnist/ (accessed 20 April 2021).

      10 10 Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012) Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst., 25, 1097–1105.

      11 11 Simonyan, K. and Zisserman, A. (2014) Very deep convolutional networks for large‐scale image recognition. arXiv preprint arXiv:1409.1556.

      12 12 He, K., Zhang, X., Ren, S., and Sun, J. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.

      13 13 Goodfellow, I., Bengio, Y., and Courville, A. (2016) Deep Learning, MIT Press.

      14 14 Krizhevsky, A. (2009) Learning multiple layers of features from tiny images.

      15 15 Bickel, P.J., Li, B., Tsybakov, A.B. et al. (2006) Regularization in statistics. Test, 15, 271–344.

      16 16 Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1986) Learning Internal Representations by Error Propagation. Tech. report. California Univ San Diego La Jolla Inst for Cognitive Science.

      17 17 Kingma, D.P. and Welling, M. (2014) Auto‐Encoding Variational Bayes. International Conference on Learning Representations.

      18 18 Kullback, S. and Leibler, R.A. (1951) On information and sufficiency. Ann. Math. Stat., 22, 79–86.

      19 19 Hochreiter, S. and Schmidhuber, J. (1997) Long short‐term memory. Neural Comput., 9, 1735–1780.

      20 20 Gers, F., Schmidhuber, J., and Cummins, F. (1999) Learning to Forget: Continual Prediction with LSTM. 1999 Ninth International Conference on Artificial Neural Networks ICANN 99. (Conf. Publ. No. 470), vol. 2, pp. 850–855.

       Taiwo Kolajo1,2, Olawande Daramola3, and Ayodele Adebiyi4

       1Federal University Lokoja, Lokoja, Nigeria

       2Covenant University, Ota, Nigeria

       3Cape Peninsula University of Technology, Cape Town, South Africa

       4Landmark University, Omu‐Aran, Kwara, Nigeria

Скачать книгу