Читать онлайн книгу - Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic. Программы. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic

Скачать книгу

the neurons of the same layer. The following reasoning can also be extended to positional GNNs and networks with a different number of layers. The function h_w is formally defined in terms of σ_j, a_j, V_j , and t_j

By the chain differentiation rule, we get

where is the derivative of σ_j , diag is an operator that transforms a vector into a diagonal matrix having such a vector as diagonal, and is the submatrix of V₁ that contains only the weights that connect the inputs corresponding to x_u to the hidden layer. The parameters w affect four components of vec(A_{n, u}), that is, a₃, V₂, a₂ , and . By the properties of derivatives for matrix products and the chain rule

(5.86)

holds. Thus, (vec (R_u,v))^′ · ∂vec(A_n,u)/∂w is the sum of four contributions. In order to derive a method of computing those terms, let I_a denote the a × a identity matrix. Let ⊗ be the Kronecker product, and suppose that P_a is a a² × a matrix such that vec(diag (v) = P_a v for any vector v ∈ R^a. By the Kronecker product’s properties, vec(AB) = (B^′ ⊗ I_a) · vec(A) holds for matrices A, B, and I_a having compatible dimensions [67]. Thus, we have

which implies

Similarly, using the properties vec(ABC) =(C^′ ⊗ A) · vec(B) and vec(AB) =(I_a ⊗ A) · vec(B), it follows that

where d_h is the number of hidden neurons. Then, we have

(5.87) equation

(5.88) equation

(5.89) equation

(5.90) equation

where the aforementioned Kronecker product properties have been used.

It follows that (vec (R_u,v))^′ · ∂vec(A_n,u)/∂w can be written as the sum of the four contributions represented by Eqs. (5.87)–(5.90). The second and the fourth terms – Eqs. (5.88) and (5.90) – can be computed directly using the corresponding formulas. The first one can be calculated by observing that images looks like the function computed by a three‐layered FNN that is the same as h_w except for the activation function of the last layer. In fact, if we denote by images such a network, then

(5.91) equation

holds, where images . A similar reasoning can be applied also to the third contribution.

Required number of operations: The above method includes two tasks: the matrix multiplications of Eqs. (5.87)–(5.90) and the backpropagation as defined by Eq. (5.91). The former task consists of several matrix multiplications. By inspection of Eqs. (5.87)–(5.90), the number of floating point operations is approximately estimated as 2s² + 12s hi_h + 10s² · hi_h , where hi_h denotes the number of hidden‐layer neurons implementing the function h. The second task has approximately the same cost as a backpropagation phase through the original function h_w. Such a value is obtained from the following observations: for an a × b matrix C and a b × c matrix D, the multiplication CD requires approximately 2abc operations; more precisely, abc multiplications and ac (b − 1) sums. If D is a diagonal b ×b matrix, then CD requires 2ab operations. Similarly, if C is an a × b matrix, D is a b × a matrix, and P_a is the a ² ×a matrix defined above and used in Eqs. (5.87)–(5.90), then computing vec(CD)P_c costs only 2ab operations provided that a sparse representation is used for P_α . Finally, a₁, a₂, a₃ are already available, since they are computed during the forward phase of the learning algorithm. Thus, the complexity of computing ∂p_w/∂w is images . Note, however, that even if the sum in Eq. (5.85) ranges over all the arcs of the graph, only those arcs (n, u) such that R_{n, u} ≠ 0 have to be considered. In practice, R_{n, u} ≠ 0 is a rare event, since it happens only when the columns of the Jacobian are larger than μ, and a penalty function was used to limit the occurrence of these cases. As a consequence, a better estimate of the complexity of computing ∂p_w/∂w is O images , where t_R is the average number of nodes u such that R_{n, u} ≠ 0 holds for some n.

Скачать книгу

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic

Чтение книги онлайн.

Читать онлайн книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic страница 103

Информация о книге: