Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic

Чтение книги онлайн.

Читать онлайн книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic страница 104

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic

Скачать книгу

lN) and =(∂ew/∂o)(∂Gw/∂w)(x, lN): The terms b and c can be calculated by the backpropagation of ∂ew/∂o through the network that implements gw . Since such an operation must be repeated for each node, the time complexity of instructions b = (∂ew/∂o)(∂Gw/∂x)(x, lN) and c = (∂ew/∂o)(∂Gw/∂w)(x, lN) is for all the GNN models.

      2  Instruction = z(t)(∂Fw/∂w)(x, l): By definition of Fw, fw , and BP, we have(5.92)

      where y = [ln, xu, l(n, u), lu] and BP1 indicates that we are considering only the first part of the output of BP. Similarly

      (5.93)equation

      where y = [ln, xu, l(n, u), lu]. These two equations provide a direct method to compute d in positional and nonlinear GNNs, respectively.

      For linear GNNs, let images denote the the output of hw and note that

equation

      holds where images and images are the element in position i, j of matrix An, u and the corresponding output of the transition network, respectively, while images is the the element of vector images is the corresponding output of the forcing network (see Eq. (5.83))], and images is the i‐th element of xu . Then

equation

      where y images δ = ∣ ne[n] ∣ · z′(t), and images is a vector that stores images in the position corresponding to i, j, that is, images . Thus, in linear GNNs, d is computed by calling the backpropagation procedure on each arc and node.

      In positional and nonlinear GNNs, the transition function must be activated itf · ∣ N∣ and itf · ∣ E∣ times, respectively. Even if such a difference may appear significant, in practice, the complexity of the two models is similar, because the network that implements the fw is larger than the one that implements hw . In fact, fw has M(s + lE) input neurons, where M is the maximum number of neighbors for a node, whereas hw has only s +lE input neurons. A significant difference can be noticed only for graphs where the number of neighbors of nodes is highly variable, since the inputs of fw must be sufficient to accommodate the maximum number of neighbors, and many inputs may remain unused when fw is applied. On the other hand, it is observed that in the linear model the FNNs are used only once for each iteration, so that the complexity of each iteration is O(s2E∣) instead of images. Note that images holds, when hw is implemented by a three‐layered FNN with hih hidden neurons. In practical cases, where hih is often larger than s, the linear model is faster than the nonlinear model. As confirmed by the experiments, such an advantage is mitigated by the smaller accuracy that the model usually achieves.

      Formally, the cost of each learning epoch is given by the sum of all the instructions times the iterations in Table 5.2. An inspection of the table shows that the cost of all instructions involved in the learning phase are linear with respect to both the dimension of the input graph and of the FNNs. The only exceptions are due to the computation of z(t + 1) = z(t) · A + b, (∂Fw/∂x)(x, l) and ∂pw/w, which depend quadratically on s.

      The most expensive instruction is apparently the computation of ∂pw/w in nonlinear GNNs, which costs O images On the other hand, the experiments have shown [1] that tR is usually a small number. In most epochs, tR is 0, since the Jacobian does not violate the imposed constraint, and in the other cases, tR is usually in the range 1–5. Thus, for a small state dimension s, the computation of ∂pw/w requires few applications of backpropagation on h and has a small impact on the global complexity of the learning process. On the other hand, in theory, if s is very large, it might happen that images and at the same time tR ≫ 0, causing the computation of the gradient to be very slow.

      Let G = (V, E) be an undirected graph with vertex set V = {v1, …, vn}. In the following, we assume that the graph G is weighted, that is, each edge between two vertices vi and vj

Скачать книгу