Читать онлайн книгу - Introduction to Graph Neural Networks. Zhiyuan Liu. Программы. Synthesis Lectures on Artificial Intelligence and Machine LearningLiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Introduction to Graph Neural Networks - Zhiyuan Liu Synthesis Lectures on Artificial Intelligence and Machine Learning

Скачать книгу

also list the limitations of the vanilla GNN in representation capability and training efficiency. After this chapter we will talk about several variants of the vanilla GNN model.

4.1 INTRODUCTION

The concept of GNN was first proposed in Gori et al. [2005], Scarselli et al. [2004, 2009]. For simplicity, we will talk about the model proposed in Scarselli et al. [2009], which aims to extend existing neural networks for processing graph-structured data.

A node is naturally defined by its features and related nodes in the graph. The target of GNN is to learn a state embedding h_v ∈ ℝ^s, which encodes the information of the neighborhood, for each node. The state embedding h_v is used to produce an output o_v, such as the distribution of the predicted node label.

In Scarselli et al. [2009], a typical graph is illustrated in Figure 4.1. The vanilla GNN model deals with the undirected homogeneous graph where each node in the graph has its input features x_v and each edge may also have its features. The paper uses co[v] and ne[v] to denote the set of edges and neighbors of node v. For processing other more complicated graphs such as heterogeneous graphs, the corresponding variants of GNNs could be found in later chapters.

4.2 MODEL

Given the input features of nodes and edges, next we will talk about how the model obtains the node embedding h_v and the output embedding o_v.

In order to update the node state according to the input neighborhood, there is a parametric function f, called local transition function, shared among all nodes. In order to produce the output of the node, there is a parametric function g, called local output function. Then, h_v and o_v are defined as follows:

where x denotes the input feature and h denotes the hidden state. co[v] is the set of edges connected to node v and ne[v] is set of neighbors of node v. So that x_v, x_co[v], h_ne[v], x_ne[v] are the features of v, the features of its edges, the states and the features of the nodes in the neighborhood of v, respectively. In the example of node l₁ in Figure 4.1, x_l1 is the input feature of l₁. co[l₁] contains edges l_(1,4), l_(1,6), l_(1,2), and l_(3.1). ne[l₁] contains nodes l₂, l₃, l₄, and l₆.

Figure 4.1: An example of the graph based on Scarselli et al. [2009].

Let H, O, X, and X_N be the matrices constructed by stacking all the states, all the outputs, all the features, and all the node features, respectively. Then we have a compact form as:

where F, the global transition function, and G is the global output function. They are stacked versions of the local transition function f and the local output function g for all nodes in a graph, respectively. The value of H is the fixed point of Eq. (4.3) and is uniquely defined with the assumption that F is a contraction map.

With the suggestion of Banach’s fixed point theorem [Khamsi and Kirk, 2011], GNN uses the following classic iterative scheme to compute the state:

where H^t denotes the t th iteration of H. The dynamical system Eq. (4.5) converges exponentially fast to the solution of Eq. (4.3) for any initial value of H(0). Note that the computations described in f and g can be interpreted as the FNNs.

After the introduction of the framework of GNN, the next question is how to learn the parameters of the local transition function f and local output function g. With the target information (t_v for a specific node) for the supervision, the loss can be written as:

where p is the number of supervised nodes. The learning algorithm is based on a gradient-descent strategy and is composed of the following steps.

• The states are iteratively updated by Eq. (4.1) until a time step T. Then we obtain an approximate fixed point solution of Eq. (4.3): H(T) ≈ H.

• The gradient of weights W is computed from the loss.

• The weights W are updated according to the gradient computed in the last step.

After running the algorithm, we can get a model trained for a specific supervised/semi-supervised task as well as hidden states of nodes in the graph. The vanilla GNN model provides an effective way to model graphic data and it is the first step toward incorporating neural networks into graph domain.

4.3 LIMITATIONS

Though experimental results showed that GNN is a powerful architecture for modeling structural data, there are still several limitations of the vanilla GNN.

• First, it is computationally inefficient to update the hidden states of nodes iteratively to get the fixed point. The model needs T steps of computation to approximate the fixed point. If relaxing the assumption of the fixed point, we can design a multi-layer GNN to get a stable representation of the node and its neighborhood.

• Second, vanilla GNN uses the same parameters in the iteration while most popular neural networks use different parameters in different layers, which serves as a hierarchical feature extraction method. Moreover, the update of node hidden states is a sequential process which can benefit from the RNN kernels like GRU and LSTM.

Конец ознакомительного фрагмента.

Текст предоставлен ООО «ЛитРес».

Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.

Безопасно оплатить

Скачать книгу

Introduction to Graph Neural Networks. Zhiyuan Liu

Чтение книги онлайн.

Читать онлайн книгу Introduction to Graph Neural Networks - Zhiyuan Liu страница 10

Информация о книге:

Конец ознакомительного фрагмента.