Читать онлайн книгу - Introduction to Graph Neural Networks. Zhiyuan Liu. Программы. Synthesis Lectures on Artificial Intelligence and Machine LearningLiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Introduction to Graph Neural Networks - Zhiyuan Liu Synthesis Lectures on Artificial Intelligence and Machine Learning

Скачать книгу

I is the n × n identity matrix. A⁻¹ exists if and only if |A| ≠ 0.

The transpose of matrix A is represented as A^T, where

There is another frequently used product between matrices called Hadamard product. The Hadamard product of two matrices A ∈ ℝ^{m × n} and B ∈ ℝ^{m × n} is a matrix C ∈ ℝ^{m × n}, where

• Tensor: An array with arbitrary dimension. Most matrix operations can also be applied to tensors.

2.1.2 EIGENDECOMPOSITION

Let A be a matrix in ℝ^{n × n}. A nonzero vector v ∈ ℂⁿ is called an eigenvector of A if there exists such scalar λ ∈ ℂ that

Here scalar λ is an eigenvalue of A corresponding to the eigenvector v. If matrix A has n eigenvectors {v₁, v₂, …, v_n} that are linearly independent, corresponding to the eigenvalue {λ₁, λ₂, … λ_n}, then it can be deduced that

Let V = [v₁ v₂ … v_n]; then it is clear that V is an invertible matrix. We have the eigendecomposition of A (also called diagonalization)

It can also be written in the following form:

However, not all square matrices can be diagonalized in such form because a matrix may not have as many as n linear independent eigenvectors. Fortunately, it can be proved that every real symmetric matrix has an eigendecomposition.

2.1.3 SINGULAR VALUE DECOMPOSITION

As eigendecomposition can only be applied to certain matrices, we introduce the singular value decomposition, which is a generalization to all matrices.

First we need to introduce the concept of singular value. Let r denote the rank of A^T A, then there exist r positive scalars σ₁ ≥ σ₂ ≥ … σ_r > 0 such that for 1 ≤ i ≤ r, v_i is an eigenvector of A^T A with corresponding eigenvalue . Note that v₁, v₂, …, v_r are linearly independent. The r positive scalars σ₁, σ₂, …, σ_r are called singular values of A. Then we have the singular value decomposition

where U ∈ ℝ^{m × m} and V (n × n) are orthogonal matrices and Σ is an m × n matrix defined as follows:

In fact, the column vectors of U are eigenvectors of AA^T, and the eigenvectors of A^T A are made up of the the column vectors of V.

2.2 PROBABILITY THEORY

Uncertainty is ubiquitous in the field of machine learning, thus we need to use probability theory to quantify and manipulate the uncertainty. In this section, we review some basic concepts and classic distributions in probability theory which are essential for understanding the rest of the book.

2.2.1 BASIC CONCEPTS AND FORMULAS

In probability theory, a random variable is a variable that has a random value. For instance, if we denote a random value by X, which has two possible values x₁ and x₂, then the probability of X equals to x₁ is P(X = x₁). Clearly, the following equation remains true:

Suppose there is another random variable Y that has y₁ as a possible value. The probability that X = x₁ and Y = y₁ is written as P(X = x₁; Y = y₁), which is called the joint probability of X = x₁ and Y = y₁.

Sometimes we need to know the relationship between random variables, like the probability of X = x₁ on the condition that Y = y₁, which can be written as P(X = x₁|Y = y₁). We call this the conditional probability of X = x₁ given Y = y₁. With the concepts above, we can write the following two fundamental rules of probability theory:

The former is the sum rule while the latter is the product rule. Slightly modifying the form of product rule, we get another useful formula:

which is the famous Bayes formula. Note that it also holds for more than two variables:

Using product rule, we can deduce the chain rule:

where X₁, X₂, …, Xn are n random variables.

The average value of some function f(x) (where x is the value of a certain random variable) under a probability distribution P(x) is called the expectation of f(x). For a discrete distribution, it can be written as

Usually, when f(x) = x, 𝔼[x] stands for the expectation of x.

Скачать книгу

Introduction to Graph Neural Networks. Zhiyuan Liu

Чтение книги онлайн.

Читать онлайн книгу Introduction to Graph Neural Networks - Zhiyuan Liu страница 7

Информация о книге: