Statistical Relational Artificial Intelligence. Luc De Raedt
Чтение книги онлайн.
Читать онлайн книгу Statistical Relational Artificial Intelligence - Luc De Raedt страница 9
Undirected graphical models, or Markov networks, [Pearl, 1988] are defined in terms of factors (or potentials) among random variables. A factor represents a function of a set of random variables; for example, a factor on variables {X, Y, Z} is a function that given a value for each variable, returns a non-negative number. The nodes in a Markov network correspond to random variables, and the variables in a factor are neighbors of each other in the graph. These models encapsulate the assumption that a variable is independent of other variables given all of its neighbors. The next two sections give more details.
2.1.1 BAYESIAN NETWORKS
Formally, a Bayesian network (or belief network) is an augmented, acyclic directed graph (DAG), where each node corresponds to a random variable Xi and each edge indicates a direct influence among the random variables. The influence for each variable Xi is quantified with a conditional probability distribution (CPD) P(Xi | Pa(Xi)), where Pa(Xi) are the parents of Xi in the graph. The Bayesian network represents a set of independence assumptions: a variable is independent of the other variables that are not its descendents given its parents.
Example 2.1 As an example of a Bayesian network, consider Judea Pearl’s famous burglary alarm network. The Bayesian network in Fig. 2.1 contains the random variables burglary, earthquake, mary_calls, john_calls, and alarm. The network specifies that burglary and earthquake are independent, and that john_calls is independent of the other variables given alarm. Assume that the random variables are all Boolean, i.e., they can have the range {true, false}, then the Bayesian network defines a probability distribution over truth-assignments to {alarm, earthquake, mary_calls, john_calls, burglary}. To do so, it makes use of CPDs associated to each of the nodes are specified in Table 2.1. They include the CPDs P(alarm | burglary, earthquake), P(earthquake), etc.
Figure 2.1: A Bayesian network for burglary alarms (adapted from Judea Pearl). Nodes denote random variables and edges denote direct influences among the random variables.
Table 2.1: Conditional probability distributions associated with the nodes in the burglary alarm network, cf. Fig. 2.1
The Bayesian network thus has two components: a qualitative one—the directed acyclic graph—and a quantitative one—the conditional probability distributions. Together they specify the joint probability distribution.
Because of the conditional independence assumption, we can write down the joint probability density as
The independence assumption, which states that each variable is independent of its non-descendents given its parents, implies other independencies. These are axiomatized by d-separation [Pearl, 1988], which allows to infer all independencies that are implied from the independence assumptions of Bayesian networks.
2.1.2 MARKOV NETWORKS AND FACTOR GRAPHS
Conceptually, a Markov random field [Pearl, 1988] is an undirected analogue of a Bayesian network. A Markov network is defined in terms of a set of discrete-valued random variables, X = {X1, X2, …, Xn} and a set of factors {f1, …, fm}, where a factor is a non-negative function of a subset of the variables.
A Markov network represents the joint distribution over X as proportional to the product of the factors, i.e.,
where fk(Xk) is a factor on Xk ⊆ X, and xk is x projected onto Xk. Z is a normalization constant known as the partition function.
As long as the factors are all non-zero (they always return a positive number), the distribution can be represented as a log-linear model:
where the factors gk(·) are functions of (a subset of) the configuration x, and wi are real-valued weights.
A Markov network is a graphical representation of a Markov random field where the nodes are the random variables and there is an arc between any two variables that are in a factor together.
As for Bayesian networks, the structure of a Markov network encodes conditional independencies. Basically, two nodes X and Y in a network are conditionally independent given variables Z1, ldots, Zn (i.e., P(X | Y, Z1, …, Zn) = P(X | Z1, …, Zn)) if and only if all paths from X to Y contain one of the variables Zi. The Markov network for the alarm example is illustrated in Fig. 2.2.
A Bayesian network can be seen as a Markov random field, where each conditional probability is a factor. Thus any inference algorithm developed for Markov networks can be applied to Bayesian networks. Many of the inference algorithms are designed for Markov networks for this reason. However, Bayesian networks allow for specialized algorithms, such as recursively pruning variables that are not observed, not queried, and have no children. Any Markov network can also be represented as a Bayesian network by scaling each factor to be in the range [0, 1] (by multiplying by a constant), creating a new variable tk for each factor fk and defining P(tk = true | xk) = fk(xk), and conditioning on tk = true.
An alternative way to depict a Markov random field is in terms of a factor graph. A factor graph is a bipartite graph, which contains a variable node for each random variable xi and a factor node for each factor fi. There is an edge between a variable node and a factor node if and only if the variable appears in the factor. This is often the preferred representation to describe probabilistic inference methods sending messages between variables and factors. The factor graph for the alarm Bayesian network is shown in Fig. 2.3.
Figure 2.2: The Markov network for the burglary alarm Network.
Figure 2.3: The factor graph representing the burglary alarm Bayesian