Statistical Relational Artificial Intelligence. Luc De Raedt
Чтение книги онлайн.
Читать онлайн книгу Statistical Relational Artificial Intelligence - Luc De Raedt страница 8
- Indeed, various techniques for probabilistic learning such as gradient-based methods, the family of EM algorithms or Markov Chain Monte Carlo methods have been developed and exhaustively investigated in different communities, such as in the Uncertainty in AI community for Bayesian networks and in the Computational Linguistics community for Hidden Markov Models. These techniques are not only theoretically sound, they have also resulted in entirely new technologies for, and revolutionary novel products in computer vision, speech recognition, medical diagnostics, troubleshooting systems, etc. Overviews of probabilistic learning can be found in Koller and Friedman [2009], Bishop [2006], Murphy [2012].
- Inductive logic programming and relational learning techniques studied logic learning, i.e., learning using first order logical or relational representations. Inductive Logic Programming has significantly broadened the application domain of data mining especially in bio- and chemo-informatics and now represent some of the best-known examples of scientific discovery by AI systems in the literature. Overviews of inductive logic learning and relational learning can be found in this volume and in Muggleton and De Raedt [1994], Lavrac and Dzeroski [1994], De Raedt [2008].
- Probabilistic logics have been also studied from a knowledge representational perspective [Nilsson, 1986, Halpern, 1990, Poole, 1993b]. The aim of this research was initially more a probabilistic characterization of logic than suitable representations for learning.
In the late 1990s and early 2000s, researchers working on these pairwise combinations started to realize that they also need the third component. Influential approaches of this period include the Probabilistic Relational Models [Friedman et al., 1999], which extended the probabilistic graphical model approach toward relations, the probabilistic logic programming approach of Sato [1995], which extended the knowledge representation approach of Poole [1993b] with a learning mechanism, and approaches such as Bayesian and stochastic logic programs [Kersting and De Raedt, 2001, Cussens, 2001] working within an inductive logic programming tradition. Around 2000, Lise Getoor and David Jensen organized the first workshop on “Statistical Relational Learning” at AAAI which gathered researchers active in the area for this time. This was the start of a successful series of events that continues till today (since 2010 under the header “StarAI”).
In the early days, many additional formalisms were contributed such as RBNs [Jäger, 1997], MMMNs [Taskar et al., 2004], SRMs [Getoor et al., 2001c], MLNs [Richardson and Domingos, 2006], BLOG [Milch et al., 2005], RDNs [Neville and Jensen, 2004], and IBAL [Pfeffer, 2001], many of which are described in the book edited by Getoor and Taskar [2007]. Especially influential was the Markov Logic approach of Richardson and Domingos [2006], as it was an elegant combination of undirected graphical models with relational logic. In this early period, research focused a lot on representational and expressivity issues, often referring to the “alphabet-soup” of systems, reflecting the fact that many systems have names that use three or four letters from the alphabet. Around 2005, it was realized that the commonalities between the different systems are more important than their (often syntactic) differences, and research started to focus on key open issues surrounding learning and inference. One central question is that of lifted inference [Poole, 2003], that is, whether one can perform probabilistic inference without grounding out the probabilistic relational model first. Other questions concern the relationship to existing probabilistic and logical solvers and the continuing quest for efficient inference techniques. Finally, research on SRL and StarAI has also inspired the addition of probabilistic primitives to programming languages, leading to what is now called probabilistic programming. Although some of the early formalisms [Poole, 1993b, Sato, 1995, Pfeffer, 2001] already extend an existing programming language with probabilities, and hence, possess the power of a universal Turing machine, this stream of research became popular since BLOG [Milch et al., 2005] and Church [Goodman et al., 2008].
PART I
Representations
CHAPTER 2
Statistical and Relational AI Representations
Artificial intelligence (AI) is the study of computational agents that act intelligently [Russell and Norvig, 2010, Poole and Mackworth, 2010] and, although it has drawn on many research methodologies, AI research arguably builds on two formal foundations: probability and logic.
The basic argument for probability as a foundation of AI is that agents that act under uncertainty are gambling, and probability is the calculus of gambling in that agents who do not use probability will lose to those that do use it (see [Talbott, 2008], for an overview). While there are a number of interpretations of probability, the most suitable for the present book is a Bayesian or subjective view of probability: our agents do not encounter generic events, but have to make decisions in particular circumstances, and only have access to their percepts (observations) and their beliefs. Probability is the calculus of how beliefs are updated based on observations (evidence).
The basic argument for first-order logic is that an agent needs to reason about individuals, properties of individuals, and relations among these individuals, and at least be able to represent conjunctions, implications, and quantification. These requirements arise from designing languages that allow one to easily and compactly represent the necessary knowledge for a non-trivial task. First-order logic has become widely used in the theory of knowledge representation. It assumes that the world consists of individuals, among which various relations hold or do not hold. Logic provides a semantics, a specification of meaning, which agent designers can then choose to implement in various ways.
Both probability and predicate calculus can be seen as extending propositional logic, one by adding measures over the worlds, and the other by extending the propositional calculus to include individuals and relations. Since, statistical relational AI (StarAI) aims to seamlessly combine these two strands of research, we will first introduce each of these separately before moving on to presenting their possible integrations.
2.1 PROBABILISTIC GRAPHICAL MODELS
The semantics of probability [Halpern, 1990, 2003] can be defined in terms of possible worlds and random variables (although they are neither random nor variable). We can either define random variables in terms of possible worlds (a random variable is a function on possible worlds) or define possible worlds in terms of random variables (a possible world is an assignment of a value to every random variable). Either way, a random variable has a value in every world. A random variable having a particular value is a proposition. Probability is defined in terms of a non-negative measure over sets of possible worlds that follow some intuitive axioms (see, e.g., [Halpern, 2003]). This means that a probabilistic graphical model defines a probability distribution over its possible worlds, or equivalently a joint probability distribution over the propositions, or the assignments of values to the random variables.
In Bayesian probability, we make explicit assumptions and the conclusions are consequences of the specified knowledge and assumptions. The assumptions are about the space of possible hypotheses and the prior probabilities. One particular explicit assumption is the assumption of conditional independence. graphical models [Pearl, 1988, Lauritzen, 1996, Koller and Friedman, 2009] provide representations of such conditional independence of random variables in terms of graphs. A Bayesian network [Pearl, 1988, Darwiche, 2009] is an acyclic directed graphical model of probabilistic dependence where the nodes are random variables. A Bayesian network encapsulates a directed form of independence between the random variables in the graph: a variable is conditionally independent of its non-descendants