Declarative Logic Programming. Michael Kifer
Чтение книги онлайн.
Читать онлайн книгу Declarative Logic Programming - Michael Kifer страница 29
Datalog is also the basis for parallel approaches to data analysis. Google’s Yedalog [Chin et al. 2015] is a recent Datalog-based system that is designed specifically to exploit data parallelism and aggregation of massive amounts of data. Also, the DeALS system [Shkapsky et al. 2013] has compilation techniques to support multicore processing [Yang et al. 2017].
Other Application Areas. We briefly mention other areas in which Datalog—and its derivatives—have been applied. The DLV system [Leone et al. 2006] uses Disjunctive Datalog to represent knowledge and to reason about alternatives. Disjunctive Datalog allows the “or” of predicates in the head of a rule, which gives rise to multiple minimal answer sets, similarly to what is seen for negation in a rule body. DLV also includes integrity constraints that rule out certain answer sets, and “soft” constraints that prioritize among answer sets. Ashley-Rollman et al. [2007] use a Datalog-like language called MELD to plan shape-shifting maneuvers for a swarm of modular robots. They find that the declarative approach allows programs that are 20 times smaller and allow more optimizations, compared to imperative alternatives.
A variety of Web-related applications also use Datalog. The Lixto system [Gottlob et al. 2004] supports Web data extraction and service definition. With Lixto, a designer can specify a data-extraction wrapper via a special browser and the system generates a Datalog program to perform the extraction. Its Elog language is based on Monadic Datalog (where all IDB predicates are unary), and supports visual wrapper specification. Lixto technology is now part of the McKinsey Periscope product. A follow-on project, DIADEM [Furche et al. 2014], targets fully automated data extraction from collections of web pages. DIADEM uses a form of Datalog± internally to represent domain knowledge. Orchestra [Ives et al. 2008] supports data sharing on the Web, and translates data-exchange programs to an extended version of Datalog with Skolem functions to support virtual-entity creation. WebDamlog [Abiteboul et al. 2011] is a Datalog-style language for implementing distributed applications among peer systems on the Web. Toward that goal, WebDamlog supports the exchange of both facts and rules.
We will also see uses of Datalog for data management in LogicBlox and Datomic in the next section. Also see chapters 7 and 9 to learn about applications in bioinformatics and natural language processing.
Summary. Datalog is gaining more acceptance of late, as evaluation methods and optimization techniques proposed on a theoretical level in the past have been implemented and tested on actual applications. These applications have demonstrated the utility of the language as a direct interface, as an intermediate language, and as a framework for additional features. These application have also inspired new extensions, such as for distribution, along with new evaluation and optimization approaches.
There is also a growing number of Datalog vendors and companies that rely on Datalog for their mission-critical applications, including LogicBlox, Datomic, XSB, Inc., Semmle, DLV Systems, Ontotext, and Coherent Knowledge Systems.
1.8 Current Systems and Comparison
This section describes and compares four current systems that implement Datalog: XSB, LogicBlox, Datomic, and Flora-2/Ergo. We show how a selection of Datalog rules and queries are written for each system, and report on experiments that illustrate their performance.
These systems all implement Datalog (or a superset). However, the observed performance of each system is unique. They show asymptotically different behavior due to fundamental factors, including their choices for evaluation strategies, selection of underlying data structures, and data indexing. Even when these factors are the same, there are differences in performance due to implementation details, such as the choice of implementation language.
1.8.1 Preliminaries: Data and Queries
For the illustration of rules and queries, we use the Mathematics Genealogy data [MathGen 2000] that contains basic facts about people, dissertations, and advisor relationships between people and dissertations. We assume the following EDB predicates, which are slightly different from those in the introductory example. In particular, an advisor ID (AID) is connected to a dissertation ID (DID), and a dissertation has a candidate (CID) who wrote it.
The data sets contains 198,962 people, 202,505 dissertations, and 211,107 facts of the advised
predicate. In each of the systems, we answer the following five queries to illustrate the performance.
1. Who are the grand-advisors of David Scott Warren?
2. Which candidates got their degrees from the same university as their advisor?
3. Which candidates worked in a different area than at least one of their advisors?
4. Who are the academic ancestors of David Scott Warren?
5. How many academic ancestors does David Scott Warren have?
These queries make use of simple joins, recursive rules, and aggregations that facilitate assessing the performance of the systems in various aspects. In the next subsections, we introduce four systems and how these queries are expressed in each.
1.8.2 XSB
XSB [Sagonas et al. 1994] is a top-down implementation of Prolog extended with tabled resolution as described in Section 1.5.2.1. XSB allows fine-grained control over which tabling strategy and indexes to use for each predicate, and the choices for tabling and indexing affect asymptotic behavior. Consider Query 1, which is written in XSB as follows:
Here the rule defining enumallq1/0
uses a Prolog idiom of a fail
-loop, which has the effect of generating all results in the most efficient way.
By default, XSB indexes each predicate on its first argument. However that may not be the most effective choice. For example, in the rule defining q1
, after the first subgoal binds D
, top-down evaluation will need to obtain values for the first argument of adv
given a binding for the second argument. The following directive creates an index on the second argument:
To illustrate tabling, consider Query 4:
For this query, one can show that there can be repeated subgoals for predicate anc
. The following tabling directive will avoid reevaluating such repeated subgoals and, therefore, avoid an infinite loop, which would affect an ordinary Prolog system: