Declarative Logic Programming. Michael Kifer

Чтение книги онлайн.

Читать онлайн книгу Declarative Logic Programming - Michael Kifer страница 23

Declarative Logic Programming - Michael Kifer ACM Books

Скачать книгу

with a general fixpoint operator, then optimized into standard relational-algebra expressions with possibly transitive closure over them. Relational algebra generally needs to be translated to SQL before sending it to a relational DBMS. Draxler [1993] provides a direct translation of a logic language into SQL, bypassing relational algebra. The NED-2 Intelligent Information System [Maier et al. 2002] introduced a specialized syntax for database queries in a logic language that are transmitted to the DBMS as SQL. Finally, van Emde Boas and van Emde Boas [1986] describe an approach to extending a relational DBMS (IBM’s Business System 12) with logic programming functionality, but it is unclear whether the project got beyond the prototype stage.

      Optimization of Datalog queries has been studied from a theoretical perspective as well, mostly in the 1980s. Since the optimization problem is equivalent to finding another Datalog program that computes an equivalent set of facts given a set of facts for extensional predicates, but is faster to evaluate, optimization has been mainly looked at in the context of query equivalence or, more generally, query containment. Query containment means that the results of one query are always guaranteed to be a subset of the results of another query. Showing containment in both directions proves equivalence. Many Datalog optimization techniques involve simplifying the original Datalog program by removing rules or subgoals and then testing for equivalence with the original. Unfortunately, the equivalence problem for Datalog queries has been shown to be undecidable [Shmueli 1993]. Furthermore, query equivalence turned out to be undecidable for many subclasses of Datalog, e.g., even if each rule contains at most one recursive predicate [Gaifman et al. 1993]. Thus, fragments of Datalog for which the query containment problem is decidable have been of strong interest. In particular, Cosmadakis et al. [1988] have shown that, for monadic Datalog, where only predicates with one arguments are recursively defined, query containment is decidable and that if binary relations are allowed to be recursively defined then the problem is undecidable. A stronger version of query equivalence, called uniform equivalence, is the decision problem of whether two Datalog programs compute an equivalent set of facts given initial facts for both intensional and extensional predicates. Sagiv [1987] showed that uniform equivalence is decidable, and provided an algorithm for minimizing a set of Datalog rules with respect to uniform equivalence. For an even simpler subset of Datalog, with no recursion altogether, called union of conjunctive queries, many positive results regarding query optimization have been obtained, especially in the context of SQL [Chaudhuri 1998]. More recently, Vardi [2016] showed that the query containment problem for union of conjunctive queries extended with transitive closure is also decidable. For Datalog optimization techniques that involve constraints, see Section 1.4.5. Another class of equivalence results looks at whether a (syntactically) recursive Datalog program has a non-recursive counterpart [Chaudhuri and Vardi 1992, Naughton 1986],

      We should add that the area of Description Logic [Baader et al. 2003] is concerned with the query containment and equivalence questions for various subsets of first-order logic, which have a limited intersection with Datalog. However, we are not aware of a result from that area that yields new insights when applied to that intersection.

       1.6 Early Datalog and Deductive Database Systems

      While much of the initial work on Datalog looked at evaluation options, semantics and extensions, a number of fairly complete implementations of deductive database systems were built in the late 1980s and early 1990s. These systems differed in several ways.

      One was in how they provided for ordered execution. Although Datalog has pure logical semantics and (at least, in theory) the programmer does not need to worry about rules and goal ordering, most applications need operations with side effects, such as updates and I/O, where order is important. Thus, these systems tended either to embed declarative queries in a procedural language, or have certain rules where evaluation order was enforced.

      A second source of variation is in what kinds of extensions of basic Datalog were supported, such as structured terms, collections, negation and aggregation. A third aspect was the optimizations and evaluation strategies employed, which are dictated to some extent by the extensions supported. The final area we call out is how base relations were managed. Some systems had disk-based storage for the EDB, either custom built or using an existing DBMS or storage system. Others worked with a main-memory database (possibly backed to secondary storage between sessions).

      The coverage that follows is by no means exhaustive, but serves to illustrate the differences we mentioned above. We have also leaned toward systems that were publicly distributed. We recommend the survey by Ramakrishnan and Ullman [1995] for a more in-depth comparison of systems from that era. Also, while we refer to these systems in the present tense, to our knowledge only XSB is still being supported and widely used, though some of the concepts and techniques of LDL++ are incorporated in DeALS [Shkapsky et al. 2013].

      1.6.1 LDL and LDL++

      The LDL language and implementation were developed at the Microelectronics and Computer Technology Corporation (MCC) during the mid-1980s as an alternative to lose coupling between logic languages and databases [Tsur and Zaniolo 1986, Zaniolo 1988]. LDL was intended to be complete enough for full application development, although it can be called from procedural languages and can call computed predicates defined in C++. While starting from pure Horn clauses with declarative semantics, it does provide some control constructs to enforce order on updates and screen output.

      LDL preserves complex terms from Prolog, such as date(1981, 'March'), and augments them with sets. Set values can be listed explicitly, or created in rules by grouping results. For example, a rule

Image

      will return result facts such as

Image

      LDL provides a partition predicate that can non-deterministically split a set, to support parallel processing of large datasets. LDL also supports non-determinism through a choice predicate, which selects a specific assignment of values from a range of options. For example, in

Image

      the choice((AID), (PDI)) goal ensures that each advisor id AID will have only one person id PID associated with it in oneAdvisee. (In relational database terms, the view oneAdvisee will satisfy the functional dependency AIDPID.) LDL supports stratified negation, which is handled via an anti-join strategy. For example, the rule

Image

      is evaluated by anti-join of the person table with the advised table on their first arguments. (Anti-join removes any rows from its first argument that connects with at least one row in the second argument on the join condition. Here it will knock out any person-fact that has a matching advised-fact.)

      LDL has two implementations. The first compiles LDL into FAD, an algebraic language designed at MCC for execution on a parallel database machine. A later, single-user prototype for Unix called SALAD [Chimenti et al. 1990] was created to make LDL available more broadly. SALAD was among the first systems to integrate many of the evaluation techniques covered in Section 1.5. SALAD applies both Magic Sets and Generalized Counting [Saccà and Zaniolo 1986a] rewrites. (The latter is suitable for graph-like queries where path length matters but not path membership.) SALAD evaluation is bottom-up semi-naïve, but has several variants for processing joins associated with individual rules, such as various combinations of eager and lazy evaluation,

Скачать книгу