Declarative Logic Programming. Michael Kifer

Чтение книги онлайн.

Читать онлайн книгу Declarative Logic Programming - Michael Kifer страница 22

Declarative Logic Programming - Michael Kifer ACM Books

Скачать книгу

(which is useful if the EDB is managed externally, say, by a database system). Solving this generalized subgoal will give the set of bindings B1 = {hw, dss}. However, rather than creating a new generalized subgoal related({hw,dss}, P2), these bindings will be merged with the existing generalized subgoal to get

Image

      If any of the new bindings had matched old bindings, it would indicate that there might be some results already available.

      QSQ comes in two flavors: iterative (QSQI) and recursive (QSQR). Both involve the steps of generating new answers and creating new subgoals [Gottlob et al. 1989]. QSQI gives priority to new answers, and suspends working on any new subgoals until all answers have been generated that do not require those subgoals. In QSQR, on the other hand, when a new subgoal is encountered, it becomes the focus and processing of the current subgoal is suspended. In practice, QSQI tends to be worse because of duplicate rule firings [Bancilhon and Ramakrishnan 1986]. QSQR seems closely related to SLG resolution applied to programs without negation, but to our knowledge no detailed comparison has been attempted.

       1.5.3 Bottom-Up vs. Top-Down

      The question naturally arises as to whether bottom-up or top-down is generally better for Datalog evaluation, and which particular methods are best. Obviously, one can concoct programs and datasets that favor a particular method, and some methods will not work at all on a given program. (For example, some counting variants of Magic Sets work for only some patterns of recursion.) Bancilhon and Ramakrishnan [1986] compare a range of bottom-up and top-down methods analytically on a variety of Datalog programs, EDB structures and query-binding patterns. While there are different orderings of the methods on the different cases, some patterns emerge, such as Magic Sets and QSQR generally being comparable in predicted performance, and usually beating QSQI and semi-naïve. Ullman [1989] claims that there are bottom-up methods that will always perform as well or better as top-down methods on Datalog. His argument is that for a given Datalog program P and query Q, there is a modified version of P that will generate all answers for Q that, when evaluated by the semi-naïve method, will beat a depth-first top-down method. He further claims that the Magic Sets approach bounds the performance of memoizing versions of top-down such as OLDT and QSQ. Bry [1990], however, disputes Ullman’s conclusion. He defines a top-down method called the Backward Fixpoint Procedure (BFP) and claims that bottom-up rewrite methods and top-down memoizing methods are actually BFP in different forms.

      In terms of asymptotic time and space complexities, Tekle and Liu [2010, 2011] have recently established the precise relationship between (1) bottom-up evaluation using Demand Transformation and Subsumptive Demand Transformation, as well as Magic Sets, and (2) top-down evaluation using variant tabling and subsumptive tabling. However, the actual performance varies widely due to constant factors, and performance of internal data structures. Faced with the ambiguity of such analyses, some Datalog systems include both bottom-up and top-down methods, though determining the best method and optimizations to apply to a particular Datalog program and dataset is far from a solved problem.

       1.5.4 Evaluation Methods Using Database Systems

      An obvious approach to scaling is to use a relational DBMS to manage the ground facts in a Datalog program, which are easily stored as rows in a table, where each predicate has a separate table. The DBMS might be used just for secondary-storage management and indexing capabilities, or for more general query processing and evaluation. As an example of the former, consider top-down evaluation of the query related(lvk, Y). If we are considering the rule

Image

      we will need to solve the sub-goal advised(A, lvk). If advised is stored in a DBMS, and indexed on the second position, the database could quickly retrieve all facts having lvk as the value in the second position. Even if we were using a very simple top-down strategy, and only considering advised(A,lvk)-facts one at a time, bringing them into memory in a batch and caching them can save I/O time. As an example of the more general strategy, consider a rule-compilation approach. When working on a query, the evaluator can accumulate EDB goals, while trying to solve IDB goals with rules. Then, when a goal list contains only EDB goals, it can be converted to a database query. For example, if for the query related(lvk, Y) we try to solve the goal using the rule

Image

      we obtain a goal list advised(B, lvk), related(B, Y). If we then, in turn, solve the related/2 goal with the previous rule, the resulting goal list is

Image

      which consists entirely of EDB goals. This goal list can be converted to the SQL query

Image

      If there are multiple ways to solve IDB goals, then there will be multiple database queries. (In the presence of recursion, the number of queries may be unbounded, however.)

      Bottom-up evaluation can also make use of DBMS capabilities. The database can handle all or part of evaluation of each application of TP for a Datalog program P. A rule that mentions only EDB predicates can be translated into an SQL query. A rule that contains one or more IDB predicates needs to also access newly derived facts during right-to-left evaluation. We can create temporary tables in the database for each IDB predicate, to hold derived IDB facts, thereby having TP execute entirely within the database. Alternatively, a bottom-up evaluator might hold IDB facts in its own memory, and evaluate each rule with a nested-loop approach, issuing database queries for EDB facts, taking into account each combination of bindings determined by the predicates outside of those queries. We discuss specific instances of these approaches in the following paragraphs—first TD methods and then BU methods.

      In terms of specific approaches, top-down methods tend to be more “proof” oriented, driven by derivation, whereas bottom-up methods are more “syntactic,” based on the structure of the program rules. Reiter [1977a] talks about an initial “compilation” phase where the IDB is consulted, resulting in a set of queries against the EDB. More specifically, he describes feeding the IDB to a theorem prover that flags EDB literals for later evaluation, and proposes that evaluation take place in a relational database. (Reiter notes that his particular method will not be complete for recursive queries.) The DADM [Kellogg et al. 1977] and DEDUCE 2 [Chang 1977] systems also present approaches where the IDB rules are first processed to yield (perhaps multiple) goal lists of EDB literals that can be evaluated against a relational database. Cuppens and Demolombe [1988] describe an accumulation approach to collect maximal sequences of EDB predicates for database evaluation. Grant and Minker [Grant and Minker 1981] note that the collection of EDB queries generated by such approaches will often have shared subexpressions that can be factored out for efficiency. Ceri et al. [1989] address overlap of database retrievals by caching the results returned from queries. The BERMUDA system [Ioannidis et al. 1988] takes a similar approach, collecting sequences of EDB predicates for evaluation by the database, caching results in a file, and having a “loader” component provide access to answers to a Prolog interpreter on demand.

      On the bottom-up side, it has long been known that the immediate consequence operator TP for a Datalog program P can be translated into a relational-algebra expression. Ceri et al. [1986] show that a complete translation of bottom-up evaluation can be obtained by extending relational algebra with a fixpoint operator. Ceri and Tanca [1987] discuss various optimizations that can be applied to such a translation. The PRISMAlog language [Houtsma and Apers 1992] was a Datalog extension that was also translated to relational algebra extended

Скачать книгу