Declarative Logic Programming. Michael Kifer
Чтение книги онлайн.
Читать онлайн книгу Declarative Logic Programming - Michael Kifer страница 32
The demand
predicate stores David Scott Warren, and his advisors, so that the new rules only infer facts of the adv
predicate that are relevant to our query. This version takes 944 ms, 33% faster than the original rules. This example illustrates that demand transformation is important, even for a small query.
Query 1.2 This query finds the people who got their degrees from the same university as their advisor.
XSB. Running the rules and query as given in XSB results in quite a long execution time, 49.2 min. However, if we analyze the rules for binding patterns, and therefore add the following indexing directive, which indexes the dissertation
predicate on its first, second, and the combination of second and fifth arguments:
the running time is reduced to 511 ms, almost 5800 times faster than the original! The effect here is more dramatic than in the previous query because of repeated calls to dissertation
with a different binding pattern than only the first argument being bound.
LogicBlox. The original rules and query take 2.14 s to run. Performing demand transformation in the original rules does not result in a new program, since in this order of subgoals, there is no IDB predicate with bound arguments in a left-to-right evaluation. However, we can change the order of subgoals to obtain the following rule for q2
, to effect a different demand for adv
without changing the semantics or performance of the given rule.
The demand for adv can be defined as
We can see that this join is quite expensive, joining every pair of people and filtering with respect to their universities. Therefore, demand transformation results in slower performance. Our experiments confirm this behavior, since this set of rules takes 77.9 s, 36 times slower.
Query 1.3 This query finds the people who worked in a different area than their advisors.
XSB. This query is similar in essence to Query 2, and without indexing the query takes 47.4 minutes to run. With the correct indexing directives, it takes 325 ms, 8750 times faster.
LogicBlox. Since Queries 2 and 3 are very similar, we again get the best performance with the initial rules, since the demand transformation after subgoal reordering results in extra work. The original rules and query take 1.74 s to run. If the adv
predicate is moved to a later position and demand transformation is applied, the resulting rules and query take 74.4 s, 43 times slower.
Queries 1.4/5 These queries find the academic ancestors of David Scott Warren, and their count, respectively. The count is only a small overhead for each system, therefore we have grouped the queries together.
XSB. For all the prior queries, it can be shown that, based on the rules and the nature of the data (the genealogy information does not contain cycles), this query would terminate even without tabling. However, even in the absence of repeated subqueries, recursion could lead to exponential time complexity. As we discussed, the solution to this problem in top-down systems is tabling. To enable tabling, we write the following directive with the rules for ancestry:
Recall that our query will generate an initial subquery for anc
with the second argument bound (to David Scott Warren). However, adding the tabling directive does not seem to help, since for every person, this set of rules will generate a query: Is this person an ancestor of David Scott Warren?
This strategy results in a lot of unnecessary computation, and XSB quickly runs out of memory; tabling did not help us get the answers here. Can we, however, reduce the number of queries, and therefore evaluate the query efficiently? The answer was already provided in Section 1.5.2.1, where we showed that left-recursive transitive closure is much more efficient than right-recursive, if tabling is used. Indeed, if we reorder the goals in the second rule as follows:
we will only ever ask: Who is an ancestor of David Scott Warren? And, voilà, we get the answers quickly, in 3.19 s. This example shows that tabling and subgoal ordering are essential to optimizing queries for top-down evaluation. Getting the count for Query 5 only takes a marginal 0.2% longer. The performance can be further improved by indexing the tabled predicate adv
on its second argument, since it is always called with the second argument bound, with the directive:
This change improves the runtime to 181 ms, 17 times faster than without the index.
Apart from indexing and reordering, we can change the recursion to obtain another left-recursive version of the rules above:
However, this version runs slower at 11.3 s, since it immediately creates a subquery with both arguments free (neither X
nor Z
are bound in the head), therefore it finds all possible ancestry pairs.
These experiments show that the correct choice of directives, of rule type, and goal ordering can drastically affect the performance of queries. As a final remark, note that in the evaluation of such recursive queries, tabling directives should always be provided to avoid infinite recursion or exponential time complexity. Without tabling, XSB fails to complete this query in a pre-defined amount of time (1 h).
LogicBlox. Since basic bottom-up evaluation computes the ancestry for every possible pair of people, it is extremely inefficient not to take the query into account. LogicBlox takes 79.1 s to get the answers to the query, in contrast to XSB answering the query in a fraction of a second.
The time complexity of the evaluation for this query in LogicBlox is quadratic (with a polylog factor) in the size of the genealogy graph. We can reduce that time to linear, and observe a benefit that brings the performance to acceptable levels, if we perform demand transformation after reordering the goals as we did for XSB. These changes yield the following rules:
After this transformation, the query takes only 1.14 s in LogicBlox, 70 times faster than the original rules. Adding the count for Query 5 results in a marginal increase of only 0.03%.
Summary. We have illustrated the behavior of the two systems for each of our queries. In particular, specifying indexes and tabling for XSB, and demand transformation for LogicBlox have been shown to dramatically improve the running time. Note that these improvements are not constant-time, but rather asymptotic speedups. We have also shown that even when the systems theoretically should have similar asymptotic behavior, differences in implementation can result in significant variations in actual running times.
1.9 Conclusions
Datalog was a product of converging trends in databases, logic programming,