Declarative Logic Programming. Michael Kifer
Чтение книги онлайн.
Читать онлайн книгу Declarative Logic Programming - Michael Kifer страница 31
Since the object-oriented schema above is quite different from the original relational schema in Section 1.8.1, the next pair of rules specifies the “bridge” between the two schemes:
We recall that the symbol ->
means “has value” and thus the first rule above says that, given a tuple 〈d, c, t, g, u, y, ar〉 in table dissertation
and a tuple 〈adv, d〉 in advised
, one can derive that d is an object in class Dissertation
such that: its attribute cid
has value c, the attribute advisor
has a value adv, title
has value t, and so on. Note that an attribute can have multiple values and so, if dissertation d was advised by several advisors, the advisor
attribute for d will be multi-valued.
The next batch of statements contains the actual queries for our running example. It starts with three auxiliary rules that define additional view-methods for classes Dissertation
and Person
. The first rule, for example, says that if there is a Dissertation
-object (denoted by a name-less variable ?
) that has a candidate with Id ?P
(i.e., the attribute cid
has value ?P
) and an advisor ?A
then that dissertation’s advisor is also an advisor for ?P
. The next two rules recursively define the method ancAdvisor
in class Person
.
The last rule is interesting because it appears in the form of a fact, but is really a shorthand for a rule like
but count{…}
is an evaluable aggregate function that can be placed directly as an argument to q5
. This last query provides a glimpse of how logical and functional styles can be mixed in Flora-2.
1.8.6 Experiments
In this section, we report experimental results on two of the four modern systems we have discussed: XSB and LogicBlox. The systems that we have chosen for experiments are representative of two schools of evaluation, top-down and bottom-up. Each school has its own challenges, and we will illustrate those challenges via experiments, and show techniques to tackle them.
An issue pertinent to both schools of evaluation is indexing. During an evaluation, a system needs to retrieve facts for a predicate given some bound arguments. XSB, by default, generates an index on the first argument of each predicate. However, as we will illustrate, that index may not be enough. In such cases, XSB allows the user to specify additional indexing directives. LogicBlox automatically generates relevant indexes, and the user cannot affect the system’s choices. We will illustrate the importance of indexing in XSB via experiments.
An issue pertinent only to top-down evaluation is repeated calls to subgoals, as described in Section 1.5.2.1. We will show how the user specifies tabling directives, and the difference in query evaluation time with and without tabling in XSB.
An issue pertinent only to bottom-up evaluation is that, in its basic form, it does not take the particular query into account. Many methods exist to transform the rules so that the query is taken into account, most notably the Magic Sets transformation [Bancilhon et al. 1986]. We will use demand transformation instead [Tekle and Liu 2010], since it is a simpler method and its space complexity can be exponentially less (in the number of arguments of predicates) than Magic Sets. We will illustrate the effect of demand transformation via experiments in the LogicBlox system.
Another impact on performance for a system is the inclusion of database features such as atomicity and durability. These require various mechanisms, including locks and writes to disk, which impact performance negatively. LogicBlox has various such features, but XSB does not, and the experiment results should be evaluated in this light as well.
Optimization of Datalog queries by transforming rules and queries has been widely studied. These include specialization and recursion conversion [Tekle et al. 2008]; the latter is illustrated for relevant queries for both systems.
We now show our experiments for each query. We performed the experiments on a 2.8 GHz Intel Core i5 with 8 GB RAM, using XSB 3.7 and LogicBlox 4.3.10. We performed each experiment 10 times and took the average times of the runs.
Loading Input Data. Input data load times are important to the viability of a system, but it is a task that need be performed only once, and does not change with respect to a given query. For this reason, we separate loading input data from query answering in our experiments. In order to load the input data into XSB, we run the following query:
and in order to load the input data into LogicBlox, we install the definitions of the input predicates, and then run import scripts via its TDX mechanism.
Setting up the environment and loading of input data as described above takes 3.91 s in XSB, and 14.9 s in LogicBlox.
Query 1.1 The first query asks for the grand-advisors of David Scott Warren.
XSB. Recall the rules and query in XSB:
Note that a call to q1
will be made with its argument as a free variable, and therefore the following calls will be made in the body of the rule for q1
: (i) a call to person
with the second argument bound due to the first subgoal, since that argument is a constant; (ii) a call to adv
with the second argument bound due to the second subgoal, since D
will be bound by the first subgoal; (iii) another call to adv
with the second argument bound due to the third subgoal, since X
will be bound by the second subgoal; and (iv) another call to person
with the first argument bound due to the final subgoal, since Y
will be bound by the third subgoal.
The rules and query may be analyzed for each relevant predicate to find binding patterns for predicates, as shown above. For optimal performance, it is critical to have indexes corresponding to each binding pattern for the IDB predicates. As noted, XSB provides indexes on the first argument by default. For each non-default index, we write the indexing directives as follows:
The query is answered in 261 ms without the index, and in 238 ms with the index, 9.6% faster. The effect is not so dramatic, since only one subgoal binds the second argument of person
. We will see more dramatic effects in other queries.
LogicBlox. Recall that in LogicBlox, we do not have control over the indexing mechanism. Running the first query takes 1400 ms. However, LogicBlox does not take the query into account, and therefore will infer facts for the adv
predicate for every possible pair, but the only relevant ones are David Scott Warren’s advisors, and their advisors in turn. To avoid the extra computation, we can perform demand transformation, which yields the following rules: