Declarative Logic Programming. Michael Kifer
Чтение книги онлайн.
Читать онлайн книгу Declarative Logic Programming - Michael Kifer страница 7
The second is a restriction usually placed on Datalog programs that the rules be safe, in the sense that any fact in the IDB contains only the values that appear in the EDB or the rules themselves. For example, the rule
is not safe, because the value of P2
is unconstrained and therefore P2
can be bound to any value, not necessarily the ones that appear in EDB. Generally, safety is guaranteed by a syntactic condition, such as requiring that every variable in the rule head also appear in some predicate in the rule body.
The third point is a notational convenience. The first Datalog rule could be written more succinctly as:
In this formulation, since the actual names of the variables do not matter, logical variables are consistently renamed with 1-letter variable names. Furthermore, inventing names for the variables that occur only once in a rule is an unnecessary burden and the underscore is often used to denote “anonymous” variables. Each occurrence of “_
” is treated as a new variable name, which the parser generates automatically.
1.2 The Emergence of Datalog
Since Datalog is a subset of Prolog, one could create and execute Datalog programs as soon as Prolog evaluators existed—around 1972. However, as a named concept and object of study, Datalog emerged in the mid-1980s, following work in deductive databases in the late 1970s. (See Section 1.3 for more on the naming of Datalog.) Why did Datalog emerge as a topic of interest at that point? It was because Datalog served as a “sweet spot”—or “middle ground”—for related research lines from Logic Programming, Database Systems, and Artificial Intelligence.
Why was Datalog interesting to the three communities? It was because pure Datalog was very simple and had a clean syntax and semantics. Yet it was expressive enough to serve as the basis for theoretical investigations and examination of evaluation alternatives, as well as a foundation from which extensions could be explored and a starting point for knowledge-representation systems. We summarize relevant trends in each of the three communities below, following each by more specific background.
Logic Programming
Logic programmers saw relational databases as an implementation of an important sublanguage, and worked to integrate them into Prolog systems. Since early Prolog implementations assumed all rules and facts were memory-resident, it was clear that for very large fact bases, something like relational database technology was necessary. Sometimes this enhancement took the form of a connection to a relational database management system (RDBMS). Translators were written to convert a subset of Prolog programs into SQL to be passed to an RDBMS and evaluated there. Datalog was a more powerful, more Prolog-ish, data-oriented subset of Prolog than SQL. Datalog was also more “declarative” than Prolog and many in the Logic Programming community liked that aspect. With Prolog, there was a serious gap between Logic Programming theory (programs understood as logical axioms and deduction) and Logic Programming practice (programs as evaluated by Prolog interpreters). For one thing, Prolog contained a number of non-logical primitives; for another, many perfectly logical Prolog programs would not terminate due to the weaknesses of the evaluation strategy used by the Prolog interpreters. With Datalog, that gap almost disappeared—deduction largely coincided with evaluation in terms of the produced results. Also, Datalog provided a basis on which to work out solutions for recursion with negation, some of which could be applied to more general logic languages.
Background. Logic programming grew out of resolution theorem proving proposed by Robinson [1965] and, especially, out of a particularly simplified version of it, called SLD resolution [Lloyd 1993, Kowalski 1974], which worked for special cases of logic, such as Horn clauses.5 In the early 1970s, researchers began to realize that SLD resolution combined with backtracking provided a computational model. In particular, Colmerauer and collaborators developed Prolog [Colmerauer and Roussel 1996] and Kowalski made significant contributions to the theory of logic programming, as it came to be called [Kowalski 1988]. Prolog was the starting point for languages in the Japanese Fifth-Generation Computer Systems (FGCS) project in the early 1980s [Fuchi and Furukawa 1987, Moto-oka and Stone 1984]. The FGCS project sought advances in hardware, databases, parallel computing, deduction and user interfaces to build high-performance knowledge-base systems. In the context of this project, Fuchi [1981] describes Prolog as a basis for bringing together programming languages and database query languages. D. H. D. Warren [1982b] provides interesting insights into the focus on Prolog for the FGCS. The use of Prolog to express queries dates from around the same time. For example, the Chat-80 system [Warren and Pereira 1982] analyzed natural-language questions and turned them into Prolog clauses that could be evaluated against stored predicates. (All examples there are actually in Datalog.) Warren [1981] showed that such queries were amenable to some database-style optimizations via Prolog rewriting and annotation. Prolog itself had been proposed for database query. For example, Maier [1986b] considers Prolog as a database query language and notes its advantages—avoiding the “impedance mismatch” between DBMS and programming language, expressive power, ease of transformation—but also points out its limits in terms of data definition, update, secondary storage, concurrency control and recovery. Zaniolo [1986] also notes that Prolog can be used to write complete database applications, avoiding the impedance mismatch. He further proposes extensions to Prolog for use with a data model supporting entity identity. For a recent survey of the history of logic programming, see Kowalski [2014].
Database Systems
As the relational model gained traction in the 1980s, limitations on expressiveness of the query languages became widely recognized by researchers and practitioners. In particular, fairly common applications—such as the transitive closure of a graph and bill-of-materials roll ups (i.e., aggregation of costs, weights, etc. in a partsubpart hierarchy)—could not be expressed with a single query in most relational query languages. Various approaches for enhancing expressiveness to handle such cases were proposed, such as adding control structures or a fixpoint operator to relational algebra. Datalog was a simple alternative that was similar to domain relational calculus, with which the database theory community was familiar. Thus, it was readily understood, and provided a natural setting in which to study topics such as deductive databases, recursion, its interaction with negation, and evaluation techniques. Much of the early discussion and presentations on Datalog took place at the informal “XP” workshops6 (particularly XP 4.5 in 1983 and XP 7.52 in 1986) and the early symposia on Principals of Database Systems (PODS), which were a follow-up to the XP workshops to some extent. Work on Datalog also highlighted the difference between query answering as evaluation in a model vs. deduction in a theory.
Background. It was recognized early on that there were recursive queries expressible neither in relational algebra nor relational calculus. For example, Aho and Ullman [1979] prove the inexpressibility of transitive closure by finite relational expressions, and consider various extensions to handle it, such as a least-fixed-point operator and embedding in a host programming language. Paredaens [1978] and Bancilhon [1978] also explore this issue. The PROBE system supported traversal recursion over directed graphs [Rosenthal et al. 1986].
The connection of logic and databases predates the relational model. For example, Green and Raphael [1968] uses theorem proving as the basis for the question-answering system QA1 that can “deduce facts that are not explicitly available in its data base.” The 1970s was an active time for investigating the connections between logic and databases, such as model-theoretic vs. proof-theoretic views of a database [Nicolas and Gallaire 1977] and “closed-world” vs. “open-world” assumptions about the information in a database [Reiter