Declarative Logic Programming. Michael Kifer
Чтение книги онлайн.
Читать онлайн книгу Declarative Logic Programming - Michael Kifer страница 26
The Starburst project at IBM sought to make relational DBMSs more extensible in several dimensions, including query processing [Haas et al. 1989]. It introduced table functions. A table function produces a table as a result, and can call itself in its own definition, thus expressing a recursive query. An interesting outcome of the project was a method for applying the Magic Sets technique to full SQL [Mumick and Pirahesh 1994].
FLORID [Frohn et al. 1998] is an object-oriented Datalog system developed in the 1990s at Freiburg University. It differs from all the aforementioned early systems in that it was based on the object-oriented model provided by F-logic (see Section 1.4.6) and not on the usual predicate syntax. It was thus one of the first systems that provided theoretically clean support for semi-structured data.21 It was also one of the first to support the well-founded semantics for negation (Section 1.4.1). In other aspects, FLORID’s evaluation strategy is bottom-up and it has a very flexible support for aggregation, which allows aggregations to be nested arbitrarily.
1.7 The Decline and Resurgence of Datalog
After its swift rise in the 1980s, interest in Datalog entered into a period of decline in 1990s and then started to rise again in 2000s. This section reviews some of the reasons for the decline and the subsequent resurgence.
1.7.1 The Decline
With the notable exception of XSB, the systems in the previous section are no longer supported. A limited form of linear recursion made it from Starburst into IBM’s DB2 product and into the SQL:1999 standard. Datalog and deductive databases did not emerge as a major commercial category. Research on Datalog ebbed in the 1990s. The reasons for the waning of interest are not completely clear, but we consider here some contributing factors. We were helped by comments from colleagues in the field (see Acknowledgments).
No “Killer Application”. A new technology is often propelled by a “killer application” that it enables for which existing technologies are ill-suited. For example, the World Wide Web led to major growth in the Internet (and network connectivity in the home). No such application emerged for Datalog—nobody seemed that interested in computing his or her ancestors. While there were problems that could not be handled directly by conventional relational DBMSs, there were often specific products in the market to solve those problems. For example, IBM’s COPICS product had a module that handled the Bill-of-Materials problem (traversing down the part-subpart hierarchy while aggregating numbers of parts, weight, total cost, and so forth), thus supporting a particular type of recursive query.
Theoretical and Practical Limitations. There were limitations in both evaluation technology and computer hardware that hampered the efficiency and applicability of early Datalog systems. While Magic Sets and similar approaches helped the performance of bottom-up evaluation, in the worst case they could cause an exponential expansion in program size, if the target predicates had many binding patterns. For top-down evaluation, while tabling techniques had been investigated, there were still issues with handling negation, incrementality, and the best way to schedule subgoals. On the hardware side, computers were limited in processor power and memory (4–64 MB of main memory was typical for desktop machines in the 1990s), which meant that Datalog programs on what today would be considered small amounts of data could easily be disk bound.
Loss of Mindshare by Logic Programming in General. Logic programming was being overshadowed by object-oriented languages, which were much better at user interfaces and graphics at a point where powerful desktop workstations were coming on the market. Prolog had modest UI capabilities, and most would not transfer into Datalog, so you could not write a complete application in it, in contrast to object-oriented languages such as Smalltalk and C++
. Programmer productivity issues seemed focused on graphics and interactivity vs. sophisticated query and analysis, though there was still interest in expert systems and knowledge bases. Coverage of Datalog (and logic programming) in academic settings declined, meaning there were fewer programmers versed in logic languages as compared to object-oriented languages.
Antipathy in the Database Systems Community. There was open antagonism by some in the commercial and systems research communities toward deductive database work. Some of that ill will seemed rooted in a vested stake in SQL orthodoxy (which also manifested in criticism of object-oriented databases). That sentiment might also have come from feelings that deductive database work was not addressing the most important issues in data management, such as extensibility and spatial access methods. The nature of some of the research contributions might have added to these perceptions. For example, there were many papers on proposed evaluation techniques, but a lot of them were not actually implemented, much less benchmarked for performance.
Barriers to Entry in the Database Market. While Datalog systems could support more expressive querying than SQL-based DBMSs, the latter had many features that were often primitive or lacking in Datalog implementations, such as concurrency control, recovery, authorizations, cost-based optimization, indexing (and other physical design options), replication and back-up services, integrity constraints, and programming interfaces from multiple languages. While some Datalog implementations had some of these features, most database applications need all of them. Faced with a choice between recursive queries and the other features, developers chose the other features. Recursive queries could be implemented in a generalpurpose programming language making calls to the DBMS, whereas simulating the missing features in Datalog systems that did not have them was in most cases infeasible. Some of the features, such as concurrency and recovery, need quite sophisticated techniques to perform efficiently, and the investment costs to provide them to a Datalog system were likely beyond the means of most development projects. An alternative approach is to have a regular DBMS as a submodule of a Datalog system (and some systems followed this approach). Such a hybrid could provide most of the features mentioned, however, query optimization and evaluation was split across two components, limiting the performance of the resulting system. Datalog systems could not meet the minimum requirements to compete in the data-management market.
Inaccessibility of the Literature. Some whom we consulted said they found the deductive database literature overly dry and complicated. Papers were not easy to understand, and often lacked compelling examples or applications. This aspect may have limited uptake of ideas into existing DBMSs, especially since papers did not say how to fit these approaches into current implementation technology. Also, while Datalog bore close resemblance to the domain relational calculus, leading relational languages, such as SQL and QUEL, were tuple-calculus based. While the difference may seem trivial, and although in many cases Datalog allowed more concise expression of queries, positional notation can be challenging and errorprone for predicates with dozens of attributes, compared to named notation.
Contributions of Datalog. Even during its decline stage and before it started to surge again, Datalog had lasting effects on the contributing areas of logic programming, knowledge representation, and databases. In logic programming, it provided a “declarative reset,” and got people thinking again about how far one could go in logic languages without procedural or extra-logical features. For example, Datalog provided an arena to examine the recursion and negation, to find approaches that were more declarative than negation-as-failure. One approach was the Stable Model semantics, which led to Answer Set programming—the focus of much of current logic programming activity. In knowledge representation, it has influenced languages to deal with graph-based knowledge structures. For example, the core of the SPARQL language for RDF builds from Datalog-like goals over triples of the form (Subject, Predicate, Object). (SELECT queries in SPARQL have been shown equivalent to non-recursive safe Datalog