Declarative Logic Programming. Michael Kifer

Чтение книги онлайн.

Читать онлайн книгу Declarative Logic Programming - Michael Kifer страница 28

Declarative Logic Programming - Michael Kifer ACM Books

Скачать книгу

and analysis. However, queries are written in an SQL-like language that is translated to Datalog and thence to relational database queries [de Moor et al. 2007]. The Doop system [Smaragdakis and Bravenboer 2011] also queries programs using Datalog, such as performing points-to analysis—determining all the memory locations to which a pointer variable might refer. They report that their analyses are as precise as previous methods, but faster. Their evaluation relies on the aggressive optimization of Datalog, as well as programmer-assisted indexing of EDB relations.

      The use of Datalog for program analysis illustrates several themes that emerge across recent applications that motivate some of the renewed interest in the language. One is handling of complex reasoning. For example, program analysis involves mutual recursion among different properties being reported, such as pointsto, call graph and exception analysis. Also, the algorithms are expressed in a way that they can often be readily modified via adding an additional parameter in a predicate, or a few more rules. For example, Lam et al. [2005] point out that the Datalog rules for their context-sensitive points-to analysis are almost identical to their counterparts in context-insensitive points-to analysis. In contrast, capturing the reasoning and relationships for program analysis, along with appropriate control strategies, in an imperative programming language is a labor-intensive and error-prone task. Also, while the data sizes involved are not staggering, they are large enough that manual optimization of code may not be sufficient to deal with them efficiently. Another use of Datalog illustrated by the program-analysis domain is the use of Datalog as an intermediate form. It is often fairly easy to write a translator from a domain-specific syntax to Datalog. From there, a wide variety of implementation and optimization options are available. As we have seen, there are multiple control strategies based on top-down and bottom-up methods, and representation alternatives for data and rules, such as binary decision diagrams. Although not discussed extensively in this chapter, Datalog admits a range of parallel and distributed evaluation techniques. With these approaches, Datalog performance can match or exceed that of custom analysis tools [de Moor et al. 2007, Smaragdakis and Bravenboer 2011].

      Declarative Networking. Datalog has also gained popularity for reasoning about networks. The general area of declarative networking [Loo et al. 2009] has grown to encompass network protocols, services, analysis, and monitoring. It is an area with many examples of recursive reasoning over both extensional (link tables) and intensional (state-transition rules) information. Describing network algorithms in Datalog exposes the essential differences between them, generates predicates that can be reused across algorithms, and suggests new alternatives [Loo et al. 2006]. Simple syntactic extensions of Datalog, such as location variables, can capture key aspects of the domain. A range of distributed evaluation techniques are available, including asynchronous data-flow computation derived from the convergence properties of Datalog. (Bottom-up evaluation of a monotonic Datalog program will always converge to the same result regardless of the order in which rules are applied.)

      The P2 facility [Loo et al. 2005] supports the expression of overlay networks, in a (cyclic) data-flow framework using asynchronous messages. The authors found that using Datalog as a basis supported succinct expression of overlay algorithms. For example, the Chord overlay algorithm can be expressed in 47 rules, with many reusable parts, in contrast to finite-automaton-based approaches that are less concise and more monolithic, and the P2 version provides acceptable performance. The Datalog-based approach called out similarities and distinctions among different algorithms. (For example, a wired and wireless protocol were found to differ only in predicate order in one rule body.) Their approach also made hybrid algorithms easy to construct. Overlay specifications in P2 are written in Overlog, a Datalog-based language with extension for location attributes in tuples and continuous queries over a changing EDB (including handling of deletion). Overlog is translatable into data-flow graphs with local Datalog evaluation engines communicating via asynchronous messages.

      Loo et al. [2006] extend the P2 work, looking at general techniques for prototyping, deploying and evolving continuous recursive queries over networks. In that work, the network is both the object of study and the vehicle for evaluation. Their language NDlog (for Network Datalog) emphasizes this latter aspect through both location flags in predicates and an explicit #link EDB predicate that lists all direct node-to-node connections in the network. Every predicate in NDlog must have a location-specifier variable as the first argument, which is used to distribute tuples in the network. Further, any “non-local” rule must contain a #link literal, and every predicate in the rule must have the source or destination of that link as its location specifier. An example from the paper is

Image

      which says there is a path from source @S to destination @D with first hop to @Z. (NDlog supports list values through built-in functions.) We see that all predicates in the rule have either @S or @Z as the location specifier. NDlog is compiled into local dataflow graphs that run at each node in the network. A non-local rule such as the one above is split into two local rules communicating through messages across the link. Evaluation is a relaxed form of semi-naïve, in which iterations happen locally, with tuples that arrive from other nodes during an iteration either being used immediately or buffered for the next iteration. The system uses both traditional and newly proposed Datalog optimizations. The former include predicate reordering, Magic Sets and aggregate selection, in which presence of an aggregate function, such as min in shortest path, allows suppression of non-optimal results [Furfaro et al. 2002]. The authors also present optimizations particular to the distributed setting, based on caching of results that pass through a node, and merging of messages with certain attribute values in common, as well as choosing between (or possibly combining) alternative evaluation strategies based on costs.

      Abiteboul et al. [2005] apply Datalog to determining possible event sequences leading to alarm reports in a networked application. (A distributed telecom system is their example.) They target Datalog because of the need to reason over extensional (alarms) and intensional (potential execution flows) knowledge with recursive dependencies. The authors note that Datalog’s convergence properties are a good match to asynchronous, distributed settings, such as reasoning in and about networks. Their distributed version of Datalog—dDatalog—assigns whole relations to nodes. They adapt QSQ to evaluating dDatalog in a distributed setting. When a node encounters a remote predicate in evaluating a query, it ships that sub-goal, its binding pattern, and the remainder of the query to the node with the remote predicate.

      The Cassandra access-control framework [Becker and Sewell 2004] is another example of the use of Datalog in a distributed setting. Cassandra augments Datalog with location and credential-issuer qualifiers, as well as constraints. The domain for the constraints is pluggable, and gives rise to “a sequence of increasingly expressive policy specification languages.” For example, a constraint domain for arithmetic allows specification of policies that limit the length of a delegation chain. Policy rules that can invoke remote policies over the network, to request resources, roles, and credentials, and are interpreted in a policy-evaluation engine. Cassandra chooses a tabled, top-down evaluation approach, because of potential problems with bottom-up evaluation. In particular, the distributed setting means that right-to-left evaluation of rules would require distributed query plans.

      Datalog and Big Data. More recently, Datalog has been the basis for scalable dataanalytics systems, especially those targeting graph structures such as arise in social networks. SociaLite [Lam et al. 2013] extends Datalog for analyzing social networks. It supports some kinds of recursive aggregates, and allows the programmer to provide hints on data representation and evaluation order, which leads to performance comparable to imperative implementations, but with programs that are an order of magnitude smaller. Distributed SociaLite [Seo et al. 2013] further allows the programmer to specify how data should be shared across servers, and derives how to distribute evaluation and organize communication from that specification. The Myria system supports iterative analytic routines using recursion in Datalog [Wang et al. 2015]. Myria supports asynchronous and prioritized evaluation of Datalog on a shared-nothing cluster, as well

Скачать книгу