Linked Data Visualization. Laura Po
Чтение книги онлайн.
Читать онлайн книгу Linked Data Visualization - Laura Po страница 8
Figure 1.6: Semantic Web Stack.
The first layer of the stack is clearly the media for the information transfer, the Web platform. The idea behind the Semantic Web was to create a globally distributed database. This means that it is necessary to univocally identify the resources and that is necessary to adopt an universally accepted encoding system in order to identify thing even between countries that adopt different writing systems. This first step was accomplished by the adoption of URI (Uniform Resource Identifies). With the advent of RDF1.1, in 2014, the actual naming convention standard became the IRI (International Resource Identifier). IRIs are sequences of Unicode characters and supports any character of any languages. This is a quite important progress in the multi-cultural context of the Internet.
Once defined how to identify and how to access the resources, it is mandatory to create them and provide additional information. The Resource Description Framework (RDF) is the model adopted for solving this task and it is a general purpose language for representing information about resources. RDF has a very simple and flexible data model, based on the central concept of the RDF statement. RDF statements describes simple facts as triples in the form of Subject – Predicate – Object consisting of the resource being described (the subject), a property (the predicate), and a property value (the object). In particular, the subject can either be an IRI or a Blank node, the predicate must be an IRI and the object can be an IRI, Blank node, or RDF Literal. A Blank node is a placeholder that stands for a resource to which no IRI nor literal is given. A collection of RDF statements (or else RDF triples) can be intuitively understood as a directed labeled graph, where the resources are nodes and the statements are arcs connecting two nodes (from the subject node to the object node). Finally, a set of RDF triples is called RDF Dataset or RDF Graph.
RDF data can be written down in a number of different formats, known as serialization. The first standard serialization format is called RDF/XML and it is based on XML tags system. Although the RDF/XML is still in use, other RDF serialization are now preferred because they are more human-friendly. The other serialization formats include:
• RDFa: notation for embedding RDF metadata in XHTML web pages;
• N-Triples: an intuitive and line-based format. It express each triple of an RDF graph on a different line;
• N3 (Notation 3): a serialization format developed by Tim Berners-Lee and designed to be compact and human-readable;
• Turtle (Terse RDF Triple Language): a compact and human-friendly format. It is a subset of N3;
• TriG: extension of Turtle notation;
• N-Quads: a superset of N-Triples, for serializing multiple RDF graphs. The fouth element of the “triple” contains the name of the graph to which the statement belongs; and
• JSON-LD: the standard JSON based serialization format that superseded RDF/JSON format. It can be used for writing RDF triples in a JSON style.
The third layer of the stack aims at structuring the data. The former RDF model and its extension, the RDFS (RDF Schema), were designed to describe, using a set of reserved terms called the RDFS vocabulary, resources and/or relationships between resources. They provide constructs for the description of types of objects (classes), type hierarchies (subclasses), properties that represent object features (properties), and property hierarchies (subproperty). In particular, a Class in RDFS corresponds to the generic concept of a type or category, somewhat like the notion of a class in object-oriented languages, and is defined using the construct rdfs:Class. The resources that belong to a class are called its instances. An instance of a class is a resource having an rdf:type property whose value is the specific class. Moreover, a resource may be an instance of more than one class. Classes can be organized in a hierarchical fashion using the construct rdfs:subClassOf. A property in RDFS is used to characterize a class or a set of classes and is defined using the construct rdf:Property. The Web Ontology Language (OWL) was released in 2004 and is the standard language for defining and instantiating Web ontologies. OWL and RDFS have several similarities. Indeed, OWL is defined as a vocabulary like RDF, however OWL has richer semantics. An OWL Class is defined using the construct owl:Class and represents a set of individuals with common properties. Moreover, OWL provides additional constructors for class definition, including the basic set operations, union, intersection and complement that are implemented, respectively, by the constructs owl:unionOf, owl:intersectionOf, and owl:complementOf. Regarding the individuals, OWL allows to specify two individuals to be identical or different through the owl:sameAs and owl:differentFrom constructs. Unlike RDF Schema, OWL distinguishes a property whose range is a datatype value (owl:DatatypeProperty) from a property whose range is a set of resources (owl:ObjectProperty). In 2009, an extended and revisioned version of OWL, called OWL 2, became the new W3C recommendation. The OWL 2 Web Ontology Language (OWL 2) has a very similar overall structure with OWL 1 and is backward compatible with it, while it introduces a plethora of new features.
Alongside the developement of OWL, a countless number of vocabularies have been developed. Just to name a few, VoID22 (Vocabulary of Interlinked Dataset) contains terms for providing metadata to a dataset, FoaF23 (Friend of a Friend) operates in the Social Network domains and contains terms for describing people and their relations, SKOS24 (Simple Knowledge Organization System) is used for sharing and linking knowledge organization systems like thesauri or taxonomies while the RDF Data Cube Vocabulary25 can be used for publishing multi-dimensional data like statistcs.
The SPARQL26 Protocol and RDF Query Language (SPARQL) is a W3C recommendation and it is the standard query language for RDF data since 2008. SPARQL is one the key technology of the Semantic Web and it is used to retrieve and manipulate RDF data from the knowledge graphs available on the Web. The evaluation of SPARQL queries is based on graph pattern matching. Graph Patterns are templates that consist of a series of triples that the SPARQL engine looks for inside the store.
SPARQL allows four query forms: SELECT, ASK, CONSTRUCT, and DESCRIBE. The SELECT query form returns a solution sequence, i.e., a sequence of variables and their bindings. The ASK query form returns a Boolean value (yes or no), indicating whether a query pattern matches or not. The CONSTRUCT query form returns an RDF graph structured according to the graph template of the query. Finally, the DESCRIBE query form returns an RDF graph which provides a “description” of the matching resources. Thus, based on the query forms, the SPARQL query results may be RDF Graphs, SPARQL solution sequences and Boolean values.
Unfortunately, this SPARQL version presented different vacancies including the lack of the support to data management operators so, in 2013, the W3C SPARQL working group published SPARQL 1.127 which extended the original SPARQL query language in several aspects. Precisely, SPARQL 1.1 introduced features for manipulating the content of the store and introduced the support for nested queries and aggregation functions.
At last, triples need to be stored in a triplestore. Different proposal have been developed over the year. Monolithic Triple Storage are triplestore that store all the triples in a single table. They are sure easy to implement and work for huge number of properties but it requires an intelligent index system and several self join during queries. A slightly lighter version of monolithic storage imply to associate each URI and Literal with a numerical identifier. It ends up in two tables; one holds the association URI/Literal—number and the other contains the triples in a numerical fashion. Property Tables are triplestore which create a table for each class. This way, the tuple with the same characteristics are grouped together.