Linked Data Visualization. Laura Po

Чтение книги онлайн.

Читать онлайн книгу Linked Data Visualization - Laura Po страница 6

Linked Data Visualization - Laura Po Synthesis Lectures on the Semantic Web: Theory and Technology

Скачать книгу

born, it’s got relationships. And when it has relationships, whenever it expresses a relationship, then the other thing that it’s related to is given one of those names that start with HTTP, so that I can go ahead and look that thing up.

      Shortly before the birth of Linked Data principles, Open Data arose and there were defined some principles. The first appearance of the term “Open Data” dates back in 1995 in a document of American scientific agency. That document stated that geophysical and environmental data transcends political border so they promoted a complete and open exchange of scientific information between different countries. However, a formal definition of the term Open Data wait until 2005 with the Open Definition 2.1.6 This document holds several characteristics for data to be considered open and it can be summarized as: “Knowledge is open if anyone is free to access, use, modify, and share it—subject, at most, to measures that preserve provenance and opennes.” Moreover, a more specific definition of the term Open Government Data7 had to wait for 2007 where 30 advocates gathered in Sebastopol, California. The meeting was meant to design a set of principles of open government data but the same logic could be inherited by all kinds of Open Data. At the end of the meeting it was stated that government data is considered open if it is compliance with the following principles.

      • Complete. All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.

      • Primary. Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.

      • Timely. Data is made available as quickly as necessary to preserve the value of the data.

      • Accessible. Data is available to the widest range of users for the widest range of purposes.

      • Machine processable. Data is reasonably structured to allow automated processing.

      • Non-discriminatory. Data is available to anyone, with no requirement of registration.

      • Non-proprietary. Data is available in a format over which no entity has exclusive control.

      • License-free. Data is not subject to any copyright, patent, trademark, or trade secret regulation. Reasonable privacy, security, and privilege restriction may be allowed.

      Well aware of the advantages both Linked Data and Open Data offered, it didn’t take long before someone started encouraging to fuse Linked Data with Open Data. In fact, in 2010, the same Tim Berners-Lee published an extension of its note containing a star rating system for publishing Linked Open Data (LOD). Every rule of this rating system is a specialization of the previous one, it means that a five-star dataset satisfies all the criteria.

      image Available on the Web (whatever format) but with an open license, to be Open Data. Documents are now publicly available online. Everyone can read, edit, save, share, and print them but unless building a custom parser, it is hard to extract data.

      image Available as machine-readable structured data (e.g., Excel instead of image scan of a table, …). Data are now accessible to machines but they remain bound to a proprietary file format. Extracting data means depending on proprietary software.

      image Available through non-proprietary format (e.g., CSV instead of Excel, …). Data are now fully accessible to everyone (both humans and machines) but they are still bound in documents and not freely accessible from the Web.

      image Use open standards from W3C (RDF and SPARQL) to identify things so that people can point at your stuff. Every resource has its own URI that identifies it univocally. Users can look them up through HTTP requests and read, edit, and share those data freely. Generally, the data are represented through RDF format however they can be converted in other formats easily.

      image Link your data to other people’s data to provide context. Data are now fully connected to other resources and their value increases. Both publishers and consumers benefit from the network effect,8 the higher is the number of consumers than the higher is the value of the data.

      The LOD Cloud9 is a diagram that depicts the Linked Data datasets publicly available online. The diagram is updated regularly and it is maintained by the Insight Center for Data Analytics10 which is one of the biggest data science research center in Europe.

      Everyone can upload datasets in the cloud but it will only be accepted and added to the cloud if it matches with the LOD Cloud principles, which are a slightly different version of the LD principles described in the section above. In order of being published, a dataset must respect the following rules.

      1. There must be resolvable http:// (or https://) URIs.

      2. They must resolve, with or without content negotiation, to RDF data in one of the popular RDF formats (RDFa, RDF/XML, Turtle, N-Triples).

      3. The dataset must contain at least 1000 triples.

      4. The dataset must be connected via RDF links to a dataset that is already in the diagram. This means, either your dataset must use URIs from the other dataset, or vice versa. They arbitrarily require at least 50 links.

      5. Access the entire dataset must be possible via RDF crawling, via an RDF dump, or via a SPARQL endpoint.

      Moreover, the maintainers of the LOD cloud developed an ad-hoc rating system for evaluating the quality of the published dataset. Although all the datasets respect the five rules described above, it is not assured that every dataset has the same characteristics. Generally, the evaluation metrics takes into account several metadata associated with the dataset like the presence or the absence of a SPARQL endpoint, the information about the author, the presence or the absence of the information of the author, the presence or the absence of metadata (and eventually the kind of metadata provided) and so forth. At the end of the process, each dataset is associated with a number of stars ranging from 1–5. The higher is the number of stars then higher is the quality of the dataset.

      The Linked Data Cloud was, initially, created in May in 2007, at that time, it was composed of only 12 datasets. The LOD cloud contained the following.

      • DBpedia which is a Linked Data version of Wikipedia.

      • Geonames which contains a Linked Data version of geographical data.

      • DBLP which contains a Linked Data version of academic data.

      • Project Guttenberg and RDF Book Mashup which contains RDF data about books.

      • Revyu which contains reviews in the form of LD.

      • MusicBrainz, DBtune, and Jamendo which contain RDF data about the music business.

      • FOAF (acronym of Friend of a Friend) which is an ontology containing LD that describes information about people, their relations, their activities, and, more generally, social network data.

      •

Скачать книгу