Linked Data Visualization. Laura Po

Чтение книги онлайн.

Читать онлайн книгу Linked Data Visualization - Laura Po страница 7

Linked Data Visualization - Laura Po Synthesis Lectures on the Semantic Web: Theory and Technology

Скачать книгу

World Factbook and U.S. census data that contain governative data in the form of RDF triples.

      The cloud shows which datasets are related to other datasets and a qualitative indication of the number of properties which connect a dataset to another. A thin line indicates that two datasets are connected with a low number of properties while a thick line represents a high number of relations connecting those datasets.

      As time passed by, more and more institutions started to publish their data according to the Linked Data Principles described in Section 1.3 and the cloud got immense. After only half a year, the Linked Data Cloud doubled in terms of the number of the dataset published and reached the incredible number of 295 datasets on September 2011. There is no data about the dimension of the cloud in 2012 and 2013 but in 2014 the cloud counted up to 570 datasets. Again, no data is available for 2015 and 2016 but from 2017 the LOD cloud started to be updated regularly and there is plenty of information. The first record of 2017, dated January 26th, reported that the number of datasets increased to 1.146 (the double of the number of datasets present in the cloud in 2014). During the following years, the race of publishing Linked Data slowed down. In fact, recent updates of March 29, 2019 showed that the number of datasets present in the Linked Data Cloud is 1.239. Figure 1.5 represents the actual Linked Data Cloud. Despite the time elapsed and the increasing number of datasets, DBPedia is still the biggest and the best representative datasets of the LOD Cloud.

image

      Figure 1.5: Linked Open Data Cloud (March 29, 2019).

      As it can be perceived from Figure 1.5 the cloud is depicted as a partially connected graph. Each node of the graph represents a dataset and the link between two nodes indicates that some kind of property connects elements from different datasets. For helping the users during the navigation of the LOD cloud, given the fact that each dataset is different to others both in size and in the domain it covers, the maintainers of the LOD cloud decided to improve the graph through the adoption of visual notation. The number of triples contained in a dataset is used for calculating the dimension of its node while the domain is used to color the inside of the nodes. Moreover, to provide further aid during the navigation, each domain is subsequently divided into distinct subsections.

      The Linked Open Data Cloud is probably the best representation of the Web of data. However, Figure 1.5 does not reflect the volume of data it contains. Each node of the graph, despite the small dimension, contains an incredibly amount of data. For example, DBPedia alone contains more than 9.5 billion triples. Clearly, DBPedia is one of the biggest datasets around but it is not the only one that reaches an incredibly high number of triples. Geonames, LinkedGeoData, BabelNet are only a few other examples of huge datasets. Along with several metadata, the LOD cloud collects the number of triples which compose every single dataset. Unfortunately, the number of the triples is not present for all the datasets, but analyzing in details the dataset whose triples count is given, it results that the mean number of triples for each dataset is approximately equal to 176 million triples. It ends up in a total of 202 billion triples counted over 1151 datasets!

      The number of Linked Data has risen in the last years also because the efforts of governments, to be more transparent and responsive to citizens’ demands, have been increasing [Attard et al., 2015], and this, in most cases, resulted in the publication of (linked) Open Data. Many online data portals exist and play a fundamental role in the expansion of the Web of Data. Portals like DataHub,11 the EU Open Data Portal,12 the European Data Portal,13 Data.Gov,14 Asia-Pacific SDG Data Portal15 act as repositories for all kind of datasets (agricultural, economy, education, environment, government, justice, transports, …) of different countries so that everyone could freely access those data. The only limitations to the usage of the data are defined by the licenses under which the data have been published but generally, they are not particularly restrictive. Thanks to those portals, the amount of data accessible through the Web is insane. Gathering together all the datasets those portal contains it is easy to exceed the threshold of one million datasets. However, despite the incredibly high number of datasets, their dimension is limited. Some dataset could be pretty big but the greatest part of them occupies a few kilobytes in space.

      There is no clear information about the volume of data already present in the Web but it is easy to figure out that the number and the dimension of the datasets could only increase over the time reaching exabytes of information. Since that information hides a real treasure, in monetary terms, during the last period several data analysis tools and big data analytics tools have been developed for supporting the job of data scientists. One of the leading company in this sector is the apache foundation that developed a high number of applications perfectly suited for handling big data like Apache Hadoop,16 Apache Spark,17 Apache Cassandra,18 Apache Commons RDF,19 Apache Jena,20 and many others.

      The impact of Open Data at the economic, political, and social levels has become clear in the recent years. The European Data Portal21 publishes every year several studies and reports about the situation of Open Data in Europe. They distinguished the benefits coming from Open Data in direct and indirect benefits. In their study [Carrara et al., 2015], they defined direct benefits as “monetised benefits that are realized in market transactions in the form of revenues and Gross Value Added (GVA), the number of jobs involved in producing a service or product, and cost savings” and indirect benefits as “new goods and services, time savings for users of applications using Open Data, knowledge economy growth, increased efficiency in public services and growth of related markets.” In the same document they estimate that the direct value of the Open Data market in the European Union is 55.3 billion Euros, with a potential growth between 2016 and 2020 of 36.9% to a value of 75.7 billion Euros, and that the overall Open Data market reaches is estimated to be between 193 and 209 billion Euros, with an estimated projection of 265–286 billion Euros for 2020. They also quantified the economic benefits by looking other three indicator: number of jobs created, cost savings, and efficiency gains. The forecasted number of direct Open Data jobs is expected to rise from 75,000 of 2016 to nearly 100,000 jobs by 2020. Moreover, thanks to the positive economic effect on innovation and the development of numerous tools to increase efficiency, not only the private sector, but also the public sector is expected to experience an increased level of cost savings through Open Data to a total of 1.7 billion Euros by 2020. They also estimated an augmentation of 7.000 saved lives thanks to a quicker response, a decreasing of 5.5% in road fatalities, a decreasing of 16% in enery usage, etc.

      Another important document that assesses the value of Open Data is Manyika et al. [2013]. That document, created in 2013, estimates the value of the world wide Open Data market is about 3 trillions dollar annually (1.1 trillion for the U.S. market, 0.7 trillion for the European market, and 1.7 for the others). The value is calculated over seven domains of interest (Education, Transportation, Consumer Products, Electricity, Oil and Gas, Health Care, Consumer finance). The staggering difference between the previous values imply that calculating the value of Open Data is not an easy task and that the value is highly dependent on the field in study. At the best of our knowledge there are no actual estimation of the value of the U.S. Open Data market.

      In order to unlock the full potential of Linked Data and to understand how to extract the maximum profit from them, it is important to dive into the technologies that have favored the birth of Linked Data [Bikakis et al., 2013]. Semantic Web is built upon a series of different technologies that have been piled up. All of these technologies form the

Скачать книгу