Semantic Web for the Working Ontologist. Dean Allemang

Чтение книги онлайн.

Читать онлайн книгу Semantic Web for the Working Ontologist - Dean Allemang страница 10

Автор:
Жанр:
Серия:
Издательство:
Semantic Web for the Working Ontologist - Dean  Allemang ACM Books

Скачать книгу

The Linked Open Data Cloud includes datasets in a wide variety of fields, including Geography, Government, Life Sciences, Linguistics, Media, and Publication.

      Another effort built on these same standards is referred to as knowledge graphs. This name refers to a wide range of information sharing approaches, but the term gained popularity around 2012 when Google announced that it was using something it called a knowledge graph to make searches more intelligent. Google felt that the use of the name Knowledge Graph, instead of something that seemed more esoteric like Semantic Web, would make it easier for people to understand the basic concept. That is rather than a Web of Semantics they would prefer to call it a Graph of Knowledge. The name has caught on, and is now used in industrial settings to refer to the use of Semantic Web technology and approaches in an enterprise setting. The basics of the technology are the same; the standards we outline in this book apply equally well to the Semantic Web, linked data, or knowledge graphs.

       What about the round-worlders?

      The network effect has already proven to be an effective and empowering way to muster the effort needed to create a massive information network like the WWW; in fact, it is the only method that has actually succeeded in creating such a structure. The AAA slogan enables the network effect that made the rapid growth of the Web possible. But what are some of the ramifications of such an open system? What does the AAA slogan imply for the content of an organically grown web?

Image

      Figure 1.1 Number of linked open datasets on the Web in the Linked Open Data Cloud. From the Linked Open Data Cloud at lod-cloud.net.

      For the network effect to take hold, we have to be prepared to cope with a wide range of variance in the information on the Web. Sometimes the differences will be minor details in an otherwise agreed-on area; at other times, differences may be essential disagreements that drive political and cultural discourse in our society. This phenomenon is apparent in the hypertext Web today; for just about any topic, it is possible to find web pages that express widely differing opinions about that topic. The ability to disagree, and at various levels, is an essential part of human discourse and a key aspect of the Web that makes it successful. Some people might want to put forth a very odd opinion on any topic; someone might even want to postulate that the world is round, while others insist that it is flat. The infrastructure of the Web must allow both of these (contradictory) opinions to have equal availability and access.

      There are a number of ways in which two speakers on the Web may disagree. We will illustrate each of them with the example of the status of Pluto as a planet:

      • They may fundamentally disagree on some topic. While the IAU has changed its definition of planet in such a way that Pluto is no longer included, it is not necessarily the case that every astronomy club or even national body agrees with this categorization. Many astrologers, in particular, who have a vested interest in considering Pluto to be a planet, have decided to continue to consider Pluto as a planet. In such cases, different sources will simply disagree.

      • Someone might want to intentionally deceive. Someone who markets posters, models, or other works that depict nine planets has a good reason to delay reporting the result from the IAU and even to spreading uncertainty about the state of affairs.

      • Someone might simply be mistaken. Web sites are built and maintained by human beings, and thus they are subject to human error. Some web site might erroneously list Pluto as a planet or, indeed, might even erroneously fail to list one of the eight “nondwarf” planets as a planet.

      • Some information may be out of date. There are a number of displays around the world of scale models of the solar system, in which the status of the planets is literally carved in stone; these will continue to list Pluto as a planet until such time as there is funding to carve a new description for the ninth object. Web sites are not carved in stone, but it does take effort to update them; not everyone will rush to accomplish this.

      While some of the reasons for disagreement might be, well, disagreeable (wouldn’t it be nice if we could stop people from lying?), from a technical perspective, there isn’t any way to tell them apart. The infrastructure of the Web has to be able to cope with the fact that information on the Web will disagree from time to time and that this is not a temporary condition. It is in the very nature of the Web that there be variations and disagreement.

      The Semantic Web is often mistaken for an effort to make everyone agree on a single ontology, that is, to make everyone agree on a single set of terms—but that just isn’t the way the Web works. The Semantic Web isn’t about getting everyone to agree, but rather about coping in a world where not everyone will agree and achieving some degree of interoperability anyway. In the data themselves too we may find disagreement; for instance, the numbers of casualties in a conflict may be reported on the Web of data with very different values from the involved parties. There will always be multiple ontologies and diverging statements, just as there will always be multiple web pages on any given topic. One of the features of the Web that has made it so successful, and so different from anything that came before it, is that it, from the start, has always allowed all these multiple viewpoints to coexist.

       To each their own

      How can the Web architecture support this sort of variation of opinion? That is, how can two people say different things about the same topic? There are two approaches to this issue. First, we have to talk a bit about how one can make any statement at all in a web context.

      The IAU can make a statement in plain English about Pluto, such as “Pluto is a dwarf planet,” but such a statement is fraught with all the ambiguities and contextual dependencies inherent in natural language. We think we know what “Pluto” refers to, but how about “dwarf planet”? Is there any possibility that someone might disagree on what a “dwarf planet” is? How can we even discuss such things?

      The first requirement for making statements on a global web is to have a global way of identifying the entities we are talking about. We need to be able to refer to “the notion of Pluto as used by the IAU” and “the notion of Pluto as used by the American Federation of Astrologers” if we even want to be able to discuss whether the two organizations are referring to the same thing by these names.

      In addition to Pluto, another object was also classified as a “dwarf planet.” This object is sometimes known as UB313 and sometimes known by the name Xena. How can we say that the object known to the IAU as UB313 is the same object that its discoverer Michael Brown calls “Xena”?

      One way to do this would be to have a global arbiter of names decide how to refer to the object. Then Brown and the IAU can both refer to that “official” name and say that they use a private “nickname” for it. Of course, the IAU itself is a good candidate for such a body, but the process to name the object took over two years. Coming up with good, agreed-on global names is not always easy business.

      On the Web, we name things with URIs. The URI standard provides rules to mint identifiers for anything around us. The most common form of URIs are the URLs commonly called Web addresses (for example, http://www.inria.fr/) that locate a specific resource on the Web. In the absence of an agreement, different Web authors will select different URIs for the same real-world resource. Brown’s Xena is IAU’s UB313. When information from these different sources is brought together in the distributed network of data, the Web infrastructure has no way of knowing that these need to be treated as the same entity. The flip side of this is that we cannot assume that just because two URIs are distinct, they refer to distinct resources. This feature of the Semantic Web is called the Nonunique Naming Assumption; that is, we have

Скачать книгу