Estonian Information Society Yearbook 2011/2012. Karin Kastehein
Чтение книги онлайн.
Читать онлайн книгу Estonian Information Society Yearbook 2011/2012 - Karin Kastehein страница 4
![Estonian Information Society Yearbook 2011/2012 - Karin Kastehein Estonian Information Society Yearbook 2011/2012 - Karin Kastehein](/cover_pre202169.jpg)
• culture (libraries, archives, museums, broadcasting)
• politics (press releases, strategies, green books)
• education (lectures, textbooks and study materials)
• science and research (research at universities, institutes and public sector)
• legal information (court, legal acts, patents, trademarks, rights and obligations)
• nature (biology, ecological, geological and geophysical information, information on energy resources)
• agriculture, forestry, fishing
• tourism, accommodations and entertainment
• traffic, transport
• social information (statistics, demographics, health, education)
• business and the economy
• meteorology, environmental information
• spatial data
Technically, the dataset published may be a collection of human-readable text files (such as a collection of legislation or regulations, official notices or contracts) or machine-readable data (such as a database of files exported to csv or xml format or a web service that allows all data to be searched for and downloaded in json or xml format).
A dataset is, in the technical sense, a collection of human-readable text files
The user must be able to do the following:
• browse and search public datasets for a dataset of interest;
• download a dataset found as a whole or, via the search system offered by the services, in parts without having to negotiate for rights or obtain passwords. In an exceptional case, a fee may be charged for the downloading of a dataset;
• to continue to use the database freely, with the right to download it into one’s computer and using it in applications (in free and paid applications) without having to pay (additionally) for it or needing permission to do so.
A public sector institution that creates and publishes a dataset has no obligation to offer data users additional amenities such as conversion to a suitable format, building special network service, translation etc. Nor do officials have the obligation to ensure that data is correct or up to date. Instead, the publisher has to explain in brief the nature of the data and document the expected frequency of the updates.
Licence and fee for dataset. An open dataset must have a licence that allows it to be used, processed and distributed free of charge and without restrictions, either free of charge or for a fee – at the user’s discretion. Specifically, we recommend that a creative commons licence be selected as the licensing option11. Above all, from this list, we recommend CC by 3.0 licence12. This means that in licensing a work, the licensor is the author or the copyright holder, but the licensee is the public at large. You have the right to copy a work (reproduce it), distribute, perform and direct it at the public, and to adapt, arrange and develop it otherwise, including derivative works, on condition that the author is credited.
Open data is published advisably for free download, but the publisher has the right to charge a fee for loading the data in cases set forth in legislation.
Principles for publishing datasets. When publishing data, a compromise between two objectives must be found:
• convenient usability and comprehensibility of the data for the data seeker and the downloader,
• simplicity of publishing data and minimizing the work expenses for the publisher.
To do so, the first task is to find the easiest, simplest and most rapid way of publishing the existing data as such and only then to examine ways of creating user friendliness for information seekers and downloaders. In other words, updating data, converting and other operations are to be tackled only once the dataset has been published.
Data can be updated and converted by a third party as well, who in turn receives the right to share the data free of charge of for a fee. The open dataset conforms to the following requirements13:
Tim Berners-Lee format level scheme on a “fi ve-star mug”
http://www.cafepress.com/w3c_shop
1. Integrity. All public data shall be made available. This includes all data not subject to personal data restrictions etc.
2. Comes from original source. The data has been gathered from the original source without modification, preserving their original format and level of detail. As with databases, it is not permitted to take data from a secondary database.
3. Up-to-dateness. The dataset was published as rapidly as possible to preserve its up-todateness.
4. Availability. The data is available to as wide a circle of users as possible with as wide a range of use as possible.
5. Machine-readability. The data has an understandable structure and can be automatically processed.
6. Avoidance of discrimination. The data is presented publicly, no need to register or seek access privileges in order to obtain it.
7. Use of open standards. The data is presented in an open format that is not the exclusive property of any one company or person.
8. Free licence. The data is not under copyright, patent, trademark or business secret protection. Reasonable privacy and security restrictions are permitted.
How to publish?
In what format? The main principle here is that it is much better to publish data in an inconvenient coding than to not publish them at all on the consideration that it is planned at some unspecified time to improve the coding. Secondly, a published data set can always later be published in a new, better code.
We recommend evaluating the user-friendliness of formats and coding formats based on Tim Berners-Lee’s five-star system principles14 – the more stars the user-friendlier format. The distribution of formats given Estonia’s circumstances could be the following:
* data is available online in any format (e.g.jpeg, pdf, doc, docx, xls.). Data cannot be separated from the file or it is presented in formats oriented at proprietary software;
** data is on the website in open format (e.g.txt, html, odt), but in unstructured form;
*** data is presented on the website in open and free format (e.g.csv, xml, ods files);
**** the objects in the data are identified by URIs15;
*****
11
http://creativecommons.org
12
http://creativecommons.org/licenses/by/3.0
13
http://www.opengovdata.org/home/8principles
14
http://lab.linkeddata.deri.ie/2010/star-scheme-by-example
15
http://en.wikipedia.org/wiki/Uniform_resource_identifier