Designing Geodatabases for Transportation. J. Allison Butler
Чтение книги онлайн.
Читать онлайн книгу Designing Geodatabases for Transportation - J. Allison Butler страница 9
Much more about these concepts is discussed later in this chapter. What you need to know for now is that data modeling, as presented here, is founded on the capabilities and constraints of the geodatabase. However, if you are like most transportation-data users, you already have data in a variety of nongeodatabase forms, so this chapter covers other fundamental data structures along with the basic concepts of database design and data models.
Data types
When starting a data-modeling project, you first must understand the data you intend to place into your new geodatabase. In addition to the geometry you use to abstractly represent a real-world entity cartographically, you have the traditional forms of data that have always been part of transport databases.
Figure 2.1 Data types The ArcGIS geodatabase supports several data types for user-supplied class attributes. The primary ones you will likely use are character string, short integer, long integer, single-precision floating-point (float), double-precision floating-point (double), and date.
One of the most common kinds of data is text, which consists of a string of alphanumeric characters, like letters, numbers, and punctuation. Anything you can type on a keyboard can go into a text field. The maximum number of allowable characters defines most text fields. For example, you might see a reference like “String (30)” to define a text field with a maximum length of 30 characters.
An equally popular form of data is a number. There are many different types of number data, but to the user they all consist of a series of digits. Where they differ is how they are stored in the database. In the geodatabase, a short integer will be stored using 2 bytes of memory; a long integer requires 4 bytes. A single-precision or floating-point number is also stored using 4 bytes, while a double-precision number is stored using 8 bytes. The actual numeric range that each of these forms represents varies according to the database management system you use.
Working in concert with the type of number format you select is the way you specify it in an ArcSDE geodatabase. A number field in such a database has two characteristics that go with its type. The first is precision, which specifies the maximum number of digits that can be stored. The second characteristic is scale, which tells the database how many of those digits will fall after the decimal point.
Number type, precision, and scale interact in various ways. For example, the database will ignore scale if you specify a number type of integer, because integers consist only of whole numbers. The data type overrides the specification. Sometimes it works the other way. For instance, if you specify a floating-number (single-precision) data type but a precision of seven or more, ArcGIS will change the data type to double-precision.
Most database management systems also support date and time data types. Although stored in vendor-specific ways, ArcGIS provides a consistent representation to the user in which date and time are combined into one data type, called ‘Date’. The date portion is provided as a two-digit month, a two-digit day, and a four-digit year, with the three components separated by a forward slash character. The time portion is presented as a two-digit number representing a 24-hour clock (00-23), a two-digit minute portion (00-59), and a second component with a precision of 5 and a scale of 3. The three time components are separated using colons.
Files
Relational databases were not the first kind of electronic data structure. The oldest form of database storage is the file, which consists of a block of data organized into logical groups called fields. Each position in the field is called a column. Files look like a table with their records (rows) that separate content using a special character to signify the end of a logical group of data. Everything is text. There is no inherent requirement for all the records to have the same structure. For example, the first record, often called a header, could state the number of body records or describe the fields in those records. All the intelligence needed to understand the file’s content is in the application that reads and writes records.
Figure 2.2 Files A fixed-length file uses column position to identify specific data content forming attributes. A variable-length file uses the sequential order of fields separated by a predefined special character—one that cannot appear in the data. In both cases, the application using the data must know the specific location of each piece of information.
Files come in two basic forms: fixed-length and variable-length. A fixed-length file uses the position of each character in a record to interpret its meaning. Any leftover space not needed to store the data for that record is filled with spaces, either before or after the actual data in the field. Fields in each record are identified by position. For example, a file specification may declare that record characters (columns) 1 through 47 contain an employee’s name right-justified with leading spaces.
A variable-length file uses the position of a field within the record to identify its content. Variable-length records avoid space filling by using special characters to say where one field stops and another begins. You may have come across this structure when using comma-and tab-delimited text files. The commas or tab characters are the things that separate the records into fields. Usually, there is also a special end-of-file character.
The most common ArcGIS file-based data structure is the shapefile. A shapefile is a kind of spatial database structure consisting of several files. There are more than a hundred recognized shapefile component types, each with its own file extension (the three characters after the dot in a typical file name). To copy a shapefile, you must copy all the component files. The minimum components are the geometry (.shp), the nonspatial attribute data (.dbf), and the spatial index (.sbx). The structure of each component file is optimized for the information it contains. For example, the geometry file (.shp) contains a 100-byte fixed-length file header followed by variable-length records. The variable-length record is composed of an 8-byte, fixed-length record header followed by variable-length record contents. Each record defines a single geometry, with the length of the variable portion being determined by the number of vertices and whether measure (m) and elevation (z) coordinate values are included. The fixed-length record header portion provides a record number and the length of the variable portion.
Coverages, which were the original ESRI data structure, are also based on a database structure consisting of multiple files. Designed to reduce the size of a spatial database, software manipulating coverage data must manage a number of composition relationships inherent in the file structure. A special data-exchange file type was developed to be able to distribute coverages via a single file.
File data structures remain useful today and will continue to be part of GIS datasets long into the future. This book, however, will restrict itself to modeling geodatabases. What you put into and take out of a geodatabase may be a file, but the database to be modeled is a geodatabase.
Tables
The next step along the evolutionary line of database design is the table, which is a fundamental data organization unit of