Practical Field Ecology. C. Philip Wheater
Чтение книги онлайн.
Читать онлайн книгу Practical Field Ecology - C. Philip Wheater страница 22
Box 1.5 Differences between interval and ratio data
Interval data have no true zero so that negative values are possible (as in temperature measured on the Celsius scale where 0 °C refers to the freezing point of water rather than the lowest possible temperature) and where measurements cannot be multiplied or divided to give meaningful answers (as in dates).
Ratio data are measurements that have an absolute zero point that is the lowest possible value (as in temperature measured on the Kelvin scale where zero Kelvin is absolute zero) and so negative values are not possible (e.g. you cannot have − 6 foxes). With ratio data, all basic mathematical operations can be performed to give meaningful answers. For example, you can derive a ratio of water lost from soil following drying out as follows (where the original mass = 20 g, and dried mass = 16.5 g):
Note that we can readily reduce measurement data to ordinal or categorical, but not the other way around. Thus, if we count the numbers of invertebrates of different species on a particular type of plant, we could subsequently express this in order of dominance from abundant through to rare (an ordinal scale), or indicate the presence or absence of different species (categories). However, if we merely record presence and absence of species, we cannot subsequently calculate the numbers of individuals. Thus, if in doubt, it is safest to collect the information at the highest resolution possible.
Figure 1.3 Example of a section of a data recording sheet for an investigation into the distribution of woodland birds.
It is good practice to use a standardised data recording sheet (ideally in your field notebook) that is as similar as possible to the way in which data will be entered into a computer for analysis to avoid data transcription errors in moving from paper to a computer spreadsheet. In our example (Figure 1.3), we have two types of variables: fixed and measured. It is easier to deal with these in order so that fixed variables come first, followed by measured variables. Fixed variables are those determined by the research design and do not vary during the investigation (record number, site, date, and time). Hence, these can be added to the recording sheet early in its production. Measured variables, on the other hand, are those factors recorded during the investigation the values of which will vary depending on the site, date, time, etc. (numbers of wrens, blackbirds, etc.). Sometimes, derived variables are also required (i.e. variables produced from measured data, e.g. the proportions that each species forms of the whole catch). Such derived variables can be added to the right of the measured data once the latter have been entered on a computer spreadsheet, since the required computations are usually easily carried out using spreadsheet functions. In most cases, data will be recorded as numerical values. Where categories (e.g. site) occur, codes or names can be used, although some computer programs will not accept letter codes, so you may need to allocate numeric codes to such variables. You should make sure that any paper copies of results sheets are photocopied or scanned as soon as possible after completion, and that electronic copies are properly backed up.
Sampling designs
When implementing a project, it is rarely possible to collect information on all the animals or plants present. Usually we need to use a sample that we hope to be representative of the situation as a whole. The total number of data points that could theoretically be gathered is known as the population (this is a statistical population rather than the actual population of animals or plants – see Box 1.6); the actual number of data points is termed the sample size. Larger samples are usually more representative of populations, although this depends on the variability of the system being studied (small samples may be reliable representations of populations with low variability). Those elements of a system that are calculated (e.g. the mean number of plants, such as plantains, per square metre in a meadow) are termed statistics and are estimates of the true attributes of a statistical population (called parameters – see Box 1.6). So, if we counted all the plantains in the entire meadow, we would be able to calculate the actual mean value per square metre (a parameter). Since it is usually impractical to count all individual plantains, in reality we usually count plantains in a subset of the meadow (i.e. take a sample), and calculate the mean numbers per square metre using this sample in the expectation that it will be representative of the whole site (a statistic). This sort of situation occurs in many types of survey. For example, market researchers obtain opinions from large groups (samples) of people and use these to indicate the attitudes of the population as a whole.
Box 1.6 Terms used in sampling theory
See also the Glossary of statistical terms in Appendix 1.
A population is a collection of individuals, normally defined by a given area at a given time. For example, scientists refer to the decline in the world population of Atlantic cod in the last century or the annual harvest of Northeast Atlantic cod. These are both true populations. The size of a population is rarely measured directly but usually estimated from samples.
A sample is a term that can be used ambiguously, but is a subset drawn from a population, which usually includes a quantity. For example, 100 individual fish taken from the Northeast Atlantic cod population and measured in order to get an estimate of body size. Another example would be taking 50 small areas from a meadow (each 1 square metre in size) in order to count the number of plantains within them.
A parameter is a population metric that is estimated from a variable (e.g. the mean body size of Northeast Atlantic cod, or the mean number of plantains per square metre of a meadow) and can be used to summarise data. Importantly, statistical tests aim to estimate parameters from a population in order to test for differences, relationships, associations, etc.
A variable is a measurement that may change from sampling unit to sampling unit (e.g. the body size of Northeast Atlantic cod taken from a sample, or the number of plantains in a square metre of a meadow) and can be used to summarise collected data (e.g. by taking the mean).
The decision over which samples to take requires some care, and at this point it is worth discussing why replication is important. Since environmental systems are usually intrinsically variable (i.e. physical, chemical, and biological factors differ spatially and temporally), the larger the sample, then the more representative it will be of the population (i.e. the more of the natural variation will be covered). However, the larger the sample, the more time and effort it will take to collect it. There are methods to calculate the optimum sample size; however, these rely on knowledge of the variability of the system. This is rarely known in advance, although a small pilot study may give some indication. If it is known or suspected that there is substantial variability, then a large sample should be taken. In most ecological surveys, a large sample would include over 50 observations. However, where the population is likely to be very large