Practical Field Ecology. C. Philip Wheater
Чтение книги онлайн.
Читать онлайн книгу Practical Field Ecology - C. Philip Wheater страница 23
Box 1.7 Aspects to be considered when determining the sample size
A larger sample size is needed when there is:
high variability – use a pilot study or consult similar investigations in the literature to get a feel for the likely variability;
a small difference or relationship or association to be detected – it is worth recognising that very small differences may not be important ecologically (e.g. a native plant may have more insect species than an introduced one, but if this difference is by only one or two common insects, it is unlikely to be of conservation importance);
a requirement to subdivide the data for analysis (e.g. separate analysis of males and females would require similar appropriate sample sizes of both males and females).
See Krebs (1999), van Belle (2002), and various online calculators25 for further details of the different calculations that can be used to estimate sample sizes, depending on the intended statistical analysis technique to be used.
In surveys of community structure, it may be important to know that the majority of species in an area have been recorded at least once in your sample. In this case, species accumulation curves may help. At its simplest, this involves plotting the accumulated number of species against increasing sampling effort. Sampling effort is the number of sampling units (quadrats, pitfall traps, animals handled, hours of observations, sites surveyed, etc.). Box 1.8 illustrates the use of species accumulation curves in quadrat sampling (see Chapter 3). There are a variety of methods of modelling species accumulation curves (see Colwell et al. 2004 and Magurran 2004 for further information) and many standard software packages include routines for this (e.g. those obtained from Pisces Conservation).26
Box 1.8 Species accumulation curves for two sites
By plotting the cumulative number of species found against the number of quadrats examined, it can be seen that as the number of quadrats used increases, the number of species also increases. At the point at which the curve levels off towards the horizontal (the asymptote), we may assume that we have obtained the maximum number of species and can stop sampling. For site A (dashed line, diamonds), we may not yet have reached the total number of species, even after 30 quadrats, and should consider increasing the sampling effort. For site B (dotted line, squares), it appears that we have reached about the maximum number of species that we can expect to get. In fact, we probably reached this number at round about 16 or so quadrats. This difference between sites A and B might reflect not only a difference in the number of species found there, but also a difference in heterogeneity of the site, with site A being less homogeneous than site B. Note that had we looked at the data for site A after 12 quadrats (solid line, diamonds), we might have assumed that we had reached the maximum number of species as the curve levels off. This highlights the importance of collecting past the initial point of curve levelling to check that it truly does reflect the asymptote.
Since we generally take a sample in order to make a valid estimate of a parameter of the population (e.g. the number of species, the mean temperature, the proportion of predators), a central requirement is that the individuals sampled are independent of each other. It is important to recognise, and avoid or if not account for, situations where the individuals sampled are linked in some way as a result of the sampling design. For example, we might compare the number of spangle galls found on leaves chosen at random on oak trees growing in clumps, with those on isolated oak trees. If we found over 20 trees in separate clumps, but only 10 isolated trees, we might be tempted to take double the measurements from each of the individual isolated trees. However, this would mean that individual data points from isolated trees were linked by virtue of the tree on which they were growing and shared many different attributes with each other. Such data would not be independent of each other (known as pseudoreplicates) and hence may cause problems in interpretation since we would be unsure whether any differences between clumped and isolated trees were due to the multiple measurements from some trees. It would be better to use unbalanced sample sizes (i.e. 20 clumped and 10 isolated trees) than use non‐independent data. Similarly, we should not take data from more than one tree in any clump since these are likely to be more similar to each other than to those in other clumps. From a statistical analysis point of view, few tests require equal sample sizes and, even where this is a problem, it would be preferable to reduce the number of trees from clumps that were measured. Note that we may wish to take account of some of the variation between leaves on each tree by taking several (perhaps 10) leaves per tree and using a mean value to represent each tree. There are also statistical tests that allow for multiple measurements per tree, but these usually require the same number of samples per sampling unit – see repeated measures analysis in Chapter 5.
If we survey a pond in order to look at the animals and their relationships with several physical, chemical, and/or biological factors, then no matter how many replicates we take, we are merely describing what happens in a single entity (i.e. this one pond). Such a study does not tell us anything about pond ecology in general, and the use of such replicates is termed pseudoreplication and should be avoided (Hurlbert 1984; van Belle 2002). In order to broaden our approach and gain more of an understanding of ponds in general, we would need to study a large number of separate ponds. Thus, studies of single sites or small parts of sites may not reveal information applicable to the wider ecological context.
In some situations, the data collected are linked to each other by design. For example, we might be interested in comparisons of matched data (e.g. examining the animals found on cabbages before and after the application of fertiliser or pesticide, or the numbers of mayfly larvae found above and below storm drain outflows into a series of streams). These designs can be perfectly sound, but because the data are matched (by cabbage or by stream) we require a slightly different approach to the resulting analysis (see Chapter 5).
When designing your sampling strategy, it is important to consider the variability and whether the timing or order of sampling might bias the result by measuring only part of the potential variation. For example, sampling the insects present on thistle flower heads will be biased if all the data are collected in the early morning, since this will miss