Practical Field Ecology. C. Philip Wheater
Чтение книги онлайн.
Читать онлайн книгу Practical Field Ecology - C. Philip Wheater страница 25
Figure 1.5 Experimental layouts for five different treatments. (a) Clustered design; (b) stratified design; (c) Latin square design. Each treatment is represented by a different symbol.
Planning statistical analysis
Although at this stage we will not discuss in detail the ways in which data are analysed, it is important to at least have sight of the likely methods that may be used (see Chapter 5 for a more detailed coverage of statistical analysis). This is because different statistical methods are required to deal with different research questions. For example, if we were interested in trends over time, a regression model would seem appropriate; but if we were motivated to look for differences between treatments, then an ANOVA might be our test of choice. It should be noted that most statistical techniques require data to be gathered in a particular manner and so a lack of care at this stage could result in data being collected that cannot answer the question posed. It could also mean that some statistical methods cannot be used because the data do not meet the minimum number of observations that are needed to obtain meaningful results. This might also apply to those statistical approaches that require balanced designs (i.e. the same number of data points in each factor measured). In this section, we will discuss some of the major types of analyses that you could employ to answer certain commonly asked questions. As always, it is worth looking at the literature to see what types of analysis have been used in similar studies to the one you propose to do. There are several major groups of analysis based on the broad types of approach required.
Describing data
We need a variety of techniques to describe the data that we collect. This might be as a data exploratory technique (to check the data to see how variable a data set is, or what sort of distribution we get, etc.), to understand some aspects of the data (e.g. how diverse communities are), and for communication purposes (to be able to discuss the results, orally and in writing, with other people).
Here, simple plotting of measured variables on frequency histograms (or tables), cross‐tabulation of one (nominal) variable against another, and examining the range of the data (from the minimum to the maximum) may help to check for errors and ensure that we can chose the correct type of test for subsequent analysis. Extracting statistics, such as diversity indices to describe species richness, or evenness to describe how equal the proportion of species is within a community, can be important to assess what sort of community we have. Likewise, the estimation of population size or density might also be important. Similarly, we can use the average value of a variable to describe the magnitude of the majority of the data points (usually in conjunction with some measure of how variable the values are and how many data points were collected). These descriptive statistical techniques are reviewed later and more detail can be found in Wheater and Cook (2000, 2015).
Table 1.3 Common statistical tests. Note that in each case, there are possible questions (and analyses) dealing with more than two samples and/or variables – see Chapter 5 for further details.
Example question | Null hypothesis | Type of test | Data required |
Is there a difference between the number of birds found in deciduous woodlands and coniferous woodlands? | There is no significant difference between the number of birds in deciduous and coniferous woodlands. | Difference tests, e.g. a t test or a Mann–Whitney U test (p. 305). | Two variables: one nominal describing the woodland type and one based on either measurements (i.e. actual numbers) or on a ranked scale that describes the number of birds. |
Is there a relationship between the number of birds and the size of the woodland? | There is no significant relationship between the number of birds and the size of the woodland. | Relationship tests, e.g. correlation analysis (p. 307). | Two variables: one (either measured or ranked) that describes the number of birds and one (either measured or ranked) that describes the size of the woodland. |
Is there an association between whether birds are resident or not and whether the woodlands are deciduous or coniferous? | There is no significant association between the frequency of residency and the frequency of woodland type. | Frequency analysis, e.g. a Chi‐square test (p. 312). | Two variables: one nominal describing the residency status of the birds and one nominal describing the woodland type. |
Asking questions about data
If we wish to ask specific questions of the data, then we are in the realm of inferential statistics. These usually involve the testing of hypotheses. It is standard practice to set up a null hypothesis alongside the questions to be asked. The null hypothesis tests the chance of there being no significant difference between samples (or relationship between variables, or association between categories of variables). So if we wish to know whether there is a difference between two samples (e.g. comparing the number of birds found in deciduous woodlands with the number found in coniferous woodlands), then we actually test the null hypothesis that: there is no significant difference between the number of birds in deciduous and coniferous woodlands. Note that we are looking at ‘significant’ differences. These are differences that are unlikely to have resulted from random variation in the individual woodlands sampled. For this we need a method that tests the null hypothesis that there is no significant difference in the sample averages. In addition to difference tests between samples, there are also relationship tests between variables, and tests designed to examine associations between categories of variables. Table 1.3 summarises some commonly used, relatively simple, statistical approaches to these research questions.
Since there are various questions that we might ask as part of an investigation, it is important to be clear about possible analysis methods in advance of any sampling. The choice of test depends not only on the question being asked, but also on the data types being used. Where data are ranked, but not measured (i.e. ordinal data – p. 27) then a suite of tests called nonparametric tests may be used. The alternative (using parametric tests) is more robust and generally preferred, but requires data to be on a measurement scale (i.e. interval/ratio data). Therefore, it is usually an advantage to obtain measurement data rather than to rank data wherever possible. Even where measurements are taken, parametric tests may not be the most appropriate. This is because most parametric tests require the data